File uploads are deceptively complex: large files (5GB+), flaky networks, deduplication for cost, lifecycle for compliance. S3 sets the API standard most others emulate. The interesting design choices: multipart strategy, dedup level, and lifecycle transitions.
Multipart upload
Files > 100MB split into 5-100MB parts. Each part uploaded in parallel, retried independently. InitiateMultipartUpload → UploadPart × N → CompleteMultipartUpload. Network blip only retries the failed part, not the whole file.
Resumable uploads
Client tracks completed parts. On reconnect, ask the service which parts are missing (ListParts), upload only those. Works across days — multipart uploads have a 7-day TTL by default. Critical for mobile and large files.
Deduplication
Hash the file (or each part) with SHA-256. Store one copy keyed by hash. Multiple logical uploads of the same file point to the same physical bytes. Saves 30-70% on storage for typical user content. Adds CPU cost — usually worth it.
Lifecycle policies
Hot tier (last 30d): SSD-backed, immediate access. Warm (30d–1yr): HDD, slightly slower. Cold (>1yr): tape or archive tier, retrieval in minutes-hours. S3 Glacier is the model. Auto-transition based on access time saves 80% on long-term cost.
Pre-signed URLs
Client uploads/downloads directly to/from object storage using a pre-signed URL — your app server never proxies the bytes. Massive bandwidth savings + lower latency. Sign with limited scope (one object, short TTL).