AWS S3 max file and upload sizes - amazon-web-services

AWS S3 documentation says:
Individual Amazon S3 objects can range in size from a minimum of 0
bytes to a maximum of 5 terabytes. The largest object that can be
uploaded in a single PUT is 5 gigabytes.
How do I store a file of size 5TB if I can only upload a file of size 5GB?

According to the documentation here you should use multipart uploads:
Upload objects in parts—Using the multipart upload API, you can upload
large objects, up to 5 TB.
The multipart upload API is designed to
improve the upload experience for larger objects. You can upload
objects in parts. These object parts can be uploaded independently, in
any order, and in parallel. You can use a multipart upload for objects
from 5 MB to 5 TB in size.
Here there is a list of the APIs and an example on how to use each one.

Related

What is the best way to upload multiple images in aws s3

i am creating a website and users can upload maximum 15 images. I store images and resized images(with aws lambda function) in aws s3, but if i send images to aws s3 one by one it will be too expensive for aws s3 bill. Should i zip them in a folder and send to aws s3, after that unzip them and resize them in aws? thanks for answers.
I am using react-springboot.
Storing in zip will help you reduce the per request cost for s3 and a little space (since zip files may not affect much to images, depending on the zip compression used).
For every user upload example(approx. cost as per AWS site):
Not Storing: $0.000005 per put request X 15 + $0.000004 per Get request X 15
Storing as ZIP: $0.000005 X 1 + 0.000004 X 1
Other options include why not resize the images in Lamda (Async call) and then directly store to AWS S3. This depends on the time your function takes which will incur you additional cost.

Upload slow on S3 Bucket for large number of files

I am uploading large numbner of images on AWS s3 bucket via s3 api from my local machine. 800 images gets uploaded in 15 min, but 8000 images take close to 24 hours to upload. Each image is around 10 MB. The time taken to upload increases exponentially with larger number of files.
Transfer acceleration didn't help as I am close to the datacenter location. Multipart upload is recommended for mostly for larger files (>100 MB). What am I missing here ?

Intermediate blob storage before uploading to S3

I am designing a service that receives large amounts of binary data (proprietary images). I want to accumulate these multiple images into a single S3 object and finally upload to S3. It doesnt make sense to store the accumulation in memory as I want to be able to horizontally scale the service by running multiple instances. As S3 is "replace-only" I cant keep uploading the object to S3. In such a case, what is the best option to store this intermediate data before uploading to S3? One option I am thinking is to use redis but it has limits on the size of value not to exceed to 512 MB.

Is AWS Cloudsearch Scalable?

I have 500MB worth of data to push to cloud search.
Here are the options I have tried:
Upload directly from console:
Tried to uplaod the file, there is a 5 MB limitation.
Then uploaded the file to S3 and selected the S3 option,
Upload to S3 and give S3 url in the console:
Fails and asks to try command line.
Tried with command line
aws cloudsearchdomain upload-documents --endpoint-url http://endpoint
--content-type application/json --documents s3://bucket/cs.json
Error parsing parameter '--documents': Blob values must be a path to a file.
OK, copied the file from s3 to local and tried to upload,
Tried with local file and cli:
aws cloudsearchdomain upload-documents --endpoint-url http://endpoint
--content-type application/json --documents ./cs.json
Connection was closed before we received a valid response from endpoint URL: "http://endpoint/2013-01-01/documents/batch?format=sdk".
Anyway to get CloudSearch to work?
As I understand the question, this is not about the scalability of Cloudsearch as per the Question Header, but it is about the limitations of uploading, and how to upload a large file into Amazon Cloudsearch.
The best and optimal solution would be to upload data by chunking it. Break your document into batches and upload data in batches. (But keep in mind the limitations associated)
The advantage of this is, if you have multiple documents to submit, submit them all in a single call rather than always submitting batches of size 1. AWS recommends to group (up to 5 mb) and send in one call. Each 1,000 batch calls cost you $0.10, I think, so grouping also saves you some money.
This worked for me. Given below are a few guidelines to help tackle the problem better.
Guidelines to follow when uploading data into Amazon Cloudsearch.
Group documents into batches before you upload them. Continuously uploading batches that consist of only one document has a huge, negative impact on the speed at which Amazon CloudSearch can process your updates. Instead, create batches that are as close to the limit as possible and upload them less frequently. (The limits are explained below)
To upload data to your domain, it must be formatted as a valid JSON or XML batch
Now, let me explain the limitations associated with Amazon Cloud search related to file uploads.
1) Batch Size:
The maximum batch size is 5 MB
2) Document size
The maximum document size is 1 MB
3) Document fields
Documents can have no more than 200 fields
4) Data loading volume
You can load one document batch every 10 seconds (approximately 10,000
batches every 24 hours), with each batch size up to 5 MB.
But if you wish to increase the limits, you can Contact Amazon CloudSearch. At the moment, Amazon does not allow to increase upload size limitations.
You can submit a request if you need to increase the maximum number of
partitions for a search domain. For information about increasing other
limits such as the maximum number of search domains, contact Amazon
CloudSearch.

AWS S3 TransferUtility.UploadDirectoryRequest and HTTP PUT Limit

Here is a little background. I have designed a WEB API which provides methods for cloud operations(upload, download..). This API internally calls AWS API methods to fulfill these cloud operation requests.
WebDev solution --> WEB API --> AWS API
I am trying to upload a directory to AWS S3 cloud. This directory has large individual files more than 5 GB each. Amazon S3 has a limit of 5 GB for a single PUT operation. But They have provided a multi part upload mechanism using which files upto 5 TB can be uploaded.
The AWS documentation said the TransferUtility.UploadDirectory method will use multi-part upload mechanism to upload large files. So In my WEB API a [PUT] UploadDirectoryMethod method calls TransferUtility.UploadDirectory to upload the directory.
I am receiving an error "Amazon.S3.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size"
Shouldn't the TransferUtility.UploadDirectory take care of breaking the larger objects (larger then 5GB) into parts and they uploading them with multiple PUT(?) operations?
How does Multi-part upload work internally for objects more then 5 GB in size? Does it create multiple PUT requests internally?