i am creating a website and users can upload maximum 15 images. I store images and resized images(with aws lambda function) in aws s3, but if i send images to aws s3 one by one it will be too expensive for aws s3 bill. Should i zip them in a folder and send to aws s3, after that unzip them and resize them in aws? thanks for answers.
I am using react-springboot.
Storing in zip will help you reduce the per request cost for s3 and a little space (since zip files may not affect much to images, depending on the zip compression used).
For every user upload example(approx. cost as per AWS site):
Not Storing: $0.000005 per put request X 15 + $0.000004 per Get request X 15
Storing as ZIP: $0.000005 X 1 + 0.000004 X 1
Other options include why not resize the images in Lamda (Async call) and then directly store to AWS S3. This depends on the time your function takes which will incur you additional cost.
Related
I have an aws s3 bucket with 10,000 files totally around 1GB in size.
I call:
aws s3 sync <remote bucket> <local folder> --exact-timestamps
And no files are found to be changed, so no actual file downloads take place.
However, there must be data exchange for the sync - does anyone know how much?
The ListObjects() API call returns a maximum of 1000 objects.
Therefore, the AWS CLI would require at least 10 API calls to retrieve information about 10,000 objects to determine whether files need to by sync'd.
However, since the cost of requests is only $0.005 per 1,000 requests, the cost would be quite small.
In my project users upload images into a S3 bucket. I have created a tensor flow resnet model to interpret the contents of the image. Based on the tensor flow interpretation, the data is to be stored in an elasticsearch instance.
For this, I have created a S3 Bucket, a lambda function that gets triggered when an image is loaded, and AWS elasticsearch instance. Since my tf models are large, I have zipped them and put it in a S3 bucket and uploaded the s3 url to lambda.
Issue: Since my unzipped files were larger than 266 mb, I could not complete the lambda function.
Alternative approach: Instead of S3 Bucket - I am thinking of creating a ec2 instance - with larger volume size to store images and receive the images directly into ec2 instance instead of s3. However, since I will be receiving images in millions within a year, I am not sure if this will be scalable.
I can think of two approaches here:
You side load the app. The lambda can be a small bootstrap script that downloads your app from s3 and unzips it. This is a popular pattern in server less frameworks. You pay for this during a cold start of the lambda so you will need to keep it warm in a production env.
You can store images in s3 itself and create event on image upload with destination SQS. Then you can use ec2 to pull the sqs messages for new messages periodically and process them using your tf models.
I have 500MB worth of data to push to cloud search.
Here are the options I have tried:
Upload directly from console:
Tried to uplaod the file, there is a 5 MB limitation.
Then uploaded the file to S3 and selected the S3 option,
Upload to S3 and give S3 url in the console:
Fails and asks to try command line.
Tried with command line
aws cloudsearchdomain upload-documents --endpoint-url http://endpoint
--content-type application/json --documents s3://bucket/cs.json
Error parsing parameter '--documents': Blob values must be a path to a file.
OK, copied the file from s3 to local and tried to upload,
Tried with local file and cli:
aws cloudsearchdomain upload-documents --endpoint-url http://endpoint
--content-type application/json --documents ./cs.json
Connection was closed before we received a valid response from endpoint URL: "http://endpoint/2013-01-01/documents/batch?format=sdk".
Anyway to get CloudSearch to work?
As I understand the question, this is not about the scalability of Cloudsearch as per the Question Header, but it is about the limitations of uploading, and how to upload a large file into Amazon Cloudsearch.
The best and optimal solution would be to upload data by chunking it. Break your document into batches and upload data in batches. (But keep in mind the limitations associated)
The advantage of this is, if you have multiple documents to submit, submit them all in a single call rather than always submitting batches of size 1. AWS recommends to group (up to 5 mb) and send in one call. Each 1,000 batch calls cost you $0.10, I think, so grouping also saves you some money.
This worked for me. Given below are a few guidelines to help tackle the problem better.
Guidelines to follow when uploading data into Amazon Cloudsearch.
Group documents into batches before you upload them. Continuously uploading batches that consist of only one document has a huge, negative impact on the speed at which Amazon CloudSearch can process your updates. Instead, create batches that are as close to the limit as possible and upload them less frequently. (The limits are explained below)
To upload data to your domain, it must be formatted as a valid JSON or XML batch
Now, let me explain the limitations associated with Amazon Cloud search related to file uploads.
1) Batch Size:
The maximum batch size is 5 MB
2) Document size
The maximum document size is 1 MB
3) Document fields
Documents can have no more than 200 fields
4) Data loading volume
You can load one document batch every 10 seconds (approximately 10,000
batches every 24 hours), with each batch size up to 5 MB.
But if you wish to increase the limits, you can Contact Amazon CloudSearch. At the moment, Amazon does not allow to increase upload size limitations.
You can submit a request if you need to increase the maximum number of
partitions for a search domain. For information about increasing other
limits such as the maximum number of search domains, contact Amazon
CloudSearch.
Is there any way to upload 50000 image files to Amazon S3 Bucket. The 50000 image file URLs are saved in a .txt file. Can someone please tell me a better way to do this.
It sounds like your requirement is: For each image URL listed in a text file, copy the images to an Amazon S3 bucket.
There is no in-built capability with Amazon S3 to do this. Instead, you would need to write an app that:
Reads the text file and, for each URL
Downloads the image
Uploads the image to Amazon S3
Doing this on an Amazon EC2 instance would be the fastest, due to low latency between S3 and EC2.
You could also get fancy and do it via Amazon EMR. It would be the fastest due to parallel processing, but would require knowledge of how to use Hadoop.
If you have a local copy of the images, you could order an AWS Snowball and use it to transfer the files to Amazon S3. However, it would probably be faster just to copy the files over the Internet (rough guess... at 1MB per file, total volume is 50GB).
I am using CloudFront-backed S3 to store lots (sometimes gigs) of images and videos on client sites, and for developers to debug issues sometimes you just need the whole set of images.
We use awscli to sync files down, and it works fine.
But, if instead of pulling from S3 I could pull from the CloudFront url, the download would be much faster as well as using less of our outbound S3 data.
Is there an easy way to do this? Maybe:
A command or flag I just don't know about?
Rewriting the S3 url on the fly?
Outputting a list of files that would be downloaded, so I can script curling them?
Using the cf command to do something?
Amazon S3 outbound data pricing (to the Internet) is free for the first 1 GB, then $0.09 for the first 10 TB. Amazon CloudFront pricing is $0.085 for the first 10 GB (US) up to $0.25 (South America).
Even using CloudFront caching, you are only going to save $.005 per GB. Lets say that you transfer 100 GB per month. You will save $.41 (41 cents) by downloading from CloudFront instead of S3. One benefit of CloudFront is usually faster downloads.
You could write a program, such as Python, that lists the objects in an S3 bucket / directory and then generate the equivalent path using CloudFront. Or a simple word search and replace script.
Firstly, downloading data from cloudfront will be more expensive as that would be on http/https while when we download data from S3 it is local ( if u create endpoint ).
Also, if u are thinking to download lots of image via cloudfront; do not think it will be faster; because for each request it will go to cloudfront and that image will not be found there and will go to origin which is S3.
Hence, you will end up in paying more money and bad latency.