EC2 and S3 image server - amazon-web-services

I'm creating an image upload service using EC2 and S3.
User uploads image to EC2 using PHP. EC2 uploads to S3 and then EC2 responds to user with the image link.
I was wondering how fast the upload between EC2 and S3 in the same region is.
Would it be better to store image temporarily on EC2, responds to user first and upload to S3 later or wait for the upload to finish before responding to user?

I was wondering how fast the upload between EC2 and S3 in the same region is
It's fast. Test it.
You should find that you can upload the image to S3 very quickly, then return the S3 URL to the client, where they'll immediately be able to fetch the image.
Caveat: if you are overwriting an S3 object at the same path, rather than creating a new object, there can be a delay after the time you upload the object before the new object is consistently returned for every request. This delay is unlikely, but possible, due to the eventual consistency model of S3. Deletes are the same way -- a deleted object may be still fetchable, briefly, before requests to S3 return 404 or 403.
See What is maximum Amazon S3 replication time on file upload? and note the change you should make to the endpoint if you're working in the US Standard (us-east-1) region to ensure immediate consistency.

It will be plenty fast; the latency between the user and your ec2 instance will be much bigger than the latency between ec2 and s3.
On the otherhand, if ec2 is not doing anything to the image before uploading it to s3, why not upload it directly to s3?

Related

How does the data packet flow work between S3 and cloudfront in AWS when S3 object is access/updated by multiple regions?

S3 bucket on Region 1 and use cloud front service
1. from Region 2 user pull the file from the s3 bucket first time through the edge server make some changes and upload.
2. After that a user from Region 1 makes changes to the same file in bucket and upload the changes.
Now users from Region 2 again pull the file from s3 bucket, how does the data packet flow work ? does the edge work check with the main server (s3 bucket )at the time requested by Region 2 user or the cached file on the edge server is automatically updated?
CF does not check whether your original S3 object has been updated or not. Cached objects stay in CF for as long as they are valid or CF deems them as not popular enough to keep them.
If you want to force CF do discard old files and start using new ones you have to invalidating your files yourself, or use any of the recommended strategies for managing that, such as using versioned file names.

Work around for handling CPU Intensive task in aws ec2?

I have created a django application (running on aws ec2) which convert media file from one format to another format ,but during this process it consume CPU resource due to which I have to pay charges to aws.
I am trying to find a work around where my local pc (ubuntu) takes care of CPU intensive task and final result is uploaded to s3 bucket which I can share with user.
Solution :- One possible solution is that when user upload media file (html upload form) it goes to s3 bucket and at the same time via socket connection the s3 bucket file link is send to my ubuntu where it download file, process it and upload back to s3 bucket.
Could anyone please suggest me better solution as it seems to be not efficient.
Please note :- I have decent internet connection and computer which can handle backend very well but i not in state to pay throttle charges to aws.
Best solution for this is to create separate lambda function for this task. Trigger lambda whenever someone upload files on S3. Lambda will process files and store back to S3.

Connecting S3 - Lambda - EC2 - Elasticsearch

In my project users upload images into a S3 bucket. I have created a tensor flow resnet model to interpret the contents of the image. Based on the tensor flow interpretation, the data is to be stored in an elasticsearch instance.
For this, I have created a S3 Bucket, a lambda function that gets triggered when an image is loaded, and AWS elasticsearch instance. Since my tf models are large, I have zipped them and put it in a S3 bucket and uploaded the s3 url to lambda.
Issue: Since my unzipped files were larger than 266 mb, I could not complete the lambda function.
Alternative approach: Instead of S3 Bucket - I am thinking of creating a ec2 instance - with larger volume size to store images and receive the images directly into ec2 instance instead of s3. However, since I will be receiving images in millions within a year, I am not sure if this will be scalable.
I can think of two approaches here:
You side load the app. The lambda can be a small bootstrap script that downloads your app from s3 and unzips it. This is a popular pattern in server less frameworks. You pay for this during a cold start of the lambda so you will need to keep it warm in a production env.
You can store images in s3 itself and create event on image upload with destination SQS. Then you can use ec2 to pull the sqs messages for new messages periodically and process them using your tf models.

Is there any way to upload 50000 image files to Amazon S3 Bucket from a list of URLs

Is there any way to upload 50000 image files to Amazon S3 Bucket. The 50000 image file URLs are saved in a .txt file. Can someone please tell me a better way to do this.
It sounds like your requirement is: For each image URL listed in a text file, copy the images to an Amazon S3 bucket.
There is no in-built capability with Amazon S3 to do this. Instead, you would need to write an app that:
Reads the text file and, for each URL
Downloads the image
Uploads the image to Amazon S3
Doing this on an Amazon EC2 instance would be the fastest, due to low latency between S3 and EC2.
You could also get fancy and do it via Amazon EMR. It would be the fastest due to parallel processing, but would require knowledge of how to use Hadoop.
If you have a local copy of the images, you could order an AWS Snowball and use it to transfer the files to Amazon S3. However, it would probably be faster just to copy the files over the Internet (rough guess... at 1MB per file, total volume is 50GB).

amazon beanstalk, s3 and local files

I have a web application in PHP that accepts image uploads from a user interface (sent using some javascript). The PHP application processes the image and saves to disk several different versions and in different formats and resolutions.
Now I'm trying to integrate Amazon S3 into this application.
1) At which point do I actually save the file to S3?
2) Should I only do it at the end to store the final versions and in the meantime save temporary version on the EC2 instance server or should I never save to the EC2 instance?
3) One of my main worries is let's say the user uploads the file but does not press save which is the step that would actually store it to amazon s3, and let's say the load increases before save is pressed, is there a chance the user by the time he/she presses save could end up on a different instance where the local image does not exist?
amazon-web-services amazon-s3 amazon
This probably not the best solution, but at least it worth mentioning: you could mount S3 bucket as a local folder on your server (using RioFS, for example), and when file is ready to be uploaded to S3 - copy it to that folder, RioFS will automatically upload it to remote S3 bucket.