Work around for handling CPU Intensive task in aws ec2? - django

I have created a django application (running on aws ec2) which convert media file from one format to another format ,but during this process it consume CPU resource due to which I have to pay charges to aws.
I am trying to find a work around where my local pc (ubuntu) takes care of CPU intensive task and final result is uploaded to s3 bucket which I can share with user.
Solution :- One possible solution is that when user upload media file (html upload form) it goes to s3 bucket and at the same time via socket connection the s3 bucket file link is send to my ubuntu where it download file, process it and upload back to s3 bucket.
Could anyone please suggest me better solution as it seems to be not efficient.
Please note :- I have decent internet connection and computer which can handle backend very well but i not in state to pay throttle charges to aws.

Best solution for this is to create separate lambda function for this task. Trigger lambda whenever someone upload files on S3. Lambda will process files and store back to S3.

Related

How to retrieve last added files programmatically from Amazon S3

I'm using the Agora SDK & RestAPI to recording life streams and upload them on AWS S3, the sdk comes with snapshoting feature that's Im planning to use for stream thumbnails.
Both Cloud Recording and Snapshot Recording are integrated with my app and works perfectly,
the Remaining problem is that the snapshots are named down to the microseconds
Agora snapshot services image file naming
From my overview, The services works as follow, Mobile App sends data to my Server, My server make requests to the Agora API so It joins the Life Stream channel and starts snapshoting and saving the images to AWS, so I suppose it's impossible to have time synchronization between AWS, AgoraRestAPI, my Server & my mobile app.
I've gone through their docs and I can't find anything about retrieving the file names.
I was thinking maybe I can have a Lambda Function that retrieves the last added file on a given bucket/folder? but due to my lack of knowledge on AWS and Lambda functions I don't know how's that or if it's possible.
any suggestions would be appreciated

download files from AWS S3 bucket in parallel

I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).

Uploading Directly to S3 vs Uploading Through EC2

Im developing a mobile app that will use AWS for its backend services. In the app I need to upload video files to S3 on a frequent basis, and I'm wondering what the recommended architecture would look like to make this scalable and efficient. Traffic could be high, and file sizes could be large.
-On one hand, I could upload directly to S3 using the S3 API on the client side. This would be the easiest option, but Im not sure of the negative implications associated with it.
-The other way to do it would be to go through an EC2 instance and handle the request using some PHP scripts and upload from there.
So my question is... Are these two options equal, or are there major drawbacks to one of them opposed to another? I will already have EC2 instances configured for database access if that makes any difference in how you approach the question.
I will recommend using "upload directly to S3 using the S3 API on the client side" as you can speed up the upload process by using AWS S3 part upload as your video files are going to large.
The second method will put extra CPU usage load on your EC2 instance as the script processing and upload to S3 will utilize CPU for the process.

EC2 and S3 image server

I'm creating an image upload service using EC2 and S3.
User uploads image to EC2 using PHP. EC2 uploads to S3 and then EC2 responds to user with the image link.
I was wondering how fast the upload between EC2 and S3 in the same region is.
Would it be better to store image temporarily on EC2, responds to user first and upload to S3 later or wait for the upload to finish before responding to user?
I was wondering how fast the upload between EC2 and S3 in the same region is
It's fast. Test it.
You should find that you can upload the image to S3 very quickly, then return the S3 URL to the client, where they'll immediately be able to fetch the image.
Caveat: if you are overwriting an S3 object at the same path, rather than creating a new object, there can be a delay after the time you upload the object before the new object is consistently returned for every request. This delay is unlikely, but possible, due to the eventual consistency model of S3. Deletes are the same way -- a deleted object may be still fetchable, briefly, before requests to S3 return 404 or 403.
See What is maximum Amazon S3 replication time on file upload? and note the change you should make to the endpoint if you're working in the US Standard (us-east-1) region to ensure immediate consistency.
It will be plenty fast; the latency between the user and your ec2 instance will be much bigger than the latency between ec2 and s3.
On the otherhand, if ec2 is not doing anything to the image before uploading it to s3, why not upload it directly to s3?

Big data zip on amazon S3 files

I have large amount of data stored on amazon S3 in the forms of objects.
like i Have user which have 200+ GB of photos (about 100000+ objects) stored on amazon S3. each object is a photo , each object size is average 5MB.
Now I want to give a user a link to download data.
Currently what i am doing.
Using S3cmd i copy all the objects from S3 to EC2.
and then using ZIP command or TAR Command i create a
ZIp.
After Zip process is complete i move the zip file back to the S3.
and Then create a singed link that i send to user as an email.
But this process takes a long long time, most of the time it gives out of memory issues, storage issues and this process is very slow.
I need to Know
Is there any way that i can boost this process time.
Is there any third party service/tool where i can create fast zip
of my files and send to user.
or any other 3rd party solution, I am ready to pay for it.
Try using EMR (Elastic Map Reducer and the S3distCp) that can be helpful in your required situation, for EMR you have to create a cluster. and the running your job.
The direction what you are following at high level is correct. However there isn't any straight forward answer which may possibly solve your problem in a single shot.
These are the things which you can try doing
Ask your user to create a AWS account ( or create an IAM user ) and provide a read-only access to that user / account
During the process of uploading to S3 you can group the photos in the bundles of few 50s or 100s compress it and then put in S3 ( from EC2 i.e. during creation of the media itself)
Export to external media from S3 using - Amazon Import / Export
S3DistCP is tool that can greatly help in cases such as this.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
S3DistCP can copy from and to S3 using an EMR Cluster instead of a single instance and compress objects on the fly.
However, in "big data" processing, the user will probably have a better experience if you either create the bundles in advance proactively or start the process asynchronously on-demand and notify the user on completion with the download link.