Uploading Directly to S3 vs Uploading Through EC2 - amazon-web-services

Im developing a mobile app that will use AWS for its backend services. In the app I need to upload video files to S3 on a frequent basis, and I'm wondering what the recommended architecture would look like to make this scalable and efficient. Traffic could be high, and file sizes could be large.
-On one hand, I could upload directly to S3 using the S3 API on the client side. This would be the easiest option, but Im not sure of the negative implications associated with it.
-The other way to do it would be to go through an EC2 instance and handle the request using some PHP scripts and upload from there.
So my question is... Are these two options equal, or are there major drawbacks to one of them opposed to another? I will already have EC2 instances configured for database access if that makes any difference in how you approach the question.

I will recommend using "upload directly to S3 using the S3 API on the client side" as you can speed up the upload process by using AWS S3 part upload as your video files are going to large.
The second method will put extra CPU usage load on your EC2 instance as the script processing and upload to S3 will utilize CPU for the process.

Related

Work around for handling CPU Intensive task in aws ec2?

I have created a django application (running on aws ec2) which convert media file from one format to another format ,but during this process it consume CPU resource due to which I have to pay charges to aws.
I am trying to find a work around where my local pc (ubuntu) takes care of CPU intensive task and final result is uploaded to s3 bucket which I can share with user.
Solution :- One possible solution is that when user upload media file (html upload form) it goes to s3 bucket and at the same time via socket connection the s3 bucket file link is send to my ubuntu where it download file, process it and upload back to s3 bucket.
Could anyone please suggest me better solution as it seems to be not efficient.
Please note :- I have decent internet connection and computer which can handle backend very well but i not in state to pay throttle charges to aws.
Best solution for this is to create separate lambda function for this task. Trigger lambda whenever someone upload files on S3. Lambda will process files and store back to S3.

How shalL I expose S3 endpoint for clients to load files

I am running a project where in my clients upload the data file on my own server through SFTP.
Now the requirement is to move my application on cloud. So, I want those clients to upload those data file on my S3.
From design & security perspective, what are the approach or ways through which I can ask my clients to upload those files on S3? Shall I expose an application api (which will upload files to S3) to my clients or is there any other better & proper way to achieve this?
EDIT:
I would be uploading daily approx 200 files with each file of size approx 2-3 MB. These file uploads can't be scheduled, they are event driven. Our client SFTP the files as and when they need some processing of those files at our end.
If your clients are already using SFTP then you should consider simply migrating them to the managed SFTP service on AWS, which is part of AWS Transfer Family.
This will mean minimal change for your clients, and will allow you to shift their uploads directly into S3, which is ultimately where you want them to be.
If all your service does is upload to S3 , Use IAM Users/Policies to grant access to s3 bucket to your clients instead as your service will act only as a proxy and add extra maintenance and costs .
If the data that you store on S3 is very critical , I'd suggest you look at this
https://docs.aws.amazon.com/AmazonS3/latest/dev/security-best-practices.html#security-best-practices-prevent
However, there can be cases where you would want to expose an endpoint, lets say -
The client only requires the functionality to upload a file and no other operation. Here, the implementation is abstracted from the client and you can internally use or migrate to any other data store(be it s3) without affecting the clients. But consider this only if this is a possibility.

Difference between s3 bucket vs host files for Amazon Cloud Front

Background
We have developed an e-commerce application where I want to use CDN to improve the speed of the app and also to reduce the load on the host.
The application is hosted on an EC2 server and now we are going to use Cloud Front.
Questions
After reading a lot of articles and documents, I have created a distribution for my sample site. After doing all the experience I have come to know the following things. I want to be sure if am right about these points or not.
When we create a distribution it takes all the accessible data from the given origin path. We don't need to copy/ sync our files to cloud front.
We just have to change the path of our application according to this distribution CNAME (if cname is given).
There is no difference between placing the images/js/CSS files on S3 or on our own host. Cloud Front will just take them by itself.
The application will have thousands of pictures of the products, should we place them on S3 or its ok if they are on the host itself? Please share any good article to understand the difference of both the techniques.
Because if S3 is significantly better then I'll have to make a program to sync all such data on S3.
Thanks for the help.
Some reasons to store the images on Amazon S3 rather than your own host (and then serve them via Amazon CloudFront):
Less load on your servers
Even though content is cached in Amazon CloudFront, your servers will still be hit with requests for the first access of each object from every edge location (each edge location maintains its own cache), repeated every time that the object expires. (Refreshes will generate a HEAD request, and will only re-download content that has changed or been flushed from the cache.)
More durable storage
Amazon S3 keeps copies of your data across multiple Availability Zones within the same Region. You could also replicate data between your servers to improve durability but then you would need to manage the replication and pay for storage on every server.
Lower storage cost
Storing data on Amazon S3 is lower cost than storing it on Amazon EBS volumes. If you are planning on keeping your data in both locations, then obviously using S3 is more expensive but you should also consider storing it only on S3, which makes it lower cost, more durable and less for you to backup on your server.
Reasons to NOT use S3:
More moving parts -- maintaining code to move files to S3
Not as convenient as using a local file system
Having to merge log files from S3 and your own servers to gather usage information

Fastest way to download large files from AWS EC2 EBS

Suppose I have a couple of terabytes worth of data files that have accumulated on an EC2 instance's block storage.
What would be the most efficient way of downloading them to a local machine? scp? ftp? nfs? http? rsync? Going through an intermediate s3 bucket? Torrent via multiple machines? Any special tools or scripts out there for this particular problem?
As I did not really receive a convincing answer, I decided to make a small measurement myself. Here are the results I got:
More details here.
Please follow these rules:
Move as one file, tar everything into a single archive file.
Create S3 bucket in the same region as your EC2/EBS.
Use AWS CLI S3 command to upload file to S3 bucket.
Use AWS CLI to pull the file to your local or wherever another storage is.
This will be the easiest and most efficient way for you.
Some more info about this usecase is needed. I hope below concepts are helpfull:
HTTP - fast, easy to implement, versatile and has small overhead.
Resilio (formerly BitTorrent Sync) - fast, easy to deploy, decentralized, and secure. Can handle transfer interruptions. Works if both endpoints are behind NAT.
rsync - old school and well known solution. Can resume transfer and fast in syncing big amounts of data.
Upload to S3 and get from there - Upload to S3 is fast. Next You can use HTTP(S) or BitTorrent to get data localy.

Using AWS Kinesis for large file uploads

My client has a service which stores a lot of files, like video or sound files. The service works well, however looks like the long-time file storing is quite a challenge, and we would like to use AWS for storing these files.
The problem is the following, the client wants to use AWS kinesis for transferring every file from our servers to AWS. Is this possible? Can we transfer files using that service? There's a lot of video files, and we got more and more every day. And every files is relatively big.
We would also like to save some detail of the files, possibly into dynamoDB, we could use Lambda functions for that.
The most important thing, that we need a reliable data transfer option.
KInesis would not be the right tool to upload files, unless they were all very small - and most videos would almost certainly be over the 1MB record size limit:
The maximum size of a data blob (the data payload before
Base64-encoding) within one record is 1 megabyte (MB).
https://aws.amazon.com/kinesis/streams/faqs/
Use S3 with multi-part upload using one of the SDK's. Objects you won't be accessing for 90+ days can be moved to Glacier.
Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 4302-4306). Amazon Web Services, Inc.. Kindle Edition.
To further optimize file upload speed, use transfer acceleration:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 2060-2062). Amazon Web Services, Inc.. Kindle Edition.
Kinesis launched a new service "Kinesis Video Streams" - https://aws.amazon.com/kinesis/video-streams/ which may be helpful to move large amount of data.