AWS S3 TransferUtility.UploadDirectoryRequest and HTTP PUT Limit - amazon-web-services

Here is a little background. I have designed a WEB API which provides methods for cloud operations(upload, download..). This API internally calls AWS API methods to fulfill these cloud operation requests.
WebDev solution --> WEB API --> AWS API
I am trying to upload a directory to AWS S3 cloud. This directory has large individual files more than 5 GB each. Amazon S3 has a limit of 5 GB for a single PUT operation. But They have provided a multi part upload mechanism using which files upto 5 TB can be uploaded.
The AWS documentation said the TransferUtility.UploadDirectory method will use multi-part upload mechanism to upload large files. So In my WEB API a [PUT] UploadDirectoryMethod method calls TransferUtility.UploadDirectory to upload the directory.
I am receiving an error "Amazon.S3.AmazonS3Exception: Your proposed upload exceeds the maximum allowed size"
Shouldn't the TransferUtility.UploadDirectory take care of breaking the larger objects (larger then 5GB) into parts and they uploading them with multiple PUT(?) operations?
How does Multi-part upload work internally for objects more then 5 GB in size? Does it create multiple PUT requests internally?

Related

Flutter upload files to AWS s3 faster with upload progress

am facing a problem while uploading one or more files i.e images/videos to AWS s3 bucket by using aws_s3_client plugin.
It's taking much time to upload a 10MB file
Not able to track the upload progress percentage
Not having option to upload multiple file at once (if same bucket)
Every time while uploading we have to verify the IM-User access. (since why cant we use single instance at once to verify and keep connection persistent/keep alive until application getting closed)
Hence, am not familiar with AWS services. So, suggest to me a best way to upload a file or multiple files to AWS s3 bucket with faster, with upload progress percentage, multiple file upload at once and persistent connection /Keep Alive verification.
For 1 and 2, use managed uploads, it provides an event to track upload progress and makes uploads faster by using multipart upload. Beware that multipart uploads only work for files having sizes from 5 MB to 5 TB.
For 3, AWS S3 does not allow uploading files having same names or keys in the same bucket. Depending on your requirement, you can turn on versioning in your bucket and that will save different versions of the same file.
For 4, you can generate and use pre-signed URLs. Pre-signed URLs have configurable timeouts that you can adjust depending on how long you want the link to be available for an upload.
Use multi part upload.multi part upload will upload files quickly to S3.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html

Is it possible to upload more than one file to Amazon S3 in single request?

We are fetching binary blobs (PDF, JPG) from sql server and adding the object to Amazon S3 using AWSSDK.S3 (.net) v3.7.2.2.
Currently the process is adding the binary objects to Amazon S3 sequentially (one by one).
Is there any way/api to add more than one objects to Amazon S3 in a single request as this can improve the performance.
While adding the binary objects we have to pass metadata (Binary object properties like width, height, extension etc..) as well.
It is not possible to upload/download multiple objects in one request.
However, Amazon S3 is highly scalable, so you can send multiple requests in parallel. This will also take more advantage of your available bandwidth due to the overhead of file transfer protocols.

How shalL I expose S3 endpoint for clients to load files

I am running a project where in my clients upload the data file on my own server through SFTP.
Now the requirement is to move my application on cloud. So, I want those clients to upload those data file on my S3.
From design & security perspective, what are the approach or ways through which I can ask my clients to upload those files on S3? Shall I expose an application api (which will upload files to S3) to my clients or is there any other better & proper way to achieve this?
EDIT:
I would be uploading daily approx 200 files with each file of size approx 2-3 MB. These file uploads can't be scheduled, they are event driven. Our client SFTP the files as and when they need some processing of those files at our end.
If your clients are already using SFTP then you should consider simply migrating them to the managed SFTP service on AWS, which is part of AWS Transfer Family.
This will mean minimal change for your clients, and will allow you to shift their uploads directly into S3, which is ultimately where you want them to be.
If all your service does is upload to S3 , Use IAM Users/Policies to grant access to s3 bucket to your clients instead as your service will act only as a proxy and add extra maintenance and costs .
If the data that you store on S3 is very critical , I'd suggest you look at this
https://docs.aws.amazon.com/AmazonS3/latest/dev/security-best-practices.html#security-best-practices-prevent
However, there can be cases where you would want to expose an endpoint, lets say -
The client only requires the functionality to upload a file and no other operation. Here, the implementation is abstracted from the client and you can internally use or migrate to any other data store(be it s3) without affecting the clients. But consider this only if this is a possibility.

Using AWS Kinesis for large file uploads

My client has a service which stores a lot of files, like video or sound files. The service works well, however looks like the long-time file storing is quite a challenge, and we would like to use AWS for storing these files.
The problem is the following, the client wants to use AWS kinesis for transferring every file from our servers to AWS. Is this possible? Can we transfer files using that service? There's a lot of video files, and we got more and more every day. And every files is relatively big.
We would also like to save some detail of the files, possibly into dynamoDB, we could use Lambda functions for that.
The most important thing, that we need a reliable data transfer option.
KInesis would not be the right tool to upload files, unless they were all very small - and most videos would almost certainly be over the 1MB record size limit:
The maximum size of a data blob (the data payload before
Base64-encoding) within one record is 1 megabyte (MB).
https://aws.amazon.com/kinesis/streams/faqs/
Use S3 with multi-part upload using one of the SDK's. Objects you won't be accessing for 90+ days can be moved to Glacier.
Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 4302-4306). Amazon Web Services, Inc.. Kindle Edition.
To further optimize file upload speed, use transfer acceleration:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 2060-2062). Amazon Web Services, Inc.. Kindle Edition.
Kinesis launched a new service "Kinesis Video Streams" - https://aws.amazon.com/kinesis/video-streams/ which may be helpful to move large amount of data.

Uploading Directly to S3 vs Uploading Through EC2

Im developing a mobile app that will use AWS for its backend services. In the app I need to upload video files to S3 on a frequent basis, and I'm wondering what the recommended architecture would look like to make this scalable and efficient. Traffic could be high, and file sizes could be large.
-On one hand, I could upload directly to S3 using the S3 API on the client side. This would be the easiest option, but Im not sure of the negative implications associated with it.
-The other way to do it would be to go through an EC2 instance and handle the request using some PHP scripts and upload from there.
So my question is... Are these two options equal, or are there major drawbacks to one of them opposed to another? I will already have EC2 instances configured for database access if that makes any difference in how you approach the question.
I will recommend using "upload directly to S3 using the S3 API on the client side" as you can speed up the upload process by using AWS S3 part upload as your video files are going to large.
The second method will put extra CPU usage load on your EC2 instance as the script processing and upload to S3 will utilize CPU for the process.