How shalL I expose S3 endpoint for clients to load files - amazon-web-services

I am running a project where in my clients upload the data file on my own server through SFTP.
Now the requirement is to move my application on cloud. So, I want those clients to upload those data file on my S3.
From design & security perspective, what are the approach or ways through which I can ask my clients to upload those files on S3? Shall I expose an application api (which will upload files to S3) to my clients or is there any other better & proper way to achieve this?
EDIT:
I would be uploading daily approx 200 files with each file of size approx 2-3 MB. These file uploads can't be scheduled, they are event driven. Our client SFTP the files as and when they need some processing of those files at our end.

If your clients are already using SFTP then you should consider simply migrating them to the managed SFTP service on AWS, which is part of AWS Transfer Family.
This will mean minimal change for your clients, and will allow you to shift their uploads directly into S3, which is ultimately where you want them to be.

If all your service does is upload to S3 , Use IAM Users/Policies to grant access to s3 bucket to your clients instead as your service will act only as a proxy and add extra maintenance and costs .
If the data that you store on S3 is very critical , I'd suggest you look at this
https://docs.aws.amazon.com/AmazonS3/latest/dev/security-best-practices.html#security-best-practices-prevent
However, there can be cases where you would want to expose an endpoint, lets say -
The client only requires the functionality to upload a file and no other operation. Here, the implementation is abstracted from the client and you can internally use or migrate to any other data store(be it s3) without affecting the clients. But consider this only if this is a possibility.

Related

File transfer from basespace to aws S3

I use the Illumina Basespace service to do high throughput sequencing secondary analyzes. This service uses AWS servers and therefore all files are stored on s3.
I would like to transfer the files (results of analyzes) from basespace to my own aws s3 account. I would like to know what would be the best strategy to make things go quickly knowing that in the end we can summarize it as a copy of files from an s3 bucket belonging to Illumina to an s3 bucket belonging to me.
The solutions I'm thinking of:
use the CLI basespace tool to copy the files to our on premise servers then transfer them back to aws
use this tool from an ec2 instance.
use the illumina API to get a pre-signed download url (but then how can I use this url to download the file directly into my s3 bucket?).
If I use an ec2 instance, what kind of instance do you recommend to have enough resources without having too much (and therefore spending money for nothing)?
Thanks in advance,
Quentin

Work around for handling CPU Intensive task in aws ec2?

I have created a django application (running on aws ec2) which convert media file from one format to another format ,but during this process it consume CPU resource due to which I have to pay charges to aws.
I am trying to find a work around where my local pc (ubuntu) takes care of CPU intensive task and final result is uploaded to s3 bucket which I can share with user.
Solution :- One possible solution is that when user upload media file (html upload form) it goes to s3 bucket and at the same time via socket connection the s3 bucket file link is send to my ubuntu where it download file, process it and upload back to s3 bucket.
Could anyone please suggest me better solution as it seems to be not efficient.
Please note :- I have decent internet connection and computer which can handle backend very well but i not in state to pay throttle charges to aws.
Best solution for this is to create separate lambda function for this task. Trigger lambda whenever someone upload files on S3. Lambda will process files and store back to S3.

Best way to transfer data from on-prem to AWS

I have a requirement to transfer data(one time) from on prem to AWS S3. The data size is around 1 TB. I was going through AWS Datasync, Snowball etc... But these managed services are better to migrate if the data is in petabytes. Can someone suggest me the best way to transfer the data in a secured way cost effectively
You can use the AWS Command-Line Interface (CLI). This command will copy data to Amazon S3:
aws s3 sync c:/MyDir s3://my-bucket/
If there is a network failure or timeout, simply run the command again. It only copies files that are not already present in the destination.
The time taken will depend upon the speed of your Internet connection.
You could also consider using AWS Snowball, which is a piece of hardware that is sent to your location. It can hold 50TB of data and costs $200.
If you have no specific requirements (apart from the fact that it needs to be encrypted and the file-size is 1TB) then I would suggest you stick to something plain and simple. S3 supports an object size of 5TB so you wouldn't run into trouble. I don't know if your data is made up of many smaller files or 1 big file (or zip) but in essence its all the same. Since the end-points or all encrypted you should be fine (if your worried, you can encrypt your files before and they will be encrypted while stored (if its backup of something). To get to the point, you can use API tools for transfer or just file-explorer type of tools which have also connectivity to S3 (e.g. https://www.cloudberrylab.com/explorer/amazon-s3.aspx). some other point: cost-effectiviness of storage/transfer all depends on how frequent you need the data, if just a backup or just in case. archiving to glacier is much cheaper.
1 TB is large but it's not so large that it'll take you weeks to get your data onto S3. However if you don't have a good upload speed, use Snowball.
https://aws.amazon.com/snowball/
Snowball is a device shipped to you which can hold up to 100TB. You load your data onto it and ship it back to AWS and they'll upload it to the S3 bucket you specify when loading the data.
This can be done in multiple ways.
Using AWS Cli, we can copy files from local to S3
AWS Transfer using FTP or SFTP (AWS SFTP)
Please refer
There are tools like cloudberry clients which has a UI interface
You can use AWS DataSync Tool

Using AWS Kinesis for large file uploads

My client has a service which stores a lot of files, like video or sound files. The service works well, however looks like the long-time file storing is quite a challenge, and we would like to use AWS for storing these files.
The problem is the following, the client wants to use AWS kinesis for transferring every file from our servers to AWS. Is this possible? Can we transfer files using that service? There's a lot of video files, and we got more and more every day. And every files is relatively big.
We would also like to save some detail of the files, possibly into dynamoDB, we could use Lambda functions for that.
The most important thing, that we need a reliable data transfer option.
KInesis would not be the right tool to upload files, unless they were all very small - and most videos would almost certainly be over the 1MB record size limit:
The maximum size of a data blob (the data payload before
Base64-encoding) within one record is 1 megabyte (MB).
https://aws.amazon.com/kinesis/streams/faqs/
Use S3 with multi-part upload using one of the SDK's. Objects you won't be accessing for 90+ days can be moved to Glacier.
Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 4302-4306). Amazon Web Services, Inc.. Kindle Edition.
To further optimize file upload speed, use transfer acceleration:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Amazon Web Services. Amazon Simple Storage Service (S3) Developer Guide (Kindle Locations 2060-2062). Amazon Web Services, Inc.. Kindle Edition.
Kinesis launched a new service "Kinesis Video Streams" - https://aws.amazon.com/kinesis/video-streams/ which may be helpful to move large amount of data.

Uploading Directly to S3 vs Uploading Through EC2

Im developing a mobile app that will use AWS for its backend services. In the app I need to upload video files to S3 on a frequent basis, and I'm wondering what the recommended architecture would look like to make this scalable and efficient. Traffic could be high, and file sizes could be large.
-On one hand, I could upload directly to S3 using the S3 API on the client side. This would be the easiest option, but Im not sure of the negative implications associated with it.
-The other way to do it would be to go through an EC2 instance and handle the request using some PHP scripts and upload from there.
So my question is... Are these two options equal, or are there major drawbacks to one of them opposed to another? I will already have EC2 instances configured for database access if that makes any difference in how you approach the question.
I will recommend using "upload directly to S3 using the S3 API on the client side" as you can speed up the upload process by using AWS S3 part upload as your video files are going to large.
The second method will put extra CPU usage load on your EC2 instance as the script processing and upload to S3 will utilize CPU for the process.