How does Amazon S3 Transfer Acceleration accelerate S3 file transfers? - amazon-web-services

I'm not clear how Amazon S3 Transfer Acceleration accelerates S3 file transfers.
I've been using https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html to refer to.
Supposing there is fileA in us-east-1, a user A in the UK, and there's a link to that fileA S3 endpoint.
Here's my understanding of how it works:
Before enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 10 seconds.
After enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 7 seconds.
I'm not clear how Amazon would achieve that reduction in time. It still has to get from the bucket to the user and goes over the public internet.
Or does Amazon intercept the link, move the file to a local CDN server in the meantime, then return a 302 to the new file location?

Under Amazon S3 Transfer Acceleration, the user is directed to the closest AWS endpoint and the request travels across the AWS network, which would have less hops and less traffic than the normal Internet.
Content is not cached.
From Amazon S3 Transfer Acceleration - Amazon Simple Storage Service:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

According the the Amazon S3 FAQ, Amazon S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.
However, this will not always lead to an increase in transfer speed. Each time you use S3 Transfer Acceleration to upload an object, AWS will check whether S3 Transfer Acceleration is likely to be faster than a regular Amazon S3 transfer. If AWS determines that S3 Transfer Acceleration is not likely to be faster than a regular Amazon S3 transfer of the same object to the same destination AWS Region, they will not charge for the use of S3 Transfer Acceleration for that transfer, and may bypass the S3 Transfer Acceleration system for that upload.

Related

download files from AWS S3 bucket in parallel

I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).

why aws s3 transfer acceleration is not working?

I have to upload some files that are between 3 and 7Gb to s3. The default upload speed when using the AWS Console is about 1.3 Mbs. I read about transfer acceleration here:
https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html#transfer-acceleration-examples-aws-cli
So I followed the steps:
Turn on transfer acceleration on the bucket, in the console. Then...
aws s3api put-bucket-accelerate-configuration --bucket [bucket name] --accelerate-configuration Status=Enabled
aws configure set default.s3.use_accelerate_endpoint true
aws s3 cp some_file.txt s3://[bucket]/some_file.txt --region us-east-1 --endpoint-ur http://[bucket].s3-accelerate.amazonaws.com
It uploads still at the same 1.3 Mbs. I am WFH so subject to Wifi, but still I wish it could be better. Is there anything else? Do I need to use Python boto3? I was hoping this would be quicker.
From Amazon S3 Transfer Acceleration - Amazon Simple Storage Service:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Basically, instead of traversing the Internet to get to the AWS endpoint, traffic is directed to the closest Edge Location and then goes across the Amazon network to the desired region.
If your closest Edge Location is in the same location as an AWS Region, then you will gain no benefit from using Amazon S3 Transfer Acceleration. This is because the traffic will follow exactly the same path.
You can use the Amazon S3 Transfer Acceleration Speed Comparison tool to test whether it provides additional speed.
You might check your maximum upstream bandwidth with a speed-tester like google's. That'll set an upper bound for the maximum upload speed you can expect.

S3 Transfer Acceleration Semantics

I have a rather simple question which I cannot find an explicit answer to, but anyone using the subject should be able to answer.
Does S3 Transfer Acceleration follow an eventual model i.e. clients upload to a CF edge location, get the response back and then the data is eventually moved to a bucket OR is the performance (speed) gain is simply because of the AWS internal network usage and upon the request completion the data is always 100% IN the S3 bucket?
If it's the former is there any SLA regarding how fast this eventual process is?
S3 Transfer Acceleration uses CloudFront and CloudFront doesn't cache POST/PUT request which means the data gets uploaded to the S3 at the same time. CloudFront doesn't buffer it, it simply saves your RTT (round trip time) by letting you connect to the nearest edge location compare to when you connect to S3 endpoint situated far from you.
And, since transfer between CloudFront and S3 is in AWS Network, it should be faster.
(buffer in the sense you can consider Acceleration endpoint as proxy).
S3 Transfer Acceleration uses portions of the CloudFront infrastructure to provide low-latency, performance-optimized connectivity from browser to edge to bucket.
It does not use any storage or caching components of CloudFront.
The acceleration is only TLS and transport (buffer and routing) related; all HTTP interactions are ultimately end-to-end with the actual S3 bucket, with CloudFront edge servers providing termination for the browser-facing TLS session and a reverse-proxy function.
Nothing stored outside the bucket, so S3's standard consistency model applies.

Amazon S3 Bucket data transfer charges applicable while mounted in EC2 server?

I have created an S3 Bucket and mounted into one of my EC2 servers in the same region. Then I put data into the bucket using FTP account created for that EC2 instance. Finally, I access the data by Http request.
I'm not accessing S3 bucket directly from Internet, either for writing or accessing. All the data transferred through EC2 instance.
So, I assume per month charges as below, for fully used up 1TB S3 bucket (standard storage),
Storage Pricing - $0.0300*1024 = $30.72
Request Pricing - $0.005*10 = $0.05 (Assumed 10,000 request per month )
Data Transfer Pricing - Nill (Since the bucket is not being accessed directly)
Is that correct? or data transfer pricing is applicable?
Ref: Pricing Details
You do not pay for data transfer between S3 and EC2 in the same region, however you pay for Data Transfer OUT From Amazon EC2 To Internet or EC2 instance in a different availability zone in the same region.
See EC2 pricing for more details.
If you transfer 1TB of data OUT to Internet from AWS, either directly from S3 or through EC2 instance, you will pay the same price.
TIP:
If you are transferring big amount of data from S3 out to Internet, look into CloudFront. Data transfer EC2/S3/ELB -> CloudFront is free of charge and CloudFront has cheaper rates per Gb compared to downloading files directly from S3.
EDIT:
see #Michael - sqlbot's comment, this is often but not always true depending on S3 Bucket's region and CloudFront edge location serving the content.
TIP 2:
For really large amounts of data it might be worth setting up DirectConnect connection (private connection from your office / on-premise setup to AWS). Then Data Transfer becomes even cheaper per Gb, however you start paying hourly rate for your DirectConnect link. Do the math to calculate what's best for you.
If you are reading data from S3 to your EC2 instance, and the S3 bucket is in the same region as your EC2 instance, then there are no data transfer costs.
Broken down:
There is no “data transfer in” costs to your EC2 instance if the data is coming from an S3 bucket in the same region: EC2 Instance Pricing – Amazon Web Services (AWS)
There is no “data transfer out” costs from your S3 bucket if the data is going to an EC2 instance in the same region: Cloud Storage Pricing – Amazon Simple Storage Service (S3) – AWS
There is no "data tansfer out" costs from EC2 to S3.
More info:
https://www.quora.com/In-AWS-EC2-what-counts-towards-data-transfer-costs

Are there any difference between amazon cloudfront and amazon s3 transfer acceleration?

I have read documents about them, but I don't know their difference exactly.
could you let me know what's the difference?
TL;DR: CloudFront is for content delivery. S3 Transfer Acceleration is for faster transfers and higher throughput to S3 buckets (mainly uploads).
Amazon S3 Transfer Acceleration is an S3 feature that accelerates uploads to S3 buckets using AWS Edge locations - the same Edge locations as in AWS CloudFront service.
However, (a) creating a CloudFront distribution with an origin pointing to your S3 bucket and (b) enabling S3 Transfer acceleration for your bucket - are two different things serving two different purposes.
When you create a CloudFront distribution with an origin pointing to your S3 bucket, you enable caching on Edge locations. Consequent requests to the same objects will be served from the Edge cache which is faster for the end user and also reduces the load on your origin. CloudFront is primarily used as a content delivery service.
When you enable S3 Transfer Acceleration for your S3 bucket and use <bucket>.s3-accelerate.amazonaws.com instead of the default S3 endpoint, the transfers are performed via the same Edge locations, but the network path is optimized for long-distance large-object uploads. Extra resources and optimizations are used to achieve higher throughput. No caching on Edge locations.
More inromation:
https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/
http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html
https://aws.amazon.com/about-aws/whats-new/2016/04/transfer-files-into-amazon-s3-up-to-300-percent-faster/
If you are interested in the difference between these two options pertaining to uploading content to S3 you may be interested in the following from Amazon's FAQ for S3:
Q. How should I choose between Transfer Acceleration and Amazon
CloudFront’s PUT/POST? Transfer Acceleration optimizes the TCP
protocol and adds additional intelligence between the client and the
S3 bucket, making Transfer Acceleration a better choice if a higher
throughput is desired. If you have objects that are smaller than 1GB
or if the data set is less than 1GB in size, you should consider using
Amazon CloudFront's PUT/POST commands for optimal performance.
As the FAQ answer states, transfer acceleration should be used if you need higher throughput.
Per the FAQs:
Q: How should I choose between S3 Transfer Acceleration and Amazon CloudFront’s PUT/POST?
S3 Transfer Acceleration optimizes the TCP protocol and adds additional intelligence between the client and the S3 bucket, making S3 Transfer Acceleration a better choice if a higher throughput is desired. If you have objects that are smaller than 1GB or if the data set is less than 1GB in size, you should consider using Amazon CloudFront's PUT/POST commands for optimal performance.
https://aws.amazon.com/s3/faqs/#s3ta
both Amazon cloudfront and amazon S3 are very different. Here is what these are for:
Amazon S3 provides a storage service on the internet while Amazon CloudFront is a web service for content delivery. Amazon S3 uses its own global network of websites while Amazon CloudFront delivers your content through a worldwide network of edge locations. Major differences in the features of both these services are mentioned Here.
And if you want to know about the S3 transfer accelerators, it actually takes advantage of Amazon CloudFront’s globally distributed edge locations to deliver/transfer fast, easy, and secure way of files over long distances between your client and an S3 bucket. Want to read more about S3 transfer accelerator, click here.
CloudFront is download direction only, so it is not offering a performant upload to the origin. Whereas S3 with Transfer Acceleration, it will utilize Edge locations like CloudFront both for upload and download.