I have to upload some files that are between 3 and 7Gb to s3. The default upload speed when using the AWS Console is about 1.3 Mbs. I read about transfer acceleration here:
https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html#transfer-acceleration-examples-aws-cli
So I followed the steps:
Turn on transfer acceleration on the bucket, in the console. Then...
aws s3api put-bucket-accelerate-configuration --bucket [bucket name] --accelerate-configuration Status=Enabled
aws configure set default.s3.use_accelerate_endpoint true
aws s3 cp some_file.txt s3://[bucket]/some_file.txt --region us-east-1 --endpoint-ur http://[bucket].s3-accelerate.amazonaws.com
It uploads still at the same 1.3 Mbs. I am WFH so subject to Wifi, but still I wish it could be better. Is there anything else? Do I need to use Python boto3? I was hoping this would be quicker.
From Amazon S3 Transfer Acceleration - Amazon Simple Storage Service:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
Basically, instead of traversing the Internet to get to the AWS endpoint, traffic is directed to the closest Edge Location and then goes across the Amazon network to the desired region.
If your closest Edge Location is in the same location as an AWS Region, then you will gain no benefit from using Amazon S3 Transfer Acceleration. This is because the traffic will follow exactly the same path.
You can use the Amazon S3 Transfer Acceleration Speed Comparison tool to test whether it provides additional speed.
You might check your maximum upstream bandwidth with a speed-tester like google's. That'll set an upper bound for the maximum upload speed you can expect.
Related
I want to download million of files from S3 bucket which will take more than a week to be downloaded one by one - any way/ any command to download those files in parallel using shell script ?
Thanks,
AWS CLI
You can certainly issue GetObject requests in parallel. In fact, the AWS Command-Line Interface (CLI) does exactly that when transferring files, so that it can take advantage of available bandwidth. The aws s3 sync command will transfer the content in parallel.
See: AWS CLI S3 Configuration
If your bucket has a large number of objects, it can take a long time to list the contents of the bucket. Therefore, you might want to sync the bucket by prefix (folder) rather than trying it all at once.
AWS DataSync
You might instead want to use AWS DataSync:
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect... Move active datasets rapidly over the network into Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server. DataSync includes automatic encryption and data integrity validation to help make sure that your data arrives securely, intact, and ready to use.
DataSync uses a protocol that takes full advantage of available bandwidth and will manage the parallel downloading of content. A fee of $0.0125 per GB applies.
AWS Snowball
Another option is to use AWS Snowcone (8TB) or AWS Snowball (50TB or 80TB), which are physical devices that you can pre-load with content from S3 and have it shipped to your location. You then connect it to your network and download the data. (It works in reverse too, for uploading bulk data to Amazon S3).
I'm not clear how Amazon S3 Transfer Acceleration accelerates S3 file transfers.
I've been using https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html to refer to.
Supposing there is fileA in us-east-1, a user A in the UK, and there's a link to that fileA S3 endpoint.
Here's my understanding of how it works:
Before enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 10 seconds.
After enabling Amazon S3 Transfer Acceleration user A would click on that link to fileA and it might take 7 seconds.
I'm not clear how Amazon would achieve that reduction in time. It still has to get from the bucket to the user and goes over the public internet.
Or does Amazon intercept the link, move the file to a local CDN server in the meantime, then return a 302 to the new file location?
Under Amazon S3 Transfer Acceleration, the user is directed to the closest AWS endpoint and the request travels across the AWS network, which would have less hops and less traffic than the normal Internet.
Content is not cached.
From Amazon S3 Transfer Acceleration - Amazon Simple Storage Service:
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
According the the Amazon S3 FAQ, Amazon S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.
However, this will not always lead to an increase in transfer speed. Each time you use S3 Transfer Acceleration to upload an object, AWS will check whether S3 Transfer Acceleration is likely to be faster than a regular Amazon S3 transfer. If AWS determines that S3 Transfer Acceleration is not likely to be faster than a regular Amazon S3 transfer of the same object to the same destination AWS Region, they will not charge for the use of S3 Transfer Acceleration for that transfer, and may bypass the S3 Transfer Acceleration system for that upload.
I am using Amazon Connect and storing the call recording in one region.
I have Amazon Transcribe in another region and I followed How to create an audio transcript with Amazon Transcribe | AWS to convert the audio file to transcript format. Steps seem very simple.
However, when I click on Create in Amazon Transcribe (to convert the audio recording file generated by connect to Transcript), it is throwing the error: the recording is there in other region (which is expected in my case, because the recorded (audio file) is not there in the same region)
The S3 URI that you provided points to the incorrect region. Make sure that the bucket is in the XXX-XXX region and try your request again.
where xxx-xxx is the region of Amazon Transcribe. It is expected the recording (audio file) to be there in the same region.
But:
Is there a way to expose the S3 bucket with an audio file so that It can be accessed from other regions too?
If not, what is the other way to solve this?
"Is there a way to expose the S3 bucket...?"
As it turns out, exposing the bucket isn't the problem. Buckets are always physically located in exactly one region, but are accessible from all regions as well as from outside AWS if the requester is in possession of appropriate and authorized credentials and no policy explicitly denies the access.
But nothing in S3 about the bucket can be changed to fix the error you're getting, because the problem is somewhere else -- not S3.
From the API data types in the Amazon Transcribe Developer Guide:
MediaFileUri
The S3 location of the input media file. The URI must be in the same region as the [Amazon Transcribe] API endpoint that you are calling.
https://docs.aws.amazon.com/transcribe/latest/dg/API_Media.html
Transcribe was designed not to reach across regional boundaries to access media in a bucket, and stops you if you try, with the message you're getting.
Why does it work that way? Possibly performance/efficiency. Possibly security. Possibly to help unwitting users avoid unexpected billing charges for cross-region data transport. Possibly other reasons, maybe in combination with the above.
Possible solutions:
Use Connect, an S3 bucket, and Transcribe, all in the same region; or
Use two buckets and S3 Cross-Region Replication to replicate files from the Connect region to the Transcribe region. Be aware that this ca have significant costs at scale, since S3 is moving data acroas regional boundaries. Be further aware that replication is fast but not instantaneous, so calls to Transcribe might fail to find media that has arrived in the first bucket but not yet the second; or
Use two buckets, and make a call in your code to S3's PUT+Copy API to copy the file to the second bucket in the Transcribe region, before calling Transcribe.
I need to copy some buckets from one account to another. I got all permissions so I started transferring the data via cli (cp command). I am operating on a c4.large. The problem is that there is pretty much data (9tb) and it goes realy slow. In 20 minutes I transferred like 20gb...
I checked the internet speed and the download is 3000Mbit/s and the upload is 500 Mbit/s. How can I speed up it?
The AWS Command-Line Interface (CLI) aws s3 cp command simply sends the copy request to Amazon S3. The data is transferred between the Amazon S3 buckets without downloading to your computer. Therefore, the size and bandwidth of the computer issuing the command is not related to the speed of data transfer.
It is likely that the aws s3 cp command is only copying a small number of files simultaneously. You could increase the speed by setting the max_concurrent_requests parameter to a higher value:
aws configure set default.s3.max_concurrent_requests 20
See:
AWS CLI S3 Configuration — AWS CLI Command Reference
Getting the Most Out of the Amazon S3 CLI | AWS Partner Network (APN) Blog
I have read documents about them, but I don't know their difference exactly.
could you let me know what's the difference?
TL;DR: CloudFront is for content delivery. S3 Transfer Acceleration is for faster transfers and higher throughput to S3 buckets (mainly uploads).
Amazon S3 Transfer Acceleration is an S3 feature that accelerates uploads to S3 buckets using AWS Edge locations - the same Edge locations as in AWS CloudFront service.
However, (a) creating a CloudFront distribution with an origin pointing to your S3 bucket and (b) enabling S3 Transfer acceleration for your bucket - are two different things serving two different purposes.
When you create a CloudFront distribution with an origin pointing to your S3 bucket, you enable caching on Edge locations. Consequent requests to the same objects will be served from the Edge cache which is faster for the end user and also reduces the load on your origin. CloudFront is primarily used as a content delivery service.
When you enable S3 Transfer Acceleration for your S3 bucket and use <bucket>.s3-accelerate.amazonaws.com instead of the default S3 endpoint, the transfers are performed via the same Edge locations, but the network path is optimized for long-distance large-object uploads. Extra resources and optimizations are used to achieve higher throughput. No caching on Edge locations.
More inromation:
https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/
http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration-examples.html
https://aws.amazon.com/about-aws/whats-new/2016/04/transfer-files-into-amazon-s3-up-to-300-percent-faster/
If you are interested in the difference between these two options pertaining to uploading content to S3 you may be interested in the following from Amazon's FAQ for S3:
Q. How should I choose between Transfer Acceleration and Amazon
CloudFront’s PUT/POST? Transfer Acceleration optimizes the TCP
protocol and adds additional intelligence between the client and the
S3 bucket, making Transfer Acceleration a better choice if a higher
throughput is desired. If you have objects that are smaller than 1GB
or if the data set is less than 1GB in size, you should consider using
Amazon CloudFront's PUT/POST commands for optimal performance.
As the FAQ answer states, transfer acceleration should be used if you need higher throughput.
Per the FAQs:
Q: How should I choose between S3 Transfer Acceleration and Amazon CloudFront’s PUT/POST?
S3 Transfer Acceleration optimizes the TCP protocol and adds additional intelligence between the client and the S3 bucket, making S3 Transfer Acceleration a better choice if a higher throughput is desired. If you have objects that are smaller than 1GB or if the data set is less than 1GB in size, you should consider using Amazon CloudFront's PUT/POST commands for optimal performance.
https://aws.amazon.com/s3/faqs/#s3ta
both Amazon cloudfront and amazon S3 are very different. Here is what these are for:
Amazon S3 provides a storage service on the internet while Amazon CloudFront is a web service for content delivery. Amazon S3 uses its own global network of websites while Amazon CloudFront delivers your content through a worldwide network of edge locations. Major differences in the features of both these services are mentioned Here.
And if you want to know about the S3 transfer accelerators, it actually takes advantage of Amazon CloudFront’s globally distributed edge locations to deliver/transfer fast, easy, and secure way of files over long distances between your client and an S3 bucket. Want to read more about S3 transfer accelerator, click here.
CloudFront is download direction only, so it is not offering a performant upload to the origin. Whereas S3 with Transfer Acceleration, it will utilize Edge locations like CloudFront both for upload and download.