Folks,
What is the throughput limit on GET calls to a single object in a S3 bucket? The AWS documentation suggests implementing CloudFront, however, they do not cover the case when a single object exists in a bucket. Does anyone know if the same applies, ie ~300 GET requests/sec?
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Thanks!
Note: as of July 17 2018, the request limits have been dramatically increased along with the auto-partitioning of s3 buckets.
More information here
There is no throughput limit applied on objects in Amazon S3. However, a high rate of requests per second may limit the ability for S3 to respond to queries. As per the documentation you linked, this will only be of concern above 300 requests per second.
Larger objects can therefore provide more throughput than smaller objects at the same number of requests per second.
Amazon CloudFront can provide faster responses because information is cached rather than served directly from Amazon S3. CloudFront also has over 50 edge locations throughout the world, allowing it to serve content in parallel from multiple locations and at lower latency compared to S3.
Related
I am trying to use AWS S3 and read the data to Jupyter Notebook. The total file size is 42 MB, however the time it is taking to get into S3 is very high. After 5 hours, only 22% has been completed and the estimated time to complete is 12 hours. Are there any other ways to either effectively upload to S3, or use other platform that possibly provides higher speed?
The upload bandwidth is determined by many factors:
Your local internet connection
Any VPN the traffic goes through
The public internet
S3-Bandwidth
Typically the last two aren't your problem (especially for 42MB), but the first two may be.
If you upload data to an S3-Bucket in a region that's far away from you, you can take a look at S3 Transfer-Acceleration, which allows you to send the data to the nearest CloudFront edge location, from where it traverses the global AWS backbone to the destination region, but I doubt this is going to help you much with the problem given the size of data, as it's most likely one of the first two.
I have been using the AmazonS3 service to store some files.
I have uploaded 4 videos and they are public. I'm using a third party video player for those videos (JW Player). As a new user on the AWS Free Tier, my free PUT, POST and LIST requests are almost used up from 2000 allowed requests, and for four videos that seems ridiculous.
Am I missing something or shouldn't one upload be one PUT request, I don't understand how I've hit that limit already.
The AWS Free Tier for Amazon S3 includes:
5GB of standard storage (normally $0.023 per GB)
20,000 GET requests (normally $0.0004 per 1,000 requests)
2,000 PUT requests (normally $0.005 per 1,000 requests)
In total, it is worth up to 13.3 cents every month!
So, don't be too worried about your current level of usage, but do keep an eye on charges so you don't get too many surprises. You can always Create a Billing Alarm to Monitor Your Estimated AWS Charges.
The AWS Free Tier is provided to explore AWS services. It is not intended for production usage.
It would be very hard to find out the reason for this without debugging a bit. So I would suggest you try the following debugging :
See if you have cloudtrail enabled. If yes, then you can track the API calls to S3 to see if anything is wrong there.
If you have cloudtrail enabled then it itself put data into the S3 bucket that might also take up some of the requests.
See if you have logging enabled at the bucket level, that might give you more insight on what all requests are reaching your bucket.
Your vides are public and that is the biggest concern here as you don't know who all can access it.
Setup cloudwatch alarms to avoid any surprises and try to look at logs to find out the issue.
I am in a position where I have a static site hosted in S3 that I need to front with CloudFront. In other words I have no option but to put CloudFront in front of it. I would like to reduce my S3 costs by changing the objects storage class to S3 Infrequent Access (IA), this will reduce my S3 costs by like 45% which is nice since I have to now spend money on CloudFront. Is this a good practice to do? since the resources will be cached by CloudFront anyways? S3 IA has 99.9% uptime which means it can have as much as 8.75 hours of down time per year with AWS s3 IA.
First, don't worry about the downtime. Unless you are using Reduced Redundancy or One-Zone Storage, all data on S3 has pretty much the same redundancy and therefore very high availability.
S3 Standard-IA is pretty much half-price for storage ($0.0125 per GB) compared to S3 Standard ($0.023 per GB). However, data retrieval costs for Standard-IA is $0.01 per GB. Thus, if the data is retrieved more than once per month, then Standard-IA is more expensive.
While using Amazon CloudFront in front of S3 would reduce data access frequency, it's worth noting that CloudFront caches separately in each region. So, if users in Singapore, Sydney and Tokyo all requested the data, it would be fetched three times from S3. So, data stored as Standard-IA would incur 3 x $0.01 per GB charges, making it much more expensive.
See: Announcing Regional Edge Caches for Amazon CloudFront
Bottom line: If the data is going to be accessed at least once per month, it is cheaper to use Standard Storage instead of Standard-Infrequent Access.
Let's imagine situation like that:
We have node.js app, which is rendering view on server-side and sends html to browser. In generated html we have few static assets (like images, stylesheets etc.).
Why should I (or not) choose S3 over Lambda to serve this content?
Here are pros & cons which I see:
Performance
I was quite sure that providing content from S3 is much more faster then from Lambda (there is no script which need to be executed)...
...Until I performed some tests (file size ~44kB) average of 10 requests:
API GW + S3: 285ms
API GW + Lambda: 290ms
S3: 135ms
As you can see there is no difference between providing content from Lambda via API GW then from S3. The only significant difference is between direct link to s3 and two previous tests.
Lambda 1 : S3 1
Cost
And here Lambda wins definetely.
First of all we have free triage of 1 000 000 requests,
Second here pricing comes:
S3: $0.004 per 10,000 requests
Lambda: around 0,002000624 per 10,000 requests:
($0.20 per 1 million requests + $0.000000208$ per every 100ms)
So in pricing Lambda wins.
Summarizing
My observations shows that Lambda is better way to serve even static content (speed is similar to S3, and pricing is twice cheaper).
Is there anything what I am missing?
I believe you've made a couple of errors.
S3 request pricing is $0.004 per 10,000 requests, which is $0.40 per million. That's correct.
Lambda is $0.20 per million invocations, plus CPU time. Agreed.
But I believe you've overlooked the fact that you can't invoke Lambda functions from the Internet without API Gateway, which is an additional $3.50 per million requests.
Net cost for serving static content from Lambda is $3.70 per million requests, plus CPU time.¹
This makes S3 substantially less expensive.
Then, consider bandwidth costs: CloudFront, when coupled with S3, is faster than S3 alone, has a higher per-request cost, but is also slightly less expensive for bandwidth. If you constrain your CloudFront distribution to Price Class 100 then you will actually pay less under some circumstances than just using S3 alone.
S3 download bandwidth in the least expensive regions is $0.09/GB.
CloudFront download bandwidth in the least expensive class is $0.085/GB.
Bandwidth from S3 to CloudFront is free (e.g. for cache misses).
The cost per GB downloaded is $0.005 less when using CloudFront with S3 than when using S3 alone. CloudFront charges $0.0075 per 10,000 requests, or $0.0035 more than S3... but, if we assume a 50% cache hit rate, the numbers look like this:
Per 10,000 objects $0.0075 [CF] + ($0.004 [S3] * 0.5 [hit rate]) = $0.0095... for simplicity, let's just round that up to $0.01.
Now, we can see that the request cost for 10K objects is exactly offset by the savings on 2GB of download, so if your objects are larger than 2G/10K = 2M/10 = 200KB/each then using CloudFront with S3 is actually slightly cheaper than using S3 alone. If not, the cost is still too close to be significant and, as mentioned, the download turnaround time is much shorter.
Additionally, CloudFront supports HTTP/2.
¹ This assumes API Gateway + Lambda. Since this answer was written, there are now two more ways to allow a Lambda function to return static (or dynamic) content: CloudFront's Lambda#Edge feature supports generating HTTP responses from a Lambda function, but the function runs in a special, lightweight "edge" container that only supports Node.js. However, minimum runtime here is 50ms rather than the standard 100ms. Application Load Balancers also support using a Lambda function as a target, and these are standard Lambda invocations in standard containers, so all runtimes are supported. Both of these can be more cost-effective than API Gateway, although the baseline cost of the ALB itself also has to be considered unless you already have an ALB. Both are also limited to a 1MB response body (on Lambda#Edge, this requires an "origin request" trigger), which is a smaller limit than API Gateway.
Another important factor you may need to consider is the lambda cold start time which will impact your performance. For static resource, it might significantly increase the page load time. This becomes worse if your lambda happens to be a vpc based which requires a new ENI attached which takes longer time to create.
I have an s3 bucket in account A with millions of files that take up many GBs
I want to migrate all this data into a new bucket in account B
So far, I've given account B permissions to run s3 commands on the bucket in account A.
I am able to get some results with the
aws s3 sync command with the setting aws configure set default.s3.max_concurrent_requests 100
its fast but it only does a speed of some 20,000 parts per minute.
Is there an approach to sync/move data across aws buckets in different accounts REALLY fast?
I tried to do aws transfer acceleration but it seems that that is good for uploading and downloading from the buckets and I think it works within an aws account.
20,000 parts per minute.
That's > 300/sec, so, um... that's pretty fast. It's also 1.2 million per hour, which is also pretty respectable.
S3 Request Rate and Performance Considerations implies that 300 PUT req/sec is something of a default performance threshold.
At some point, make too many requests too quickly and you'll overwhelm your index partition and you'll start encountering 503 Slow Down errors -- though hopefully aws-cli will handle that gracefully.
The idea, though, seems to be that that S3 will scale up to accommodate the offered workload, so if you leave this process running, you may find that it actually does get faster with time.
Or...
If you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Note, also, that it says "temporary limits." This is where I come to the conclusion that, all on its own, S3 will -- at some point -- provision more index capacity (presumably this means a partition split) to accommodate the increased workload.
You might also find that you get away a much higher aggregate trx/sec if you run multiple separate jobs, each handling a different object prefix (e.g. asset/1, asset/2, asset/3, etc. depending on how the keys are designed in your bucket, because you're not creating such a hot spot in the object index.
The copy operation going on here is an internal S3-to-S3 copy. It isn't download + upload. Transfer acceleration is only used for actual downloads.