After reading the documentation on AWS Cloudfront, I am aware that there is a cost for every 10,000 requests (either HTTP or HTTPS).
I was curious about whether the cost applies to all requests on Cloudfront including those files that are cached or does it apply only apply to requests for invalidating current cached files?
I am guessing it's the first option, but I wanted to know just to be in case!
The pricing page indicates the request pricing for all HTTP/HTTPS methods per 10,000 requests (which will include both content fetch and invalidation requests). Note that HTTPS requests cost more.
Invalidation requests themselves have an additional cost, after the first 1,000 per month.
This applies to all requests including the ones that are hitting the cache.
Related
I was using GCP cloud storage.
But there is a limit on how frequently you can update. I need to add/update 10000 pages per minute.
Quotas & limits | Cloud Storage | Google Cloud
Any other options with GCP or any other CDN to achieve frequent updates?
Adding pages can be quite different to updating pages.
For example, Amazon CloudFront is a 'pull' cache, so it only caches pages when a user requests the page and it only caches it in the region/edge location of the user who made the request.
Thus, there is no action required when new pages are added to your origin.
If, however, you want to invalidate a page that has been cached, it would cost $0.005 per path requested for invalidation (after the first 1000 paths each month). Thus, it would be expensive to invalidate thousands of pages per minute.
You might instead consider using a low Time-to-Live (TTL) and simply having pages expire themselves.
Let's imagine situation like that:
We have node.js app, which is rendering view on server-side and sends html to browser. In generated html we have few static assets (like images, stylesheets etc.).
Why should I (or not) choose S3 over Lambda to serve this content?
Here are pros & cons which I see:
Performance
I was quite sure that providing content from S3 is much more faster then from Lambda (there is no script which need to be executed)...
...Until I performed some tests (file size ~44kB) average of 10 requests:
API GW + S3: 285ms
API GW + Lambda: 290ms
S3: 135ms
As you can see there is no difference between providing content from Lambda via API GW then from S3. The only significant difference is between direct link to s3 and two previous tests.
Lambda 1 : S3 1
Cost
And here Lambda wins definetely.
First of all we have free triage of 1 000 000 requests,
Second here pricing comes:
S3: $0.004 per 10,000 requests
Lambda: around 0,002000624 per 10,000 requests:
($0.20 per 1 million requests + $0.000000208$ per every 100ms)
So in pricing Lambda wins.
Summarizing
My observations shows that Lambda is better way to serve even static content (speed is similar to S3, and pricing is twice cheaper).
Is there anything what I am missing?
I believe you've made a couple of errors.
S3 request pricing is $0.004 per 10,000 requests, which is $0.40 per million. That's correct.
Lambda is $0.20 per million invocations, plus CPU time. Agreed.
But I believe you've overlooked the fact that you can't invoke Lambda functions from the Internet without API Gateway, which is an additional $3.50 per million requests.
Net cost for serving static content from Lambda is $3.70 per million requests, plus CPU time.¹
This makes S3 substantially less expensive.
Then, consider bandwidth costs: CloudFront, when coupled with S3, is faster than S3 alone, has a higher per-request cost, but is also slightly less expensive for bandwidth. If you constrain your CloudFront distribution to Price Class 100 then you will actually pay less under some circumstances than just using S3 alone.
S3 download bandwidth in the least expensive regions is $0.09/GB.
CloudFront download bandwidth in the least expensive class is $0.085/GB.
Bandwidth from S3 to CloudFront is free (e.g. for cache misses).
The cost per GB downloaded is $0.005 less when using CloudFront with S3 than when using S3 alone. CloudFront charges $0.0075 per 10,000 requests, or $0.0035 more than S3... but, if we assume a 50% cache hit rate, the numbers look like this:
Per 10,000 objects $0.0075 [CF] + ($0.004 [S3] * 0.5 [hit rate]) = $0.0095... for simplicity, let's just round that up to $0.01.
Now, we can see that the request cost for 10K objects is exactly offset by the savings on 2GB of download, so if your objects are larger than 2G/10K = 2M/10 = 200KB/each then using CloudFront with S3 is actually slightly cheaper than using S3 alone. If not, the cost is still too close to be significant and, as mentioned, the download turnaround time is much shorter.
Additionally, CloudFront supports HTTP/2.
¹ This assumes API Gateway + Lambda. Since this answer was written, there are now two more ways to allow a Lambda function to return static (or dynamic) content: CloudFront's Lambda#Edge feature supports generating HTTP responses from a Lambda function, but the function runs in a special, lightweight "edge" container that only supports Node.js. However, minimum runtime here is 50ms rather than the standard 100ms. Application Load Balancers also support using a Lambda function as a target, and these are standard Lambda invocations in standard containers, so all runtimes are supported. Both of these can be more cost-effective than API Gateway, although the baseline cost of the ALB itself also has to be considered unless you already have an ALB. Both are also limited to a 1MB response body (on Lambda#Edge, this requires an "origin request" trigger), which is a smaller limit than API Gateway.
Another important factor you may need to consider is the lambda cold start time which will impact your performance. For static resource, it might significantly increase the page load time. This becomes worse if your lambda happens to be a vpc based which requires a new ENI attached which takes longer time to create.
I have a RESTful webservice running on Amazon EC2. Since my application needs to deal with large number of photos, I plan to put them on Amazon S3. So the URL for retrieving a photo from S3 could look like this:
http://johnsmith.s3.amazonaws.com/photos/puppy.jpg
Is there any way or necessity to cache the images on EC2? The pros and cons I can think of is:
1) Reduced S3 usage and cost with improved image fetching performance. However on the other hand EC2 cost can rise plus EC2 may not have the capability to handle the image cache due to bandwidth restrictions.
2) Increased development complexity cuz you need to check the cache first and ask S3 to transfer the image to EC2 and then transfer to the client.
I'm using the EC2 micro instance and feel it might be better not to do the image cache on EC2. But the scale might grow fast and eventually will need a image cache.(Am I right?) If cache is needed, is it better to do it on EC2, or on S3? (Is there a way for caching for S3?)
By the way, when the client uploads an image, should it be uploaded to EC2 or S3 directly?
Why bring EC2 into the equation? I strongly recommend using CloudFront for the scenario.
When you use CloudFront in conjunction with S3 as origin; the content gets distributed to 49 different locations worldwide ( as of count of edge locations worldwide today ) directly working out as a cache globally and the content being fetched from nearest location based on the latency to your end users.
The way you don't need to worry about the scale and performance of Cache and EC2 can straightforward offload this to CloudFront and S3.
Static vs dynamic
Generally speaking, here are the tiers:
best CDN (cloudfront)
good static hosting (S3)
okay dynamic (EC2)
Why? There are a few reasons.
maintainability and scalability: cloudfront and S3 scale "for free". You don't need to worry about capacity or bandwidth or request rate.
price: approximately speaking, it's cheaper to use S3 than EC2.
latency: CDNs are located around the world, leading to shorter load times.
Caching
No matter where you are serving your static content from, proper use of the Cache-Control header will make life better. With that header you can tell a browser how long the content is good for. If it is something that never changes, you can instruct a browser to keep it for a year. If it frequently changes, you can instruct a browser to keep it for an hour, or a minute, or revalidate every time. You can give similar instructions to a CDN.
Here's a good guide, and here are some examples:
# keep for one year
Cache-Control: max-age=2592000
# keep for a day on a CDN, but a minute on client browsers
Cache-Control: s-maxage=86400, maxage=60
You can add this to pages served from your EC2 instance (no matter if it's nginx, Tornado, Tomcat, IIS), you can add it to the headers on S3 files, and CloudFront will use these values.
I would not pull the images from S3 to EC2 and then serve them. It's wasted effort. There are only a small number of use cases where that makes sense.
Few scenarios when EC2 caching instance:
your upload/download ratio is far from 50/50
you hit S3 limit 100req/sec
you need URL masking
you want to optimise kernel, TCP/IP settings, cache SSL session for clients
you want proper cache invalidating mechanism for all geo locations
you need 100% control where data is stored
you need to count number of requests
you have custom authentication mechanism
For number of reasons I recommend to take a look at Nginx S3 proxy.
Folks,
What is the throughput limit on GET calls to a single object in a S3 bucket? The AWS documentation suggests implementing CloudFront, however, they do not cover the case when a single object exists in a bucket. Does anyone know if the same applies, ie ~300 GET requests/sec?
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Thanks!
Note: as of July 17 2018, the request limits have been dramatically increased along with the auto-partitioning of s3 buckets.
More information here
There is no throughput limit applied on objects in Amazon S3. However, a high rate of requests per second may limit the ability for S3 to respond to queries. As per the documentation you linked, this will only be of concern above 300 requests per second.
Larger objects can therefore provide more throughput than smaller objects at the same number of requests per second.
Amazon CloudFront can provide faster responses because information is cached rather than served directly from Amazon S3. CloudFront also has over 50 edge locations throughout the world, allowing it to serve content in parallel from multiple locations and at lower latency compared to S3.
In the Amazon docs they say that
Invalidation Requests No additional charge for the first 1,000 files
that you request for invalidation each month. $0.005 per file listed
in your invalidation requests thereafter.
Does it mean that if I use www.cloudfront.net/abc.jpg 1000 times and the image is not there, I will be charged?
A request received by CloudFront for an object that doesn't exist is still a request, so will be charged at whatever cost tier you are currently at. (Requests are cheaper when you have a high volume of them.)
If you try to invalidate an object that doesn't exist, it will still count against your free invalidation quota (and be charged if you go above the 1000/month limit that is mentioned in the docs).
Mike B's comment is correct, a more detailed explanation is as follows:
Amazon CloudFront provides support for Invalidating Objects:
If you need to remove an object from CloudFront edge-server caches before it would expire, you can do one of the following:
Invalidate the object. The next time an end user requests the object, CloudFront returns to the origin to fetch the latest version
of the object.
Use object versioning to serve a different version of the object that has a differ
ent name. For more information, see Updating Existing Objects Using Versioned Object Names.
[emphasis mine]
That is, this is solely a feature supporting the lifecycle of objects in CloudFront's edge-server caches and does not relate in any way to a HTTP 404 (Not Found) status code.
Consequently you won't be charged for the scenario you describe.
Appendix
In case you might be thinking about using CloudFront invalidation as well now, please be aware of the two related FAQs:
Is there a limit to the number of invalidation requests I can make? - There are no limits on the total number of objects you can invalidate; however, each invalidation request you make can have a maximum of 1,000 objects. In addition, you can only have 3 invalidation requests in progress at any given time. [...] You should use invalidation only in unexpected circumstances; if you know beforehand that your files will need to be removed from cache frequently, it is recommended that you either implement a versioning system for your files and/or set a short expiration period. [emphasis mine]
What is the price of Amazon CloudFront? - [...] You may invalidate up to 1,000 files each month from Amazon CloudFront at no additional charge. Beyond the first 1,000 files, you will be charged per file for each file listed in your invalidation requests. You can see the rates for invalidation requests here.
So the pricing reflects this feature not being intended to be a regular cache control mechanism, rather only for out of band invalidation needs.