frequent add/updates of files on CDN

frequent add/updates of files on CDN - google-cloud-platform

I was using GCP cloud storage.
But there is a limit on how frequently you can update. I need to add/update 10000 pages per minute.
Quotas & limits  |  Cloud Storage  |  Google Cloud
Any other options with GCP or any other CDN to achieve frequent updates?

Adding pages can be quite different to updating pages.
For example, Amazon CloudFront is a 'pull' cache, so it only caches pages when a user requests the page and it only caches it in the region/edge location of the user who made the request.
Thus, there is no action required when new pages are added to your origin.
If, however, you want to invalidate a page that has been cached, it would cost $0.005 per path requested for invalidation (after the first 1000 paths each month). Thus, it would be expensive to invalidate thousands of pages per minute.
You might instead consider using a low Time-to-Live (TTL) and simply having pages expire themselves.

Related

How does AWS Cloudfront cost work for requests

After reading the documentation on AWS Cloudfront, I am aware that there is a cost for every 10,000 requests (either HTTP or HTTPS).
I was curious about whether the cost applies to all requests on Cloudfront including those files that are cached or does it apply only apply to requests for invalidating current cached files?
I am guessing it's the first option, but I wanted to know just to be in case!

The pricing page indicates the request pricing for all HTTP/HTTPS methods per 10,000 requests (which will include both content fetch and invalidation requests). Note that HTTPS requests cost more.
Invalidation requests themselves have an additional cost, after the first 1,000 per month.

This applies to all requests including the ones that are hitting the cache.

Amazon S3 Requests Usage seems high

I have been using the AmazonS3 service to store some files.
I have uploaded 4 videos and they are public. I'm using a third party video player for those videos (JW Player). As a new user on the AWS Free Tier, my free PUT, POST and LIST requests are almost used up from 2000 allowed requests, and for four videos that seems ridiculous.
Am I missing something or shouldn't one upload be one PUT request, I don't understand how I've hit that limit already.

The AWS Free Tier for Amazon S3 includes:
5GB of standard storage (normally $0.023 per GB)
20,000 GET requests (normally $0.0004 per 1,000 requests)
2,000 PUT requests (normally $0.005 per 1,000 requests)
In total, it is worth up to 13.3 cents every month!
So, don't be too worried about your current level of usage, but do keep an eye on charges so you don't get too many surprises. You can always Create a Billing Alarm to Monitor Your Estimated AWS Charges.
The AWS Free Tier is provided to explore AWS services. It is not intended for production usage.

It would be very hard to find out the reason for this without debugging a bit. So I would suggest you try the following debugging :
See if you have cloudtrail enabled. If yes, then you can track the API calls to S3 to see if anything is wrong there.
If you have cloudtrail enabled then it itself put data into the S3 bucket that might also take up some of the requests.
See if you have logging enabled at the bucket level, that might give you more insight on what all requests are reaching your bucket.
Your vides are public and that is the biggest concern here as you don't know who all can access it.
Setup cloudwatch alarms to avoid any surprises and try to look at logs to find out the issue.

How to cache the images stored in Amazon S3?

I have a RESTful webservice running on Amazon EC2. Since my application needs to deal with large number of photos, I plan to put them on Amazon S3. So the URL for retrieving a photo from S3 could look like this:
http://johnsmith.s3.amazonaws.com/photos/puppy.jpg
Is there any way or necessity to cache the images on EC2? The pros and cons I can think of is:
1) Reduced S3 usage and cost with improved image fetching performance. However on the other hand EC2 cost can rise plus EC2 may not have the capability to handle the image cache due to bandwidth restrictions.
2) Increased development complexity cuz you need to check the cache first and ask S3 to transfer the image to EC2 and then transfer to the client.
I'm using the EC2 micro instance and feel it might be better not to do the image cache on EC2. But the scale might grow fast and eventually will need a image cache.(Am I right?) If cache is needed, is it better to do it on EC2, or on S3? (Is there a way for caching for S3?)
By the way, when the client uploads an image, should it be uploaded to EC2 or S3 directly?

Why bring EC2 into the equation? I strongly recommend using CloudFront for the scenario.
When you use CloudFront in conjunction with S3 as origin; the content gets distributed to 49 different locations worldwide ( as of count of edge locations worldwide today ) directly working out as a cache globally and the content being fetched from nearest location based on the latency to your end users.
The way you don't need to worry about the scale and performance of Cache and EC2 can straightforward offload this to CloudFront and S3.

Static vs dynamic
Generally speaking, here are the tiers:
best CDN (cloudfront)
good static hosting (S3)
okay dynamic (EC2)
Why? There are a few reasons.
maintainability and scalability: cloudfront and S3 scale "for free". You don't need to worry about capacity or bandwidth or request rate.
price: approximately speaking, it's cheaper to use S3 than EC2.
latency: CDNs are located around the world, leading to shorter load times.
Caching
No matter where you are serving your static content from, proper use of the Cache-Control header will make life better. With that header you can tell a browser how long the content is good for. If it is something that never changes, you can instruct a browser to keep it for a year. If it frequently changes, you can instruct a browser to keep it for an hour, or a minute, or revalidate every time. You can give similar instructions to a CDN.
Here's a good guide, and here are some examples:
# keep for one year
Cache-Control: max-age=2592000
# keep for a day on a CDN, but a minute on client browsers
Cache-Control: s-maxage=86400, maxage=60
You can add this to pages served from your EC2 instance (no matter if it's nginx, Tornado, Tomcat, IIS), you can add it to the headers on S3 files, and CloudFront will use these values.
I would not pull the images from S3 to EC2 and then serve them. It's wasted effort. There are only a small number of use cases where that makes sense.

Few scenarios when EC2 caching instance:
your upload/download ratio is far from 50/50
you hit S3 limit 100req/sec
you need URL masking
you want to optimise kernel, TCP/IP settings, cache SSL session for clients
you want proper cache invalidating mechanism for all geo locations
you need 100% control where data is stored
you need to count number of requests
you have custom authentication mechanism
For number of reasons I recommend to take a look at Nginx S3 proxy.

AWS S3 Request Rate Performance to Single Object

Folks,
What is the throughput limit on GET calls to a single object in a S3 bucket? The AWS documentation suggests implementing CloudFront, however, they do not cover the case when a single object exists in a bucket. Does anyone know if the same applies, ie ~300 GET requests/sec?
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
Thanks!

Note: as of July 17 2018, the request limits have been dramatically increased along with the auto-partitioning of s3 buckets.
More information here

There is no throughput limit applied on objects in Amazon S3. However, a high rate of requests per second may limit the ability for S3 to respond to queries. As per the documentation you linked, this will only be of concern above 300 requests per second.
Larger objects can therefore provide more throughput than smaller objects at the same number of requests per second.
Amazon CloudFront can provide faster responses because information is cached rather than served directly from Amazon S3. CloudFront also has over 50 edge locations throughout the world, allowing it to serve content in parallel from multiple locations and at lower latency compared to S3.

How does billing for invalid file requests in Amazon CloudFront work?

In the Amazon docs they say that
Invalidation Requests No additional charge for the first 1,000 files
that you request for invalidation each month. $0.005 per file listed
in your invalidation requests thereafter.
Does it mean that if I use www.cloudfront.net/abc.jpg 1000 times and the image is not there, I will be charged?

A request received by CloudFront for an object that doesn't exist is still a request, so will be charged at whatever cost tier you are currently at. (Requests are cheaper when you have a high volume of them.)
If you try to invalidate an object that doesn't exist, it will still count against your free invalidation quota (and be charged if you go above the 1000/month limit that is mentioned in the docs).

Mike B's comment is correct, a more detailed explanation is as follows:
Amazon CloudFront provides support for Invalidating Objects:
If you need to remove an object from CloudFront edge-server caches before it would expire, you can do one of the following:
Invalidate the object. The next time an end user requests the object, CloudFront returns to the origin to fetch the latest version
of the object.
Use object versioning to serve a different version of the object that has a differ
ent name. For more information, see Updating Existing Objects Using Versioned Object Names.
[emphasis mine]
That is, this is solely a feature supporting the lifecycle of objects in CloudFront's edge-server caches and does not relate in any way to a HTTP 404 (Not Found) status code.
Consequently you won't be charged for the scenario you describe.
Appendix
In case you might be thinking about using CloudFront invalidation as well now, please be aware of the two related FAQs:
Is there a limit to the number of invalidation requests I can make? - There are no limits on the total number of objects you can invalidate; however, each invalidation request you make can have a maximum of 1,000 objects. In addition, you can only have 3 invalidation requests in progress at any given time. [...] You should use invalidation only in unexpected circumstances; if you know beforehand that your files will need to be removed from cache frequently, it is recommended that you either implement a versioning system for your files and/or set a short expiration period. [emphasis mine]
What is the price of Amazon CloudFront? - [...] You may invalidate up to 1,000 files each month from Amazon CloudFront at no additional charge. Beyond the first 1,000 files, you will be charged per file for each file listed in your invalidation requests. You can see the rates for invalidation requests here.
So the pricing reflects this feature not being intended to be a regular cache control mechanism, rather only for out of band invalidation needs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js