Is it possible to set Amazon CloudFront cache to far-future expiry? - amazon-web-services

How do I change the cache expiry in CloudFront on AWS? I can't see a way to do it and I think I saw an old post of a few years ago on here was somebody said it couldn't be done.
I've gone through every option in S3 and CloudFront and every option on the outer folder and on the file, but nothing.
Can it be done now, or is there any alternative? I really want to set the cache to 6 months or a year if I can.
AWS is hard work.

You can, but its not exactly obvious how this works.
You can store custom http headers with your s3 objects. If you look at the console, this is under the metadata section for an object. With this you can set a far future expires header.
Cloudfront will take the existing headers and pass them on. If cloudfront is already caching the object, you will need to invalidate it to see the headers after you set them.

Related

submit PUT request through CloudFront

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.
I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

Invalidate Cloudfront's cached data by passing in custom header

I need some resources or general direction.
I am looking into using Cloudfront to help combat latency on calls to my service.
I want to be able to serve cached data, but need to allow the client to be able to specify when they want to bypass cached data and get the latest data instead.
I know that I can send a random value in the query parameter to invalidate the cache. But I want to be able to send a custom header that will do the same thing.
Ideally, I would like to use the Cloudfront that is created behind the scenes with API Gateway. Is this possible? Or would I need to create a new CloudFront to sit in front of API Gateway?
Has anyone done this? Are there any resources you can point me to?
You cannot actually invalidate the CloudFront cache by passing a specific header -- or with a query parameter, for that matter. That is cache busting, and not invalidation.
You can configure CloudFront to include the value of a specific header in the cache key, simply by whitelisting that header for forwarding to the origin -- even if the origin ignores it.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders
However... the need to give your APIs consumers a way to bypass your cache seems like there's a problem with your design. Use an adaptive Cache-Control response header and cache the responses in CloudFront for an appropriate amount of time, and this issue goes away.
Otherwise, the clever ones will just bypass it all the time, by continually changing that value.
CloudFront does caches based on headers.
Create a custom header and whitelist on that header.
CloudFront will fetch from origin if the value is not found in the cache.
Hope it helps.
EDIT:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html
Header based caching.

Best way to handle Cloudfront/S3 website with www redirected to bare domain

I have a website that I would like the www-prefixed version to redirect to the bare domain.
After searching for different solutions, I found this closed topic here with this answer that seems to work great: https://stackoverflow.com/a/42869783/8406990
However, I have a problem where if I update the root object "index/html" in my S3 bucket, it can take over a day before Cloudfront serves the new version. I have even manually invalidated the file, and while that updates the "index.html" file correctly, Cloudfront still serves the old one.
To better explain, if I type in: http://mywebsite.com/index.html, it will serve the new version. But if I type in http://mywebsite.com/, it serves the old index.html.
I went ahead and added "index.html" in the Default Root Object Property of my Cloudfront distribution (for the bare domain), and it immediately worked as I wanted. Typing in just the domain (without adding /index.html) returned the new version.
However, this is in contrast with the answer in the thread I just linked to, which explicitly states NOT to set a "default root object" when using two distributions to do the redirect. I was hoping to gain a better understanding of this "Default Root Object", and whether there is a better way to make sure the root object updates the cached version correctly?
Thank you.
If you really put index.html/ as the default root object and your CloudFront distribution is pointing to the web site hosting endpoint of the bucket and it worked, then you were almost certainly serving up an object in your bucket called index.html/ which would appear in your bucket as a folder, or an object named index.html inside a folder named index.html. The trailing slash doesn't belong new there. This might explain the strange behavior. But that also might be a typo in your question.
Importantly... one purpose of CloudFront is to minimize requests to the back-end and keep copies cached in locations that are geographically near where they are frequently requested. Updating an object in S3 isn't designed to update what CloudFront serves right away, unless you have configured it to do so. One way of doing this is to set (for example) Cache-Control: public, max-age=600 on the object metadata when you save it to S3. This would tell CloudFront never to serve up a cached copy of the object that it obtained from S3 longer than 600 seconds (10 minutes) ago. If you don't set this, CloudFront will not check back for 24 hours, by default (the "Default TTL").
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
This only works in one direction -- it tells CloudFront how long it is permitted to retain a cached copy without checking for updates. It doesn't tell CloudFront that it must wait that long before checking. Objects that are requested infrequently might be released by CloudFront before their max-age expires. The next request fetches a fresh copy from S3.
If you need to wipe an object from CloudFront's cache right away, that's called a cache invalidation. These are billed $0.005 for each path (not each file) that you request be invalidated, but the first 1,000 per month per AWS account are billed at $0.00. You can invalidate all your files by requesting an invalidation for /*. This leaves S3 untouched but CloudFront discards anything it cached before the invalidation request.
The default root object is a legacy feature that is no longer generally needed since S3 introduced static web site hosting buckets. Before that -- and still, if you point CloudFront to the REST endpoint for the bucket -- someone hitting the root of your web site would see a listing of all your objects. Obviously, that's almost always undesirable, so the default root object allowed you to substitute a different page at the root of the site.
With static hosting in S3, you have index documents, which work in any "directory" on your site, making the CloudFront option -- which only works at the root of the site, not anywhere an index document is available. So it's relatively uncommon to use this feature, now.

S3 cloudfront expiry on images - performance very slow

I recently started serving my website images on cloudfront CDN instead of S3 thinking that it would be much faster. It is not. In fact it is much much slower. After a lot of investigation I'm being hinted that setting an expiry date on image objects is the key as Cloudfront will know how long to keep cached static content. Makes sense. But this is poorly documented by AWS and I can't figure out how to change the expiry date. people have said "you can change this on the aws console" Please show how dumb I am because I cannot see this. been at it for several hours. Needless to say I'm quite frustrated of fumbling around on this. Anyways any hints as small as they might be would be terrific. I like AWS, and what Cloudfront promised, but so far it's not what it seems.
EDIT ADITIONAL DETAIL:
Added expiry date headers per answer. In my case I had no headers. My hypothesis was that my slow Cloudfront performance serving images had to do with having NO expiry in the header. Having set an expiry date as shown in screenshot, and described in the answer, I'm seeing no noticeable difference in performance (going from no headers to adding an expiry date only). My site takes on average 7s to load with 10 core images (each <60Kbs). Those 10 images (served via cloudfront) account for 60-80% of the load time latency - depending on the performance tool used. Obviously something is wrong given that serving files on my VPS is faster. I hate to conclude that cloudfront is the problem given that so many people use it and I'd hate to break of from EC2 and S3, but right now testing MAxCDN is showing better results. I'll keep testing over the next 24hrs, but my conclusion is that the expiry date header is just a confusing detail with no favorable impact. Hopefully I'm wrong because I'd like to keep it all in the AWS family. Perhaps I'm barking up the wrong tree on the expiry date thing?
You will need to set it in the meta-data while uploading the file into S3. This article describes how you can achieve this.
The format for the expiry date is the RFC1123 date which is formatted like this:
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Setting this to a far future date will enable caches like CloudFront to hold the file for a long time and in this case speed up the delivery as the single edge-locations (server all over the world delivering content for CloudFront) already have the data and don't need to fetch it again and again.
Even with a far future expiry header the first request of an object will be slow as the edge-location has to fetch the object once before it can be served from the cache.
Alternately you can omit the Expires header and use Cache-Control instead. CloudFront will understand that one too and you are more flexible with the expiry. In that case you can for example state the object should be held for one day from the first request the edge-location made to the object:
Cache-Control: public, max-age=86400
In that case the time is given using seconds instead of a fixed date.
While setting Cache-Control or Expires headers will improve the cache-ability of your objects, it won't improve your 60k/sec download speed. This requires some help from AWS. Therefore, I would recommend posting to the AWS CloudFront Forums with some sample response headers, traceroutes and resolver information.

CloudFront atomic replication

I want to host a static site via Amazon CloudFront from an S3 bucket. If I update the content of the bucket with a new version of the page, is there a way I can ensure the distribution happens in an atomic way?
What I mean is, if I have assets like a.js and b.js, that the updated version of both is served at the same time, and not e.g. the old a.js and new b.js.
You have a couple of options:
You can request an invalidation. Takes about 15 minutes or so to complete.
You can give your new assets a new name. This is a bit harder to do, but in my opinion the preferable route. Since its easier to enable long expiration client side caching.
If you perform object invalidataion, there is no gurantee that two js files will be invalidataed at the same time. There would definitely be some time when your site will behave unexpectedly.
Either you do it at a time when you expect least number of users visiting your site Or create new resources like "datasage" mentioned & then use names of these newly created resources to update all the files that reference these.