Best way to handle Cloudfront/S3 website with www redirected to bare domain - amazon-web-services

I have a website that I would like the www-prefixed version to redirect to the bare domain.
After searching for different solutions, I found this closed topic here with this answer that seems to work great: https://stackoverflow.com/a/42869783/8406990
However, I have a problem where if I update the root object "index/html" in my S3 bucket, it can take over a day before Cloudfront serves the new version. I have even manually invalidated the file, and while that updates the "index.html" file correctly, Cloudfront still serves the old one.
To better explain, if I type in: http://mywebsite.com/index.html, it will serve the new version. But if I type in http://mywebsite.com/, it serves the old index.html.
I went ahead and added "index.html" in the Default Root Object Property of my Cloudfront distribution (for the bare domain), and it immediately worked as I wanted. Typing in just the domain (without adding /index.html) returned the new version.
However, this is in contrast with the answer in the thread I just linked to, which explicitly states NOT to set a "default root object" when using two distributions to do the redirect. I was hoping to gain a better understanding of this "Default Root Object", and whether there is a better way to make sure the root object updates the cached version correctly?
Thank you.

If you really put index.html/ as the default root object and your CloudFront distribution is pointing to the web site hosting endpoint of the bucket and it worked, then you were almost certainly serving up an object in your bucket called index.html/ which would appear in your bucket as a folder, or an object named index.html inside a folder named index.html. The trailing slash doesn't belong new there. This might explain the strange behavior. But that also might be a typo in your question.
Importantly... one purpose of CloudFront is to minimize requests to the back-end and keep copies cached in locations that are geographically near where they are frequently requested. Updating an object in S3 isn't designed to update what CloudFront serves right away, unless you have configured it to do so. One way of doing this is to set (for example) Cache-Control: public, max-age=600 on the object metadata when you save it to S3. This would tell CloudFront never to serve up a cached copy of the object that it obtained from S3 longer than 600 seconds (10 minutes) ago. If you don't set this, CloudFront will not check back for 24 hours, by default (the "Default TTL").
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
This only works in one direction -- it tells CloudFront how long it is permitted to retain a cached copy without checking for updates. It doesn't tell CloudFront that it must wait that long before checking. Objects that are requested infrequently might be released by CloudFront before their max-age expires. The next request fetches a fresh copy from S3.
If you need to wipe an object from CloudFront's cache right away, that's called a cache invalidation. These are billed $0.005 for each path (not each file) that you request be invalidated, but the first 1,000 per month per AWS account are billed at $0.00. You can invalidate all your files by requesting an invalidation for /*. This leaves S3 untouched but CloudFront discards anything it cached before the invalidation request.
The default root object is a legacy feature that is no longer generally needed since S3 introduced static web site hosting buckets. Before that -- and still, if you point CloudFront to the REST endpoint for the bucket -- someone hitting the root of your web site would see a listing of all your objects. Obviously, that's almost always undesirable, so the default root object allowed you to substitute a different page at the root of the site.
With static hosting in S3, you have index documents, which work in any "directory" on your site, making the CloudFront option -- which only works at the root of the site, not anywhere an index document is available. So it's relatively uncommon to use this feature, now.

Related

CloudFlare or AWS CDN links

I have a script that I install on a page and it will load some more JS and CSS from an S3 bucket.
I have versions, so when I do a release on Github for say 1.1.9 it will get deployed to /my-bucket/1.1.9/ on S3.
Question, if I want to have something like a symbolic link /my-bucket/v1 -> /my-bucket/1.1.9, how can I achieve this with AWS or CloudFlare?
The idea is that I want to release a new version by deploying it, to my bucket or whatever CDN, and than when I am ready I want to switch v1 to the latest 1.x.y version released. I want all websites to point to /v1 and get the latest when there is new release.
Is there a CDN or AWS service or configuration that will allow me to create a sort of a linux-like symbolic link like that?
A simple solution with CloudFront requires a slight change in your path design:
Bucket:
/1.1.9/v1/foo
Browser:
/v1/foo
CloudFront Origin Path (on the Origin tab)
/1.1.9
Whatever you configure as the Origin Path is added to the beginning of whatever the browser requested before sending the request to the Origin server.
Note that changing this means you also need to do a cache invalidation, because responses are cached based on what was requested, not what was fetched.
There is a potential race condition here, between the time you change the config and invalidate -- there is no correlation in the order of operations between configuration changes and invalidation requests -- a config change followed by an invalidation may be completed after,¹ so will probably need to invalidate, update config, invalidate, verify that the distribution had progressed a stable state, then invalidate once more. You don't need to invalidate objects individually, just /* or /v1*. It would be best if only the resource directly requested is subject to the rewrite, and not it's dependencies. Remember, also, that browser caching is a big cost-saver that you can't leverage as fully if you use the same request URI to represent a different object over time.
More complicated path rewriting in CloudFront requires a Lambda#Edge Origin Request trigger (or you could use Viewer Request, but these run more often and thus cost more and add to overall latency).
¹ Invalidation requests -- though this is not documented and is strictly anecdotal -- appear to involve a bit of time travel. Invalidations are timestamped, and it appears that they invalidate anything cached before their timestamp, rather than before the time they propagate to the edge locations. Architecturally, it would make sense if CloudFront is designed such that invalidations don't actively purge content, but only serve as directives for the cache to consider any cached object as stale if it pre-dates the timestamp on the invalidation request, allowing the actual purge to take place in the background. Invalidations seem to complete too rapidly for any other explanation. This means creating an invalidation request after the distribution returns to the stable Deployed state would assure that everything old is really purged, and that another invalidation request when the change is initially submitted would catch most of the stragglers that might be served from cache before the change is propagated. Changes and invalidations do appear to propagate to the edges via independent pipelines, based on observed completion timing.

Can a distribution automatically match the subdomain from a request to figure out the origin

We're adding a lot of nearly equivalent apps on the same domain, each app can be accessed through its specific subdomain. Each app has got specific assets (not a lot).
Every app refer to the same cdn.mydomain.com to get the assets from cloudfront.
Assets are named spaced. For exemple:
app1:
Can be reached from app1.mydomain.com
assets url is cdn.mydomain.com/assets/app1
cloudfront orgin app1.mydomain.com
cache behavior /assets/app1/* to origin app1.mydomain.com
When Cloudfront doesn't have the assets in cache, it downloads it from the right origin.
Actually we're making a new origin and cache behavior on the same distribution each time we add a new app.
We're trying to simplify that process so Cloudfront can be able to get the assets from the right origin without having to specify it. And this will resolve the problem if we hit the limit of the number of origin in one distribution.
How can we do this and is it possible?
We're thinking of making an origin of mydomain.com with a cache configure to forward the host header but we're not sure that this will do the trick.
Origins are tied to Cache Behaviors, which are tied to path patterns. You can't really do what you're thinking about doing.
I would suggest that you should create a distribution for each app and each subdomain. It's very easy to script this using aws-cli, since once you have one set up the way you like it, you can use its configuration output as a template to make more, with minimal changes. (I use a Perl script to build the final JSON to create each distribution, with minimal inputs like alternate domain name and certificate ARN and pipe its output into aws-cli.)
I believe this is the right approach, because:
CloudFront cannot select the origin based on the Host header. Only the path pattern is used to select the origin.
Lambda#Edge can rewrite the path and can inspect the Host header, but it cannot rewrite the path before the matching is done that selects the Cache Behavior (and thus the origin). You cannot use Lambda#Edge to cause CloudFront to switch or select origins, unless you generatre browser redirects, which you probably don't want to do, for performance reasons. I've submitted a feature request to allow a Lambda trigger to signal CloudFront that it should return to the beginning of processing and re-evaluate the path, but I don't know if it is being considered as a future feature -- AWS tends to keep their plans for future functionality close to the vest, and understandably so.
you don't gain any efficiency or cost savings by combining your sites in a single distribution, since the resources are different
if you decide to whitelist the Host header, that means CloudFront will cache responses, separately, based on the Host header, the same as it would do if you had created multiple distributions. Even if the path is identical, it will still cache separate responses if the Host header differs, as it must to ensure sensible behavior
the default limit for distributions is 200, while the limit for origins and cache behaviors is 25. Both can be raised by request, but the number of distributions they can give you is unlimited, while the other resources are finite because they increase the workload on the system for each request and would eventually have a negative performance impact
separate distributions gives you separate logs and reports
provisioning errors have a smaller blast radius when each app has its own distribution
You can also go into Amazon Certificate Manager and a wildcard certificate for * *.cdn.example.com. Then use e.g. app1.cdn.example.com as the alternate domain name for the app1 distribution and attach the wildcard cert. Then reuse the same cert on the app2.cdn.app.com distribution, etc.
Note that you also have an easy migration strategy from your current solution: You can create a single distribution with *.cdn.example.com as its alternate domain name. Code the apps to use their own unique-name-here.cdn.example.com. Point all the DNS records here. Later, when you create a distribution with a specific alternate domain name foo.cdn.example.com, CloudFront will automatically stop routing those requests to the wildcard distribution and start routing them to the one with the specific domain. You will need to change the DNS entry... but CloudFront will actually handle the requests correctly, routing them to the newly-created distribution, before you change the DNS, because it has some internal magic that will match the non-wildcard hostname to the correct distribution regardless of whether the browser connects to the new endpoint or the old... so the migration event should pretty much be a non-event.
I'd suggest the wildcard strategy is a good one, anyway, so that your apps are each connecting to a specific endpoint hostname, allowing you much more flexibility in the future.

GET from CloudFront after PUT returns old data

I am using CloudFront to access my S3 bucket.
I perform both GET and PUT operations to retrieve and update the data. The problem is that after i send PUT request with new data, GET request still returns the older data. I do see that the file is updated in S3 bucket.
I am performing both GET and PUT from iOS application. However, i tried performing GET request using regular browsers and i still receive older data.
Do i need to do anything in addition to make CloudFront refresh its data?
Cloudfront caches your data. How long depends on the headers the origin serves content with and the distribution settings.
Amazon has a document with the full results of how they interac, but if you haven't set your cache control headers, not changed any cloudfront settings, then by default data is cached for upto 24 hours.
You can either:
set headers indicating how long to cache content for (e.g. Cache-Control: max-age=300 to allow caching for up to 5 minutes). How exactly you do this depends on how you are uploading the content, at a pinch you can use the console
Use the console / api to invalidate content. Beware that only the first 1000 invalidations a month a free - beyond that amazon charges. In addition, invalidations take 10-15 minutes to process.
Change the naming strategy for your s3 data so that new data is served under a different name (perhaps less relevant in your case)
When you PUT an object into S3 by sending it through Cloudfront, Cloudfront proxies the PUT request back to S3, without interpreting it within Cloudfront... so the PUT request changes the S3 object, but the old version of the object, if cached, would have no reason to be evicted from the Cloudfront cache, and would continue to be served until it is expired, evicted, or invalidated.
"The" Cloudfront cache is not a single thing. Cloudfront has over 50 global edge locations (reqests are routed to what should be the closest one, using geolocating DNS), and objects are only cached in locations through which they have been requested. Sending an invalidation request to purge an object from cache causes a background process at AWS to contact all of the edge locations and request the object be purged, if it exists.
What's the point of uploading this way, then? The point has to do with the impact of packet loss, latency, and overall network performance on the throughput of a TCP connection.
The Cloudfront edge locations are connected to the S3 regions by high bandwidth, low loss, low latency (within the bounds of the laws of physics) connections... so the connection from the "back side" of Cloudfront towards S3 may be a connection of higher quality than the browser would be able to establish.
Since the Cloudfront edge location is also likely to be closer to the browser than S3 is, the browser connection is likely to be of higher quality and more resilient... thereby improving the net quality of the end-to-end logical connection, by splitting it into two connections. This feature is solely about performance:
http://aws.amazon.com/blogs/aws/amazon-cloudfront-content-uploads-post-put-other-methods/
If you don't have any issues sending directly to S3, then uploads "through" Cloudfront serve little purpose.

CloudFront atomic replication

I want to host a static site via Amazon CloudFront from an S3 bucket. If I update the content of the bucket with a new version of the page, is there a way I can ensure the distribution happens in an atomic way?
What I mean is, if I have assets like a.js and b.js, that the updated version of both is served at the same time, and not e.g. the old a.js and new b.js.
You have a couple of options:
You can request an invalidation. Takes about 15 minutes or so to complete.
You can give your new assets a new name. This is a bit harder to do, but in my opinion the preferable route. Since its easier to enable long expiration client side caching.
If you perform object invalidataion, there is no gurantee that two js files will be invalidataed at the same time. There would definitely be some time when your site will behave unexpectedly.
Either you do it at a time when you expect least number of users visiting your site Or create new resources like "datasage" mentioned & then use names of these newly created resources to update all the files that reference these.

Is it possible to set Amazon CloudFront cache to far-future expiry?

How do I change the cache expiry in CloudFront on AWS? I can't see a way to do it and I think I saw an old post of a few years ago on here was somebody said it couldn't be done.
I've gone through every option in S3 and CloudFront and every option on the outer folder and on the file, but nothing.
Can it be done now, or is there any alternative? I really want to set the cache to 6 months or a year if I can.
AWS is hard work.
You can, but its not exactly obvious how this works.
You can store custom http headers with your s3 objects. If you look at the console, this is under the metadata section for an object. With this you can set a far future expires header.
Cloudfront will take the existing headers and pass them on. If cloudfront is already caching the object, you will need to invalidate it to see the headers after you set them.