AWS CloudFront | Invalidations - amazon-web-services

I am using a single CloudFront to serve multiple domains (*.domain.com), this CloudFront uses an API server as an origin.
Any request made to a domain say, test1.domain.com to goes to the API server and depending on the host header we send the response back.
Now, if I want to purge the cache, I can't run it for a specific domain ( as CloudFront only allows path invalidation like /users/* or /users//books*).
Note: I don't want to purge the entire cache (*/**), as it would cause performance issues.
Did someone face this situation?

Related

AWS Cloudfront Origin Groups "cannot include POST, PUT, PATCH, or DELETE for a cached behavior"

I have two paths in a CloudFront distribution which have the following Behaviors:
Path Pattern
Origin
Viewer Protocol Policy
/api/*
APIOriginGroup
Redirect HTTP to HTTPS
/*
S3OriginGroup
Redirect HTTP to HTTPS
And these Origins:
Origin Name
Origin Domain
Origin Group
S3Origin1
us-east-1.s3.[blah]
S3OriginGroup
S3Origin2
us-east-2.s3.[blah]
S3OriginGroup
APIOrigin1
a domain
APIOriginGroup
APIOrigin2
a domain
APIOriginGroup
This setup works fine for GET requests, but if I add POST requests into the Cache Behavior's Cache Methods I get an error:
"cannot include POST, PUT, PATCH, or DELETE for a cached behavior"
This doesn't make sense to me. If CloudFront really is used by AWS customers to serve billions of requests per day and AWS recommends using CloudFront Origin Failover, which ?requires origin groups?, then it follows that there must be some way to configure CloudFront to allow origin behaviors which allow POST requests. Is this not true? Are all of these API requests being made by this customer GET requests?
To be clear, my fundamental problem is that I want to use CloudFront Origin Failover to switch between my primary region and secondary region when an AWS region fails. To make that possible, I need to switch over not only my front end, S3-based traffic (GET requests), but also switch over my back-end traffic (POST requests).
Note: CloudFront supports routing behaviors with POST requests if you do not use an Origin Group. It seems that only when I added this Origin Group (to support the second region) that this error appeared.
Short Answer: You can't do origin failover in CloudFront for request methods other than GET, HEAD, or OPTIONS. Period.
TL; DR
CloudFront caches GET and HEAD requests always. It can be configured to cache OPTIONS requests too. However it doesn't cache POST, PUT, PATCH, DELETE,... requests which is consistent with the most of the public CDNs out there. However, some of them might provide you with writing some sort of custom hooks by virtue of which you can cache POST, PUT, PATCH, DELETE,... requests. You might be wondering why is that? Why can't I cache POST requests? The answer to that question is RFC 2616. Since POST requests are not idempotent, the specification advises against caching them and sending them to the end server intended, indeed, always. There's a very nice SO thread here which you can read to have a better understanding.
CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET, HEAD, or OPTIONS. CloudFront does not fail over when the viewer sends a different HTTP method (for example POST, PUT, and so on).
Ok. POST requests are not cached by CloudFront. But, why does CloudFront not provide failover for POST requests?
Let's see how does CloudFront handle requests in case of a primary origin failure. See below:
CloudFront routes all incoming requests to the primary origin, even when a previous request failed over to the secondary origin. CloudFront only sends requests to the secondary origin after a request to the primary origin fails.
Now, since POST requests are not cached CloudFront has to go to the primary origin each time, come back with an invalid response or worst a time-out, then hit the secondary origin in the origin group. We're talking about region failures here. The failover requests from primary to secondary would be ridiculously high and we might expect a cascading failure due to high load. This would lead to CloudFront PoP failures and this defeats the whole purpose of high availability, doesn't it? Again, this explanation is only my assumption. Of course, I'm sure folks at CloudFront would come up with a solution for handling POST requests region failover soon.
So far so good. But how are other AWS customers able to guarantee high availability to their users in case of AWS region failures.
Well other AWS customers only use CloudFront region failover to make their static websites, SPAs, static contents like videos (live and on demand), images, etc failure proof which by the way only requires GET, HEAD and occasionally OPTION HTTP requests. Imagine a SaaS company which drives its sales and discoverability via a static website. If you could reduce your downtime by the method above which would ensure your sales/growth doesn't take a hit, why wouldn't you?
Got the point. But I do really need to have region failover for my backend APIs. How can I do it?
One way would be to write a custom Lambda#Edge function. CloudFront hits the intended primary origin, the code inside checks for time-out/response codes/etc and if failover has to be triggered, hits the other origin's endpoint and returns the response. This is again in contradictory to the current schemes of CloudFront.
Another solution would be, which in my opinion is much cleaner, is to make use of latency-based routing support of Route53. You can read about how to do that here. While this method would surely work for your backend APIs if you had different subdomain names for your S3 files and APIs (and those subdomains pointing to different CloudFront distributions) since it leverages CloudFront canonical names, I'm a bit skeptical if this would work in your setup. You can try and test it out, anyways.
Edit: As suggested by OP, there is a third approach to achieve this which is to handle this on the client side. Whenever client receives an unexpected response code or a timeout it makes an API call to another endpoint which is hosted on another region. This solution is a bit cheaper and simpler and easier to implement with current scheme of things available.

How to enable browser cache in AWS cloudfront

I have configured my cloudfront distribution to use the managed cache policy, however all tools (like google pagespeed, cache checkers etc) are not detecting any caching.
Neither is the browser detecting any cache related header.
What am I missing here?
sample CDN url: https://cdn.thekiwi.app/images/skills/SubCategories/81/desktop_b/81.png
I think you misunderstood how Cloudfront cache works.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/controlling-the-cache-key.html
Cloudfront is the CDN that caches the resources on Cloudfront edge servers. Every time your users want to access your application's resources, either of the following 2 scenarios happens:
Cache hit: the resource is cached in Cloudfront servers and returned to your users without requesting your app's origin server
Cache miss: the resource is not cached and a request is sent to your app's origin server to retrieve. Then based on the cache policy you define in your Cloudfront distribution, the resource might or might not be cached in Cloudfront servers.
The cache policy you defined in your Cloudfront server tells Cloudfront when and how long to cache the resource.
Tools like Google pagespeed do not check for CDN cache, it checks for browser cache, which works based on the HTTP response headers such as Cache-Control
In order to set the Cache-Control value, you can either
(1) Write a Cloudfront functions to instruct Cloudfront to insert the header before returning responses to your users
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/example-function-add-cache-control-header.html
or
(2) Update your app's origin server to add the Cache-Control header for your resources.

AWS S3 : Pricing for invalidation requests for React. js application hosted on S3

I have a web application developed using React.js, HTML and Java script.
This react.js web application calls the backend REST APIs.
I have hosted this web application on AWS S3.
I am able to access the web application using HTTP.
To enable HTTPs based access, I am planning to use AWS cloud front.
I don't have much static media content, but few css, js and few small images.
As I understand the pricing for Cloud front is based on
Amount of Data transfer
No. of HTTP/HTTPS Requests
Invalidation Requests
In my case, the web application makes HTTPs calls to the backend when the user requests for a web page or wants to search the records.
I want to know if this every request to the backend is treated as "Invalidation Request" ?
Or does the invalidation requests is applicable only when the static content (HTML, CSS, JS, images) are changed?
Is there any other cost effective option for enabling HTTPs for S3 based web applications?
You would only create an invalidation request if you wanted to purge the CloudFront cache of cached content (ie old version of files) and use a new version.
With your React / HTML / CSS project, you'll put it in the S3 bucket, and set your S3 bucket as the Origin for CloudFront. When CloudFront fetches the objects from S3, it will cache them in its edge cache for future requests for the TTL (time-to-live) of the object. The object will remain there, and CloudFront will not check your origin for a new version of the object until the TTL has expired.
An invalidation request will tell CloudFront to purge the objects from the cache and since they are no longer in the cache, when a request comes to CloudFront it will get the object from your S3 bucket.
Basically, every time you publish a new production build, you'll need CloudFront to be using new versions of your objects, so you'll want to do an invalidation every time you put a new version of you object in to production.
You can read more about invalidations and the costs of invalidations here:
https://aws.amazon.com/blogs/aws/simplified-multiple-object-invalidation-for-amazon-cloudfront/
It's worth noting that we do 5-10 releases per day, and our CodePipeline takes care of the invalidations for us. We've never paid any charges for invalidations. Also, just a heads up that, in my experience, depending on the number of objects being invalidated, invalidations can take anywhere from just a few minutes, to over 30 minutes.

cloudfront cache with CloudFront-Is-Mobile-Viewer header

I am new to Cloudfront and trying to maintain and access Cloudfront cache effectively.
If I whitelist CloudFront-Is-Mobile-Viewer header, Will Cloudfront start maintaining the cache based on each user agent?
or it will just maintain whether the request is from mobile or not ?
Thanks,
Manish
Cloudfront will not maintain the cache for each User-Agent.
In fact, Cloudfront discourages to whitelist headers which can have a lot of possible values as it will increase the number of cached copies on the CDN and also increase the number of requests which are sent to the origin.
It will cache the response sent from the origin server based on the Cloudfront-Is-Mobile-Viewer header only. So, for all mobile devices which are detected by Cloudfront-Is-Mobile-Viewer header the same cached content will be delivered to the end users.

Dynamically choose an S3 object to be served by CloudFront

Is it possible to have a custom origin server tell CloudFront to directly serve a file from an S3 bucket, similar to the way X-Sendfile works in Nginx? I'd like to avoid having to read the file from S3 and pipe it to CloudFront.
No, this isn't possible.
Once the request is sent from CloudFront to the origin server, the only thing CloudFront will do (unless an error occurs, of course) is return the origin server's response to the requester.
The only way that comes to mind in which this could really be possible would be if CloudFront followed HTTP redirects, which it does not do.
If you want to return content from elsewhere once the request has arrived at the origin, you'll have to fetch it and stream it back... which will probably perform better than you expect, if the bucket is in the same region as the origin server and your code is tight. The latency to S3 within a region is very low and the available bandwidth is high. I have an application that does exactly this, many thousands of times each day on a little t2 instance, so it's certainly viable.
Of course, with a single CloudFront distrubution, you can have multiple origins -- such as your server and S3. CloudFront can choose which origin will handle each request based on path pattern matching... but that's a static mapping, so I don't know whether it applies to what you're trying to do.