I am new to Cloudfront and trying to maintain and access Cloudfront cache effectively.
If I whitelist CloudFront-Is-Mobile-Viewer header, Will Cloudfront start maintaining the cache based on each user agent?
or it will just maintain whether the request is from mobile or not ?
Thanks,
Manish
Cloudfront will not maintain the cache for each User-Agent.
In fact, Cloudfront discourages to whitelist headers which can have a lot of possible values as it will increase the number of cached copies on the CDN and also increase the number of requests which are sent to the origin.
It will cache the response sent from the origin server based on the Cloudfront-Is-Mobile-Viewer header only. So, for all mobile devices which are detected by Cloudfront-Is-Mobile-Viewer header the same cached content will be delivered to the end users.
Related
I have two paths in a CloudFront distribution which have the following Behaviors:
Path Pattern
Origin
Viewer Protocol Policy
/api/*
APIOriginGroup
Redirect HTTP to HTTPS
/*
S3OriginGroup
Redirect HTTP to HTTPS
And these Origins:
Origin Name
Origin Domain
Origin Group
S3Origin1
us-east-1.s3.[blah]
S3OriginGroup
S3Origin2
us-east-2.s3.[blah]
S3OriginGroup
APIOrigin1
a domain
APIOriginGroup
APIOrigin2
a domain
APIOriginGroup
This setup works fine for GET requests, but if I add POST requests into the Cache Behavior's Cache Methods I get an error:
"cannot include POST, PUT, PATCH, or DELETE for a cached behavior"
This doesn't make sense to me. If CloudFront really is used by AWS customers to serve billions of requests per day and AWS recommends using CloudFront Origin Failover, which ?requires origin groups?, then it follows that there must be some way to configure CloudFront to allow origin behaviors which allow POST requests. Is this not true? Are all of these API requests being made by this customer GET requests?
To be clear, my fundamental problem is that I want to use CloudFront Origin Failover to switch between my primary region and secondary region when an AWS region fails. To make that possible, I need to switch over not only my front end, S3-based traffic (GET requests), but also switch over my back-end traffic (POST requests).
Note: CloudFront supports routing behaviors with POST requests if you do not use an Origin Group. It seems that only when I added this Origin Group (to support the second region) that this error appeared.
Short Answer: You can't do origin failover in CloudFront for request methods other than GET, HEAD, or OPTIONS. Period.
TL; DR
CloudFront caches GET and HEAD requests always. It can be configured to cache OPTIONS requests too. However it doesn't cache POST, PUT, PATCH, DELETE,... requests which is consistent with the most of the public CDNs out there. However, some of them might provide you with writing some sort of custom hooks by virtue of which you can cache POST, PUT, PATCH, DELETE,... requests. You might be wondering why is that? Why can't I cache POST requests? The answer to that question is RFC 2616. Since POST requests are not idempotent, the specification advises against caching them and sending them to the end server intended, indeed, always. There's a very nice SO thread here which you can read to have a better understanding.
CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET, HEAD, or OPTIONS. CloudFront does not fail over when the viewer sends a different HTTP method (for example POST, PUT, and so on).
Ok. POST requests are not cached by CloudFront. But, why does CloudFront not provide failover for POST requests?
Let's see how does CloudFront handle requests in case of a primary origin failure. See below:
CloudFront routes all incoming requests to the primary origin, even when a previous request failed over to the secondary origin. CloudFront only sends requests to the secondary origin after a request to the primary origin fails.
Now, since POST requests are not cached CloudFront has to go to the primary origin each time, come back with an invalid response or worst a time-out, then hit the secondary origin in the origin group. We're talking about region failures here. The failover requests from primary to secondary would be ridiculously high and we might expect a cascading failure due to high load. This would lead to CloudFront PoP failures and this defeats the whole purpose of high availability, doesn't it? Again, this explanation is only my assumption. Of course, I'm sure folks at CloudFront would come up with a solution for handling POST requests region failover soon.
So far so good. But how are other AWS customers able to guarantee high availability to their users in case of AWS region failures.
Well other AWS customers only use CloudFront region failover to make their static websites, SPAs, static contents like videos (live and on demand), images, etc failure proof which by the way only requires GET, HEAD and occasionally OPTION HTTP requests. Imagine a SaaS company which drives its sales and discoverability via a static website. If you could reduce your downtime by the method above which would ensure your sales/growth doesn't take a hit, why wouldn't you?
Got the point. But I do really need to have region failover for my backend APIs. How can I do it?
One way would be to write a custom Lambda#Edge function. CloudFront hits the intended primary origin, the code inside checks for time-out/response codes/etc and if failover has to be triggered, hits the other origin's endpoint and returns the response. This is again in contradictory to the current schemes of CloudFront.
Another solution would be, which in my opinion is much cleaner, is to make use of latency-based routing support of Route53. You can read about how to do that here. While this method would surely work for your backend APIs if you had different subdomain names for your S3 files and APIs (and those subdomains pointing to different CloudFront distributions) since it leverages CloudFront canonical names, I'm a bit skeptical if this would work in your setup. You can try and test it out, anyways.
Edit: As suggested by OP, there is a third approach to achieve this which is to handle this on the client side. Whenever client receives an unexpected response code or a timeout it makes an API call to another endpoint which is hosted on another region. This solution is a bit cheaper and simpler and easier to implement with current scheme of things available.
I have configured my cloudfront distribution to use the managed cache policy, however all tools (like google pagespeed, cache checkers etc) are not detecting any caching.
Neither is the browser detecting any cache related header.
What am I missing here?
sample CDN url: https://cdn.thekiwi.app/images/skills/SubCategories/81/desktop_b/81.png
I think you misunderstood how Cloudfront cache works.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/controlling-the-cache-key.html
Cloudfront is the CDN that caches the resources on Cloudfront edge servers. Every time your users want to access your application's resources, either of the following 2 scenarios happens:
Cache hit: the resource is cached in Cloudfront servers and returned to your users without requesting your app's origin server
Cache miss: the resource is not cached and a request is sent to your app's origin server to retrieve. Then based on the cache policy you define in your Cloudfront distribution, the resource might or might not be cached in Cloudfront servers.
The cache policy you defined in your Cloudfront server tells Cloudfront when and how long to cache the resource.
Tools like Google pagespeed do not check for CDN cache, it checks for browser cache, which works based on the HTTP response headers such as Cache-Control
In order to set the Cache-Control value, you can either
(1) Write a Cloudfront functions to instruct Cloudfront to insert the header before returning responses to your users
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/example-function-add-cache-control-header.html
or
(2) Update your app's origin server to add the Cache-Control header for your resources.
I am using a single CloudFront to serve multiple domains (*.domain.com), this CloudFront uses an API server as an origin.
Any request made to a domain say, test1.domain.com to goes to the API server and depending on the host header we send the response back.
Now, if I want to purge the cache, I can't run it for a specific domain ( as CloudFront only allows path invalidation like /users/* or /users//books*).
Note: I don't want to purge the entire cache (*/**), as it would cause performance issues.
Did someone face this situation?
We have an aws s3 bucket that hosts our dynamic images, which will be fetched by web and mobile apps through https and with different sizes (url/width x height/image_name) i.e. http://test.s3.com/200x300/image.png).
For this we did two things:
1- Realtime resizing: I have a redirection rule in my s3 bucket to redirect 404 errors requesting non-existing image sizes to an API gateway that calls a Lambda function. The lambda function fetches the original image and resizes it and places it in a folder in the bucket matching the requested size.
We followed the steps in this articles:
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
2- HTTPS: I created a cloudfront distribution with an SSL certificate and its origin is the s3 static website endpoint
Problem: Requesting an image from s3 using the cloudfront https domain always causes an 404 error which gets redriected by my redirection rule the API gateway, even if this specific image size already exists.
I tried to debug this issue with no luck. I examined the requests and from what I see things should work normally.
I'd appreciate a hint on what to do to better debug this issue (and what kind of logs I need to provide here).
Thanks
Sary
This solution relies on S3 generating HTTP redirects for missing objects, to redirect the browser to API Gateway to resize the object... and save it at the original URL.
The problem is two-fold:
S3 generated redirects don't include any Cache-Control headers, and
CloudFront's default behavior when Cache-Control is absent in a response is to cache the response internally for the value of a timer called Default TTL, which by default is set to 86400 seconds (24 hours).
The problem this causes is that CloudFront will remember the original redirect and send the browser to it, again and again, even though the object is now present.
Selecting Customize instead of Use Origin Cache Headers for "Object caching" and then setting Default TTL to 0 (all in the CloudFront Cache Behavior settings) will resolve the issue, because it configures CloudFront not to cache responses where the origin didn't include any relevant Cache-Control headers.
For more background:
What is Cloudfront Minimum TTL for? explains the Minimum/Default/Maximum TTL timers and how/when they apply.
Setting "Object Caching" on CloudFront explains the confusing UI labeling of these options, which is likely a holdover from a time before all three timers were configurable.
Is it possible to have a custom origin server tell CloudFront to directly serve a file from an S3 bucket, similar to the way X-Sendfile works in Nginx? I'd like to avoid having to read the file from S3 and pipe it to CloudFront.
No, this isn't possible.
Once the request is sent from CloudFront to the origin server, the only thing CloudFront will do (unless an error occurs, of course) is return the origin server's response to the requester.
The only way that comes to mind in which this could really be possible would be if CloudFront followed HTTP redirects, which it does not do.
If you want to return content from elsewhere once the request has arrived at the origin, you'll have to fetch it and stream it back... which will probably perform better than you expect, if the bucket is in the same region as the origin server and your code is tight. The latency to S3 within a region is very low and the available bandwidth is high. I have an application that does exactly this, many thousands of times each day on a little t2 instance, so it's certainly viable.
Of course, with a single CloudFront distrubution, you can have multiple origins -- such as your server and S3. CloudFront can choose which origin will handle each request based on path pattern matching... but that's a static mapping, so I don't know whether it applies to what you're trying to do.