Tell CloudFront to only cache 200 response codes - amazon-web-services

Is it possible to configure Amazon CloudFront to only ever cache 200 codes? I want it to never cache 3xx as I want to connect it to an on the fly image processing tool with Lambda that performs a 307 via S3 as described ere https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/

There isn't a way to explicitly tell CloudFront to cache only 2XX's and not cache 3XX's unless you can configure the origin to set the Cache-Control header accordingly -- CloudFront considers 2XX and 3XX as "success" and treats them the same. (It has different rules for 4XX and 5XX only, and an obvious exception for a 304 response to a conditional request.)
In the case of S3 redirects, the problem with this is that S3 redirection rules do not allow a Cache-Control header to be set.
However, if you are setting the Cache-Control headers correctly on the objects when you create them in S3 -- as you should be -- then you can probably¹ rely on CloudFront's Default TTL setting to solve the problem entirely, by telling CloudFront that responses lacking a Cache-Control header should not be cached. This would mean setting the Default TTL to 0, and would of course require that the Minimum TTL also be set to 0, since minimum <= default is required.
The Maximum TTL should be left at its default value, since it is used to shorten the CloudFront cache time for objects with a max-age that is larger than Maximum TTL. You don't likely want to shorten the cacheability of 2XX responses.
Assuming browsers behave correctly and do not cache the redirect (which they shouldn't, for 307 or 302), then your issue is resolved, because CloudFront behaves as expected in this configuration -- honoring Cache-Control when it's present, and not caching responses when it's absent.
However, you might have to get more aggressive, if you find that browsers or other downstream caches are holding on to your redirects.
The only way to explicitly add Cache-Control (or other headers) to responses when the origin doesn't provide them would be with Lambda#Edge. The following code, used as an Origin Response² trigger, would add Cache-Control: no-cache, no-store, private (yes, it's a bit redundant) to any 3XX HTTP response received from an origin server. If any Cache-Control header is present on the origin's response, it would be overwritten. Any other response (e.g. 2XX) would not be modified.
'use strict';
// add Cache-Control: no-cache, ... only if response status code is 3XX
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
if (response.status.match(/^30[27]/))
{
response.headers['cache-control'] = [{
key: 'Cache-Control',
value: 'no-cache, no-store, private'
}];
}
callback(null, response);
};
With this trigger in place, 2XX responses do not have their headers modified, but 302/307 responses will be modified as shown. This will tell CloudFront and the browser not to cache the response.
¹ probably... is not intended to imply that CloudFront merely might do the right thing. CloudFront behaves exactly as expected. Probably refers to this being the only action needed: You can probably consider this solution sufficient, because probably browsers will not cache the redirect. Browser behavior, as usual, is the wildcard that may require the more aggressive addition of explicit Cache-Control headers to prevent caching of the redirect by the browser.
² Origin Response triggers examine and can modify certain aspects of responses before they are cached (if they are cached) and returned to the viewer. Modifying or adding Cache-Control headers at this point in the flow would prevent the response from being stored in the CloudFront cache, and should prevent browser caching as well.

You can ignore Response Page Path and HTTP Response Code in your use case.
Next, on CloudFront Behaviour Make sure Caching is zero if you want to retrieve every time from the origin.
If you are using headers, make sure the Origin Cache-Control Headers has the right caching header values.

Related

How to get CloudFront to cache locally for max-age

I have a custom origin that I am trying to cache locally. The default cache-control sends no-store, no-cache, must-revalidate, and the pre and post checks are set to 0. I actually need the opposite. I need the browser to store, to cache, NOT to revalidate until my max age (24h / 86400s) is hit. There is private data being used and I don't need users seeing other user's data. I set up a response header policy to override cache-control with "private, max-age=86400" and I only get 200 HTTP responses (I am looking for a 304) and in x-cache I keep getting "miss from cloudfront" as well.
Some info about my setup:
Protocol: Match Viewer
SSL: TLSv1.1 (Custom cert that AWS handles)
Methods: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE
Unrestricted viewer access
Legacy Cache: Headers(None), Query strings (ALL), Cookies (None)
Object cache: Use origin headers
Response Policy: Custom cache-control "private, max-age=86400"
The only other behaviors are using CachingDisabled on folders I don't want caching. I know this is likely just a bad config issue but I have no idea what's causing it.

Google Cloud CDN vary:cookie response never gets cache hit

I'm using Google Cloud CDN to cache an HTML page.
I've configured all the correct headers as per the docs, and the page is caching fine. Now, I want to change it so that it only caches when the request has no cookies, i.e. no cookie header set.
My understanding was that this was simply a case of changing my origin server to add a vary: cookie header to all responses for the page, then only adding the caching headers Cache-Control: public and Cache-Control: max-age=300 when no cookie header is set on the request.
However, this doesn't work. Using curl I can see that all caching headers, the vary: cookie header, are set as expected when I send requests with and without cookies, but I never get cache hits on the requests without cookies.
Digging into the Cloud CDN logs, I see that every request with no cookie header has cacheFillBytes populated with the same number as the response size - whereas it's not for the requests with a cookie header set with a value (as expected).
So it appears like Cloud CDN is attempting to populate the cache as expected for requests with no cookies, it's just that I never get a cache hit - i.e. it's just cacheFillBytes every time, cacheHit: true never appears in the logs.
Has anyone come across anything similar? I've triple-checked all my headers for typos, and indeed just removing the vary: cookie header makes caching work as expected, so I'm almost certain my configuration is right in terms of headers and what Cloud CDN considers cacheable.
Should Cloud CDN handle vary: cookie like I'm expecting it to? The docs suggest it handles arbitrary vary headers. And if so, why would I see cacheFillBytes on every request, with Cache-Control: public and Cache-Control: max-age=300 set on the response, but then never see a cacheHit: true on any subsequent request (I've tried firing hundreds with curl in a loop, it really never hits, it's not just that I'm populating a few different edge caches)?
I filed a bug with Google and it turns out that, indeed, the documentation was wrong.
vary: cookie is not supported by Cloud CDN
The docs have been updated - the only headers that can be used with vary are Accept, Accept-Encoding and Origin.
As per the GCP documentation[1], it is informed that Cloud CDN respects any Vary headers that origin servers include in responses. As per this information it looks like vary:cookie is supported by GCP Cloud CDN since any Vary header that the origin serves will be respected by Cloud CDN. Keep in mind though that this will negatively impact caching because the Vary header indicates that the response varies depending on the client's request headers. Therefore, if a request for an object has request header Cookie: abc, then a subsequent request for the same object with request header Cookie: xyz would not be served from the cache.So, yes it is supported and respected but will impact caching (https://cloud.google.com/cdn/docs/troubleshooting-steps?hl=en#low-hit-rate).
[1]https://cloud.google.com/cdn/docs/caching#vary_headers

Content-Encoding header not returned from Cloudfront

I'm trying to deliver compressed CSS and JS files to my web app. The files are hosted on S3, with a Cloudfront distribution in front of the S3 origin to provide edge cacheing. I'm having trouble getting these files to the browser both compressed and with the right cache-related headers to allow the browser to cache as well.
I have a cloudfront distribution with S3 as the Origin to deliver the JS and CSS files for my web app. I initially set up CloudFront to compress the files, but it would not send the Cache-Control or ETag headers in the response.
Since I also wanted to leverage the browser cache too, I thought of storing the gzipped files in S3, with the Cache-Control, and Content-Encoding headers attached. I did this, and the CloudFront did start returning the Cache-Control and ETag headers in the response, but it would not return the Content-Encoding: gzip header in the response (that I set in the file metadata in S3). Because this header is missing in the response, the browser doesn't know to uncompress the response and ends up with an unreadable file.
I've also tried setting up a viewer response edge lambda to add the Content-Encoding header, but this is disallowed (see the AWS docs) and results in a LambdaValidationError.
Is there something I'm missing here that would allow the files to make it to the browser with compression, AND still allow the Cache-Control and ETag headers to make it through to the browser?
Any help would be much appreciated!
The way I usually do this is to upload uncompressed content to the S3 bucket and put Cache-Control headers on your items there. The Cache-Control header is the only thing I set in the origin (S3).
In Cloudfront I check the 'Compress Objects Automatically' option in Behavior Settings to have Cloudfront compress the files for me. That takes care of the Content-Encoding and Last-Modified headers and the gzipping. That should be all you need. You won't see an ETag header from Cloudfront but Last-Modified does essentially the same thing here.
If you don't see your changes coming through, check that you properly invalidated your Cloudfront cache. I see a lot of people put / in the box but it's really /* to invalidate the entire distribution.
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
This should take care of gzipping, caching from the CDN and browser caching.
Good luck!
In your particular case I think you are missing one bit. You need to modify you distrubtion in cloudfront like this:
-> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header.
In general the way caching in CloudFront is:
If you have compression enabled in CloudFront, all files which can be compressed, meaning:
have compressible type: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront-file-types
are above 1kb https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront
will be compressed by CloudFront and will have etag header removed by default. CloudFront does not touch/modify cache-control header which you can set as attribute in your s3 objects.
It might be confusing when diagnosing the disappearance of etag with with curl. Curl by default will return etag because it does not send header:
"Accept-Encoding: gzip, deflate, br"
until you specify it. For non-compressed content etag is preserved by CloudFront.
One thing you can do to have etag is to disable compression on cloudfront but it means increased cost, higher load times.
Other thing is to is to white-list Accept-Encoding header on cloudfront: -> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header and upload compressed s3 object. Remember to setup "Content Encoding" metadata accordingly. Here you will find an instruction: https://medium.com/#graysonhicks/how-to-serve-gzipped-js-and-css-from-aws-s3-211b1e86d1cd
From now on CloudFront will keep cached version and share etag. More reading: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html?shortFooter=true#header-caching-web-compressed
Additionally CloudFront adds:
last-modified: Sat, 13 Jul 2019 07:11:35 GMT
header.
If you have cache-control present without etag that the caching behavior works as described here:
https://developer.mozilla.org/pl/docs/Web/HTTP/Headers/Cache-Control
If you have only last-modified than it is not 100% obvious how long the browser will cache such request.
Based on my experience when firefox and chrome have this object already cached, when retrieving such object again from CloudFront will add request header:
if-modified-since: Sat, 13 Jul 2019 07:11:35 GMT
CloudFront will respond will proper data if it was modified after this date.
On IE it seems like heuristic caching algorithm is used, you can read more about it here: https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/.
For IE the object can be cached for as long as: (current time - last-modified) * 10%.

Implement Lambda#Edge authentication for CloudFront

I am looking to add the Lambda#Edge to one of our services. The goal is to regex the url for certain values and compare those against a header value to ensure authorization. If the value is present then it is compared and if rejected should return a 403 immediately to the user. If the value compared matches or the url doesn't contain a particular value, then the request continues on as an authorized request.
Initially I was thinking that this would occur with a "viewer request" event. Some of the posts and comments on SO suggest that the "origin request" is more ideal for this check. But right now I've been trying to play around with the examples in the documentation on one of our CF end points but I'm not seeing expected results. The code is the following:
'use strict';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
request.headers["edge-test"] = [{
key: 'edge-test',
value: Date.now().toString()
}];
console.log(require('util').inspect(event, { depth: null }));
callback(null, request);
};
I would expect that there should be a logged value inside cloudwatch and a new header value in the request, yet I'm not seeing any logs nor am I seeing the header value when the request comes in.
Can someone shed some light on why things don't seem to be executing as to what I would think should be the response? Is my understanding of what the expected output wrong? Is there configuration that I may be missing (My distribution ID on the trigger is set to the instance we want, and the behavior was set to '*')? Any help is appreciated :)
First, a few notes;
CloudFront is (among other things) a web cache.
A web cache's purpose is to serve content directly to the browser instead of sending the request to the origin server.
However, one of the most critical things a cache must do correctly is not return the wrong content. One of the ways a cache can return the wrong content is by not realizing that certain request headers may cause the orogin server to vary the response it returns for a given URI.
CloudFront has no perfect way of knowing this, so its solution -- by default -- is to remove almost all of the headers from the request before forwarding it to the origin. Then it caches the received response against exactly the request that it sent to the origin, and will only use that cached response for future identical requests.
Injecting a new header in a Viewer Request trigger will cause that header to be discarded after it passes through the matching Cache Behavior, unless the cache behavior specifically is configured to whitelist that header for forwarding to the origin. This is the same behavior you would see if the header had been injected by the browser, itself.
So, your solution to get this header to pass through to the origin is to whitelist it in the cache behavior settings.
If you tried this same code as an Origin Request trigger, without the header whitelisted, CloudFront would actually throw a 502 Bad Gateway error, because you're trying to inject a header that CloudFront already knows you haven't whitelisted in the matching Cache Behavior. (In Viewer Request, the Cache Behavior match hasn't yet occurred, so CloudFront can't tell if you're doing something with the headers that will not ultimately work. In Origin Request, it knows.) The flow is Viewer Request > Cache Behavior > Cache Check > (if cache miss) Origin Request > send to Origin Server. Whitelisting the header would resolve this, as well.
Any header you want the origin to see, whether it comes from the browser, or a request trigger, must be whitelisted.
Note that some headers are inaccessible or immutable, particularly those that could be used to co-opt CloudFront for fraudulent purposes (such as request forgery and spoofing) and those that simply make no sense to modify.

Cloudfront how to avoid If-Modified-Since header request everytime

AWS Cloudfront document says:
If you set the TTL for a particular origin to 0, CloudFront will still
cache the content from that origin. It will then make a GET request
with an If-Modified-Since header, thereby giving the origin a chance
to signal that CloudFront can continue to use the cached content if it
hasn't changed at the origin
I need to configure my Dynamic Content. I have already set TTL to 0.. I want every request to go to Origin always. Is there a way I avoid this additional GET request with an If-Modified-Since header ! Why this extra request everytime !
Is there a way I avoid this additional GET request
It sounds as if you are misinterpreting the what you are reading. Unfortunately, you didn't cite the source, so it's difficult to go back and pick up more context; however, this is not referring to an "extra" request.
It will then make a GET request with an If-Modified-Since header
This refers to each time the object is subsequently requested by a browser. CloudFront sends the next request with If-Modified-Since: so that your origin server has the option of returning a 304 Not Modified response... it doesn't send two requests to the origin in response to one request from a browser.
If your content is always dynamic, return Cache-Control: private, no-cache, no-store and set Minimum TTL to 0.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#ExpirationDownloadDist
This is the answer I got from AWS:
However, if you forward all headers for that particular origin, the
request will go to the origin every time without the If-Modified-Since
header mentioned [1]. Please view the excerpt from the link below for
further detail:
“Forward all headers to your origin Important If you configure
CloudFront to forward all headers to your origin, CloudFront doesn't
cache the objects associated with this cache behavior. Instead, it
sends every request to the origin.”