Content-Encoding header not returned from Cloudfront

Content-Encoding header not returned from Cloudfront - amazon-web-services

I'm trying to deliver compressed CSS and JS files to my web app. The files are hosted on S3, with a Cloudfront distribution in front of the S3 origin to provide edge cacheing. I'm having trouble getting these files to the browser both compressed and with the right cache-related headers to allow the browser to cache as well.
I have a cloudfront distribution with S3 as the Origin to deliver the JS and CSS files for my web app. I initially set up CloudFront to compress the files, but it would not send the Cache-Control or ETag headers in the response.
Since I also wanted to leverage the browser cache too, I thought of storing the gzipped files in S3, with the Cache-Control, and Content-Encoding headers attached. I did this, and the CloudFront did start returning the Cache-Control and ETag headers in the response, but it would not return the Content-Encoding: gzip header in the response (that I set in the file metadata in S3). Because this header is missing in the response, the browser doesn't know to uncompress the response and ends up with an unreadable file.
I've also tried setting up a viewer response edge lambda to add the Content-Encoding header, but this is disallowed (see the AWS docs) and results in a LambdaValidationError.
Is there something I'm missing here that would allow the files to make it to the browser with compression, AND still allow the Cache-Control and ETag headers to make it through to the browser?
Any help would be much appreciated!

The way I usually do this is to upload uncompressed content to the S3 bucket and put Cache-Control headers on your items there. The Cache-Control header is the only thing I set in the origin (S3).
In Cloudfront I check the 'Compress Objects Automatically' option in Behavior Settings to have Cloudfront compress the files for me. That takes care of the Content-Encoding and Last-Modified headers and the gzipping. That should be all you need. You won't see an ETag header from Cloudfront but Last-Modified does essentially the same thing here.
If you don't see your changes coming through, check that you properly invalidated your Cloudfront cache. I see a lot of people put / in the box but it's really /* to invalidate the entire distribution.
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
This should take care of gzipping, caching from the CDN and browser caching.
Good luck!

In your particular case I think you are missing one bit. You need to modify you distrubtion in cloudfront like this:
-> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header.
In general the way caching in CloudFront is:
If you have compression enabled in CloudFront, all files which can be compressed, meaning:
have compressible type: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront-file-types
are above 1kb https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront
will be compressed by CloudFront and will have etag header removed by default. CloudFront does not touch/modify cache-control header which you can set as attribute in your s3 objects.
It might be confusing when diagnosing the disappearance of etag with with curl. Curl by default will return etag because it does not send header:
"Accept-Encoding: gzip, deflate, br"
until you specify it. For non-compressed content etag is preserved by CloudFront.
One thing you can do to have etag is to disable compression on cloudfront but it means increased cost, higher load times.
Other thing is to is to white-list Accept-Encoding header on cloudfront: -> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header and upload compressed s3 object. Remember to setup "Content Encoding" metadata accordingly. Here you will find an instruction: https://medium.com/#graysonhicks/how-to-serve-gzipped-js-and-css-from-aws-s3-211b1e86d1cd
From now on CloudFront will keep cached version and share etag. More reading: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html?shortFooter=true#header-caching-web-compressed
Additionally CloudFront adds:
last-modified: Sat, 13 Jul 2019 07:11:35 GMT
header.
If you have cache-control present without etag that the caching behavior works as described here:
https://developer.mozilla.org/pl/docs/Web/HTTP/Headers/Cache-Control
If you have only last-modified than it is not 100% obvious how long the browser will cache such request.
Based on my experience when firefox and chrome have this object already cached, when retrieving such object again from CloudFront will add request header:
if-modified-since: Sat, 13 Jul 2019 07:11:35 GMT
CloudFront will respond will proper data if it was modified after this date.
On IE it seems like heuristic caching algorithm is used, you can read more about it here: https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/.
For IE the object can be cached for as long as: (current time - last-modified) * 10%.

Related

Why does CloudFront sometimes serve gzip instead of br, when both are enabled?

I am reading the CloudFront documentation (https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html, retrieved June 20, 2021) regarding CloudFront compression:
The viewer includes the Accept-Encoding HTTP header in the request, and the header values include gzip, br, or both. This indicates that the viewer supports compressed content. When the viewer supports both formats, CloudFront uses Brotli.
My plain reading of this is that if a request is made with the following header:
accept-encoding: gzip, deflate, br
Then CloudFront should use Brotli.
However, observing our production web apps (which uses the Managed-CachingOptimized cache policy which has both gzip and br enabled) served via CloudFront, I can simply see that this is NOT the case - about 70% of the files are served using Brotli, while the remainder use gzip.
What's even more confusing is that all these files are part of the same compilation, served via the same origin, having the same metadata. I don't see how they differ at all, other than content and size.
My only intuition is that CloudFront in some cases determines that gzip produces a better file size than br, and thus decides to use that instead, but I cannot find this behavior documented.
Why is this happening?

So after speaking to AWS support about this, I found the issue.
In short - if the FIRST request for a CloudFront resource (i.e. before it is cached) only supports gzip, then ALL future requests to that (now cached) resource will be served using gzip, even if the client specifies that it supports brotli.
The reason this happened to us is that we use Cypress to run automated tests against our webapps. Cypress currently only supports gzip when making requests to a target site (https://github.com/cypress-io/cypress/issues/6197#issuecomment-684847493), and would occasionally be the first to access new files as we uploaded them - causing them to be permanently cached as gzip.
The only resolution I found for now is to, as #LovekeshKumar suggested in a comment, manually clear the CloudFront cache, and then immediately fetch all the files via Chrome or something else that supports Brotli. Strange and tedious, but hopefully this will be solved both on the AWS and Cypress side of things eventually.

Google Cloud CDN vary:cookie response never gets cache hit

I'm using Google Cloud CDN to cache an HTML page.
I've configured all the correct headers as per the docs, and the page is caching fine. Now, I want to change it so that it only caches when the request has no cookies, i.e. no cookie header set.
My understanding was that this was simply a case of changing my origin server to add a vary: cookie header to all responses for the page, then only adding the caching headers Cache-Control: public and Cache-Control: max-age=300 when no cookie header is set on the request.
However, this doesn't work. Using curl I can see that all caching headers, the vary: cookie header, are set as expected when I send requests with and without cookies, but I never get cache hits on the requests without cookies.
Digging into the Cloud CDN logs, I see that every request with no cookie header has cacheFillBytes populated with the same number as the response size - whereas it's not for the requests with a cookie header set with a value (as expected).
So it appears like Cloud CDN is attempting to populate the cache as expected for requests with no cookies, it's just that I never get a cache hit - i.e. it's just cacheFillBytes every time, cacheHit: true never appears in the logs.
Has anyone come across anything similar? I've triple-checked all my headers for typos, and indeed just removing the vary: cookie header makes caching work as expected, so I'm almost certain my configuration is right in terms of headers and what Cloud CDN considers cacheable.
Should Cloud CDN handle vary: cookie like I'm expecting it to? The docs suggest it handles arbitrary vary headers. And if so, why would I see cacheFillBytes on every request, with Cache-Control: public and Cache-Control: max-age=300 set on the response, but then never see a cacheHit: true on any subsequent request (I've tried firing hundreds with curl in a loop, it really never hits, it's not just that I'm populating a few different edge caches)?

I filed a bug with Google and it turns out that, indeed, the documentation was wrong.
vary: cookie is not supported by Cloud CDN
The docs have been updated - the only headers that can be used with vary are Accept, Accept-Encoding and Origin.

As per the GCP documentation[1], it is informed that Cloud CDN respects any Vary headers that origin servers include in responses. As per this information it looks like vary:cookie is supported by GCP Cloud CDN since any Vary header that the origin serves will be respected by Cloud CDN. Keep in mind though that this will negatively impact caching because the Vary header indicates that the response varies depending on the client's request headers. Therefore, if a request for an object has request header Cookie: abc, then a subsequent request for the same object with request header Cookie: xyz would not be served from the cache.So, yes it is supported and respected but will impact caching (https://cloud.google.com/cdn/docs/troubleshooting-steps?hl=en#low-hit-rate).
[1]https://cloud.google.com/cdn/docs/caching#vary_headers

Tell CloudFront to only cache 200 response codes

Is it possible to configure Amazon CloudFront to only ever cache 200 codes? I want it to never cache 3xx as I want to connect it to an on the fly image processing tool with Lambda that performs a 307 via S3 as described ere https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/

There isn't a way to explicitly tell CloudFront to cache only 2XX's and not cache 3XX's unless you can configure the origin to set the Cache-Control header accordingly -- CloudFront considers 2XX and 3XX as "success" and treats them the same. (It has different rules for 4XX and 5XX only, and an obvious exception for a 304 response to a conditional request.)
In the case of S3 redirects, the problem with this is that S3 redirection rules do not allow a Cache-Control header to be set.
However, if you are setting the Cache-Control headers correctly on the objects when you create them in S3 -- as you should be -- then you can probably¹ rely on CloudFront's Default TTL setting to solve the problem entirely, by telling CloudFront that responses lacking a Cache-Control header should not be cached. This would mean setting the Default TTL to 0, and would of course require that the Minimum TTL also be set to 0, since minimum <= default is required.
The Maximum TTL should be left at its default value, since it is used to shorten the CloudFront cache time for objects with a max-age that is larger than Maximum TTL. You don't likely want to shorten the cacheability of 2XX responses.
Assuming browsers behave correctly and do not cache the redirect (which they shouldn't, for 307 or 302), then your issue is resolved, because CloudFront behaves as expected in this configuration -- honoring Cache-Control when it's present, and not caching responses when it's absent.
However, you might have to get more aggressive, if you find that browsers or other downstream caches are holding on to your redirects.
The only way to explicitly add Cache-Control (or other headers) to responses when the origin doesn't provide them would be with Lambda#Edge. The following code, used as an Origin Response² trigger, would add Cache-Control: no-cache, no-store, private (yes, it's a bit redundant) to any 3XX HTTP response received from an origin server. If any Cache-Control header is present on the origin's response, it would be overwritten. Any other response (e.g. 2XX) would not be modified.
'use strict';
// add Cache-Control: no-cache, ... only if response status code is 3XX
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
if (response.status.match(/^30[27]/))
{
response.headers['cache-control'] = [{
key: 'Cache-Control',
value: 'no-cache, no-store, private'
}];
}
callback(null, response);
};
With this trigger in place, 2XX responses do not have their headers modified, but 302/307 responses will be modified as shown. This will tell CloudFront and the browser not to cache the response.
¹ probably... is not intended to imply that CloudFront merely might do the right thing. CloudFront behaves exactly as expected. Probably refers to this being the only action needed: You can probably consider this solution sufficient, because probably browsers will not cache the redirect. Browser behavior, as usual, is the wildcard that may require the more aggressive addition of explicit Cache-Control headers to prevent caching of the redirect by the browser.
² Origin Response triggers examine and can modify certain aspects of responses before they are cached (if they are cached) and returned to the viewer. Modifying or adding Cache-Control headers at this point in the flow would prevent the response from being stored in the CloudFront cache, and should prevent browser caching as well.

You can ignore Response Page Path and HTTP Response Code in your use case.
Next, on CloudFront Behaviour Make sure Caching is zero if you want to retrieve every time from the origin.
If you are using headers, make sure the Origin Cache-Control Headers has the right caching header values.

Cloudfront how to avoid If-Modified-Since header request everytime

AWS Cloudfront document says:
If you set the TTL for a particular origin to 0, CloudFront will still
cache the content from that origin. It will then make a GET request
with an If-Modified-Since header, thereby giving the origin a chance
to signal that CloudFront can continue to use the cached content if it
hasn't changed at the origin
I need to configure my Dynamic Content. I have already set TTL to 0.. I want every request to go to Origin always. Is there a way I avoid this additional GET request with an If-Modified-Since header ! Why this extra request everytime !

Is there a way I avoid this additional GET request
It sounds as if you are misinterpreting the what you are reading. Unfortunately, you didn't cite the source, so it's difficult to go back and pick up more context; however, this is not referring to an "extra" request.
It will then make a GET request with an If-Modified-Since header
This refers to each time the object is subsequently requested by a browser. CloudFront sends the next request with If-Modified-Since: so that your origin server has the option of returning a 304 Not Modified response... it doesn't send two requests to the origin in response to one request from a browser.
If your content is always dynamic, return Cache-Control: private, no-cache, no-store and set Minimum TTL to 0.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#ExpirationDownloadDist

This is the answer I got from AWS:
However, if you forward all headers for that particular origin, the
request will go to the origin every time without the If-Modified-Since
header mentioned [1]. Please view the excerpt from the link below for
further detail:
“Forward all headers to your origin Important If you configure
CloudFront to forward all headers to your origin, CloudFront doesn't
cache the objects associated with this cache behavior. Instead, it
sends every request to the origin.”

Being able to download, not just stream files, from Amazon S3

I have Amazon S3 where all of my files are stored. Currently my users can go to a link where they can stream, but not download, audio and video files. How can I set up a link through either Amazon S3 or perhaps Amazon CloudFront that will allow someone to download an MP3 file or something of that nature?
Thanks for any advice!

You must set the file's content header to something other than the media type the browser understands. For example:
Content-Disposition: attachment; filename=FILENAME.EXT
Content-Type: application/octet-stream
This used to be a big issue if you wanted to have both features (ability to display/view and ability to download) and you used to have to proxy the file download through your EC2 or other annoying ways. Now S3 has it built in:
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html
You can override values for a set of response headers using the query
parameters listed in the following table. These response header values
are only sent on a successful request, that is, when status code 200
OK is returned. The set of headers you can override using these
parameters is a subset of the headers that Amazon S3 accepts when you
create an object. The response headers that you can override for the
GET response are Content-Type, Content-Language, Expires,
Cache-Control, Content-Disposition, and Content-Encoding. To override
these header values in the GET response, you use the request
parameters described in the following table. (linke above)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js