Browser not responding with 200 (from cache) and sending 304 Not Modified - amazon-web-services

My understanding of setting Cache-Control with a max-age value is that the browser is instructed to Cache the file.
What I then expect is that if I hit "enter" on the address bar for the same link, the browser would return a 200 (from cache) response.
My question is that why is it returning a 304 Not Modified response?
The way I see it is that with the 200 (from cache) the browser no longer makes a connection with the Server to validate the file and immediately just serves the cached content. But with the 304, although the browser will not download the file again and will simply instruct the browser to serve the cached file, it will still need to send a request to validate the freshness of the content.
The assets here are served with Amazon's CloudFront CDN with Amazon S3 buckets as the origin. The Cache-Header there (in S3) have been set already. This was is not an issue for all other self-hosted assets.
Thanks for the help!
EDIT: I found this What is the difference between HTTP status code 200 (cache) vs status code 304?. Additional question: I already have Cache-Control set to max-age=31536000, s-maxage=2592000, no-transform, public and still I'm getting a 304, do it need to set the Expire also? I could cache fine before on self-hosted sites with just the Cache-Control.

You expect to see a 200 with the content, rather than a 304 saying "not modified". That's the browser asking to see if the content is newer than what it has cached. 304 means "no, don't waste your bandwidth, your content is current". It can do this with a couple of methods- etag and if-modified-since.
As an example, we can use your stackoverflow avatar image. When I load that in Chrome and look at the Developer Tools, I can see it has a 304 response and is passing those two headers:
if-modified-since:Thu, 28 Jan 2016 13:16:24 GMT
if-none-match:"484ab25da1294b24f8d9d13afb913afd"

Related

Why does CloudFront sometimes serve gzip instead of br, when both are enabled?

I am reading the CloudFront documentation (https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html, retrieved June 20, 2021) regarding CloudFront compression:
The viewer includes the Accept-Encoding HTTP header in the request, and the header values include gzip, br, or both. This indicates that the viewer supports compressed content. When the viewer supports both formats, CloudFront uses Brotli.
My plain reading of this is that if a request is made with the following header:
accept-encoding: gzip, deflate, br
Then CloudFront should use Brotli.
However, observing our production web apps (which uses the Managed-CachingOptimized cache policy which has both gzip and br enabled) served via CloudFront, I can simply see that this is NOT the case - about 70% of the files are served using Brotli, while the remainder use gzip.
What's even more confusing is that all these files are part of the same compilation, served via the same origin, having the same metadata. I don't see how they differ at all, other than content and size.
My only intuition is that CloudFront in some cases determines that gzip produces a better file size than br, and thus decides to use that instead, but I cannot find this behavior documented.
Why is this happening?
So after speaking to AWS support about this, I found the issue.
In short - if the FIRST request for a CloudFront resource (i.e. before it is cached) only supports gzip, then ALL future requests to that (now cached) resource will be served using gzip, even if the client specifies that it supports brotli.
The reason this happened to us is that we use Cypress to run automated tests against our webapps. Cypress currently only supports gzip when making requests to a target site (https://github.com/cypress-io/cypress/issues/6197#issuecomment-684847493), and would occasionally be the first to access new files as we uploaded them - causing them to be permanently cached as gzip.
The only resolution I found for now is to, as #LovekeshKumar suggested in a comment, manually clear the CloudFront cache, and then immediately fetch all the files via Chrome or something else that supports Brotli. Strange and tedious, but hopefully this will be solved both on the AWS and Cypress side of things eventually.

Google Cloud CDN vary:cookie response never gets cache hit

I'm using Google Cloud CDN to cache an HTML page.
I've configured all the correct headers as per the docs, and the page is caching fine. Now, I want to change it so that it only caches when the request has no cookies, i.e. no cookie header set.
My understanding was that this was simply a case of changing my origin server to add a vary: cookie header to all responses for the page, then only adding the caching headers Cache-Control: public and Cache-Control: max-age=300 when no cookie header is set on the request.
However, this doesn't work. Using curl I can see that all caching headers, the vary: cookie header, are set as expected when I send requests with and without cookies, but I never get cache hits on the requests without cookies.
Digging into the Cloud CDN logs, I see that every request with no cookie header has cacheFillBytes populated with the same number as the response size - whereas it's not for the requests with a cookie header set with a value (as expected).
So it appears like Cloud CDN is attempting to populate the cache as expected for requests with no cookies, it's just that I never get a cache hit - i.e. it's just cacheFillBytes every time, cacheHit: true never appears in the logs.
Has anyone come across anything similar? I've triple-checked all my headers for typos, and indeed just removing the vary: cookie header makes caching work as expected, so I'm almost certain my configuration is right in terms of headers and what Cloud CDN considers cacheable.
Should Cloud CDN handle vary: cookie like I'm expecting it to? The docs suggest it handles arbitrary vary headers. And if so, why would I see cacheFillBytes on every request, with Cache-Control: public and Cache-Control: max-age=300 set on the response, but then never see a cacheHit: true on any subsequent request (I've tried firing hundreds with curl in a loop, it really never hits, it's not just that I'm populating a few different edge caches)?
I filed a bug with Google and it turns out that, indeed, the documentation was wrong.
vary: cookie is not supported by Cloud CDN
The docs have been updated - the only headers that can be used with vary are Accept, Accept-Encoding and Origin.
As per the GCP documentation[1], it is informed that Cloud CDN respects any Vary headers that origin servers include in responses. As per this information it looks like vary:cookie is supported by GCP Cloud CDN since any Vary header that the origin serves will be respected by Cloud CDN. Keep in mind though that this will negatively impact caching because the Vary header indicates that the response varies depending on the client's request headers. Therefore, if a request for an object has request header Cookie: abc, then a subsequent request for the same object with request header Cookie: xyz would not be served from the cache.So, yes it is supported and respected but will impact caching (https://cloud.google.com/cdn/docs/troubleshooting-steps?hl=en#low-hit-rate).
[1]https://cloud.google.com/cdn/docs/caching#vary_headers

Content-Encoding header not returned from Cloudfront

I'm trying to deliver compressed CSS and JS files to my web app. The files are hosted on S3, with a Cloudfront distribution in front of the S3 origin to provide edge cacheing. I'm having trouble getting these files to the browser both compressed and with the right cache-related headers to allow the browser to cache as well.
I have a cloudfront distribution with S3 as the Origin to deliver the JS and CSS files for my web app. I initially set up CloudFront to compress the files, but it would not send the Cache-Control or ETag headers in the response.
Since I also wanted to leverage the browser cache too, I thought of storing the gzipped files in S3, with the Cache-Control, and Content-Encoding headers attached. I did this, and the CloudFront did start returning the Cache-Control and ETag headers in the response, but it would not return the Content-Encoding: gzip header in the response (that I set in the file metadata in S3). Because this header is missing in the response, the browser doesn't know to uncompress the response and ends up with an unreadable file.
I've also tried setting up a viewer response edge lambda to add the Content-Encoding header, but this is disallowed (see the AWS docs) and results in a LambdaValidationError.
Is there something I'm missing here that would allow the files to make it to the browser with compression, AND still allow the Cache-Control and ETag headers to make it through to the browser?
Any help would be much appreciated!
The way I usually do this is to upload uncompressed content to the S3 bucket and put Cache-Control headers on your items there. The Cache-Control header is the only thing I set in the origin (S3).
In Cloudfront I check the 'Compress Objects Automatically' option in Behavior Settings to have Cloudfront compress the files for me. That takes care of the Content-Encoding and Last-Modified headers and the gzipping. That should be all you need. You won't see an ETag header from Cloudfront but Last-Modified does essentially the same thing here.
If you don't see your changes coming through, check that you properly invalidated your Cloudfront cache. I see a lot of people put / in the box but it's really /* to invalidate the entire distribution.
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
This should take care of gzipping, caching from the CDN and browser caching.
Good luck!
In your particular case I think you are missing one bit. You need to modify you distrubtion in cloudfront like this:
-> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header.
In general the way caching in CloudFront is:
If you have compression enabled in CloudFront, all files which can be compressed, meaning:
have compressible type: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront-file-types
are above 1kb https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront
will be compressed by CloudFront and will have etag header removed by default. CloudFront does not touch/modify cache-control header which you can set as attribute in your s3 objects.
It might be confusing when diagnosing the disappearance of etag with with curl. Curl by default will return etag because it does not send header:
"Accept-Encoding: gzip, deflate, br"
until you specify it. For non-compressed content etag is preserved by CloudFront.
One thing you can do to have etag is to disable compression on cloudfront but it means increased cost, higher load times.
Other thing is to is to white-list Accept-Encoding header on cloudfront: -> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header and upload compressed s3 object. Remember to setup "Content Encoding" metadata accordingly. Here you will find an instruction: https://medium.com/#graysonhicks/how-to-serve-gzipped-js-and-css-from-aws-s3-211b1e86d1cd
From now on CloudFront will keep cached version and share etag. More reading: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html?shortFooter=true#header-caching-web-compressed
Additionally CloudFront adds:
last-modified: Sat, 13 Jul 2019 07:11:35 GMT
header.
If you have cache-control present without etag that the caching behavior works as described here:
https://developer.mozilla.org/pl/docs/Web/HTTP/Headers/Cache-Control
If you have only last-modified than it is not 100% obvious how long the browser will cache such request.
Based on my experience when firefox and chrome have this object already cached, when retrieving such object again from CloudFront will add request header:
if-modified-since: Sat, 13 Jul 2019 07:11:35 GMT
CloudFront will respond will proper data if it was modified after this date.
On IE it seems like heuristic caching algorithm is used, you can read more about it here: https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/.
For IE the object can be cached for as long as: (current time - last-modified) * 10%.

Cache website to browser when navigating back

I have a single url/page of my website that I want to be cached to browser, so whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser. Also, I can't use solutions that cache the page in between the web server and Django, as every user has different permissions on what data they can see.
So I added this in my nginx config:
...
location /search {
expires 300s;
add_header Cache-Control "private";
...
And this works very well, 50% of the time :). How can I make it work always?
whenever a user navigates away from that page, and then presses the BACK button of the browser, I don't want that request to go to django at all, but serve the cached version of the page that is in the browser
For some browsers, this is the default behavior - if you have set no caching directives on the server, then it will keep not only a copy of the response but the entire rendered page in memory so than when you click the back button, it can be shown instantly.
But if you want to explicitly instruct the browser to cache the response you can use a max-age directive on the Cache-Control header. Set
Cache-Control: max-age=3600
This is a more modern and reliable way than using an "Expires" header, especially for small durations. If the user's browser has the incorrect time or time zone set "Expires" might not work at all, but "max-age" still should.
If you are serving each person a different version of the page you can add the "private" too to prevent caching by proxies (as in your example):
Cache-Control: private; max-age=3600
Note: you can't force a browser to always use the cache. If you are noticing that sometimes it doesn't use the cache, it could be:
The item in the cache has expired. You were giving it only 5 minutes, so 5 minutes after the request that went into the cache, if you request it again it will send through the request to the remote server - even if there have been requests in the time between.
The browser cache became full and some items were purged.
For some reason the browser believed or was configured to believe that the response should not be cached regardless of cache directives.
The user pressed reload.
A proxy between client and server stripped the Cache-Control or other headers.

getting 304 response even with django-cors-headers

I installed django-cors-headers in my django application.
I want to display svg file in webbrowser.
For first time, its not loading properly and its showing 304 response in network.
Can anyone help me how to rectify this problem?
This response should be fine, it just indicates that your browser has a cached version. It saves Django from having to pass back the response again.
From Wikipedia
304 Not Modified
Indicates that the resource has not been modified since the version specified by the request headers If-Modified-Since or If-None-Match. This means that there is no need to retransmit the resource, since the client still has a previously-downloaded copy.
This sounds like what you are looking for, as the SVG should still be rendered by the browser.