How to get CloudFront to cache locally for max-age - amazon-web-services

I have a custom origin that I am trying to cache locally. The default cache-control sends no-store, no-cache, must-revalidate, and the pre and post checks are set to 0. I actually need the opposite. I need the browser to store, to cache, NOT to revalidate until my max age (24h / 86400s) is hit. There is private data being used and I don't need users seeing other user's data. I set up a response header policy to override cache-control with "private, max-age=86400" and I only get 200 HTTP responses (I am looking for a 304) and in x-cache I keep getting "miss from cloudfront" as well.
Some info about my setup:
Protocol: Match Viewer
SSL: TLSv1.1 (Custom cert that AWS handles)
Methods: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE
Unrestricted viewer access
Legacy Cache: Headers(None), Query strings (ALL), Cookies (None)
Object cache: Use origin headers
Response Policy: Custom cache-control "private, max-age=86400"
The only other behaviors are using CachingDisabled on folders I don't want caching. I know this is likely just a bad config issue but I have no idea what's causing it.

Related

Access Denied from Cloudfront with Secure Cookies returns no CORS headers preventing reading error information from a XHR request

I'm using cloudfront secure cookies to keep some files private. When cookie auth succeeds and the origin is hit cloudfront returns the proper cors headers (Access-Control-Allow-Origin) from the origin but how do I make cloudfront return CORS headers during a 403/Access Denied? This validation is entirely in cloudfront before the request to the origin, but is there a setting to enable it? I want to be able to make a XHR request to cloudfront and know why the request failed. Since cloudfront doesn't return cors headers on a 403 most modern browsers will prevent reading any information on the request including the status code and its tough to determine why the request failed.
Thanks!
As you know, CloudFront doesn't spontaneously emit CORS headers -- they need to come from the origin server -- so in order to see CORS headers in the response, the request needs to be allowed by CloudFront... but, of course, it can't be allowed, because the condition you're trying to catch is 403 Forbidden.
So, what we need in order to allow your unauthorized responses to be CORS-friendly is an additional origin that can provide us with an alternate error response, and that origin needs to be CORS-aware. The solution seems to be something we can accomplish with a little help from CloudFront Custom Error Responses and an otherwise-empty S3 bucket, created for the purpose.
Custom error responses allow you to configure CloudFront to fetch the custom error response from another origin, rather than generating it internally. As part of that process, some headers from the original request are included in the upstream fetch, and the response headers from the error document are returned.
S3 makes a handy origin, since it has configurable CORS support.
Create a new, empty bucket.
Enable CORS for the bucket, and configure CORS with the appropriate parameters. The default configuration may be fine for this purpose.
Create a simple file that your CloudFront distribution will be using instead of its built in response for a 403. For test purposes, that can just be a text file that says "Access denied."
Upload the file to the bucket with whatever name you like, such as 403.txt. Select the option to make the object publicly-readable. Set metadata Cache-Control: no-cache, no-store, private, must-revalidate and Content-Type: text/plain (or text/html, depending on what exactly you put in the error file).
In CloudFront, create a new Origin. For the Origin Domain Name, select the bucket from the list of buckets.
Create a new Cache Behavior, matching path /403.txt (or whatever you named the file). Whitelist the Origin, Access-Control-Request-Headers, and Access-Control-Request-Method headers for forwarding. Set Restrict Viewer Access to No, because for this one path, we don't require signed credentials. Note that this path needs to be exactly the same as the filename in the bucket (except the leading slash, which isn't shown in the bucket but should be included, here).
In CloudFront Custom Error Responses, choose Create Custom Error Response. Select error code 403, set Error Caching Minimum TTL to 0, choose Customize Error Response Yes, set Response Page Path /403.txt and set HTTP Response code to 403.
Profit!
Test:
$ curl -v dzczcnnnnexample.cloudfront.net -H 'Origin: http://example.com'
* Rebuilt URL to: dzczcnnnnexample.cloudfront.net/
* Trying 203.0.113.208...
* Connected to dzczcnnnnexample.cloudfront.net (203.0.113.208) port 80 (#0)
> GET / HTTP/1.1
> Host: dzczcnnnnexample.cloudfront.net
> User-Agent: curl/7.47.0
> Accept: */*
> Origin: http://example.com
>
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain
< Content-Length: 16
< Connection: keep-alive
< Date: Sun, 08 Apr 2018 14:01:25 GMT
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, HEAD
< Access-Control-Max-Age: 3000
< Last-Modified: Sun, 08 Apr 2018 13:29:19 GMT
< ETag: "fd9e8f7be7b65381c4acc272b6afc858"
< x-amz-server-side-encryption: AES256
< Cache-Control: private, no-cache, no-store, must-revalidate
< Accept-Ranges: bytes
< Server: AmazonS3
< Vary: Origin,Access-Control-Request-Headers,Access-Control-Request-Method
< X-Cache: Error from cloudfront
< Via: 1.1 1234567890a26beddeac6bfc77b2d348.cloudfront.net (CloudFront)
< X-Amz-Cf-Id: ExAmPlEIbQBtaqExamPLEQs4VwcxhvtU1YXBi47uUzUgami0Hj0MwQ==
<
Access denied.
* Connection #0 to host dzczcnnnnexample.cloudfront.net left intact
Here, Access denied. is what I put in the text file I created. You may want to get a little more creative, after confirming that this works for you, as it does for me. The content of this new file in S3 will always be returned whenever CloudFront throws a 403 error. Additionally, it will also be returned whenever your origin throws a 403, because custom error responses are designed to replace all errors with a given HTTP status code.
You note, above, that we see Access-Control-Allow-Origin: *. This is the default behavior of S3 CORS. If you provide explicit origins in the S3 CORS config, you get a response like this...
Access-Control-Allow-Origin: http://example.com
...but for GET requests, I assume this level of specificity would not be necessary and the wildcard would suffice. The scenario described here isn't setting CORS for the entire CloudFront distribution -- just for the error response.

Content-Encoding header not returned from Cloudfront

I'm trying to deliver compressed CSS and JS files to my web app. The files are hosted on S3, with a Cloudfront distribution in front of the S3 origin to provide edge cacheing. I'm having trouble getting these files to the browser both compressed and with the right cache-related headers to allow the browser to cache as well.
I have a cloudfront distribution with S3 as the Origin to deliver the JS and CSS files for my web app. I initially set up CloudFront to compress the files, but it would not send the Cache-Control or ETag headers in the response.
Since I also wanted to leverage the browser cache too, I thought of storing the gzipped files in S3, with the Cache-Control, and Content-Encoding headers attached. I did this, and the CloudFront did start returning the Cache-Control and ETag headers in the response, but it would not return the Content-Encoding: gzip header in the response (that I set in the file metadata in S3). Because this header is missing in the response, the browser doesn't know to uncompress the response and ends up with an unreadable file.
I've also tried setting up a viewer response edge lambda to add the Content-Encoding header, but this is disallowed (see the AWS docs) and results in a LambdaValidationError.
Is there something I'm missing here that would allow the files to make it to the browser with compression, AND still allow the Cache-Control and ETag headers to make it through to the browser?
Any help would be much appreciated!
The way I usually do this is to upload uncompressed content to the S3 bucket and put Cache-Control headers on your items there. The Cache-Control header is the only thing I set in the origin (S3).
In Cloudfront I check the 'Compress Objects Automatically' option in Behavior Settings to have Cloudfront compress the files for me. That takes care of the Content-Encoding and Last-Modified headers and the gzipping. That should be all you need. You won't see an ETag header from Cloudfront but Last-Modified does essentially the same thing here.
If you don't see your changes coming through, check that you properly invalidated your Cloudfront cache. I see a lot of people put / in the box but it's really /* to invalidate the entire distribution.
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
This should take care of gzipping, caching from the CDN and browser caching.
Good luck!
In your particular case I think you are missing one bit. You need to modify you distrubtion in cloudfront like this:
-> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header.
In general the way caching in CloudFront is:
If you have compression enabled in CloudFront, all files which can be compressed, meaning:
have compressible type: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront-file-types
are above 1kb https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html?shortFooter=true#compressed-content-cloudfront
will be compressed by CloudFront and will have etag header removed by default. CloudFront does not touch/modify cache-control header which you can set as attribute in your s3 objects.
It might be confusing when diagnosing the disappearance of etag with with curl. Curl by default will return etag because it does not send header:
"Accept-Encoding: gzip, deflate, br"
until you specify it. For non-compressed content etag is preserved by CloudFront.
One thing you can do to have etag is to disable compression on cloudfront but it means increased cost, higher load times.
Other thing is to is to white-list Accept-Encoding header on cloudfront: -> Edit the default behavior [*] and select "Cache Based on Selected Request Headers" to Whitelist the "Accept-Encoding" header and upload compressed s3 object. Remember to setup "Content Encoding" metadata accordingly. Here you will find an instruction: https://medium.com/#graysonhicks/how-to-serve-gzipped-js-and-css-from-aws-s3-211b1e86d1cd
From now on CloudFront will keep cached version and share etag. More reading: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html?shortFooter=true#header-caching-web-compressed
Additionally CloudFront adds:
last-modified: Sat, 13 Jul 2019 07:11:35 GMT
header.
If you have cache-control present without etag that the caching behavior works as described here:
https://developer.mozilla.org/pl/docs/Web/HTTP/Headers/Cache-Control
If you have only last-modified than it is not 100% obvious how long the browser will cache such request.
Based on my experience when firefox and chrome have this object already cached, when retrieving such object again from CloudFront will add request header:
if-modified-since: Sat, 13 Jul 2019 07:11:35 GMT
CloudFront will respond will proper data if it was modified after this date.
On IE it seems like heuristic caching algorithm is used, you can read more about it here: https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/.
For IE the object can be cached for as long as: (current time - last-modified) * 10%.

Tell CloudFront to only cache 200 response codes

Is it possible to configure Amazon CloudFront to only ever cache 200 codes? I want it to never cache 3xx as I want to connect it to an on the fly image processing tool with Lambda that performs a 307 via S3 as described ere https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
There isn't a way to explicitly tell CloudFront to cache only 2XX's and not cache 3XX's unless you can configure the origin to set the Cache-Control header accordingly -- CloudFront considers 2XX and 3XX as "success" and treats them the same. (It has different rules for 4XX and 5XX only, and an obvious exception for a 304 response to a conditional request.)
In the case of S3 redirects, the problem with this is that S3 redirection rules do not allow a Cache-Control header to be set.
However, if you are setting the Cache-Control headers correctly on the objects when you create them in S3 -- as you should be -- then you can probably¹ rely on CloudFront's Default TTL setting to solve the problem entirely, by telling CloudFront that responses lacking a Cache-Control header should not be cached. This would mean setting the Default TTL to 0, and would of course require that the Minimum TTL also be set to 0, since minimum <= default is required.
The Maximum TTL should be left at its default value, since it is used to shorten the CloudFront cache time for objects with a max-age that is larger than Maximum TTL. You don't likely want to shorten the cacheability of 2XX responses.
Assuming browsers behave correctly and do not cache the redirect (which they shouldn't, for 307 or 302), then your issue is resolved, because CloudFront behaves as expected in this configuration -- honoring Cache-Control when it's present, and not caching responses when it's absent.
However, you might have to get more aggressive, if you find that browsers or other downstream caches are holding on to your redirects.
The only way to explicitly add Cache-Control (or other headers) to responses when the origin doesn't provide them would be with Lambda#Edge. The following code, used as an Origin Response² trigger, would add Cache-Control: no-cache, no-store, private (yes, it's a bit redundant) to any 3XX HTTP response received from an origin server. If any Cache-Control header is present on the origin's response, it would be overwritten. Any other response (e.g. 2XX) would not be modified.
'use strict';
// add Cache-Control: no-cache, ... only if response status code is 3XX
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
if (response.status.match(/^30[27]/))
{
response.headers['cache-control'] = [{
key: 'Cache-Control',
value: 'no-cache, no-store, private'
}];
}
callback(null, response);
};
With this trigger in place, 2XX responses do not have their headers modified, but 302/307 responses will be modified as shown. This will tell CloudFront and the browser not to cache the response.
¹ probably... is not intended to imply that CloudFront merely might do the right thing. CloudFront behaves exactly as expected. Probably refers to this being the only action needed: You can probably consider this solution sufficient, because probably browsers will not cache the redirect. Browser behavior, as usual, is the wildcard that may require the more aggressive addition of explicit Cache-Control headers to prevent caching of the redirect by the browser.
² Origin Response triggers examine and can modify certain aspects of responses before they are cached (if they are cached) and returned to the viewer. Modifying or adding Cache-Control headers at this point in the flow would prevent the response from being stored in the CloudFront cache, and should prevent browser caching as well.
You can ignore Response Page Path and HTTP Response Code in your use case.
Next, on CloudFront Behaviour Make sure Caching is zero if you want to retrieve every time from the origin.
If you are using headers, make sure the Origin Cache-Control Headers has the right caching header values.

Cloudfront how to avoid If-Modified-Since header request everytime

AWS Cloudfront document says:
If you set the TTL for a particular origin to 0, CloudFront will still
cache the content from that origin. It will then make a GET request
with an If-Modified-Since header, thereby giving the origin a chance
to signal that CloudFront can continue to use the cached content if it
hasn't changed at the origin
I need to configure my Dynamic Content. I have already set TTL to 0.. I want every request to go to Origin always. Is there a way I avoid this additional GET request with an If-Modified-Since header ! Why this extra request everytime !
Is there a way I avoid this additional GET request
It sounds as if you are misinterpreting the what you are reading. Unfortunately, you didn't cite the source, so it's difficult to go back and pick up more context; however, this is not referring to an "extra" request.
It will then make a GET request with an If-Modified-Since header
This refers to each time the object is subsequently requested by a browser. CloudFront sends the next request with If-Modified-Since: so that your origin server has the option of returning a 304 Not Modified response... it doesn't send two requests to the origin in response to one request from a browser.
If your content is always dynamic, return Cache-Control: private, no-cache, no-store and set Minimum TTL to 0.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#ExpirationDownloadDist
This is the answer I got from AWS:
However, if you forward all headers for that particular origin, the
request will go to the origin every time without the If-Modified-Since
header mentioned [1]. Please view the excerpt from the link below for
further detail:
“Forward all headers to your origin Important If you configure
CloudFront to forward all headers to your origin, CloudFront doesn't
cache the objects associated with this cache behavior. Instead, it
sends every request to the origin.”

What Headers need to be whitelisted in AWS CloudFront for Parse Server

I'm running parse server behind AWS CloudFront and I'm still trying to figure out what the best configuration would be. Currently I've configured the CloudFront behavior to:
Allowed HTTP Methods: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE
Cached HTTP Methods: GET, HEAD (Cached by default)
Forward Headers: Whitelist
Accept-Language
Content-Type
Host
Origin
Referer
Object Caching: Customize:
Minimum TTL: 0
Maximum TTL: 31536000
Default TTL: 28800
Forward Cookies: All
My GET requests (using the parse REST API) seem to be cached as expected with this configuration. All requests that are made using the parse JS SDK seem to be called via POST and produce a 504 error in the browser console:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
For some reasons those requests are still fullfilled by the parse server because e.g. saving Objects still stores them into my MongoDB even though there's this Access Control Origin error.
The fix for this is not through cloud front but it will be from the Parse Server side.
In this file /src/middlewares.js add the below code and the cloud from will not thorough that exception.
var allowCrossDomain = function(req, res, next) {
res.header('Access-Control-Allow-Origin', '*');
res.header('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS');
res.header('Access-Control-Allow-Headers', 'X-Parse-Master-Key, X-Parse-REST-API-Key, X-Parse-Javascript-Key, X-Parse-Application-Id, X-Parse-Client-Version, X-Parse-Session-Token, X-Requested-With, X-Parse-Revocable-Session, Content-Type');