AWS Cloudfront behaviors not working as expected - amazon-web-services

I have a PHP app on AWS Elastic Beanstalk, I have created an assets bucket on S3. I'm trying to setup a Cloudfront distribution with behaviors to send requests for assets/* to S3 with a default behavior to send requests to EB. The domain points to Cloudfront.
All requests are going to EB which returns a 404 since there is no assets diretory in the EB environment.
I have created 2 Cloudfront origins, one for EB and one for the S3 bucket. This is what my behaviors look like:
Precedence Path Pattern Origin Protocol Policy Fwd Query Strings
0 assets/* S3-example-bucket HTTP and HTTPS No
1 Default (*) Custom-example.us-east-1.elasticbeanstalk.com HTTP and HTTPS Yes
It seems as though this should be pretty straight forward so I assume I'm missing something basic. Any help is greatly appreciated.
Edit:
Request header:
GET /assets/images/10waysaudiobook.png HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wordpress_logged_in_8a27500b7747be1e4fbad7f473f238e5=snickerspixy%7C1466021823%7Cr7rE5moINjanjHEqb1TGbsSkn9F7OCZLfX69IbcnGJu%7C28fc452885f3fe6e954243abab585a188f6511cdd6eeec6fa5ec5c50b9f3d393; wp-settings-7674=m4%3Do%26m5%3Do%26m9%3Do%26m6%3Do%26editor%3Dhtml%26m10%3Do%26m0%3Do%26m3%3Do%26hidetb%3D1%26m2%3Dc%26m1%3Do%26m8%3Do%26m12%3Do%26m7%3Do%26m11%3Do%26urlbutton%3Dnone%26m13%3Do%26tml1%3D1%26imgsize%3Dfull%26align%3Dcenter%26libraryContent%3Dbrowse%26ed_size%3D569%26unfold%3D1%26wplink%3D1%26mfold%3Do%26post_dfw%3Doff%26advImgDetails%3Dshow%26posts_list_mode%3Dlist; wp-settings-time-7674=1464816549; AWSELB=1FCB85F51606EBAFF15FEADB01C8069AEDE17E2A043407E615EF1A0E1ABF24607545A45D3DC206631F7AAE4503ADA423788B5E6B5B48FAE93EE916DE068509E64F92AC10FF; PHPSESSID=cpi2su7s967phu87rlpjgneel6; wordpress_test_cookie=WP+Cookie+check
Connection: keep-alive
Response header:
HTTP/1.1 404 Not Found
Cache-Control: no-cache, must-revalidate, max-age=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 05 Jun 2016 00:54:23 GMT
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Link: <http://example.com/wp-json/>; rel="https://api.w.org/"
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Connection: keep-alive

The response headers indicate that this request wasn't served by CloudFront at all, because there are headers that should be present... but are absent.
CloudFront adds Via:, X-Cache:, and x-amz-cf-id: headers to every response, and sometimes Age: (on cache hits and errors) or Vary: if you're forwarding the CloudFront-Is-*-Viewer: headers to the origin.
The absence of these headers suggest that the DNS for the site hasn't been pointed to CloudFront and may still be pointing directly to the EB environment, or if the change was recent, that the former TTL for the DNS entry may not yet have expired.

Related

CloudFront is removing Access-Control-* headers from my Origin

I have a CloudFront distribution with a custom origin to an APIGateway that forwards calls to a Lambda which is my API code. I have a separate CloudFront distribution for my static single-page website. My website is not working because it is getting CORS errors when calling my API on a separate subdomain. It is my Lambda that is currently responsible for sending back CORS headers.
Looking into it, it seems CloudFront is removing CORS headers from the responses from the APIGateway and I cannot figure out how to get it to allow the headers. I can make the same call directly to my APIGateway and I get the correct response headers.
Request:
OPTIONS https://api.mywebsite.com/some/endpoint
User-Agent: ...snip...
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Access-Control-Request-Method: GET
Access-Control-Request-Headers: authorization
Referer: https://www.mywebsite.com/
Origin: https://www.mywebsite.com
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
APIGateway Response:
200 OK
Date: Fri, 27 Jan 2023 03:47:55 GMT
Content-Type: application/json
Content-Length: 0
Connection: keep-alive
x-amzn-RequestId: ...snip...
X-XSS-Protection: 1; mode=block
Access-Control-Allow-Origin: https://www.mywebsite.com
Access-Control-Allow-Headers: authorization
X-Frame-Options: DENY
x-amz-apigw-id: ...snip...
Vary: Origin
Vary: Access-Control-Request-Method
Vary: Access-Control-Request-Headers
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Expires: 0
X-Content-Type-Options: nosniff
Access-Control-Allow-Methods: GET
Pragma: no-cache
Access-Control-Max-Age: 3600
CloudFront Response:
200 OK
Content-Type: application/json
Content-Length: 0
Connection: keep-alive
Date: Fri, 27 Jan 2023 03:51:58 GMT
x-amzn-RequestId: ...snip...
X-XSS-Protection: 1; mode=block
Accept-Patch:
Access-Control-Allow-Origin: https://www.cicerone.development.loesoft.com
Allow: GET,HEAD,OPTIONS
X-Frame-Options: DENY
x-amz-apigw-id: ...snip...
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Expires: 0
X-Content-Type-Options: nosniff
Pragma: no-cache
X-Cache: Miss from cloudfront
Via: 1.1 ...snip....cloudfront.net (CloudFront)
X-Amz-Cf-Pop: DFW56-P2
X-Amz-Cf-Id: ...snip...
The browser is rejecting the desired GET call because the PreFlight call didn't have the expected information. I suspect this is because of the missing one or more of the Access-Control-* headers.
I've tried configuring CloudFront a few different ways with no success. Original configuration, the default (only) behavior had a Cache policy and no assigned Origin Request policy or Response Headers policy. I tried adding the managed "All Viewer" managed Origin Request policy which should be sending all inbound request headers to my APIGateway. I did this just in case any headers were being removed in this case. This made no difference. I then added a Response Headers policy that set generic values for the various CORS headers and made sure the "override origin" flag was off so that the "Access-Control-*" headers coming from my origin would be used. This also did not solve the issue. I've tried various different configurations for all the policies but I'm not having much luck.
Additionally, if I have my UI bypass CloudFront and access my API directly, the API calls from the browser work w/o issue.
Is there a way to configure CloudFront to solve my CORS issue or even just to not filter any headers coming from the my origin?
Thank you in advance.
The issue turned out to be 2 parted. First, without an assigned Origin Request policy, CF was stripping many of the CORS headers before sending the request to the origin. This was causing the appropriate CORS response headers to not be generated by my backend Lambda. Next, adding the AllViewer Origin Request policy resulted in all responses returning 403 but never actually getting to my backend Lambda. It appears that setting this will cause the Host header to be sent with the down stream request, and APIGateway was rejecting the call.
I ended up creating my own Origin Request policy that included all the viewer headers except the Host header and then my downstream Lambda started getting the headers and returning the correct response headers that were then being echoed back by CF.
I did not need a Response Headers policy in place for this to work.

Google cloud CDN backend service load balancer not caching any resources

We have a requirement to generate images on the fly and cache using CDN. For this we have configured a backend service with a load balancer enabled cloud CDN. We are using Nginx proxy server. We have added headers specified in the Google cloud CDN docs, but unfortunately it is not caching.
Request:
GET /resize?size=l&url=https://example.com/image.jpeg HTTP/1.1
Host: resize.example.com
Request Headers:
Host: resize.example.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Response headers:
HTTP/1.1 200 OK
Server: nginx/1.17.2
Date: Wed, 15 Jan 2020 15:01:14 GMT
Content-Type: image/jpeg
Content-Length: 62771
cache-control: max-age=86400, public, s-maxage=86400
Via: 1.1 google
I suggest you a couple of pages that could help you.
a) Not all HTTP responses are cacheable. Cloud CDN caches only those responses that meet all the requirements in this section. Some of these requirements are specified by RFC 7234, and others are specific to Cloud CDN.
Cacheability for HTTP responses
Responses aren't being cached--Troubleshooting
The following example demonstrates using curl to check the HTTP response headers for http://example.com/style.css:
$ curl -s -D - -o /dev/null http://example.com/style.css
HTTP/1.1 200 OK
Date: Tue, 16 Feb 2016 12:00:00 GMT
Content-Type: text/css
Content-Length: 1977
Via: 1.1 google
Although perhaps because of the added response, you may have already read it.

Cloudfront, Compression, HTTPS Load error

I've working on a project and we recently switched from HTTP to HTTPS.
So here are a few things. We're hosting our website on S3 with the static website enabled. For the server, we created an instance through Elastic Beanstalk. Using ACM, we have a certificate and successfully attached it to our frontend through Cloudfront and our server through Elastic Beanstalk Load Balancer.
Now our site is finally live and says secure! For some reason sometimes on the first load it takes upwards of 10 seconds to load the page and usually ends of timing out. This happens every couple of hours or whenever we clear our cache. But as soon as we refresh the site takes 2-3 seconds to load and everything works perfectly. We think that this is because we're using Angular 5 and using ng build, we get a large file called vendor.bundle.js that's around 13 mb.
We want to gzip this to hopefully solve the problem cause we think the website is timing out before that vendor file is even loaded in. We went into Cloudfront and enabled compression. We then went into our S3 bucket and added "Content-Length" as an allowed header. I looked up the request and response body and it is as follows:
Request
:method: GET
:scheme: https
:authority: riftapp.io
:path: /vendor.bundle.js
Host: riftapp.io
If-None-Match: "de078424bf26a6b1e2873009a37924ee-2"
Accept: */*
Connection: keep-alive
Accept-Language: en-us
Accept-Encoding: br, gzip, deflate
Cookie: __stripe_mid=333bb2df-0b1a-4757-a193-0596baeabd7c;
__stripe_sid=bff74933-d3fd-4ee2-a7a8-e5e73a224fdb
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4)
AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1
Safari/605.1.15
If-Modified-Since: Wed, 25 Jul 2018 03:02:42 GMT
Referer: https://riftapp.io/home
Response
:status: 304
Via: 1.1 e3a844a9e0d478ce4d12c6d1f3a2d892.cloudfront.net (CloudFront)
ETag: "de078424bf26a6b1e2873009a37924ee-2"
Age: 3021
Date: Wed, 25 Jul 2018 03:55:47 GMT
Server: AmazonS3
x-amz-cf-id: JHS0o-e-tBh1wfTy4AgKHALzHEEvdbUOkO0mgzSc56LAmnx515WS9A==
x-cache: Hit from cloudfront
We think that maybe it's because the request body host is riftapp.io instead of abc.cloudfront.net or something, but when I tried to delete the origin, I can't because our S3 bucket is linked to it.

Amazon CloudFront: Identify if there have been any updates with the file - Reload file (and re-cache)

I have recently set Cloudfront to cache files (images, css and javascript):
It requires me to clear cache manually from within the browser sometimes to get the new file.
Caching files within Cloudfront is really tough - A week ago - I tried enabling GZIP compression on Cloudfront and noticed that until I didn't add my file to the "Invalidaion" table - the file wasn't displayed updated on my website.
(I cleared cache from my browser and this didn't help either btw)
Is there any "smart" way for caching on Amazon Cloudfront?
Something like: "if the file has been updated - send the new file (as gzip of course). "
Here are my headers:
Request URL:https://abc.cloudfront.net/live/static/rcss/bootstrap3.min.css
Request Method:GET
Status Code:200 OK
Remote Address:77.77.77.77:443
Referrer Policy:no-referrer-when-downgrade
Response Headers
HTTP/1.1 200 OK
Content-Type: text/css
Connection: keep-alive
Date: Sun, 07 May 2017 08:14:04 GMT
Last-Modified: Wed, 26 Apr 2017 08:43:05 GMT
Server: AmazonS3
Content-Encoding: gzip
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 abc.cloudfront.net (CloudFront)
X-Amz-Cf-Id: abc-iuxzccRYcxce8LjxzcDeYdasdehHqFJGj80iczxcz8Q4asdDpWg==
Request Headers
Accept:text/css,*/*;q=0.1
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Connection:keep-alive
Host:abc.cloudfront.net
Pragma:no-cache
Referer:https://example.com/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
If there is anything else I can add to my post - please do ask.

MISS from Cloudfront after HIT from Cloudfront

I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==
This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.