I have a static website using S3, Cloudfront and Route53. I had version 1 and now I updated to version 2.
If I view the website using the S3 endpoint (http://abc.s3-website-eu-west-1.amazonaws.com/) I see version 2.
If I use the Cloudfront endpoint (https://xyz.cloudfront.net/) I see version 2 again.
If I use the domain I have configured in Route53 (A record pointing to the cloudfront distribution) I see version 1. I have not setup any TTL for the DNS records (default behavior) and this has been going on for ~1 week now.
Some extra checks:
dig A xyz.cloudfront.net and dig A mydomain.com point to the same IPs.
And the output of curl is describing the previous situation (version 1) where mydomain.com was configured to point to www.example.com. Now I have it pointing directly to the cloudfront distribution
curl -sD - https://example.com -o /dev/null
HTTP/1.1 302 Moved Temporarily
Content-Length: 0
Connection: keep-alive
Server: CloudFront
Location: https://www.example.com/
X-Cache: Miss from cloudfront
Via: 1.1 qwe.cloudfront.net (CloudFront)
...
curl -sD - https://xyz.cloudfront.net -o /dev/null
HTTP/2 200
content-type: text/html
content-length: 537
date: Fri, 18 Nov 2022 20:35:41 GMT
last-modified: Tue, 15 Nov 2022 10:09:03 GMT
etag: "..."
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 bla.cloudfront.net (CloudFront)
...
Is there anything else I could check to find out what is misconfigured?
The DNS server (Route53) and the TTL on those DNS records are entirely irrelevant here. Those have nothing to do with content caching.
Your curl commands show your requests are being routed to different CloudFront edge nodes (qwe.cloudfront.net and bla.cloudfront.net).
It appears that one of those nodes had served a request for your website before, so it had a cached version stored, which it is still serving when you make new requests that hit that CloudFront edge node now. The other node didn't have a cached version stored, so when your request hit it, it went back to the origin (S3) and pulled in the latest version.
This is pretty much the expected behavior of CloudFront, or any other CDN, when you publish new content on your origin server without notifying the CDN that it needs to clear the old cached content. You need to tell CloudFront to invalidate the cache, which will cause it to remove the cached version of your content from all edge locations.
Related
I'm trying to configure Cloudfront to send an If-None-Match to my custom origin when a resource expires, so that I can respond with a 304 if nothing has changed. For some reason, I'm unable to get Cloudfront to do so.
My origin responds with these headers:
HTTP/2 200
content-type: application/json; charset=utf-8
content-length: 181691
access-control-allow-origin: *
cache-control: max-age=5
date: Fri, 17 Feb 2023 21:16:49 GMT
x-content-type-options: nosniff
x-frame-options: DENY
etag: W/"15-mbAPvGdFm9PuCZHJFTtrwm#3"
vary: Accept-Encoding
So, sending cache-control of 5 seconds and a weak e-tag.
My cloudfront cache policy has min ttl of 1, forwards headers Origin and a few x- ones, forwards all query strings. No cookies. Compression is turned on.
My origin request policy is "AllViewer".
The request is traveling to Cloudfront, which goes through a classic AWS load balancer, which hits a kubernetes pod that handles and sends the response.
For some reason, Cloudfront never sends an If-None-Match header to my origin when resource expires. If I manually specify an If-None-Match header in my request in a curl command to Cloudfront, my origin does see it and responds correctly. No no intermediate hop is removing the If-None-Match header, so it must be that Cloudfront is not sending it in the first place.
Any ideas what could be wrong? I've been pouring over the documentations but have not found anything that worked.
Thanks!
I have configured the following redirect rule to my AWS Application Load Balancer to redirect all HTTP traffic to HTTPS:
The issue is that when I now curl (or visit the domain in the browser), I will get this ugly and redundant Location response (domain changed to example.com):
~ $ curl -I http://www.example.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Mon, 14 Sep 2020 18:28:48 GMT
Content-Type: text/html
Content-Length: 150
Connection: keep-alive
Location: https://www.example.com:443/
I know that https://www.example.com:443/ is in practice just fine, and I know that it will not be shown in the end user's browser's URL field. But still, it will be shown in the browser's network tab's 'Response headers', and to me it just looks unprofessional compared to a redirect without the port, e.g.:
~ $ curl -I http://www.apple.com
HTTP/1.1 301 Moved Permanently
Server: AkamaiGHost
Content-Length: 0
Location: https://www.apple.com/
Cache-Control: max-age=0
Expires: Mon, 14 Sep 2020 18:33:23 GMT
Date: Mon, 14 Sep 2020 18:33:23 GMT
Connection: keep-alive
strict-transport-security: max-age=31536000
Set-Cookie: geo=FI; path=/; domain=.apple.com
Set-Cookie: ccl=izHQtSPVGso4jrdTGqyAkA==; path=/; domain=.apple.com
It would seem like a logical thing to just drop the port from the URL, but unfortunately it's a required field:
Also the 'Switch to full URL' option doesn't seem to really help, even though the port can be cleared there:
it still appears back after saving:
Is there any way to make this work?
Edit:
My domain is managed through AWS Route 53.
UPDATE: maybe it's worth mentioning this is nice as an exercise and show capabilities. However between a Lambda redirection and just a pure Load Balancer redirection, I'd still suggest you go with the ALB-native. It's more performatic (no cold-start, no extra hop), less error-prone (no coding required), for the cost of just aesthetics on a network trace or developer tools. End users would never notice that (you probably browse in lot of these sites everyday and you don't even know). So if you ask me, I'd suggest against going to production with something like this.
If you want something very simple as a redirect, you could just create a Target Group pointing to a Lambda function that returns a redirect. It doesn't require the overhead of managing a CloudFront distribution or trying to make it work behind a load balancer, as that would require changes in your DNS as well, since you cannot just place CloudFront behind a load balancer (not that I know of, at least).
For your case, I created a Lambda function (Python 3.8) with the following code:
def lambda_handler(event, context):
response = {}
if event.get('headers', {}).get('x-forwarded-proto', '') == 'http':
print(event)
response["statusCode"]=302
response["headers"] = {"Location": f"https://{event['headers']['host']}{event['path']}"}
response["body"]=""
return response
Then I created a new Target Group with that Lambda function as backend:
Finally, I configured my Listener on port 80 to the redirect Target Group:
With this implementation, your 302 redirect will show up as you expected:
This would still display in your browser's URL field without the port if this is what you're worrying about.
By default interpreters assume that if the protocol is HTTPS then the port is 443 so you do not need to specify anything, when the redirect occurs it does not add a port number to the end unless you specify a non-standard port such as 4430.
You have to specify the protocol (HTTPS) and port (443) simply because the configuration requires these items, this will not display for the user. This is the same as other AWS configurations such as during the configuration for an ELB.
You could put CloudFront in front of your server and have that perform https redirect. If you are only using ELB for SSL termination, you could remove it entirely and have CloudFront terminate your SSL. This would save you some money as ELB is pretty expensive if your only using it for one origin server.
https://aws.amazon.com/cloudfront/
You can use CloudFront and set the the origin as the load balancer. Then force HTTPS redirection on the cloud-front with Viewer Protocol Policy set to Redirect HTTP to HTTPS.
In this approach, all the ELB rules will still be applied.
I've working on a project and we recently switched from HTTP to HTTPS.
So here are a few things. We're hosting our website on S3 with the static website enabled. For the server, we created an instance through Elastic Beanstalk. Using ACM, we have a certificate and successfully attached it to our frontend through Cloudfront and our server through Elastic Beanstalk Load Balancer.
Now our site is finally live and says secure! For some reason sometimes on the first load it takes upwards of 10 seconds to load the page and usually ends of timing out. This happens every couple of hours or whenever we clear our cache. But as soon as we refresh the site takes 2-3 seconds to load and everything works perfectly. We think that this is because we're using Angular 5 and using ng build, we get a large file called vendor.bundle.js that's around 13 mb.
We want to gzip this to hopefully solve the problem cause we think the website is timing out before that vendor file is even loaded in. We went into Cloudfront and enabled compression. We then went into our S3 bucket and added "Content-Length" as an allowed header. I looked up the request and response body and it is as follows:
Request
:method: GET
:scheme: https
:authority: riftapp.io
:path: /vendor.bundle.js
Host: riftapp.io
If-None-Match: "de078424bf26a6b1e2873009a37924ee-2"
Accept: */*
Connection: keep-alive
Accept-Language: en-us
Accept-Encoding: br, gzip, deflate
Cookie: __stripe_mid=333bb2df-0b1a-4757-a193-0596baeabd7c;
__stripe_sid=bff74933-d3fd-4ee2-a7a8-e5e73a224fdb
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4)
AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1
Safari/605.1.15
If-Modified-Since: Wed, 25 Jul 2018 03:02:42 GMT
Referer: https://riftapp.io/home
Response
:status: 304
Via: 1.1 e3a844a9e0d478ce4d12c6d1f3a2d892.cloudfront.net (CloudFront)
ETag: "de078424bf26a6b1e2873009a37924ee-2"
Age: 3021
Date: Wed, 25 Jul 2018 03:55:47 GMT
Server: AmazonS3
x-amz-cf-id: JHS0o-e-tBh1wfTy4AgKHALzHEEvdbUOkO0mgzSc56LAmnx515WS9A==
x-cache: Hit from cloudfront
We think that maybe it's because the request body host is riftapp.io instead of abc.cloudfront.net or something, but when I tried to delete the origin, I can't because our S3 bucket is linked to it.
I went through numerous articles and forums online on how to create a distribution group but all of them are using S3 as origin domain name.
I created a distribution group using origin domain name as rails server e.g assets.abcd.efgh.com I can access the asset if i do assets.abcd.efgh.com/assets/abcdefghti-ieajife.css but i can not access the asset using distribution domain name as 1234test.cloudfront.net/assets/abcdefghti-ieajife.css. i am getting error:
Failed to contact the origin
The result i get using curl is
curl -I -s -X GET -H "Origin: https://assets.abcd.efgh.com" 1234test.cloudfront.net/assets/abcdefghti-ieajife.css
HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Content-Length: 507
Connection: keep-alive
Server: CloudFront
Date: Tue, 25 Oct 2016 16:48:17 GMT
Expires: Tue, 25 Oct 2016 16:48:17 GMT
X-Cache: Error from cloudfront
Via: 1.1 8f18deab0e501ffbd2fa94cfd46e4785.cloudfront.net (CloudFront)
X-Amz-Cf-Id: PLAjGN5UuFEEFZSRYu_fGfsMDBcjH1w7Ruy1x1fv9bWiftWak3k1QA==
can someone guide me what other settings i need to do while creating distribution group or what i am missing?
Found out that the origin needed to be updated to receive public requests. It was receiving private requests only
I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==
This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.