We have a requirement to generate images on the fly and cache using CDN. For this we have configured a backend service with a load balancer enabled cloud CDN. We are using Nginx proxy server. We have added headers specified in the Google cloud CDN docs, but unfortunately it is not caching.
Request:
GET /resize?size=l&url=https://example.com/image.jpeg HTTP/1.1
Host: resize.example.com
Request Headers:
Host: resize.example.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Response headers:
HTTP/1.1 200 OK
Server: nginx/1.17.2
Date: Wed, 15 Jan 2020 15:01:14 GMT
Content-Type: image/jpeg
Content-Length: 62771
cache-control: max-age=86400, public, s-maxage=86400
Via: 1.1 google
I suggest you a couple of pages that could help you.
a) Not all HTTP responses are cacheable. Cloud CDN caches only those responses that meet all the requirements in this section. Some of these requirements are specified by RFC 7234, and others are specific to Cloud CDN.
Cacheability for HTTP responses
Responses aren't being cached--Troubleshooting
The following example demonstrates using curl to check the HTTP response headers for http://example.com/style.css:
$ curl -s -D - -o /dev/null http://example.com/style.css
HTTP/1.1 200 OK
Date: Tue, 16 Feb 2016 12:00:00 GMT
Content-Type: text/css
Content-Length: 1977
Via: 1.1 google
Although perhaps because of the added response, you may have already read it.
Related
I am developing a simple automation tool using a Go Fiber HTTP server to start and stop AWS instances using the Go SDK v1.44.156.
The service listens to an endpoint at /csm/aws/:region/:instance_id/powerOn.
My code works well when I send requests from Postman. When I send requests using the Go HTTP client, AWS returns the following error:
AuthFailure: AWS was not able to validate the provided access credentials
The Postman request that works fine:
2022/12/23 16:26:12 Request came :#0000000100000003 - 127.0.0.1:7000 <-> 127.0.0.1:34976 - POST http://127.0.0.1:7000/csm/aws/us-east-1/i-0f9c5fe6b5c7b0a87/powerOn
Params: map[instance_id:i-0f9c5fe6b5c7b0a87 region:us-east-1]
Request: POST /csm/aws/us-east-1/i-0f9c5fe6b5c7b0a87/powerOn HTTP/1.1
User-Agent: PostmanRuntime/7.30.0
Host: 127.0.0.1:7000
Content-Type: application/json
Content-Length: 136
Accept: */*
Postman-Token: e27b899f-5125-497a-b154-61cd3214cd74
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
{"aws_access_key_id":"my-id","aws_secret_access_key":"my-key","account":"","region":""}
The Go request which returns the error:
2022/12/23 16:22:02 Request came :#0000000200000002 - 127.0.0.1:7000 <-> 127.0.0.1:34278 - POST http://127.0.0.1:7000/csm/aws/us-east-1/i-0f9c5fe6b5c7b0a87/powerOn
Params: map[instance_id:i-0f9c5fe6b5c7b0a87 region:us-east-1]
Request: POST /csm/aws/us-east-1/i-0f9c5fe6b5c7b0a87/powerOn HTTP/1.1
User-Agent: Go-http-client/1.1
Host: 127.0.0.1:7000
Content-Type: application/json
Content-Length: 136
Accept-Encoding: gzip
{"aws_access_key_id":"my-id","aws_secret_access_key":"my-key","account":"","region":""}
I searched on the web and I found information about this error message. It seems like it can be due to the time of the PC so I set my computer's time to automatic, but I see the same error.
My code was working a few days ago.
I've working on a project and we recently switched from HTTP to HTTPS.
So here are a few things. We're hosting our website on S3 with the static website enabled. For the server, we created an instance through Elastic Beanstalk. Using ACM, we have a certificate and successfully attached it to our frontend through Cloudfront and our server through Elastic Beanstalk Load Balancer.
Now our site is finally live and says secure! For some reason sometimes on the first load it takes upwards of 10 seconds to load the page and usually ends of timing out. This happens every couple of hours or whenever we clear our cache. But as soon as we refresh the site takes 2-3 seconds to load and everything works perfectly. We think that this is because we're using Angular 5 and using ng build, we get a large file called vendor.bundle.js that's around 13 mb.
We want to gzip this to hopefully solve the problem cause we think the website is timing out before that vendor file is even loaded in. We went into Cloudfront and enabled compression. We then went into our S3 bucket and added "Content-Length" as an allowed header. I looked up the request and response body and it is as follows:
Request
:method: GET
:scheme: https
:authority: riftapp.io
:path: /vendor.bundle.js
Host: riftapp.io
If-None-Match: "de078424bf26a6b1e2873009a37924ee-2"
Accept: */*
Connection: keep-alive
Accept-Language: en-us
Accept-Encoding: br, gzip, deflate
Cookie: __stripe_mid=333bb2df-0b1a-4757-a193-0596baeabd7c;
__stripe_sid=bff74933-d3fd-4ee2-a7a8-e5e73a224fdb
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4)
AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1
Safari/605.1.15
If-Modified-Since: Wed, 25 Jul 2018 03:02:42 GMT
Referer: https://riftapp.io/home
Response
:status: 304
Via: 1.1 e3a844a9e0d478ce4d12c6d1f3a2d892.cloudfront.net (CloudFront)
ETag: "de078424bf26a6b1e2873009a37924ee-2"
Age: 3021
Date: Wed, 25 Jul 2018 03:55:47 GMT
Server: AmazonS3
x-amz-cf-id: JHS0o-e-tBh1wfTy4AgKHALzHEEvdbUOkO0mgzSc56LAmnx515WS9A==
x-cache: Hit from cloudfront
We think that maybe it's because the request body host is riftapp.io instead of abc.cloudfront.net or something, but when I tried to delete the origin, I can't because our S3 bucket is linked to it.
I have recently set Cloudfront to cache files (images, css and javascript):
It requires me to clear cache manually from within the browser sometimes to get the new file.
Caching files within Cloudfront is really tough - A week ago - I tried enabling GZIP compression on Cloudfront and noticed that until I didn't add my file to the "Invalidaion" table - the file wasn't displayed updated on my website.
(I cleared cache from my browser and this didn't help either btw)
Is there any "smart" way for caching on Amazon Cloudfront?
Something like: "if the file has been updated - send the new file (as gzip of course). "
Here are my headers:
Request URL:https://abc.cloudfront.net/live/static/rcss/bootstrap3.min.css
Request Method:GET
Status Code:200 OK
Remote Address:77.77.77.77:443
Referrer Policy:no-referrer-when-downgrade
Response Headers
HTTP/1.1 200 OK
Content-Type: text/css
Connection: keep-alive
Date: Sun, 07 May 2017 08:14:04 GMT
Last-Modified: Wed, 26 Apr 2017 08:43:05 GMT
Server: AmazonS3
Content-Encoding: gzip
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 abc.cloudfront.net (CloudFront)
X-Amz-Cf-Id: abc-iuxzccRYcxce8LjxzcDeYdasdehHqFJGj80iczxcz8Q4asdDpWg==
Request Headers
Accept:text/css,*/*;q=0.1
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Connection:keep-alive
Host:abc.cloudfront.net
Pragma:no-cache
Referer:https://example.com/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
If there is anything else I can add to my post - please do ask.
I have a PHP app on AWS Elastic Beanstalk, I have created an assets bucket on S3. I'm trying to setup a Cloudfront distribution with behaviors to send requests for assets/* to S3 with a default behavior to send requests to EB. The domain points to Cloudfront.
All requests are going to EB which returns a 404 since there is no assets diretory in the EB environment.
I have created 2 Cloudfront origins, one for EB and one for the S3 bucket. This is what my behaviors look like:
Precedence Path Pattern Origin Protocol Policy Fwd Query Strings
0 assets/* S3-example-bucket HTTP and HTTPS No
1 Default (*) Custom-example.us-east-1.elasticbeanstalk.com HTTP and HTTPS Yes
It seems as though this should be pretty straight forward so I assume I'm missing something basic. Any help is greatly appreciated.
Edit:
Request header:
GET /assets/images/10waysaudiobook.png HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wordpress_logged_in_8a27500b7747be1e4fbad7f473f238e5=snickerspixy%7C1466021823%7Cr7rE5moINjanjHEqb1TGbsSkn9F7OCZLfX69IbcnGJu%7C28fc452885f3fe6e954243abab585a188f6511cdd6eeec6fa5ec5c50b9f3d393; wp-settings-7674=m4%3Do%26m5%3Do%26m9%3Do%26m6%3Do%26editor%3Dhtml%26m10%3Do%26m0%3Do%26m3%3Do%26hidetb%3D1%26m2%3Dc%26m1%3Do%26m8%3Do%26m12%3Do%26m7%3Do%26m11%3Do%26urlbutton%3Dnone%26m13%3Do%26tml1%3D1%26imgsize%3Dfull%26align%3Dcenter%26libraryContent%3Dbrowse%26ed_size%3D569%26unfold%3D1%26wplink%3D1%26mfold%3Do%26post_dfw%3Doff%26advImgDetails%3Dshow%26posts_list_mode%3Dlist; wp-settings-time-7674=1464816549; AWSELB=1FCB85F51606EBAFF15FEADB01C8069AEDE17E2A043407E615EF1A0E1ABF24607545A45D3DC206631F7AAE4503ADA423788B5E6B5B48FAE93EE916DE068509E64F92AC10FF; PHPSESSID=cpi2su7s967phu87rlpjgneel6; wordpress_test_cookie=WP+Cookie+check
Connection: keep-alive
Response header:
HTTP/1.1 404 Not Found
Cache-Control: no-cache, must-revalidate, max-age=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 05 Jun 2016 00:54:23 GMT
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Link: <http://example.com/wp-json/>; rel="https://api.w.org/"
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Connection: keep-alive
The response headers indicate that this request wasn't served by CloudFront at all, because there are headers that should be present... but are absent.
CloudFront adds Via:, X-Cache:, and x-amz-cf-id: headers to every response, and sometimes Age: (on cache hits and errors) or Vary: if you're forwarding the CloudFront-Is-*-Viewer: headers to the origin.
The absence of these headers suggest that the DNS for the site hasn't been pointed to CloudFront and may still be pointing directly to the EB environment, or if the change was recent, that the former TTL for the DNS entry may not yet have expired.
I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==
This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.