MISS from Cloudfront after HIT from Cloudfront - amazon-web-services

I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==

This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.

Related

Cloudfront not sending If-None-Match to my origin

I'm trying to configure Cloudfront to send an If-None-Match to my custom origin when a resource expires, so that I can respond with a 304 if nothing has changed. For some reason, I'm unable to get Cloudfront to do so.
My origin responds with these headers:
HTTP/2 200
content-type: application/json; charset=utf-8
content-length: 181691
access-control-allow-origin: *
cache-control: max-age=5
date: Fri, 17 Feb 2023 21:16:49 GMT
x-content-type-options: nosniff
x-frame-options: DENY
etag: W/"15-mbAPvGdFm9PuCZHJFTtrwm#3"
vary: Accept-Encoding
So, sending cache-control of 5 seconds and a weak e-tag.
My cloudfront cache policy has min ttl of 1, forwards headers Origin and a few x- ones, forwards all query strings. No cookies. Compression is turned on.
My origin request policy is "AllViewer".
The request is traveling to Cloudfront, which goes through a classic AWS load balancer, which hits a kubernetes pod that handles and sends the response.
For some reason, Cloudfront never sends an If-None-Match header to my origin when resource expires. If I manually specify an If-None-Match header in my request in a curl command to Cloudfront, my origin does see it and responds correctly. No no intermediate hop is removing the If-None-Match header, so it must be that Cloudfront is not sending it in the first place.
Any ideas what could be wrong? I've been pouring over the documentations but have not found anything that worked.
Thanks!

CloudFront responds with 403 Forbidden instead of triggering Lambda

Here is a schema of CDN to resize images and serve them via AWS CloudFront:
If an image is not found in the S3 bucket, it issues a 307 Temporary Redirect (instead of 404) to access Lambda via API Gateway. Lambda resizes the image (based on the original from the S3 bucket) and uploads it into the S3 bucket. The browser gets once again permanently redirected to the S3 bucket with the newly generated image.
When I want to access the same image via CloudFront, I am receiving a 403 Forbidden error. It comes either from the S3 or CloudFront. As the status indicates, this may have something to do with access rights.
Why does adding CloudFront into the working request chain cause the 403 error?
What works:
https://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 307 Temporary Redirect
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:12 GMT
Location: https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
Content-Length: 0
Server: AmazonS3
https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
HTTP/1.1 301 Moved Permanently
Content-Type: application/json
Content-Length: 0
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:37:16 GMT
x-amzn-RequestId: xxxx
location: http://{bucket}.s3-website-eu-west-1.amazonaws.com/100x100/image.jpg
X-Amzn-Trace-Id: xxxx
X-Cache: Miss from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
http://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 200 OK
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:18 GMT
Last-Modified: Sat, 19 Aug 2017 15:37:17 GMT
x-amz-version-id: null
ETag: xxxx
Content-Type: image/png
Content-Length: 20495
Server: AmazonS3
What doesn't work:
https://{distribution}.cloudfront.net/100x100/image.jpg
HTTP/1.1 403 Forbidden
Content-Type: application/xml
Transfer-Encoding: chunked
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:38:24 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
I've added the S3 bucket as origin into CloudFront
The error was caused by using a REST endpoint (e.g. s3.amazonaws.com) for website-like functionality (redirects, html error messages, and index documents). These features are only provided by the web site endpoints (e.g. bucketname.s3-website-us-east-1.amazonaws.com).
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
It confused me because the REST endpoint was offered via autocomplete in the console, when creating the CloudFront distribution. The correct endpoint has to be entered manually.
CloudFront also caches 40x 50x status codes coming from S3 (doc.: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html#HTTPStatusCodes-cached-errors ).
You should invalidate the Cloudfront cache for the resized img path. You can do it by calling the CreateInvalidation API from your Lambda function.
Doc:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html#invalidating-objects-api

AWS Cloudfront behaviors not working as expected

I have a PHP app on AWS Elastic Beanstalk, I have created an assets bucket on S3. I'm trying to setup a Cloudfront distribution with behaviors to send requests for assets/* to S3 with a default behavior to send requests to EB. The domain points to Cloudfront.
All requests are going to EB which returns a 404 since there is no assets diretory in the EB environment.
I have created 2 Cloudfront origins, one for EB and one for the S3 bucket. This is what my behaviors look like:
Precedence Path Pattern Origin Protocol Policy Fwd Query Strings
0 assets/* S3-example-bucket HTTP and HTTPS No
1 Default (*) Custom-example.us-east-1.elasticbeanstalk.com HTTP and HTTPS Yes
It seems as though this should be pretty straight forward so I assume I'm missing something basic. Any help is greatly appreciated.
Edit:
Request header:
GET /assets/images/10waysaudiobook.png HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wordpress_logged_in_8a27500b7747be1e4fbad7f473f238e5=snickerspixy%7C1466021823%7Cr7rE5moINjanjHEqb1TGbsSkn9F7OCZLfX69IbcnGJu%7C28fc452885f3fe6e954243abab585a188f6511cdd6eeec6fa5ec5c50b9f3d393; wp-settings-7674=m4%3Do%26m5%3Do%26m9%3Do%26m6%3Do%26editor%3Dhtml%26m10%3Do%26m0%3Do%26m3%3Do%26hidetb%3D1%26m2%3Dc%26m1%3Do%26m8%3Do%26m12%3Do%26m7%3Do%26m11%3Do%26urlbutton%3Dnone%26m13%3Do%26tml1%3D1%26imgsize%3Dfull%26align%3Dcenter%26libraryContent%3Dbrowse%26ed_size%3D569%26unfold%3D1%26wplink%3D1%26mfold%3Do%26post_dfw%3Doff%26advImgDetails%3Dshow%26posts_list_mode%3Dlist; wp-settings-time-7674=1464816549; AWSELB=1FCB85F51606EBAFF15FEADB01C8069AEDE17E2A043407E615EF1A0E1ABF24607545A45D3DC206631F7AAE4503ADA423788B5E6B5B48FAE93EE916DE068509E64F92AC10FF; PHPSESSID=cpi2su7s967phu87rlpjgneel6; wordpress_test_cookie=WP+Cookie+check
Connection: keep-alive
Response header:
HTTP/1.1 404 Not Found
Cache-Control: no-cache, must-revalidate, max-age=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 05 Jun 2016 00:54:23 GMT
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Link: <http://example.com/wp-json/>; rel="https://api.w.org/"
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Connection: keep-alive
The response headers indicate that this request wasn't served by CloudFront at all, because there are headers that should be present... but are absent.
CloudFront adds Via:, X-Cache:, and x-amz-cf-id: headers to every response, and sometimes Age: (on cache hits and errors) or Vary: if you're forwarding the CloudFront-Is-*-Viewer: headers to the origin.
The absence of these headers suggest that the DNS for the site hasn't been pointed to CloudFront and may still be pointing directly to the EB environment, or if the change was recent, that the former TTL for the DNS entry may not yet have expired.

How to find out where a cookie is set?

I am trying to find out where a cookie is being set.
I am running Varnish cache and want to know where the cookie is being set so I know if I can safely remove it for caching purposes.
The response headers look like this;
HTTP/1.1 200 OK
Server: Apache/2.2.17 (Ubuntu)
Expires: Mon, 05 Dec 2011 15:11:39 GMT
Cache-Control: no-store, max-age=7200
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
X-Session: NO
X-Cacheable: YES
Date: Tue, 04 Dec 2012 15:29:40 GMT
X-Varnish: 1233768756 1233766580
Age: 1081
Via: 1.1 varnish
Connection: keep-alive
X-Cache: HIT
There is no cookie present. But when loading the same page in a browser the headers are the same, I get a cache hit and no cookie in the response headers.
But then the cookie is there all of a sudden, so it must be being somewhere. Even if I remove it it reappears. It even appears in Incognito mode in Chrome. But it is not in the header response.
I have been through all the javascript on the site and cannot find anything, is there any other way of setting a cookie?
Thanks.
If the Set-Cookie header goes through Varnish at some point, you can use varnishlog to find the request URL:
$ varnishlog -b -m 'RxHeader:Set-Cookie.*COOKIENAME'
This will give you a full varnishlog listing for the backend requests, including the TxURL to the backend which tells you what the client asked for when it got Set-Cookie back.

Is it valid to respond to an HTTP 1.1 request with an HTTP 1.0 response?

I am setting up video delivery for video files to tv set-top boxes.
I want to use Amazon Cloudfront.
The video files are requested as usual http requets that may contain a range header to request partial resources (to enable the user on the box to jump into any position within the video).
My problem is that it is working on 2 of 3 boxes, one makes problems.
The request looks like this (sample data):
GET /path/file.mp4 HTTP/1.1
User-Agent: My User Agent
Host:myhost.com
Accept:*/*
Range: bytes=100-200
So if I do a request to cloudfront using telnet I see that the response is HTTP 1.0:
joe#flimmit-joe:~$ telnet d2zf9fl0izzsf6.cloudfront.net 80
Trying 216.137.61.164...
Connected to d2zf9fl0izzsf6.cloudfront.net.
Escape character is '^]'.
GET /skin/frontend/default/flimmit/images/headerbanners/02_green.png HTTP/1.1
User-Agent: My User Agent
Host:d2zf9fl0izzsf6.cloudfront.net
Accept:*/*
Range: bytes=100-200
HTTP/1.0 206 Partial Content
Date: Sun, 12 Feb 2012 18:42:15 GMT
Server: Apache/2.2.16 (Ubuntu)
Last-Modified: Tue, 26 Jul 2011 10:37:54 GMT
ETag: "1e0b8a-2d2b-4a8f6863ac11a"
Accept-Ranges: bytes
Cache-Control: max-age=2592000
Expires: Tue, 13 Mar 2012 18:42:15 GMT
Content-Type: image/png
Age: 351213
Content-Range: bytes 100-200/11563
Content-Length: 101
X-Cache: Hit from cloudfront
X-Amz-Cf-Id: W2fzPeBSWb8_Ha_UzvIepZH-Z9xibXyRddoHslJZ3TDXyFfjwE3UMQ==,CwiKc8-JGfE77KBVTTOyE9g-OYf7P-bCJZEWGwef9Es5rzhUBYKE8A==
Via: 1.0 972e3ba2f91fd0a38ea062d0cc03be37.cloudfront.net (CloudFront)
Connection: close
q�]#��ĥM�oӘ�i��i��������Y�.��/��ib���&
���
�Ⱦ�00�>�����Y`��X���r���s�=�n�s�b���7MConnection closed by foreign host.
joe#flimmit-joe:~$
Unfortunately I have only limited access to the box for testing purposes.
However this behavior by cloud-front seems strange to me so I wanted to ask whether it is even valid.
It is absolutely "valid" to answer with Http 1.0 to an Http 1.1 request.
I'll cite the Appendix 19.6 to the RFC2068 "It is beyond the scope of a protocol specification to mandate compliance with previous versions. HTTP/1.1 was deliberately designed, however, to make supporting previous versions easy."
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.6
The important part is basically that the RFC does not force an Http 1.1 answer, so it's up to the server.