Amazon S3 and CloudFront - compressed files

Amazon S3 and CloudFront - compressed files - amazon-web-services

concerning Amazon's documentation it is possible to serve compressed files via CloudFront/S3, if I upload a compressed and an uncompressed version of the same file. Both files need to have the same content type, the compressed additionally needs to have Content-Encoding set to "gzip".
So now I have two files on S3:
https://s3-eu-west-1.amazonaws.com/kiga-client/gzip/client/config.js
https://s3-eu-west-1.amazonaws.com/kiga-client/gzip/client/config.js.gz
On my website I create a link to CloudFront which links to the config.js on
https://d1v5g5yve3hx29.cloudfront.net/gzip/client/config.js
I would now expect that I get automatically a compressed file when the client sends Accept-Encoding "gzip" via:
curl -I -H 'Accept-Encoding: gzip,deflate' https://d1v5g5yve3hx29.cloudfront.net/gzip/client/config.js
Unfortunately I get the raw file returned:
HTTP/1.1 200 OK
Content-Type: application/x-javascript
Content-Length: 3509
Connection: keep-alive
Date: Wed, 26 Nov 2014 11:12:43 GMT
Cache-Control: max-age=31536000
Last-Modified: Wed, 26 Nov 2014 10:50:15 GMT
ETag: "c310121403754f4faab782504912c15c"
Accept-Ranges: bytes
Server: AmazonS3
Age: 2405
X-Cache: Hit from cloudfront
Via: 1.1 8a256bddd45845f932a0a374e95fa057.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 4HRqstvYGYD1A-vfvltNrXGffg0D5XbFjSpoWReI5UNYf-2jQfE8jQ==
The response header Content-Encoding: gzip should be set but is missing.

To serve compressed files you need to actually request compressed file's URL from CloudFront. See pt. 5 here: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html#CompressedS3

To be precise one actually has to compress the files manually and then upload them to S3 with the appropriate metadata.
Further more one must keep the original filename, although the file is compressed.
So given a file image.jpg which gets compress to image.jpg.gz one has to upload image.jpg.gz and rename it to image.jpg

Related

BigQuery upload job returning errors - payload parts count wrong?

We are experiencing upload errors to BigQuery / cloud storage:
REQUEST
POST https://www.googleapis.com/upload/bigquery/v2/projects/XXX HTTP/1.1
Content-Type: multipart/related; boundary="PART_TAG_DATA_IMPORTER"
Host: www.googleapis.com
Content-Length: 652
--PART_TAG_DATA_IMPORTER
Content-Type: application/json; charset=UTF-8
{"configuration":{"load":{"createDisposition":"CREATE_IF_NEEDED","destinationTable":{"datasetId":"XX","projectId":"XX","tableId":"XX"},"schema":{"fields":[{"mode":"required","name":"xx1","type":"INTEGER"},{"mode":"required","name":"xx2","type":"STRING"},{"mode":"required","name":"xx3","type":"INTEGER"}]},"skipLeadingRows":1,"sourceFormat":"CSV","sourceUris":["gs://XXX/9f41d369-b63e-4858-9108-7d1243175955.csv"],"writeDisposition":"WRITE_TRUNCATE"}}}
--PART_TAG_DATA_IMPORTER--
RESPONSE:
HTTP/1.1 400 Bad Request
X-GUploader-UploadID: XXX
Content-Length: 77
Date: Fri, 15 Nov 2019 10:23:33 GMT
Server: UploadServer
Content-Type: text/html; charset=UTF-8
Alt-Svc: quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000
Payload parts count different from expected 2. Request payload parts count: 1
Anyone else receiving this? Everything worked fine since last night. There were no changes in our codebase and error is happening in about 80% of the cases but after 5-6 attempts it (sometimes) goes through.
We are using .NET and have the latest Google.Apis libraries but this is reproducible by simple request to the server. It also sometimes goes through normally.

Google has added check in /upload/bigquery/v2/projects/{projectId}/jobs endpoint a rule that it cannot receive single part message.
/bigquery/v2/projects/{projectId}/jobs needs to be used when doing upload from GCS as per this documentation here (which does not say this explicitly):
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert

This looks quite odd. It appears you're using the inline upload endpoint but you're passing a reference to a GCS object in the load config, and not sending an inline upload.
Could you share a snippet of how you're constructing this from the .NET code?

Content-Encoding header on html files within specific folder in AWS S3 bucket

I have just created a custom error page, but for some reason I cannot set several headers on the file(s).
Currently my headers look like this:
X-Firefox-Spdy h2
accept-ranges bytes
age 432
content-length 1931
content-type text/html
date Fri, 06 Oct 2017 10:55:47 GMT
etag "6fc24050256bab8cec351de1c6c74a4f"
last-modified Fri, 06 Oct 2017 10:55:33 GMT
server AmazonS3
via 1.1 a57f85bbf89c6dasdasdasddcddasd9687e0.cloudfront.net (CloudFront)
x-amz-cf-id JZAiF7gZnnUVrorerfasusQu84gQVGwV0UU4h3mjaw4E-CKL2_Xm6zOg==
x-cache Error from cloudfront
but should really look like this:
X-Firefox-Spdy h2
age 1512
content-encoding gzip
content-type text/html
date Fri, 14 Jul 2017 06:42:03 GMT
last-modified Sat, 17 Jan 2015 17:35:49 GMT
server AmazonS3
vary Accept-Encoding
via 1.1 a57f85bbf89c6dasdasdasddcddasd9687e0.cloudfront.net (CloudFront)
x-amz-cf-id JZAiF7gZnnUVrorerfasusQu84gQVGwV0UU4h3mjaw4E-CKL2_Xm6zOg==
x-cache Error from cloudfront
There is an option in metadata to enter Content-Encoding, but when I enter gzip, I keep getting an error and page is not being displayed. In addition to this, header Accept-Encoding cannot be set and when I am trying to delete accept-ranges header, it keeps coming back again and again.
What should I do or what should not do to make it right.

Here is the documentation on how to setup S3 or custom origins to serve compressed data transfer through cloudfront,
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
One does not need to compress file and store it in S3. Cloudfront will automatically handle that for you.
Hope it helps.

CloudFront responds with 403 Forbidden instead of triggering Lambda

Here is a schema of CDN to resize images and serve them via AWS CloudFront:
If an image is not found in the S3 bucket, it issues a 307 Temporary Redirect (instead of 404) to access Lambda via API Gateway. Lambda resizes the image (based on the original from the S3 bucket) and uploads it into the S3 bucket. The browser gets once again permanently redirected to the S3 bucket with the newly generated image.
When I want to access the same image via CloudFront, I am receiving a 403 Forbidden error. It comes either from the S3 or CloudFront. As the status indicates, this may have something to do with access rights.
Why does adding CloudFront into the working request chain cause the 403 error?
What works:
https://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 307 Temporary Redirect
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:12 GMT
Location: https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
Content-Length: 0
Server: AmazonS3
https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
HTTP/1.1 301 Moved Permanently
Content-Type: application/json
Content-Length: 0
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:37:16 GMT
x-amzn-RequestId: xxxx
location: http://{bucket}.s3-website-eu-west-1.amazonaws.com/100x100/image.jpg
X-Amzn-Trace-Id: xxxx
X-Cache: Miss from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
http://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 200 OK
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:18 GMT
Last-Modified: Sat, 19 Aug 2017 15:37:17 GMT
x-amz-version-id: null
ETag: xxxx
Content-Type: image/png
Content-Length: 20495
Server: AmazonS3
What doesn't work:
https://{distribution}.cloudfront.net/100x100/image.jpg
HTTP/1.1 403 Forbidden
Content-Type: application/xml
Transfer-Encoding: chunked
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:38:24 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
I've added the S3 bucket as origin into CloudFront

The error was caused by using a REST endpoint (e.g. s3.amazonaws.com) for website-like functionality (redirects, html error messages, and index documents). These features are only provided by the web site endpoints (e.g. bucketname.s3-website-us-east-1.amazonaws.com).
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
It confused me because the REST endpoint was offered via autocomplete in the console, when creating the CloudFront distribution. The correct endpoint has to be entered manually.

CloudFront also caches 40x 50x status codes coming from S3 (doc.: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html#HTTPStatusCodes-cached-errors ).
You should invalidate the Cloudfront cache for the resized img path. You can do it by calling the CreateInvalidation API from your Lambda function.
Doc:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html#invalidating-objects-api

MISS from Cloudfront after HIT from Cloudfront

I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==

This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.

Is it valid to respond to an HTTP 1.1 request with an HTTP 1.0 response?

I am setting up video delivery for video files to tv set-top boxes.
I want to use Amazon Cloudfront.
The video files are requested as usual http requets that may contain a range header to request partial resources (to enable the user on the box to jump into any position within the video).
My problem is that it is working on 2 of 3 boxes, one makes problems.
The request looks like this (sample data):
GET /path/file.mp4 HTTP/1.1
User-Agent: My User Agent
Host:myhost.com
Accept:*/*
Range: bytes=100-200
So if I do a request to cloudfront using telnet I see that the response is HTTP 1.0:
joe#flimmit-joe:~$ telnet d2zf9fl0izzsf6.cloudfront.net 80
Trying 216.137.61.164...
Connected to d2zf9fl0izzsf6.cloudfront.net.
Escape character is '^]'.
GET /skin/frontend/default/flimmit/images/headerbanners/02_green.png HTTP/1.1
User-Agent: My User Agent
Host:d2zf9fl0izzsf6.cloudfront.net
Accept:*/*
Range: bytes=100-200
HTTP/1.0 206 Partial Content
Date: Sun, 12 Feb 2012 18:42:15 GMT
Server: Apache/2.2.16 (Ubuntu)
Last-Modified: Tue, 26 Jul 2011 10:37:54 GMT
ETag: "1e0b8a-2d2b-4a8f6863ac11a"
Accept-Ranges: bytes
Cache-Control: max-age=2592000
Expires: Tue, 13 Mar 2012 18:42:15 GMT
Content-Type: image/png
Age: 351213
Content-Range: bytes 100-200/11563
Content-Length: 101
X-Cache: Hit from cloudfront
X-Amz-Cf-Id: W2fzPeBSWb8_Ha_UzvIepZH-Z9xibXyRddoHslJZ3TDXyFfjwE3UMQ==,CwiKc8-JGfE77KBVTTOyE9g-OYf7P-bCJZEWGwef9Es5rzhUBYKE8A==
Via: 1.0 972e3ba2f91fd0a38ea062d0cc03be37.cloudfront.net (CloudFront)
Connection: close
q�]#��ĥM�oӘ�i��i��������Y�.��/��ib���&
���
�Ⱦ�00�>�����Y`��X���r���s�=�n�s�b���7MConnection closed by foreign host.
joe#flimmit-joe:~$
Unfortunately I have only limited access to the box for testing purposes.
However this behavior by cloud-front seems strange to me so I wanted to ask whether it is even valid.

It is absolutely "valid" to answer with Http 1.0 to an Http 1.1 request.
I'll cite the Appendix 19.6 to the RFC2068 "It is beyond the scope of a protocol specification to mandate compliance with previous versions. HTTP/1.1 was deliberately designed, however, to make supporting previous versions easy."
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.6
The important part is basically that the RFC does not force an Http 1.1 answer, so it's up to the server.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Amazon S3 and CloudFront - compressed files - amazon-web-services

To serve compressed files you need to actually request compressed file's URL from CloudFront. See pt. 5 here: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html#CompressedS3

Related

BigQuery upload job returning errors - payload parts count wrong?

Content-Encoding header on html files within specific folder in AWS S3 bucket

CloudFront responds with 403 Forbidden instead of triggering Lambda

MISS from Cloudfront after HIT from Cloudfront

Is it valid to respond to an HTTP 1.1 request with an HTTP 1.0 response?

Categories

Resources