I went through numerous articles and forums online on how to create a distribution group but all of them are using S3 as origin domain name.
I created a distribution group using origin domain name as rails server e.g assets.abcd.efgh.com I can access the asset if i do assets.abcd.efgh.com/assets/abcdefghti-ieajife.css but i can not access the asset using distribution domain name as 1234test.cloudfront.net/assets/abcdefghti-ieajife.css. i am getting error:
Failed to contact the origin
The result i get using curl is
curl -I -s -X GET -H "Origin: https://assets.abcd.efgh.com" 1234test.cloudfront.net/assets/abcdefghti-ieajife.css
HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Content-Length: 507
Connection: keep-alive
Server: CloudFront
Date: Tue, 25 Oct 2016 16:48:17 GMT
Expires: Tue, 25 Oct 2016 16:48:17 GMT
X-Cache: Error from cloudfront
Via: 1.1 8f18deab0e501ffbd2fa94cfd46e4785.cloudfront.net (CloudFront)
X-Amz-Cf-Id: PLAjGN5UuFEEFZSRYu_fGfsMDBcjH1w7Ruy1x1fv9bWiftWak3k1QA==
can someone guide me what other settings i need to do while creating distribution group or what i am missing?
Found out that the origin needed to be updated to receive public requests. It was receiving private requests only
Related
I have a static website using S3, Cloudfront and Route53. I had version 1 and now I updated to version 2.
If I view the website using the S3 endpoint (http://abc.s3-website-eu-west-1.amazonaws.com/) I see version 2.
If I use the Cloudfront endpoint (https://xyz.cloudfront.net/) I see version 2 again.
If I use the domain I have configured in Route53 (A record pointing to the cloudfront distribution) I see version 1. I have not setup any TTL for the DNS records (default behavior) and this has been going on for ~1 week now.
Some extra checks:
dig A xyz.cloudfront.net and dig A mydomain.com point to the same IPs.
And the output of curl is describing the previous situation (version 1) where mydomain.com was configured to point to www.example.com. Now I have it pointing directly to the cloudfront distribution
curl -sD - https://example.com -o /dev/null
HTTP/1.1 302 Moved Temporarily
Content-Length: 0
Connection: keep-alive
Server: CloudFront
Location: https://www.example.com/
X-Cache: Miss from cloudfront
Via: 1.1 qwe.cloudfront.net (CloudFront)
...
curl -sD - https://xyz.cloudfront.net -o /dev/null
HTTP/2 200
content-type: text/html
content-length: 537
date: Fri, 18 Nov 2022 20:35:41 GMT
last-modified: Tue, 15 Nov 2022 10:09:03 GMT
etag: "..."
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 bla.cloudfront.net (CloudFront)
...
Is there anything else I could check to find out what is misconfigured?
The DNS server (Route53) and the TTL on those DNS records are entirely irrelevant here. Those have nothing to do with content caching.
Your curl commands show your requests are being routed to different CloudFront edge nodes (qwe.cloudfront.net and bla.cloudfront.net).
It appears that one of those nodes had served a request for your website before, so it had a cached version stored, which it is still serving when you make new requests that hit that CloudFront edge node now. The other node didn't have a cached version stored, so when your request hit it, it went back to the origin (S3) and pulled in the latest version.
This is pretty much the expected behavior of CloudFront, or any other CDN, when you publish new content on your origin server without notifying the CDN that it needs to clear the old cached content. You need to tell CloudFront to invalidate the cache, which will cause it to remove the cached version of your content from all edge locations.
Production Setup: Django v3.0.5 on Nginx / Gunicorn / Supervisor (i followed directions from here)
(I don't think this is any issue but i am using dj-stripe for django/stripe integration)
While on development (django's built-in HTTP server).. everything seems to work (i.e. stripe can send webhook events just fine)... however, on production, i get emails saying that Stripe can't reach my server.
When I run
curl -D - -d "user=user1&pass=abcd" -X POST https://my.server/stripe/webhook/
I get this response
HTTP/1.1 400 Bad Request
Server: nginx/1.15.9 (Ubuntu)
Date: Thu, 18 Jun 2020 19:44:07 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
Vary: Cookie
However, non-webhook (i.e. visiting the website via browser) seems to work normally.. just webhooks.
Any idea where this is going wrong?
Your request doesn't have the Stripe secret which is needed for authentication.
Here is a schema of CDN to resize images and serve them via AWS CloudFront:
If an image is not found in the S3 bucket, it issues a 307 Temporary Redirect (instead of 404) to access Lambda via API Gateway. Lambda resizes the image (based on the original from the S3 bucket) and uploads it into the S3 bucket. The browser gets once again permanently redirected to the S3 bucket with the newly generated image.
When I want to access the same image via CloudFront, I am receiving a 403 Forbidden error. It comes either from the S3 or CloudFront. As the status indicates, this may have something to do with access rights.
Why does adding CloudFront into the working request chain cause the 403 error?
What works:
https://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 307 Temporary Redirect
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:12 GMT
Location: https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
Content-Length: 0
Server: AmazonS3
https://{gateway}.execute-api.{region}.amazonaws.com/prod/resize?key=100x100/image.jpg
HTTP/1.1 301 Moved Permanently
Content-Type: application/json
Content-Length: 0
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:37:16 GMT
x-amzn-RequestId: xxxx
location: http://{bucket}.s3-website-eu-west-1.amazonaws.com/100x100/image.jpg
X-Amzn-Trace-Id: xxxx
X-Cache: Miss from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
http://{bucket}.s3-website-{region}.amazonaws.com/100x100/image.jpg
HTTP/1.1 200 OK
x-amz-id-2: xxxx
x-amz-request-id: xxxx
Date: Sat, 19 Aug 2017 15:37:18 GMT
Last-Modified: Sat, 19 Aug 2017 15:37:17 GMT
x-amz-version-id: null
ETag: xxxx
Content-Type: image/png
Content-Length: 20495
Server: AmazonS3
What doesn't work:
https://{distribution}.cloudfront.net/100x100/image.jpg
HTTP/1.1 403 Forbidden
Content-Type: application/xml
Transfer-Encoding: chunked
Connection: keep-alive
Date: Sat, 19 Aug 2017 15:38:24 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 {distribution}.cloudfront.net (CloudFront)
X-Amz-Cf-Id: xxxx
I've added the S3 bucket as origin into CloudFront
The error was caused by using a REST endpoint (e.g. s3.amazonaws.com) for website-like functionality (redirects, html error messages, and index documents). These features are only provided by the web site endpoints (e.g. bucketname.s3-website-us-east-1.amazonaws.com).
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
It confused me because the REST endpoint was offered via autocomplete in the console, when creating the CloudFront distribution. The correct endpoint has to be entered manually.
CloudFront also caches 40x 50x status codes coming from S3 (doc.: http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/HTTPStatusCodes.html#HTTPStatusCodes-cached-errors ).
You should invalidate the Cloudfront cache for the resized img path. You can do it by calling the CreateInvalidation API from your Lambda function.
Doc:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html#invalidating-objects-api
I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.
When I go to the image url on a browser it loads without a problem.
When I attempt to get the Header info via my code or via online tools it fails
An example URL is http://www.adorama.com/images/large/CHHB74P.JPG
As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code.
I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.
It also fails to do a normal GET request via Hurl.it. Same 403 error.
If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.
And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue
Any idea what is required for me to download the image headers and not get the 403?
same problem here.
Locally it works smoothly. Doing it from an AWS instance I get the very same problem.
I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.
My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.
Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.
I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 403 Forbidden
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 299
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 08:42:33 GMT
Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.
But from another Linux host it did work. Here is output
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Content-Type: image/jpeg
Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
ServerID: C01
Content-Length: 15131
Cache-Control: private, max-age=604800
Date: Wed, 23 Mar 2016 08:43:11 GMT
Connection: keep-alive
Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d
100%[=====================================>] 15,131 --.-K/s in 0s
2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]
I would guess that the image provider is deliberately blocking requests from EC2 address ranges.
The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing
Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check
Geoip, IP
Browser headers
User agents
plugin info
Browser fonts return
You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org
You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.
I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:
Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
https ://bar: okay
https://bar: 403
All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.
I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.
I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).
The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.
My current Cloudfront configuration forwards Content-Type request Header to origin.
How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.
Request/Response headers-
I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Response after two successive request using browser
Remote Address:54.230.156.66:443
Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
Request Method:GET
Status Code:200 OK
Response Headers
view source
Accept-Ranges:bytes
Age:23
Cache-Control:public, max-age=31557600
Connection:keep-alive
Content-Length:8708
Content-Type:image/jpg
Date:Fri, 27 Nov 2015 09:16:03 GMT
ETag:"-170562206"
Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
Vary:Accept-Encoding
Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
X-Backend:image6_40
X-Cache:Hit from cloudfront
X-Cache-Hits:0
X-Device:pc
X-DeviceType:pc
X-Powered-By:xyz
Now same url requested using curl but gave me miss
curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Age: 0
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==
Now this will give HIT
manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
HTTP/1.1 200 OK
Content-Type: image/jpg
Content-Length: 8708
Connection: keep-alive
Cache-Control: public, max-age=31557600
Date: Fri, 27 Nov 2015 09:16:47 GMT
ETag: "-170562206"
Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
X-Backend: image6_40
X-Cache-Hits: 0
X-Device: pc
X-DeviceType: pc
X-Powered-By: xyz
Age: 3
Vary: Accept-Encoding
X-Cache: Hit from cloudfront
Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==
This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?
Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.
I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).
Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.