Is it possible to have a custom origin server tell CloudFront to directly serve a file from an S3 bucket, similar to the way X-Sendfile works in Nginx? I'd like to avoid having to read the file from S3 and pipe it to CloudFront.
No, this isn't possible.
Once the request is sent from CloudFront to the origin server, the only thing CloudFront will do (unless an error occurs, of course) is return the origin server's response to the requester.
The only way that comes to mind in which this could really be possible would be if CloudFront followed HTTP redirects, which it does not do.
If you want to return content from elsewhere once the request has arrived at the origin, you'll have to fetch it and stream it back... which will probably perform better than you expect, if the bucket is in the same region as the origin server and your code is tight. The latency to S3 within a region is very low and the available bandwidth is high. I have an application that does exactly this, many thousands of times each day on a little t2 instance, so it's certainly viable.
Of course, with a single CloudFront distrubution, you can have multiple origins -- such as your server and S3. CloudFront can choose which origin will handle each request based on path pattern matching... but that's a static mapping, so I don't know whether it applies to what you're trying to do.
Related
I have two paths in a CloudFront distribution which have the following Behaviors:
Path Pattern
Origin
Viewer Protocol Policy
/api/*
APIOriginGroup
Redirect HTTP to HTTPS
/*
S3OriginGroup
Redirect HTTP to HTTPS
And these Origins:
Origin Name
Origin Domain
Origin Group
S3Origin1
us-east-1.s3.[blah]
S3OriginGroup
S3Origin2
us-east-2.s3.[blah]
S3OriginGroup
APIOrigin1
a domain
APIOriginGroup
APIOrigin2
a domain
APIOriginGroup
This setup works fine for GET requests, but if I add POST requests into the Cache Behavior's Cache Methods I get an error:
"cannot include POST, PUT, PATCH, or DELETE for a cached behavior"
This doesn't make sense to me. If CloudFront really is used by AWS customers to serve billions of requests per day and AWS recommends using CloudFront Origin Failover, which ?requires origin groups?, then it follows that there must be some way to configure CloudFront to allow origin behaviors which allow POST requests. Is this not true? Are all of these API requests being made by this customer GET requests?
To be clear, my fundamental problem is that I want to use CloudFront Origin Failover to switch between my primary region and secondary region when an AWS region fails. To make that possible, I need to switch over not only my front end, S3-based traffic (GET requests), but also switch over my back-end traffic (POST requests).
Note: CloudFront supports routing behaviors with POST requests if you do not use an Origin Group. It seems that only when I added this Origin Group (to support the second region) that this error appeared.
Short Answer: You can't do origin failover in CloudFront for request methods other than GET, HEAD, or OPTIONS. Period.
TL; DR
CloudFront caches GET and HEAD requests always. It can be configured to cache OPTIONS requests too. However it doesn't cache POST, PUT, PATCH, DELETE,... requests which is consistent with the most of the public CDNs out there. However, some of them might provide you with writing some sort of custom hooks by virtue of which you can cache POST, PUT, PATCH, DELETE,... requests. You might be wondering why is that? Why can't I cache POST requests? The answer to that question is RFC 2616. Since POST requests are not idempotent, the specification advises against caching them and sending them to the end server intended, indeed, always. There's a very nice SO thread here which you can read to have a better understanding.
CloudFront fails over to the secondary origin only when the HTTP method of the viewer request is GET, HEAD, or OPTIONS. CloudFront does not fail over when the viewer sends a different HTTP method (for example POST, PUT, and so on).
Ok. POST requests are not cached by CloudFront. But, why does CloudFront not provide failover for POST requests?
Let's see how does CloudFront handle requests in case of a primary origin failure. See below:
CloudFront routes all incoming requests to the primary origin, even when a previous request failed over to the secondary origin. CloudFront only sends requests to the secondary origin after a request to the primary origin fails.
Now, since POST requests are not cached CloudFront has to go to the primary origin each time, come back with an invalid response or worst a time-out, then hit the secondary origin in the origin group. We're talking about region failures here. The failover requests from primary to secondary would be ridiculously high and we might expect a cascading failure due to high load. This would lead to CloudFront PoP failures and this defeats the whole purpose of high availability, doesn't it? Again, this explanation is only my assumption. Of course, I'm sure folks at CloudFront would come up with a solution for handling POST requests region failover soon.
So far so good. But how are other AWS customers able to guarantee high availability to their users in case of AWS region failures.
Well other AWS customers only use CloudFront region failover to make their static websites, SPAs, static contents like videos (live and on demand), images, etc failure proof which by the way only requires GET, HEAD and occasionally OPTION HTTP requests. Imagine a SaaS company which drives its sales and discoverability via a static website. If you could reduce your downtime by the method above which would ensure your sales/growth doesn't take a hit, why wouldn't you?
Got the point. But I do really need to have region failover for my backend APIs. How can I do it?
One way would be to write a custom Lambda#Edge function. CloudFront hits the intended primary origin, the code inside checks for time-out/response codes/etc and if failover has to be triggered, hits the other origin's endpoint and returns the response. This is again in contradictory to the current schemes of CloudFront.
Another solution would be, which in my opinion is much cleaner, is to make use of latency-based routing support of Route53. You can read about how to do that here. While this method would surely work for your backend APIs if you had different subdomain names for your S3 files and APIs (and those subdomains pointing to different CloudFront distributions) since it leverages CloudFront canonical names, I'm a bit skeptical if this would work in your setup. You can try and test it out, anyways.
Edit: As suggested by OP, there is a third approach to achieve this which is to handle this on the client side. Whenever client receives an unexpected response code or a timeout it makes an API call to another endpoint which is hosted on another region. This solution is a bit cheaper and simpler and easier to implement with current scheme of things available.
I am using a single CloudFront to serve multiple domains (*.domain.com), this CloudFront uses an API server as an origin.
Any request made to a domain say, test1.domain.com to goes to the API server and depending on the host header we send the response back.
Now, if I want to purge the cache, I can't run it for a specific domain ( as CloudFront only allows path invalidation like /users/* or /users//books*).
Note: I don't want to purge the entire cache (*/**), as it would cause performance issues.
Did someone face this situation?
I'm using CloudFront CDN to simply cache my static contents in "Origin Pull" mode. The CloudFront origin is my website.
However, I've encountered a CORS problem. My browser doesn't let my web pages load my fonts files and SVGs from CloudFront.
After googling this matter a bit, I noticed that all blogs/tutorials explain how to enable CORS on an S3 bucket used as the origin for CloudFront, and letting CloudFront forward the Access-Control-Allow-XXX headers from S3 to the client.
I don't need an S3 bucket and would like to keep it that way for the sake of simplicity, if possible.
Is it possible to enable CORS on CloudFront? Even a quick and dirty solution, such as setting the access control header on all responses would be good enough.
Following up the comment above, CORS is a request made FROM a domain different of the TO domain. The key part to avoid this, is in the server which returns your requests return the header allowing cross origin requests.
Your fonts, which should be your website's assets, should be kept in the same server as your website, therefore CORS should not be an issue.
We have an aws s3 bucket that hosts our dynamic images, which will be fetched by web and mobile apps through https and with different sizes (url/width x height/image_name) i.e. http://test.s3.com/200x300/image.png).
For this we did two things:
1- Realtime resizing: I have a redirection rule in my s3 bucket to redirect 404 errors requesting non-existing image sizes to an API gateway that calls a Lambda function. The lambda function fetches the original image and resizes it and places it in a folder in the bucket matching the requested size.
We followed the steps in this articles:
https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/
2- HTTPS: I created a cloudfront distribution with an SSL certificate and its origin is the s3 static website endpoint
Problem: Requesting an image from s3 using the cloudfront https domain always causes an 404 error which gets redriected by my redirection rule the API gateway, even if this specific image size already exists.
I tried to debug this issue with no luck. I examined the requests and from what I see things should work normally.
I'd appreciate a hint on what to do to better debug this issue (and what kind of logs I need to provide here).
Thanks
Sary
This solution relies on S3 generating HTTP redirects for missing objects, to redirect the browser to API Gateway to resize the object... and save it at the original URL.
The problem is two-fold:
S3 generated redirects don't include any Cache-Control headers, and
CloudFront's default behavior when Cache-Control is absent in a response is to cache the response internally for the value of a timer called Default TTL, which by default is set to 86400 seconds (24 hours).
The problem this causes is that CloudFront will remember the original redirect and send the browser to it, again and again, even though the object is now present.
Selecting Customize instead of Use Origin Cache Headers for "Object caching" and then setting Default TTL to 0 (all in the CloudFront Cache Behavior settings) will resolve the issue, because it configures CloudFront not to cache responses where the origin didn't include any relevant Cache-Control headers.
For more background:
What is Cloudfront Minimum TTL for? explains the Minimum/Default/Maximum TTL timers and how/when they apply.
Setting "Object Caching" on CloudFront explains the confusing UI labeling of these options, which is likely a holdover from a time before all three timers were configurable.
Using AWS cloudfront with S3 to host an angular-based web client.
Is there any rewrite rule or settings allowing one of the following examples? It is so unclear from what AWS are trying to exaplain.
Using friendly route, for example:
domain.com?lang=en&fun=no => domain.com/en/no
Configuration folders to have a default file, for example:
domain.com\en => domain.com (but now the client knows it has a parameter lang=en)
Obviously both of the example can be done with an html file which routes to the desired url BUT it doesn't work well with some sort of analytics models such as google's.
I would suggest using 'AWS Lamda at the Edge' functionality to provide the custom rewriting you want:
Using CloudFront with Lambda#Edge
Lambda#Edge is an extension of AWS Lambda, a compute service that lets you execute functions that customize the content that CloudFront delivers. Lambda#Edge scales automatically, from a few requests per day
to thousands per second. Processing requests at AWS locations closer
to the viewer instead of on origin servers significantly reduces
latency and improves the user experience.
When you associate a CloudFront distribution with a Lambda#Edge
function, CloudFront intercepts requests and responses at CloudFront
edge locations. You can execute Lambda functions when the following
CloudFront events occur:
When CloudFront receives a request from a viewer (viewer request)
Before CloudFront forwards a request to the origin (origin request)
When CloudFront receives a response from the origin (origin response)
Before CloudFront returns the response to the viewer (viewer response)
and here is an aCloudGuru blog post with lots of good examples, including one specifically about url rewriting:
https://read.acloud.guru/supercharging-a-static-site-with-lambda-edge-da5a1314238b
In a multipage web app (say 12 pages), you will want to use an automated and worry-less strategy via AWS Lamda#Edge. It solves this completely.
First, create an AWS Lambda function and then attach your CloudFront as a trigger.
In the code section of this AWS Lamda page, add the snippet in the repository below.
https://github.com/CloudUnder/lambda-edge-nice-urls/blob/master/lambdaRewrite.js
Content delivery will still be as fast as you can blink your eyes.
PS: Note the options in the readme section of the repo above