Mapping Host Header to custom header for secondary origin - amazon-web-services

I'm looking for a way to pass the requesting host header onto either the API Gateway or a custom endpoint (outside of amazon) from a cloudfront origin.
Essentially I have multiple domains mapped to a cloudfront catchall and I'm trying to pre-render based off the index request on the server while letting all other resources through.
IF this is not possible, would lambda edge be able to achieve such a thing?
Thanks!

Until such time as Lambda#Edge leaves preview, here's your workaround:
For each domain name, create a separate CloudFront distribution, and add a unique custom origin header.
If you've configured more than one CloudFront distribution to use the same origin, you can specify different custom headers for the origins in each distribution and use the logs for your web server to distinguish between the requests that CloudFront forwards for each distribution.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/forward-custom-headers.html
It should go without saying that "use the logs for your web server" is only one possible use for this value. You can also use it to identify which domain the request is for, by inspecting the inserted request header.
For example, for the site api-42.example.com, add a custom origin header X-Forwarded-Host with the static value the same as the hostname, api-42.example.com.
CloudFront adds the custom origin header to each request when sending it to the origin server.
If the client, for whatever reason, sends the same header, CloudFront discards what the client sent, before adding your header and value to each request.
Since the actual CloudFront distributions themselves are free, there's no real harm in this solution. If you need to create a lot of them, that's easily scripted with aws-cli. By default, accounts can create 200 different distributions, but you can submit a free support request to increase that limit.
You may now be contemplating the impact of this on your cache hit rate, since the different sites wouldn't share a common cache. That's a valid concern, but the impact may not be as substantial as you expect, for a variety of reasons -- not the least of which is that CloudFront's cache is not monolithic. If you have viewers hitting a single distribution but from two different parts of the world, those users are almost certainly connecting to different CloudFront edge locations, thus hitting different cache instances anyway.

Related

Is there some equivalent of x-sendfile or x-accell-redirect for S3?

I'm building an API and for some responses it will stream the content of S3 objects back to the requester. I would prefer to serve the content directly rather than redirect to send a 302 (e.g. to redirect to a cloudfront distro).
The default is that I read the file into the application and then stream it back out.
If I were using apache or nginx with a local file system I could ask the reverse proxy to stream the content directly from disk with X-Sendfile or X-Accel-Redirect.
Is there an AWS-native mechanism for doing this, so I can avoid loading the file into the application and serving back out again?
I’m not entirely sure I understand your scenario correctly, but I’m thinking in the following direction:
Generally, Cloudfront works like a reverse proxy with a cache attached. (Unlike other vendor’s products where you would “deploy on” the CDN.)
You can attach different types of origins to Cloudfront, it has native support for S3 buckets, but basically everything that speaks HTTP can be attached as a custom origin.
So, in the most trivial scenario, you would place your S3 bucket behind the Cloudfront, add an Origin Access Policy (OAI) and a bucket policy which permits the OAI to access your content.
In order to benefit from caching on the Cloudfront edge, you will need to configure it appropriately, otherwise it will just be a proxy. Make sure to set the Cloudfront TTLs for your content. Check how min/max/default TTL work.
But also don’t forget to set headers for your clients to cache (Cache-Control etc); this may save you a lot of money if the same clients need the same content over and over again.
As we know, caching and cach invalidation in particular, are tricky. Make sure to understand how Cloudfront handles caching to not run into problems. For example: cache busting with query parameters does work, but you need to make Cloudfront aware that the query sting is significant.
Now here comes the exciting part: If you need to react dynamically to the request of the client, you have Lambda#Edge and Cloudfront Functions at your disposal.
Lambda#Edge is basically what it says; Lambda functions on the edge. They can work in four modes: Client request, origin request, origin response, client response. Depends what you need to modify; incoming vs. outgoing data and client-Cloudfront vs. Cloudfront-origin communication.
CF Functions are pretty limited (ES5 only, no XHR or anything, only works on viewer request/response) but very cheap at the same time. Check the AWS docs to determine what you need.
FWIW, Cloudfront also supports signed cookies and signed URLs in case you need to restrict the content to particular viewers.

How AWS Cloudfront works for both static website and dynamic website when website is externally hosted (not hosted on AWS or S3)?

I am trying to understand how Cloudfront works. Assume static website is static.com and dynamic website is dynamic.com. static.com has thousands of html files containing img tags referencing images coming from static.com.
dynamic.com is Java based dynamically generating HTML and img tags and images comes from dynamic.com
Assume images are not manually copied to s3. No modifications are made in both sites in regards to Cloudfront other than DNS settings.
Assume Cloudfront url setup for static.com is mystaticxyzz.cloudfront.net and for dynamic.com it is mydynamicxyz.cloudfront.net
CloudFront works as a CDN sitting in front of what are called Origins.
These origins are the endpoints that CloudFront forwards traffic to, to retrieve the response and content. This could be a single server, a load balancer or any other resolvable hostname that is publicly accessible.
If you want to split between static and dynamic content you would create an origin for each type of content within the same distribution. One would be the default origin whilst the other would be matched based on a file path (/css or /images).
Each of these origins can include their own cache behaviours which enable you to define whether they should be cached and how long.
When a user accesses the CloudFront domain dependant on the path it will route to the appropriate origin or retrieve a response from the edge cache where possible.
I know this is rather late, but I am just going to add this here for those struggling to cache dynamic and static content.
Firstly, you need to understand your application your application.
Client Side Rendering
if you have a reactjs you don't need to worry too much about your caching behavior as you will be rendering , the data which will be fetched from an api client side.
none of the static files/content will be changing which are being delivered to the end user.
Since the APIs requests will be coming from a different domain , that data won't be cached by the cdn . Moreover, the data being rendered will update the html via javascript. If your javascript files are continously updating then you can use invalidations for them.
If you have content that is not stored on the origin and your CSR app is fetching the content from using a separate domain from your website domain, you will need to set up a separate cdn and point the domain name to that cdn. You wont need to make any changes to your application as the domain name stays the same for that.
However, if you static content that exists in the same origin (e.g. s3) then you would just request the content using the domain name of the cdn from which the request will come from client to cdn to origin (if not cached / expired)
lastly, assume we have separate origins like an s3 bucket for react app and s3 bucket for images . We can set up a single cdn with multiple origins . This means we can use cloudfront as an aggregator , you will then be able to cache content from different origins by using special paths.
This means , where ever you make calls to those origins previously. i.e. using the the s3 domain names, you would need to update them to that single domain name as the caching behaviors will handle the requests to the respective domains
example:
www.example.com(react-app react s3 bucket)
www.example.com/images (some s3 image bucket)
<img source={{url:"www.example.com/images/example.jpg"}} />
cloud front will make a request to that server based on that origin for the behavior configured on "/images"
Server Side Rendered
for serverside rendered apps , ideally the default cahcing behavior on the origin should allow all the different http methods , because you will have post and put http requests which you will want cloudfront to forward to the origin .
Make sure that you forward all query strings and cookies to the origin using a request policy. You can fine tune it with white listing query strings or cookies but this will make life easier. Also, the default caching behavior should use a caching policy that disables cache i.e. min,default,max ttls = 0secs . this is because the content is dynamic in nature and gets rendered on the server and not client side thus you will encounter unexpected behaviors in your application depending on how it is set up.
if you have static content on different paths like "/img", "/css" , or "/web/pages/information" cache those independently from the default behavior the respective ttls on them.
you could do some cool stuff using the cache-control header which can by pass the cache if you dont want to configure a 101 behaviors.
https://aws.amazon.com/blogs/networking-and-content-delivery/improve-your-website-performance-with-amazon-cloudfront/
Just understand your application and you will be able to leverage cdn properly
if you have a webserver that does a mixture of server side and client side rendering
just identify which paths are client-side rendered and cache those static files.
Any thing that is dynamic in nature that requires the application to make requests to the origin , make use of the caching disabled policy within a behavior.
Moreover, any of those patterns(of using a single cdn with a single/multiple origins or multiple cdns with differing origins ) mentioned earlier is applicable to serverside rendering if some content gets rendered clients side such as images

How to get past CloudFront cached redirect and hit API Gateway instead?

I have a CloudFront distribution set up so that <domain>/api redirects me to <api-gateway-url>/<env>/api. However I find that sometimes CloudFront caches responses to GET requests and the browser does not redirect to the API Gateway endpoint and returns the cached response.
Example: /api/getNumber redirects to <api-gateway-url>/<env>/api/getNumber and returns me 2. I change the response so that it should return the number 300, but when I make a request through my browser now there is no redirect and I still get back the number two. The x-cache response header says cache hit from CloudFront.
AWS CloudFront is often used for caching, thus decreasing the number of requests that will hit the back-end resources. Therefore you shouldn't use CloudFront on your testing environment if you want to imediately see changes.
In your case it seems, that your endpoint doesn't have any parameters (Path/Query), so essentially what CloudFront sees is the same request every time, naturally in this case you will hit the cache.
You have a couple of options to "fix" that:
Diversify your API requests (using parameters for example)
Use CloudFront's TTL options, to make CloudFront keep the cached objects less time
NOTE: This is not advisable if this is production environment, because it might eliminate the whole point of caching and disrupt expected behavior
Disable CloudFront's caching for those paths that don't take parameters and/or whose response will change often, thus keeping caching on for the rest of your distribution:
https://aws.amazon.com/premiumsupport/knowledge-center/prevent-cloudfront-from-caching-files/
And lastly if this is just your test environment, disable CloudFront, but the things above might later on apply to your production environment

How to pass UserAgent header to origin but not cache on it

We use CloudFront for caching user requests on CDN level.
The problem is that we want to pass User-Agent header to origin to be able to track it but we do not want to use it as caching key(there are just too many User agents and value of caching will be zero).
When using Whitelist strategy for headers AWS Cloudfront cuts all headers which are not whitelisted and are not default and doesn't send them to origin.
As we use Whitelist Forward strategy now, one of the solutions will be to switch to All strategy but we do like other headers that we are using as cache keys.
Any ideas?
I checked similar questions, but found no answer to this question by far:
Does Amazon pass custom headers to origin?
Pass cookie to CloudFront origin but prevent from caching
(similar but for cookies)

How to add basic logic in AWS S3 or cloudfront?

Let's say I have two files: one for safari and one for Firefox.
I want to check User-Agent and return file based on the User-Agent.
How do I do this without adding external server?
You can't do this without adding an extra server.
S3 supports static content. It does not¹ vary its response based on request headers.
CloudFront relies on the origin server if content needs to vary based on request headers. Note that by default, CloudFront doesn't forward most headers to the origin, but this can be changed in the cache behavior configuration. If you forward the User-Agent header to the origin, your cache hit rate drops dramatically, since CloudFront has no choice but to assume any and every change in the user agent string could trigger a change in the response, so an object in the cache that was requested by a specific user agent string will only be served to a future browser with an identical user agent string. It will cache each different copy, but this still hurts your hit rate. If you only want to know the general type of browser, CloudFront can inject special headers to tell the origin whether the user agent is desktop, smart-tv, mobile, or tablet, without actually forwarding the user agent string and causing the same negative impact on the cache hit ratio.
So CloudFront will correctly cache the appropriate version of a page for each unique user agent... but the origin server must implement the actual content selection logic. And when the origin is S3, that isn't supported -- unless you have a server between CloudFront and S3. This is a perfectly valid configuration -- I have such a setup, with a server that rewrites the request path received from CloudFront before sending the request to S3, then returns the content from S3 back to CloudFront, which returns the content to the browser.
AWS Lambda would be a potential candidate for an application like this, acting as the necessary server (a serverless server, if you will) between CloudFront and S3... but it does not yet suport binary data, so for anything other than text, that isn't an option, either.
¹At least, not in any sense that is relevant, here. Exceptions exist for CORS and when access is granted or denied based on a limited subset of request headers.