I have a URL, that is backed by cloudfront to host images. I've since then moved to a new image hosting solution, and need to redirect traffic from the old URL to the new URL, efficiently and without degrading user experience.
For example:
When my website loads an image from www.images.companyName.com/bucket/itemGroup1/itemId, I need traffic to go instead to www.someWebsite.com/xx/bucket/itemGroup1/itemId (notice the path has also slightly changed)
How can this be done leveraging AWS, if it cannot, what other options do I have. I've been thinking Lambda#Edge, but I am not sure if this is efficient.
If this is a one time Migration where you are expecting to achieve zero down time for the end users, you can set up a syncing mechanism from old bucket to new (keeping the old files also for the moment) and update the Cloudfront distribution with both Origins and Behaviors as necessary. This is assuming some clients will have the old paths and some will access the latest. Then later analyzing the Cloudfront logs you can decide to remove old Origin and Behaviors.
If you are planning for an URL rewrite, then you need to use Lambda#Edge for this, specially if there is no way of informing the end users that the URL is updated.
Related
I'm building an API and for some responses it will stream the content of S3 objects back to the requester. I would prefer to serve the content directly rather than redirect to send a 302 (e.g. to redirect to a cloudfront distro).
The default is that I read the file into the application and then stream it back out.
If I were using apache or nginx with a local file system I could ask the reverse proxy to stream the content directly from disk with X-Sendfile or X-Accel-Redirect.
Is there an AWS-native mechanism for doing this, so I can avoid loading the file into the application and serving back out again?
I’m not entirely sure I understand your scenario correctly, but I’m thinking in the following direction:
Generally, Cloudfront works like a reverse proxy with a cache attached. (Unlike other vendor’s products where you would “deploy on” the CDN.)
You can attach different types of origins to Cloudfront, it has native support for S3 buckets, but basically everything that speaks HTTP can be attached as a custom origin.
So, in the most trivial scenario, you would place your S3 bucket behind the Cloudfront, add an Origin Access Policy (OAI) and a bucket policy which permits the OAI to access your content.
In order to benefit from caching on the Cloudfront edge, you will need to configure it appropriately, otherwise it will just be a proxy. Make sure to set the Cloudfront TTLs for your content. Check how min/max/default TTL work.
But also don’t forget to set headers for your clients to cache (Cache-Control etc); this may save you a lot of money if the same clients need the same content over and over again.
As we know, caching and cach invalidation in particular, are tricky. Make sure to understand how Cloudfront handles caching to not run into problems. For example: cache busting with query parameters does work, but you need to make Cloudfront aware that the query sting is significant.
Now here comes the exciting part: If you need to react dynamically to the request of the client, you have Lambda#Edge and Cloudfront Functions at your disposal.
Lambda#Edge is basically what it says; Lambda functions on the edge. They can work in four modes: Client request, origin request, origin response, client response. Depends what you need to modify; incoming vs. outgoing data and client-Cloudfront vs. Cloudfront-origin communication.
CF Functions are pretty limited (ES5 only, no XHR or anything, only works on viewer request/response) but very cheap at the same time. Check the AWS docs to determine what you need.
FWIW, Cloudfront also supports signed cookies and signed URLs in case you need to restrict the content to particular viewers.
NOTE: I'm providing details of my setup, but really this is a "how is this possible" question, not a "please debug my setup" question.
I have a "singe page application" (ie. an HTML file that uses the History API to simulate URLs). I'm serving this app on AWS S3, behind an AWS Cloudfront ... front.
I had successfully configured things so that if someone went to www.example.com/foo (let's pretend I own example.com), Cloudfront would serve an "error page" of my index.html. My index.html would then see the URL, and use its routing to show the user the correct page.
That all worked great ... until it didn't. Now for some reason when I go to www.example.com/foo, I get redirected to www.example.com. I'm trying to debug things, but what I can't understand is how I'm going from /foo to the main page.
When I look in the Network panel of my developer tools, I can see the request made to the original (/foo). Then I can see the chain of requests (for images, css files, etc.), and they all have a referrer of www.example.com/foo.
Then all of the sudden I see a request for React Developer tools (why it needs to make a request is beyond me) ... and it's from referrer www.example.com. After that I get one last image request from /foo, and then all subsequent requests come from www.example.com.
Can anyone explain how this could be working? I know that if a server returns a redirect (either type) that could change my URL ... but every request has a 200 status (ie. no server redirects).
I know Javascript could "push" a new URL to my browser ... but that would leave a history entry right? When I go "back" (either with my browser or history.back()) I go to the page before; I don't go "back" to /foo.
So somehow I'm not making a history entry, but I am switching my URL, and the URL I make requests from, and this all happens within milliseconds on page load ... without any redirects. How?
P.S. When I use my dev tools to add an beforeunload breakpoint, then try to navigate from example.com to example.com/foo I don't hit that break point (either for going to /foo, or when I'm "redirected" back to example.com).
When I check the box for any Load event, I do see some happen ... after my URL has already switched. In other words, I type example.com/foo, hit enter, and by the time any event fires I'm back on example.com. Whatever mechanism is doing the "redirection" here ... it doesn't trigger any load events.
I figured out my (AWS-specific) problem, thanks to a bit of Gatsby documentation. I'll include the details below in case it helps others, but I won't accept this answer, as I still don't understand how AWS did what it did (and I'd still welcome an answer for that).
What happened was that I had my Cloudfront "Origin Domain Name and Path" pointing to:
example.com.s3.amazonaws.com
However, as explained on https://www.gatsbyjs.com/docs/deploying-to-s3-cloudfront/:
There are two ways that you can connect CloudFront to an S3 origin. The most obvious way, which the AWS Console will suggest, is to type the bucket name in the Origin Domain Name field. This sets up an S3 origin, and allows you to configure CloudFront to use IAM to access your bucket. Unfortunately, it also makes it impossible to perform serverside (301/302) redirects, and it also means that directory indexes (having index.html be served when someone tries to access a directory) will only work in the root directory. You might not initially notice these issues, because Gatsby’s clientside JavaScript compensates for the latter and plugins such as gatsby-plugin-meta-redirect can compensate for the former. But just because you can’t see these issues, doesn’t mean they won’t affect search engines.
In order for all the features of your site to work correctly, you must instead use your S3 bucket’s Static Website Hosting Endpoint as the CloudFront origin. This does (sadly) mean that your bucket will have to be configured for public-read, because when CloudFront is using an S3 Static Website Hosting Endpoint address as the Origin, it’s incapable of authenticating via IAM.
Once I changed my Cloudfront "Origin Domain Name and Path" to the bucket's static hosting URL:
http://example.com.s3-website-us-west-1.amazonaws.com
Everything worked!
But again, I still don't understand how AWS did what it did when I mis-set my "Origin Domain Name and Path". It redirected me to my root domain, seemingly without either a redirect response OR a client-side redirect, and I'd love to hear how that was accomplished.
I am trying to understand how Cloudfront works. Assume static website is static.com and dynamic website is dynamic.com. static.com has thousands of html files containing img tags referencing images coming from static.com.
dynamic.com is Java based dynamically generating HTML and img tags and images comes from dynamic.com
Assume images are not manually copied to s3. No modifications are made in both sites in regards to Cloudfront other than DNS settings.
Assume Cloudfront url setup for static.com is mystaticxyzz.cloudfront.net and for dynamic.com it is mydynamicxyz.cloudfront.net
CloudFront works as a CDN sitting in front of what are called Origins.
These origins are the endpoints that CloudFront forwards traffic to, to retrieve the response and content. This could be a single server, a load balancer or any other resolvable hostname that is publicly accessible.
If you want to split between static and dynamic content you would create an origin for each type of content within the same distribution. One would be the default origin whilst the other would be matched based on a file path (/css or /images).
Each of these origins can include their own cache behaviours which enable you to define whether they should be cached and how long.
When a user accesses the CloudFront domain dependant on the path it will route to the appropriate origin or retrieve a response from the edge cache where possible.
I know this is rather late, but I am just going to add this here for those struggling to cache dynamic and static content.
Firstly, you need to understand your application your application.
Client Side Rendering
if you have a reactjs you don't need to worry too much about your caching behavior as you will be rendering , the data which will be fetched from an api client side.
none of the static files/content will be changing which are being delivered to the end user.
Since the APIs requests will be coming from a different domain , that data won't be cached by the cdn . Moreover, the data being rendered will update the html via javascript. If your javascript files are continously updating then you can use invalidations for them.
If you have content that is not stored on the origin and your CSR app is fetching the content from using a separate domain from your website domain, you will need to set up a separate cdn and point the domain name to that cdn. You wont need to make any changes to your application as the domain name stays the same for that.
However, if you static content that exists in the same origin (e.g. s3) then you would just request the content using the domain name of the cdn from which the request will come from client to cdn to origin (if not cached / expired)
lastly, assume we have separate origins like an s3 bucket for react app and s3 bucket for images . We can set up a single cdn with multiple origins . This means we can use cloudfront as an aggregator , you will then be able to cache content from different origins by using special paths.
This means , where ever you make calls to those origins previously. i.e. using the the s3 domain names, you would need to update them to that single domain name as the caching behaviors will handle the requests to the respective domains
example:
www.example.com(react-app react s3 bucket)
www.example.com/images (some s3 image bucket)
<img source={{url:"www.example.com/images/example.jpg"}} />
cloud front will make a request to that server based on that origin for the behavior configured on "/images"
Server Side Rendered
for serverside rendered apps , ideally the default cahcing behavior on the origin should allow all the different http methods , because you will have post and put http requests which you will want cloudfront to forward to the origin .
Make sure that you forward all query strings and cookies to the origin using a request policy. You can fine tune it with white listing query strings or cookies but this will make life easier. Also, the default caching behavior should use a caching policy that disables cache i.e. min,default,max ttls = 0secs . this is because the content is dynamic in nature and gets rendered on the server and not client side thus you will encounter unexpected behaviors in your application depending on how it is set up.
if you have static content on different paths like "/img", "/css" , or "/web/pages/information" cache those independently from the default behavior the respective ttls on them.
you could do some cool stuff using the cache-control header which can by pass the cache if you dont want to configure a 101 behaviors.
https://aws.amazon.com/blogs/networking-and-content-delivery/improve-your-website-performance-with-amazon-cloudfront/
Just understand your application and you will be able to leverage cdn properly
if you have a webserver that does a mixture of server side and client side rendering
just identify which paths are client-side rendered and cache those static files.
Any thing that is dynamic in nature that requires the application to make requests to the origin , make use of the caching disabled policy within a behavior.
Moreover, any of those patterns(of using a single cdn with a single/multiple origins or multiple cdns with differing origins ) mentioned earlier is applicable to serverside rendering if some content gets rendered clients side such as images
I have a legacy site and a new site, and I'm using Cloudfront to route traffic to the two different server groups based on URL path (eg. /new goes to the new servers, everything else to the old ones).
In AWS Cloudfront, my Default (*) path pattern captures all traffic not caught by the other rules, and routes these requests to the legacy site. This site explicitly prevents caching with its headers and I want to override this.
It looks from the AWS Console, though, like I can't do this:
All the other cache behaviours (eg. for the /new path pattern) allow me to set these options. Does this mean that Cloudfront doesn't allow customisation of TTLs for the default path? If not, the only way I can fix these cache headers will be to manually change them at the source origin, which I'd prefer to avoid.
Is there a way I can change these settings for my default route?
I'm looking for a way to pass the requesting host header onto either the API Gateway or a custom endpoint (outside of amazon) from a cloudfront origin.
Essentially I have multiple domains mapped to a cloudfront catchall and I'm trying to pre-render based off the index request on the server while letting all other resources through.
IF this is not possible, would lambda edge be able to achieve such a thing?
Thanks!
Until such time as Lambda#Edge leaves preview, here's your workaround:
For each domain name, create a separate CloudFront distribution, and add a unique custom origin header.
If you've configured more than one CloudFront distribution to use the same origin, you can specify different custom headers for the origins in each distribution and use the logs for your web server to distinguish between the requests that CloudFront forwards for each distribution.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/forward-custom-headers.html
It should go without saying that "use the logs for your web server" is only one possible use for this value. You can also use it to identify which domain the request is for, by inspecting the inserted request header.
For example, for the site api-42.example.com, add a custom origin header X-Forwarded-Host with the static value the same as the hostname, api-42.example.com.
CloudFront adds the custom origin header to each request when sending it to the origin server.
If the client, for whatever reason, sends the same header, CloudFront discards what the client sent, before adding your header and value to each request.
Since the actual CloudFront distributions themselves are free, there's no real harm in this solution. If you need to create a lot of them, that's easily scripted with aws-cli. By default, accounts can create 200 different distributions, but you can submit a free support request to increase that limit.
You may now be contemplating the impact of this on your cache hit rate, since the different sites wouldn't share a common cache. That's a valid concern, but the impact may not be as substantial as you expect, for a variety of reasons -- not the least of which is that CloudFront's cache is not monolithic. If you have viewers hitting a single distribution but from two different parts of the world, those users are almost certainly connecting to different CloudFront edge locations, thus hitting different cache instances anyway.