Invalidate Cloudfront's cached data by passing in custom header

Invalidate Cloudfront's cached data by passing in custom header - amazon-web-services

I need some resources or general direction.
I am looking into using Cloudfront to help combat latency on calls to my service.
I want to be able to serve cached data, but need to allow the client to be able to specify when they want to bypass cached data and get the latest data instead.
I know that I can send a random value in the query parameter to invalidate the cache. But I want to be able to send a custom header that will do the same thing.
Ideally, I would like to use the Cloudfront that is created behind the scenes with API Gateway. Is this possible? Or would I need to create a new CloudFront to sit in front of API Gateway?
Has anyone done this? Are there any resources you can point me to?

You cannot actually invalidate the CloudFront cache by passing a specific header -- or with a query parameter, for that matter. That is cache busting, and not invalidation.
You can configure CloudFront to include the value of a specific header in the cache key, simply by whitelisting that header for forwarding to the origin -- even if the origin ignores it.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders
However... the need to give your APIs consumers a way to bypass your cache seems like there's a problem with your design. Use an adaptive Cache-Control response header and cache the responses in CloudFront for an appropriate amount of time, and this issue goes away.
Otherwise, the clever ones will just bypass it all the time, by continually changing that value.

CloudFront does caches based on headers.
Create a custom header and whitelist on that header.
CloudFront will fetch from origin if the value is not found in the cache.
Hope it helps.
EDIT:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html
Header based caching.

Related

submit PUT request through CloudFront

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.

I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

CloudFlare or AWS CDN links

I have a script that I install on a page and it will load some more JS and CSS from an S3 bucket.
I have versions, so when I do a release on Github for say 1.1.9 it will get deployed to /my-bucket/1.1.9/ on S3.
Question, if I want to have something like a symbolic link /my-bucket/v1 -> /my-bucket/1.1.9, how can I achieve this with AWS or CloudFlare?
The idea is that I want to release a new version by deploying it, to my bucket or whatever CDN, and than when I am ready I want to switch v1 to the latest 1.x.y version released. I want all websites to point to /v1 and get the latest when there is new release.
Is there a CDN or AWS service or configuration that will allow me to create a sort of a linux-like symbolic link like that?

A simple solution with CloudFront requires a slight change in your path design:
Bucket:
/1.1.9/v1/foo
Browser:
/v1/foo
CloudFront Origin Path (on the Origin tab)
/1.1.9
Whatever you configure as the Origin Path is added to the beginning of whatever the browser requested before sending the request to the Origin server.
Note that changing this means you also need to do a cache invalidation, because responses are cached based on what was requested, not what was fetched.
There is a potential race condition here, between the time you change the config and invalidate -- there is no correlation in the order of operations between configuration changes and invalidation requests -- a config change followed by an invalidation may be completed after,¹ so will probably need to invalidate, update config, invalidate, verify that the distribution had progressed a stable state, then invalidate once more. You don't need to invalidate objects individually, just /* or /v1*. It would be best if only the resource directly requested is subject to the rewrite, and not it's dependencies. Remember, also, that browser caching is a big cost-saver that you can't leverage as fully if you use the same request URI to represent a different object over time.
More complicated path rewriting in CloudFront requires a Lambda#Edge Origin Request trigger (or you could use Viewer Request, but these run more often and thus cost more and add to overall latency).
¹ Invalidation requests -- though this is not documented and is strictly anecdotal -- appear to involve a bit of time travel. Invalidations are timestamped, and it appears that they invalidate anything cached before their timestamp, rather than before the time they propagate to the edge locations. Architecturally, it would make sense if CloudFront is designed such that invalidations don't actively purge content, but only serve as directives for the cache to consider any cached object as stale if it pre-dates the timestamp on the invalidation request, allowing the actual purge to take place in the background. Invalidations seem to complete too rapidly for any other explanation. This means creating an invalidation request after the distribution returns to the stable Deployed state would assure that everything old is really purged, and that another invalidation request when the change is initially submitted would catch most of the stragglers that might be served from cache before the change is propagated. Changes and invalidations do appear to propagate to the edges via independent pipelines, based on observed completion timing.

Can a distribution automatically match the subdomain from a request to figure out the origin

We're adding a lot of nearly equivalent apps on the same domain, each app can be accessed through its specific subdomain. Each app has got specific assets (not a lot).
Every app refer to the same cdn.mydomain.com to get the assets from cloudfront.
Assets are named spaced. For exemple:
app1:
Can be reached from app1.mydomain.com
assets url is cdn.mydomain.com/assets/app1
cloudfront orgin app1.mydomain.com
cache behavior /assets/app1/* to origin app1.mydomain.com
When Cloudfront doesn't have the assets in cache, it downloads it from the right origin.
Actually we're making a new origin and cache behavior on the same distribution each time we add a new app.
We're trying to simplify that process so Cloudfront can be able to get the assets from the right origin without having to specify it. And this will resolve the problem if we hit the limit of the number of origin in one distribution.
How can we do this and is it possible?
We're thinking of making an origin of mydomain.com with a cache configure to forward the host header but we're not sure that this will do the trick.

Origins are tied to Cache Behaviors, which are tied to path patterns. You can't really do what you're thinking about doing.
I would suggest that you should create a distribution for each app and each subdomain. It's very easy to script this using aws-cli, since once you have one set up the way you like it, you can use its configuration output as a template to make more, with minimal changes. (I use a Perl script to build the final JSON to create each distribution, with minimal inputs like alternate domain name and certificate ARN and pipe its output into aws-cli.)
I believe this is the right approach, because:
CloudFront cannot select the origin based on the Host header. Only the path pattern is used to select the origin.
Lambda#Edge can rewrite the path and can inspect the Host header, but it cannot rewrite the path before the matching is done that selects the Cache Behavior (and thus the origin). You cannot use Lambda#Edge to cause CloudFront to switch or select origins, unless you generatre browser redirects, which you probably don't want to do, for performance reasons. I've submitted a feature request to allow a Lambda trigger to signal CloudFront that it should return to the beginning of processing and re-evaluate the path, but I don't know if it is being considered as a future feature -- AWS tends to keep their plans for future functionality close to the vest, and understandably so.
you don't gain any efficiency or cost savings by combining your sites in a single distribution, since the resources are different
if you decide to whitelist the Host header, that means CloudFront will cache responses, separately, based on the Host header, the same as it would do if you had created multiple distributions. Even if the path is identical, it will still cache separate responses if the Host header differs, as it must to ensure sensible behavior
the default limit for distributions is 200, while the limit for origins and cache behaviors is 25. Both can be raised by request, but the number of distributions they can give you is unlimited, while the other resources are finite because they increase the workload on the system for each request and would eventually have a negative performance impact
separate distributions gives you separate logs and reports
provisioning errors have a smaller blast radius when each app has its own distribution
You can also go into Amazon Certificate Manager and a wildcard certificate for * *.cdn.example.com. Then use e.g. app1.cdn.example.com as the alternate domain name for the app1 distribution and attach the wildcard cert. Then reuse the same cert on the app2.cdn.app.com distribution, etc.
Note that you also have an easy migration strategy from your current solution: You can create a single distribution with *.cdn.example.com as its alternate domain name. Code the apps to use their own unique-name-here.cdn.example.com. Point all the DNS records here. Later, when you create a distribution with a specific alternate domain name foo.cdn.example.com, CloudFront will automatically stop routing those requests to the wildcard distribution and start routing them to the one with the specific domain. You will need to change the DNS entry... but CloudFront will actually handle the requests correctly, routing them to the newly-created distribution, before you change the DNS, because it has some internal magic that will match the non-wildcard hostname to the correct distribution regardless of whether the browser connects to the new endpoint or the old... so the migration event should pretty much be a non-event.
I'd suggest the wildcard strategy is a good one, anyway, so that your apps are each connecting to a specific endpoint hostname, allowing you much more flexibility in the future.

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.

I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!

I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.

#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.

Is it possible to set Amazon CloudFront cache to far-future expiry?

How do I change the cache expiry in CloudFront on AWS? I can't see a way to do it and I think I saw an old post of a few years ago on here was somebody said it couldn't be done.
I've gone through every option in S3 and CloudFront and every option on the outer folder and on the file, but nothing.
Can it be done now, or is there any alternative? I really want to set the cache to 6 months or a year if I can.
AWS is hard work.

You can, but its not exactly obvious how this works.
You can store custom http headers with your s3 objects. If you look at the console, this is under the metadata section for an object. With this you can set a far future expires header.
Cloudfront will take the existing headers and pass them on. If cloudfront is already caching the object, you will need to invalidate it to see the headers after you set them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js