I'm looking for a service that allows me to proxy/modify incoming requests inside AWS.
Currently I am using cloudfront, but that has limited functions.
I need to be able to see user agent strings and make proxy decisions based on that - like reverse proxying to another domain, or routing all requests to /index.html.
Anyone know of a service that within AWS - or outside of AWS.
It sounds like you are describing Lambda#Edge, which is a CloudFront enhancement that allows you to define Lambda functions that will fire at any of 4 hook points in the CloudFront signal flow, and modify the request or generate a dynamic response.
Viewer Request triggers allow inspection/modification of requests and dynamic generation of small responses before the cache lookup.
Origin Request triggers are similar, but fire after the cache is checked. They allow you to inspect and modify the request, including changing the origin server, path, and/or query string, or to generate a response instead of allowing CloudFront to proceed with the connection to the origin.
If the request goes to the origin, then once it returns, an Origin Response trigger can fire to modify the response headers or replace the response body with a different body you generate. The response after this trigger is finished with it is what gets stored in the cache, if cacheable.
Once a reaponse is cached, further firing of the Origin Request and Origin Response triggers doesn't occur for subsequent requests that can be served from the cache.
Finally, when the response is ready, whether it came from the cache or the origin, a Viewer Response trigger can modify it further, if desired.
Response triggers can also inspect many of the headers from the original request.
Lambda#Edge functions are written in Node.js, and are presented with the request or responses as simple structured objects that you inspect and/or modify.
Related
I'm building an API and for some responses it will stream the content of S3 objects back to the requester. I would prefer to serve the content directly rather than redirect to send a 302 (e.g. to redirect to a cloudfront distro).
The default is that I read the file into the application and then stream it back out.
If I were using apache or nginx with a local file system I could ask the reverse proxy to stream the content directly from disk with X-Sendfile or X-Accel-Redirect.
Is there an AWS-native mechanism for doing this, so I can avoid loading the file into the application and serving back out again?
I’m not entirely sure I understand your scenario correctly, but I’m thinking in the following direction:
Generally, Cloudfront works like a reverse proxy with a cache attached. (Unlike other vendor’s products where you would “deploy on” the CDN.)
You can attach different types of origins to Cloudfront, it has native support for S3 buckets, but basically everything that speaks HTTP can be attached as a custom origin.
So, in the most trivial scenario, you would place your S3 bucket behind the Cloudfront, add an Origin Access Policy (OAI) and a bucket policy which permits the OAI to access your content.
In order to benefit from caching on the Cloudfront edge, you will need to configure it appropriately, otherwise it will just be a proxy. Make sure to set the Cloudfront TTLs for your content. Check how min/max/default TTL work.
But also don’t forget to set headers for your clients to cache (Cache-Control etc); this may save you a lot of money if the same clients need the same content over and over again.
As we know, caching and cach invalidation in particular, are tricky. Make sure to understand how Cloudfront handles caching to not run into problems. For example: cache busting with query parameters does work, but you need to make Cloudfront aware that the query sting is significant.
Now here comes the exciting part: If you need to react dynamically to the request of the client, you have Lambda#Edge and Cloudfront Functions at your disposal.
Lambda#Edge is basically what it says; Lambda functions on the edge. They can work in four modes: Client request, origin request, origin response, client response. Depends what you need to modify; incoming vs. outgoing data and client-Cloudfront vs. Cloudfront-origin communication.
CF Functions are pretty limited (ES5 only, no XHR or anything, only works on viewer request/response) but very cheap at the same time. Check the AWS docs to determine what you need.
FWIW, Cloudfront also supports signed cookies and signed URLs in case you need to restrict the content to particular viewers.
I have a CloudFront distribution set up so that <domain>/api redirects me to <api-gateway-url>/<env>/api. However I find that sometimes CloudFront caches responses to GET requests and the browser does not redirect to the API Gateway endpoint and returns the cached response.
Example: /api/getNumber redirects to <api-gateway-url>/<env>/api/getNumber and returns me 2. I change the response so that it should return the number 300, but when I make a request through my browser now there is no redirect and I still get back the number two. The x-cache response header says cache hit from CloudFront.
AWS CloudFront is often used for caching, thus decreasing the number of requests that will hit the back-end resources. Therefore you shouldn't use CloudFront on your testing environment if you want to imediately see changes.
In your case it seems, that your endpoint doesn't have any parameters (Path/Query), so essentially what CloudFront sees is the same request every time, naturally in this case you will hit the cache.
You have a couple of options to "fix" that:
Diversify your API requests (using parameters for example)
Use CloudFront's TTL options, to make CloudFront keep the cached objects less time
NOTE: This is not advisable if this is production environment, because it might eliminate the whole point of caching and disrupt expected behavior
Disable CloudFront's caching for those paths that don't take parameters and/or whose response will change often, thus keeping caching on for the rest of your distribution:
https://aws.amazon.com/premiumsupport/knowledge-center/prevent-cloudfront-from-caching-files/
And lastly if this is just your test environment, disable CloudFront, but the things above might later on apply to your production environment
(My setup: CloudFront + S3 Origin)
Hi everyone!
This is what I’m trying to do:
Step 1. Trigger a Lambda function on viewer request. Get cookie with user preferred language if available (this cookie is set when the user chooses site language).
Step 2. Trigger a Lambda function on origin response. If response is an error (ex. 404), return an error page to the viewer based on the preferred language cookie from step 1.
My question is: how do I make information gotten in step 1 available in step 2? In general, how do I process a response based on user request AND origin response information? I would appreciate any suggestion. Thank you!
You shouldn't need step 1.
Whitelist the cookie for forwarding to the origin in the cache behavior. This causes CloudFront to cache a separate copy of each page, based on the value of the cookie. You'd need this anyway if your origin is going to be able to see the cookie.
In Lambda#Edge, there are viewer-side triggers (in front of the cache) and origin-side triggers (behind the cache).
An Origin Response trigger can see the response returned from the origin, but can also see the request that was sent to the origin.
request
Origin response – The request that CloudFront forwarded to the origin and that might have been modified by the Lambda function that was triggered by an origin request event
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html#lambda-event-structure-response
There's not a straigtforward way to send information from a viewer request trigger to an origin response trigger, because they are on opposite sides of the cache and not able to communicate directly.
Your handler will be passed an event.
Everything you need is in event.Records[0].cf.
const cf = event.Records[0].cf;
The response is in cf.response and the request is in cf.request.
If the response status isn't 404, bail out of the origin response trigger and allow CloudFront to continue processing.
if(cf.response.status != "404')
{
return callback(null, cf.response);
}
Otherwise, extract the cookie from cf.request.headers.cookie (you'll need to parse this array after verifying that it exists -- it will not, if the browser didn't supply cookies), generate your custom response based on the cookie, and return it.
See Generated Responses - Examples for how to return a generated response.
Since you are generating the response in an origin response trigger, it will be stored in the cache according to the value of the Error Caching Minimum TTL (default 5 minutes).
I am looking to add the Lambda#Edge to one of our services. The goal is to regex the url for certain values and compare those against a header value to ensure authorization. If the value is present then it is compared and if rejected should return a 403 immediately to the user. If the value compared matches or the url doesn't contain a particular value, then the request continues on as an authorized request.
Initially I was thinking that this would occur with a "viewer request" event. Some of the posts and comments on SO suggest that the "origin request" is more ideal for this check. But right now I've been trying to play around with the examples in the documentation on one of our CF end points but I'm not seeing expected results. The code is the following:
'use strict';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
request.headers["edge-test"] = [{
key: 'edge-test',
value: Date.now().toString()
}];
console.log(require('util').inspect(event, { depth: null }));
callback(null, request);
};
I would expect that there should be a logged value inside cloudwatch and a new header value in the request, yet I'm not seeing any logs nor am I seeing the header value when the request comes in.
Can someone shed some light on why things don't seem to be executing as to what I would think should be the response? Is my understanding of what the expected output wrong? Is there configuration that I may be missing (My distribution ID on the trigger is set to the instance we want, and the behavior was set to '*')? Any help is appreciated :)
First, a few notes;
CloudFront is (among other things) a web cache.
A web cache's purpose is to serve content directly to the browser instead of sending the request to the origin server.
However, one of the most critical things a cache must do correctly is not return the wrong content. One of the ways a cache can return the wrong content is by not realizing that certain request headers may cause the orogin server to vary the response it returns for a given URI.
CloudFront has no perfect way of knowing this, so its solution -- by default -- is to remove almost all of the headers from the request before forwarding it to the origin. Then it caches the received response against exactly the request that it sent to the origin, and will only use that cached response for future identical requests.
Injecting a new header in a Viewer Request trigger will cause that header to be discarded after it passes through the matching Cache Behavior, unless the cache behavior specifically is configured to whitelist that header for forwarding to the origin. This is the same behavior you would see if the header had been injected by the browser, itself.
So, your solution to get this header to pass through to the origin is to whitelist it in the cache behavior settings.
If you tried this same code as an Origin Request trigger, without the header whitelisted, CloudFront would actually throw a 502 Bad Gateway error, because you're trying to inject a header that CloudFront already knows you haven't whitelisted in the matching Cache Behavior. (In Viewer Request, the Cache Behavior match hasn't yet occurred, so CloudFront can't tell if you're doing something with the headers that will not ultimately work. In Origin Request, it knows.) The flow is Viewer Request > Cache Behavior > Cache Check > (if cache miss) Origin Request > send to Origin Server. Whitelisting the header would resolve this, as well.
Any header you want the origin to see, whether it comes from the browser, or a request trigger, must be whitelisted.
Note that some headers are inaccessible or immutable, particularly those that could be used to co-opt CloudFront for fraudulent purposes (such as request forgery and spoofing) and those that simply make no sense to modify.
Let's say I have two files: one for safari and one for Firefox.
I want to check User-Agent and return file based on the User-Agent.
How do I do this without adding external server?
You can't do this without adding an extra server.
S3 supports static content. It does not¹ vary its response based on request headers.
CloudFront relies on the origin server if content needs to vary based on request headers. Note that by default, CloudFront doesn't forward most headers to the origin, but this can be changed in the cache behavior configuration. If you forward the User-Agent header to the origin, your cache hit rate drops dramatically, since CloudFront has no choice but to assume any and every change in the user agent string could trigger a change in the response, so an object in the cache that was requested by a specific user agent string will only be served to a future browser with an identical user agent string. It will cache each different copy, but this still hurts your hit rate. If you only want to know the general type of browser, CloudFront can inject special headers to tell the origin whether the user agent is desktop, smart-tv, mobile, or tablet, without actually forwarding the user agent string and causing the same negative impact on the cache hit ratio.
So CloudFront will correctly cache the appropriate version of a page for each unique user agent... but the origin server must implement the actual content selection logic. And when the origin is S3, that isn't supported -- unless you have a server between CloudFront and S3. This is a perfectly valid configuration -- I have such a setup, with a server that rewrites the request path received from CloudFront before sending the request to S3, then returns the content from S3 back to CloudFront, which returns the content to the browser.
AWS Lambda would be a potential candidate for an application like this, acting as the necessary server (a serverless server, if you will) between CloudFront and S3... but it does not yet suport binary data, so for anything other than text, that isn't an option, either.
¹At least, not in any sense that is relevant, here. Exceptions exist for CORS and when access is granted or denied based on a limited subset of request headers.