(My setup: CloudFront + S3 Origin)
Hi everyone!
This is what I’m trying to do:
Step 1. Trigger a Lambda function on viewer request. Get cookie with user preferred language if available (this cookie is set when the user chooses site language).
Step 2. Trigger a Lambda function on origin response. If response is an error (ex. 404), return an error page to the viewer based on the preferred language cookie from step 1.
My question is: how do I make information gotten in step 1 available in step 2? In general, how do I process a response based on user request AND origin response information? I would appreciate any suggestion. Thank you!
You shouldn't need step 1.
Whitelist the cookie for forwarding to the origin in the cache behavior. This causes CloudFront to cache a separate copy of each page, based on the value of the cookie. You'd need this anyway if your origin is going to be able to see the cookie.
In Lambda#Edge, there are viewer-side triggers (in front of the cache) and origin-side triggers (behind the cache).
An Origin Response trigger can see the response returned from the origin, but can also see the request that was sent to the origin.
request
Origin response – The request that CloudFront forwarded to the origin and that might have been modified by the Lambda function that was triggered by an origin request event
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-event-structure.html#lambda-event-structure-response
There's not a straigtforward way to send information from a viewer request trigger to an origin response trigger, because they are on opposite sides of the cache and not able to communicate directly.
Your handler will be passed an event.
Everything you need is in event.Records[0].cf.
const cf = event.Records[0].cf;
The response is in cf.response and the request is in cf.request.
If the response status isn't 404, bail out of the origin response trigger and allow CloudFront to continue processing.
if(cf.response.status != "404')
{
return callback(null, cf.response);
}
Otherwise, extract the cookie from cf.request.headers.cookie (you'll need to parse this array after verifying that it exists -- it will not, if the browser didn't supply cookies), generate your custom response based on the cookie, and return it.
See Generated Responses - Examples for how to return a generated response.
Since you are generating the response in an origin response trigger, it will be stored in the cache according to the value of the Error Caching Minimum TTL (default 5 minutes).
Related
I am doing something probably unusual and ill-advised to overcome a limitation with request and response behavior.
I am having an Origin Request Lambda call back to the initial URL via https.get, with a parameter passed in the header. This will cause a secondary behavior for the same URL request, allowing me to mutate the response in the original Origin Request Lambda before returning a custom response.
The long version:
Function 1 of Viewer Request Lambda fires when there is not the custom property my-uuid in the header. It will create the UUID, set that UUID in the my-uuid property on the header, and then fire the callback with the updated header.
Function 1 of the Origin Request Lambda fires where my-uuid header is present. Cloudfront is configured to cache based on this header alone, so that the generated UUID will always trigger Function 1 of the Origin Request Lambda. Function 1 makes an https.get call to the URL called in the original request, but passed along the my-uuid header.
Function 2 of the Viewer Request Lambda fires based on the presence of the my-uuid header in this second run. This simply strips the my-uuid header and fires the callback sans my-uuid header property.
This page has been called before and is in the Cloudfront cache. As the request does not have the my-uuid header property, there is no cache-busting, and the cached page is returned to Function 1 of the Origin Request Lambda. OR:
This page has not yet been cached, so Function 2 of the Origin Request Lambda is invoked. In the absence of the my-uuid header property, it simply fires the callback with the request as-is.
Either way, Function 1 of the Origin Request Lambda receives the HTML from the https.get call, and uses this to create a custom response object with the body of the desired page but also the set-cookie header containing the UUID I generated in the initial Viewer Request Lambda. This custom response object is passed into the callback.
While on that path, the solution I crafted brought me to another issue:
Steps 3 and 4.2 (the Function 2 of either Request Lambda) are not logging at all when I call my endpoint via Postman. I have a plethora of console logs to track what's happening internally. However, the response has any headers I try to set in the final response (except, annoyingly, the set-cookie header which appears to simply disappear and is why I need the logging to work).
If I set the my-uuid header on my Postman request to trigger the Function 2 behavior, I do see those in the log.
I just came across the same issue today and found this SO question.
I was able to make the cookies set by origin request Lambda work by changing the "Forward Cookies" chache behavior setting from "None (Improves Caching)" to "Forward all, cache based on all"
Update: Collected my thoughts better
I'm generating a unique identifier (UUID) for each user in the Viewer Request Lambda, and then selecting a cached page to return based upon that UUID. This works.
Ideally, this user would always have the same UUID.
I must generate that UUID in the Viewer Request if it is not present in a cookie on that Viewer Request. I also need that UUID to be set as a cookie, which of course happens in the response not the request.
Without caching, my server simply handles taking a custom header and creating a Set-Cookie in the response header.
I am not finding a way to handle this if I want to cache the page. I can ignore the request header for caching and serve the correct cached page, but then the user does not persist that UUID as no cookie is set to be utilized in their next request.
Has anyone accomplished something like this?
Things I'm trying
There are a few angles I'm working on with this, but haven't been able to get to work yet:
Some sort of setting in Cloudfront I'm unaware of that handles the header or other data pass-through from Viewer Request to Viewer Response, which could be used in a second lambda in Cloudfront.
Modify the response object headers preemptively in the Viewer Request. I don't think this is possible, as they return headers are not yet created, unless there's some built-in Cloudfront methodology I'm missing.
An existing pass-through header of some sort, I don't know if that's even a thing since I'm not intimately familiar with this aspect of request-response handling, but worth a shot.
Possibly (haven't tried yet though) I could create the entire response object in the Client Request lambda and somehow serve the cached page from there, modifying the response headers then passing it into the callback method.
Tobin's answer actually works, but is not a solid solution. If the user is not storing or serving their cookies it becomes an infinite loop, plus I'd rather not throw a redirect up in front of all of my pages if I can avoid it
Somewhat-working concept
Viewer Request Lambda, when UUID not present in cookies, generates UUID
Viewer Request Lambda sets UUID in cookies on header in request object. Callback with updated request object passed in
Presence of UUID cookie busts Cloudfront cache
Origin Request Lambda is triggered with UUID present
Origin Request Lambda calls original request URL again via http.get with UUID cookie set (40KB limit makes doing this in the Viewer Request Lambda impractical)
Second scenario for Viewer Request Lambda, seeing UUID now present, strips the UUID cookie then continues the request normally
Second Origin Request if not yet cached - Cached response if cached, as cache-busting UUID is not present - returns actual page HTML to First Origin Request
First Origin Request receives response from http.get containing HTML
First Origin Request creates custom response object containing response body from http.get and Set-Cookie header set with our original UUID
Subsequent calls, having the UUID already set, will strip the UUID from the cookie (to prevent cache busting) and skip directly to the second-scenario in the Viewer Request Lambda which will directly load the cached version of the page.
I say "somewhat" because when I try to hit my endpoint, I get a binary file downloaded.
EDIT
This is because I was not setting the content-type header. I now have only a 302 redirect problem... if I overcome this I'll post a full answer.
Original question
I have a function on the Viewer Request that picks an option and sets some things in the request before it's retrieved from the cache or server.
That works, but I want it to remember that choice for future users. The thought is to simply set a cookie I can read the next time that user comes through. As this is on the Viewer Request and not the Viewer Response I haven't figured out how to make that happen, or if it even is possible via the Lambda itself.
Viewer Request ->
Lambda picks options (needs to set cookie) ->
gets corresponding content ->
returns to Viewer with set-cookie header intact
I have seen the examples and been able to set cookies successfully in the Viewer Response via a Lambda. That doesn't help me much as the decision needs to be made on the request. Quite unsurprisingly adding this code into the Viewer Request shows nothing in the response.
I would argue that the really correct way to set a nonexistent cookie would be to return a 302 redirect to the same URI with Set-Cookie, and let the browser redo the request. This probably would not have much of an impact since the browser can reuse the same connection to "follow" the redirect.
But if you insist on not doing it that way, then you can inject the cookie into the request with your Viewer Request trigger and then emit a Set-Cookie with the same value in your Viewer Response trigger.
The request object, in a viewer response event, can be found at the same place where it's found in the original request event, event.Records[0].cf.request.
In a viewer-response trigger, this part of the structure contains the "request that CloudFront received from the viewer and that might have been modified by the Lambda function that was triggered by a viewer request event."
Use caution to ensure that you handle the cookie header correctly. The Cookie request header requires careful and accurate manipulation because the browser can use multiple formats when multiple cookies exist.
Once upon a time, cookies were required to be sent as a single request header.
Cookie: foo=bar; buzz=fizz
Parse these by splitting the values on ; followed by <space>.
But the browser may also split them with multiple headers, like this:
Cookie: foo=bar
Cookie: buzz=fizz
In the latter case, the array event.Records[0].cf.request.headers.cookie will contain multiple members. You need to examine the value attribute of each object in that array, check for multiple values within each, as well as accommodating the fact that the array will be completely undefined (not empty) if no cookies exist.
Bonus: Here's a function I wrote, that I believe correctly handles all cases including the case where there are no cookies. It will extract the cookie with the name you are looking for. Cookie names are case-sensitive.
// extract a cookie value from request headers, by cookie name
// const my_cookie_value = extract_cookie(event.Records[0].cf.request.headers,'MYCOOKIENAME');
// returns null if the cookie can't be found
// https://stackoverflow.com/a/55436033/1695906
function extract_cookie(headers, cname) {
const cookies = headers['cookie'];
if(!cookies)
{
console.log("extract_cookie(): no 'Cookie:' headers in request");
return null;
}
// iterate through each Cookie header in the request, last to first
for (var n = cookies.length; n--;)
{
// examine all values within each header value, last to first
const cval = cookies[n].value.split(/;\ /);
const vlen = cval.length;
for (var m = vlen; m--;)
{
const cookie_kv = cval[m].split('=');
if(cookie_kv[0] === cname)
{
return cookie_kv[1];
}
} // for m (each value)
} // for n (each header)
// we have no match if we reach this point
console.log('extract_cookie(): cookies were found, but the specified cookie is absent');
return null;
}
Are you able to add another directory: with the first cookie setter request, return (from the lambda) a redirect which includes the cookie-set header, that redirects to your actual content?
OK, long way round but:
Take cookie instruction from the incoming request
Set this somewhere (cache, etc)
Let the request get your object
on the Response, also call a function that reads the (cache) and sets the set-cookie header on the response if needed?
It's been more than one year since the question was published. I hope you found a solution and you can share it with us!
I am facing the same problem and I've thinking also about the infinite loop... What about this?
The viewer request event sends back a 302 response with the cookie set, e.g. uuid=whatever and a GET parameter added to the URL in the Location header, e.g. _uuid_set_=1.
In the next viewer request where the GET parameter _uuid_set_ is set (and equals 1, but this is not needed), there will be two options:
Either the cookie uuid is not set, in which case you can send back a response 500 to break the loop, or whatever fits your needs,
or the cookie is set, in which case you send another 302 back with the parameter _uuid_set_ removed, so that it is never seen by the end user and cannot be copy-pasted and shared and we all can sleep at night.
I currently have two Lambda#Edge functions:
language-redirect: invoked on viewer request, returns a 302 or passes the request on to CloudFront
HSTS: invoked on viewer response, adds response header
The current flow is:
viewer request -> language-redirect
if 302 -> viewer response
if not 302 -> pass on to CloudFront -> HSTS -> viewer response
Would it be possible to combine both in one function (combined) that is invoked only a single time per viewer request?
viewer request -> combined
if 302 -> viewer response
if not 302 -> pass on to CloudFront -> combined -> viewer response
The goal is to have the same function invoked once, not having the same function invoked twice.
There are 4 distinct trigger events in CloudFront's Lambda#Edge enhancement. Their interaction with the cache is in bold, and becomes important, later:
Viewer Request - fires on every request, before the cache is checked on request arrival; a spontaneously-generated response is not cached
Origin Request - fires only on cache miss, before request goes to the origin; if this trigger generates a response, no HTTP request is sent to the origin, and the response will be stored in the cache, if cacheable
Origin Response - fires only on cache miss, when response returns from origin, before response is checked for cacheability and stored in the cache; if this trigger modifies the response, the modified response is what is stored in the cache, if cacheable
Viewer Response - fires immediately before response is returned to viewer, whether cache hit or miss; any modifications to the response made by this trigger are not cached
One Lambda function, correctly written to understand where it has been fired within the transaction cycle, can be triggered at any combination of these points -- but since these events all occur at different times, it isn't possible to handle more than a single event with one invocation of the trigger function.
However, note the points in bold, above. You can substantially reduce the number of trigger invocations in many cases by using the origin-side triggers. As noted above, using these triggers results in the response of the triggers being cacheable -- so when your redirect trigger fires, if it generates a redirect, the redirect can be cached, and the next request doesn't need to invoke the trigger at all. Similarly, adding your HSTS header to a cacheable response in an origin response trigger means that future cache hits will return the modified response with HSTS headers, without the trigger firing.
Update: in 2021, CloudFront (finally!) launched a new feature called response header policies, which allows you to configure static HTTP response headers as part of the CloudFront distribution. This allows adding of static response headers, such as for HSTS (as mentioned in the original question), without using Lambda#Edge or CloudFront Functions.
I'm looking for a service that allows me to proxy/modify incoming requests inside AWS.
Currently I am using cloudfront, but that has limited functions.
I need to be able to see user agent strings and make proxy decisions based on that - like reverse proxying to another domain, or routing all requests to /index.html.
Anyone know of a service that within AWS - or outside of AWS.
It sounds like you are describing Lambda#Edge, which is a CloudFront enhancement that allows you to define Lambda functions that will fire at any of 4 hook points in the CloudFront signal flow, and modify the request or generate a dynamic response.
Viewer Request triggers allow inspection/modification of requests and dynamic generation of small responses before the cache lookup.
Origin Request triggers are similar, but fire after the cache is checked. They allow you to inspect and modify the request, including changing the origin server, path, and/or query string, or to generate a response instead of allowing CloudFront to proceed with the connection to the origin.
If the request goes to the origin, then once it returns, an Origin Response trigger can fire to modify the response headers or replace the response body with a different body you generate. The response after this trigger is finished with it is what gets stored in the cache, if cacheable.
Once a reaponse is cached, further firing of the Origin Request and Origin Response triggers doesn't occur for subsequent requests that can be served from the cache.
Finally, when the response is ready, whether it came from the cache or the origin, a Viewer Response trigger can modify it further, if desired.
Response triggers can also inspect many of the headers from the original request.
Lambda#Edge functions are written in Node.js, and are presented with the request or responses as simple structured objects that you inspect and/or modify.
I am looking to add the Lambda#Edge to one of our services. The goal is to regex the url for certain values and compare those against a header value to ensure authorization. If the value is present then it is compared and if rejected should return a 403 immediately to the user. If the value compared matches or the url doesn't contain a particular value, then the request continues on as an authorized request.
Initially I was thinking that this would occur with a "viewer request" event. Some of the posts and comments on SO suggest that the "origin request" is more ideal for this check. But right now I've been trying to play around with the examples in the documentation on one of our CF end points but I'm not seeing expected results. The code is the following:
'use strict';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
request.headers["edge-test"] = [{
key: 'edge-test',
value: Date.now().toString()
}];
console.log(require('util').inspect(event, { depth: null }));
callback(null, request);
};
I would expect that there should be a logged value inside cloudwatch and a new header value in the request, yet I'm not seeing any logs nor am I seeing the header value when the request comes in.
Can someone shed some light on why things don't seem to be executing as to what I would think should be the response? Is my understanding of what the expected output wrong? Is there configuration that I may be missing (My distribution ID on the trigger is set to the instance we want, and the behavior was set to '*')? Any help is appreciated :)
First, a few notes;
CloudFront is (among other things) a web cache.
A web cache's purpose is to serve content directly to the browser instead of sending the request to the origin server.
However, one of the most critical things a cache must do correctly is not return the wrong content. One of the ways a cache can return the wrong content is by not realizing that certain request headers may cause the orogin server to vary the response it returns for a given URI.
CloudFront has no perfect way of knowing this, so its solution -- by default -- is to remove almost all of the headers from the request before forwarding it to the origin. Then it caches the received response against exactly the request that it sent to the origin, and will only use that cached response for future identical requests.
Injecting a new header in a Viewer Request trigger will cause that header to be discarded after it passes through the matching Cache Behavior, unless the cache behavior specifically is configured to whitelist that header for forwarding to the origin. This is the same behavior you would see if the header had been injected by the browser, itself.
So, your solution to get this header to pass through to the origin is to whitelist it in the cache behavior settings.
If you tried this same code as an Origin Request trigger, without the header whitelisted, CloudFront would actually throw a 502 Bad Gateway error, because you're trying to inject a header that CloudFront already knows you haven't whitelisted in the matching Cache Behavior. (In Viewer Request, the Cache Behavior match hasn't yet occurred, so CloudFront can't tell if you're doing something with the headers that will not ultimately work. In Origin Request, it knows.) The flow is Viewer Request > Cache Behavior > Cache Check > (if cache miss) Origin Request > send to Origin Server. Whitelisting the header would resolve this, as well.
Any header you want the origin to see, whether it comes from the browser, or a request trigger, must be whitelisted.
Note that some headers are inaccessible or immutable, particularly those that could be used to co-opt CloudFront for fraudulent purposes (such as request forgery and spoofing) and those that simply make no sense to modify.