AWS Lambda#edge. How to read HTML file from S3 and put content in response body - amazon-web-services

Specifically, in an origin response triggered function (EX. With 404 Status), how can I read an HTML file stored in S3 and use its content for the response body?
(I would like to manually return a custom error page just as CloudFront does, but choosing it based on cookies).
NOTE: The HTML file in S3 is stored in the same bucket of my website. OAI Enabled.
Thank you very much!

Lambda#Edge functions don't currently¹ have direct access to any body content from the origin.
You will need to grant your Lambda Execution Role the necessary privileges to read from the bucket, and then use s3.getObject() from the JavaScript SDK to fetch the object from the bucket, then use its body.
The SDK is already in the environment,² so you don't need to bundle it with your code. You can just require it, and create the S3 client globally, outside the handler, which saves time on subsequent invocations.
'use strict';
const AWS = require('aws-sdk');
const s3 = new AWS.S3({ region: 'us-east-2' }); // use the correct region for your bucket
exports.handler ...
Note that one of the perceived hassles of updating a Lambda#Edge function is that the Lambda console gives the impression that redeploying it is annoyingly complicated... but you don't have to use the Lambda console to do this. The wording of the "enable trigger and replicate" checkbox gives you the impression that it's doing something important, but it turns out... it isn't. Changing the version number in the CloudFront configurarion and saving changes accomplishes the same purpose.
After you create a new version of the function, you can simply go to the Cache Behavior in the CloudFront console and edit the trigger ARN to use the new version number, then save changes.
¹currently but I have submitted this as a feature request; this could potentially allow a response trigger to receive a copy of the response body and rewrite it. It would necessarily be limited to the maximum size of the Lambda API (or smaller, as generated responses are currently limited), and might not be applicable in this case, since I assume you may be fetching a language-specific response.
²already in the environment. If I remember right, long ago, Lambda#Edge didn't include the SDK, but it is always there, now.

Related

How to let AWS lambda to pass a string value to index.html in s3 bucket

I'm currently building a lambda function which the iot trigger passes event['key'] value which based on the value of event['key'] it will update the index.html that is stored in s3 bucket. For example, if event['key'] = 'Yes', the html will display a string 'hi'.
I'm not quite sure how I'd be able to update html since I'm fairly new to AWS. I know there's like an API that has that functionality but can't seem to find it. putObject seems fairly close but it's not the one that I'm looking for since it needs to update the string value in html. Any way to do so?
Details can vary based on environment / stack you using to write your Lambda.
If you want to update your index.html file located in S3 based on (IoT or any) trigger - your Lambda needs to getObject (read that file from S3) modify content - by simple find and replace or more advanced parsing, traversing, and DOM manipulation - and putObject back to S3.

AWS Lambda function getting called repeatedly

I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?
You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.
AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.
You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.
The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder
The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!

Amazon S3 Object Lifecycle Management via header

I've been searching for an answer to this question for quite some time but apparently I'm missing something.
I use s3cmd heavily to automate document uploads to AWS S3, via script. One of the parameters that can be used in s3cmd is --add-header, which I assume allows for lifecycle rules to be added.
My objective is to add this parameters and specify a +X (where X is days) to the upload. In the event of ... --add-header=...1 ... the lifecyle rule would delete this file after 24h.
I know this can be easily done via the console, but I would like to have a more detailed control over individual files/scripts.
I've read the parameters that can be passed to S3 via s3cmd, but I somehow can't understand how to put all of those together to get the intended result.
Thank you very much for any help or assistance!
The S3 API itself does not implement support for any request header that triggers lifecycle management at the object level.
The --add-header option for s3cmd can add headers that S3 understands, such as Content-Type, but there is no lifecycle header you can send using any tool.
You might be thinking of this:
If you make a GET or a HEAD request on an object that has been scheduled for expiration, the response will include an x-amz-expiration header that includes this expiration date and the corresponding rule Id
https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/
This is a reaponse header, and is read-only.

Resize images on the fly in CloudFront and get them in the same URL instantly: AWS CloudFront -> S3 -> Lambda -> CloudFront

TLDR: We have to trick CloudFront 307 redirect caching by creating new cache behavior for responses coming from our Lambda function.
You will not believe how close we are to achieve this. We have stucked so badly in the last step.
Business case:
Our application stores images in S3 and serves them with CloudFront in order to avoid any geographic slow downs around the globe.
Now, we want to be really flexible with the design and to be able to request new image dimentions directly in the CouldFront URL!
Each new image size will be created on demand and then stored in S3, so the second time it is requested it will be
served really quickly as it will exist in S3 and also will be cached in CloudFront.
Lets say the user had uploaded the image chucknorris.jpg.
Only the original image will be stored in S3 and wil be served on our page like this:
//xxxxx.cloudfront.net/chucknorris.jpg
We have calculated that we now need to display a thumbnail of 200x200 pixels.
Therefore we put the image src to be in our template:
//xxxxx.cloudfront.net/chucknorris-200x200.jpg
When this new size is requested, the amazon web services have to provide it on the fly in the same bucket and with the requested key.
This way the image will be directly loaded in the same URL of CloudFront.
I made an ugly drawing with the architecture overview and the workflow on how we are doing this in AWS:
Here is how Python Lambda ends:
return {
'statusCode': '301',
'headers': {'location': redirect_url},
'body': ''
}
The problem:
If we make the Lambda function redirect to S3, it works like a charm.
If we redirect to CloudFront, it goes into redirect loop because CloudFront caches 307 (as well as 301, 302 and 303).
As soon as our Lambda function redirects to CloudFront, CloudFront calls the API Getaway URL instead of fetching the image from S3:
I would like to create new cache behavior in CloudFront's Behaviors settings tab.
This behavior should not cache responses from Lambda or S3 (don't know what exactly is happening internally there), but should still cache any followed requests to this very same resized image.
I am trying to set path pattern -\d+x\d+\..+$, add the ARN of the Lambda function in add "Lambda Function Association"
and set Event Type Origin Response.
Next to that, I am setting the "Default TTL" to 0.
But I cannot save the behavior due to some error:
Are we on the right way, or is the idea of this "Lambda Function Association" totally different?
Finally I was able to solve it. Although this is not really a structural solution, it does what we need.
First, thanks to the answer of Michael, I have used path patterns to match all media types. Second, the Cache Behavior page was a bit misleading to me: indeed the Lambda association is for Lambda#Edge, although I did not see this anywhere in all the tooltips of the cache behavior: all you see is just Lambda. This feature cannot help us as we do not want to extend our AWS service scope with Lambda#Edge just because of that particular problem.
Here is the solution approach:
I have defined multiple cache behaviors, one per media type that we support:
For each cache behavior I set the Default TTL to be 0.
And the most important part: In the Lambda function, I have added a Cache-Control header to the resized images when putting them in S3:
s3_resource.Bucket(BUCKET).put_object(Key=new_key,
Body=edited_image_obj,
CacheControl='max-age=12312312',
ContentType=content_type)
To validate that everything works, I see now that the new image dimention is served with the cache header in CloudFront:
You're on the right track... maybe... but there are at least two problems.
The "Lambda Function Association" that you're configuring here is called Lambda#Edge, and it's not yet available. The only users who can access it is users who have applied to be included in the limited preview. The "maximum allowed is 0" error means you are not a preview participant. I have not seen any announcements related to when this will be live for all accounts.
But even once it is available, it's not going to help you, here, in the way you seem to expect, because I don't believe an Origin Response trigger allows you to do anything to trigger CloudFront to try a different destination and follow the redirect. If you see documentation that contradicts this assertion, please bring it to my attention.
However... Lambda#Edge will be useful for setting Cache-Control: no-cache on the 307 so CloudFront won't cache it, but the redirect itself will still need to go all the way back to the browser.
Note also, Lambda#Edge only supports Node, not Python... so maybe this isn't even part of your plan, yet. I can't really tell, from the question.
Read about the Lambda#Edge limited preview.
The second problem:
I am trying to set path pattern -\d+x\d+\..+$
You can't do that. Path patterns are string matches supporting * wildcards. They are not regular expressions. You might get away with /*-*x*.jpg, though, since multiple wildcards appear to be supported.

Limit Size Of Objects While Uploading To Amazon S3 Using Pre-Signed URL

I know of limiting the upload size of an object using this method: http://doc.s3.amazonaws.com/proposals/post.html#Limiting_Uploaded_Content
But i would like to know how it can be done while generating a pre-signed url using S3 SDK on the server side as an IAM user.
This Url from SDK has no such option in its parameters : http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property
Neither in this:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property
Please note: I already know of this answer: AWS S3 Pre-signed URL content-length and it is NOT what i am looking for.
The V4 signing protocol offers the option to include arbitrary headers in the signature. See:
http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-query-string-auth.html
So, if you know the exact Content-Length in advance, you can include that in the signed URL. Based on some experiments with CURL, S3 will truncate the file if you send more than specified in the Content-Length header. Here is an example V4 signature with multiple headers in the signature
http://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html
You may not be able to limit content upload size ex-ante, especially considering POST and Multi-Part uploads. You could use AWS Lambda to create an ex-post solution. You can setup a Lambda function to receive notifications from the S3 bucket, have the function check the object size and have the function delete the object or do some other action.
Here's some documentation on
Handling Amazon S3 Events Using the AWS Lambda.
For any other wanderers that end up on this thread - if you set the Content-Length attribute when sending the request from your client, there a few possibilities:
The Content-Length is calculated automatically, and S3 will store up to 5GB per file
The Content-Length is manually set by your client, which means one of these three scenarios will occur:
The Content-Length matches your actual file size and S3 stores it.
The Content-Length is less than your actual file size, so S3 will truncate your file to fit it.
The Content-Length is larger than your actual file size, and you will receive a 400 Bad Request
In any case, a malicious user can override your client and manually send a HTTP request with whatever headers they want, including a much larger Content-Length than you may be expecting. Signed URLs do not protect against this! The only way is to setup an POST policy. Official docs here: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTConstructPolicy.html
More details here: https://janac.medium.com/sending-files-directly-from-client-to-amazon-s3-signed-urls-4bf2cb81ddc3?postPublishedType=initial
Alternatively, you can have a Lambda that automatically deletes files that are larger than expected.