Trigger AWS Lambda in Java for the newly uploaded file - amazon-web-services

I am working on a requirement where I want to trigger the AWS Lambda function written in Java when a file is uploaded on S3 bucket. The condition is that the function should pick-up the latest file in the bucket. Right now, I have the lambda function which picks up the specified file (having already specified file name). But as per the requirement, the file name can be anything(eg. web-log-). Is there any way to do that?
Since with lambda functions, we have access to the event object, can I use it to find out the recently uploaded file?

You could check out the AWS Lambda S3 tutorials, which should show how the uploaded object is passed in as event data. The example code contains a line which should point you in the right direction:
event.Records[0].s3.object.key

Related

Aws s3 trigger lambda on new files that are NOT temporary files

i want to launch a lambda for any new complete file, the process is quite simple:
i upload files to s3
for every new files in the directory I launch a lambda
unfortunately, i see that my lambdas invokes on _temporary/* files, which are files that are not fully uploaded to s3. what can i do?
thanks!
There is no concept of a "partially uploaded" file in Amazon S3. Either the whole object is created, or the object is not created. Nor does Amazon S3 have a concept of "temporary" or _temporary/ files. If they exist, it is because your application is uploading those files.
When creating an Amazon S3 event, you can specify a Prefix. The event will only be triggered for objects matching that prefix.
Alternatively, you could add a line of code at the start of the AWS Lambda function that checks the Key of the object and exits if it does not need to perform any actions on the object.

AWS lambda function and Athena to create partitioned table

Here's my requirements. Every day i'm receiving a CSV file into an S3 bucket. I need to partition that data and store it into Parquet to eventually map a Table. I was thinking about using AWS lambda function that is triggered whenever a file is uploaded. I'm not sure what are the steps to do that.
There are (as usual in AWS!) several ways to do this, the 2 first ones that come to me first are:
using a Cloudwatch Event, with an S3 PutObject Object level) action as trigger, and a lambda function that you have already created as a target.
starting from the Lambda function it is slightly easier to add suffix-filtered triggers, eg for any .csv file, by going to the function configuration in the Console, and in the Designer section adding a trigger, then choose S3 and the actions you want to use, eg bucket, event type, prefix, suffix.
In both cases, you will need to write the lambda function in either case to do the work you have described, and it will need IAM access to the bucket to pull the files and process them.

AWS Lambda S3 event infinite loop

I want to use S3 events to publish to AWS Lambda whenever a video file (.mp4) gets uploaded, so that it can be compressed. The problem is that the path to the video file is stored in RDS, so I want the path to remain the same after compression. From what I've read, replacing the file will again call the Object Created event leading to an infinite loop.
Is there any way to replace the file without triggering any event? What are my options?
You are correct that you cannot completely distinguish. From the documentation the following events are supported:
s3:ObjectCreated:Put – An object was created by an HTTP PUT operation.
s3:ObjectCreated:Post – An object was created by HTTP POST operation.
s3:ObjectCreated:Copy – An object was created an S3 copy operation.
s3:ObjectCreated:CompleteMultipartUpload – An object was created by the completion of a S3 multi-part upload.
s3:ObjectCreated:* – An object was created by one of the event types listed above or by a similar object creation event added in the future.
s3:ReducedRedundancyObjectLost – An S3 object stored with Reduced Redundancy has been lost.
The architecture that I would generally see for this type of problem is having 2 S3 buckets
1 S3 Bucket stores the source material without any modification, this would trigger the Lambda event.
1 S3 Bucket stores the processed artifact, from the compressed output.
By doing this you can store the original, and rerun if needed to autocorrect.
There is an ungraceful solution to this problem, which is not documented anywhere.
The event parameter in the Lambda function contains a userIdentity dict which contains principalId. For an event which originated because of AWS Lambda (like updating S3 object as mentioned in question), this principalId contains the name of the lambda function appended at the end.
Therefore, by checking the principalId one can deduce whether the event came from Lambda or not, and accordingly compress or not.

Does AWS S3 Have a concept of files being 'updated'?

I'd like to write a Lambda function that is triggered when files are added or modified in an s3 bucket and processes them and moves them elsewhere, clobbering older versions of the files.
I'm wondering if AWS Lambda can be configured to trigger when files are updated?
After reviewing the Boto3 documentation for s3 it looks like the only things that could happen in a s3 bucket would be creations and deletions.
Additionally, the AWS documentation seems to indicate there is no way to trigger things on 'updates' to S3.
Am I correct in thinking there is no real concept of an 'update' to a file in S3 and that an update would actually be when something was destroyed and recreated? If I'm mistaken, how can I trigger a Lambda function when an S3 file is changed in a bucket?
No, there is no concept of updating a file on S3. A file on S3 is updated the same way it is uploaded in the first place - through a PUT object request. (Relevant answer here.) An S3 bucket notification configured to trigger on a PUT object request can execute a Lambda function.
There is now a new functionality for S3 buckets. Under properties there is the possibility to enable versioning for this bucket. And if you set a trigger for creating on S3 assigned to your Lambda function - this will executed every time if you 'update' the same file as it is a new version.

AWS Lambda and zip upload from S3

This feature is not clear to me about the benefits (I didn't find any good documentation):
Is it just faster in the case you reuse the same zip for many lambda functions because you upload only 1 time and you just give the S3 link URL to each lambda function?
If you use an S3 link, will all your lambda functions be updated with the latest code automatically when you re-upload the zip file, meaning is the zip file on S3 a "reference" to use at each call to a lambda function?
Thank you.
EDIT:
I have been asked "Why do you want the same code for multiple Lambda functions anyway?"
Because I use AWS Lambda with AWS API Gateway so I have 1 project with all my handlers which are actual "endpoints" for my RESTful API.
EDIT #2:
I confirm that uploading a modified version of the zip file on S3 doesn't change the existing lambda functions result.
If an AWS guy reads this message, that would be great to have a kind of batch update feature that updates a set of selected lambda functions with 1 zip file on S3 in 1 click (or even an "automatic update" feature that detects when the file has been updated ;-))
Let's say you have 50 handlers in 1 project, then you modify something global impacting all of them, currently you have to go through all your lambda functions and update the zip file manually...
The code is imported from the zip to Lambda. It is exactly the same as uploading the zip file through the Lambda console or API. However, if your Lambda function is big (they say >10MB), they recommend uploading to S3 and then using the S3 import functionality because that is more stable than directly uploading from the Lambda page. Other than that, there is no benefit.
So for question 1: no. Why do you want the same code for multiple Lambda functions anyway?
Question 2: If you overwrite the zip you will not update the Lambda function code.
To add to other people's use cases, having the ability to update a Lambda function from S3 is extremely useful within an automated deployment / CI process.
The instructions under New Deployment Options for AWS Lambda include a simple Lambda function that can be used to copy a ZIP file from S3 to Lambda itself, as well as instructions for triggering its execution when a new file is uploaded.
As an example of how easy this can make development and deployment, my current workflow is:
I update my Node lambda application on my local machine, and git commit it to a remote repository.
A Jenkins instance picks up the commit, pulls down the appropriate files, adds them into a ZIP file and uploads this to an S3 bucket.
The LambdaDeployment function then automatically deploys this new version for me, without me needing to even leave my development environment.
To answer what I think is the essence of your question, AWS allows you to use S3 as the origin for your Lambda zip file because sometimes uploading large files via your browser can timeout. Also, storing your code on S3 allows you to store it centrally, rather than on your computer and I'm sure there is a CodeCommit tie-in there as well.
Using the S3 method of uploading your code to Lambda also allows you to upload larger files (AWS has a 10MB limit when uploading via web browser).
#!/bin/bash
cd /your/workspace
#zips up the new code
zip -FSr yourzipfile.zip . -x *.git* *bin/\* *.zip
#Updates function code of lambda and pushes new zip file to s3bucket for cloudformation lambda:codeuri source
aws lambda update-function-code --function-name arn:aws:lambda:us-west-2:YOURID:function:YOURFUNCTIONNAME --zip-file file://yourzipfile.zip
Depends on aws-cli install and aws profile setup
aws --profile yourProfileName configure