Amazon S3 object event creation - amazon-web-services

my AWS S3 bucket is associated with a Lambda function. Lambda is triggered by file insertion with certain file name. There is a XML file inside the same bucket from which lambda function reads the setting. The problem I am facing is that, when ever the settings file is not present/settings in the XML is wrong, the lambda launch will fail. But when the settings are made correct, lambda will get triggered for old files as well, where lambda got failed previously. I don't want trigger the Lambda again, once its failed for same file. Can some one direct me how to do that?

Take a look at the docs here:
http://aws.amazon.com/lambda/faqs/
Lambda will attempt to execute your event at least 3 times before it gives up.
If you can handle the errors in your function (related to reading the config file), so that the Lambda Function exists with out error, then Lambda will not (normally) run your function again.

Related

How to move file to a sublevel in s3 without triggering lambda?

An excel file (say my_excel_file.xlsx) will be uploaded to s3://my-bucket/a/b/
A trigger is set in lambda with following properties:
bucket-name: my-bucket, prefix: a/b/
I want my lambda to:
Read the excel file uploaded to s3://my-bucket/a/b/ into a pandas dataframe
After processing it, move the excel file to s3://my-bucket/a/b/archive/ with the name: my_excel_file_timestamp.xlsx
In case I am able to achieve the above step, will the lambda get invoked recursively? If yes, is there a workaround?
Since Amazon S3 event is configured to trigger on prefix a/b/, then it will trigger the AWS Lambda function when an object is placed into a/b/archive/.
I recommend adding a line of code at the top of the Lambda function that checks the Key, which is passed to the function via the event parameter. It should check if the Key starts with a/b/archive/ (or similar rule) -- if so, it should exit the function immediately. This will not incur a significant cost because it will exit quickly and Lambda is only charged per millisecond.
The alternative is to put your archive folder in a different location.

Write into a file in S3 directly

I have a requirement where I need to write a long String log for each execution of my lambda function, this String is basically log having record of failed and success cases.
as this code is for lambda its not possible to create file in a physical location like local file directly and upload to s3. but I need to directly create a file and write the String in one go.
Thanks in advance.
AWS Lambda functions can write data to the /tmp/ directory. There is a 512MB limit, so delete any files before your Lambda function exits so that it is clear for the next function execution.
You can therefore create a 'local' file and upload it to Amazon S3.
Alternatively, you can use the Amazon S3 API to create an object (PutObject()) while specifying the Body content without creating an actual file. This might be easier for you.
By the way, if your Lambda function prints output, it will automatically be captured by the Lambda function and stored in CloudWatch Logs. It will be intermixed with other messages from Lambda (eg showing the amount of memory used), but it's an easy way to log information.

Is there a way to pass file size as an input parameter from an AWS S3 bucket to a StepFunctions state machine?

I'm currently struggling to configure automated input to my AWS StepFunctions state machine. Basically, I am trying to set up a state machine that is notified whenever an object create event takes place in a certain S3 bucket. When that happens, the input is passed to a choice state which checks the file size. If the file is small enough, it invokes a Lambda function to process the file contents. If the file is too large, it invokes a Lambda to split up the file into files of manageable size, and then invokes the other Lambda to process the contents of those files. The problem with this is that I cannot figure out a way to pass the file size in as input to the state machine.
I am generally aware of how input is passed to StepFunctions, and I know that S3 Lambda triggers contain file size as a parameter, but I still haven't been able to figure out a practical way of passing file size as an input parameter to a StepFunctions state machine.
I would greatly appreciate any help on this issue and am happy to clarify or answer any questions that you have to the best of my ability. Thank you!
Currently S3 events can't triggers Step Function directly, so one option would be to create a S3 event that triggers a lambda. The lambda works as a proxy and passes the file info to the step function and kicks it off, also you can select data you want and only pass selective data to Step Functions.
The other option is to configure a state machine as a target for a CloudWatch Events rule. This will start an execution when files are added to an Amazon S3 bucket.
The first option is more flexible.

AWS Lambda function getting called repeatedly

I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?
You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.
AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.
You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.
The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder
The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!

Iterating a list of lists within Python for aws lambda

I have a question related to python.
The use case is to write a python function (in aws lambda) which will look for a set of files in multiple buckets and return some action like creating a dummy file in s3 bucket or triggering another lambda.
For eg:
list1=[file1,file2,file3]
list2=[file4,file5,file6]
list3=[f7,f8,f9]
def lambda_handler(event,context):
if len(list1)==9:
print("something")
//create dummy file in s3 OR, trigger another lambda
elif len(list2)==9:
print("Something")
else:
print("all files are not available")
and like wise.
I am a bit confused about how to do iteration within the 3lists and trigger one lambda for one set of file say list1 or list2 or list3. Or alternatively I can create a dummy file in s3.
Can anyone please help me with the way to do it?
I would recommend this architecture:
Create an AWS Lambda function that is triggered whenever any file is added to the S3 bucket
The Lambda function will receive details of the file that was added (which caused the function to be triggered)
The function can then check whether all the associated files are also present
If they are not present, it simply exits and does nothing
If they are present, it can then do the desired processing or invoke another Lambda function to do the processing
This way, things only happen when files are retrieved, rather than having to check every n minutes. Also, it will only be triggered on new files arriving, rather than having to overlook existing files that have already been processed or are awaiting other files.
The only potential danger is if all the desired files arrive in a short time space. Each file would trigger a separate lambda function and each of them might see that all files are available and then attempt to trigger the next process. So, be a little careful around that second trigger. You might need to include some logic to make sure they aren't processed twice.