Temporarily disable DynamoDB Lambda Triggers / Stream - amazon-web-services

I'm looking for a way to temporarily disable Lambda triggers on a DynamoDB. I want to be able do apply manual Updates on a table (e.g. such as importing data from a S3 backup) without the Lambda code being triggers. I tried the disable button next to the trigger in the lambda functions "Trigger" tab. I also tried to disable the whole Stream for the table. In both cases, when reactivating the trigger/stream all the trigger events (that happened, while they were deactivated) are executed then.
How can i prevent this code being triggered?
Thank you very much!

For others that arrive at this answer - https://alestic.com/2015/11/aws-lambda-kinesis-pause-resume/ provides a CLI solution for pausing stream reading, and resuming from the same place at some point in the future.

Related

How to handle AWS Lambda timeout?

I have a lambda function that generates thumbnails on S3 Put event. It works fine. But I want to handle the case when it accidentally takes longer than reserved time(3 sec).
It's because I'm fetching the lambda generated thumbnail by suffixing '-small.jpg' or '-medium.jpg'. If the timeout happens and the thumbnails are not generated, I must have an alternative image in my bucket.
if you want to increase function timeout you can edit in the general setting of your function. steps and screenshot below will explain how to do it.
Click on the lambda function hyperlink and click on General Configuration.
click on edit [top right pane], and increase the function timeout.
2:
I suspect that creating thumbnails will not take more than 15 minutes at maximum Lambda RAM (and correspondingly max CPU) but if you need to handle that possibility then configure your Lambda function with a DLQ and trigger subsequent processing of failed Lambda functions.

Prevent lambda from processing dynamodb stream events during table recovery

I'm working on preparing a disaster recovery plan for DynamoDB. In a DR situation we would create a temporary table to restore a snapshot to. From the Temp table we would copy data to a table that has been provisioned with IAC. Our DynamoDB tables have a stream and associated lambda trigger which would process all events that were copied in from the data in the temp table, this is unwanted and would cause a bunch of downstream issues.
Ideally I would like to disable the stream/ lambda trigger until the restore is complete and then enable and not process/ignore any of the changes from the copy/restore process.
I've read through DynamoDB stream documentation and it isn't clear to me if disabling the stream will clear events. It is my understanding that disabling/ enabling the DynamoDB stream although will provide you with a new arn is still the same stream behind the scenes and that log lives for 24 hours and once enable events would be sent to the lambda trigger.
Seems I might be able to configure the trigger on the lambda side, disable it, and then set the ShardIteratorType to 'LATEST' in order to prevent reading events from copied data.
Thanks in advance for any advice.
For anyone traveling down this path in the future I received some good answers on AWS Re:post here.
Some take aways:
When you disable/enable a stream it changes ARN and is completely separate to the old stream.
If the stream is disabled and you copy data to the table the stream log will remain empty once you reenable.
lambda event source mapping solution: LATEST will start reading from the the point of when you enable the trigger on the ESM. So if your copy puts events on the stream and you later create an ESM with iterator position as LATEST your Lambda will disregard all the data from the copy.

Automated Real Time Data Processing on AWS with Lambda

I am interested in doing automated real-time data processing on AWS using Lambda and I am not certain about how I can trigger my Lambda function. My data processing code involves taking multiple files and concatenating them into a single data frame after performing calculations on each file. Since files are uploaded simultaneously onto S3 and files are dependent on each other, I would like the Lambda to be only triggered when all files are uploaded.
Current Approaches/Attempts:
-I am considering an S3 trigger, but my concern is that an S3 Trigger will result in an error in the case where a single file upload triggers the Lambda to start. An alternate option would be adding a wait time but that is not preferred to limit the computation resources used.
-A scheduled trigger using Cloudwatch/EventBridge, but this would not be real-time processing.
-SNS trigger, but I am not certain if the message can be automated without knowing the completion in file uploads.
Any suggestion is appreciated! Thank you!
If you really cannot do it with a scheduled function, the best option is to trigger a Lambda function when an object is created.
The tricky bit is that it will fire your function on each object upload. So you either can identify the "last part", e.g., based on some meta data, or you will need to store and track the state of all uploads, e.g. in a DynamoDB, and do the actual processing only when a batch is complete.
Best, Stefan
Your file coming in parts might be named as -
filename_part1.ext
filename_part2.ext
If any of your systems is generating those files, then use the system to generate a final dummy blank file name as -
filename.final
Since in your S3 event trigger you can use a suffix to generate an event, use .final extension to invoke lambda, and process records.
In an alternative approach, if you do not have access to the server putting objects to your s3 bucket, then with each PUT operation in your s3 bucket, invoke the lambda and insert an entry in dynamoDB.
You need to put a unique entry per file (not file parts) in dynamo with -
filename and last_part_recieved_time
The last_part_recieved_time keeps getting updated till you keep getting the file parts.
Now, this table can be looked up by a cron lambda invocation which checks if the time skew (time difference between SYSTIME of lambda invocation and dynamoDB entry - last_part_recieved_time) is enough to process the records.
I will still prefer to go with the first approach as the second one still has a chance for error.
Since you want this to be as real time as possible, perhaps you could just perform your logic every single time a file is uploaded, updating the version of the output as new files are added, and iterating through an S3 prefix per grouping of files, like in this other SO answer.
In terms of the architecture, you could add in an SQS queue or two to make this more resilient. An S3 Put Event can trigger an SQS message, which can trigger a Lambda function, and you can have error handling logic in the Lambda function that puts that event in a secondary queue with a visibility timeout (sort of like a backoff strategy) or back in the same queue for retries.

AWS - want to upload multiple files to S3 and only when all are uploaded trigger a lambda function

I am seeking advice on what's the best way to design this -
Use Case
I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.
Naive Approach
The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.
A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.
The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.
Edit: Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.
Thoughts
I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.
What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).
Edit In response to mbaird's suggestions below-
Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.
Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.
In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D
Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!
So Option 1 wins.
If you don't want the client program responsible for invoking the Lambda function directly, then would it be OK if it did something a bit more generic?
Option 1: (SNS) What if it simply notified an SNS topic that it had completed a batch of S3 uploads? You could subscribe your Lambda function to that SNS topic.
Option 2: (DynamoDB Streams) What if it simply updated the DynamoDB record with something like an attribute record.allFilesUploaded = true. You could have your Lambda function trigger off the DynamoDB stream. Since you are already creating a DynamoDB record via the client, this seems like a very simple way to mark the batch of uploads as complete without having to code in knowledge about what needs to happen next. The Lambda function could then check the "allFilesUploaded" attribute instead of having to go to S3 for a file listing every time it is called.
Alternatively, don't insert the DynamoDB record until all files have finished uploading, then your Lambda function could just trigger off new records being created.
Option 3: (continuing to use S3 triggers) If the client program can't be changed from how it works today, then instead of listing all the S3 files and comparing them to the list in DynamoDB each time a new file appears, simply update the DynamoDB record via an atomic counter. Then compare the result value against the size of the file list. Once the values are the same you know all the files have been uploaded. The down side to this is that you need to provision enough capacity on your DynamoDB table to handle all the updates, which is going to increase your costs.
Also, I agree with you that SWF is overkill for this task.

AWS Lambda: error creating the event source mapping: Configuration is ambiguously defined

There was an error creating the event source mapping: Configuration is ambiguously defined. Cannot have overlapping suffixes in two rules if the prefixes are overlapping for the same event type.
I created an event earlier from the GUI console 6-7 days ago and it was working fine. The next day the event just missing, i cant see it anymore at the Lambda console GUI. But every S3 objects still seems triggering the lambda function not a problem. If i cant see, it is not good; So i deleted the Lambda function, waited for 5-10 seconds before creating another new function. And now, i receive the same above when i try to create the event sources like this:
When i click "Submit" the event sources tab says "You do not have any event sources for this function", Lambda does not get triggered; it means the entire application flow is now broken :(
The problem is almost the same as: "https://forums.aws.amazon.com/thread.jspa?messageID=670712򣯸" But somehow i cant reply to that thread, so i created a new thread here instead. anyone encounter this issue?
In fact, i try to response to the existing AWS forum thread: https://forums.aws.amazon.com/thread.jspa?messageID=670712&#670712
but i keep getting this funny error: "Your message quota has been reached. Please try again later.". And i wasnt even posting anything, how can i use up my quota?
What I suspect is your S3 bucket may still be "linked" to the lambda function.
Maybe check your S3 bucket for events and remove them there, then try creating the lambda events again?
i.e. S3 bucket-> properties-> Events
After 6 years nice to see some people still befitting from this answer,
Here is a shamless plug to youtube video I uploaded 2022-12-13.
https://www.youtube.com/watch?v=rjpOU7jbgEs
The issue must be that the s3 bucket is already linked with the suffix/prefix you are trying to link. Remove the link in S3 and try again.
When you setup a lambda function and setup a trigger related to S3. The notification gets updated in the properties sections of that S3 bucket.
The mentioned error occurs when the earlier lambda function is deleted and you're trying to setup same kind of trigger again. This time the thing to note is, the S3 notification is still not deleted when you deleted the lambda function.
Goto S3 bucket > Properties > Event notifications
and delete the old setting and then setup new trigger in the new lambda function trigger.
Here is a link to a youtube video profiling this issue and demonstrating the solution:
https://www.youtube.com/watch?v=1Tfmc9nEtbU
Just as Ridwaan Manuel, you must remove the events by going to S3 bucket-> properties-> Events as the video shows.
Steps to reproduce this issue:
Create a bucket and create a folder called “example/”
Create Lambda Function
Add S3 trigger to the lambda using the bucket from (1) with default settings
Save the trigger
Click Save and notice error
Refresh the page and notice that the triggers disappeared
Add the same bucket again and notice the ambiguous reference error