Is there anyway to prevent a DynamoDB stream from triggering a lambda upon every DynamoDB change?
A DynamoDB stream is setup to trigger a lambda function. The lambda is at the end of a step function, and the DynamoDB table is updated in a few places throughout the step function. But those aren't the updates the lambda needs from the stream. So it's prematurely triggering the lambda before the lambda needs to invoke, and causing the lambda to trigger several times throughout the duration of the step function. This causes all sorts of problems.
One very specific change to the DynamoDB table is what's needed to trigger the lambda. That change doesn't come from the step function, but from a UI via GraphQL. The lambda needs to be able to run at both the end of the step function and whenever that change happens on the UI.
Basically, there are two scenarios when the lambda is supposed to run: 1) at the end of the step function, and 2) when the DynamoDB table is updated in the UI, bypassing the step function.
I'm writing code that stops the lambda execution if it's not the desired DynamoDB change, but that doesn't seem right...don't want it to constantly invoke if it doesn't need to. In the lifecycle of the step function, the DynamoDB table can change several times, before it ever reaches the lambda.
These numbers aren't exact, but say the step function will run 10 times in a row, then it'll update the DynamoDB 3 times. That's 30 times the lambda will be invoked, before the step function has ever triggered the lambda like it's supposed to. Is there anyway to prevent those lambda invocations?
No, if you attach a Lambda function to a DDB trigger it will always execute on a DDB update. You need to change your architecture. You could stop immediately if you don't want it to run (what you do now), but you do pay for the invocation requests.
Alternatively, you could change the update DDB code (scenario 2). Replace that with a Lambda function that updates the DDB code, then calls the Lambda function you want. You can then safely remove the stream, as you are not reliant on it anymore.
Related
I have an aws lambda function. When it receives only one trigger, it always succeds. But when it receives more than one trigger, it sometimes throws error. The first trigger always succeds.
Can I configure one aws lambda function receives only one trigger?
can one aws lambda function handle multiple triggers at once?
Yes, Lambda functions can handle multiple triggers at once.
when it receives more than one trigger, it sometimes throws error
This is most probably related to your implementation. Are you doing something different based on the inputs? Is the code behaving differently based on time?
Can I configure one aws lambda function receives only one trigger?
You can limit the concurrency of the Lambda function. If you set it to 1, you can only have one Lambda function running at any given time.
See: Set Concurrency Limits on Individual AWS Lambda Functions
Here's what I know, or think I know.
In AWS Lambda, the first time you call a function is commonly called a "cold start" -- this is akin to starting up your program for the first time.
If you make a second function invocation relatively quickly after your first, this cold start won't happen again. This is colloquially known as a "warm start"
If a function is idle for long enough, the execution environment goes away, and the next request will need to cold start again.
It's also possible to have a single AWS Lambda function with multiple triggers. Here's an example of a single function that's handling both API Gateway requests and SQS messages.
My question: Will AWS Lambda reuse (warm start) an execution environment when different event triggers come in? Or will each event trigger have it's own cold start? Or is this behavior that's not guaranteed by Lambda?
Yes, different triggers will use the same containers since the execution environment is the same for different triggers, the only difference is the event that is passed to your Lambda.
You can verify this by executing your Lambda with two types of triggers (i.e. API Gateway and simply the Test function on the Lambda Console) and looking at the CloudWatch logs. Each Lambda container creates its own Log Stream inside of your Lambda's Log Group. You should see both event logs going to the same Log Stream which means the 2nd event is successfully using the warm container created by the first event.
Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.
Currently I have a use case that a cloud watch rule will trigger a step function every 5 minutes. I want to have a logic to skip starting another execution if there is one execution already running in step function.
Any way to do that?
Instead of having your CloudWatch event rule trigger the Step Function directly, you could have it trigger a Lambda function. The Lambda function could check if there are any Step Function executions in the RUNNING state, via the ListExecutions API. If not, the Lambda function could start a new execution via the StartExecution API.
I have written a python script for autoscaling server naming. which will check current servers in autoscaling and give an appropriate name and sequence to the new server.
I am triggering my Aws lambda function by autoscaling event. and when I bring 3 servers at the same time(or new autoscaling with desired capacity 10) I don't want lambda to be executed parallelly. it is making my script assign the same count for all servers.
Or if I can implement some kind of locking to put other lambdas in wait state. So what should i used for it.
There are 2 options:
You can create a distributed lock using a DynamoDB table. This will
help you maintain a state saying Operation in progress. Every time a
lambda function is invoked due to an autoscaling event, you can
check if this record exists in dynamodb. If it does not create one,
and proceed. If it exists, do nothing and return. After your lambda
function executes successfully, remove this entry. This probably
will add not more than $2.00 to your total AWS bill as the read and
write capacity for this table will be really low.
You can make use of the step functions to implement this scenario.
With step functions you can check if one is already running and
skip.
Read more about step functions here:
https://aws.amazon.com/step-functions/