Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.
Related
I am building a simple app in AWS which lets user rent out cars for limited amount of time. I am using AWS Lambda for computation, dynamoDB for storage and API Gateway to handle requests to lambda functions.
My question is if there is any AWS service or dynamoDB feature that allows me to track time for "Car" object in dynamoDB such that when rental time is over, it triggers a lambda function to notify the user and perform other action?
You could consider using DynamoDB Time to Live along with DynamoDB streams and a lambda function.
In this scenario, the items specific to the rental time would be placed in a separate table. They would have TTL values set to the rental time. DynamoDB automatically scans and deletes items based on the TTL. These automatic deletions could be picked up by DynamoDB streams and forwarded to a lambda function. The function would take action based on the expired time.
However, a possible issue could be that sometimes DynamoDB will take 48 hours to delete an item.
DynamoDB Streams and TTL are not good solutions because DynamoDB provides no SLA for TTL deletes (it can even take longer than 48 hours in rare cases) and the item will be deleted so cannot be used by downstream applications or analytics later on.
For this you should use Cloudwatch event rules (or Amazon Eventbridge) with a cron schedule expression. So your code that puts the item into the DynamoDB table can subsequently create a Cloudwatch event rule for the time in the future when the rental time will expire, using a cron schedule expression. This will trigger a lambda that can call your notification service to notify the customer.
A possible solution would be having a Lambda cron job that runs on a timer that scans or queries the DynamoDB table for values that have a date matching the end date of the rental. This lambda could then invoke your NotifyUser lambda using AWS Step Functions, or could emit an event to a SNS Topic where your lambda has subscribed to.
Some links that may be helpful:
CronJob
SNS
I have a dynamo db and every entry has a specific cron expression on which we need to perform actions for the entry.
Basically, I want every entry in the dynamo to have a custom cron expression(they can be different for all the entries) on the basis of which a schedule is triggered and the contents of the entry is sent to a queue or published to an SNS topic in the form of a message.
Is that possible to achieve via CloudWatch events? As in, having a specific CloudWatch event for every entry in the DynamoDb?
Also, can it be done in a different way if not CloudWatch?
You could use dynamodb streams to trigger a step function on insert event
I can't seem to find the documentation on what kinds of events DynamoDB is able to trigger a lambda function based on. All I can find is mentions of when a new record is added to a table or a record is updated. Are those the two "only" actions/events available? Or could I also trigger a lambda function when I request a records that does not exists (which is what I need in my case, where I will be using DynamoDB as a cache)?
Triggering AWS Lambda through events happening in DynamoDB is done by utilizing DynamoDB Streams.
As stated in the documentation:
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours.
So they only capture operations which modify data, which isn't the case for read operations.
Triggering a Lambda function automatically because somebody queried for a key that doesn't exist is not supported by DynamoDB. You would have to handle that in your querying code.
I'm trying to set up ElasticSearch import process from DynamoDB table. I have already created AWS Lambda and enabled DynamoDB stream with trigger that invokes my lambda for every added/updated record. Now I want to perform initial seed operation (import all records that are currently in my DynamoDB table to ElasticSearch). How do I do that? Is there any way to make all records in a table be "reprocessed" and added to stream (so they can be processed by my lambda)? Or is it better to write a separate function that will manually read all data from the table and send it to ElasticSearch - so basically have 2 lambdas: one for initial data migration (executed only once and triggered manually by me), and another one for syncing new records (triggered by DynamoDB stream events)?
Thanks for all the help :)
Depending on how Large your dataset is you won't be able to seed your database in Lambda as there is a max timeout of 300 seconds (EDIT: This is now 15 minutes, thanks #matchish).
You could spin up an EC2 instance and use the SDK to perform a DynamoDB scan operation and batch write to your Elasticsearch instance.
You could also use Amazon EMR to perform a Map Reduce Job to export to S3 and from there process all your data.
I would write a script that will touch each records in dynamodb. For each items in your dynamodb, add a new property called migratedAt or whatever you wish. Adding this property will trigger dynamodb stream which in turn will trigger your lambda handler. Based on your question, your lambda handler already handles the update so there is no change there.
I have a scenario where I have a DynamoDB table with a trigger (a stream) to an AWS Lambda function.
I want to use DynamoDB as an Event Store and use the Lambda function to maintain a projection / aggregate view / read view of the data.
I need to make sure that when I save an CreateEntity event in DynamoDB and then maybe right after when I save an UpdateEntity that the Lambda function will process the CreateEntity event before the UpdateEntity event.
My understanding is that the parallelism of the triggers to Lambda depends on the number of Shards the DynamoDB stream consists of. So if the DynamoDB Stream that the Lambda function use have 2 shards and one event goes on Shard1 and the other event goes on Shard2 then they can be processed in parallel by two instances of the Lambda function.
So if the CreateEntity event is on Shard1 and UpdateEntity is on Shard2 then if Shard1 or the Lambda function instance for some reason is slow then the UpdateEntity event in Shard2 might be processed first. Meaning that it cannot be added to the projection because there is no entity created first.
Is my understanding correct?
Is there a way to ensure that the events are only processed by one instance of the Lambda function so that I can ensure the ordering of processing of the messages?
Or do I have to use something else than Lambda for this? For example DynamoDB stream to Kinesis with my own application where I can ensure that only one instance of the application is running and ensure the ordering this way.
this is partly correct
if you CreateEntity X, and then UpdateEntity X, then in almost all of the cases. it will happen on the same shard (entities are split on shards according their composite key).
the only case that it wont work is when your entity is split over shard, and this can happen only if you have small amount of unique entities, any many of them. and if you are in this case then you are doing something wrong..
so in your case its being ensured...