I have a dynamo db and every entry has a specific cron expression on which we need to perform actions for the entry.
Basically, I want every entry in the dynamo to have a custom cron expression(they can be different for all the entries) on the basis of which a schedule is triggered and the contents of the entry is sent to a queue or published to an SNS topic in the form of a message.
Is that possible to achieve via CloudWatch events? As in, having a specific CloudWatch event for every entry in the DynamoDb?
Also, can it be done in a different way if not CloudWatch?
You could use dynamodb streams to trigger a step function on insert event
Related
I am building a simple app in AWS which lets user rent out cars for limited amount of time. I am using AWS Lambda for computation, dynamoDB for storage and API Gateway to handle requests to lambda functions.
My question is if there is any AWS service or dynamoDB feature that allows me to track time for "Car" object in dynamoDB such that when rental time is over, it triggers a lambda function to notify the user and perform other action?
You could consider using DynamoDB Time to Live along with DynamoDB streams and a lambda function.
In this scenario, the items specific to the rental time would be placed in a separate table. They would have TTL values set to the rental time. DynamoDB automatically scans and deletes items based on the TTL. These automatic deletions could be picked up by DynamoDB streams and forwarded to a lambda function. The function would take action based on the expired time.
However, a possible issue could be that sometimes DynamoDB will take 48 hours to delete an item.
DynamoDB Streams and TTL are not good solutions because DynamoDB provides no SLA for TTL deletes (it can even take longer than 48 hours in rare cases) and the item will be deleted so cannot be used by downstream applications or analytics later on.
For this you should use Cloudwatch event rules (or Amazon Eventbridge) with a cron schedule expression. So your code that puts the item into the DynamoDB table can subsequently create a Cloudwatch event rule for the time in the future when the rental time will expire, using a cron schedule expression. This will trigger a lambda that can call your notification service to notify the customer.
A possible solution would be having a Lambda cron job that runs on a timer that scans or queries the DynamoDB table for values that have a date matching the end date of the rental. This lambda could then invoke your NotifyUser lambda using AWS Step Functions, or could emit an event to a SNS Topic where your lambda has subscribed to.
Some links that may be helpful:
CronJob
SNS
I have an SQS queue that triggers a lambda each time a message arrives in the queue.
the message contains information about a product let's called it A. When the lambda is executed it inserts data of product A into RDS.
However, another message will arrive in about 30 seconds containing other information about product A, which will insert data into RDS again.
Is there any method to put some latency on the SQS triggering lambda?
Also, can the new messages received for product A be processed and the old ones being discarded? I want to use SQS message deduplication in order to use each message received for the product as unique but I am not sure that it's a good fit for this use case?
The other solution was to replace the SQS with a "custom queue", by replacing the SQS with an RDS aurora instance, the lambda will than do a cron on the instance and pick the product with expired TTL in order to insert in the DB but I find this a bit overkill, is there any other way to do this?
Thanks
Based on the comments, the partial solution to the problem is to setup an event source mapping between Lambda and SQS.
In the ideal situation, the producer should be modified in that situation. However, since the producer can't be modified, a caching solution (e.g. ElastiCache) to store the "incomplete" sqs messages before writing them to RDS and to filter out duplicates could be implemented.
I have a workflow where I need to pass in some information that would be stored for a period of time, and then sends off a trigger after a scheduled period of time with the same information.
I considered using a TTL on a dynamo db table, but I was wondering if I could use cloudwatch events for this since it seems ideal as it has cron expressions for cloudwatch rules.
I know I can setup a cloudwatch rule to trigger say every 15 minutes, but how do I setup cloudwatch such that only my custom information gets picked up by this rule and I can pass some information into this event so that when the trigger gets sent to the target, my custom information is sent to the target as well?
DynamoDB TTL is a bad fit as you don’t get a ton of granularity.
You can use the CloudWatch events PutEvent api to put a custom event. That should get you where you want.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/AddEventsPutEvents.html
Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.
I'm trying to set up ElasticSearch import process from DynamoDB table. I have already created AWS Lambda and enabled DynamoDB stream with trigger that invokes my lambda for every added/updated record. Now I want to perform initial seed operation (import all records that are currently in my DynamoDB table to ElasticSearch). How do I do that? Is there any way to make all records in a table be "reprocessed" and added to stream (so they can be processed by my lambda)? Or is it better to write a separate function that will manually read all data from the table and send it to ElasticSearch - so basically have 2 lambdas: one for initial data migration (executed only once and triggered manually by me), and another one for syncing new records (triggered by DynamoDB stream events)?
Thanks for all the help :)
Depending on how Large your dataset is you won't be able to seed your database in Lambda as there is a max timeout of 300 seconds (EDIT: This is now 15 minutes, thanks #matchish).
You could spin up an EC2 instance and use the SDK to perform a DynamoDB scan operation and batch write to your Elasticsearch instance.
You could also use Amazon EMR to perform a Map Reduce Job to export to S3 and from there process all your data.
I would write a script that will touch each records in dynamodb. For each items in your dynamodb, add a new property called migratedAt or whatever you wish. Adding this property will trigger dynamodb stream which in turn will trigger your lambda handler. Based on your question, your lambda handler already handles the update so there is no change there.