Why do I receive two events after a update on dynamodb? - amazon-web-services

I have configured dynamodb stream to trigger my lambda. When I update an item on dynamodb table, I see my lambda is triggered twice with two different event. The NewImage and OldImage are same in these two events. They are only different in eventID, ApproximateCreationDateTime, SequenceNumber etc.
And there is only 1 million second different based on the timestamp.
I updated the item via dynamodb console which means there should be only one action happened. Otherwise, it is impossible to update item twice within 1 million second via console.
Is it expected to see two events?

This would not be expected behaviour.
If you're seeing 2 separate events this would indicate 2 separate actions occurred. As theres a different time this indicates a secondary action has occurred.
From the AWS Documentation the following is true
DynamoDB Streams helps ensure the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
This will likely be related to your application, ensure that you're not using multiple writes where you think there might be a single.
Also check your CloudTrail to see whether there are multiple API calls that you can see. I would imagine if you're using global tables there's a possibility of seeing a secondary api call as the contents of the item would be modified by the DynamoDB service.

Related

Trigger Lambda function based on attribute value in DynamoDB

I have a DynamoDB table whose items have these attributes: id, user, status. Status can take values A or B.
Is it possible to trigger a lambda based on only the value of attribute 'status' ?
Example, trigger the lambda when a new item is added to DDB with status == A or when the status of an existing item is updated to A.
(I am looking into DynamoDB streams for achieving this, but I have not come across an example where anyone is using it for this use case.)
Is it possible to monitor a DDB based on value of a certain attribute ?
Example, when status == B, I don't want to trigger lambda, but only emit a metrics for that row. Basically, I want to have a metrics to see how many items in the table have status == B at a given point.
If not from DynamoDB , are the above two possible for any other storage type ?
Yes, as your initial research has uncovered, this is something you'll want to use DynamoDB Streams for.
You can trigger a lambda function based on an item being written, updated, or removed from Dynamo DB, and you can configure your stream subscription to filter on only attributes and values you care about.
DynamoDB recently introduced the ability to filter stream events before invoking your function, you can read more about how that works and how to configure it here
For more information about DynamoDB Stream use cases, this post may be helpful.

Set Dynamodb tables replication with boto3 as soon as possible

I'm creating dynamodb tables with boto3/python using the client.create_table() method, an asynchronous action. However, you cannot create a global table(replication) until "after" the table is in ACTIVE status. We would like the replication to happen as soon as possible, however, we need to return to the user ASAP and can't wait for the tables to be ready and run create_global_table() on them.
My first thoughts are to create an SQS message referencing the new tables that can then be processed by a lambda to monitor the status of said table until the status is ACTIVE and only then make the call to replicate the table, and only then delete the message.
I suppose I could also create a lambda that runs every minute and scans for tables that don't have global tables enabled then run create_global_table() on them. This seems in efficient these tables are not created very often.
Can anyone think of a better way of doing this?
If the client doesn't need to wait until the table is created, just set up a SQS Queue with a Lambda function behind it.
When a client requests a new table to be created, you send a message to the queue and respond to the client with something along the lines of the HTTP 202 status code.
You can then write the Lambda function that listens to the queue in a way to create the table, wait for it to become active and then create a global table.
Another option would be a Step Function that creates the table, then you loop until it becomes active and afterwards create the global table. Then you'd have no long running lambdas. You could trigger the step function from the initial lambda that accepts the request.

DynamoDb persist items for a specific time

I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.
I was thinking about two solutions:
1) Move expiry check to Java logic:
Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.
There are some limitations:
BatchGetItem let you retrieve max 100 items.
BatchWriteItem let you delete max 25 items
2) Move expiry check to the db logic:
Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.
Again, there are some limitations:
The result set from a Query is limited to 1 MB per call.
For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.
My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!
Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.
This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.
If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.

Is dynamoDB' item available for querying immediately?

I added some items to dynamoDB table using DynamoDBMapper.save. I then queried the item immediately. Will I definitely get the saved item? Or I should put thread.sleep() before querying the item? In SQL database, we use transactions and we can guarantee that we will get the item once the record is inserted to sql table. But for dynamoDB, I am not sure. Checked AWS dynamodb documents but didn't find related information.
DynamoDB reads are eventually consistent by default. However, DynamoDB does allow you to specify strongly consistent reads using the ConsistentRead parameter for Read operations. It does come at a cost however, strongly consistent reads take up twice as much Read Capacity Units.
See: Read consistency in DynamoDB

Can DynamoDB stream see an uncomitted transaction?

I have a DynamoDB table where I am using transactional writes (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transactions.html). The transaction consists of 2 puts. Let's say the first put succeeds and the second fails. In this scenario, the first put will be rolled back by the transaction library.
I also have DynamoDB streams (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) enabled on the table and another application consumes from that stream.
Question: In the rollback scenario, will the first successful put result in a DynamoDB stream event and the rollback will result in another? If yes, is there is a way to prevent this, that is, to ensure that a stream event is triggered only for a fully completed transaction?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html
Changes made with transactions are propagated to global secondary
indexes (GSIs), DynamoDB streams, and backups eventually, after the
transaction completes successfully. Because of eventual consistency,
tables restored from an on-demand or point-in-time-recovery (PITR)
backup might contain some but not all of the changes made by a recent
transaction.
So As I read it, you won't see anything in the stream till after the transaction completes successfully.