Can DynamoDB stream see an uncomitted transaction? - amazon-web-services

I have a DynamoDB table where I am using transactional writes (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transactions.html). The transaction consists of 2 puts. Let's say the first put succeeds and the second fails. In this scenario, the first put will be rolled back by the transaction library.
I also have DynamoDB streams (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) enabled on the table and another application consumes from that stream.
Question: In the rollback scenario, will the first successful put result in a DynamoDB stream event and the rollback will result in another? If yes, is there is a way to prevent this, that is, to ensure that a stream event is triggered only for a fully completed transaction?

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html
Changes made with transactions are propagated to global secondary
indexes (GSIs), DynamoDB streams, and backups eventually, after the
transaction completes successfully. Because of eventual consistency,
tables restored from an on-demand or point-in-time-recovery (PITR)
backup might contain some but not all of the changes made by a recent
transaction.
So As I read it, you won't see anything in the stream till after the transaction completes successfully.

Related

AWS Lambda | How to rollback database changes due to execution timeout

My team is working on an AWS Lambda function that has a configured timeout of 30 seconds. Given that lambdas have this timeout constraint and the fact that they can be reused for subsequent requests, it seems like there will always be the potential for the function's execution to timeout prior to completing all of its necessary steps. Is this a correct assumption? If so, how do we bake in resiliency so that db updates can be rolled back in the case of a timeout occurring after records have been updated, but a response hasn't been returned to the function's caller?
To be more specific, my team is managing a Javascript-based lambda (Node.js 16.x) that sits behind an Api Gateway and is an implementation of a REST method to retrieve and update job records. The method works by retrieving records from DynamodDB given certain conditions, updates their states, then returns the updated job records to the caller. Is there a means to detect when a timeout has occurred and to rollback (either manually or automatically) the updated db records so that they're in the same state as when the lambda began execution?
It is important to consider the consequences of what you are trying to do here. Instead of finding ways to detect when your Lambda function is about to expire, the best practice is to first monitor a good chunk of executed requests and analyze how much time, on average, it takes to complete the said requests. Perhaps 30 seconds may not be enough to complete the transaction implemented as a Lambda function.
Once you work with an admittable timeout that suits the average execution time for requests, you can minimize the possibility of rollbacks because of incomplete executions with the support for transactions in DynamoDB. It allows you to group multiple operations together and submit them as a single all-or-nothing, thus ensuring atomicity.
Another aspect related to the design of your implementation is about how fast can you retrieve data from DynamoDB without compromising the timeout. Currently, your code retrieves records from DynamoDB and then updates them if certain conditions are met. This creates a need for this read to happen as fast as possible so the subsequent operation of update can start. A way for you to speed up this read is enabling the DAX (DynamoDB Accelerator) to achieve in-memory acceleration. This acts as a cache for DynamoDB with microseconds of latency.
Finally, if you wat to be extra careful and not even start a transaction in DynamoDB because there will be not enough time to do so, you can use the context object from the Lambda API to query for the remaining time of the function. In Node.js, you can do this like this:
let remainingTimeInMillis = context.getRemainingTimeInMillis()
if (remainingTimeInMillis < TIMEOUT_PASSED_AS_ENVIRONMENT_VARIABLE) {
// Cancel the execution and clean things up
}

DynamoDB Transaction Write

DynamoDB's transaction write states:
Multiple transactions updating the same items simultaneously can cause conflicts that cancel the transactions. We recommend following DynamoDB best practices for data modeling to minimize such conflicts.
If there are multiple simultaneous TransactionWriteItems on the same item simultaneously, will all the transaction write request fail with TransactionCanceledException? Or at least one request will succeed?
This purely depends on how the conflicts happen. So, if all the writes in a transaction get successful, it will be successful. If some other write/trasaction midifies one of the items, it will fail.

Why do I receive two events after a update on dynamodb?

I have configured dynamodb stream to trigger my lambda. When I update an item on dynamodb table, I see my lambda is triggered twice with two different event. The NewImage and OldImage are same in these two events. They are only different in eventID, ApproximateCreationDateTime, SequenceNumber etc.
And there is only 1 million second different based on the timestamp.
I updated the item via dynamodb console which means there should be only one action happened. Otherwise, it is impossible to update item twice within 1 million second via console.
Is it expected to see two events?
This would not be expected behaviour.
If you're seeing 2 separate events this would indicate 2 separate actions occurred. As theres a different time this indicates a secondary action has occurred.
From the AWS Documentation the following is true
DynamoDB Streams helps ensure the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
This will likely be related to your application, ensure that you're not using multiple writes where you think there might be a single.
Also check your CloudTrail to see whether there are multiple API calls that you can see. I would imagine if you're using global tables there's a possibility of seeing a secondary api call as the contents of the item would be modified by the DynamoDB service.

How to achieve consistent read across multiple SELECT using AWS RDS DataService (Aurora Serverless)

I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..
With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.

How dynamo DB handles atomic transaction, if further writes are made through dynamo db streams?

The question is how does dynamo DB handles acid transactions if some transactions are assosiated with dynamo DB streams which cause further write to dynamo db tables.
Is it gonna revert those writes occured using dynamo DB streams, or leave them unchanged and if it does revert those changes how does it work, the following behaviour could not be found in any official aws documentation.
for example consider the situation where there are total 3 transactions that needs to be atomic A, B and C but when A execute it use dynamoDB streams which execute a lambda function which makes a D transaction which makes modifications on some other dynamoDB table, now if the transaction fails how would it revert the modifications done by transaction D.
so, I'm expecting some expert would point me out in the right direction.
thanks
varnit