Datomic Transaction Functions on Every Transaction - clojure

Is there a way to append a transaction function to every transaction before it is committed in datomic? For example, regardless of the source of the transaction, we want to invalidate a record.

Yes, you can "append" or rather call a transaction function before a transaction is committed. The transaction processor will lookup the function in its :db/fn attribute and then invoke it, passing the value of the db (currently as of the beginning of the transaction). As such you will need to make the call before each transaction.
Please note there is no api or flow from transaction function to automatically call a transactor function on each transact call. For every call to transact you will need to call your validating transactor function.
http://docs.datomic.com/database-functions.html

If you want to validate the incoming transaction data, you can do some trickery on the peer side to validate it. For example, you can take a db, use with to get a "fake" db with the incoming transaction applied, and then validate based on that db. For example, you can easily get which entities that were touched based in this transaction, using the normal Datomic APIs on this "fake" db from with, and see if the incoming transaction touched entities that it was not allowed to touch.
If however you want to validate something on the transactor side, your only option is manually invoking database functions.

Related

AWS Lambda | How to rollback database changes due to execution timeout

My team is working on an AWS Lambda function that has a configured timeout of 30 seconds. Given that lambdas have this timeout constraint and the fact that they can be reused for subsequent requests, it seems like there will always be the potential for the function's execution to timeout prior to completing all of its necessary steps. Is this a correct assumption? If so, how do we bake in resiliency so that db updates can be rolled back in the case of a timeout occurring after records have been updated, but a response hasn't been returned to the function's caller?
To be more specific, my team is managing a Javascript-based lambda (Node.js 16.x) that sits behind an Api Gateway and is an implementation of a REST method to retrieve and update job records. The method works by retrieving records from DynamodDB given certain conditions, updates their states, then returns the updated job records to the caller. Is there a means to detect when a timeout has occurred and to rollback (either manually or automatically) the updated db records so that they're in the same state as when the lambda began execution?
It is important to consider the consequences of what you are trying to do here. Instead of finding ways to detect when your Lambda function is about to expire, the best practice is to first monitor a good chunk of executed requests and analyze how much time, on average, it takes to complete the said requests. Perhaps 30 seconds may not be enough to complete the transaction implemented as a Lambda function.
Once you work with an admittable timeout that suits the average execution time for requests, you can minimize the possibility of rollbacks because of incomplete executions with the support for transactions in DynamoDB. It allows you to group multiple operations together and submit them as a single all-or-nothing, thus ensuring atomicity.
Another aspect related to the design of your implementation is about how fast can you retrieve data from DynamoDB without compromising the timeout. Currently, your code retrieves records from DynamoDB and then updates them if certain conditions are met. This creates a need for this read to happen as fast as possible so the subsequent operation of update can start. A way for you to speed up this read is enabling the DAX (DynamoDB Accelerator) to achieve in-memory acceleration. This acts as a cache for DynamoDB with microseconds of latency.
Finally, if you wat to be extra careful and not even start a transaction in DynamoDB because there will be not enough time to do so, you can use the context object from the Lambda API to query for the remaining time of the function. In Node.js, you can do this like this:
let remainingTimeInMillis = context.getRemainingTimeInMillis()
if (remainingTimeInMillis < TIMEOUT_PASSED_AS_ENVIRONMENT_VARIABLE) {
// Cancel the execution and clean things up
}

How to achieve consistent read across multiple SELECT using AWS RDS DataService (Aurora Serverless)

I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..
With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.

Can DynamoDB stream see an uncomitted transaction?

I have a DynamoDB table where I am using transactional writes (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transactions.html). The transaction consists of 2 puts. Let's say the first put succeeds and the second fails. In this scenario, the first put will be rolled back by the transaction library.
I also have DynamoDB streams (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) enabled on the table and another application consumes from that stream.
Question: In the rollback scenario, will the first successful put result in a DynamoDB stream event and the rollback will result in another? If yes, is there is a way to prevent this, that is, to ensure that a stream event is triggered only for a fully completed transaction?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html
Changes made with transactions are propagated to global secondary
indexes (GSIs), DynamoDB streams, and backups eventually, after the
transaction completes successfully. Because of eventual consistency,
tables restored from an on-demand or point-in-time-recovery (PITR)
backup might contain some but not all of the changes made by a recent
transaction.
So As I read it, you won't see anything in the stream till after the transaction completes successfully.

Is Redis atomic when multiple clients attempt to read/write an item at the same time?

Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client

Locking Behavior In Spanner vs MySQL

I'm exploring moving an application built on top of MySQL into Spanner and am not sure if I can replicate certain functionality from our MySQL db.
basically a simplified version of our mysql schema would look like this
users idnamebalanceuser_transactionsiduser_idexternal_idamountuser_locksuser_iddate
when the application receives a transaction for a user the app starts a mysql transaction, updates the user_lock for that user, checks if the user has sufficient balance for the transaction, creates a new transaction, and then updates the balance. It is possible the application receive transactions for a user at the same time and so the lock forces them to be sequential.
Is it possible to replicate this in Spanner? How would I do so? Basically If the application receives two transactions at the same time I want to ensure that they are given an order and that the changed data from the first transaction is propagated to the second transaction.
Cloud Spanner would do this by default since it provides serializability which means that all transactions appear to have occurred in serial order. You can read more about the transaction semantics here:
https://cloud.google.com/spanner/docs/transactions#rw_transaction_semantics