I read the Eric Evan's book about DDD, chapter Aggregates.
When dealing with the Order/OrderLine example, it is stated:
When both users have saved their changes, an Order will be stored in the
database that violates the invariant of the domain model. An important
business rule has been broken. And nobody even knows. Clearly, locking
a single line-item isn’t an adequate safeguard. If, instead, we locked
an entire Order at a time, the problem would have been prevented.
I well know that the essence of Aggregate is to protect invariants with a single wrapped database transaction.
But should every aggregate be specified with a read-lock at database side to prevent potential concurrency issues (race condition) while modifying this aggregate simultaneously by multiple users?
Is the real meaning of making an aggregate is to gather some elements for a read-lock at database side?
Any clarification about this would make me happy.
No, the two are orthogonal:
The goal of the aggregate design is to establish a consistency boundary and protect the invariants within that boundary. The goal of the locking design is to enable an appropriate level of concurrency within the application.
What this means is that with the same aggregate design, different locking mechanisms might make sense (depending on the non-functional requirements of the application).
I well know that the essence of Aggregate is to protect invariants
with a single wrapped database transaction.
Is it? Aggregate roots are there as a consistency boundary around their own invariants. Given that persistence happens across a network in a completely different process, how can an aggregate in a process ever hope to guarantee consistency around something that is WAY outside its control?
The essence of DDD is separation of domain and infrastructure concerns; aggregates should be couched in terms of the domain modelled (Orders, Products, Customers) and everything else (databases, persistence, locking, transactions) are infrastructure and should not be polluting your domain model.
Related
I was reading the DynamoDB documentation and found two interesting features:
Eventual consistent reads
Strongly consistent reads
Conditional updates
My question is, how do these three things interact with each other? Mostly I'm wondering if they conditional updates use a strongly consistent reads for checking the condition, or do they use eventually consistent reads? If it's the later, there is still a race condition, correct?
For a conditional update you need strong consistency. I am going to guess that an update is a separate operation in which consistent read + write happen atomically and fail/succeeded together.
The way to think of Dynamo is like a group of separated entities that all keep track of the state and inform each other of updates that are made / agree if such updates can be propagated to the whole group or not.
When you (dynamo api on your behalf) write you basically inform a subset of these entities that you want to update data. After that the data propagates to all of these entities.
When you do an eventual consistent read you read it from one of the entities. It's eventual consistent meaning that there is a possibility that you will read from one of the entities that did not get the memo yet.
When doing a strong consistent read you read from enough entities to ensure that what you're read has propagated. If propagation is in progress you need to wait.
Having read the following statement from the official documentation of OrientDB:
In order to guarantee atomicity and consistency, OrientDB acquire an
exclusive lock on the storage during transaction commit.
I am wondering if my understanding of the situation is correct. Here is how I assume this will work:
Thread 1 opens a transaction, and reads records #1:100 to #1:200, some from class A, and some from class B (something which cannot be determined without the transaction coming to a close).
Thread 1 massages the data, maybe even inserting a few records.
Thread 1 starts to commit the data. As the database does not have any way to know which parts of the data might be effected by the open transaction, it will blindly block the whole storage unit and verify the #version to enforce optimistic locking on all possibly affected records.
Thread 2 tries to read record #1:1 (or any other record from the whole database) and is blocked by the commit process, which is aligned, AFAIK with exclusive locking on the storage unit. This block occurs, if I'm not off, regardless of the cluster the original data resides on, since we have multi-master datasets.
Thread 1 ends the commit process and the database becomes consistent, effectively lifting the lock.
At this point, any thread can operate on the dataset, transactionally or otherwise, and will not be bound by the exclusive locking mechanism.
If this is the case, during the exchange highlighted in point 3 the data store, in its entirety is in an effective trance state and cannot be reached to, read from, or interacted with in any meaningful way.
I do so hope that I am missing my guess.
Disclaimer: I have not had the chance to dig into the underlying code from the rather rich OrientDB codebase. As such, this is, at its best, an educated guess and should not be taken as any sort of reference as to how OrientDB actually operates.
Possible Workarounds:
Should worse come to worse and this happens to be the way OrientDB actually works, I would dearly welcome any workarounds to this conundrum. We are looking for meaningful ways that will still keep OrientDB as a viable option for an enterprise, scalable high-end application.
In current release of OrientDB, transactions lock the storage in exclusive mode. Fortunately OrientDB works in optimistic way and this is done "only" at commit() time. So no matter when the transaction is begun.
If this is a showstopper for your use case, you could consider to:
don't use transactions. In this case you'll go in parallel with no locks, but consider using indexes requires the usage of lock at index level. In case the index is a bottleneck, the most common workaround is to create X sub-classes with an index on each. OrientDB will use the index of sub-classes if needed and on CRUD operation only the specific index will be locked
wait for OrientDB 3.0 where this limitation will be removed with real parallel transaction execution
For my implementation, a particular write must be done in bulk and without the chance of another interfering.
I have been told that two competing transactions in this way will lead to the first one blocking the second, and the second may or may not complete after the first has.
Please post the documentation that confirms this. Also, what exactly happens to the second transaction if the first is blocking? Will it be queued, fail, or some combination?
If this cannot be confirmed, should the transaction isolation level for this transaction be set to SERIALIZABLE? If so, how can that be done with libpqxx prepared statements?
If the transactions are serialized, will the second transaction fail or be queued until the first has completed?
If either fail, how can this be detected with libpqxx?
The only way to conclusively prevent concurrency effects is to LOCK TABLE ... IN ACCESS EXCLUSIVE MODE each table you wish to modify.
This means you're really only doing one thing at a time. It also leads to fun problems with deadlocks if you don't always acquire your locks in the same order.
So usually, what you need to do is figure out what exactly the operations you wish to do are, and how they interact. Determine what concurrency effects you can tolerate, and how to prevent those you cannot.
This question as it stands is just too broad to usefully answer.
Options include:
Exclusively locking tables. (This is the only way to do a multi-row upsert without concurrency problems in PostgreSQL right now). Beware of lock upgrade and lock order related deadlocks.
appropriate use of SERIALIZABLE isolation - but remember, you have to be able to keep a record of what you did during a transaction and retry it if the tx aborts.
Careful row-level locking - SELECT ... FOR UPDATE, SELECT ... FOR SHARE.
"Optimistic locking" / optimistic concurrency control, where appropriate
Writing your queries in ways that make them more friendly toward concurrent operation. For example, replacing read-modify-write cycles with in-place updates.
I am relatively new to JPA and have become very confused about how to best optimistically lock and refresh entities. I would like a set of general purpose methods to handle this consistently in my project.
I may be calling the lock / refresh methods from within a method that does not know the state of the entity, it may have been passed a detatched or new / not saved entity object as well as one previously read from the database. For simplicity I would like my utility methods to handle all eventualities. Semantically the methods I am trying to implement are:
MyEntity refreshAndLock(MyEntity e)
Re-reads the entity from the database and locks it optimistically, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read and locked and a managed version returned.
MyEntity refresh(MyEntity e)
Just re-read the entity, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read.
MyEntity lockAndNotRefresh(MyEntity e)
Lock the version of the entity in memory (may already be out of date)
Any tips or links gratefully accepted. I haven't managed to find clear guidance on this which I'm surprised at since it seems like a common requirement.
1st, my main recommendation is: Don't try to implement your own generic data access layer. You have the EntityManager at your hands doing all the stuff for you. Keep your code simple and don't overengeneer. With a generic layer you are very likely introduce new problems and lower maintainability.
2nd, you have to ask yourself, what will be the typical use case of your application in order to decide about locking. Locking always brings the problem of bottlenecks and possible dead locks. So if your application reads much more than it writes or is likely not to access the same entity at once, you're better off with optimistic locking and then treat exceptions. JPA provides you with versioning, so you always know if some other thread changed your object. If you really need pessimistic locking, then go ahead and set it for those cases.
what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.