How should I best lock and refresh JPA entities? - jpa-2.0

I am relatively new to JPA and have become very confused about how to best optimistically lock and refresh entities. I would like a set of general purpose methods to handle this consistently in my project.
I may be calling the lock / refresh methods from within a method that does not know the state of the entity, it may have been passed a detatched or new / not saved entity object as well as one previously read from the database. For simplicity I would like my utility methods to handle all eventualities. Semantically the methods I am trying to implement are:
MyEntity refreshAndLock(MyEntity e)
Re-reads the entity from the database and locks it optimistically, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read and locked and a managed version returned.
MyEntity refresh(MyEntity e)
Just re-read the entity, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read.
MyEntity lockAndNotRefresh(MyEntity e)
Lock the version of the entity in memory (may already be out of date)
Any tips or links gratefully accepted. I haven't managed to find clear guidance on this which I'm surprised at since it seems like a common requirement.

1st, my main recommendation is: Don't try to implement your own generic data access layer. You have the EntityManager at your hands doing all the stuff for you. Keep your code simple and don't overengeneer. With a generic layer you are very likely introduce new problems and lower maintainability.
2nd, you have to ask yourself, what will be the typical use case of your application in order to decide about locking. Locking always brings the problem of bottlenecks and possible dead locks. So if your application reads much more than it writes or is likely not to access the same entity at once, you're better off with optimistic locking and then treat exceptions. JPA provides you with versioning, so you always know if some other thread changed your object. If you really need pessimistic locking, then go ahead and set it for those cases.

Related

asio, shared data, Active Object vs mutexes

I want to understand what is true-asio way to use shared data?
reading the asio and the beast examples, the only example of using shared data is http_crawl.cpp. (perhaps I missed something)
in that example the shared object is only used to collect statistics for sessions, that is the sessions do not read that object's data.
as a result I have three questions:
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Is it implied that interaction with shared data in asio-style is an Active Object? i.e. should mutexes be avoided?
Starting at the end, yes mutexes should be avoided. This is because all service handlers (initiations and completions) will be executed on the service thread(s) which means that blocking in a handler will block all other handlers.
Whether that leads to Active Object seems to be a choice to me. Yes, a typical approach would be like Active Object (see e.g. boost::asio and Active Object), where operations queue for the data.
However, other approaches are viable and frequently seen, like e.g. the data being moving with their task(s) e.g. through a task flow.
whether the statement will be correct that for reading the shared data it is also necessary to use "requests" to Active Object, and also no mutexes?
Yes, synchronization needs to happen for shared state, regardless of the design pattern chosen (although some design pattern reduce sharing alltogether).
The Asio approach is using strands, which abstract away the scheduling from the control flow. This gives the service the option to optimize for various cases (e.g. continuation on the same strand, the case where there's only one service thread anyway etc.).
has anyone tried to evaluate the overhead of "requests" to Active Object, compared to using mutexes?
Lots of people and lots of times. Often are wary of trying Asio because "it uses locking internally". If you know what you're doing, throughput can be excellent, which goes for most patterns and industrial-strength frameworks.
Specific benchmarks depend heavily on specific implementation choices. I'm pretty sure you can find examples on github, blogs and perhaps even on this site.
(perhaps I missed something)
You're missing the fact that all IO objects are not thread-safe, which means that they themselves are shared data for any composed asynchronous operation (chain)

Should aggregate involve a read-lock in database?

I read the Eric Evan's book about DDD, chapter Aggregates.
When dealing with the Order/OrderLine example, it is stated:
When both users have saved their changes, an Order will be stored in the
database that violates the invariant of the domain model. An important
business rule has been broken. And nobody even knows. Clearly, locking
a single line-item isn’t an adequate safeguard. If, instead, we locked
an entire Order at a time, the problem would have been prevented.
I well know that the essence of Aggregate is to protect invariants with a single wrapped database transaction.
But should every aggregate be specified with a read-lock at database side to prevent potential concurrency issues (race condition) while modifying this aggregate simultaneously by multiple users?
Is the real meaning of making an aggregate is to gather some elements for a read-lock at database side?
Any clarification about this would make me happy.
No, the two are orthogonal:
The goal of the aggregate design is to establish a consistency boundary and protect the invariants within that boundary. The goal of the locking design is to enable an appropriate level of concurrency within the application.
What this means is that with the same aggregate design, different locking mechanisms might make sense (depending on the non-functional requirements of the application).
I well know that the essence of Aggregate is to protect invariants
with a single wrapped database transaction.
Is it? Aggregate roots are there as a consistency boundary around their own invariants. Given that persistence happens across a network in a completely different process, how can an aggregate in a process ever hope to guarantee consistency around something that is WAY outside its control?
The essence of DDD is separation of domain and infrastructure concerns; aggregates should be couched in terms of the domain modelled (Orders, Products, Customers) and everything else (databases, persistence, locking, transactions) are infrastructure and should not be polluting your domain model.

What does the exclusive lock on storage for OrientDB entail exactly?

Having read the following statement from the official documentation of OrientDB:
In order to guarantee atomicity and consistency, OrientDB acquire an
exclusive lock on the storage during transaction commit.
I am wondering if my understanding of the situation is correct. Here is how I assume this will work:
Thread 1 opens a transaction, and reads records #1:100 to #1:200, some from class A, and some from class B (something which cannot be determined without the transaction coming to a close).
Thread 1 massages the data, maybe even inserting a few records.
Thread 1 starts to commit the data. As the database does not have any way to know which parts of the data might be effected by the open transaction, it will blindly block the whole storage unit and verify the #version to enforce optimistic locking on all possibly affected records.
Thread 2 tries to read record #1:1 (or any other record from the whole database) and is blocked by the commit process, which is aligned, AFAIK with exclusive locking on the storage unit. This block occurs, if I'm not off, regardless of the cluster the original data resides on, since we have multi-master datasets.
Thread 1 ends the commit process and the database becomes consistent, effectively lifting the lock.
At this point, any thread can operate on the dataset, transactionally or otherwise, and will not be bound by the exclusive locking mechanism.
If this is the case, during the exchange highlighted in point 3 the data store, in its entirety is in an effective trance state and cannot be reached to, read from, or interacted with in any meaningful way.
I do so hope that I am missing my guess.
Disclaimer: I have not had the chance to dig into the underlying code from the rather rich OrientDB codebase. As such, this is, at its best, an educated guess and should not be taken as any sort of reference as to how OrientDB actually operates.
Possible Workarounds:
Should worse come to worse and this happens to be the way OrientDB actually works, I would dearly welcome any workarounds to this conundrum. We are looking for meaningful ways that will still keep OrientDB as a viable option for an enterprise, scalable high-end application.
In current release of OrientDB, transactions lock the storage in exclusive mode. Fortunately OrientDB works in optimistic way and this is done "only" at commit() time. So no matter when the transaction is begun.
If this is a showstopper for your use case, you could consider to:
don't use transactions. In this case you'll go in parallel with no locks, but consider using indexes requires the usage of lock at index level. In case the index is a bottleneck, the most common workaround is to create X sub-classes with an index on each. OrientDB will use the index of sub-classes if needed and on CRUD operation only the specific index will be locked
wait for OrientDB 3.0 where this limitation will be removed with real parallel transaction execution

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.

Is it ok to store large objects (java component for example) in an Application variable?

I am developing an app right now which creates and stores a connection to a local XMPP server in the Application scope. The connection methods are stored in a cfc that makes sure the Application.XMPPConnection is connected and authorized each time it is used, and makes use of the connection to send live events to users. As far as I can tell, this is working fine. BUT it hasn't been tested under any kind of stress.
My question is: Will this set up cause problems later on? I only ask because I can't find evidence of other people using Application variables in this way. If I weren't using railo I would be using CF's event gateway instead to accomplish the same task.
Size itself isn't a problem. If you were to initialize one object per request, you'd burn a lot more memory. The problem is access.
If you have a large number of requests competing for the same object, you need to measure the access time for that object vs. instantiation. Keep in mind that, for data objects, more than one thread can read them. My understanding, though, is that when an object's function is called, it locks that object to other threads until the function returns.
Also, if the object maintains state, you need to consider what to do when multiple threads are getting/setting that data. Will you end up with race conditions?
You might consider handling this object in the session scope, so that it is only instantiated per user (who, likely, will only make one or two simultaneous requests).
Of course you can use application scope for storing these components if they are used by all users in different parts of application.
Now, possible issues are :
size of the component(s)
time needed for initialization if these are set during application start
racing conditions between setting/getting states of these components
For the first, there are ways to calculate size of a component in memory. Lately there were lots of posts on this topic so it would be easy to find some. If you dont have some large structure or query saved inside, I guess you're ok here.
Second, again, if you are not filling this cfc with some large query from DB or doing some slow parsing, you're ok here too.
Third, pay attention to possible situations, where more users are changing states of these components. If so use cflock on each setting of the components the state.