Avoiding concurrency issues in a web based application - concurrency

Consider a web application where students can book a room. Now there could be scenarios where multiple students can book the same room leading to data errors. How to avoid this kind of thing?
My Thoughts:
The rooms booked by the students will be stored in a database. So if I use the functionality of the transaction provided by the database, that should be enough to handle it
I am not sure if the transaction code written in the web application, say in java language, needs to be put in a synchronized block?

It's effectively impossible to categorically answer this question; The "correct" way depends on your implementation, tools, languages and desired user experience.
There are, however, some common models for both user experience and implementation. The most common models I've seen in your specific scenario are:
Pre-allocation, where selecting the room as a candidate for booking causes the Webapp to 'lock' that room for a set period of time, giving the user a chance to complete their booking. This is annoying if people often select rooms then don't complete their booking, as it means that rooms will be unavailable to book often (but never get booked).
Transaction Locking, where you attempt to book a room and allow transaction exceptions from your DB to indicate that it's already booked. This is annoying if rooms are in high demand and people tend to try to book concurrently; They'll end up getting to the end of the process and be irritated.
If your system will have a single web server, I would probably lean towards the former method with a short timeout. Have the specific room selection be made at the end of the process (So the time between selecting a room and completing the booking is as short as possible). Assume users who are at that state will be completing their booking, and that it won't take long, then set the timeout accordingly. Then, control concurrency with a global, thread-safe pool and lean on your language's safe concurrency access model to control who wins in a contention scenario.

In order to handle the logic in your database, you have to correctly lock an entity, while doing a check whether your reservation is present or not, and then writing to the database. If you have done so, you can release the lock.
This way you'll get consistent data, so that there is no way, multiple reservations are made.
Doing it like this, you don't need business logic to handle locking. You just receive some return value, and then you have to decide what to do. Either tell the user that the reservation is a success, or else a new reservation has to be chosen.
Of course you can do the locking also in the business logic. In that case, you don't have to lock the database. As you mentioned, you would need a synchronize block. Inside this block you do the check and write to the database.

Related

Event Sourcing/CQRS doubts about aggregates, atomicity, concurrency and eventual consistency

I'm studying event sourcing and command/query segregation and I have a few doubts that I hope someone with more experience will easily answer:
A) should a command handler work with more than one aggregate? (a.k.a. should they coordinate things between several aggregates?)
B) If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store? (how can I garantee no other command handler will "interleave" events in between?)
C) In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second. This makes me think that a lot of requests will just fail at huge rates (a lot of ConcurrencyExceptions), how you guys deal with this?
D) How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus? (how to eventually push those "confirmed" events back to the event bus?)
E) How you guys deal with the eventual consistency in the projections? you just live with it? or in some cases people lock things there too? (waiting for an update for example)
I made a sequence diagram to better ilustrate all those questions
(and sorry for the bad english)
If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store?
Most reasonable event store implementations will allow you to batch multiple events into the same transaction.
In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second.
If you have lots of parallel threads trying to maintain a complex invariant, something has gone badly wrong.
For "events" that aren't expected to establish or maintain any invariant, then you are just writing things to the end of a stream. In other words, you are probably not trying to write an event into a specific position in the stream. So you can probably use batching to reduce the number of conflicting writes, and a simple retry mechanism. In effect, you are using the same sort of "fan-in" patterns that appear when you have concurrent writers inserting into a queue.
For the cases where you are establishing/maintaining an invariant, you don't normally have many concurrent writers. Instead, specific writers have authority to write events (think "sharding"); the concurrency controls there are primarily to avoid making a mess in abnormal conditions.
How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus?
Use pull, rather than push, as the primary subscription mechanism. Make sure that subscribers can handle duplicate messages safely (aka "idempotent"). Don't use a message subscription that can re-order events when you need events strictly ordered.
How you guys deal with the eventual consistency in the projections? you just live with it?
Pretty much. Views and reports have metadata information in them to let you know at what fixed point in "time" the report was accurate.
Unless you lock out all writers while a report is being consumed, there's a potential for any data being out of date, regardless of whether you are using events vs some other data model, regardless of whether you are using a single data model or several.
It's all part of the tradeoff; we accept that there will be a larger window between report time and current time in exchange for lower response latency, an "immutable" event history, etc.
should a command handler work with more than one aggregate?
Probably not - which isn't the same thing as always never.
Usual framing goes something like this: aggregate isn't a domain modeling pattern, like entity. It's a lifecycle pattern, used to make sure that all of the changes we make at one time are consistent.
In the case where you find that you want a command handler to modify multiple domain entities at the same time, and those entities belong to different aggregates, then have you really chosen the correct aggregate boundaries?
What you can do sometimes is have a single command handler that manages multiple transactions, updating a different aggregate in each. But it might be easier, in the long run, to have two different command handlers that each receive a copy of the command and decide what to do, independently.

How should I best lock and refresh JPA entities?

I am relatively new to JPA and have become very confused about how to best optimistically lock and refresh entities. I would like a set of general purpose methods to handle this consistently in my project.
I may be calling the lock / refresh methods from within a method that does not know the state of the entity, it may have been passed a detatched or new / not saved entity object as well as one previously read from the database. For simplicity I would like my utility methods to handle all eventualities. Semantically the methods I am trying to implement are:
MyEntity refreshAndLock(MyEntity e)
Re-reads the entity from the database and locks it optimistically, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read and locked and a managed version returned.
MyEntity refresh(MyEntity e)
Just re-read the entity, or do nothing for entities yet to be saved to the database. Detached entities would also be re-read.
MyEntity lockAndNotRefresh(MyEntity e)
Lock the version of the entity in memory (may already be out of date)
Any tips or links gratefully accepted. I haven't managed to find clear guidance on this which I'm surprised at since it seems like a common requirement.
1st, my main recommendation is: Don't try to implement your own generic data access layer. You have the EntityManager at your hands doing all the stuff for you. Keep your code simple and don't overengeneer. With a generic layer you are very likely introduce new problems and lower maintainability.
2nd, you have to ask yourself, what will be the typical use case of your application in order to decide about locking. Locking always brings the problem of bottlenecks and possible dead locks. So if your application reads much more than it writes or is likely not to access the same entity at once, you're better off with optimistic locking and then treat exceptions. JPA provides you with versioning, so you always know if some other thread changed your object. If you really need pessimistic locking, then go ahead and set it for those cases.

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.

taking a snapshot of complex mutable structure in concurrent environment

Given: a complex structure of various nested collections, with refs scattered in different levels.
Need: A way to take a snapshot of such a structure, while allowing writes to continue to happen in other threads.
So one the "reader" thread needs to read whole complex state in a single long transaction. The "writer" thread meanwhile makes modifications in multiple short transactions. As far as I understand, in such a case STM engine utilizes the refs history.
Here we have some interesting results. E.g., reader reaches some ref in 10 secs after beginning of transaction. Writer modifies this ref each 1 sec. It results in 10 values of ref's history. If it exceeds the ref's :max-history limit, the reader transaction will be run forever. If it exceeds :min-history, transaction may be rerun several times.
But really the reader needs just a single value of ref (the 1st one) and the writer needs just the recent one. All intermediate values in history list are useless. Is there a way to avoid such history overuse?
Thanks.
To me it's a bit of a "design smell" to have a large structure with lots of nested refs. You are effectively emulating a mutable object graph, which is a bad idea if you believe Rich Hickey's take on concurrency.
Some various thoughts to try out:
The idiomatic way to solve this problem in Clojure would be to put the state in a single top-level ref, with everything inside it being immutable. Then the reader can take a snapshot of the entire concurrent state for free (without even needing a transaction). Might be difficult to refactor to this from where you currently are, but I'd say it is best practice.
If you only want the reader to get a snaphot of the top level ref, you can just deref it directly outside of a transaction. Just be aware that the refs inside may continue to get mutated, so whether this is useful or not depends on the consistency requirements you have for the reader.
You can do everything within a (dosync...) transaction as normal for both readers and writer. You may get contention and transaction retries, but it may not be an issue.
You can create a "snapshot" function that quickly traverses the graph and dereferences all the refs within a transaction, returning the result with the refs stripped out (or replaced by new cloned refs). The reader calls snapshot once, then continues to do the rest of it's work after the snapshot is completed.
You could take a snapshot immediately each time after the writer finishes, and store it separately in an atom. Readers can use this directly (i.e. only the writer thread accesses the live data graph directly)
The general answer to your question is that you need two things:
A flag to indicate that the system is in "snapshot write" mode
A queue to hold all transactions that occur while the system is in snapshot mode
As far as what to do if the queue is overflows because the snapshot process isn't fast enough, well, there isn't much you can do about that except either optimize that process, or increase the size of your queue - it's going to be a balance that you'll have to strike depending on the needs of you app. It's a delicate balance, and is going to take some pretty extensive testing, depending on how complex your system is.
But you're on the right track. If you basically put the system in "snapshot write mode", then your reader/writer methods should automatically change where they are reading/writing from, so that the thread that is making changes gets all the "current values" and the thread reading the snapshot state is reading all the "snapshot values". You can split these up into separate methods - the snapshot reader will use the "snapshot value" methods, and all other threads will read the "current value" methods.
When the snapshot reader is done with its work, it needs to clear the snapshot state.
If a thread tries to read the "snapshot values" when no "snapshot state" is currently set, they should simply respond with the "current values" instead. No biggie.
Systems that allow snapshots of file systems to be taken for backup purposes, while not preventing new data from being written, follow a similar scheme.
Finally, unless you need to keep a record of all changes to the system (i.e. for an audit trail), then the queue of transactions actually doesn't need to be a queue of changes to be applied - it just needs to store the latest value of whatever thing you're changing in the system. When the "snapshot state" is cleared, you simply write all those non-committed values to the system, and call it done. The thing you might want to consider is making a log of those changes yet to be made, in case you need to recover from a crash, and have those changes still applied. The log file will give you a record of what happened, and can let you do this recovery. That's an oversimplification of the recovery process, but that's not really what your question is about, so I'll stop there.
What you are after is the state-of-the-art in high-performance concurrency. You should look at the work of Nathan Bronson, and his lab's collaborations with Aleksandar Prokopec, Phil Bagwell and the Scala team.
Binary Tree:
http://ppl.stanford.edu/papers/ppopp207-bronson.pdf
https://github.com/nbronson/snaptree/
Tree-of-arrays -based Hash Map
http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf
However, a quick look at the implementations above should convince you this is not "roll-your-own" territory. I'd try to adapt an off-the-shelf concurrent data structure to your needs if possible. Everything I've linked to is freely available on the JVM, but its not native Clojure as such.

Is it ok to store large objects (java component for example) in an Application variable?

I am developing an app right now which creates and stores a connection to a local XMPP server in the Application scope. The connection methods are stored in a cfc that makes sure the Application.XMPPConnection is connected and authorized each time it is used, and makes use of the connection to send live events to users. As far as I can tell, this is working fine. BUT it hasn't been tested under any kind of stress.
My question is: Will this set up cause problems later on? I only ask because I can't find evidence of other people using Application variables in this way. If I weren't using railo I would be using CF's event gateway instead to accomplish the same task.
Size itself isn't a problem. If you were to initialize one object per request, you'd burn a lot more memory. The problem is access.
If you have a large number of requests competing for the same object, you need to measure the access time for that object vs. instantiation. Keep in mind that, for data objects, more than one thread can read them. My understanding, though, is that when an object's function is called, it locks that object to other threads until the function returns.
Also, if the object maintains state, you need to consider what to do when multiple threads are getting/setting that data. Will you end up with race conditions?
You might consider handling this object in the session scope, so that it is only instantiated per user (who, likely, will only make one or two simultaneous requests).
Of course you can use application scope for storing these components if they are used by all users in different parts of application.
Now, possible issues are :
size of the component(s)
time needed for initialization if these are set during application start
racing conditions between setting/getting states of these components
For the first, there are ways to calculate size of a component in memory. Lately there were lots of posts on this topic so it would be easy to find some. If you dont have some large structure or query saved inside, I guess you're ok here.
Second, again, if you are not filling this cfc with some large query from DB or doing some slow parsing, you're ok here too.
Third, pay attention to possible situations, where more users are changing states of these components. If so use cflock on each setting of the components the state.