libpqxx transaction serialization & consequences

libpqxx transaction serialization & consequences - c++

For my implementation, a particular write must be done in bulk and without the chance of another interfering.
I have been told that two competing transactions in this way will lead to the first one blocking the second, and the second may or may not complete after the first has.
Please post the documentation that confirms this. Also, what exactly happens to the second transaction if the first is blocking? Will it be queued, fail, or some combination?
If this cannot be confirmed, should the transaction isolation level for this transaction be set to SERIALIZABLE? If so, how can that be done with libpqxx prepared statements?
If the transactions are serialized, will the second transaction fail or be queued until the first has completed?
If either fail, how can this be detected with libpqxx?

The only way to conclusively prevent concurrency effects is to LOCK TABLE ... IN ACCESS EXCLUSIVE MODE each table you wish to modify.
This means you're really only doing one thing at a time. It also leads to fun problems with deadlocks if you don't always acquire your locks in the same order.
So usually, what you need to do is figure out what exactly the operations you wish to do are, and how they interact. Determine what concurrency effects you can tolerate, and how to prevent those you cannot.
This question as it stands is just too broad to usefully answer.
Options include:
Exclusively locking tables. (This is the only way to do a multi-row upsert without concurrency problems in PostgreSQL right now). Beware of lock upgrade and lock order related deadlocks.
appropriate use of SERIALIZABLE isolation - but remember, you have to be able to keep a record of what you did during a transaction and retry it if the tx aborts.
Careful row-level locking - SELECT ... FOR UPDATE, SELECT ... FOR SHARE.
"Optimistic locking" / optimistic concurrency control, where appropriate
Writing your queries in ways that make them more friendly toward concurrent operation. For example, replacing read-modify-write cycles with in-place updates.

Related

Event Sourcing/CQRS doubts about aggregates, atomicity, concurrency and eventual consistency

I'm studying event sourcing and command/query segregation and I have a few doubts that I hope someone with more experience will easily answer:
A) should a command handler work with more than one aggregate? (a.k.a. should they coordinate things between several aggregates?)
B) If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store? (how can I garantee no other command handler will "interleave" events in between?)
C) In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second. This makes me think that a lot of requests will just fail at huge rates (a lot of ConcurrencyExceptions), how you guys deal with this?
D) How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus? (how to eventually push those "confirmed" events back to the event bus?)
E) How you guys deal with the eventual consistency in the projections? you just live with it? or in some cases people lock things there too? (waiting for an update for example)
I made a sequence diagram to better ilustrate all those questions
(and sorry for the bad english)

If my command handler generates more than one event to store, how do you guys push all those events atomically to the event store?
Most reasonable event store implementations will allow you to batch multiple events into the same transaction.
In many articles I read people suggest using optimistic locking to write the new events generated, but in my use case I will have around 100 requests / second.
If you have lots of parallel threads trying to maintain a complex invariant, something has gone badly wrong.
For "events" that aren't expected to establish or maintain any invariant, then you are just writing things to the end of a stream. In other words, you are probably not trying to write an event into a specific position in the stream. So you can probably use batching to reduce the number of conflicting writes, and a simple retry mechanism. In effect, you are using the same sort of "fan-in" patterns that appear when you have concurrent writers inserting into a queue.
For the cases where you are establishing/maintaining an invariant, you don't normally have many concurrent writers. Instead, specific writers have authority to write events (think "sharding"); the concurrency controls there are primarily to avoid making a mess in abnormal conditions.
How to deal with the fact that the command handler can crash after storing the events in the event store but before publishing them to the event bus?
Use pull, rather than push, as the primary subscription mechanism. Make sure that subscribers can handle duplicate messages safely (aka "idempotent"). Don't use a message subscription that can re-order events when you need events strictly ordered.
How you guys deal with the eventual consistency in the projections? you just live with it?
Pretty much. Views and reports have metadata information in them to let you know at what fixed point in "time" the report was accurate.
Unless you lock out all writers while a report is being consumed, there's a potential for any data being out of date, regardless of whether you are using events vs some other data model, regardless of whether you are using a single data model or several.
It's all part of the tradeoff; we accept that there will be a larger window between report time and current time in exchange for lower response latency, an "immutable" event history, etc.
should a command handler work with more than one aggregate?
Probably not - which isn't the same thing as always never.
Usual framing goes something like this: aggregate isn't a domain modeling pattern, like entity. It's a lifecycle pattern, used to make sure that all of the changes we make at one time are consistent.
In the case where you find that you want a command handler to modify multiple domain entities at the same time, and those entities belong to different aggregates, then have you really chosen the correct aggregate boundaries?
What you can do sometimes is have a single command handler that manages multiple transactions, updating a different aggregate in each. But it might be easier, in the long run, to have two different command handlers that each receive a copy of the command and decide what to do, independently.

`sqlite3` ignores `sqlite3_busy_timeout`?

I use sqlite3 in multiple threads application (it is compiled with SQLITE_THREADSAFE=2). In watch window I see that sqlite->busyTimeout == 600000, i. e. that it is supposed to have 10 minutes timeout. However, sqlite3_step returns SQLITE_BUSY obvoiusly faster than after 10 minutes (it returns instantly actually, like if I never called sqlite3_busy_timeout).
What is the reason that sqlite3 ignores timeout and return error instantly?

One possibility: SQLite ignores the timeout when it detects a deadlock.
The scenario is as follows. Transaction A starts as a reader, and later attempts to perform a write. Transaction B is a writer (either started that way, or started as a reader and promoted to a writer first). B holds a RESERVED lock, waiting for readers to clear so it can start writing. A holds a SHARED lock (it's a reader) and tries to acquire RESERVED lock (so it can start writing). For description of various lock types, see http://sqlite.org/lockingv3.html
The only way to make progress in this situation is for one of the transactions to roll back. No amount of waiting will help, so when SQLite detects this situation, it doesn't honor the busy timeout.
There are two ways to avoid the possibility of a deadlock:
Switch to WAL mode - it allows one writer to co-exist with multiple readers.
Use BEGIN IMMEDIATE to start a transaction that may eventually need to write - this way, it starts as a writer right away. This of course reduces the potential concurrency in the system, as the price of avoiding deadlocks.

I made a lot of tests and share them here for other people who uses SQLite in multithreaded environment. SQLite threading support is not well documented, there are not any good tutorial that describes all threading issues in one place. I made a test program that creates 100 threads and sends random queries (INSERT, SELECT, UPDATE, DELETE) concurrently to single database. My answer is based on this program observation.
The only really thread safe journal mode is WAL. It allows multiple connections to do anything they need for the same database within one proces in the same manner as single threaded application does. Any other modes are not thread safe independently from timeouts, busy handlers and SQLITE_THREADSAFE preprocessor definition. They generate SQLITE_BUSY periodically, and it looks like complex programming task to expect such error always and handle it always. If you need thread safe SQLite that never returns SQLITE_BUSY like signle thread does, you have to set WAL journal mode.
Additionally, you have to set SQLITE_THREADSAFE=2 or SQLITE_THREADSAFE=1 preprocessor definition.
When done, you have to choose from 2 options:
You can call sqlite3_busy_timeout. It is enough, you are not required to call sqlite3_busy_handler, even from documentation it is not obvious. It gives you "default", "built-in" timeout functionality.
You can call sqlite3_busy_handler and implement timeout yourself. I don't see why, but may be under some nonstandard OS it is required. When you call sqlite3_busy_handler, it resets timeout to 0 (i. e. disabled). For desktop Linux & Windows you don't need it unless you like to write more complex code.

What does the exclusive lock on storage for OrientDB entail exactly?

Having read the following statement from the official documentation of OrientDB:
In order to guarantee atomicity and consistency, OrientDB acquire an
exclusive lock on the storage during transaction commit.
I am wondering if my understanding of the situation is correct. Here is how I assume this will work:
Thread 1 opens a transaction, and reads records #1:100 to #1:200, some from class A, and some from class B (something which cannot be determined without the transaction coming to a close).
Thread 1 massages the data, maybe even inserting a few records.
Thread 1 starts to commit the data. As the database does not have any way to know which parts of the data might be effected by the open transaction, it will blindly block the whole storage unit and verify the #version to enforce optimistic locking on all possibly affected records.
Thread 2 tries to read record #1:1 (or any other record from the whole database) and is blocked by the commit process, which is aligned, AFAIK with exclusive locking on the storage unit. This block occurs, if I'm not off, regardless of the cluster the original data resides on, since we have multi-master datasets.
Thread 1 ends the commit process and the database becomes consistent, effectively lifting the lock.
At this point, any thread can operate on the dataset, transactionally or otherwise, and will not be bound by the exclusive locking mechanism.
If this is the case, during the exchange highlighted in point 3 the data store, in its entirety is in an effective trance state and cannot be reached to, read from, or interacted with in any meaningful way.
I do so hope that I am missing my guess.
Disclaimer: I have not had the chance to dig into the underlying code from the rather rich OrientDB codebase. As such, this is, at its best, an educated guess and should not be taken as any sort of reference as to how OrientDB actually operates.
Possible Workarounds:
Should worse come to worse and this happens to be the way OrientDB actually works, I would dearly welcome any workarounds to this conundrum. We are looking for meaningful ways that will still keep OrientDB as a viable option for an enterprise, scalable high-end application.

In current release of OrientDB, transactions lock the storage in exclusive mode. Fortunately OrientDB works in optimistic way and this is done "only" at commit() time. So no matter when the transaction is begun.
If this is a showstopper for your use case, you could consider to:
don't use transactions. In this case you'll go in parallel with no locks, but consider using indexes requires the usage of lock at index level. In case the index is a bottleneck, the most common workaround is to create X sub-classes with an index on each. OrientDB will use the index of sub-classes if needed and on CRUD operation only the specific index will be locked
wait for OrientDB 3.0 where this limitation will be removed with real parallel transaction execution

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!

If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.

SQLite and checkout semantic

I'm starting to investigate on using SQLite. What I would like to do (among other things) is implementing some kind of checkout semantic. I.e if one sql connection makes a checkout lock on one column or row doesn't matter. I would like no other connections to be allowed for reading or modifying that data until the first connection releases the lock OR the first connection closes/application crach etc..
Would this be implementable in SQLite?
/Thanks in advance!

SQLite is not really designed for heavy concurrency - its locking model is to lock at the database level. If you need record-level locking (mostly you don't), you need a server based RDBMS.

Databases in general don't really support checkout semantics. The databases guarantee transaction isolation, but since they don't guarantee that transaction succeeds, they can let another transaction proceed with old version of data that another transaction just modified (and didn't commit yet) and if the transactions actually become non-serializable, just roll one back. Even if they do use locking, they still don't support it explicitly. You read the row, it becomes read-locked and if you write it, it becomes write-locked, but you don't have any control over it.
Sqlite in particular locks whole database when you start writing in a transaction unless in WAL mode. You can force the lock by starting the transaction with begin immediate instead of just begin. In WAL mode however it supports some kind of concurrency. Unfortunately I don't know the exact mode.
In any case, you'll probably end up having to implement the checkout semantics yourself. Or do without it, because checkout semantics is quite complicated by having to deal with stale checkouts.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js