MVCC & B-Tree & Concurrency

MVCC & B-Tree & Concurrency - concurrency

I'm currently reading dbms book and as i've understood Mvcc (Multi version concurrency control) is used for high concurrent read and write transactions.
But "concurrency control on search structures" chapter mentions different locking concepts (lock coupling,link technique etc) for B-Trees.
Doesn't Mvcc applied on B-Tree's internal and leaf nodes in dbms?
Are B-Tree concurrency and MVCC completeley different things?If so how Mvvc is implemented in dbms?

MVCC can be implemented in a variety of ways. The only requirement is that somehow older row versions are available.
For instance, SQL Server stores them in a temporary database that is reset when the server restarts.
Postgres stores row versions as hidden rows directly in the b-tree. It adds a hidden key column to the tree. When reading from the tree is only exposes the version that logically should be seen.
RavenDB's Voron manages b-tree pages as immutable data. Writes create entirely new trees. MVCC is therefore implemented as reading from the correct immutable tree.
Databases rarely lock physical structures for an extended amount of time. It is not a good idea to enable the database client to stop progress on database internal structures. Internal structures usually are locked very briefly. Logical row locks are treated separately.
If I had to guess concurrency control on search structures refers to physical thread-safety. This usually does not involve MVCC because there is no need to manage multiple versions. Normal in-memory locks are sufficient for brief accesses.

MVCC means you don't need to use locks.
Imagine each transaction gets a number timestamp that goes up for each transaction. We have transactions 1 and 2 in this example.
Transaction 1 reads A and writes value (A + 1). The snapshot isolation creates a temporary version of (A) which transaction 1 owns. The read timestamp of A is set to Transaction 1.
If transaction 2 comes along at the same time and reads A, it will also read the committed A -- it wont see A + 1 because it hasn't been committed. Transaction 2 can see versions of A that are == lastCommittedA and <= transaction 2.
At the time transaction 2 reads A, it will also check the read timestamp of A and see that a transaction 1 is there and check transaction 1 timestamp < transaction 2 timestamp. Because 1 < 2 then the transaction 2 will be aborted because it already depends on an old value of A.
(I have implemented MVCC in Java. See transaction, runner and mvcc code I have implemented a btree in Python.)

Related

Predictively computing potentially needed values on large shared data structure with infrequent updates

I have a system I need to design with low latency in mind, processing power and memory are generous. I have a large (several GB) data structure that is updated once every few seconds. Many (read only) operations are going to run against this data structure between updates, in parallel, accessing it heavily. As soon as an update occurs, all computations in progress should be cleanly cancelled, as their results are invalidated by the update.
The issue I'm running into here is that writes are infrequent enough, and readers access so often that locking around individual reader access would have a huge hit to performance. I'm fine with the readers reading invalid data, but then I need to deal with any invariants broken (assertions) or segfaults due to stale pointers, etc. At the same time, I can't have readers block writers, so reader-writer locks acquired at every reader's thread start is unacceptable.
The only solution I can think of has a number of issues, which is to allocate a mapping with mmap, put the readers in separate processes, and mprotect the memory to kill the workers when it's time to update. I'd prefer a cross-platform solution (ideally pure C++), however, and ideally without forking every few seconds. This would also require some surgery to get all the data structures located in shm.
Something like a revocable lock would do exactly what I need, but I don't know of any libraries that provide such functionality.

If this was a database I'd use multi-versions concurrency control. Readers obtain a logical snapshot while the underlying physical data structures are mostly lock-free (or locked very shortly and fine-grainedly).
You say your memory is generously equipped. Can you just create a complete copy of the data structure? Then you modify the copy and swap it out atomically.
Or, can you use immutable data-structures so that readers continue to use the old version and the writer creates new objects?
Or, you implement MVCC in a fine-grained way. Let's say you want to version a hash-set. Instead of keeping one value per key, you keep one value per key per version. Readers read from the latest version that is <= the version that existed when they started to read. Writers create a new version number for each write "transaction". Only when all writes are complete readers would start picking up changes from the new version. This is how MVCC-databases do it.
Besides these approaches I also liked your mmap idea. I don't think you need a separate process is your OS supports copy-on-write memory mappings. Then you can map the same memory area multiple times and provide a stable snapshot to readers.

Optimal PostgreSQL isolation level for multiprocessing app

I have an app that spins up multiple processes to read large amounts of data from several PostgreSQL tables to do number crunching, and then stores the results in separate tables.
When I tested this with just a single process, it was blazing fast and was using almost 100% CPU, but when I tried using 8 processes on an 8 core machine, all processes registered about 1% CPU and the whole task seemed to take even longer.
When I check pg_stat_activity, I saw several connections listed as "<IDLE> in transaction". Following some advice here, I looked at pg_locks, and I'm seeing hundreds of "AccessShareLock" locks on the dozens of read-only tables. Based on the docs, I believe this is the default, but I think this is causing the processes to step on each others feet, negating any benefit to multi-processing.
Is there a more efficient isolation level to use, or better way to tune PostgreSQL to allow faster read-only access to several processes, so each doesn't need to lock the table? Specifically, I'm using Django as my ORM.

Not sure what throttles your multiple cores, but it has nothing to do with the isolation level. Even if you have concurrent write operations. Per documentation:
The main advantage of using the MVCC model of concurrency control
rather than locking is that in MVCC locks acquired for querying
(reading) data do not conflict with locks acquired for writing data,
and so reading never blocks writing and writing never blocks reading.
PostgreSQL maintains this guarantee even when providing the strictest
level of transaction isolation through the use of an innovative
Serializable Snapshot Isolation (SSI) level.
Bold emphasis mine.
Of course, reading also never blocks reading.
Maybe you need to reconfigure resource allocation on your server? Default configuration is regularly to conservative. On the other hand, some parameters should not be set too high in a multi-user environment. work_mem comes to mind. Check the list for Performance Optimization in the Postgres Wiki.
And finally:
Django as my ORM.
ORMs often try to stay platform-independent and fail to get the full potential out of a particular RDBMS. They are primitive crutches and don't play well with performance optimization.

Concurrent writes in Cassandra: Are conflicts possible?

Does Cassandra guarantee consistency of replicas in case of concurrent writes? For example, if N=3, W=3 and there are 3 concurrent writers, is it possible to end up with 3 different values on each replica?
Is it a Cassandra-specific problem or does the canonical Dynamo design also has this problem, despite its use of vector clocks?

Cassandra uses client-provided timestamps in this case, to ensure each replica keeps the 'latest' value. In your example, where you write to each replica, even when replicas receive the writes in different order, they will use the timestamp provided with the writes to decide which one to keep. Writing the same key with an older timestamp to a replica will just be ignored.
This mechanism isn't just needed to cope with concurrent writes - Cassandra can receive writes out of order over long periods of time (ie replying hints to a recently down node). To cope with this, when Cassandra compacts of SSTables and encounters two keys that are the same, it will use the timestamps to decide which one is kept.
Similarly, Cassandra has a feature called read repair. On read, Cassandra will compare the timestamp given by each replica and return the value associated with the latest timestamp to the client. It will then write this value back to any replicas which were out of date (this can have a performance impact, so the chance of it doing the subsequent write is tuneable).

Just to add on tom.wilkie's answer
If you want to guarantee a good consistency to your data with the latest value being kept try to read AND write always at LOCAL_QUORUM or QUORUM consistency.

Optimistic vs Multi Version Concurrency Control - Differences?

I am trying to find out, what the difference between optimistic concurrency control (OCC) and multi version concurrency control (MVCC) is?
So far I know that both is based on version checking for updates.
In OCC, I read about transactions that acquire no locks for reading access, only for the later update which will fail if in between the version was incremented and version checking fails. In this case the transaction will be rolled back.
In MVCC, it is basically the same, or not? Where is the difference?

I think they are sometimes used interchangeably, and if the transaction only involves one object then they are essentially the same, but MVCC is an extension of optimistic concurrency (or a version of it) that provides guarantees when more than one object is involved.
Say that you have two objects, A and B, which must maintain some invariant between them, e.g. they are two numbers whose sum is constant. Now, a transaction T1 subtracts 10 from A and adds it to B, while, concurrently, another transaction T2 is reading the two numbers.
Even if you optimistically update A and B independently (CAS them), T2 could get an inconsistent view of the two numbers (say, if it reads A before it's modified but reads B after it's been modified). MVCC would ensure T2 reads a consistent view of A and B by possibly returning their old values, i.e., it must save the old versions.
To sum up, optimistic locking (or optimistic concurrency control), is a general principle for synchronization w/o locks. MVCC is an optimistic technique which allows isolated transactions which span multiple objects.

To directly reply to the question, multi version concurrency control (MVCC) is a concurrency control method, (typically) belonging in the category of optimistic concurrency control (OCC).
There are 2 main concurrency control approaches:
Pessimistic Concurrency Control: this approach assumes that conflicting operations happen more frequently (that's why it's called pessimistic). Since the conflicts are common, this approach makes use of locks to prevent conflicting operations from executing, assuming that there is no significant overhead from their usage.
Optimistic Concurrency Control: this approach assumes that conflicting operations are rare and they do not happen so frequently. Under this assumptions, the locks would impose significant & not needed overhead to the performance. For this reason, this approach generally avoids locking and attempts to execute the operations, checking (at the commit of each transaction), whether there has been a conflict with another transaction during its operations. If there was any conflict, this approach proceeds with aborting the transactions that had conflicting operations.
One widely known algorithm of pessimistic concurrency control is the 2-phase locking.
Two widely known algorithms of optimistic concurrency control are:
The timestamp-based concurrency control
The multi-version concurrency control
The main difference between these 2 algorithms is the following. The timestamp-based algorithm assigns a single (more correctly one for each kind of operation, read & write) timestamp to each object, denoting the last transaction that accessed it. So, each transaction checks during the operation, if it conflicts with the last transaction that accessed the object. The multi-version approach maintains multiple versions of each object, each one corresponding to a transaction. As a result, the multi-version approach manages to have fewer aborts than the first approach, since a potentially conflicting transaction can write a new version, instead of aborting in some cases. However, this is achieved at the cost of more storage required for all the versions.
Strictly speaking, MVCC is concerned mostly with how data are stored, i.e. the fact that there can be multiple physical versions for each data item. As a result, it is theoretically possible to combine it with pessimistic methods as well(e.g. locking), but its multi-version nature is best combined with optimistic methods.

Just to rectify Dimos's answer: timestamp based concurrency control is still a pessimistic method (it may still abort/block transactions during their execution phase).

How does db4o support concurrency and transactions?

We are looking at db40 for a high volume e-commerce website using Java on the server side. Concurrency and transactional support is very important to us. When a customer purchases an item we need to lock the product and customer objects to update inventory and customer order history, respectively, in a single transaction. Is this possible with db4o? I want to make sure it support multi-object transactions.

There are already similar questions here, like this one. And my answer is more or less the same.
About the high volume e-commerce website: db4o was never build as a high volume, big database but rather for embedded use cases like desktop and mobile apps. Well it depends what a 'high volume' means. I assume that it means hundreds or concurrent transactions. Thats certainly out of scope of db4o.
Concurrency and transactional support: The db4o core is still inherently single threaded and therefore can only serve a small amount of concurrent operations. db4o supports transactions with the read committed isolation. That means that a transaction can only see the committed state of other transactions. In practice thats a very weak guarantee.
To your example: you can update the purchase with the product and consumer in one transaction. However another transaction could update any of these objects and commit. Then a running transaction which already has read some objects might does calculations with the old value and stores it. So the weak isolation 'taints' your state.
You could use locks to prevent that, but db4o hasn't any nice object-locking mechanism. And it would decrease the performance further.
All in all I think you probably need a 'larger' database, which has better support for Concurrency and transaction-handling.

It sounds like you need to use db4o semaphores.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js