Concurrency Kit hash table and multiple writers - concurrency

In Concurrency Kit documentation I see SPMC in get and put operations:
ck_ht_get_spmc()
ck_ht_put_spmc()
Does it means I can't use this hash table for multiple writers and I will have race conditions?
Currently I use mutex with put operation to make it work like single writer.

Yes, from the documentation for ck_ht_init:
The hash table is safe to access by multiple readers in the presence of one concurrent writer. Behavior is undefined in the presence of concurrent writers.

Related

What is the fastest possible solution for concurent read/write into hash-maps?

I am writing a network service which receives raw packets then converts them and puts them into a queue, there are also a couple of worker threads that take the converted packets from the queue and based on some rules update a hash-map. in order to prevent concurrent update on hash-map from different worker threads I have to use mutex. unfortunately using mutex imposes a big performance hit. I need to find a work around for this.
EDITED:
the converted packets contain a sessio_id, this session_id is used as the hash-map key. Before any insertion or update the session_id is first searched and if there is no session_id found then a new entry is added and this is exactly where i use mutex lock, otherwise if the session_id already exists I just update the existing value and there is no mutex lock used for mere value update. It might be helping to know that I use boost::unordered_map as the underlying hash-map.
below is a psudo code of the logic I use:
if hash.find(session_id) then
hash.update(value)
else
mutex.lock()
hash.insert(value)
mutex.unlock()
end
what is you suggestion?
by the way this is my working environment and tools:
Compiler: C++(gcc)
Thread library: pthread
OS: Ubuntu 14.04
The fastest solution would be to split the data in a way that each thread uses its own data set, so you would not need any locking at all. Maybe you can get there by distributing the messages among the threads based on some key data.
Second best solution would be to have a read-write-spinlock implemented using either C++ 11 atomics or the functions from the C library, see https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
Read-write spinlock typically allow multiple, parallel read accesses, but only one write access (which of course also blocks all read accesses).
There is also a read-write mutex in Linux, but I found it to be slightly slower than a hand-made implementation.
Have you looked into lock-free data structures? You can refer to an interesting paper from Andrei Alexandrescu and Maged Michael, Lock-Free Data Structures with Hazard Pointers. Some implementations using similar ideas can for instance be found on the libcds Github repository.
Although they use locking to some extent, Facebook's folly AtomiHashMap and Intel's TBB also provide high performance concurrent hash-maps.
Of course these approaches will require a bit a extra reading and integration work, but if you have determined that your current locking strategy is the bottleneck, it may well be worth the cost.

How do DynamoDB's conditional updates work with its eventual consistency model?

I was reading the DynamoDB documentation and found two interesting features:
Eventual consistent reads
Strongly consistent reads
Conditional updates
My question is, how do these three things interact with each other? Mostly I'm wondering if they conditional updates use a strongly consistent reads for checking the condition, or do they use eventually consistent reads? If it's the later, there is still a race condition, correct?
For a conditional update you need strong consistency. I am going to guess that an update is a separate operation in which consistent read + write happen atomically and fail/succeeded together.
The way to think of Dynamo is like a group of separated entities that all keep track of the state and inform each other of updates that are made / agree if such updates can be propagated to the whole group or not.
When you (dynamo api on your behalf) write you basically inform a subset of these entities that you want to update data. After that the data propagates to all of these entities.
When you do an eventual consistent read you read it from one of the entities. It's eventual consistent meaning that there is a possibility that you will read from one of the entities that did not get the memo yet.
When doing a strong consistent read you read from enough entities to ensure that what you're read has propagated. If propagation is in progress you need to wait.

Can I use WAL mode in SQLite3 if I use an additional mutex for multiple writers?

http://sqlite.org/wal.html
WAL mode is specified for N readers and one writer. It the writer's identity interchangible? For example, I have N writers and an additional mutex. Each writer has to obtain this mutex to be able to write, so there are never two writers writing at the same time. Is this possible? And if so, why is this not part of SQLite since to me, it appears to be a natural extension of the WAL mode.
In WAL mode, it is not possible to have multiple active writers.
However, it is possible for multiple writers to attempt to write.
This conflict is handled in exactly the same way as in rollback journal mode, i.e., the first writer locks the database, and the others have to wait.

libpqxx transaction serialization & consequences

For my implementation, a particular write must be done in bulk and without the chance of another interfering.
I have been told that two competing transactions in this way will lead to the first one blocking the second, and the second may or may not complete after the first has.
Please post the documentation that confirms this. Also, what exactly happens to the second transaction if the first is blocking? Will it be queued, fail, or some combination?
If this cannot be confirmed, should the transaction isolation level for this transaction be set to SERIALIZABLE? If so, how can that be done with libpqxx prepared statements?
If the transactions are serialized, will the second transaction fail or be queued until the first has completed?
If either fail, how can this be detected with libpqxx?
The only way to conclusively prevent concurrency effects is to LOCK TABLE ... IN ACCESS EXCLUSIVE MODE each table you wish to modify.
This means you're really only doing one thing at a time. It also leads to fun problems with deadlocks if you don't always acquire your locks in the same order.
So usually, what you need to do is figure out what exactly the operations you wish to do are, and how they interact. Determine what concurrency effects you can tolerate, and how to prevent those you cannot.
This question as it stands is just too broad to usefully answer.
Options include:
Exclusively locking tables. (This is the only way to do a multi-row upsert without concurrency problems in PostgreSQL right now). Beware of lock upgrade and lock order related deadlocks.
appropriate use of SERIALIZABLE isolation - but remember, you have to be able to keep a record of what you did during a transaction and retry it if the tx aborts.
Careful row-level locking - SELECT ... FOR UPDATE, SELECT ... FOR SHARE.
"Optimistic locking" / optimistic concurrency control, where appropriate
Writing your queries in ways that make them more friendly toward concurrent operation. For example, replacing read-modify-write cycles with in-place updates.

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.