How can I protect a vector with a Mutex? - c++

I am working on designing a C++ server that accepts multiple different interacting clients, and I use vectors to keep track of all of them individually. However, I realized that, because of so many threads running, there's a tiny chance a vector might be read and written to at the same time by two threads. Is there a quick and safe way to add a mutex or something to them so that it will wait until all the reads are done until another function adds to it? Not doing so can mess up the protocol and maybe even crash the server.
I had an idea to create a global variable that would lock all reads to a vector, but I'm not sure if the threads can be told to mutually exclude that variable too (i.e. not change bool to false and check it as true at the same time, rendering the mechanism pointless).
I am using Windows 7 (Visual Studio 2010 Pro). Thanks for any and all advice!

The quickest solution is to replace the std::vector with a concurrent_vector. This class mimics std::vector's interface but is thread-safe for concurrent reads and writes.
However, this will make the code non-portable because the concurrent_vector class is part of the Microsoft Parallel Patterns Library and not the C++ standard library. If you want to maintain portability, you'll have to use a Boost.Mutex (since VS2010 doesn't support std::mutex) to gain exclusive access to the vector from each thread. Using a global variable to prevent concurrent access is useless.

Since you are using VS2010, you should use Concurrency::concurrent_vector. But be aware that there are limitations with this class, and is not fully thread-safe. You may use Concurrency::critical_section or Concurrency::reader_writer_lock. Reader-writer lock would give good performance when there are more reads than writes. You may also use Windows native Reader-writer locks, but they are supported only only Windows Vista and higher.

Related

How to create a single process mutex within C++?

So I'm reading about monitors vs mutexes and finding mentions that suggest that monitors are faster mutexes because they don't lock system wide but rather only across the threads of a given process.
Is there some way in C++ to accomplish or simulate this?
Edit: I'm curious now what the difference is between system wide mutex and one restricted to a specific process.
C++ Standard does not define system-wide vs per-process primitives. So C++ does not specify whether std::mutex is system-wide.
Reasonable implementations have efficient per-process std::mutex; to have system-wide mutex you'll need to use libraries or operating system objects for your platform
The difference is that per-process mutex may use any memory operations to avoid system calls, as the process memory is shared among process's threads. Atomic operation on that memory are more efficient, and system call is often avoided via them. System-wide mutex will either start with system calls (not efficient), or will have to use shared memory (might be unsafe, also still may have some overhead).
The answer by #Alex Guteniev is as accurate as one can get (and should be considered the accepted answer). It states that the c++ standard doesn't define a system wide concept, and that mutexes for all practical purposes are per process i.e for synchronization between threads (execution agents) in a single process (and therefore according to your needs). The C++ makes it clear what a thread (std::thread) is (33.3 - ... intended to map one-to-one with OS threads (in my draft, at least...N4687)).
Microsoft post VC2015 has improved their implementation to use windows primitives as stated here. This is also indicated here in the most upvoted answer. I've also looked at the boost library implementations (which often precedes/influences the c++ standard) for microsoft and (AFAICT) it doesn't use any inter-process calls.
So to answer your question. In C++ threads and monitors are practically the same thing if this definition is to be considered accurate.
Update, stumbled across the answer to this while researching something related.
On Windows, Critical Sections can be used for single processes instead of system wide mutexes and are often faster:
Edit:
While the above statement is correct, c++ doesn't have the concept system wide mutex. This concept only exists when using OS specific primitives such as win32 CreateMutex and is not relevant to std c++.
Source:
std::mutex performance compared to win32 CRITICAL_SECTION
On Linux, pthreads are for processes.

What is the fastest possible solution for concurent read/write into hash-maps?

I am writing a network service which receives raw packets then converts them and puts them into a queue, there are also a couple of worker threads that take the converted packets from the queue and based on some rules update a hash-map. in order to prevent concurrent update on hash-map from different worker threads I have to use mutex. unfortunately using mutex imposes a big performance hit. I need to find a work around for this.
EDITED:
the converted packets contain a sessio_id, this session_id is used as the hash-map key. Before any insertion or update the session_id is first searched and if there is no session_id found then a new entry is added and this is exactly where i use mutex lock, otherwise if the session_id already exists I just update the existing value and there is no mutex lock used for mere value update. It might be helping to know that I use boost::unordered_map as the underlying hash-map.
below is a psudo code of the logic I use:
if hash.find(session_id) then
hash.update(value)
else
mutex.lock()
hash.insert(value)
mutex.unlock()
end
what is you suggestion?
by the way this is my working environment and tools:
Compiler: C++(gcc)
Thread library: pthread
OS: Ubuntu 14.04
The fastest solution would be to split the data in a way that each thread uses its own data set, so you would not need any locking at all. Maybe you can get there by distributing the messages among the threads based on some key data.
Second best solution would be to have a read-write-spinlock implemented using either C++ 11 atomics or the functions from the C library, see https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
Read-write spinlock typically allow multiple, parallel read accesses, but only one write access (which of course also blocks all read accesses).
There is also a read-write mutex in Linux, but I found it to be slightly slower than a hand-made implementation.
Have you looked into lock-free data structures? You can refer to an interesting paper from Andrei Alexandrescu and Maged Michael, Lock-Free Data Structures with Hazard Pointers. Some implementations using similar ideas can for instance be found on the libcds Github repository.
Although they use locking to some extent, Facebook's folly AtomiHashMap and Intel's TBB also provide high performance concurrent hash-maps.
Of course these approaches will require a bit a extra reading and integration work, but if you have determined that your current locking strategy is the bottleneck, it may well be worth the cost.

Does Boost have support for Windows EnterCriticalSection API?

I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}

Multi-threaded one reader and one writer using boost

I'm programming in C++ on linux. My program uses two threads when one reads and writes to a shared data-structure. The data-structure is from type - Boost::bimaps::unordered_set_of .
So my question is whether I need to worry regarding any synchronizing issues. I.E, do I need to protect the read and write fro the data-structure with locks (or something like that)? Or maybe I will work fine without any use in mutexes?
Thanks.
You should work with the mutex provided by boost
http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_concepts
In common with the standard containers, Boost.Bimap does not provide thread synchronisation. You will have to provide that yourself.

Are STL Map or HashMaps thread safe?

Can I use a map or hashmap in a multithreaded program without needing a lock?
i.e. are they thread safe?
I'm wanting to potentially add and delete from the map at the same time.
There seems to be a lot of conflicting information out there.
By the way, I'm using the STL library that comes with GCC under Ubuntu 10.04
EDIT: Just like the rest of the internet, I seem to be getting conflicting answers?
You can safely perform simultaneous read operations, i.e. call const member functions. But you can't do any simultaneous operations if one of then involves writing, i.e. call of non-const member functions should be unique for the container and can't be mixed with any other calls.
i.e. you can't change the container from multiple threads. So you need to use lock/rw-lock
to make the access safe.
No.
Honest. No.
edit
Ok, I'll qualify it.
You can have any number of threads reading the same map. This makes sense because reading it doesn't have any side-effects, so it can't matter whether anyone else is also doing it.
However, if you want to write to it, then you need to get exclusive access, which means preventing any other threads from writing or reading until you're done.
Your original question was about adding and removing in parallel. Since these are both writes, the answer to whether they're thread-safe is a simple, unambiguous "no".
TBB is a free open-source library that provides thread-safe associative containers. (http://www.threadingbuildingblocks.org/)
The most commonly used model for STL containers' thread safety is the SGI one:
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe.
but in the end it's up to the STL library authors - AFAIK the standard says nothing about STL's thread-safety.
But according to the docs GNU's stdc++ implementation follows it (as of gcc 3.0+), if a number of conditions are met.
HIH
The answer (like most threading problems) is it will work most of the time. Unfortunately if you catch the map while it's resizing then you're going to end up in trouble. So no.
To get the best performance you'll need a multi stage lock. Firstly a read lock which allows accessors which can't modify the map and which can be held by multiple threads (more than one thread reading items is ok). Secondly a write lock which is exclusive which allows modification of the map in ways that could be unsafe (add, delete etc..).
edit Reader-writer locks are good but whether they're better than standard mutex depends on the usage pattern. I can't recommend either without knowing more. Profile both and see which best fits your needs.