I need several STL containers, threadsafe.
Basically I was thinking I just need 2 methods added to each of the STL container objects,
.lock()
.unlock()
I could also break it into
.lockForReading()
.unlockForReading()
.lockForWriting()
.unlockForWriting()
The way that would work is any number of locks for parallel reading are acceptable, but if there's a lock for writing then reading AND writing are blocked.
An attempt to lock for writing waits until the lockForReading semaphore drops to 0.
Is there a standard way to do this?
Is how I'm planning on doing this wrong or shortsighted?
This is really kind of bad. External code will not recognize or understand your threading semantics, and the ease of availability of aliases to objects in the containers makes them poor thread-safe interfaces.
Thread-safety occurs at design time. You can't solve thread safety by throwing locks at the problem. You solve thread safety by not having two threads writing to the same data at the same time- in the general case, of course. However, it is not the responsibility of a specific object to handle thread safety, except direct threading synchronization primitives.
You can have concurrent containers, designed to allow concurrent use. However, their interfaces are vastly different to what's offered by the Standard containers. Less aliases to objects in the container, for example, and each individual operation is encapsulated.
The standard way to do this is acquire the lock in a constructor, and release it in the destructor. This is more commonly know as Resource Acquisition Is Initialization, or RAII. I strongly suggest you use this methodology rather than
.lock()
.unlock()
Which is not exception safe. You can easily forget to unlock the mutex prior to throwing, resulting in a deadlock the next time a lock is attempted.
There are several synchronization types in the Boost.Thread library that will be useful to you, notably boost::mutex::scoped_lock. Rather than add lock() and unlock() methods to whatever container you wish to access from multiple threads, I suggest you use a boost:mutex or equivalent and instantiate a boost::mutex::scoped_lock whenever accessing the container.
Is there a standard way to do this?
No, and there's a reason for that.
Is how I'm planning on doing this
wrong or shortsighted?
It's not necessarily wrong to want to synchronize access to a single container object, but the interface of the container class is very often the wrong place to put the synchronization (like DeadMG says: object aliases, etc.).
Personally I think both TBB and stuff like concurrent_vector may either be overkill or still the wrong tools for a "simple" synchronization problem.
I find that ofttimes just adding a (private) Lock object (to the class holding the container) and wrapping up the 2 or 3 access patterns to the one container object will suffice and will be much easier to grasp and maintain for others down the road.
Sam: You don't want a .lock() method because something could go awry that prevents calling the .unlock() method at the end of the block, but if .unlock() is called as a consequence of object destruction of a stack allocated variable then any kind of early return from the function that calls .lock() will be guaranteed to free the lock.
DeadMG:
Intel's Threading Building Blocks (open source) may be what you're looking for.
There's also Microsoft's concurrent_vector and concurrent_queue, which already comes with Visual Studio 2010.
Related
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. Is std::vector thread safe? Or do I need to implement the mechanism myself to make it thread safe?
I want to avoid doing extra "locking and freeing" work because I'm a library user rather than a library designer. I hope to look for existing thread-safe solutions for vector. How about boost::vector, which was newly introduced from boost 1.48.0 onward. Is it thread safe?
The C++ standard makes certain threading guarantees for all the classes in the standard C++ library. These guarantees may not be what you'd expect them to be but for all standard C++ library classes certain thread safety guarantees are made. Make sure you read the guarantees made, though, as the threading guarantees of standard C++ containers don't usually align with what you would want them to be. For some classes different, usually stronger, guarantees are made and the answer below specifically applies to the containers. The containers essentially have the following thread-safety guarantees:
there can be multiple concurrent readers of the same container
if there is one writer, there shall be no more writers and no readers
These are typically not what people would want as thread-safety guarantees but are very reasonable given the interface of the standard containers: they are intended to be used efficiently in the absence of multiple accessing threads. Adding any sort of locking for their methods would interfere with this. Beyond this, the interface of the containers isn't really useful for any form of internal locking: generally multiple methods are used and the accesses depend on the outcome of previous accesses. For example, after having checked that a container isn't empty() an element might be accessed. However, with internal locking there is no guarantee that the object is still in the container when it is actually accessed.
To meet the requirements which give the above guarantees you will probably have to use some form of external locking for concurrently accessed containers. I don't know about the boost containers but if they have an interface similar to that of the standard containers I would suspect that they have exactly the same guarantees.
The guarantees and requirements are given in 17.6.4.10 [res.on.objects] paragraph 1:
The behavior of a program is undefined if calls to standard library functions from different threads may introduce a data race. The conditions under which this may occur are specified in 17.6.5.9. [ Note: Modifying an object of a standard library type that is shared between threads risks undefined behavior unless objects of that type are explicitly specified as being sharable without data races or the user supplies a locking mechanism. —endnote]
... and 17.6.5.9 [res.on.data.races]. This section essentially details the more informal description in the not.
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. Is std::vector thread safe?
This is unsafe.
Or do I need to implement the mechanism myself to make it thread safe?
Yes.
I want to avoid doing extra "locking and freeing" work because I'm a library user rather than a library designer. I hope to look for existing thread-safe solutions for vector.
Well, vector's interface isn't optimal for concurrent use. It is fine if the client has access to a lock, but for for the interface to abstract locking for each operation -- no. In fact, vector's interface cannot guarantee thread safety without an external lock (assuming you need operations which also mutate).
How about boost::vector, which was newly introduced from boost 1.48.0 onward. Is it thread safe?
Docs state:
//! boost::container::vector is similar to std::vector but it's compatible
//! with shared memory and memory mapped files.
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. ... I hope to look for existing thread-safe solutions for vector.
Take a look at concurrent_vector in Intel's TBB. Strictly speaking, it's quite different from std::vector internally and is not fully compatible by API, but still might be suitable. You might find some details of its design and functionality in the blogs of TBB developers.
I've been playing around with gtkmm and multi-threaded GUIs and stumbled into the concept of a mutex. From what I've been able to gather, it serves the purpose of locking access to a variable for a single thread in order to avoid concurrency issues. This I understand, seems rather natural, however I still don't get how and when one should use a mutex. I've seen several uses where the mutex is only locked to access particular variables (e.g.like this tutorial). For which type of variables/data should a mutex be used?
PS: Most of the answers I've found on this subject are rather technical, and since I am far from an expert on this I was looking more for a conceptual answer.
If you have data that is accessed from more than a single thread, you probably need a mutex. You usually see something like
theMutex.lock()
do_something_with_data()
theMutex.unlock()
or a better idiom in c++ would be:
{
MutexGuard m(theMutex)
do_something_with_data()
}
where MutexGuard c'tor does the lock() and d'tor does the unlock()
This general rule has a few exceptions
if the data you are using can be accessed in an atomic manner, you don't need a lock. In Visual Studio you have functions like InterlockedIncrement() that do this. gcc has it's own facilities to do this.
If you are accessing the data to only ever read it and never change it, it's usually safe to do without locking. but if even a single thread does any change to the data, all the other threads need to make sure they don't try to read the data while it is being changed. You can also read about Reader-Writer lock for this kind of situations.
variables that are changed among multiple threads. So data that is not modified (immutable) or data that is not shared does not need
While reading up on POSIX threading, I came across an example of thread-specific-data. I did have one area of confusion in my mind...
The thread-specific-data interface looks a little clunky, especially once you mix in having to use pthread_once, the various initializers, etc.
Is there any reason I can't just use a static std::map where the key is the pthread_self() id and the data value is held in the second part of the std::pair?
I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex, but I see no suggestion of it or anything similar which confuses me given it sounds much easier than the provided API. I know threading can have alot of catch-22's so I thought I'd ask and see if I was about to step in... something unpleasant? :)
I can't think of a reason that this wouldn't work as long as it was wrapped in a mutex
That in itself is a very good reason; implemented properly, you can access your thread-specific data without preventing other threads from simultaneously creating or accessing theirs.
There's also general efficiency (constant time access, versus logarithmic time if you use std::map), no guarantee that pthread_t has a suitable ordering defined, and automatic cleanup along with all the other thread resources.
You could use C++11's thread_local keyword, or boost::thread_specific_ptr, if you don't like the Posix API.
pthread thread-specific-data existed before the standard library containers
thread-specific-data avoids the need of locking and makes sure no other thread messes with the data
The data is cleaned up automatically when the thread disappears
Having said that, nothing stops you from using your own solution. If you can be sure that the container is completely constructed before any threads are running (static threading model), you don't even need the mutex.
I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB
The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.
The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)
You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.
Can I use a map or hashmap in a multithreaded program without needing a lock?
i.e. are they thread safe?
I'm wanting to potentially add and delete from the map at the same time.
There seems to be a lot of conflicting information out there.
By the way, I'm using the STL library that comes with GCC under Ubuntu 10.04
EDIT: Just like the rest of the internet, I seem to be getting conflicting answers?
You can safely perform simultaneous read operations, i.e. call const member functions. But you can't do any simultaneous operations if one of then involves writing, i.e. call of non-const member functions should be unique for the container and can't be mixed with any other calls.
i.e. you can't change the container from multiple threads. So you need to use lock/rw-lock
to make the access safe.
No.
Honest. No.
edit
Ok, I'll qualify it.
You can have any number of threads reading the same map. This makes sense because reading it doesn't have any side-effects, so it can't matter whether anyone else is also doing it.
However, if you want to write to it, then you need to get exclusive access, which means preventing any other threads from writing or reading until you're done.
Your original question was about adding and removing in parallel. Since these are both writes, the answer to whether they're thread-safe is a simple, unambiguous "no".
TBB is a free open-source library that provides thread-safe associative containers. (http://www.threadingbuildingblocks.org/)
The most commonly used model for STL containers' thread safety is the SGI one:
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe.
but in the end it's up to the STL library authors - AFAIK the standard says nothing about STL's thread-safety.
But according to the docs GNU's stdc++ implementation follows it (as of gcc 3.0+), if a number of conditions are met.
HIH
The answer (like most threading problems) is it will work most of the time. Unfortunately if you catch the map while it's resizing then you're going to end up in trouble. So no.
To get the best performance you'll need a multi stage lock. Firstly a read lock which allows accessors which can't modify the map and which can be held by multiple threads (more than one thread reading items is ok). Secondly a write lock which is exclusive which allows modification of the map in ways that could be unsafe (add, delete etc..).
edit Reader-writer locks are good but whether they're better than standard mutex depends on the usage pattern. I can't recommend either without knowing more. Profile both and see which best fits your needs.