I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB
The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.
The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)
You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.
Related
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. Is std::vector thread safe? Or do I need to implement the mechanism myself to make it thread safe?
I want to avoid doing extra "locking and freeing" work because I'm a library user rather than a library designer. I hope to look for existing thread-safe solutions for vector. How about boost::vector, which was newly introduced from boost 1.48.0 onward. Is it thread safe?
The C++ standard makes certain threading guarantees for all the classes in the standard C++ library. These guarantees may not be what you'd expect them to be but for all standard C++ library classes certain thread safety guarantees are made. Make sure you read the guarantees made, though, as the threading guarantees of standard C++ containers don't usually align with what you would want them to be. For some classes different, usually stronger, guarantees are made and the answer below specifically applies to the containers. The containers essentially have the following thread-safety guarantees:
there can be multiple concurrent readers of the same container
if there is one writer, there shall be no more writers and no readers
These are typically not what people would want as thread-safety guarantees but are very reasonable given the interface of the standard containers: they are intended to be used efficiently in the absence of multiple accessing threads. Adding any sort of locking for their methods would interfere with this. Beyond this, the interface of the containers isn't really useful for any form of internal locking: generally multiple methods are used and the accesses depend on the outcome of previous accesses. For example, after having checked that a container isn't empty() an element might be accessed. However, with internal locking there is no guarantee that the object is still in the container when it is actually accessed.
To meet the requirements which give the above guarantees you will probably have to use some form of external locking for concurrently accessed containers. I don't know about the boost containers but if they have an interface similar to that of the standard containers I would suspect that they have exactly the same guarantees.
The guarantees and requirements are given in 17.6.4.10 [res.on.objects] paragraph 1:
The behavior of a program is undefined if calls to standard library functions from different threads may introduce a data race. The conditions under which this may occur are specified in 17.6.5.9. [ Note: Modifying an object of a standard library type that is shared between threads risks undefined behavior unless objects of that type are explicitly specified as being sharable without data races or the user supplies a locking mechanism. —endnote]
... and 17.6.5.9 [res.on.data.races]. This section essentially details the more informal description in the not.
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. Is std::vector thread safe?
This is unsafe.
Or do I need to implement the mechanism myself to make it thread safe?
Yes.
I want to avoid doing extra "locking and freeing" work because I'm a library user rather than a library designer. I hope to look for existing thread-safe solutions for vector.
Well, vector's interface isn't optimal for concurrent use. It is fine if the client has access to a lock, but for for the interface to abstract locking for each operation -- no. In fact, vector's interface cannot guarantee thread safety without an external lock (assuming you need operations which also mutate).
How about boost::vector, which was newly introduced from boost 1.48.0 onward. Is it thread safe?
Docs state:
//! boost::container::vector is similar to std::vector but it's compatible
//! with shared memory and memory mapped files.
I have multiple threads simultaneously calling push_back() on a shared object of std::vector. ... I hope to look for existing thread-safe solutions for vector.
Take a look at concurrent_vector in Intel's TBB. Strictly speaking, it's quite different from std::vector internally and is not fully compatible by API, but still might be suitable. You might find some details of its design and functionality in the blogs of TBB developers.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
question about STL thread-safe and STL debugging
I'm currently being engaged in a project which is developed using C++. Recently we are considering replacing some self-defined thread-safe containers with some STL equivalents to gain some efficiency.
However, after looking for a while, I found that there is no a thread-safe container provided in STL at all, which surprises quite a lot. Is there any reason?
Probably because it wouldn't actually be all that useful (in addition to what #Luchian Grigore says in another answer). Even if individual container operations are thread-safe, you still need to do a lot of work to ensure thread safety. For instance, this simple code contains a race condition even if the container itself is thread-safe:
if (!container.empty())
container.pop();
Standard Library containers do provide some basic thread safety, Performance was a more important design goal for designers of the Standard Library containers than safety.
All Standard Library containers guarantee:
Multiple concurrent reads from the same container are safe but
If there is atleast one writer thread, then there is no thread safety & there shall not be any other writer or reader.
The Standard Library containers were primarily designed for working efficiently in Single threaded environments and providing only basic thread safety is a way to ensure full performance for containers that do not need concurrent access.
The basic thread safety needs that users need some sort of synchronization methods to avoid race conditions through use of using mutexes, or locks.Locking or other forms of synchronization are typically expensive and hence need to be avoided when not necessary.
Also, given the interfaces exposed by the Standard Library containers, It is easy for the client or user of the container to provide the necessary locking by wrapping the underlying container operations with a lock acquisition and release if intended use is for multi-threaded environments.
Note that All the implementations conform the following requirements specified by the C++ Standard:
17.6.3.10 Shared objects and the library [res.on.objects]
The behavior of a program is undefined if calls to standard library functions from different threads may introduce a data race. The conditions under which this may occur are specified in 17.6.4.8. [ Note: Modifying an object of a standard library type that is shared between threads risks undefined behavior unless objects of that type are explicitly specified as being sharable without data races or the user supplies a locking mechanism. —end note ]
Because thread safety is highly platform and compiler specific.
The C++ STL provides the kind of thread-safety that pretty much everything else provides: You can safely use STL containers from multiple threads so long as an object isn't accessed in one thread while another thread is, or might be, modifying it.
In a sentence - because it's hard.
Because thread-safe containers require specific design - e.g. they must be persistent data structures. Such containers are easiest to implement in functional / garbage collected / event-based environments. Which C++ is not.
That is to say, implementing these would still require the user to handle all resource allocation/deallocation. That kind of defeats the point of having a collection.
I'm using Android 2.2, which comes with a version of STLport. For some reason, it was configured to be non-thread safe. This was done using a #define _NOTHREADS in a configuration header file.
When I constructed and initialized distinct non-shared containers (e.g. strings) from different pthreads, I was getting memory corruption.
With _NOTHREADS, it looks like some low-level code in STL inside allocator.cpp doesn't do proper locking. It seems analogous to C not providing thread safety for malloc.
Does anyone know why STL might be built with _NOTHREADS by default on Android? By turning this off, I'm wondering if there may be a side effect. One thing I can think of is slightly degraded performance, but I don't see much of a choice given I'm using lots of threading.
The SGI STL
The SGI STL is the grandmother of all of the other STL implementations.
See the SGI STL docs.
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
G++
libstdc++ docs
We currently use the SGI STL definition of thread safety.
STLPort
STLPort docs
Please refer to SGI site for detailed
document on thread safety. Basic
points are:
simultaneous read access to the same container from within separate
threads is safe;
simultaneous access to distinct containers (not shared between
threads) is safe;
user must provide synchronization for all accesses if
any thread may modify shared
container.
General Information about the C++ Standard
The current C++ standard doesn't address concurrency issues at all, so at least for now there's no requirement that applies to all implementations.
A meaningful answer can only really apply to a specific implementation (STLPort, in this case). STLPort is basically a version of the original SGI STL implementation with improvements to its portability, so you'd probably want to start with the documentation about thread safety in the original SGI version.
Of course, the reason it is without re-entrancy is for performance: speed, less memory use, less resource uses. Presumably this is because the assumption is most client programs won't be multi-threaded.
Sun WorkShop 5.0
This is a bit old but quite informative. The bottom line is that STL only provides locks on allocators.
Strings however are referenced counted objects but any changes to their reference count is done atomically. This is only true however when passing strings around by value. Two threads holding the same reference to a single string object will need to do their own locking.
When you use e.g. std::string or similar objects, and change them from different threads, you are sharing same object between threads. To make any of the member functions you can call on string reentrant, it would mean that no other thread can affect our mem-function, and our function cannot affect any other calls in other threads. The truth is exactly the opposite, since you are sharing the same object via this pointer, which is implicitly given when calling member objects. To illustrate, equivalent to this call
std::string a;
a.insert( ... );
without using OOP syntax would be:
std::string a;
insert( &a, ... );
So, you are implicitly violating requirement that no resource is shared between function calls. You can see more here.
Hope this helps.
Can I use a map or hashmap in a multithreaded program without needing a lock?
i.e. are they thread safe?
I'm wanting to potentially add and delete from the map at the same time.
There seems to be a lot of conflicting information out there.
By the way, I'm using the STL library that comes with GCC under Ubuntu 10.04
EDIT: Just like the rest of the internet, I seem to be getting conflicting answers?
You can safely perform simultaneous read operations, i.e. call const member functions. But you can't do any simultaneous operations if one of then involves writing, i.e. call of non-const member functions should be unique for the container and can't be mixed with any other calls.
i.e. you can't change the container from multiple threads. So you need to use lock/rw-lock
to make the access safe.
No.
Honest. No.
edit
Ok, I'll qualify it.
You can have any number of threads reading the same map. This makes sense because reading it doesn't have any side-effects, so it can't matter whether anyone else is also doing it.
However, if you want to write to it, then you need to get exclusive access, which means preventing any other threads from writing or reading until you're done.
Your original question was about adding and removing in parallel. Since these are both writes, the answer to whether they're thread-safe is a simple, unambiguous "no".
TBB is a free open-source library that provides thread-safe associative containers. (http://www.threadingbuildingblocks.org/)
The most commonly used model for STL containers' thread safety is the SGI one:
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe.
but in the end it's up to the STL library authors - AFAIK the standard says nothing about STL's thread-safety.
But according to the docs GNU's stdc++ implementation follows it (as of gcc 3.0+), if a number of conditions are met.
HIH
The answer (like most threading problems) is it will work most of the time. Unfortunately if you catch the map while it's resizing then you're going to end up in trouble. So no.
To get the best performance you'll need a multi stage lock. Firstly a read lock which allows accessors which can't modify the map and which can be held by multiple threads (more than one thread reading items is ok). Secondly a write lock which is exclusive which allows modification of the map in ways that could be unsafe (add, delete etc..).
edit Reader-writer locks are good but whether they're better than standard mutex depends on the usage pattern. I can't recommend either without knowing more. Profile both and see which best fits your needs.
I have two questions about STL
1) why STL is not thread-safe? Is there any structure that is thread-safe?
2) How to debug STL using GDB? In GDB, how can I print a vector?
Container data structures almost always require synchronization (e.g. a mutex) to prevent race conditions. Since threading is not support by the C++ standard (pre C++0x), these could not be added to the STL. Also, synchronization is very expensive for cases where it is not needed. STL containers may be used in multi-threaded applications as long as you perform this synchronization manually. Alternatively, you may create your own thread-safe containers that are compatible with STL algorithms like this thread-safe circular queue.
A vector contains a contiguous block of memory. So, it can be displayed in the same way as a regular array once you find the pointer to this memory block. The exact details depend on the STL implementation you use.
The standard c++ containers are not thread safe because you most likely actually want higher level locking than just the containers themselves. In other words you are likely to want two or more operations to be safe together.
For example, if you have multiple threads running:
v.push_back(0);
v.push_back(1);
You wont get a nice vector of alternating 0's and 1's, they could be jumbled. You would need to lock around both commands to get what you want.
STL is not thread-safe because a lot of people don't need thread safety, and because that introduces threading context into classes that otherwise have no need to know anything about the concept of threads.
You can encapsulate access to containers and provide your own thread safety (or other restrictions imposed by your specific design and implementation.)
Because there are still single-threaded programs.
Take a look here.