Soon i'll start working on a parallel version of a mesh refinement algorithm using shared memory.
A professor at the university pointed out that we have to be very careful about thread safety because neither the compiler nor the stl is thread aware.
I searched for this question and the answer depended on the compiler (some try to be somewhat thread-aware) and the plattform (if the system calls used by the compiler are thread-safe or not).
So, in linux, the gcc 4 compiler produces thread-safe code for the new operator?
If not, what is the best way to overcome this problem? Maybe lock each call to the new operator?
You will have to look very hard to find a platform that supports threads but doesn't have a thread safe new. In fact, the thread safety of new (and malloc) is one of the reasons it's so slow.
If you want a thread safe STL on the other hand, you may consider Intel TBB which has thread aware containers (although not all operations on them are thread safe).
Generally the new operator is thread safe - however thread safety guarantees for calls into the STL and the standard library are governed by the standard - this doesn't mean that they are thread unaware - they tend to have very well defined guarantees of thread safety for certain operations. For example iterating through a list in a read-only fashion is thread safe for multiple readers, while iterating through a list and making updates is not. You have to read the documentation and see what the various guarantees are, although they aren't that onerous and they tend to make sense.
While I'm talking about concepts I have not used, I feel I should mention that if you're using shared memory, then you likely want to ensure that you use only POD types, and to use placement new.
Secondly, if you're using shared memory as it is commonly understood to be on linux systems, then you may be using multiple processes - not threads, to allocate memory and 'do stuff' - using shared memory as a communication layer. If this is the case, then the thread safety of your application and libraries are not important - what is important, however, is the thread safety of anything using the shared memory allocation! This is a different situation than running one process with many threads, in which case asking about the thread safety of the new operator IS a valid concern, and could be addressed by placement new if it is not, or by defining your own allocators.
Well, this is not a definitive answer to my question, just that I found out that Google implemented a high-performance multi-threaded malloc.
So, if you're in doubt of whether your implementation is thread safe, maybe you should use the Google Performance Tools.
Related
Thread-safe or thread-compatible code is good.
However there are cases in which one could implement things differently (more simply or more efficiently) if one knows that the program will not be using threads.
For example, I once heard that things like std::shared_ptr could use different implementations to optimize the non-threaded case (but I can't find a reference).
I think historically std::string in some implementation could use Copy-on-write in non-threaded code.
I am not in favor or against these techniques but I would like to know if that there is a way, (at least a nominal way) to determine at compile time if the code is being compiled with the intention of using threads.
The closest I could get is to realize that threaded code is usually (?) compiled with the -pthreads (not -lpthreads) compiler option.
(Not sure if it is a hard requirement or just recommended.)
In turn -pthreads defines some macros, like _REENTRANT or _THREAD_SAFE, at least in gcc and clang.
In some some answers in SO, I also read that they are obsolete.
Are these macros the right way to determine if the program is intended to be used with threads? (e.g. threads launched from that same program). Are there other mechanism to detect this at compile time? How confident would the detection method be?
EDIT: since the question can be applied to many contexts apparently, let me give a concrete case:
I am writing a header only library that uses another 3rd party library inside. I would like to know if I should initialize that library to be thread-safe (or at least give a certain level of thread support). If I assume the maximum level of thread support but the user of the library will not be using threads then there will be cost paid for nothing. Since the 3rd library is an implementation detail I though I could make a decision about the level of thread safety requested based on a guess.
EDIT2 (2021): By chance I found this historical (but influential) library Blitz++ which in the documentation says (emphasis mine)
8.1 Blitz++ and thread safety
To enable thread-safety in Blitz++, you need to do one of these
things:
Compile with gcc -pthread, or CC -mt under Solaris. (These options define_REENTRANT,which tells Blitz++ to generate thread-safe code).
Compile with -DBZ_THREADSAFE, or #define BZ_THREADSAFE before including any Blitz++ headers.
In threadsafe mode, Blitz++ array reference counts are safeguarded by
a mutex. By default, pthread mutexes are used. If you would prefer a
different mutex implementation, add the appropriate BZ_MUTEX macros to
<blitz/blitz.h> and send them toblitz-dev#oonumerics.org for
incorporation. Blitz++ does not do locking for every array element
access; this would result in terrible performance. It is the job of
the library user to ensure that appropriate synchronization is used.
So it seems that at some point _REENTRANT was used as a clue for the need of multi-threading code.
Maybe it is a very old reference to take seriously.
I support the other answer in that thread-safety decision ideally should not be done on whole program basis, rather they should be for specific areas.
Note that boost::shared_ptr has thread-unsafe version called boost::local_shared_ptr. boost::intrusive_ptr has safe and unsafe counter implementation.
Some libraries use "null mutex" pattern, that is a mutex, which does nothing on lock / unlock. See boost or Intel TBB null_mutex, or ATL CComFakeCriticalSection. This is specifically to substitute real mutex for threqad-safe code, and a fake one for thread-unsafe.
Even more, sometimes it may make sense to use the same objects in thread-safe and thread-unsafe way, depending on current phase of execution. There's also atomic_ref which serves the purpose of providing thread-safe access to underlying type, but still letting work with it in thread unsafe.
I know a good example of runtime switches between thread-safe and thread-unsafe. See HeapCreate with HEAP_NO_SERIALIZE, and HeapAlloc with HEAP_NO_SERIALIZE.
I know also a questionable example of the same. Delphi recommends calling its BeginThread wrapper instead of CreateThread API function. The wrapper sets a global variable telling that from now on Delphi Memory Manager should be thread-safe. Not sure if this behavior is still in place, but it was there for Delphi 7.
Fun fact: in Windows 10, there are virtually no single-threaded programs. Before the first statement in main is executed, static DLL dependencies are loaded. Current Windows version makes this DLL loading paralleled where possible by using thread pool. Once program is loaded, thread pool threads are waiting for other tasks that could be issued by using of Windows API calls or std::async. Sure if program by itself will not use threads and TLS, it will not notice, but technically it is multi-threaded from the OS perspective.
How confident would the detection method be?
Not really. Even if you can unambiguously detect if code is compiled to be used with multiple threads, not everything must be thread safe.
Making everything thread-safe by default, even though it is only ever used only by a single thread would defeat the purpose of your approach. You need more fine grainded control to turn on/off thread safety if you do not want to pay for what you do not use.
If you have class that has a thread-safe and a non-thread-safe version then you could use a template parameter
class <bool isThreadSafe> Foo;
and let the user decide on a case for case basis.
I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB
The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.
The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)
You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.
First of all, I'm fairly experienced with C++ and understand the basics of threading and thread synchronization. I also want to write a custom memory allocator as a pet project of mine and have read that they should be thread-safe.
I understand what the term "thread-safe" means, but I have no idea on how to make C++ code thread-safe.
Are there any practical examples or tutorials on how to make code thread-safe?
In a memory allocator scenario, is it essentially ensuring that all mutating functions are marked as critical sections? Or is there something more to it?
Same as all threading issues: make sure that when one thread is changing something, no other thread is accessing it. For a memory allocation system, I would imagine you would need a way of making sure you don't allocate the same block of memory to 2 threads at the same time. Whether that is by wrapping the entire search, or by allowing multiple searches but locking when the allocation table is to be updated (which could then cause the result of the search to become invalid, necessitating another search) would be up to you.
I need several STL containers, threadsafe.
Basically I was thinking I just need 2 methods added to each of the STL container objects,
.lock()
.unlock()
I could also break it into
.lockForReading()
.unlockForReading()
.lockForWriting()
.unlockForWriting()
The way that would work is any number of locks for parallel reading are acceptable, but if there's a lock for writing then reading AND writing are blocked.
An attempt to lock for writing waits until the lockForReading semaphore drops to 0.
Is there a standard way to do this?
Is how I'm planning on doing this wrong or shortsighted?
This is really kind of bad. External code will not recognize or understand your threading semantics, and the ease of availability of aliases to objects in the containers makes them poor thread-safe interfaces.
Thread-safety occurs at design time. You can't solve thread safety by throwing locks at the problem. You solve thread safety by not having two threads writing to the same data at the same time- in the general case, of course. However, it is not the responsibility of a specific object to handle thread safety, except direct threading synchronization primitives.
You can have concurrent containers, designed to allow concurrent use. However, their interfaces are vastly different to what's offered by the Standard containers. Less aliases to objects in the container, for example, and each individual operation is encapsulated.
The standard way to do this is acquire the lock in a constructor, and release it in the destructor. This is more commonly know as Resource Acquisition Is Initialization, or RAII. I strongly suggest you use this methodology rather than
.lock()
.unlock()
Which is not exception safe. You can easily forget to unlock the mutex prior to throwing, resulting in a deadlock the next time a lock is attempted.
There are several synchronization types in the Boost.Thread library that will be useful to you, notably boost::mutex::scoped_lock. Rather than add lock() and unlock() methods to whatever container you wish to access from multiple threads, I suggest you use a boost:mutex or equivalent and instantiate a boost::mutex::scoped_lock whenever accessing the container.
Is there a standard way to do this?
No, and there's a reason for that.
Is how I'm planning on doing this
wrong or shortsighted?
It's not necessarily wrong to want to synchronize access to a single container object, but the interface of the container class is very often the wrong place to put the synchronization (like DeadMG says: object aliases, etc.).
Personally I think both TBB and stuff like concurrent_vector may either be overkill or still the wrong tools for a "simple" synchronization problem.
I find that ofttimes just adding a (private) Lock object (to the class holding the container) and wrapping up the 2 or 3 access patterns to the one container object will suffice and will be much easier to grasp and maintain for others down the road.
Sam: You don't want a .lock() method because something could go awry that prevents calling the .unlock() method at the end of the block, but if .unlock() is called as a consequence of object destruction of a stack allocated variable then any kind of early return from the function that calls .lock() will be guaranteed to free the lock.
DeadMG:
Intel's Threading Building Blocks (open source) may be what you're looking for.
There's also Microsoft's concurrent_vector and concurrent_queue, which already comes with Visual Studio 2010.
I'm using Android 2.2, which comes with a version of STLport. For some reason, it was configured to be non-thread safe. This was done using a #define _NOTHREADS in a configuration header file.
When I constructed and initialized distinct non-shared containers (e.g. strings) from different pthreads, I was getting memory corruption.
With _NOTHREADS, it looks like some low-level code in STL inside allocator.cpp doesn't do proper locking. It seems analogous to C not providing thread safety for malloc.
Does anyone know why STL might be built with _NOTHREADS by default on Android? By turning this off, I'm wondering if there may be a side effect. One thing I can think of is slightly degraded performance, but I don't see much of a choice given I'm using lots of threading.
The SGI STL
The SGI STL is the grandmother of all of the other STL implementations.
See the SGI STL docs.
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
G++
libstdc++ docs
We currently use the SGI STL definition of thread safety.
STLPort
STLPort docs
Please refer to SGI site for detailed
document on thread safety. Basic
points are:
simultaneous read access to the same container from within separate
threads is safe;
simultaneous access to distinct containers (not shared between
threads) is safe;
user must provide synchronization for all accesses if
any thread may modify shared
container.
General Information about the C++ Standard
The current C++ standard doesn't address concurrency issues at all, so at least for now there's no requirement that applies to all implementations.
A meaningful answer can only really apply to a specific implementation (STLPort, in this case). STLPort is basically a version of the original SGI STL implementation with improvements to its portability, so you'd probably want to start with the documentation about thread safety in the original SGI version.
Of course, the reason it is without re-entrancy is for performance: speed, less memory use, less resource uses. Presumably this is because the assumption is most client programs won't be multi-threaded.
Sun WorkShop 5.0
This is a bit old but quite informative. The bottom line is that STL only provides locks on allocators.
Strings however are referenced counted objects but any changes to their reference count is done atomically. This is only true however when passing strings around by value. Two threads holding the same reference to a single string object will need to do their own locking.
When you use e.g. std::string or similar objects, and change them from different threads, you are sharing same object between threads. To make any of the member functions you can call on string reentrant, it would mean that no other thread can affect our mem-function, and our function cannot affect any other calls in other threads. The truth is exactly the opposite, since you are sharing the same object via this pointer, which is implicitly given when calling member objects. To illustrate, equivalent to this call
std::string a;
a.insert( ... );
without using OOP syntax would be:
std::string a;
insert( &a, ... );
So, you are implicitly violating requirement that no resource is shared between function calls. You can see more here.
Hope this helps.