Thread safety of google/dense_hash_map

Thread safety of google/dense_hash_map - c++

Are reading operations in the dense_hash_map thread safe?

A const C++ object of reentrant type (most are) is generally assumed to be thread-safe.
The documentation of dense_hash_map doesn't specify anything regarding thread-safety, so the most defensive approach would be to assume it isn't even reentrant. It takes unprotected global mutable state to make a class non-reentrant, though, and it's hard to find an argument for dense_hash_map to require that, but seeing as it stores its contents to disk, that might be all you can hope for. To assume the thing is thread-safe even on mutable operations is far-fetched without confirmation from the documentation.
Barring documentation, you might want to have a look at the implementation to see whether you can verify reentrancy for at least some subset of the API.

According to the paper Scalable, High Performance Ethernet Forwarding with CUCKOOSWITCH (2013), google::dense_hash_map is not thread-safe for reads and writes:
[...] We therefore also compare to three non-thread-safe hash tables: the STL’s hash_map and Google’s sparse_hash_map and dense_hash_map. [...] These non-thread-safe tables do not support concurrent reads and writes.
I could not find any other information about google::dense_hash_map being thread-safe or not.

Related

Is shared_future<void> a legitimate replacement for a condition_variable?

Josuttis states ["Standard Library", 2nd ed, pg 1003]:
Futures allow you to block until data by another thread is provided or another thread is done. However, a future can pass data from one thread to another only once. In fact, a future's major purpose is to deal with return values or exceptions of threads.
On the other hand, a shared_future<void> can be used by multiple threads, to identify when another thread has done its job.
Also, in general, high-level concurrency features (such as futures) should be preferred to low-level ones (such as condition_variables).
Therefore, I'd like to ask: Is there any situation (requiring synchronization of multiple threads) in which a shared_future<void> won't suffice and a condition_variable is essential?

As already pointed out in the comments by #T.C. and #hlt, the use of futures/shared_futures is mostly limited in the sense that they can only be used once. So for every communication task you have to have a new future. The pros and cons are nicely explained by Scott Meyers in:
Item 39: Consider void futures for one-shot event
communication.
Scott Meyers: Effective Modern C++ (emphasis mine)
His conclusion is that using promise/future pairs dodges many of the problems with the use of condidition_variables, providing a nicer way of communicating one-shot events. The price to pay is that you are using dynamically allocated memory for the shared states and more importantly, that you have to have one promise/future pair for every event that you want to communicate.

While the notion of using high-level abstracts instead of low-level abstract is laudable, there is a misconception here. std::future is not a high-level replacement for std::conditional_variable. Instead, it is a specific high-level construct build for a specific use-case of std::condition_variable - namely, a one-time return of the value.
Obviously, not all uses of condition variable is for this scenario. For example, an message queue can not be implemented with std::future, no matter how much you try. Such a thread is another high-level construct built on low-level building block. So yes, shoot for high-level constructs, but do not expect a one-to-one map mapping between high and low level.

STL containers and threads (concurrent writes) in Linux

I am looking for the optimal strategy to use STL containers (like std::map and std::vector) and pthreads.
What is the canonical way to go? A simple example:
std::map<string, vector<string>> myMap;
How do we guarantee concurrency?
mutex_lock;
write at myMap;
mutex_unlock;
Additionally, I would like to know if pthreads and STL face performance issues when used together.
System: Liunx, g++, pthreads, no boost, no Intel TBB

The C++03 Standard does not talk about concurrency at all, So the concurrency aspect is left out as an implementation detail for compilers. So the documentation that comes with your compiler is where one should look to for answers related to concurrency.
Most of the STL implementations are not thread safe as such.
Since STL containers do not provide any explicit Thread safety, So yes you will have to use your own synchronization mechanism. And while you are at it You should use RAII rather than manage the synchronization resource(mutex unlock etc) manually.
You can refer the Documentations here:
MSDN:
If a single object is being written to by one thread, then all reads and writes to that object on the same or other threads must be protected. For example, given an object A, if thread 1 is writing to A, then thread 2 must be prevented from reading from or writing to A.
GCC Documentation says:
We currently use the SGI STL definition of thread safety, which states:
The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Point to Note: GCC's Standard Library is a derivative of SGI's STL code.

The canonical way to provide concurrency is to hold a lock while accessing the collection.
That works in 90% of the cases where access to the collection isn't performance-critical anyway. If you're accessing a shared collection so much that locking around it harms performance, you should rethink your design. (And odds are, your design is okay and it won't affect performance anywhere near as much as you might suspect.)

You should take a look at intel thread building blocks tbb ( http://threadingbuildingblocks.org/ ). They have a few very optimized data structures that handle concurrency internally using non-blocking strategies.

is GCC STL thread-safe?

I found contradictory information on the web:
http://www.sgi.com/tech/stl/thread_safety.html
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct containers are safe, and
simultaneous read accesses to to shared containers are safe. If
multiple threads access a single container, and at least one thread
may potentially write, then the user is responsible for ensuring
mutual exclusion between the threads during the container accesses.
http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
The user-code must guard against concurrent method calls which may
access any particular library object's state. Typically, the
application programmer may infer what object locks must be held based
on the objects referenced in a method call. Without getting into great
detail, here is an example which requires user-level locks:
All library objects are safe to use in a multithreaded program as long
as each thread carefully locks out access by any other thread while it
uses any object visible to another thread, i.e., treat library objects
like any other shared resource. In general, this requirement includes
both read and write access to objects; unless otherwise documented as
safe, do not assume that two threads may access a shared standard
library object at the same time.
I bolded the imporant part - maybe I dont understand what they mean by that,when I read object state I think of STL containers

How I understand this:
both documents say the same in different manner. MS STL implementation (actually Dinkumware one) says almost the same as your quoted SGI doc. They mean that they did nothing to make STL objects (e.g. containers) thread-safe, most probably because this would add an overhead unnecessary in many single-threaded applications. Any object is thread-safe in their terms, you can read it from multiple threads.
Also docs guarantee that STL objects are not modified under the hood in some background threads.

FWIW I updated the libstdc++ docs a while ago, it now says (emphasis mine):
The user code must guard against concurrent function calls which access any particular library object's state when one or more of those accesses modifies the state.

The information you cite is not contradictory. STL libraries should be safe to be used in a multi-threaded environment (actually, I've worked with one implementation where it was not the case) but it is users' burden to synchronize access to library objects. For instance, if you create a set of ints in one thread and another set of ints in another thread and you don't share either of them among threads, you should be able to use them; if you share an instance of a set among threads, it's up to you to synch the access to the set.

STL is no more. It is superseded by the C++ Standard Library. If you use the ISO C++ and the Standard Library, you should read (a) the Standard and (b) documentation that comes with your implementation of C++.
SGI STL documentation is mostly of historical interest, unless you for some reason actually use SGI STL.

Is C++ STL thread-safe for distinct containers (using STLport implementation)?

I'm using Android 2.2, which comes with a version of STLport. For some reason, it was configured to be non-thread safe. This was done using a #define _NOTHREADS in a configuration header file.
When I constructed and initialized distinct non-shared containers (e.g. strings) from different pthreads, I was getting memory corruption.
With _NOTHREADS, it looks like some low-level code in STL inside allocator.cpp doesn't do proper locking. It seems analogous to C not providing thread safety for malloc.
Does anyone know why STL might be built with _NOTHREADS by default on Android? By turning this off, I'm wondering if there may be a side effect. One thing I can think of is slightly degraded performance, but I don't see much of a choice given I'm using lots of threading.

The SGI STL
The SGI STL is the grandmother of all of the other STL implementations.
See the SGI STL docs.
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
G++
libstdc++ docs
We currently use the SGI STL definition of thread safety.
STLPort
STLPort docs
Please refer to SGI site for detailed
document on thread safety. Basic
points are:
simultaneous read access to the same container from within separate
threads is safe;
simultaneous access to distinct containers (not shared between
threads) is safe;
user must provide synchronization for all accesses if
any thread may modify shared
container.

General Information about the C++ Standard
The current C++ standard doesn't address concurrency issues at all, so at least for now there's no requirement that applies to all implementations.
A meaningful answer can only really apply to a specific implementation (STLPort, in this case). STLPort is basically a version of the original SGI STL implementation with improvements to its portability, so you'd probably want to start with the documentation about thread safety in the original SGI version.

Of course, the reason it is without re-entrancy is for performance: speed, less memory use, less resource uses. Presumably this is because the assumption is most client programs won't be multi-threaded.

Sun WorkShop 5.0
This is a bit old but quite informative. The bottom line is that STL only provides locks on allocators.
Strings however are referenced counted objects but any changes to their reference count is done atomically. This is only true however when passing strings around by value. Two threads holding the same reference to a single string object will need to do their own locking.

When you use e.g. std::string or similar objects, and change them from different threads, you are sharing same object between threads. To make any of the member functions you can call on string reentrant, it would mean that no other thread can affect our mem-function, and our function cannot affect any other calls in other threads. The truth is exactly the opposite, since you are sharing the same object via this pointer, which is implicitly given when calling member objects. To illustrate, equivalent to this call
std::string a;
a.insert( ... );
without using OOP syntax would be:
std::string a;
insert( &a, ... );
So, you are implicitly violating requirement that no resource is shared between function calls. You can see more here.
Hope this helps.

Are STL Map or HashMaps thread safe?

Can I use a map or hashmap in a multithreaded program without needing a lock?
i.e. are they thread safe?
I'm wanting to potentially add and delete from the map at the same time.
There seems to be a lot of conflicting information out there.
By the way, I'm using the STL library that comes with GCC under Ubuntu 10.04
EDIT: Just like the rest of the internet, I seem to be getting conflicting answers?

You can safely perform simultaneous read operations, i.e. call const member functions. But you can't do any simultaneous operations if one of then involves writing, i.e. call of non-const member functions should be unique for the container and can't be mixed with any other calls.
i.e. you can't change the container from multiple threads. So you need to use lock/rw-lock
to make the access safe.

No.
Honest. No.
edit
Ok, I'll qualify it.
You can have any number of threads reading the same map. This makes sense because reading it doesn't have any side-effects, so it can't matter whether anyone else is also doing it.
However, if you want to write to it, then you need to get exclusive access, which means preventing any other threads from writing or reading until you're done.
Your original question was about adding and removing in parallel. Since these are both writes, the answer to whether they're thread-safe is a simple, unambiguous "no".

TBB is a free open-source library that provides thread-safe associative containers. (http://www.threadingbuildingblocks.org/)

The most commonly used model for STL containers' thread safety is the SGI one:
The SGI implementation of STL is thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe.
but in the end it's up to the STL library authors - AFAIK the standard says nothing about STL's thread-safety.
But according to the docs GNU's stdc++ implementation follows it (as of gcc 3.0+), if a number of conditions are met.
HIH

The answer (like most threading problems) is it will work most of the time. Unfortunately if you catch the map while it's resizing then you're going to end up in trouble. So no.
To get the best performance you'll need a multi stage lock. Firstly a read lock which allows accessors which can't modify the map and which can be held by multiple threads (more than one thread reading items is ok). Secondly a write lock which is exclusive which allows modification of the map in ways that could be unsafe (add, delete etc..).
edit Reader-writer locks are good but whether they're better than standard mutex depends on the usage pattern. I can't recommend either without knowing more. Profile both and see which best fits your needs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Thread safety of google/dense_hash_map - c++

Are reading operations in the dense_hash_map thread safe?

Related

Is shared_future<void> a legitimate replacement for a condition_variable?

STL containers and threads (concurrent writes) in Linux

is GCC STL thread-safe?

Is C++ STL thread-safe for distinct containers (using STLport implementation)?

Are STL Map or HashMaps thread safe?

Categories

Resources