Thread safety of std::map for read-only operations - c++

I have a std::map that I use to map values (field ID's) to a human readable string. This map is initialised once when my program starts before any other threads are started, and after that it is never modified again. Right now, I give every thread its own copy of this (rather large) map but this is obviously inefficient use of memory and it slows program startup. So I was thinking of giving each thread a pointer to the map, but that raises a thread-safety issue.
If all I'm doing is reading from the map using the following code:
std::string name;
//here N is the field id for which I want the human readable name
unsigned field_id = N;
std::map<unsigned,std::string>::const_iterator map_it;
// fields_p is a const std::map<unsigned, std::string>* to the map concerned.
// multiple threads will share this.
map_it = fields_p->find(field_id);
if (map_it != fields_p->end())
{
name = map_it->second;
}
else
{
name = "";
}
Will this work or are there issues with reading a std::map from multiple threads?
Note: I'm working with visual studio 2008 currently, but I'd like this to work acros most main STL implementations.
Update: Edited code sample for const correctness.

This will work from multiple threads as long as your map remains the same. The map you use is immutable de facto so any find will actually do a find in a map which does not change.
Here is a relevant link: http://www.sgi.com/tech/stl/thread_safety.html
The SGI implementation of STL is
thread-safe only in the sense that
simultaneous accesses to distinct
containers are safe, and simultaneous
read accesses to to shared containers
are safe. If multiple threads access a
single container, and at least one
thread may potentially write, then the
user is responsible for ensuring
mutual exclusion between the threads
during the container accesses.
You fall into he "simultaneous read accesses to shared containers" category.
Note: this is true for the SGI implementation. You need to check if you use another implementation. Of the two implementations which seem widely used as an alternative, STLPort has built-in thread safety as I know. I don't know about the Apache implementation though.

It should be fine.
You can use const references to it if you want to document/enforce read-only behaviour.
Note that correctness isn't guaranteed (in principle the map could choose to rebalance itself on a call to find), even if you do use const methods only (a really perverse implementation could declare the tree mutable). However, this seems pretty unlikely in practise.

Yes it is.
See related post with same question about std::set:
Is the C++ std::set thread-safe?

For MS STL implementation
Thread Safety in the C++ Standard Library
The following thread safety rules apply to all classes in the C++ Standard Library—this includes shared_ptr, as described below. Stronger guarantees are sometimes provided—for example, the standard iostream objects, as described below, and types specifically intended for multithreading, like those in .
An object is thread-safe for reading from multiple threads. For example, given an object A, it is safe to read A from thread 1 and from thread 2 simultaneously.

Related

Is thread::id used anywhere in the standard C++ library?

std::thread::get_id() gives you an implementation defined value which uniquely identifies a given thread, but the interesting thing for me is that there is a dedicated type for this, thread::id, is this type used anywhere in the standard library ?
A thread::id is used somewhere or in any interface that you know of ? AFAIK this type is used nowhere hence it looks like it's quite useless at the moment .
The purpose of such user defined types is to make things easy for implementors.
In many cases, you will be implementing C++ threads on top of existing code bases, OS systems, or the like. These may have different types of thread identification.
With that type, the C++ std implementation is more likely to be able to expose the underlying thread identification value directly, or with minimal modification.
Knowing what thread you are on is quite useful in many situations from the client side, and implementing it without an id from the system is complex.
std::thread::id can be sorted, compared (in a totally ordered way, with sensible equality) and std::hashed, all of which are useful with the std library. They can be copied (trivially) and constructed with no arguments (for the id that represents no thread). They can be turned into a string via ostream <<, with the only guarantee that the resulting string is never the same except for two == ids.
Any operations on them beyond that is undefined. But, an implementation could make thread_id be basically a pointer, or an unsigned integer index in an array, or one of many different underlying implementations. An implementation who accesses such information is doing something completely implementation dependent, however.
Is thread::id used anywhere in the standard C++ library?
No, thread::id is not used in the interface of the standard C++ library.
It might be used in the implementation of one of the recursive mutexes, but that would be an implementation detail. I do not know if any implementation currently uses it.
AFAIK this type is used nowhere hence it looks like it's quite useless
at the moment.
Here are a few of the other types in the std::library that are "useless" by this definition:
list
set
multiset
map
multimap
unordered_set
unordered_multimap
array
atomic
bitset
complex
condition_variable
condition_variable_any
forward_list
fstream
reverse_iterator
move_iterator
mutex
queue
stack
regex
thread
This is not an exhaustive list.
Somewhat reluctant update
Me:
Is your question: What is a motivating use case for thread::id?
user2485710:
yes, that sounds right, with, maybe, a special focus on the standard library.
thread::id is sometimes used to map a thread to attributes or vice-versa. For example one might implement "named threads" by associating a std::string with a std::thread::id in a std::map. When a thread of execution logged, or throws an exception, or has some notable event, one could look up the name of the thread to create a message for the log, error message, etc, in order to give better context. For example threads might have suggestive names such as: "database server" or "table updater".
thread::id is more convenient to use than thread for this application as thread is usually needed elsewhere to control joining.
Another use for thread::id is to detect if the current thread executing a function is the same thread as the last thread that executed that same function. I've seen this technique used in the implementation of recursive_mutex::lock(). For example, if the mutex is locked and this_thread::get_id() == stored_id, then increment lock count.
As far as "focus on the standard C++ library" is concerned, I really don't know what that means. If it means: "Used in the interface", then this question has already been answered earlier in this answer, and in other answers:
No, thread::id is not used in the interface of the standard C++ library.
There are many, many types in the std::lib that are not part of the API of other parts of the std::lib. thread::id was not standardized because it was needed in the API of other parts of the library. It was standardized because std::thread was being standardized, and thread::id is a natural part of the std::thread library, and because it is useful for the use cases such as those mentioned above.
The key difference between thread and thread::id is that thread maintains unique ownership of a thread of execution. Thus thread is a move-only type. This is very analogous to unique_ptr. Only one std::thread can be used to join(). In contrast, a thread::id is just a "name" for a thread. Names are copyable and comparable. They aren't used for ownership, only for identification.
This separation of concerns (privilege-to-join vs identification) is made more obvious in a language which supports both move-only and copyable types.
AFAIK this type is used nowhere hence it looks like it's quite useless at the moment .
A thread::id implements relational operators and hash support. This allows you, the user, to have them as keys in associative and unordered containers.

Is erasing and inserting in a single linked list thread safe?

Using std::forward_list are there any data races when erasing and inserting? For example I have one thread that does nothing but add new elements at the end of the list, and I have another thread that walks the (same) list and can erase elements from it.
From what I know of linked lists, each element holds a pointer to the next element, so if I erase the last element, at the same time that I am inserting a new element, would this cause a data race or do these containers work differently (or do they handle that possibility)?
If it is a data race, is there a (simple and fast) way to avoid this? (Note: The thread that inserts is the most speed critical of the two.)
There are thread-safety guarantees for the standard C++ library containers but they tend not to be of the kind people would consider thread-safety guarantees (that is, however, an error of people expecting the wrong thing). The thread-safety guarantees of standard library containers are roughly (the relevant section 17.6.5.9 [res.on.data.races]):
You can have as many readers of a container as you want. What exactly qualifies as reader is a bit subtly but roughly amounts to users of const member functions plus using a few non-const members to only read the data (the thread safety of the read data isn't any of the containers concern, i.e., 23.2.2 [container.requirements.dataraces] specifies that the elements can be changed without the containers introducing data races).
If there is one writer of a container, there shall be no other readers or writes of the container in another thread.
That is, reading one end of a container and writing the other end is not thread safe! In fact, even if the actual container changes don't affect the reader immediately, you always need synchronization of some form when communicating a piece of data from one thread to another thread. That is, even if you can guarantee that the consumer doesn't erase() the node the producer currently insert()s, there would be a data race.
No, neither forward_list nor any other STL containers are thread-safe for writes. You must provide synchronization so that no other threads read or write to the container while a write is occurring. Only simultaneous reads are safe.
The simplest way to do this is to use a mutex to lock access to the container while an insert is occurring. Doing this in a portable way requires C++ 11 (std::mutex) or platform-specific features (mutexes in Windows, perhaps pthreads in Linux/Unix).
Unless you're using a version of the STL that explicitly states it is thread-safe then no, the containers are not thread safe.
It's rare to make general purpose containers thread safe by default, as it imposses a performance hit on users who don't require thread safe access to the container, and this is by far the normal usage pattern.
If thread safety is an issue for you then you'll need to surround your code with locks, or use a data structure that is designed specifically designed for multi threaded access.
std containers are not meant to be thread safe.
You should carefully protect them for modify operations.

Why map is not multithread safe in C++?

I met this problem when I tried to solve an concurrency issue in my code. In the original code, we only use a unique lock to lock the write operation on a cache which is a stl map. But there is no restrictions on read operation to the cache. So I was thinking add a shared lock to the read operation and keep the unique lock to the write. But someone told me that it's not safe to do multithreading on a map due to some internal caching issue that it itself does.
Can someone explain the reason in details? What does the internal caching do?
The implementations of std::map must all meet the usual
guarantees: if all your do is read, then there is no need for
external synchrionization, but as soon as one thread modifies,
all accesses must be synchronized.
It's not clear to me what you mean by "shared lock"; there is no
such thing in the standard. But if any one thread is writing,
you must ensure that no other threads may read at the same time.
(Something like Posix' pthread_rwlock could be used, but
there's nothing similar in the standard, at least not that I can
find off hand.)
Since C++11 at least, a const operation on a standard library class is guaranteed to be thread safe (assuming const operations on objects stored in it are thread safe).
All const member functions of std types can be safely called from multiple threads in C++11 without explicit synchronization. In fact, any type that is ever used in conjunction with the standard library (e.g. as a template parameter to a container) must fulfill this guarantee.
Clarificazion: The standard guarantees that your program will have the desired behaviour as long as you never cause a write and any other access to the same data location without a synchronization point in between. The rationale behind this is that modern CPUs don't have strict sequentially consistent memory models, which would limit scalability and performance. Under the hood, your compiler and standard library will emit appropriate memory fences at places where stronger memory orderings are needed.
I really don't see why there would be any caching issue...
If I refer to the stl definition of a map, it should be implemented as a binary search tree.
A binary search tree is simply a tree with a pool of key-value nodes. Those nodes are sorted following the natural order of their keys and, to avoid any problem, keys must be unique. So no internal caching is needed at all.
As no internal caching is required, read operations are safe in multi-threading context. But it's not the same story for write operations, for those you must provide your own synchronization mechanism as for any non-threading-aware data structure.
Just be aware that you must also forbid any read operations when a write operation is performed by a thread, because this write operation can result in a slow and complete rebalancing of the binary tree, i.e. a quick read operation during a long write operation would return a wrong result.

C++ Thread safe vector.erase

I wrote a threaded Renderer for SFML which takes pointers to drawable objects and stores them in a vector to be draw each frame. Starting out adding objects to the vector and removing objects to the vector would frequently cause Segmentation faults (SIGSEGV). To try and combat this, I would add objects that needed to be removed/added to a queue to be removed later (before drawing the frame). This seemed to fix it, but lately I have noticed that if I add many objects at one time (or add/remove them fast enough) I will get the same SIGSEGV.
Should I be using locks when I add/remove from the vector?
You need to understand the thread-safety guarantees the C++ standard (and implementations of C++2003 for possibly concurrent systems) give. The standard containers are a thread-safe in the following sense:
It is OK to have multiple concurrent threads reading the same container.
If there is one thread modifying a container there shall be no concurrent threads reading or writing the same container.
Different containers are independent of each other.
Many people misunderstand thread-safety of container to mean that these rules are imposed by the container implementation: they are not! It is your responsibility to obey these rules.
The reason these aren't, and actually can't, be imposed by the containers is that they don't have an interface suitable for this. Consider for example the following trivial piece of code:
if (!c.empty() {
auto value = c.back();
// do something with the read value
}
The container can control the access to the calls to empty() and back(). However, between these calls it necessarily needs to release any sort of synchronization facilities, i.e. by the time the thread tries to read c.back() the container may be empty again! There are essentially two ways to deal with this problem:
You need to use external locking if there is possibility that a concurrent thread may be changing the container to span the entire range of accesses which are interdependent in some form.
You change the interface of the containers to become monitors. However, the container interface isn't at all suitable to be changed in this direction because monitors essentially only support "fire and forget" style of interfaces.
Both strategies have their advantages and the standard library containers are clearly supporting the first style, i.e. they require external locking when using concurrently with a potential of at least one thread modifying the container. They don't require any kind of locking (neither internal or external) if there is ever only one thread using them in the first place. This is actually the scenario they were designed for. The thread-safety guarantees given for them are in place to guarantee that there are no internal facilities used which are not thread-safe, say one per-object iterator object or a memory allocation facility shared by multiple threads without being thread-safe, etc.
To answer the original question: yes, you need to use external synchronization, e.g. in the form of mutex locks, if you modify the container in one thread and read it in another thread.
Should I be using locks when I add/remove from the vector?
Yes. If you're using the vector from two threads at the same time and you reallocate, then the backing allocation may be swapped out and freed behind the other thread's feet. The other thread would be reading/writing to freed memory, or memory in use for another unrelated allocation.

Locked write, unlocked read

Algorithmic question
How to allow only the following types of threaded operations on an object?
multiple simultaneous reads, no writes
single write, no reads
Example: wrapper for STL container allowing efficient search from multiple threads. For simplicity, let assume no iterator can be accessed from outside of the wrapper in question.
Let's assume we have semaphores and mutexes at out disposal.
I know that boost libraries has this concept implemented. I'd like to understand how is this usually done.
Use boost::shared_mutex to handle frequent read, infrequent write access patterns.
As you've noted, STL containers are 'leaky' in that you can retrieve an iterator which has to be treated as either an implicit ongoing operation for write (if non-const) or read (if const), until the iterator goes out of scope. Writes to the container by other threads while you hold such an iterator can invalidate it. Your wrapper would have to be carefully designed to handle this case and keep the wrapper class efficient.
You want a "multiple-reader / single-writer" mutex : Boost.Thread provides one.