std::queue pop push thread safety - c++

Basically my question is: is it safe to call front+pop and push from two thread without synchronization?
I've read about this and never found a clear answer. People are saying you should use mutex, but some are hinting you can use two different mutexes for push and for pop. Is it true?
Does this code has undefined behavior?
std::queue<int> queue;
int pop()
{
int x = queue.front();
queue.pop();
return x;
}
void push(int x)
{
queue.push(x);
}
int main()
{
queue.push(1);
std::thread t1(pop);
std::thread t2(push);
t1.join();
t2.join();
}
I would say it's undefined behavior, but you can design a pop push safe queue, so why isn't std::queue is like that?

No, it's not. The standard containers are not thread-safe -- you can't mutate them from two threads. You will have to use a mutex, or a lock-free queue. The problem is that the std::queue has to be able to work with objects like std::string, which cannot be atomically moved or constructed, and the std::queue also has to support arbitrary sizes.
Most lock-free queues only work with machine-word size types and a fixed maximum size. If you need the flexibility of std::queue and thread-safety, you'll have to manually use a mutex. Adding the mutex to the default implementation would be also extremely wasteful, as now suddenly every application would get thread safety even if it doesn't need it.

Related

How to individually lock unordered_map elements in C++

I have an unordered_map that I want to be accessible by multiple threads but locking the whole thing with a mutex would be too slow.
To get around this I put a mutex in each element of the unordered_map:
class exampleClass{
std::mutex m;
int data;
};
std::unordered_map<int,exampleClass> exampleMap;
The issue is I'm unable to safely erase elements, because in order to destroy a mutex it must be unlocked but if it's unlocked then another thread could lock it and be writing to or reading the element during destruction.
unordered_map is not suitable for fine-grained parallelism. It is not legal
to add or remove elements without ensuring mutual exclusion during the process.
I would suggest using something like tbb::concurrent_hash_map instead, which will result in less lock contention than locking the map as a whole. (There are other concurrent hash table implementations out there; the advantage of TBB is that it's well-supported and stable.)
#Sneftel's answer is good enough.
But if you insist on using std::unordered_map, I suggest you two use one mutex to protect the insertion/deletion of the map, and another mutex for each element for modifying the element.
class exampleClass{
std::mutex m;
int data;
};
std::unordered_map<int,exampleClass> exampleMap;
std::mutex mapLock;
void add(int key, int value) {
std::unique_lock<std::mutex> _(mapLock);
exampleMap.insert({key, value});
}
void delete(int key) {
std::unique_lock<std::mutex> _(mapLock);
auto it = exampleMap.find(key);
if (it != exampleMap.end()) {
std::unique_lock<std::mutex> _1(it->m);
exampleMap.erase(it);
}
}
These should perform better for a big lock on the whole map if delete is not a frequent operation.
But be careful of these kinds of code, because it is hard to reason and to get right.
I strongly recommend #Sneftel's answer.
You have the following options:
Lock the entire mutex
Use a container of shared_ptr so the actual class can be modified (with or without a mutex) unrelated to the container.

std::atomic_bool for cancellation flag: is std::memory_order_relaxed the correct memory order?

I have a thread that reads from a socket and generates data. After every operation, the thread checks a std::atomic_bool flag to see if it must exit early.
In order to cancel the operation, I set the cancellation flag to true, then call join() on the worker thread object.
The code of the thread and the cancellation function looks something like this:
std::thread work_thread;
std::atomic_bool cancel_requested{false};
void thread_func()
{
while(! cancel_requested.load(std::memory_order_relaxed))
process_next_element();
}
void cancel()
{
cancel_requested.store(true, std::memory_order_relaxed);
work_thread.join();
}
Is std::memory_order_relaxed the correct memory order for this use of an atomic variable?
As long as there is no dependency between cancel_requested flag and anything else, you should be safe.
The code as shown looks OK, assuming you use cancel_requested only to expedite the shutdown, but also have a provision for an orderly shutdown, such as a sentinel entry in the queue (and of course that the queue itself is synchronized).
Which means your code actually looks like this:
std::thread work_thread;
std::atomic_bool cancel_requested{false};
std::mutex work_queue_mutex;
std::condition_variable work_queue_filled_cond;
std::queue work_queue;
void thread_func()
{
while(! cancel_requested.load(std::memory_order_relaxed))
{
std::unique_lock<std::mutex> lock(work_queue_mutex);
work_queue_filled_cond.wait(lock, []{ return !work_queue.empty(); });
auto element = work_queue.front();
work_queue.pop();
lock.unlock();
if (element == exit_sentinel)
break;
process_next_element(element);
}
}
void cancel()
{
std::unique_lock<std::mutex> lock(work_queue_mutex);
work_queue.push_back(exit_sentinel);
work_queue_filled_cond.notify_one();
lock.unlock();
cancel_requested.store(true, std::memory_order_relaxed);
work_thread.join();
}
And if we're that far, then cancel_requested may just as well become a regular variable, the code even becomes simpler.
std::thread work_thread;
bool cancel_requested = false;
std::mutex work_queue_mutex;
std::condition_variable work_queue_filled_cond;
std::queue work_queue;
void thread_func()
{
while(true)
{
std::unique_lock<std::mutex> lock(work_queue_mutex);
work_queue_filled_cond.wait(lock, []{ return cancel_requested || !work_queue.empty(); });
if (cancel_requested)
break;
auto element = work_queue.front();
work_queue.pop();
lock.unlock();
process_next_element(element);
}
}
void cancel()
{
std::unique_lock<std::mutex> lock(work_queue_mutex);
cancel_requested = true;
work_queue_filled_cond.notify_one();
lock.unlock();
work_thread.join();
}
memory_order_relaxed is generally hard to reason about, because it blurs the general notion of sequentially executing code. So the usefulness of it is very, very limited as Herb explains in his atomic weapons talk.
Note std::thread::join() by itself acts as a memory barrier between the two threads.
Whether this code is correct depends on a lot of things. Most of all it depends on what exactly you mean by "correct". As far as I can tell, the bits of code that you show don't invoke undefined behavior (assuming your work_thread and cancel_requested are not actually initialized in the order your snippet above suggests as you would then have the thread potentially reading the uninitialized value of the atomic). If all you need to do is change the value of that flag and have the thread eventually see the new value at some point independent of whatever else may be going on, then std::memory_order_relaxed is sufficient.
However, I see that your worker thread calls a process_next_element() function. That suggests that there is some mechanism through which the worker thread receives elements to process. I don't see any way for the thread to exit when all elements have been processed. What does process_next_element() do when there's no next element available right away? Does it just return immediately? In that case you've got yourself a busy wait for more input or cancellation, which will work but is probably not ideal. Or does process_next_element() internally call some function that blocks until an element becomes available!? If that is the case, then cancelling the thread would have to involve first setting the cancellation flag and then doing whatever is needed to make sure the next element call your thread is potentially blocking on returns. In this case, it's potentially essential that the thread can never see the cancellation flag after the blocking call returns. Otherwise, you could potentially have the call return, go back into the loop, still read the old cancellation flag and then go call process_next_element() again. If process_next_element() is guaranteed to just return again, then you're fine. If that is not the case, you have a deadlock. So I believe it technically depends on what exactly process_next_element() does. One could imagine an implementation of process_next_element() where you would potentially need more than relaxed memory order. However, if you already have a mechanism for fetching new elements to process, why even use a separate cancellation flag? You could simply handle cancellation through that same mechanism, e.g., by having it return a next element with a special value or return no element at all to signal cancellation of processing and cause the thread to return instead of relying on a separate flag…

is it ok to access value(entry in thread safe map) pointed by pointer inside non-thread safe container?

For example,
// I am using thread safe map from
// code.google.com/p/thread-safe-stl-containers
#include <thread_safe_map.h>
class B{
vector<int> b1;
};
//Thread safe map
thread_safe::map<int, B> A;
B b_object;
A[1] = b_object;
// Non thread safe map.
map<int, B*> C;
C[1] = &A[1].second;
So are following operations still thread safe?
Thread1:
for(int i=0; i<10000; i++) {
cout << C[1]->b1[i];
}
Thread2:
for(int i=0; i<10000; i++) {
C[1]->b1.push_back(i);
}
Is there any problem in the above code? If so how can I fix it?
Is it OK to access value(entry in thread safe map) pointed by pointer inside non-thread safe container?
No, what you are doing there is not safe. The way your thread_safe_map is implemented is to take a lock for the duration of every function call:
//Element Access
T & operator[]( const Key & x ) { boost::lock_guard<boost::mutex> lock( mutex ); return storage[x]; }
The lock is released as soon as the access function ends which means that any modification you make through the returned reference has no protection.
As well as being not entirely safe this method is very slow.
A safe(er), efficient, but highly experimental way to lock containers is proposed here: https://github.com/isocpp/CppCoreGuidelines/issues/924
with source code here https://github.com/galik/GSL/blob/lockable-objects/include/gsl/gsl_lockable (shameless self promotion disclaimer).
In general, STL containers can be accessed from multiple threads as long as all threads either:
read from the same container
modify elements in a thread safe manner
You cannot push_back (or erase, insert, etc.) from one thread and read from another thread. Suppose that you are trying to access an element in thread 1 while push_back in thread 2 is in the middle of reallocation of vector's storage. This might crash the application, might return garbage (or might work, if you're lucky).
The second bullet point applies to situations like this:
std::vector<std::atomic_int> elements;
// Thread 1:
elements[10].store(5);
// Thread 2:
int v = elements[10].load();
In this case, you're concurrently reading and writing an atomic variable, but the vector itself is not modified - only its element is.
Edit: using thread_safe::map doesn't change anything in you're case. While the modifying the map is ok, modifying its elements is not. Putting std::vector in a thread-safe collection doesn't automagically make it thread-safe too.

C++: Thread Safety in a Signal/Slot Library

I'm implementing a Signal/Slot framework, and got to the point that I want it to be thread-safe. I already had a lot of support from the Boost mailing-list, but since this is not really boost-related, I'll ask my pending question here.
When is a signal/slot implementation (or any framework that calls functions outside itself, specified in some way by the user) considered thread-safe? Should it be safe w.r.t. its own data, i.e. the data associated to its implementation details? Or should it also take into account the user's data, which might or might not be modified whatever functions are passed to the framework?
This is an example given on the mailing-list (Edit: this is an example use-case --i.e. user code--. My code is behind the calls to the Emitter object):
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
In the above code, it might happen that Event is emitted, causing someFunction to be executed. If somePtr is non-null, but becomes null just after the if, but before the assignment, we're in trouble. From the point of view of thread2, this is not obvious because it is disconnecting someFunction before calling cleanupPtr.
I can see why this could potentially lead to trouble, but who's responsibility is this? Should my library protect the user from using it in every irresponsible but imaginable way?
I suspect there is no clearly good answer, but clarity will come from documenting the guarantees you wish to make about concurrent access to an Emitter object.
One level of guarantee, which to me is what is implied by a promise of thread safety, is that:
Concurrent operations on the object are guaranteed to leave the object in a consistent state (at least, from the point of view of the accessing threads.)
Non-commutative operations will be performed as if they were scheduled serially in some (unknown) order.
Then the question is, what does the emit method promise semantically: passing control to the connected routine, or evaluation of the function? If the former, then your work sounds like it is already done; if the latter, then the 'as-if ordered' requirement would mean that you need to enforce some level of synchronisation.
Users of the library can work with either, provided it is clear what is being promised.
Firstly the simplest possibility: If you don't claim your library to be thread-safe, you don't have to bother about this.
(But even) if you do:
In your example the user would have to take care about thread-safety, since both functions could be dangerous, even without using your event-system (IMHO, this is a pretty good way to determine who should take care about those kind of problems). A possible way for him to do this in C++11 could be:
#include <mutex>
// A mutex is used to control thread-acess to a shared resource
std::mutex _somePtr_mutex;
int* somePtr = nullptr;
void someFunction()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
if(somePtr)
*somePtr = 17;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
void cleanupPtr()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
int *tmp = somePtr;
somePtr = null;
delete tmp;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
The last question is easy. If you say your library is threadsafe, it should threadsafe. It makes no sense to say it is partly threadsafe or, it is only threadsafe if you do not abuse it. In that case you have to explain what exactly is not threadsafe.
Now to your first question regarded someFunction:
The operation is non atomic. Which means the CPU can interrupt between the if and the assigment. And that will happen, I know that :-) The other thread can erase the pointer anytime. Even between two short and fast looking statements.
Now to cleanupPtr:
I am not a compiler expert, but if you want to be shure that your assigment take place in the same moment you wrote it in code you should write the keyword volatile in front of the declaration of somePtr. The compiler will now know that you use that attribute in a multithreaded situation and will not buffer the value in a register of the CPU.
If you have a thread situation with a reader thread and a writer thread, the keyword volatile can (IMHO) be enough to sync them. As long as the attributes you use to exchange information between threads are generic.
For other situations you can use mutex or atomics. I will give you an example for mutex. I use C++11 for that, but it works similar with previous versions of C++ using boost.
Using mutex:
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
std::recursive_mutex g_mutex;
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
I only added a recursive mutex here without changing any other code of the sample, even if it's now cargo code.
There are two kinds of mutex in the std. A utterly useless std::mutex and the std::recursive_mutex which work like you expect a mutex should work. The std::mutex exclude the access of any further call even from the same thread. Which can happen if a method which needs mutex protection calls a public method which use the same mutex. std::recursive_mutex is reentrant for the same thread.
Atomics (or interlocks in win32) are another way, but only to exchange values between threads or access them concurrently. Your example is missing such values, but in your case, I would look a little deeper in them (std::atomic).
UPDATE
If your are the user of a library which is not explicit declared as threadsafe by the developer, take it as non threadsafe and shield every call to it with a mutex lock.
To stick with the example. If you cannot change someFunction the you have to wrap the function like:
void threadsafeSomeFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
someFunction();
}

Mutex when writing to queue held in map for thread safety

I have a map<int, queue<int>> with one thread writing into it i.e. pushing messages into the queues. They key refers to a client_id, and the queue holds messages for the client. I am looking to make this read-write thread safe.
Currently, the thread that writes into it does something like this
map<int, queue<int>> msg_map;
if (msg_map.find(client_id) != msg_map.end())
{
queue<int> dummy_queue;
dummy_queue.push(msg); //msg is an int
msg_map.insert(make_pair(client_id, dummy_queue);
}
else
{
msg_map[client_id].push(msg);
}
There are many clients reading - and removing - from this map.
if (msg_map.find(client_id) != msg_map.end())
{
if (!msg_map.find(client_id)->second.empty())
{
int msg_rxed = msg_map[client_id].front();
//processing message
msg_map[client_id].pop();
}
}
I am reading this on mutexes (haven't used them before) and I was wondering when and where I ought to lock the mutex. My confusion lies in the fact that they are accessing individual queues (held within the same map). Do I lock the queues, or the map?
Is there a standard/accepted way to do this - and is using a mutex the best way to do this? There are '0s of client threads, and just that 1 single writing thread.
Simplifying and optimizing your code
For now we'll not concern ourselves with mutexes, we'll handle that later when the code is cleaned up a bit (it will be easier then).
First, from the code you showed there seems to be no reason to use an ordered std::map (logarithmic complexity), you could use the much more efficient std::unordered_map (average constant-time complexity). The choice is entirely up to you, if you don't need the container to be ordered you just have to change its declaration:
std::map<int, std::queue<int>> msg_map;
// or
std::unordered_map<int, std::queue<int>> msg_map; // C++11 only though
Now, maps are quite efficient by design but if you insist on doing lookups for each and every operation then you lose all the advantage of maps.
Concerning the writer thread, all your block of code (for the writer) can be efficiently replaced by just this line:
msg_map[client_id].push(msg);
Note that operator[] for both std::map and std::unordered_map is defined as:
Inserts a new element to the container using key as the key and a default constructed mapped value and returns a reference to the newly constructed mapped value. If an element with key key already exists, no insertion is performed and a reference to its mapped value is returned.
Concerning your reader threads, you can't directly use operator[] because it would create a new entry if none currently exists for a specific client_id so instead, you need to cache the iterator returned by find in order to reuse it and thus avoid useless lookups:
auto iter = msg_map.find(client_id);
// iter will be either std::map<int, std::queue<int>>::iterator
// or std::unordered_map<int, std::queue<int>>::iterator
if (iter != msg_map.end()) {
std::queue<int>& q = iter->second;
if (!q.empty()) {
int msg = q.front();
q.pop();
// process msg
}
}
The reason why I pop the message immediately, before processing it, is because it will improve concurrency when we add mutexes (we can unlock the mutex sooner, which is always good).
Making the code thread-safe
#hmjd's idea about multiple locks (one for the map, and one per queue) is interesting, but based on the code you showed us I disagree: any benefit you'll get from the additional concurrency will quite probably be negated by the additional time it takes to lock the queue mutexes (indeed, locking mutexes is a very expensive operation), not to mention the additional code complexity you'll have to handle. I'll bet my money on a single mutex (protecting the map and all the queues at once) being more efficient.
Incidentally, a single mutex solves the iterator invalidation problem if you want to use the more efficient std::unordered_map (std::map doesn't suffer from that problem though).
Assuming C++11, just declare a std::mutex along with your map:
std::mutex msg_map_mutex;
std::map<int, std::queue<int>> msg_map; // or std::unordered_map
Protecting the writer thread is quite straightforward, just lock the mutex before accessing the map:
std::lock_guard<std::mutex> lock(msg_map_mutex);
// the lock is held while the lock_guard object stays in scope
msg_map[client_id].push(msg);
Protecting the reader threads is barely any harder, the only trick is that you'll probably want to unlock the mutex ASAP in order to improve concurrency so you'll have to use std::unique_lock (which can be unlocked early) instead of std::lock_guard (which can only unlock when it goes out of scope):
std::unique_lock<std::mutex> lock(msg_map_mutex);
auto iter = msg_map.find(client_id);
if (iter != msg_map.end()) {
std::queue<int>& q = iter->second;
if (!q.empty()) {
int msg = q.front();
q.pop();
// assuming you don't need to access the map from now on, let's unlock
lock.unlock();
// process msg, other threads can access the map concurrently
}
}
If you can't use C++11, you'll have to replace std::mutex et al. with whatever your platform provides (pthreads, Win32, ...) or with the boost equivalent (which has the advantage of being as portable and as easy to use as the new C++11 classes, unlike the platform-specific primitives).
Read and write access to both the map and the queue need synchronized as both structures are being modified, including the map:
map<int, queue<int>> msg_map;
if (msg_map.find(client_id) != msg_map.end())
{
queue<int> dummy_queue;
dummy_queue.push(msg); //msg is an int
msg_map.insert(make_pair(client_id, dummy_queue);
}
else
{
msg_map[client_id].push(msg); // Modified here.
}
Two options are a mutex that locks both the map and queue or have a mutex for the map and a mutex per queue. The second approach is preferable as it reduces the length of time a single lock is held and means multiple threads can be updating several queues concurrently.