`std::condition_var::notify_all` deadlocks - c++

I have cpp code where one thread produces, pushing data into a queue and another consumes it before passing it to other libraries for processing.
std::mutex lock;
std::condition_variable new_data;
std::vector<uint8_t> pending_bytes;
bool data_done=false;
// producer
void add_bytes(size_t byte_count, const void *data)
{
if (byte_count == 0)
return;
std::lock_guard<std::mutex> guard(lock);
uint8_t *typed_data = (uint8_t *)data;
pending_bytes.insert(pending_bytes.end(), typed_data,
typed_data + byte_count);
new_data.notify_all();
}
void finish()
{
std::lock_guard<std::mutex> guard(lock);
data_done = true;
new_data.notify_all();
}
// consumer
Result *process(void)
{
data_processor = std::unique_ptr<Processor>(new Processor());
bool done = false;
while (!done)
{
std::unique_lock<std::mutex> guard(lock);
new_data.wait(guard, [&]() {return data_done || pending_bytes.size() > 0;});
size_t byte_count = pending_bytes.size();
std::vector<uint8_t> data_copy;
if (byte_count > 0)
{
data_copy = pending_bytes; // vector copies on assignment
pending_bytes.clear();
}
done = data_done;
guard.unlock();
if (byte_count > 0)
{
data_processor->process(byte_count, data_copy.data());
}
}
return data_processor->finish();
}
Where Processor is a rather involved class with a lot of multi-threaded processing, but as far as I can see it should be separated from the code above.
Now sometimes the code deadlocks, and I'm trying to figure out the race condition. My biggest clue is that the producer threads appears to be stuck under notify_all(). In GDB I get the following backtrace, showing that notify_all is waiting on something:
[Switching to thread 3 (Thread 0x7fffe8d4c700 (LWP 45177))]
#0 0x00007ffff6a4654d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ffff6a44240 in pthread_cond_broadcast##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#2 0x00007ffff67e1b29 in std::condition_variable::notify_all() () from /lib64/libstdc++.so.6
#3 0x0000000001221177 in add_bytes (data=0x7fffe8d4ba70, byte_count=256,
this=0x7fffc00dbb80) at Client/file.cpp:213
while also owning the lock
(gdb) p lock
$12 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 1, __count = 0, __owner = 45177, __nusers = 1, __kind = 0,
__spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
with the other thread waiting in the condition variable wait
[Switching to thread 5 (Thread 0x7fffe7d4a700 (LWP 45180))]
#0 0x00007ffff6a43a35 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff6a43a35 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007ffff67e1aec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2 0x000000000121f9a6 in std::condition_variable::wait<[...]::{lambda()#1}>(std::
unique_lock<std::mutex>&, [...]::{lambda()#1}) (__p=..., __lock=...,
this=0x7fffc00dbb28) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/std_mutex.h:104
There are two other threads running under the Process data part, which also hang on pthread_cond_wait, but as far as I'm aware they do not share any synchronization primities (and are just waiting for calls to processor->add_data or processor->finish)
Any ideas what notify_all is waiting for? or ways of finding the culprit?
Edit: I reproduced the code with a dummy processor here:
https://onlinegdb.com/lp36ewyRSP
But, pretty much as expected, this doesn't reproduce the issue, so I assume there is something more intricate going on. Possibly just different timings, but maybe some interaction between condition_variable and OpenMP (used by the real processor) could cause this?

I also encountered the same problem. After doing a few experiments, I found that if the notify_all starts to work after the condition_variable destroying, the notify_all will deadlocks.
See the code below.
#include <iostream>
#include <condition_variable>
#include <thread>
#include <chrono>
std::thread* t;
void test() {
std::condition_variable cv;
std::mutex cv_m;
t = new std::thread([&](){
std::this_thread::sleep_for(std::chrono::seconds(3));
std::cout << "...before notify_all\n";
cv.notify_all();
std::cout << "...after notify_all\n";
});
std::unique_lock<std::mutex> lk(cv_m);
std::cout << "Waiting... \n";
cv.wait(lk, []{return true;});
std::cout << "...finished waiting\n";
}
int main()
{
test();
t->join();
}
On linux:
LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.3 (Final)
Release: 6.3
Codename: Final
uname info:
Linux xxx_name 3.10.0_3-0-0-34 #1 SMP Sun Apr 26 22:58:21 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
Compile the code using gcc 8.2.0:
g++ --std=c++11 test.cpp -o test_cond -lpthread
The program will hang on after outputing "...before notify_all", and nerver reach "...after notify_all".
However, compile the code using gcc 12.1.0 the program will run successfully.

It seems to me that you should unlock your mutex in the producer before the call to notify_all (https://en.cppreference.com/w/cpp/thread/condition_variable)

Related

Embedding python interpreter into static library

I have a function that I want to put into a static library:
run_me.h:
void run_me();
run_me.cpp:
namespace {
std::mutex python_mutex;
struct PythonScopedLock {
PythonScopedLock() : _lock(python_mutex) {
PyEval_AcquireLock();
}
~PythonScopedLock() {
PyEval_ReleaseLock();
}
private:
std::lock_guard<decltype(python_mutex)> _lock;
};
struct PythonInterpreter {
PythonInterpreter() {
Py_InitializeEx(0);
PyEval_InitThreads();
PyEval_ReleaseLock();
}
~PythonInterpreter() {
PyEval_AcquireLock();
Py_Finalize();
}
};
const PythonInterpreter python_interpreter;
} //namespace
void run_me()
{
PythonScopedLock lock;
// import python module and call some python code
}
Next, in the following executable, I get a crash almost instantly on PyObject_Malloc.
#include <tbb/tbb.h>
#include "run_me.h"
int main(){
using namespace tbb;
parallel_for(blocked_range<size_t>(0, 1234, 1), [&](const blocked_range<size_t>& r) {
for (auto it = r.begin(); it != r.end(); ++it) {
run_me();
}
});
}
I checked the threads with gdb and they are nicely waiting for lock to be unlocked, but one of them crashes:
1 Thread 0x7ffff7fdb900 (LWP 21474) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
2 Thread 0x7fffef349700 (LWP 21483) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
3 Thread 0x7fffeef48700 (LWP 21484) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
* 4 Thread 0x7fffeeb47700 (LWP 21485) "tests" PyObject_Malloc (nbytes=41) at Objects/obmalloc.c:831
5 Thread 0x7fffee746700 (LWP 21486) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
6 Thread 0x7fffedf44700 (LWP 21488) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
7 Thread 0x7fffee345700 (LWP 21487) "tests" 0x00007ffff69374ed in __lll_lock_wait () from /lib64/libpthread.so.0
Situation changes when I move out PythonScopedLock from the body of run_me and put it into lambda body. This runs just fine:
#include <tbb/tbb.h>
#include "run_me.h"
namespace {
std::mutex python_mutex;
struct PythonScopedLock {
PythonScopedLock() : _lock(python_mutex) {
PyEval_AcquireLock();
}
~PythonScopedLock() {
PyEval_ReleaseLock();
}
private:
std::lock_guard<decltype(python_mutex)> _lock;
};
} // namespace
int main(){
using namespace tbb;
parallel_for(blocked_range<size_t>(0, 1234, 1), [&](const blocked_range<size_t>& r) {
for (auto it = r.begin(); it != r.end(); ++it) {
PythonScopedLock lock;
run_me();
}
});
}
I have few questions:
What is the difference, in this context, between heaving a lock inside of the lambda or a body of the function? To me it's the same, I just want to hide that lock inside of body of the function.
What are possible causes of the crash?
What are other problems besides static initialisation fiasco, that I can encounter during interpreter initialisation through a variable in anon namespace through const PythonInterpreter python_interpreter;?
EDIT
After a bit of debugging I noticed when lock is inside of run_me definition, in the static library, PyGILState_GetThisThreadState returns NULL. When lock is outside, in the main thread then it returns same pointer as _PyThreadState_Current. That kind of answers question number 1. and 2.
Question 3. still remains open.

boost::function deallocation segmentation fault in thread pool

I'm trying to make a thread pool that blocks the main thread until all it's children have completed. The real-world use-case for this is a "Controller" process that spawns independent processes for the user to interact with.
Unfortunately, when the main exits, a segmentation fault is encountered. I cannot figure out the cause of this segmentation fault.
I've authored a Process class which is little more than opening a shell script (called waiter.sh that contains a sleep 5) and waiting for the pid to exit. The Process class is initialized and then the Wait() method is placed in one of the threads in the thread pool.
The problem arises when ~thread_pool() is called. The std::queue cannot properly deallocate the boost::function passed to it, even though the reference to Process is still valid.
#include <sys/types.h>
#include <sys/wait.h>
#include <spawn.h>
#include <queue>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
extern char **environ;
class Process {
private:
pid_t pid;
int status;
public:
Process() : status(0), pid(-1) {
}
~Process() {
std::cout << "calling ~Process" << std::endl;
}
void Spawn(char **argv) {
// want spawn posix and wait for th epid to return
status = posix_spawn(&pid, "waiter.sh", NULL, NULL, argv, environ);
if (status != 0) {
perror("unable to spawn");
return;
}
}
void Wait() {
std::cout << "spawned proc with " << pid << std::endl;
waitpid(pid, &status, 0);
// wait(&pid);
std::cout << "wait complete" << std::endl;
}
};
Below is the thread_pool class. This is loosely adapted from the accepted answer for this question
class thread_pool {
private:
std::queue<boost::function<void() >> tasks;
boost::thread_group threads;
std::size_t available;
boost::mutex mutex;
boost::condition_variable condition;
bool running;
public:
thread_pool(std::size_t pool_size) : available(pool_size), running(true) {
std::cout << "creating " << pool_size << " threads" << std::endl;
for (std::size_t i = 0; i < available; ++i) {
threads.create_thread(boost::bind(&thread_pool::pool_main, this));
}
}
~thread_pool() {
std::cout << "~thread_pool" << std::endl;
{
boost::unique_lock<boost::mutex> lock(mutex);
running = false;
condition.notify_all();
}
try {
threads.join_all();
} catch (const std::exception &) {
// supress exceptions
}
}
template <typename Task>
void run_task(Task task) {
boost::unique_lock<boost::mutex> lock(mutex);
if (0 == available) {
return; //\todo err
}
--available;
tasks.push(boost::function<void()>(task));
condition.notify_one();
return;
}
private:
void pool_main() {
// wait on condition variable while the task is empty and the pool is still
// running
boost::unique_lock<boost::mutex> lock(mutex);
while (tasks.empty() && running) {
condition.wait(lock);
}
// copy task locally and remove from the queue. this is
// done within it's own scope so that the task object is destructed
// immediately after running the task. This is useful in the
// event that the function contains shared_ptr arguments
// bound via 'bind'
{
auto task = tasks.front();
tasks.pop();
lock.unlock();
// run the task
try {
std::cout << "running task" << std::endl;
task();
} catch (const std::exception &) {
// supress
}
}
// task has finished so increment count of availabe threads
lock.lock();
++available;
}
};
Here is the main:
int main() {
// input arguments are not required
char *argv[] = {NULL};
Process process;
process.Spawn(argv);
thread_pool pool(5);
pool.run_task(boost::bind(&Process::Wait, &process));
return 0;
}
The output for this is
creating 5 threads
~thread_pool
I am waiting... (from waiting.sh)
running task
spawned proc with 2573
running task
running task
running task
running task
wait complete
Segmentation fault (core dumped)
And here is the stack trace:
Starting program: /home/jandreau/NetBeansProjects/Controller/dist/Debug/GNU- Linux/controller
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
creating 5 threads
[New Thread 0x7ffff691d700 (LWP 2600)]
[New Thread 0x7ffff611c700 (LWP 2601)]
[New Thread 0x7ffff591b700 (LWP 2602)]
[New Thread 0x7ffff511a700 (LWP 2603)]
[New Thread 0x7ffff4919700 (LWP 2604)]
~thread_pool
running task
running task
spawned proc with 2599
[Thread 0x7ffff611c700 (LWP 2601) exited]
running task
[Thread 0x7ffff591b700 (LWP 2602) exited]
running task
[Thread 0x7ffff511a700 (LWP 2603) exited]
running task
[Thread 0x7ffff4919700 (LWP 2604) exited]
I am waiting...
wait complete
[Thread 0x7ffff691d700 (LWP 2600) exited]
Thread 1 "controller" received signal SIGSEGV, Segmentation fault.
0x000000000040f482 in boost::detail::function::basic_vtable0<void>::clear (
this=0xa393935322068, functor=...)
at /usr/include/boost/function/function_template.hpp:509
509 if (base.manager)
(gdb) where
#0 0x000000000040f482 in boost::detail::function::basic_vtable0<void>::clear (
this=0xa393935322068, functor=...)
at /usr/include/boost/function/function_template.hpp:509
#1 0x000000000040e263 in boost::function0<void>::clear (this=0x62ef50)
at /usr/include/boost/function/function_template.hpp:883
#2 0x000000000040cf20 in boost::function0<void>::~function0 (this=0x62ef50,
__in_chrg=<optimized out>)
at /usr/include/boost/function/function_template.hpp:765
#3 0x000000000040b28e in boost::function<void ()>::~function() (
this=0x62ef50, __in_chrg=<optimized out>)
at /usr/include/boost/function/function_template.hpp:1056
#4 0x000000000041193a in std::_Destroy<boost::function<void ()> >(boost::function<void ()>*) (__pointer=0x62ef50)
at /usr/include/c++/5/bits/stl_construct.h:93
#5 0x00000000004112df in std::_Destroy_aux<false>::__destroy<boost::function<void ()>*>(boost::function<void ()>*, boost::function<void ()>*) (
__first=0x62ef50, __last=0x62ed50)
at /usr/include/c++/5/bits/stl_construct.h:103
#6 0x0000000000410d16 in std::_Destroy<boost::function<void ()>*>(boost::function<void ()>*, boost::function<void ()>*) (__first=0x62edd0, __last=0x62ed50)
at /usr/include/c++/5/bits/stl_construct.h:126
#7 0x0000000000410608 in std::_Destroy<boost::function<void ()>*, boost::function<void ()> >(boost::function<void ()>*, boost::function<void ()>*, std::allocat---Type <return> to continue, or q <return> to quit---
or<boost::function<void ()> >&) (__first=0x62edd0, __last=0x62ed50)
at /usr/include/c++/5/bits/stl_construct.h:151
#8 0x000000000040fac5 in std::deque<boost::function<void ()>, std::allocator<boost::function<void ()> > >::_M_destroy_data_aux(std::_Deque_iterator<boost::function<void ()>, boost::function<void ()>&, boost::function<void ()>*>, std::_Deque_iterator<boost::function<void ()>, boost::function<void ()>&, boost::function<void ()>*>) (this=0x7fffffffdaf0, __first=..., __last=...)
at /usr/include/c++/5/bits/deque.tcc:845
#9 0x000000000040e6e4 in std::deque<boost::function<void ()>, std::allocator<boost::function<void ()> > >::_M_destroy_data(std::_Deque_iterator<boost::function<void ()>, boost::function<void ()>&, boost::function<void ()>*>, std::_Deque_iterator<boost::function<void ()>, boost::function<void ()>&, boost::function<void ()>*>, std::allocator<boost::function<void ()> > const&) (
this=0x7fffffffdaf0, __first=..., __last=...)
at /usr/include/c++/5/bits/stl_deque.h:2037
#10 0x000000000040d0c8 in std::deque<boost::function<void ()>, std::allocator<boost::function<void ()> > >::~deque() (this=0x7fffffffdaf0,
__in_chrg=<optimized out>) at /usr/include/c++/5/bits/stl_deque.h:1039
#11 0x000000000040b3ce in std::queue<boost::function<void ()>, std::deque<boost::function<void ()>, std::allocator<boost::function<void ()> > > >::~queue() (
this=0x7fffffffdaf0, __in_chrg=<optimized out>)
at /usr/include/c++/5/bits/stl_queue.h:96
#12 0x000000000040b6c0 in thread_pool::~thread_pool (this=0x7fffffffdaf0,
---Type <return> to continue, or q <return> to quit---
__in_chrg=<optimized out>) at main.cpp:63
#13 0x0000000000408b60 in main () at main.cpp:140
I'm puzzled by this because the Process hasn't yet gone out of scope and I'm passing a copy of the boost::function<void()> to the thread pool for processing.
Any ideas?
The stack trace indicates that you are destroying a std::function that has not been properly initialized (e.g. some random memory location that is treated as being a std::function) or that you are destroying a std::function twice.
The problem is that your program pushes to tasks only once, but pops five times, hence you remove elements from an empty deque, which is undefined behaviour.
The while loop in pool_main terminates if running is false, and running may be false even if the deque is empty. Then you pop unconditionally. You might consider correcting pool_main as follows:
void pool_main() {
// wait on condition variable
// while the task is empty and the pool is still
// running
boost::unique_lock<boost::mutex> lock(mutex);
while (tasks.empty() && running) {
condition.wait(lock);
}
// copy task locally and remove from the queue. this is
// done within it's own scope so that the task object is destructed
// immediately after running the task. This is useful in the
// event that the function contains shared_ptr arguments
// bound via 'bind'
if (!tasks.empty ()) { // <--- !!!!!!!!!!!!!!!!!!!!!!!!
auto task = tasks.front();
tasks.pop();
lock.unlock();
// run the task
try {
std::cout << "running task" << std::endl;
task();
} catch (const std::exception &) {
// supress
}
}
// task has finished so increment count of availabe threads
lock.lock();
++available;
};
I am, however, not sure whether the logic regarding available is correct. Shouldn't available be decremented on starting the processing of a task and be incremented when it is finished (hence be changed within pool_main only and only within the newly introduced if clause)?
You don't seem to be allocating memory for
extern char **environ;
anywhere. Though wouldn't that be a link error?
Cutting this back to be a minimal reproduction case would help a lot. There's a lot of code here that's presumably not necessary to reproduce the problem.
Also, what is this:
// supress exceptions
If you are getting exceptions while joining your threads, then you presumably haven't joined them all and cleaning up the threads without joining them will cause an error after main exits.

thread sanitizer is not showing data race

I tried the program given here
small_race.c
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
Global = 43;
pthread_join(t, NULL);
return Global;
}
compilation
$ clang -fsanitize=thread -g -pthread -O1 small_race.c
$./a.out ==> No error it's passing successfully
I tried to create 2 more thread and also try to sleep in one of thread then also it's passing. I am using Debian OS
Something is wrong with your platform or installation. With your exact code, I get:
==================
WARNING: ThreadSanitizer: data race (pid=20087)
Write of size 4 at 0x000000601080 by thread T1:
#0 Thread1(void*) /tmp/a.cpp:4 (a2+0x000000400a7f)
#1 <null> <null> (libtsan.so.0+0x0000000235b9)
Previous write of size 4 at 0x000000601080 by main thread:
#0 main /tmp/a.cpp:10 (a2+0x000000400ac5)
Location is global '<null>' of size 0 at 0x000000000000 (a2+0x000000601080)
Thread T1 (tid=20089, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027a67)
#1 main /tmp/a.cpp:9 (a2+0x000000400abb)
SUMMARY: ThreadSanitizer: data race /tmp/a.cpp:4 Thread1(void*)
==================

packaged_task hanging on operator()

Compiling with gcc 4.7.2 on Ubuntu, compiled with -std=c++11 -O0 -pthread, I somehow created a deadlock in code that doesn't seem like it should ever run into that problem. I have a thread which just acquires a lock and then runs through a vector<function<void()>>, calling everything. Meanwhile, the main thread pushes std::packaged_task<int()>s onto it one-by-one and blocks on when that task's future returns. The tasks themselves are trivial (print and return).
Here is the full code. Running the app sometimes succeeds, but within a few tries will hang:
#include <iostream>
#include <future>
#include <thread>
#include <vector>
#include <functional>
std::unique_lock<std::mutex> lock() {
static std::mutex mtx;
return std::unique_lock<std::mutex>{mtx};
}
int main(int argc, char** argv)
{
std::vector<std::function<void()>> messages;
std::atomic<bool> running{true};
std::thread thread = std::thread([&]{
while (running) {
auto lk = lock();
std::cout << "[T] locked with " << messages.size() << " messages." << std::endl;
for (auto& fn: messages) {
fn();
}
messages.clear();
}
});
for (int i = 0; i < 1000000; ++i) {
std::packaged_task<int()> task([=]{
std::cout << "[T] returning " << i << std::endl;
return i;
});
{
auto lk = lock();
messages.emplace_back(std::ref(task));
}
task.get_future().get();
}
running = false;
thread.join();
}
Sample output:
[T] returning 127189
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
[T] returning 127190
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
[T] returning 127191
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 0 messages.
[T] locked with 1 messages.
... hangs forever ...
What's going on? Why does the call into packaged_task::operator() hang? Where is the deadlock? Is this a gcc bug?
[update] Upon deadlock, the two threads are at:
Thread 1 (line 39 is the task.get_future().get() line):
#0 pthread_cond_wait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x00007feb01fe800c in __gthread_cond_wait (this=Unhandled dwarf expression opcode 0xf3
)
at [snip]/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:879
#2 std::condition_variable::wait (this=Unhandled dwarf expression opcode 0xf3
) at [snip]/gcc-4.7.2/libstdc++-v3/src/c++11/condition_variable.cc:52
#3 0x0000000000404aff in void std::condition_variable::wait<std::__future_base::_State_base::wait()::{lambda()#1}>(std::unique_lock<std::mutex>&, std::__future_base::_State_base::wait()::{lambda()#1}) (this=0x6111e0, __lock=..., __p=...)
at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/condition_variable:93
#4 0x0000000000404442 in std::__future_base::_State_base::wait (this=0x6111a8)
at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:331
#5 0x00000000004060fb in std::__basic_future<int>::_M_get_result (this=0x7fffc451daa0)
at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:601
#6 0x0000000000405488 in std::future<int>::get (this=0x7fffc451daa0)
at [snip]gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:680
#7 0x00000000004024dc in main (argc=1, argv=0x7fffc451dbb8) at test.cxx:39
and Thread 2 (line 22 is the fn() line):
#0 pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:95
#1 0x00000000004020f6 in __gthread_once (__once=0x611214, __func=0x401e68 <__once_proxy#plt>)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/gthr-default.h:718
#2 0x0000000000404db1 in void std::call_once<void (std::__future_base::_State_base::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&), std::__future_base::_State_base* const, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()> >, std::reference_wrapper<bool> >(std::once_flag&, void (std::__future_base::_State_base::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&), std::__future_base::_State_base* const&&, std::reference_wrapper<std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()> >&&, std::reference_wrapper<bool>&&) (__once=..., __f=#0x7feb014fdc10)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/mutex:819
#3 0x0000000000404517 in std::__future_base::_State_base::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>, bool) (this=0x6111a8, __res=..., __ignore_failure=false)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:362
#4 0x0000000000407af0 in std::__future_base::_Task_state<int ()()>::_M_run() (this=0x6111a8)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:1271
#5 0x00000000004076cc in std::packaged_task<int ()()>::operator()() (this=0x7fffc451da30)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/future:1379
#6 0x000000000040745a in std::_Function_handler<void ()(), std::reference_wrapper<std::packaged_task<int ()()> > >::_M_invoke(std::_Any_data const&) (
__functor=...) at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1956
#7 0x00000000004051f2 in std::function<void ()()>::operator()() const (this=0x611290)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:2311
#8 0x000000000040232f in operator() (__closure=0x611040) at test.cxx:22
#9 0x0000000000403d8e in _M_invoke<> (this=0x611040)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1598
#10 0x0000000000403cdb in operator() (this=0x611040)
at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1586
#11 0x0000000000403c74 in _M_run (this=0x611028) at [snip]/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/thread:115
#12 0x00007feb01feae10 in execute_native_thread_routine (__p=Unhandled dwarf expression opcode 0xf3
) at [snip]/gcc-4.7.2/libstdc++-v3/src/c++11/thread.cc:73
#13 0x00007feb018879ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007feb015e569d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()
It seems that the problem is that you destroy the packaged_task possibly before operator() returns in the worker thread. This is most likely undefined behaviour. The program works fine for me if I re-aquire the mutex in the loop after waiting for the future to return a result. This serializes operator() and the destructor of the packaged_task.
I can't explain why your code was broken, but I did find a way to fix it (storing tasks, not std::functions constructed from tasks):
#include <iostream>
#include <future>
#include <thread>
#include <vector>
#include <functional>
#include <unistd.h>
int main(int argc, char** argv)
{
// Let's face it - your lock() function was kinda weird.
std::mutex mtx;
// I've changed this to a vector of tasks, from a vector
// of functions. Seems to have done the job. Not sure exactly
// why but this seems to be the proper way to go.
std::vector<std::packaged_task<int()>> messages;
std::atomic<bool> running{true};
std::thread thread([&]{
while (running) {
std::unique_lock<std::mutex> l{mtx};
std::cout << "[T] locked with " << messages.size() << " messages." << std::endl;
for (auto& fn: messages) {
fn();
}
messages.clear();
}
});
for (int i = 0; i < 1000000; ++i) {
std::packaged_task<int()> task([i]{
std::cout << "[T] returning " << i << std::endl;
return i;
});
// Without grabbing this now, if the thread executed fn()
// before I do f.get() below, it complained about having
// no shared state.
std::future<int> f = task.get_future();
{
std::unique_lock<std::mutex> l{mtx};
messages.emplace_back(std::move(task));
}
f.get();
}
running = false;
thread.join();
}
At the very least, if this code also deadlocks, then it hasn't yet for me.

GCC's TSAN reports a data race with a thread safe static local

I wrote the following toy example:
std::map<char, size_t> getMap(const std::string& s)
{
std::map<char, size_t> map;
size_t i = 0;
for (const char * b = s.data(), *end = b + s.size(); b != end; ++b)
{
map[*b] = i++;
}
return map;
}
void check(const std::string& s)
{
//The creation of the map should be thread safe according to the C++11 rules.
static const auto map = getMap("12abcd12ef");
//Now we can read the map concurrently.
size_t n = 0;
for (const char* b = s.data(), *end = b + s.size(); b != end; ++b)
{
auto iter = map.find(*b);
if (iter != map.end())
{
n += iter->second;
}
}
std::cout << "check(" << s << ")=" << n << std::endl;
}
int main()
{
std::thread t1(check, "abc");
std::thread t2(check, "def");
t1.join();
t2.join();
return 0;
}
According to the C++11 standard, this should not contain any data race (cf. this post).
However TSAN with gcc 4.9.2, reports a data race:
==================
WARNING: ThreadSanitizer: data race (pid=14054)
Read of size 8 at 0x7f409f5a3690 by thread T2:
#0 TestServer::check(std::string const&) <null>:0 (TestServer+0x0000000cc30a)
#1 std::thread::_Impl<std::_Bind_simple<void (*(char const*))(std::string const&)> >::_M_run() <null>:0 (TestServer+0x0000000cce37)
#2 execute_native_thread_routine ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/thread.cc:84 (libstdc++.so.6+0x0000000b5bdf)
Previous write of size 8 at 0x7f409f5a3690 by thread T1:
#0 TestServer::getMap(std::string const&) <null>:0 (TestServer+0x0000000cc032)
#1 TestServer::check(std::string const&) <null>:0 (TestServer+0x0000000cc5dd)
#2 std::thread::_Impl<std::_Bind_simple<void (*(char const*))(std::string const&)> >::_M_run() <null>:0 (TestServer+0x0000000cce37)
#3 execute_native_thread_routine ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/thread.cc:84 (libstdc++.so.6+0x0000000b5bdf)
Location is global 'TestServer::check(std::string const&)::map' of size 48 at 0x7f409f5a3680 (TestServer+0x00000062b690)
Thread T2 (tid=14075, running) created by main thread at:
#0 pthread_create ../../../../gcc-4.9.2/libsanitizer/tsan/tsan_interceptors.cc:877 (libtsan.so.0+0x000000047c03)
#1 __gthread_create /home/Guillaume/Compile/objdir/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:662 (libstdc++.so.6+0x0000000b5d00)
#2 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/thread.cc:142 (libstdc++.so.6+0x0000000b5d00)
#3 TestServer::main() <null>:0 (TestServer+0x0000000ae914)
#4 StarQube::runSuite(char const*, void (*)()) <null>:0 (TestServer+0x0000000ce328)
#5 main <null>:0 (TestServer+0x0000000ae8bd)
Thread T1 (tid=14074, finished) created by main thread at:
#0 pthread_create ../../../../gcc-4.9.2/libsanitizer/tsan/tsan_interceptors.cc:877 (libtsan.so.0+0x000000047c03)
#1 __gthread_create /home/Guillaume/Compile/objdir/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:662 (libstdc++.so.6+0x0000000b5d00)
#2 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) ../../../../../gcc-4.9.2/libstdc++-v3/src/c++11/thread.cc:142 (libstdc++.so.6+0x0000000b5d00)
#3 TestServer::main() <null>:0 (TestServer+0x0000000ae902)
#4 StarQube::runSuite(char const*, void (*)()) <null>:0 (TestServer+0x0000000ce328)
#5 main <null>:0 (TestServer+0x0000000ae8bd)
SUMMARY: ThreadSanitizer: data race ??:0 TestServer::check(std::string const&)
==================
What is wrong here ?
is TSan buggy ? (When I am using Clang's toolchain, I get no data race report)
does GCC emit code which is not thread safe? (I am not using -fno-threadsafe-statics though)
is my understanding of static locals incorrect?
is TSan buggy ? (When I am using Clang's toolchain, I get no data race report)
does GCC emit code which is not thread safe? (I am not using -fno-threadsafe->statics though)
is my understanding of static locals incorrect?
I believe this is bug in gcc part that generate code for tsan purposes.
I try this:
#include <thread>
#include <iostream>
#include <string>
std::string message()
{
static std::string msg("hi");
return msg;
}
int main()
{
std::thread t1([]() { std::cout << message() << "\n"; });
std::thread t2([]() { std::cout << message() << "\n"; });
t1.join();
t2.join();
}
If look at code generate by clang and gcc, all good,
__cxa_guard_acquire is called in both cases for path that init static local variable. But in case of check that we need init msg or not we have problem.
The code looks like this
if (atomic_flag/*uint8_t*/) {
lock();
call_constructor_of_msg();
unlock();
}
in case of clang callq __tsan_atomic8_load was generated,
but in the case of gcc it generate callq __tsan_read1.
Note that this calls annotate real memory operations,
not do operations by itself.
so it at runtime tsan runtime library thinks that all bad,
and we have data race, I report problem here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68338
and looks like it fixed in trunk, but not in current stable release of gcc - 5.2