I tried the program given here
small_race.c
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
Global = 43;
pthread_join(t, NULL);
return Global;
}
compilation
$ clang -fsanitize=thread -g -pthread -O1 small_race.c
$./a.out ==> No error it's passing successfully
I tried to create 2 more thread and also try to sleep in one of thread then also it's passing. I am using Debian OS
Something is wrong with your platform or installation. With your exact code, I get:
==================
WARNING: ThreadSanitizer: data race (pid=20087)
Write of size 4 at 0x000000601080 by thread T1:
#0 Thread1(void*) /tmp/a.cpp:4 (a2+0x000000400a7f)
#1 <null> <null> (libtsan.so.0+0x0000000235b9)
Previous write of size 4 at 0x000000601080 by main thread:
#0 main /tmp/a.cpp:10 (a2+0x000000400ac5)
Location is global '<null>' of size 0 at 0x000000000000 (a2+0x000000601080)
Thread T1 (tid=20089, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000027a67)
#1 main /tmp/a.cpp:9 (a2+0x000000400abb)
SUMMARY: ThreadSanitizer: data race /tmp/a.cpp:4 Thread1(void*)
==================
Related
I have cpp code where one thread produces, pushing data into a queue and another consumes it before passing it to other libraries for processing.
std::mutex lock;
std::condition_variable new_data;
std::vector<uint8_t> pending_bytes;
bool data_done=false;
// producer
void add_bytes(size_t byte_count, const void *data)
{
if (byte_count == 0)
return;
std::lock_guard<std::mutex> guard(lock);
uint8_t *typed_data = (uint8_t *)data;
pending_bytes.insert(pending_bytes.end(), typed_data,
typed_data + byte_count);
new_data.notify_all();
}
void finish()
{
std::lock_guard<std::mutex> guard(lock);
data_done = true;
new_data.notify_all();
}
// consumer
Result *process(void)
{
data_processor = std::unique_ptr<Processor>(new Processor());
bool done = false;
while (!done)
{
std::unique_lock<std::mutex> guard(lock);
new_data.wait(guard, [&]() {return data_done || pending_bytes.size() > 0;});
size_t byte_count = pending_bytes.size();
std::vector<uint8_t> data_copy;
if (byte_count > 0)
{
data_copy = pending_bytes; // vector copies on assignment
pending_bytes.clear();
}
done = data_done;
guard.unlock();
if (byte_count > 0)
{
data_processor->process(byte_count, data_copy.data());
}
}
return data_processor->finish();
}
Where Processor is a rather involved class with a lot of multi-threaded processing, but as far as I can see it should be separated from the code above.
Now sometimes the code deadlocks, and I'm trying to figure out the race condition. My biggest clue is that the producer threads appears to be stuck under notify_all(). In GDB I get the following backtrace, showing that notify_all is waiting on something:
[Switching to thread 3 (Thread 0x7fffe8d4c700 (LWP 45177))]
#0 0x00007ffff6a4654d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ffff6a44240 in pthread_cond_broadcast##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#2 0x00007ffff67e1b29 in std::condition_variable::notify_all() () from /lib64/libstdc++.so.6
#3 0x0000000001221177 in add_bytes (data=0x7fffe8d4ba70, byte_count=256,
this=0x7fffc00dbb80) at Client/file.cpp:213
while also owning the lock
(gdb) p lock
$12 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 1, __count = 0, __owner = 45177, __nusers = 1, __kind = 0,
__spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
with the other thread waiting in the condition variable wait
[Switching to thread 5 (Thread 0x7fffe7d4a700 (LWP 45180))]
#0 0x00007ffff6a43a35 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff6a43a35 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007ffff67e1aec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2 0x000000000121f9a6 in std::condition_variable::wait<[...]::{lambda()#1}>(std::
unique_lock<std::mutex>&, [...]::{lambda()#1}) (__p=..., __lock=...,
this=0x7fffc00dbb28) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/std_mutex.h:104
There are two other threads running under the Process data part, which also hang on pthread_cond_wait, but as far as I'm aware they do not share any synchronization primities (and are just waiting for calls to processor->add_data or processor->finish)
Any ideas what notify_all is waiting for? or ways of finding the culprit?
Edit: I reproduced the code with a dummy processor here:
https://onlinegdb.com/lp36ewyRSP
But, pretty much as expected, this doesn't reproduce the issue, so I assume there is something more intricate going on. Possibly just different timings, but maybe some interaction between condition_variable and OpenMP (used by the real processor) could cause this?
I also encountered the same problem. After doing a few experiments, I found that if the notify_all starts to work after the condition_variable destroying, the notify_all will deadlocks.
See the code below.
#include <iostream>
#include <condition_variable>
#include <thread>
#include <chrono>
std::thread* t;
void test() {
std::condition_variable cv;
std::mutex cv_m;
t = new std::thread([&](){
std::this_thread::sleep_for(std::chrono::seconds(3));
std::cout << "...before notify_all\n";
cv.notify_all();
std::cout << "...after notify_all\n";
});
std::unique_lock<std::mutex> lk(cv_m);
std::cout << "Waiting... \n";
cv.wait(lk, []{return true;});
std::cout << "...finished waiting\n";
}
int main()
{
test();
t->join();
}
On linux:
LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.3 (Final)
Release: 6.3
Codename: Final
uname info:
Linux xxx_name 3.10.0_3-0-0-34 #1 SMP Sun Apr 26 22:58:21 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
Compile the code using gcc 8.2.0:
g++ --std=c++11 test.cpp -o test_cond -lpthread
The program will hang on after outputing "...before notify_all", and nerver reach "...after notify_all".
However, compile the code using gcc 12.1.0 the program will run successfully.
It seems to me that you should unlock your mutex in the producer before the call to notify_all (https://en.cppreference.com/w/cpp/thread/condition_variable)
According to ThreadSanitizer, I have a data race in the following code.
#include <iostream>
#include <opencv2/imgcodecs.hpp> // cv::imread
#include <opencv2/imgproc.hpp> // cv::cvtColor, cv::COLOR_BGR2HSV
int main() {
cv::Mat img = cv::imread("img.png");
if (img.empty()) {
std::cout << "failed to read image" << std::endl;
return 1;
}
cv::Mat imgHSV;
cv::cvtColor(img, imgHSV, cv::COLOR_BGR2HSV);
return 0;
}
Here's the result of running the code.
$ ./bin/main
==================
WARNING: ThreadSanitizer: data race (pid=10930)
Write of size 8 at 0x7b2000000100 by thread T5:
#0 operator delete[](void*) <null> (main+0xdf7b7)
#1 <null> <null> (libtbb.so.2+0x23236)
Previous read of size 8 at 0x7b2000000100 by thread T2:
#0 memcmp <null> (main+0x97cd0)
#1 <null> <null> (libtbb.so.2+0x214a4)
Thread T5 (tid=10936, running) created by thread T2 at:
#0 pthread_create <null> (main+0x9235a)
#1 <null> <null> (libtbb.so.2+0x206e0)
Thread T2 (tid=10933, finished) created by main thread at:
#0 pthread_create <null> (main+0x9235a)
#1 <null> <null> (libtbb.so.2+0x206e0)
#2 __libc_start_main <null> (libc.so.6+0x27b24)
SUMMARY: ThreadSanitizer: data race (/home/user/bin/main+0xdf7b7) in operator delete[](void*)
==================
ThreadSanitizer: reported 1 warnings
This is how I'm compiling.
clang++ -std=c++17 \
$(pkg-config --cflags --libs opencv4) \
-O1 -g -fsanitize=thread -fno-omit-frame-pointer \
-fsanitize=signed-integer-overflow,null,alignment \
-fno-sanitize-recover=null -fsanitize-trap=alignment \
-Wunused-value -Wshadow \
-o bin/main main.cpp
I'm on Arch Linux, using opencv 4.5.4-6.
This question has been asked before but the solutions proposed there do not work for me. The issue is when an unhandled exception occurs and std::terminate is invoked the original stack trace where the exception is thrown is gone. How can we get that?
Here is a simple test app where child thread throw an exception and std:: terminated is called. The core file does not show the original stack trace.
I tried below, none of them works --
i) std::set_terminate(myhandler) -- myhandler is called as expected but this does not affect stack unwinding.
ii) compile with -fno-exceptions -- no effect, likely because the exception is thrown from std library.
Any suggestions will be greatly appreciated!
$ cat thread.cpp
#include <utility>
#include <thread>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <vector>
void f1()
{
for (int i = 0; i < 10; ++i) {
std::cout << "Thread 1 executing\n";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
// !!! create an exception, how to get the stack trace here.
std::vector<int> vec;
vec.reserve(-1);
}
}
int main()
{
std::thread t1(f1);
t1.join();
std::cout << "Done " << '\n';
}
$ g++ -g thread.cpp -pthread
$ ./a.out
Thread 1 executing
terminate called after throwing an instance of 'std::length_error'
what(): vector::reserve
Aborted (core dumped)
$gdb a.out core
...
Core was generated by `./a.out'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f7e13084277 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
[Current thread is 1 (Thread 0x7f7e1304d700 (LWP 106036))]
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-28.el7_5.1.x86_64 libstdc++-4.8.5-28.el7_5.1.x86_64
(gdb) thread apply all bt
Thread 2 (Thread 0x7f7e14054740 (LWP 106035)):
#0 0x00007f7e13423f97 in pthread_join (threadid=140179461691136, thread_return=0x0) at pthread_join.c:92
#1 0x00007f7e13c03e37 in std::thread::join() () from /lib64/libstdc++.so.6
#2 0x0000000000400f8f in main () at thread.cpp:22
Thread 1 (Thread 0x7f7e1304d700 (LWP 106036)):
#0 0x00007f7e13084277 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f7e13085968 in __GI_abort () at abort.c:90
#2 0x00007f7e13baf7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007f7e13bad746 in ?? () from /lib64/libstdc++.so.6
#4 0x00007f7e13bad773 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00000000004023de in execute_native_thread_routine ()
#6 0x00007f7e13422e25 in start_thread (arg=0x7f7e1304d700) at pthread_create.c:308
#7 0x00007f7e1314cbad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
In some of the answers to related questions I could see that gdb 7.3 should support displaying thread names atleast with 'info threads' command .
But I am not even getting that luxury. please help me to understand what I am doing wrong.
My sample code used for testing:
#include <stdio.h>
#include <pthread.h>
#include <sys/prctl.h>
static pthread_t ta, tb;
void *
fx (void *param)
{
int i = 0;
prctl (PR_SET_NAME, "Mythread1", 0, 0, 0);
while (i < 1000)
{
i++;
printf ("T1%d ", i);
}
}
void *
fy (void *param)
{
int i = 0;
prctl (PR_SET_NAME, "Mythread2", 0, 0, 0);
while (i < 100)
{
i++;
printf ("T2%d ", i);
}
sleep (10);
/* generating segmentation fault */
int *p;
p = NULL;
printf ("%d\n", *p);
}
int
main ()
{
pthread_create (&ta, NULL, fx, 0);
pthread_create (&tb, NULL, fy, 0);
void *retval;
pthread_join (ta, &retval);
pthread_join (tb, &retval);
return 0;
}
Output( using core dump generated by segmentation fault)
(gdb) core-file core.14001
[New LWP 14003]
[New LWP 14001]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `./thread_Ex'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x08048614 in fy (param=0x0) at thread_Ex.c:30
30 printf("%d\n",*p);
(gdb) info threads
Id Target Id Frame
2 Thread 0xb77d76c0 (LWP 14001) 0x00b95424 in __kernel_vsyscall ()
* 1 Thread 0xb6dd5b70 (LWP 14003) 0x08048614 in fy (param=0x0) at thread_Ex.c:30
(gdb) bt
#0 0x08048614 in fy (param=0x0) at thread_Ex.c:30
#1 0x006919e9 in start_thread () from /lib/libpthread.so.0
#2 0x005d3f3e in clone () from /lib/libc.so.6
(gdb) thread apply all bt
Thread 2 (Thread 0xb77d76c0 (LWP 14001)):
#0 0x00b95424 in __kernel_vsyscall ()
#1 0x006920ad in pthread_join () from /lib/libpthread.so.0
#2 0x080486a4 in main () at thread_Ex.c:50
Thread 1 (Thread 0xb6dd5b70 (LWP 14003)):
#0 0x08048614 in fy (param=0x0) at thread_Ex.c:30
#1 0x006919e9 in start_thread () from /lib/libpthread.so.0
#2 0x005d3f3e in clone () from /lib/libc.so.6
(gdb) q
As you can see I cant see any thread names that I have set. what could be wrong?
Note:
I am using gdb version 7.7 (Downloaded and compiled using no special options)
commands used to compile & install gdb : ./configure && make && make install
As far as I am aware, thread names are not present in the core dump.
If they are available somehow, please file a gdb bug.
I get thread name displayed on CentOS6.5, but not displayed on CentOS6.4 .
Consider the following C++ program. I expect that the first thread to invoke exit will terminate the program. This is what happens when I compile it with g++ -g test.cxx -lpthread. However, when I link against TCMalloc (g++ -g test.cxx -lpthread -ltcmalloc), it hangs. Why?
Examination of the stack frames suggests that the first thread to invoke exit is stuck in __unregister_atfork waiting on some sort of reference-counted variable to reach 0. Since it previously acquired the mutex, all other threads become deadlocked. My guess is that there is some sort of interaction betweek tcmalloc's atfork handlers and my code.
Tested on CentOS 6.4 with gperftools 2.0.
$ cat test.cxx
#include <unistd.h>
#include <iostream>
#include <pthread.h>
#include <stdlib.h>
using namespace std;
static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
static void* task(void*) {
if (fork() == 0)
return NULL;
pthread_mutex_lock(&m);
exit(0);
}
int main(int argc, char **argv) {
cout << getpid() << endl;
pthread_t t;
for (unsigned i = 0; i < 100; ++i) {
pthread_create(&t, NULL, task, NULL);
}
sleep(9999);
}
$ g++ -g test.cxx -lpthread && $ ./a.out
19515
$ g++ -g test.cxx -lpthread -ltcmalloc && ./a.out
24252
<<< process hangs indefinitely >>>
^C
$ pstack 24252
Thread 101 (Thread 0x7ffaabdf7700 (LWP 24253)):
#0 0x000000328c4f84c4 in __unregister_atfork () from /lib64/libc.so.6
#1 0x00007ffaac02d2c6 in __do_global_dtors_aux () from /usr/lib64/libtcmalloc.so.4
#2 0x0000000000000000 in ?? ()
Thread 100 (Thread 0x7ffaab3f6700 (LWP 24254)):
#0 0x000000328cc0e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000328cc09388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x000000328cc09257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000400abf in task(void*) ()
#4 0x000000328cc07851 in start_thread () from /lib64/libpthread.so.0
#5 0x000000328c4e894d in clone () from /lib64/libc.so.6
<<< the other 98 threads are also deadlocked >>>
Thread 1 (Thread 0x7ffaabdf9740 (LWP 24252)):
#0 0x000000328c4acbcd in nanosleep () from /lib64/libc.so.6
#1 0x000000328c4aca40 in sleep () from /lib64/libc.so.6
#2 0x0000000000400b33 in main ()
EDIT: I think the problem might be that exit is not thread-safe. According to POSIX, exit is thread-safe. However, the glibc documentation states that exit is not thread-safe.