I have problems getting Helgrind and DRD working with g++ and C++11 threads.
My setup:
- RedHad Linux 2.6
- g++ 4.7.2
- Valgrind 3.7.0
I tried the program posted here, after adding the definitions listed in the first answer, thus:
#include <valgrind/helgrind.h>
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) ANNOTATE_HAPPENS_BEFORE(addr)
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) ANNOTATE_HAPPENS_AFTER(addr)
#define _GLIBCXX_EXTERN_TEMPLATE -1
#include <thread>
int main()
{
std::thread t( []() { } );
t.join();
return 0;
}
I then build the program:
$ g++ -std=c++11 -Wall -Wextra -pthread main.cc
The program (which doesn't do much) runs correctly:
$ ./a.out
also with valgrind:
$ valgrind ./a.out
==21284== Memcheck, a memory error detector
==21284== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==21284== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21284== Command: ./a.out
==21284==
==21284==
==21284== HEAP SUMMARY:
==21284== in use at exit: 0 bytes in 0 blocks
==21284== total heap usage: 2 allocs, 2 frees, 344 bytes allocated
==21284==
==21284== All heap blocks were freed -- no leaks are possible
==21284==
==21284== For counts of detected and suppressed errors, rerun with: -v
==21284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
But then, with Helgrind, I get false positives:
$ valgrind --tool=helgrind ./a.out
==21467== Helgrind, a thread error detector
==21467== Copyright (C) 2007-2011, and GNU GPL'd, by OpenWorks LLP et al.
==21467== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21467== Command: ./a.out
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467==
==21467== Thread #1 is the program's root thread
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467== [lines removed]
==21467==
==21467== ----------------------------------------------------------------
==21467==
==21467== Possible data race during write of size 8 at 0x5B7A058 by thread #1
==21467== Locks held: none
==21467== [lines removed]
==21467==
==21467== This conflicts with a previous write of size 8 by thread #2
==21467== Locks held: none
==21467== at 0x4EE0A25: execute_native_thread_routine (shared_ptr_base.h:587)
==21467== by 0x4C2D3AD: mythread_wrapper (hg_intercepts.c:219)
==21467== by 0x55D1850: start_thread (in /lib64/libpthread-2.12.so)
==21467== by 0x58CF90C: clone (in /lib64/libc-2.12.so)
==21467==
==21467== [lines removed]
==21467==
==21467==
==21467== For counts of detected and suppressed errors, rerun with: -v
==21467== Use --history-level=approx or =none to gain increased speed, at
==21467== the cost of reduced accuracy of conflicting-access information
==21467== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Similar bogus reports with DRD instead of Helgrind.
Any idea what could be wrong?
I found this in drd manual. It seems you need to recompile the functions execute_native_thread_routine() and std::thread::_M_start_thread(), and link it with your program(And not using the shared library at execution time for these functions).
Actually the macros _GLIBCXX_SYNCHRONIZATION_HAPPENS_* are seen by your code but they were not seen by the internal functions of the c++ library code when this library was built. That's why you need to recompile them using the same header inclusion and macro definition you used for your code.
Related
Here is a simple parallel fortran program which clearly has no threading errors.
program example
implicit none
!$OMP PARALLEL
!$OMP END PARALLEL
end program
I compile it with gfortran example_parallel.f90 -fopenmp -g. I then run it with valgrind with valgrind --tool=helgrind ./a.out. Part of the result is show below:
==43721== Helgrind, a thread error detector
==43721== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==43721== Using Valgrind-3.17.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==43721== Command: ./a.out
==43721==
==43721== ---Thread-Announcement------------------------------------------
==43721==
==43721== Thread #2 was created
==43721== at 0x1009D1DC6: __bsdthread_create (in /usr/lib/system/libsystem_kernel.dylib)
==43721== by 0x100A33DF3: _pthread_create (in /usr/lib/system/libsystem_pthread.dylib)
==43721== by 0x1004705D1: gomp_team_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100468BCC: GOMP_parallel (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721==
==43721== ---Thread-Announcement------------------------------------------
==43721==
==43721== Thread #1 is the program's root thread
==43721==
==43721== ----------------------------------------------------------------
==43721==
==43721== Possible data race during read of size 8 at 0x104914638 by thread #2
==43721== Locks held: none
==43721== at 0x10046FED8: gomp_thread_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100C3124F: ???
==43721== by 0x70000C0ABFFF: ???
==43721== by 0x70000C0ABFDF: ???
==43721== by 0x70000C0ABFFF: ???
==43721== by 0x70000C0ABFCF: ???
==43721==
==43721== This conflicts with a previous write of size 8 by thread #1
==43721== Locks held: none
==43721== at 0x100A33DFE: _pthread_create (in /usr/lib/system/libsystem_pthread.dylib)
==43721== by 0x1004705D1: gomp_team_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100468BCC: GOMP_parallel (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== Address 0x104914638 is on thread #1's stack
[... Lots more errors below I just skip to the end]
--43721:0:schedule VG_(sema_down): read returned -4
==43721==
==43721== Use --history-level=approx or =none to gain increased speed, at
==43721== the cost of reduced accuracy of conflicting-access information
==43721== For lists of detected and suppressed errors, rerun with: -s
==43721== ERROR SUMMARY: 35 errors from 32 contexts (suppressed: 0 from 0)
Are these 35 errors false positives? If not, what am I doing wrong? I'm using MacOS 10.15.7, valgrind installed with homebrew following this post.
As #Hristolliev pointed out, libgomp is not compatible with Helgrind, and is known to generate false positives. I found the following on the Valgrind manual:
Runtime support library for GNU OpenMP (part of GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime library (libgomp.so) constructs its own synchronisation primitives using combinations of atomic memory instructions and the futex syscall, which causes total chaos since in Helgrind since it cannot "see" those.
Fortunately, this can be solved using a configuration-time option (for GCC). Rebuild GCC from source, and configure using --disable-linux-futex. This makes libgomp.so use the standard POSIX threading primitives instead. Note that this was tested using GCC 4.2.3 and has not been re-tested using more recent GCC versions. We would appreciate hearing about any successes or failures with more recent versions.
I was toying around with Valgrind, when I noticed something weird:
my C++ program does nothing, yet there is 1 memory alloc and 1 free.
My simple program:
int main() {
return 0;
}
when compiled with g++ and checked with Valgrind
> g++ main.cpp
> valgrind --leak-check=full --track-origins=yes ./a.out
==40790== Memcheck, a memory error detector
==40790== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==40790== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==40790== Command: ./a.out
==40790==
==40790==
==40790== HEAP SUMMARY:
==40790== in use at exit: 0 bytes in 0 blocks
==40790== total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==40790==
==40790== All heap blocks were freed -- no leaks are possible
==40790==
==40790== For lists of detected and suppressed errors, rerun with: -s
==40790== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
My question: My program does nothing. Where does the alloc and free come from?
Interestingly enough, the same program compiled with gcc, shows zero allocs and frees:
> gcc main.c
> valgrind --leak-check=full --track-origins=yes ./a.out
==40740== Memcheck, a memory error detector
==40740== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==40740== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==40740== Command: ./a.out
==40740==
==40740==
==40740== HEAP SUMMARY:
==40740== in use at exit: 0 bytes in 0 blocks
==40740== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==40740==
==40740== All heap blocks were freed -- no leaks are possible
==40740==
==40740== For lists of detected and suppressed errors, rerun with: -s
==40740== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Follow up question: Why do the two memory allocations differ, for the same piece of code?
compiler: gcc (GCC) 10.1.0
valgrind: valgrind-3.16.0.GIT
The main function is the entry point of your code. It doesn't have to be (and seldom is) the entry point to the process for the operating system that is loading your program.
There's usually plenty of code running first to set up things needed for the standard library (like setting up the standard I/O streams, and fetching the actual arguments from the operating system) before your main function is called.
And it's important to note that the main function is called like any other function. Once it returns it will return to the initialization code which will now clean up after itself (like freeing memory it might have allocated, and closing streams, etc.).
I have developed a pure-C implementation of FIFO lists (queues) in files fifo.h and fifo.c, and have written a test programme testfifo.c which I compile to ./bin/testfifo. The node structure is defined in list.h.
I run my programme through Valgrind on OS X 10.6 like this
valgrind --tool=memcheck --leak-check=full --show-reachable=yes ./bin/testfifo
and get the following output
==54688== Memcheck, a memory error detector
==54688== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==54688== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==54688== Command: bin/testfifo
==54688==
--54688-- bin/testfifo:
--54688-- dSYM directory is missing; consider using --dsymutil=yes
==54688==
==54688== HEAP SUMMARY:
==54688== in use at exit: 88 bytes in 1 blocks
==54688== total heap usage: 11 allocs, 10 frees, 248 bytes allocated
==54688==
==54688== LEAK SUMMARY:
==54688== definitely lost: 0 bytes in 0 blocks
==54688== indirectly lost: 0 bytes in 0 blocks
==54688== possibly lost: 0 bytes in 0 blocks
==54688== still reachable: 0 bytes in 0 blocks
==54688== suppressed: 88 bytes in 1 blocks
==54688==
==54688== For counts of detected and suppressed errors, rerun with: -v
==54688== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
According to the leak summary, there are no leaks, but I am still wondering what the "suppressed" leaks are. Besides, the number of alloc's and free's do not match, and hence I am unsure if there are leaks or not.
----EDIT----
Running
valgrind --tool=memcheck --leak-check=full --show-reachable=yes -v ./bin/testfifo
on OS X 10.6 produces a quite long and confusing output, but I have run
valgrind --tool=memcheck --leak-check=full --show-reachable=yes ./bin/testfifo
on a Linux machine an got this output:
==32688== Memcheck, a memory error detector
==32688== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==32688== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==32688== Command: bin/testfifo
==32688==
==32688==
==32688== HEAP SUMMARY:
==32688== in use at exit: 0 bytes in 0 blocks
==32688== total heap usage: 10 allocs, 10 frees, 160 bytes allocated
==32688==
==32688== All heap blocks were freed -- no leaks are possible
==32688==
==32688== For counts of detected and suppressed errors, rerun with: -v
==32688== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
alloc's and free's now match, so the extra alloc on OS X seems to be due to some system library, as has been suggested.
I have run the very same command with the -v option, in order to reveal the 4 suppressed errors, but I have not got any easily understandable new information.
Those are leaks outside of your code, in (probably shared) libraries or known false positives. Running valgrind with -v should inform you about the suppressions used.
I have a R script that throws a segfault error. The R script uses a package "RSofia" that internally calls a C++ program using Rcpp package which I believe is causing the issue.
Please refer to the link for the question I posted on the same: RSofia Issue
I am trying to debug and identify what is causing the issue using valgrind as follows:
R -d "valgrind --leak-check=full --show-reachable=yes" -f svm.r
This throws the following output:
==11235== Memcheck, a memory error detector
==11235== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11235== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11235== Command: /usr/lib64/R/bin/exec/R -f svm.r
==11235==
vex: priv/main_main.c:319 (LibVEX_Translate): Assertion `are_valid_hwcaps(VexArchAMD64, vta->archinfo_host.hwcaps)' failed.
vex storage: T total 0 bytes allocated
vex storage: P total 0 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
==11235== at 0x38031DA7: report_and_quit (m_libcassert.c:235)
==11235== by 0x38031E0E: panic (m_libcassert.c:319)
==11235== by 0x38031E68: vgPlain_core_panic_at (m_libcassert.c:324)
==11235== by 0x38031E7A: vgPlain_core_panic (m_libcassert.c:329)
==11235== by 0x3804D162: failure_exit (m_translate.c:708)
==11235== by 0x380D4C38: vex_assert_fail (main_util.c:219)
==11235== by 0x380D3009: LibVEX_Translate (main_main.c:319)
==11235== by 0x3804AACE: vgPlain_translate (m_translate.c:1559)
==11235== by 0x38079D9F: vgPlain_scheduler (scheduler.c:991)
==11235== by 0x380A6409: run_a_thread_NORETURN (syswrap-linux.c:103)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==11235== at 0x4000B00: ??? (in /lib64/ld-2.12.so)
==11235== by 0x2: ???
==11235== by 0x7FF00036E: ???
==11235== by 0x7FF000386: ???
==11235== by 0x7FF000389: ???
Can someone help with how to locate the error from this message and what could be a possible fix to this?
Looks like it is using an assert() which, according to Writing R Extension one should not be using in the first place.
Now why the assert() evaluates the way it does and hence aborts is another matter. But for that one would need a minimally reproducible example, plus some spare time and patience.
I have the following situation (Ubuntu 15.10 and Debian Testing)
I have Lib A which is compiled without cxx11 and a lib B that uses -std=c++11. B includes and links against A, A uses boost.
If i link B to A, the application a A created crashes during dynload.
If i compile A without cxx11 or B with cxx11 everything works fine.
My question: as far i understood the ABI namespace add on should guarantee shuch kind of problem. Am i wrong here?
I created a example project to clarify the problem:
https://github.com/goldhoorn/sandbox/tree/gcc5.2-issue
test1 failed, the other tests passing.
GDB tells me:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bceb2e in _GLOBAL__sub_I_Lib.cpp () from ./libmyLib.so
(gdb) bt
#0 0x00007ffff7bceb2e in _GLOBAL__sub_I_Lib.cpp () from ./libmyLib.so
#1 0x00007ffff7deaa0a in call_init (l=<optimized out>, argc=argc#entry=1,
argv=argv#entry=0x7fffffffe688, env=env#entry=0x7fffffffe698)
at dl-init.c:78
#2 0x00007ffff7deaaf3 in call_init (env=0x7fffffffe698, argv=0x7fffffffe688,
argc=1, l=<optimized out>) at dl-init.c:36
#3 _dl_init (main_map=0x7ffff7ffe1a8, argc=1, argv=0x7fffffffe688,
env=0x7fffffffe698) at dl-init.c:126
#4 0x00007ffff7ddd1ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ?? ()
#6 0x00007fffffffe89f in ?? ()
#7 0x0000000000000000 in ?? ()
Result from Valgrind:
goldhoorn#debian:/tmp/example$ LD_LIBRARY_PATH=. valgrind --show-below-main=yes ./main
==17140== Memcheck, a memory error detector
==17140== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17140== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==17140== Command: ./main
==17140==
==17140==
==17140== Process terminating with default action of signal 11 (SIGSEGV)
==17140== Bad permissions for mapped region at address 0x401FE8
==17140== at 0x4E3EB2E: _GLOBAL__sub_I_Lib.cpp (in /tmp/example/libmyLib.so)
==17140== by 0x400EA09: call_init.part.0 (dl-init.c:78)
==17140== by 0x400EAF2: call_init (dl-init.c:36)
==17140== by 0x400EAF2: _dl_init (dl-init.c:126)
==17140== by 0x40011C9: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==17140==
==17140== HEAP SUMMARY:
==17140== in use at exit: 72,704 bytes in 1 blocks
==17140== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==17140==
==17140== LEAK SUMMARY:
==17140== definitely lost: 0 bytes in 0 blocks
==17140== indirectly lost: 0 bytes in 0 blocks
==17140== possibly lost: 0 bytes in 0 blocks
==17140== still reachable: 72,704 bytes in 1 blocks
==17140== suppressed: 0 bytes in 0 blocks
==17140== Rerun with --leak-check=full to see details of leaked memory
==17140==
==17140== For counts of detected and suppressed errors, rerun with: -v
==17140== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault
Please check your library A against the list of Cxx11 ABI backward incompatibilities: https://gcc.gnu.org/wiki/Cxx11AbiCompatibility
The C++98 language is ABI-compatible with the C++11 language, but several places in the library break compatibility. This makes it dangerous to link C++98 objects with C++11 objects.