Here is a simple parallel fortran program which clearly has no threading errors.
program example
implicit none
!$OMP PARALLEL
!$OMP END PARALLEL
end program
I compile it with gfortran example_parallel.f90 -fopenmp -g. I then run it with valgrind with valgrind --tool=helgrind ./a.out. Part of the result is show below:
==43721== Helgrind, a thread error detector
==43721== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==43721== Using Valgrind-3.17.0.GIT-lbmacos and LibVEX; rerun with -h for copyright info
==43721== Command: ./a.out
==43721==
==43721== ---Thread-Announcement------------------------------------------
==43721==
==43721== Thread #2 was created
==43721== at 0x1009D1DC6: __bsdthread_create (in /usr/lib/system/libsystem_kernel.dylib)
==43721== by 0x100A33DF3: _pthread_create (in /usr/lib/system/libsystem_pthread.dylib)
==43721== by 0x1004705D1: gomp_team_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100468BCC: GOMP_parallel (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721==
==43721== ---Thread-Announcement------------------------------------------
==43721==
==43721== Thread #1 is the program's root thread
==43721==
==43721== ----------------------------------------------------------------
==43721==
==43721== Possible data race during read of size 8 at 0x104914638 by thread #2
==43721== Locks held: none
==43721== at 0x10046FED8: gomp_thread_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100C3124F: ???
==43721== by 0x70000C0ABFFF: ???
==43721== by 0x70000C0ABFDF: ???
==43721== by 0x70000C0ABFFF: ???
==43721== by 0x70000C0ABFCF: ???
==43721==
==43721== This conflicts with a previous write of size 8 by thread #1
==43721== Locks held: none
==43721== at 0x100A33DFE: _pthread_create (in /usr/lib/system/libsystem_pthread.dylib)
==43721== by 0x1004705D1: gomp_team_start (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== by 0x100468BCC: GOMP_parallel (in /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib)
==43721== Address 0x104914638 is on thread #1's stack
[... Lots more errors below I just skip to the end]
--43721:0:schedule VG_(sema_down): read returned -4
==43721==
==43721== Use --history-level=approx or =none to gain increased speed, at
==43721== the cost of reduced accuracy of conflicting-access information
==43721== For lists of detected and suppressed errors, rerun with: -s
==43721== ERROR SUMMARY: 35 errors from 32 contexts (suppressed: 0 from 0)
Are these 35 errors false positives? If not, what am I doing wrong? I'm using MacOS 10.15.7, valgrind installed with homebrew following this post.
As #Hristolliev pointed out, libgomp is not compatible with Helgrind, and is known to generate false positives. I found the following on the Valgrind manual:
Runtime support library for GNU OpenMP (part of GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime library (libgomp.so) constructs its own synchronisation primitives using combinations of atomic memory instructions and the futex syscall, which causes total chaos since in Helgrind since it cannot "see" those.
Fortunately, this can be solved using a configuration-time option (for GCC). Rebuild GCC from source, and configure using --disable-linux-futex. This makes libgomp.so use the standard POSIX threading primitives instead. Note that this was tested using GCC 4.2.3 and has not been re-tested using more recent GCC versions. We would appreciate hearing about any successes or failures with more recent versions.
Related
I'm trying to run valgrind in order to identify memory leaks in my program.
Unfortunately, running it fails as follows:
igor#WaylandGnome ~/dbhandler/Debug/dbhandler $ valgrind --leak-check=full dbhandler
==32622== Memcheck, a memory error detector
==32622== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32622== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==32622== Command: dbhandler
==32622==
vex amd64->IR: unhandled instruction bytes: 0x8F 0xEA 0x78 0x10 0xD0 0x8 0x4 0x0 0x0 0x89
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==32622== valgrind: Unrecognised instruction at address 0x40197cf.
==32622== at 0x40197CF: get_common_indices.constprop.0 (in /lib64/ld-2.33.so)
==32622== by 0x401ACB6: init_cpu_features.constprop.0 (in /lib64/ld-2.33.so)
==32622== by 0x401BE1D: _dl_sysdep_start (in /lib64/ld-2.33.so)
==32622== by 0x4001FDB: _dl_start (in /lib64/ld-2.33.so)
==32622== by 0x4001057: ??? (in /lib64/ld-2.33.so)
==32622== Your program just tried to execute an instruction that Valgrind
==32622== did not recognise. There are two possible reasons for this.
==32622== 1. Your program has a bug and erroneously jumped to a non-code
==32622== location. If you are running Memcheck and you just saw a
==32622== warning about a bad jump, it's probably your program's fault.
==32622== 2. The instruction is legitimate but Valgrind doesn't handle it,
==32622== i.e. it's Valgrind's fault. If you think this is the case or
==32622== you are not sure, please let us know and we'll try to fix it.
==32622== Either way, Valgrind will now raise a SIGILL signal which will
==32622== probably kill your program.
==32622==
==32622== Process terminating with default action of signal 4 (SIGILL)
==32622== Illegal opcode at address 0x40197CF
==32622== at 0x40197CF: get_common_indices.constprop.0 (in /lib64/ld-2.33.so)
==32622== by 0x401ACB6: init_cpu_features.constprop.0 (in /lib64/ld-2.33.so)
==32622== by 0x401BE1D: _dl_sysdep_start (in /lib64/ld-2.33.so)
==32622== by 0x4001FDB: _dl_start (in /lib64/ld-2.33.so)
==32622== by 0x4001057: ??? (in /lib64/ld-2.33.so)
==32622==
==32622== HEAP SUMMARY:
==32622== in use at exit: 0 bytes in 0 blocks
==32622== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==32622==
==32622== All heap blocks were freed -- no leaks are possible
==32622==
==32622== For lists of detected and suppressed errors, rerun with: -s
==32622== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction
Any idea what is happening?
I'm running Gentoo here and my program is built with C++11 standard.
This is valgrind bug https://bugs.kde.org/show_bug.cgi?id=381819
There is a patch attached to the bug but the patch is not in the latest valgrind release as it is reported as not being complete.
I'm learning C language from Learn C The Hard Way. I'm on exercise 6 and while I can make it work, valgrind repots a lot of errors.
Here's the stripped down minimal program from a file ex6.c:
#include <stdio.h>
int main(int argc, char *argv[])
{
char initial = 'A';
float power = 2.345f;
printf("Character is %c.\n", initial);
printf("You have %f levels of power.\n", power);
return 0;
}
Content of Makefile is just CFLAGS=-Wall -g.
I compile the program with $ make ex6 (there are no compiler warnings or errors). Executing with $ ./ex6 produces the expected output.
When I run the program with $ valgrind ./ex6 I get errors which I can't solve. Here's the full output:
==69691== Memcheck, a memory error detector
==69691== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==69691== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==69691== Command: ./ex6
==69691==
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
==69691== Conditional jump or move depends on uninitialised value(s)
==69691== at 0x1003FBC3F: _platform_memchr$VARIANT$Haswell (in /usr/lib/system/libsystem_platform.dylib)
==69691== by 0x1001EFBB6: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001FA005: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021F9CE: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021FCA0: __xvprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F5B91: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F39F7: printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x100000F1B: main (ex6.c:8)
==69691==
Character is A.
==69691== Invalid read of size 32
==69691== at 0x1003FBC1D: _platform_memchr$VARIANT$Haswell (in /usr/lib/system/libsystem_platform.dylib)
==69691== by 0x1001EFBB6: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001FA005: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021F9CE: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021FCA0: __xvprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F5B91: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F39F7: printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x100000F31: main (ex6.c:9)
==69691== Address 0x100809680 is 32 bytes before a block of size 32 in arena "client"
==69691==
You have 2.345000 levels of power.
==69691==
==69691== HEAP SUMMARY:
==69691== in use at exit: 39,365 bytes in 429 blocks
==69691== total heap usage: 510 allocs, 81 frees, 45,509 bytes allocated
==69691==
==69691== LEAK SUMMARY:
==69691== definitely lost: 16 bytes in 1 blocks
==69691== indirectly lost: 0 bytes in 0 blocks
==69691== possibly lost: 13,090 bytes in 117 blocks
==69691== still reachable: 26,259 bytes in 311 blocks
==69691== suppressed: 0 bytes in 0 blocks
==69691== Rerun with --leak-check=full to see details of leaked memory
==69691==
==69691== For counts of detected and suppressed errors, rerun with: -v
==69691== Use --track-origins=yes to see where uninitialised values come from
==69691== ERROR SUMMARY: 5 errors from 2 contexts (suppressed: 0 from 0)
I'm on OS X yosemite. Valgrind is installed via brew with this command $ brew install valgrind --HEAD.
So, does anyone know what's the issue here? How do I fix the valgrind errors?
If the programme you are running through Valgrind is exactly the one you posted in your question, it clearly doesn't have any memory leaks. In fact, you don't even use malloc/free yourself!
It looks to me like these are spurious errors / false positives that Valgrind detects on OS X (only!), similar to what happened to myself some time ago.
If you have access to a different operating system, e.g. a Linux machine, try to analyze the programme using Valgrind on that system.
EDIT: I haven't tried this myself, since I don't have access to a Mac right now, but you should try what
M Oehm suggested: try to use a supressions file as mentioned in this other SO question.
This issue is fixed for Darwin 14.3.0 (Mac OS X 10.10.2) using Valgrind r14960 with VEX r3124 for Xcode6.2 and Valgrind r15088 for Xcode 6.3.
If you are using Macports (at this time of writing), sudo port install valgrind-devel will give you Valgrind r14960 with VEX r3093.
Here's my build script to install Valgrind r14960 with VEX r3124:
#! /usr/bin/env bash
mkdir -p buildvalgrind
cd buildvalgrind
svn co svn://svn.valgrind.org/valgrind/trunk/#14960 valgrind
cd valgrind
./autogen.sh
./configure --prefix=/usr/local
make && sudo make install
# check that we have our valgrind installed
/usr/local/bin/valgrind --version
(reference: http://calvinx.com/2015/04/10/valgrind-on-mac-os-x-10-10-yosemite/)
My macports-installed valgrind is located at /opt/local/bin/valgrind.
If I now run
/opt/local/bin/valgrind --leak-check=yes --suppressions=`pwd`/objc.supp ./ex6
I will get exactly the same errors you described above. (Using my objc.supp file here https://gist.github.com/calvinchengx/0b1d45f67be9fdca205b)
But if I run
/usr/local/bin/valgrind --leak-check=yes --suppressions=`pwd`/objc.supp ./ex6
Everything works as expected and I do not get the system level memory leak errors showing up.
Judging from this topic, I assume that valgrind is not guaranteed to give correct results on your platform. If you can, try this code on another platform.
The culprit is either in valgrid itself or in your system's implementation of printf, both of which would be impractical for you to fix.
Rerun with --leak-check=full to see details of leaked memory. This should give you some more information about the leak you are experiencing. If nothing helps, you can create a suppression file to stop the errors from being displayed.
I have a R script that throws a segfault error. The R script uses a package "RSofia" that internally calls a C++ program using Rcpp package which I believe is causing the issue.
Please refer to the link for the question I posted on the same: RSofia Issue
I am trying to debug and identify what is causing the issue using valgrind as follows:
R -d "valgrind --leak-check=full --show-reachable=yes" -f svm.r
This throws the following output:
==11235== Memcheck, a memory error detector
==11235== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11235== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11235== Command: /usr/lib64/R/bin/exec/R -f svm.r
==11235==
vex: priv/main_main.c:319 (LibVEX_Translate): Assertion `are_valid_hwcaps(VexArchAMD64, vta->archinfo_host.hwcaps)' failed.
vex storage: T total 0 bytes allocated
vex storage: P total 0 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
==11235== at 0x38031DA7: report_and_quit (m_libcassert.c:235)
==11235== by 0x38031E0E: panic (m_libcassert.c:319)
==11235== by 0x38031E68: vgPlain_core_panic_at (m_libcassert.c:324)
==11235== by 0x38031E7A: vgPlain_core_panic (m_libcassert.c:329)
==11235== by 0x3804D162: failure_exit (m_translate.c:708)
==11235== by 0x380D4C38: vex_assert_fail (main_util.c:219)
==11235== by 0x380D3009: LibVEX_Translate (main_main.c:319)
==11235== by 0x3804AACE: vgPlain_translate (m_translate.c:1559)
==11235== by 0x38079D9F: vgPlain_scheduler (scheduler.c:991)
==11235== by 0x380A6409: run_a_thread_NORETURN (syswrap-linux.c:103)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==11235== at 0x4000B00: ??? (in /lib64/ld-2.12.so)
==11235== by 0x2: ???
==11235== by 0x7FF00036E: ???
==11235== by 0x7FF000386: ???
==11235== by 0x7FF000389: ???
Can someone help with how to locate the error from this message and what could be a possible fix to this?
Looks like it is using an assert() which, according to Writing R Extension one should not be using in the first place.
Now why the assert() evaluates the way it does and hence aborts is another matter. But for that one would need a minimally reproducible example, plus some spare time and patience.
I have the following situation (Ubuntu 15.10 and Debian Testing)
I have Lib A which is compiled without cxx11 and a lib B that uses -std=c++11. B includes and links against A, A uses boost.
If i link B to A, the application a A created crashes during dynload.
If i compile A without cxx11 or B with cxx11 everything works fine.
My question: as far i understood the ABI namespace add on should guarantee shuch kind of problem. Am i wrong here?
I created a example project to clarify the problem:
https://github.com/goldhoorn/sandbox/tree/gcc5.2-issue
test1 failed, the other tests passing.
GDB tells me:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bceb2e in _GLOBAL__sub_I_Lib.cpp () from ./libmyLib.so
(gdb) bt
#0 0x00007ffff7bceb2e in _GLOBAL__sub_I_Lib.cpp () from ./libmyLib.so
#1 0x00007ffff7deaa0a in call_init (l=<optimized out>, argc=argc#entry=1,
argv=argv#entry=0x7fffffffe688, env=env#entry=0x7fffffffe698)
at dl-init.c:78
#2 0x00007ffff7deaaf3 in call_init (env=0x7fffffffe698, argv=0x7fffffffe688,
argc=1, l=<optimized out>) at dl-init.c:36
#3 _dl_init (main_map=0x7ffff7ffe1a8, argc=1, argv=0x7fffffffe688,
env=0x7fffffffe698) at dl-init.c:126
#4 0x00007ffff7ddd1ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ?? ()
#6 0x00007fffffffe89f in ?? ()
#7 0x0000000000000000 in ?? ()
Result from Valgrind:
goldhoorn#debian:/tmp/example$ LD_LIBRARY_PATH=. valgrind --show-below-main=yes ./main
==17140== Memcheck, a memory error detector
==17140== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17140== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==17140== Command: ./main
==17140==
==17140==
==17140== Process terminating with default action of signal 11 (SIGSEGV)
==17140== Bad permissions for mapped region at address 0x401FE8
==17140== at 0x4E3EB2E: _GLOBAL__sub_I_Lib.cpp (in /tmp/example/libmyLib.so)
==17140== by 0x400EA09: call_init.part.0 (dl-init.c:78)
==17140== by 0x400EAF2: call_init (dl-init.c:36)
==17140== by 0x400EAF2: _dl_init (dl-init.c:126)
==17140== by 0x40011C9: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==17140==
==17140== HEAP SUMMARY:
==17140== in use at exit: 72,704 bytes in 1 blocks
==17140== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==17140==
==17140== LEAK SUMMARY:
==17140== definitely lost: 0 bytes in 0 blocks
==17140== indirectly lost: 0 bytes in 0 blocks
==17140== possibly lost: 0 bytes in 0 blocks
==17140== still reachable: 72,704 bytes in 1 blocks
==17140== suppressed: 0 bytes in 0 blocks
==17140== Rerun with --leak-check=full to see details of leaked memory
==17140==
==17140== For counts of detected and suppressed errors, rerun with: -v
==17140== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault
Please check your library A against the list of Cxx11 ABI backward incompatibilities: https://gcc.gnu.org/wiki/Cxx11AbiCompatibility
The C++98 language is ABI-compatible with the C++11 language, but several places in the library break compatibility. This makes it dangerous to link C++98 objects with C++11 objects.
I have problems getting Helgrind and DRD working with g++ and C++11 threads.
My setup:
- RedHad Linux 2.6
- g++ 4.7.2
- Valgrind 3.7.0
I tried the program posted here, after adding the definitions listed in the first answer, thus:
#include <valgrind/helgrind.h>
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) ANNOTATE_HAPPENS_BEFORE(addr)
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) ANNOTATE_HAPPENS_AFTER(addr)
#define _GLIBCXX_EXTERN_TEMPLATE -1
#include <thread>
int main()
{
std::thread t( []() { } );
t.join();
return 0;
}
I then build the program:
$ g++ -std=c++11 -Wall -Wextra -pthread main.cc
The program (which doesn't do much) runs correctly:
$ ./a.out
also with valgrind:
$ valgrind ./a.out
==21284== Memcheck, a memory error detector
==21284== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==21284== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21284== Command: ./a.out
==21284==
==21284==
==21284== HEAP SUMMARY:
==21284== in use at exit: 0 bytes in 0 blocks
==21284== total heap usage: 2 allocs, 2 frees, 344 bytes allocated
==21284==
==21284== All heap blocks were freed -- no leaks are possible
==21284==
==21284== For counts of detected and suppressed errors, rerun with: -v
==21284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
But then, with Helgrind, I get false positives:
$ valgrind --tool=helgrind ./a.out
==21467== Helgrind, a thread error detector
==21467== Copyright (C) 2007-2011, and GNU GPL'd, by OpenWorks LLP et al.
==21467== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21467== Command: ./a.out
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467==
==21467== Thread #1 is the program's root thread
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467== [lines removed]
==21467==
==21467== ----------------------------------------------------------------
==21467==
==21467== Possible data race during write of size 8 at 0x5B7A058 by thread #1
==21467== Locks held: none
==21467== [lines removed]
==21467==
==21467== This conflicts with a previous write of size 8 by thread #2
==21467== Locks held: none
==21467== at 0x4EE0A25: execute_native_thread_routine (shared_ptr_base.h:587)
==21467== by 0x4C2D3AD: mythread_wrapper (hg_intercepts.c:219)
==21467== by 0x55D1850: start_thread (in /lib64/libpthread-2.12.so)
==21467== by 0x58CF90C: clone (in /lib64/libc-2.12.so)
==21467==
==21467== [lines removed]
==21467==
==21467==
==21467== For counts of detected and suppressed errors, rerun with: -v
==21467== Use --history-level=approx or =none to gain increased speed, at
==21467== the cost of reduced accuracy of conflicting-access information
==21467== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Similar bogus reports with DRD instead of Helgrind.
Any idea what could be wrong?
I found this in drd manual. It seems you need to recompile the functions execute_native_thread_routine() and std::thread::_M_start_thread(), and link it with your program(And not using the shared library at execution time for these functions).
Actually the macros _GLIBCXX_SYNCHRONIZATION_HAPPENS_* are seen by your code but they were not seen by the internal functions of the c++ library code when this library was built. That's why you need to recompile them using the same header inclusion and macro definition you used for your code.