How shoud the results from cachegrind be interpretated? - profiling

I need to profile a program in development to understand what bottlenecks there may be and in particular whether there are any due to memory accesses.
To do this I used cachegrind built into valgrind.
I compiled the program using gcc and the -g flag after which I ran valgrind using the command valgrind --tool=cachegrind ./a.out.
The result printed on the terminal was as follows:
==11611== Cachegrind, a cache and branch-prediction profiler
==11611== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==11611== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==11611== Command: ./profiling
==11611==
--11611-- warning: L3 cache found, using its data for the LL simulation.
Elapsed computation time: 33.46223 seconds
==11611==
==11611== I refs: 10,918,854,735
==11611== I1 misses: 1,655
==11611== LLi misses: 1,646
==11611== I1 miss rate: 0.00%
==11611== LLi miss rate: 0.00%
==11611==
==11611== D refs: 4,620,671,815 (4,235,254,268 rd + 385,417,547 wr)
==11611== D1 misses: 3,222,370 ( 2,887,833 rd + 334,537 wr)
==11611== LLd misses: 18,506 ( 16,679 rd + 1,827 wr)
==11611== D1 miss rate: 0.1% ( 0.1% + 0.1% )
==11611== LLd miss rate: 0.0% ( 0.0% + 0.0% )
==11611==
==11611== LL refs: 3,224,025 ( 2,889,488 rd + 334,537 wr)
==11611== LL misses: 20,152 ( 18,325 rd + 1,827 wr)
==11611== LL miss rate: 0.0% ( 0.0% + 0.0% )
The thing I don't understand is the final percentage for LL miss rate, in fact doing LL misses/LL refs * 100 should come 0.6% while the terminal reports 0.0%. Is it an approximation done by cachegrind ?
Using kcachegrind I only get percentages next to the event types and next to the lines of code (as in figure). Is it possible to see the number of misses instead ?

Related

Does the output of valgrind still reliable when it reports the 'impossible' happened: LibVEX called failure_exit()

Here is the brief log:
# valgrind --error-limit=no --leak-check=full --tool=memcheck /mnt/aarch64/ld-linux-aarch64.so.1 ./program
==12104== Memcheck, a memory error detector
==12104== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12104== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==12104== Command: /mnt/aarch64/ld-linux-aarch64.so.1 ./program
==12104==
vex: priv/host_arm64_defs.c:2796 (genSpill_ARM64): Assertion `offsetB < 4096' failed.
vex storage: T total 4207069920 bytes allocated
vex storage: P total 0 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
host stacktrace:
==12104== at 0x5803F488: show_sched_status_wrk (m_libcassert.c:406)
==12104== by 0x5803F5C7: report_and_quit (m_libcassert.c:477)
==12104== by 0x5803F7FB: panic (m_libcassert.c:553)
==12104== by 0x5803F7FB: vgPlain_core_panic_at (m_libcassert.c:558)
==12104== by 0x5803F81F: vgPlain_core_panic (m_libcassert.c:563)
==12104== by 0x5805481B: failure_exit (m_translate.c:761)
==12104== by 0x5811E043: vex_assert_fail (main_util.c:245)
==12104== by 0x5817A897: genSpill_ARM64 (host_arm64_defs.c:2796)
==12104== by 0x58172217: spill_vreg (host_generic_reg_alloc3.c:338)
==12104== by 0x5817324F: doRegisterAllocation_v3 (host_generic_reg_alloc3.c:1280)
==12104== by 0x5811CD97: libvex_BackEnd (main_main.c:1133)
==12104== by 0x5811CD97: LibVEX_Translate (main_main.c:1236)
==12104== by 0x58056FCB: vgPlain_translate (m_translate.c:1830)
==12104== by 0x58092A27: handle_chain_me (scheduler.c:1169)
==12104== by 0x580954A7: vgPlain_scheduler (scheduler.c:1514)
==12104== by 0x580D8E8F: thread_wrapper (syswrap-linux.c:101)
==12104== by 0x580D8E8F: run_a_thread_NORETURN (syswrap-linux.c:154)
==12104== by 0x580D916F: vgModuleLocal_start_thread_NORETURN (syswrap-linux.c:328)
==12104== by 0x580A68D3: ??? (in /mnt/aarch64/lib/valgrind/memcheck-arm64-linux)
sched status:
running_tid=3
I want to known whether the reset output of valgrind is still meaningful for pointing out the memory leak in the program when it reports the said error(i.e. the 'impossible' happened: LibVEX called failure_exit())
Updated:
The program does run, I can see many stack strace,e.g:
Thread 1: status = VgTs_WaitSys syscall 98 (lwpid 12168)
==12168== at 0x6E85274: syscall (in /lib/libc-2.31.so)
==12168== by 0x6B99FF3: std::__atomic_futex_unsigned_base
...
Thread 2:
...
In general, if Valgrind reports some errors and then encounters an internal error, those initial reports should be valid.
In the case of the message that you show, the first report is an internal error related to the virtualization of the ARM aarch64 CPU. That's only useful for Valgrind developers.
Try running Valgrind with -d -v. There will be a lot of output, but you should see
syswrap- run_a_thread_NORETURN(tid=1): pre-thread_wrapper that is the guest executable starting
a load of gdbsrv messages
a mix of REDIR and mallocrf/hashtbl messages as the guest startup code invokes ld.so to load shared libraries
I can't tell where you get the subsequent traces from - could be trace options, more likely from the assert.

TCMalloc memory leak debugging

I've compiled an application with tcmalloc and used HEAPPROFILE environment variable to get heap files
every 10MB or so a new heap file is created and according to the tcmalloc page i can use the pprof tool to compare heap files, and see what are the additional non released objects (possible leak).
pprof --text myapp --base=myapp.0001.heap myapp.0047.heap
the result is:
...
Total: 4600.7 MB
4592.3 99.8% 99.8% 4592.3 99.8% 0x00000000009f1d25
7.3 0.2% 100.0% 7.3 0.2% 0x00000000009f1cfc
1.0 0.0% 100.0% 1.0 0.0% 0x00000000009f74f1
0.0 0.0% 100.0% 4600.7 100.0% 00007f07fe149b44
0.0 0.0% 100.0% 4600.7 100.0% 0x0000000000480da1
0.0 0.0% 100.0% 4600.7 100.0% 0x00000000004b5a3e
0x00000000009f1d25 is a nice address, but i can't really do anything with this data.
I've tried running the same in an helloworld application
pprof --text helloworld helloworld.0001.heap
Using local file helloworld.
Using local file helloworld.0001.heap.
Total: 9.5 MB
9.5 100.0% 100.0% 9.5 100.0% BigNumber::BigNumber
0.0 0.0% 100.0% 0.0 0.0% __GI__IO_file_doallocate
0.0 0.0% 100.0% 9.5 100.0% main
0.0 0.0% 100.0% 0.0 0.0% _IO_new_file_overflow
0.0 0.0% 100.0% 0.0 0.0% _IO_new_file_xsputn
0.0 0.0% 100.0% 0.0 0.0% __GI__IO_doallocbuf
0.0 0.0% 100.0% 0.0 0.0% __GI__IO_fwrite
0.0 0.0% 100.0% 9.5 100.0% __libc_start_main
0.0 0.0% 100.0% 9.5 100.0% _start
0.0 0.0% 100.0% 0.0 0.0% std::__ostream_insert
0.0 0.0% 100.0% 0.0 0.0% std::operator<<
Here we can see clearly that all the functions have clear names, and the leak is from the BigNumber constructor.
Can anyone point me in the right direction toward getting the meaning of the address above ?

Valgrind is Telling me there are bytes allocated on my heap? [duplicate]

I have developed a pure-C implementation of FIFO lists (queues) in files fifo.h and fifo.c, and have written a test programme testfifo.c which I compile to ./bin/testfifo. The node structure is defined in list.h.
I run my programme through Valgrind on OS X 10.6 like this
valgrind --tool=memcheck --leak-check=full --show-reachable=yes ./bin/testfifo
and get the following output
==54688== Memcheck, a memory error detector
==54688== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==54688== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==54688== Command: bin/testfifo
==54688==
--54688-- bin/testfifo:
--54688-- dSYM directory is missing; consider using --dsymutil=yes
==54688==
==54688== HEAP SUMMARY:
==54688== in use at exit: 88 bytes in 1 blocks
==54688== total heap usage: 11 allocs, 10 frees, 248 bytes allocated
==54688==
==54688== LEAK SUMMARY:
==54688== definitely lost: 0 bytes in 0 blocks
==54688== indirectly lost: 0 bytes in 0 blocks
==54688== possibly lost: 0 bytes in 0 blocks
==54688== still reachable: 0 bytes in 0 blocks
==54688== suppressed: 88 bytes in 1 blocks
==54688==
==54688== For counts of detected and suppressed errors, rerun with: -v
==54688== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
According to the leak summary, there are no leaks, but I am still wondering what the "suppressed" leaks are. Besides, the number of alloc's and free's do not match, and hence I am unsure if there are leaks or not.
----EDIT----
Running
valgrind --tool=memcheck --leak-check=full --show-reachable=yes -v ./bin/testfifo
on OS X 10.6 produces a quite long and confusing output, but I have run
valgrind --tool=memcheck --leak-check=full --show-reachable=yes ./bin/testfifo
on a Linux machine an got this output:
==32688== Memcheck, a memory error detector
==32688== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==32688== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==32688== Command: bin/testfifo
==32688==
==32688==
==32688== HEAP SUMMARY:
==32688== in use at exit: 0 bytes in 0 blocks
==32688== total heap usage: 10 allocs, 10 frees, 160 bytes allocated
==32688==
==32688== All heap blocks were freed -- no leaks are possible
==32688==
==32688== For counts of detected and suppressed errors, rerun with: -v
==32688== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
alloc's and free's now match, so the extra alloc on OS X seems to be due to some system library, as has been suggested.
I have run the very same command with the -v option, in order to reveal the 4 suppressed errors, but I have not got any easily understandable new information.
Those are leaks outside of your code, in (probably shared) libraries or known false positives. Running valgrind with -v should inform you about the suppressions used.

Compiling with -mpopcnt causes Illegal instruction error

I compile the following C++ code
// main.cpp
#include <cstdio>
int main() {
unsigned char tab[4] = {0};
printf("%d\n", __builtin_popcount(*((int *)tab)));
}
using command line:
g++ -o prog main.cpp -mpopcnt
When I run the program I get error:
Illegal instruction
Compiling without -mpopcnt does not give an error (it just prints 0).
Question: what is causing this error?
I am compiling and running the program on the same machine.
Valgrind detects no problem. Running
valgrind --leak-check=full ./prog
gives
==12917== Memcheck, a memory error detector
==12917== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==12917== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==12917== Command: ./prog
==12917==
0
==12917==
==12917== HEAP SUMMARY:
==12917== in use at exit: 0 bytes in 0 blocks
==12917== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==12917==
==12917== All heap blocks were freed -- no leaks are possible
==12917==
==12917== For counts of detected and suppressed errors, rerun with: -v
==12917== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Below I give some specifications of my system.
I'm using Ubuntu 12.04. Running
uname -a
gives me
Linux wtu-82 3.2.0-65-generic #99-Ubuntu SMP Fri Jul 4 21:03:29 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Running
g++ -v
gives
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.4-1ubuntu1~12.04' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.4 (Ubuntu/Linaro 4.6.4-1ubuntu1~12.04)
The output of
cat /proc/cpuinfo
is
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E8500 # 3.16GHz
stepping : 10
microcode : 0xa0c
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips : 6317.48
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E8500 # 3.16GHz
stepping : 10
microcode : 0xa0c
cpu MHz : 2000.000
cache size : 6144 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips : 6317.38
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
POPCNT was introduced in SSE 4.2. Your processor is SSE 4.1. So, the instruction is simply missing. You get an illegal instruction error when you force the compiler, with -mpopcnt, to generate code using an instruction your processor doesn't know about.

cannot get Helgrind/DRD work with C++11 thread

I have problems getting Helgrind and DRD working with g++ and C++11 threads.
My setup:
- RedHad Linux 2.6
- g++ 4.7.2
- Valgrind 3.7.0
I tried the program posted here, after adding the definitions listed in the first answer, thus:
#include <valgrind/helgrind.h>
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) ANNOTATE_HAPPENS_BEFORE(addr)
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) ANNOTATE_HAPPENS_AFTER(addr)
#define _GLIBCXX_EXTERN_TEMPLATE -1
#include <thread>
int main()
{
std::thread t( []() { } );
t.join();
return 0;
}
I then build the program:
$ g++ -std=c++11 -Wall -Wextra -pthread main.cc
The program (which doesn't do much) runs correctly:
$ ./a.out
also with valgrind:
$ valgrind ./a.out
==21284== Memcheck, a memory error detector
==21284== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==21284== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21284== Command: ./a.out
==21284==
==21284==
==21284== HEAP SUMMARY:
==21284== in use at exit: 0 bytes in 0 blocks
==21284== total heap usage: 2 allocs, 2 frees, 344 bytes allocated
==21284==
==21284== All heap blocks were freed -- no leaks are possible
==21284==
==21284== For counts of detected and suppressed errors, rerun with: -v
==21284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
But then, with Helgrind, I get false positives:
$ valgrind --tool=helgrind ./a.out
==21467== Helgrind, a thread error detector
==21467== Copyright (C) 2007-2011, and GNU GPL'd, by OpenWorks LLP et al.
==21467== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==21467== Command: ./a.out
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467==
==21467== Thread #1 is the program's root thread
==21467==
==21467== ---Thread-Announcement------------------------------------------
==21467== [lines removed]
==21467==
==21467== ----------------------------------------------------------------
==21467==
==21467== Possible data race during write of size 8 at 0x5B7A058 by thread #1
==21467== Locks held: none
==21467== [lines removed]
==21467==
==21467== This conflicts with a previous write of size 8 by thread #2
==21467== Locks held: none
==21467== at 0x4EE0A25: execute_native_thread_routine (shared_ptr_base.h:587)
==21467== by 0x4C2D3AD: mythread_wrapper (hg_intercepts.c:219)
==21467== by 0x55D1850: start_thread (in /lib64/libpthread-2.12.so)
==21467== by 0x58CF90C: clone (in /lib64/libc-2.12.so)
==21467==
==21467== [lines removed]
==21467==
==21467==
==21467== For counts of detected and suppressed errors, rerun with: -v
==21467== Use --history-level=approx or =none to gain increased speed, at
==21467== the cost of reduced accuracy of conflicting-access information
==21467== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Similar bogus reports with DRD instead of Helgrind.
Any idea what could be wrong?
I found this in drd manual. It seems you need to recompile the functions execute_native_thread_routine() and std::thread::_M_start_thread(), and link it with your program(And not using the shared library at execution time for these functions).
Actually the macros _GLIBCXX_SYNCHRONIZATION_HAPPENS_* are seen by your code but they were not seen by the internal functions of the c++ library code when this library was built. That's why you need to recompile them using the same header inclusion and macro definition you used for your code.