Valgrind reporting memory leak in RocksDB - profiling

I am trying to profile the performance of RocksDB using Callgrind / KCacheGrind on a mac. I am running the command valgrind --tool=callgrind ./simple_example on one of the example programs that comes with RocksDB in the examples folder. I seem to be getting a memory leak, though, which prevents me from being able to do the performance profiling that I ultimately want to do.
==54628== Callgrind, a call-graph generating cache profiler
==54628== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==54628== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==54628== Command: ./simple_example
==54628==
==54628== For interactive control, run 'callgrind_control -h'.
==54628==
==54628== Process terminating with default action of signal 11 (SIGSEGV)
==54628== Access not within mapped region at address 0x18
==54628== at 0x1016D25BA: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==54628== by 0x1016D250C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==54628== by 0x1016D1BF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==54628== If you believe this happened as a result of a stack
==54628== overflow in your program's main thread (unlikely but
==54628== possible), you can try to increase the size of the
==54628== main thread stack using the --main-stacksize= flag.
==54628== The main thread stack size used in this run was 8388608.
--54628:0:schedule VG_(sema_down): read returned -4
==54628==
==54628== Events : Ir
==54628== Collected : 14961779
==54628==
==54628== I refs: 14,961,779
Segmentation fault: 11

Related

Performance testing in C++ project

I wrote a project in c++ with 10 threads. One thread loads the data into memory(write the buffer) and other 9 threads are simultaneously read the buffer and store data in SQLite database, All threads are handled with the mutex to avoid conflicts.
Now I need to evaluate the performance of this project such as time to success per threads, memory usages etc. How can I go it in c++ environment? I used Valgrind to check these. But I think it not working.
This is the code I run with Valgrind,
valgrind --tool=memcheck --leak-check=yes ./executable
It gives a message like this,
callers=20 --track-fds=yes ./monerosci
==24262== Memcheck, a memory error detector
==24262== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24262== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==24262== Command: ./monerosci
==24262==
valgrind: m_syswrap/syswrap-linux.c:5361
(vgSysWrap_linux_sys_fcntl_before): Assertion 'Unimplemented
functionality' failed.
valgrind: valgrind
host stacktrace:
==24262== at 0x38083F48: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x38084064: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380841F1: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380FB399: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380D6234: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380D2D2A: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380D43DE: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
==24262== by 0x380E3946: ??? (in /usr/lib/valgrind/memcheck-amd64-linux)
How can I test the performance of the project in C++?
Well it seems there's two separate problems here:
1) memcheck is failing to run due to a bug or some limitation. Apparently one variation of a fcntl call is not supported by your version of valgrind. Maybe you should reduce the code size, remove libraries, until you can pinpoint which call is triggering this problem. Or just run it under a different version of valgrind. However, I think memcheck will not give you the data you want...
2) memcheck is not a tool for profiling. Valgrind is composed of several different tools that can be switched by using the --tool parameter. Here's an overview of them. The one that most likely will give you the info you want is callgrind.

Valgrind shows question marks

I have compiled a program using Code::Blocks. I have turned on "produce debugging symbols" under the "Debug" target, and also turned off "strip all symbols..." But when I run the program with Valgrind I get question marks in the output:
$ valgrind --leak-check=yes --track-origins=yes --log-file=valgrind_output.txt
~/bin/myprg
==3766== Memcheck, a memory error detector
==3766== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==3766== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==3766== Command: /home/xxxxxx/bin/myprg
==3766== Parent PID: 3209
==3766==
==3766== Warning: client switching stacks? SP change: 0xffefff978 --> 0xffed13da0
==3766== to suppress, use: --max-stackframe=3062744 or greater
==3766== Invalid write of size 4
==3766== at 0x40892B: ??? (in /home/xxxxxx/bin/myprg)
==3766== by 0x40275C: ??? (in /home/xxxxxx/bin/myprg)
==3766== by 0x56FB82F: (below main) (libc-start.c:291)
==3766== Address 0xffed13ddc is on thread 1's stack
==3766==
==3766== Invalid write of size 4
==3766== at 0x408931: ??? (in /home/xxxxxx/bin/myprg)
==3766== by 0x40275C: ??? (in /home/xxxxxx/bin/myprg)
==3766== by 0x56FB82F: (below main) (libc-start.c:291)
==3766== Address 0xffed13dd4 is on thread 1's stack
==3766==
...
What is the meaning of this output, and how do I find the piece of code that is causing this error?
Update: Solution
The problem was with Code::Blocks. It is necessary to correctly configure the project build options for the whole project and not just the "Debug" target. So all flags except "-std=c++11" were removed from the "whole project" options, so nothing was overriding the "Debug" options. Also the linker ".o" files need to be deleted when the options are changed, to force Code::Blocks to rebuild the executable.
The code needs to be compiled and linked with debug info (-g command line option) and -fno-omit-frame-pointer for valgrind to show correct stack traces.
See The stack traces given by Memcheck (or another tool) aren't helpful. How can I improve them? for more details.
I recently had this problem and was able to resolve it by using the "--keep-debuginfo=yes" option, as suggested by the FAQ on this page.

valgrind error with __builtin_ctz

I'm trying to profile my code but run into problems.
If I run the following code:
#include <iostream>
int main() {
size_t val = 8;
std::cout << sizeof(val) << std::endl;
std::cout << __builtin_ctz(val) << std::endl;
}
It returns as expected
8
3
If I run valgrind on it it returns:
==28602== Memcheck, a memory error detector
==28602== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==28602== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==28602== Command: ./test
==28602==
8
vex amd64->IR: unhandled instruction bytes: 0xF3 0xF 0xBC 0xC0 0x89 0xC6 0xBF 0x60
==28602== valgrind: Unrecognised instruction at address 0x400890.
==28602== at 0x400890: main (in /home/magu_/sod/test/test)
==28602== Your program just tried to execute an instruction that Valgrind
==28602== did not recognise. There are two possible reasons for this.
==28602== 1. Your program has a bug and erroneously jumped to a non-code
==28602== location. If you are running Memcheck and you just saw a
==28602== warning about a bad jump, it's probably your program's fault.
==28602== 2. The instruction is legitimate but Valgrind doesn't handle it,
==28602== i.e. it's Valgrind's fault. If you think this is the case or
==28602== you are not sure, please let us know and we'll try to fix it.
==28602== Either way, Valgrind will now raise a SIGILL signal which will
==28602== probably kill your program.
==28602==
==28602== Process terminating with default action of signal 4 (SIGILL)
==28602== Illegal opcode at address 0x400890
==28602== at 0x400890: main (in /home/magu_/sod/test/test)
==28602==
==28602== HEAP SUMMARY:
==28602== in use at exit: 0 bytes in 0 blocks
==28602== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==28602==
==28602== All heap blocks were freed -- no leaks are possible
==28602==
==28602== For counts of detected and suppressed errors, rerun with: -v
==28602== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Illegal instruction (core dumped)
Is this an bug of valgrind or should I not use __builtin_ctz with my computer? __builtin_popcount does not raise any errors.
My system:
g++ (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1
CPU : Intel Core Duo T7500
You need to upgrade valgrind to at least 4.8.1 or use an gcc older than v4.8.
The opcode you ran into -- F3 0F BC -- is the TZCNT opcode, introduced in BMI1, which your CPU doesn't implement. However, it is also REP;BSF (F3 is REP) and older CPUs, including yours, ignore the REP for this opcode, and the similar LZCNT == REP;BSR pair. There is very little difference between TZCNT and BSF (they differ in how they handle 0).
Older gcc versions used BSF for older CPUs and TZCNT for newer ones, but since the opcode is relatively rare, in newer gcc versions the logic was simplified and TZCNT is always used, since both older and newer CPUs understand it.
Unfortunately, valgrind did not correctly fallback from TZCNT to BSF until v4.8.1. See bug 295808.
On Debian/Sid/x86-64 (Intel i4750HQ processor) with gcc version 4.9.1 (Debian 4.9.1-4) and valgrind-3.9.0 your test is working ok (and valgrind runs successfully without reporting any errors).
So I suggest you to upgrade your GCC compiler and most importantly valgrind. Start first by compiling valgrind from its valgrind-3.9.0 source code tarball (and use aptitude build-dep valgrind before).
BTW, your distribution version is quite old. Did you consider upgrading to Ubuntu 14.0 LTS?
If you don't have root access, consider passing some explicit --prefix (e.g. $HOME/pub/ ) to valgrind-3.9.0/configure

Valgrind report write error? why?

When running Valgrind's memcheck, occasionally valgrind report error like this:
==2745== Memcheck, a memory error detector
==2745== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==2745== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info
==2745== Command: ./HSFramework
==2745==
==2745== Invalid write of size 8
==2745== at 0x3B81C097C0: do_lookup_x (in /lib64/ld-2.12.so)
==2745== by 0x1C31032D: ???
==2745== by 0x3B81C09E19: _dl_lookup_symbol_x (in /lib64/ld-2.12.so)
==2745== Address 0x7feffee78 is on thread 1's stack
==2745==
platform: Linux 2.6.32-220.el6.x86_64 x86_64 x86_64 x86_64 GNU/Linux
There is not clue about my code from this error report.
I had no idea about this error report.
What reasons will lead to this error?
This error means you are getting a buffer overrun in do_lookup_x, if you got its source look at that or share with us.
http://valgrind.org/docs/manual/quick-start.html
This means that the do_lookup_x function has performed an invalid write access. That function is part of the runtime library (and not likely the origin of the issue). I would contact the author of HSFramework to see if they can fix this issue by running valgrind as you did

Invisiable SIGSEGV on linux that does not happen on windows?

INTRO
I have a TCP/HTTP server that supports plugins in form of Shared Libraries (DLL and .so). It has make and .sln files build system via premake. When I start my application I feed to it a configuration file like this with description of what libraries server shall use as plugins and what arguments it shall pass to tham. For some time I had 2 plugins and all worked just fine. and even now works just fine if I feed to my server config fdiles alike this. But Now I have new plugin I am developing and so new config file.
SETUP
Steps required to setup my server on linux are fiew and simple
download build script (from here as described here)
./cloud_server_net_setup.sh , no superuser needed, requires curl, make and g++
In regular case (not development this is enought - it will get boost, and other libraries it needs into local folder, it will build all of tham, build server in release form )
now you can cd into cloud_server/install-dir/
call export LD_LIBRARY_PATH=./:./lib_boost
and run our server ./CloudServer
But we need debug wersion so after we call script we
cd cloud_server/CloudServer/projects/linux-gmake/
make
cd bin/debug
export LD_LIBRARY_PATH=./:(place from where we called our script)/cloud_server/install-dir/lib_boost
PROBLEM
and now, finally we can call gdb.
So we call it. and this is what we see:
gdb ./CloudServer
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer...done.
(gdb) r
Starting program: /home/ole_jak/cloud_server/CloudServer/projects/linux-gmake/bin/debug/CloudServer
[Thread debugging using libthread_db enabled]
Cloud Server v0.5
Copyright (c) 2011 Cloud Forever. All rights reserved.
Type 'help' to see help messages.
Config file path: config.xml
[New Thread 0x7ffff5967700 (LWP 11516)]
[New Thread 0x7ffff5166700 (LWP 11517)]
[New Thread 0x7ffff4965700 (LWP 11518)]
[New Thread 0x7ffff4164700 (LWP 11519)]
[New Thread 0x7ffff3963700 (LWP 11520)]
[New Thread 0x7ffff3162700 (LWP 11521)]
[New Thread 0x7ffff2961700 (LWP 11522)]
[New Thread 0x7ffff2160700 (LWP 11523)]
[New Thread 0x7ffff195f700 (LWP 11524)]
[New Thread 0x7ffff115e700 (LWP 11525)]
[New Thread 0x7ffff095d700 (LWP 11526)]
[New Thread 0x7fffebfff700 (LWP 11527)]
[New Thread 0x7fffeb7fe700 (LWP 11528)]
[New Thread 0x7fffeaffd700 (LWP 11529)]
[New Thread 0x7fffea7fc700 (LWP 11530)]
[New Thread 0x7fffe9ffb700 (LWP 11531)]
Library libFileService.so opened.
[New Thread 0x7fffe953c700 (LWP 11532)]
Library libUsersFilesService.so opened.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) x/i $pc
0x0: Cannot access memory at address 0x0
I am Linux nube and all I know about Segmentation fault I know from wikipedia, but I know one more thing about my server and this new service I am creating - it compiles and runs on Windows with no errors at all (VS2008, 2010 solutions can be created from same premake script).
So I wonder how and where in this 2 files .cpp and .h I have created an error that does not show on windows at alss an shows so dramaticvally on Linux? And is it fixable, or visiable to fresh eye?
UPDATE:
Valgrind output
ole_jak#dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$ valgrind ./CloudServer
==11682== Memcheck, a memory error detector
==11682== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11682== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==11682== Command: ./CloudServer
==11682==
Cloud Server v0.5
Copyright (c) 2011 Cloud Forever. All rights reserved.
Type 'help' to see help messages.
Config file path: config.xml
Library libFileService.so opened.
Library libUsersFilesService.so opened.
==11682== Jump to the invalid address stated on the next line
==11682== at 0x0: ???
==11682== by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
==11682== by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
==11682== by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
==11682== by 0x102A159F: openDatabase (sqlite3.c:108909)
==11682== by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
==11682== by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
==11682== by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
==11682== by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
==11682== by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
==11682== by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
==11682== by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
==11682== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==11682==
==11682==
==11682== Process terminating with default action of signal 11 (SIGSEGV)
==11682== Bad permissions for mapped region at address 0x0
==11682== at 0x0: ???
==11682== by 0x4D49BE: sqlite3_free (sqlite3.c:18155)
==11682== by 0x102242D5: sqlite3OsInit (sqlite3.c:14162)
==11682== by 0x1029EB28: sqlite3_initialize (sqlite3.c:107299)
==11682== by 0x102A159F: openDatabase (sqlite3.c:108909)
==11682== by 0x102A1B29: sqlite3_open (sqlite3.c:109156)
==11682== by 0x1021CAB0: sqlite3pp::database::connect(char const*) (sqlite3pp.cpp:89)
==11682== by 0x1021C6E3: sqlite3pp::database::database(char const*) (sqlite3pp.cpp:74)
==11682== by 0x1020DDDF: users_files_service::create_files_table(std::string) (users_files_service.cpp:171)
==11682== by 0x1020BAFC: users_files_service::apply_config(boost::shared_ptr<boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> > >) (users_files_service.cpp:38)
==11682== by 0x4B5432: server_utils::parse_config_services(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:156)
==11682== by 0x4B6436: server_utils::parse_config(boost::property_tree::basic_ptree<std::string, std::string, std::less<std::string> >) (server_utils.cpp:208)
==11682==
==11682== HEAP SUMMARY:
==11682== in use at exit: 124,050 bytes in 1,083 blocks
==11682== total heap usage: 1,814 allocs, 731 frees, 183,517 bytes allocated
==11682==
==11682== LEAK SUMMARY:
==11682== definitely lost: 0 bytes in 0 blocks
==11682== indirectly lost: 0 bytes in 0 blocks
==11682== possibly lost: 46,248 bytes in 799 blocks
==11682== still reachable: 77,802 bytes in 284 blocks
==11682== suppressed: 0 bytes in 0 blocks
==11682== Rerun with --leak-check=full to see details of leaked memory
==11682==
==11682== For counts of detected and suppressed errors, rerun with: -v
==11682== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Убито
ole_jak#dspproc:~/cloud_server/CloudServer/projects/linux-gmake/bin/debug$
This is a nasty one. I am unsure about the exact root cause, but this seems to be a multi-threading related issue. The immediate cause of the problem is that the sqlite3Config.m.xSize function pointer is NULL at the place and time the error happens.
This pointer is supposed to be initialized to point to a proper function the first time that sqlite3_initialize() is called, which normally happens the first time you open an SQLite database file. By setting breakpoints and watchpoints in GDB I was able to verify that the pointer is successfully set, yet at the time of the segmentation fault its value is NULL.
That could mean one of two things:
The new pointer value is not properly propagated to all threads. SQLite3 is supposed to be thread-safe, but well, threads can be nasty little buggers...
Something resets the pointer after it has been initialized. I considered this highly unlikely since the sqlite3Config structure is not usually modified after initialization.
I performed a simple test, which incidentally can be used as a temporary workaround: I added an explicit call to sqite3_initialize() as the first statement in main(), allowing it to be executed before any threads are launched. As a result, the segmentation fault went away and I got a shell prompt for your server, which points to the first of the two alternatives. Note that this is a workaround at best, since sqite3_initialize() is not supposed to be explicitly called. The root cause of the issue may still be present and make itself known otherwise - or, worse, it could break things in subtle, yet hard to detect, ways.
Since SQLite3 is supposed to be thread-safe (and the source code of the sqlite3_initialize() function seems correct in that regard), I am unsure what is happening. It could be a problem with the sqlite3pp wrapper or with the way the threads are launched.
here are my suggestions.
turn off optimizations. Sometime optimizations cause errors. use -O0 for example.
remove dynamic loading, try linking your code in statically, and see if the problem still occurs.
reduce the size of the problem. Make the smallest possible program that can reproduce the error and then post it here.
thanks,
mike