valgrind does not show memory leak in log [duplicate] - c++

This question already has an answer here:
valgrind, gcc 6.2.0 and "-fsanitize=address"
(1 answer)
Closed 1 year ago.
I got a service, and running in my local, when I stopped the service, ASAN give me memory leak messages. So I tried to use Valgrind to find where it got leaked, but there is no such errors.
I run it with
valgrind --leak-check=full --show-leak-kinds=all --verbose --log-file=out.txt /my/path/to/myshell -m myservice.py
"/my/path/to/myshell -m myservice.py" is the way to start my service in my local.
myshell invokes a Python customer interpreter with os.execve
and after I stopped my service, I see ASAN is giving me a lot of messages about memory leak, but in out.txt, I see the pid, which is the same process as I run ps -ef, but there is no memory leak info at all. where is wrong?

If you intend to run the program under valgrind, then don't compile with ASAN. They do not work together.
Recompile without -fsanitize=address and try again.
You also need the --trace-children=yes flag to valgrind in order for it to check subprocesses executed by execve.

Related

How to find "index out of bound" if I don't get Segmentation Fault

I have a program which cause Seg fault in a machine, which is not accessible for me. However, when I compile and run it with the same compiler and same input on my machine, I don't get anything. The problem is probably "array index out of bound" which might lead to Seg Fault in some circumstances, however, compiler does not show any warning. The program is huge and complicated. So I cannot find the problem just by checking the code.
Any suggestion on how to get the Segmentation Fault on my machine too? This way I can debug the code and find the problem.
You could use valgrind if it works over Linux machine.
To use valgrind you just type on console:
valgrind --leak-check=full --num-callers=20 --tool=memcheck ./program
and should return invalid read/write of size X according to the variable and (if you compiled with debugging information), it will tell you the line where the problem might be.
By the way, you can install valgrind in Ubuntu/Debian Linux (for example) just as easy as:
sudo apt-get install valgrind
You can try a solution such as Valgrind as other posters mentioned, or your compiler may also have some specific ability to insert guard words before I located a raise to detect this kind of access.

MPI and Valgrind not showing line numbers

I've written a large program and I'm having a really hard time tracking down a segmentation fault. I posted a question but I didn't have enough information to go on (see link below - and if you do, note that I spent almost an entire day trying several times to come up with a minimally compilable version of the code that reproduced the error to no avail).
https://stackoverflow.com/questions/16025411/phantom-bug-involving-stdvectors-mpi-c
So now I'm trying my hand at valgrind for the first time. I just installed it (simply "sudo apt-get install valgrind") with no special installation to account for MPI (if there is any). I'm hoping for concrete information including file names and line numbers (I understand it's impossible for valgrind to provide variable names). While I am getting useful information, including
Invalid read of size 4
Conditional jump or move depends on uninitialised value(s)
Uninitialised value was created by a stack allocation
4 bytes in 1 blocks are definitely lost
in addition to this magical thing
Syscall param sched_setaffinity(mask) points to unaddressable byte(s) at 0x433CE77: syscall (syscall.S:31) Address 0x0 is not stack'd, malloc'd or (recently) free'd
I am not getting file names and line numbers. Instead, I get
==15095== by 0x406909A: ??? (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
Here's how I compile my code:
mpic++ -Wall -Wextra -g -O0 -o Hybrid.out (…file names)
Here are two ways I've executed valgrind:
valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log.txt mpirun -np 1 Hybrid.out
and
mpirun -np 1 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
The second version based on instructions in
Segmentation faults occur when I run a parallel program with Open MPI
which, if I'm understanding the chosen answer correctly, appears to be contradicted by
openmpi with valgrind (can I compile with MPI in Ubuntu distro?)
I am deliberately running valgrind on one processor because that's the only way my program will execute to completion without the segmentation fault. I have also run it with two processors, and my program seg faulted as expected, but the log I got back from valgrind seemed to contain essentially the same information. I'm hoping that by resolving the issues valgrind reports on one processor, I'll magically solve the issue happening on more than one.
I tried to include "-static" in the program compilation as suggested in
Valgrind not showing line numbers in spite of -g flag (on Ubuntu 11.10/VirtualBox)
but the compilation failed, saying (in addition to several warnings)
dynamic STT_GNU_IFUNC symbol "strcmp" with pointer equality in '…' can not be used when making an executably; recompile with fPIE and relink with -pie
I have not looked into what "fPIE" and "-pie" mean. Also, please note that I am not using a makefile, nor do I currently know how to write one.
A few more notes: My code does not use the commands malloc, calloc, or new. I'm working entirely with std::vector; no C arrays. I do use commands like .resize(), .insert(), .erase(), and .pop_back(). My code also passes vectors to functions by reference and constant reference. As for parallel commands, I only use MPI_Barrier(), MPI_Bcast(), and MPI_Allgatherv().
How do I get valgrind to show the file names and line numbers for the errors it is reporting? Thank you all for your help!
EDIT
I continued working on it and a friend of mine pointed out that the reports without line numbers are all coming from MPI files, which I did not compile from source, and since I did not compile them, I can't use the -g option, and hence, don't see lines. So I tried valgrind again based on this command,
mpirun -np 1 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
but now for two processors, which is
mpirun -np 2 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
The program ran to completion (I did not see the seg fault reported in the command line) but this execution of valgrind did give me line numbers within my files. The line valgrind is pointing to is a line where I call MPI_Bcast(). Is it safe to say that this appeared because the memory problem only manifests itself on multiple processors (since I've run it successfully on np -1)?
It sounds like you are using the wrong tool. If you want to know where a segmentation fault occurs use gdb.
Here's a simple example. This program will segfault at *b=5
// main.c
int
main(int argc, char** argv)
{
int* b = 0;
*b = 5;
return *b;
}
To see what happened using gdb; (the <---- part explains input lines)
svengali ~ % g++ -g -c main.c -o main.o # include debugging symbols in .o file
svengali ~ % g++ main.o -o a.out # executable is linked (no -g here)
svengali ~ % gdb a.out
GNU gdb (GDB) 7.4.1-debian
<SNIP>
Reading symbols from ~/a.out...done.
(gdb) run <--------------------------------------- RUNS THE PROGRAM
Starting program: ~/a.out
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005a3 in main (argc=1, argv=0x7fffffffe2d8) at main.c:5
5 *b = 5;
(gdb) bt <--------------------------------------- PRINTS A BACKTRACE
#0 0x00000000004005a3 in main (argc=1, argv=0x7fffffffe2d8) at main.c:5
(gdb) print b <----------------------------------- EXAMINE THE CONTENTS OF 'b'
$2 = (int *) 0x0
(gdb)

Counter exit code 139 when running, but gdb make it through

My question sounds specific, but I doubt it still can be of a C++ debug issue.
I am using omnet++ which is to simulate wireless network. omnet++ itself is a c++ program.
I encountered a queer phenomena when I run my program (modified inet framework with omnet++ 4.2.2 in Ubuntu 12.04): the program exit with exit code 139 (people say this means memory fragmentation) when touching a certain part of the codes, when I try to debug, gdb doesn't report anything wrong with the 'problematic' codes where the simulation exits previously, actually, the debug goes through this part of codes and output expected results.
gdb version info: GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Could anybody tell me why the run fails but debug doesn't?
Many thanks!
exit code 139 (people say this means memory fragmentation)
No, it means that your program died with signal 11 (SIGSEGV on Linux and most other UNIXes), also known as segmentation fault.
Could anybody tell me why the run fails but debug doesn't?
Your program exhibits undefined behavior, and can do anything (that includes appearing to work correctly sometimes).
Your first step should be running this program under Valgrind, and fixing all errors it reports.
If after doing the above, the program still crashes, then you should let it dump core (ulimit -c unlimited; ./a.out) and then analyze that core dump with GDB: gdb ./a.out core; then use where command.
this error is also caused by null pointer reference.
if you are using a pointer who is not initialized then it causes this error.
to check either a pointer is initialized or not you can try something like
Class *pointer = new Class();
if(pointer!=nullptr){
pointer->myFunction();
}

force coredump on glib free error

I get the following error when I run my program and it won't happen under gdb. How can I force glibc or ubuntu to dump core on abort? I tried "ulimit -c unlimited". But, this is not a seg fault and no luck. Also, I have too many memory errors in valgrind fixing all of them will take a lot of time.
Also, setting MALLOC_CHECK_ to 0 is not forcing program to exit. But, that's not a option for me.
* glibc detected ./main: free(): invalid next size (fast): 0x0000000000ae0560 **
Edit
Anyway I found what is exactly causing this glibc corruption in valgrind. Just keeping it open to see if it's possible.
From glibc documentation:
If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort is called immediately.
Calling abort() usually produces a core dump (subject to ulimit -c setting).
Use Valgrind to diagnose and fix the problem. It will be quicker and straight to the point, since this indeed looks like a classic heap corruption.
There is likely a (Valgrind) package available for your distro, if you use a common one.
The only other method to create a core dump would be to attach GDB to the process before it happens. But that still doesn't get you closer to the solution of what causes the problem. Valgrind is the superior approach.

Aborted core dumped C++

I have a large C++ function which uses OpenCV library and running on Windows with cygwin g++ compiler. At the end it gives Aborted(core dumped) but the function runs completely before that. I have also tried to put the print statement in the end of the function. That also gets printed. So I think there is no logical bug in code which will generate the fault.
Please explain.
I am also using assert statements.But the aborted error is not due to assert statement. It does not say that assertion failed. It comes at end only without any message.
Also the file is a part of a large project so I cannot post the code also.
gdb results:
Program received signal SIGABRT, Aborted.
0x7c90e514 in ntdll!LdrAccessResource () from /c/WINDOWS/system32/ntdll.dll
It looks like a memory fault (write to freed memory, double-free, stack overflow,...). When the code can be compiled and run under Linux you can use valgrind to see if there are memory issues. Also you can try to disable parts of the application until the problem disappears, to get a clue where the error happens. But this method can also give false positives, since memory related bugs can cause modules to fail which are not the cause of the error. Also you can run the program in gdb. But also here the position the debugger points to may not be the position where the error happened.
You don't give us much to go on. However, this looks like you are running into some problem when freeing resources. Maybe a heap corruption. Have you tried running it under gdb and then looking where it crashes? Also, check if all your new/delete calls match.
Load the core dump together with the binary into gdb to get an idea at which location the problem list. Command line is:
gdb <path to the binary> <path to the core file>
For more details on gdb see GDB: The GNU Project Debugger.
Run it through AppVerifier and cdb.
E.g.
cdb -xd sov -xd av -xd ch <program> <args>