Valgrind is not showing line numbers - fortran

My fortran code gets failed if I run it without debugger. But if I use valgrind while running it runs. Surely, a case of memory issue. I thought to debug the code using valgrind and it does contain errors like...
==988381== at 0x15EA7C: __nautilus_main_MOD_init_reaction_rates (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x16AD99: __nautilus_main_MOD_initialisation (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x10B44C: MAIN__ (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x10B30E: main (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== Address 0x4fb7c98 is 8 bytes before a block of size 20,864 alloc'd
==988381== at 0x483C855: malloc (vg_replace_malloc.c:381)
==988381== by 0x150140: __global_variables_MOD_initialize_global_arrays (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x16AD2B: __nautilus_main_MOD_initialisation (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x10B44C: MAIN__ (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
==988381== by 0x10B30E: main (in /home/seps/Desktop/softwares/DNautilus_spin/bin/dnautilus_spin)
But valgrind is not showing any line numbers. I compiled the code using flags like -g3, -g, -fbacktrace, -o0. Even then, valgrind is not showing line numbers.
I tried changing the flags except -g3 while compiling the code. I tried different valgrind versions like valgrind-3.19.0, valgrind-3.20.0. These I did based on the suggestions from web search.
All I want is to get line numbers so that I can proceed further. Can anyone help me?
Thanks in advance.

Related

gdb only finds some debug symbols

So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1

c++ - Valgrind on codeblocks (linux)

I already use the Valgrind in small programs to check memorys leaks and its work good.
Now i have a big program with many class and .cpp and .h files and i'm trying to use Valgrind to check the memory leak because i use a lot of pointers, memory, etc.
I'm using linux and codeblocks 16.01 with gcc and i trying to run the Valgrind directly in codeblocks but i'm getting the follow error:
--------------- Application output --------------
valgrind: /myPathToTheProject/ValgrindOut.xml: No such file or directory
If i test with a small project with only a .cpp file and main it works good and the Valgrind generate the ValgrindOut.xml. In this big project i always getting this error. Someone have some idea what is wrong? or other way or tool to test memory leak?
EDIT - LEAK SUMMARY after running Valgrind
Leak summary:
definitely lost: 673 bytes in 6 blocks.
indirectly lost: 89,128 bytes in 68 blocks.
possibly lost: 232 bytes in 2 blocks.
still reachable: 80,944 bytes in 6 blocks.
suppressed: 0 bytes in 0 blocks.
I am not sure how to run valgrind directly from codeblocks. I suggest you build your project using codeblocks. While executing, use valgrind as per below command.
Command
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --log-file=leak.txt ./myexecutable <my command line arguments>
Example
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --log-file=leak.txt ./myexecutable -i 192.168.1.10 -p 5000
This way you can generate valgrind output file, that is leak.txt that contains memory leaks etc.

Is this error from Qt or my program?

I'm new to valgrind. I have written a program in C++ using Qt 5.5.1 libraries on Ubuntu 15.10. I'm using Qt Creator with Debug build set. I checked for memory leaks using Valgrind with the following command:
valgrind --leak-check=yes --track-origins=yes ./texteditor
Valgrind then gives me the following message:
==2977== Conditional jump or move depends on uninitialised value(s)
==2977== at 0x97ED1EC: ??? (in /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0.2400.28)
==2977== by 0x97EE58A: ??? (in /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0.2400.28)
==2977== by 0x5B3380B: g_cclosure_marshal_VOID__VOID (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x5B31B8A: g_closure_invoke (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x5B43FFB: ??? (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x5B4CC95: g_signal_emit_valist (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x5B4CFC4: g_signal_emit (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x96ECD00: gtk_adjustment_changed (in /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0.2400.28)
==2977== by 0x5B35465: ??? (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x5B384FC: g_object_thaw_notify (in /usr/lib/i386-linux-gnu/libgobject-2.0.so.0.4600.2)
==2977== by 0x96ED182: gtk_adjustment_configure (in /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0.2400.28)
==2977== by 0x4563C7F: ??? (in /home/tembo/Qt/5.5/gcc/lib/libQt5Widgets.so.5.5.1)
==2977== Uninitialised value was created by a stack allocation
==2977== at 0x456215F: ??? (in /home/tembo/Qt/5.5/gcc/lib/libQt5Widgets.so.5.5.1)
From the above message nothing points to the location of myProgram at all. Is this from Qt and other libraries or do I miss something pointing to myProgram?
By default Valgrind only shows the top 12 entries of the call stack, but this can be changed with the --num-callers=xx parameter. The functions from your own program code are likely further down on the stack.

MPI and Valgrind not showing line numbers

I've written a large program and I'm having a really hard time tracking down a segmentation fault. I posted a question but I didn't have enough information to go on (see link below - and if you do, note that I spent almost an entire day trying several times to come up with a minimally compilable version of the code that reproduced the error to no avail).
https://stackoverflow.com/questions/16025411/phantom-bug-involving-stdvectors-mpi-c
So now I'm trying my hand at valgrind for the first time. I just installed it (simply "sudo apt-get install valgrind") with no special installation to account for MPI (if there is any). I'm hoping for concrete information including file names and line numbers (I understand it's impossible for valgrind to provide variable names). While I am getting useful information, including
Invalid read of size 4
Conditional jump or move depends on uninitialised value(s)
Uninitialised value was created by a stack allocation
4 bytes in 1 blocks are definitely lost
in addition to this magical thing
Syscall param sched_setaffinity(mask) points to unaddressable byte(s) at 0x433CE77: syscall (syscall.S:31) Address 0x0 is not stack'd, malloc'd or (recently) free'd
I am not getting file names and line numbers. Instead, I get
==15095== by 0x406909A: ??? (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
Here's how I compile my code:
mpic++ -Wall -Wextra -g -O0 -o Hybrid.out (…file names)
Here are two ways I've executed valgrind:
valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log.txt mpirun -np 1 Hybrid.out
and
mpirun -np 1 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
The second version based on instructions in
Segmentation faults occur when I run a parallel program with Open MPI
which, if I'm understanding the chosen answer correctly, appears to be contradicted by
openmpi with valgrind (can I compile with MPI in Ubuntu distro?)
I am deliberately running valgrind on one processor because that's the only way my program will execute to completion without the segmentation fault. I have also run it with two processors, and my program seg faulted as expected, but the log I got back from valgrind seemed to contain essentially the same information. I'm hoping that by resolving the issues valgrind reports on one processor, I'll magically solve the issue happening on more than one.
I tried to include "-static" in the program compilation as suggested in
Valgrind not showing line numbers in spite of -g flag (on Ubuntu 11.10/VirtualBox)
but the compilation failed, saying (in addition to several warnings)
dynamic STT_GNU_IFUNC symbol "strcmp" with pointer equality in '…' can not be used when making an executably; recompile with fPIE and relink with -pie
I have not looked into what "fPIE" and "-pie" mean. Also, please note that I am not using a makefile, nor do I currently know how to write one.
A few more notes: My code does not use the commands malloc, calloc, or new. I'm working entirely with std::vector; no C arrays. I do use commands like .resize(), .insert(), .erase(), and .pop_back(). My code also passes vectors to functions by reference and constant reference. As for parallel commands, I only use MPI_Barrier(), MPI_Bcast(), and MPI_Allgatherv().
How do I get valgrind to show the file names and line numbers for the errors it is reporting? Thank you all for your help!
EDIT
I continued working on it and a friend of mine pointed out that the reports without line numbers are all coming from MPI files, which I did not compile from source, and since I did not compile them, I can't use the -g option, and hence, don't see lines. So I tried valgrind again based on this command,
mpirun -np 1 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
but now for two processors, which is
mpirun -np 2 valgrind --tool=memcheck --leak-check=full --track-origins=yes --log-file=log4.txt -v ./Hybrid.out
The program ran to completion (I did not see the seg fault reported in the command line) but this execution of valgrind did give me line numbers within my files. The line valgrind is pointing to is a line where I call MPI_Bcast(). Is it safe to say that this appeared because the memory problem only manifests itself on multiple processors (since I've run it successfully on np -1)?
It sounds like you are using the wrong tool. If you want to know where a segmentation fault occurs use gdb.
Here's a simple example. This program will segfault at *b=5
// main.c
int
main(int argc, char** argv)
{
int* b = 0;
*b = 5;
return *b;
}
To see what happened using gdb; (the <---- part explains input lines)
svengali ~ % g++ -g -c main.c -o main.o # include debugging symbols in .o file
svengali ~ % g++ main.o -o a.out # executable is linked (no -g here)
svengali ~ % gdb a.out
GNU gdb (GDB) 7.4.1-debian
<SNIP>
Reading symbols from ~/a.out...done.
(gdb) run <--------------------------------------- RUNS THE PROGRAM
Starting program: ~/a.out
Program received signal SIGSEGV, Segmentation fault.
0x00000000004005a3 in main (argc=1, argv=0x7fffffffe2d8) at main.c:5
5 *b = 5;
(gdb) bt <--------------------------------------- PRINTS A BACKTRACE
#0 0x00000000004005a3 in main (argc=1, argv=0x7fffffffe2d8) at main.c:5
(gdb) print b <----------------------------------- EXAMINE THE CONTENTS OF 'b'
$2 = (int *) 0x0
(gdb)

How to solve segmentation fault problems happening in support libraries?

I have a very odd problem going on. I can replicate the problem by the following small sample code:
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <iostream>
void printSSLErrors()
{
int l_err = ERR_get_error();
while(l_err!=0)
{
std::cout << "SSL ERROR: " << ERR_error_string(l_err, NULL) << std::endl;
l_err = ERR_get_error();
}
}
int main(int argc, char* argv[]) {
SSL_library_init();
SSL_load_error_strings();
// context
SSL_CTX* mp_ctx;
if(!(mp_ctx = SSL_CTX_new(SSLv23_server_method())))
{
printSSLErrors();
return 0;
}
std::cout << "CTX created OK" << std::endl;
// set certificate and private key
if(SSL_CTX_use_certificate_file(mp_ctx, argv[1], SSL_FILETYPE_PEM)!=1)
{
printSSLErrors();
return 0;
}
std::cout << "Certificate intialised OK" << std::endl;
if(SSL_CTX_use_PrivateKey_file(mp_ctx, argv[2], SSL_FILETYPE_PEM)!=1)
{
printSSLErrors();
return 0;
}
std::cout << "Key intialised OK" << std::endl;
SSL_CTX_free(mp_ctx);
ERR_free_strings();
}
This program works as expected when I compile it and link it using -lssl. The problem however is that the openssl routines are part of an application that also links in the mysqlclient libraries. I now recompile the above code with -lssl -lmysqlclient (note that I don't include or use anything from that library here). If I execute the program again I get a segmentation fault in the open ssl library. The most I can pull out of gdb is:
[Thread debugging using libthread_db enabled]
[New Thread -1208158528 (LWP 32359)]
CTX created OK
Certificate intialised OK
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208158528 (LWP 32359)]
0x001b1213 in X509_get_pubkey () from /lib/libcrypto.so.4
(gdb) backtrace
#0 0x001b1213 in X509_get_pubkey () from /lib/libcrypto.so.4
#1 0x00de8a6c in SSL_rstate_string () from /lib/libssl.so.4
#2 0x086f2258 in ?? ()
#3 0xbffceb64 in ?? ()
#4 0x086f1130 in ?? ()
#5 0xbffceaa8 in ?? ()
#6 0x086f2258 in ?? ()
#7 0x086f0d90 in ?? ()
#8 0x00df4858 in ?? () from /lib/libssl.so.4
#9 0x086f2258 in ?? ()
#10 0x086f1130 in ?? ()
#11 0xbffceaa8 in ?? ()
#12 0x00de9d50 in SSL_CTX_use_PrivateKey_file () from /lib/libssl.so.4
Previous frame inner to this frame (corrupt stack?)
(gdb) frame 0
#0 0x001b1213 in X509_get_pubkey () from /lib/libcrypto.so.4
For some reason this only happens when I use mysqlclient v 15 and not with mysqlclient v 16. This is probably too obscure for anyone to solve, but some comments on how linking against a dynamic library that the code itself doesn't even use can cause these errors would be very helpful.
The system is:
RHEL ES4, gcc 3.4.6, openssl-0.9.7a, MySQL-5.11
Any thoughts?
Edit: Here is the output to possibly clarify things a little more:
[Lieuwe ~]$ c++ openssl_test.cpp -lssl -o ssltest
[Lieuwe ~]$ ./ssltest /etc/httpd/conf/certs/test.crt /etc/httpd/conf/certs/test.key
CTX created OK
Certificate intialised OK
Key intialised OK
[Lieuwe ~]$ c++ openssl_test.cpp -lmysqlclient -lssl -o ssltest
[Lieuwe ~]$ ./ssltest /etc/httpd/conf/certs/test.crt /etc/httpd/conf/certs/test.key
CTX created OK
Certificate intialised OK
Segmentation fault (core dumped)
[Lieuwe ~]$
Note that for this purpose I use the crt and key file that the apache server also uses (and work)
Edit 2: Here is the (relevant?) output of valgrind for the program
CTX created OK
--5429-- REDIR: 0x5F6C80 (memchr) redirected to 0x4006184 (memchr)
Certificate intialised OK
==5429== Invalid read of size 4
==5429== at 0xCF4205: X509_get_pubkey (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xDE8A6B: (within /lib/libssl.so.0.9.7a)
==5429== by 0xDE9D4F: SSL_CTX_use_PrivateKey_file (in /lib/libssl.so.0.9.7a)
==5429== by 0x8048C77: main (in /home/liwu/ssltest)
==5429== Address 0x4219940 is 0 bytes inside a block of size 84 free'd
==5429== at 0x4004EFA: free (vg_replace_malloc.c:235)
==5429== by 0xC7FD00: CRYPTO_free (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xCE53A7: (within /lib/libcrypto.so.0.9.7a)
==5429== by 0xCE5562: ASN1_item_free (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xCE0560: X509_free (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xDE979E: SSL_CTX_use_certificate_file (in /lib/libssl.so.0.9.7a)
==5429== by 0x8048C23: main (in /home/liwu/ssltest)
==5429==
==5429== Invalid read of size 4
==5429== at 0xCD4A5F: EVP_PKEY_copy_parameters (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xDE8A7C: (within /lib/libssl.so.0.9.7a)
==5429== by 0xDE9D4F: SSL_CTX_use_PrivateKey_file (in /lib/libssl.so.0.9.7a)
==5429== by 0x8048C77: main (in /home/liwu/ssltest)
==5429== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==5429==
==5429== Process terminating with default action of signal 11 (SIGSEGV)
==5429== Access not within mapped region at address 0x0
==5429== at 0xCD4A5F: EVP_PKEY_copy_parameters (in /lib/libcrypto.so.0.9.7a)
==5429== by 0xDE8A7C: (within /lib/libssl.so.0.9.7a)
==5429== by 0xDE9D4F: SSL_CTX_use_PrivateKey_file (in /lib/libssl.so.0.9.7a)
==5429== by 0x8048C77: main (in /home/liwu/ssltest)
==5429==
I would suggest running your program under Valgrind. Valgrind is intended to provide help with exactly this kind of problem and it is generally much easier to use than a debugger.
If I were to hazard a guess, I would first suspect a memory error in your application (or, less likely, in one of the shared libraries) that is sensitive to the memory layout of the resulting executable. Adding a new shared library or, say, enabling debugging options could very well make the problem appear or disappear for no apparent reason.
The only logical explanation may be that the public key, which is needed for X509_get_pubkey(), can not be located.
Can you please verify that the public key requested by the function is available?
I'd think that the mysql client library is linked against another version of libssl. If you are on linux: are both libraries installed via your distro's official repositories? Are you linking against the static (.a) or dynamic (.so) versions of those libraries?
You can play around with the nm command to find out more (read the manpage).
You can try to rebuild the mysql client library yourself to make sure the same libssl version is used and see whether the problem disappears.