How to debug further based on Valgrind output - c++

I have C/C++ code which is giving a segfault. It is compiled using gcc/g++ on a RH Linux Enterprise server. I used the Valgrind memory checker on the executable with:
valgrind --tool=memcheck --leak-check=full --show-reachable=yes
I get this as one of Valgrind's output messages:
==7053== Invalid read of size 1
==7053== at 0xDBC96C: func1 (file1:4742)
==7053== by 0xDB8769: func2 (file1.c:3478)
==7053== by 0xDB167E: func3 (file1.c:2032)
==7053== by 0xDB0378: func4 (file1.c:1542)
==7053== by 0xDB97D8: func5 (file1.c:3697)
==7053== by 0xDB17A7: func6 (file1.c:2120)
==7053== by 0xDBD55E: func7 (file2.c:271)
==7053== Address 0x1bcaf2f0 is not stack'd, malloc'd or (recently) free'd
I read that to mean that my code has accessed an invalid memory location it is not allowed to.
My questions:
How do I find out which buffer memory access has been invalid, and which of the functions above has done that.
How can I use the address 0x1bcaf2f0, which valgrind is saying is invalid. How can I find the symbol (essentially, the buffer name) at that address? Memory map file, any other way.
Any other general pointers, valgrind options or other tools for using Valgrind to detect memory (heap/stack corruption) errors?

Ad 1: In your example, that'd be func1 in line file1:4742 (1). The following functions are the stack trace. Analyzing that line should lead you to the invalid memory access.
Ad 2: Try splitting it into multiple simpler lines in case it's too complex and not obvious which exact call is causing the warning.
Ad 3: memcheck is the quintessential valgrind tool for detecting errors with heap memory. It won't help for stack corruption though.

If you have Valgrind 3.7.0, you can use the embedded gdbserver to
debug with gdb your application running under Valgrind.
See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver

Related

How to enable address sanitizer at godbolt.org

I am trying to enable address sanitizer at godbolt.org with -fsanitize=address, but get error:
==3==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==3==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Example: https://godbolt.org/z/5GDtrr
How can I enable address sanitizer correctly?
By design Asan allocates a huge amount of virtual memory (20 Tb on x86_64 machines) at start (details here). This may be a problem if overcommit is disabled or virtual memory is limited with ulimit -v.
In both cases there's nothing Asan can do - you'll need to raise this with Godbolt VM maintainers in https://github.com/mattgodbolt/compiler-explorer/issues

Can I use result of windbg analyse if I have some symbol warnings?

I am new in windbg and memory analize in windows.
I try analize memory dump (crash dump) it's x64 system.
After loading all symbols (my and microsoft)
I type !analyze -v
This is a part of output:
......
FAULTING_SOURCE_CODE: <some code here>
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: rtplogic!CSRTPStack::Finalize+19d
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: RTPLogic
IMAGE_NAME: RTPLogic.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 58542837
STACK_COMMAND: ~544s; .ecxr ; kb
FAILURE_BUCKET_ID: WRONG_SYMBOLS_c0000374_RTPLogic.dll!CSRTPStack::Finalize
BUCKET_ID: X64_APPLICATION_FAULT_WRONG_SYMBOLS_rtplogic!CSRTPStack::Finalize+19d
......
This WRONG_SYMBOLS worried me.
Can I be sure code in FAULTING_SOURCE_CODE it is the code that related to crash?
No, unfortunately you can't trust it. There's at least one point in the analysis of the call stack where the debugger wasn't 100% sure if he got the stack unwinding right.
When you type ~544s; .ecxr; k you'll see a call stack. That call stack will include a warning at that point where it becomes uncertain. You can trust everything before, which may already help, but you can't trust the stack frames below the warning.
You can compare the k output to dps #ebp (maybe add L fff if it's not enough) in order to see what else the debugger could have guessed.
Note that in the output of dps you may also see totally unrelated stuff if, by accident, one of your calculations on the stack resulted in a value that could be interpreted as a symbol.
c0000374 is a STATUS_HEAP_CORRUPTION. Looking at the normal dump only shows code after the corruption has occurred.
Activate Pageheap with gflags.exe for your exe
PageHeap enables Windows features that reserve memory at the boundary of each allocation to detect attempts to access memory beyond the allocation. This will crash the app sooner and here you can see the real cause of the crash. Open the dmp and run !analyze -v to see what gets corrupted.

I am getting the following error during run time. What does it mean and how do i debug it?

*** glibc detected *** ./main: corrupted double-linked list: 0x086c4f30 ***
After this the program does not exit and I am forced to exit using cntrl+C. I am not using any memory de allocation like "delete" in my whole code either
On using Valgrind, i get the following message:
Invalid write of size 4
==20358== at 0x8049932: main (main.cpp:123)
==20358== Address 0x432e6f8 is 0 bytes after a block of size 16 alloc'd
==20358== at 0x402C454: operator new[](unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==20358== by 0x8049907: main (main.cpp:120)
And the corresponding piece of code in line 123 is
float **der_global= new float *[NODES];
for(int i=0; i<no_element; i++)
{
der_global[i]=new float [no_element];
}
Your original new call gives you space to store NODES pointers; but your for-loop tries to set no_element of them, which doesn't have to be the same number. Your for loop should have i less than NODES, not i less than no_element.
This error usually shows up when the program frees memory that is no longer valid.
Are u using malloc or any other dynamic allocation.
It would be easy to solve ur problem if u could add some of your code
Try using valgrind
valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-reachable=yes --log-file=val.log ./<executable> <parameters>
and look at the val.log
You could also use gdb but for that u will need to compile with the -g tag

Regarding Possible Lost in Valgrind

What is wrong if we push the strings into vector like this:
globalstructures->schema.columnnames.push_back("id");
When i am applied valgrind on my code it is showing
possibly lost of 27 bytes in 1 blocks are possibly lost in loss record 7 of 19.
like that in so many places it is showing possibly lost.....because of this the allocations and frees are not matching....which is resulting in some strange error like
malloc.c:No such file or directory
Although I am using calloc for allocation of memory everywhere in my code i am getting warnings like
Syscall param write(buf) points to uninitialised byte(s)
The code causing that error is
datapage *dataPage=(datapage *)calloc(1,PAGE_SIZE);
writePage(dataPage,dataPageNumber);
int writePage(void *buffer,long pagenumber)
{
int fd;
fd=open(path,O_WRONLY, 0644);
if (fd < 0)
return -1;
lseek(fd,pagenumber*PAGE_SIZE,SEEK_SET);
if(write(fd,buffer,PAGE_SIZE)==-1)
return false;
close(fd);
return true;
}
Exact error which i am getting when i am running through gdb is ...
Breakpoint 1, getInfoFromSysColumns (tid=3, numColumns=#0x7fffffffdf24: 1, typesVector=..., constraintsVector=..., lengthsVector=...,
columnNamesVector=..., offsetsVector=...) at dbheader.cpp:1080
Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff78bd720, bytes=8) at malloc.c:3498
3498 malloc.c: No such file or directory.
When i run the same through valgrind it's working fine...
Well,
malloc.c:No such file or directory
can occur while you are debugging using gdb and you use command "s" instead of "n" near malloc which essentially means you are trying to step into malloc, the source of which may not be not available on your Linux machine.
That is perhaps the same reason why it is working fine with valgrind.
Why error is in malloc:
The problem is that you overwrote some memory buffer and corrupted one
of the structures used by the memory manager. (c)
Try to run valgrind with --track-origins=yes and see where that uninitialized access comes from. If you believe that it should be initialized and it is not, maybe the data came from a bad pointer, valgrind will show you where exactly the values were created. Probably those uninitialized values overwrote your buffer, including memory manager special bytes.
Also, review all valgrind warnings before the crash.

Segmentation fault when run as root?

My c++ program gives me a seg fault when I run as root from my computer but not when I start a remote session. My program run from my computer only as a user. What can be the problem? I wrote my program for an embedded device and I'm using this to compile:
gcc -Werror notify.cc -o notify `pkg-config --libs --cflags gtk+-2.0 hildon-notifymm hildonmm hildon-fmmm'
I'm not getting any error. Could it be a flag problem? I can post my code.
EDIT: When I start my program with gdb I get this:
Program received signal SIGSEGV, Segmentation fault.
0x40eed060 in strcmp () from /lib/libc.so.6
0x40eed060 <strcmp+0>: ldrb r2, [r0], #1
Backtrace give this:
(gdb) backtrace
#0 0x40eed060 in strcmp () from /lib/libc.so.6
#1 0x40b7f190 in dbus_set_g_error ()
from /usr/lib/libdbus-glib-1.so.2
#2 0x40b7d060 in dbus_g_bus_get () from /usr/lib/libdbus-glib-1.so.2
#3 0x400558ec in notify_init () from /usr/lib/libnotify.so.1
#4 0x4004a240 in Notify::init(Glib::ustring const&) ()
from /usr/lib/libnotifymm-1.0.so.7
#5 0x40033794 in Hildon::notify_init(Glib::ustring const&) ()
from /usr/lib/libhildon-notifymm-1.0.so.1
Here is my code:
#include <hildonmm.h>
#include <hildon-notifymm.h>
#include <hildon/hildon-notification.h>
#include <libnotifymm/init.h>
#include <gtkmm/stock.h>
#include <dbus/dbus.h>
#include <dbus/dbus-glib.h>
#include <dbus/dbus-glib-lowlevel.h>
#include <iostream>
int main(int argc, char *argv[])
{
// Initialize gtkmm and maemomm:
Hildon::init();
Hildon::notify_init("Notification Example");
// Initialize D-Bus (needed by hildon-notify):
DBusConnection* conn = dbus_bus_get(DBUS_BUS_SESSION, NULL);
dbus_connection_setup_with_g_main(conn, NULL);
// Create a new notification:
Glib::RefPtr<Hildon::Notification> notification = Hildon::Notification::create("Something Happened", "A thing has just happened.", Gtk::Stock::OPEN);
// Show the notification:
std::auto_ptr<Glib::Error> ex;
notification->show(ex);
if(ex.get())
{
std::cerr << "Notification::show() failed: " << ex->what() << std::endl;
}
return 0;
}
EDIT: Problem solved. Program needs a DBUS_SESSION_ADDRESS in the env of the terminal.
The problem is that you've invoked undefined behavior somewhere. Undefined behavior can behave differently on different machines, different runs on the same machine, whatever. You've got to find where you let a wild pointer happen and deal with it.
Most likely you're just getting "lucky" when running as a limited user and either the page permissions on your process are set to allow whatever invalid memory access you're getting, or you have some root-specific code which isn't being reached when run in usermode only.
You might want to run your program under valgrind. I wrote a tiny program that writes outside of an allocated array:
$ valgrind ./segfault
==11830== Memcheck, a memory error detector
==11830== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11830== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==11830== Command: ./segfault
==11830==
==11830== Invalid write of size 1
==11830== at 0x4004BF: main (in /tmp/segfault)
==11830== Address 0x7feff65bf is not stack'd, malloc'd or (recently) free'd
==11830==
==11830==
==11830== Process terminating with default action of signal 11 (SIGSEGV)
==11830== Access not within mapped region at address 0x7FEFF65BF
==11830== at 0x4004BF: main (in /tmp/segfault)
==11830== If you believe this happened as a result of a stack
==11830== overflow in your program's main thread (unlikely but
==11830== possible), you can try to increase the size of the
==11830== main thread stack using the --main-stacksize= flag.
==11830== The main thread stack size used in this run was 8388608.
==11830==
==11830== HEAP SUMMARY:
==11830== in use at exit: 0 bytes in 0 blocks
==11830== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==11830==
==11830== All heap blocks were freed -- no leaks are possible
==11830==
==11830== For counts of detected and suppressed errors, rerun with: -v
==11830== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Segmentation fault
The most important part of this output is here:
==11830== Invalid write of size 1
==11830== at 0x4004BF: main (in /tmp/segfault)
The write of size 1 might help you figure out which line was involved:
int main(int argc, char *argv[]) {
char f[1];
f[-40000]='c';
return 0;
}
Another very useful tool to know is gdb. If you set your rlimits to allow dumping core (see setrlimit(2) for details on the limits, and your shell's manual (probably bash(1)) for details on the ulimit built-in command) then you can get a core file for use with gdb:
$ ulimit -c 1000
$ ./segfault
Segmentation fault (core dumped)
$ gdb --core=core ./segfault
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/segfault...(no debugging symbols found)...done.
[New Thread 11951]
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libc.so.6...Reading symbols from /usr/lib/debug/lib/libc-2.12.1.so...done.
done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.12.1.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004004bf in main ()
(gdb) bt
#0 0x00000000004004bf in main ()
(gdb) quit
Depending upon the size of your program, you might need to give way more than 1000 blocks to the allowed core file. If this program were remotely complicated, knowing the call chain to get to the segfault could be vital information.
It's hard to say anything specific without seeing any code, so I'll give you some general
advice: learn to use your debugger (probably gdb), and try to reproduce the failure
under the debugger. If you're lucky, the segfault will still occur under the debugger,
you'll get a stack trace showing where it failed, and that will give you a starting point that will let you work your way back to the true source of the problem.
If you're unlucky, the problem might disappear if you compile with debugging support, or
run it under gdb. In that case you'll have to resort to code inspection, and scrub your
code for any undefined behavior (for example, wild or uninitialized pointers, as
Billy ONeal suggests).
Set ulimit -c unlimited.
Run your program and let it crash. It should now core dump.
Run gdb <program-name> core
If you use the bt (backtrace) command, it should give you a good idea where the crash is happening. This should then help you fix it.