Segmentation fault when run as root? - c++

My c++ program gives me a seg fault when I run as root from my computer but not when I start a remote session. My program run from my computer only as a user. What can be the problem? I wrote my program for an embedded device and I'm using this to compile:
gcc -Werror notify.cc -o notify `pkg-config --libs --cflags gtk+-2.0 hildon-notifymm hildonmm hildon-fmmm'
I'm not getting any error. Could it be a flag problem? I can post my code.
EDIT: When I start my program with gdb I get this:
Program received signal SIGSEGV, Segmentation fault.
0x40eed060 in strcmp () from /lib/libc.so.6
0x40eed060 <strcmp+0>: ldrb r2, [r0], #1
Backtrace give this:
(gdb) backtrace
#0 0x40eed060 in strcmp () from /lib/libc.so.6
#1 0x40b7f190 in dbus_set_g_error ()
from /usr/lib/libdbus-glib-1.so.2
#2 0x40b7d060 in dbus_g_bus_get () from /usr/lib/libdbus-glib-1.so.2
#3 0x400558ec in notify_init () from /usr/lib/libnotify.so.1
#4 0x4004a240 in Notify::init(Glib::ustring const&) ()
from /usr/lib/libnotifymm-1.0.so.7
#5 0x40033794 in Hildon::notify_init(Glib::ustring const&) ()
from /usr/lib/libhildon-notifymm-1.0.so.1
Here is my code:
#include <hildonmm.h>
#include <hildon-notifymm.h>
#include <hildon/hildon-notification.h>
#include <libnotifymm/init.h>
#include <gtkmm/stock.h>
#include <dbus/dbus.h>
#include <dbus/dbus-glib.h>
#include <dbus/dbus-glib-lowlevel.h>
#include <iostream>
int main(int argc, char *argv[])
{
// Initialize gtkmm and maemomm:
Hildon::init();
Hildon::notify_init("Notification Example");
// Initialize D-Bus (needed by hildon-notify):
DBusConnection* conn = dbus_bus_get(DBUS_BUS_SESSION, NULL);
dbus_connection_setup_with_g_main(conn, NULL);
// Create a new notification:
Glib::RefPtr<Hildon::Notification> notification = Hildon::Notification::create("Something Happened", "A thing has just happened.", Gtk::Stock::OPEN);
// Show the notification:
std::auto_ptr<Glib::Error> ex;
notification->show(ex);
if(ex.get())
{
std::cerr << "Notification::show() failed: " << ex->what() << std::endl;
}
return 0;
}
EDIT: Problem solved. Program needs a DBUS_SESSION_ADDRESS in the env of the terminal.

The problem is that you've invoked undefined behavior somewhere. Undefined behavior can behave differently on different machines, different runs on the same machine, whatever. You've got to find where you let a wild pointer happen and deal with it.
Most likely you're just getting "lucky" when running as a limited user and either the page permissions on your process are set to allow whatever invalid memory access you're getting, or you have some root-specific code which isn't being reached when run in usermode only.

You might want to run your program under valgrind. I wrote a tiny program that writes outside of an allocated array:
$ valgrind ./segfault
==11830== Memcheck, a memory error detector
==11830== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11830== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==11830== Command: ./segfault
==11830==
==11830== Invalid write of size 1
==11830== at 0x4004BF: main (in /tmp/segfault)
==11830== Address 0x7feff65bf is not stack'd, malloc'd or (recently) free'd
==11830==
==11830==
==11830== Process terminating with default action of signal 11 (SIGSEGV)
==11830== Access not within mapped region at address 0x7FEFF65BF
==11830== at 0x4004BF: main (in /tmp/segfault)
==11830== If you believe this happened as a result of a stack
==11830== overflow in your program's main thread (unlikely but
==11830== possible), you can try to increase the size of the
==11830== main thread stack using the --main-stacksize= flag.
==11830== The main thread stack size used in this run was 8388608.
==11830==
==11830== HEAP SUMMARY:
==11830== in use at exit: 0 bytes in 0 blocks
==11830== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==11830==
==11830== All heap blocks were freed -- no leaks are possible
==11830==
==11830== For counts of detected and suppressed errors, rerun with: -v
==11830== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Segmentation fault
The most important part of this output is here:
==11830== Invalid write of size 1
==11830== at 0x4004BF: main (in /tmp/segfault)
The write of size 1 might help you figure out which line was involved:
int main(int argc, char *argv[]) {
char f[1];
f[-40000]='c';
return 0;
}
Another very useful tool to know is gdb. If you set your rlimits to allow dumping core (see setrlimit(2) for details on the limits, and your shell's manual (probably bash(1)) for details on the ulimit built-in command) then you can get a core file for use with gdb:
$ ulimit -c 1000
$ ./segfault
Segmentation fault (core dumped)
$ gdb --core=core ./segfault
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/segfault...(no debugging symbols found)...done.
[New Thread 11951]
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libc.so.6...Reading symbols from /usr/lib/debug/lib/libc-2.12.1.so...done.
done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.12.1.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./segfault'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004004bf in main ()
(gdb) bt
#0 0x00000000004004bf in main ()
(gdb) quit
Depending upon the size of your program, you might need to give way more than 1000 blocks to the allowed core file. If this program were remotely complicated, knowing the call chain to get to the segfault could be vital information.

It's hard to say anything specific without seeing any code, so I'll give you some general
advice: learn to use your debugger (probably gdb), and try to reproduce the failure
under the debugger. If you're lucky, the segfault will still occur under the debugger,
you'll get a stack trace showing where it failed, and that will give you a starting point that will let you work your way back to the true source of the problem.
If you're unlucky, the problem might disappear if you compile with debugging support, or
run it under gdb. In that case you'll have to resort to code inspection, and scrub your
code for any undefined behavior (for example, wild or uninitialized pointers, as
Billy ONeal suggests).

Set ulimit -c unlimited.
Run your program and let it crash. It should now core dump.
Run gdb <program-name> core
If you use the bt (backtrace) command, it should give you a good idea where the crash is happening. This should then help you fix it.

Related

How to debug crash, when backtrace starts with zero

my long running application crashes randomly with segmentation fault. When trying to debug the generated coredump, I get stuck with wierd stacktrace:
(gdb) bt full
#0 __memmove_ssse3 () at ../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:2582
No locals.
#1 0x00000000 in ?? ()
No symbol table info available.
How it can happen, that the backtrace starts at 0x00000000?
What can I do to debug this issue more? I can't run it in gdb as it may take even a week till the crash occures.
Generally this means that the return address on the stack has been overwritten with 0, probably due to overrunning the end of an on-stack array. You can trying building with address sanitizer on gcc or clang (if you are using them). Or you can try running with valgrind to see if it will tell you about invalid memory writes.

I am getting the following error during run time. What does it mean and how do i debug it?

*** glibc detected *** ./main: corrupted double-linked list: 0x086c4f30 ***
After this the program does not exit and I am forced to exit using cntrl+C. I am not using any memory de allocation like "delete" in my whole code either
On using Valgrind, i get the following message:
Invalid write of size 4
==20358== at 0x8049932: main (main.cpp:123)
==20358== Address 0x432e6f8 is 0 bytes after a block of size 16 alloc'd
==20358== at 0x402C454: operator new[](unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==20358== by 0x8049907: main (main.cpp:120)
And the corresponding piece of code in line 123 is
float **der_global= new float *[NODES];
for(int i=0; i<no_element; i++)
{
der_global[i]=new float [no_element];
}
Your original new call gives you space to store NODES pointers; but your for-loop tries to set no_element of them, which doesn't have to be the same number. Your for loop should have i less than NODES, not i less than no_element.
This error usually shows up when the program frees memory that is no longer valid.
Are u using malloc or any other dynamic allocation.
It would be easy to solve ur problem if u could add some of your code
Try using valgrind
valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-reachable=yes --log-file=val.log ./<executable> <parameters>
and look at the val.log
You could also use gdb but for that u will need to compile with the -g tag

GDB backtrace from hard fault handler on ARM cortex M0 NRF51822

I'm on an ARM Cortex M0 (Nordic NRF51822) using the Segger JLink. When my code hard faults (say due to a dereferencing an invalid pointer), I see only the following stack trace:
(gdb) bt
#0 HardFault_HandlerC (hardfault_args=<optimized out>) at main_display.cpp:440
#1 0x00011290 in ?? ()
I have a hard fault handler installed and it can give me the lr and pc:
(gdb) p/x stacked_pc
$1 = 0x18ea6
(gdb) p/x stacked_lr
$2 = 0x18b35
And I know I can use addr-to-line to translate these to source code lines:
> arm-none-eabi-addr2line -e main_display.elf 0x18ea6
/Users/cmason/code/nrf/src/../libs/epaper/EPD_Display.cpp:33
> arm-none-eabi-addr2line -e main_display.elf 0x18b35
/Users/cmason/code/nrf/src/../libs/epaper/EPD.cpp:414
Can I get the rest of the backtrace somehow? If I stop at a normal breakpoint I can get a backtrace, so I know GDB can do the (somewhat complex) algorithm to unwind the stack on ARM. I understand that, in the general case, the stack may be screwed up by my code to the point where it's unreadable, but I don't think that's whats happening in this case.
I think this may be complicated by Nordic's memory protection scheme. Their bluetooth stack installs its own interrupt vector and prevents access to certain memory regions. Or maybe this is Segger's fault? On other examples of Cortex M0 do most people see regular back traces from hard faults?
Thanks!
-c
Cortex-M0 and Cortex-M3 is close enough that you can use the answer from this question:
Stack Backtrace for ARM core using GCC compiler (when there is a MSP to PSP switch)
in short: GCC has a function _Unwind_Backtrace to generate a full call stack; this needs to be hacked up a bit to simulate doing a backtrace from before the exception entry happened. Details in the linked question.

Regarding Possible Lost in Valgrind

What is wrong if we push the strings into vector like this:
globalstructures->schema.columnnames.push_back("id");
When i am applied valgrind on my code it is showing
possibly lost of 27 bytes in 1 blocks are possibly lost in loss record 7 of 19.
like that in so many places it is showing possibly lost.....because of this the allocations and frees are not matching....which is resulting in some strange error like
malloc.c:No such file or directory
Although I am using calloc for allocation of memory everywhere in my code i am getting warnings like
Syscall param write(buf) points to uninitialised byte(s)
The code causing that error is
datapage *dataPage=(datapage *)calloc(1,PAGE_SIZE);
writePage(dataPage,dataPageNumber);
int writePage(void *buffer,long pagenumber)
{
int fd;
fd=open(path,O_WRONLY, 0644);
if (fd < 0)
return -1;
lseek(fd,pagenumber*PAGE_SIZE,SEEK_SET);
if(write(fd,buffer,PAGE_SIZE)==-1)
return false;
close(fd);
return true;
}
Exact error which i am getting when i am running through gdb is ...
Breakpoint 1, getInfoFromSysColumns (tid=3, numColumns=#0x7fffffffdf24: 1, typesVector=..., constraintsVector=..., lengthsVector=...,
columnNamesVector=..., offsetsVector=...) at dbheader.cpp:1080
Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff78bd720, bytes=8) at malloc.c:3498
3498 malloc.c: No such file or directory.
When i run the same through valgrind it's working fine...
Well,
malloc.c:No such file or directory
can occur while you are debugging using gdb and you use command "s" instead of "n" near malloc which essentially means you are trying to step into malloc, the source of which may not be not available on your Linux machine.
That is perhaps the same reason why it is working fine with valgrind.
Why error is in malloc:
The problem is that you overwrote some memory buffer and corrupted one
of the structures used by the memory manager. (c)
Try to run valgrind with --track-origins=yes and see where that uninitialized access comes from. If you believe that it should be initialized and it is not, maybe the data came from a bad pointer, valgrind will show you where exactly the values were created. Probably those uninitialized values overwrote your buffer, including memory manager special bytes.
Also, review all valgrind warnings before the crash.

How to debug further based on Valgrind output

I have C/C++ code which is giving a segfault. It is compiled using gcc/g++ on a RH Linux Enterprise server. I used the Valgrind memory checker on the executable with:
valgrind --tool=memcheck --leak-check=full --show-reachable=yes
I get this as one of Valgrind's output messages:
==7053== Invalid read of size 1
==7053== at 0xDBC96C: func1 (file1:4742)
==7053== by 0xDB8769: func2 (file1.c:3478)
==7053== by 0xDB167E: func3 (file1.c:2032)
==7053== by 0xDB0378: func4 (file1.c:1542)
==7053== by 0xDB97D8: func5 (file1.c:3697)
==7053== by 0xDB17A7: func6 (file1.c:2120)
==7053== by 0xDBD55E: func7 (file2.c:271)
==7053== Address 0x1bcaf2f0 is not stack'd, malloc'd or (recently) free'd
I read that to mean that my code has accessed an invalid memory location it is not allowed to.
My questions:
How do I find out which buffer memory access has been invalid, and which of the functions above has done that.
How can I use the address 0x1bcaf2f0, which valgrind is saying is invalid. How can I find the symbol (essentially, the buffer name) at that address? Memory map file, any other way.
Any other general pointers, valgrind options or other tools for using Valgrind to detect memory (heap/stack corruption) errors?
Ad 1: In your example, that'd be func1 in line file1:4742 (1). The following functions are the stack trace. Analyzing that line should lead you to the invalid memory access.
Ad 2: Try splitting it into multiple simpler lines in case it's too complex and not obvious which exact call is causing the warning.
Ad 3: memcheck is the quintessential valgrind tool for detecting errors with heap memory. It won't help for stack corruption though.
If you have Valgrind 3.7.0, you can use the embedded gdbserver to
debug with gdb your application running under Valgrind.
See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver