I have a program running in Linux and It's been mysteriously crashing. I already know one way to know where it crashes at is to use GDB. But I don't want to attach to it every time I restart it (do this a lot since I'm testing it). Is there an alternative way to do this?
First use ulimit -c unlimited to allow crashed programs to write core dumps.
After the program crashes, you'll find a core dump file, called core, or perhaps core.<pid> if your program is multithreaded.
You can load this into GDB to examine the state at the point of the crash with gdb program core.
First do a ulimit -c unlimited, so the program will leave a core dump.
Then, when it crashes, invoke gdb with the core dump, to read the
state of the program at the moment of the crash.
You can configure your OS to dump a core file any time a program crashes. You can then examine the core to determine the crash location.
-> compile the code with gdb flags enabled.
gcc -o < binary name > -g < file.c > (assuming it is a c/c++ program)
-> run the executable withing gdb.
gdb < binary name >
after this there are ways to find the crash location:
1. stepwise execution.
2. run the code, it crashes (as expected), type "where" within gdb (without quotes) it gives the backtrace. from that, you can find out the address.
here is a nice quick guide to gdb : http://www.cs.cmu.edu/~gilpin/tutorial/
Related
when the application compiled with -O2 crash, how to locate the function or the code line that make the application crash?
when the application compiled with -O2 crash, how to locate the function or the code line that make the application crash?
In exactly the same way as you would for an application compiled without -O2.
when the application crash on the production environment, it's hard to locate the problem.
Your first step should be to arrange for the production environment to save a core dump somewhere, or at least to print crashing address (often logged in /var/log/messages or the like).
Once you have a core, you can use debugger, e.g. gdb a.out core and then where command will list functions leading to the crash.
If you want file and line info, you need to build a.out with the -g flag (in addition to -O2).
If you don't have a core, but do have the crashing address, then addr2line -fe a.out $address should give you the function name.
I am using vim for c++ programming. I have bound the compile command to ctrl+c in vim and I run it in another tmux pane by running ./main.out. My problem is that when my c++ program gives me segmentation fault error, I don't know which line has caused the problem. But when I compiled and ran the program in vscode, it showed me the line that caused the error.
I'm seeking for a way to find out the lines that cause runtime errors like segmentation fault error while running the program's binary file in console.
This is an example output when I do ./main.out:
[1] 24656 segmentation fault (core dumped) ./main.out
When compiling the program, add the -g compiler flag, or even better -ggdb3, which will give you a much prettier output, by adding debugging symbols to the executable. Also, make sure that you compile with the -O0 optimization level.
To actually debug the program, run gdb ./main.out to start the program in a debugging session. If you then run r, gdb will start executing the program, and then stop at the line that gives the segfault.
To figure out how you got to that point, run bt while in the debugging session, and you will get a backtrace, which will show you all the function calls that were made to get to the line of code that crashed.
You can of course do a lot more than this (and you will probably need to, since locating the source of an error is often only the first step). You can use p to print the values of variables, set watchpoints, and many more things. For a while now, gdb even ships with a full fledged python interpreter, so you can even write a python script for your custom debugging needs.
Learning how to use gdb can seem overwhelming at the start, but persevere, and I guarantee the effort will pay off big time :)
Ditto on Adin
Also your code can crash due to a call in which the parameter/s are acceptable but cause the proverbial out of range protection fault from some library somewhere if you don't have those debug versions. If an assembly routine is used inside there, they can do some strange things.
So don't be afraid to add temporary code to help like finding a single call that crashes when 1,000,000 other calls to the same did not.
Is why I like to use a lot of generated randoms if possible to test when you got it fixed.
Is there any gcc option I can set that will give me the line number of the segmentation fault?
I know I can:
Debug line by line
Put printfs in the code to narrow down.
Edits:
bt / where on gdb give No stack.
Helpful suggestion
I don't know of a gcc option, but you should be able to run the application with gdb and then when it crashes, type where to take a look at the stack when it exited, which should get you close.
$ gdb blah
(gdb) run
(gdb) where
Edit for completeness:
You should also make sure to build the application with debug flags on using the -g gcc option to include line numbers in the executable.
Another option is to use the bt (backtrace) command.
Here's a complete shell/gdb session
$ gcc -ggdb myproj.c
$ gdb a.out
gdb> run --some-option=foo --other-option=bar
(gdb will say your program hit a segfault)
gdb> bt
(gdb prints a stack trace)
gdb> q
[are you sure, your program is still running]? y
$ emacs myproj.c # heh, I know what the error is now...
Happy hacking :-)
You can get gcc to print you a stacktrace when your program gets a SEGV signal, similar to how Java and other friendlier languages handle null pointer exceptions. See my answer here for more details:
how to generate a stacktace when my C++ app crashes ( using gcc compiler )
The nice thing about this is you can just leave it in your code; you don't need to run things through gdb to get the nice debug output.
If you compile with -g and follow the instructions there, you can use a command-line tool like addr2line to get file/line information from the output.
Run it under valgrind.
you also need to build with debug flags on -g
You can also open the core dump with gdb (you need -g though).
If all the preceding suggestions to compile with debugging (-g) and run under a debugger (gdb, run, bt) are not working for you, then:
Elementary: Maybe you're not running under the debugger, you're just trying to analyze the postmortem core dump. (If you start a debug session, but don't run the program, or if it exits, then when you ask for a backtrace, gdb will say "No stack" -- because there's no running program at all. Don't forget to type "run".) If it segfaulted, don't forget to add the third argument (core) when you run gdb, otherwise you start in the same state, not attached to any particular process or memory image.
Difficult: If your program is/was really running but your gdb is saying "No stack" perhaps your stack pointer is badly smashed. In which case, you may be a buffer overflow problem somewhere, severe enough to mash your runtime state entirely. GCC 4.1 supports the ProPolice "Stack Smashing Protector" that is enabled with -fstack-protector-all. It can be added to GCC 3.x with a patch.
There is no method for GCC to provide this information, you'll have to rely on an external program like GDB.
GDB can give you the line where a crash occurred with the "bt" (short for "backtrace") command after the program has seg faulted. This will give you not only the line of the crash, but the whole stack of the program (so you can see what called the function where the crash happened).
The No stack problem seems to happen when the program exit successfully.
For the record, I had this problem because I had forgotten a return in my code, which made my program exit with failure code.
I'm using gcc 4.9.2 & gdb 7.2 in Solaris 10 on sparc. The following was tested after compiling/linking with -g, -ggdb, and -ggdb3.
When I attach to a process:
~ gdb
/snip/
(gdb) attach pid_goes_here
... it is not loading symbolic information. I started with netbeans which starts gdb without specifying the executable name until after the attach occurs, but I've eliminated netbeans as the cause.
I can force it to load the symbol table under netbeans if I do one of the following:
Attach to the process, then in the debugger console do one of the following:
(gdb) detach
(gdb) file /path/to/file
(gdb) attach the_pid_goes_here
or
(gdb) file /path/to/file
(gdb) sharedlibrary .
I want to know if there's a more automatic way I can force this behavior. So far googling has turned up zilch.
I want to know if there's a more automatic way I can force this behavior.
It looks like a bug.
Are you sure that the main executable symbols are loaded? This bug says that attach pid without giving the binary doesn't work on Solaris at all.
In any case, it's supposed to work automatically, so your best bet to make it work better is probably to file a bug, and wait for it to be fixed (or send a patch to fix it yourself :-)
I'm using Linux redhat 3, can someone explain how is that possible that i am able to analyze
with gdb , a core dump generated in Linux redhat 5 ?
not that i complaint :) but i need to be sure this will always work... ?
EDIT: the shared libraries are the same version, so no worries about that, they are placed in a shaerd storage so it can be accessed from both linux 5 and linux 3.
thanks.
You can try following commands of GDB to open a core file
gdb
(gdb) exec-file <executable address>
(gdb) set solib-absolute-prefix <path to shared library>
(gdb) core-file <path to core file>
The reason why you can't rely on it is because every process used libc or system shared library,which will definitely has changes from Red hat 3 to red hat 5.So all the instruction address and number of instruction in native function will be diff,and there where debugger gets goofed up,and possibly can show you wrong data to analyze. So its always good to analyze the core on the same platform or if you can copy all the required shared library to other machine and set the path through set solib-absolute-prefix.
In my experience analysing core file, generated on other system, do not work, because standard library (and other libraries your program probably use) typically will be different, so addresses of the functions are different, so you cannot even get a sensible backtrace.
Don't do it, because even if it works sometimes, you cannot rely on it.
You can always run gdb -c /path/to/corefile /path/to/program_that_crashed. However, if program_that_crashed has no debug infos (i.e. was not compiled and linked with the -g gcc/ld flag) the coredump is not that useful unless you're a hard-core debugging expert ;-)
Note that the generation of corefiles can be disabled (and it's very likely that it is disabled by default on most distros). See man ulimit. Call ulimit -c to see the limit of core files, "0" means disabled. Try ulimit -c unlimited in this case. If a size limit is imposed the coredump will not exceed the limit size, thus maybe cutting off valuable information.
Also, the path where a coredump is generated depends on /proc/sys/kernel/core_pattern. Use cat /proc/sys/kernel/core_pattern to query the current pattern. It's actually a path, and if it doesn't start with / then the file will be generated in the current working directory of the process. And if cat /proc/sys/kernel/core_uses_pid returns "1" then the coredump will have the file PID of the crashed process as file extension. You can also set both value, e.g. echo -n /tmp/core > /proc/sys/kernel/core_pattern will force all coredumps to be generated in /tmp.
I understand the question as:
how is it possible that I am able to
analyse a core that was produced under
one version of an OS under another
version of that OS?
Just because you are lucky (even that is questionable). There are a lot of things that can go wrong by trying to do so:
the tool chains gcc, gdb etc will
be of different versions
the shared libraries will be of
different versions
so no, you shouldn't rely on that.
You have asked similar question and accepted an answer, ofcourse by yourself here : Analyzing core file of shared object
Once you load the core file you can get the stack trace and get the last function call and check the code for the reason of crash.
There is a small tutorial here to get started with.
EDIT:
Assuming you want to know how to analyse core file using gdb on linux as your question is little unclear.