I want to get the address of function _dl_start (entry point of the dynamic linker). I am able to set a breakpoint using gdb. I expected to find the symbol using readelf but I did not. How can I get the address / how does the gdb resolve _dl_start?
The example source (main.cpp) to set the breakpoint using the gdb is
int main( int argc, char** argv, char** envp )
{
return 0;
}
I compiled it with
g++ main.cpp -o teststart
The gdb output when running the program was
(gdb) b _dl_start
Function "_dl_start" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_dl_start) pending.
(gdb) r
Starting program: /tmp/teststart
Breakpoint 1, 0x00007fa7ee8c4fc4 in _dl_start () from /lib64/ld-linux-x86-64.so.2
The _dl_start symbol is in ld-linux-x86-64.so.2 (the dynamic loader), and that symbol is private to ld-linux. This means that the only way to find it from inside the program is to do the same thing GDB does: read the symbol table of ld-linux, and search it for the "_dl_start" function (by name). Linking to it directly (as Martin suggested) can not and will not work (as you've already discovered).
Reading ELF symbol tables is not very complicated -- you just have to find .symtab and .strtab sections, and read the .symtab as a table of Elf64_Sym entries. Or use libelf (start here).
An additional complication is that ld-linux could be stripped (the symbol table is not required for it to work). If it is stipped, neither GDB, nor your program will be able to find _dl_start.
Finally, it is somewhat likely that your attempt to find _dl_start is pointless: you do realize that this function is called long before the first instruction of your program is executed. By the time you hit main, _dl_start has long finished, never to be called again.
Upate:
I still wonder how gdb gets the address of _dl_start in ld-linux (it is stripped)
If ld-linux is stripped, GDB will not be able to find _dl_start in it. Since you GDB does find it, either
your ld-linux is not actually stripped, or
you have "separate debuginfo" package for glibc installed.
To verify that ld-linux is really fully stripped, run nm /lib64/ld-linux-x86-64.so.2 | grep _dl_start and readelf -S /lib64/ld-linux-x86-64.so.2 | grep symtab. Both commands should produce no output.
To see where GDB is loading symbols from, you can use set print symbol-loading on command (before running the executable).
I wanted to to call _dl_start (after preparing the stack and adjusting the auxiliary vector) to create an executable image of a program stored already in memory (file representation)...
I don't see how that could possibly work. _dl_start expects certain state (e.g. its global variables to be zeroed out) before it is called, so calling it for a second time is very likely to result in assertion failure even if you don't adjust the aux vector. And assert is even more likely if you do adjust aux vector in some non-trivial way, which is (apparently) your goal.
_dl_start is not part of your program itself, it is contained in the runtime loader (as you can see from the output "..._dl_start () from /lib64/ld-linux-x86-64.so.2").
GDB initially cannot set a breakpoint, because it is not contained in your executable.
It is a bit unclear to me, if you want to know the address of _dl_start from inside the program or from outside? From the inside, you should be able to simply assign it e.g. to a void* variable like this:
void* address = dl_start;
Related
Sometimes there is a function in my binary that I'm sure hasn't been optimized away, because it's called by another function:
(gdb) disassemble 'k3::(anonymous namespace)::BM_AwaitLongReadyChain(testing::benchmark::State&)'
Dump of assembler code for function k3::(anonymous namespace)::BM_AwaitLongReadyChain(testing::benchmark::State&):
[...]
0x00000000003a416d <+45>: call 0x3ad0e0 <k3::(anonymous namespace)::RecursivelyAwait<k3::(anonymous namespace)::Immediate17>(unsigned long, k3::(anonymous namespace)::Immediate17&&)>
End of assembler dump.
But if I ask GDB to disassemble it using the very same name that it refers to the function with, it claims the function doesn't exist:
(gdb) disassemble 'k3::(anonymous namespace)::RecursivelyAwait<k3::(anonymous namespace)::Immediate17>(unsigned long, k3::(anonymous namespace)::Immediate17&&)'
No symbol "k3::(anonymous namespace)::RecursivelyAwait<k3::(anonymous namespace)::Immediate17>(unsigned long, k3::(anonymous namespace)::Immediate17&&)" in current context.
However, if I disassemble it using its address, it works fine:
(gdb) disassemble 0x3ad0e0
Dump of assembler code for function k3::(anonymous namespace)::RecursivelyAwait<k3::(anonymous namespace)::Immediate17>(unsigned long, k3::(anonymous namespace)::Immediate17&&):
0x00000000003ad0e0 <+0>: push rbp
[...]
End of assembler dump.
This is terribly inconvenient, because I don't know the address a priori—I have to go disassemble a caller just to find the address of the callee. It's really cumbersome.
How can I get GDB to disassemble this function by name? I assume this is some issue with name mangling/canonicalization, probably around the rvalue references and/or anonymous namespaces, but I can't figure out what exactly is going on. I'm using GDB 10.0-gg5.
But if I ask GDB to disassemble it using the very same name that it refers to the function with, it claims the function doesn't exist
There are many possible mangling schemes; the relationship between mangled and unmangled names is not 1:1.
The parser built into GCC which turns foo::bar(int) into something which can be used to lookup the symbol in the symbol table may have bugs.
This is terribly inconvenient, because I don't know the address a priori—I have to go disassemble a caller just to find the address of the callee.
If the called function is already on the stack (i.e. part of active call chain), you can easily disassemble it via disas $any_address_in_fn -- you don't need to give GDB the starting address. So you could do e.g. frame 5 followed by disas $pc -- GDB will find enclosing function in the symbol table and disassemble it in its entirety.
Another option is to get the address from file:line info: info line foo.cc:123 followed by disas $addr_given_by_previous_command.
If you know that foo::bar() exists somewhere, but don't know its source location, another option is to set a breakpoint on it via e.g. rbreak 'foo::bar'. This will tell you the address where the breakpoint was set, and you can disassemble that address.
I am doing a postmortem analysis of a crashed program. I am on Linux (Ubuntu 12.04, x86), the code is written in C++. The Program is using some singletons that may contain valuable information. Is it possible to find the pointer to the instance of a singleton if it was created like this:
SingletonType& SingletonType::getInstance(){
static SingletonType* instance = new SingletonType();
return *instance;
}
And if its is possible, how is it done in GDB?
Run gdb with the core file, and run the command
disassemble SingletonType::getInstance
On my test-program I found a mov 0x<addr>, %eax instruction near the end of the method. A print *(*(SingletonType**) <0xaddr>) should print the contents of your singleton structure.
show modules1 should probably tell you the base addresses, and instance, being statically allocated, should be visible in some kind of objdump/nm report. Yeah hairy maths.
The alternative would be to disassemble SingletonType::getInstance() and see what effective address gets loaded in the initialization/return path.
1 Mmm can't find the exact match I was remembering. info sharedlibrary would get you most info.
this is what I do, while inside the core with gdb:
(gdb) info var instance
this will list all the addresses of all the singletons instances, among which you will find the one of SingletonType
0x86aa960 SingletonType::getInstance()::instance
Now that I have the address you can print the your instance' pointed memory:
(gdb) p *((SingletonType*)0x86aa960)
How does one determine where the mistake is in the code that causes a segmentation fault?
Can my compiler (gcc) show the location of the fault in the program?
GCC can't do that but GDB (a debugger) sure can. Compile you program using the -g switch, like this:
gcc program.c -g
Then use gdb:
$ gdb ./a.out
(gdb) run
<segfault happens here>
(gdb) backtrace
<offending code is shown here>
Here is a nice tutorial to get you started with GDB.
Where the segfault occurs is generally only a clue as to where "the mistake which causes" it is in the code. The given location is not necessarily where the problem resides.
Also, you can give valgrind a try: if you install valgrind and run
valgrind --leak-check=full <program>
then it will run your program and display stack traces for any segfaults, as well as any invalid memory reads or writes and memory leaks. It's really quite useful.
You could also use a core dump and then examine it with gdb. To get useful information you also need to compile with the -g flag.
Whenever you get the message:
Segmentation fault (core dumped)
a core file is written into your current directory. And you can examine it with the command
gdb your_program core_file
The file contains the state of the memory when the program crashed. A core dump can be useful during the deployment of your software.
Make sure your system doesn't set the core dump file size to zero. You can set it to unlimited with:
ulimit -c unlimited
Careful though! that core dumps can become huge.
There are a number of tools available which help debugging segmentation faults and I would like to add my favorite tool to the list: Address Sanitizers (often abbreviated ASAN).
Modern¹ compilers come with the handy -fsanitize=address flag, adding some compile time and run time overhead which does more error checking.
According to the documentation these checks include catching segmentation faults by default. The advantage here is that you get a stack trace similar to gdb's output, but without running the program inside a debugger. An example:
int main() {
volatile int *ptr = (int*)0;
*ptr = 0;
}
$ gcc -g -fsanitize=address main.c
$ ./a.out
AddressSanitizer:DEADLYSIGNAL
=================================================================
==4848==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x5654348db1a0 bp 0x7ffc05e39240 sp 0x7ffc05e39230 T0)
==4848==The signal is caused by a WRITE memory access.
==4848==Hint: address points to the zero page.
#0 0x5654348db19f in main /tmp/tmp.s3gwjqb8zT/main.c:3
#1 0x7f0e5a052b6a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x26b6a)
#2 0x5654348db099 in _start (/tmp/tmp.s3gwjqb8zT/a.out+0x1099)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/tmp.s3gwjqb8zT/main.c:3 in main
==4848==ABORTING
The output is slightly more complicated than what gdb would output but there are upsides:
There is no need to reproduce the problem to receive a stack trace. Simply enabling the flag during development is enough.
ASANs catch a lot more than just segmentation faults. Many out of bounds accesses will be caught even if that memory area was accessible to the process.
¹ That is Clang 3.1+ and GCC 4.8+.
All of the above answers are correct and recommended; this answer is intended only as a last-resort if none of the aforementioned approaches can be used.
If all else fails, you can always recompile your program with various temporary debug-print statements (e.g. fprintf(stderr, "CHECKPOINT REACHED # %s:%i\n", __FILE__, __LINE__);) sprinkled throughout what you believe to be the relevant parts of your code. Then run the program, and observe what the was last debug-print printed just before the crash occurred -- you know your program got that far, so the crash must have happened after that point. Add or remove debug-prints, recompile, and run the test again, until you have narrowed it down to a single line of code. At that point you can fix the bug and remove all of the temporary debug-prints.
It's quite tedious, but it has the advantage of working just about anywhere -- the only times it might not is if you don't have access to stdout or stderr for some reason, or if the bug you are trying to fix is a race-condition whose behavior changes when the timing of the program changes (since the debug-prints will slow down the program and change its timing)
Lucas's answer about core dumps is good. In my .cshrc I have:
alias core 'ls -lt core; echo where | gdb -core=core -silent; echo "\n"'
to display the backtrace by entering 'core'. And the date stamp, to ensure I am looking at the right file :(.
Added: If there is a stack corruption bug, then the backtrace applied to the core dump is often garbage. In this case, running the program within gdb can give better results, as per the accepted answer (assuming the fault is easily reproducible). And also beware of multiple processes dumping core simultaneously; some OS's add the PID to the name of the core file.
This is a crude way to find the exact line after which there was the segmentation fault.
Define line logging function
#include \<iostream>
void log(int line) {
std::cout << line << std::endl;
}
find and replace all the semicolon after the log function with "; log(_LINE_);"
Make sure that the semicolons replaced with functions in the for (;;) loops are removed
If you have a reproducible exception like segmentation fault, you can use a tool like a debugger to reproduce the error.
I used to find source code location for even non-reproducible error. It's based on the Microsoft compiler tool chain. But it's based on a idea.
Save the MAP file for each binary (DLL,EXE) before you give it to the customer.
If an exception occurs, lookup the address in the MAP file and determine the function whose start address is just below the exception address. As a result you know the function, where the exception occurred.
Subtract the function start address from the exception address. The result is the offset in the function.
Recompile the source file containing the function with assembly listing enabled. Extract the function's assembly listing.
The assembly includes the offset of each instruction in the function. Lookup the source code line, that matches the offset in the function.
Evaluate the assembler code for the specific source code line. The offset points exactly the assembler instruction that caused the thrown exception. Evaluate the code of this single source code line. With a bit of experience with the compiler output you can say what caused the exception.
Be aware the reason for the exception might be at a totally different location. e.g. the code dereferenced a NULL pointer, but the actual reason, why the pointer is NULL can be somewhere else.
The steps 6. and 7. are beneficial since you asked only for the line of code. But I recommend that you should be aware of it.
I hope you get a similar environment with the GCC compiler for your platform. If you don't have a usable MAP file, use the tool chain tools to get the addresses of the the function. I am sure the ELF file format supports this.
In case any of you (like me!) were looking for this same question but with gfortran, not gcc, the compiler is much more powerful these days and before resorting to the use of the debugger, you can also try out these compile options. For me, this identified exactly the line of code where the error occurred and which variable I was accessing out of bounds to cause the segmentation fault error.
-O0 -g -Wall -fcheck=all -fbacktrace
What does it mean when it gives a backtrace with the following output?
#0 0x00000008009c991c in pthread_testcancel () from /lib/libpthread.so.2
#1 0x00000008009b8120 in sigaction () from /lib/libpthread.so.2
#2 0x00000008009c211a in pthread_mutexattr_init () from /lib/libpthread.so.2
#3 0x0000000000000000 in ?? ()
The program has crashed with a standard signal 11, segmentation fault.
My application is a multi-threaded FastCGI C++ program running on FreeBSD 6.3, using pthread as the threading library.
It has been compiled with -g and all the symbol tables for my source are loaded, according to info sources.
As is clear, none of my actual code appears in the trace but instead the error seems to originate from standard pthread libraries. In particular, what is ?? () ????
EDIT: eventually tracked the crash down to a standard invalid memory access in my main code. Doesn't explain why the stack trace was corrupted, but that's a question for another day :)
gdb wasn't able to extract the proper return address from pthread_mutexattr_init; it got an address of 0. The "??" is the result of looking up address 0 in the symbol table. It cannot find a symbolic name, so it prints a default "??"
Unfortunately right offhand I don't know why it could not extract the correct return address.
Something you did cause the threading library to crash. Since the threading library itself is not compiled with debugging symbols (-g), it cannot display the source code file or line number the crash happened on. In addition, since it's threads, the call stack does not point back to your file. Unfortunately this will be a tough bug to track down, you're gonna need to step through your code and try and narrow down when exactly the crash happens.
Make sure you compile with debug symbols. (For gcc I think that is the -g option). Then you should be able to get more interesting information out of GDB. Don't forget to turn it off when you compile the production version.
I could be missing something, but isn't this indicative of someone using NULL as a function pointer?
#include <stdio.h>
typedef int (*funcptr)(void);
int
func_caller(funcptr f)
{
return (*f)();
}
int
main()
{
return func_caller(NULL);
}
This produces the same style of a backtrace if you run it in gdb:
rivendell$ gcc -g -O0 foo.c -o foo
rivendell$ gdb --quiet foo
Reading symbols for shared libraries .. done
(gdb) r
Starting program: ...
Reading symbols for shared libraries . done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0x00001f9d in func_caller (f=0) at foo.c:8
#2 0x00001fb1 in main () at foo.c:14
This is a pretty strange crash though... pthread_mutexattr_init rarely does anything more than allocate a data structure and memset it. I'd look for something else going on. Is there a possibility of mismatched threading libraries or something. My BSD knowledge is a little dated, but there used to be issues around this.
Maybe the bug that caused the crash has broken the stack (overwritten parts of the stack)? In that case, the backtrace might be useless; no idea what to do in that case...
I am looking for a more technical explanation than the OS calls the function.
Is there a website or book?
The .exe file (or equivalent on other platforms) contains an 'entry point' address. To a first approximation, the OS loads the relevant sections of the .EXE file into RAM, and then jumps to the entry point.
As others have said, this entry point will not be 'main', but will instead be a part of the runtime library - it will do things like initialising static objects, setting up the argc and argv parameters, setting up standard input, standard output, standard error, etc. When it's done all that, it will call your main() function. When main exits, the runtime goes through an analogous process of passing your return code back to the environment, calling static destructors, calling _atexit routines, etc.
If you have Microsoft tools (perhaps not the freebie ones), then you have all the runtime source, and an easy way to look at it is to put a breakpoint on the closing brace of your main() method, and single step back up into the runtime.
main() is part of the C library and is not a system function. I don't know for OS X or Linux, but Windows usually starts a program with WinMainCRTStartup(). This symbol init your process, extract command line arguments and environment (argc, argv, end) and calls main(). It is also responsible of calling any code that should run after main(), like atexit().
By looking in your Visual Studio file, you should be able to find the default implementation of WinMainCRTStartup to see what it does.
You can also define a function of your own to call at startup, this is done by changing "entry point" in the linker options. This is often a function that takes no arguments and returns a void.
As far as Windows goes, the entry point functions are:
Console: void __cdecl mainCRTStartup( void ) {}
GUI: void __stdcall WinMainCRTStartup( void ) {}
DLL: BOOL __stdcall _DllMainCRTStartup(HINSTANCE hinstDLL,DWORD fdwReason,void* lpReserved) {}
The only reason to use these over the normal main, WinMain, and DllMain is if you wanted to use your own run time library. (If you want smaller file size or custom features.)
For custom run-time implementations and other tricks to get smaller PE files, see:
http://www.microsoft.com/msj/archive/S569.aspx
http://www.codeproject.com/KB/tips/aggressiveoptimize.aspx
http://www.catch22.net/tuts/minexe.asp
http://www.hailstorm.net/papers/smallwin32.htm
It's OS dependent.
In OS X, there's a frame in the mach header that contains the start address for the EIP (instruction pointer) register.
Once the binary is loaded, the OS launches execution from this address:
cristi:test diciu$ otool -l ./a.out | grep -A 10 LC_UNIXTHREAD
cmd LC_UNIXTHREAD
cmdsize 80
flavor i386_THREAD_STATE
count i386_THREAD_STATE_COUNT
[..]
ss 0x00000000 eflags 0x00000000 eip 0x00001f8c cs 0x00000000
[..]
The address is the address of the "start" function from the binary:
cristi:test diciu$ nm ./a.out
0000200c D _NXArgc
00002008 D _NXArgv
00002000 D ___progname
00001fe0 t __dyld_func_lookup
00001000 A __mh_execute_header
[..]
00001f8c T start
In Mac OS X, it's the "start" function that gets called first, even before the "main" function:
(gdb) b start
Breakpoint 1 at 0x1f90
(gdb) b main
Breakpoint 2 at 0x1ff4
(gdb) r
Starting program: /Users/diciu/Programming/test/a.out
Reading symbols for shared libraries ++. done
Breakpoint 1, 0x00001f90 in start ()
Expert C++/CLI (check around page 279) has very specific details of the different bootstrap scenarios for native, mixed, and pure CLR assemblies.