Linking against library corrupts ELF program headers - c++

I am trying to link my C++ executable against a shared C library (also written by me / my colleagues).
The C++ executable just contains a main that prints "Hello World" to stdout and returns 0.
It runs fine. But as soon as I dynamic link against my C library (I am not even calling into this lib) my C++ executable fails at startup:
/opt/av-server # strace ./program
execve("./program", ["./program"], [/* 10 vars */]) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault
I did some research and found out that as soon as I link against the C library my ELF header starts looking very strange. The entry point address is suspiciously low and the addresses of the first LOAD are all zero:
Corrupt Header
Elf file type is DYN (Shared object file)
Entry point 0x6d8
There are 6 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x000960 0x00000960 0x00000960 0x00010 0x00010 R 0x4
LOAD 0x000000 0x00000000 0x00000000 0x00974 0x00974 R E 0x10000
LOAD 0x000ee0 0x00010ee0 0x00010ee0 0x0015c 0x00164 RW 0x10000
DYNAMIC 0x000ef0 0x00010ef0 0x00010ef0 0x00110 0x00110 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x000ee0 0x00010ee0 0x00010ee0 0x00120 0x00120 R 0x1
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01 .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
02 .init_array .fini_array .jcr .dynamic .got .data .bss
03 .dynamic
04
05 .init_array .fini_array .jcr .dynamic
I guess this explains why the program is segfaulting: It jumps to 0 :/ As comparison this is the header that is generated when I don't link against the lib:
Good Header
Elf file type is EXEC (Executable file)
Entry point 0x10588
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x0007fc 0x000107fc 0x000107fc 0x00018 0x00018 R 0x4
PHDR 0x000034 0x00010034 0x00010034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x00010154 0x00010154 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.3]
LOAD 0x000000 0x00010000 0x00010000 0x00818 0x00818 R E 0x10000
LOAD 0x000ef0 0x00020ef0 0x00020ef0 0x00148 0x001dc RW 0x10000
DYNAMIC 0x000f00 0x00020f00 0x00020f00 0x00100 0x00100 RW 0x4
NOTE 0x000168 0x00010168 0x00010168 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x000ef0 0x00020ef0 0x00020ef0 0x00110 0x00110 R 0x1
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01
02 .interp
03 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
04 .init_array .fini_array .jcr .dynamic .got .data .bss
05 .dynamic
06 .note.ABI-tag
07
08 .init_array .fini_array .jcr .dynamic
I understand why it must segfault. But the real question is: What inside the C library could possibly be responsible for corrupting the ELF header in that way?
I want to know that to look for. My suspicion is that somewhere in the lib some compiler intrinsics are used that go wrong. Just a wild guess.
I cross compile for Linux bha-1CCAE370CE47 3.4.35 #1 Thu Mar 9 11:20:13 HKT 2017 armv5tejl GNU/Linux with a GCC6.3. The toolchain was build with crosstool-ng. Build system is macOS.

Related

Member variable allocated at start of memory

I'm trying to use c++ on an STM32 device compiling with gcc. The device loads the code and start executing it but hard faults on any member variable write.
I can see with GDB that member variables are stored at beginnning of memory (0x7 to be specific), of course the STM32 hard faults at the first write of that location.
I can see that BSS section is not generated unless i declare a variable in main (used readelf on the final elf file).
Shouldnt be member variables be placed in bss?
I'm compiling and linking with -nostdlib -mcpu=cortex-m0plus -fno-exceptions -O0 -g.
The linker script is:
ENTRY(start_of_memory);
MEMORY {
rom (rx) : ORIGIN = 0x08000000, LENGTH = 16K
ram (xrw) : ORIGIN = 0x20000000, LENGTH = 2K
}
SECTIONS {
.text : {
*(.text)
} > rom
.data : {
*(.data)
*(.data.*)
} > ram
.bss : {
*(.bss)
*(.bss.*)
*(COMMON)
} > ram
}
The output of readelf (no variables declaration, only object usage):
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x8000000
Start of program headers: 52 (bytes into file)
Start of section headers: 76536 (bytes into file)
Flags: 0x5000200, Version5 EABI, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 14
Section header string table index: 13
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 08000000 010000 0005a8 00 AX 0 0 4
[ 2] .rodata PROGBITS 080005a8 0105a8 00005c 00 A 0 0 4
[ 3] .ARM.attributes ARM_ATTRIBUTES 00000000 010604 00002d 00 0 0 1
[ 4] .comment PROGBITS 00000000 010631 000049 01 MS 0 0 1
[ 5] .debug_info PROGBITS 00000000 01067a 000a93 00 0 0 1
[ 6] .debug_abbrev PROGBITS 00000000 01110d 0003b8 00 0 0 1
[ 7] .debug_aranges PROGBITS 00000000 0114c5 000060 00 0 0 1
[ 8] .debug_line PROGBITS 00000000 011525 000580 00 0 0 1
[ 9] .debug_str PROGBITS 00000000 011aa5 000416 01 MS 0 0 1
[10] .debug_frame PROGBITS 00000000 011ebc 000228 00 0 0 4
[11] .symtab SYMTAB 00000000 0120e4 000640 10 12 86 4
[12] .strtab STRTAB 00000000 012724 000344 00 0 0 1
[13] .shstrtab STRTAB 00000000 012a68 00008f 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x010000 0x08000000 0x08000000 0x00604 0x00604 R E 0x10000
Section to Segment mapping:
Segment Sections...
00 .text .rodata
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
Symbol table '.symtab' contains 100 entries:
Main (init platform probably does not use any variables):
int main(void) {
init_platform(SPEED_4_MHz);
gpio testpin(GPIO_A, 5);
testpin.dir(MODE_OUTPUT);
while (1) {
testpin.high();
wait();
testpin.low();
wait();
}
return 0;
}
Update #1:
The vector table is at beginning of memory, sp and msp are initialized successfully.
(gdb) p/x *0x00000000
$2 = 0x20000700
(gdb) p/x *0x00000004
$3 = 0x80000f1
(gdb) info registers
sp 0x20000700 0x20000700
lr 0xffffffff -1
pc 0x80000f6 0x80000f6 <main()+6>
xPSR 0xf1000000 -251658240
msp 0x20000700 0x20000700
psp 0xfffffffc 0xfffffffc
Putting a breakpoint on a constructor for the GPIO class, i can see variables are at 0x00000XXX
Breakpoint 2, gpio::gpio (this=0x7, port=0 '\000', pin=5 '\005') at gpio.cpp:25
25 mypin = pin;
(gdb) p/x &mypin
$6 = 0xb
I tried to make mypin a public member variable (was private), did not make any change.
Starting to think that dynamic allocation is needed with C++.
Address 0x7 is in the initial vector table in ROM, it is not writeable.
Unfortunately you don't have a section to populate the vector table, so this code is never going to work. You also don't appear to have a stack, which is where the members of gpio would be placed (because it is defined inside a function without the static keyword).
Start by taking the linker script provided as part of the STM32Cube package and then (if you must) modify it a little bit at a time until you break it. Then you will know what you have broken. It is not reasonable to write such a naïve linker script as this and expect it to work on a microcontroller.
of course the STM32 hard faults at the first write of that location.
STM32 does not "fault" if you try to write FLASH. It will simple have no effect.
You need to have a vector table in at the beginning of the FLASH memory. It has to contain as a minimum valid stack pointer address and the firmware entry point.
Your linker script and the code (I understand you do not use any STM supplied startup code) is far from being sufficient.
My advice:
Create the project using STM32Cube.
Then see how it should be done
Having this knowledge you can start to reinvent the wheel
The issue was in the launch script:
Not working:
toolchain\bin\arm-none-eabi-gdb.exe ^
-ex "target remote 127.0.0.1:3333" ^
-ex "load" ^
-ex "b main" ^
-ex "b unmanaged_isr_call" ^
-ex "b hard_fault_isr" ^
-ex "j main" binaries\main.elf
Working:
toolchain\bin\arm-none-eabi-gdb.exe ^
-ex "target remote 127.0.0.1:3333" ^
-ex "load" ^
-ex "b unmanaged_isr_call" ^
-ex "b hard_fault_isr" ^
-ex "set $pc = &main" binaries\main.elf
Made it work.
The issue was in j main.
The jump instruction does not modify the stack frame where all the object are placed by the compiler.
Using set $pc, execution starts at the given address, using jump execution starts at the first C line after the address, a big difference!.
From the gdb jump documentation:
The jump command does not change the current stack frame, or the stack pointer, or the contents of any memory location or any register other than the program counter. If locspec resolves to an address in a different function from the one currently executing, the results may be bizarre if the two functions expect different patterns of arguments or of local variables. For this reason, the jump command requests confirmation if the jump address is not in the function currently executing. However, even bizarre results are predictable if you are well acquainted with the machine-language code of your program.
The first lines make space in the stack for the objects "created" by main, space needed for the object to be used during execution. (verified by launching both commands and seeing differen msp values at the first C line).
With jump, those lines are not executed and the space is not allocated on stack: when code calls a funxction, the parameters will overwrite member data.

What is _IO_stdin_used

Can someone explain me what _IO_stdin_used is in the following line:
114a: 48 8d 3d b3 0e 00 00 lea rdi,[rip+0xeb3] # 2004 <_IO_stdin_used+0x4>
Sorry for the noob question.
Expanding on https://stackoverflow.com/users/224132/peter-cordes 's comment (What is _IO_stdin_used ), some tools can help make sense of this. Starting with burt.c:
#include <stdio.h>
void main (void) {
puts("hello!");
}
built like so: LDFLAGS=-Wl,-Map=burt.map make burt
We will look at the map file in a moment, first inspect the executable:
$ objdump -xsd -M intel burt
...
0000000000002000 g O .rodata 0000000000000004 _IO_stdin_used
...
Contents of section .rodata:
2000 01000200 68656c6c 6f2100 ....hello!.
...
1148: 48 8d 05 b5 0e 00 00 lea rax,[rip+0xeb5] # 2004 <_IO_stdin_used+0x4>
114f: 48 89 c7 mov rdi,rax
1152: e8 d9 fe ff ff call 1030 <puts#plt>
...
Note, the string's location (2004h) does not have an entry in the symbol table (where the 2000h entry _IO_stdin_used occurs). So, objdump's disassembly comments mention it relative to another symbol it knows about.
But why does it know about _IO_stdin_used, or, why does is that symbol included and why does it have a name? burt.map shows it came from Scrt1.o:
...
.rodata 0x0000000000002000 0xb
*(.rodata .rodata.* .gnu.linkonce.r.*)
.rodata.cst4 0x0000000000002000 0x4 /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.1/../../../../lib/Scrt1.o
.rodata 0x0000000000002004 0x7 /tmp/ccYG1XvK.o
...
On my computer, Scrt1.o belongs to glibc, and the symbol can be traced back here: https://sourceware.org/git/?p=glibc.git;a=blob;f=csu/init.c;h=c2f978f3da565590bcab355fefa3d81cf211cb36;hb=63fb8f9aa9d19f85599afe4b849b567aefd70a36.
To show that this is displayed in the objdump disassembly because there is no name for the address, we can give our string a name to add it to the symbol table, so that objdump will show it instead. To do this let's modify burt.c like so:
#include <stdio.h>
const char greeting[] = "hello!";
void main (void) {
puts(greeting);
}
Build, then inspect:
...
0000000000002004 g O .rodata 0000000000000007 greeting
...
0000000000002000 g O .rodata 0000000000000004 _IO_stdin_used
...
Contents of section .rodata:
2000 01000200 68656c6c 6f2100 ....hello!.
...
113d: 48 8d 05 c0 0e 00 00 lea rax,[rip+0xec0] # 2004 <greeting>
1144: 48 89 c7 mov rdi,rax
1147: e8 e4 fe ff ff call 1030 <puts#plt>
The offset relative to rip has changed a little bit, because the .text and .rodata sections are now a little bit further apart, although the .rodata contents remain the same, because the source modification was chosen to avoid modifying .rodata -- if greetings were not const, it would be located in .data instead. The map file below now shows burt.c's temporary object files's contribution to .rodata defining the greeting symbol, as compared to the previous map file where the same data had no name:
.rodata 0x0000000000002000 0xb
*(.rodata .rodata.* .gnu.linkonce.r.*)
.rodata.cst4 0x0000000000002000 0x4 /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.1/../../../..
/lib/Scrt1.o
0x0000000000002000 _IO_stdin_used
.rodata 0x0000000000002004 0x7 /tmp/ccDnWXjG.o
0x0000000000002004 greeting

How to find EH_FRAME

Suppose I wanted a program written in C++ read its own eh_frame, in order to get information required for stack unwinding and exception handling. How to find out where it begins?
Suppose I wanted a program written in C++ read its own eh_frame
The .eh_frame_hdr is linked into its own GNU_EH_FRAME program header, precisely so it's easy for runtime libraries to locate .eh_frame at runtime.
Here is a typical layout:
readelf -Wl /bin/date
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x3cd0
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x002790 0x002790 R 0x1000
LOAD 0x003000 0x0000000000003000 0x0000000000003000 0x010139 0x010139 R E 0x1000
LOAD 0x014000 0x0000000000014000 0x0000000000014000 0x005ad8 0x005ad8 R 0x1000
LOAD 0x01a250 0x000000000001b250 0x000000000001b250 0x001090 0x001248 RW 0x1000
DYNAMIC 0x01adf8 0x000000000001bdf8 0x000000000001bdf8 0x0001e0 0x0001e0 RW 0x8
NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x017fe0 0x0000000000017fe0 0x0000000000017fe0 0x00040c 0x00040c R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x01a250 0x000000000001b250 0x000000000001b250 0x000db0 0x000db0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .data.rel.ro .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.build-id .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .data.rel.ro .dynamic .got
So start with GNU_EH_FRAME segment, and follow the links.
You could use (like RefPerSys and GCC do) Ian Lance Taylor's libbacktrace for that purpose.
(but you'll better compile your C++ code with DWARF debug information, so g++ -Wall -g -O ...)

Hook an static linked ELF binary

I have an application that have openssl statically linked elf binary and i'm about to hook some of it's openssl function to get pre-master key thus allow me to decrypt the connections using wireshark.
I'm aware and know how to LD_PRELOAD or LD_LIBRARY_PATH hooking shared library, but this is statically linked binary.
Fortunately, the static elf didn't strip their debug symbol, so all named function i'm to hooking to are identified.
How do I have todo to hook this statically linked elf ?
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x80ceae0
Start of program headers: 52 (bytes into file)
Start of section headers: 3285112 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 8
Size of section headers: 40 (bytes)
Number of section headers: 28
Section header string table index: 27
Program Headers:
Elf file type is EXEC (Executable file)
Entry point 0x80ceae0
There are 8 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x309507 0x309507 R E 0x1000
LOAD 0x309520 0x08352520 0x08352520 0x13168 0x29934 RW 0x1000
DYNAMIC 0x31c0fc 0x083650fc 0x083650fc 0x00100 0x00100 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x2ccc30 0x08314c30 0x08314c30 0x0a06c 0x0a06c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table
03 .data .dynamic .ctors .dtors .jcr .got .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
Symbol Table:
...
8627: 081ddbb0 408 FUNC GLOBAL DEFAULT 12 SSL_free
8629: 081de360 190 FUNC GLOBAL DEFAULT 12 SSL_copy_session_id
8665: 081deba0 148 FUNC GLOBAL DEFAULT 12 SSL_get_shared_ciphers
8848: 081df2f0 17 FUNC GLOBAL DEFAULT 12 SSL_CTX_set_default_passw
8927: 081e03a0 42 FUNC GLOBAL DEFAULT 12 SSL_CTX_set_cert_store
8996: 081de2d0 94 FUNC GLOBAL DEFAULT 12 SSL_get_peer_certificate
9079: 081e0250 14 FUNC GLOBAL DEFAULT 12 SSL_get_verify_result
9130: 081e52e0 269 FUNC GLOBAL DEFAULT 12 SSL_CTX_use_RSAPrivateKey
9193: 081e0f70 20 FUNC GLOBAL DEFAULT 12 SSL_SESSION_get_ex_data
9266: 081e0230 17 FUNC GLOBAL DEFAULT 12 SSL_set_verify_result
9305: 081df350 17 FUNC GLOBAL DEFAULT 12 SSL_CTX_set_verify_depth
9394: 081de230 14 FUNC GLOBAL DEFAULT 12 SSL_CTX_get_verify_depth
9409: 081e1840 36 FUNC GLOBAL DEFAULT 12 SSL_CTX_remove_session
9590: 081e3390 63 FUNC GLOBAL DEFAULT 12 SSL_rstate_string
9655: 081df8c0 122 FUNC GLOBAL DEFAULT 12 SSL_set_ssl_method
9662: 081e0360 20 FUNC GLOBAL DEFAULT 12 SSL_CTX_get_ex_data
9691: 081de330 38 FUNC GLOBAL DEFAULT 12 SSL_get_peer_cert_chain
9696: 081e0d20 20 FUNC GLOBAL DEFAULT 12 SSL_CTX_set_client_CA_lis
9798: 081e0d50 68 FUNC GLOBAL DEFAULT 12 SSL_get_client_CA_list
9810: 081de6f0 138 FUNC GLOBAL DEFAULT 12 SSL_write
...
You'll have to use GDB with a breakpoint command (perhaps involving Python scripting), or Systemtap. There is no direct way to interpose functions which are not listed in the .dynsym section (which is of course missing due to static linking).

gdb add-symbol-file all sections and load address

I'm debugging a boot loader (syslinux) with gdb and the gdb-stub of qemu. At some point the main file load a shared object ldlinux.elf.
I would like to add the symbols in gdb for that file. The command add-symbol-file seems like the way to go. However, as a relocatable file, I have to specify the memory address it has been loaded at. And here comes the problem.
Although I know the base address at which the LOAD segment has been loaded at, add-symbol-file works section-wise and want me to specify the address at which each section has been loaded.
Can I tell gdb to load all the symbols of all the sections provided that I specify the base address of the file in memory?
Does the behavior of gdb make sens? The section headers aren't used for running an ELF and are even optional. I can't see a use case where specifying the load address of the sections would be useful.
Example
Here are the program headers and section headers of the shared object.
Elf file type is DYN (Shared object file)
Entry point 0x4c60
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x1db10 0x20bfc RWE 0x1000
DYNAMIC 0x01d618 0x0001d618 0x0001d618 0x00098 0x00098 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
Section to Segment mapping:
Segment Sections...
00 .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text .rodata .ctors .dtors .data.rel.ro .dynamic .got .got.plt .data .bss
01 .dynamic
02
There are 29 section headers, starting at offset 0x78618:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .gnu.hash GNU_HASH 00000094 000094 0007e0 04 A 2 0 4
[ 2] .dynsym DYNSYM 00000874 000874 0015c0 10 A 3 1 4
[ 3] .dynstr STRTAB 00001e34 001e34 0010f4 00 A 0 0 1
[ 4] .rel.dyn REL 00002f28 002f28 000ce8 08 A 2 0 4
[ 5] .rel.plt REL 00003c10 003c10 000568 08 AI 2 6 4
[ 6] .plt PROGBITS 00004180 004180 000ae0 04 AX 0 0 16
[ 7] .text PROGBITS 00004c60 004c60 013816 00 AX 0 0 4
[ 8] .rodata PROGBITS 00018480 018480 00462f 00 A 0 0 32
[ 9] .ctors INIT_ARRAY 0001cab0 01cab0 000010 00 WA 0 0 4
[10] .dtors FINI_ARRAY 0001cac0 01cac0 000004 00 WA 0 0 4
[11] .data.rel.ro PROGBITS 0001cae0 01cae0 000b38 00 WA 0 0 32
[12] .dynamic DYNAMIC 0001d618 01d618 000098 08 WA 3 0 4
[13] .got PROGBITS 0001d6b0 01d6b0 0000d0 04 WA 0 0 4
[14] .got.plt PROGBITS 0001d780 01d780 0002c0 04 WA 0 0 4
[15] .data PROGBITS 0001da40 01da40 0000d0 00 WA 0 0 32
[16] .bss NOBITS 0001db20 01db10 0030dc 00 WA 0 0 32
[17] .comment PROGBITS 00000000 01db10 000026 01 MS 0 0 1
[18] .debug_aranges PROGBITS 00000000 01db38 0010c0 00 0 0 8
[19] .debug_info PROGBITS 00000000 01ebf8 021ada 00 0 0 1
[20] .debug_abbrev PROGBITS 00000000 0406d2 009647 00 0 0 1
[21] .debug_line PROGBITS 00000000 049d19 00bd3a 00 0 0 1
[22] .debug_frame PROGBITS 00000000 055a54 004574 00 0 0 4
[23] .debug_str PROGBITS 00000000 059fc8 00538c 01 MS 0 0 1
[24] .debug_loc PROGBITS 00000000 05f354 01312d 00 0 0 1
[25] .debug_ranges PROGBITS 00000000 072481 0005d0 00 0 0 1
[26] .shstrtab STRTAB 00000000 072a51 000101 00 0 0 1
[27] .symtab SYMTAB 00000000 072b54 003530 10 28 504 4
[28] .strtab STRTAB 00000000 076084 002593 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
If I try to load the file at the address 0x7fab000 then it will relocate the symbols so that the .text section starts at 0x7fab000.
(gdb) add-symbol-file bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
add symbol table from file "bios/com32/elflink/ldlinux/ldlinux.elf" at
.text_addr = 0x7fab000
(y or n) y
Reading symbols from bios/com32/elflink/ldlinux/ldlinux.elf...done.
And then all the symbols are off by 0x4c60 bytes.
So, finally, I made my own command with python and the readelf tool. It's not very clean since it runs readelf in a subprocess and parse its output instead of parsing the ELF file directly, but it works (for 32 bits ELF only).
It uses the section headers to generate and run an add-symbol-file command with all the sections correctly relocated. The usage is pretty simple, you give it the elf file and the base address of the file. And since the remove-symbol-file wasn't working properly by just giving it the filename, I made a remove-symbol-file-all that generate and run the right remove-symbol-file -a address command.
(gdb) add-symbol-file-all bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
add symbol table from file "bios/com32/elflink/ldlinux/ldlinux.elf" at
.text_addr = 0x7fafc50
.gnu.hash_addr = 0x7fab094
.dynsym_addr = 0x7fab874
.dynstr_addr = 0x7face34
.rel.dyn_addr = 0x7fadf28
.rel.plt_addr = 0x7faec08
.plt_addr = 0x7faf170
.rodata_addr = 0x7fc34e0
.ctors_addr = 0x7fc7af0
.dtors_addr = 0x7fc7b00
.data.rel.ro_addr = 0x7fc7b20
.dynamic_addr = 0x7fc8658
.got_addr = 0x7fc86f0
.got.plt_addr = 0x7fc87bc
.data_addr = 0x7fc8a80
.bss_addr = 0x7fc8b60
(gdb) remove-symbol-file-all bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
Here is the code to be added in the .gdbinit file.
python
import subprocess
import re
def relocatesections(filename, addr):
p = subprocess.Popen(["readelf", "-S", filename], stdout = subprocess.PIPE)
sections = []
textaddr = '0'
for line in p.stdout.readlines():
line = line.decode("utf-8").strip()
if not line.startswith('[') or line.startswith('[Nr]'):
continue
line = re.sub(r' +', ' ', line)
line = re.sub(r'\[ *(\d+)\]', '\g<1>', line)
fieldsvalue = line.split(' ')
fieldsname = ['number', 'name', 'type', 'addr', 'offset', 'size', 'entsize', 'flags', 'link', 'info', 'addralign']
sec = dict(zip(fieldsname, fieldsvalue))
if sec['number'] == '0':
continue
sections.append(sec)
if sec['name'] == '.text':
textaddr = sec['addr']
return (textaddr, sections)
class AddSymbolFileAll(gdb.Command):
"""The right version for add-symbol-file"""
def __init__(self):
super(AddSymbolFileAll, self).__init__("add-symbol-file-all", gdb.COMMAND_USER)
self.dont_repeat()
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
filename = argv[0]
if len(argv) > 1:
offset = int(str(gdb.parse_and_eval(argv[1])), 0)
else:
offset = 0
(textaddr, sections) = relocatesections(filename, offset)
cmd = "add-symbol-file %s 0x%08x" % (filename, int(textaddr, 16) + offset)
for s in sections:
addr = int(s['addr'], 16)
if s['name'] == '.text' or addr == 0:
continue
cmd += " -s %s 0x%08x" % (s['name'], addr + offset)
gdb.execute(cmd)
class RemoveSymbolFileAll(gdb.Command):
"""The right version for remove-symbol-file"""
def __init__(self):
super(RemoveSymbolFileAll, self).__init__("remove-symbol-file-all", gdb.COMMAND_USER)
self.dont_repeat()
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
filename = argv[0]
if len(argv) > 1:
offset = int(str(gdb.parse_and_eval(argv[1])), 0)
else:
offset = 0
(textaddr, _) = relocatesections(filename, offset)
cmd = "remove-symbol-file -a 0x%08x" % (int(textaddr, 16) + offset)
gdb.execute(cmd)
AddSymbolFileAll()
RemoveSymbolFileAll()
end
Can I tell gdb to load all the symbols of all the sections provided that I specify the base address of the file in memory?
Yes, but you need to provide the address of .text section, i.e. 0x7fab000+0x00004c60 here. I agree: it's quite annoying to have to fish out address of .text, and I wanted to fix it many times, so that e.g.
(gdb) add-symbol-file foo.so #0x7abc0000
just works. Feel free to file a feature request in GDB bugzilla.
Does the behavior of gdb make sens?
I am guessing that this is rooted in how GDB was used to debug embedded ROMs, where each section can be at arbitrary memory address.