This is weird.
So I'm trying to make a little kernel and I decided to use C++ for this. I did everything and I now have an (almost) working VGA Text Mode Driver. Why almost? Because whenever I pass the write method a const char* the multiboot header literally disappears.
And after a bit of fiddling i realized that ANY const char* use makes it go bonkers. Even just a variable.
The weird thing is that if I never create a const char* it just works. I can print individual characters too.
Note: I based on the Bare Bones Tuturial on OSDev.
Here's the relevant code:
# Main.asm
MBALIGN equ 1 << 0
MEMINFO equ 1 << 1
FLAGS equ MBALIGN | MEMINFO
MAGIC equ 0x1BADB002
CHECKSUM equ -(MAGIC + FLAGS)
section .multiboot
align 4
dd MAGIC
dd FLAGS
dd CHECKSUM
section .bss
align 16
stack_bottom:
resb 16384
stack_top:
section .text
global _start:function (_start.end - _start)
_start:
mov esp, stack_top
extern kernel_main
call kernel_main
cli
.hang: hlt
jmp .hang
.end:
// Main.cpp
void init() {
Drivers::VGA vga;
vga.putc('h');
vga.write("hello", 5);
}
extern "C" void kernel_main() {
init();
}
// Part of VGA.cpp
void VGA::write(const char* data, size_t size) {
for (size_t i = 0; i < size; i += 1) {
s_buffer[i] = vga_entry(data[i], _color);
}
}
[...]
u16 VGA::vga_entry(unsigned char c, u8 color) {
return (u16)c | (u16)color << 8;
}
# Linker.ld
ENTRY(_start)
SECTIONS {
. = 1M;
.text : ALIGN(4K) {
KEEP(*(.multiboot))
*(.text)
}
.rodata : ALIGN(4K) {
*(.rodata)
}
.data : ALIGN(4K) {
*(.data)
}
.bss : ALIGN(4K) {
*(COMMON)
*(.bss)
}
}
Compiler Options: -target i686-pc-elf -c -IKernel -ffreestanding -nostdlib++ -fno-exceptions -fno-rtti -fno-stack-protector -m32 -fno-use-cxa-atexit
Toolchain: Clang, Nasm and ld.lld
The problem is that ld.lld has a bug(?) or something. It put the rodata section before text, so the multiboot header wouldn't be visible.
Here's the output of readelf -S Kernel with ld.lld
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .rodata.str1.1 PROGBITS 00100000 001000 000001 01 AMS 0 0 1
[ 2] .text PROGBITS 00101000 002000 0002fe 00 AX 0 0 4096
[ 3] .data PROGBITS 00102000 003000 000004 00 WA 0 0 4096
[ 4] .bss NOBITS 00103000 003004 004000 00 WA 0 0 4096
[ 5] .comment PROGBITS 00000000 003004 000029 01 MS 0 0 1
[ 6] .symtab SYMTAB 00000000 003030 0001a0 10 8 14 4
[ 7] .shstrtab STRTAB 00000000 0031d0 000044 00 0 0 1
[ 8] .strtab STRTAB 00000000 003214 00018f 00 0 0 1
And here's the output with system's ld (GNU)
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00100000 001000 0002fe 00 AX 0 0 4096
[ 2] .rodata.str1.1 PROGBITS 001002fe 0012fe 000001 01 AMS 0 0 1
[ 3] .data PROGBITS 00101000 002000 000004 00 WA 0 0 4096
[ 4] .bss NOBITS 00102000 002004 004000 00 WA 0 0 4096
[ 5] .comment PROGBITS 00000000 002004 000015 01 MS 0 0 1
[ 6] .symtab SYMTAB 00000000 00201c 0001f0 10 7 19 4
[ 7] .strtab STRTAB 00000000 00220c 00018f 00 0 0 1
[ 8] .shstrtab STRTAB 00000000 00239b 000044 00 0 0 1
Related
Having the entire standard library or any other library already compiled and ready to be linked to form an executable is a nice feature and it makes the compilation faster, but as far as I know, the entire library is linked even if only a few functions in it are used.
So for instance on my machine, the following code is 1.6 kB when compiled to object code but it becomes almost 17 kB when I link it to the standard library.
#include <stdio.h>
int main(void)
{
printf("Hello world\n");
}
Is there any other way to recompile only the parts of the standard library (or any other library) that are necessary in order to make the program more space-efficient?
Sorry if the question is already asked, I googled it but couldn't find any answers to it.
If we dump the contents of the generated ELF executable we will see no parts of the C runtime library embedded in the binary, because the GLIBC library is linked in dynamically (e.g. from libc.so.6).
$ gcc -Os -s a.c
$ ls -la a.out
-rwxrwxrwx 1 user user 14408 Feb 18 12:56 a.out
$ ldd a.out
linux-vdso.so.1 (0x00007ffd157f8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007feb0a33f000)
/lib64/ld-linux-x86-64.so.2 (0x00007feb0a518000)
$ objdump -T a.out
a.out: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 puts
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
We notice also that GCC optimized a printf with no placeholders into a puts (for the file size it doesn't matter).
To see "inside" the ELF we can dump its sections:
$ readelf -SW a.out
There are 28 section headers, starting at offset 0x3148:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 00000000000002a8 0002a8 00001c 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 00000000000002c4 0002c4 000020 00 A 0 0 4
[ 3] .note.gnu.build-id NOTE 00000000000002e4 0002e4 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000000308 000308 000024 00 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000000330 000330 0000a8 18 A 6 1 8
[ 6] .dynstr STRTAB 00000000000003d8 0003d8 000082 00 A 0 0 1
[ 7] .gnu.version VERSYM 000000000000045a 00045a 00000e 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000000468 000468 000020 00 A 6 1 8
[ 9] .rela.dyn RELA 0000000000000488 000488 0000c0 18 A 5 0 8
[10] .rela.plt RELA 0000000000000548 000548 000018 18 AI 5 23 8
[11] .init PROGBITS 0000000000001000 001000 000017 00 AX 0 0 4
[12] .plt PROGBITS 0000000000001020 001020 000020 10 AX 0 0 16
[13] .plt.got PROGBITS 0000000000001040 001040 000008 08 AX 0 0 8
[14] .text PROGBITS 0000000000001050 001050 000171 00 AX 0 0 16
[15] .fini PROGBITS 00000000000011c4 0011c4 000009 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000002000 002000 000010 00 A 0 0 4
[17] .eh_frame_hdr PROGBITS 0000000000002010 002010 00003c 00 A 0 0 4
[18] .eh_frame PROGBITS 0000000000002050 002050 000100 00 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000003de8 002de8 000008 08 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000003df0 002df0 000008 08 WA 0 0 8
[21] .dynamic DYNAMIC 0000000000003df8 002df8 0001e0 10 WA 6 0 8
[22] .got PROGBITS 0000000000003fd8 002fd8 000028 08 WA 0 0 8
[23] .got.plt PROGBITS 0000000000004000 003000 000020 08 WA 0 0 8
[24] .data PROGBITS 0000000000004020 003020 000010 00 WA 0 0 8
[25] .bss NOBITS 0000000000004030 003030 000008 00 WA 0 0 1
[26] .comment PROGBITS 0000000000000000 003030 00001c 01 MS 0 0 1
[27] .shstrtab STRTAB 0000000000000000 00304c 0000f7 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
So ~14 kB is pretty much the minimal ELF executable size these days due to PIC, relro, eh_frame and other sections generated by the GNU linker.
You can reduce the size somewhat by turning off relro (which reduces the security a little).
$ gcc -Os -s -z norelro -Wl,--gc-sections a.c
$ $ ls -la a.out
-rwxrwxrwx 1 user user 10960 Feb 18 13:06 a.out
When I objdump an ELF file emitted from GHS Multi, the physical address fields for each section is set to 0; however, section file offsets are available, but those offsets do not correspond to the memory region addresses from the linker directive file so alone, offset is not sufficient; as they appear to be file offset only.
Question: How does one compute the physical load address for each section?
My idea is to read through the ELF file and determine the load address for each of the 90 sections and then check to see what memory range the address falls within. In this manner, I can tally memory usage for each memory region; internal sram, external sram, flash, dataflash, etc.
Is the idea flawed? Is there something else I am missing?
No virtual addressing is used. It is a simple executable intended for a 32-bit micro-controller. We are running an RTOS on a Renesas RH850. There is no MMU or MPU active. The program entry point is hooked directly to the processor reset vector; exception: We use a small boot loader in production but jump directly from the boot loader to the start address of the image.
I wish I could share the map file or ELF, but unfortunately, it is not mine to share. I can, however, share the output of readelf.
There are 92 section headers, starting at offset 0xa2d6b0:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] /DISCARD/ PROGBITS 00000000 000000 000000 00 0 0 0
[ 2] .text PROGBITS 00000000 000040 0c596a 00 AX 0 0 2
[ 3] .data PROGBITS 00000000 0c59b0 0038b8 00 WA 0 0 8
[ 4] .bss NOBITS 00000000 000000 0033b0 00 WA 0 0 8
[ 5] .entry_section PROGBITS 00000000 0c9268 0000e8 00 AX 0 0 2
[ 6] .vectorTableCOde PROGBITS 00000000 0c9350 000004 00 AX 0 0 2
[ 7] .rela.text RELA 00000000 0c9354 0a4b14 0c 89 2 4
[ 8] .rela.entry_secti RELA 00000000 16de68 000108 0c 89 5 4
[ 9] .rela.vectorTable RELA 00000000 16df70 00000c 0c 89 6 4
[10] .rodata PROGBITS 00000000 16df7c 135928 00 A 0 0 4
[11] .note.renesas NOTE 00000000 2a38a4 000050 00 0 0 4
[12] .bss_sram NOBITS 00000000 000000 07acd0 00 WA 0 0 8
[13] .rbss NOBITS 00000000 000000 00b5dc 00 WA 0 0 8
[14] .rela.data RELA 00000000 2a38f4 002124 0c 89 3 4
[15] .rela.rodata RELA 00000000 2a5a18 00d41c 0c 89 10 4
[16] .rela.ghsinfo RELA 00000000 2b2e34 000774 0c 89 88 4
[17] .slbss NOBITS 00000000 000000 003534 00 WA 0 0 8
[18] .applicationInfo PROGBITS 00000000 2b35a8 000040 00 A 0 0 4
[19] .applicationEnd PROGBITS 00000000 2b35e8 000014 00 A 0 0 4
[20] .R_FDL_Text PROGBITS 00000000 2b35fc 0016ea 00 AX 0 0 2
[21] .rela.R_FDL_Text RELA 00000000 2b4ce8 001278 0c 89 20 4
[22] .R_FDL_Const PROGBITS 00000000 2b5f60 000014 00 A 0 0 4
[23] .R_FDL_Data NOBITS 00000000 000000 000064 00 WA 0 0 4
[24] .OS_CONST PROGBITS 00000000 2b5f80 0002b4 00 A 0 0 16
[25] .rela.OS_CONST RELA 00000000 2b6234 000624 0c 89 24 4
[26] .OS_CODE PROGBITS 00000000 2b6858 0114c0 00 AX 0 0 4
[27] .rela.OS_CODE RELA 00000000 2c7d18 0094b0 0c 89 26 4
[28] .OS_CORE0_CONST PROGBITS 00000000 2d11c8 002af0 00 A 0 0 8
[29] .OS_CORE0_VAR_NOI NOBITS 00000000 000000 001604 00 WA 0 0 8
[30] .rela.OS_CORE0_CO RELA 00000000 2d3cb8 0040bc 0c 89 28 4
[31] .OS_STACK_BLETASK NOBITS 00000000 000000 001000 00 WA 0 0 4
[32] .OS_STACK_BSWTASK NOBITS 00000000 000000 001000 00 WA 0 0 4
[33] .OS_STACK_CANCM_T NOBITS 00000000 000000 000400 00 WA 0 0 4
[34] .OS_STACK_CANCOM_ NOBITS 00000000 000000 000800 00 WA 0 0 4
[35] .OS_STACK_CLITASK NOBITS 00000000 000000 000400 00 WA 0 0 4
[36] .OS_STACK_CPUMONI NOBITS 00000000 000000 000400 00 WA 0 0 4
[37] .OS_STACK_CHIMEAP NOBITS 00000000 000000 001388 00 WA 0 0 4
[38] .OS_STACK_CHIMEDR NOBITS 00000000 000000 000fa0 00 WA 0 0 4
[39] .OS_STACK_DIRANAB NOBITS 00000000 000000 000400 00 WA 0 0 4
[40] .OS_STACK_DIRANAD NOBITS 00000000 000000 000320 00 WA 0 0 4
[41] .OS_STACK_EXTAMPT NOBITS 00000000 000000 000800 00 WA 0 0 4
[42] .OS_STACK_HWTIMER NOBITS 00000000 000000 000320 00 WA 0 0 4
[43] .OS_STACK_IPC2TAS NOBITS 00000000 000000 000400 00 WA 0 0 4
[44] .OS_STACK_IPCMPTA NOBITS 00000000 000000 000400 00 WA 0 0 4
[45] .OS_STACK_IDLETAS NOBITS 00000000 000000 000400 00 WA 0 0 4
[46] .OS_STACK_IPCGATE NOBITS 00000000 000000 000400 00 WA 0 0 4
[47] .OS_STACK_KEYTASK NOBITS 00000000 000000 000800 00 WA 0 0 4
[48] .OS_STACK_LOGGERT NOBITS 00000000 000000 000800 00 WA 0 0 4
[49] .OS_STACK_MODULEM NOBITS 00000000 000000 000800 00 WA 0 0 4
[50] .OS_STACK_NVMTASK NOBITS 00000000 000000 001000 00 WA 0 0 4
[51] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000800 00 WA 0 0 4
[52] .OS_STACK_OSCORE_ NOBITS 00000000 000000 001000 00 WA 0 0 4
[53] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[54] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[55] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000800 00 WA 0 0 4
[56] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[57] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[58] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[59] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000400 00 WA 0 0 4
[60] .OS_STACK_OSCORE_ NOBITS 00000000 000000 001000 00 WA 0 0 4
[61] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000800 00 WA 0 0 4
[62] .OS_STACK_OSCORE_ NOBITS 00000000 000000 000800 00 WA 0 0 4
[63] .OS_STACK_POLLING NOBITS 00000000 000000 000400 00 WA 0 0 4
[64] .OS_STACK_PWRAMPT NOBITS 00000000 000000 000800 00 WA 0 0 4
[65] .OS_STACK_RTCTASK NOBITS 00000000 000000 000400 00 WA 0 0 4
[66] .OS_STACK_TESTSIM NOBITS 00000000 000000 000400 00 WA 0 0 4
[67] .OS_STACK_TIMERTA NOBITS 00000000 000000 000400 00 WA 0 0 4
[68] .OS_STACK_VEHCFG3 NOBITS 00000000 000000 000960 00 WA 0 0 4
[69] .OS_STACK_VEHCFG4 NOBITS 00000000 000000 000320 00 WA 0 0 4
[70] .OS_STACK_VEHCFG5 NOBITS 00000000 000000 0004b0 00 WA 0 0 4
[71] .OS_STACK_VIDEOIN NOBITS 00000000 000000 000800 00 WA 0 0 4
[72] .OS_STACK_HWPWRCT NOBITS 00000000 000000 0005dc 00 WA 0 0 4
[73] .OS_STACK_SENSORT NOBITS 00000000 000000 000514 00 WA 0 0 4
[74] .OS_STACK_STARTUP NOBITS 00000000 000000 0008fc 00 WA 0 0 4
[75] .OS_STACK_SYSCTRL NOBITS 00000000 000000 000dac 00 WA 0 0 4
[76] .OS_STACK_VEHCFG1 NOBITS 00000000 000000 0005dc 00 WA 0 0 4
[77] .OS_STACK_VEHCFG2 NOBITS 00000000 000000 0005dc 00 WA 0 0 4
[78] .OS_EXCVEC_CORE0_ PROGBITS 00000000 2d7d80 0001fc 00 AX 0 0 512
[79] .OS_INTVEC_CORE0_ PROGBITS 00000000 2d7f80 000800 00 AX 0 0 512
[80] .rela.OS_EXCVEC_C RELA 00000000 2d8780 000180 0c 89 78 4
[81] .rela.OS_INTVEC_C RELA 00000000 2d8900 001800 0c 89 79 4
[82] .OS_VAR_NOCACHE_N NOBITS 00000000 000000 00000c 00 WA 0 0 4
[83] .OS_CORESTATUS_CO NOBITS 00000000 000000 000014 00 WA 0 0 4
[84] .OS_BARRIER_CORE0 NOBITS 00000000 000000 000008 00 WA 0 0 4
[85] .bootstrap PROGBITS 00000000 2da100 000040 00 AX 0 0 2
[86] .rela.bootstrap RELA 00000000 2da140 000018 0c 89 85 4
[87] .gstackfix PROGBITS 00000000 000000 000000 00 0 0 4
[88] .ghsinfo NOTE 00000000 2da158 000480 00 0 0 1
[89] .symtab SYMTAB 00000000 2da5d8 261080 10 90 110134 4
[90] .strtab STRTAB 00000000 53b658 4f173c 00 0 0 1
[91] .shstrtab STRTAB 00000000 a2cd94 00091c 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Using in gdb (e.g. for this pointing to some MyClass instance)
info vtbl this
on linux sometimes shows a high address somewhere below 0x800000000000.
But asking
readelf --symbols --wide /usr/bin/my-binary|c++filt|grep "vtable.*MyClass"
shows a different - very much lower - value.
I guess this is due to code relocation.
How can i correlate these values?
Where is "documented" which sections/area are relocated?
I am asking because if have a dangling instance, with memory area already re-used for some other object. As a consequence my this ptr has a "wrong" vtable entry.
How can i find out, which class the vtable is now referencing to?
Exercising with a simple (small) test program does not have this problem:
frank#frank-PC:~/TEST$ cat test.cpp
#include <iostream>
class Base
{
public:
virtual ~Base()
{
}
virtual void DoIt(void)
{
std::cout << "Base::DoIt()" << std::endl;
}
};
class Derived : public Base
{
public:
void DoIt(void)
{
std::cout << "Derived::DoIt()" << std::endl;
}
};
int
main(
int,
char**)
{
Derived d;
d.DoIt();
return 0;
}
compile:
g++ -fPIC -o test -g3 test.cpp
gdb:
frank#frank-PC:~/TEST$ gdb --batch -ex "break 22" -ex "r" -ex "info vtbl this" ./test
warning: /home/frank/.gdbinit: Success
Breakpoint 1 at 0x400c13: file test.cpp, line 22.
Derived::DoIt()
Breakpoint 1, Derived::DoIt (this=0x7fffffffdad0) at test.cpp:22
22 }
vtable for 'Derived' # 0x601d88 (subobject # 0x7fffffffdad0):
[0]: 0x400c62 <Derived::~Derived()>
[1]: 0x400ca4 <Derived::~Derived()>
[2]: 0x400bdc <Derived::DoIt()>
symbol from readelf:
frank#frank-PC:~/TEST$ readelf --symbols --wide ./test|c++filt |grep vtable
11: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND vtable for __cxxabiv1::__class_type_info#CXXABI_1.3 (4)
13: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND vtable for __cxxabiv1::__si_class_type_info#CXXABI_1.3 (4)
85: 0000000000601d78 40 OBJECT WEAK DEFAULT 23 vtable for Derived
86: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND vtable for __cxxabiv1::__class_type_info##CXXABI_1.3
97: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND vtable for __cxxabiv1::__si_class_type_info##CXXABI_1.3
104: 0000000000601da0 40 OBJECT WEAK DEFAULT 23 vtable for Base
So vtable for Derived is at 0x0000000000601d78.
sections from readelf:
frank#frank-PC:~/TEST$ readelf --sections --wide ./test
There are 40 section headers, starting at offset 0x11a38:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400238 000238 00001c 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 000254 000020 00 A 0 0 4
[ 3] .note.gnu.build-id NOTE 0000000000400274 000274 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 000298 00001c 00 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002b8 0002b8 0001c8 18 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400480 000480 00021e 00 A 0 0 1
[ 7] .gnu.version VERSYM 000000000040069e 00069e 000026 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 00000000004006c8 0006c8 000080 00 A 6 3 8
[ 9] .rela.dyn RELA 0000000000400748 000748 0000a8 18 A 5 0 8
[10] .rela.plt RELA 00000000004007f0 0007f0 0000c0 18 AI 5 26 8
[11] .init PROGBITS 00000000004008b0 0008b0 00001a 00 AX 0 0 4
[12] .plt PROGBITS 00000000004008d0 0008d0 000090 10 AX 0 0 16
[13] .plt.got PROGBITS 0000000000400960 000960 000008 00 AX 0 0 8
[14] .text PROGBITS 0000000000400970 000970 0003d2 00 AX 0 0 16
[15] .fini PROGBITS 0000000000400d44 000d44 000009 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000400d50 000d50 000037 00 A 0 0 8
[17] .eh_frame_hdr PROGBITS 0000000000400d88 000d88 000084 00 A 0 0 4
[18] .eh_frame PROGBITS 0000000000400e10 000e10 00025c 00 A 0 0 8
[19] .gcc_except_table PROGBITS 000000000040106c 00106c 00000c 00 A 0 0 1
[20] .init_array INIT_ARRAY 0000000000601d58 001d58 000010 00 WA 0 0 8
[21] .fini_array FINI_ARRAY 0000000000601d68 001d68 000008 00 WA 0 0 8
[22] .jcr PROGBITS 0000000000601d70 001d70 000008 00 WA 0 0 8
[23] .data.rel.ro PROGBITS 0000000000601d78 001d78 000078 00 WA 0 0 8
[24] .dynamic DYNAMIC 0000000000601df0 001df0 0001f0 10 WA 6 0 8
[25] .got PROGBITS 0000000000601fe0 001fe0 000020 08 WA 0 0 8
[26] .got.plt PROGBITS 0000000000602000 002000 000058 08 WA 0 0 8
[27] .data PROGBITS 0000000000602058 002058 000018 00 WA 0 0 8
[28] .bss NOBITS 0000000000602070 002070 000008 00 WA 0 0 1
[29] .comment PROGBITS 0000000000000000 002070 000034 01 MS 0 0 1
[30] .debug_aranges PROGBITS 0000000000000000 0020a4 0000b0 00 0 0 1
[31] .debug_info PROGBITS 0000000000000000 002154 001925 00 0 0 1
[32] .debug_abbrev PROGBITS 0000000000000000 003a79 000516 00 0 0 1
[33] .debug_line PROGBITS 0000000000000000 003f8f 00080f 00 0 0 1
[34] .debug_str PROGBITS 0000000000000000 00479e 009ba4 01 MS 0 0 1
[35] .debug_ranges PROGBITS 0000000000000000 00e342 0000a0 00 0 0 1
[36] .debug_macro PROGBITS 0000000000000000 00e3e2 00251e 00 0 0 1
[37] .shstrtab STRTAB 0000000000000000 0118ac 000186 00 0 0 1
[38] .symtab SYMTAB 0000000000000000 010900 000a50 18 39 59 8
[39] .strtab STRTAB 0000000000000000 011350 00055c 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
So relevant section is .data.rel.ro
objdump:
frank#frank-PC:~/TEST$ objdump -s -j .data.rel.ro ./test
./test: file format elf64-x86-64
Contents of section .data.rel.ro:
601d78 00000000 00000000 c81d6000 00000000 ..........`.....
601d88 620c4000 00000000 a40c4000 00000000 b.#.......#.....
601d98 dc0b4000 00000000 00000000 00000000 ..#.............
601da8 e01d6000 00000000 460b4000 00000000 ..`.....F.#.....
601db8 7c0b4000 00000000 a20b4000 00000000 |.#.......#.....
601dc8 00000000 00000000 780d4000 00000000 ........x.#.....
601dd8 e01d6000 00000000 00000000 00000000 ..`.............
601de8 810d4000 00000000 ..#.....
Note
There is an offset of 0x10 (#16) between the "vtable for Derived" information of readelf (0x0000000000601d78) and gdb (0x0000000000601d88).
The reason for that is described here: https://www.rpi.edu/dept/cis/software/g77-mingw32/info-html/g++int.html#Vtables, chapter "Specification of thunked vtables".
"For vtable thunks, each slot only consists of a pointer to the virtual function, which might be a thunk function. The first slot in the vtable is an offset of the this pointer to the complete object, which is needed as a parameter to __dynamic_cast. The second slot is the virtual typeinfo function. All other slots are allocated with the same procedure as in the non-thunked case. Allocation of vfields also uses the same procedure as described above."
Looking again at those leading #16 octets in my example we have
601d78 00000000 00000000 c81d6000 00000000 ..........`.....
1st slot 0x0000000000000000
2nd slot 0x0000000000601dc8 (octets re-ordered for a valid address)
Cross-check for 2nd slot:
> objdump -z -D -j .data.rel.ro test|c++filt
0000000000601dc8 <typeinfo for Derived>:
601dc8: 00 00 add %al,(%rax)
601dca: 00 00 add %al,(%rax)
601dcc: 00 00 add %al,(%rax)
601dce: 00 00 add %al,(%rax)
601dd0: 78 0d js 601ddf <typeinfo for Derived+0x17>
601dd2: 40 00 00 add %al,(%rax)
601dd5: 00 00 add %al,(%rax)
601dd7: 00 e0 add %ah,%al
601dd9: 1d 60 00 00 00 sbb $0x60,%eax
601dde: 00 00 add %al,(%rax)
I'm debugging a boot loader (syslinux) with gdb and the gdb-stub of qemu. At some point the main file load a shared object ldlinux.elf.
I would like to add the symbols in gdb for that file. The command add-symbol-file seems like the way to go. However, as a relocatable file, I have to specify the memory address it has been loaded at. And here comes the problem.
Although I know the base address at which the LOAD segment has been loaded at, add-symbol-file works section-wise and want me to specify the address at which each section has been loaded.
Can I tell gdb to load all the symbols of all the sections provided that I specify the base address of the file in memory?
Does the behavior of gdb make sens? The section headers aren't used for running an ELF and are even optional. I can't see a use case where specifying the load address of the sections would be useful.
Example
Here are the program headers and section headers of the shared object.
Elf file type is DYN (Shared object file)
Entry point 0x4c60
There are 3 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x1db10 0x20bfc RWE 0x1000
DYNAMIC 0x01d618 0x0001d618 0x0001d618 0x00098 0x00098 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
Section to Segment mapping:
Segment Sections...
00 .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text .rodata .ctors .dtors .data.rel.ro .dynamic .got .got.plt .data .bss
01 .dynamic
02
There are 29 section headers, starting at offset 0x78618:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .gnu.hash GNU_HASH 00000094 000094 0007e0 04 A 2 0 4
[ 2] .dynsym DYNSYM 00000874 000874 0015c0 10 A 3 1 4
[ 3] .dynstr STRTAB 00001e34 001e34 0010f4 00 A 0 0 1
[ 4] .rel.dyn REL 00002f28 002f28 000ce8 08 A 2 0 4
[ 5] .rel.plt REL 00003c10 003c10 000568 08 AI 2 6 4
[ 6] .plt PROGBITS 00004180 004180 000ae0 04 AX 0 0 16
[ 7] .text PROGBITS 00004c60 004c60 013816 00 AX 0 0 4
[ 8] .rodata PROGBITS 00018480 018480 00462f 00 A 0 0 32
[ 9] .ctors INIT_ARRAY 0001cab0 01cab0 000010 00 WA 0 0 4
[10] .dtors FINI_ARRAY 0001cac0 01cac0 000004 00 WA 0 0 4
[11] .data.rel.ro PROGBITS 0001cae0 01cae0 000b38 00 WA 0 0 32
[12] .dynamic DYNAMIC 0001d618 01d618 000098 08 WA 3 0 4
[13] .got PROGBITS 0001d6b0 01d6b0 0000d0 04 WA 0 0 4
[14] .got.plt PROGBITS 0001d780 01d780 0002c0 04 WA 0 0 4
[15] .data PROGBITS 0001da40 01da40 0000d0 00 WA 0 0 32
[16] .bss NOBITS 0001db20 01db10 0030dc 00 WA 0 0 32
[17] .comment PROGBITS 00000000 01db10 000026 01 MS 0 0 1
[18] .debug_aranges PROGBITS 00000000 01db38 0010c0 00 0 0 8
[19] .debug_info PROGBITS 00000000 01ebf8 021ada 00 0 0 1
[20] .debug_abbrev PROGBITS 00000000 0406d2 009647 00 0 0 1
[21] .debug_line PROGBITS 00000000 049d19 00bd3a 00 0 0 1
[22] .debug_frame PROGBITS 00000000 055a54 004574 00 0 0 4
[23] .debug_str PROGBITS 00000000 059fc8 00538c 01 MS 0 0 1
[24] .debug_loc PROGBITS 00000000 05f354 01312d 00 0 0 1
[25] .debug_ranges PROGBITS 00000000 072481 0005d0 00 0 0 1
[26] .shstrtab STRTAB 00000000 072a51 000101 00 0 0 1
[27] .symtab SYMTAB 00000000 072b54 003530 10 28 504 4
[28] .strtab STRTAB 00000000 076084 002593 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
If I try to load the file at the address 0x7fab000 then it will relocate the symbols so that the .text section starts at 0x7fab000.
(gdb) add-symbol-file bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
add symbol table from file "bios/com32/elflink/ldlinux/ldlinux.elf" at
.text_addr = 0x7fab000
(y or n) y
Reading symbols from bios/com32/elflink/ldlinux/ldlinux.elf...done.
And then all the symbols are off by 0x4c60 bytes.
So, finally, I made my own command with python and the readelf tool. It's not very clean since it runs readelf in a subprocess and parse its output instead of parsing the ELF file directly, but it works (for 32 bits ELF only).
It uses the section headers to generate and run an add-symbol-file command with all the sections correctly relocated. The usage is pretty simple, you give it the elf file and the base address of the file. And since the remove-symbol-file wasn't working properly by just giving it the filename, I made a remove-symbol-file-all that generate and run the right remove-symbol-file -a address command.
(gdb) add-symbol-file-all bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
add symbol table from file "bios/com32/elflink/ldlinux/ldlinux.elf" at
.text_addr = 0x7fafc50
.gnu.hash_addr = 0x7fab094
.dynsym_addr = 0x7fab874
.dynstr_addr = 0x7face34
.rel.dyn_addr = 0x7fadf28
.rel.plt_addr = 0x7faec08
.plt_addr = 0x7faf170
.rodata_addr = 0x7fc34e0
.ctors_addr = 0x7fc7af0
.dtors_addr = 0x7fc7b00
.data.rel.ro_addr = 0x7fc7b20
.dynamic_addr = 0x7fc8658
.got_addr = 0x7fc86f0
.got.plt_addr = 0x7fc87bc
.data_addr = 0x7fc8a80
.bss_addr = 0x7fc8b60
(gdb) remove-symbol-file-all bios/com32/elflink/ldlinux/ldlinux.elf 0x7fab000
Here is the code to be added in the .gdbinit file.
python
import subprocess
import re
def relocatesections(filename, addr):
p = subprocess.Popen(["readelf", "-S", filename], stdout = subprocess.PIPE)
sections = []
textaddr = '0'
for line in p.stdout.readlines():
line = line.decode("utf-8").strip()
if not line.startswith('[') or line.startswith('[Nr]'):
continue
line = re.sub(r' +', ' ', line)
line = re.sub(r'\[ *(\d+)\]', '\g<1>', line)
fieldsvalue = line.split(' ')
fieldsname = ['number', 'name', 'type', 'addr', 'offset', 'size', 'entsize', 'flags', 'link', 'info', 'addralign']
sec = dict(zip(fieldsname, fieldsvalue))
if sec['number'] == '0':
continue
sections.append(sec)
if sec['name'] == '.text':
textaddr = sec['addr']
return (textaddr, sections)
class AddSymbolFileAll(gdb.Command):
"""The right version for add-symbol-file"""
def __init__(self):
super(AddSymbolFileAll, self).__init__("add-symbol-file-all", gdb.COMMAND_USER)
self.dont_repeat()
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
filename = argv[0]
if len(argv) > 1:
offset = int(str(gdb.parse_and_eval(argv[1])), 0)
else:
offset = 0
(textaddr, sections) = relocatesections(filename, offset)
cmd = "add-symbol-file %s 0x%08x" % (filename, int(textaddr, 16) + offset)
for s in sections:
addr = int(s['addr'], 16)
if s['name'] == '.text' or addr == 0:
continue
cmd += " -s %s 0x%08x" % (s['name'], addr + offset)
gdb.execute(cmd)
class RemoveSymbolFileAll(gdb.Command):
"""The right version for remove-symbol-file"""
def __init__(self):
super(RemoveSymbolFileAll, self).__init__("remove-symbol-file-all", gdb.COMMAND_USER)
self.dont_repeat()
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
filename = argv[0]
if len(argv) > 1:
offset = int(str(gdb.parse_and_eval(argv[1])), 0)
else:
offset = 0
(textaddr, _) = relocatesections(filename, offset)
cmd = "remove-symbol-file -a 0x%08x" % (int(textaddr, 16) + offset)
gdb.execute(cmd)
AddSymbolFileAll()
RemoveSymbolFileAll()
end
Can I tell gdb to load all the symbols of all the sections provided that I specify the base address of the file in memory?
Yes, but you need to provide the address of .text section, i.e. 0x7fab000+0x00004c60 here. I agree: it's quite annoying to have to fish out address of .text, and I wanted to fix it many times, so that e.g.
(gdb) add-symbol-file foo.so #0x7abc0000
just works. Feel free to file a feature request in GDB bugzilla.
Does the behavior of gdb make sens?
I am guessing that this is rooted in how GDB was used to debug embedded ROMs, where each section can be at arbitrary memory address.
I'm trying to count static initializers in a C++ file.
Solution I already have (which used to work with gcc-4.4) is looking at size of the .ctors ELF section.
After an upgrade to gcc-4.6, this seems to no longer return valid results (calculated number of static initializers is 0, which doesn't match reality, e.g. as returned by nm).
Now the issue is I'd like the solution to work even in absence of symbols (otherwise I'd have used nm).
Below is the output of readelf -SW of an example executable:
There are 35 section headers, starting at offset 0x4f39820:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 00000174 000174 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 00000188 000188 000020 00 A 0 0 4
[ 3] .note.gnu.build-id NOTE 000001a8 0001a8 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 000001cc 0001cc 000918 04 A 5 0 4
[ 5] .dynsym DYNSYM 00000ae4 000ae4 00a5e0 10 A 6 1 4
[ 6] .dynstr STRTAB 0000b0c4 00b0c4 00ef72 00 A 0 0 1
[ 7] .gnu.version VERSYM 0001a036 01a036 0014bc 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 0001b4f4 01b4f4 000450 00 A 6 13 4
[ 9] .rel.dyn REL 0001b944 01b944 268480 08 A 5 0 4
[10] .rel.plt REL 00283dc4 283dc4 0048c8 08 A 5 12 4
[11] .init PROGBITS 0028868c 28868c 00002e 00 AX 0 0 4
[12] .plt PROGBITS 002886c0 2886c0 0091a0 04 AX 0 0 16
[13] .text PROGBITS 00291860 291860 3ac5638 00 AX 0 0 16
[14] malloc_hook PROGBITS 03d56ea0 3d56ea0 00075a 00 AX 0 0 16
[15] google_malloc PROGBITS 03d57600 3d57600 008997 00 AX 0 0 16
[16] .fini PROGBITS 03d5ff98 3d5ff98 00001a 00 AX 0 0 4
[17] .rodata PROGBITS 03d5ffc0 3d5ffc0 ffa640 00 A 0 0 64
[18] .eh_frame_hdr PROGBITS 04d5a600 4d5a600 0004b4 00 A 0 0 4
[19] .eh_frame PROGBITS 04d5aab4 4d5aab4 001cb8 00 A 0 0 4
[20] .gcc_except_table PROGBITS 04d5c76c 4d5c76c 0003ab 00 A 0 0 4
[21] .tbss NOBITS 04d5df0c 4d5cf0c 000014 00 WAT 0 0 4
[22] .init_array INIT_ARRAY 04d5df0c 4d5cf0c 000090 00 WA 0 0 4
[23] .ctors PROGBITS 04d5df9c 4d5cf9c 000008 00 WA 0 0 4
[24] .dtors PROGBITS 04d5dfa4 4d5cfa4 000008 00 WA 0 0 4
[25] .jcr PROGBITS 04d5dfac 4d5cfac 000004 00 WA 0 0 4
[26] .data.rel.ro PROGBITS 04d5dfc0 4d5cfc0 1b160c 00 WA 0 0 32
[27] .dynamic DYNAMIC 04f0f5cc 4f0e5cc 000220 08 WA 6 0 4
[28] .got PROGBITS 04f0f7ec 4f0e7ec 00a800 04 WA 0 0 4
[29] .data PROGBITS 04f1a000 4f19000 0206b8 00 WA 0 0 32
[30] .bss NOBITS 04f3a6c0 4f396b8 04c800 00 WA 0 0 32
[31] .comment PROGBITS 00000000 4f396b8 00002a 01 MS 0 0 1
[32] .shstrtab STRTAB 00000000 4f396e2 00013e 00 0 0 1
[33] .symtab SYMTAB 00000000 4f39d98 4ff960 10 34 140163 4
[34] .strtab STRTAB 00000000 54396f8 144992a 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Should I be looking at .init or .init_array instead? Could you point me to corresponding documentation that explains the change between gcc or linker versions?
Static constructors can be triggered by any of the three sections .init, .ctors, or .init_array (oldest to newest in that order). .init contains a fragment of code, .ctors and .init_array contain pointers to code. The difference between .ctors and .init_array has to do with the overall order in which constructors are executed. As far as I know, none of this is documented anywhere other than code comments and mailing list posts, but it's probably worth checking the ELF ABI documents (g- and ps- both).
You cannot deduce the number of static constructors in a file from the size of any of these sections. It is permitted, and common, for compilers to generate a single special function which invokes all of the constructors in a file, and reference only that one function in whichever of the sections it uses. All you can know for sure (without examining the contents of the sections, applying relocations, and chasing pointers / call instructions into the .text segment and reverse engineering whatever gets called) is: in an object file, if at least one of these sections has nonzero size, then there is at least one file- or global-scope constructor in the file; if all three sections are empty, then there are none. (In an executable, all three sections are always nonempty, because the data structures that they define need headers and trailers, which are automatically added at link time.)
Note also that constructors for block-scoped static objects are not invoked from any of these sections; they're invoked the first time control reaches their declaration.
I am assuming you have access to all the source code of your applications (and perhaps all the libraries it is called). This obviously is true for free software.
Then, you might measure that more precisely at compilation time, when compiling (with a recent version of GCC, e.g. 4.7 or 4.8) your application. You could extend it with MELT (that is a high level domain specific language to extend GCC), or with painful GCC plugins coded in C++, to measure such things.
And I am not entirely sure that your question makes a precise sense. If your application is e.g. linked to some shared library which use visibility tricks to hide its static constructors, understanding how much static constructors that library calls is not really defined.