ELF core file format - gdb

Short of digging through GDB source, where can I find documentation about the format used to create core files?
The ELF specification leaves the core file format open, so I guess this should be part of the GDB specifications! Sadly, I did not find any help in this regard from GNU's gdb documentation.
Here's what I am trying to do: Map virtual addresses to function names in executable/libraries that comprised the running process. To do that, I would first like to figure out, from the core file, the map from virtual address space to the name of the executable file/libraries, and then dig into the relevant file to get the symbolic information.
Now 'readelf -a core' tells me that nearly all the segments in the core file are of the type 'load' -- I'd guess these are the .text and .bss/.data segments from all the participating files, plus a stack segment. Barring these load segments, there is one note segment, but that does not seem to contain the map. So how is the information about which file a segment corresponds to, stored in the core file? Are those 'load' segments format in a particular way to include the file information?

The core dump file format is using the ELF format but is not described in the ELF standard. AFAIK, there is no authoritative reference on this.
so how is the information about which file a segment corresponds to, stored in the core file?
A lot of extra information is contained within the ELF notes. You can use readelf -n to see them.
The CORE/NT_FILE note defines the association between memory address range and file (+ offset):
Page size: 1
Start End Page Offset
0x0000000000400000 0x000000000049d000 0x0000000000000000
/usr/bin/xchat
0x000000000069c000 0x00000000006a0000 0x000000000009c000
/usr/bin/xchat
0x00007f2490885000 0x00007f24908a1000 0x0000000000000000
/usr/share/icons/gnome/icon-theme.cache
0x00007f24908a1000 0x00007f24908bd000 0x0000000000000000
/usr/share/icons/gnome/icon-theme.cache
0x00007f24908bd000 0x00007f2490eb0000 0x0000000000000000
/usr/share/fonts/opentype/ipafont-gothic/ipag.ttf
[...]
For each thread, you should have a CORE/NT_PRSTATUS note which gives you the registers of the thread (including the stack pointer). You might be able to infer the position of the stacks from this.
More information about format of ELF core files:
Anatomy of an ELF core file (disclaimer: I wrote this one)
A brief look into core dumps

A core dump is the in-memory image of the process when it crashed. It includes the program segments, the stack, the heap and other data. You'll still need the original program in order to make sense of the contents: the symbol tables and other data make the raw addresses and structures in the memory image meaningful.

Additional information about the process that generated the core file is stored in an ELF note section, albeit in an operating system specific manner. For example, see the core(5) manual page for NetBSD.

Not so much gdb as the bfd library used by gdb, binutils, etc.

A simpler solution to your problem may be to parse text from /proc/$pid/maps to determine what file a given virtual address maps to. You could then parse the corresponding file.
Kenshoto's open source VDB (debugger) uses this approach, if you can read python it is a good example.

Related

Trace32 command to read symbol name for a given address from the ELF File

I used T32 to load bin files and elf and wrote scripts to extract the Pc , Lr register values from the ELF file. Now I have the address for e.g say PC's address is 0xccccdddd. Now I need to get the symbol corresponding to that.
I ran gdb and used gdb info symbol 0xccccdddd and got the symbol name.
But I need to know if there is any command in T32 itself to get the symbol name. Or can I get the symbol name from some commands like readelf or objdump.
Thanks in advance.
The command to open a window to see all the static symbols is
sYmbol.Browse
To learn more about that window, I recommend to check the "Training HLL Debugging" (training_hll.pdf) from your TRACE32 installation.
To get only the symbol related to one single address use the PRACTICE function sYmbol.Name(<addr>). Functions have to be used together with a command. To simply display the name use the command PRINT.
E.g.:
PRINT sYmbol.Name(P:0xccccdddd)
Note that the address-offset has to be prefixed by an access class. Usually the access class "P:" stands for program memory, while "D:" stands for data memory. See the "Processor Architecture Manual" for more CPU specific access classes (Menu > Help > Processor Architecture Manual)

With ASLR turned on, are all sections of an image get loaded at the same offsets relative to the image base address every time?

Do different sections of libc (such as .text, .plt, .got, .bss, .rodata, and others) get loaded at the same offset relative to the libc base address every time?
I know the loader loads libc at a random location every time I run my program.
Thank you in advance.
I guess I found the answer to my own question. I wrote a pin-tool using Intel PIN that on every libc section get loaded outputs the section offset relative to the address of libc. Here are the sections having get loaded at same offsets with their corresponding offsets (the thing before - is the library name which is libc with its version and what comes after that is the section name):
libc.so.6-.note.gnu.build-id 0x0000000000000270
libc.so.6-.note.ABI-tag 0x0000000000000294
libc.so.6-.gnu.hash 0x00000000000002b8
libc.so.6-.dynsym 0x0000000000003d80
libc.so.6-.dynstr 0x0000000000010ff8
libc.so.6-.gnu.version 0x00000000000169d8
libc.so.6-.gnu.version_d 0x0000000000017b68
libc.so.6-.gnu.version_r 0x0000000000017ee0
libc.so.6-.rela.dyn 0x0000000000017f10
libc.so.6-.rela.plt 0x000000000001f680
libc.so.6-.plt 0x000000000001f7c0
libc.so.6-.plt.got 0x000000000001f8a0
libc.so.6-.text 0x000000000001f8b0
libc.so.6-__libc_freeres_fn 0x0000000000172b10
libc.so.6-__libc_thread_freeres_fn 0x0000000000175030
libc.so.6-.rodata 0x0000000000175300
libc.so.6-.stapsdt.base 0x0000000000196650
libc.so.6-.interp 0x0000000000196660
libc.so.6-.eh_frame_hdr 0x000000000019667c
libc.so.6-.eh_frame 0x000000000019bb38
libc.so.6-.gcc_except_table 0x00000000001bc3cc
libc.so.6-.hash 0x00000000001bc810
libc.so.6-.tdata 0x00000000003c07c0
libc.so.6-.tbss 0x00000000003c07d0
libc.so.6-.init_array 0x00000000003c07d0
libc.so.6-__libc_subfreeres 0x00000000003c07e0
libc.so.6-__libc_atexit 0x00000000003c08d8
libc.so.6-__libc_thread_subfreeres 0x00000000003c08e0
libc.so.6-.data.rel.ro 0x00000000003c0900
libc.so.6-.dynamic 0x00000000003c3ba0
libc.so.6-.got 0x00000000003c3d80
libc.so.6-.got.plt 0x00000000003c4000
libc.so.6-.data 0x00000000003c4080
libc.so.6-.bss 0x00000000003c5720
And there are indeed sections that get loaded at different offsets every time. You may see them in below. However, as I do not recognize them and to me not important, I would like to conclude that yes the sections we most concern about get loaded at the same offset each time the program runs.
libc.so.6-.note.stapsdt
libc.so.6-.gnu.warning.sigstack
libc.so.6-.gnu.warning.sigreturn
libc.so.6-.gnu.warning.siggetmask
libc.so.6-.gnu.warning.tmpnam
libc.so.6-.gnu.warning.tmpnam_r
libc.so.6-.gnu.warning.tempnam
libc.so.6-.gnu.warning.sys_errlist
libc.so.6-.gnu.warning.sys_nerr
libc.so.6-.gnu.warning.gets
libc.so.6-.gnu.warning.getpw
libc.so.6-.gnu.warning.re_max_failures
libc.so.6-.gnu.warning.lchmod
libc.so.6-.gnu.warning.getwd
libc.so.6-.gnu.warning.sstk
libc.so.6-.gnu.warning.revoke
libc.so.6-.gnu.warning.mktemp
libc.so.6-.gnu.warning.gtty
libc.so.6-.gnu.warning.stty
libc.so.6-.gnu.warning.chflags
libc.so.6-.gnu.warning.fchflags
libc.so.6-.gnu.warning.__compat_bdflush
libc.so.6-.gnu.warning.__memset_zero_constant_len_parameter
libc.so.6-.gnu.warning.__gets_chk
libc.so.6-.gnu.warning.inet6_option_space
libc.so.6-.gnu.warning.inet6_option_init
libc.so.6-.gnu.warning.inet6_option_append
libc.so.6-.gnu.warning.inet6_option_alloc
libc.so.6-.gnu.warning.inet6_option_next
libc.so.6-.gnu.warning.inet6_option_find
libc.so.6-.gnu.warning.getmsg
libc.so.6-.gnu.warning.putmsg
libc.so.6-.gnu.warning.fattach
libc.so.6-.gnu.warning.fdetach
libc.so.6-.gnu.warning.setlogin
libc.so.6-.gnu_debuglink
libc.so.6-.shstrtab

How to make GDB use a RAM dump file?

We have an embedded board with ColdFire CPU which runs µC-OS/II. When the embebbed program crashes, the CPU dumps (or copies) the entire RAM in the embedded flash. Then, we have a procedure to retrieve the RAM content (which was dumped into the flash) into a simple .bin file.
When we want to debug, we use GDB (m68k-elf-gdb.exe) combined with the .elf file. For example :
$ gdb our_elf_file
(gdb) print some_var
Cannot access memory at address 0x30617890
(gdb) ptype some_var
type = unsigned int
(gdb)
This allows us to know the address of the variable. Then, we perform a simple offset operation with the previous given address and read the RAM dump at a specific location.
For example, if we want to read some_var located at 0x30617890, we know that the dump represent the RAM content starting from 0x20000000. After that, we read 4 bytes of the .bin file at the offset (0x30617890 - 0x20000000).
(Sometimes we also use objdump (m68k-elf-objdump.exe) for other purposes).
I am completely new to this kind of stuff so maybe my question is stupid, but, is there some way to tell gdb where the RAM content is ?

How to read frames from a core dump (without GDB)?

I would like to access the frames stored in a core dump of a program that doesn't has debug symbols (I want to do this in C). When I open up the program and the core dump inside GDB I get a stack trace including the names of the functions. For example:
(gdb) bt
#0 0x08048443 in layer3 ()
#1 0x08048489 in layer2 ()
#2 0x080484c9 in layer1 ()
#3 0x0804854e in main ()
The names of all functions are stored in the executable in the .strtab section. How can I build up the stack trace with the different frames? Running GDB in batch mode is not an option. And also just "copy the parts from gdb the are needed" is also a bad idea because the code is not independently written.
So to make my question more precisely: Where do I find the point inside a core dump where I can start reading the stack information? Is there a library of some sort for accessing those information? A struct I can use? Or even better, a documentation how those informations are structured inside a core dump?
(I already seen the question "how to generate a stack trace from a core dump file in C, without invoking an external tool such as gdb", but since there is no valid answer, I thought I would ask it again)
[Edit] I'm doing this under Linux x86
Coredump contains stack information as well. If you can use this stack information along with the EBP and EIP register values in the coredump file, you can print the stack trace. I had written a program to do this. You can find the program in the following link.
http://www.emntech.com/programs/corestrace.c
Usage: Compile the above program and give the corefile when you execute it.
$corestrace core
If you want symbols also to be printed, you do like this: Let's assume the program that generated the core is 'test'.
$ nm -n test > symbols
$ corestrace core symbols
Sample output looks like this:
$ ./coretrace core symbols
0x80483cd foo+0x9
0x8048401 func+0x1f
0x8048430 main+0x2d

ELF File Format

I'm attempting to manually load the hexdump of an elf file that I compiled using g++ into a processor simulation I designed. There are 30 sections to a standard elf file and I am loading all 30 segments with their proper memory location offset taken into account. I then start my program counter at the beginning of the .text section (00400130) but it seems that the program isn't running correctly. I have verified my processor design relatively thoroughly using SPIM as a gold standard. The strange thing is that, if I load an assembly file into SPIM, and then take the disassembled .text and .data sections that are generated by the software, load them into my processor's memory, the programs work. This is different from what I want to do because I want to:
write a c++ program
compile it using mipseb-linux-g++ (cross compiler)
hex dump all sections into their own file
read files and load contents into processor "memory"
run program
Where in the ELF file should I place my program counter initially? I have it at the beginning of .text right now. Also, do I only need to include .text and .data for my program to work correctly? What am I doing wrong here?
The ELF header should include the entry address, which is not necessarily the same as the first address in the .text region. Use objdump -f to see what the entry point of the file is -- it'll be called the "start address".
The format is described here - you should be using the program headers rather than the section headers for loading the ELF image into memory (I doubt that there are 30 program headers), and the entry point will be described by the e_entry field in the ELF header.
Use the e_entry field of the ELF header to determine where to set the Program Counter.
Look into Elf32_Ehdr.e_entry (or Elf64_Ehdr.e_entry if you are on 64-bit platform). You should at least also include the .bss section, which is empty, but has "in-memory" size in the disk ELF image.
Wikipedia will lead you to all necessary documentation.
Edit:
Here's from objdump -h /usr/bin/vim on my current box:
Sections:
Idx Name Size VMA LMA File off Algn
...
22 .bss 00009628 00000000006df760 00000000006df760 001df760 2**5
ALLOC
23 .comment 00000bc8 0000000000000000 0000000000000000 001df760 2**0
CONTENTS, READONLY
Note the File off is the same for both .bss and .comment, which means .bss is empty in the disk file, but should be 0x9628 bytes in memory.