ELF File Format - c++

I'm attempting to manually load the hexdump of an elf file that I compiled using g++ into a processor simulation I designed. There are 30 sections to a standard elf file and I am loading all 30 segments with their proper memory location offset taken into account. I then start my program counter at the beginning of the .text section (00400130) but it seems that the program isn't running correctly. I have verified my processor design relatively thoroughly using SPIM as a gold standard. The strange thing is that, if I load an assembly file into SPIM, and then take the disassembled .text and .data sections that are generated by the software, load them into my processor's memory, the programs work. This is different from what I want to do because I want to:
write a c++ program
compile it using mipseb-linux-g++ (cross compiler)
hex dump all sections into their own file
read files and load contents into processor "memory"
run program
Where in the ELF file should I place my program counter initially? I have it at the beginning of .text right now. Also, do I only need to include .text and .data for my program to work correctly? What am I doing wrong here?

The ELF header should include the entry address, which is not necessarily the same as the first address in the .text region. Use objdump -f to see what the entry point of the file is -- it'll be called the "start address".
The format is described here - you should be using the program headers rather than the section headers for loading the ELF image into memory (I doubt that there are 30 program headers), and the entry point will be described by the e_entry field in the ELF header.

Use the e_entry field of the ELF header to determine where to set the Program Counter.

Look into Elf32_Ehdr.e_entry (or Elf64_Ehdr.e_entry if you are on 64-bit platform). You should at least also include the .bss section, which is empty, but has "in-memory" size in the disk ELF image.
Wikipedia will lead you to all necessary documentation.
Edit:
Here's from objdump -h /usr/bin/vim on my current box:
Sections:
Idx Name Size VMA LMA File off Algn
...
22 .bss 00009628 00000000006df760 00000000006df760 001df760 2**5
ALLOC
23 .comment 00000bc8 0000000000000000 0000000000000000 001df760 2**0
CONTENTS, READONLY
Note the File off is the same for both .bss and .comment, which means .bss is empty in the disk file, but should be 0x9628 bytes in memory.

Related

gcc objdump - what is section .ARM.exidx.text

I need to find out what a particular BYTE - .obj address 0x584a7 - in my .obj is mapped to i.e. what is responsible for generating it (code/debug info/etc).
I've successfully run objdump -xSsDg on my .obj file.
Looking at the output, I've identified the areas that refer to that byte:
First section: note the address is 0x584a0 and the size is 8 so this is what includes my byte of interest (0x584a7)
1012 .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE 00000008 00000000 00000000 000584a0 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
...
Second section: using a hex editor, I've identified that this data - 00000000 00000080 - matches my actual .obj file. The first four 0 at the start of the line indicate the offset INTO this section, which itself starts at 0x584a0
Contents of section .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE:
0000 00000000 00000080 ........
...
Third section: looks like code but not sure what is what.
Disassembly of section .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE:
00000000 <.ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE>:
0: 00000000 andeq r0, r0, r0
0: R_ARM_PREL31 .text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE
0: R_ARM_NONE __aeabi_unwind_cpp_pr1
4: 80000000 andhi r0, r0, r0
4: R_ARM_PREL31 .ARM.extab.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE
So I have my .cpp and I have the .obj and I can see the mangled name has something to do with QHash, QString, QHashData . . .
QUESTION
How do I conclusively map this section to a specific . . . THING be it code or debug info or . . . whatever so that I can know what affects this particular byte.
In an ARM object file, sections named .ARM.exidx.text.* contain the exception index table. Each entry in this table contains the association between a function in the code being compiled and its exception-handling code; in other words, each entry specifies what should happen when an exception is thrown during execution of the function (or in other cases where the function call stack frame needs to be unwound outside the normal code execution path). These sections don't contain ARM or Thumb instructions, so their contents cannot be disassembled, and the andeq and andhi instructions you are seeing in your disassembly are misleading (the -D option you are passing to objdump makes it disassemble the contents of all object sections, even if they don't contain instructions).
In your case, you are looking at the frame unwinding instructions associated to the deleteNode() method of the QHash class from the Qt library. The format of these instructions is defined in the exception handling ABI for the ARM architecture (available for download at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf), and specifically at section 9.3 of the document.

How to get line number from libunwind and AddressSanitizer listed as <shared object>+offset?

I often get stack traces from libunwind or AddressSanitizer like this:
#12 0x7ffff4b47063 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1f5063)
#13 0x7ffff4b2c783 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1da783)
#14 0x7ffff4b2cca4 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1daca4)
#15 0x7ffff4b2cf42 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1daf42)
I know that if I have gdb attached to the still living process, I can use this to get details
on the location:
(gdb) list *0x7ffff4b47063
But if the process has died, I can not just restart it under gdb and use the above because
address randomization makes that I get the wrong result (at least, that is my assumption;
I clearly do not get meaningful locations). So, I tried
% gdb program
% run
<get to the place everything is loaded and type Control-C>
(gdb) info shared
<Dumps mapping location of shared objects>
(gdb) list *(<base of libswipl.so.7.1.13>+0x1f5063)
But, this either lists nothing or clearly the wrong location. This sounds simple, but
I failed to find the answer :-( Platform is 64-bit Linux, but I guess this applies to
any platform.
(gdb) info shared
<Dumps mapping location of shared objects>
Unfortunately, above does not dump actual mapping location that is usable with this:
libswipl.so.7.1.13+0x1f5063
(as you've discovered). Rather, GDB output lists where the .text section was mapped, not where the ELF binary itself was mapped.
You can adjust for .text offset by finding it in
readelf -WS libswipl.so.7.1.13 | grep '\.text'
It might be easier to use addr2line instead. Something like
addr2line -fe libswipl.so.7.1.13 0x1f5063 0x1da783
should work.
Please see http://clang.llvm.org/docs/AddressSanitizer.html for the instructions on using the asan_symbolize.py script and/or the symbolize=true option.

How to make GDB use a RAM dump file?

We have an embedded board with ColdFire CPU which runs µC-OS/II. When the embebbed program crashes, the CPU dumps (or copies) the entire RAM in the embedded flash. Then, we have a procedure to retrieve the RAM content (which was dumped into the flash) into a simple .bin file.
When we want to debug, we use GDB (m68k-elf-gdb.exe) combined with the .elf file. For example :
$ gdb our_elf_file
(gdb) print some_var
Cannot access memory at address 0x30617890
(gdb) ptype some_var
type = unsigned int
(gdb)
This allows us to know the address of the variable. Then, we perform a simple offset operation with the previous given address and read the RAM dump at a specific location.
For example, if we want to read some_var located at 0x30617890, we know that the dump represent the RAM content starting from 0x20000000. After that, we read 4 bytes of the .bin file at the offset (0x30617890 - 0x20000000).
(Sometimes we also use objdump (m68k-elf-objdump.exe) for other purposes).
I am completely new to this kind of stuff so maybe my question is stupid, but, is there some way to tell gdb where the RAM content is ?

ELF core file format

Short of digging through GDB source, where can I find documentation about the format used to create core files?
The ELF specification leaves the core file format open, so I guess this should be part of the GDB specifications! Sadly, I did not find any help in this regard from GNU's gdb documentation.
Here's what I am trying to do: Map virtual addresses to function names in executable/libraries that comprised the running process. To do that, I would first like to figure out, from the core file, the map from virtual address space to the name of the executable file/libraries, and then dig into the relevant file to get the symbolic information.
Now 'readelf -a core' tells me that nearly all the segments in the core file are of the type 'load' -- I'd guess these are the .text and .bss/.data segments from all the participating files, plus a stack segment. Barring these load segments, there is one note segment, but that does not seem to contain the map. So how is the information about which file a segment corresponds to, stored in the core file? Are those 'load' segments format in a particular way to include the file information?
The core dump file format is using the ELF format but is not described in the ELF standard. AFAIK, there is no authoritative reference on this.
so how is the information about which file a segment corresponds to, stored in the core file?
A lot of extra information is contained within the ELF notes. You can use readelf -n to see them.
The CORE/NT_FILE note defines the association between memory address range and file (+ offset):
Page size: 1
Start End Page Offset
0x0000000000400000 0x000000000049d000 0x0000000000000000
/usr/bin/xchat
0x000000000069c000 0x00000000006a0000 0x000000000009c000
/usr/bin/xchat
0x00007f2490885000 0x00007f24908a1000 0x0000000000000000
/usr/share/icons/gnome/icon-theme.cache
0x00007f24908a1000 0x00007f24908bd000 0x0000000000000000
/usr/share/icons/gnome/icon-theme.cache
0x00007f24908bd000 0x00007f2490eb0000 0x0000000000000000
/usr/share/fonts/opentype/ipafont-gothic/ipag.ttf
[...]
For each thread, you should have a CORE/NT_PRSTATUS note which gives you the registers of the thread (including the stack pointer). You might be able to infer the position of the stacks from this.
More information about format of ELF core files:
Anatomy of an ELF core file (disclaimer: I wrote this one)
A brief look into core dumps
A core dump is the in-memory image of the process when it crashed. It includes the program segments, the stack, the heap and other data. You'll still need the original program in order to make sense of the contents: the symbol tables and other data make the raw addresses and structures in the memory image meaningful.
Additional information about the process that generated the core file is stored in an ELF note section, albeit in an operating system specific manner. For example, see the core(5) manual page for NetBSD.
Not so much gdb as the bfd library used by gdb, binutils, etc.
A simpler solution to your problem may be to parse text from /proc/$pid/maps to determine what file a given virtual address maps to. You could then parse the corresponding file.
Kenshoto's open source VDB (debugger) uses this approach, if you can read python it is a good example.

Linux: What is the best way to estimate the code & static data size of program?

I want to be able to get an estimate of how much code & static data is used by my C++ program?
Is there a way to find this out by looking at the executable or object files? Or perhaps something I can do at runtime?
Will objdump & readelf help?
"size" is the traditional tool. "readelf" has a lot of options.
$ size /bin/sh
text data bss dec hex filename
712739 37524 21832 772095 bc7ff /bin/sh
If you want to take the next step of identifying the functions and data structures to focus on for footprint reduction, the --size-sort argument to nm can show you:
$ nm --size-sort /usr/bin/fld | tail -10
000000ae T FontLoadFontx
000000b0 T CodingByRegistry
000000b1 t ShmFont
000000ec t FontLoadw
000000ef T LoadFontFile
000000f6 T FontLoadDFontx
00000108 D fSRegs
00000170 T FontLoadMinix
000001e7 T main
00000508 T FontLoadBdf
readelf will indeed help. You can use the -S option; that will show the sizes of all sections. .text is (the bulk of) your executable code. .data and .rodata is your static data. There are other sections too, some of which are used at runtime, others only at link time.
size -A