What's the meaning of HIGHLOW in a disassembled binary file? - c++

I just used DUMPBIN for the first time and I see the term HIGHLOW repeatedly in the output file:
BASE RELOCATIONS #7
11000 RVA, E0 SizeOfBlock
...
3B5 HIGHLOW 2001753D ___onexitbegin
3C1 HIGHLOW 2001753D ___onexitbegin
...
I'm curious what this term stands for. I didn't find anything on Google or Stackoverflow about it.

To apply a fixup, a delta is calculated as the difference between the
preferred base address, and the base where the image is actually
loaded.
The basic idea is that when doing a fixup at some address, we must know
what memory must be changed ("offset" field)
what value is needed for its relocation ("delta" value)
which parts of relocated data and delta value to use ("type" field)
Here are some possible values of the "type" field
HIGH - add higher word (16 bits) of delta to the 16-bit value at "offset"
LOW - add lower word of delta to the value at "offset"
HIGHLOW - add full delta to the 32-bit value at "offset"
In other words, HIGHLOW type tells the program that it's doing a fix-up on offset "offset" from the page of this relocation block*, and that there is a doubleword that needs to be modified in order to have properly working executable.
* all of the relocation entries are grouped into blocks, and every block has a page on which its entries are applied
Let's say that you have this instruction in your code:
section .data
message: "Hello World!", 0
section .code
...
mov eax, message
...
You run assembler and immediately after it you run disassembler. Now your code looks like this:
mov eax, dword [0x702000]
You're now curious why is it 0x700000, and when you look into file dump, you see that
ImageBase: 0x00700000
Now you understand where did this number come from and you'e ready to run the executable.
Loader which loads executable files into memory and creates address space for them finds out, that memory 0x700000 is unavailable and it needs to place that file somewhere else. It decides that 0xf00000 will be OK and copies the file contents there.
But, your program was linked to work only with data on 0x700000 and there was no way for linker to know that its output would be relocated. Because of this, loader must do its magic. It
calculates delta value - the old address (image base) is 0x700000 but it wants 0xf00000 (preferred address). It subtracts one from another and gets 0x800000 as result.
gets to the .reloc section of the file
checks if there is still another page (4KB of data) to be relocated. If no, it continues toward calling fileĀ“s entry point.
4.for every relocation for the current page, it
gets data at relocation offset
adds the delta value (in the way as type field states)
places the new value at relocation offset
continues on step 3
There are also more types of relocation entry and some of them are architecture-specific. To see a full list, read the "Microsoft Portable Executable and Common Object File Format, section 6.6.2. Fixup Types".

What you see here is the content of the "Base relocation table" in Microsoft Windows executable files.
Base relocation tables are necessary in Windows for DLL files and they are optional for executable files; they contain information about the location of address information in the EXE/DLL file that must be updated when the actual address of the DLL file in memory is known (when loading the DLL into memory). Windows uses the information stored in this table to update the address information.
The table supports different types of addresses while the naming is Microsoft-specific: ABSOLUTE (= dummy), HIGH, LOW, HIGHLOW, HIGHADJ and MIPS_JMPADDR.
The full name of the constant is "IMAGE_REL_BASED_HIGHLOW".
The "ABSOLUTE" type is typically a dummy entry inserted to ensure the parts of the table are a multiple of 4 (or 8) bytes long.
On x86 CPUs only the "HIGHLOW" type is used: It tells Windows about the location of an absolute (32-bit) address in the file.
Some background info:
In your example the "Image Base" could be 0x20000000 which means that the EXE/DLL file has been compiled to be loaded into address 0x20000000. At the addresses 0x200113B5 (0x20000000 + 0x11000 + 0x3B5) and 0x200113C1 there are absolute addresses.
Let's say the memory at location 0x200113B5 contains the value 0x20012345 which is the address of a function or variable in the program.
Maybe the memory at address 0x20000000 cannot be used and Windows decides to load the DLL into the memory at 0x50000000 instead. Then the 0x20012345 must be replaced by 0x50012345.
The information in the base relocation table is used by Windows to find all addresses that must be replaced.

Related

Trace32 CMM script : understanding the Data.Set command

What does the following command mean?
sYmbol.NEW _VectorTable 0x34000000
sYmbol.NEW _ResetVector 0x34000020
sYmbol.NEW _InitialSP 0x34000100
Data.Set EAXI:_VectorTable %Long _InitialSP _ResetVector+1
The command Data.Set writes data values to your target's memory. The syntax of the command is
Data.Set <address>|<address_range> [<access_width>] {value(s)}
The <address> to which the data is written to has the form:
<access_class>:<address_offset>
A full address, just the address offset and the values (you want to write), can also be represented by debug symbols. These symbols are usually the variables, function names and labels defined in your target application and are declared to the debugger, by loading the target application's ELF file.
In this case however the symbols are declared in the debugger manually by the command sYmbol.NEW.
Anyway: By replacing the symbols with their value in the command Data.Set EAXI:_VectorTable %Long _InitialSP _ResetVector+1 we get the command
Data.Set EAXI:0x34000000 %Long 0x34000100 0x34000021
So what does this command actually do?
The access-width specifier %Long indicate that 32-bit values should be written. As a result the address will increment automatically by 4 for each specified data value.
The value 0x34000100 is written to address EAXI:0x34000000
The value 0x34000021 is written to address EAXI:0x34000004
The <access_class> "EAXI" indicates that the debugger should access the address 0x34000000 directly via the AXI bus (Advanced eXtensible Interface). By writing directly to the AXI bus, you bypass your target's CPU core (bypassing any MMU, MPU or caches). The leading 'E' of the access class EAXI indicates that the write operation may also performed while the CPU core is running (or considered to be running (e.g. in Prepare mode)). The description of all possible access classes is specific to the target's core-architecture and thus, you can find the description in the debugger's "Target Architecture Manual".
And what does this exactly mean for your target and the application running on it?
Well, I don't know you chip or SoC (nor do I know your application).
But from the data I see, I guess that you are debugging a chip with an ARM architecture - probably Cortex-M. Your chip's Boot-ROM seems to start at address 0x34000000, while your actual application's start-up code starts at 0x34000020 (maybe with symbol _start).
For Cortex-M cores you have to program at offset 0 of your vector table (in the boot ROM) the initial value of the stack-pointer, while at offset 4 you have to write the initial value of the program counter. In your case the program counter should be initialized with 0x34000021. Why 0x34000021 and not 0x34000020? Because your start-up code is probably encoded in ARM Thumb. (Cortex-M can only execute Thumb code). By setting the least significant bit of the initial value for the program counters to 1, the core knows, that it should start decoding Thumb instructions. (Not setting the least significant bit to 1 on a Cortex-M will cause an exception).

Why does pre_c_init access memory outside of the defined program segments?

While looking through the assembly for a console "hello world" program (compiled using the visual c++ compiler), I came across this:
pre_c_init proc near
.text:00401AFE mov eax, 5A4Dh
.text:00401B03 cmp ds:400000h, ax
The code above seems to be accessing memory that isn't filled with anything in particular: All segments start at 0x401000 or even further down in the file. (The image base is at 0x400000, but the first segment is at 0x401000).
I used OllyDbg to see what the actual value at 0x400000 is, and every single time it's the same as in the code (0x5A4D). What's going on here?
5A4D is "MZ" in little-endian ASCII, and MZ is the signature of MS-DOS and, more recently, PE executables.
The comparison checks whether the executable has been mapped at the default base address, 0x400000. This, I believe, is used to determine whether it is necessary to perform relocation.
This is discussed further in the following thread: Why does PE need a relocation table?

How do relocations work in COFF object (not image) files

What steps exactly are taken by the linker while resolving relocations in an object file before creating the final image? More specifically, how does the linker treat the value which is already stored at the relocation site? Does it always add it to the final VA/RVA, or is it sometimes ignored (e.g certain relocation types)?
I couldn't find a clear explanation in the MS PE/COFF Specfication, and after googling and experimenting for a while, all I could find out was this:
In the MS COFF spec, chapter 5.6.2 "Base Relocation Types", it is said that "The base relocation applies all 32 bits of the difference to the 32-bit field at offset", which I guess means that the relocation should take into account whatever address is already stored at the specified offset. However, chapter 5.6 (the .reloc section) is only relevant to image files, and not object files.
The dumpbin utility adds a column named "Applied To" when printing the relocations table, which seems to always (no matter the relocation type) contain the value which is stored at the relocation site.
The Relocation Directives chapter in the DJGPP COFF Specification clearly states that the value currently stored at the location should be added to the address of the symbol pointed to by the relocation table entry.
Can you point me to any (relevant) documentation which explains how relocations are handled by the linker?
The relocation section used in "image files" has a slightly different purpose from the relocation information present in "object files".
Unlike Linux Shared Libraries, Windows DLLs do not typically use position independent code. Instead they are defined relative to a fixed based address. The Windows loader, however, has the ability to relocate a DLL in the event of a conflict. To support this, DLL images contain relocation sections that specify what data needs to be modified when the image is relocated. Many intra-dll symbol references will use "eip" (or rip) relative addressing, so they may not need to be modified on DLL relocation.
Image file relocations are always specified relative to the base address of the executable image. Object file relocations are specified relative to the address (within an image, using the images preferred based address) of a symbol in a symbol table. Image files don't have a symbol table (they have an IAT, but that's not a symbol table). The set of supported relocations in object files is richer then the set supported in image files.
The details are covered in the "COFF Relocations (Object Only)" section of the PE/COFF spec (I'm looking at version 3 as I type this).

Getting the offset to .text section code PE file format? VirtualAddress, PointerToRawData?

I've been trying to do this for about two days, with no success. I have been reading over many PE file format tutorials to no avail.
I map a 32 bit executable into memory via CreateFileMapping which works perfectly. My program then loops through the section headers, and checks the characteristics against my default characteristics (to make sure the section is executable and is code). If it is true the program returns the (PIMAGE_SECTION_HEADER) pointer to that section header (program works perfectly so far).
Now that I have the pointer, there are two specific entries to the structure that have baffled me, and that is PointerToRawData and VirtualAddress, when I cout the entries;
VirtualSize = 4096, PointerToRawData = 1536.
From what I have read in PE documentation, is that PointerToRawData is a supposed offset (RVA???) to the first byte of data in the section on disk (am I correct?), and is a multiple of a alignment value (512). The question is what do I set this value to, to obtain a pointer which I can use to access the section's data. On a memory-mapped file would it be better to use (VirtualAddress value + the imagebase value) to find the first byte of the section?
Another point of confusion is VirtualSize vs SizeOfRawData. This has confused me because in this article - http://msdn.microsoft.com/en-us/library/ms809762.aspx, it says "The SizeOfRawData field (seems a bit of a misnomer) later on in the structure holds the rounded up value" yet my VirtualSize is greater than my SizeOfRawData value which has led to confusion on which one I should use.
The object of this program is to find the executable section (.text section) and perform a bitwise operation on all the bits in the section, and end the operation before the next section.
I don't want it to seem like I expect a spoonfeed, I just want some clarifications.
Thank you for your time/help, it is appreciated.
I don't happen to have the spec handy or any PE code to look at for reference (I'm writing this on my iPad from my couch ;) but the key point to realize is that there are two modes to consider: all talk of RVAs is only relevant when the PE is mapped into memory and the alignment there is page-alignment. When you're reading the file off disk, the offsets are file offsets and each section is using the file alignment.
I hope this helps.

How to understand the call stack of Visual Studio?

VCamD.ax!CFactoryTemplate::CreateInstance() + 0x3f bytes
> VCamD.ax!CClassFactory::CreateInstance() + 0x7f bytes
What's 0x7f and 0x3f?
Those values are the offset of the instruction pointer from the start of the listed function.
It's a way of expressing which assembly instruction is currently being executed in the function. Similar to having the the current source code line highlighted in the editor. Except that the offset is at the assembly level not the source code level.
You got .pdb files that were stripped. Their source code file and line number info was removed. Typical for example for .pdb files you can get from the Microsoft symbol server.
Which is okay, you can't get the source code anyway. Without the line number info, the debugger falls back to using the 'closest' symbol whose address it does have available, typically the function entry point, and adds the offset of the call instruction.
Sometimes you get bogus information if debug info is missing for long stretches of code. That 'closest' symbol is in no way related to the original code. Any time the offset gets to be larger than, oh, 0x200 then you ought to downplay the relevance of the displayed symbol name. There are a couple of places in the Windows code where you actually see the name of a string variable. The stack trace you posted has high confidence, those offsets are small.
If it's at the top of the stack, then it's the offset of the instruction pointer relative to the given symbol. So if the top stack frame is VCamD.ax!CClassFactory::CreateInstance() + 0x7f bytes, and VCamD.ax!CClassFactory::CreateInstance() is at location 0x3000 in memory (fictional obviously), then EIP is currently 0x307f. This shows you how far into the function you are.
If it's further up the stack then it's the return offset from the start of the given symbol. So if VCamD.ax!CFactoryTemplate::CreateInstance() called VCamD.ax!CClassFactory::CreateInstance(), and VCamD.ax!CFactoryTemplate::CreateInstance() is at location 0x4000 , then when VCamD.ax!CClassFactory::CreateInstance() returns, EIP will be at 0x403f.
One important thing to notice though is that if you see something like somedll!SomeFunc() + 0x5f33 bytes, you can be pretty certain that this is NOT the correct symbol. Since functions are rarely 0x5f33 bytes long, you can see that EIP is simply in a place that the debugger doesn't have symbols for.