What's the structure of .arm.extab entry in armcc? - c++

I'm trying to understand exactly how the exception table (.arm.extab) works.
I'm aware that this is compiler dependent, so I'll restrict myself to armcc (as I'm using Keil).
A typical entry in the table looks something like:
b0aa0380 2a002c00 01000000 00000000
To my understanding, the first word encodes instructions for the personality routine, while the third word is a R_ARM_PREL31 relocation to the start of the catch block.
What baffles me is the second word - it appears to be split into 2 shorts, the second of which measures some distance from the start of the throwing function, but I'm not sure exactly to what (nor what the first short does).
Is there any place where the structure of these entries is documented?
I've found 2 relevant documents, but as far as I can see they have no compiler-dependent information, so they're not sufficient:
https://github.com/ARM-software/abi-aa/releases/download/2022Q1/aaelf32.pdf
https://github.com/ARM-software/abi-aa/releases/download/2022Q1/ehabi32.pdf

If you happen to have the byte ordering missed up, the below applies. Some information is probably useful even if the byte-order is correct in your original example.
extab and exidx are sections added by the AAPCS which is a newer ARM ABI.
For the older APCS, the frame pointer or fp is a root of a linked of the active routine back to the main routine (or _start). With AAPCS records are created and placed in the exidx and extab sections. These are needed to unwind stacks (and resources) when the fp is used as a generic register.
The exidx is an ordered table of routine start addresses and an extab index (or can't unwind). A PC (program counter) can be examined and search via the table to find the corresponding extab entry.
The ARM EHABI documentation has a section 7 on Exception-handling Table entries. These are extab entries and you can at least start from there to learn more. There are two defined,
Generic (or C++)
ARM compact
The compact model will be used for most 'C' code. There are no objects to be destroyed on the stack as with C++. The hex 8003aab0 gives,
1000b for the leading nibble, so this is compact.
0000b for the index. Su16—Short
03h - pop 16 bytes, some locals or padding.
aah - pop r4-r6
b0h - finish
Table 4, ARM-defined frame-unwinding instructions gives the unwinding data of each byte.
The next is 0x002c002a which is an offset to the generic personality routine. The next four values should be the 8.2 Data Structures, which are a size and should be zero... Next would be stride and then a four byte object type info. The offset 0x2c002a would be to call the objects destructor or some sort of wrapper to do this.
I think all C++ code is intended to use this Generic method. Other methods are for different languages and NOT compilers.
Related Q/A and links:
Arm exidx - about the exidx.
ARM link and frame pointer - situation for older APCS and many AAPCS functions.
Linux ARM Unwind - sample unwinding code for 'C'.
prel31 - SO Q/A on prel31 in Linux code above.
Generating unwind in ARM gnu assembler
gas ARM directives See: .cantunwind, .vsave, etc.

Related

Why does the compiler allocate more space than sizeof(MyClass) for the object on the stack? [duplicate]

What is stack alignment?
Why is it used?
Can it be controlled by compiler settings?
The details of this question are taken from a problem faced when trying to use ffmpeg libraries with msvc, however what I'm really interested in is an explanation of what is "stack alignment".
The Details:
When runnig my msvc complied program which links to avcodec I get the
following error: "Compiler did not align stack variables. Libavcodec has
been miscompiled", followed by a crash in avcodec.dll.
avcodec.dll was not compiled with msvc, so I'm unable to see what is going on inside.
When running ffmpeg.exe and using the same avcodec.dll everything works well.
ffmpeg.exe was not compiled with msvc, it was complied with gcc / mingw (same as avcodec.dll)
Thanks,
Dan
Alignment of variables in memory (a short history).
In the past computers had an 8 bits databus. This means, that each clock cycle 8 bits of information could be processed. Which was fine then.
Then came 16 bit computers. Due to downward compatibility and other issues, the 8 bit byte was kept and the 16 bit word was introduced. Each word was 2 bytes. And each clock cycle 16 bits of information could be processed. But this posed a small problem.
Let's look at a memory map:
+----+
|0000|
|0001|
+----+
|0002|
|0003|
+----+
|0004|
|0005|
+----+
| .. |
At each address there is a byte which can be accessed individually.
But words can only be fetched at even addresses. So if we read a word at 0000, we read the bytes at 0000 and 0001. But if we want to read the word at position 0001, we need two read accesses. First 0000,0001 and then 0002,0003 and we only keep 0001,0002.
Of course this took some extra time and that was not appreciated. So that's why they invented alignment. So we store word variables at word boundaries and byte variables at byte boundaries.
For example, if we have a structure with a byte field (B) and a word field (W) (and a very naive compiler), we get the following:
+----+
|0000| B
|0001| W
+----+
|0002| W
|0003|
+----+
Which is not fun. But when using word alignment we find:
+----+
|0000| B
|0001| -
+----+
|0002| W
|0003| W
+----+
Here memory is sacrificed for access speed.
You can imagine that when using double word (4 bytes) or quad word (8 bytes) this is even more important. That's why with most modern compilers you can chose which alignment you are using while compiling the program.
Some CPU architectures require specific alignment of various datatypes, and will throw exceptions if you don't honor this rule. In standard mode, x86 doesn't require this for the basic data types, but can suffer performance penalties (check www.agner.org for low-level optimization tips).
However, the SSE instruction set (often used for high-performance) audio/video procesing has strict alignment requirements, and will throw exceptions if you attempt to use it on unaligned data (unless you use the, on some processors, much slower unaligned versions).
Your issue is probably that one compiler expects the caller to keep the stack aligned, while the other expects callee to align the stack when necessary.
EDIT: as for why the exception happens, a routine in the DLL probably wants to use SSE instructions on some temporary stack data, and fails because the two different compilers don't agree on calling conventions.
IIRC, stack alignment is when variables are placed on the stack "aligned" to a particular number of bytes. So if you are using a 16 bit stack alignment, each variable on the stack is going to start from a byte that is a multiple of 2 bytes from the current stack pointer within a function.
This means that if you use a variable that is < 2 bytes, such as a char (1 byte), there will be 8 bits of unused "padding" between it and the next variable. This allows certain optimisations with assumptions based on variable locations.
When calling functions, one method of passing arguments to the next function is to place them on the stack (as opposed to placing them directly into registers). Whether or not alignment is being used here is important, as the calling function places the variables on the stack, to be read off by the calling function using offsets. If the calling function aligns the variables, and the called function expects them to be non-aligned, then the called function won't be able to find them.
It seems that the msvc compiled code is disagreeing about variable alignment. Try compiling with all optimisations turned off.
As far as I know, compilers don't typically align variables that are on the stack. The library may be depending on some set of compiler options that isn't supported on your compiler. The normal fix is to declare the variables that need to be aligned as static, but if you go about doing this in other people's code, you'll want to be sure that they variables in question are initialized later on in the function rather than in the declaration.
// Some compilers won't align this as it's on the stack...
int __declspec(align(32)) needsToBe32Aligned = 0;
// Change to
static int __declspec(align(32)) needsToBe32Aligned;
needsToBe32Aligned = 0;
Alternately, find a compiler switch that aligns the variables on the stack. Obviously the "__declspec" align syntax I've used here may not be what your compiler uses.

Decoding and matching Chip 8 opcodes in C/C++

I'm writing a Chip 8 emulator as an introduction to emulation and I'm kind of lost. Basically, I've read a Chip 8 ROM and stored it in a char array in memory. Then, following a guide, I use the following code to retrieve the opcode at the current program counter (pc):
// Fetch opcode
opcode = memory[pc] << 8 | memory[pc + 1];
Chip 8 opcodes are 2 bytes each. This is code from a guide which I vaguely understand as adding 8 extra bit spaces to memory[pc] (using << 8) and then merging memory[pc + 1] with it (using |) and storing the result in the opcode variable.
Now that I have the opcode isolated however, I don't really know what to do with it. I'm using this opcode table and I'm basically lost in regards to matching the hex opcodes I read to the opcode identifiers in that table. Also, I realize that many of the opcodes I'm reading also contain operands (I'm assuming the latter byte?), and that is probably further complicating my situation.
Help?!
Basically once you have the instruction you need to decode it. For example from your opcode table:
if ((inst&0xF000)==0x1000)
{
write_register(pc,(inst&0x0FFF)<<1);
}
And guessing that since you are accessing rom two bytes per instruction, the address is probably a (16 bit) word address not a byte address so I shifted it left one (you need to study how those instructions are encoded, the opcode table you provided is inadequate for that, well without having to make assumptions).
There is a lot more that has to happen and I dont know if I wrote anything about it in my github samples. I recommend you create a fetch function for fetching instructions at an address, a read memory function, a write memory function a read register function, write register function. I recommend your decode and execute function decodes and executes only one instruction at a time. Normal execution is to just call it in a loop, it provides the ability to do interrupts and things like that without a lot of extra work. It also modularizes your solution. By creating the fetch() read_mem_byte() read_mem_word() etc functions. You modularize your code (at a slight cost of performance), makes debugging much easier as you have a single place where you can watch registers or memory accesses and figure out what is or isnt going on.
Based on your question, and where you are in this process, I think the first thing you need to do before writing an emulator is to write a disassembler. Being a fixed instruction length instruction set (16 bits) that makes it much much easier. You can start at some interesting point in the rom, or at the beginning if you like, and decode everything you see. For example:
if ((inst&0xF000)==0x1000)
{
printf("jmp 0x%04X\n",(inst&0x0FFF)<<1);
}
With only 35 instructions that shouldnt take but an afternoon, maybe a whole saturday, being your first time decoding instructions (I assume that based on your question). The disassembler becomes the core decoder for your emulator. Replace the printf()s with emulation, even better leave the printfs and just add code to emulate the instruction execution, this way you can follow the execution. (same deal have a disassemble a single instruction function, call it for each instruction, this becomes the foundation for your emulator).
Your understanding needs to be more than vague as to what that fetch line of code is doing, in order to pull off this task you are going to have to have a strong understanding of bit manipulation.
Also I would call that line of code you provided buggy or at least risky. If memory[] is an array of bytes, the compiler might very well perform the left shift using byte sized math, resulting in a zero, then zero orred with the second byte results in only the second byte.
Basically a compiler is within its rights to turn this:
opcode = memory[pc] << 8) | memory[pc + 1];
Into this:
opcode = memory[pc + 1];
Which wont work for you at all, a very quick fix:
opcode = memory[pc + 0];
opcode <<= 8;
opcode |= memory[pc + 1];
Will save you some headaches. Minimal optimization will save the compiler from storing the intermediate results to ram for each operation resulting in the same (desired) output/performance.
The instruction set simulators I wrote and mentioned above are not intended for performance but instead readability, visibility, and hopefully educational. I would start with something like that then if performance for example is of interest you will have to re-write it. This chip8 emulator, once experienced, would be an afternoon task from scratch, so once you get through this the first time you could re-write it maybe three or four times in a weekend, not a monumental task (to have to re-write). (the thumbulator one took me a weekend, for the bulk of it. The msp430 one was probably more like an evening or two worth of work. Getting the overflow flag right, once and for all, was the biggest task, and that came later). Anyway, point being, look at things like the mame sources, most if not all of those instruction set simulators are designed for execution speed, many are barely readable without a fair amount of study. Often heavily table driven, sometimes lots of C programming tricks, etc. Start with something manageable, get it functioning properly, then worry about improving it for speed or size or portability or whatever. This chip8 thing looks to be graphics based so you are going to also have to deal with a lot of line drawing and other bit manipulation on a bitmap/screen/wherever. Or you could just call api or operating system functions. Basically this chip8 thing is not your traditional instruction set with registers and a laundry list of addressing modes and alu operations.
Basically -- Mask out the variable part of the opcode, and look for a match. Then use the variable part.
For example 1NNN is the jump. So:
int a = opcode & 0xF000;
int b = opcode & 0x0FFF;
if(a == 0x1000)
doJump(b);
Then the game is to make that code fast or small, or elegant, if you like. Good clean fun!
Different CPUs store values in memory differently. Big endian machines store a number like $FFCC in memory in that order FF,CC. Little-endian machines store the bytes in reverse order CC, FF (that is, with the "little end" first).
The CHIP-8 architecture is big endian, so the code you will run has the instructions and data written in big endian.
In your statement "opcode = memory[pc] << 8 | memory[pc + 1];", it doesn't matter if the host CPU (the CPU of your computer) is little endian or big endian. It will always put a 16-bit big endian value into an integer in the correct order.
There are a couple of resources that might help: http://www.emulator101.com gives a CHIP-8 emulator tutorial along with some general emulator techniques. This one is good too: http://www.multigesture.net/articles/how-to-write-an-emulator-chip-8-interpreter/
You're going to have to setup a bunch of different bit masks to get the actual opcode from the 16-bit word in combination with a finite state machine in order to interpret those opcodes since it appears that there are some complications in how the opcodes are encoded (i.e., certain opcodes have register identifiers, etc., while others are fairly straight-forward with a single identifier).
Your finite state machine can basically do the following:
Get the first nibble of the opcode using a mask like `0xF000. This will allow you to "categorize" the opcode
Based on the function category from step 1, apply more masks to either get the register values from the opcode, or whatever other variables might be encoded with the opcode that will narrow down the actual function that would need to be called, as well as it's arguments.
Once you have the opcode and the variable information, do a look-up into a fixed-length table of functions that have the appropriate handlers to coincide with the opcode functionality and the variables that go along with the opcode. While you can, in your state machine, hard-code the names of the functions that would go with each opcode once you've isolated the proper functionality, a table that you initialize with function-pointers for each opcode is a more flexible approach that will let you modify the code functionality easier (i.e., you could easily swap between debug handlers and "normal" handlers, etc.).

What does "BUS_ADRALN - Invalid address alignment" error means?

We are on HPUX and my code is in C++.
We are getting
BUS_ADRALN - Invalid address alignment
in our executable on a function call. What does this error means?
Same function is working many times then suddenly its giving core dump.
in GDB when I try to print the object values it says not in context.
Any clue where to check?
You are having a data alignment problem. This is likely caused by trying to read or write through a bad pointer of some kind.
A data alignment problem is when the address a pointer is pointing at isn't 'aligned' properly. For example, some architectures (the old Cray 2 for example) require that any attempt to read anything other than a single character from memory only occur through a pointer in which the last 3 bits of the pointer's value are 0. If any of the last 3 bits are 1, the hardware will generate an alignment fault which will result in the kind of problem you're seeing.
Most architectures are not nearly so strict, and frequently the required alignment depends on the exact type being accessed. For example, a 32 bit integer might require only the last 2 bits of the pointer to be 0, but a 64 bit float might require the last 3 bits to be 0.
Alignment problems are usually caused by the same kinds of problems that would cause a SEGFAULT or segmentation fault. Usually a pointer that isn't initialized. But it could be caused by a bad memory allocator that isn't returning pointers with the proper alignment, or by the result of pointer arithmetic on the pointer when it isn't of the correct type.
The system implementation of malloc and/or operator new are almost certainly correct or your program would be crashing way before it currently does. So I think the bad memory allocator is the least likely tree to go barking up. I would check first for an uninitialized pointer and then bad pointer arithmetic.
As a side note, the x86 and x86_64 architectures don't have any alignment requirements. But, because of how cache lines work, and for various other reasons, it's often a good idea for performance to align your data on a boundary that's as big as the datatype being stored (i.e. a 4 byte boundary for a 32 bit int).
Most processors (not x86 and friends.. the blacksheep of the family lol) require accesses to certain elements to be aligned on multiples of bytes. I.e. if you read an integer from address 0x04 that is okay, but if you try to do the same from 0x03 you will cause an interrupt to be thrown.
This is because it's easier to implement the load/store hardware if it's always on a multiple of the data size with which you're working.
Since HP-UX runs only on RISC processors, which typically have such constraints, you should see here -> http://en.wikipedia.org/wiki/Data_structure_alignment#RISC.
Actually HP-UX has its own great forum on ITRC and some HP staff members are very helpful. I just took a look at the same topic you are asking and here are some results. For example the similar problem was caused actually by a bad input parameter. I strongly advise you first to read answers to similar question and if necessary to post your question there.
By the way it is likely that you will be asked to post results of these gdb commands:
(gdb) bt
(gdb) info reg
(gdb) disas $pc-16*8 $pc+16*4
Most of these issues are caused by multiple upstream dependencies linking to different versions of the same library.
For example, both the gnustl and stlport provide distinct implementations of the C++ standard library. If you compile and link against gnustl, while one of your dependencies was compiled and linked against stlport, then you will each have a different implementation of standard functions and classes. When your program is launched, the dynamic linker will attempt to resolve all of the exported symbols, and will discover known symbols at incorrect offsets, resulting in the BUS_ADRALN signal.

Align native code on fixed size memory boundaries with GCC/G++/AS?

I have a C function that contains all the code that will implement the bytecodes of a bytecode interpreter.
I'm wondering if there is a way to align segments of the compiled code in memory on fixed size boundaries so that I could directly calculate the address to jump to from the value of the bytecode? Sort of the same way an array works but instead of reading from the calculated address, I'm jumping to it.
I am aware that i will have to put the code to perform the next jump at the end of every "bytecode code" segment and that I will have to make the boundary size at least as big as the size of the largest segment.
If this is even possible, how would I tell the compiler/assembler (gcc / g++ / as) to align in said way?
I realize this is not exactly what you are asking for, but this is the standard way to implement byte code interpreters with GCC.
GCC's "computed goto" or "labels as values" feature allows you to put labels in an array and efficiently jump to different bytecode instructions. See Fast interpreter using gcc's computed goto. Also look at this related Stack Overflow question: C/C++ goto, and the GCC documentation on labels as values.
The code to do this would look something like this:
void* jumptable[] = {&&label1, &&label2};
label:
/* Code here... */
label2:
/* Other code here... */
You can then jump to different instructions using the table:
goto *jumptable[i];
There are two issues here, but the answer is the same. First, you're writing (binary) data to a (binary) file. Second, you're loading that (binary) data into memory. You control where it goes on disk, and you control where it goes in memory. You can easily calculate what you're looking for.
Personally, I would probably use an array when loading data into memory, and I would make sure all data started at a valid index in that array. Arrays are laid out contiguously and are relatively easy to work with. Kernighan and Ritchie's book The C Programming Language mentions a technique of using unions for alignment, but that doesn't make pointer arithmetic any easier.
If you are using linux use posix_memalign(). I am sure there is a similar function for Windows.
If you want to align your own code have a look at the gcc __attribute__ syntax.
The ld -Ttext options may also be helpful.

what is "stack alignment"?

What is stack alignment?
Why is it used?
Can it be controlled by compiler settings?
The details of this question are taken from a problem faced when trying to use ffmpeg libraries with msvc, however what I'm really interested in is an explanation of what is "stack alignment".
The Details:
When runnig my msvc complied program which links to avcodec I get the
following error: "Compiler did not align stack variables. Libavcodec has
been miscompiled", followed by a crash in avcodec.dll.
avcodec.dll was not compiled with msvc, so I'm unable to see what is going on inside.
When running ffmpeg.exe and using the same avcodec.dll everything works well.
ffmpeg.exe was not compiled with msvc, it was complied with gcc / mingw (same as avcodec.dll)
Thanks,
Dan
Alignment of variables in memory (a short history).
In the past computers had an 8 bits databus. This means, that each clock cycle 8 bits of information could be processed. Which was fine then.
Then came 16 bit computers. Due to downward compatibility and other issues, the 8 bit byte was kept and the 16 bit word was introduced. Each word was 2 bytes. And each clock cycle 16 bits of information could be processed. But this posed a small problem.
Let's look at a memory map:
+----+
|0000|
|0001|
+----+
|0002|
|0003|
+----+
|0004|
|0005|
+----+
| .. |
At each address there is a byte which can be accessed individually.
But words can only be fetched at even addresses. So if we read a word at 0000, we read the bytes at 0000 and 0001. But if we want to read the word at position 0001, we need two read accesses. First 0000,0001 and then 0002,0003 and we only keep 0001,0002.
Of course this took some extra time and that was not appreciated. So that's why they invented alignment. So we store word variables at word boundaries and byte variables at byte boundaries.
For example, if we have a structure with a byte field (B) and a word field (W) (and a very naive compiler), we get the following:
+----+
|0000| B
|0001| W
+----+
|0002| W
|0003|
+----+
Which is not fun. But when using word alignment we find:
+----+
|0000| B
|0001| -
+----+
|0002| W
|0003| W
+----+
Here memory is sacrificed for access speed.
You can imagine that when using double word (4 bytes) or quad word (8 bytes) this is even more important. That's why with most modern compilers you can chose which alignment you are using while compiling the program.
Some CPU architectures require specific alignment of various datatypes, and will throw exceptions if you don't honor this rule. In standard mode, x86 doesn't require this for the basic data types, but can suffer performance penalties (check www.agner.org for low-level optimization tips).
However, the SSE instruction set (often used for high-performance) audio/video procesing has strict alignment requirements, and will throw exceptions if you attempt to use it on unaligned data (unless you use the, on some processors, much slower unaligned versions).
Your issue is probably that one compiler expects the caller to keep the stack aligned, while the other expects callee to align the stack when necessary.
EDIT: as for why the exception happens, a routine in the DLL probably wants to use SSE instructions on some temporary stack data, and fails because the two different compilers don't agree on calling conventions.
IIRC, stack alignment is when variables are placed on the stack "aligned" to a particular number of bytes. So if you are using a 16 bit stack alignment, each variable on the stack is going to start from a byte that is a multiple of 2 bytes from the current stack pointer within a function.
This means that if you use a variable that is < 2 bytes, such as a char (1 byte), there will be 8 bits of unused "padding" between it and the next variable. This allows certain optimisations with assumptions based on variable locations.
When calling functions, one method of passing arguments to the next function is to place them on the stack (as opposed to placing them directly into registers). Whether or not alignment is being used here is important, as the calling function places the variables on the stack, to be read off by the calling function using offsets. If the calling function aligns the variables, and the called function expects them to be non-aligned, then the called function won't be able to find them.
It seems that the msvc compiled code is disagreeing about variable alignment. Try compiling with all optimisations turned off.
As far as I know, compilers don't typically align variables that are on the stack. The library may be depending on some set of compiler options that isn't supported on your compiler. The normal fix is to declare the variables that need to be aligned as static, but if you go about doing this in other people's code, you'll want to be sure that they variables in question are initialized later on in the function rather than in the declaration.
// Some compilers won't align this as it's on the stack...
int __declspec(align(32)) needsToBe32Aligned = 0;
// Change to
static int __declspec(align(32)) needsToBe32Aligned;
needsToBe32Aligned = 0;
Alternately, find a compiler switch that aligns the variables on the stack. Obviously the "__declspec" align syntax I've used here may not be what your compiler uses.