Finding Memory Size in Boot without DOS, Windows, Linux - c++

I am writing a simple program in Assembly (NASM). When the boot sector loads it has to display all the memory (RAM) installed in the computer in Megabytes. There would be no Operating System (DOS, Windows, Linux) when the boot sector is loaded, so how would I find out total RAM size. I have 2 GB RAM in my computer. I have searched alot on the internet but could not find a solution.
Is there an interrupt of BIOS that shows memory size of 2 GB? There is an interrupt that was used in old computers to shows memory, but it does not shows all 2 GBs. I checked and there is no solution for this in Ralph Brown List. May be someone knows alot more about BIOS. If BIOS does not provide this facility then can I use C/C++ to find out Total RAM size? And call C/C++ code from assembly? What function of C/C++ would be used to find Total RAM size?
Remember that my assembly code will do a cold boot and there would be no operating system to provide any facility to my code.
EDITED:
I read the website http://wiki.osdev.org/Detecting_Memory_%28x86%29. And decided to check if int 15 works. So I got the code from this website and edited it to test if int 15 EAX = E820 works. But it fails to work and the output is 'F' in .failed1. 'F' is test case which I made to check for "unsupported function". Test Cases are 'F', 'G' and 'H'. Here is the code.
; use the INT 0x15, eax= 0xE820 BIOS function to get a memory map
; inputs: es:di -> destination buffer for 24 byte entries
; outputs: bp = entry count, trashes all registers except esi
do_e820:
xor ebx, ebx ; ebx must be 0 to start
xor bp, bp ; keep an entry count in bp
mov edx, 0x0534D4150 ; Place "SMAP" into edx
mov eax, 0xe820
mov [es:di + 20], dword 1 ; force a valid ACPI 3.X entry
mov ecx, 24 ; ask for 24 bytes
int 0x15
jc short .failed1 ; carry set on first call means "unsupported function"
mov edx, 0x0534D4150 ; Some BIOSes apparently trash this register?
cmp eax, edx ; on success, eax must have been reset to "SMAP"
jne short .failed2
test ebx, ebx ; ebx = 0 implies list is only 1 entry long (worthless)
je short .failed3
jmp short .jmpin
.e820lp:
mov eax, 0xe820 ; eax, ecx get trashed on every int 0x15 call
mov [es:di + 20], dword 1 ; force a valid ACPI 3.X entry
mov ecx, 24 ; ask for 24 bytes again
int 0x15
jc short .e820f ; carry set means "end of list already reached"
mov edx, 0x0534D4150 ; repair potentially trashed register
.jmpin:
jcxz .skipent ; skip any 0 length entries
cmp cl, 20 ; got a 24 byte ACPI 3.X response?
jbe short .notext
test byte [es:di + 20], 1 ; if so: is the "ignore this data" bit clear?
je short .skipent
.notext:
mov ecx, [es:di + 8] ; get lower dword of memory region length
or ecx, [es:di + 12] ; "or" it with upper dword to test for zero
jz .skipent ; if length qword is 0, skip entry
inc bp ; got a good entry: ++count, move to next storage spot
add di, 24
.skipent:
test ebx, ebx ; if ebx resets to 0, list is complete
jne short .e820lp
.e820f:
mov [mmap_ent], bp ; store the entry count
clc ; there is "jc" on end of list to this point, so the carry must be cleared
mov ah, 0x0E ; Teletype command
mov bh, 0x00 ; Page number
mov bl, 0x07 ; Attributes (7 == white foreground, black background)
mov al, mmap_ent ; Character to print
int 0x10
ret
.failed1:
push eax
push ebx
mov ah, 0x0E ; Teletype command
mov bh, 0x00 ; Page number
mov bl, 0x07 ; Attributes (7 == white foreground, black background)
mov al, 70 ; Character 'F' to print
int 0x10
pop ebx
pop eax
stc ; "function unsupported" error exit
ret
.failed2:
push eax
push ebx
mov ah, 0x0E ; Teletype command
mov bh, 0x00 ; Page number
mov bl, 0x07 ; Attributes (7 == white foreground, black background)
mov al, 71 ; Character 'G' to print
int 0x10
pop ebx
pop eax
stc ; "function unsupported" error exit
ret
.failed3:
push eax
push ebx
mov ah, 0x0E ; Teletype command
mov bh, 0x00 ; Page number
mov bl, 0x07 ; Attributes (7 == white foreground, black background)
mov al, 72 ; Character 'H' to print
int 0x10
pop ebx
pop eax
stc ; "function unsupported" error exit
ret
mmap_ent db 0
failmsg db 0
failmem db 'Failed', 0
;times 512-($-$$) db 0
;dw 0xAA55
EDITED:
I used nasm memext.asm -o memext.com -l memext.lst . Used MagicISO to make a bootable image file memext.iso and used Windows disk burner to burn it to a DVD/RW. Loaded Oracle VM and made a new Virtual Machine with 256 Mb RAM, CD/DVD, Hard disk of 2GB. Booted with DVD for a cold boot test, does not print anything.
Also, I open Command Console and just typed memext and it gave 'F' as output.

You will need to read the ACPI tables on a PC (or other machines that support ACPI).
Note that this will not give you the total size as one number, but give you the memory size of each region of memory - on a simple machine, that may just be two or three regions (there are holes of "not real memory" at the 0xA0000-0xFFFFF and wherever the BIOS decides to put the "PCI-hole").
I suspect it won't be entirely trivial to fit the ACPI reader into a single sector, considering some of the boot sector only has around 400 bytes of space available (although if you completely skip the partition table, I suppose you can use almost all of the 512 bytes).
As to "how to call C/C++", you will not be able to fit any meaningful C or C++ program in less than several sectors. You will need to take a look at an OS bootloader, and see how they achieve the setup for the compiler (and in many cases, you will also need special tools to produce code that is at a particular location suitable to be loaded into the memory and directly executed). This page may be of help for that (I haven't read through it all, it may even tell how much memory you have): http://www.codeproject.com/Articles/36907/How-to-develop-your-own-Boot-Loader

EDIT: my mistake, the wiki is correct, just leaving this here because...
Looks like there's a typo in the wiki - the line:
mov edx,0x0534D4150
should look like:
mov edx,0x050414D53
Notice the bytes are in reverse order (since x86 is little endian).

Related

Compiler generates costly MOVZX instruction

My profiler has identified the following function profiling as the hotspot.
typedef unsigned short ushort;
bool isInteriorTo( const std::vector<ushort>& point , const ushort* coord , const ushort dim )
{
for( unsigned i = 0; i < dim; ++i )
{
if( point[i + 1] >= coord[i] ) return false;
}
return true;
}
In particular one assembly instruction MOVZX (Move with Zero-Extend) is responsible for the bulk of the runtime. The if statement is compiled into
mov rcx, QWORD PTR [rdi]
lea r8d, [rax+1]
add rsi, 2
movzx r9d, WORD PTR [rsi-2]
mov rax, r8
cmp WORD PTR [rcx+r8*2], r9w
jae .L5
I'd like to coax the compiler out of generating this instruction but I suppose I first need to understand why this instruction is generated. Why the widening/zero extension, considering that I'm working with the same data type?
(Find the entire function on godbolt compiler explorer.)
Thank you for the good question!
Clearing Registers and Dependency Breaking Idioms
A Quote from the Intel® 64 and IA-32 Architectures
Optimization Reference Manual, Section 3.5.1.8:
Code sequences that modifies partial register can experience some delay in its dependency chain, but can be avoided by using dependency breaking idioms. In processors based on Intel Core microarchitecture, a number of instructions can help clear execution dependency when software uses these instructions to clear register content to zero. Break dependences on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.
Assembly/Compiler Coding Rule 37. (M impact, MH generality): Break dependences on portions of registers between instructions by operating on 32-bit registers instead of partial registers. For moves, this can be accomplished with 32-bit moves or by using MOVZX.
movzx vs mov
The compiler knows that movzx is not costly and uses it as often as possible. It may take more bytes to encode movzx than mov, but it is not expensive to execute.
Contrary to the logic, a program with movzx (that fills the entire registers) actually works faster than with just mov, which only sets lower parts of the registers.
Let me demonstrate this conclusion to you on the following code fragment. It is part of the code that implements CRC-32 calculation using the Slicing by-N algorithm. Here it is:
movzx ecx, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
movzx ecx, bl
shr ebx, 8
xor eax, dword ptr [ecx * 4 + edi + 1024 * 2]
movzx ecx, bl
shr ebx, 8
xor eax, dword ptr [ecx * 4 + edi + 1024 * 1]
skipped 6 more similar triplets that do movzx, shr, xor.
dec <<<a counter register >>>>
jnz …… <<repeat the whole loop again>>>
Here is the second code fragment. We have cleared ecx in advance, and now just instead of “movzx ecx, bl” do “mov cl, bl”:
// ecx is already cleared here to 0
mov cl, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
mov cl, bl
shr ebx, 8
xor eax, dword ptr [ecx * 4 + edi + 1024 * 2]
mov cl, bl
shr ebx, 8
xor eax, dword ptr [ecx * 4 + edi + 1024 * 1]
<<< and so on – as in the example #1>>>
Now guess which of the two above code fragments runs faster? Did you think previously that the speed is the same, or the movzx version is slower? In fact, the movzx code is faster because all the CPUs since Pentium Pro do Out-Of-Order execution of instructions and register renaming.
Register Renaming
Register renaming is a technique used internally by a CPU that eliminates the false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them.
Let me just take the first 4 instructions from the first code fragment:
movzx ecx, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
movzx ecx, bl
As you see, instruction 4 depends on instruction 2. Instruction 4 does not rely on the result of instruction 3.
So the CPU could execute instructions 3 and 4 in parallel (together), but instruction 3 uses the register (read-only) modified by instruction 4, thus instruction 4 may only start executing after instruction 3 fully completes. Let us then rename the register ecx to edx after the first triplet to avoid this dependency:
movzx ecx, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
movzx edx, bl
shr ebx, 8
xor eax, dword ptr [edx * 4 + edi + 1024 * 2]
movzx ecx, bl
shr ebx, 8
xor eax, dword ptr [ecx * 4 + edi + 1024 * 1]
Here is what we have now:
movzx ecx, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
movzx edx, bl
Now instruction 4 in no way uses any register needed for instruction 3, and vice versa, so instructions 3 and 4 can execute simultaneously for sure!
This is what the CPU does for us. The CPU, when translating instructions to micro-operations (micro-ops) which the Out-of-order algorithm will execute, renames the registers internally to eliminate these dependencies, so the micro-ops deal with renamed, internal registers, rather than with the real ones as we know them. Thus we don't need to rename registers ourselves as I have just renamed in the above example – the CPU will automatically rename everything for us while translating instructions to micro-ops.
The micro-ops of instruction 3 and instruction 4 will be executed in parallel, since micro-ops of instruction 4 will deal with entirely different internal register (exposed to outside as ecx) than micro-ops of instruction 3, so we don't need to rename anything.
Let me revert the code to the initial version. Here it is:
movzx ecx, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
movzx ecx, bl
(instructions 3 and 4 run in parallel because ecx of instruction 3 is not that ecx as of instruction 4, but a different, renamed register – the CPU has automatically allocated for instruction 4 micro-ops a new, fresh register from the pool of internally available registers).
Now let us go back to movxz vs mov.
Movzx clears a register entirely, so the CPU for sure knows that we do not depend on any previous value that remained in higher bits of the register. When the CPU sees the movxz instruction, it knows that it can safely rename the register internally and execute the instruction in parallel with previous instructions. Now take the first 4 instructions from our example #2, where we use mov rather than movzx:
mov cl, bl
shr ebx, 8
mov eax, dword ptr [ecx * 4 + edi + 1024 * 3]
mov cl, bl
In this case, instruction 4, by modifying cl, modifies bits 0-7 of the ecx, leaving bits 8-32 unchanged. Thus the CPU cannot just rename the register for instruction 4 and allocate another, fresh register, because instruction 4 depends on bits 8-32 left from previous instructions. The CPU has to preserve bits 8-32 before it can execute instruction 4. Thus it cannot just rename the register. It will wait until instruction 3 completes before executing instruction 4. Instruction 4 didn't become fully independent - it depends on the previous value of ECX and the previous value of bl. So it depends on two registers at once. If we had used movzx, it would have depended on just one register - bl. Consequently, instructions 3 and 4 would not run in parallel because of their interdependence. Sad but true.
That's why it is always faster to operate complete registers. Suppose we need only to modify a part of the register. In that case, it's always quicker to alter the entire register (for example, use movzx) – to let the CPU know for sure that the register no longer depends on its previous value. Modifying complete registers allows the CPU to rename the register and let the Out-of-order execution algorithm execute this instruction together with the other instructions, rather than execute them one-by-one.
The movzx instruction zero extends a quantity into a register of larger size. In your case, a word (two bytes) is zero extended into a dword (four bytes). Zero extending itself is usually free, the slow part is loading the memory operand WORD PTR [rsi-2] from RAM.
To speed this up, you can try to ensure that the datum you want to fetch from RAM is in the L1 cache at the time you need it. You can do this by placing strategic prefetch intrinsics into an appropriate place. For example, assuming that one cache line is 64 bytes, you could add a prefetch intrinsic to fetch array entry i + 32 every time you go through the loop.
You can also consider an algorithmic improvement such that less data needs to be fetched from memory, but that seems unlikely to be possible.

Which, in x86, is more efficient? Using a variable or lea of the variable? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a section of x86 code inside of some c++ code:
void encrypt_chars (int lengthW, char EKey)
{
__asm { //
xor esi, esi //zeroise esi
mov edi, lengthW //store the max loop counter in a register
for:
movzx ecx, OChars[esi] //store the character to encrypt
lea eax, EKey //by ref
movzx ebx, byte ptr[eax] //store the EKey value in EBX as a keep safe for when the original is changed later
sub ecx, 0x0A //change the current characters hex value by -10 (denary)
and byte ptr[eax], 0xAA //and EKey with 170(denary) to get an encryption value
not byte ptr[eax] //not the encryption value to obtain a different value
movzx edx, byte ptr[eax] //store the encryption value in EDX
or ebx, 0xAA //create a second encryption value
add bl, dl //add the values in the last 8 bits of EBX and EDX (the two encryption values), store them in the last 8 bits of EBX (ignores the 9th bit from carry)
xor ecx, ebx //encrypt the original letter with the encryption value
rol cl, 2 //futher encryption through rotating last 8 bits of EAX bits 2 to left
mov EChars[esi], cl //move
inc esi //increment loop counter
cmp esi, edi //compare loop counter and the max number of loops
jl for //jump if esi is less than the loop counter
}
return;
}
My question is, which is more efficient, to use the lea into eax then use a pointer, or use the variable itself instead of all of the byte ptr[eax].
I know that lea is a very quick instruction, but I'm unsure as to whether referencing it in memory is more efficient than just using the variable.
Using some register, not necessary eax, is better when you have multiply accesses to the given variable and the variable is global - i.e. addressed by absolute address.
In the code from the question, the variables are function arguments and they are pointed by the ESP or by EBP (depending on the compiler). So, it is the same as using EAX.
So, using the variables by name will spare one instruction (lea eax, EKey) from the inner loop and the code will be a little bit faster.
Notice how using inline assembly makes the code less readable and more obscure, because of the hidden code generated by the compiler. Better write everything in assembly language and then link the compiled object file to your C program.
It seems that most of this code is performing 8 bit operations, and if the key is 8 bits, why not just load it into al? You could also get rid of the offsets for a slight improvement in speed.
__asm {
lea esi, Ochars
mov edi, lengthW
add edi, esi
mov al, Ekey
for:
mov cl, [esi]
mov bl, al
sub cl, 0x0a
...

Simple encryption Assembly Program - Access violation writing to memory location

I had to implement a cdecl calling convention into this program which originally used a non standardized convention. As far as I can tell it looks right, but I get a unhandled exception error saying "Accress violation writing location 0x00000066, which seems to hit when the program gets down to the line "not byte ptr[eax]" or atleast that is where the arrow points after breaking the program.
Could anyone tell me what is wrong with my program and how I may fix it? Thank you.
void encrypt_chars (int length, char EKey)
{ char temp_char;
for (int i = 0; i < length; i++)
{
temp_char = OChars [i];
__asm {
push eax
movzx eax, temp_char
push eax
lea eax, EKey
push eax
call encrypt
mov temp_char, al
pop eax
}
EChars[i] = temp_char;
return;
// Inputs: register EAX = 32-bit address of Ekey,
// ECX = the character to be encrypted (in the low 8-bit field, CL).
// Output: register EAX = the encrypted value of the source character (in the low 8-bit field, AL).
__asm {
encrypt:
push ebp
mov ebp, esp
mov ecx, 8[ebp]
mov eax, 12[ebp]
push edi
push ecx
not byte ptr[eax]
add byte ptr[eax], 0x04
movzx edi, byte ptr[eax]
pop eax
xor eax, edi
pop edi
rol al, 1
rol al, 1
add al, 0x04
mov esp, ebp
pop ebp
ret
}
By inspection, the comment on the encrypt function is wrong. Remember: the stack grows down, so when the arguments are pushed onto the stack, the ones pushed first have the higher address and, therefore, the higher offset from the base pointer in the stack frame.
The comment to encrypt says:
// Inputs: register EAX = 32-bit address of Ekey,
// ECX = the character to be encrypted
However, your calling sequence is:
movzx eax, temp_char ; push the char to encrypt FIRST
push eax
lea eax, EKey ; push the encryption key SECOND
push eax
call encrypt
So the character is push first. So the character to encrypt But encrypt is loading them this way:
; On function entry, the old Instruction Pointer (4 bytes) is pushed onto the stack
; so now the EKey is +4 bytes from the stack pointer
; and the character is +8 bytes from the stack pointer
;
push ebp
mov ebp, esp
; We just pushed another 4 bytes onto the stack (the esp register)
; and THEN we put the stack pointer (esp) into ebp as base pointer
; to the stack frame.
;
; That means EKey is now +8 bytes off of the base pointer
; and the char to encrypt is +12 off of the base pointer
;
mov ecx, 8[ebp] ; This loads EKey pointer to ECX
mov eax, 12[ebp] ; This loads char-to-encrypt to EAX
The code then proceeds to try to reference EAX as a pointer (since it thinks that's EKey), which is going to cause an access violation since it's your character to encrypt the first time it tries to reference EAX as a pointer, which is here:
not byte ptr[eax]
So your debugger pointer was right! :)
You can fix it just by swapping these two registers:
mov eax, 8[ebp] ; This loads EKey pointer to EAX
mov ecx, 12[ebp] ; This loads char-to-encrypt to ECX
Finally, your call to encrypt doesn't clean up the stack pointer when it's done. Since you pushed 8 bytes of data onto the stack before calling encrypt, and since encrypt does a standard ret with no stack clean-up, you need to clean up after the call:
...
call encrypt
add esp, 8
...

Porting old compiler ftol (float to long) function to C

I'm relying on an old implementation that does some calculations and converts float's to int.
However, after replicating the calculations some values are off due different rounding results.
It boils down that the binary is using the following code for converting a float to int (long).
lea ecx, [esp+var_8] ; Load Effective Address
sub esp, 10h ; Integer Subtraction
and ecx, 0FFFFFFF8h ; Logical AND
fld st ; Load Real
fistp qword ptr [ecx] ; Store Integer and Pop
fild qword ptr [ecx] ; Load Integer
mov edx, [ecx+4]
mov eax, [ecx]
test eax, eax ; Logical Compare
jz short loc_3 ; Jump if Zero (ZF=1)
loc_1:
fsubp st(1), st ; Subtract Real and Pop
test edx, edx ; Logical Compare
jz short loc_2 ; Jump if Zero (ZF=1)
fstp dword ptr [ecx] ; Store Real and Pop
mov ecx, [ecx]
add esp, 10h ; Add
xor ecx, 80000000h ; Logical Exclusive OR
add ecx, 7FFFFFFFh ; Add
adc eax, 0 ; Add with Carry
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_2:
fstp dword ptr [ecx] ; Store Real and Pop
mov ecx, [ecx]
add esp, 10h ; Add
add ecx, 7FFFFFFFh ; Add
sbb eax, 0 ; Integer Subtraction with Borrow
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_3:
test edx, 7FFFFFFFh ; Logical Compare
jnz short loc_1 ; Jump if Not Zero (ZF=0)
fstp dword ptr [ecx] ; Store Real and Pop
fstp dword ptr [ecx] ; Store Real and Pop
add esp, 10h ; Add
retn ; Return Near from Procedure
This behaves differently then simply doing unsigned int var = (unsigned int)floatVal;.
I believe this is an old ftol implementation and was done because converting from float to int was very slow and the compiler needed to change the FPU rounding mode.
It looks very similar to this one http://www.libsdl.org/release/SDL-1.2.15/src/stdlib/SDL_stdlib.c
Can anyone assist me in converting the function to C? Or tell me how I can create an inline ASM function with float parameter and return int using Visual Studio. The one in SDL_sdtlib.c has no header and I'm not sure how to call it without function args.
This doesn't exactly answer the questions as asked, but I wanted to start by trying a more thorough English translation. It's not perfect, and there are a few lines where I'm still trying to track down the intent. Everyone please speak up with questions and corrections.
lea ecx, [esp+var_8]; Load Effective Address // make ecx point to somewhere on the stack (I don't know where var_8 is being generated in this case, but I'm guessing it's set such that it makes ecx point to the local stack space allocated on the next line)
sub esp, 10h ; Integer Subtraction // make room on stack for 16 bytes of local variable -- doesn't all get used but adds padding to allow aligned loads and stores
and ecx, 0FFFFFFF8h ; Logical AND // align pointer in ecx to 8-byte boundary
fld st ; Load Real // duplicates whatever was last left (passed by calling convention) on the top of the FPU stack -- st(1) = st(0)
fistp qword ptr [ecx] ; Store Integer and Pop // convert st(0) to *64bit* int (truncate), store in aligned 8 bytes (of local variable space?) pointed to by ecx, and pop off the top value from the FPU stack
fild qword ptr [ecx] ; Load Integer // convert truncated value back to float and leave it sitting on the top of the FPU stack
;// at this point:
;// - st(0) is the truncated float
;// - st(1) is still the original float.
;// - There is a 64bit integer representation pointed to by [ecx]
mov edx, [ecx+4] ; // move [bytes 4 thru 7 of integer output] to edx (most significant bytes)
mov eax, [ecx] ; // move [bytes 0 thru 3 of integer output] to eax (least significant bytes) -- makes sense, as EAX should hold integer return value in x86 calling conventions
test eax, eax ; Logical Compare // (http://stackoverflow.com/questions/13064809/the-point-of-test-eax-eax)
jz short loc_3 ; Jump if Zero (ZF=1) // if the least significant 4 bytes are zero, goto loc_3
; // else fall through to loc_1
loc_1: ;
fsubp st(1), st ; Subtract Real and Pop // subtract the truncated float from the original, store in st(1), then pop. (i.e. for 1.25 st(0) ends up 0.25, and the original float is no longer on the FPU stack)
test edx, edx ; Logical Compare // same trick as earlier, but for the most significant bytes now
jz short loc_2 ; Jump if Zero (ZF=1) // if the most significant 4 bytes from before were all zero, goto loc_2 -- (i.e. input float does not overflow a 32 bit int)
fstp dword ptr [ecx] ; Store Real and Pop // else, dump the fractional portion of the original float over the least significant bytes of the 64bit integer
;// at this point:
;// - the FPU stack should be empty
;// - eax holds a copy of the least significant 4 bytes of the 64bit integer (return value)
;// - edx holds a copy of the most significant 4 bytes of the 64bit integer
;// - [ecx] points to a float representing the part of the input that would be lost in integer truncation
;// - [ecx+4] points at the most significant 4 bytes of our 64bit integer output (probably considered garbage now and not used again)
mov ecx, [ecx] ; // make ecx store what it's pointing at directly instead of the pointer to it
add esp, 10h ; Add // clean up stack space from the beginning
xor ecx, 80000000h ; Logical Exclusive OR // mask off the sign bit of the fractional float
add ecx, 7FFFFFFFh ; Add // add signed int max (still need to figure out why this)
adc eax, 0 ; Add with Carry // clear carry bit
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_2:
;// at this point: the FPU stack still holds the fractional (non-integer) portion of the original float that woud have been lost to truncation
fstp dword ptr [ecx] ; Store Real and Pop // store non-integer part as float in local stack space, and remove it from the FPU stack
mov ecx, [ecx] ; // make ecx store what it's pointing at directly instead of the pointer to it
add esp, 10h ; Add // clean up stack space from the beginning
add ecx, 7FFFFFFFh ; Add // add signed int max to the float we just stored (still need to figure out why this)
sbb eax, 0 ; Integer Subtraction with Borrow // clear carry bit
retn ; Return Near from Procedure
; ---------------------------------------------------------------------------
loc_3:
test edx, 7FFFFFFFh ; Logical Compare // test the most significant bytes for signed int max
jnz short loc_1 ; Jump if Not Zero (ZF=0) // if the high bytes equal signed int max go back to loc_1
fstp dword ptr [ecx] ; Store Real and Pop // else, empty the FPU stack
fstp dword ptr [ecx] ; Store Real and Pop // empty the FPU stack
add esp, 10h ; Add // clean up stack space from the beginning
retn ; Return Near from Procedure

Hooking into a rather big application

i have this code :
.text:0045A020 ; int __thiscall CMapConnection__OnItemOptionCombination(CMapConnection *this, _tagRequestMAP_COMPOSITION_OPTIONITEM *prcreq)
.text:0045A020 ?OnItemOptionCombination#CMapConnection##QAEHPAU_tagRequestMAP_COMPOSITION_OPTIONITEM###Z proc near
.text:0045A020
.text:0045A020 000 push ebp
.text:0045A021 004 mov ebp, esp
.text:0045A023 004 sub esp, 440h ; Integer Subtraction
.text:0045A029 444 mov eax, ___security_cookie
.text:0045A02E 444 xor eax, ebp ; Logical Exclusive OR
.text:0045A030 444 mov [ebp+var_2F0], eax
.text:0045A036 444 push esi
.text:0045A037 448 push edi
.text:0045A038 44C mov [ebp+this], ecx
.text:0045A03E 44C mov eax, [ebp+this]
.text:0045A044 44C mov ecx, [eax+534h]
.text:0045A04A 44C mov [ebp+pPlayer], ecx
.text:0045A050 44C cmp [ebp+pPlayer], 0 ; Compare Two Operands
.text:0045A057 44C jnz short loc_45A063 ; Jump if Not Zero (ZF=0)
.text:0045A057
.text:0045A059 44C mov eax, 1
.text:0045A05E 44C jmp loc_45A97B ; Jump
Long things short, i need to do the folowing :
- hook into the beginning of the function
- do some checks ( allot of code is required for those checks )
- based on the checking result, i need to either let the function continue it's normal course or make it jump to the section where it triggers some errors or simply stop it from advancing.
I have to do this with basic understanding of asm.
From what i've read i can do that with a hook, but here's my problem :
The checking function needs to read the _tagRequestMAP_COMPOSITION_OPTIONITEM *prcreq data, so it can gather some numbers.
.text:0041A464 784C mov ecx, [ebp+pPacket] ; jumptable 00417B7A case 27
.text:0041A467 784C add ecx, 4 ; Add
.text:0041A46A 784C mov [ebp+var_1874], ecx
.text:0041A470 784C mov edx, [ebp+var_1874]
.text:0041A476 784C push edx ; prcreq
.text:0041A477 7850 mov ecx, [ebp+this] ; this
.text:0041A47D 7850 call ?OnItemOptionCombination#CMapConnection##QAEHPAU_tagRequestMAP_COMPOSITION_OPTIONITEM###Z ;
Here's how the original function is called.
My questions :
How do i read the data from *pcreq in C++ code? Is it possible?
Is it possible to call another function from my hook while passing the same parameters to it as the hooked function has?
I don't mess with the parameters of the OnItemCombination function at all, do i have to redo the stack when i exit from my hook?
Since you can't "pause" the program in order to inject the DLL/so and do the checks (or at least I've never heard of such a thing) you could modify the startup code in order to loop around a variable.
While the program is spinning, perform your checks into the injected DLL/so then get the static pointer used for that variable and modify it to allow the continuation of the injected program.
This will probably take some skill to achieve.
Eagerly waiting for more answers,
Cheers.
Update:
Here's what I had in mind.
edit the startup code of the program to spin at a loop like the following. Using jmp and cmp instructions.
static bool spin = true;
while(spin){ }
Then inject your DLL/so and do your checks. Once you're done. Change spin to false and allow the program to continue.
To change spin you'll have to find the static pointer. You can do that by studying the instructions or with a program like CheatEngine.
Detours Library
http://research.microsoft.com/en-us/projects/detours/
or
EasyHook
http://www.codeproject.com/Articles/27637/EasyHook-The-reinvention-of-Windows-API-hooking