C++ compiled (from the same source) DLL with Visual Studio C++ 2010 Express on both a 64 bit Windows 7 and 32 bit Windows XP. External 64 bit app on windows 7 calls the DLL and executes properly. Equivalent 32 bit app on Windows XP bombs on return from DLL call with stack or memory corruption.
Trying to debug this I put a breakpoint where the DLL is copying the data from some internal structures to what the external app wants, last step before returning. At a given point I'm looking at something like this in Visual Studio:
destination[i].field = source[i].field;
where both fields in the source and destination are doubles or longs.
Hovering over the source it shows the correctly computed values. Hovering over the destination, before executing the statement, shows that it was properly initialized to zeros. After executing the statement the destination contain a different value, e.g. 36.3468 becomes 0.00104800000000122891, 6 becomes 10, etc.
This is strange. Maybe there is a structure element misalignment, but wouldn't that show up as a warning somewhere else? Maybe I'm stepping over memory (in the 32 bit version only!?), but then shouldn't the value be apparently correct after stepping over the assignment? Haven't stepped into machine code in a while and don't know x86/x86_64 assembly that well, do I have to do that to see what the code that does that assignment is really doing?
Here is one of the lines that seems to not execute properly and the disassembly in both the 64 bit and 32 bit versions, in that order:
destination[i].field = source[i].field;
000007FEEF6B4DD3 movsxd rax,dword ptr [i]
000007FEEF6B4DDB imul rax,rax,4A68h
000007FEEF6B4DE2 movsxd rcx,dword ptr [i]
000007FEEF6B4DEA imul rcx,rcx,30h
000007FEEF6B4DEE mov rdx,qword ptr [destination]
000007FEEF6B4DF3 mov r8,qword ptr [source]
000007FEEF6B4DF8 movsd xmm0,mmword ptr [r8+rax+4A40h]
000007FEEF6B4E02 movsd mmword ptr [rdx+rcx+8],xmm0
destination[i].field = source[i].field;
09E64361 mov eax,dword ptr [i]
09E64364 imul eax,eax,4A38h
09E6436A mov ecx,dword ptr [i]
09E6436D imul ecx,ecx,2Ch
09E64370 mov edx,dword ptr [destination]
09E64373 mov esi,dword ptr [source]
09E64376 fld qword ptr [esi+eax+4A10h]
09E6437D fstp qword ptr [edx+ecx+4]
If I step over that line in the 64 bit version, VS shows me the proper value for destination[i].field, but not in the 32 bit version. Seems that the structures have different sizes in different versions, thus different offsets and 4 vs 8 bytes in the last assignment, but shouldn't at that point VS show me the proper value?
If I step over the fld instruction on the 32 bit version, I can see that st0 is loaded with the wrong value, i.e. not what is shown for source[i].field, For i=0, eax=0, esi=source, thus probably the 4A10h offset is wrong and/or differently computed in the code and what VS uses to show me the value. How is this possible?
Related
I am writing simple programs then analyze them.
Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.
TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.
I am currently reading "The Ultimate Anti Debugging Reference" and I am trying to implement some of the techniques.
To check the Value of the NtglobalFlag they use this code -
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
I did all the correct adjustments for running x64 code on visual studio 2017 I used this tutorial.
I used this instruction to accesses the NtGlobalFlag
lodsq gs:[rsi]
because their syntax didn't work on Visual studio.
But still, it didn't work.
While debugging I've noticed that the value of the gs register is set to 0x0000000000000000 while the fs register is set to a real value 0x0000007595377000.
I don't understand why the value of GS was zeroed, because it should have its value set on x64.
64 bit Windows is apparently using fs to point to "per thread" memory, since gs is zero. I don't know what variables are kept in "per thread" memory, other than the seed value for rand(). You could debug a program that used rand(), and step through it in a disassembler window, to see how it is accessed.
The success of adding an anti-debugger feature to a program will depend on how much motivation there is to defeat it. The main issue is Windows remote debugging, and/or using a hacker installed device driver running in kernel mode to defeat an anti-debugger feature.
So I still don't understand why the code posted here caused so many problems, As I said I just copied it from "The “Ultimate”Anti-Debugging Reference"
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
But I've found a simpler solution that works perfectly.
As #"Peter Cordes" said I should be good with just accessing the value without lodsq like so -
mov rax, gs:[60h]
And after further investigation, I found this reference,
Code -
mov rax, gs:[60h]
mov al, [rax+BCh]
and al, 70h
cmp al, 70h
jz being_debugged
And I modified it a little bit for my program -
.code
GetValueFromASM proc
mov rax, gs:[60h]
mov al, [rax+0BCh]
and al, 70h
cmp al, 70h
jz being_debugged
mov rax,0
ret
being_debugged:
mov rax, 1
ret
GetValueFromASM endp
end
Just one thing to note -
When running inside visual studio 2017 the result returned was 0. Meaning no debugger attached which is False (Because I used the Local Windows Debugger).
But when launching the process with WinDBG it did return 1 which means that it works.
I have some C code that when given to Compiler Explorer, it outputs:
mov BYTE PTR [rbp-4], al
mov eax, ecx
mov BYTE PTR [rbp-8], al
mov eax, edx
mov BYTE PTR [rbp-12], al
However if I use GCC or G++ then it gives me this:
mov BYTE PTR 16[rbp], al
mov eax, edx
mov BYTE PTR 24[rbp], al
mov eax, ecx
mov BYTE PTR 32[rbp], al
I have no idea why the BYTE PTRs are different. They have a completely wrong address and I don't get why they are before the [rdp] part.
If you know how to reproduce the first output using gcc or g++ please help!
gcc.exe (GCC) 8.2.0
Looks like GCC for the Windows x64 calling convention is using the shadow space (32 bytes above the return address) reserved by its caller. Godbolt's GCC installs target GNU/Linux, i.e. the x86-64 System V ABI.
You can get the same code on Godbolt by marking your function with __attribute__((ms_abi)). Of course that means your caller has to see that attribute in the prototype so it knows to reserve that space, and which registers to pass function args in.
The Windows x64 calling convention is mostly worse than x86-64 System V; fewer arg-passing registers for example. One of its only advantages is easier implementation of variadic functions (because of the shadow space), and having some call-preserved XMM regs. (Probably too many, but x86-64 SysV has zero.) So more likely you want to use a cross compiler (targeting GNU/Linux) on Windows, or use __attribute__((sysv_abi)) on all your functions. (https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html)
The XMM part of the calling convention is normally irrelevant for kernel code; most kernels avoid saving/restoring the SIMD/FPU state on kernel entry/exit by not letting the compiler use SIMD/FP instructions.
I was trying to build a function in assebmly(FASM) that used more than 4 parameters. in x86 it works fine but I know in x64 with fastcall you have to spill the parameters into the shadow space in the order of rcx,rdx,r8,r9 I read that for 5 and etc you have to pass them onto the stack, but I don't know how to do this. this is what I tried but it keeps saying invalid operand. I know that the first 4 parameters I am doing right because I have made x64 functions before but it is the last 3 I don't know how to spill
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
if I try
mov [buffer3],rsp+8*4
it says extra characters on the line.
I also saw that somepeople use rsp+20h, rsp+28h etc but that does not work either.
how do I call more than 4 parameters using fastcall on x64?
also do I have to make room on the stack? I saw some people have to put add rsp,20h right before their spill code. I tried that and it did not help the invlaid operand.
thanks
update
after playing around with it for a little bit I found that the only way it seems to work is if I spill the first 4 parameters and then ignore the rest 5-infinity
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
;start the regular code. ignore spilling buffer3,startposition and length
On x86/x64-CPUs this following instructions does not exist:
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
Workaround with using the rax-register for to read and for to write a values from and to a memory loaction:
mov rax,[rsp+8*4]
mov [buffer3],rax
mov rax,[rsp+8*5]
mov [startposition],rax
mov rax,[rsp+8*6]
mov [length],rax
Here is a simple asm code I have inserted in VC++ project. addr_curr_ebp is the current address of EBP pointer. It is pointing to the old EBP value inside the stack frame. 4 bytes after this is the return address inside the application function. I extract a single byte from the code section. I run my code along with other applications like gtalk, vlc etc. The application always crashes when I include ProbStat 1 and 2 in my code. When I remove these statements everything works fine. What do you think this is?
__asm{
push eax
push ebx
push cx
mov ebx, addr_curr_ebp
mov eax, [ebx + 4]
mov cl, BYTE PTR [eax - 5] //ProbStat 1
mov ret_5, cl // ProbStat 2
pop cx
pop ebx
pop eax
}
Your code snippet isn't good enough to see where "ret_5" is located. You'll get an automatic crash if it is a member of a class. The ecx register stores the "this" pointer, you're messing it up.
Not sure what this does, sound to me like you need to use the _ReturnAddress intrinsic. It returns the address of the instruction after the call instruction that called this code. Assign it to an unsigned char*, no need for assembly this way.