MASM Fixing 64 bit Truncation in a DLL

MASM Fixing 64 bit Truncation in a DLL - c++

I am working with the Adobe Flash ocx by loading it into my C++ program. The ocx is supposed to be 64 bit but for some reason it has issues when I compile with the x64 platform. I have read up on this and found that it is likely that some function receives DWORD userData instead of void* userData through some structure and then casts it to an object pointer. This works ok in a 32-bit environment, but crashes in 64-bit.
The disassembly of the function calls inside the ocx that cause the crash are the following lines:
mov ecx,r8d
The first operation copies only low 32-bits from R8D to ECX (ECX is 32-bit).
cmp dword ptr [rcx+11BCh],0
The second operation accesses 64-bit register, where low 32-bits contains correct address and high 32-bits contains some junk. Leading to a crash, of course.
Solution
I have read that one possible solution is to do the following:
Create an asm file containing the following code:
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8d // I've replaced ecx with rcx here
cmp dword ptr [rcx+11BCh],0
Build an obj file using this asm file and MASM.exe
Open the obj file with a hex editor and locate the 90's that represent the nop
In the Flash ocx locate the first string of bytes between the nops and replace it with the new string of bytes that comes after the nops. This will change it from 32 bit to 64 bit function calls.
Problem
I have attempted this by making the following asm file and building it with ml64.exe (I do not have masm.exe but I think that ml.exe is the new 32 bit version of it, and this code would only build with the ml64.exe, probably because of the 64-bit only operators?):
TITLE: Print String Assembly Program (test.asm)
.Code
main Proc
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8
cmp dword ptr [rcx+11BCh],0
main ENDP
END
I had trouble getting it to build (I kept getting errors about instruction length matching) until I changed r8d to r8 in the second section.
I got this obj to build, and opened it with a hex editor and was able to locate the two byte strings. But where my problem comes is that when I search for the first byte string that should be in the flash ocx, I cannot find it. It is not there, so I cannot replace it with the second one.
What am I doing wrong?
Thanks!

Create an asm file containing the following code:
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8d // I've replaced ecx with rcx here
cmp dword ptr [rcx+11BCh],0
Build an obj file using this asm file and MASM.exe
Open the obj file with a hex editor and locate the 90's that represent the nop
In the Flash ocx locate the first string of bytes between the nops and replace it with the new string of bytes that comes after the nops. This will change it from 32 bit to 64 bit function calls.
I made the following asm file and built it with ml64.exe
TITLE: Print String Assembly Program (test.asm)
.Code
main Proc
nop
nop
nop
mov ecx,r8d
cmp dword ptr [rcx+11BCh],0
nop
nop
nop
mov rcx,r8
cmp dword ptr [rcx+11BCh],0
main ENDP
END
I got this obj to build, and opened it with a hex editor and was able to locate the two byte strings. I found the first byte string in the Flash OCX and changed it to the second one. (The only actual change was a 41 to a 49 in the strings)

Related

asm inspection of c++ compiled object. what is the meaning of this cs: part [duplicate]

I am writing simple programs then analyze them.
Today I've written this:
#include <stdio.h>
int x;
int main(void){
printf("Enter X:\n");
scanf("%d",&x);
printf("You enter %d...\n",x);
return 0;
}
It's compiled into this:
push rbp
mov rbp, rsp
lea rdi, s ; "Enter X:"
call _puts
lea rsi, x
lea rdi, aD ; "%d"
mov eax, 0
call ___isoc99_scanf
mov eax, cs:x <- don't understand this
mov esi, eax
lea rdi, format ; "You enter %d...\n"
mov eax, 0
call _printf
mov eax, 0
pop rbp
retn
I don't understand what cs:x means.
I use Ubuntu x64, GCC 10.3.0, and IDA pro 7.6.

TL:DR: IDA confusingly uses cs: to indicate a RIP-relative addressing mode in 64-bit code.
In IDA mov eax, x means mov eax, DWORD [x] which in turn means reading a DWORD from the variable x.
For completeness, mov rax, OFFSET x means mov rax, x (i.e. putting the address of x in rax).
In 64-bit displacements are still 32-bit, so, for a Position Independent Executable, it's not always possible to address a variable by encoding its address (because it's 64-bit and it would not fit into a 32-bit field). And in position-independent code, it's not desirable.
Instead, RIP-relative addressing is used.
In NASM, RIP-relative addressing takes the form mov eax, [REL x], in gas it is mov x(%rip), %eax.
Also, in NASM, if DEFAULT REL is active, the instruction can be shortened to mov eax, [x] which is identical to the 32-bit syntax.
Each disassembler will disassemble a RIP-relative operand differently. As you commented, Ghidra gives mov eax, DWORD PTR [x].
IDA uses mov eax, cs:x to mean mov eax, [REL x]/mov x(%rip), %eax.
;IDA listing, 64-bit code
mov eax, x ;This is mov eax, [x] in NASM and most likely wrong unless your exec is not PIE and always loaded <= 4GiB
mov eax, cs:x ;This is mov eax, [REL x] in NASM and idiomatic to 64-bit programs
In short, you can mostly ignore the cs: because that's just the way variables are addressed in 64-bit mode.
Of course, as the listing above shows, the use or absence of RIP-relative addressing tells you the program can be loaded anywhere or just below the 4GiB.
The cs prefix shown by IDA threw me off.
I can see that it could mentally resemble "code" and thus the rip register but I don't think the RIP-relative addressing implies a cs segment override.
In 32-bit mode, the code segment is usually read-only, so an instruction like mov [cs:x], eax will fault.
In this scenario, putting a cs: in front of the operand would be wrong.
In 64-bit mode, segment overrides (other than fs/gs) are ignored (and the read-bit of the code segment is ignored anyway), so the presence of a cs: doesn't really matter because ds and cs are effectively indistinguishable. (Even an ss or ds override doesn't change the #GP or #SS exception for a non-canonical address.)
Probably the AGU doesn't even read the segment shadow registers anymore for segment bases other than fs or gs. (Although even in 32-bit mode, there's a lower latency fast path for the normal case of segment base = 0, so hardware may just let that do its job.)
Still cs: is misleading in my opinion - a 2E prefix byte is still possible in machine code as padding. Most tools still call it a CS prefix, although http://ref.x86asm.net/coder64.html calls it a "null prefix" in 64-bit mode. There's no such byte here, and cs: is not an obvious or clear way to imply RIP-relative addressing.

GCC/G++ addresses and can't read the registers [duplicate]

This question already has answers here:
GDB Cannot insert breakpoint, Cannot access memory at address XXX? [duplicate]
(2 answers)
Closed 5 years ago.
Dump of assembler code for function main():
0x000000000000071a <+0>: push rbp
0x000000000000071b <+1>: mov rbp,rsp
0x000000000000071e <+4>: sub rsp,0x20
0x0000000000000722 <+8>: mov rax,QWORD PTR fs:0x28
0x000000000000072b <+17>: mov QWORD PTR [rbp-0x8],rax
0x000000000000072f <+21>: xor eax,eax
0x0000000000000731 <+23>: lea rax,[rbp-0x20]
0x0000000000000735 <+27>: mov rdi,rax
0x0000000000000738 <+30>: call 0x764 <Test::Test()>
0x000000000000073d <+35>: lea rax,[rbp-0x20]
0x0000000000000741 <+39>: mov rdi,rax
0x0000000000000744 <+42>: call 0x7ae <Test::a()>
0x0000000000000749 <+47>: mov eax,0x0
0x000000000000074e <+52>: mov rdx,QWORD PTR [rbp-0x8]
0x0000000000000752 <+56>: xor rdx,QWORD PTR fs:0x28
0x000000000000075b <+65>: je 0x762 <main()+72>
0x000000000000075d <+67>: call 0x5f0 <__stack_chk_fail#plt>
0x0000000000000762 <+72>: leave
0x0000000000000763 <+73>: ret
End of assembler dump.
I have a problem.. I'm trying to debug the program but the addresses are weird and I can't read the registers(after start). "The program has no registers now."
and that's happens at any program that I've compiled in my computer.
EDIT:
gef➤ break*0x0000000000000763
Breakpoint 1 at 0x763: file 1.cpp, line 36.
gef➤ r
Starting program: /root/Desktop/Challenges/AdvancedMemoryChallenges/1.bin
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x763
gef➤ info reg $rip
rip 0x7ffff7dd9c20 0x7ffff7dd9c20
gef➤
gef➤ start
[+] Breaking at '{int (void)} 0x55555555471a <main()>'
[!] Command 'entry-break' failed to execute properly, reason: Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x763

0x763 is an address before relocation. (It is unclear whether it is from an object file or the actual executable.)
The addresses of code in a running program are never this low in the address space.
You need to set a breakpoint on _start or main, start the program, and see which addresses the kernel assigns to the machine code in question. The GDB disassemble command print will print such addresses.
GDB automatically disables address space layout randomization (ASLR), so the addresses will be constant as long as you do not change the program, its libraries, or the kernel (which sometimes results in process layout changes, too).

fastcall how to use for more than 4 parameters

I was trying to build a function in assebmly(FASM) that used more than 4 parameters. in x86 it works fine but I know in x64 with fastcall you have to spill the parameters into the shadow space in the order of rcx,rdx,r8,r9 I read that for 5 and etc you have to pass them onto the stack, but I don't know how to do this. this is what I tried but it keeps saying invalid operand. I know that the first 4 parameters I am doing right because I have made x64 functions before but it is the last 3 I don't know how to spill
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
if I try
mov [buffer3],rsp+8*4
it says extra characters on the line.
I also saw that somepeople use rsp+20h, rsp+28h etc but that does not work either.
how do I call more than 4 parameters using fastcall on x64?
also do I have to make room on the stack? I saw some people have to put add rsp,20h right before their spill code. I tried that and it did not help the invlaid operand.
thanks
update
after playing around with it for a little bit I found that the only way it seems to work is if I spill the first 4 parameters and then ignore the rest 5-infinity
proc substr,inputstring,outputstring,buffer1,buffer2,buffer3,startposition,length
;spill
mov [inputstring],rcx
mov [outputstring],rdx
mov [buffer1],r8
mov [buffer2],r9
;start the regular code. ignore spilling buffer3,startposition and length

On x86/x64-CPUs this following instructions does not exist:
mov [buffer3],[rsp+8*4]
mov [startposition],[rsp+8*5]
mov [length],[rsp+8*6]
Workaround with using the rax-register for to read and for to write a values from and to a memory loaction:
mov rax,[rsp+8*4]
mov [buffer3],rax
mov rax,[rsp+8*5]
mov [startposition],rax
mov rax,[rsp+8*6]
mov [length],rax

Injecting 64 Bit DLL using code cave

I'm trying to inject a 64 Bit DLL into 64 Bit Process (explorer for the matter).
I've tried using Remote-thread\Window Hooks techniques but some Anti-Viruses detects my loader as a false positive.
After reading this article : Dll Injection by Darawk, I decided to use code caves.
It worked great for 32bit but because VS doesn't support inline assembly for 64 Bit I had to write the op-codes and operands explicitly.
I looked at this article : 64Bit injection using code cave, as the article states, there are some differences:
There are several differences that had to be incorporated here:
MASM64 uses fastcall, so the function's argument has to be passed in a
register and not on the stack.
The length of the addresses - 32 vs. 64 bit - must be taken into account.
MASM64 has no instruction that
pushes all registers on the stack (like pushad in 32bit) so this had
to be done by pushing all the registers explicitly.
I followed those guidelines and ran the article's example but none of what I did worked.
The target process just crashed at the moment I resumed the main thread and I don't know how to really look into it because ollydbg has no 64 bit support.
This is how the code looks before I injected it:
codeToInject:
000000013FACD000 push 7741933Ah
000000013FACD005 pushfq
000000013FACD006 push rax
000000013FACD007 push rcx
000000013FACD008 push rdx
000000013FACD009 push rbx
000000013FACD00A push rbp
000000013FACD00B push rsi
000000013FACD00C push rdi
000000013FACD00D push r8
000000013FACD00F push r9
000000013FACD011 push r10
000000013FACD013 push r11
000000013FACD015 push r12
000000013FACD017 push r13
000000013FACD019 push r14
000000013FACD01B push r15
000000013FACD01D mov rcx,2CA0000h
000000013FACD027 mov rax,76E36F80h
000000013FACD031 call rax
000000013FACD033 pop r15
000000013FACD035 pop r14
000000013FACD037 pop r13
000000013FACD039 pop r12
000000013FACD03B pop r11
000000013FACD03D pop r10
000000013FACD03F pop r9
000000013FACD041 pop r8
000000013FACD043 pop rdi
000000013FACD044 pop rsi
000000013FACD045 pop rbp
000000013FACD046 pop rbx
000000013FACD047 pop rdx
000000013FACD048 pop rcx
000000013FACD049 pop rax
000000013FACD04A popfq
000000013FACD04B ret
Seems fine to me but I guess I'm missing something.
My complete code can be found here : Source code
Any ideas\suggestions\alternatives?

The first push that stores the return value only pushes a 32-bit value. dwOldIP in your code is a DWORD as well, it should be a DWORD64. Having to cast to DWORD from ctx.Rip should've been enough of a hint ;)
Also, make sure the stack is 16-byte aligned upon entering the call to LoadLibrary. Some APIs throw exceptions if the stack is not aligned properly.

Apparently, The main problem was that I allocated the code cave data without the EXECUTE_PAGE_READWRITE permission and therefore the chunk of data was treated as data and not as opcodes.

Why would fclose hang / deadlock? (Windows)

I have a directory change monitor process that reads updates from files within a set of directories. I have another process that performs small writes to a lot of files to those directories (test program). Figure about 100 directories with 10 files in each, and about 500 files being modified per second.
After running for a while, the directory monitor process hangs on a call to fclose() in a method that is basically tailing the file. In this method, I fopen() the file, check that the handle is valid, do a few seeks and reads, and then call fclose(). These reads are all performed by the same thread in the process. After the hang, the thread never progresses.
I couldn't find any good information on why fclose() might deadlock instead of returning some kind of error code. The documentation does mention _fclose_nolock(), but it doesn't seem to be available to me (Visual Studio 2003).
The hang occurs for both debug and release builds. In a debug build, I can see that fclose() calls _free_base(), which hangs before returning. Some kind of call into kernel32.dll => ntdll.dll => KernelBase.dll => ntdll.dll is spinning. Here's the assembly from ntdll.dll that loops indefinitely:
77CEB83F cmp dword ptr [edi+4Ch],0
77CEB843 lea esi,[ebx-8]
77CEB846 je 77CEB85E
77CEB848 mov eax,dword ptr [edi+50h]
77CEB84B xor dword ptr [esi],eax
77CEB84D mov al,byte ptr [esi+2]
77CEB850 xor al,byte ptr [esi+1]
77CEB853 xor al,byte ptr [esi]
77CEB855 cmp byte ptr [esi+3],al
77CEB858 jne 77D19A0B
77CEB85E mov eax,200h
77CEB863 cmp word ptr [esi],ax
77CEB866 ja 77CEB815
77CEB868 cmp dword ptr [edi+4Ch],0
77CEB86C je 77CEB87E
77CEB86E mov al,byte ptr [esi+2]
77CEB871 xor al,byte ptr [esi+1]
77CEB874 xor al,byte ptr [esi]
77CEB876 mov byte ptr [esi+3],al
77CEB879 mov eax,dword ptr [edi+50h]
77CEB87C xor dword ptr [esi],eax
77CEB87E mov ebx,dword ptr [ebx+4]
77CEB881 lea eax,[edi+0C4h]
77CEB887 cmp ebx,eax
77CEB889 jne 77CEB83F
Any ideas what might be happening here?

I posted this as a comment, but I realize this could be an answer in its own right...
Based on the disassembly, my guess is you've overwritten some internal heap structure maintained by ntdll, and it is looping forever iterating through a linked list.
In particular at the start of the loop, the current list node seems to be in ebx. At the end of the loop, the expected last node (or terminator, if you like -- it looks a bit like these are circular lists and the last node is the same as the first, pointer to this node being at [edi+4Ch]) is contained in eax. Probably the result of cmp ebx, eax is never equal, because there is some cycle in the list introduced by a heap corruption.
I don't think this has anything to do with locks, otherwise we would see some atomic instructions (eg. lock cmpxchg, xchg, etc.) or calls to other synchronization functions.

I had a same case with file close function. In my case, I solved by located the close function embedded other function body instead of having own function.
I was also suspicious on
(1) the name of file being duplicated (2) Windows scheduling (file IO wasn't completed before next task treading being started. Windows scheduling and multi-threading is behind of the curtain, so it is hard to verify, but I have similar issue when I tried to save many data in ASCII in the loop. Saving on binary solved at this case.)
My environment, IDE: Visual Studio 2015, OS: Windows 7, language: C++

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js