Entering Ring 0 with C++ using Visual Studio 15 - c++

Yes I know, that there are some similar questions around, but none of them are satisfying
I know that it is a stupid idea, but I need to enter into kernel-mode (aka Ring 0) with my Visual Studio 2015 C++-Project.
I also want to do it with the minimal effort necessary (meaning, that I do not want to create a driver specifically for testing and having to sign and redeploy after every build as this is very tedious).
How can I achieve this?
It does not matter to me, whether the project is run on my host machine or on a remote one (or virtual one) -- I have enough machines at my disposal.
Background: I am currently working on the Cosmos operating system and I need to test X86-assembly instructions which need Ring 0 "privilege", e.g. rdmsr, out, in etc.
Running the following code will break on the 8th line with an 0xC0000096: Privileged instruction.-Error:
int* ptr = new int[4];
int* va = ptr;
__asm
{
lea esi, va
mov ecx, 0xe7
rdmsr //error, as this must run in ring0
mov [esi + 4], eax
mov [esi], edx
mov ecx, 0xe8
rdmsr
mov [esi + 12], eax
mov [esi + 8], edx
xor eax, eax
}
....
And yes, I am fully aware of any risk I am taking, so do please not ask, why I would need to do such a thing or whether I am trying to get the programmer's darwin award ;)

AFAIK Visual Studio cannot debug kernel code, but there are other debuggers that can: WinDbg and KD. You'll need some time to figure them out, but there's no other way.

Related

Why the gs segment register is address is set to 0x0000000000000000 on visual studio x64(MASM)?

I am currently reading "The Ultimate Anti Debugging Reference" and I am trying to implement some of the techniques.
To check the Value of the NtglobalFlag they use this code -
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
I did all the correct adjustments for running x64 code on visual studio 2017 I used this tutorial.
I used this instruction to accesses the NtGlobalFlag
lodsq gs:[rsi]
because their syntax didn't work on Visual studio.
But still, it didn't work.
While debugging I've noticed that the value of the gs register is set to 0x0000000000000000 while the fs register is set to a real value 0x0000007595377000.
I don't understand why the value of GS was zeroed, because it should have its value set on x64.
64 bit Windows is apparently using fs to point to "per thread" memory, since gs is zero. I don't know what variables are kept in "per thread" memory, other than the seed value for rand(). You could debug a program that used rand(), and step through it in a disassembler window, to see how it is accessed.
The success of adding an anti-debugger feature to a program will depend on how much motivation there is to defeat it. The main issue is Windows remote debugging, and/or using a hacker installed device driver running in kernel mode to defeat an anti-debugger feature.
So I still don't understand why the code posted here caused so many problems, As I said I just copied it from "The “Ultimate”Anti-Debugging Reference"
push 60h
pop rsi
gs:lodsq ;Process Environment Block
mov al, [rsi*2+rax-14h] ;NtGlobalFlag
and al, 70h
cmp al, 70h
je being_debugged
But I've found a simpler solution that works perfectly.
As #"Peter Cordes" said I should be good with just accessing the value without lodsq like so -
mov rax, gs:[60h]
And after further investigation, I found this reference,
Code -
mov rax, gs:[60h]
mov al, [rax+BCh]
and al, 70h
cmp al, 70h
jz being_debugged
And I modified it a little bit for my program -
.code
GetValueFromASM proc
mov rax, gs:[60h]
mov al, [rax+0BCh]
and al, 70h
cmp al, 70h
jz being_debugged
mov rax,0
ret
being_debugged:
mov rax, 1
ret
GetValueFromASM endp
end
Just one thing to note -
When running inside visual studio 2017 the result returned was 0. Meaning no debugger attached which is False (Because I used the Local Windows Debugger).
But when launching the process with WinDBG it did return 1 which means that it works.

Any advantage of XOR AL,AL + MOVZX EAX, AL over XOR EAX,EAX?

I have some unknown C++ code that was compiled in Release build, so it's optimized. The point I'm struggling with is:
xor al, al
add esp, 8
cmp byte ptr [ebp+userinput], 31h
movzx eax, al
This is my understanding:
xor al, al ; set eax to 0x??????00 (clear last byte)
add esp, 8 ; for some unclear reason, set the stack pointer higher
cmp byte ptr [ebp+userinput], 31h ; set zero flag if user input was "1"
movzx eax, al ; set eax to AL and extend with zeros, so eax = 0x000000??
I don't care about line 2 and 3. They might be there in this order for pipelining reasons and IMHO have nothing to do with EAX.
However, I don't understand why I would clear AL first, just to clear the rest of EAX later. The result will IMHO always be EAX = 0, so this could also be
xor eax, eax
instead. What is the advantage or "optimization" of that piece of code?
Some background info:
I will get the source code later. It's a short C++ console demo program, maybe 20 lines of code only, so nothing that I would call "complex" code. IDA shows a single loop in that program, but not around this piece. The Stud_PE signature scan didn't find anything, but likely it's Visual Studio 2013 or 2015 compiler.
xor al,al is already slower than xor eax,eax on most CPUs. e.g. on Haswell/Skylake it needs an ALU uop and doesn't break the dependency on the old value of eax/rax. It's equally bad on AMD CPUs, or Atom/Silvermont. (Well, maybe not equally because AMD doesn't eliminate xor eax,eax at issue/rename, but it still has a false dependency which could serialize the new dependency chain with whatever used eax last).
On CPUs that do rename al separately from the rest of the register (Intel pre-IvyBridge), the xor al,al may still be recognized as a zeroing idiom, but unless you actively want to preserve the upper bytes of the register, the best way to zero al is xor eax,eax.
Doing movzx on top of that just makes it even worse.
I'm guessing your compiler somehow got confused and decided it needed a 1-byte zero, but then realized it needed to promote it to 32 bits. xor sets flags, so it couldn't xor-zero after the cmp, and it failed to notice that it could have just xor-zeroed eax before the cmp.
Either that or it's something like Jester's suggestion, where the movzx is a branch target. Even if that's the case, xor eax,eax would still have been better because zero-extending into eax follows unconditionally on this code path.
I'm curious what compiler produced this from what source.

Difference between += 1 and ++ as executed by computer?

Are the commands the computer performs the same for these two expressions in C++:
myInt += 1;
myInt ++;
Am I supposed to use a GCC? Or if you know the answer can you tell me?
For many compilers, these will be identical. (Note that I said many compilers - see my disclaimer below). For example, for the following C++ code, both Test 1 and Test 2 will result in the same assembly language:
int main()
{
int test = 0;
// Test 1
test++;
// Test 2
test += 1;
return 0;
}
Many compilers (including Visual Studio) can be configured to show the resulting assembly language, which is the best way to settle questions like this. For example, in Visual Studio, I right-clicked on the project file, went to "Properties", and did the following:
In this case, as shown below, Visual Studio does, in fact, compile them to the same assembly language:
; Line 8
push ebp
mov ebp, esp
sub esp, 204 ; 000000ccH
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-204]
mov ecx, 51 ; 00000033H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 9
mov DWORD PTR _test$[ebp], 0
; Line 12 - this is Test 1
mov eax, DWORD PTR _test$[ebp]
add eax, 1
mov DWORD PTR _test$[ebp], eax
; Line 15 - This is Test 2 - note that the assembly is identical
mov eax, DWORD PTR _test$[ebp]
add eax, 1
mov DWORD PTR _test$[ebp], eax
; Line 17
xor eax, eax
; Line 18
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
Interestingly enough, its C# compiler also produces the same MSIL (which is C#'s "equivalent" of assembly language) for similar C# code, so this apparently holds across multiple languages as well.
By the way, if you're using another compiler like gcc, you can follow the directions here to get assembly language output. According to the accepted answer, you should use the -S option, like the following:
gcc -S helloworld.c
If you're writing in Java and would like to do something similar, you can follow these directions to use javap to get the bytecode, which is Java's "equivalent" of assembly language, so to speak.
Also of interest, since this question originally asked about Java as well as C++, this question discusses the relationship between Java code, bytecode, and the eventual machine code (if any).
Caution: Different compilers may produce different assembly language, especially if you're compiling for different processors and platforms. Whether or not you have optimization turned on can also affect things. So, strictly speaking, the fact that Visual Studio is "smart enough" to know that those two statements "mean" the same thing doesn't necessarily mean that all compilers for all possible platforms will be that smart.

when can an assignment fail to properly show in Visual C?

C++ compiled (from the same source) DLL with Visual Studio C++ 2010 Express on both a 64 bit Windows 7 and 32 bit Windows XP. External 64 bit app on windows 7 calls the DLL and executes properly. Equivalent 32 bit app on Windows XP bombs on return from DLL call with stack or memory corruption.
Trying to debug this I put a breakpoint where the DLL is copying the data from some internal structures to what the external app wants, last step before returning. At a given point I'm looking at something like this in Visual Studio:
destination[i].field = source[i].field;
where both fields in the source and destination are doubles or longs.
Hovering over the source it shows the correctly computed values. Hovering over the destination, before executing the statement, shows that it was properly initialized to zeros. After executing the statement the destination contain a different value, e.g. 36.3468 becomes 0.00104800000000122891, 6 becomes 10, etc.
This is strange. Maybe there is a structure element misalignment, but wouldn't that show up as a warning somewhere else? Maybe I'm stepping over memory (in the 32 bit version only!?), but then shouldn't the value be apparently correct after stepping over the assignment? Haven't stepped into machine code in a while and don't know x86/x86_64 assembly that well, do I have to do that to see what the code that does that assignment is really doing?
Here is one of the lines that seems to not execute properly and the disassembly in both the 64 bit and 32 bit versions, in that order:
destination[i].field = source[i].field;
000007FEEF6B4DD3 movsxd rax,dword ptr [i]
000007FEEF6B4DDB imul rax,rax,4A68h
000007FEEF6B4DE2 movsxd rcx,dword ptr [i]
000007FEEF6B4DEA imul rcx,rcx,30h
000007FEEF6B4DEE mov rdx,qword ptr [destination]
000007FEEF6B4DF3 mov r8,qword ptr [source]
000007FEEF6B4DF8 movsd xmm0,mmword ptr [r8+rax+4A40h]
000007FEEF6B4E02 movsd mmword ptr [rdx+rcx+8],xmm0
destination[i].field = source[i].field;
09E64361 mov eax,dword ptr [i]
09E64364 imul eax,eax,4A38h
09E6436A mov ecx,dword ptr [i]
09E6436D imul ecx,ecx,2Ch
09E64370 mov edx,dword ptr [destination]
09E64373 mov esi,dword ptr [source]
09E64376 fld qword ptr [esi+eax+4A10h]
09E6437D fstp qword ptr [edx+ecx+4]
If I step over that line in the 64 bit version, VS shows me the proper value for destination[i].field, but not in the 32 bit version. Seems that the structures have different sizes in different versions, thus different offsets and 4 vs 8 bytes in the last assignment, but shouldn't at that point VS show me the proper value?
If I step over the fld instruction on the 32 bit version, I can see that st0 is loaded with the wrong value, i.e. not what is shown for source[i].field, For i=0, eax=0, esi=source, thus probably the 4A10h offset is wrong and/or differently computed in the code and what VS uses to show me the value. How is this possible?

ASSEMBLY Offset to C++ Code question

I've been trying to convert this code to C++ without any inlining and I cannot figure it out..
Say you got this line
sub edx, (offset loc_42C1F5+5)
My hex-rays gives me
edx -= (uint)((char*)loc_42C1F5 + 5))
But how would it really look like without the loc_42C1F5.
I would think it would be
edx -= 0x42C1FA;
But is that correct? (can't really step this code in any assembler-level debugger.. as it's damaged well protected)
loc_42C1F5 is a label actually..
seg000:0042C1F5 loc_42C1F5: ; DATA XREF: sub_4464A0+2B5o
seg000:0042C1F5 mov edx, [esi+4D98h]
seg000:0042C1FB lea ebx, [esi+4D78h]
seg000:0042C201 xor eax, eax
seg000:0042C203 xor ecx, ecx
seg000:0042C205 mov [ebx], eax
loc_42C1F5 is a symbol. Given the information you've provided, I cannot say what its offset is. It may be 0x42C1F5 or it may be something else entirely.
If it is 0x42C1F5, then your translation should be correct.
IDA has incorrectly identified 0x42C1FA as an offset, and Hex-Rays used that interpretation. Just convert it to plain number (press O) and all will be well. That's why it's called Interactive Disassembler :)