I found function written in c++ which is able to detect debugger when I execute
xor eax, eax
div eax
but the problem is when a debugger is attached the process crashes after reaching div eax. I put that inline asm to __try and __except section but after reaching instruction div eax process just freezes. Whole code:
#include "windows.h"
#include <iostream>
using namespace std;
LONG WINAPI UnhandledExcepFilter(PEXCEPTION_POINTERS pExcepPointers) {
SetUnhandledExceptionFilter((LPTOP_LEVEL_EXCEPTION_FILTER)pExcepPointers->ContextRecord->Eax);
pExcepPointers->ContextRecord->Eip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
int main() {
SetUnhandledExceptionFilter(UnhandledExcepFilter);
__try {
__asm {
xor eax, eax
div eax
}
}
__except (EXCEPTION_INT_DIVIDE_BY_ZERO) {
cout << "DEBUGGER NOT FOUND" << endl;
}
return NULL;
}
I need just silently detect debugger.
Thx for any help.
Given that you are using "windows.h", you can simplify it to IsDebuggerPresent, and / or DebugBreak
Related
I am writing an assembly language code in Masm Visual Studio to incement a variable thrice.
// Increment.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include<iostream>
using namespace std;
extern "C" {
void incr();
}
int main()
{
incr(); //Breakpoint Here
return 0;
}
My assembly language code is:
PUBLIC incr
EXTERN puts:PROC
.data
var dword 0005
.code
incr PROC
mov eax, var
inc eax //Breakpoint Here
inc eax //Breakpoint Here
inc eax //Breakpoint Here
incr ENDP
END
The program builds successfully and while debugging, also successfully shows the desired values of RAX, but when it returns to increment.cpp, it throws exception at return 0;(Exception Thrown: Increment.exe has triggered a breakpoint). Why does this happen and how to get rid of it?
You need to add a ret instruction to your incr procedure so that your function returns properly.
incr PROC
mov eax, var
inc eax
inc eax
inc eax
ret
incr ENDP
The exception you're no doubt getting is being thrown by the debugger thats decided you're mixing calling conventions. What is actually happening here is that execution continues into some debugger padding and it then checks to see why the stack pointer is out of whack - but it should never reach here anyway. You stop it reaching here by returning (via ret) from your MASM procedure.
I have been wondering how V8 JavaScript Engine and any other JIT compilers execute the generated code.
Here are the articles I read during my attempt to write a small demo.
http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introduction
http://nullprogram.com/blog/2015/03/19/
I only know very little about assembly, so I initially used http://gcc.godbolt.org/ to write a function and get the disassembled output, but the code is not working on Windows.
I then wrote a small C++ code, compiled with -g -Og, then get disassmbled output with gdb.
#include <stdio.h>
int square(int num) {
return num * num;
}
int main() {
printf("%d\n", square(10));
return 0;
}
Output:
Dump of assembler code for function square(int):
=> 0x00000000004015b0 <+0>: imul %ecx,%ecx
0x00000000004015b3 <+3>: mov %ecx,%eax
0x00000000004015b5 <+5>: retq
I copy-pasted the output ('%' removed) to online x86 assembler and get { 0x0F, 0xAF, 0xC9, 0x89, 0xC1, 0xC3 }.
Here is my final code. if I compiled it with gcc, I always get 1. If I compiled it with VC++, I get random number. What is going on?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>
typedef unsigned char byte;
typedef int (*int0_int)(int);
const byte square_code[] = {
0x0f, 0xaf, 0xc9,
0x89, 0xc1,
0xc3
};
int main() {
byte* buf = reinterpret_cast<byte*>(VirtualAlloc(0, 1 << 8, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
if (buf == nullptr) return 0;
memcpy(buf, square_code, sizeof(square_code));
{
DWORD old;
VirtualProtect(buf, 1 << 8, PAGE_EXECUTE_READ, &old);
}
int0_int square = reinterpret_cast<int0_int>(buf);
int ans = square(100);
printf("%d\n", ans);
VirtualFree(buf, 0, MEM_RELEASE);
return 0;
}
Note
I am trying to learn how JIT works, so please do not suggest me to use LLVM or any library. I promise I will use a proper JIT library in real project rather than writing from scratch.
Note: as Ben Voigt points out in the comments, this is really only valid for x86, not x86_64. For x86_64 you just have some errors in your assembly (which are still errors in x86 as well) as Ben Voigt points out as well in his answer.
This is happening because your compiler could see both sides of the function call when you generated your assembly. Since the compiler was in control of generating code for both the caller and the callee, it didn't have to follow the cdecl calling convention, and it didn't.
The default calling convention for MSVC is cdecl. Basically, function parameters are pushed onto the stack in the reverse of the order they're listed, so a call to foo(10, 100) could result in the assembly:
push 100
push 10
call foo(int, int)
In your case, the compiler will generate something like the following at the call site:
push 100
call esi ; assuming the address of your code is in the register esi
That's not what your code is expecting though. Your code is expecting its argument to be passed in the register ecx, not the stack.
The compiler has used what looks like the fastcall calling convention. If I compile a similar program (I get slightly different assembly) I get the expected result:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>
typedef unsigned char byte;
typedef int (_fastcall *int0_int)(int);
const byte square_code[] = {
0x8b, 0xc1,
0x0f, 0xaf, 0xc0,
0xc3
};
int main() {
byte* buf = reinterpret_cast<byte*>(VirtualAlloc(0, 1 << 8, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
if (buf == nullptr) return 0;
memcpy(buf, square_code, sizeof(square_code));
{
DWORD old;
VirtualProtect(buf, 1 << 8, PAGE_EXECUTE_READ, &old);
}
int0_int square = reinterpret_cast<int0_int>(buf);
int ans = square(100);
printf("%d\n", ans);
VirtualFree(buf, 0, MEM_RELEASE);
return 0;
}
Note that I've told the compiler to use the _fastcall calling convention. If you want to use cdecl, the assembly would need to look more like this:
push ebp
mov ebp, esp
mov eax, DWORD PTR _n$[ebp]
imul eax, eax
pop ebp
ret 0
(DISCLAMER: I'm not great at assembly, and that was generated by Visual Studio)
I copy-pasted the output ('%' removed)
Well, that means your second instruction was
mov ecx, eax
which makes no sense at all (it overwrites the result of the multiplication with the uninitialized return value).
On the other hand
mov eax, foo
ret
is a very common pattern for ending a function with non-void return type.
The difference between your two assembly languages (AT&T style vs Intel style) is more than just the % marker, the operand order is reversed and pointers and offsets are denoted very differently as well.
You'll want to issue a set disassembly-flavor intel command in gdb
I'm trying to get PEB address of the current process with assembler.
the cpp file:
#include <iostream>
//#include <windows.h>
extern "C" int* __ptr64 Get_Ldr_Addr();
int main(int argc, char **argv)
{
std::cout << "asm " << Get_Ldr_Addr() << "\n";
//std::cout <<"peb "<< GetModuleHandle(0) << "\n";
return 0;
}
the asm file:
.code
Get_Ldr_Addr proc
push rax
mov rax, GS:[30h]
mov rax, [rax + 60h]
pop rax
ret
Get_Ldr_Addr endp
end
But I get different addresses from the GetModuleHandle(0) and the Get_Ldr_Addr()!
what is the problem? doesn't is suppose to be the same?
Q: If the function is external, it will check the PEB of the process that called it or of the function's dll (it suppose to be a dll)?
Tnx
If you don't mind C. Works in Microsoft Visual Studio 2015.
Uses the "__readgsqword()" intrinsic.
#include <winnt.h>
#include <winternl.h>
// Thread Environment Block (TEB)
#if defined(_M_X64) // x64
PTEB tebPtr = reinterpret_cast<PTEB>(__readgsqword(reinterpret_cast<DWORD_PTR>(&static_cast<NT_TIB*>(nullptr)->Self)));
#else // x86
PTEB tebPtr = reinterpret_cast<PTEB>(__readfsdword(reinterpret_cast<DWORD_PTR>(&static_cast<NT_TIB*>(nullptr)->Self)));
#endif
// Process Environment Block (PEB)
PPEB pebPtr = tebPtr->ProcessEnvironmentBlock;
Just two comments.
No need to push/pop rax because it's a scratch or volatile register on Windows, see the caller/callee saved registers. In particular, rax will hold the return value for your function.
It often helps to step through the machine code when you call GetModuleHandle() and compare it with your own assembly code. You'll probably encounter something like this implementation.
I like Sirmabus' answer but I much prefer it with simple C casts and the offsetof macro:
PPEB get_peb()
{
#if defined(_M_X64) // x64
PTEB tebPtr = (PTEB)__readgsqword(offsetof(NT_TIB, Self));
#else // x86
PTEB tebPtr = (PTEB)__readfsdword(offsetof(NT_TIB, Self));
#endif
return tebPtr->ProcessEnvironmentBlock;
}
Get_Ldr_Addr didnt save your result.
you should not protect rax by push and pop because rax is the return value
I have a program in which a simple function is called a large number of times. I have added some simple logging code and find that this significantly affects performance, even when the logging code is not actually called. A complete (but simplified) test case is shown below:
#include <chrono>
#include <iostream>
#include <random>
#include <sstream>
using namespace std::chrono;
std::mt19937 rng;
uint32_t getValue()
{
// Just some pointless work, helps stop this function from getting inlined.
for (int x = 0; x < 100; x++)
{
rng();
}
// Get a value, which happens never to be zero
uint32_t value = rng();
// This (by chance) is never true
if (value == 0)
{
value++; // This if statment won't get optimized away when printing below is commented out.
std::stringstream ss;
ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
std::cout << ss.str();
}
return value;
}
int main(int argc, char* argv[])
{
// Just fror timing
high_resolution_clock::time_point start = high_resolution_clock::now();
uint32_t sum = 0;
for (uint32_t i = 0; i < 10000000; i++)
{
sum += getValue();
}
milliseconds elapsed = duration_cast<milliseconds>(high_resolution_clock::now() - start);
// Use (print) the sum to make sure it doesn't get optimized away.
std::cout << "Sum = " << sum << ", Elapsed = " << elapsed.count() << "ms" << std::endl;
return 0;
}
Note that the code contains stringstream and cout but these are never actually called. However, the presence of these three lines of code increases the run time from 2.9 to 3.3 seconds. This is in release mode on VS2013. Curiously, if I build in GCC using '-O3' flag the extra three lines of code actually decrease the runtime by half a second or so.
I understand that the extra code could impact the resulting executable in a number of ways, such as by preventing inlining or causing more cache misses. The real question is whether there is anything I can do to improve on this situation? Switching to sprintf()/printf() doesn't seem to make a difference. Do I need to simply accept that adding such logging code to small functions will affect performance even if not called?
Note: For completeness, my real/full scenario is that I use a wrapper macro to throw exceptions and I like to log when such an exception is thrown. So when I call THROW_EXCEPT(...) it inserts code similar to that shown above and then throws. This in then hurting when I throw exceptions from inside a small function. Any better alternatives here?
Edit: Here is a VS2013 solution for quick testing, and so compiler settings can be checked: https://drive.google.com/file/d/0B7b4UnjhhIiEamFyS0hjSnVzbGM/view?usp=sharing
So I initially thought that this was due to branch prediction and optimising out branches so I took a look at the annotated assembly for when the code is commented out:
if (value == 0)
00E21371 mov ecx,1
00E21376 cmove eax,ecx
{
value++;
Here we see that the compiler has helpfully optimised out our branch, so what if we put in a more complex statement to prevent it from doing so:
if (value == 0)
00AE1371 jne getValue+99h (0AE1379h)
{
value /= value;
00AE1373 xor edx,edx
00AE1375 xor ecx,ecx
00AE1377 div eax,ecx
Here the branch is left in but when running this it runs about as fast as the previous example with the following lines commented out. So lets have a look at the assembly for having those lines left in:
if (value == 0)
008F13A0 jne getValue+20Bh (08F14EBh)
{
value++;
std::stringstream ss;
008F13A6 lea ecx,[ebp-58h]
008F13A9 mov dword ptr [ss],8F32B4h
008F13B3 mov dword ptr [ebp-0B0h],8F32F4h
008F13BD call dword ptr ds:[8F30A4h]
008F13C3 push 0
008F13C5 lea eax,[ebp-0A8h]
008F13CB mov dword ptr [ebp-4],0
008F13D2 push eax
008F13D3 lea ecx,[ss]
008F13D9 mov dword ptr [ebp-10h],1
008F13E0 call dword ptr ds:[8F30A0h]
008F13E6 mov dword ptr [ebp-4],1
008F13ED mov eax,dword ptr [ss]
008F13F3 mov eax,dword ptr [eax+4]
008F13F6 mov dword ptr ss[eax],8F32B0h
008F1401 mov eax,dword ptr [ss]
008F1407 mov ecx,dword ptr [eax+4]
008F140A lea eax,[ecx-68h]
008F140D mov dword ptr [ebp+ecx-0C4h],eax
008F1414 lea ecx,[ebp-0A8h]
008F141A call dword ptr ds:[8F30B0h]
008F1420 mov dword ptr [ebp-4],0FFFFFFFFh
That's a lot of instructions if that branch is ever hit. So what if we try something else?
if (value == 0)
011F1371 jne getValue+0A6h (011F1386h)
{
value++;
printf("This never gets printed, but commenting out these three lines improves performance.");
011F1373 push 11F31D0h
011F1378 call dword ptr ds:[11F30ECh]
011F137E add esp,4
Here we have far fewer instructions and once again it runs as quickly as with all lines commented out.
So I'm not sure I can say for certain exactly what is happening here but I feel at the moment it is a combination of branch prediction and CPU instruction cache misses.
In order to solve this problem you could move the logging into a function like so:
void log()
{
std::stringstream ss;
ss << "This never gets printed, but commenting out these three lines improves performance." << std::endl;
std::cout << ss.str();
}
and
if (value == 0)
{
value++;
log();
Then it runs as fast as before with all those instructions replaced with a single call log (011C12E0h).
I bumped into a very serious error using visual studio 2005, running a C++ Win32 Console application. The problem will show when running the code below (simplified), using the following project properties: C++|optimization|optimization|/O2 (or /O1, or /Ox), C++|optimization|Whole program optimization|/GL, linker|optimization|/ltcg
#include "stdafx.h"
#include <iostream>
using namespace std;
const int MAXVAL=10;
class MyClass
{
private:
int p;
bool isGood;
public:
int SetUp(int val);
};
int MyClass::SetUp(int val)
{
isGood = true;
if (MAXVAL<val)
{
int wait;
cerr<<"ERROR, "<<MAXVAL<<"<"<<val<<endl;
cin>>wait;
//exit(1); //for x64 uncomment, for win32 leave commented
}
if (isGood) p=4;
return 1;
}
int _tmain(int argc, _TCHAR* argv[])
{
int wait=0, setupVal1=10, setupVal2=12;
MyClass classInstance1;
MyClass classInstance2;
if (MAXVAL>=setupVal1) classInstance1.SetUp(setupVal1);
if (MAXVAL>setupVal2) classInstance2.SetUp(setupVal2);
cerr<<"exit, enter value to terminate\n";
cin>>wait;
return 0;
}
The output shows that value 10 is smaller then value 10! I already found out that changing setting /O2 to /Od solves the problem (setting /Og, which is part of /O2, causes the problem), but that really slows down the execution time. Also changing the code a bit can solve it but hey, I can never be sure that the code is reliable. I am using Visual studio 2005 professional (Version 8.0.50727.867), os windows 7.
My questions are: can someone try to reproduce this error using Visual Studio 2005, (I already tried VS 2010, no problem), and if so, what happens here?
Can I assume that newer versions have solved this problem (I consider buying VS 2012)
Thank you
You can reduce your example significantly and still get the same problem! You don't need two instances, and you don't need any of the other local or member variables. Also, you can hardcode MAXVAL.
Quick summary of what "solves" the problem:
making MAXVAL a nonconst int
setting setupVal2 to a value less than 10
surprisingly, changing the condition 10<val to val>10 !!!
Here's my minimal version to reproduce the problem:
#include "stdafx.h"
#include <iostream>
using namespace std;
class MyClass
{
public:
int SetUp(int val);
};
int MyClass::SetUp(int val)
{
if (10<val)
cout<<10<<"<"<<val<<endl;
return 1;
}
int _tmain(int argc, _TCHAR* argv[])
{
int setupVal1=10, setupVal2=12;
MyClass classInstance;
classInstance.SetUp(setupVal1);
classInstance.SetUp(setupVal2);
cin.get();
return 0;
}
The problem, as witnessed by the disassembly, is that the compiler thinks 10<val is always true and therefore omits the check.
_TEXT SEGMENT
?SetUp#MyClass##QAEHH#Z PROC ; MyClass::SetUp
; _val$ = ecx
; 16 : if (10<val)
; 17 : cout<<10<<"<"<<val<<endl;
mov eax, DWORD PTR __imp_?endl#std##YAAAV?$basic_ostream#DU?$char_traits#D#std###1#AAV21##Z
push eax
push ecx
mov ecx, DWORD PTR __imp_?cout#std##3V?$basic_ostream#DU?$char_traits#D#std###1#A
push 10 ; 0000000aH
call DWORD PTR __imp_??6?$basic_ostream#DU?$char_traits#D#std###std##QAEAAV01#H#Z
push eax
call ??$?6U?$char_traits#D#std###std##YAAAV?$basic_ostream#DU?$char_traits#D#std###0#AAV10#PBD#Z ; std::operator<<<std::char_traits<char> >
add esp, 4
mov ecx, eax
call DWORD PTR __imp_??6?$basic_ostream#DU?$char_traits#D#std###std##QAEAAV01#H#Z
mov ecx, eax
call DWORD PTR __imp_??6?$basic_ostream#DU?$char_traits#D#std###std##QAEAAV01#P6AAAV01#AAV01##Z#Z
; 18 : return 1;
mov eax, 1
; 19 : }