How to trap stack overflow in a Windows x64 C++ application

How to trap stack overflow in a Windows x64 C++ application - c++

I am trying to compile an application to x64 platform architecture in Windows. A couple of threads, handling the parsing of a scripting language, uses this code recommended by Microsoft to trap stack overflows and avoid access violation exceptions:
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
LPBYTE lpPage;
static SYSTEM_INFO si;
static MEMORY_BASIC_INFORMATION mi;
static DWORD dwOldProtect;
// Get page size of system
GetSystemInfo(&si);
// Find SP address
_asm mov lpPage, esp;
// Get allocation base of stack
VirtualQuery(lpPage, &mi, sizeof(mi));
// Go to page beyond current page
lpPage = (LPBYTE)(mi.BaseAddress)-si.dwPageSize;
// Free portion of stack just abandoned
if (!VirtualFree(mi.AllocationBase,
(LPBYTE)lpPage - (LPBYTE)mi.AllocationBase,
MEM_DECOMMIT))
{
exit(1);
}
// Reintroduce the guard page
if (!VirtualProtect(lpPage, si.dwPageSize,
PAGE_GUARD | PAGE_READWRITE,
&dwOldProtect))
{
exit(1);
}
Sleep(2000);
}
Unfortunately it uses one line of inline assembler to get the stack pointer. Visual Studio does not support inline assembly for x64 mode and I can't find a compiler intrinsic for getting the stack pointer neither.
Is it possible to do this in a x64 friendly manner?

As pointed out in a comment to the question, the whole "hack" above can be replaced by the _resetstkoflw function. This works fine in both x86 and x64 mode.
The code snippet above then becomes:
// Filter for the stack overflow exception. This function traps
// the stack overflow exception, but passes all other exceptions through.
int stack_overflow_exception_filter(int exception_code)
{
if (exception_code == EXCEPTION_STACK_OVERFLOW)
{
// Do not call _resetstkoflw here, because at this point
// the stack is not yet unwound. Instead, signal that the
// handler (the __except block) is to be executed.
return EXCEPTION_EXECUTE_HANDLER;
}
else
return EXCEPTION_CONTINUE_SEARCH;
}
void example()
{
int result = 0;
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(stack_overflow_exception_filter(GetExceptionCode()))
{
// Here, it is safe to reset the stack.
result = _resetstkoflw();
}
// Terminate if _resetstkoflw failed (returned 0)
if (!result)
return 3;
return 0;
}

Related

Read access violation using lock xcmpchg16b _InterlockedCompareExchange128

I'm trying to hook a function with a lock xcmpchg16b. I have tried about 20 different things.
Expected result:
In real func
In hook func
Result in Debug build:
Exception thrown at 0x..: 0xC0000005 Access violation reading 0xFFFFFFFFFFFFFFFF
I'm not sure why it is trying to read from 0xFFFFFFFFFFFFFFFF, none of the pointers go there.
In a Release build, it doesn't crash! But it doesn't hook the function either.
Source:
#include <stdio.h>
#include <Windows.h>
int RealFunc()
{
printf("In real func\n");
return 2;
}
int HookFunc()
{
printf("In hook func\n");
return 1;
}
int main()
{
DWORD dwOld;
if (!VirtualProtect(&RealFunc, 0x1000, PAGE_EXECUTE_READWRITE, &dwOld))
{
printf("Unable to make mem RWX.\n");
return 0;
}
RealFunc();
__declspec(align(16)) PVOID ProcAddress = &RealFunc;
__declspec(align(16)) LONG64 Restore[2];
Restore[0] = 0x0000000025ff9090; // nop, nop, jmp [rip + 0]
Restore[1] = (LONG64)&HookFunc;
_InterlockedCompareExchange128((LONG64*)ProcAddress, Restore[0], Restore[1], Restore);
RealFunc();
system("PAUSE");
return 0;
}
Here is the function documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/hh972640(v=vs.85).aspx

_InterlockedCompareExchange128((LONG64*)ProcAddress,
Restore[0], Restore[1], Restore);
this is of course wrong. if look for function signature
unsigned char __cdecl InterlockedCompareExchange128(
_Inout_ LONGLONG volatile *Destination,
_In_ LONGLONG ExchangeHigh,
_In_ LONGLONG ExchangeLow,
_Inout_ LONGLONG *ComparandResult
);
second operand is ExchangeHigh and third is ExchangeLow - so must be Restore[1], Restore[0] but not Restore[0], Restore[1]. also ComparandResult must hold original function data. so it can not be Restore.
also note next, from MSDN:
The parameters for this function must be aligned on a 16-byte
boundary; otherwise, the function will behave unpredictably on x64
systems.
but which parameters ? all ? obvious that no. for example ExchangeHigh and ExchangeLow is passed by value. we can use direct values here which at all have no any address. so speak about align is senseless for second and third params. really InterlockedCompareExchange128 is converted to lock cmpxchg16b instruction. from intel manual
Note that CMPXCHG16B requires that the destination (memory) operand be
16-byte aligned.
so only Destination must be 16-byte aligned. ComparandResult - not (it will be moved to RCX:RBX register pairs)
so __declspec(align(16)) LONG64 Restore[2]; you not need at all - you can pass direct values to InterlockedCompareExchange128. then
__declspec(align(16)) PVOID ProcAddress = &RealFunc;
with _InterlockedCompareExchange128((LONG64*)ProcAddress..
wrong and senseless. what is different which align of ProcAddress ?? the memory to which point ProcAddress must be 16 byte aligned. but not ProcAddress itself. and again we not need any temporary variable here. we can direct use
_InterlockedCompareExchange128((LONG64*)RealFunc, ...)
of course RealFunc must be be 16-byte aligned. otherwise we got exactly 0xC0000005 Access violation reading 0xFFFFFFFFFFFFFFFF exception.
so i guess that in debug mode RealFunc not 16 byte aligned.
In a Release build, it doesn't crash! But it doesn't hook the function
either.
not hook because you use Restore in place ComparandResult and no exception because RealFunc was randomly 16byte align.
because in general function can have any address and must not be aligned on 16 bytes - _InterlockedCompareExchange128 not useful at all here. also this is only for x64, not for x86
code (which anyway not hook function, if RealFunc not aligned on 16 bytes)can look like
int RealFunc()
{
printf("In real func\n");
return 2;
}
int HookFunc()
{
printf("In hook func\n");
return 1;
}
int xxx()
{
DWORD dwOld;
if (VirtualProtect(RealFunc, 2*sizeof(PVOID), PAGE_EXECUTE_READWRITE, &dwOld))
{
RealFunc();
#if defined(_M_X64)
if (!((LONG_PTR)RealFunc & 15))
{
LONG64 Comparand[2] = { ((LONG64*)RealFunc)[0], ((LONG64*)RealFunc)[1] };
InterlockedCompareExchange128((LONG64*)RealFunc, (LONG64)HookFunc, 0x0000000025ff9090, Comparand);
}
else
{
printf("bad function address %p\n", RealFunc);
}
#elif defined(_M_IX86)
static PVOID pvHookFunc = HookFunc;
LARGE_INTEGER Exchange = { 0x25ff9090, (LONG)&pvHookFunc };
LONG64 Comparand;
memcpy(&Comparand, RealFunc, sizeof(Comparand));
InterlockedCompareExchange64((LONG64*)RealFunc, Exchange.QuadPart, Comparand);
#else
#error not implemented
#endif
FlushInstructionCache(NtCurrentProcess(), RealFunc, 2*sizeof(PVOID));
if (dwOld != PAGE_EXECUTE_READWRITE)
{
VirtualProtect(RealFunc, 2*sizeof(PVOID), dwOld, &dwOld);
}
RealFunc();
}
return 0;
}

setjmp, longjump and stack reconstruction

Normally setjmp and longjmp does not care about call stack - instead functions are just preserving and restoring registers.
I would like to use setjmp and longjmp so that call stack would be preserved, and then restored at different executing context
EnableFeature( bool bEnable )
{
if( bEnable )
{
if( setjmp( jmpBuf ) == 0 )
{
backup call stack
} else {
return; //Playback backuped call stack + new call stack
}
} else {
restore saved call stack on top of current call stack
modify jmpBuf so we will jump to new stack ending
longjmp( jmpBuf )
}
Is this kind of approach possible - can someone code me a sample code for this ?
Why I believe by myself it's doable - is because of similar code snipet I have already coded / prototyped:
Communication protocol and local loopback using setjmp / longjmp
There is two call stack running simultaneously - independently from each other.
But just to help you out with this task - I'll give you function for getting callstack for native and managed code:
//
// Originated from: https://sourceforge.net/projects/diagnostic/
//
// Similar to windows API function, captures N frames of current call stack.
// Unlike windows API function, works with managed and native functions.
//
int CaptureStackBackTrace2(
int FramesToSkip, //[in] frames to skip, 0 - capture everything.
int nFrames, //[in] frames to capture.
PVOID* BackTrace //[out] filled callstack with total size nFrames - FramesToSkip
)
{
#ifdef _WIN64
CONTEXT ContextRecord;
RtlCaptureContext( &ContextRecord );
UINT iFrame;
for( iFrame = 0; iFrame < (UINT)nFrames; iFrame++ )
{
DWORD64 ImageBase;
PRUNTIME_FUNCTION pFunctionEntry = RtlLookupFunctionEntry( ContextRecord.Rip, &ImageBase, NULL );
if( pFunctionEntry == NULL )
{
if( iFrame != -1 )
iFrame--; // Eat last as it's not valid.
break;
}
PVOID HandlerData;
DWORD64 EstablisherFrame;
RtlVirtualUnwind( 0 /*UNW_FLAG_NHANDLER*/,
ImageBase,
ContextRecord.Rip,
pFunctionEntry,
&ContextRecord,
&HandlerData,
&EstablisherFrame,
NULL );
if( FramesToSkip > (int)iFrame )
continue;
BackTrace[iFrame - FramesToSkip] = (PVOID)ContextRecord.Rip;
}
#else
//
// This approach was taken from StackInfoManager.cpp / FillStackInfo
// http://www.codeproject.com/Articles/11221/Easy-Detection-of-Memory-Leaks
// - slightly simplified the function itself.
//
int regEBP;
__asm mov regEBP, ebp;
long *pFrame = (long*)regEBP; // pointer to current function frame
void* pNextInstruction;
int iFrame = 0;
//
// Using __try/_catch is faster than using ReadProcessMemory or VirtualProtect.
// We return whatever frames we have collected so far after exception was encountered.
//
__try {
for( ; iFrame < nFrames; iFrame++ )
{
pNextInstruction = (void*)(*(pFrame + 1));
if( !pNextInstruction ) // Last frame
break;
if( FramesToSkip > iFrame )
continue;
BackTrace[iFrame - FramesToSkip] = pNextInstruction;
pFrame = (long*)(*pFrame);
}
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
}
#endif //_WIN64
iFrame -= FramesToSkip;
if( iFrame < 0 )
iFrame = 0;
return iFrame;
} //CaptureStackBackTrace2
I think it can be modified to obtain actual stack pointer (x64 - eSP and for x32 - there is a pointer already).

Legally, setjmp/longjmp can only be used to jump "back" in the nested call sequence. Which means that it never needs to really "reconstruct" anything - at the moment when you execute the longjmp everything is still intact, right there in the stack. All you need to do is rollback the extra stuff accumulated on top of that between the moment of setjmp and the moment of longjmp.
longjmp automatically does a "shallow" rollback for you (i.e. it simply purges the raw bytes off the top of the stack without calling any destructors). So, if you wanted to do a proper "deep" rollback (like what exceptions do as they fly up the call hierarchy) you'd have to setjmp at each level that needs deep cleanup, "intercept" the jump, perform the cleanup manually and then longjmp further up the call hierarchy.
But this would basically be a manual implementation of "poor-man's exception handling". Why would you want to reimplement it manually? I'd understand if you wanted to do it in C code. But why in C++?
P.S. And yes, setjmp/longjmp are sometimes used in a non-standard way to implement co-routines in C, which does involve jumping "across" and a raw form of stack restoration. But this is non-standard. And in general case it would be much more painful to implement in C++ for the very same reasons that I mentioned above.

Is __finally supposed to run after EXCEPTION_CONTINUE_SEARCH?

In the following code, the function foo calls itself recursively once. The inner call causes an access violation to be raised. The outer call catches the exception.
#include <windows.h>
#include <stdio.h>
void foo(int cont)
{
__try
{
__try
{
__try
{
if (!cont)
*(int *)0 = 0;
foo(cont - 1);
}
__finally
{
printf("inner finally %d\n", cont);
}
}
__except (!cont? EXCEPTION_CONTINUE_SEARCH: EXCEPTION_EXECUTE_HANDLER)
{
printf("except %d\n", cont);
}
}
__finally
{
printf("outer finally %d\n", cont);
}
}
int main()
{
__try
{
foo(1);
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
printf("main\n");
}
return 0;
}
The expected output here should be
inner finally 0
outer finally 0
inner finally 1
except 1
outer finally 1
However, outer finally 0 is conspicuously missing from the real output. Is this a bug or is there some detail I'm overlooking?
For completeness, happens with VS2015, compiling for x64. Surprisingly it doesn't happen on x86, leading me to believe that it is really a bug.

exist and more simply example (we can remove inner try/finally block:
void foo(int cont)
{
__try
{
__try
{
if (!cont) *(int *)0 = 0;
foo(cont - 1);
}
__except (cont? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
printf("except %d\n", cont);
}
}
__finally
{
printf("finally %d\n", cont);
}
}
with output
except 1
finally 1
so finally 0 block not executed. but in not recursive case - no bug:
__try
{
foo(0);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
printf("except\n");
}
output:
finally 0
except
this is bug in next function
EXCEPTION_DISPOSITION
__C_specific_handler (
_In_ PEXCEPTION_RECORD ExceptionRecord,
_In_ PVOID EstablisherFrame,
_Inout_ PCONTEXT ContextRecord,
_Inout_ PDISPATCHER_CONTEXT DispatcherContext
);
old implementation of this function with bug here :
//
// try/except - exception filter (JumpTarget != 0).
// After the exception filter is called, the exception
// handler clause is executed by the call to unwind
// above. Having reached this point in the scan of the
// scope tables, any other termination handlers will
// be outside the scope of the try/except.
//
if (TargetPc == ScopeTable->ScopeRecord[Index].JumpTarget) { // bug
break;
}
if we have latest VC compiler/libraries installed, search for chandler.c (in my install in located at \VC\crt\src\amd64\chandler.c )
and in file can view now next code:
if (TargetPc == ScopeTable->ScopeRecord[Index].JumpTarget
// Terminate only when we are at the Target frame;
// otherwise, continue search for outer finally:
&& IS_TARGET_UNWIND(ExceptionRecord->ExceptionFlags)
) {
break;
}
so additional condition is added IS_TARGET_UNWIND(ExceptionRecord->ExceptionFlags) which fix this bug
__C_specific_handler implemented in different crt libraries (in some case with static link, in some case will be imported from vcruntime*.dll or msvcrt.dll (was forwarded to ntdll.dll)). also ntdll.dll export this function - however in latest win10 builds(14393) it still not fixed

Detect stack overflow with old GLIBC version

We suspect that we are encountering a stack overflow in our multithreaded program. However, as it is an embedded application, we have been unable to get valgrind etc working for it. Also, we are constrained to using GCC version v4.0.0 and GLIBC v2.3.2, which do not support the flag -fstack-protector-all.
How could we go about detecting whether the segmentation faults we are seeing are the result of a stack overflow in this instance? We have doubled the stack size of all our threads, and this fixes the problem, but we would like to be sure that this is a genuine fix.

You can figure this out for yourself with a bit of care. If you set up your program to use a stack you allocated you can add a "guard page" to catch reads and writes to the first page past the end of the given stack. You can then install a signal handler to catch the signal and tell you if the segfault was caused by an access within that guard page.
This is the smallest example I could make that shows how to do this:
#include <stdio.h>
#include <ucontext.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <malloc.h>
#include <signal.h>
static char *guard = NULL;
static const int pagesize = getpagesize();
static void handler(int sig, siginfo_t *info, void *ctx) {
if ((char*)info->si_addr >= guard && (char*)info->si_addr - guard <= pagesize) {
write(2, "stack overflow\n", 15);
}
write(2, "sigsegv caught\n", 15);
_exit(-1);
}
static void install_handler() {
// register sigsegv handler:
static struct sigaction act;
act.sa_sigaction = handler;
sigemptyset(&act.sa_mask);
act.sa_flags=SA_SIGINFO|SA_ONSTACK;
// give the signal handler an alternative stack
static char stack[4096];
stack_t ss;
ss.ss_size = sizeof(stack);
ss.ss_sp = stack;
if (sigaltstack(&ss, 0)) {
perror("sigaltstack");
fprintf(stderr,"failed to set sigstack\n");
exit(-1);
}
if (sigaction(SIGSEGV, &act, NULL)) {
perror("sigaction");
fprintf(stderr,"failed to set handler\n");
exit(-1);
}
}
static int overflow() {
return overflow() + 1;
}
static void test()
{
install_handler();
puts("start test");
// real code that might overflow
// test non-overflow segv
//*(char*)0 = 0;
// test overflow
overflow();
puts("finish test");
}
int main()
{
// create a stack and guard page:
const int pagesize = getpagesize();
char *st1=(char*)memalign(pagesize,1+(pagesize*4));
guard = st1+(pagesize*4);
if (mprotect(guard, pagesize, PROT_NONE)) {
perror("mprotect");
fprintf(stderr,"failed to protect guard page: %p \n", guard);
return -1;
}
ucontext_t ctx[2];
getcontext(&ctx[1]);
ctx[1].uc_stack.ss_sp = st1;
ctx[1].uc_stack.ss_size = 4*pagesize;
ctx[1].uc_link = &ctx[0];
makecontext(&ctx[1], test, 0);
swapcontext(&ctx[0], &ctx[1]);
return 0;
}
As well as using your own stack for your code to run in you have to supply another stack for the signal to be delivered using, otherwise the signal delivery itself will fail because of the guard page.

Do you get a corefile? You should be able to examine a stack trace (either by running the code in GDB or from a corefile) and see if there's a very deep call stack at the time of the crash

GetLastError called in a catch block yields incorrect value

Fun exploring some legacy code today. Ran into this little number:
function Func1()
{
DWORD dwError;
try
{
dwError = 1;
throw "Hey!";
} catch (LPCTSTR szError)
{
Log("Log1: %d", dwError);
SetLastError(dwError);
throw szError;
}
}
function Func2()
{
try {
Func1();
}
catch (LPCTSTR szError)
{
DWORD dwLastError = GetLastError();
Log("Log2: %d", dwLastError); ///OMG is 0!
}
}
GetLastError() returns 0! Why is that? The functions are actually a bit more complicated than this. They do include a few things on the stack (DWORDs, CString, BYTE[]). What should I be looking for?
Logs look like:
Log1: 1
Log2: 0

C++ exceptions in the MSVC compiler and runtime are built on top of native Windows SEH. Stack unwinding is actually performed by Windows. Using Windows api functions is going to affect the value stored for GetLastError(). More details about the connection with SEH in this answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to trap stack overflow in a Windows x64 C++ application - c++

Related

Read access violation using lock xcmpchg16b _InterlockedCompareExchange128

setjmp, longjump and stack reconstruction

Is __finally supposed to run after EXCEPTION_CONTINUE_SEARCH?

Detect stack overflow with old GLIBC version

GetLastError called in a catch block yields incorrect value

Categories

Resources