I use Windows 10, Visual Studio 2019
The program generates threads. I need to add functionality pointing me what is available stack size in any execution time .
#include <iostream>
#include <thread>
void thread_function()
{
//AvailableStackSize()?
//Code1 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
//Code2 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
//Code3 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
}
int main()
{
std::thread t(&thread;_function);
std::cout << "main thread\n";
std::thread t2 = t;
t2.join();
return 0;
}
I try to use . But I am not sure how can I proceed. I just can query what is a total size of stack. Not an available one.
bool AvailableStackSize()
{
// Get the stack pointer
PBYTE pEsp;
_asm {
mov pEsp, esp
};
// Query the accessible stack region
MEMORY_BASIC_INFORMATION mbi;
VERIFY(VirtualQuery(pEsp, &mbi, sizeof(mbi)));
return mbi.RegionSize;//This is a total size! what is an available? Where guard is located?
}
Probably I can Also check if some address in inside a stack
PVOID add;
(add>= mbi.BaseAddress) && (add < PBYTE(mbi.BaseAddress) + mbi.RegionSize);
I saw several similar question, but no one answers the question 100%.
What is correct approach to get available stack size?
Since the stack grows downward, it's the difference between the potential bottom of the stack and the end of what is currently used. BTW you don't need any inline assembly, because you know that locals are placed on the stack.
__declspec(noinline) size_t AvailableStackSize()
{
// Query the accessible stack region
MEMORY_BASIC_INFORMATION mbi;
VERIFY(VirtualQuery(&mbi, &mbi, sizeof(mbi)));
return uintptr_t(&mbi) - uintptr_t(mbi.AllocationBase);
}
This will be slightly different from what's actually available in the caller, because the function call used some (to store the return address and any preserved registers). And you'd need to investigate whether the final guard page appears in the same VirtualQuery result or a neighboring one.
But this is the general approach.
Related
I'm designing a preloader-based lock tracing utility that attaches to Pthreads, and I've run into a weird issue. The program works by providing wrappers that replace relevant Pthreads functions at runtime; these do some logging, and then pass the args to the real Pthreads function to do the work. They do not modify the arguments passed to them, obviously. However, when testing, I discovered that the condition variable pointer passed to my pthread_cond_wait() wrapper does not match the one that gets passed to the underlying Pthreads function, which promptly crashes with "futex facility returned an unexpected error code," which, from what I've gathered, usually indicates an invalid sync object passed in. Relevant stack trace from GDB:
#8 __pthread_cond_wait (cond=0x7f1b14000d12, mutex=0x55a2b961eec0) at pthread_cond_wait.c:638
#9 0x00007f1b1a47b6ae in pthread_cond_wait (cond=0x55a2b961f290, lk=0x55a2b961eec0)
at pthread_trace.cpp:56
I'm pretty mystified. Here's the code for my pthread_cond_wait() wrapper:
int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* lk) {
// log arrival at wait
the_tracer.add_event(lktrace::event::COND_WAIT, (size_t) cond);
// run pthreads function
GET_REAL_FN(pthread_cond_wait, int, pthread_cond_t*, pthread_mutex_t*);
int e = REAL_FN(cond, lk);
if (e == 0) the_tracer.add_event(lktrace::event::COND_LEAVE, (size_t) cond);
else {
the_tracer.add_event(lktrace::event::COND_ERR, (size_t) cond);
}
return e;
}
// GET_REAL_FN is defined as:
#define GET_REAL_FN(name, rtn, params...) \
typedef rtn (*real_fn_t)(params); \
static const real_fn_t REAL_FN = (real_fn_t) dlsym(RTLD_NEXT, #name); \
assert(REAL_FN != NULL) // semicolon absence intentional
And here's the code for __pthread_cond_wait in glibc 2.31 (this is the function that gets called if you call pthread_cond_wait normally, it has a different name because of versioning stuff. The stack trace above confirms that this is the function that REAL_FN points to):
int
__pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex)
{
/* clockid is unused when abstime is NULL. */
return __pthread_cond_wait_common (cond, mutex, 0, NULL);
}
As you can see, neither of these functions modifies cond, yet it is not the same in the two frames. Examining the two different pointers in a core dump shows that they point to different contents, as well. I can also see in the core dump that cond does not appear to change in my wrapper function (i.e. it's still equal to 0x5... in frame 9 at the crash point, which is the call to REAL_FN). I can't really tell which pointer is correct by looking at their contents, but I'd assume it's the one passed in to my wrapper from the target application. Both pointers point to valid segments for program data (marked ALLOC, LOAD, HAS_CONTENTS).
My tool is definitely causing the error somehow, the target application runs fine if it is not attached. What am I missing?
UPDATE: Actually, this doesn't appear to be what's causing the error, because calls to my pthread_cond_wait() wrapper succeed many times before the error occurs, and exhibit similar behavior (pointer value changing between frames without explanation) each time. I'm leaving the question open, though, because I still don't understand what's going on here and I'd like to learn.
UPDATE 2: As requested, here's the code for tracer.add_event():
// add an event to the calling thread's history
// hist_entry ctor gets timestamp & stack trace
void tracer::add_event(event e, size_t obj_addr) {
size_t tid = get_tid();
hist_map::iterator hist = histories.contains(tid);
assert(hist != histories.end());
hist_entry ev (e, obj_addr);
hist->second.push_back(ev);
}
// hist_entry ctor:
hist_entry::hist_entry(event e, size_t obj_addr) :
ts(chrono::steady_clock::now()), ev(e), addr(obj_addr) {
// these are set in the tracer ctor
assert(start_addr && end_addr);
void* buf[TRACE_DEPTH];
int v = backtrace(buf, TRACE_DEPTH);
int a = 0;
// find first frame outside of our own code
while (a < v && start_addr < (size_t) buf[a] &&
end_addr > (size_t) buf[a]) ++a;
// skip requested amount of frames
a += TRACE_SKIP;
if (a >= v) a = v-1;
caller = buf[a];
}
histories is a lock-free concurrent hashmap from libcds (mapping tid->per-thread vectors of hist_entry), and its iterators are guaranteed to be thread-safe as well. GNU docs say backtrace() is thread-safe, and there's no data races mentioned in the CPP docs for steady_clock::now(). get_tid() just calls pthread_self() using the same method as the wrapper functions, and casts its result to size_t.
Hah, figured it out! The issue is that Glibc exposes multiple versions of pthread_cond_wait(), for backwards compatibility. The version I reproduce in my question is the current version, the one we want to call. The version that dlsym() was finding is the backwards-compatible version:
int
__pthread_cond_wait_2_0 (pthread_cond_2_0_t *cond, pthread_mutex_t *mutex)
{
if (cond->cond == NULL)
{
pthread_cond_t *newcond;
newcond = (pthread_cond_t *) calloc (sizeof (pthread_cond_t), 1);
if (newcond == NULL)
return ENOMEM;
if (atomic_compare_and_exchange_bool_acq (&cond->cond, newcond, NULL))
/* Somebody else just initialized the condvar. */
free (newcond);
}
return __pthread_cond_wait (cond->cond, mutex);
}
As you can see, this version tail-calls the current one, which is probably why this took so long to detect: GDB is normally pretty good at detecting frames elided by tail calls, but I'm guessing it didn't detect this one because the functions have the "same" name (and the error doesn't affect the mutex functions because they don't expose multiple versions). This blog post goes into much more detail, coincidentally specifically about pthread_cond_wait(). I stepped through this function many times while debugging and sort of tuned it out, because every call into glibc is wrapped in multiple layers of indirection; I only realized what was going on when I set a breakpoint on the pthread_cond_wait symbol, instead of a line number, and it stopped at this function.
Anyway, this explains the changing pointer phenomenon: what happens is that the old, incorrect function gets called, reinterprets the pthread_cond_t object as a struct containing a pointer to a pthread_cond_t object, allocates a new pthread_cond_t for that pointer, and then passes the newly allocated one to the new, correct function. The frame of the old function gets elided by the tail-call, and to a GDB backtrace after leaving the old function it looks like the correct function gets called directly from my wrapper, with a mysteriously changed argument.
The fix for this was simple: GNU provides the libdl extension dlvsym(), which is like dlsym() but also takes a version string. Looking for pthread_cond_wait with version string "GLIBC_2.3.2" solves the problem. Note that these versions do not usually correspond to the current version (i.e. pthread_create()/exit() have version string "GLIBC_2.2.5"), so they need to be looked up on a per-function basis. The correct string can be determined either by looking at the compat_symbol() or versioned_symbol() macros that are somewhere near the function definition in the glibc source, or by using readelf to see the names of the symbols in the compiled library (mine has "pthread_cond_wait##GLIBC_2.3.2" and "pthread_cond_wait##GLIBC_2.2.5").
int testFun(int A)
{
return A+1;
}
int main()
{
int x=0;
int y= testFun(x)
cout<<y;
}
As we know, the stack saves the local variables, which means when I was in the main function, the stack had variables (x and y) and when I called the function (testFun) the stack had the variable(A)
and when I return from (testFun) The stack pops the last frame
But the quesion here, when I return from (testFun), how it know the last place it were in the main function before calling the (testFun)
when I return from (testFun), how it know the last place it were in the main function before calling the (testFun)
The compiler parses the code and generates machine instructions that run on the CPU. A function call produces a CALL instruction. When the function exits, a RET instruction is used to return to the caller.
The CALL instruction pushes the address of the instruction that follows the CALL itself onto the call stack, then jumps to the starting address of the specified function.
The RET instruction pops that address from the call stack, then jumps to the specified address.
I need to increase the stack size of a boost::thread object. The thread's task is to store a large set 3D points, which is implemented recursively and thus needs quite some memory space on the stack.
int main(int argc, char* argv[]) {
Flashlight *flashlight = new Flashlight();
flashlight->thread_group = new boost::thread_group();
boost::thread::attributes attrs;
attrs.set_stack_size(16*1024*1024);
flashlight->orbslam_thread = new boost::thread(attrs, boost::bind(&Flashlight::orbslam_loop, flashlight));
flashlight->thread_group->add_thread(flashlight->orbslam_thread);
// initializing some more threads ...
flashlight->thread_group->join_all();
return 0;
}
A.) Did I increase the thread's stack size to 16MB correctly in the code listed above?
B.) Is it possible to read the attributes, in particular the current stack size, of a boost::thread object somehow?
Yes, you have set the stack size to 16M. This gives you the stack size.
std::cout << attrs.get_stack_size() << std::endl;
I'm trying to hook a function using PAGE_GUARD but it does not raise any exception when the page/address is called.
void HookMe(){
printf("Not hooked\n");
}
void GoodFnc(){
printf("Hooked!\n");
}
long ExceptionHandler(PEXCEPTION_POINTERS ex){
printf("ExceptionHandler called\n");
}
/*Called by CreateThread in main*/
DWORD WINAPI ExceptionTesting(LPVOID) {
DWORD old = 0;
AddVectoredExceptionHandler(1, ExceptionHandler);
if (VirtualProtect((LPVOID)HookMe, 1, PAGE_EXECUTE_READWRITE | PAGE_GUARD, &old))
printf("PAGE_GUARD set\n");
//This was for testing:
//*(char*)0 = 0;//ExceptionHandler gets called when ACCESS_VIOLATION happens
while (1) {
HookMe();
Sleep(1000);
}
return 0;
}
The code above will only show PAGE_GUARD set and then Not hooked each second, without raising any kind of exception.
I've also made sure that HookMe() is in a different memory page than ExceptionHandler(...) and ExceptionTesting(LPVOID)
Causing any kind of exception such as ACCESS_VIOLATION(as seen in the comment above the infinite loop) will result in ExceptionHandler being called.
It is possible, depending on your compiler, that the call to HookMe is inlined. Inspect the generated code. You should be able to defeat this with something like __declspec(noinline) on the declaration of HookMe. (MS VC++). Note that you can take the address of a function even if it is inlined at all calls!
The documentation for VirtualProtect says that addresses being protected must be part of a reserved region acquired by using VirtualAlloc (or VirtualAllocEx). Code within your program was not allocated this way.
Also, the protection is done on a page basis (usually 4K), so likely all of the code in your example above would be protected, and the guard would go off immediately when the call to VirtualProtect returned - not when Hook was called.
about VirtualProtect
Changes the protection on a region of committed pages in the
virtual address space of the calling process.
PAGES - not single byte. we can set PAGE_GUARD attribute at least on page (0x1000) byte only. as result when you try set PAGE_GUARD to some function - you set guard attribute not only to it but to many bytes around it (before and after). in case code such your (anyway your code is pseudo code, which not compile even) - faster of all guard exception will be just after VirtualProtect return - on next instruction after call. if you want only single function affect by guard page - you need place it in separate exe section, say with #pragma code_seg. also can note - that not needs any infinite loops or separate threads create for test
//#pragma code_seg(".guard")
void HookMe(){
MessageBoxW(0, 0, L"HookMe", MB_ICONINFORMATION);
}
#pragma code_seg()
LONG NTAPI ExceptionHandler(::PEXCEPTION_POINTERS pep)
{
if (pep->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION)
{
WCHAR msg[64];
swprintf(msg, L"guard violation at %p (%p)", pep->ExceptionRecord->ExceptionAddress, HookMe);
MessageBoxW(0, msg, L"ExceptionHandler", MB_ICONWARNING);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
void gtest()
{
if (PVOID pv = AddVectoredExceptionHandler(TRUE, ExceptionHandler))
{
ULONG op;
if (VirtualProtect(HookMe, 1, PAGE_EXECUTE_READ|PAGE_GUARD, &op))
{
HookMe();
}
RemoveVectoredExceptionHandler(pv);
}
}
Why isn't setjmp saving the stack?
Consider the following code:
#include <iostream>
jmp_buf Buf;
jmp_buf Buf2;
void MyFunction()
{
for(int i = 0; i < 5; i++)
{
std::cout << i << std::endl;
if(!setjmp(Buf))
longjmp(Buf2, 1);
}
}
int main (int argc, const char * argv[])
{
while(true)
{
if(!setjmp(Buf2))
{
MyFunction();
break;
}
longjmp(Buf, 1);
}
return 0;
}
What I except is that the code will jump back and forth from main to the function and back printing increasing number every time.
What actually happens is that it prints 0 and then 1 infinite number of times. it is as if when it jumps back into the function the stack is reset to defaults. why is it doing it? is there any way I can make it save the stack too?
I know setjmp and longjmp are even worse than goto when it comes to coding style and readable code, but I am experimenting right now, and this code will probably never see the light of a usable application.
Because unfortunately thats not how setjmp works. setjmp copies the current instruction pointer and register set into the jump buffer but it does not copy the stack (obviously be cause the stack is huge). It looks like you want to use some kind of coroutine based techniques. If you want to do this yourself checkout the ucontext procedured (ucontext.h) http://compute.cnr.berkeley.edu/cgi-bin/man-cgi?ucontext.h+3 they will help you to allocate and manage additionaly thread stacks.
or you could use something like Russ Cox's libtask (http://swtch.com/libtask/) which will help do this for you. Or if you want to do it yourself you should take a look at the libtask code (also available through that link). It's pretty easy to read so its a good resource.