int testFun(int A)
{
return A+1;
}
int main()
{
int x=0;
int y= testFun(x)
cout<<y;
}
As we know, the stack saves the local variables, which means when I was in the main function, the stack had variables (x and y) and when I called the function (testFun) the stack had the variable(A)
and when I return from (testFun) The stack pops the last frame
But the quesion here, when I return from (testFun), how it know the last place it were in the main function before calling the (testFun)
when I return from (testFun), how it know the last place it were in the main function before calling the (testFun)
The compiler parses the code and generates machine instructions that run on the CPU. A function call produces a CALL instruction. When the function exits, a RET instruction is used to return to the caller.
The CALL instruction pushes the address of the instruction that follows the CALL itself onto the call stack, then jumps to the starting address of the specified function.
The RET instruction pops that address from the call stack, then jumps to the specified address.
Related
I use Windows 10, Visual Studio 2019
The program generates threads. I need to add functionality pointing me what is available stack size in any execution time .
#include <iostream>
#include <thread>
void thread_function()
{
//AvailableStackSize()?
//Code1 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
//Code2 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
//Code3 //varibales on stack allocation + function call
//AvailableStackSize()? it should decrease
}
int main()
{
std::thread t(&thread;_function);
std::cout << "main thread\n";
std::thread t2 = t;
t2.join();
return 0;
}
I try to use . But I am not sure how can I proceed. I just can query what is a total size of stack. Not an available one.
bool AvailableStackSize()
{
// Get the stack pointer
PBYTE pEsp;
_asm {
mov pEsp, esp
};
// Query the accessible stack region
MEMORY_BASIC_INFORMATION mbi;
VERIFY(VirtualQuery(pEsp, &mbi, sizeof(mbi)));
return mbi.RegionSize;//This is a total size! what is an available? Where guard is located?
}
Probably I can Also check if some address in inside a stack
PVOID add;
(add>= mbi.BaseAddress) && (add < PBYTE(mbi.BaseAddress) + mbi.RegionSize);
I saw several similar question, but no one answers the question 100%.
What is correct approach to get available stack size?
Since the stack grows downward, it's the difference between the potential bottom of the stack and the end of what is currently used. BTW you don't need any inline assembly, because you know that locals are placed on the stack.
__declspec(noinline) size_t AvailableStackSize()
{
// Query the accessible stack region
MEMORY_BASIC_INFORMATION mbi;
VERIFY(VirtualQuery(&mbi, &mbi, sizeof(mbi)));
return uintptr_t(&mbi) - uintptr_t(mbi.AllocationBase);
}
This will be slightly different from what's actually available in the caller, because the function call used some (to store the return address and any preserved registers). And you'd need to investigate whether the final guard page appears in the same VirtualQuery result or a neighboring one.
But this is the general approach.
i have written a pintool. it implements a stack for a program while instructions are executed. In case a call instruction is encountered it pushes the address of next instruction in sequence to stack. when the called procedure is completed and the return instruction is encountered it verifies the target address in ret instruction is equal to the top of stack and pop out the top.
normally the number of call instructions should be equal to return instructions. but this tool monitors a larger number of return instructions. how is this possible? what is the problem? how can i solve it?
edit 1:
code for pintool
VOID f_jump(int a, int b)
{
s.push(b);
cout<<s.top()<<"\t";
icount1++;
}
VOID f_ret(int a, int b)
{
if (b==s.top())
{
cout<<s.top();
s.pop();
cout<<"\tOK"<<endl;
}
else
cout<<"Exploit\t"<<endl<<s.top()<<"\t"<<b<<endl;
icount2++;
}
VOID Instruction(INS ins, VOID *v)
{
if( INS_IsCall(ins) )
{
INS_InsertCall(ins,IPOINT_TAKEN_BRANCH,AFUNPTR(f_jump),
IARG_BRANCH_TARGET_ADDR,IARG_RETURN_IP, IARG_END);
}
if( INS_IsRet(ins) )
{
INS_InsertCall(ins,IPOINT_BEFORE,AFUNPTR(f_ret),
IARG_INST_PTR,IARG_BRANCH_TARGET_ADDR, IARG_END);
}
}
i run it on various binaries and processes but the problem remained the same. please help.
You can have longjmp, C++ exceptions or exit calls in a function call, which makes you miss the return instructions of that function call.
This has been discussed many times here
Here is my program just to find the difference between pthread_exit and return from a thread.
struct foo{
int a,b,c,d;
~foo(){cout<<"foo destructor called"<<endl;}
};
//struct foo foo={1,2,3,4};
void printfoo(const char *s, const struct foo *fp)
{
cout<<s;
cout<<"struct at 0x"<<(unsigned)fp<<endl;
cout<<"foo.a="<<fp->a<<endl;
cout<<"foo.b="<<fp->b<<endl;
cout<<"foo.c="<<fp->c<<endl;
cout<<"foo.d="<<fp->d<<endl;
}
void *thr_fn1(void *arg)
{
struct foo foo={1,2,3,4};
printfoo("thread1:\n",&foo);
pthread_exit((void *)&foo);
//return((void *)&foo);
}
int main(int argc, char *argv[])
{
int err;
pthread_t tid1,tid2;
struct foo *fp;
err=pthread_create(&tid1,NULL,thr_fn1,NULL);
if(err!=0)
cout<<"can't create thread 1"<<endl;
err=pthread_join(tid1,(void **)&fp);
if(err!=0)
cout<<"can't join with thread 1"<<endl;
exit(0);
}
In "*thr_fn1" thread function I created an object foo.
According to the site pthread_exit vs. return
when I exit the thread function "thr_fun1()" using "return((void *)&foo);" it should call the destructor for the object foo, but it should not call the destructor when I call "pthread_exit((void *)&foo);" to return to main from function "thr_fun1()".
But in both the cases using "return((void *)&foo);" or "pthread_exit((void *)&foo);" the local object "foo" in function "thr_fun1()" is getting called.
This is not the behaviour I guess. Destructor should be called only in "return((void *)&foo);" case only.
Please verify me if I am wrong?
Your code has a serious problem. Specifically, you're using a local variable as the exit value for pthread_exit():
void *thr_fn1(void *arg)
{
struct foo foo={1,2,3,4};
printfoo("thread1:\n",&foo);
pthread_exit((void *)&foo);
//return((void *)&foo);
}
Per the Pthreads spec, "After a thread has terminated, the result of access to local (auto) variables of the thread is undefined."
Therefore, returning the address of a stack-allocated variable from your thread function as the thread exit value (in your case, pthread_exit((void *)&foo) ) will cause problems for any code that retrieves and attempts to dereference this address.
Yes, that's right. pthread_exit() immediately exits the current thread, without calling any destructors of objects higher up on the stack. If you're coding in C++, you should make sure to either always return from your thread procedure, or only call pthread_exit() from one of the bottommost stack frames with no objects with destructors still alive in that frame or any higher frames; otherwise, you will leak resources or cause other bad problems.
pthread_exit() is throwing an exception which causes the stack to unwind and destructors to be called for locals. See https://stackoverflow.com/a/11452942/12711 for more details.
The exception thrown is of type abi::__forced_unwind (from cxxabi.h); an Internet search can give you more details.
note: as other answers/comments have mentioned, returning the address of a local wouldn't work anyway, but that is besides the point of the question. You get the same behavior regarding destructing foo if some other valid address (or the null pointer) is returned instead of &foo.
SOLVED / SHORT ANSWER: Yes you can. Bug was somewhere else. Read on if you want to know where it was.
I have to process items (do calculations that are independent between items). Items are processed in a function a();
What i want to do is whenever a() is called, create a new thread with all a()'s processing code in it, and immediately exit a(). Next time a() will be called (is called immediately by the caller which i don't have access to), will again create a new thread and terminate. When 8 consequent calls have been made (i have 8 cores), inside a() join the 8 previous threads and go on...
Is this possible? Can i join inside a() threads that have been created in a previous call of a()?
My program, while it runs perfectly for 1 thread, it faults in any other number.
=================================================================================
ADDED CODE FOR YOU TO SEE:
First of all. I don't have access to the function that calls a(). if no threading is involved, caller waits until a() finishes it's calculations, and then calls it again providing the next x,y* s. What i want to do is doing parallel the calculations of 8 a()s. If a() can start its calculations and return (create a thread and exit), caller will call a() again with the new x,y* while the old are still being calculated. This is the concept. Calculations of every x,y* pair is totally independent to any other pair.
int counter = 0;
pthread_t threads[8]; //i have 8 cores
thread_args args[8]; //arguments that pass to the threads
int res[8]; //threads store their results here
void a(int x, int y*) { //a() is being called by caller immediately after it returns with a new pair of x,y*
args[counter].x = x; //struct thread_args has x,y,my_counter
args[counter].y = y;
args[counter].my_counter = counter;
pthread_create(&threads[counter], NULL, calculate_xy, (void *)&args[counter]);
//calculate_xy stores results in res[args->my_counter]
if(++counter != 8)
return;
//it reaches here every 8th call of a(); (total number of a() calls is an exact multiple of 8)
counter = 0;
for (int i = 0; i < 8; ++i)
pthread_join(threads[i], NULL);
//GO ON... append the 8 results to a text and go on...
}//end a()
First of all, whatever the bug in your code, this is a bad design. Your function a() has global state (the past-created threads and the number created so far) which would be bad enough in a single-threaded program, but in a multi-threaded program, things could go very wrong if multiple threads could simultaneously call a(). Even if not, there are many reasons to avoid global state:
http://www.youtube.com/watch?v=-FRm3VPhseI
A much better design would be for the a() function to take an extra argument, a pointer to a structure containing the counter and an array of pthread_t values for all the threads created so far. Then, the "state of a()" would not be global state, but would be state belonging to the part of the program using a().
As for why your program is crashing right now, it's hard to say without seeing any code. I suspect you're either calling a() from multiple threads without synchronization, or just have a careless error/typo somewhere in your array indexing...
The answer to my original question is yes.
You can join threads in a function that were created in a previous call of this same function.
The bug in my code was that the place where y* pointed, was reused inside caller, every time a() was called. So, while i thought that previously created threads were still doing their job correctly, they were not because during their life, the place where y* argument was pointing was being repeatedly rewritten, at every new a() call from the caller with the contents of the next x,y* pair messing threads' calculations.
Thank you all. You guided me to solution.
The following code summarizes the problem I have at the moment. My current execution flow is as follows and a I'm running in GCC 4.3.
jmp_buf a_buf;
jmp_buf b_buf;
void b_helper()
{
printf("entering b_helper");
if(setjmp(b_buf) == 0)
{
printf("longjmping to a_buf");
longjmp(a_buf, 1);
}
printf("returning from b_helper");
return; //segfaults right here
}
void b()
{
b_helper();
}
void a()
{
printf("setjmping a_buf");
if(setjmp(a_buf) == 0)
{
printf("calling b");
b();
}
printf("longjmping to b_buf");
longjmp(b_buf, 1);
}
int main()
{
a();
}
The above execution flow creates a segfault right after the return in b_helper. It's almost as if only the b_helper stack frame is valid, and the stacks below it are erased.
Can anyone explain why this is happening? I'm guessing it's a GCC optimization that's erasing unused stack frames or something.
Thanks.
You can only longjmp() back up the call stack. The call to longjmp(b_buf, 1) is where things start to go wrong, because the stack frame referenced by b_buf no longer exists after the longjmp(a_buf).
From the documentation for longjmp:
The longjmp() routines may not be called after the routine which called the setjmp() routines returns.
This includes "returning" through a longjmp() out of the function.
The standard says this about longjmp() (7.13.2.1 The longjmp function):
The longjmp function restores the environment saved by the most recent invocation of
the setjmp macro in the same invocation of the program with the corresponding
jmp_buf argument. If there has been no such invocation, or if the function containing
the invocation of the setjmp macro has terminated execution in the interim
with a footnote that clarifies this a bit:
For example, by executing a return statement or because another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set of nested calls.
So you can't longjmp() back & forth across nested setjmp/longjmp sets.