Base of Global Call Stack in C/C++ - c++

I have read that each function invocation leads to pushing of a stack frame in the global call stack and once the function call is completed the call stack is popped off and the control passes to the address that we get from the popped of stack frame. If a called function calls on to yet another function, it will push another return address onto the top of the same call stack, and so on, with the information stacking up and unstacking as the program dictates.
I was wondering what's at the base of global call stack in a C or C++ program?
I did some searching on the internet but none of the sources explicitly mention about it. Is the call stack empty when our program starts and only once a function is called, the call stack usage starts? OR Is the address where main() function has to return, gets implicitly pushed as the base of our call stack and is a stack frame in our call stack? I expect the main() would also have a stack frame in our call stack since we are always returning something at end of our main() function and there needs to be some address to return to. OR is this dependent on compiler/OS and differs according to implementation?
It would be helpful if someone has some informative links about this or could provide details on the process that goes into it.

main() is invoked by the libc code that handles setting up the environment for the executable etc. So by the time main() is called, the stack already has at least one frame created by the caller.

I'm not sure if there is a universal answer, as stack is something that may be implemented differently per architecture. For example a stack may grow up (i.e. stack position pointer value increases when pushing onto the stack) or grow downwards.
Exiting main() is usually done by calling an operating function to indicate the program wishes to to terminate (with the specified return code), so I don't expect a return address for main() to be present on the stack, but this may differ per operating system and even compiler.
I'm not sure why you need to know this, as this is typically something you leave up to the system.

First of all, there is no such thing as a "global call stack". Each thread has a stack, and the stack for the main thread is often looking quite different from the thread of any thread spawned later on. And mostly, each of these "stacks" is just an arbitrary memory segment currently declared to be used as such, sub-allocated from any arbitrary suitable memory pool.
And due to compiler optimizations, many function calls will not even end up on the stack, usually. Meaning there isn't necessarily a distinguishable stack frame. You are only guaranteed that you can reference variables you put on the stack, but not that the compiler must preserve anything you didn't explicitly reference.
There is not even a guarantee that the memory layout for your call stack must even be organized in distinguishable frames. Function pointers are never guaranteed to be part of the stack frame, just happens to be an implementation detail in architectures where data and function pointers may co-exist in the address space. (As there are architectures which require return addresses to be stored in a different address space than the data used in the call stack.)
That aside, yes, there is code which is executed outside of the main() function. Specifically initializers for global static variables, code to set up the runtime environment (env, call parameters, stdin/stdout) etc.
E.g. when having linked to libc, there is __libc_start_main which will call your main function after initialization is done. And clean up when your main function returns.
__libc_start_main is about the point where "stack" starts being used, as far as you can see from within the program. That's not actually true though, there has already been some loader code been executed in kernel space, for reserving memory for your process to operate in initially (including memory for the future stack), initializing registers and memory to well defined values etc.
Right before actually "starting" your process, after dropping out of kernel mode, arbitrary pointers to a future stack, and the first instruction of your program, are loaded into the corresponding processor registers. Effectively, that's where __libc_start_main (or any other initialization function, depending on your runtime) starts running, and the stack visible to you starts building up.
Getting back into the kernel usually involves an interrupt now, which doesn't follow the stack either, but may just directly access processor registers to simply swap the contents of the corresponding processor registers. (E.g. if you call a function from the kernel, the memory required by the call stack inside the function call is not allocated from your stack, but from one you don't even have access to.)
Either way, everything that happens before main() is called, and whenever you enter a syscall, is implementation dependent, and you are not guaranteed any specific observable behavior. And messing around with processor registers, and thereby alternating the program flow, is also far outside defined behavior as far as a pure C / C++ run time is concerned.

Every system I have seen, when main() is called a stack is setup. It has to be or just declaring a variable inside main would fail. A stack is setup once a thread or process is created. Thus any thread of execution has a stack. Further in every assembly language i know, a register or fixed memory location is used to indicate the current value of the stack pointer, so the concept of a stack always exists (the stack pointer might be bad, but stack operations always exist since they are built into the every mainstream assembly language).

Related

Can I get a backtrace from an explicitly allocated stack

I have a pool of threads that have explicitly allocated stacks (i.e., using pthread_attr_setstack).
Is it possible to use something like gnulib's backtrace & backtrace_symbols on this stack?
Thanks.
backtrace() returns a backtrace for the calling program, in the array
pointed to by buffer.
Therefore yes, it does not care where the stack was allocated. You simply call it from the target thread and the stack information is implicit in the current stack pointer.
If you want to backtrace another thread (other than the one that's running) that's trickier. For one thing, it may be running, which would make all attempts to backtrace subject to races. But for another, I don't think there's any standard way for the current stack pointer (which is where the trace should start) to be made visible to another thread. Without that information you can't start the trace, because the base of the stack isn't really enough information.
(ptrace can get the current register values. This is the system call used by debuggers. It will be quite disruptive, though)

Do cpp object methods have their own stack frame?

I have a hypothesis here, but it's a little tough to verify.
Is there a unique stack frame for each calling thread when two threads invoke the same method of the same object instance? In a compiled binary, I understand a class to be a static code section filled with function definitions in memory and the only difference between different objects is the this pointer which is passed beneath the hood.
But therefore the thread calling it must have its own stack frame, or else two threads trying to access the same member function of the same object instance, would be corrupting one another's local variables.
Just to reiterate here, I'm not referring to whether or not two threads can corrupt the objects data by both modifying this at the same time, I'm well aware of that. I'm more getting at whether or not, in the case that two threads enter the same method of the same instance at the same time, whether or not the local variables of that context are the same places in memory. Again, my assumption is that they are not.
You are correct. Each thread makes use of its own stack and each stack makes local variables distinct between threads.
This is not specific to C++ though. It's just the way processors function. (That is in modern processors, some older processors had only one stack, like the 6502 that had only 256 bytes of stack and no real capability to run threads...)
Objects may be on the stack and shared between threads and thus you can end up modifying the same object on another thread stack. But that's only if you share that specific pointer.
you are right that different threads have unique stacks. That is not a feature of c++ or cpp but something provided by the OS. class objects won't necessary be different. This depends on how they are allocated. Different threads could share heap objects which might lead to concurrent problem.
Local variables of any function or class method are stored in each own stack (actually place in thread's stack, stack frame), so it is doesn't matter from what thread you're calling method - it will use it's own stack during execution for each call
a little different explanation: each method call creates its own stack (or better stack frame)
NOTE: static variables will be the same
of course there exists techniques to get access to another's method's stack memory during execution, but there are kinda hacks

How does the compiler know where control should return to after a function call?

Consider the following functions:
int main()
{
//statement(s);
func1();
//statement(s);
}
void func1()
{
//statement(s);
func2();
//statement(s);
}
void func2()
{
//statement(s);
}
How does the compiler know where to return to after the func2 has performed all its operations? I know the control transfers to function func1 (and exactly which statement), but how does the compiler knows it? What tells the compiler where to return to?
This is typically implemented using a call stack:
When control is being transfered to a function, the address to return to is pushed onto the stack.
When the function finishes, the address is popped off the stack and used to transfer control back to the callee.
The details are typically mandated by the hardware architecture for which the code is being compiled.
Actually, the compiler doesn't run the code, but the machine does, and when it calls a new function, it stores the address of the next instruction to be executed after the function currently being called on the stack, so that when the function returns it can pop it off back in to the Instruction Pointer (IP) and resume from there.
I've simplified things a bit for the sake of explanation.
When a function is called, the correct return address in the calling function is placed somewhere, usually the stack though the standard does not mandate that, that is used for precisely the purpose of storing the return address.
It is the compiler's duty to ensure that its calling conventions are such that unless something goes wrong (for example, a stack overflow), then the called function knows how to return to the calling function.
The runtime makes use of some thing called as a 'call stack' which basically holds the address of the next statement to call after the function being called is returned. So when a function call is made and before the control jumps to the new instruction address, the next instruction address in the calling function is pushed on to the stack. And this process is repeated for every subsequent call to any function. Now why only a stack? because it's necessary to get back to the point where it left off - which is basically a 'last in first out' behavior and stack is the data structure that does that. You can actually look at this call stack when you are debugging a program in Visual Studio - there's a separate window called 'Call Stack' which shows the entries of the addresses placed in the call stack.

Exit the entire recursion stack

I'm calling a function fooA from main() that calls another function fooB that is recursive.
When I wish to return, I keep using exit(1) to halt execution. What is the right way to exit when the recursion tree is deep?
Returning through the recursion stack may not be of help because returning usually clears a part solution I build and I don't want to do that. I want to do execute more piece of code from main().
I read Exceptions can be used, it would be nice if I can get a code snippet.
The goto statement won't work to hop from one function back to another; Nikos C. is correct that it wouldn't account for releasing the stack frames of each of the calls you've made, so when you got to the function you goto'ed to, the stack pointer would be pointing to the stack frame of the function you were just in... no, that just won't work. Similarly, you can't simply call (either directly, or indirectly via a function pointer) the function you want to end up in when your algorithm is done. You'd never get back to the context you were in prior to diving into your recursive algorithm. You could conceivably architect a system this way, but in essence each time you did this you'd "leak" what was currently on the stack (not quite the same as leaking heap memory, but a similar effect). And if you were deep into a highly recursive algorithm, that could be a lot of "leaked" stack space.
No, you need to somehow return back to the calling context. There are only three ways to do so in C++:
Exit each function in turn by returning from it to its caller
backing up through the call chain in an orderly fashion.
Throw an exception and catch it at the point right after you
launched into your recursive algorithm (which automatically destroys
any objects created by each function on the stack in an orderly
fashion).
Use setjmp() & longjmp() to do something similar to throwing &
catching an exception, but "throwing" a longjmp() will not destroy
objects on the stack; if any such objects own heap allocations,
those allocations will be leaked.
To do option 1, you have to write your recursive function such that once a solution is reached, it returns some sort of indication that it's complete to its caller (which may be the same function), and its caller sees that fact & relays that fact on to its caller by returning to it (which may be the same function), so on and so on, until finally all stack frames of the recursive algorithm are released and you return to whatever function called the first function in the recursive algorithm.
To do option 2, you wrap the call to your recursive algorithm in a try{...} and immediately after it you catch(){...} the expected thrown object (which could conceivably be the result of the computation, or just some object that lets the caller know "hey, I'm done, you know where to find the result"). Example:
try
{
callMyRecursiveFunction(someArg);
}
catch( whateverTypeYouWantToThrow& result )
{
...do whatever you want to do with the result,
including copy it to somewhere else...
}
...and in your recursive function, when you finish the results, you simply:
throw(whateverTypeYouWantToThrow(anyArgsItsConstructorNeeds));
To do option 3...
#include <setjmp.h>
static jmp_buf jmp; // could be allocated other ways; the longjmp() user just needs to have access to it.
.
.
.
if (!setjmp(jmp)) // setjmp() returns zero 1st time, or whatever int value you send back to it with longjmp()
{
callMyRecursiveFunction(someArg);
}
...and in your recursive function, when you finish the results, you simply:
longjmp(jmp, 1); // this passes 1 back to the setjmp(). If your result is an int, you
// could pass that back to setjmp(), but you can't pass zero back.
The bad thing about using setjmp()/longjmp() is that if there are any stack-allocated objects still "alive" on the stack when you call longjmp(), execution will jump back to the setjmp() point, skipping the destructors for those objects. If your algorithm uses only POD types, that's not an issue. It's also not an issue if the non-POD types your algorithm uses do NOT contain any heap allocations (e.g. from malloc() or new). If your algorithm uses non-POD types that contain heap allocations, then you're only safe with options 1 & 2 above. But if your algorithm meets the criteria of being OK with setjmp()/longjmp(), and if your algorithm is buried under a ton of recursive calls at the point it finishes, setjmp()/longjmp() may be the fastest way back to the initial calling context. If that won't work, option 1 is probably your best bet in terms of speed. Option 2 may seem convenient (and would possibly eliminate a condition check at the start of each recursion call), but the overhead associated with the system automatically unwinding the callstack is somewhat significant.
It's typically said you should reserve exceptions for "exceptional events" (events expected to be very rare), and the overhead associated with unwinding the callstack is why. Older compilers used something akin to setjmp()/longjmp() to implement exceptions (setjmp() at the location of the try & catch, and longjmp() at the location of a throw), but there was of course extra overhead associated with determining what objects on the stack need destroyed, even if there are no such objects. Plus, every time you'd run across a try, it would have to save the context just in case there was a throw, and if exceptions are truly exceptional events, the time spent saving that context was simply wasted. Newer compilers are now more likely to use what are known as "Zero Cost Exceptions" (a.k.a. Table Based Exceptions), which seems like that would solve all the world's problems, but it doesn't.... It makes normal runtime faster because there is no longer a need to save the context every time you run across a try, but in the event that a throw executes, there is now even more overhead associated with decoding information stored in massive tables that the runtime has to process in order to figure out how to unwind the stack based on the location where the throw was encountered and content of the runtime stack. So exceptions aren't free, even though they're very convenient. You'll find a lot of stuff on the internet where people make claims about how unreasonably expensive they are and how much they slow down your code, and you'll also find lots of stuff by people refuting those claims, with both sides presenting hard data to bolster their claims. What you should take away from the arguments is that using exceptions is great if you expect them to rarely occur, because they result in cleaner interfaces & logic that's free of a ton of condition checking for "badness" every time you make a function call. But you shouldn't use exceptions as a means of normal communication between a caller and its callees, because that mode of communication is significantly more expensive than simply using return values.
This happened to me while finding the path from root to node of a binary tree. I was using a stack to store the nodes in preorder and the recursion wouldnt stop until the last node returned NULL. I used a global variable, integer i=1, and when I reached the node I was looking for I set that variable to 0 and used while(i==0) return stack; to allow the program to go back up the memory stack without popping my nodes off.

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.
Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference
I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).
I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?
There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}
That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.
With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.
Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.
It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.
Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)
I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.
I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.
I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.
It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.