A function's static and dynamic parent - c++

I'm reading Thinking in C++ (vol. 2):
Whenever a function is called,
information about that function is
pushed onto the runtime stack in an
activation record instance (ARI), also
called a stack frame. A typical stack
frame contains (1) the address of the
calling function (so execution can
return to it), (2) a pointer to the ARI of
the function’s static parent (the
scope that lexically contains the
called function, so variables global
to the function can be accessed), and
(3) a pointer to the function that called
it (its dynamic parent). The path that
logically results from repetitively
following the dynamic parent links is
the dynamic chain, or call chain
I'm unable to comprehend what the author means as function's static and dynamic parent. Also am not able to differentiate between item 1, 2 or 3. They all seem to be the same. Can someone please explain this passage to me?

I think this statement is not about C++ but general structure of stack frame.
1) is return address - address of instruction after call in main function. when return is performed it will be poped from stack and execution will go to that point (valid for c++)
2) and 3) are valid for languages that allow nested functions. (Function declared inside function) such functions may have access to parent's variables, so they have link (static link) to parent's stack frame and dynamic link is for this functions to be able call themselves recursively

This all sounds very odd to me. Static frame pointers are normally used in languages with lexical scope, such as functional languages, and the pascal family with their nested functions. Globals are bound once either at compile time or runtime, and shouldn't need frame pointers. (1) is valid, but (2) doesn't exist in C++, AFAIK.
I suspect that (3) was meant to refer to the parent frame pointer. Call stacks are usually setup as linked lists so that debuggers and related tools can walk them without requiring deep knowledge of the program.

Related

Understanding C++ syntax for an experienced C programmer

I am looking through the source code for MongoDB, and see the following declaration inside of a function, which I don't understand and haven't seen in my C programming experience.
Lock::DBLock dbLock(txn->lockState(), ns.db(), MODE_X);
I am trying to understand what this declaration is doing in C++ terms (ie. I want to understand the syntax, not specifics about the functionality of what is going on).
This breaks down into two main questions:
1) What is the purpose of the Lock::DBLock statement in front of the function?
2) Is this a function call, or a function declaration?
It is a variable declaration with a constructor - so it's BOTH a function call [to the object constructor] and a declaration of a variable.
It declares a variable of the type;
Lock::DBLock
The variable is called dbLock. It calls the constructor with a txn->lockState(), the ns.db() and a MODE_X.
My guess is that txn->lockState actually returns a lock-object, and the thing we're locking is ns.db() - in "exclusive mode". But that's a guess, and you have to look those things up within the environment.
Without looking up the documentation, I expect Lock::DBLock is a "lock manager", in other words, it takes the lock when created, and releases when destroyed.
A simple lock manager would look something like this:
class LockMgr
{
LockMgr(SomeLockType &X) : keeper(X)
{
keeper.Lock();
}
~LockMgr()
{
keeper.Unlock();
}
private:
SomeLockType& keeper; // Must be reference to original lock
};
The destructor is automatically called when the variable goes out of scope (in other worse when you leave the {} pair that the variable is within.
In C++, one commmon strategy for "resource handling" is called RAII (Resource Acquisition Is Initialization) - a variable is used to hold a resource, and it is assigned during initialization. This helps a lot with "not forgetting to undo" - for example in code that returns in the middle of the function, does break to exit a loop, or similar things. In C, you always have to watch your step for releasing locks, closing files, freeing memory, etc - and of course, if you use C++ in the wrong way, you can fall into the same pitfalls there - calling new will definitely need a call to delete, directly calling a locks Lock() member function requires a Unlock() call somewhere. But if we "wrap" the resource into an object that holds it for the duration we need it, and automatically "lets go" (frees, unlocks, etc) in the destructor, there's no need to remember to release memory, unlock locks, etc, etc.
I would suggest that before asking further questions, you read the first chapter about classes and constructor & destructor pairs in your C++ book (you DO have a book, right?)
The code is the declaration of a variable named dbLock of type Lock::DBLock. The parenthesized list contains arguments to a constructor for that type.

Allocation of global variables C++

It seems unclear as to where global variables get stored when declared different ways and which way is best?
For example, where are the variables stored in each example and what is their scope?
//Example 1 (at the top of a cpp file):
Rectangle rect(15,12);
//Example 2:
Rectangle *rect = new Rectangle(15,12);
"Where are the variables stored" is usually the wrong question. It varies between platforms and the language is designed to provide an abstraction over such details anyway.
Example 1 creates a Rectangle object with static storage duration. It will normally be destroyed automatically after main returns.
Example 2 creates a Rectangle object with dynamic storage duration. It will be destroyed whenever you call delete on the pointer (or, perhaps, call the destructor explicitly); otherwise it won't be destroyed. Informally people say objects of dynamic storage duration are "on the heap", but the implementation detail this evokes has a platform-dependent meaning.
If the first is definied outside a function, it is going to be stored in the DATA segment. If it's defined in a function, it is going to be stored on the stack.
With the second (the pointer itself) it's the same, but the object the pointer is pointing to is going to be allocated on the heap.
At the risk of oversimplification . . . .
An compiler will divide the compilation unit into sections:
- Executable data
- Read only data
- read write data
The linker will collect all the section with the same attributes together. At the end of the link process a global read/write data usually gets merges with read/write data.
This creates read/write data.
Rectangle rect(15,12);
This creates read/write data for rect as well as executable data that calls new at startup.
Rectangle *rect = new Rectangle(15,12);
Ignoring debug information, local variables on have scope only during compilation. After compilation, local variables are only [relative] memory locations. Global variables remain identifiable after compilation. After linking, global variables essentially disappear.
(For simplicity I ignore universal symbols and shared libraries.)
Where the variables "get stored" is implementation defined, and is not in the scope of the C++ standard, except as to the specific semantics of their scope.
Assuming that both declarations are statically-scoped, in both cases 'rect' will be stored at the static scope. In the second case, rect will refer to a heap-allocated object, and throughout the application's lifetime, the application may delete the pointer, and/or reassign it to point to some other instance of this class.

Why are function parameters pushed earlier on call stack than the return address?

From http://en.wikipedia.org/wiki/Stack_pointer#Structure
I am wondering why the return address for a function is placed above the parameters for that function?
It makes more sense to have Return Address pushed onto the stack before the Parameters for Drawline because the parameters are not required any more when the Return Address is popped for returning back to the calling function.
What are the reasons for preferring the implementation shown in diagram above?
The return address is usually pushed via the call machine command, [which in the native language's instruction set] while the parameters and variables are pushed with several machine commands - which the compiler creates.
Thus, the return address is the last thing pushed by the caller, and before anything [local variables] pushed by the callee.
The parameters are all pushed before the return address, because the jump to the actual function and the insertion of the return address to the stack is done in the same machine command.
Also, another reason is - the caller is the one allocating space on stack for the parameters - It [the caller] should also be the one who cleans it up.
The reason is simple: The function arguments are pushed onto the stack by the calling function (which is the only one which can do it because only it has the necessary information; after all the whole point of doing so is to pass that information to the called function). The return address is pushed to the stack by the function call mechanism. The function is called after the calling function has set up the parameters, because after the call it's the called function which is executed, not the calling one.
OK, so now you could argue that the calling function could put the parameters beyond the currently used stack, and the called function could then just adjust the stack pointer accordingly. But that would not work out well because at any time there could be an interrupt or a signal, which would push the current state onto the stack in order to restore later (I wouldn't be surprised if a task switch did so, too). But if you set up the parameters beyond the current stack, those asynchronous events would overwrite it, and since you cannot predict when they will happen, you cannot avoid that (beyond disabling, which may have other drawbacks or even be impossible, in the case of task switch). Basically, everything beyond the current stack has to be considered volatile.
Also note that this is independent of the question of who cleans up the parameters. In principle, the called function could call call destructors of the arguments even if physically they lie in the caller's stack frame. Also, many processors (including the x86) have instructions which automatically pop extra space above the return address on return (for example, Pascal compilers usually did that because in Pascal you don't have any cleanup beyond returning memory, and at least fr the processors of the time, it was more efficient to clean up with that processor instruction (I have no idea if that is still true for modern processors). However C didn't use that mechanism due to variable-length argument lists: For those, the mechanism wasn't applicable because you'd need to know at compile time how much extra space to release, and K&R C did not require to forward-declare variadic functions (C89 does, but few if any compilers take advantage of that, due to compatibility with old code), so there was no way for the calling function to know whether to clean up the arguments unless it had to do that always.

c/c++ passing argument by pointer/argument by reference stack frame layout

Will the compiler produce the same code for both of these statements?
foo1(int* val){(*val)++;}
foo2(int &val){val++;}
Will it simply write a pointer into the parameter part of foo's stack frame? Or, in the second case, will the callers' and foos' stack frames somehow overlap such that the callers' local variable takes the same memory on the stack as the parameter for foo?
Those two calls should generate exactly the same code, unless you have some kind of weird compiler.
It depends.
The code generated for both will be equivalent if not identical on most platforms if compiled to a library.
Any good compiler will inline such a small function, so it is quite possible that rather than getting the address of something on the stack incrementing the pointed-to value, it will instead increment the value directly. Any inlined function's stack frame is embedded in the caller's stack frame, so the will overlap in that case.
The stacks cannot be made to overlap.
Consider that the argument could be a global, a heap object, or even if stored in the stack it could be not the very last element. Depending on the calling convention, other elements might be placed in between one stack frame and the parameters passed into the function (i.e. return address)...
And note that even if nothing was added in the stack, the decision cannot be made while compiling the function, but rather when the compiler is processing the calling function. Once the function is compiled, it will not change depending on where it is called from.
regarding overlapping of stack frames I found the following info here
:
For some purposes, the stack frame of a subroutine and that of its caller can be considered to overlap, the overlap consisting of the area where the parameters are passed from the caller to the callee. In some environments, the caller pushes each argument onto the stack, thus extending its stack frame, then invokes the callee. In other environments, the caller has a preallocated area at the top of its stack frame to hold the arguments it supplies to other subroutines it calls. This area is sometimes termed the outgoing arguments area or callout area. Under this approach, the size of the area is calculated by the compiler to be the largest needed by any called subroutine.
So in your case if only variables in local scopes of caller functions are passed to foo2 overlapping thing may be possible!

C++/C object->isOnStack()

I would like to be able to determine if a pointer is on the stack or not at runtime for a number of reasons. Like if I pass it into a function call, I can determine whether I need to clone it or not. or whether I need to delete it.
In Microsft C (VC 6,7,8) is there a way to bounds check a pointer to see if it in on the stack or not? I am only concerned with determining this on the thread that owns the stack the object was placed on.
something like
static const int __stack_size
and __stack_top
????
Thanks!
Knowing whether an object is on the stack or heap isn't going to tell you whether it should be cloned or deleted by the called function. After all, you can clone either type, and while you shouldn't try to delete a stack-allocated function you shouldn't try to delete all heap pointers either.
Having a function that will make some arcane check to see whether it should delete a passed pointer or not is going to cause confusion down the line. You don't want a situation where you may or may not be able to refer to fields in an object you passed, depending on context. Nor do you want to risk a mistake that will result in trying to free a stack object.
There isn't any standard way to tell what a pointer points to, and any nonstandard way is likely to break. You can't count on stack contiguity, particularly in multithreaded applications (and somebody could easily add a thread to an application without realizing the consequences).
The only safe ways are to have a calling convention that the called function will or will not delete a passed object, or to pass some sort of smart pointer. Anything else is asking for trouble.
Interesting question.
Here's an idea on how to determine it, but not a function call.
Create a dummy variable at the very start of your application on the stack.
Create a variable on the stack in a function isOnStack( void *ptr )
Check to see that the 'ptr' is between the dummy variable and the local variable.
Remember that the stack is contiguous for a given thread. I'm not sure what would happen when you started checking from one thread to another for this information.
If it's not in the stack, then it must be on the heap.
I do not know any method to determine where an object was allocated.
I see this kind of behaviour should be avoided. Such things should imho be solved by contract between user and library developer. State these things in the documentation! If unsure copy the object (which requires a copy constructor and saves you from trying to copy uncopyable objects).
You can also use smart pointers from Boost. If unsure when an object is now longer needed, pass it as a shared pointer.
Doing this depends on the calling convention of the function. Some calling conventions place arguments in registers, others place them in memory after the head of the stack. Each one is a different agreement between the caller/callee. So at any function boundary in the stack a different convention could have been used. This forces you to track the calling convention used at every level.
For example, in fastcall, one or more arguments can be passed via registers.
See MSDN for more. This would mess up any scheme to figure out if an address exists within a certain range. In MS's thiscall, the this pointer is passed via registers. The &this would not resolve to somewhere between a range of values between the begin and end of the stack.
Bottom line, research calling conventions, it specifies how stack memory will be laid out. Here is a good tutorial
Note this is very platform specific!
This is very platform specific, and IMO suitable only for debug build diagnostics. What you'd need to do (on WIntel) is this:
When a thread is created, create a stack variable, and store its address in a global (threadid, stack base address) map.
IsOnStack needs to create its own local variable, and check if the pointer passed is between the stack base and the address in the current stack frame.
This will not tell you anything abotu variables within other threads. Stack addresses decrease, so the base address is higher than the current address.
As a portable solution, I'd pass a boost::shared_ptr, which can be associated with a deleter. (In boost, this is not a template parameter, so it doesn't "infect" the function consuming the pointer).
you can create an "unmanaged" pointer like this:
inline void boost_null_deleter(void *) {}
template <typename T> inline
boost::shared_ptr<T> unmanaged_ptr(T * x)
{
return boost::shared_ptr<T>(x, ::boost_null_deleter);
}
and call your function like this
Foo local = { ... };
FooPtr heapy(new Foo);
FunnyFunc(unmanaged_ptr(&local));
FunnyFunc(heapy);
I've wanted such a feature in C++ for a while now, but nothing good really exists. The best you can hope for is to document that you expect to be passed an object that lives on the heap, and then to establish an idiom in the code so that everyone working on the code base will know to pass heap allocated objects to your code. Using something like auto_ptr or boost::shared_ptr is a good idiom for this kind of requirement.
Well, I agree there is probably a better way of doing what you're trying to do. But it's an interesting question anyway. So for discussion's sake...
First, there is no way of doing this is portable C or C++. You have to drop to assembly, using at least a asm{ } block.
Secondly, I wouldn't use this in production code. But for VC++/x86 you can find out if a variable is on your stack by check that it's address is between the values of ESP and EBP registers.
Your ESP ( Extended Stack Pointer, the low value ) holds the top of your stack and the EBP ( Extended Base Pointer ) usually the bottom. Here's the Structure of the Call Stack on x86.
Calling convention will affect function parameters mainly, and how the return address is handled, etc. So it doesn't relate to your stack much. Not for your case anyway.
What throws things off are compiler optimizations. Your compiler may leave out the frame pointer ( EBP ). This is the -Oy flag in VC++. So instead of using the EBP as the base pointer you can use the address of function parameters, if you have any. Since those a bit higher up on the stack.
But what if that variable you're testing is on your caller's stack? Or a caller's stack several generations above you? Well you can walk the entire call stack, but you can see how this can get very ugly ( as if it isn't already :-) )
Since you're living dangerously, another compiler flag that may interest you is -
Gh flag. With that flag and a suitable _penter hook function, you can setup these calculations for the functions or files or modules, etc. easily. But please don't do this unless you'd just like to see how things work.
Figuring out what's on the heap is even worse....
On some platforms, the stack can be split by the run-time system. That is, instead of getting a (no pun intended) stack overflow, the system automatically grabs some more stack space. Of course, the new stack space is usually not contiguous with the old stack space.
It's therefore really not safe to depend on whether something is on the stack.
The use of auto_ptr generally eliminates the need for this kind of thing, and is way cooler besides.
The MSVC Windows compiler specific answer. This is of course specific to the thread the object is in. It's a pretty bad idea to pass any auto-stack item into any thread other than the one whos stack it is on so I'm not worried about that :)
bool __isOnStack(const void *ptr)
{
// FS:[0x04] 4 Win9x and NT Top of
stack // FS:[0x08] 4 Win9x and
NT Current bottom of stack
const char *sTop; const char
*sBot;
__asm {
mov EAX, FS:[04h]
mov [sTop], EAX
mov EAX, FS:[08h]
mov [sBot], EAX
}
return( sTop > ((const char *)ptr) && ((const char *)ptr) > sBot);
}