C++/C object->isOnStack() - c++

I would like to be able to determine if a pointer is on the stack or not at runtime for a number of reasons. Like if I pass it into a function call, I can determine whether I need to clone it or not. or whether I need to delete it.
In Microsft C (VC 6,7,8) is there a way to bounds check a pointer to see if it in on the stack or not? I am only concerned with determining this on the thread that owns the stack the object was placed on.
something like
static const int __stack_size
and __stack_top
????
Thanks!

Knowing whether an object is on the stack or heap isn't going to tell you whether it should be cloned or deleted by the called function. After all, you can clone either type, and while you shouldn't try to delete a stack-allocated function you shouldn't try to delete all heap pointers either.
Having a function that will make some arcane check to see whether it should delete a passed pointer or not is going to cause confusion down the line. You don't want a situation where you may or may not be able to refer to fields in an object you passed, depending on context. Nor do you want to risk a mistake that will result in trying to free a stack object.
There isn't any standard way to tell what a pointer points to, and any nonstandard way is likely to break. You can't count on stack contiguity, particularly in multithreaded applications (and somebody could easily add a thread to an application without realizing the consequences).
The only safe ways are to have a calling convention that the called function will or will not delete a passed object, or to pass some sort of smart pointer. Anything else is asking for trouble.

Interesting question.
Here's an idea on how to determine it, but not a function call.
Create a dummy variable at the very start of your application on the stack.
Create a variable on the stack in a function isOnStack( void *ptr )
Check to see that the 'ptr' is between the dummy variable and the local variable.
Remember that the stack is contiguous for a given thread. I'm not sure what would happen when you started checking from one thread to another for this information.
If it's not in the stack, then it must be on the heap.

I do not know any method to determine where an object was allocated.
I see this kind of behaviour should be avoided. Such things should imho be solved by contract between user and library developer. State these things in the documentation! If unsure copy the object (which requires a copy constructor and saves you from trying to copy uncopyable objects).
You can also use smart pointers from Boost. If unsure when an object is now longer needed, pass it as a shared pointer.

Doing this depends on the calling convention of the function. Some calling conventions place arguments in registers, others place them in memory after the head of the stack. Each one is a different agreement between the caller/callee. So at any function boundary in the stack a different convention could have been used. This forces you to track the calling convention used at every level.
For example, in fastcall, one or more arguments can be passed via registers.
See MSDN for more. This would mess up any scheme to figure out if an address exists within a certain range. In MS's thiscall, the this pointer is passed via registers. The &this would not resolve to somewhere between a range of values between the begin and end of the stack.
Bottom line, research calling conventions, it specifies how stack memory will be laid out. Here is a good tutorial
Note this is very platform specific!

This is very platform specific, and IMO suitable only for debug build diagnostics. What you'd need to do (on WIntel) is this:
When a thread is created, create a stack variable, and store its address in a global (threadid, stack base address) map.
IsOnStack needs to create its own local variable, and check if the pointer passed is between the stack base and the address in the current stack frame.
This will not tell you anything abotu variables within other threads. Stack addresses decrease, so the base address is higher than the current address.
As a portable solution, I'd pass a boost::shared_ptr, which can be associated with a deleter. (In boost, this is not a template parameter, so it doesn't "infect" the function consuming the pointer).
you can create an "unmanaged" pointer like this:
inline void boost_null_deleter(void *) {}
template <typename T> inline
boost::shared_ptr<T> unmanaged_ptr(T * x)
{
return boost::shared_ptr<T>(x, ::boost_null_deleter);
}
and call your function like this
Foo local = { ... };
FooPtr heapy(new Foo);
FunnyFunc(unmanaged_ptr(&local));
FunnyFunc(heapy);

I've wanted such a feature in C++ for a while now, but nothing good really exists. The best you can hope for is to document that you expect to be passed an object that lives on the heap, and then to establish an idiom in the code so that everyone working on the code base will know to pass heap allocated objects to your code. Using something like auto_ptr or boost::shared_ptr is a good idiom for this kind of requirement.

Well, I agree there is probably a better way of doing what you're trying to do. But it's an interesting question anyway. So for discussion's sake...
First, there is no way of doing this is portable C or C++. You have to drop to assembly, using at least a asm{ } block.
Secondly, I wouldn't use this in production code. But for VC++/x86 you can find out if a variable is on your stack by check that it's address is between the values of ESP and EBP registers.
Your ESP ( Extended Stack Pointer, the low value ) holds the top of your stack and the EBP ( Extended Base Pointer ) usually the bottom. Here's the Structure of the Call Stack on x86.
Calling convention will affect function parameters mainly, and how the return address is handled, etc. So it doesn't relate to your stack much. Not for your case anyway.
What throws things off are compiler optimizations. Your compiler may leave out the frame pointer ( EBP ). This is the -Oy flag in VC++. So instead of using the EBP as the base pointer you can use the address of function parameters, if you have any. Since those a bit higher up on the stack.
But what if that variable you're testing is on your caller's stack? Or a caller's stack several generations above you? Well you can walk the entire call stack, but you can see how this can get very ugly ( as if it isn't already :-) )
Since you're living dangerously, another compiler flag that may interest you is -
Gh flag. With that flag and a suitable _penter hook function, you can setup these calculations for the functions or files or modules, etc. easily. But please don't do this unless you'd just like to see how things work.
Figuring out what's on the heap is even worse....

On some platforms, the stack can be split by the run-time system. That is, instead of getting a (no pun intended) stack overflow, the system automatically grabs some more stack space. Of course, the new stack space is usually not contiguous with the old stack space.
It's therefore really not safe to depend on whether something is on the stack.
The use of auto_ptr generally eliminates the need for this kind of thing, and is way cooler besides.

The MSVC Windows compiler specific answer. This is of course specific to the thread the object is in. It's a pretty bad idea to pass any auto-stack item into any thread other than the one whos stack it is on so I'm not worried about that :)
bool __isOnStack(const void *ptr)
{
// FS:[0x04] 4 Win9x and NT Top of
stack // FS:[0x08] 4 Win9x and
NT Current bottom of stack
const char *sTop; const char
*sBot;
__asm {
mov EAX, FS:[04h]
mov [sTop], EAX
mov EAX, FS:[08h]
mov [sBot], EAX
}
return( sTop > ((const char *)ptr) && ((const char *)ptr) > sBot);
}

Related

would you recommend using assembly to access arguments in this exceptional case?

consider the following function that won't get inlined and assume x86 as platform:
void doSomething(int & in){
//do something
}
firstly I'm not sure such scenario would happen but since I think it is possible I'm gonna ask so IF whenever in any caller this function is called the argument to be supplied lies exactly at the top of the caller stack frame so that in the called function access to that through ebp register(after callee has moved content of esp into ebp) in assembly language is possible do you suggest we ignore declaring a parameter at all for function and use assembly to access our arguments in this exceptional case or just leave function definition as it was and leave it to compiler to do what it does? since I haven't read anywhere that compiler would consider such exceptional case as a factor for calling convention and I think it'll simply generate code to pass a pointer to the argument to the callee stack frame or one of registers
First of all, it's SO easy for this to break - for example, you get a different version of compiler, that generates code differently. Or you change optimisation features. Never mind the situation where you suddenly need to use doSomething in a different place and then it won't work, because the variable is no longer on the top of the stack.
Second, assuming that the code inside the function is short enough, it's highly likely that the compiler will inline the function, so you don't "lose" anything at all.
Third, a single argument in modern compilers, is typically passed in a register anyway, so there is no benefit in this when optimisation is enabled.
If you really think there is worthwhile benefit in this, and the compiler won't inline or otherwise optimise the code [have you looked at the generated code?], then try using forceinline or always_inline or whatever it is called in your compiler (most compilers have such an option). If that doesn't work, use a macro to inline it by hand. Or simply move the code to where it is called by "copy-n-paste".
Your note "the argument to be supplied lies exactly at the top of the caller stack frame so that in the called function access to that through ebp register" contains a factual misunderstanding.
That's because of the following things:
you're assuming a stack-based calling convention, i.e. function arguments being pushed to the stack by the caller before calling the function. That's not generally the case; even on 32bit x86, there's non-stack-based calling conventions (for example, Windows fastcall or the GNU GCC ones used in the 32bit linux kernel). If such are used, the argument wouldn't be found on top of the stack, but rather in ... whatever register is used to hold the first argument.
But even if you have stack-based parameter passing ... still:
you've missed that on x86 at the very least the call instruction is pushing a return address onto the top of the stack, so that when the first instruction of a function reached that way is executing, ESP will not point to the first arg of that function, but to the return address.
you've missed that EBP is a callee-saved (preserved over function calls) registers, and not initialized on your behalf by the architecture - it's necessary for the generated code to explicitly set it up. A function which wants to use it (even if only as a framepointer) is therefore obliged to save it somewhere before using it. That means the normal prologue will have push EBP; mov EBP, ESP (you cannot only do MOV EBP, ESP because that would overwrite the caller's EBP which is invalid / which you may not do). Therefore, if you like to refer to the first argument of the function, you'd need [ EBP + 8 ] not [ EBP ].
If you're not using framepointers, then the first argument (due to the call which was used to reach the function having pushed a return address) is at [ ESP + 4 ] not [ ESP ].
I hope this clarifies a little.
I agree with the other posters that clarifying the question would help, what exactly you want to achieve and why you think assembly language might be useful here.
No, I would not. Calling conventions may vary (between x86 and x86_64); Parameters could be pushed to the stack or put into register, and I'm not sure you can know for sure where they'll be.
Writing this in assembly, unless you really know what you're doing is likely to lead to undefined behavior code.

Pointer to address and Memory Stack

void EventsStack::push(Event *e){
EventNode *q = new EventNode();
q->data = e;
q->next = _top;
_top = q;
}
void main() {
EventsStack eventStack;
Event e1(1);
eventStack.push(&e1);
Event e2(2);
eventStack.push(&e2);
}
First question: when I do
eventStack.push(&e1);
am I sending the ADDRESS of e1 to the push function, and the push function receiving it as a pointer? as if I am doing:
Event *e = 1000 (1000 is the offset (address) of e1 for example on the stack)
?
Second question: I am asked to illustrate the stack upon running the main function. When I get to the line
eventStack.push(&e1);
does a 4 byte return address and a 4byte pointer to e1 get allocated as the function's activation frame or in this situation there is no activation frame since eventStack is an object of the class EventsStack and the push is one of its' member functions?
With regards to your first question: the expression &e1 takes
the address of e1, which is a pointer. In other words, e1
has type Event, and &e1 has type Event*, and a value such
that dereferencing it (the unary * operator) will have the
same effect as using e1. This is what you pass to the push
function.
And a pointer cannot be just an offset on the stack, since it
must be possible to access the object with it from elsewhere,
where the stack isn't necessarily available. Most modern
desktop machines use linear addressing, which means that the
pointer is simply an integer, but that hasn't always been the
case, and it's probably not the case on some embedded processors
(and for historical reasons on some mainframes).
With regards to the second question: formally, it's completely
unspecified how functions are called. What is necessary is
that the compiler put the return address and the arguments
somewhere where the called function can find them. In the
expression eventStack.push( &e1 ), the function has two
arguments, the address of eventStack (which will become the
this pointer in the function), and the expression &e1. How
the compiler passes these in varies enormously, and almost
always depends on their type, but usually, on most modern
machines, the first 3 to 5 arguments will go into machine
registers if they fit (and both of the addresses you pass will
fit), so all that gets pushed on the machine stack is the return
address. And parameters are not usually considered part of the
current frame, although on some older processors, the compiler
did treat them as such. (You say you are "asked to illustrate
the stack". Does the question refer to a specific machine
architecture, or what? What actually happens on the stack will
vary enormously between compilers, and g++ under Linux will
often do something completely different than Visual Studios,
even when running on the same processor.)
Regarding your first question, yes: the & operator yields the memory address of your Event.
As to your second question, well, it's complicated. The description you made seems to indicate an underlying confusion as to how the stack works. I'd strongly recommend going over some introductory material on the topic, it's going to leave you with a much stronger understanding.
You'll be able to answer your own question a few hours from now if you go Google a bit :).

Why are function parameters pushed earlier on call stack than the return address?

From http://en.wikipedia.org/wiki/Stack_pointer#Structure
I am wondering why the return address for a function is placed above the parameters for that function?
It makes more sense to have Return Address pushed onto the stack before the Parameters for Drawline because the parameters are not required any more when the Return Address is popped for returning back to the calling function.
What are the reasons for preferring the implementation shown in diagram above?
The return address is usually pushed via the call machine command, [which in the native language's instruction set] while the parameters and variables are pushed with several machine commands - which the compiler creates.
Thus, the return address is the last thing pushed by the caller, and before anything [local variables] pushed by the callee.
The parameters are all pushed before the return address, because the jump to the actual function and the insertion of the return address to the stack is done in the same machine command.
Also, another reason is - the caller is the one allocating space on stack for the parameters - It [the caller] should also be the one who cleans it up.
The reason is simple: The function arguments are pushed onto the stack by the calling function (which is the only one which can do it because only it has the necessary information; after all the whole point of doing so is to pass that information to the called function). The return address is pushed to the stack by the function call mechanism. The function is called after the calling function has set up the parameters, because after the call it's the called function which is executed, not the calling one.
OK, so now you could argue that the calling function could put the parameters beyond the currently used stack, and the called function could then just adjust the stack pointer accordingly. But that would not work out well because at any time there could be an interrupt or a signal, which would push the current state onto the stack in order to restore later (I wouldn't be surprised if a task switch did so, too). But if you set up the parameters beyond the current stack, those asynchronous events would overwrite it, and since you cannot predict when they will happen, you cannot avoid that (beyond disabling, which may have other drawbacks or even be impossible, in the case of task switch). Basically, everything beyond the current stack has to be considered volatile.
Also note that this is independent of the question of who cleans up the parameters. In principle, the called function could call call destructors of the arguments even if physically they lie in the caller's stack frame. Also, many processors (including the x86) have instructions which automatically pop extra space above the return address on return (for example, Pascal compilers usually did that because in Pascal you don't have any cleanup beyond returning memory, and at least fr the processors of the time, it was more efficient to clean up with that processor instruction (I have no idea if that is still true for modern processors). However C didn't use that mechanism due to variable-length argument lists: For those, the mechanism wasn't applicable because you'd need to know at compile time how much extra space to release, and K&R C did not require to forward-declare variadic functions (C89 does, but few if any compilers take advantage of that, due to compatibility with old code), so there was no way for the calling function to know whether to clean up the arguments unless it had to do that always.

Visual Studio not able to show the value of 'this' in release mode (with debug information)

Original question:
Why is the this pointer 0 in a VS c++ release build?
When breaking in a Visual Studio 2008 SP1 release build with the /Zi (Compiler: Debug Information Format - Program Database) and /DEBUG (Linker: Generate Debug Info, yes) options, why are 'this'-pointers always 0x00000000?
EDIT: Rephrased question:
My original question was quite unclear, sorry for that. When using the Visual Studio 2008 debugger to step through a program I can see all variables, except the local object's member variables. This is probably cause the debugger derives these from the this pointer, but VS always says it's 0x00000000, so it cannot derive the current object's member variables (it does not know the memory position of the object)
When loading a megadump (Like a Windows minidump, but containing the entire memory space of the process), I can look at all my local variables (defined in the function) and entire tree-structures on the heap even I have pointers to.
For example: when breaking in A::foo() in Release mode
'this' will have value 0x00000000
'f_' will show garbage
Somehow this information needs to be available to the process. Is this a missing feature in VS2008? Any other debugger that does handle this properly?
class A
{
void foo() { /*break here*/ }
int f_;
};
As some others have mentioned, compiling in Release mode makes certain optimizations (especially eliminating the use of ebp/rbp as a frame pointer) that break assumptions on which the debugger relies for figuring out your local variables. However, knowing why it happens isn't very helpful for debugging your program!
Here's a way you can work around it: at the very beginning of a method call (breaking on the first line of the function, not the opening brace), the this pointer will always be found in a specific register (ecx on 32-bit systems or rcx on 64-bit systems). The debugger knows that and so you should be able to see the value of this right at the start of your method call. You can then copy the address from the Value column and watch that specifically (as (MyObject *)0x003f00f0 or whatever), which will allow you to see into this later in the method.
If that's not good enough (for example, because you only want to stop when a bug manifests itself, which is a very small percentage of the time the given method is called), you can try this slightly more advanced (and less reliable) trick. Usually, the this pointer is taken out of ecx/rcx very early in a function call, because that is a "caller-saves" register, meaning that its value may be clobbered and not restored by function calls your method makes (it's also needed for some instructions that can only use that register for their operand, like REP* and some of the shift instructions). However, if your method uses the this pointer a lot (including the implicit use of referring to member variables or calling virtual member functions), the compiler will probably have saved this in another register, a "callee-saves" register (meaning that any function that clobbers it must restore it before returning).
The practical upshot of this is that, in your watch window, you can try looking at (MyObject *) ebp, (MyObject *) esi, and so on with other registers, until you find that you're looking at a pointer that is probably the correct one (because the member variables line up with your expectation of the contents of this at the time of your breakpoint). On x86, the calle-saved registers are ebp, esi, edi, and ebx. On x86-64, they are rbp, rsi, rdi, rbx, r12, r13, r14, and r15. If you don't want to search all those, you could always try looking at the disassembly of your function prologue to see what ecx (or rcx) is being copied into.
Local variables (including this) when viewed in the Locals window cannot be relied upon in the Release build in the way that they can in Debug builds. Whether the variable value shown is correct at any given instruction depends on how the underlying register is being used at that point. If the code runs OK in Debug it's most unlikely that the value is actually 0.
Optimization in Release builds makes values in the Locals window a crap shoot, to the naked eye. Without concurrent display and correlation of the Disassembly window, you cannot be sure that the Locals window is telling you the actual value of the variable. If you step through the code (maybe in Disassembly not Source) to a line that actually uses this, it's more likely that you will see a valid value there.
Because you wrote a bugged program and called a member function on a NULL pointer.
Edit: Reread your question. Most likely, it's because the optimizer did a number on your code and the debugger can't read it anymore. If you have a problem specific to Release build, then it's a hint that your code has a dodgy #ifdef in it, or you invoked UB that just happens to work in Debug mode. Else, debug with Debug build. However, that's not terribly helpful if you actually have a problem in Release mode you can't find.
Your function foo is inline (it's declared in the class definition, so is implicitly inline), and doesn't access any members. Therefore the optimizer will likely not actually pass the this pointer at all when it compiles the code, so it is not available to the debugger.
In release builds, the optimizer will rearrange code quite substantially in order to improve performance, particularly with inline functions (though it does optimize other functions too, especially if whole program optimization is enabled). Rather than passing this, it may instead pass a pointer to a used member directly, or even just pass the member's value in a register that it loaded for a previous function call.
Sometimes the debug info is enough that the debugger can actually piece together a this pointer, and the values of local variables. Often, it is not, and the this pointer shown in the watch window (and consequently the member variables) are nonsense.
Because it is a release build. The entire point in optimizations is to change the implementation details of the program, while preserving the overall functionality.
Does the program still work? Then it doesn't matter that the this pointer is seemingly null.
In general, when you're working with a release build, you should expect that the debugger is going to get confused. Code is going to be reordered, variables removed entirely, or containing weird unexpected values.
When optimizations are enabled, no guarantees are given about any of these things. But the compiler won't break your program. If it worked without optimizations, it'll still work with optimizations. If it suddenly doesn't work, it's because you have a bug that was only exposed because the compiler optimized and modified the code.
Are they "const" functions?
A const function is one which is declared with the keyword const, and this indicates that it will not change any of the members, only read them (like accessor functions)
An optimising compiler may not bother passing the 'this' pointer to some const functions if it doesn't even read from non-static member variables
An optimising compiler may search for functions which could be const, make them constant, and then not pass a this pointer into them, causing the debugger to be unable to find the hook.
It isn't the this pointer that is NULL, but rather the pointer you are using to call a member function:
class A
{
public:
void f() {}
};
int main()
{
A* a = NULL;
a->f(); // DO'H! NULL pointer access ...
// FIX
A* a = new A;
a->f(); // Aha!
}
As others already said you should make sure that the compiler does not do anything which can confuse the debugger, optimizations are likely to do.
The fact that you have NULL pointer can happen IF you call the function statically like :
A* b=NULL;
b->foo();
The function is not static here but called a static way.
The best spot to find the real this pointer is the take a look at the stack. For non-static class functions the this pointer MUST be the first ( hidden ) argument of your function.
class A
{
void foo() { } // this is "void foo(A *this)" really
int f_;
};
If your this prointer is null here, then you have problem before calling the function. If the pointer is correct here then you debugger is kinda messed up.
I've been using Code::Blocks with Mingw for years now, with the built in debugger ( gdb )
I only have problems with the pointer when I had optimizations turned on, otherwise it always knows the this pointer and can dreference it any time.

Thought experiment with __stdcall and corrupted stack (C++)

My mind was wandering today on the topic of function pointers, and I came up with the following scenario in my head:
__stdcall int function (int)
{
return 0;
}
int main()
{
(*(int(*)(char*,char*))function)("thought", "experiment");
return 0;
}
AFAIK this code would corrupt the stack, so what types of issues could I be looking at if I ran this code?
I'd do this investigating myself however I'm away from my dev machine for a week.
EDIT: Hold on a second, I've been thinking a bit more. As has been observed in the comments, the intent of this code was to have a parameter left on the stack when all is said and done (caller puts two params on the stack, callee -- expecting only one param -- pops only one off). However, since my cast doesn't make mention of the calling convention, am I casting away stdcall, at least from the view of the caller? int function(int) will still pop a param off the stack, but does the caller revert to thinking the function is __cdecl (the default) because of the cast? (i.e. three total params popped?)
EDIT2: The answer to that second question, as confirmed by Rob, is yes. I would have to restate __stdcall if I wanted to leave a param on the stack:
(*(__stdcall int(*)(char*,char*))function)("thought", "experiment");
You are calling the function as if it is _cdecl which means the caller pushes the arguments and cleans up the stack.
The receiving function is _stdcall which implies the callee cleans up the stack. The callee is expecting a single argument so will pop 4 bytes off the stack.
When the function returns the caller will then pop off two pointers (having previously pushed on two pointers), so your stack is being corrupted by 4 bytes.
Both calling conventions use the same return mechanism, and have the same register rules (eax, ecx and edx are not preserved). See wikipedia for more details.
Depending on the stack frame layout and alignment this mismatch could cause a number of effects. If you are lucky then you get away with it. If not you might mess up the return address of your main function, causing the program to crash when it branches to who-knows-where. If the compiler has injected some kind of stack guard to catch corruption then it will likely detect this and abort the program.
No, it will definitely not cause a blue screen. No user-mode process is able to do that. Even if such bug were in kernel-mode code, the BSOD would occur only after accessing invalid memory or passing wrong arguments to a function.
You are simply corrupting private memory of your process, and the corruption may (or may not) later result in an invalid operation (eg. dereferencing a pointer pointing to invalid memory). When this happens, the OS terminates your process, but no sooner.
I think you would have 'undefined behavior' in this case.
From the C standard: (I would assume it's the same in C++)
768 If
a converted pointer is used to call a
function whose type is not compatible
with the pointed-to type, the behavior
is undefined.
Edit: On most operating system, this type of error would not cause problems in your whole operating system. But it would cause undefined problems in your program. It would be very hard for a user mode program to be able to cause a blue-screen.