Pointer to address and Memory Stack

Pointer to address and Memory Stack - c++

void EventsStack::push(Event *e){
EventNode *q = new EventNode();
q->data = e;
q->next = _top;
_top = q;
}
void main() {
EventsStack eventStack;
Event e1(1);
eventStack.push(&e1);
Event e2(2);
eventStack.push(&e2);
}
First question: when I do
eventStack.push(&e1);
am I sending the ADDRESS of e1 to the push function, and the push function receiving it as a pointer? as if I am doing:
Event *e = 1000 (1000 is the offset (address) of e1 for example on the stack)
?
Second question: I am asked to illustrate the stack upon running the main function. When I get to the line
eventStack.push(&e1);
does a 4 byte return address and a 4byte pointer to e1 get allocated as the function's activation frame or in this situation there is no activation frame since eventStack is an object of the class EventsStack and the push is one of its' member functions?

With regards to your first question: the expression &e1 takes
the address of e1, which is a pointer. In other words, e1
has type Event, and &e1 has type Event*, and a value such
that dereferencing it (the unary * operator) will have the
same effect as using e1. This is what you pass to the push
function.
And a pointer cannot be just an offset on the stack, since it
must be possible to access the object with it from elsewhere,
where the stack isn't necessarily available. Most modern
desktop machines use linear addressing, which means that the
pointer is simply an integer, but that hasn't always been the
case, and it's probably not the case on some embedded processors
(and for historical reasons on some mainframes).
With regards to the second question: formally, it's completely
unspecified how functions are called. What is necessary is
that the compiler put the return address and the arguments
somewhere where the called function can find them. In the
expression eventStack.push( &e1 ), the function has two
arguments, the address of eventStack (which will become the
this pointer in the function), and the expression &e1. How
the compiler passes these in varies enormously, and almost
always depends on their type, but usually, on most modern
machines, the first 3 to 5 arguments will go into machine
registers if they fit (and both of the addresses you pass will
fit), so all that gets pushed on the machine stack is the return
address. And parameters are not usually considered part of the
current frame, although on some older processors, the compiler
did treat them as such. (You say you are "asked to illustrate
the stack". Does the question refer to a specific machine
architecture, or what? What actually happens on the stack will
vary enormously between compilers, and g++ under Linux will
often do something completely different than Visual Studios,
even when running on the same processor.)

Regarding your first question, yes: the & operator yields the memory address of your Event.
As to your second question, well, it's complicated. The description you made seems to indicate an underlying confusion as to how the stack works. I'd strongly recommend going over some introductory material on the topic, it's going to leave you with a much stronger understanding.
You'll be able to answer your own question a few hours from now if you go Google a bit :).

Related

Function returning value vs modifying value passed by reference

In what situations it is preferred to return an object instead of just modifying an object passed to that function by reference? How do I know which one should I pick?
Actually, the question is if there are things I wouldn't be able to do without the ability to return an object from function, but instead only modifying objects passed by reference.

The main pragmatic difference between
TYPE function () ;
and
void function (TYPE &value) ;
is that the former can be used in expressions.
a = 10 * function () ;
if (function ())
{
}
That is the main design consideration.
Beyond that, I would have to get into opinion. I am going to stick to the objective difference.

Its very simple...
Return by "value":
- If you want a copy of the object with the current state of the object;
- And inst important if the state of the original object change latter;
Return by reference:
- If you want to have the correct state when the object is updated by others;
- If you want to change the state of the object, and others be aware of that changes;
Those are the most important reasons.
But exist a special use case that you must be aware:
By techinical reasons, the current languages normally are faster when you return by reference.
If speed is a requirement you should also consider this constraint to take the best decision.
But the fundamental decision is about how you want to deal with the state of the object.

Returning something is the easy option but you'll use the reference if something speaks against it. One reason is because you have something else in mind for the return value, like a success/failure code. Another is because it would look big on the stack and you're on a tiny computer. Then, of course, you might want to modify an existing variable or structure instead of making a new one. I think that's it.
I'm not aware of anything you can't do by passing a reference or pointer to receive the result.

No one has mentioned speed, so I'll just add this tid-bit:
Passing by reference or pointer is frequently faster than passing by value (so long as the size of a pointer is < the size of the value)
Returning/passing by reference is faster (same as returning by a pointer) than returning by value if sizeof(object) > sizeof(void *), or in other words, if the size of the object being returned is larger than the size of any pointer on the system (all pointers on a given system have the same size), which is 8 bytes on most systems.
So, if you're returning any integer, float, or double type, returning by value is as fast or faster than returning by pointer. If you're returning a 9 byte, 10 byte, or 1200 byte object, however, returning by reference or pointer is faster, and much much faster for the 1200-byte-object case.

In C++ are all names essentialy "under-the-hood" pointers?

In C++ are names actually pointers-under-the-hood? For example:
// num is a name that points to the address of the
// memory location containing 32.53
double num (32.53);
*num; // so can we deference it and get 32.53?
num; // same as *num but the compiler automatically
// dereferences the name for us?
Clarification:
This question was kind of an odd mix of "What's happening at the machine level?" and also about the C++ language semantics of pointers. Thus, I can see why the answers are yes/no. "Yes" because outside of the language semantics, an identifier could be thought of as referring to a location in memory; and "No" because that is still not a C++ pointer and it is incorrect to deference a non-pointer double as shown in the code.
So I think both camps have successfully answered my question. It could perhaps be restated as, "Since names refer to memory locations, why can't they be treated as implicit pointers?" But such a question might generate fuzzy debates or just not be worth answering. I will carefully go over the answers and try to find the one that (I feel) answers the question the best from both angles. In other words, one that doesn't just say "That's not a C++ pointer dummy!" or "of course the name points to memory somehow".

A pointer is a programmer-visible value which holds the location of some object (or else a null value that doesn't point to an object, or an indeterminate value).
Although addressing is involved in resolving name references at run-time, names of variables are not pointers in C++. This is because variable names are not run-time values which represent locations; they are compile-time (and in the case of external names with linkage, link-time) symbols that denote locations.
"Pointer" and "address" are not the same thing. For instance, when we call a function, assuming it is not tail-call optimized or inlined, typically a "return address" is stored somewhere so that the function can return to the caller. This address is not a pointer; or at least not a C++ pointer. It is a "no user serviceable component" managed behind the scenes by the implementation, just like the stack pointer (if there is such a thing), stack frame pointer and other machine-level features. C++ references, at least in some cases, are also implemented using run-time addresses. C++ references are also not pointers.
There is a run-time pointer value associated with a variable name, and you can access this property using the address-of operator:
double *pnum = &num;
but not like this:
*num; // so can we deference it and get 32.53?
Have you tried it? The unary dereference operator requires an expression of pointer type; it won't work with an expression of type double. (It can work with an expression of class type, if suitably overloaded, which is something else.)
Though by means of the & operator we can get a pointer to the storage location named by num, the name num itself isn't that pointer.
When we evaluate code like num = 3, quite likely an address is involved. Or maybe not. For instance, if num is optimized into a register, then this just loads 3 into that register. Even if num has a memory address, that address is not programmer-visible in that situation. Pointers are programmer-visible: the programmer creates them, displaces them, dereferences them, stores them in variables, passes them into functions, and so on.
In fact a name isn't anything in C++; names need not be retained at run time and are not accessible in any portable way. (Implementations have ways of retaining information about names after compilation, for the sake of symbolic debugging, and platforms that support dynamic linking have ways of looking dynamic symbols from strings.)

If you extend the definition of pointer to "any conceptual device that refers to some information in memory" then, yes, absolutely.
However, nobody does.
You'd be closer to the money if you used the term handle which has, variably, been used to mean "pointer", "reference", "variable", "resource object", "name" in source code, "accessor", "identifier", and myriad other things.
One generally comes to the conclusion that general terms are too ambiguous, and end up sticking with terms that are either language-specific (such as C++'s "pointer", with its very specific semantics, not including those which you have posited), or unambiguous and commonly accepted across the realm of the industry. There are very few of those.

No.
From the definition of pointer type of the C standard (at §6.2.5/20):
A pointer type may be derived from a function type or an object type, called the referenced type. A pointer type describes an object whose value provides a reference to an entity of the referenced type.
(emphasis mine).
In your case:
double num (32.53);
you have a double, with identifier num, whose value is 32.53, which is not intended to use as a pointer value. Not even the type of num is a pointer type.
Therefore no, num is not a pointer and as the compiler would have told you, if you had tried to compile:
*num;
you can't dereference it.

No.
In this case, num is the name of an object of type double. Its declaration does not explicitly or implicitly create a pointer. It creates a floating-point object, initialized to 32.53.
You can obtain a pointer value (not a pointer object) by taking the address of the object, as in&num, but you can that for any object.
As for *num, that's illegal if num is of type double.
The name of an object is something that refers to the object itself; in that sense, if I squint really hard, I can think of that as a kind of "pointer". But a pointer in the C++ sense is a value or object containing a memory address, and it exists while the program is executing; an identifier exists only in the C++ source code. A C++ compiler will have some internal compile-time data structure that refers to a declared variable, that will include (or refer to) information about its name and type, but IMHO it's not reasonable to call that data structure a "pointer". It's likely to be something far more complex than a memory address. And the name of a variable declared inside a function will refer to different objects, or to none at all, at different times during the execution of the program.

Yes, in the sense that, like a pointer, an identifier is a handle for an object, and not the object itself. But it is still one fewer level of indirection than a pointer variable (whose name is a handle to the address which is a handle to the object).
In fact, the compiler will maintain a symbol table which is a mapping from identifier to location of the object (here location is typically not memory address, but offset from the bottom of the stack, or from the beginning of the data segment -- but then again C++ pointers aren't physical addresses either on virtual memory systems) Normally this symbol table is output by the compiler for use during debugging. Dynamic languages would actually use the symbol table during execution, to support late-binding and eval(). But C++ doesn't use identifiers at runtime.

I think variables in assembly work like that, i.e. that a name is actually a human-readable label for an address. (This article would indicate so, at least: http://www.friedspace.com/assembly/memory.php ) In C/C++, however, the name of the variable is actually a dereferenced address. To get the address, you have to use the & operator.
Pointers in C/C++ are actually variables that contain an address--something that most likely has its own address (though it doesn't need to if the compiler chooses to store the pointer in a CPU register and you don't try to get its address--if you do, then, you're guaranteed to get one).

What you're talking about really comes down to the definition of an "lvalue".
Let's consider a simple variable like you gave in the question:
double num = 32.53;
num is an lvalue, which roughly translates to the fact that it refers to some location in memory.
When it's used in an expression, an lvalue can be converted to an rvalue, which translates (roughly) to retrieving the value from that memory location. For example, given an expression like:
num = num + 1.5;
num starts out as an lvalue, but where it's used on the right side of the assignment, it's converted to an rvalue. That basically means the value from that location in memory is fetched, then 1.5 is added to it, then the resulting value is written back to the location in memory that num refers to.
That does not, however, mean that num is a pointer. num does refer to a location in memory. A pointer is different: it's a variable that refers to some other location in memory. A pointer variable is itself an lvalue. It has a location in memory, and its name refers to that location in memory. The value stored in that location in memory, however, is itself a reference to some location in memory (normally some location other than the pointer itself).
Perhaps a picture would help:
Here I'm using a solid arrow-line to indicate an association that can't be changed. The rectangles stand for the names of variables, and the sideways diamonds to memory locations. So, num is immutably associated with one location in memory. It can't be modified to refer any other location, and it can't be dereferenced.
If we define something like:
double num2 = 12.34;
double *ptr = &num2;
Then we get roughly the situation depicted in the second part of the picture. We've defined num2 as a double, just like num is (but holding a different value). We've then defined a pointer (named pointer) that points to num2. In other words, when we dereference pointer, we get num2. The connection from pointer to num2 is a dashed line though--indicating that if we chose to, we could change this--we could assign the address of some other double to pointer, which would make pointer refer to that other variable's location.
Though you haven't asked about it (yet) the third part of the picture shows what we'd get if we defined something like:
double num3 = 98.76;
double &reference = num3;
In this case, we've created a third object in memory (num3), and we've created a reference (named reference) that refers to num3. I've drawn the location for reference in a lighter color to signify the fact that there may or may not be an actual location in memory used to store the reference itself--it could just be a second name that refers (more or less) directly to the num3. Unlike the pointer, this has a solid line from the reference to the object to which it refers, because it can't be changed--once the reference is created, it always refers to that same location.
To answer your other question, with a definition like you gave (double num = 32.53;), no you cannot dereference num. An expression like *num = 10.2; simply won't compile. Since the name of the variable always refers to that variable's location, trying to use * to refer to its location simply isn't necessary, supported, or allowed.

No. The value num is probably stored in a register, for such a simple example. And on todays CPU's, registers do not have an address.
num is a name of an object. C++ pretends that the object lives at a specific place in memory, but that's not very efficient on modern architectures. Most operations require the value to be in a register, so if the compiler can keep it in a register it will. If not, the compiler may benefit from putting the value in a cache-friendly location. This may not be the same location every time, so value may move around in memory.
So, the name is far more powerful that a pointer: it follows the value as it moves around in memory. Creating and storing a pointer &num disallows such movements, as moving num would then invalidate the pointer.

Is (*i).member less efficient than i->member

Having
struct Person {
string name;
};
Person* p = ...
Assume that no operators are overloaded.
Which is more efficient (if any) ?
(*p).name vs. p->name
Somewhere in the back of my head I hear some bells ringing, that the * dereference operator may create a temporary copy of an object; is this true?
The background of this question are cases like this:
Person& Person::someFunction(){
...
return *this;
}
and I began to wonder, if changing the result to Person* and the last line to simply return this would make any difference (in performance)?

There's no difference. Even the standard says the two are equivalent, and if there's any compiler out there that doesn't generate the same binary for both versions, it's a bad one.

When you return a reference, that's exactly the same as passing back a pointer, pointer semantics excluded.
You pass back a sizeof(void*) element, not a sizeof(yourClass).
So when you do that:
Person& Person::someFunction(){
...
return *this;
}
You return a reference, and that reference has the same intrinsic size than a pointer, so there's no runtime difference.
Same goes for your use of (*i).name, but in that case you create an l-value, which has then the same semantics as a reference (see also here)

Yes, it's much harder to read and type, so you are much better off using the x->y than (*x).y - but other than typing efficiency, there is absolutely no difference. The compiler still needs to read the value of x and then add the offset to y, whether you use one form or the other [assuming there are no funny objects/classes involved that override the operator-> and operator* respectively, of course]
There is definitely no extra object created when (*x) is referenced. The value of the pointer is loaded into a register in the processor [1]. That's it.
Returning a reference is typically more efficient, as it returns a pointer (in disguise) to the object, rather than making a copy of the object. For objects that are bigger than the size of a pointer, this is typically a win.
[1] Yes, we can have a C++ compiler for a processor that doesn't have registers. I know of at least one processor from Rank-Xerox that I saw in about 1984, which doesn't have registers, it was a dedicated LiSP processor, and it just has a stack for LiSP objects... But they are far from common in todays world. If someone working on a processor that doesn't have registers, please don't downvote my answer simply because I don't cover that option. I'm trying to keep the answer simple.

Any good compiler will produce the same results. You can answer this yourself, compile both codes to assembler and check the produced code.

C++/C object->isOnStack()

I would like to be able to determine if a pointer is on the stack or not at runtime for a number of reasons. Like if I pass it into a function call, I can determine whether I need to clone it or not. or whether I need to delete it.
In Microsft C (VC 6,7,8) is there a way to bounds check a pointer to see if it in on the stack or not? I am only concerned with determining this on the thread that owns the stack the object was placed on.
something like
static const int __stack_size
and __stack_top
????
Thanks!

Knowing whether an object is on the stack or heap isn't going to tell you whether it should be cloned or deleted by the called function. After all, you can clone either type, and while you shouldn't try to delete a stack-allocated function you shouldn't try to delete all heap pointers either.
Having a function that will make some arcane check to see whether it should delete a passed pointer or not is going to cause confusion down the line. You don't want a situation where you may or may not be able to refer to fields in an object you passed, depending on context. Nor do you want to risk a mistake that will result in trying to free a stack object.
There isn't any standard way to tell what a pointer points to, and any nonstandard way is likely to break. You can't count on stack contiguity, particularly in multithreaded applications (and somebody could easily add a thread to an application without realizing the consequences).
The only safe ways are to have a calling convention that the called function will or will not delete a passed object, or to pass some sort of smart pointer. Anything else is asking for trouble.

Interesting question.
Here's an idea on how to determine it, but not a function call.
Create a dummy variable at the very start of your application on the stack.
Create a variable on the stack in a function isOnStack( void *ptr )
Check to see that the 'ptr' is between the dummy variable and the local variable.
Remember that the stack is contiguous for a given thread. I'm not sure what would happen when you started checking from one thread to another for this information.
If it's not in the stack, then it must be on the heap.

I do not know any method to determine where an object was allocated.
I see this kind of behaviour should be avoided. Such things should imho be solved by contract between user and library developer. State these things in the documentation! If unsure copy the object (which requires a copy constructor and saves you from trying to copy uncopyable objects).
You can also use smart pointers from Boost. If unsure when an object is now longer needed, pass it as a shared pointer.

Doing this depends on the calling convention of the function. Some calling conventions place arguments in registers, others place them in memory after the head of the stack. Each one is a different agreement between the caller/callee. So at any function boundary in the stack a different convention could have been used. This forces you to track the calling convention used at every level.
For example, in fastcall, one or more arguments can be passed via registers.
See MSDN for more. This would mess up any scheme to figure out if an address exists within a certain range. In MS's thiscall, the this pointer is passed via registers. The &this would not resolve to somewhere between a range of values between the begin and end of the stack.
Bottom line, research calling conventions, it specifies how stack memory will be laid out. Here is a good tutorial
Note this is very platform specific!

This is very platform specific, and IMO suitable only for debug build diagnostics. What you'd need to do (on WIntel) is this:
When a thread is created, create a stack variable, and store its address in a global (threadid, stack base address) map.
IsOnStack needs to create its own local variable, and check if the pointer passed is between the stack base and the address in the current stack frame.
This will not tell you anything abotu variables within other threads. Stack addresses decrease, so the base address is higher than the current address.
As a portable solution, I'd pass a boost::shared_ptr, which can be associated with a deleter. (In boost, this is not a template parameter, so it doesn't "infect" the function consuming the pointer).
you can create an "unmanaged" pointer like this:
inline void boost_null_deleter(void *) {}
template <typename T> inline
boost::shared_ptr<T> unmanaged_ptr(T * x)
{
return boost::shared_ptr<T>(x, ::boost_null_deleter);
}
and call your function like this
Foo local = { ... };
FooPtr heapy(new Foo);
FunnyFunc(unmanaged_ptr(&local));
FunnyFunc(heapy);

I've wanted such a feature in C++ for a while now, but nothing good really exists. The best you can hope for is to document that you expect to be passed an object that lives on the heap, and then to establish an idiom in the code so that everyone working on the code base will know to pass heap allocated objects to your code. Using something like auto_ptr or boost::shared_ptr is a good idiom for this kind of requirement.

Well, I agree there is probably a better way of doing what you're trying to do. But it's an interesting question anyway. So for discussion's sake...
First, there is no way of doing this is portable C or C++. You have to drop to assembly, using at least a asm{ } block.
Secondly, I wouldn't use this in production code. But for VC++/x86 you can find out if a variable is on your stack by check that it's address is between the values of ESP and EBP registers.
Your ESP ( Extended Stack Pointer, the low value ) holds the top of your stack and the EBP ( Extended Base Pointer ) usually the bottom. Here's the Structure of the Call Stack on x86.
Calling convention will affect function parameters mainly, and how the return address is handled, etc. So it doesn't relate to your stack much. Not for your case anyway.
What throws things off are compiler optimizations. Your compiler may leave out the frame pointer ( EBP ). This is the -Oy flag in VC++. So instead of using the EBP as the base pointer you can use the address of function parameters, if you have any. Since those a bit higher up on the stack.
But what if that variable you're testing is on your caller's stack? Or a caller's stack several generations above you? Well you can walk the entire call stack, but you can see how this can get very ugly ( as if it isn't already :-) )
Since you're living dangerously, another compiler flag that may interest you is -
Gh flag. With that flag and a suitable _penter hook function, you can setup these calculations for the functions or files or modules, etc. easily. But please don't do this unless you'd just like to see how things work.
Figuring out what's on the heap is even worse....

On some platforms, the stack can be split by the run-time system. That is, instead of getting a (no pun intended) stack overflow, the system automatically grabs some more stack space. Of course, the new stack space is usually not contiguous with the old stack space.
It's therefore really not safe to depend on whether something is on the stack.
The use of auto_ptr generally eliminates the need for this kind of thing, and is way cooler besides.

The MSVC Windows compiler specific answer. This is of course specific to the thread the object is in. It's a pretty bad idea to pass any auto-stack item into any thread other than the one whos stack it is on so I'm not worried about that :)
bool __isOnStack(const void *ptr)
{
// FS:[0x04] 4 Win9x and NT Top of
stack // FS:[0x08] 4 Win9x and
NT Current bottom of stack
const char *sTop; const char
*sBot;
__asm {
mov EAX, FS:[04h]
mov [sTop], EAX
mov EAX, FS:[08h]
mov [sBot], EAX
}
return( sTop > ((const char *)ptr) && ((const char *)ptr) > sBot);
}

Thought experiment with __stdcall and corrupted stack (C++)

My mind was wandering today on the topic of function pointers, and I came up with the following scenario in my head:
__stdcall int function (int)
{
return 0;
}
int main()
{
(*(int(*)(char*,char*))function)("thought", "experiment");
return 0;
}
AFAIK this code would corrupt the stack, so what types of issues could I be looking at if I ran this code?
I'd do this investigating myself however I'm away from my dev machine for a week.
EDIT: Hold on a second, I've been thinking a bit more. As has been observed in the comments, the intent of this code was to have a parameter left on the stack when all is said and done (caller puts two params on the stack, callee -- expecting only one param -- pops only one off). However, since my cast doesn't make mention of the calling convention, am I casting away stdcall, at least from the view of the caller? int function(int) will still pop a param off the stack, but does the caller revert to thinking the function is __cdecl (the default) because of the cast? (i.e. three total params popped?)
EDIT2: The answer to that second question, as confirmed by Rob, is yes. I would have to restate __stdcall if I wanted to leave a param on the stack:
(*(__stdcall int(*)(char*,char*))function)("thought", "experiment");

You are calling the function as if it is _cdecl which means the caller pushes the arguments and cleans up the stack.
The receiving function is _stdcall which implies the callee cleans up the stack. The callee is expecting a single argument so will pop 4 bytes off the stack.
When the function returns the caller will then pop off two pointers (having previously pushed on two pointers), so your stack is being corrupted by 4 bytes.
Both calling conventions use the same return mechanism, and have the same register rules (eax, ecx and edx are not preserved). See wikipedia for more details.
Depending on the stack frame layout and alignment this mismatch could cause a number of effects. If you are lucky then you get away with it. If not you might mess up the return address of your main function, causing the program to crash when it branches to who-knows-where. If the compiler has injected some kind of stack guard to catch corruption then it will likely detect this and abort the program.

No, it will definitely not cause a blue screen. No user-mode process is able to do that. Even if such bug were in kernel-mode code, the BSOD would occur only after accessing invalid memory or passing wrong arguments to a function.
You are simply corrupting private memory of your process, and the corruption may (or may not) later result in an invalid operation (eg. dereferencing a pointer pointing to invalid memory). When this happens, the OS terminates your process, but no sooner.

I think you would have 'undefined behavior' in this case.
From the C standard: (I would assume it's the same in C++)
768 If
a converted pointer is used to call a
function whose type is not compatible
with the pointed-to type, the behavior
is undefined.
Edit: On most operating system, this type of error would not cause problems in your whole operating system. But it would cause undefined problems in your program. It would be very hard for a user mode program to be able to cause a blue-screen.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js