In C++ are all names essentialy "under-the-hood" pointers? - c++

In C++ are names actually pointers-under-the-hood? For example:
// num is a name that points to the address of the
// memory location containing 32.53
double num (32.53);
*num; // so can we deference it and get 32.53?
num; // same as *num but the compiler automatically
// dereferences the name for us?
Clarification:
This question was kind of an odd mix of "What's happening at the machine level?" and also about the C++ language semantics of pointers. Thus, I can see why the answers are yes/no. "Yes" because outside of the language semantics, an identifier could be thought of as referring to a location in memory; and "No" because that is still not a C++ pointer and it is incorrect to deference a non-pointer double as shown in the code.
So I think both camps have successfully answered my question. It could perhaps be restated as, "Since names refer to memory locations, why can't they be treated as implicit pointers?" But such a question might generate fuzzy debates or just not be worth answering. I will carefully go over the answers and try to find the one that (I feel) answers the question the best from both angles. In other words, one that doesn't just say "That's not a C++ pointer dummy!" or "of course the name points to memory somehow".

A pointer is a programmer-visible value which holds the location of some object (or else a null value that doesn't point to an object, or an indeterminate value).
Although addressing is involved in resolving name references at run-time, names of variables are not pointers in C++. This is because variable names are not run-time values which represent locations; they are compile-time (and in the case of external names with linkage, link-time) symbols that denote locations.
"Pointer" and "address" are not the same thing. For instance, when we call a function, assuming it is not tail-call optimized or inlined, typically a "return address" is stored somewhere so that the function can return to the caller. This address is not a pointer; or at least not a C++ pointer. It is a "no user serviceable component" managed behind the scenes by the implementation, just like the stack pointer (if there is such a thing), stack frame pointer and other machine-level features. C++ references, at least in some cases, are also implemented using run-time addresses. C++ references are also not pointers.
There is a run-time pointer value associated with a variable name, and you can access this property using the address-of operator:
double *pnum = #
but not like this:
*num; // so can we deference it and get 32.53?
Have you tried it? The unary dereference operator requires an expression of pointer type; it won't work with an expression of type double. (It can work with an expression of class type, if suitably overloaded, which is something else.)
Though by means of the & operator we can get a pointer to the storage location named by num, the name num itself isn't that pointer.
When we evaluate code like num = 3, quite likely an address is involved. Or maybe not. For instance, if num is optimized into a register, then this just loads 3 into that register. Even if num has a memory address, that address is not programmer-visible in that situation. Pointers are programmer-visible: the programmer creates them, displaces them, dereferences them, stores them in variables, passes them into functions, and so on.
In fact a name isn't anything in C++; names need not be retained at run time and are not accessible in any portable way. (Implementations have ways of retaining information about names after compilation, for the sake of symbolic debugging, and platforms that support dynamic linking have ways of looking dynamic symbols from strings.)

If you extend the definition of pointer to "any conceptual device that refers to some information in memory" then, yes, absolutely.
However, nobody does.
You'd be closer to the money if you used the term handle which has, variably, been used to mean "pointer", "reference", "variable", "resource object", "name" in source code, "accessor", "identifier", and myriad other things.
One generally comes to the conclusion that general terms are too ambiguous, and end up sticking with terms that are either language-specific (such as C++'s "pointer", with its very specific semantics, not including those which you have posited), or unambiguous and commonly accepted across the realm of the industry. There are very few of those.

No.
From the definition of pointer type of the C standard (at §6.2.5/20):
A pointer type may be derived from a function type or an object type, called the referenced type. A pointer type describes an object whose value provides a reference to an entity of the referenced type.
(emphasis mine).
In your case:
double num (32.53);
you have a double, with identifier num, whose value is 32.53, which is not intended to use as a pointer value. Not even the type of num is a pointer type.
Therefore no, num is not a pointer and as the compiler would have told you, if you had tried to compile:
*num;
you can't dereference it.

No.
In this case, num is the name of an object of type double. Its declaration does not explicitly or implicitly create a pointer. It creates a floating-point object, initialized to 32.53.
You can obtain a pointer value (not a pointer object) by taking the address of the object, as in&num, but you can that for any object.
As for *num, that's illegal if num is of type double.
The name of an object is something that refers to the object itself; in that sense, if I squint really hard, I can think of that as a kind of "pointer". But a pointer in the C++ sense is a value or object containing a memory address, and it exists while the program is executing; an identifier exists only in the C++ source code. A C++ compiler will have some internal compile-time data structure that refers to a declared variable, that will include (or refer to) information about its name and type, but IMHO it's not reasonable to call that data structure a "pointer". It's likely to be something far more complex than a memory address. And the name of a variable declared inside a function will refer to different objects, or to none at all, at different times during the execution of the program.

Yes, in the sense that, like a pointer, an identifier is a handle for an object, and not the object itself. But it is still one fewer level of indirection than a pointer variable (whose name is a handle to the address which is a handle to the object).
In fact, the compiler will maintain a symbol table which is a mapping from identifier to location of the object (here location is typically not memory address, but offset from the bottom of the stack, or from the beginning of the data segment -- but then again C++ pointers aren't physical addresses either on virtual memory systems) Normally this symbol table is output by the compiler for use during debugging. Dynamic languages would actually use the symbol table during execution, to support late-binding and eval(). But C++ doesn't use identifiers at runtime.

I think variables in assembly work like that, i.e. that a name is actually a human-readable label for an address. (This article would indicate so, at least: http://www.friedspace.com/assembly/memory.php ) In C/C++, however, the name of the variable is actually a dereferenced address. To get the address, you have to use the & operator.
Pointers in C/C++ are actually variables that contain an address--something that most likely has its own address (though it doesn't need to if the compiler chooses to store the pointer in a CPU register and you don't try to get its address--if you do, then, you're guaranteed to get one).

What you're talking about really comes down to the definition of an "lvalue".
Let's consider a simple variable like you gave in the question:
double num = 32.53;
num is an lvalue, which roughly translates to the fact that it refers to some location in memory.
When it's used in an expression, an lvalue can be converted to an rvalue, which translates (roughly) to retrieving the value from that memory location. For example, given an expression like:
num = num + 1.5;
num starts out as an lvalue, but where it's used on the right side of the assignment, it's converted to an rvalue. That basically means the value from that location in memory is fetched, then 1.5 is added to it, then the resulting value is written back to the location in memory that num refers to.
That does not, however, mean that num is a pointer. num does refer to a location in memory. A pointer is different: it's a variable that refers to some other location in memory. A pointer variable is itself an lvalue. It has a location in memory, and its name refers to that location in memory. The value stored in that location in memory, however, is itself a reference to some location in memory (normally some location other than the pointer itself).
Perhaps a picture would help:
Here I'm using a solid arrow-line to indicate an association that can't be changed. The rectangles stand for the names of variables, and the sideways diamonds to memory locations. So, num is immutably associated with one location in memory. It can't be modified to refer any other location, and it can't be dereferenced.
If we define something like:
double num2 = 12.34;
double *ptr = &num2;
Then we get roughly the situation depicted in the second part of the picture. We've defined num2 as a double, just like num is (but holding a different value). We've then defined a pointer (named pointer) that points to num2. In other words, when we dereference pointer, we get num2. The connection from pointer to num2 is a dashed line though--indicating that if we chose to, we could change this--we could assign the address of some other double to pointer, which would make pointer refer to that other variable's location.
Though you haven't asked about it (yet) the third part of the picture shows what we'd get if we defined something like:
double num3 = 98.76;
double &reference = num3;
In this case, we've created a third object in memory (num3), and we've created a reference (named reference) that refers to num3. I've drawn the location for reference in a lighter color to signify the fact that there may or may not be an actual location in memory used to store the reference itself--it could just be a second name that refers (more or less) directly to the num3. Unlike the pointer, this has a solid line from the reference to the object to which it refers, because it can't be changed--once the reference is created, it always refers to that same location.
To answer your other question, with a definition like you gave (double num = 32.53;), no you cannot dereference num. An expression like *num = 10.2; simply won't compile. Since the name of the variable always refers to that variable's location, trying to use * to refer to its location simply isn't necessary, supported, or allowed.

No. The value num is probably stored in a register, for such a simple example. And on todays CPU's, registers do not have an address.
num is a name of an object. C++ pretends that the object lives at a specific place in memory, but that's not very efficient on modern architectures. Most operations require the value to be in a register, so if the compiler can keep it in a register it will. If not, the compiler may benefit from putting the value in a cache-friendly location. This may not be the same location every time, so value may move around in memory.
So, the name is far more powerful that a pointer: it follows the value as it moves around in memory. Creating and storing a pointer &num disallows such movements, as moving num would then invalidate the pointer.

Related

How is the type of a pointer implemented in c++?

Pointer types like int*, char*, and float* point to different types. But I have heard that pointers are simply implemented as links to other addresses - then how is this link associated with a type that the compiler can match with the type of the linked address (the variable at this location)?
Types are mostly compile time things in c++. A variable's type is used at compile time to determine what the operations (in other C++ code) do on that variable.
So a variable bob of type int* when you ++ it, maps at runtime to a generic pointer-sized integer being increased by sizeof(int).
To a certain extent this is a lie; C++'s behavior is specified in terms of an abstract machine, not a concrete one. The compiler interprets your code as expressing operations on that abtract machine (that doesn't exist), then writes concrete assembly code that realizes those operations (insofar as they are defined) on concrete hardware.
In that abstract machine, int* and double* are not just numbers. If you dereference an int* and write to some memory, then do the same with a double*, and the memory overlaps, in the abstract machine the result is undefined behavior.
In the concrete implementation of that abstract machine, pointers-as-numbers as int* or double* dereferenced with the same address results in quite well defined behavior.
This difference is important. The compiler is free to assume the abstract machine (where int* and double* are very distinct things) is the only reality that matters. So if you write to a int*, write to a double* then read back from the int* the compiler can skip the read back, because it can prove that in the abstract machine writing to a double* cannot change a the value that an int* points to.
So
int buf[10]={0};
int* a = &buff[0];
double* d = reinterpret_cast<double*>(&buff[0]);
*a = 77;
*d = 3.14;
std::cout << *a;
the apparent read at std::cout << *a can be skipped by the compiler. Meanwhile, if it actually happened on real hardware, it would read bits generated by the *d write.
When reasoning about C++ you have to think of 3 things at once; what happens at compile time, the abstract machine behavior, and the concrete implementation of your code. In two of these (compile time and abstract machine) int* is implemented differently than float*. At actual runtime, int* and float* are both going to be 64 or 32 bit integers in a register or in memory somewhere.
Type checking is done at compile time. The error happens then, or never, excluding cases of RTTI (runtime type information).
RTTI is things like dynamic_cast, which does not work on pointers to primitives like float* or int*.
At compile time that variable carries with it the fact it is a int* everywhere it goes. In the abstract machine, ditto. In the concrete compiled output, it has forgotten it is an int*.
There's no particular "link" at this stage, nor any hidden meta-data stored somewhere. Since C and C++ are compiled and eventually produce a standalone executable, the compiler "trusts" the programmer and simply provides him with a data type that represents a memory address.
If there's nothing explicitly defined at this address, you can use void * pointer. If you know that this will be the location of something in particular, you can qualify it with a certain data type like int * or char *. The compiler will therefore be able to directly access the object that lies behind but the way this address is stored remains the same in every case, and keep the same format.
Note that this qualification is done at compilation time only. It totally disappear in the definitive executable code. This means that this generated code will be produced to handle certain kinds of objects, but nothing will tell you which ones at first if you disassemble the machine code. You'll have to figure this out by yourself.
Variables represent data which is stored in one or more memory cells or "bytes". The compiler will associate this group of bytes with a name and a type when the variable is defined.
The hardware uses a binary number to access a memory cell. This is known as the "address" of the memory cell.
When you store some data in a variable, the compiler will look up the name of the variable and check that the data you want to store is compatible with its type. If it is, it then generates code which will save it in the memory cell(s) at that address.
Since this address is a number, it can itself be stored in a variable. The type of this address variable will be "pointer to T", where T is the type of the data stored in that address.
It is the responsibility of the programmer to make sure that this address variable does correspond to valid data and not some random area of memory. The compiler will not check this for you.

What is the meaning of this.attribute when it should be this->attribute

I'm working on a C++ program, and while debugging I was in following function:
int CClass::do_something()
{
... // I've put a breakpoint here
}
My CClass has an attribute, let's call it att.
When my program is halted at my breakpoint, I've put three things in my Watch window:
att
this->att
this.att
The first two, att and this->att contain the correct value, but this.att contains a wrong value (at least it looks wrong).
The fact that it shows value means that this.att has some kind of meaning.
What is that meaning? What is the meaning of this.att compared to this->att?
For your information, I'm using Visual Studio as a development environment.
The arrow operator -> operates on a pointer on the left, meaning that it accesses the operator pointed to by the pointer on the left. As pointed out many other places the -> operator is a shorthand for manually dereferencing the pointer, and using the dot notation. a->b == (*a).b. This means that the context information it needs to look up, it can find through the pointer value. The dot operator . assumes that you have a reference to the object on the left, so it does not need to dereference the pointer on its left first.
If your debugger somehow allows you to get away with using the dot operator instead of the arrow operator, probably because it does not (or cannot) validate the type used, it will assume that the address of the pointer variable, and not the address held by the address pointed by the value, is the object.
In other words, it will look at an object at a location, in memory, that is not the object itself. The behaviour is most likely undefined, and you will only see garbage data, of what is actually stored at the memory offset of the member att.
Assume your object is layed out so that, an attribute foo is at offset 0, and an attribute att is at offset 4, then the debugger is basically looking at the address of the object instance plus whatever offset your member variable holds. I am not at my home computer, but I can add a sketch of what happens to clarify a bit later.
att, by itself, returns the member variable att for that instance of the class.
this->att returns the same thing as att except that the this-> explicitly tells the program to return the att member variable of the class.
An example of a situation when att and this->att would return different values is if att was redefined in a local scope inside a class. In that case, att would refer to the locally defined att and this->att would refer to the att member variable of the class.
Since this is a pointer to the current object, it requires the -> notation (which is a combination of dereferencing the pointer and member/method access). The . notation is only member/method access. To answer your question of "what is the meaning of this.att", I would say it is meaningless and honestly I'm surprised it even returned anything at all.
The only "right" way you can use this with . is if you dereference the pointer first, like this:
(*this).att
However, the result is the same and the syntax is identical if you just use ->.
Ok, just blindly guessing without further information, but let me try to make some sense of what you see in your debugger, since this is your question right? So I will not explain the difference between . and ->. Assume the following situation:
class CClass
{
int x; // at offset 0 of CClass
int att; // at offset 4 of CClass
};
CClass* p = // ... (say p points to the address 0xF0 for now)
int CClass::do_something
{
CClass q;
CClass* r;
}
So, now inspecting the value of the expression p->att in your debugger, what the debugger does is take the value of p (here 0xF0) and add the offset of att to it, resulting in the memory address 0xF4. At this address read 4 bytes of memory (since we have an int) and interpret it as a signed integer. For the above situation, this should show the correct value of the att member of the object pointed to by p.
Now, assume you are inside the do_something function, where you have a stack allocated instance of CClass. If you now try to get the value of q.att, the debugger takes the address of the object itself (basically &q) and again adds the corresponding offset 4. Since q is allocated on the stack, the memory fragment at &q + 4 is the location inside the stack frame where the value of member att of object q is stored. Everything fine so far.
Next, consider the expression r.att, which is not correctly typed C++, but the debugger seems to just proceed in the usual manner. It takes the address of r (&r), not the address pointed to by r, and again adds the offset of att (4) to it, before reading an int from this location. Since the address of r is on the stack, the resulting location &r + 4 is not the location where the member of the object pointed to by r is located, but is an unrelated location inside your stack frame (like inside other local variables or a function argument).
In your situation, the same happens with this. Since the this pointer is an invisible first argument of non-static member functions, it is as well located in the function's stack frame. The debugger now seems to take the address where the this pointer is located in the stack frame, adds the corresponding offset and outputs the value of the 4 bytes found there. What the actual value is, depends on the layout of your stack frame (could be something like the return address), but it is clearly not inside the object pointed to by this.

The address of const variable, C++

Recently I was rereading the Effective C++ by Scott Meyers (3-rd edition). And according to Meyers:
"Also, though good compilers won’t set
aside storage for const objects of integral types (unless you create a
pointer or reference to the object), sloppy compilers may, and you may
not be willing to set aside memory for such objects."
Here in my code I can print the address of const variable, but I have not created a pointer or reference on it. I use Visual Studio 2012.
int main()
{
const int x = 8;
std::cout<<x<<" "<<&x<<std::endl;
}
The output is:
8 0015F9F4
Can anybody explain my the mismatch between the book and my code? Or I have somewhere mistaken?
By using the address-of operator on a variable, you are in fact creating a pointer. The pointer is a temporary object, not a declared variable, but it's very much there.
Furthermore there is a declared variable of pointer type that points to your variable: the argument to the overloaded operator << that you used to print the pointer.
std::cout<<x<<" "<<&x<<std::endl;
You tried to get the address of the variable x,so the compiler thinks it is necessary to generate codes to set aside storage for const objects.
By &x, you ODR-used the variable, which makes allocating actual storage for x necessary.
A good compiler (when using optimizations) will try to replace any compile-time constant by its value in your code to avoid making a memory access. However, if you do request the address of a constant (like you do) it can't do the optimization of not allocating memory to it.
However, one important thing to note is that it doesn't mean the research and replace wasn't done in your code. As you are not supposed to change the value of the constant, the compiler will assume it is safe to do a "research and replace" on it. If you do change the value with a const_cast you will get undefined behavior. It tends to work fine if you compile in debug but usually fails if your compiler optimizes the code.
In C++,for basic data type constants, the compiler will put it in the symbol table without allocating storage space, and ADT(Abstract Data Type)/UDT(User Defined Type) const object will need to allocate storage space (large objects). There are some cases also need to allocate storage space, such as forcing declared as extern symbolic constants or take the address of symbolic constants,etc.

does a variable consume memory in addition to just its content (e.g. type, location)?

Quite likely this has been asked/answered before, but not sure how to phrase it best, a link to a previously answered question would be great.
If you define something like
char myChar = 'a';
I understand that this will take up one byte in memory (depending on implementation and assuming no unicode and so on, the actual number is unimportant).
But I would assume the compiler/computer would also need to keep a table of variable types, addresses (i.e. pointers), and possibly more. Otherwise it would have the memory reserved, but would not be able to do anything with it. So that's already at least a few more bytes of memory consumed per variable.
Is this a correct picture of what happens, or am I misunderstanding what happens when a program gets compiled/executed? And if the above is correct, is it more to do with compilation, or execution?
The compiler will keep track of the properties of a variable - its name, lifetime, type, scope, etc. This information will exist in memory only during compilation. Once the program has been compiled and the program is executed, however, all that is left is the object itself. There is no type information at run-time (except if you use RTTI, then there will be some, but only because you required it for your program to function - such as is required for dynamic_casting).
Everything that happens in the code that accesses the object has been compiled into a form that treats it exactly as a single byte (because it's a char). The address that the object is located at can only be known at run-time anyway. However, variables with automatic storage duration (like local variables), are typically located simply by some fixed offset from the current stack frame. That offset is hard-baked into the executable.
Wether a variable contains extra information depends on the type of the variable and your compiler options. If you use RTTI, extra information is stored. If you compile with debug information then there will also extra overhead be added.
For native datatypes like your example of char there is usually no overhead, unless you have structs which also can cotnain padding bytes. If you define classes, there may be a virtual table associated with your class. However, if you dynamically allocate memory, then there usually will be some overhead along with your allocated memory.
Somtimes a variable may not even exist, because the optimizer realizes that there is no storage needed for it, and it can wrap it up in a register.
So in total, you can not rely on counting your used variables and sum their size up to calculate the amount of memory it requires because there is not neccessarily a 1:1: relation.
Some types can be detected in compile type, say in this code:
void foo(char c) {...}
it is obvious what type of variable c in compile time is.
In case of inheritance you cannot know the real type of the variable in the compile type, like:
void draw(Drawable* drawable); // where drawable can be Circle, Line etc.
But C++ compiler can help to determine the type of the Drawable using dynamic_cast. In this case it uses pointer to a virtual method tables, associated with an object to determine the real type.

const variable in c++

these are the some silly question ..i want to ask..please help me to comprehend it
const int i=100; //1
///some code
long add=(long)&i; //2
Doubt:for the above code..will compiler first go through the whole code
for deciding whether memory should be allocated or not..or first it ll store the
variable in read only memory place and then..allocate stroage as well at 2
doubt:why taking address of variable enforce compiler to store variable on memory..even
though rom or register too have address
In your code example, add contains the address, not the value, of i. I believe you may have thought that i was not stored in normal memory unless/until you take its address. This is not the case.
const does not mean the value is stored in ROM. It is stored in normal memory (often the stack) just like any other variable. const means the compiler will go to some lengths to prevent you from modifying the value.
const is not, and was never intended, to be some sort of security mechanism. If you obtain the address of the memory and want to modify it, you can do so. Of course this is almost always a bad idea, but if you really need to do it, it is possible.
I never wrote a compiler implementing this, but I think that it would be simple to just handle the variable as a normal variable but using the constant value where the variable value is used and using the address of the variable if the address is used.
If at the end of the scope of the variable no one took the address then I can just drop it instead of doing a real allocation because for all other uses the constant value has been used instead of compiling a variable loading operation.
constant values (not the only use for const, but the one used here) are not 'stored in normal memory' (nor in ROM, of course). the compiler simply uses the value (100 in this case) whenever the code uses the variable.
Of course, if the value isn't stored anywhere, there's no meaning of an address for the constant.
Other uses of const are stored in 'normal memory', and you can take their address, but the result is a 'pointer to const value', so it's (in principle) unusable for modification of the value. A hard cast would of course change that, so they trigger a nasty compiler warning.
also, remember that the C/C++ compiler operates totally at compile time (by definition!), it's nothing unusual that some use at a later part affects the code generation of an early part.
A very obvious example is the declaration of stack variables: the compiler has to take into account all the variables declared at any given level to be able to generate the stack allocation at the block entry.
I am a little confused about what you are asking but looking at your code:
i = 100 with a address of 0x?????????????
add = whatever the address is stored as a long int
There is no (dynamic) memory allocation in this code. The two local variables are created on stack. The address of i is taken and brutally cast into long, which is then assigned to the second variable.