A pointer variable in C++ points only to variables in the heap? - c++

According to Warford's "Computer Systems" (4th ed), "When you declare a global or local variable, you specify its type. For example, you can specify the type to be an integer, or a character, or an array. Similarly, when you declare a pointer, you must declare that it points to some type. The pointer itself can be global or local. The value to which it points, however, resides in the heap and is neither global nor local."
I know that you can create local variables in the stack, global variables in a fixed location, and dynamically allocated variables in the heap. Can pointers only point to dynamically allocated variables then? Why can't a pointer point to global or local variables?
Thanks

The value to which it points, however, resides in the heap and is neither global nor local.
That is incorrect.
Can pointers only point to dynamically allocated variables then?
No.
Why can't a pointer point to global or local variables?
They sure can.
You might want to read the book more carefully to see whether the statement in book was made with some additional context. If not, it's time to abandon the book and learn from a different one.

The pointer point to the place in memory where the variable is located. I don't think you should care about where it is. You can make a pointer to absolutely any variable even outside of your program (however this is dangerous and may also crash your app). Pointer is just a number, address in memory. There are many cases where you point to global/local variable. For example if you want to pass class/array to some function.

A pointer can point to local variables or global variables (or fields inside them). But you might prefer references in that case.
A pointer can also point to memory obtained by some ad-hoc mechanism. For instance, the ::operator new on Linux would indirectly call mmap(2) (or similar system calls) which grows the virtual address space of your process. Or in some embedded processors you have special primitives to get some memory (e.g. switch memory banks), etc...
A pointer can also point to some address outside of your virtual address space. Then dereferencing it is undefined behavior and often gives a segmentation fault.
However, it is often (but not always) considered bad taste to point to local or global data. It is often simpler to code with the convention that pointers should be to heap allocated memory. Read also about smart pointers (notably std::shared_ptr) and reference counting.
(there are some excellent reasons to use pointers to local data on the call stack or to global data, but that should be done with caution and care; I leave you to find out more about them, perhaps by studying existing C++ source code - e.g. of free software)
Read also about memory leaks and dangling pointers. You should avoid both of them. On Linux and POSIX systems, learn about valgrind (a tool to help hunting memory related bugs).
The terminology and concepts of garbage collection are useful to know.
A common style might be (for pointers stored in fields of classes or structures) to allocate memory for them with new in constructors and to release with delete in destructors.

Pointers can point to any object, such that statement "The value to which it points, however, resides in the heap" is at least incomplete, actually wrong. A pointer can point to any object, regardless of whether it resides on a "stack", on a "heap" or on same "fixed" memory, and regardless of whether it is a "global variable" or a "local variable". Note that all these terms are imprecise and not even used in the standard in the context of objects or pointers. The standard just talks about storage duration of objects (automatic, static, dynamic), and about the scope (i.e. visibility) of the variable name (block, function, namespace, global namespace,...).
So the only important thing is that a pointer can point to any object, regardless of its storage duration, but it must not be "used" (i.e. dereferenced) before the lifetime of the object it points to has begun or after the lifetime has ended. Otherwise, the behaviour of your program is undefined.
BTW: regardless of how old your book is, there has not been any C++ standard where the statement "points always to the heap" has been true.

Related

How do Dynamic variables get disposed of?

I have a few questions regarding dynamic variables and pointers so I can understand them better.
Are dynamic variables automatically disposed of when their scope expires? If not, then what happens?
Is the value of a pointer a memory address? If not, then what are they?
It's important to understand that there two complete separate, discreet, independent "things" that you are asking about.
A pointer to an object that was created in dynamic scope (i.e. with the new statement).
And the object itself, that the pointer is pointing to.
It's important for you two separate the two in your mind, and consider them as independent entities, in of themselves. Insofar as the actual pointer itself, its lifetime and scope are no different than any other object's. When it goes out of scope it gets destroyed.
But this has no effect on the object the pointer was pointing to. Only delete destroys the object. If there's some other pointer, that's still in scope, that points to the same object, you can use that pointer for the requisite delete.
Otherwise you end up with a memory leak.
And the value of the pointer is, for all practical purposes, a memory address. This is not specified in any form or fashion in the C++ standard, which defines pointers and objects in dynamic scope in terms of how their behavior is specified. However on all run-of-the-mill operating systems you'll see a memory address in the pointer.
It's also important to understand that since a pointer is an independent object, a pointer doesn't have to point to an object in dynamic scope. There are many pointers that don't. It's not a trivial task to keep track of all objects, all pointers, and to figure out which ones need to be deleted properly. Modern C++ has additional classes and templates that will help you do that, which you'll learn about in due time.

C++ variables and where they are stored in memory (stack, heap, static)

I've just begun learning C++ and I wanted to get my head around the different ways to create variables and what the different keywords mean. I couldn't find any description that really went through it, so I wrote this to try to understand what's going on. Have I missed anything? Am I wrong about anything?
Global Variables
Global variables are stored neither on the heap nor stack.
static global variables are non-exported (standard global variables can be accessed with extern, static globals cannot)
Dynamic Variables
Any variable that is accessed with a pointer is stored on the heap.
Heap variables are allocated with the new keyword, which returns a pointer to the memory address on the heap.
The pointer itself is a standard stack variable.
Variables inside {} that aren't created with new
Stored in the stack, which is limited in size so it should be used only for primitives and small data structures.
static keyword means the variable is essentially global and stored in the same memory space as global variables, but scope is restricted to this function/class.
const keyword means you can't change the variable.
thread_local is like static but each thread gets its own variable.
Register
A variable can be declared as register to hint to the compiler that it should be stored in the register.
The compiler will probaby ignore this and apply it to whatever it thinks would be the best improvement.
Typical usage would be for an index or pointer being used as an interator in a loop.
Good practice
Use const by default when applicable, its faster.
Be wary of static and globals in multithreaded applications, instead use thread_local or mutex
Use register on iterators
Notes
Any variables created inside a function (non-global) that is not static or thread_local and is not created with new, will be on the stack. Stack variables should not exceed more than a few KB in memory, otherwise use new to put it on the heap.
The full available system memory can be used for variables with static keyword, thread_local keyword, created with new, or global.
Variables created with new need to be freed with delete. All others are automatically freed when they're out of scope, except static, thead_local and globals which are freed when the program ends.
Despite all the parroting about how globals should not be used, don't be bullied: they are great for some use cases, and more efficient than variables allocated on the heap. Mutexes will be needed to avoid race conditions in multi-threaded applications.
Mostly right.
Any variable that is accessed with a pointer is stored on the heap.
This isn't true. You can have pointers to stack-based or global variables.
Also it's worth pointing out that global variables are generally unified by the linker (i.e. if two modules have "int i" at global scope, you'll only have one global variable called "i"). Dynamic libraries complicate that slightly; on Windows, DLLs don't have that behaviour (i.e. an "int i" in a Windows DLL will not be the same "int i" as in another DLL in the same process, or as the main executable), while most other platforms dynamic libraries do. There are some additional complications on Darwin (iOS/macOS) which has a hierarchical namespace for symbols; as long as you're linking with the flat_namespace option, what I just said will hold.
Additionally, it's worth talking about initialisation behaviour; global variables are initialised automatically by the runtime (typically either using special linker features or by means of a call that is inserted into the code for your main function). The order of initialisation of globals isn't guaranteed. However, static variables declared at function scope are initialised when that function is first executed, and not at program start-up as you might suppose, and that feature is commonly used by C++ programmers to do lazy initialisation.
(Similar concerns apply to destructors for global objects; those are best avoided entirely IMO, not least because on some platforms there are fast termination features that simply won't call them.)
const keyword means you can't change the variable.
Almost. const affects the type, and there is a difference depending on where you write it exactly. For example
const char *foo;
should be read as foo is a pointer to a const char, i.e. foo itself is not const, but the thing it points at is. Contrast with
char * const foo;
which says that foo is a const pointer to char.
Finally, you've missed out volatile, the point of which is to tell the compiler not to make assumptions about the thing to which it applies (e.g. it can't assume that it's safe to cache a volatile value in a register, or to optimise away accesses, or in general to optimise across any operation that affects a volatile value). Hopefully you'll never need to use volatile; it's most often useful if you're doing really low-level things that frankly a lot of people have no need to go anywhere near.
The other answer is correct, but doesn't mention the use of register.
The compiler will probaby ignore this and apply it to whatever it thinks would be the best improvement.
This is correct. Compilers are so good at choosing variables to put in registers (and typical programmer is bad at that), that C++ committees decided it's completely useless.
This keyword was deprecated in C++11 and removed in C++17 (but it's still reserved for possible future use).
Do not use it.
You need to differentiate between specification and implementation. The specification does not say anything about stack and heap, because that's an implementation detail. They purposely talk about Storage duration.
How this storage duration is achieved depends on the target environment and if the compiler needs to do allocations those at all or if these values can be determined at the compile-time, and are then only part of the machine code (which for sure is also at some part of the memory).
So most of your descriptions would be For the target platform XY it will generally allocate on stack/heap if I do XY
C++ could also be used as an interpreted language e.g. cling that could have completely different ways of handling memory.
It could be cross-compiled to some kind of byte interpreter in which every type is dynamically allocated.
And when it comes to embedded systems the way how memory is managed/handled might be even more different.
Heap variables are allocated with the new keyword, which returns a pointer to the memory address on the heap.
If the default operator new, operator new[] are mapped to something like malloc (or any other equivalent in the given OS) this is likely the case (if the object really needs to be allocated).
But for embedded systems, it might be the case that operator new, operator new[] aren't implemented at all. The "OS" just might provide you a chunk of memory for the application that is handled like stack memory for which you manually reserve a certain amount of memory, and you implement a operator new and operator new[] that works with this preallocated memory, so in such a case you only have stack memory.
Besides that, you can create a custom operator new for certain classes that allocates the memory on some hardware that is different to the "regular" memory provided by the OS.
The std::vector is allocating the memory in the same memory space that new is allocating it, i.e. the heap, or its not? This is important because it changes how I use it.
A std::vector is defined as template<class T, class Allocator = std::allocator<T>> class vector; so there is a default behavior (that is given by the implementation) where the vector allocates memory, for common Desktop OS it uses something like OS call like malloc to dynamically allocate memory. But you could also provide a custom allocator that uses memory at any other addressable memory location (e.g. stack).

Where memory for 'this' pointer allocated

In C++, this pointer get passed to method as a hidden argument which actually points to current object, but where 'this' pointer stored in memory... in stack, heap, data where?
The standard doesn't specify where the this pointer is stored.
When it's passed to a member function in a call of that function, some compilers pass it in a register, and others pass it on the stack. It can also depend on the compiler options.
About the only thing you can be sure of is that this is an rvalue of basic type, so you can't take its address.
It wasn't always that way.
In pre-standard C++ you could assign to this, e.g. in order to indicate constructor failure. This was before exceptions were introduced. The modern standard way of indicating construction failure is to throw an exception, which guarantees an orderly cleanup (if not foiled by the user's code, such as the infamous MFC placement new bug).
The this pointer is allocated on the stack of your class functions (or sometimes a register).
This is however not likely the question you are actually asking.
In C++, this is "simply" a pointer to the current object. It allows you to access object-specific data.
For example, when code in a class has the following snippet:
this->temperature = 40.0f;
it sets the temperature for whatever object is being acted upon (assuming temperature is not a class-level static, shared amongst all objects of the class).
The this pointer itself doesn't have to be allocated (in terms of dynamic memory), it depends entirely on how it's all handled under the covers, something the standard doesn't actually mandate (the standard tends to focus more on behaviour than internals).
There are any number of places it could be: on the stack, at a specific memory location, in a register, and so on. All you have to concern yourself with is its behaviour which is basically how you use it to get access to the object.
What this points to is the object itself and it's usually allocated with new for dynamic allocation, or on the stack.
The this pointer is in the object itself*. Sort of. It is the memory location of the object.
If the object is on the stack, the this pointer is on the stack.
If the object is on the heap, the this pointer is on the heap.
Bottom line, it's nothing you need to worry about.
*[UPDATE] Let me backpedal/clarify/correct my answer. The physical this pointer is not in the object.
I would conclude the this pointer is derived by the compiler and is simply the address of the object which is stored in the symbol table. Semantically, it is inside the object but that was not what the OP was asking.
Here is the memory layout of 3 variables on the stack. The middle one is an object. You can see it holds it's one variable and nothing else:

C++: Global variable as pointer

I am new to c++ and have one question to global variables. I see in many examples that global variables are pointers with addresses of the heap. So the pointers are in the memory for global/static variables and the data behind the addresses is on the heap, right?
Instead of this you can declare global (no-pointer) variables that are stored the data. So the data is stored in the memory for global/static variables and not on the heap.
Has this solution any disadvantages over the first solution with the pointers and the heap?
Edit:
First solution:
//global
Sport *sport;
//somewhere
sport = new Sport;
Second solution:
//global
Sport sport;
A disadvantage of storing your data in a global/static variable is that the size is fixed at compile time and can't be changed as opposed to heap storage where the size can be determined at runtime and grow or shrink repeatedly over the run. The lifetime is also fixed as the complete run of the program from start to finish for global/static variables as opposed to heap storage where it can be acquired and released (even repeatedly) all through the runtime of the program. On the other hand, global and static storage management is all handled for you by the compiler where as heap storage has to be explicitly managed by your code. So in summary, global/static storage is easier but not as flexible as heap storage.
You are right in your hypothesis of where the objects are located. About usage,
It's horses for courses. There is no definite rule, it depends on the design & the type of functionality you want to implement. For example:
One may choose the pointer version to achieve lazy initialization or polymorphic behavior, neither of which is possible with global non pointer object approach.
Right. Declared variables go in the DataSegment. And they sit there for the life of the program. You cannot free them. You cannot reallocate them. In Windows, the DataSegment is a fixed size....if you put everything there you may run out of memory (at least it used to be this way).

Detect dynamically allocated object?

Can I check if an object (passed by pointer or reference) is dynamically allocated?
Example:
T t;
T* pt = new T();
is_tmp(&t); // false
is_tmp(pt); // true
Context
I perfectly realise this smells like bad design, and as a matter of fact it is, but I am trying to extend code I cannot (or should not) modify (of course I blame code that isn't mine ;) ). It calls a method (which I can override) that will delete the passed object among other things that are only applicable to dynamically allocated objects. Now, I want to check whether I have something that is okay to be deleted or if it is a temporary.
I will never pass a global (or static) variable, so I leave this undefined, here.
Not portably. Under Solaris or Linux on a PC (at least 32 bit Linux),
the stack is at the very top of available memory, so you can compare the
address passed in to the address of a local variable: if the address
passed in is higher than that of the local variable, the object it
points to is either a local variable or a temporary, or a part of a
local variable or temporary. This technique, however, invokes undefined
behavior right and left—it just happens to work on the two
platforms I mention (and will probably work on all platforms where the
stack is at the top of available memory and grows down).
FWIW: you can also check for statics on these machines. All statics are
at the bottom of memory, and the linker inserts a symbol end at the
end of them. So declare an external data (of any type) with this name,
and compare the address with it.
With regards to possibly deleting the object, however... just knowing
that the object is not on the heap (nor is a static) is not enough. The
object might be a member of a larger dynamically allocated object.
In general, as DeadMG said, there's no way you can tell from a pointer where it comes from. However, as a debugging or porting or analyzing measure, you could add a member operator new to your class which tracks dynamic allocations (provided nobody uses the explicit global ::new -- that includes containers, I'm afraid). You could then build up a set<T*> of dynamically allocated memory and search in there.
That's not at all suitable for any sort of serious application, but perhaps this can help you track where things are coming from. You can even add debug messages with line numbers to your operator.
No, it's impossible to know. You should fix the bug. In the least case, you can use a smart pointer (like shared_ptr) and give it an empty custom destructor if you don't want it to be deleted.
If you have access to the dynamic memory allocator code itself, you could scan the internal structure and see if the current pointer is in its allocated list/stack/area or however it is being stored. Quite often they are stored as linked list style structs and it wouldn't be too hard to scan for your var's address.
In my opinion it should be possible
because you can check if the memory is on the heap or on the stack
This is going to be highly platform depended code
First you have to get the range of the heap, and then you have to check if the passed memory adress is in this range...
(sounds simple, but the first step is probably tricky :-) )