Related
According to Warford's "Computer Systems" (4th ed), "When you declare a global or local variable, you specify its type. For example, you can specify the type to be an integer, or a character, or an array. Similarly, when you declare a pointer, you must declare that it points to some type. The pointer itself can be global or local. The value to which it points, however, resides in the heap and is neither global nor local."
I know that you can create local variables in the stack, global variables in a fixed location, and dynamically allocated variables in the heap. Can pointers only point to dynamically allocated variables then? Why can't a pointer point to global or local variables?
Thanks
The value to which it points, however, resides in the heap and is neither global nor local.
That is incorrect.
Can pointers only point to dynamically allocated variables then?
No.
Why can't a pointer point to global or local variables?
They sure can.
You might want to read the book more carefully to see whether the statement in book was made with some additional context. If not, it's time to abandon the book and learn from a different one.
The pointer point to the place in memory where the variable is located. I don't think you should care about where it is. You can make a pointer to absolutely any variable even outside of your program (however this is dangerous and may also crash your app). Pointer is just a number, address in memory. There are many cases where you point to global/local variable. For example if you want to pass class/array to some function.
A pointer can point to local variables or global variables (or fields inside them). But you might prefer references in that case.
A pointer can also point to memory obtained by some ad-hoc mechanism. For instance, the ::operator new on Linux would indirectly call mmap(2) (or similar system calls) which grows the virtual address space of your process. Or in some embedded processors you have special primitives to get some memory (e.g. switch memory banks), etc...
A pointer can also point to some address outside of your virtual address space. Then dereferencing it is undefined behavior and often gives a segmentation fault.
However, it is often (but not always) considered bad taste to point to local or global data. It is often simpler to code with the convention that pointers should be to heap allocated memory. Read also about smart pointers (notably std::shared_ptr) and reference counting.
(there are some excellent reasons to use pointers to local data on the call stack or to global data, but that should be done with caution and care; I leave you to find out more about them, perhaps by studying existing C++ source code - e.g. of free software)
Read also about memory leaks and dangling pointers. You should avoid both of them. On Linux and POSIX systems, learn about valgrind (a tool to help hunting memory related bugs).
The terminology and concepts of garbage collection are useful to know.
A common style might be (for pointers stored in fields of classes or structures) to allocate memory for them with new in constructors and to release with delete in destructors.
Pointers can point to any object, such that statement "The value to which it points, however, resides in the heap" is at least incomplete, actually wrong. A pointer can point to any object, regardless of whether it resides on a "stack", on a "heap" or on same "fixed" memory, and regardless of whether it is a "global variable" or a "local variable". Note that all these terms are imprecise and not even used in the standard in the context of objects or pointers. The standard just talks about storage duration of objects (automatic, static, dynamic), and about the scope (i.e. visibility) of the variable name (block, function, namespace, global namespace,...).
So the only important thing is that a pointer can point to any object, regardless of its storage duration, but it must not be "used" (i.e. dereferenced) before the lifetime of the object it points to has begun or after the lifetime has ended. Otherwise, the behaviour of your program is undefined.
BTW: regardless of how old your book is, there has not been any C++ standard where the statement "points always to the heap" has been true.
I am new to c++ and have one question to global variables. I see in many examples that global variables are pointers with addresses of the heap. So the pointers are in the memory for global/static variables and the data behind the addresses is on the heap, right?
Instead of this you can declare global (no-pointer) variables that are stored the data. So the data is stored in the memory for global/static variables and not on the heap.
Has this solution any disadvantages over the first solution with the pointers and the heap?
Edit:
First solution:
//global
Sport *sport;
//somewhere
sport = new Sport;
Second solution:
//global
Sport sport;
A disadvantage of storing your data in a global/static variable is that the size is fixed at compile time and can't be changed as opposed to heap storage where the size can be determined at runtime and grow or shrink repeatedly over the run. The lifetime is also fixed as the complete run of the program from start to finish for global/static variables as opposed to heap storage where it can be acquired and released (even repeatedly) all through the runtime of the program. On the other hand, global and static storage management is all handled for you by the compiler where as heap storage has to be explicitly managed by your code. So in summary, global/static storage is easier but not as flexible as heap storage.
You are right in your hypothesis of where the objects are located. About usage,
It's horses for courses. There is no definite rule, it depends on the design & the type of functionality you want to implement. For example:
One may choose the pointer version to achieve lazy initialization or polymorphic behavior, neither of which is possible with global non pointer object approach.
Right. Declared variables go in the DataSegment. And they sit there for the life of the program. You cannot free them. You cannot reallocate them. In Windows, the DataSegment is a fixed size....if you put everything there you may run out of memory (at least it used to be this way).
I'm aware that questions about the stack vs. the heap have been asked several times, but I'm confused about one small aspect of choosing how to declare objects in C++.
I understand that the heap--accessed with the "new" operator--is used for dynamic memory allocation. According to an answer to another question on Stack Overflow, "the heap is for storage of data where the lifetime of the storage cannot be determined ahead of time". The stack is faster than the heap, and seems to be used for variables of local scope, i.e., the variables are automatically deleted when the relevant section of code is completed. The stack also has a relatively limited amount of available space.
In my case, I know prior to runtime that I will need an array of pointers to exactly 500 objects of a particular class, and I know I will need to store the pointers and the objects throughout the duration of runtime. The heap doesn't make sense because I know beforehand how long I will need the memory and I know exactly how man objects I will need. The stack also doesn't make sense if it is limited in scope; plus, I don't know if it can actually hold all of my objects/pointers.
What would be the best way to approach this situation and why? Thanks!
Objects allocated on the stack in main() have a lifetime of the entire run of the program, so that's an option. An array of 500 pointers is either 2000 or 4000 bytes depending on whether your pointers are 32 or 64 bits wide -- if you were programming in an environment whose stack limit was that small, you would know it (such environments do exist: for instance, kernel mode stacks are often 8192 bytes or smaller in total) so I wouldn't hesitate to put the array there.
Depending on how big your objects are, it might also be reasonable to put them on the stack -- the typical stack limit in user space nowadays is order of 8 megabytes, which is not so large that you can totally ignore it, but is not peanuts, either.
If they are too big for the stack, I would seriously consider making a global variable that was an array of the objects themselves. The major downside of this is you can't control precisely when they are initialized. If the objects have nontrivial constructors this is very likely to be a problem. An alternative is to allocate storage for the objects as a global variable, initialize them at the appropriate point within main using placement new, and explicitly call their destructors on the way out. This requires care in the presence of exceptions; I'd write a one-off RAII class that encapsulated the job.
It is not a matter of stack or heap (which to be accurate do not mean what you think in c++: they are just data structures like vector, set or queue). It is a matter of storage duration.
You most likely need here static duration objects, which can be either global, or members of a class. Automatic variables declared inside the main function could also do the job, if you design a way to access them from your other code.
There is some information about the different storage durations of C++ (automatic, static, dynamic) there. The accepted answer however uses the confusing stack/heap terminology, but the explanation is correct.
the heap is for storage of data where the lifetime of the storage cannot be determined ahead of time
While that is correct, it's also incomplete.
The stack unwinds when you exit its scope, so using it for global scoped variables is unfeasible, like you said. This however is where you stop being on the right track. While you know the lifetime (or more accurately the scope since that's the most important factor here), you also know it's above the stack frame, so given only two choices, you put it on the heap.
There is a third option, an actual static variable declared at the top scope, but this will only work if your objects have default constructors.
TL;DR: use global (static) storage for either a pointer to the array (dynamic allocation) or just the actual array (static allocation).
Note: Your assumption that the stack is somehow "faster" than the heap is wrong, they're both backed in the same RAM, you just access it relative to different registers. Also I'd like to mention yet again how much I dislike the use of the terms stack and heap.
The stack, as you mention, often has size limits. That leaves you with two choices - dynamically allocate your objects, or make them global. The time cost to allocate all of your objects once at application startup is almost certainly not of significant concern. So just pick whichever method you prefer and do it.
I think you are confusing the use of the stack (storage for local variables and parameters) and unscoped data (static class variables and data allocated via new or malloc). One appropriate solution based on your description would be a static class that has your array of pointers declared as a static class member. This would be allocated on a heap like structure (maybe even the heap depending on your C++ implementation). a "quick and dirty" solution would be to declare the array as a static variable (basically a global variable), however it isn't the best approach from a maintainability perspective.
The best concept to go by is "Use the stack when you can and the heap when you must." I don't see why the stack wouldn't be able to hold all of your objects unless they're large or you're working with a limited resources system. Try the stack and if it can't handle it, the time it would take to allocate the memory on the heap can all be done early in the program's execution and wouldn't be a significant problem.
Unless speed is an issue or you can't afford the overhead, you should stick the objects in a std::vector. If copy semantics aren't defined for the objects, you should use a std::vector of std::shared_ptrs.
Related to a lot of questions and answers on SO, I've learned that it's better to refer to objects whose lifetime is managed as residing in automatic storage rather than the stack.
Also, dynamically allocated objects shouldn't be referred to as residing on the heap, but in dynamic storage.
I get that there is automatic, dynamic and static storage, but never really understood the difference between automatic-stack and dynamic-heap. Why are the former preferred?
I'm not asking what stack/heap mean or how memory management works. I'm asking why the terms automatic/dynamic storage are preferred over the terms stack/heap.
Automatic tells me something about the lifetime of an object: specifically that it is bound automatically to the enclosing scope, and will be destroyed automatically when that scope exits.
Dynamic tells me that the lifetime of an object is not controlled automatically by the compiler, but is under my direct control.
Stack is an overloaded name for a type of container, and for the related popular instruction pointer protocol supported by common call and ret instructions. It doesn't tell me anything about the lifetime of an object, except through a historical association to object lifetimes in C, due to popular stack frame conventions.
Note also that in some implementations, thread-local storage is on the stack of a thread, but is not limited to the scope of any single function.
Heap is again an overloaded name, indicating either a type of sorted container or a free-store management system. This is not the only free store available on all systems, and nor does it tell me anything concrete about the lifetime of an object allocated with new.
Most implementations use a stack to back objects with automatic storage. This is not required by the standard, but it works out well on most CPU architectures these days.
Implementations use various strategies to back objects with dynamic storage duration. I'm not sure a heap is the best way to describe what modern memory allocators use, but that appears to be the "historical" term for that.
So automatic/dynamic storage are terms the standard uses to classify ("abstract") object lifetimes. Those are the proper terms to use if you want to talk objects as the standard describes them.
Stacks and heaps are ("concrete") implementation techniques that can be used to back them. Using those terms is less correct, unless you're talking about a specific implementation.
Technically speaking stack/heap allocation are implementation details, while automatic/dynamic storage are the more general terms. The standard itself doesn't mandate that the allocator has to use a stack/heap. Therefore automatic/dynamic is the more proper term, although personally I find that distinction to be a bit overly pedantic.
The automatic/dynamic storage terms are preferable simply because this is what the standard requires. Stack/heap are implementation based and can theoretically be implemented another way.
The terms "static storage duration", "automatic storage duration", and "dynamic storage duration" appear in the C++ standard.
The terms "stack" and "heap" are used to refer to features in the Standard Library (stack<>, make_heap(), push_heap(), etc.) which have little to do with storage duration.
Stack and heap bring in concepts related to implementation into picture, whereas the terms "automatic" and "dynamic" are more general
I've been using C++ for a short while, and I've been wondering about the new keyword. Simply, should I be using it, or not?
With the new keyword...
MyClass* myClass = new MyClass();
myClass->MyField = "Hello world!";
Without the new keyword...
MyClass myClass;
myClass.MyField = "Hello world!";
From an implementation perspective, they don't seem that different (but I'm sure they are)... However, my primary language is C#, and of course the 1st method is what I'm used to.
The difficulty seems to be that method 1 is harder to use with the std C++ classes.
Which method should I use?
Update 1:
I recently used the new keyword for heap memory (or free store) for a large array which was going out of scope (i.e. being returned from a function). Where before I was using the stack, which caused half of the elements to be corrupt outside of scope, switching to heap usage ensured that the elements were intact. Yay!
Update 2:
A friend of mine recently told me there's a simple rule for using the new keyword; every time you type new, type delete.
Foobar *foobar = new Foobar();
delete foobar; // TODO: Move this to the right place.
This helps to prevent memory leaks, as you always have to put the delete somewhere (i.e. when you cut and paste it to either a destructor or otherwise).
Method 1 (using new)
Allocates memory for the object on the free store (This is frequently the same thing as the heap)
Requires you to explicitly delete your object later. (If you don't delete it, you could create a memory leak)
Memory stays allocated until you delete it. (i.e. you could return an object that you created using new)
The example in the question will leak memory unless the pointer is deleted; and it should always be deleted, regardless of which control path is taken, or if exceptions are thrown.
Method 2 (not using new)
Allocates memory for the object on the stack (where all local variables go) There is generally less memory available for the stack; if you allocate too many objects, you risk stack overflow.
You won't need to delete it later.
Memory is no longer allocated when it goes out of scope. (i.e. you shouldn't return a pointer to an object on the stack)
As far as which one to use; you choose the method that works best for you, given the above constraints.
Some easy cases:
If you don't want to worry about calling delete, (and the potential to cause memory leaks) you shouldn't use new.
If you'd like to return a pointer to your object from a function, you must use new
There is an important difference between the two.
Everything not allocated with new behaves much like value types in C# (and people often say that those objects are allocated on the stack, which is probably the most common/obvious case, but not always true). More precisely, objects allocated without using new have automatic storage duration
Everything allocated with new is allocated on the heap, and a pointer to it is returned, exactly like reference types in C#.
Anything allocated on the stack has to have a constant size, determined at compile-time (the compiler has to set the stack pointer correctly, or if the object is a member of another class, it has to adjust the size of that other class). That's why arrays in C# are reference types. They have to be, because with reference types, we can decide at runtime how much memory to ask for. And the same applies here. Only arrays with constant size (a size that can be determined at compile-time) can be allocated with automatic storage duration (on the stack). Dynamically sized arrays have to be allocated on the heap, by calling new.
(And that's where any similarity to C# stops)
Now, anything allocated on the stack has "automatic" storage duration (you can actually declare a variable as auto, but this is the default if no other storage type is specified so the keyword isn't really used in practice, but this is where it comes from)
Automatic storage duration means exactly what it sounds like, the duration of the variable is handled automatically. By contrast, anything allocated on the heap has to be manually deleted by you.
Here's an example:
void foo() {
bar b;
bar* b2 = new bar();
}
This function creates three values worth considering:
On line 1, it declares a variable b of type bar on the stack (automatic duration).
On line 2, it declares a bar pointer b2 on the stack (automatic duration), and calls new, allocating a bar object on the heap. (dynamic duration)
When the function returns, the following will happen:
First, b2 goes out of scope (order of destruction is always opposite of order of construction). But b2 is just a pointer, so nothing happens, the memory it occupies is simply freed. And importantly, the memory it points to (the bar instance on the heap) is NOT touched. Only the pointer is freed, because only the pointer had automatic duration.
Second, b goes out of scope, so since it has automatic duration, its destructor is called, and the memory is freed.
And the barinstance on the heap? It's probably still there. No one bothered to delete it, so we've leaked memory.
From this example, we can see that anything with automatic duration is guaranteed to have its destructor called when it goes out of scope. That's useful. But anything allocated on the heap lasts as long as we need it to, and can be dynamically sized, as in the case of arrays. That is also useful. We can use that to manage our memory allocations. What if the Foo class allocated some memory on the heap in its constructor, and deleted that memory in its destructor. Then we could get the best of both worlds, safe memory allocations that are guaranteed to be freed again, but without the limitations of forcing everything to be on the stack.
And that is pretty much exactly how most C++ code works.
Look at the standard library's std::vector for example. That is typically allocated on the stack, but can be dynamically sized and resized. And it does this by internally allocating memory on the heap as necessary. The user of the class never sees this, so there's no chance of leaking memory, or forgetting to clean up what you allocated.
This principle is called RAII (Resource Acquisition is Initialization), and it can be extended to any resource that must be acquired and released. (network sockets, files, database connections, synchronization locks). All of them can be acquired in the constructor, and released in the destructor, so you're guaranteed that all resources you acquire will get freed again.
As a general rule, never use new/delete directly from your high level code. Always wrap it in a class that can manage the memory for you, and which will ensure it gets freed again. (Yes, there may be exceptions to this rule. In particular, smart pointers require you to call new directly, and pass the pointer to its constructor, which then takes over and ensures delete is called correctly. But this is still a very important rule of thumb)
The short answer is: if you're a beginner in C++, you should never be using new or delete yourself.
Instead, you should use smart pointers such as std::unique_ptr and std::make_unique (or less often, std::shared_ptr and std::make_shared). That way, you don't have to worry nearly as much about memory leaks. And even if you're more advanced, best practice would usually be to encapsulate the custom way you're using new and delete into a small class (such as a custom smart pointer) that is dedicated just to object lifecycle issues.
Of course, behind the scenes, these smart pointers are still performing dynamic allocation and deallocation, so code using them would still have the associated runtime overhead. Other answers here have covered these issues, and how to make design decisions on when to use smart pointers versus just creating objects on the stack or incorporating them as direct members of an object, well enough that I won't repeat them. But my executive summary would be: don't use smart pointers or dynamic allocation until something forces you to.
Which method should I use?
This is almost never determined by your typing preferences but by the context. If you need to keep the object across a few stacks or if it's too heavy for the stack you allocate it on the free store. Also, since you are allocating an object, you are also responsible for releasing the memory. Lookup the delete operator.
To ease the burden of using free-store management people have invented stuff like auto_ptr and unique_ptr. I strongly recommend you take a look at these. They might even be of help to your typing issues ;-)
If you are writing in C++ you are probably writing for performance. Using new and the free store is much slower than using the stack (especially when using threads) so only use it when you need it.
As others have said, you need new when your object needs to live outside the function or object scope, the object is really large or when you don't know the size of an array at compile time.
Also, try to avoid ever using delete. Wrap your new into a smart pointer instead. Let the smart pointer call delete for you.
There are some cases where a smart pointer isn't smart. Never store std::auto_ptr<> inside a STL container. It will delete the pointer too soon because of copy operations inside the container. Another case is when you have a really large STL container of pointers to objects. boost::shared_ptr<> will have a ton of speed overhead as it bumps the reference counts up and down. The better way to go in that case is to put the STL container into another object and give that object a destructor that will call delete on every pointer in the container.
Without the new keyword you're storing that on call stack. Storing excessively large variables on stack will lead to stack overflow.
If your variable is used only within the context of a single function, you're better off using a stack variable, i.e., Option 2. As others have said, you do not have to manage the lifetime of stack variables - they are constructed and destructed automatically. Also, allocating/deallocating a variable on the heap is slow by comparison. If your function is called often enough, you'll see a tremendous performance improvement if use stack variables versus heap variables.
That said, there are a couple of obvious instances where stack variables are insufficient.
If the stack variable has a large memory footprint, then you run the risk of overflowing the stack. By default, the stack size of each thread is 1 MB on Windows. It is unlikely that you'll create a stack variable that is 1 MB in size, but you have to keep in mind that stack utilization is cumulative. If your function calls a function which calls another function which calls another function which..., the stack variables in all of these functions take up space on the same stack. Recursive functions can run into this problem quickly, depending on how deep the recursion is. If this is a problem, you can increase the size of the stack (not recommended) or allocate the variable on the heap using the new operator (recommended).
The other, more likely condition is that your variable needs to "live" beyond the scope of your function. In this case, you'd allocate the variable on the heap so that it can be reached outside the scope of any given function.
The simple answer is yes - new() creates an object on the heap (with the unfortunate side effect that you have to manage its lifetime (by explicitly calling delete on it), whereas the second form creates an object in the stack in the current scope and that object will be destroyed when it goes out of scope.
Are you passing myClass out of a function, or expecting it to exist outside that function? As some others said, it is all about scope when you aren't allocating on the heap. When you leave the function, it goes away (eventually). One of the classic mistakes made by beginners is the attempt to create a local object of some class in a function and return it without allocating it on the heap. I can remember debugging this kind of thing back in my earlier days doing c++.
C++ Core Guidelines R.11: Avoid using new and delete explicitly.
Things have changed significantly since most answers to this question were written. Specifically, C++ has evolved as a language, and the standard library is now richer. Why does this matter? Because of a combination of two factors:
Using new and delete is potentially dangerous: Memory might leak if you don't keep a very strong discipline of delete'ing everything you've allocated when it's no longer used; and never deleteing what's not currently allocated.
The standard library now offers smart pointers which encapsulate the new and delete calls, so that you don't have to take care of managing allocations on the free store/heap yourself. So do other containers, in the standard library and elsewhere.
This has evolved into one of the C++ community's "core guidelines" for writing better C++ code, as the linked document shows. Of course, there exceptions to this rule: Somebody needs to write those encapsulating classes which do use new and delete; but that someone is rarely yourself.
Adding to #DanielSchepler's valid answer:
The second method creates the instance on the stack, along with such things as something declared int and the list of parameters that are passed into the function.
The first method makes room for a pointer on the stack, which you've set to the location in memory where a new MyClass has been allocated on the heap - or free store.
The first method also requires that you delete what you create with new, whereas in the second method, the class is automatically destructed and freed when it falls out of scope (the next closing brace, usually).
The short answer is yes the "new" keyword is incredibly important as when you use it the object data is stored on the heap as opposed to the stack, which is most important!