What is the best way to put large objects on the heap? - c++

I am working on a project that needs to load many objects from a data file and store them in memory. Since I have been told that stack space is rare and larger amounts of data should be on the heap I put everything on the heap. However, my impression is that I overdid it a little bit.
My current design looks like this:
class RoadMap
{
unique_ptr<set<unique_ptr<Node>>> allNodes;
void addNode(unique_ptr<Node> node)
{
this->allNodes->insert(std::move(node));
}
}
int main()
{
unique_ptr<RoadMap> map(new RoadMap());
// open file etc.
for (auto nodeData : nodesInFile)
{
map->addNode(unique_ptr<Node>(new Node(nodeData)));
}
}
From what I understand by now, this creates a lot of overhead because there are many unique pointers involved that I think I do not need. If I understand correctly, it should be sufficient to only have one unique pointer barrier in the "pointer chain". However, I am unsure what the best practice is to do this.
Option 1
class RoadMap
{
unique_ptr<set<Node>> allNodes;
void addNode (Node node)
{
this->allNodes->insert(node);
}
}
int main()
{
RoadMap map;
//open file etc.
for (auto nodeData : nodesInFile)
{
map.addNode(Node(nodeData));
}
}
The advantage of this seems to me that the RoadMap class itself is the only one that needs to take care of heap allocation and does so only once when creating the set.
Option 2
class RoadMap
{
set<Node> allNodes;
void addNode (Node node)
{
this->allNodes.insert(node);
}
}
int main()
{
unique_ptr<RoadMap> map(new RoadMap());
// open file etc.
for (auto nodeData : nodesInFile)
{
map->addNode(Node(nodeData));
}
}
Here the unique pointer is only in the main function meaning that the users of the RoadMap class will need to know that this object can become quite large and should be put on the stack. I don't think that this is an overly nice solution.
Option 3
class RoadMap
{
set<unique_ptr<Node>> allNodes;
void addNode(unique_ptr<Node> node)
{
this->allNodes.insert(std::move(node));
{
}
int main()
{
RoadMap map;
// open file etc.
for (auto nodeData : nodesInFile)
{
map.addNode(unique_ptr<Node>(new Node(nodeData)));
}
}
This solution uses many unique pointers which means that when deleting the RoadMap many destructors and deletes will need to be called. Also the RoadMap caller has to supply a unique_ptr when adding a node meaning that he has to do the heap allocation himself.
Right now, I am favouring option 1 over the others. However, I have only been coding C++ for a comparatively short time and am unsure whether I fully understand the concepts behind memory management which is why I want you to (in)validate my opinion. Am I correct in assuming that option 1 is the best way to do this? Do you have any additional references to best practices for this sort of thing?

Give Node a move constructor and move assignment operator (to make operations on the set cheap), then use a mix of option 1 and 2. std::set will already be heap allocating its contents so you don't need to worry about allocating a RoadMap on the heap. Note the extra std::move inside addNode to allow Nodes to be moved into the set.
class RoadMap
{
set<Node> allNodes;
void addNode (Node node)
{
allNodes.emplace(std::move(node));
}
};
int main()
{
RoadMap map;
// open file etc.
for (const auto& nodeData : nodesInFile)
{
map.addNode(Node(nodeData));
}
}

Each of them are quite different from each other.
I would suggest option 2 for simplicity. But it might be more performance intensive in some operations like sort etc because you would be moving the entire Node and not a pointer to it.
I assume that is not a problem, since you are using set. You can still optimize this by using move semantics on your Node object. With out this you are still using 1 copy per add.
The above issue I mention might have been a problem with vector. Another issue you would have with storing the objects directly is the lack of polymorphism. You cant store subtypes of Node, they would get sliced.
If this is an issue I would suggest option 2. Storing pointers means that moving them is faster, and Polymorphism works.
I see no reason for Option 1 or your original solution.
p.s. the this-> in your code is unnecessary.
p.p.s As DyP points out set uses heap anyway, which is what makes Option 2 good. Clue - Stack based structures cannot grow. => Only std::array is I believe stored on stack.

Let me talk a little about the meta problem: You don't want the stack to overflow and hence put your data structures on the heap. That's the right thing to do. But the important thing to understand here is when things will be put onto the heap.
Every local variable is allocated on the stack. If you have data structures of dynamic size, then they refer to the heap in (allmost) all cases. (The only exception I know is when you reserve memory on the stack on purpose with alloca() or std::get_temporary_buffer() or something like it). In particular all STL containers keep their memory on the heap and hardly any stack memory for local variables or member variables is used (except std::array whose size is known at compile-time).
Hence wrapping dynamically sized data structures into unique_ptrs has very little effect, if you want to save stack memory, but it adds indirection to your program which complicates your code, slows down execution and increases heap memory usage unnecessarily.
Here's an example: On Visual Studio 2010 with 32-bit compilation an std::set will use 20 bytes of memory on the stack independent of the template type parameter and of the actual number elements contained in the set. The memory for the set elements is on the heap.
I believe, that you can now make your own decision on whether to use unique_ptrs for the purpose you intent.

Basically it also depends how you want to access the stored Node instances inside your RoadMap instance. I assume your Node instance will release the wrapped note data.
I would go for an adjusted version 2.

Related

Intuition on C++ situations where an an unknown number of objects of a custom class will be needed at runtime

Everything below has to do with situations where a developer makes a custom C++ class (I have in mind something like OnlyKnowDemandAtRuntime below)... and there can be no way of knowing how many instances/objects "the user" will need during runtime.
Question 1: As a sanity check, is it fair to say that in Case One below, RAII is being used to manage the "dynamic" usage of OnlyKnowDemandAtRuntime?
Question 2: Is it correct to say that in case two, RAII isn't the first option that comes to mind simply because it is inconvenient (possibly a major understatement here) to hack up a way to wrap the nodes of a tree in an STL container? And, therefore, it is simpler to just use new and destructor/delete (or smart pointers) here, rather than scramble for a way to have the standard library manage memory for us. Note: Nothing here is a question about whether trees are often used in day to day work; rather, everything in this post is about the intuition behind the decision making one must do when using C++ to create objects at runtime. Note: Of course smart pointers are themselves part of the library, and of course they handle memory for us just as the library containers do... but for the purposes of this question I'm putting smart pointers and new on the same footing: Because my question is about the limits on the abilities of the STL containers to have more and more instances of something like OnlyKnowDemandAtRuntime inserted into them at runtime, while also being able to handle the relationships between said instances (without adding lots of logic to keep track of where things are in the container).
Question 3: If 1 and 2 are reasonable enough, then would a fair summary be this: [When a developer makes a custom class but doesn't know how many objects of it will be needed during runtime], either...
Wrap the objects in an STL container when the structure between said objects is "trackable" with the STL container being used (or perhaps trackable with the STL container being used plus some reasonably simple extra logic), or
Explicitly use the heap to build the objects with new and destructor/delete, or smart pointers, and manually build the structure "between" said objects (as in left_ and right_ of Case Two below).
Quick reminder, this isn't about whether we need to build trees in day to day work with C++. Also, (and I suppose this is already clear to anyone who would answer this question) this is not about "use the heap when an object is too big for the stack or when an object needs a lifetime beyond the scope in which it was created".
Case One:
// This is the class "for" which an unknown of objects will be created during runtime
class OnlyKnowDemandAtRuntime {
public:
OnlyKnowDemandAtRuntime(int num) : number_(num) {};
private:
int number_;
};
// This is the class where an a priori unknown number of `OnlyKnowDemandAtRuntime` objects are created at runtime.
class SomeOtherClass {
public:
void NeedAnotherOnlyKnownAtRuntime(int num) {
v_only_know_demand_at_runtime_.emplace_back(num);
}
private:
std::vector<OnlyKnowDemandAtRuntime> v_only_know_demand_at_runtime_;
}
Case Two:
// This is the class "for" which an unknown of objects will be created during runtime
class Node{
public:
Node(int value) : value_(value), left_(nullptr), right_(nullptr) {};
private:
int value_;
Node *left_;
Node *right_;
friend class Tree;
};
// This is the class where an a priori unknown number of `Node` objects are created at runtime.
class Tree {
public:
~Tree() { // Traverse tree and `delete` every `Node *` //}
void Insert(int value) {
Node *new_node = new Node(value);
ThisMethodPlacesNewNodeInTheAppropriateLeafPosition(new_node);
}
private:
Node *root;
}
Not to your literal questions but you might find this useful.
Smart pointers like std::unique_ptr are most basic RAII classes.
Using RAII is the only reasonably sane way to ensure exception safety.
In your particular example, I’d use std::unique_ptr<Node> specifically. With arbitrary graph that’d be more complicated ofc.
Also,
makes a custom C++ class but doesn't know how many objects of it will be needed during runtime.
That’s highly unspecific. It is important that you have a container (be it SomeOtherClass or Tree or whatever) that manages these objects. Otherwise, things may become really really complicated.

How to use new and delete in this scenario

So I have a class aCollection that has as its members a binary search tree and a hash table (for organizing data by different parameters). The way I designed the program to work was that aCollection has an add(aVendor& vendor) function that takes a dummy vendor object created in main and produces a pointer (using new) that points to a vendor object which then is passed to the bst and hash table's add functions.
Within the bst and hash table, they use new to create a node that contains a pointer to a vendor object and the requisite linking pointers (next, left, right, etc.)
In summary a dummy data object goes to aCollection::add(aVendor& vendor), and a pointer to aVendor object (with the data inside it) is sent to the bst and hash table which then store that pointer in their own node objects they declared using new.
My question is, how should I use delete to properly release the memory? The bst and hash table share the pointers to the aVendor object that was passed to them and they each have their own nodes to delete. I know I need to call delete in the bst and hash table's remove functions (to delete their respective nodes) but how do I ensure that aVendor that is created in aCollection is deleted once and only once?
P.S. Is calling new in aCollection even necessary? I figure the pointer needs to stay allocated so the data always exists.
The code is a bit verbose so I made a quick illustration of what is going on.
----Solution----
Thanks to Ped7g's excellent explanation I figured out that since aCollection should be the function deleting the pointers I needed to keep track of the pointers to be deleted. Going by his/her suggestion I decided to use a std::list to add all the pointers added to the program to a list, then I designed a while loop in the destructor that iterates through those pointers and deletes them, thus preventing memory leaks stemming from aCollection, here is the code I wrote to do so.
20 //Constructor function
21
22 aCollection::aCollection()
23 {
24 //allocates one instance of a hash table and bst
25 hashTable = new hashFunctions;
26 bst = new aBst;
27
28 //Creates a list to track ptrs
29 trackList = std::list<aVendor*>();
30 return;
31 }
32
33 //Destructor
34 aCollection::~aCollection()
35 {
36 //Destroys hashTable and bst
37 delete hashTable;
38 delete bst;
39 //Deletes vendor pointer objects
40 while(!trackList.empty())
41 {
42 delete trackList.front();
43 trackList.pop_front();
44 }
45 return;
46 }
In the add function I used this line of code to add the pointers to the list
84 trackList.push_front(vendorPtr);
And finally this is how I declared the list as part of aCollection.h
43 list<aVendor*> trackList;
Try to follow the "everything belongs somewhere" principle.
Only the owner of information handles new/delete (better to say "avoids new/delete as much as possible"), other subjects don't care, if they receive a pointer, they assume it's live trough the whole time of their processing - because that's the owner responsibility, if it gave the pointer away, it should be aware how [long] it will be used outside, and adjust it's strategy of new/delete to fulfil those needs. So they don't delete it.
In your case it depends a lot, whether you remove vendors often from the structure, or you only add vendors to it. If you remove vendors only absolutely rarely and huge performance penalty is allowed then, you can have std::vector<aVendor> vendors; (1) in aCollection, giving the bs and hash nodes only iterator (or pointer/index) to the vendor. In case vendor is removed from vector, all bs/hash nodes have to updated with fresh iterator (pointer/index) = performance penalty upon remove. Well, actually inserting vendors will invalidate iterators and pointers too, only indices would survive, so use indices then - which also makes quite clear, how bs/hash nodes care about delete of vendor (you don't delete index, makes no sense).
If you remove vendors often, a std::list is better choice, as inserting/removing does not invalidate iterators, so all the copies of iterators in bs/hash nodes will remain correct.
Overall it looks like you wrote your own implementation of std::* containers... any particular reason for that? (it's a good learning exercise, and in very rare cases the performance decision, but in such case you would end with completely different design, because what you have is as horrible as std::list performance wise ... otherwise in production code it's usually much more efficient to stick with standard containers and design implementation around them, as they have reasonably OK-ish performance, and you don't have to implement them, only use).
(1) avoids new/delete completely. If that way is not practical for you (aVendor default constructor is costly), use std::vector<aVendor *> vendors; with manual new/delete upon insert/removal of vendor.
edit:
"how do I ensure that aVendor that is created in aCollection is deleted once and only once?"
Well, you delete it just once, in aCollection.
It's not clear from the question, what is your problem (maybe you are struggling with detection when a vendor is deleted in all nodes, and you want to release it from aCollection then too? That's completely different question, and would require much more architecture insight into the app algorithm, to see if there's some good place to detect "dangling" vendor no more used by any node, triggering delete in aCollection).
edit: how to new/delete example:
live example
#include <iostream>
#include <list>
#include <string>
class Vendor {
private:
std::string name;
public:
Vendor(const std::string & name) : name(name) {}
const std::string & getName() const { return name; }
};
// A bit pointless example how to handle naked new/delete.
class VendorList {
private:
std::list<Vendor *> vendors;
// usually with neatly designed classes the std::list<Vendor>
// would suffice, saving all the hassle with new/delete
// (that's why this example is a bit pointless)
// Also storing around iterators to internal list outside
// of VendorList class feels quite wrong, that's a code smell.
public:
~VendorList() {
std::cout << "~VendorList() destructor called.\n";
// release any remaining vendors from heap
auto vendorIterator = vendors.begin();
while (vendorIterator != vendors.end()) {
auto toRemove = vendorIterator++;
removeVendor(toRemove);
}
// release the (now invalid) pointers
vendors.clear();
// at this point, whoever still holds iterator
// to a vendor has a problem, it's invalid now.
}
// stores vendor into collection of vendors
// + data of vendor are allocated on heap by "new"
// returns iterator pointing to the newly added vendor
// (not a best choice for public API)
std::list<Vendor *>::iterator addVendor(const Vendor & vendor) {
Vendor * copyOnHeap = new Vendor(vendor);
std::cout << "VendorList: adding vendor: "
<< copyOnHeap->getName() << std::endl;
return vendors.insert(vendors.end(), copyOnHeap);
}
// removes particular vendor from the list
// to be used after the rest of application does not hold any iterator
void removeVendor(std::list<Vendor *>::iterator vendor_iterator) {
std::cout << "VendorList: releasing specific vendor: " <<
(*vendor_iterator)->getName() << std::endl;
// release the heap memory containing vendor's data
delete *vendor_iterator;
// remove the released pointer from list
vendors.erase(vendor_iterator);
// at this point, whoever still holds iterator
// to that vendor has a problem, it's invalid now.
}
const std::list<Vendor *> & get() const {
return vendors;
}
};
int main()
{
VendorList vlist;
vlist.addVendor(Vendor("v1"));
auto v2iterator = vlist.addVendor(Vendor("v2"));
vlist.removeVendor(v2iterator);
for (auto vendorPtr : vlist.get()) {
std::cout << "Vendor in list: " << vendorPtr->getName() << std::endl;
}
}
I'm not really happy about this example, as it feels wrong on many levels (like bad OOP design), but to make the API fit your purpose, the purpose would have to be known first.
So take this only as an example where new and delete belongs. (only once in VendorList, which is owner + responsible for managing these. Nobody else in app should use new/delete on Vendor, they should instead call the VendorList add/remove functions, and let the list to manage the implementation details (like where it is stored, and how many new/deletes are per vendor used).
Usually by designing your data classes lean, and avoiding naked pointers, you can avoid new/delete completely in C++ code. You can try to turn this example into std::list<Vendor> variant, the code will be simplified a lot (empty destructor code, as default would release the list), removal/insert only called on the list, etc.
You then handle the life cycle of data in memory by scope. Like in main you have VendorList vl;, as that instance of vendor list will be used trough whole life cycle of application. Or if you need it only during invoice processing, you can declare it inside processInvoices(), again as local variable, init it, and then forget about it. It will get released when you go out of scope of processInvoices().
Which will release all initialised Vendors, as they belong into VendorList. Etc..
So as long as you manage to design your classes with clear "belongs to" and "responsible for", you can get to a great lengths by using only local/member variables, not using new/delete/shared_ptr/etc... at all. The source will look almost like Java with GC, just shorter and faster, and you implicitly know when the release of particular data happens (when they go out of their scope).

C++ Linked List remove all

So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?
You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.
You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.
Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.

why to use non-pointer for c++ fields?

In these days I'm starting to get my feet wet with c++ and, due to my Java-ish background, I obviously have some problems in understanding some c++ features.
Since java offers only references and primitives one of the most mysterious c++ feature for me is non-pointer (and non-primitive) fields.
Here is an example of what I mean.
If I should write a c++ implementation in c++ of a list of objects of type X I would write something like:
class XList{
private:
struct node {
X* data;
node* next;
};
node* first;
public:
*/
a lot of methods
*/
}
This code is probably awful, I know about templates, STL and whatnot but the problem for me here is just the field "data". If a declare "data" as a X pointer I presume that I can use it in a way very similar to Java references.
What could be instead the reason to declare data as a X (X data;). What is the difference? I know the difference between allocating on the stack and on the heap but is there any connection here?
Please help me get a bit more of a grip on this topic.
Thank you.
--- UPDATE: ----
Most of the answers seem to focus on the difference between using the plain type on a pointer in general.
Probably I wrote the question in the wrong way, but I already know the difference between allocating on the stack or on the heap (the basics at least).
What I can't understand is that in my (probably wrong) opinion the usage of a plain type in a member variables (not field, thank you for your correction) should be just some kind of corner case. Especially when templates are involved a copy of the data makes no sense to me.
Instead every time I see an implementation of some data structure the plain type is used.
E.g.: If you search "bst c++ template" on google you will find a lot of implementation like this one:
template<class T>
class BinarySearchTree
{
private:
struct tree_node
{
tree_node* left;
tree_node* right;
T data;
};
tree_node* root;
public:
/*
methods, methods and methods
*/
};
Do you really want to make a copy of every data of type T inserted on this tree without knowing its size? Since I'm new to the language I suppose that I misunderstood something.
The advantage of using an X instead of an X * is that with the pointer, you have to allocate the space for the X as well, which uses more space (4 bytes or 8 bytes for the pointer, plus the overhead of the allocation for the X via new), whereas with the plain type, you avoid that overhead. So, it is simpler just to use the plain X.
You'd use the pointer when you definitively do not want to make a copy of the X value, but you could end up with dangling pointers if you are not careful. You'd also use the pointer if there are circumstances where you might not have an object to point to.
Summary
Use the direct object to simplify memory management.
Use the pointer when you cannot afford copying or need to represent the absence of a value.
The difference is that a pointer points to an as yet allocated or determined hunk of memory. But when you leave off the *, you're saying "allocated space for this entire class (not just a pointer to a class) along with this class."
The former, using a pointer, puts all the memory allocation and maintenance in your hands. The latter just gives you the object as a member of your class.
The former doesn't use much space (what, four to eight bytes, depending on architecture?) The latter can use a little up to a LOT depending on what the class X has as its member.
I use non-pointer data members when: 1) I'm sure this data shouldn't and won't be shared among objects 2) I can rely on automatic deallocation when the oject is finally destroyed. A good example is a wrapper (of something to be wrapped):
class Wrapper
{
private:
Wrapped _wrapped;
public:
Wrapper(args) : _wrapped(args) { }
}
The primary difference is that if you declare an X* you're responsible for memory management (new and delete) on the heap, while with an X the memory is handled on the stack, so it's allocated/freed with the scope.
There are other subtle things as well, like worrying about assignment to self, etc.

Theory on C++ convention regarding cleanup of the heap, a suggested build, is it good practice?

I have another theory question , as the title suggested it's to evaluate a build of code. Basically I'm considering using this template everywhere.
I am using VC++ VS2008 (all included)
Stapel.h
class Stapel
{
public:
//local vars
int x;
private:
public:
Stapel();
Stapel(int value);
~Stapel(){}
//getters setters
void set_x(int value)
{
x = value;
}
int get_x(int value)
{
x = value;
}
void CleanUp();
private:
};
Stapel.cpp
#include "Stapel.h"
Stapel::Stapel()
{
}
Stapel::Stapel(int value)
{
set_x(value);
}
void Stapel::CleanUp()
{
//CleanUpCalls
}
The focal point here is the cleanup method, basically I want to put that method in all my files everywhere , and simply let it do my delete calls when needed to make sure it's all in one place and I can prevent delete's from flying around which , as a rookie, even I know is probably not something you want to mess around with nor have a sloppy heap.
What about this build?
Good bad ? why ?
And what about using destructors for such tasks?
Boost provides several utilities for RAII-style heap-managment:
Smart pointer (there are several implementations here for different scenarios)
Pointer Containers
Drawbacks of your proposal:
In your implementation, you still have to remember to place a delete in the CleanUp-method for every heap-allocation you do. Tracking these allocations can be very difficult if your program has any kind of non-linear control flow (some allocations might only happen under certain circumstances). By binding the deallocation of resources (in this case memory) to the lifetime of objects on the stack, you do not have to worry as much. You will still have to consider things like circular references.
RAII helps you write exception-safe code.
In my experience, RAII leads to more structured code. Objects that are only needed inside a certain loop or branch will not be initialized somewhere else, but right inside the block where they are needed. This makes code easier to read and to maintain.
Edit: A good way to start implementing that is to get Boost. Then search your code for raw pointers, and try to replace every pointer by
A reference
A smart-pointer
A pointer container, if it is a container that owns pointers
If this is done, your code should not contain any deletes anymore. If you use make_shared, you can even eliminate all news. If you run into any problems that you cannot solve by yourself, check out stackoverflow.com ... oh wait, you know that one already ;)
Use smart pointers and RAII instead. That will not center all the deletes in one place, but rather remove them from your code. If you need to perform any cleanup yourself, that is what destructors are for, use them as that is the convention in C++.