C++ - When to use pointers to class data members? - c++

I've been coding in C++ and I was wondering if someone could help me with the general reason why we sometimes need to make pointers to class members and other times we don't.
For example if we are coding a Binary Tree
I implement it as
class BinaryTree{
BinaryTree * left;
BinaryTree * right;
int val;
public:
BinaryTree(int v) {left = NULL; right = NULL; val = v;}
//implementation of any other neccessary functions
};
I use the BinaryTree pointers to left and right, because we can't do it without the pointer since BinaryTree does not exist at that point in time.
Are there any other reasons to do this? Is there anyway around this?
Also, if we put pointer member functions, will the implicit destructor handle the deletion of them?
Thanks for your time.

This is a box:
it has a volume of about one cubic meter, so it can only store objects that have total volume of one cubic meter. And it definitely can't store two identical boxes as itself. Note that each one of these two boxes would also need to contain two boxes like it, and so on, and so on.
This is a struct:
struct BinaryTree {
BinaryTree left;
BinaryTree right;
int val;
};
it has a finite size equal to sizeof(BinaryTree), so it can only store objects that have total size less or equal to sizeof(BinaryTree). And it definitely can't store two values of type BinaryTree. Note that each one of these two values would also need to store two values like it, and so on, and so on.
Since the struct instances can't contain other instances of the same struct, and we need to define relations between them, and trees are definitely hierarchical, we use pointers here.
Note that the only thing that so called raw pointer to T (that is, T*) does, is to point to T. Since pointing is the only task of such pointer, destruction won't destroy the pointed object, only the pointer.
There exist types that behave like pointers, but also do other tasks, like managing lifetime of pointed object. These are C++11's std::unique_ptr, and std::shared_ptr, and many others. I highly recommend using them.

Objects often hold members that only need to be created based on run time conditions or parameters. You want to delay creation as late as possible. This is a common case for using pointers.

Related

Modern C++ Object Relationships

I have a graph implemented using a struct Node and a struct Edge where:
Each Edge has a start and an end Node
Each Node maintains a list of Edge objects which start from or end at it
The following is one possible implementation:
struct Node;
struct Edge {
Node *st;
Node *en;
int some_data;
};
const int MAX_EDGES = 100;
struct Node {
Edge *edges[MAX_EDGES];
int some_data;
};
While the above structs can represent the graph I have in mind, I would like to do it the "Modern C++" way while satisfying the following requirements:
Avoid pointers
Use an std::vector for Node::edges
Be able to store Node and Edge objects in standard C++ containers
How is this done in Modern C++? Can all of 1-3 be achieved?
Avoid pointers
You can use std::shared_ptr and std::weak_ptr for this. Just decide whether you want nodes to own edges, or vice versa. The non-owning type should use weak_ptr (to avoid cycles).
Unless your graph is acyclic you might still need to be careful about ownership cycles.
std::unique_ptr is not an option, because there is not a one-to-one relationship between nodes and edges, so there cannot be a unique owner of any given object.
Use an std::vector for Node::edges
No problem. Make it a std::vector<std::weak_ptr<Edge>> or std::vector<std::shared_ptr<Edge>> (depending whether edges own nodes or vice versa)
Be able to store Node and Edge objects in standard C++ containers
No problem, just ensure your type can be safely moved/copied without leaking or corrupting memory, i.e. has correct copy/move constructors and assignment operators. That will happen automatically if you use smart pointers and std::vector as suggested above.
Modern C++ eschews the assignment of dynamic memory to a raw pointer. This is because it is all to easy to forget to delete said pointer. Having said that there is nothing wrong with the use of raw pointers as reference to an object provided you can guarantee that the object's lifetime will be greater than the use of said pointer.
The rules generally are:
Use std::unique_ptr if an object has single owner.
Use raw pointers to reference objects created in 1. provided you can guarantee that the object's lifetime will be greater than the use of your reference.
Use std::shared_ptr for reference counted objects
Use std::weak_ptr to refer to a reference counted object when you do not want to increase the refernce count.
So in your case, if the Edge owns the Nodes then use std::unique_ptr, if not, the keep the raw pointers.
In your Node class, if the Node owns the Edges use a std::vector<Edge> otherwise use a std::vector<Edge*> although it might be more efficient to link the your Edges together in their own intrusive linked list.
Having done some work on complex graphs, it might be allocate all your Nodes and Edgees in a vector outside your graph and then only refer to them internally using raw pointers inside the graph. Remember memory allocation is slow so the less you do the faster your algorithm will be.
By using std::shared_ptr or std::unique_ptr
I don't think vector is a right choice here since a graph usually is not linear (usually speaking, also ,in most cases you can't linearize it like you can with a heap)
there is no standard 'general-use' container , but you can use templates here for generity
for example, your Element class can look like this:
template <class T>
struct Elem {
std::shared_ptr<Node> st , en;
T some_data;
};
speaking of modern C++ , I don't think struct is encouraged here , you ahould encapsulate you data

Statically allocating array of inherited objects

The title of this question is pretty convoluted, so I'll try to frame it with an example. Let's say that I have an abstract base class, with a number of classes which inherit from it. In the example below I've only shown two inherited classes, but in reality there could be more.
class Base {
public:
Base();
virtual ~Base() = 0;
/// Other methods/members
};
class SmallChild: public Base {
public:
SmallChild();
~SmallChild();
/// Other methods/members such that sizeof(SmallChild) < sizeof(LargeChild)
};
class LargeChild : public Base {
public:
LargeChild();
~LargeChild();
/// Other methods/members such that sizeof(LargeChild) > sizeof(SmallChild)
};
I need to implement a container which stores up to N inherited objects. These objects need to be created/destroyed at runtime and placed in the container, but due to constraints in the project (specifically that it's on embedded hardware), dynamic memory allocation isn't an option. The container needs to have all of its space statically allocated. Also, C++11 is not supported by the compiler.
There was only one way I could think to implement this. To reference the N objects, I'd first need to create an array of pointers to the base class, and then to actually store the objects, I'd need to create a buffer large enough to store N copies of the largest inherited object, which in this case is LargeChild
Base * children[N];
uint8_t childBuffer[N * sizeof(LargeChild)];
I could then distribute the pointers in children across childBuffer, each separated by sizeof(LargeChild). As objects need to be created, C++'s "placement new" could be used to place them at the specified locations in the array. I'd need to keep track of the type of each object in childBuffer in order to dereference the pointers in children, but this shouldn't be too bad.
I have a few questions regarding this entire setup/implementation:
Is this a good approach to solving the problem as I've described it? I've never implemented ANYTHING like this before, so I have no idea if I'm way out to lunch here and there's a much easier way to accomplish this.
How much of this can be done at compile-time? If I have M types of inherited classes (SmallChild, LargeChild, etc.) but I don't know their size in relation to each other, how can I determine the size of childBuffer? This size depends on the size of the largest class, but is there a way to determine this size at compile-time? I can imagine some preprocessor macros iterating through the classes, evaluating sizeof and finding the maximum, but I have very little experience with this level of preprocessor work and have no idea what this would look like. I can also imagine this being possible using templates, but again, I don't have any experience with compile-time template sorcery so I'm only basing this on my intuition. Any direction on how to implement this would be appreciated.
Do you need to be able to dealocate the objects? If not, it may be easier to override operator new. I refer to this:
void* operator new (std::size_t size) throw (std::bad_alloc);
All your overrides would allocate memory from a sinle large buffer. How much memory to allocate is specified by the size parammeter.
This way you should be able to just say
children[i] = new SmallChild();
Edit: if you do need to deallocate, you need more complex data structures. You may end up re-implementing the heap anyway.
If the set of objects is fully static (set at build time and doesn't change at runtime), the usual approach is to use a set of arrays of each derived class and build up the 'global' array with pointers into the other arrays:
static SmallChild small_children[] = {
{ ...initializer for first small child... },
{ ...initializer for second small child... },
...
};
static LargeChild large_children[] = {
{ ...initializer for first large child... },
...
};
Base *children[N] = { &small_children[0], &small_children[1], &large_children[0], ....
This can be tricky to maintain if there are children being added/removed from the build frequently, or if the order in the children array is important. It may be desirable to generate the above source file with a script or build program that reads a description of the children needed.
Your approach is interesting, given your constraints (i.e. no use of dynamic allocation).
In fact you are managing on your own way a kind of array of union anyChild { smallChild o1; largeChild o2; ... }; The sizeof(anyChild) would give you the largest block size you are looking for.
By the way, there could be a risk of dangling pointers in you approach, as long as all objects have not been created with the the placement new, or if some of them are deleted through explicit call of their destructor.
if you put your derived types into a union:
union Child{
SmallChild asSmallChild;
LargeChild asLargeChild;
}
Then the union will automatically be of the sizeof the largest type. Of course, now you have a new problem. What type is represented in the union? You could give yourself a hint in the base Class, or you could instead make Child a struct which contains a hint and then the union inlined within. For examples look at components made by Espressif for ESP32 on the githubs, lots of good union uses there.
Anyways, when you go to allocate, if you allocate an array of the union'ed type it will make an array of largest children... because that's what unions do.

why to use non-pointer for c++ fields?

In these days I'm starting to get my feet wet with c++ and, due to my Java-ish background, I obviously have some problems in understanding some c++ features.
Since java offers only references and primitives one of the most mysterious c++ feature for me is non-pointer (and non-primitive) fields.
Here is an example of what I mean.
If I should write a c++ implementation in c++ of a list of objects of type X I would write something like:
class XList{
private:
struct node {
X* data;
node* next;
};
node* first;
public:
*/
a lot of methods
*/
}
This code is probably awful, I know about templates, STL and whatnot but the problem for me here is just the field "data". If a declare "data" as a X pointer I presume that I can use it in a way very similar to Java references.
What could be instead the reason to declare data as a X (X data;). What is the difference? I know the difference between allocating on the stack and on the heap but is there any connection here?
Please help me get a bit more of a grip on this topic.
Thank you.
--- UPDATE: ----
Most of the answers seem to focus on the difference between using the plain type on a pointer in general.
Probably I wrote the question in the wrong way, but I already know the difference between allocating on the stack or on the heap (the basics at least).
What I can't understand is that in my (probably wrong) opinion the usage of a plain type in a member variables (not field, thank you for your correction) should be just some kind of corner case. Especially when templates are involved a copy of the data makes no sense to me.
Instead every time I see an implementation of some data structure the plain type is used.
E.g.: If you search "bst c++ template" on google you will find a lot of implementation like this one:
template<class T>
class BinarySearchTree
{
private:
struct tree_node
{
tree_node* left;
tree_node* right;
T data;
};
tree_node* root;
public:
/*
methods, methods and methods
*/
};
Do you really want to make a copy of every data of type T inserted on this tree without knowing its size? Since I'm new to the language I suppose that I misunderstood something.
The advantage of using an X instead of an X * is that with the pointer, you have to allocate the space for the X as well, which uses more space (4 bytes or 8 bytes for the pointer, plus the overhead of the allocation for the X via new), whereas with the plain type, you avoid that overhead. So, it is simpler just to use the plain X.
You'd use the pointer when you definitively do not want to make a copy of the X value, but you could end up with dangling pointers if you are not careful. You'd also use the pointer if there are circumstances where you might not have an object to point to.
Summary
Use the direct object to simplify memory management.
Use the pointer when you cannot afford copying or need to represent the absence of a value.
The difference is that a pointer points to an as yet allocated or determined hunk of memory. But when you leave off the *, you're saying "allocated space for this entire class (not just a pointer to a class) along with this class."
The former, using a pointer, puts all the memory allocation and maintenance in your hands. The latter just gives you the object as a member of your class.
The former doesn't use much space (what, four to eight bytes, depending on architecture?) The latter can use a little up to a LOT depending on what the class X has as its member.
I use non-pointer data members when: 1) I'm sure this data shouldn't and won't be shared among objects 2) I can rely on automatic deallocation when the oject is finally destroyed. A good example is a wrapper (of something to be wrapped):
class Wrapper
{
private:
Wrapped _wrapped;
public:
Wrapper(args) : _wrapped(args) { }
}
The primary difference is that if you declare an X* you're responsible for memory management (new and delete) on the heap, while with an X the memory is handled on the stack, so it's allocated/freed with the scope.
There are other subtle things as well, like worrying about assignment to self, etc.

FIFO implementation

While implementing a FIFO I have used the following structure:
struct Node
{
T info_;
Node* link_;
Node(T info, Node* link=0): info_(info), link_(link)
{}
};
I think this a well known trick for lots of STL containers (for example for List). Is this a good practice? What it means for compiler when you say that Node has a member with a type of it's pointer? Is this a kind of infinite loop?
And finally, if this is a bad practice, how I could implement a better FIFO.
EDIT: People, this is all about implemenation. I am enough familiar with STL library, and know a plenty of containers from several libraries. Just I want to discuss with people who can gave a good implementation or a good advice.
Is this a good practice?
I don't see anything in particular wrong with it.
What it means for compiler when you say that Node has a member with a type of it's pointer?
There's nothing wrong with a class storing a pointer to an object of the same class.
And finally, if this is a bad practice, how I could implement a better FIFO.
I'd use std::queue ;)
Obviously you are using linked-list as the underlying implementation of your queue. There's nothing particularly bad about that.
Just FYI though, that in terms of implementation, std::queue itself is using std::deque as its underlying implementation. std::deque is a more sophisticated data structure that consists of blocks of dynamic arrays that are cleverly managed.
It ends up being better than linked list because:
With linked-list, each insertion means you have to do an expensive dynamic memory allocation. With dynamic arrays, you don't. You only allocate memory when the buffer has to grow.
Array elements are contiguous and that means elements access can be cached easily in hardware.
Pointers to objects of type that is being declared is fine in both C and C++. This is based on the fact that pointers are objects of fixed size (say, always 32-bit integers on 32-bit platform) so you don't need the full size of the pointed-to type to be known.
In fact, you don't even need a full type declaration to declare a pointer. A forward declaration would suffice:
class A; // forward declared type
struct B
{
A* pa; //< pointer to A - perfectly legal
};
Of course, you need a full declaration in scope at the point where you actually access members:
#include <A.hpp> // bring in full declaration of class A
...
B b;
b.pa = &a; // address of some instance of A
...
b.pa->func(); // invoke A's member function - this needs full declaration
For FIFO look into std::queue. Both std::list, std::deque, and std::vector could be used for that purpose, but also provide other facilities.
You can use the existing FIFO, std::queue.
This is one good way of implementing a node. The node pointer is used to create the link to the next node in the container. You're right though, it can be used to create a loop. If the last node in the container references the first, iterating that container would loop through all of the nodes.
For example, if the container is a FIFO queue the pointer would reference the next node in the queue. That is, the value of link_ would be the address of another instance of class Node.
If the value type T implemented an expensive copy constructor, a more efficient Node class would be
struct Node
{
T * info_;
Node* link_;
Node(T * info, Node* link=0): info_(info), link_(link)
{}
};
Note that info_ is now a pointer to an instance of T. The idea behind using a pointer is that assigning a pointer is less expensive than copying complex objects.

Are data structures an appropriate place for shared_ptr?

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?
I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.
Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>
Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.
Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)
Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.
There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.
Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.