C++ Etiquette about Member Variables on the Heap

C++ Etiquette about Member Variables on the Heap - c++

Is it considered bad manners/bad practice to explicitly place object members on the heap (via new)? I would think you might want to allow the client to choose the memory region to instantiate the object. I know there might be a situation where heap members might be acceptable. If you know a situation could you describe it please?

If you have a class that's designed for copy semantics and you're allocating/deallocating a bunch of memory unnecessarily, I could see this being bad practice. In general, though, it's not. There are a lot of classes that can make use of heap storage. Just make sure you're free of memory leaks (deallocate things in the destructor, reference count, etc.) and you're fine.
If you want more flexibility, consider letting your user specify an Allocator. I'll explain.
Certain classes, e.g. std::vector, string, map, etc. need heap storage for the data structures they represent. It's not considered bad manners; when you have an automatically allocated vector, the user is expected to know that a buffer is allocated when the vector constructor gets called:
void foo() {
// user of vector knows a buffer that can hold at least 10 ints
// gets allocated here.
std::vector<int> foo(10);
}
Likewise, for std::string, you know there's an internal, heap-allocated char*. Whether there's one per string instance is usually up to the STL implementation; often times they're reference counted.
However, for nearly all of the STL classes, users do have a choice of where things are put, in that they can specify an allocator. vector is defined kind of like this:
template <typename T, typename Alloc = DefaultAllocator<T> >
class vector {
// etc.
};
Internally, vector uses Alloc (which defaults to whatever the default allocator is for T) to allocate the buffer and other heap storage it may need. If users doesn't like the default allocation strategy, they can specify one of their own:
vector<int, MyCustomAllocator> foo(10);
Now when the constructor allocates, it will use a MyCustomAllocator instead of the default. Here are some details on writing your own STL allocator.
If you're worried that it might be "bad manners" to use the heap for certain storage in your class, you might want to consider giving users of your class an option like this so that they can specify how things are to be allocated if your default strategy doesn't fit their needs.

I don't consider it bad practice at all. There are all sorts of reasons why you might want to explicitly allocate a member variable via new. Here are a few off the top of my head.
Say your class has a very large buffer, e.g., 512kb or 1MB. If this buffer is not stored on the heap, your users might potentially exceed the default stack space if they create multiple local variables of your class. In this case, it would make sense to allocate the buffer in your constructor and store it as a pointer.
If you are doing any kind of reference counting, you'll need a pointer to keep track of how many objects are actually pointing to your data.
If your member variable has a different lifetime than your class, a pointer is the way to go. A perfect example of this is lazy evaluation, where you only pay for the creation of the member if the user asks for it.
Although it is not necessarily a direct benefit to your users, compilation time is another reason to use pointers instead of objects. If you put an object in your class, you have to include the header file that defines the object in the header file for your class. If you use a pointer, you can forward declare the class and only include the header file that defines the class in the source files that need it. In large projects, using forward declarations can drastically speed up compilation time by reducing the overall size of your compilation units.
On the flip side, if your users create a lot of instances of your class for use on the stack, it would be advantageous to use objects instead of pointers for your member variables simply because heap allocations/deallocations are slow by comparison. It's more efficient to avoid the heap in this case, taking into account the first bullet above of course.

Where the class puts its members is less important than that the management of them is contained within the class; i.e. clients and subclasses shouldn't have to worry about the object's member variable.
The simplest way to do this would be to make them stack variables. But in some cases, such as if your class has a dynamic data structure like a linked list, it doesn't make sense.
But if you make sure your objects clean up after themeselves, that should be fine for most applications.

hmm, I don't really understand your question.
If you have a class :
class MyOtherClass;
class MyClass
{
MyOtherClass* m_pStruct;
};
Then, the client of MyClass does not have a real choice on how m_pStruct will be allocated.
But it will be the client's decision on how the class MyClass will itself be allocated, either on the stack or on the heap:
MyClass* pMyClass = new MyClass;
or
MyClass myClass;

Related

Possible scenarios to justify heap-allocated variables scoped locally?

I encountered the following code snippet.
int main() {
auto a = new A(/* arguments */);
// Do something
delete a;
}
Here A is a very nontrivial class I cannot easily reason about (highly parallelized and networking involved). Because x is instantiated in main (or possibly any other function) and then deleted at the end of the scope, I thought there is no reason to heap allocate this variable a. Instead, I would simply instantiate it as a stack variable as A a(/* */).
However, I wonder if there are any valid reasons to allocate A dynamically. One thing that comes to my mind is the possibility of saving some space in stack if A is a massive object, while I doubt that this would really make sense with modern machines.

First of all there is no reason to not use a smart pointer here, i.e.
auto a = std::make_unique<A>(/* arguments */);
but as to why you would want to heap allocate rather than creating the object on the stack, reasons include
Size. Class A might be huge. Stack space is not inexhaustible; heap space is much much larger. Seriously, you can overflow the stack surprisingly easily, even on modern machines. You don't want to stack allocate an array of 100,000 items, etc.
You might need a pointer for runtime polymorphism. Say instead of calling A's constructor directly you are calling some factory function that returns a unique_ptr to a base class from which A inherits and the rest of your code depends on polymorphic calls to a; you'd need to use dynamic allocation in that case.

array class member with different size for each class instance

I would like to have a class that contains an array member, but the constructor lets me set the size of an array member.
Is this doable? I do not thing I need dynamic allocation, since once the class instances are created, there is no need for the array to change size, it is just that each class instance will have a different size.

Despite several comments suggest that this would be impossible, it is actually not impossible.
The simplest way, of course, is to use an indirection and allocate the array during construction just the normal way (with a = new type[size] and calling delete[] a - not delete a - in the destructor).
But if for some reason you really do not want to have the array data being allocated separately from your object, you can use placement-new to construct your object into a pre-allocated buffer that is large enough to contain all your elements. This avoids a separate allocation for your array and you can still have dynamic size.
I would not recommend using this technique, though, unless you really have a demanding use case for it.

For a data member, is there any difference between dynamically allocating this variable(or not) if the containing object is already in dynamic memory?

I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?

std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.

Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.

I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).

Store pointers or objects in classes?

Just a design/optimization question. When do you store pointers or objects and why? For example, I believe both of these work (barring compile errors):
class A{
std::unique_ptr<Object> object_ptr;
};
A::A():object_ptr(new Object()){}
class B{
Object object;
};
B::B():object(Object()){}
I believe one difference comes when instantiating on stack or heap?
For example:
int main(){
std::unique_ptr<A> a_ptr;
std::unique_ptr<B> b_ptr;
a_ptr = new A(); //(*object_ptr) on heap, (*a_ptr) on heap?
b_ptr = new B(); //(*object_ptr) on heap, object on heap?
A a; //a on stack, (*object_ptr) on heap?
B b; //b on stack, object on stack?
}
Also, sizeof(A) should be < sizeof(B)?
Are there any other issues that I am missing?
(Daniel reminded me about the inheritance issue in his related post in the comments)
So since stack allocation is faster than the heap allocation in general, but size is smaller for A than B, are these one of those tradeoffs that cannot be answered without testing performance in the case in question even with move semantics? Or some rules of thumbs When it is more advantageous to use one over the other?
(San Jacinto corrected me about stack/heap allocation is faster, not stack/heap)
I would guess that more copy constructing would lead to the same performance issue, (3 copies would ~ 3x similar performance hit as initiating the first instance). But move constructing may be more advantageous to use the stack as much as possible???
Here is a related question, but not exactly the same.
C++ STL: should I store entire objects, or pointers to objects?
Thanks!

If you have a big object inside your A class, then I'd store a pointer to it, but for small objects, or primitive types, you should not really need to store pointers, in most cases.
Also, when something is stored on the stack or on the heap (freestore) is really implementation dependent, and A a is not always guarantueed to be on the stack.
It's better to call this an automatic object, because it's storage duration is determined by the scope of the function it is declared in. When the function returns, a will be destroyed.
Pointers require the use of new and it does carry some overhead, but on machines today, I'd say it is trivial in most cases, unless of course you have start newing up millions of objects, then you will start seeing the performance issues.
Each situation is different, and when you should and shouldn't use a pointer, instead of an automatic object, is largely dependent on your situation.

This depends on a lot of specific factors, and either approach can have its merits. I'd say if you will exclusively use the outer object through dynamic allocation, then you might as well make all the members direct members and avoid the additional member allocation. On the other hand, if the outer object is allocated automatically, large members should probably be handled through a unique_ptr.
There's an additional benefit to handling members only through pointers: You remove compile-time dependencies, and the header file for the outer class may be able to get away with a forward-declaration of the inner class, rather than requiring full inclusion of the inner class's header ("PIMPL"). In large projects this sort of decoupling may turn out to be economically sensible.

The heap is not "slower" than the stack. Heap allocation can be slower than stack allocation, and poor cache locality may cause a lot of cache misses if you design your objects and data structures in such a way that there is not a lot of contiguous memory access. So from this standpoint, it depends on what your design and code use goals are.
Even setting this aside, you have to question your copy semantics too. If you want deep copies of your objects (and your objects' objects are also deeply copied), then why even store pointers? If it's okay to have shared memory due to copy semantics, then store pointers but make sure you don't free the memory twice in the dtor.
I tend to use pointers under two conditions: class member initialization order matters deeply, and I'm injecting dependencies into an object. In most other cases, I use non-pointer types.
edit: There are two additional cases when I use pointers: 1) to avoid circular include dependencies (although I may use a reference in some cases), 2) With the intention of using polymorphic function calls.

There are a few cases where you have almost no choice but to store a pointer. One obvious one is when you're creating something like a binary tree:
template <class T>
struct tree_node {
struct tree_node *left, *right;
T data;
}
In this case, the definition is basically recursive, and you don't know up-front how many descendants a tree node might have. You're pretty much stuck with (at least some variation of) storing pointers, and allocating descendant nodes as needed.
There are also cases like dynamic strings where you have only a single object (or array of objects) in the parent object, but its size can vary over a wide enough range that you just about need to (at least provide for the possibility to) use dynamic allocation. With strings, small sizes are common enough that there's a fairly widely-used "short string optimization", where the string object directly includes enough space for strings up to some limit, as well as a pointer to allow dynamic allocation if the string exceeds that size:
template <class T>
class some_string {
static const limit = 20;
size_t allocated;
size_t in_use;
union {
T short_data[limit];
T *long_data;
};
// ...
};
A less obvious reason to use a pointer instead of directly storing a sub-object is for the sake of exception safety. Just for one obvious example, if you store only pointers in a parent object, that can (usually does) make it trivial to provide a swap for those objects that gives the nothrow guarantee:
template <class T>
class parent {
T *data;
void friend swap(parent &a, parent &b) throw() {
T *temp = a.data;
a.data = b.data;
b.data = temp;
}
};
With only a couple of (usually valid) assumptions:
the pointers are valid to start with, and
assigning valid pointers will never throw an exception
...it's trivial for this swap to give the nothrow guarantee unconditionally (i.e., we can just say: "swap will not throw"). If parent stored objects directly instead of pointers, we could only guarantee that conditionally (e.g., swap will throw if and only if the copy constructor or assignment operator for T throws.")
For C++11, using a pointer like this often (usually?) makes it easy to provide an extremely efficient move constructor (that also gives the nothrow guarantee). Of course, using a pointer to (most of) the data isn't the only possible route to fast move construction -- but it is an easy one.
Finally, there are the cases I suspect you had in mind when you asked the question -- ones where the logic involved doesn't necessarily indicate whether you should use automatic or dynamic allocation. In this case, it's (obviously) a judgement call. From a purely theoretical viewpoint, it probably makes no difference at all which you use in these cases. From a practical viewpoint, however, it can make quite a bit of difference. Even though neither the C nor C++ standard guarantees (or even hints at) anything of the sort, the reality is that on most typical systems, objects using automatic allocation will end up on the stack. On most typical systems (e.g., Windows, Linux) the stack is limited to only a fairly small fraction of the available memory (typically on the order of single-digit to low double-digit megabytes).
This means that if all the objects of these types that might exist at any given time might exceed a few megabytes (or so) you need to ensure that (at least most of) the data is allocated dynamically, not automatically. There are two ways to do that: you can either leave it to the user to allocate the parent objects dynamically when/if they might exceed the available stack space, or else you can have the user work with relatively small "shell" objects that allocate space dynamically on the user's behalf.
If that's at all likely to be an issue, it's almost always preferable for the class to handle the dynamic allocation instead of forcing the user to do so. This has two obvious good points:
The user gets to use stack-based resource management (SBRM, aka RAII), and
The effects of limited stack space are limited instead of "percolating" through the whole design.
Bottom line: especially for a template where the type being stored isn't known up-front, I'd tend to favor a pointer and dynamic allocation. I'd reserve direct storage of sub-objects primarily to situations where I know the stored type will (almost?) always be quite small, or where profiling has indicated that dynamic allocation is causing a real speed problem. In the latter case, however, I'd give at least some though to alternatives like overloading operator new for that class.

Dynamic vs non-dynamic class members

In C++, ff I have a class that needs to hold a member which could be dynamically allocated and used as a pointer, or not, like this:
class A {
type a;
};
or
class A {
A();
~A();
type* a;
};
and in the constructor:
A::A {
a = new type();
}
and destructor:
A::~A {
delete a;
}
are there any advantages or disadvantages to either one, aside from the dynamic one requiring more code? Do they behave differently (aside from the pointer having to be dereferenced) or is one slower than the other? Which one should I use?

There are several differences:
The size of every member must be known when you're defining a class. This means you must include your type header, and you can't just use a forward-declaration as you would with a pointer member (since the size of all pointers is known). This has implications for #include clutter and compile times for large projects.
The memory for the data member is part of the enclosing class instance, so it will be allocated at the same time, in the same place, as all the other class members (whether on the stack or the heap). This has implications for data locality - having everything in the same place could potentially lead to better cache utilization, etc. Stack allocation will likely be a tad faster than heap allocation. Declaring too many huge object instances could blow your stack quicker.
The pointer type is trickier to manage - since it doesn't automatically get allocated or destroyed along with the class, you need to make sure to do that yourself. This becomes tricky with multiple pointer members - if you're newing all of them in the constructor, and halfway through the process there's an exception, the destructor doesn't get called and you have a memory leak. It's better to assign pointer variables to a "smart pointer" container (like std::auto_ptr) immediately, this way the cleanup gets handled automatically (and you don't need to worry about deleteing them in the destructor, often saving you from writing one at all). Also, any time you're handling resources manually you need to worry about copy constructors and assignment operators.

With the pointer you have more control, but also more responsibilities. You have more control in the sense that you can decide the lifetime of the object more precisely, while without the pointer the lifetime is essentially equal to the lifetime of the containing object. Also, with the pointer the member could actually be an instance of a subclass of the pointer type.
Performance-wise, using the pointer does mean more memory usage, more memory fragmentation, and dereferencing does take amount of time. For all but the most performance critical code none of this is really worth worrying about, however.

The main difference is that the pointer can potentially point somewhere else.
edit
Laurence's answer isn't wrong, but it's a bit general. In specific, dynamic allocation is going to be slightly slower. Dereferencing through the pointer is likewise going to be very slightly slower. Again, this is not a lot of speed loss, and the flexibility it buys may well be very much worth it.

The main difference is that if you don't use a pointer, the memory for the inner member will be allocated as a part of the memory allocated for the containing object. If you use new, you will get memory in separate chunks (you already seem to have proper creation and destruction of the referenced object down)

You need to understand the implications of default copy constructor and copy assignment operators when using raw pointers. The raw pointer gets copied in both the cases. In other words, you will end up having multiple objects (or raw pointers) pointing to the same memory location. Therefore, your destructor written as is above will attempt to delete the same memory multiple times.

If the member variable should live beyond the lifetime of the object, or if its ownership should be transferred to another object, then the member should be dynamically (heap) allocated using "new". If it is not, then it is often the best choice to make it a direct member of the class in order to simplify code and lessen the burden on the memory-allocator. Memory allocation is expensive.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js