Performance of smart pointer and raw pointer in containers - c++

I'm curious about the answer to this question as I mostly work with containers.
which one is more logical to use in minimum of 100 (and maximum of 10k) elements in vector or map container in?
std:::vector<std::unique_ptr<(struct or class name)>>
std:::vector<std::shared_ptr<(struct or class name)>>
std:::vector<(struct or class name)*>
Machine detais: FreeBSD 12.1 + clang-devel or gcc11.

This is really opinion-based, but I'll describe the rules of thumb I use.
std:::vector<(struct or class name)> is my default unless I have specific requirements that are not met by that option. More specifically, it is my go-to option UNLESS at least one of the following conditions are true;
struct or class name is polymorphic and instances of classes derived from struct or class name need to be stored in the vector.
struct or class name does not comply with the rule of three (before C++11), the rule of five (from C++11), OR the rule of zero
there are SPECIFIC requirements to dynamically manage lifetime of instances of struct or class name
The above criteria amount to "use std::vector<(struct or class name)> if struct or class name meets requirements to be an element of a standard container".
If struct or class name is polymorphic AND there is a requirement that the vector contain instances of derived classes my default choice is std:::vector<std::unique_ptr<(struct or class name)> >. i.e. none of the options mentioned in the question.
I will only go past that choice if there are special requirements for managing lifetime of the objects in the vector that aren't met by either std:::vector<(struct or class name)> or std:::vector<std::unique_ptr<(struct or class name)> >.
Practically, the above meets the vast majority of real-world needs.
If there is a need for two unrelated pieces of code to have control over the lifetime of objects stored in a vector then (and only then) I will consider std:::vector<std::shared_ptr<(struct or class name)> >. The premise is that there will be some code that doesn't have access to our vector, but has access to its elements via (for example) being passed a std::shared_ptr<(struct or class name)>.
Now, I get to the case which is VERY rare in my experience - where there are requirements to manage lifetime of objects that aren't properly handled by std:::vector<(struct or class name)>, std:::vector<std::unique_ptr<(struct or class name)> >, or by std:::vector<std::shared_ptr<(struct or class name)> >.
In that case, and only that case, I will - and only if I'm desperate - use std:::vector<(struct or class name)*>. This is the situation to be avoided, as much as possible. To give you an idea of how bad I think this option is, I've been known to change other system-level requirements in a quest to avoid this option. The reason I avoid this option like the plague is that it becomes necessary to write and debug EVERY bit of code that explicitly manages the lifetime of each struct or class name. This includes writing new expressions everywhere, ensuring every new expression is eventually matched by a corresponding delete expression. This option also means there is a need to debug hand-written code to ensure no object is deleted twice (undefined behaviour) and every object is deleted once (i.e. avoid leaks). In other words, this option involves lots of effort and - in non-trivial situations - is really hard to get working correctly.

Start with correct behavior, not performance.
Does your container own your objects? If no, use raw pointers. If yes, use smart pointers. But which ones? See below.
Do you need to support several containers containing the same object, and is it unclear which container will be deleted first? If the answer to both is "yes", use shared_ptr. Otherwise, use unique_ptr.
Later, if you discover that accessing the smart pointers wastes too much time (unlikely), replace the smart pointers by raw pointers together with highly optimized memory management, which you will have to implement according to your specific needs.
As noted in comments, you could do it without pointers. So, before applying this answer, ask yourself why you need pointers at all (I guess the answer is polymorphism, but not sure).

It's hard to provide a firm solution to your question without seeing the context and the way your struct/class operates.
But I still want to provide some basic info about smart pointers so hopefully, you can make a wise decision.
An example:
#include <iostream>
#include <vector>
#include <memory>
int main( )
{
struct MyStruct
{
int a;
double b;
};
std::cout << "Size of unique_ptr: " << sizeof( std::unique_ptr< MyStruct > ) << '\n';
std::cout << "Size of shared_ptr: " << sizeof( std::shared_ptr< MyStruct > ) << '\n';
std::cout << '\n';
std::vector< std::unique_ptr<MyStruct> > vec1; // a container holding unique pointers
std::vector< MyStruct* > vec2; // another container holding raw pointers
vec1.emplace_back( std::make_unique<MyStruct>(2, 3.6) ); // deletion process automatically handled
vec2.emplace_back( new MyStruct(5, 11.2) ); // you'll have to manually delete all objects later
std::cout << vec1[0]->a << ' ' << vec1[0]->b << '\n';
std::cout << vec2[0]->a << ' ' << vec2[0]->b << '\n';
}
The possible output:
Size of unique_ptr: 8
Size of shared_ptr: 16
2 3.6
5 11.2
Check the assembly output here and compare the two containers. As I saw, they generate the exact same code.
The unique_ptr is very fast. I don't think it has any overhead. However, the shared_ptr has a bit of overhead due to its reference counting mechanism. But it still might be more efficient than a handwritten reference counting system. Don't underestimate the facilities provided in the STL. Use them in most cases except the ones in which STL does not exactly perform the specific task you need.
Speaking of performance, std::vector<(struct or class name)> is better in most cases since all the objects are stored in a contiguous block of heap memory, and also dereferencing them is not required.
However, when using a container of pointers, your objects will be scattered around heap memory and your program will be less cache-friendly.

Related

Do standard library containers structures store copies or references?

I'm afraid that standard library containers store inside themselves copies of all elements that I push into them. I hoped that they worked with references or pointers to my elements, so they don't waste extra memory and time making copies of each element. I made this proof:
queue<int> prueba;
int x = 5;
prueba.push(x);
x++;
cout << prueba.front() << ", ";
cout << x;
prueba.pop();
And the result was: 5, 6.
So, If I make a big class with a lot of heavy members, and then, I push a lot of objects of that class into a standard library container.
Will containers make a copy of each object inside? That's terrible!
Is there any way to avoid this catastrophic end, other than create just containers of pointers?
Does STL structures stores copies or references?
C++ standard containers are non-instrusive containers and as such they have the following properties:
Object doesn't "know" and contain details about the container in which is to be stored. Example:
struct Node
{
T data;
}
1. Pros:
does not containe additional information regarding the container integration.
object's lifetime managed by the container. (less complex.)
2. Cons:
store copies of values passed by the user. (inplace emplace construction possible.)
an object can belong only to one container. (or the contaier should store pointers to objects.)
overhead on storing copies. (bookkeeping on each allocation.)
can't store derived object and still maintain its original type. (slicing - looses polymorphism.)
Thus, the answer to your question is - they store copies.
Is there any way to avoid this catastrophic end, other than create just containers of pointers?
As far as I know, a reasonable solution is container of smart pointers.
The answer to your question is simple: STL containers store copies.

Cast char array struct vector to a POD vector?

Say, if I want to create a vector type of only for holding POD structures and regular data types. Can I do the following? It looks very unsafe but it works. If it is, what sort of issues might arise?
template <size_t N>
struct Bytes {
char data[N];
};
std::vector<Bytes<sizeof(double)> > d_byte_vector;
std::vector<double>* d_vectorP = reinterpret_cast<std::vector<double>*>(&d_byte_vector);
for (int i=0;i<50;i++) {
d_vectorP->push_back(rand()/RAND_MAX);
}
std::cout << d_vectorP->size() << ":" << d_byte_vector.size() << std::endl;
No, this is not safe. Specific compilers may make some guarantee that is particular to that dialect of C++, but according to the Standard you evoke Undefined Behavior by instantiating an array of char and pretending it's actually something else completely unrelated.
Usually when I see code like this, the author was going for one of three things and missed one of the Right Ways to go about it:
Maybe the author wanted an abstract data structure. A better way to go about that is to use an abstract base class, and move the implementation elsewhere.
Maybe the author wanted an opaque data structure. In that case I would employ some variant of the pimpl idiom, where the void* (or char*) presented to the user actually points to some real data type.
Maybe the author wanted some kind of memory pool. The best way to accomplish that is to allocate a large buffer of char, and then use placement-new to construct real objects within it.
No, this is not safe and generally not recommended because compilers aren't required to operate in a method that allows it. With that said, I've found exactly one reason to ever do this (very recently as well) and that is a variant of the pimpl idiom where I wanted to avoid pointers so that all of my data access could avoid the need to allocate memory for, deallocate memory for, and dereference the extra pointer. That code isn't in production yet and I'm still keeping an eye on that section of code to make sure that it doesn't start causing any other problems.
Unless you're generating code that has to be Extremely optimized, I would recommend finding some other way of doing whatever it is you need to do.

How to keep track of C++ objects

I currently have an class grounds which is used to make objects for all the blocks that make up the ground for a game I am making. What is the best way to keep track of this somewhat large list of blocks? I know how to keep track of objects in python but I recently moved to C++ and I am unsure of how to go about setting up some sort of list that is easy to iterate through.
In C++, the standard library (also referred to as the standard template library) provides several container classes that store a collection of things. The "things" may be any data type, including fundamental and user-defined.
Which container to use depends on your needs. Here's an authoritative article on it from Microsoft.
Your best bet is to use either vector if you need the ability to refer to specific elements by their position in your container, or a set if the order of elements doesn't matter and you need to quickly be able to check whether a certain element is present.
Some examples:
vector<MyClass> mycontainer; // a vector that holds objects of type MyClass
MyClass myObj;
mycontainer.push_back(myObj);
cout << mycontainer[0] << endl; // equivalent to cout << myObj << endl;
Or using a set:
set<MyClass> mycontainer;
MyClass myObj;
mycontainer.insert(myObj);
if (mycontainer.find(myObj))
cout << "Yep, myObj is in the set." << endl;
The reason there's no one ultimate container is that there are efficiency tradeoffs. One container may be blazing-fast at identifying whether an element is present within it, while another is optimal for removing an arbitrary element, etc.
So your best bet is to consider what behaviors you want your container to support (and how efficiently!), and then to review the authoritative article I linked to earlier.

Is it unefficient to use a std::vector when it only contains two elements?

I am building a C++ class A that needs to contain a bunch of pointers to other objects B.
In order to make the class as general as possible, I am using a std::vector<B*> inside this class. This way any number of different B can be held in A (there are no restrictions on how many there can be).
Now this might be a bit of overkill because most of the time, I will be using objects of type A that only hold either 2 or 4 B*'s in the vector.
Since there is going to be a lot of iterative calculations going on, involving objects of class A, I was wondering if there is a lot of overhead involved in using a vector of B's when there are only two B's needed.
Should I overload the class to use another container when there are less than 3 B present?
to make things clearer: A are multipoles and B are magnetic coils, that constitute the multipoles
Premature optimization. Get it working first. If you profile your application and see that you need more efficiency (in memory or performance), then you can change it. Otherwise, it's a potential waste of time.
I would use a vector for now, but typedef a name for it instead of spelling std::vector out directly where it's used:
typedef std::vector vec_type;
class A {
vec_type<B*> whatever;
};
Then, when/if it becomes a problem, you can change that typedef name to refer to a vector-like class that's optimized for a small number of contained objects (e.g., does something like the small-string optimization that's common with many implementations of std::string).
Another possibility (though I don't like it quite as well) is to continue to use the name "vector" directly, but use a using declaration to specify what vector to use:
class A {
using std::vector;
vector<B*> whatever;
};
In this case, when/if necessary, you put your replacement vector into a namespace, and change the using declaration to point to that instead:
class A {
using my_optimized_version::vector;
// the rest of the code remains unchanged:
vector<B*> whatever;
};
As far as how to implement the optimized class, the typical way is something like this:
template <class T>
class pseudo_vector {
T small_data[5];
T *data;
size_t size;
size_t allocated;
public:
// ...
};
Then, if you have 5 or fewer items to store, you put them in small_data. When/if your vector contains more items than that fixed limit, you allocate space on the heap, and use data to point to it.
Depending a bit on what you're trying to optimize, you may want to use an abstract base class, with two descendants, one for small vectors and the other for large vectors, with a pimpl-like class to wrap them and make either one act like something you can use directly.
Yet another possibility that can be useful for some situations is to continue to use std::vector, but provide a custom Allocator object for it to use when obtaining storage space. Googling for "small object allocator" should turn up a number of candidates that have already been written. Depending on the situation, you may want to use one of those directly, or you may want to use them as inspiration to write your own.
If you need an array of B* that will never change its size, you won't need the dynamic shrinking and growing abilities of the std::vector.
So, probably not for reasons of efficiency, but for reasons of intuition, you could consider using a fixed length array:
struct A {
enum { ndims = 2 };
B* b[ndims];
};
or std::array (if available):
struct A {
std::array<B*, 2> b;
};
see also this answer on that topic.
Vectors are pretty lean as far as overhead goes. I'm sure someone here can give more detailed information about what that really means. But if you've got performance issues, they're not going to come from vector.
In addition I'd definitely avoid the tactic of using different containers depending on how many items there are. That's just begging for a disaster and won't really give you anything in return.

Array: Storing Objects or References

As a Java developer I have the following C++ question.
If I have objects of type A and I want to store a collection of them in an array,
then should I just store pointers to the objects or is it better to store the object itself?
In my opinion it is better to store pointers because:
1) One can easily remove an object, by setting its pointer to null
2) One saves space.
Pointers or just the objects?
You can't put references in an array in C++. You can make an array of pointers, but I'd still prefer a container and of actual objects rather than pointers because:
No chance to leak, exception safety is easier to deal with.
It isn't less space - if you store an array of pointers you need the memory for the object plus the memory for a pointer.
The only times I'd advocate putting pointers (or smart pointers would be better) in a container (or array if you must) is when your object isn't copy construable and assignable (a requirement for containers, pointers always meet this) or you need them to be polymorphic. E.g.
#include <vector>
struct foo {
virtual void it() {}
};
struct bar : public foo {
int a;
virtual void it() {}
};
int main() {
std::vector<foo> v;
v.push_back(bar()); // not doing what you expected! (the temporary bar gets "made into" a foo before storing as a foo and your vector doesn't get a bar added)
std::vector<foo*> v2;
v2.push_back(new bar()); // Fine
}
If you want to go down this road boost pointer containers might be of interest because they do all of the hard work for you.
Removing from arrays or containers.
Assigning NULL doesn't cause there to be any less pointers in your container/array, (it doesn't handle the delete either), the size remains the same but there are now pointers you can't legally dereference. This makes the rest of your code more complex in the form of extra if statements and prohibits things like:
// need to go out of our way to make sure there's no NULL here
std::for_each(v2.begin(),v2.end(), std::mem_fun(&foo::it));
I really dislike the idea of allowing NULLs in sequences of pointers in general because you quickly end up burying all the real work in a sequence of conditional statements. The alternative is that std::vector provides an erase method that takes an iterator so you can write:
v2.erase(v2.begin());
to remove the first or v2.begin()+1 for the second. There's no easy "erase the nth element" method though on std::vector because of the time complexity - if you're doing lots of erasing then there are other containers which might be more appropriate.
For an array you can simulate erasing with:
#include <utility>
#include <iterator>
#include <algorithm>
#include <iostream>
int main() {
int arr[] = {1,2,3,4};
int len = sizeof(arr)/sizeof(*arr);
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// remove 2nd element, without preserving order:
std::swap(arr[1], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// and again, first element:
std::swap(arr[0], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
preserving the order requires a series of shuffles instead of a single swap, which nicely illustrates the complexity of erasing that std::vector faces. Of course by doing this you've just reinvented a pretty big wheel a whole lot less usefully and flexibly than a standard library container would do for you for free!
It sounds like you are confusing references with pointers. C++ has 3 common ways of representing object handles
References
Pointers
Values
Coming from Java the most analogous way is to do so with a pointer. This is likely what you are trying to do here.
How they are stored though has some pretty fundamental effects on their behaviors. When you store as a value you are often dealing with copies of the values. Where pointers are dealing with one object with multiple references. Giving a flat answer of one is better than the other is not really possible without a bit more context on what these objects do
It completely depends on what you want to do... but you're misguided in some ways.
Things you should know are:
You can't set a reference to NULL in C++, though you can set a pointer to NULL.
A reference can only be made to an existing object - it must start initialized as such.
A reference cannot be changed (though the referenced value can be).
You wouldn't save space, in fact you would use more since you're using an object and a reference. If you need to reference the same object multiple times then you save space, but you might as well use a pointer - it's more flexible in MOST (read: not all) scenarios.
A last important one: STL containers (vector, list, etc) have COPY semantics - they cannot work with references. They can work with pointers, but it gets complicated, so for now you should always use copyable objects in those containers and accept that they will be copied, like it or not. The STL is designed to be efficient and safe with copy semantics.
Hope that helps! :)
PS (EDIT): You can use some new features in BOOST/TR1 (google them), and make a container/array of shared_ptr (reference counting smart pointers) which will give you a similar feel to Java's references and garbage collection. There's a flurry of differences but you'll have to read about it yourself - they are a great feature of the new standard.
You should always store objects when possible; that way, the container will manage the objects' lifetimes for you.
Occasionally, you will need to store pointers; most commonly, pointers to a base class where the objects themselves will be of different types. In that case, you need to be careful to manage the lifetime of the objects yourself; ensuring that they are not destroyed while in the container, but that they are destroyed once they are no longer needed.
Unlike Java, setting a pointer to null does not deallocate the object pointed to; instead, you get a memory leak if there are no more pointers to the object. If the object was created using new, then delete must be called at some point. Your best options here are to store smart pointers (shared_ptr, or perhaps unique_ptr if available), or to use Boost's pointer containers.
You can't store references in a container. You could store (naked) pointers instead, but that's prone to errors and is therefore frowned upon.
Thus, the real choice is between storing objects and smart pointers to objects. Both have their uses. My recommendation would be to go with storing objects by value unless the particular situation demands otherwise. This could happen:
if you need to NULL out the object without removing it from the
container;
if you need to store pointers to the same object in
multiple containers;
if you need to treat elements of the container
polymorphically.
One reason to not do it is to save space, since storing elements by value is likely to be more space-efficient.
To add to the answer of aix:
If you want to store polymorphic objects, you must use smart pointers because the containers make a copy, and for derived types only copy the base part (at least the standard ones, I think boost has some containers which work differently). Therefore you'll lose any polymorphic behaviour (and any derived-class state) of your objects.