I'm afraid that standard library containers store inside themselves copies of all elements that I push into them. I hoped that they worked with references or pointers to my elements, so they don't waste extra memory and time making copies of each element. I made this proof:
queue<int> prueba;
int x = 5;
prueba.push(x);
x++;
cout << prueba.front() << ", ";
cout << x;
prueba.pop();
And the result was: 5, 6.
So, If I make a big class with a lot of heavy members, and then, I push a lot of objects of that class into a standard library container.
Will containers make a copy of each object inside? That's terrible!
Is there any way to avoid this catastrophic end, other than create just containers of pointers?
Does STL structures stores copies or references?
C++ standard containers are non-instrusive containers and as such they have the following properties:
Object doesn't "know" and contain details about the container in which is to be stored. Example:
struct Node
{
T data;
}
1. Pros:
does not containe additional information regarding the container integration.
object's lifetime managed by the container. (less complex.)
2. Cons:
store copies of values passed by the user. (inplace emplace construction possible.)
an object can belong only to one container. (or the contaier should store pointers to objects.)
overhead on storing copies. (bookkeeping on each allocation.)
can't store derived object and still maintain its original type. (slicing - looses polymorphism.)
Thus, the answer to your question is - they store copies.
Is there any way to avoid this catastrophic end, other than create just containers of pointers?
As far as I know, a reasonable solution is container of smart pointers.
The answer to your question is simple: STL containers store copies.
Related
I'm curious about the answer to this question as I mostly work with containers.
which one is more logical to use in minimum of 100 (and maximum of 10k) elements in vector or map container in?
std:::vector<std::unique_ptr<(struct or class name)>>
std:::vector<std::shared_ptr<(struct or class name)>>
std:::vector<(struct or class name)*>
Machine detais: FreeBSD 12.1 + clang-devel or gcc11.
This is really opinion-based, but I'll describe the rules of thumb I use.
std:::vector<(struct or class name)> is my default unless I have specific requirements that are not met by that option. More specifically, it is my go-to option UNLESS at least one of the following conditions are true;
struct or class name is polymorphic and instances of classes derived from struct or class name need to be stored in the vector.
struct or class name does not comply with the rule of three (before C++11), the rule of five (from C++11), OR the rule of zero
there are SPECIFIC requirements to dynamically manage lifetime of instances of struct or class name
The above criteria amount to "use std::vector<(struct or class name)> if struct or class name meets requirements to be an element of a standard container".
If struct or class name is polymorphic AND there is a requirement that the vector contain instances of derived classes my default choice is std:::vector<std::unique_ptr<(struct or class name)> >. i.e. none of the options mentioned in the question.
I will only go past that choice if there are special requirements for managing lifetime of the objects in the vector that aren't met by either std:::vector<(struct or class name)> or std:::vector<std::unique_ptr<(struct or class name)> >.
Practically, the above meets the vast majority of real-world needs.
If there is a need for two unrelated pieces of code to have control over the lifetime of objects stored in a vector then (and only then) I will consider std:::vector<std::shared_ptr<(struct or class name)> >. The premise is that there will be some code that doesn't have access to our vector, but has access to its elements via (for example) being passed a std::shared_ptr<(struct or class name)>.
Now, I get to the case which is VERY rare in my experience - where there are requirements to manage lifetime of objects that aren't properly handled by std:::vector<(struct or class name)>, std:::vector<std::unique_ptr<(struct or class name)> >, or by std:::vector<std::shared_ptr<(struct or class name)> >.
In that case, and only that case, I will - and only if I'm desperate - use std:::vector<(struct or class name)*>. This is the situation to be avoided, as much as possible. To give you an idea of how bad I think this option is, I've been known to change other system-level requirements in a quest to avoid this option. The reason I avoid this option like the plague is that it becomes necessary to write and debug EVERY bit of code that explicitly manages the lifetime of each struct or class name. This includes writing new expressions everywhere, ensuring every new expression is eventually matched by a corresponding delete expression. This option also means there is a need to debug hand-written code to ensure no object is deleted twice (undefined behaviour) and every object is deleted once (i.e. avoid leaks). In other words, this option involves lots of effort and - in non-trivial situations - is really hard to get working correctly.
Start with correct behavior, not performance.
Does your container own your objects? If no, use raw pointers. If yes, use smart pointers. But which ones? See below.
Do you need to support several containers containing the same object, and is it unclear which container will be deleted first? If the answer to both is "yes", use shared_ptr. Otherwise, use unique_ptr.
Later, if you discover that accessing the smart pointers wastes too much time (unlikely), replace the smart pointers by raw pointers together with highly optimized memory management, which you will have to implement according to your specific needs.
As noted in comments, you could do it without pointers. So, before applying this answer, ask yourself why you need pointers at all (I guess the answer is polymorphism, but not sure).
It's hard to provide a firm solution to your question without seeing the context and the way your struct/class operates.
But I still want to provide some basic info about smart pointers so hopefully, you can make a wise decision.
An example:
#include <iostream>
#include <vector>
#include <memory>
int main( )
{
struct MyStruct
{
int a;
double b;
};
std::cout << "Size of unique_ptr: " << sizeof( std::unique_ptr< MyStruct > ) << '\n';
std::cout << "Size of shared_ptr: " << sizeof( std::shared_ptr< MyStruct > ) << '\n';
std::cout << '\n';
std::vector< std::unique_ptr<MyStruct> > vec1; // a container holding unique pointers
std::vector< MyStruct* > vec2; // another container holding raw pointers
vec1.emplace_back( std::make_unique<MyStruct>(2, 3.6) ); // deletion process automatically handled
vec2.emplace_back( new MyStruct(5, 11.2) ); // you'll have to manually delete all objects later
std::cout << vec1[0]->a << ' ' << vec1[0]->b << '\n';
std::cout << vec2[0]->a << ' ' << vec2[0]->b << '\n';
}
The possible output:
Size of unique_ptr: 8
Size of shared_ptr: 16
2 3.6
5 11.2
Check the assembly output here and compare the two containers. As I saw, they generate the exact same code.
The unique_ptr is very fast. I don't think it has any overhead. However, the shared_ptr has a bit of overhead due to its reference counting mechanism. But it still might be more efficient than a handwritten reference counting system. Don't underestimate the facilities provided in the STL. Use them in most cases except the ones in which STL does not exactly perform the specific task you need.
Speaking of performance, std::vector<(struct or class name)> is better in most cases since all the objects are stored in a contiguous block of heap memory, and also dereferencing them is not required.
However, when using a container of pointers, your objects will be scattered around heap memory and your program will be less cache-friendly.
Simply written I would like to ask "what is a good reason to use smart pointers?"
for ex std::unique_ptr
However, I am not asking for reasons to use smart pointers over regular (dumb) pointers. I think every body knows that or a quick search can find the reason.
What I am asking is a comparison of these two cases:
Given a class (or a struct) named MyObject use
std:queue<std::unique_ptr<MyObject>>queue;
rather than
std:queue<MyObject> queue;
(it can be any container, not necessarily a queue)
Why should someone use option 1 rather than 2?
That is actually a good question.
There are a few reasons I can think of:
Polymorphism works only with references and pointers, not with value types. So if you want to hold derived objects in a container you can't have std::queue<MyObject>. One options is unique_ptr, another is reference_wrapper
the contained objects are referenced (*) from outside of the container. Depending on the container, the elements it holds can move, invalidating previous references to it. For instance std::vector::insert or the move of the container itself. In this case std::unique_ptr<MyObject> assures that the reference is valid, regardless of what the container does with it (ofc, as long as the unique_ptr is alive).
In the following example in Objects you can add a bunch of objects in a queue. However two of those objects can be special and you can access those two at any time.
struct MyObject { MyObject(int); };
struct Objects
{
std::queue<std::unique_ptr<MyObject>> all_objects_;
MyObject* special_object_ = nullptr;
MyObject* secondary_special_object_ = nullptr;
void AddObject(int i)
{
all_objects_.emplace(std::make_unique<MyObject>(i));
}
void AddSpecialObject(int i)
{
auto& emplaced = all_objects_.emplace(std::make_unique<MyObject>(i));
special_object_ = emplaced.get();
}
void AddSecondarySpecialObject(int i)
{
auto& emplaced = all_objects_.emplace(std::make_unique<MyObject>(i));
secondary_special_object_ = emplaced.get();
}
};
(*) I use "reference" here with its english meaning, not the C++ type. Any way to refer to an object (e.g. via a raw pointer)
Usecase: You want to store something in a std::vector with constant indices, while at the same time being able to remove objects from that vector.
If you use pointers, you can delete a pointed to object and set vector[i] = nullptr, (and also check for it later) which is something you cannot do when storing objects themselves. If you'd store Objects you would have to keep the instance in the vector and use a flag bool valid or something, because if you'd delete an object from a vector all indices after that object's index change by -1.
Note: As mentioned in a comment to this answer, the same can be archieved using std::optional, if you have access to C++17 or later.
The first declaration generates a container with pointer elements and the second one generates pure objects.
Here are some benefits of using pointers over objects:
They allow you to create dynamically sized data structures.
They allow you to manipulate memory directly (such as when packing or
unpacking data from hardware devices.)
They allow object references(function or data objects)
They allow you to manipulate an object(through an API) without needing to know the details of the object(other than the API.)
(raw) pointers are usually well matched to CPU registers, which makes dereferencing a value via a pointer efficient. (C++ “smart” pointers are more complicated data objects.)
Also, polymorphism is considered as one of the important features of Object-Oriented Programming.
In C++ polymorphism is mainly divided into two types:
Compile-time Polymorphism
This type of polymorphism is achieved by function overloading or operator overloading.
Runtime Polymorphism
This type of polymorphism is achieved by Function Overriding which if we want to use the base class to use these functions, it is necessary to use pointers instead of objects.
From a C background I find myself falling back into C habits where there is generally a better way. In this case I can't think of a way to do this without pointers.
I would like
struct foo {
int i;
int j;
};
mymap['a'] = foo
mymap['b'] = bar
As long as only one key references a value mymap.find will return a reference so I can modify the value, but if I do this:
mymap['c'] = mymap.find('a') // problematic because foo is copied right?
The goal is to be able to find 'a' or 'c' modify foo and then the next find of 'a' or 'c' will show the updated result.
No, you will need to use pointers for this. Each entry in the map maintains a copy of the value assigned, which means that you cannot have two keys referring to the same element. Now if you store pointers to the element, then two keys will refer to two separate pointers that will refer to the exact same in memory element.
For some implementation details, std::map is implemented as a balanced tree where in each node contains a std::pair<const Key,Value> object (and extra information for the tree structure). When you do m[ key ] the node containing the key is looked up or a new node is created in the tree and the reference to the Value subobject of the pair is returned.
I would use std::shared_ptr here. You have an example of shared ownership, and shared_ptr is made for that. While pointers tend to be overused, it is nothing wrong with using them when necessary.
Boost.Intrusive
Boost.Intrusive is a library presenting some intrusive containers to
the world of C++. Intrusive containers are special containers that
offer better performance and exception safety guarantees than
non-intrusive containers (like STL containers).
The performance benefits of intrusive containers makes them ideal as a
building block to efficiently construct complex containers like
multi-index containers or to design high performance code like memory
allocation algorithms.
While intrusive containers were and are widely used in C, they became
more and more forgotten in C++ due to the presence of the standard
containers which don't support intrusive techniques.Boost.Intrusive
not only reintroduces this technique to C++, but also encapsulates the
implementation in STL-like interfaces. Hence anyone familiar with
standard containers can easily use Boost.Intrusive.
As a Java developer I have the following C++ question.
If I have objects of type A and I want to store a collection of them in an array,
then should I just store pointers to the objects or is it better to store the object itself?
In my opinion it is better to store pointers because:
1) One can easily remove an object, by setting its pointer to null
2) One saves space.
Pointers or just the objects?
You can't put references in an array in C++. You can make an array of pointers, but I'd still prefer a container and of actual objects rather than pointers because:
No chance to leak, exception safety is easier to deal with.
It isn't less space - if you store an array of pointers you need the memory for the object plus the memory for a pointer.
The only times I'd advocate putting pointers (or smart pointers would be better) in a container (or array if you must) is when your object isn't copy construable and assignable (a requirement for containers, pointers always meet this) or you need them to be polymorphic. E.g.
#include <vector>
struct foo {
virtual void it() {}
};
struct bar : public foo {
int a;
virtual void it() {}
};
int main() {
std::vector<foo> v;
v.push_back(bar()); // not doing what you expected! (the temporary bar gets "made into" a foo before storing as a foo and your vector doesn't get a bar added)
std::vector<foo*> v2;
v2.push_back(new bar()); // Fine
}
If you want to go down this road boost pointer containers might be of interest because they do all of the hard work for you.
Removing from arrays or containers.
Assigning NULL doesn't cause there to be any less pointers in your container/array, (it doesn't handle the delete either), the size remains the same but there are now pointers you can't legally dereference. This makes the rest of your code more complex in the form of extra if statements and prohibits things like:
// need to go out of our way to make sure there's no NULL here
std::for_each(v2.begin(),v2.end(), std::mem_fun(&foo::it));
I really dislike the idea of allowing NULLs in sequences of pointers in general because you quickly end up burying all the real work in a sequence of conditional statements. The alternative is that std::vector provides an erase method that takes an iterator so you can write:
v2.erase(v2.begin());
to remove the first or v2.begin()+1 for the second. There's no easy "erase the nth element" method though on std::vector because of the time complexity - if you're doing lots of erasing then there are other containers which might be more appropriate.
For an array you can simulate erasing with:
#include <utility>
#include <iterator>
#include <algorithm>
#include <iostream>
int main() {
int arr[] = {1,2,3,4};
int len = sizeof(arr)/sizeof(*arr);
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// remove 2nd element, without preserving order:
std::swap(arr[1], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// and again, first element:
std::swap(arr[0], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
preserving the order requires a series of shuffles instead of a single swap, which nicely illustrates the complexity of erasing that std::vector faces. Of course by doing this you've just reinvented a pretty big wheel a whole lot less usefully and flexibly than a standard library container would do for you for free!
It sounds like you are confusing references with pointers. C++ has 3 common ways of representing object handles
References
Pointers
Values
Coming from Java the most analogous way is to do so with a pointer. This is likely what you are trying to do here.
How they are stored though has some pretty fundamental effects on their behaviors. When you store as a value you are often dealing with copies of the values. Where pointers are dealing with one object with multiple references. Giving a flat answer of one is better than the other is not really possible without a bit more context on what these objects do
It completely depends on what you want to do... but you're misguided in some ways.
Things you should know are:
You can't set a reference to NULL in C++, though you can set a pointer to NULL.
A reference can only be made to an existing object - it must start initialized as such.
A reference cannot be changed (though the referenced value can be).
You wouldn't save space, in fact you would use more since you're using an object and a reference. If you need to reference the same object multiple times then you save space, but you might as well use a pointer - it's more flexible in MOST (read: not all) scenarios.
A last important one: STL containers (vector, list, etc) have COPY semantics - they cannot work with references. They can work with pointers, but it gets complicated, so for now you should always use copyable objects in those containers and accept that they will be copied, like it or not. The STL is designed to be efficient and safe with copy semantics.
Hope that helps! :)
PS (EDIT): You can use some new features in BOOST/TR1 (google them), and make a container/array of shared_ptr (reference counting smart pointers) which will give you a similar feel to Java's references and garbage collection. There's a flurry of differences but you'll have to read about it yourself - they are a great feature of the new standard.
You should always store objects when possible; that way, the container will manage the objects' lifetimes for you.
Occasionally, you will need to store pointers; most commonly, pointers to a base class where the objects themselves will be of different types. In that case, you need to be careful to manage the lifetime of the objects yourself; ensuring that they are not destroyed while in the container, but that they are destroyed once they are no longer needed.
Unlike Java, setting a pointer to null does not deallocate the object pointed to; instead, you get a memory leak if there are no more pointers to the object. If the object was created using new, then delete must be called at some point. Your best options here are to store smart pointers (shared_ptr, or perhaps unique_ptr if available), or to use Boost's pointer containers.
You can't store references in a container. You could store (naked) pointers instead, but that's prone to errors and is therefore frowned upon.
Thus, the real choice is between storing objects and smart pointers to objects. Both have their uses. My recommendation would be to go with storing objects by value unless the particular situation demands otherwise. This could happen:
if you need to NULL out the object without removing it from the
container;
if you need to store pointers to the same object in
multiple containers;
if you need to treat elements of the container
polymorphically.
One reason to not do it is to save space, since storing elements by value is likely to be more space-efficient.
To add to the answer of aix:
If you want to store polymorphic objects, you must use smart pointers because the containers make a copy, and for derived types only copy the base part (at least the standard ones, I think boost has some containers which work differently). Therefore you'll lose any polymorphic behaviour (and any derived-class state) of your objects.
I just started learning about pointers in C++, and I'm not very sure on when to use pointers, and when to use actual objects.
For example, in one of my assignments we have to construct a gPolyline class, where each point is defined by a gVector. Right now my variables for the gPolyline class looks like this:
private:
vector<gVector3*> points;
If I had vector< gVector3 > points instead, what difference would it make? Also, is there a general rule of thumb for when to use pointers? Thanks in advance!
The general rule of thumb is to use pointers when you need to, and values or references when you can.
If you use vector<gVector3> inserting elements will make copies of these elements and the elements will not be connected any more to the item you inserted. When you store pointers, the vector just refers to the object you inserted.
So if you want several vectors to share the same elements, so that changes in the element are reflected in all the vectors, you need the vectors to contain pointers. If you don't need such functionality storing values is usually better, for example it saves you from worrying about when to delete all these pointed to objects.
Pointers are generally to be avoided in modern C++. The primary purpose for pointers nowadays revolves around the fact that pointers can be polymorphic, whereas explicit objects are not.
When you need polymorphism nowadays though it's better to use a smart pointer class -- such as std::shared_ptr (if your compiler supports C++0x extensions), std::tr1::shared_ptr (if your compiler doesn't support C++0x but does support TR1) or boost::shared_ptr.
Generally, it's a good idea to use pointers when you have to, but references or alternatively objects objects (think of values) when you can.
First you need to know if gVector3 fulfils requirements of standard containers, namely if the type gVector3 copyable and assignable. It is useful if gVector3 is default constructible as well (see UPDATE note below).
Assuming it does, then you have two choices, store objects of gVector3 directly in std::vector
std::vector<gVector3> points;
points.push_back(gVector(1, 2, 3)); // std::vector will make a copy of passed object
or manage creation (and also destruction) of gVector3 objects manually.
std::vector points;
points.push_back(new gVector3(1, 2, 3));
//...
When the points array is no longer needed, remember to talk through all elements and call delete operator on it.
Now, it's your choice if you can manipulate gVector3 as objects (you can assume to think of them as values or value objects) because (if, see condition above) thanks to availability of copy constructor and assignment operator the following operations are possible:
gVector3 v1(1, 2, 3);
gVector3 v2;
v2 = v1; // assignment
gVector3 v3(v2); // copy construction
or you may want or need to allocate objects of gVector3 in dynamic storage using new operator. Meaning, you may want or need to manage lifetime of those objects on your own.
By the way, you may be also wondering When should I use references, and when should I use pointers?
UPDATE: Here is explanation to the note on default constructibility. Thanks to Neil for pointing that it was initially unclear. As Neil correctly noticed, it is not required by C++ standard, however I pointed on this feature because it is an important and useful one. If type T is not default constructible, what is not required by the C++ standard, then user should be aware of potential problems which I try to illustrate below:
#include <vector>
struct T
{
int i;
T(int i) : i(i) {}
};
int main()
{
// Request vector of 10 elements
std::vector<T> v(10); // Compilation error about missing T::T() function/ctor
}
You can use pointers or objects - it's really the same at the end of the day.
If you have a pointer, you'll need to allocate space for the actual object (then point to it) any way. At the end of the day, if you have a million objects regardless of whether you are storing pointers or the objects themselves, you'll have the space for a million objects allocated in the memory.
When to use pointers instead? If you need to pass the objects themselves around, modify individual elements after they are in the data structure without having to retrieve them each and every time, or if you're using a custom memory manager to manage the allocation, deallocation, and cleanup of the objects.
Putting the objects themselves in the STL structure is easier and simpler. It requires less * and -> operators which you may find to be difficult to comprehend. Certain STL objects would need to have the objects themselves present instead of pointers in their default format (i.e. hashtables that need to hash the entry - and you want to hash the object, not the pointer to it) but you can always work around that by overriding functions, etc.
Bottom line: use pointers when it makes sense to. Use objects otherwise.
Normally you use objects.
Its easier to eat an apple than an apple on a stick (OK 2 meter stick because I like candy apples).
In this case just make it a vector<gVector3>
If you had a vector<g3Vector*> this implies that you are dynamically allocating new objects of g3Vector (using the new operator). If so then you need to call delete on these pointers at some point and std::Vector is not designed to do that.
But every rule is an exception.
If g3Vector is a huge object that costs a lot to copy (hard to tell read your documentation) then it may be more effecient to store as a pointer. But in this case I would use the boost::ptr_vector<g3Vector> as this automatically manages the life span of the object.