I came across with an issue with std::vector<T>, where T is a built-in type saying that the vector is not trivially copyable.
I was wondering if it's right and am looking for the reason.
Formally, a std::vector<T> (for any T) is not trivially copyable because its copy constructor is not trivial, if only because it's user-provided (as opposed to implicitly-defined).
Practically speaking, copying a vector involves more than making a shallow copy of its data members - it requires allocating a memory buffer on the heap and copying over its contents from another vector's heap-allocated buffer.
A vector grows as data is added to it. This means that one does not need to know upfront how much space is needed to store all its data. The vector solves this problem by allocating (and reallocating) a separate storage buffer on the heap. This buffer is managed internally while providing an interface that can be though of as a variably sized array.
Now if an object is trivially-constructable, one should be able to copy/clone the object simply using memcpy(dest, &a, sizeof(a)). If one were to do this for a vector, one would have 2 vector objects pointing to the same storage buffer. This will result in horrible undefined behaviour. Copying a vector therefore requires that one duplicate the internal storage, duplicate its parameters and then set the internal pointer to point to the correct storage buffer. This requires internal knowledge of the object to do.
std::array however, has a static size set at compile time. It does not have internal pointers and can therefore be copied simply by using memcpy. It therefore is trivial to copy.
Related
I'm reading about mutable_buffer and it says
The mutable_buffer class provides a safe representation of a buffer
that can be modified. It does not own the underlying data, and so is
cheap to copy or assign.
By copy I think it mean copying data using memcpy. What does assign mean?
Also if I have a pointer to a data, can't I simply make mutable_buffer point to this data instead of do mine? Of course, if the sizes of both are consistent.
If you look at more context you can see that the boost buffers are constructed from existing data aggregates like arrays, std::arrays, boost::arrays or std::vectors. These are the owners of the data, meaning they are responsible for allocation and deletion.
The mutable_buffer class, by contrast, just points to the data provided by one of the mentioned containers, does not acquire it when it is created and and does not delete it when it is destroyed; this is what's meant with "it does not own the data".
Because it consists just of a pointer and an integral size and could not care less about the data it points to it is cheap to create, copy, assign and destroy. (But obviously care must be taken that the data pointed to is still valid — that's the difference to e.g. a std::vector which takes automagically care of that no matter when and how often often you copy, create and destroy it. The downside is that copying a vector copies all the data.)
What I have readen say that a common approach to make a vector of pointer that own the pointers, of MyObject for example for simples uses, is vector<unique_pointer<MyObject>>.
But each time we access an element will call unique_ptr::get(). There is also a little overhead.
Why isn't vector of the pointer with "custom deleter", if such a thing exists (I don't have used allocators), more standard? That is, a smart vector instead of a vector of a smart pointer. It will eliminate the little overhead of using unique_ptr::get().
Something like vector<MyObject*, delete_on_destroy_allocator<MyObject>> or unique_vector<MyObject>.
The vector would take the behaviour "delete pointer when destroy" instead of duplicate this behaviour in each unique_ptr , is there a reason, or is just the overhead neglegible ?
Why isn't vector of pointer with "custom deleter", if such a thing exists
Because such a thing doesn't exist and cannot exist.
The allocator supplied to a container exists to allocate memory for the container and (optionally) creates/destroys the objects in that container. A vector<T*> is a container of pointers; therefore, the allocator allocates memory for the pointer and (optionally) creates/destroys the pointers. It is not responsible for the content of the pointer: the object it points to. That is the domain of the user to provide and manage.
If an allocator takes responsibility for destroying the object being pointed to, then it must logically also have responsibility for creating the object being pointed to, yes? After all, if it didn't, and we copied such a vector<T*, owning_allocator>, each copy would expect to destroy the objects being pointed to. But since they're pointing to the same objects (copying a vector<T> copies the Ts), you get a double destroy.
Therefore, if owning_allocator::destruct is going to delete the memory, owning_allocator::construct must also create the object being pointed to.
So... what does this do:
vector<T*, owning_allocator> vec;
vec.push_back(new T());
See the problem? allocator::construct cannot decide when to create a T and when not to. It doesn't know if its being called because of a vector copy operation or because push_back is being called with a user-created T*. All it knows is that it is being called with a T* value (technically a reference to a T*, but that's irrelevant, since it will be called with such a reference in both cases).
Therefore, either it 1) allocates a new object (initialized via a copy from the pointer it is given), or 2) it copies the pointer value. And since it cannot detect which situation is in play, it must always pick the same option. If it does #1, then the above code is a memory leak, because the vector didn't store the new T(), and nobody else deleted it. If it does #2, then you can't copy such a vector (and the story for internal vector reallocation is equally hazy).
What you want is not possible.
A vector<T> is a container of Ts, whatever T may be. It treats T as whatever it is; any meaning of this value is up to the user. And ownership semantics are part of that meaning.
T* has no ownership semantics, so vector<T*> also has no ownership semantics. unique_ptr<T> has ownership semantics, so vector<unique_ptr<T>> also has ownership semantics.
This is why Boost has ptr_vector<T>, which is explicitly a vector-style class that specifically contains pointers to Ts. It has a slightly modified interface because of this; if you hand it a T*, it knows it is adopting the T* and will destroy it. If you hand it a T, then it allocates a new T and copies/moves the value into the newly allocated T. This is a different container, with a different interface, and different behavior; therefore, it merits a different type from vector<T*>.
Neither a vector of unique_ptr's nor a vector of plain pointers are the preferred way to store data. In your example: std::vector<MyObject> is usually just fine, and if you know the size at compile time, try std::array<int>.
If you absolutely need indirect references , you can also consider std::vector<std::reference_wrapper<MyObject>>. Read about reference wrappers here.
Having said that... if you:
Need to store your vector somewhere else than your actual data, or
If MyObjects are very large / expensive to move, or
If construction or destruction of MyObjects has real-world side-effects which you want to avoid;
and, additionally, you want your MyObject to be freed when it's no longer refered to from the vector is gone - the vector of unique pointers is relevant.
Now, pointers are just a plain and simple data type inherited from the C language; it doesn't have custom deleters or custom anything... but - std::unique_ptr does support custom deleters. Also, it may be the case that you have more complex resource management needs for which it doesn't makes sense to have each element manage its own allocation and de-allocation - in which case as "smart" vector class may be relevant.
So: Different data structures fit different scenarios.
I was trying to write user defined object with std::vector. I read that for User Defined classes, if Copy Constructor and Assignment Operator are public then only one can insert its object in STL container. This is because of two reasons::
All STL contains always stores the copy of inserted objects, not the actual one. So, whenever we insert any element or object in container then it’s copy constructor is called to create a copy and then this copy is inserted into the container.
While insertion in std::vector it might be possible that storage relocation takes place internally due to insufficient space. In such cases assignment operator will be called on objects inside the container to copy them from one location to another. why all STL container always stores the copy of inserted objects, not the actual one? I couldn't understand the reason as to why they didn't allow the storing of the actual object. what was the disadvantage?
The standard containers in C++ allocate memory that they manage. If your program creates an object, then that object is in another memory place, so to be part of the container, a copy is made in the memory of the container.
Instead of copying, moving could have been done, but in a lot of cases, that would not be more efficient and sometimes it could even be quite inconvenient.
A good solution to avoid copying is to create the object directly in the container with the emplace-functions.
About the vector growing, because it is possible that the new vector has to be at a different memory address and the memory contains the objects, they have to be moved or copied. This answer shows how you can make the vector move upon resizing.
Assuming that a vector contains an array internally of std::aligned_storage instances that actually contain elements of the type the vector is templated on if the aligned_storage instance is in use.
When the vector has to allocate a new block of memory and move all its elements, why does it invoke the move constructor of each element in use and then destroy the old element? Why not just copy over all the bytes byte by byte and just delete the old array without calling destructors? This would make the new array an exact copy of the old array without the overhead of moving and destroying elements.
Maybe I'm a little tired and am missing something really basic. But I cannot think of a reason why this would not work.
It would not be safe to merely copy the bytes. Imagine, for example, that your object has two members, p and d, and p is a pointer that points to d. If you just copy the bytes, you'd copy the value of p that points to the old location of d, which has been destroyed.
This is a simple example, but in general, the reason for C++ constructors, destructors, copy and move constructors, is to allow your object to be "smarter" than just a sequence of bytes would be. Member variables have meaning, and that meaning is understood by your code, not by the compiler.
I've had some experience in C++ from school works. I've learned, among other things, that objects should be stored in a container (vector, map, etc) as pointers. The main reason being that we need the use of the new-operator, along with a copy constructor, in order to create a copy on the heap (otherwise called dynamic memory) of the object. This method also necessitates defining a destructor.
However, from what I've read since then, it seems that STL containers already store the values they contain on the heap. Thus, if I were to store my objects as values, a copy (using the copy constructor) would be made on the heap anyway, and there would be no need to define a destructor. All in all, a copy on the heap would be made anyway???
Also, if(true), then the only other reason I can think of for storing objects using pointers would be to alleviate resource needs for copying the container, as pointers are easier to copy than whole objects. However, this would require the use of std::shared_ptr instead of regular pointers, since you don't want elements in the copied container to be deleted when the original container is destroyed. This method would also alleviate the need for defining a destructor, wouldn't it?
Edit : The destructor to be defined would be for the class using the container, not for the class of the objects stored.
Edit 2 : I guess a more precise question would be : "Does it make a difference to store objects as pointers using the new-operator, as opposed to plain values, on a memory and resources used standpoint?"
The main reason to avoid storing full objects in containers (rather than pointers) is because copying or moving those objects is expensive. In that case, the recommended alternative is to store smart pointers in the container.
So...
vector<something_t> ................. Usually perfectly OK
vector<shared_ptr<something_t>> ..... Preferred if you want pointers
vector<something_t*> ................ Usually best avoided
The problem with raw pointers is that, when a raw pointer disappears, the object it points to hangs around causing memory and resource leaks - unless you've explicitly deleted it. C++ doesn't have garbage collection, and when a pointer is discarded, there's no way to know if other pointers may still be pointing to that object.
Raw pointers are a low-level tool - mostly used to write libraries such as vector and shared_ptr. Smart pointers are a high-level tool.
However, particularly with C++11 move semantics, the costs of moving items around in a vector is normally very small even for huge objects. For example, a vector<string> is fine even if all the strings are megabytes long. You mostly worry about the cost of moving objects if sizeof(classname) is big - if the object holds lots of data inside itself rather than in separate heap-allocated memory.
Even then, you don't always worry about the cost of moving objects. It doesn't matter that moving an object is expensive if you never move it. For example, a map doesn't need to move items around much. When you insert and delete items, the nodes (and contained items) stay where they are, it's just the pointers that link the nodes that change.