Create std::vector in-place from raw data

Create std::vector in-place from raw data - c++

Given a raw array of elements, how to create a std::vector that takes ownership of the raw array without reallocate & copy?
For example having the raw array:
int* elems = new int[33]
how to create a std::vector of size 33 pointing to elems?
I am sure that theoretically this is possible, as usually std::vector is implemented as a structure containing three pointers, one pointing to the beginning of the allocated memory, one to the end of the valid elements, and one to the end of the allocated memory. But is there a standard way of initializing the std::vector structure with the raw array?

What you need is a "view" rather than a container. Containers own their elements and their main purpose is to encapsulate the raw memory they manage. If you need to manage the memory yourself then you dont need a container. Take a look at string_view that would be your solution if you had a string. Perhaps boost ranges is something that you could apply. From the docs (emphasize mine):
The motivation for the Range concept is that there are many useful
Container-like types that do not meet the full requirements of
Container, and many algorithms that can be written with this reduced
set of requirements. In particular, a Range does not necessarily
own the elements that can be accessed through it,
have copy semantics,
PS: Actually std::array_view was considered for C++17, but unfortunately it didnt make it into the standard.

The reason why this is not directly possible is that the standard library uses allocators to reserve memory for the containers.
Therefore if you have an std::vector which uses a certain type of allocator and give it some pointer you have created, you effectively break the allocator idiom. If your implementation of the standard library for example uses malloc and free instead of new and delete, your program will fail.
For this to become a standard way, the standard library would need to provide a constructor that accepts a T* which additionally must have been returned by the same allocator the vector uses later. So the signature of the constructor you need would be something like std::vector::vector(T* data, size_type used_elements, size_type capacity, const Allocator& alloc). Notice that the allocator argument is necessary as the T* must (theoretically) have been returned by the exact same allocator that is used in the vector.
You can achieve some of the functionality this by creating your own allocator according to this concept, but to have your 33 elements to not be reconstructed you will also have to provide a allocator::construct(..) function which is a no-op until the 34th element (exclusive). Additionally you will have to initially resize your vector to 33 elements to force the vector to have the correct size.
That being said this nevertheless is a bad idea because for the conditional construct and allocate functions you will probably have more overhead opposed to copying your elements once.

In accordance to this, there is no constructor that accepts pointer to data. So, you can't pass the ownership over raw array to vector.
You can only create a vector and place data into it.

If the type of the object you are working on is movable, you can do this:
template<typename T>
std::vector<std::unique_ptr<T>> ConvertArrayToVector(T* data, size_t size)
{
std::vector<std::unique_ptr<T>> result(size);
for (unsigned int i = 0; i<size; ++i)
result[i] = std::make_unique<T>(std::forward<T>(data[i]));
return result;
}
The resulting vector owns the array now, in a sense that it stores pointers to its elements and makes sure the objects are deleted when the vector is destroyed, but the original array gets invalidated in the process.

Given a raw array of elements, how to create a std::vector that
takes ownership of the raw array without reallocate & copy?
There is no way.
how to create a std::vector of size 33 pointing to elems?
Impossible.
I am sure that theoretically this is possible,
No, it isn't.
But is there a standard way of initializing the std::vector structure with the raw array?
No.
That being said, chances are that you may be able to hack together a solution with a custom allocator. However, in addition to the fact that writing custom allocators is a rarely-used error-prone technique, you shouldn't overestimate the usability of such a solution.
std::vector<int> and std::vector<int, MyAllocator> are two different classes. If your goal is to interface with code that expects std::vector<int>, then std::vector<int, MyAllocator> cannot be used; and if you intend to create and use std::vector<int, MyAllocator> in your code, then honestly you'd be better off just implementing your own non-owning container class, i.e. something like a custom VectorView<T>.

Related

Why is vector of unique_ptr the prefered way to store pointers?

What I have readen say that a common approach to make a vector of pointer that own the pointers, of MyObject for example for simples uses, is vector<unique_pointer<MyObject>>.
But each time we access an element will call unique_ptr::get(). There is also a little overhead.
Why isn't vector of the pointer with "custom deleter", if such a thing exists (I don't have used allocators), more standard? That is, a smart vector instead of a vector of a smart pointer. It will eliminate the little overhead of using unique_ptr::get().
Something like vector<MyObject*, delete_on_destroy_allocator<MyObject>> or unique_vector<MyObject>.
The vector would take the behaviour "delete pointer when destroy" instead of duplicate this behaviour in each unique_ptr , is there a reason, or is just the overhead neglegible ?

Why isn't vector of pointer with "custom deleter", if such a thing exists
Because such a thing doesn't exist and cannot exist.
The allocator supplied to a container exists to allocate memory for the container and (optionally) creates/destroys the objects in that container. A vector<T*> is a container of pointers; therefore, the allocator allocates memory for the pointer and (optionally) creates/destroys the pointers. It is not responsible for the content of the pointer: the object it points to. That is the domain of the user to provide and manage.
If an allocator takes responsibility for destroying the object being pointed to, then it must logically also have responsibility for creating the object being pointed to, yes? After all, if it didn't, and we copied such a vector<T*, owning_allocator>, each copy would expect to destroy the objects being pointed to. But since they're pointing to the same objects (copying a vector<T> copies the Ts), you get a double destroy.
Therefore, if owning_allocator::destruct is going to delete the memory, owning_allocator::construct must also create the object being pointed to.
So... what does this do:
vector<T*, owning_allocator> vec;
vec.push_back(new T());
See the problem? allocator::construct cannot decide when to create a T and when not to. It doesn't know if its being called because of a vector copy operation or because push_back is being called with a user-created T*. All it knows is that it is being called with a T* value (technically a reference to a T*, but that's irrelevant, since it will be called with such a reference in both cases).
Therefore, either it 1) allocates a new object (initialized via a copy from the pointer it is given), or 2) it copies the pointer value. And since it cannot detect which situation is in play, it must always pick the same option. If it does #1, then the above code is a memory leak, because the vector didn't store the new T(), and nobody else deleted it. If it does #2, then you can't copy such a vector (and the story for internal vector reallocation is equally hazy).
What you want is not possible.
A vector<T> is a container of Ts, whatever T may be. It treats T as whatever it is; any meaning of this value is up to the user. And ownership semantics are part of that meaning.
T* has no ownership semantics, so vector<T*> also has no ownership semantics. unique_ptr<T> has ownership semantics, so vector<unique_ptr<T>> also has ownership semantics.
This is why Boost has ptr_vector<T>, which is explicitly a vector-style class that specifically contains pointers to Ts. It has a slightly modified interface because of this; if you hand it a T*, it knows it is adopting the T* and will destroy it. If you hand it a T, then it allocates a new T and copies/moves the value into the newly allocated T. This is a different container, with a different interface, and different behavior; therefore, it merits a different type from vector<T*>.

Neither a vector of unique_ptr's nor a vector of plain pointers are the preferred way to store data. In your example: std::vector<MyObject> is usually just fine, and if you know the size at compile time, try std::array<int>.
If you absolutely need indirect references , you can also consider std::vector<std::reference_wrapper<MyObject>>. Read about reference wrappers here.
Having said that... if you:
Need to store your vector somewhere else than your actual data, or
If MyObjects are very large / expensive to move, or
If construction or destruction of MyObjects has real-world side-effects which you want to avoid;
and, additionally, you want your MyObject to be freed when it's no longer refered to from the vector is gone - the vector of unique pointers is relevant.
Now, pointers are just a plain and simple data type inherited from the C language; it doesn't have custom deleters or custom anything... but - std::unique_ptr does support custom deleters. Also, it may be the case that you have more complex resource management needs for which it doesn't makes sense to have each element manage its own allocation and de-allocation - in which case as "smart" vector class may be relevant.
So: Different data structures fit different scenarios.

Can a std::array<T, N> release its data?

Consider as an example the unique_ptr and its release method that returns a pointer to the managed object and releases the ownership.
Is there any way to release the ownership of the underlying array of a std::array?
Ok, one could use a std::unique_ptr instead of a std::array and that's all. Anyway, the latter has a few nice features like the size member method that are useful sometimes.

Is there any way to release the ownership of the underlying array of a std::array?
No. A std::array is just a wrapper for a raw array. It can be reassigned but that is actually a copy operation of all the elements in the array. The destination array does not point to the source array.
You should also note that a std::array and a std::unique_ptr<type[]> are different in that the std::array size must be know at compile time where the std::unique_ptr<type[]> size can be set at run time. All std::unique_ptr<type[]> really does is wrap a type * name = new type[some_size].

Nope, an std::array is just a simple wrapper around a native array, so it is on the stack and cannot release its contents unless it goes out of scope when the contents are automatically popped from the stack
You should consider using a std::vector instead. Since you are already dealing with an array on the heap. You can then std::move the vector into another one to "transfer" ownership of the contents. For example
another_vec = std::move(old_vec); // now another_vec has the contents
Note If you use a unique_ptr the array you are getting is on the heap and not on the stack! So you might be better off using a std::vector and its data() function instead. But I am not completely sure of your use case.
Another note Another thing that is not that obvious when thinking about using an std::array is that the type is a heavyweight object, this means that the regular rvalue optimizations might not work as optimally since it is not as trivial to move as a vector

«releasing» a std::array basically means calling the destructor and using the memory for something else.
You can explicitly destroy an explicitly constructed std::array by using the std::*::destroy functionality found in the standard library or calling the destructor explicitly.
This is something you usually want to avoid, unless you are implementing very basic data structures where you have no other choice. One use case is if you want to control when and how you construct and destruct a global array without an indirection through a pointer.
C++17 may provide the new functions
destroy_at, destroy and destroy_n to explicitly destruct objects.
See also the C++ FAQ on destructors.

Is it a bad idea to replace POD C-style array with std::valarray?

I'm working with a code base that is poorly written and has a lot of memory leaks.
It uses a lot of structs that contains raw pointers, which are mostly used as dynamic arrays.
Although the structs are often passed between functions, the allocation and deallocation of those pointers are placed at random places and cannot be easily tracked/reasoned/understood.
I changed some of them to classes and those pointers to be RAIIed by the classes themselves. They works well and don't look very ugly except that I banned copy-construct and copy-assignment of those classes simply because I don't want to spend time implementing them.
Now I'm thinking, am I re-inventing the wheel? Why don't I replace C-style array with std:array or std::valarray?
I would prefer std::valarray because it uses heap memory and RAIIed. And std::array is not (yet) available in my development environment.
Edit1: Another plus of std::array is that the majority of those dynamic arrays are POD (mostly int16_t, int32_t, and float) arrays, and the numeric API can possibility make life easier.
Is there anything that I need to be aware of before I start?
One I can think of is that there might not be an easy way to convert std::valarray or std::array back to C-style arrays, and part of our code does uses pointer arithmetic and need data to be presented as plain C-style arrays.
Anything else?
EDIT 2
I came across this question recently. A VERY BAD thing about std::valarray is that it's not safely copy-assignable until C++11.
As is quoted in that answer, in C++03 and earlier, it's UB if source and destination are of different sizes.

The standard replacement of C-style array would be std::vector. std::valarray is some "weird" math-vector for doing number-calculation-like stuff. It is not really designed to store an array of arbitrary objects.
That being said, using std::vector is most likely a very good idea. It would fix your leaks, use the heap, is resizable, has great exception-safety and so on.
It also guarantees that the data is stored in one contiguous block of memory. You can get a pointer to said block with the data() member function or, if you are pre-C++11, with &v[0] for a non-empty vector v. You can then do your pointer business with it as usual.

std::unique_ptr<int[]> is close to a drop-in replacement for an owning int*. It has the nice property that it will not implicitly copy itself, but it will implicitly move.
An operation that copies will generate compile time errors, instead of run time inefficiency.
It also has next to no run time overhead over that owning int* other than a null-check at destruction. It uses no more space than an int*.
std::vector<int> stores 3 pointers and implicitly copies (which can be expensive, and does not match your existing code behavior).
I would start with std::unique_ptr<int[]> as a first pass and get it working. I might transition some code over to std::vector<int> after I decide that intelligent buffer management is worth it.
Actually, as a first pass, I'd look for memcpy and memset and similar functions and make sure they aren't operating on the structures in question before I start adding RAII members.
A std::unique_ptr<int[]> means that the default created destructor for a struct will do the RAII cleanup for you without having to write any new code.

I would prefer std::vector as the replacement of c-style arrays. You can have a direct access to the underlying data (something like bare pointers) via .data():
Returns pointer to the underlying array serving as element storage.

vector<X> vec vs vector<X> vec

What is the difference in memory usage between:
std::vector<X*> vec
where each element is on the heap, but the vector itself isn't
and
std::vector<X>* vec
where the vector is declared on the heap, but each element is (on the stack?).
The second option doesn't make much sense- does it mean the vector pointer is on the heap, but it points back at each element, which are on the stack??

std::vector<X*> vec
Is an array of pointers of the class X. This is useful, for example, when making an array of non-copyable classes/objects like std::fstream in C++98. So
std::vector<std::fstream> vec;
is WRONG, and won't work. But
std::vector<std::fstream*> vec;
works, while you have to create a new object for each element, so for example if you want 5 fstream elements you have to write something like
vec.resize(5);
for(unsigned long i = 0; i < vec.size(); i++)
{
vec[i] = new std::fstream;
}
Of course, there are many other uses depending on your application.
Now the second case is a pointer of the vector itself. So:
vector<int>* vec;
is just a pointer! it doesn't carry any information, and you can't use it unless you create the object for the vector itself, like
vec = new vector<int>();
and eventually you may use it as:
vec->resize(5);
Now this is not really useful, since vectors anyway store their data on the heap and manage the memory they carry. So use it only if you have a good reason to do it, and sometimes you would need it. I don't have any example in mind on how it could be useful.

If this is what you really asked:
vector<X>* vec = new vector<X>();
it means that the whole vector with all its elements is on the heap. The elements occupy a contiguous memory block on the heap.

The difference is where (and what) you need to do for manual memory management.
Whenever you have a raw C-style pointer in C++, you need to do some manual memory management -- the raw pointer can point at anything, and the compiler won't do any automatic construction or destruction for you. So you need to be aware of where the pointer points and who 'owns' the memory pointed at in the rest of your code.
So when you have
std::vector<X*> vec;
you don't need to worry about memory management for the vector itself (the compiler will do it for you), but you do need to worry about the memory management of the pointed at X objects for the pointers you put in the vector. If you're allocating them with new, you need to make sure to manually delete them at some point.
When you have
std::vector<X> *vec;
You DO need to worry about memory management for the vector itself, but you DON'T need to worry about memory management for the individual elements.
Simplest is if you have:
std::vector<X> vec;
then you don't need to worry about memory management at all -- the compiler will take care of it for you.

In code using good modern C++ style, none of the above is true.
std::vector<X*> is a collection of handles to objects of type X or any of its subclasses, which you do not own. The owner knows how they were allocated and will deallocate them -- you don't know and don't care.
std::vector<X>* would in practice, only ever be used as a function argument which represents a vector you do not own (the caller does) but which you are going to modify. According to one common approach, the fact that it's a pointer rather than a vector means that it is optional. Much more rarely it might be used as a class member where the lifetime of the attached vector is known to outlive the class pointing to it.
std::vector<std::unique_ptr<X>> is a polymorphic collection of mixed objects of various subclasses of X (and maybe X itself directly). Occasionally you might use it non-polymorphically if X is expensive to move, but modern style makes most types cheap to move.
Prior to C++11, std::vector<some_smart_pointer<X> > (yes, there's a space between the closing brackets) would be used for both the polymorphic case and the non-copyable case. Note that some_smart_pointer isn't std::unique_ptr, which didn't exist yet, not std::auto_ptr, which wasn't usable in collections. boost::unique_ptr was a good choice. With C++11, the copyability requirement for collection elements is relaxed to moveability, so this reason completely went away. (There remain some types which are neither copyable nor moveable, such as the ScopeGuard pattern, but these should not be stored in a collection anyway)

Pointer to vector vs vector of pointers vs pointer to vector of pointers

Just wondering what you think is the best practice regarding vectors in C++.
If I have a class containing a vector member variable.
When should this vector be declared a:
"Whole-object" vector member varaiable containing values, i.e. vector<MyClass> my_vector;
Pointer to a vector, i.e vector<MyClass>* my_vector;
Vector of pointers, i.e. vector<MyClass*> my_vector;
Pointer to vector of pointers, i.e. vector<MyClass*>* my_vector;
I have a specific example in one of my classes where I have currently declared a vector as case 4, i.e. vector<AnotherClass*>* my_vector;
where AnotherClass is another of the classes I have created.
Then, in the initialization list of my constructor, I create the vector using new:
MyClass::MyClass()
: my_vector(new vector<AnotherClass*>())
{}
In my destructor I do the following:
MyClass::~MyClass()
{
for (int i=my_vector->size(); i>0; i--)
{
delete my_vector->at(i-1);
}
delete my_vector;
}
The elements of the vectors are added in one of the methods of my class.
I cannot know how many objects will be added to my vector in advance. That is decided when the code executes, based on parsing an xml-file.
Is this good practice? Or should the vector instead be declared as one of the other cases 1, 2 or 3 ?
When to use which case?
I know the elements of a vector should be pointers if they are subclasses of another class (polymorphism). But should pointers be used in any other cases ?
Thank you very much!!

Usually solution 1 is what you want since it’s the simplest in C++: you don’t have to take care of managing the memory, C++ does all that for you (for example you wouldn’t need to provide any destructor then).
There are specific cases where this doesn’t work (most notably when working with polymorphous objects) but in general this is the only good way.
Even when working with polymorphous objects or when you need heap allocated objects (for whatever reason) raw pointers are almost never a good idea. Instead, use a smart pointer or container of smart pointers. Modern C++ compilers provide shared_ptr from the upcoming C++ standard. If you’re using a compiler that doesn’t yet have that, you can use the implementation from Boost.

Definitely the first!
You use vector for its automatic memory management. Using a raw pointer to a vector means you don't get automatic memory management anymore, which does not make sense.
As for the value type: all containers basically assume value-like semantics. Again, you'd have to do memory management when using pointers, and it's vector's purpose to do that for you. This is also described in item 79 from the book C++ Coding Standards. If you need to use shared ownership or "weak" links, use the appropriate smart pointer instead.

Deleting all elements in a vector manually is an anti-pattern and violates the RAII idiom in C++. So if you have to store pointers to objects in a vector, better use a 'smart pointer' (for example boost::shared_ptr) to facilitate resource destructions. boost::shared_ptr for example calls delete automatically when the last reference to an object is destroyed.
There is also no need to allocate MyClass::my_vector using new. A simple solution would be:
class MyClass {
std::vector<whatever> m_vector;
};
Assuming whatever is a smart pointer type, there is no extra work to be done. That's it, all resources are automatically destroyed when the lifetime of a MyClass instance ends.
In many cases you can even use a plain std::vector<MyClass> - that's when the objects in the vector are safe to copy.

In your example, the vector is created when the object is created, and it is destroyed when the object is destroyed. This is exactly the behavior you get when making the vector a normal member of the class.
Also, in your current approach, you will run into problems when making copies of your object. By default, a pointer would result in a flat copy, meaning all copies of the object would share the same vector. This is the reason why, if you manually manage resources, you usually need The Big Three.
A vector of pointers is useful in cases of polymorphic objects, but there are alternatives you should consider:
If the vector owns the objects (that means their lifetime is bounded by that of the vector), you could use a boost::ptr_vector.
If the objects are not owned by the vector, you could either use a vector of boost::shared_ptr, or a vector of boost::ref.

A pointer to a vector is very rarely useful - a vector is cheap to construct and destruct.
For elements in the vector, there's no correct answer. How often does the vector change? How much does it cost to copy-construct the elements in the vector? Do other containers have references or pointers to the vector elements?
As a rule of thumb, I'd go with no pointers until you see or measure that the copying of your classes is expensive. And of course the case you mentioned, where you store various subclasses of a base class in the vector, will require pointers.
A reference counting smart pointer like boost::shared_ptr will likely be the best choice if your design would otherwise require you to use pointers as vector elements.

Complex answer : it depends.
if your vector is shared or has a lifecycle different from the class which embeds it, it might be better to keep it as a pointer.
If the objects you're referencing have no (or have expensive) copy constructors , then it's better to keep a vector of pointer. In the contrary, if your objects use shallow copy, using vector of objects prevent you from leaking...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js