Is there a way to transfer ownership of the data contained in a std::vector (pointed to by, say T*data) into another construct, preventing having "data" become a dangling pointer after the vector goes out of scope?
EDIT: I DON'T WANT TO COPY THE DATA (which would be an easy but ineffective solution).
Specifically, I'd like to have something like:
template<typename T>
T* transfer_ownership(vector<T>&v){
T*data=&v[0];
v.clear();
...//<--I'd like to make v's capacity 0 without freeing data
}
int main(){
T*data=NULL;
{
vector<double>v;
...//grow v dynamically
data=transfer_ownership<double>(v);
}
...//do something useful with data (user responsible for freeing it later)
// for example mxSetData(mxArray*A,double*data) from matlab's C interface
}
The only thing that comes to my mind to emulate this is:
{
vector<double>*v=new vector<double>();
//grow *v...
data=(*v)[0];
}
and then data will later either be freed or (in my case) used as mxSetData(mxArrayA,doubledata). However this results in a small memory leak (data struct for handling v's capacity, size, etc... but not the data itself of course).
Is it possible without leaking ?
A simple workaround would be swapping the vector with one you own:
vector<double> myown;
vector<double> someoneelses = foo();
std::swap( myown, someoneelses );
A tougher but maybe better approach is write your own allocator for the vector, and let it allocate out of a pool you maintain. No personal experience, but it's not too complicated.
The point of using a std::vector is not to have to worry about the data in it:
Keep your vector all along your application;
Pass it by const-ref to other functions (to avoid unnecessary copies);
And feed functions expecting a pointer-to-T with &v[0].
If you really don't want to keep your vector, you will have to copy your data -- you can't transfer ownership because std::vector guarantees it will destroy its content when going out-of-scope. In that case, use the std::copy() algorithm.
If your vector contains values you can only copy them (which happens when you call std::copy, std::swap, etc.). If you keep non-primitive objects in a vector and don't want to copy them (and use in another data structure), consider storing pointers
Does something like this work for you?
int main()
{
double *data = 0;
{
vector<double> foo;
// insert some elements to foo
data = new double[foo.size()];
std::copy(foo.begin(), foo.end(), &data[0]);
}
// Pass data to Matlab function.
delete [] data;
return 0;
}
Since you don't want to copy data between containers, but want to transfer ownership of data between containers, I suggest using a container of smart pointers as follows.
void f()
{
std::vector<boost::shared_ptr<double> > doubles;
InitVector(doubles);
std::vector<boost::shared_ptr<double> > newDoubles(doubles);
}
You really can't transfer ownership of data between standard containers without making a copy of it, since standard containers always copy the data they encapsulate. If you want to minimize the overhead of copying expensive objects, then it is a good idea to use a reference-counted smart pointer to wrap your expensive data structure. boost::shared_ptr is suitable for this task since it is fairly cheap to make a copy of it.
Related
I am reading a lot of different things on C++ optimization and I am getting quite mixed up. I would appreciate some help. Basically, I want to clear up what needs to be a pointer or not when I am passing vectors and structures as parameters or returning vectors and structures.
Say I have a Structure that contains 2 elements: an int and then a vector of integers. I will be creating this structure locally in a function, and then returning it. This function will be called multiple times and generate a new structure every time. I would like to keep the last structure created in a class member (lastStruct_ for example). So before returning the struct I could update lastStruct_ in some way.
Now, what would be the best way to do this, knowing that the vector in the structure can be quite large (would need to avoid copies). Does the vector in the struct need to be a pointer ? If I want to share lastStruct_ to other classes by creating a get_lastStruct() method, should I return a reference to lastStruct_, a pointer, or not care about that ? Should lastStruct_ be a shared pointer ?
This is quite confusing to me because apparently C++ knows how to avoid copying, but I also see a lot of people recommending the use of pointers while others say a pointer to a vector makes no sense at all.
struct MyStruct {
std::vector<int> pixels;
int foo;
}
class MyClass {
MyStruct lastStruct_;
public:
MyStruct create_struct();
MyStruct getLastStruct();
}
MyClass::create_struct()
{
MyStruct s = {std::vector<int>(100, 1), 1234};
lastStruct_ = s;
return s;
}
MyClass::getLastStruct()
{
return lastStruct_;
}
If the only copy you're trying to remove is the one that happen when you return it from your factory function, I'd say containing the vector directly will be faster all the time.
Why? Two things. Return Value Optimisation (RVO/NRVO) will remove any need for temporaries when returning. This is enough for almost all cases.
When return value optimisation don't apply, move semantics will. returning a named variable (eg: return my_struct;) will do implicit move in the case NRVO won't apply.
So why is it always faster than a shared pointer? Because when copying the shared pointer, you must dereference the control block to increase the owner count. And since it's an atomic operation, the incrementation is not free.
Also, using a shared pointer brings shared ownership and non-locality. If you were to use a shared pointer, use a pointer to const data to bring back value semantics.
Now that you added the code, it's much clearer what you're trying to do.
There's no way around the copy here. If you measure performance degradation, then containing a std::shared_ptr<const std::vector<int>> might be the solution, since you'll keep value semantic but avoid vector copy.
First of all, my motivation is to do efficient memory management on top of a C like computational kernel. And I tried to use the std::unique_ptr and std::vector, my code looks like below
// my data container
typedef std::unique_ptr<double> my_type;
std::vector<my_type> my_storage;
// when I need some memory for computation kernel
my_storage.push_back(my_type());
my_storage.back.reset(new double[some_length]);
// get pointer to do computational stuff
double *p_data=my_storage.back.get();
Notice here in practice p_data may be stored in some other container(e.g. map) to indexing each allocated array according to the domain problem, nevertheless, my main questions are
Here is std::vector a good choice? what about other container like std::list/set?
Is there fundamental problem with my allocation method?
Suppose after I use p_data for some operations, now I want to release the memory chunk pointed by the raw pointer p_data, what is the best practice here?
First of all, if you are allocating an array you need to use the specialization std::unique_ptr<T[]> or you won't get a delete [] on memory release but a simple delete.
std::vector is a good choice unless you have any explicit reason to use something different. For example, if you are going to move many elements inside the container then a std::list could perform better (less memmove operations to shift things around).
Regarding how to manage memory, it depends mainly on the pattern of utilization. If my_storage is mainly responsible for everything (which in your specification it is, since unique_ptr expresses ownership), it means that it will be the only one who can release memory. Which could be done simply by calling my_storage[i].reset().
Mind that storing raw pointers of managed objects inside other collections leads to dangling pointers if memory is released, for example:
using my_type = std::unique_ptr<double[]>;
using my_storage = std::vector<my_type>;
my_storage data;
data.push_back(my_type(new double[100]));
std::vector<double*> rawData;
rawData.push_back(data[0].get());
data.clear(); // delete [] is called on array and memory is released
*rawData[0] = 1.2; // accessing a dangling pointer -> bad
This could be a problem or not, if data is released by last then there are no problems, otherwise you could store const references to std::unique_ptr so that at least you'd be able to check if memory is still valid, e.g.:
using my_type = std::unique_ptr<double[]>;
using my_managed_type = std::reference_wrapper<const my_type>;
std::vector<my_managed_type> rawData;
Using std::unique_ptr with any STL container , including std::vector, is fine in general. But you are not using std::unique_ptr the correct way (you are not using the array specialized version of it), and you don't need to resort to using back.reset() at all. Try this instead:
// my data container
typedef std::unique_ptr<double[]> my_type;
// or: using my_type = std::unique_ptr<double[]>;
std::vector<my_type> my_storage;
my_type ptr(new double[some_length]);
my_storage.push_back(std::move(ptr));
// or: my_storage.push_back(my_type(new double[some_length]));
// or: my_storage.emplace_back(new double[some_length]);
Let's say my function:
vector<MyClass>* My_func(int a)
{
vector<MyClass>* ptr = new vector<MyClass>;
//...... Add a lot of elements to this vector, and let's say MyClass is also relatively big structure.
return ptr;
}
This method leaves responsibility for user to free the pointer.
Another method I can think of is just creating local variable in function and return the value:
vector<MyClass> My_func(int a)
{
vector<MyClass> vec;
//...... Add a lot of elements to this vector, and let's say MyClass is also relatively big structure.
return vec;
}
This one avoid the responsibility for user but may take a lot of space when return and copy the value.
Maybe smart pointer in C++ is a better choice but I am not sure. I did not use smart pointer before. What do people do when they come across this situation? What kind of return type will they choose?
Thanks ahead for your tips:-)
In most cases of this sort of construct, the compiler will do "Return Value Optimisation", which means that it's not actually copying the data structure being returned, but instead writing straight into one that lives on the place where it will be returned to.
So, you can safely do this without worrying about it being copied.
However, another method would be to not return a vector in the first, place, but request that the calling code pass one in:
So, something like:
void My_func(int a, vector<MyClass>& vec)
{
...
}
This is GUARANTEED to avoid copying.
In many situations return value optimizations easily can take care of the unnecessary copying. The question is: how are you planning to use this vector outside the function? If you have something like:
vector<MyClass> ret = My_func(a);
Then the optimizations can take care of the problem.
On the other hand, if you want to reuse an existing vector, you could pass a non-const reference to an existing vector, but there aren't many situations where this is needed or useful.
vector<MyClass> ret;
// do something with ret ...
My_func(a, ret);
Plus, this also changes the semantics of your function (e.g. you may need to clear() the vector).
Here is the internal structure of a vector
template <typename T>
class vector {
private:
size_t m_size;
size_t m_cap;
T * m_data;
public:
//methods push pop etc.
};
As you can the size of the vector (with 2 additional size_t data members) is not much larger than size of a pointer. There will be negligible performance benefit in passing a vector instead of pointer, infact using a pointer, accessing the vector will be slower as each time you will have to dereference the pointer. Generally, we don't create a pointer to a vector.
Also, never return a pointer to a local variable. The memory of the local variable will be wiped off once you return value & go out of the scope of the method. Ideally, you should create a vector in the calling function and pass a reference to the vector, when calling your method My_func.
I just read this post on SO, that discusses where in memory, STL vectors are stored. According to the accepted answer,
vector<int> temp;
the header info of the vector on the stack but the contents on the heap.
In that case, would the following code be erroneous?
vector<int> some_function() {
vector<int> some_vector;
some_vector.push_back(10);
some_vector.push_back(20);
return some_vector;
}
Should I have used vector<int> *some_vector = new vector<int> instead? Would the above code result in some code of memory allocation issues? Would this change if I used an instance of a custom class instead of int?
Your code is precisely fine.
Vectors manage all the memory they allocate for you.
It doesn't matter whether they store all their internal data using dynamic allocations, or hold some metadata as direct members (with automatic storage duration). Any dynamic allocations performed internally will be safely cleaned-up in the vector's destructor, copy constructor, and other similar special functions.
You do not need to do anything as all of that is abstracted away from your code. Your code has no visibility into that mechanism, and dynamically allocating the vector itself will not have any effect on it.
That is the purpose of them!
If you decide for dynamic allocation of the vector, you will have really hard time destroying it correctly even in very simple cases (do not forget about exceptions!). Do avoid dynamic allocation at all costs whenever possible.
In other words, your code is perfectly correct. I would not worry about copying the returned vector in memory. In these simple cases compilers (in release builds) should use return value optimization / RVO (http://en.wikipedia.org/wiki/Return_value_optimization) and create some_vector at memory of the returned object. In C++11 you can use move semantics.
But if you really do not trust the compiler using RVO, you can always pass a reference to a vector and fill it in inside the function.
//function definition
void some_function(vector<int> &v) { v.push_back(10); v.push_back(20); }
//function usage
vector<int> vec;
some_function(vec);
And back to dynamic allocation, if you really need to use it, try the pattern called RAII. Or use smart pointers.
It is not important where internally vectors define their data because you return the vector by copy.:) (by value) It is the same as if you would return an integer
int some_function()
{
int x = 10;
return x;
}
I am looking for a way to insert multiple objects of type A inside a container object, without making copies of each A object during insertion. One way would be to pass the A objects by reference to the container, but, unfortunately, as far as I've read, the STL containers only accept passing objects by value for insertions (for many good reasons). Normally, this would not be a problem, but in my case, I DO NOT want the copy constructor to be called and the original object to get destroyed, because A is a wrapper for a C library, with some C-style pointers to structs inside, which will get deleted along with the original object...
I only require a container that can return one of it's objects, given a particular index, and store a certain number of items which is determined at runtime, so I thought that maybe I could write my own container class, but I have no idea how to do this properly.
Another approach would be to store pointers to A inside the container, but since I don't have a lot of knowledge on this subject, what would be a proper way to insert pointers to objects in an STL container? For example this:
std::vector<A *> myVector;
for (unsigned int i = 0; i < n; ++i)
{
A *myObj = new myObj();
myVector.pushBack(myObj);
}
might work, but I'm not sure how to handle it properly and how to dispose of it in a clean way. Should I rely solely on the destructor of the class which contains myVector as a member to dispose of it? What happens if this destructor throws an exception while deleting one of the contained objects?
Also, some people suggest using stuff like shared_ptr or auto_ptr or unique_ptr, but I am getting confused with so many options. Which one would be the best choice for my scenario?
You can use boost or std reference_wrapper.
#include <boost/ref.hpp>
#include <vector>
struct A {};
int main()
{
A a, b, c, d;
std::vector< boost::reference_wrapper<A> > v;
v.push_back(boost::ref(a)); v.push_back(boost::ref(b));
v.push_back(boost::ref(c)); v.push_back(boost::ref(d));
return 0;
}
You need to be aware of object lifetimes when using
reference_wrapper to not get dangling references.
int main()
{
std::vector< boost::reference_wrapper<A> > v;
{
A a, b, c, d;
v.push_back(boost::ref(a)); v.push_back(boost::ref(b));
v.push_back(boost::ref(c)); v.push_back(boost::ref(d));
// a, b, c, d get destroyed by the end of the scope
}
// now you have a vector full of dangling references, which is a very bad situation
return 0;
}
If you need to handle such situations you need a smart pointer.
Smart pointers are also an option but it is crucial to know which one to use. If your data is actually shared, use shared_ptr if the container owns the data use unique_ptr.
Anyway, I don't see what the wrapper part of A would change. If it contains pointers internally and obeys the rule of three, nothing can go wrong. The destructor will take care of cleaning up. This is the typical way to handle resources in C++: acquire them when your object is initialized, delete them when the lifetime of your object ends.
If you purely want to avoid the overhead of construction and deletion, you might want to use vector::emplace_back.
In C++11, you can construct container elements in place using emplace functions, avoiding the costs and hassle of managing a container of pointers to allocated objects:
std::vector<A> myVector;
for (unsigned int i = 0; i < n; ++i)
{
myVector.emplace_back();
}
If the objects' constructor takes arguments, then pass them to the emplace function, which will forward them.
However, objects can only be stored in a vector if they are either copyable or movable, since they have to be moved when the vector's storage is reallocated. You might consider making your objects movable, transferring ownership of the managed resources, or using a container like deque or list that doesn't move objects as it grows.
UPDATE: Since this won't work on your compiler, the best option is probably std::unique_ptr - that has no overhead compared to a normal pointer, will automatically delete the objects when erased from the vector, and allows you to move ownership out of the vector if you want.
If that's not available, then std::shared_ptr (or std::tr1::shared_ptr or boost::shared_ptr, if that's not available) will also give you automatic deletion, for a (probably small) cost in efficiency.
Whatever you do, don't try to store std::auto_ptr in a standard container. It's destructive copying behaviour makes it easy to accidentally delete the objects when you don't expect it.
If none of these are available, then use a pointer as in your example, and make sure you remember to delete the objects once you've finished with them.