Returning a vector in C++ - c++

I just read this post on SO, that discusses where in memory, STL vectors are stored. According to the accepted answer,
vector<int> temp;
the header info of the vector on the stack but the contents on the heap.
In that case, would the following code be erroneous?
vector<int> some_function() {
vector<int> some_vector;
some_vector.push_back(10);
some_vector.push_back(20);
return some_vector;
}
Should I have used vector<int> *some_vector = new vector<int> instead? Would the above code result in some code of memory allocation issues? Would this change if I used an instance of a custom class instead of int?

Your code is precisely fine.
Vectors manage all the memory they allocate for you.
It doesn't matter whether they store all their internal data using dynamic allocations, or hold some metadata as direct members (with automatic storage duration). Any dynamic allocations performed internally will be safely cleaned-up in the vector's destructor, copy constructor, and other similar special functions.
You do not need to do anything as all of that is abstracted away from your code. Your code has no visibility into that mechanism, and dynamically allocating the vector itself will not have any effect on it.
That is the purpose of them!

If you decide for dynamic allocation of the vector, you will have really hard time destroying it correctly even in very simple cases (do not forget about exceptions!). Do avoid dynamic allocation at all costs whenever possible.
In other words, your code is perfectly correct. I would not worry about copying the returned vector in memory. In these simple cases compilers (in release builds) should use return value optimization / RVO (http://en.wikipedia.org/wiki/Return_value_optimization) and create some_vector at memory of the returned object. In C++11 you can use move semantics.
But if you really do not trust the compiler using RVO, you can always pass a reference to a vector and fill it in inside the function.
//function definition
void some_function(vector<int> &v) { v.push_back(10); v.push_back(20); }
//function usage
vector<int> vec;
some_function(vec);
And back to dynamic allocation, if you really need to use it, try the pattern called RAII. Or use smart pointers.

It is not important where internally vectors define their data because you return the vector by copy.:) (by value) It is the same as if you would return an integer
int some_function()
{
int x = 10;
return x;
}

Related

Use unique_ptr and appropriate container to do memory management

First of all, my motivation is to do efficient memory management on top of a C like computational kernel. And I tried to use the std::unique_ptr and std::vector, my code looks like below
// my data container
typedef std::unique_ptr<double> my_type;
std::vector<my_type> my_storage;
// when I need some memory for computation kernel
my_storage.push_back(my_type());
my_storage.back.reset(new double[some_length]);
// get pointer to do computational stuff
double *p_data=my_storage.back.get();
Notice here in practice p_data may be stored in some other container(e.g. map) to indexing each allocated array according to the domain problem, nevertheless, my main questions are
Here is std::vector a good choice? what about other container like std::list/set?
Is there fundamental problem with my allocation method?
Suppose after I use p_data for some operations, now I want to release the memory chunk pointed by the raw pointer p_data, what is the best practice here?
First of all, if you are allocating an array you need to use the specialization std::unique_ptr<T[]> or you won't get a delete [] on memory release but a simple delete.
std::vector is a good choice unless you have any explicit reason to use something different. For example, if you are going to move many elements inside the container then a std::list could perform better (less memmove operations to shift things around).
Regarding how to manage memory, it depends mainly on the pattern of utilization. If my_storage is mainly responsible for everything (which in your specification it is, since unique_ptr expresses ownership), it means that it will be the only one who can release memory. Which could be done simply by calling my_storage[i].reset().
Mind that storing raw pointers of managed objects inside other collections leads to dangling pointers if memory is released, for example:
using my_type = std::unique_ptr<double[]>;
using my_storage = std::vector<my_type>;
my_storage data;
data.push_back(my_type(new double[100]));
std::vector<double*> rawData;
rawData.push_back(data[0].get());
data.clear(); // delete [] is called on array and memory is released
*rawData[0] = 1.2; // accessing a dangling pointer -> bad
This could be a problem or not, if data is released by last then there are no problems, otherwise you could store const references to std::unique_ptr so that at least you'd be able to check if memory is still valid, e.g.:
using my_type = std::unique_ptr<double[]>;
using my_managed_type = std::reference_wrapper<const my_type>;
std::vector<my_managed_type> rawData;
Using std::unique_ptr with any STL container , including std::vector, is fine in general. But you are not using std::unique_ptr the correct way (you are not using the array specialized version of it), and you don't need to resort to using back.reset() at all. Try this instead:
// my data container
typedef std::unique_ptr<double[]> my_type;
// or: using my_type = std::unique_ptr<double[]>;
std::vector<my_type> my_storage;
my_type ptr(new double[some_length]);
my_storage.push_back(std::move(ptr));
// or: my_storage.push_back(my_type(new double[some_length]));
// or: my_storage.emplace_back(new double[some_length]);

Program terminates in initialization of large array

I have V = 3997962, and I want to have an array of that size consisting of vectors of ints in C++.
When I initialize it like this:
const int V = 3997962;
vector<int> array[V];
The program terminates without prompting to any error.
Is it stack overflow error? How can I go about this?
Should I define it like this:
vector<int>* test = new vector<int>[V];
How can I pass this variable to a function? And how it should be defined as an argument? Do I need to delete it after all?
If this is a local variable, you are basically asking for almost 8 million pointer variables in automatic storage. This will likely fail due to stack overflow.
You can instead use a vector of vectors.
vector<vector<int>> array(V);
The above results in a vector called array that is populated with V default initialized vector<int>s.
It is very likely a stack overflow.
You are allocating V vector<int>s. While the elements of those vectors will be allocated on the heap, the vectors themselves (which contain a pointer and a few other objects) are being allocated on the stack. If you have V of these, you will likely hit your stack limit.
vector<int>* test = new vector<int>[V];
This is one possible solution, but not ideal. It will require you to delete[] the array later, with delete[] test;. You could get around this problem by wrapping this dynamically allocated array in a smarter pointer, but keep reading for a better solution.
How you would pass this to other functions is not really relevant (you should design function parameters completely independent of how the client might allocate them), but you could just pass the pointer around:
void f(vector<int>* param);
f(test);
The parameter could alternatively be written as vector<int> param[], which might better express that this pointer points to an array. consts could also be added where you want immutability. However, we can find a nicer solution by avoiding using new and raw pointers entirely.
Instead, I would recommend having a vector<vector<int>>:
vector<vector<int>> test(V);
Now you only actually have one vector on the stack. The elements of that vector, which are themselves vectors, will be allocated on the heap, and their elements also.
You should provide that the array would have static storage duration.
Either define it outside any function in some name space or use keyword static if you want to define it in a function (for example main)
static std::vector<int> array[V];
because the stack memory is very limited.
Otherwise define it in the heap or use the same vector. For example
std::vector<vector<int>> array( V );
Or
std::vector<vector<int>> array;
array.reserve( V );
Take into account that class std::vector has member function max_size that allows to obtain information about the maximum acceptable size of the vector.
You're asking for 3997962 * sizeof(std::vector<int>) in automatic storage space if this is declared local to some function. To better understand how much space the basic management members of a std::vector<int> occupy, consider:
#include <iostream>
#include <vector>
int main()
{
std::cout << sizeof(std::vector<int>) << '\n';
}
Output (OSX 10.10, 64bit clang 3.5)
24
thusly, (at least on my platform) you're requesting at least 3997962 * 24, or 95951088 bytes ( roughly 92 MB) of automatic storage. So yes, you're very likely blowing out your automatic storage space. To place all but the primary management data of a single vector on the heap, you can:
std::vector<std::vector<int>> vv(N);
which will create a vector of N vectors of int, all of which are initially empty and heap-managed. the core management data internal to the base vector vv is still in automatic storage, but as you can see:
#include <iostream>
#include <vector>
int main()
{
std::cout << sizeof(std::vector<std::vector<int>>) << '\n';
}
Output
24
the footprint in automatic storage is considerably reduced
To address your followup questions of:
How can I pass this variable to a function?
How it should be defined as an argument?
How to pass this (the first question) depends entirely on whether you need to modify its content, and will effect how you declare the parameter (the second question). To avoid expensive copies, pass it by reference. Second, if the callee doesn't need to modify the data, pass it as const:
// read-only, pass as const-reference
void doesnt_modify(const std::vector<std::vector<int>>& vv)
{
// use here, can't modify
}
// writable, pass as reference
void can_modify(std::vector<std::vector<int>>& vv)
{
// use here, can modify
}
While the vector data is located on the heap. The vector object size itself is (on a 64bit, Linux) 24 bytes, so you are locating 24*3997962 ~ 95MB on the stack. The default stack limit on a linux machine for example is ~8MB (try ulimit -a to check). So it is likely a stack over flow.

Can an rvalue vector be allocated on the heap?

I'm not sure if I'm using the right terminology here, but say I have a function that returns a vector:
std::vector<int> func()
{
std::vector<int> vec(100,1);
return vec;
}
And when I call this function I want to allocate the vector on the heap. Can I do this?
I'm thinking something along the lines of this:
std::shared_ptr<std::vector<int>> vec(new std::vector<int>);
vec->swap(func());
Is there a way of doing this that is less convoluted, without changing func()?
Just try to remove that std::move, it's a specific compiler exception to avoid you put a std::move and let the compiler to do the rest.
std::vector<int> func()
{
std::vector<int> vec(100,1);
return vec; // NOT: return std::move(vec);
}
Why?
Because, the automatic object vec is going to destroy after executing return and it will be behave as same as a rvalue in this case. Then compiler will move it. Putting std::move will annoy the compiler to NRVO.
That simple returning the vector is optimized and don't worry about the performance.
The only better way i can think of is not use std::move since the compiler automatically does RVO
The second expression can be shortened a bit :
std::vector<int>* vec2 = new std::vector<int>( f() );
And just like what other says, allocating vector on the heap isn't really neccessary
I don't know what you're doing, how much can you optimize away from the stack without any trade-offs or from the heap if you want more fragments in your system memory ? If you have millions of records you need to allocate on "your heap", the first thing comes to my mind is your chance to get it corrupted with operations possible existing in your code is high. Second, heap size is limited then you need to re-implement your allocator to handle "how to deal with small memory and large data to be stored".
If you insist on optimization strategy over heap management, then returning pointer to the vector seems promising
std::vector<int*>* yourfunc()
{
// do something
return pVec;
}
then new method is applied but object deletion at the end is still required.

C++ New vs Malloc for dynamic memory array of Objects

I have a class Bullet that takes several arguments for its construction. However, I am using a dynamic memory array to store them. I am using C++ so i want to conform to it's standard by using the new operator to allocate the memory. The problem is that the new operator is asking for the constructor arguments when I'm allocating the array, which I don't have at that time. I can accomplish this using malloc to get the right size then fill in form there, but that's not what i want to use :) any ideas?
pBulletArray = (Bullet*) malloc(iBulletArraySize * sizeof(Bullet)); // Works
pBulletArray = new Bullet[iBulletArraySize]; // Requires constructor arguments
Thanks.
You can't.
And if you truly want to conform to C++ standards, you should use std::vector.
FYI, it would probably be even more expensive than what you're trying to achieve. If you could do this, new would call a constructor. But since you'll modify the object later on anyway, the initial construction is useless.
1) std::vector
A std::vector really is the proper C++ way to do this.
std::vector<Bullet> bullets;
bullets.reserve(10); // allocate memory for bullets without constructing any
bullets.push_back(Bullet(10.2,"Bang")); // put a Bullet in the vector.
bullets.emplace_back(10.2,"Bang"); // (C++11 only) construct a Bullet in the vector without copying.
2) new [] operator
It is also possible to do this with new, but you really shouldn't. Manually managing resources with new/delete is an advanced task, similar to template meta-programming in that it's best left to library builders, who'll use these features to build efficient, high level libraries for you. In fact to do this correctly you'll basically be implementing the internals of std::vector.
When you use the new operator to allocate an array, every element in the array is default initialized. Your code could work if you added a default constructor to Bullet:
class Bullet {
public:
Bullet() {} // default constructor
Bullet(double,std::string const &) {}
};
std::unique_ptr<Bullet[]> b = new Bullet[10]; // default construct 10 bullets
Then, when you have the real data for a Bullet you can assign it to one of the elements of the array:
b[3] = Bullet(20.3,"Bang");
Note the use of unique_ptr to ensure that proper clean-up occurs, and that it's exception safe. Doing these things manually is difficult and error prone.
3) operator new
The new operator initializes its objects in addition to allocating space for them. If you want to simply allocate space, you can use operator new.
std::unique_ptr<Bullet,void(*)(Bullet*)> bullets(
static_cast<Bullet*>(::operator new(10 * sizeof(Bullet))),
[](Bullet *b){::operator delete(b);});
(Note that the unique_ptr ensures that the storage will be deallocated but no more. Specifically, if we construct any objects in this storage we have to manually destruct them and do so in an exception safe way.)
bullets now points to storage sufficient for an array of Bullets. You can construct an array in this storage:
new (bullets.get()) Bullet[10];
However the array construction again uses default initialization for each element, which we're trying to avoid.
AFAIK C++ doesn't specify any well defined method of constructing an array without constructing the elements. I imagine this is largely because doing so would be a no-op for most (all?) C++ implementations. So while the following is technically undefined, in practice it's pretty well defined.
bool constructed[10] = {}; // a place to mark which elements are constructed
// construct some elements of the array
for(int i=0;i<10;i+=2) {
try {
// pretend bullets points to the first element of a valid array. Otherwise 'bullets.get()+i' is undefined
new (bullets.get()+i) Bullet(10.2,"Bang");
constructed = true;
} catch(...) {}
}
That will construct elements of the array without using the default constructor. You don't have to construct every element, just the ones you want to use. However when destroying the elements you have to remember to destroy only the elements that were constructed.
// destruct the elements of the array that we constructed before
for(int i=0;i<10;++i) {
if(constructed[i]) {
bullets[i].~Bullet();
}
}
// unique_ptr destructor will take care of deallocating the storage
The above is a pretty simple case. Making non-trivial uses of this method exception safe without wrapping it all up in a class is more difficult. Wrapping it up in a class basically amounts to implementing std::vector.
4) std::vector
So just use std::vector.
It's possible to do what you want -- search for "operator new" if you really want to know how. But it's almost certainly a bad idea. Instead, use std::vector, which will take care of all the annoying details for you. You can use std::vector::reserve to allocate all the memory you'll use ahead of time.
Bullet** pBulletArray = new Bullet*[iBulletArraySize];
Then populate pBulletArray:
for(int i = 0; i < iBulletArraySize; i++)
{
pBulletArray[i] = new Bullet(arg0, arg1);
}
Just don't forget to free the memory using delete afterwards.
The way C++ new normally works is allocating the memory for the class instance and then calling the constructor for that instance. You basically have already allocated the memory for your instances.
You can call only the constructor for the first instance like this:
new((void*)pBulletArray) Bullet(int foo);
Calling the constructor of the second one would look like this (and so on)
new((void*)pBulletArray+1) Bullet(int bar);
if the Bullet constructor takes an int.
If what you're really after here is just fast allocation/deallocation, then you should look into "memory pools." I'd recommend using boost's implementation, rather than trying to roll your own. In particular, you would probably want to use an "object_pool".

preventing data from being freed when vector goes out of scope

Is there a way to transfer ownership of the data contained in a std::vector (pointed to by, say T*data) into another construct, preventing having "data" become a dangling pointer after the vector goes out of scope?
EDIT: I DON'T WANT TO COPY THE DATA (which would be an easy but ineffective solution).
Specifically, I'd like to have something like:
template<typename T>
T* transfer_ownership(vector<T>&v){
T*data=&v[0];
v.clear();
...//<--I'd like to make v's capacity 0 without freeing data
}
int main(){
T*data=NULL;
{
vector<double>v;
...//grow v dynamically
data=transfer_ownership<double>(v);
}
...//do something useful with data (user responsible for freeing it later)
// for example mxSetData(mxArray*A,double*data) from matlab's C interface
}
The only thing that comes to my mind to emulate this is:
{
vector<double>*v=new vector<double>();
//grow *v...
data=(*v)[0];
}
and then data will later either be freed or (in my case) used as mxSetData(mxArrayA,doubledata). However this results in a small memory leak (data struct for handling v's capacity, size, etc... but not the data itself of course).
Is it possible without leaking ?
A simple workaround would be swapping the vector with one you own:
vector<double> myown;
vector<double> someoneelses = foo();
std::swap( myown, someoneelses );
A tougher but maybe better approach is write your own allocator for the vector, and let it allocate out of a pool you maintain. No personal experience, but it's not too complicated.
The point of using a std::vector is not to have to worry about the data in it:
Keep your vector all along your application;
Pass it by const-ref to other functions (to avoid unnecessary copies);
And feed functions expecting a pointer-to-T with &v[0].
If you really don't want to keep your vector, you will have to copy your data -- you can't transfer ownership because std::vector guarantees it will destroy its content when going out-of-scope. In that case, use the std::copy() algorithm.
If your vector contains values you can only copy them (which happens when you call std::copy, std::swap, etc.). If you keep non-primitive objects in a vector and don't want to copy them (and use in another data structure), consider storing pointers
Does something like this work for you?
int main()
{
double *data = 0;
{
vector<double> foo;
// insert some elements to foo
data = new double[foo.size()];
std::copy(foo.begin(), foo.end(), &data[0]);
}
// Pass data to Matlab function.
delete [] data;
return 0;
}
Since you don't want to copy data between containers, but want to transfer ownership of data between containers, I suggest using a container of smart pointers as follows.
void f()
{
std::vector<boost::shared_ptr<double> > doubles;
InitVector(doubles);
std::vector<boost::shared_ptr<double> > newDoubles(doubles);
}
You really can't transfer ownership of data between standard containers without making a copy of it, since standard containers always copy the data they encapsulate. If you want to minimize the overhead of copying expensive objects, then it is a good idea to use a reference-counted smart pointer to wrap your expensive data structure. boost::shared_ptr is suitable for this task since it is fairly cheap to make a copy of it.