Program terminates in initialization of large array - c++

I have V = 3997962, and I want to have an array of that size consisting of vectors of ints in C++.
When I initialize it like this:
const int V = 3997962;
vector<int> array[V];
The program terminates without prompting to any error.
Is it stack overflow error? How can I go about this?
Should I define it like this:
vector<int>* test = new vector<int>[V];
How can I pass this variable to a function? And how it should be defined as an argument? Do I need to delete it after all?

If this is a local variable, you are basically asking for almost 8 million pointer variables in automatic storage. This will likely fail due to stack overflow.
You can instead use a vector of vectors.
vector<vector<int>> array(V);
The above results in a vector called array that is populated with V default initialized vector<int>s.

It is very likely a stack overflow.
You are allocating V vector<int>s. While the elements of those vectors will be allocated on the heap, the vectors themselves (which contain a pointer and a few other objects) are being allocated on the stack. If you have V of these, you will likely hit your stack limit.
vector<int>* test = new vector<int>[V];
This is one possible solution, but not ideal. It will require you to delete[] the array later, with delete[] test;. You could get around this problem by wrapping this dynamically allocated array in a smarter pointer, but keep reading for a better solution.
How you would pass this to other functions is not really relevant (you should design function parameters completely independent of how the client might allocate them), but you could just pass the pointer around:
void f(vector<int>* param);
f(test);
The parameter could alternatively be written as vector<int> param[], which might better express that this pointer points to an array. consts could also be added where you want immutability. However, we can find a nicer solution by avoiding using new and raw pointers entirely.
Instead, I would recommend having a vector<vector<int>>:
vector<vector<int>> test(V);
Now you only actually have one vector on the stack. The elements of that vector, which are themselves vectors, will be allocated on the heap, and their elements also.

You should provide that the array would have static storage duration.
Either define it outside any function in some name space or use keyword static if you want to define it in a function (for example main)
static std::vector<int> array[V];
because the stack memory is very limited.
Otherwise define it in the heap or use the same vector. For example
std::vector<vector<int>> array( V );
Or
std::vector<vector<int>> array;
array.reserve( V );
Take into account that class std::vector has member function max_size that allows to obtain information about the maximum acceptable size of the vector.

You're asking for 3997962 * sizeof(std::vector<int>) in automatic storage space if this is declared local to some function. To better understand how much space the basic management members of a std::vector<int> occupy, consider:
#include <iostream>
#include <vector>
int main()
{
std::cout << sizeof(std::vector<int>) << '\n';
}
Output (OSX 10.10, 64bit clang 3.5)
24
thusly, (at least on my platform) you're requesting at least 3997962 * 24, or 95951088 bytes ( roughly 92 MB) of automatic storage. So yes, you're very likely blowing out your automatic storage space. To place all but the primary management data of a single vector on the heap, you can:
std::vector<std::vector<int>> vv(N);
which will create a vector of N vectors of int, all of which are initially empty and heap-managed. the core management data internal to the base vector vv is still in automatic storage, but as you can see:
#include <iostream>
#include <vector>
int main()
{
std::cout << sizeof(std::vector<std::vector<int>>) << '\n';
}
Output
24
the footprint in automatic storage is considerably reduced
To address your followup questions of:
How can I pass this variable to a function?
How it should be defined as an argument?
How to pass this (the first question) depends entirely on whether you need to modify its content, and will effect how you declare the parameter (the second question). To avoid expensive copies, pass it by reference. Second, if the callee doesn't need to modify the data, pass it as const:
// read-only, pass as const-reference
void doesnt_modify(const std::vector<std::vector<int>>& vv)
{
// use here, can't modify
}
// writable, pass as reference
void can_modify(std::vector<std::vector<int>>& vv)
{
// use here, can modify
}

While the vector data is located on the heap. The vector object size itself is (on a 64bit, Linux) 24 bytes, so you are locating 24*3997962 ~ 95MB on the stack. The default stack limit on a linux machine for example is ~8MB (try ulimit -a to check). So it is likely a stack over flow.

Related

How can I pass and store an array of variable size containing pointers to objects?

For my project I need to store pointers to objects of type ComplicatedClass in an array. This array is stored in a class Storage along with other information I have omitted here.
Here's what I would like to do (which obviously doesn't work, but hopefully explains what I'm trying to achieve):
class ComplicatedClass
{
...
}
class Storage
{
public:
Storage(const size_t& numberOfObjects, const std::array<ComplicatedClass *, numberOfObjects>& objectArray)
: size(numberOfObjects),
objectArray(objectArray)
{}
...
public:
size_t size;
std::array<ComplicatedClass *, size> objectArray;
...
}
int main()
{
ComplicatedClass * object1 = new ComplicatedClass(...);
ComplicatedClass * object2 = new ComplicatedClass(...);
Storage myStorage(2, {object1, object2});
...
return 0;
}
What I am considering is:
Using std::vector instead of std::array. I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that. As a plus I would be able to ditch size.
Changing Storage to a class template. I would like to avoid this because then I have templates all over my code. This is not terrible but it would make classes that use Storage much less readable, because they would also have to have templated functions.
Are there any other options that I am missing?
How can I pass and store an array of variable size containing pointers to objects?
By creating the objects dynamically. Most convenient solution is to use std::vector.
size_t size;
std::array<ComplicatedClass *, size> objectArray;
This cannot work. Template arguments must be compile time constant. Non-static member variables are not compile time constant.
I would like to avoid this because there are parts of my program that are not allowed to allocate memory on the free-store. As far as I know, std::vector would have to do that.
std::vector would not necessarily require the use of free-store. Like all standard containers (besides std::array), std::vector accepts an allocator. If you implement a custom allocator that doesn't use free-store, then your requirement can be satisfied.
Alternatively, even if you do use the default allocator, you could write your program in such way that elements are inserted into the vector only in parts of your program that are allowed to allocate from the free-store.
I thought C++ had "free-store" instead of heap, does it not?
Those are just different words for the same thing. "Free store" is the term used in C++. It's often informally called "heap memory" since "heap" is a data structure that is sometimes used to implement it.
Beginning with C++11 std::vector has the data() method to access the underlying array the vector is using for storage.
And in most cases a std::vector can be used similar to an array allowing you to take advantage of the size adjusting container qualities of std::vector when you need them or using it as an array when you need that. See https://stackoverflow.com/a/261607/1466970
Finally, you are aware that you can use vectors in place of arrays,
right? Even when a function expects c-style arrays you can use
vectors:
vector<char> v(50); // Ensure there's enough space
strcpy(&v[0], "prefer vectors to c arrays");

Can I create a C++ string/vector with specified length but no initialization?

I need create a string/vector. I know how long it should be, however, I'd like to write the right thing into it later. Can I create it with a specified length but without any initialization (neither explicit nor implicit), like what malloc does? Because I'll write into it properly before reading from it, it would be a waste of time to initialize it at construction.
I hoped I could write with arbitrary order after creating the vector, like
vector<int> v(10); // Some magic to create v with 10 of uninitialized ints
v[6] = 1;
v[3] = 2;
...
Seemingly that's impossible.
If I understand your question properly, you want std::vector::reserve or std::basic_string::reserve.
std::vector<int> v; // empty vector
v.reserve(how_long_it_should_be); // insure the capacity
v.push_back(the_right_thing); // add elements
...
Edit for question's edit
vector<int> v(10);, will always construct v with 10 default-initialized int, i.e. 0. You might want std::array if you could know the size at compile time.
std::array<int, 10> v; // construct v with 10 uninitialized int
v[6] = 1;
v[3] = 2;
LIVE
Using .reserve() on either containers will increase the .capacity() of the internal memory block allocated without calling any default constructors.
You can assert that the container has the right capacity at the moment you need it using .capacity(). Note that .size() will be different to .capacity() after a .reserve() as the first returns the number of actual objects inside the container, while the seconds returns the total number of objects the current memory block can handle without reallocation.
It is good practice (especially for std::vector) to empirically .reserve() your containers to avoid extra allocations at runtime. If you are using at least C++11, in case you want the remaining memory back and you can deal with some copying/moving, you can use shrink_to_fit().
Note that std::string::reserve differs from std::vector::reserve in case the new capacity requested is smaller than the current capacity. The string will take it as a non-binding request to shrink, while the vector will ignore the request.
Yes, you can, with boost::noinit_adaptor:
vector<int, boost::noinit_adaptor<std::allocator<int>> v(10);
Under the hood it redefines allocator::construct to do default initialization using new(p) T instead of value initialization new(p) T().
For built-in types default initialization does nothing, whereas value initialization zero-initializes.
When growing a vector, the new object must be initialized one way or another. It's not possible to create a vector of uninitialized int objects, for example.
The closest you could get would be to define a class with a data member and a default constructor that does not initialize that member, e.g.:
struct bar { int x; bar() {} };
// ...
std::vector<bar> vec(5);
Then vec ultimately contains 5 uninitialized int subobjects.
The reserve function allocates memory but does not increase the count of objects in the vector; it does not help with the problem that when you do eventually want an object in the vector you must initialize that object.
Use .resize(x) to change the actual number of elements.
* If .size() > x then elements will be destroyed, back first.
* If .size() < x new elements will be added using their default constructor.
Use .reserve(x) to have the vector allocate memory for x elements but not instantiate any (.size() will not change).
* If .size() < x no action will be taken, and the vector will be unaffected.
From C++11 you can use fill constructor:
http://www.cplusplus.com/reference/vector/vector/vector/
(2) fill constructor
Constructs a container with n elements. Each element is a copy of val (if provided).
BTW: there is quite big difference between string and vector.

Returning a vector in C++

I just read this post on SO, that discusses where in memory, STL vectors are stored. According to the accepted answer,
vector<int> temp;
the header info of the vector on the stack but the contents on the heap.
In that case, would the following code be erroneous?
vector<int> some_function() {
vector<int> some_vector;
some_vector.push_back(10);
some_vector.push_back(20);
return some_vector;
}
Should I have used vector<int> *some_vector = new vector<int> instead? Would the above code result in some code of memory allocation issues? Would this change if I used an instance of a custom class instead of int?
Your code is precisely fine.
Vectors manage all the memory they allocate for you.
It doesn't matter whether they store all their internal data using dynamic allocations, or hold some metadata as direct members (with automatic storage duration). Any dynamic allocations performed internally will be safely cleaned-up in the vector's destructor, copy constructor, and other similar special functions.
You do not need to do anything as all of that is abstracted away from your code. Your code has no visibility into that mechanism, and dynamically allocating the vector itself will not have any effect on it.
That is the purpose of them!
If you decide for dynamic allocation of the vector, you will have really hard time destroying it correctly even in very simple cases (do not forget about exceptions!). Do avoid dynamic allocation at all costs whenever possible.
In other words, your code is perfectly correct. I would not worry about copying the returned vector in memory. In these simple cases compilers (in release builds) should use return value optimization / RVO (http://en.wikipedia.org/wiki/Return_value_optimization) and create some_vector at memory of the returned object. In C++11 you can use move semantics.
But if you really do not trust the compiler using RVO, you can always pass a reference to a vector and fill it in inside the function.
//function definition
void some_function(vector<int> &v) { v.push_back(10); v.push_back(20); }
//function usage
vector<int> vec;
some_function(vec);
And back to dynamic allocation, if you really need to use it, try the pattern called RAII. Or use smart pointers.
It is not important where internally vectors define their data because you return the vector by copy.:) (by value) It is the same as if you would return an integer
int some_function()
{
int x = 10;
return x;
}

Can std::array (or boost::array) be used in this case, or am I stuck with std::vector and native arrays?

So std::array and boost::array (which are almost identical, and I will hereafter refer to ambiguously as just "array") were designed to provide a container object for arrays that does not incur the overheads of vector that are unnecessary if the array does not dynamically change size. However, they are both designed by taking the array size not as a constructor parameter but a template argument. The result: vector allows dynamic resizing after object creation; array requires the size to be known at compile time.
As far as I can see, if you have an array for which you will knows the size at object creation but not at compile time, then your only options are 1) unnecessarily incur extra overheads by using vector, 2) use the (non-container type) native array (e.g., int foo[42];), or 3) write your own array-wrapper class from scratch. So is this correct, that this is an in-between case where you may want to use array rather than vector, but cannot? Or is there some magic I can do that will may array work for me?
Here's a little detail (ok, a lot) on what inspired this question, in case it helps you understand:
I have a module - say the caller - that will repeatedly produce binary data at runtime (unsigned char[], or array), and then pass it to another module - say the callee. The callee module does not modify the array (it will make a copy and modify that if necessary), so once the caller module creates the array initially, it will not change size (nor indeed contents). However, two problems arise: 1) The caller may not generate arrays of the same size each time an array is generated - it will know the array size at rutime when it creates the array, but not at compile time. 2) The method for caller to pass the array to callee needs to be able to take an array of whatever size the caller passes to it.
I thought about making it a templated function, e.g.,
template<size_t N> void foo(const array<unsigned char, N>& my_array);
However, I'm using an interface class to separate interface from implementation in the callee module. Therefore, the function must be a virtual method, which is mutually exclusive with being templated. Furthermore, even if that were not an issue, it would still have the same problem as #1 above - if the array sizes are not known at compile time then it also cannot resolved the templated function at compile time.
My actual funciton:
virtual void foo(const array<unsigned char, N>& my_array); // but what is N???
So in summary, am I correct that my only real choices are to use a vector or native array, e.g.,
virtual void foo(const vector<unsigned char> my_array); // unnecessary overhead
virtual void foo(const unsigned char[] my_array, size_t my_array_len); // yuk
Or is there some trick I'm overlooking that will let me use a std::array or boost::array?
Until we have std::dynarray in C++11, you can use std::unique_ptr:
std::unique_ptr<Foo[]> arr(new Foo[100]);
You can use this as arr[0], arr[1], etc., and it will call the correct delete[] upon destruction. The overhead is minimal (just the pointer).
I think the only difference between an array-typed unique pointer and std::dynarray is that the latter has iterators and and size other "containery" properties, and that it'll be in the "Containers" section rather than the "general utilities". [Update: And that compilers may choose to natively support dynarray and optimize it to use stack storage.]
You simply cannot use any form of std::array if you don't know the length at compile time.
If you don't know the size of your array at compile time, seriously consider using std::vector. Using variable length arrays (like int foo[n]), is not standard C++ and will cause stack overflows if given length is big enough. Also you cannot write any array-like-wrapper with (measurably) less overhead than std::vector.
I would just use
virtual void foo(const unsigned char* my_array, size_t my_array_len);
And call it like
obj.foo(&vec[0], vec.size());
There is no overhead attached and it does what you want. In addition to normal arrays (int foo[42]) this can also be called with vectors and std::arrays with zero overhead.
Other considerations:
Arrays are allocated on the stack. This is much faster than allocating on the heap.
Arrays always initialize all their elements when they are created.
So:
class Foo;
std::array<Foo, 100> aFoo;
constructs 100 Foo objects, (calls Foo::Foo() 100 times) while
std::vector<Foo> vFoo;
vFoo.reserve(100);
reserves space for 100 Foo objects (on the heap), but doesn't construct any of them.

Pass nested C++ vector as built-in style multi-dimensional array

If I have a vector in C++, I know I can safely pass it as an array (pointer to the contained type):
void some_function(size_t size, int array[])
{
// impl here...
}
// ...
std::vector<int> test;
some_function(test.size(), &test[0]);
Is it safe to do this with a nested vector?
void some_function(size_t x, size_t y, size_t z, int* multi_dimensional_array)
{
// impl here...
}
// ...
std::vector<std::vector<std::vector<int> > > test;
// initialize with non-jagged dimensions, ensure they're not empty, then...
some_function(test.size(), test[0].size(), test[0][0].size(), &test[0][0][0]);
Edit:
If it is not safe, what are some alternatives, both if I can change the signature of some_function, and if I can't?
Short answer is "no".
Elements here std::vector<std::vector<std::vector<int> > > test; are not replaced in contiguous memory area.
You can only expect multi_dimensional_array to point to a contiguos memory block of size test[0][0].size() * sizeof(int). But that is probably not what you want.
It is erroneous to take the address of any location in a vector and pass it. It might seem to work, but don't count on it.
The reason why is closely tied to why a vector is a vector, and not an array. We want a vector to grow dynamically, unlike an array. We want insertions into a vector be a constant cost and not depend on the size of the vector, like an array until you hit the allocated size of the array.
So how does the magic work? When there is no more internal space to add a next element to the vector, a new space is allocated twice the size of the old. The old space is copied to the new and the old space is no longer needed, or valid, which makes dangling any pointer to the old space. Twice the space is allocated so the average cost of insertion to the vector that is constant.
Is it safe to do this with a nested vector?
Yes, IF you want to access the inner-most vector only, and as long you know the number of elements it contains, and you don't try accessing more than that.
But seeing your function signature, it seems that you want to acess all three dimensions, in that case, no, that isn't valid.
The alternative is that you can call the function some_function(size_t size, int array[]) for each inner-most vector (if that solves your problem); and for that you can do this trick (or something similar):
void some_function(std::vector<int> & v1int)
{
//the final call to some_function(size_t size, int array[])
//which actually process the inner-most vectors
some_function(v1int.size(), &v1int[0]);
}
void some_function(std::vector<std::vector<int> > & v2int)
{
//call some_function(std::vector<int> & v1int) for each element!
std::for_each(v2int.begin(), v2int.end(), some_function);
}
//call some_function(std::vector<std::vector<int> > & v2int) for each element!
std::for_each(test.begin(), test.end(), some_function);
A very simple solution would be to simply copy the contents of the nested vector into one vector and pass it to that function. But this depends on how much overhead you are willing to take.
That being sad: Nested vectorS aren't good practice. A matrix class storing everything in contiguous memory and managing access is really more efficient and less ugly and would possibly allow something like T* matrix::get_raw() but the ordering of the contents would still be an implementation detail.
Simple answer - no, it is not. Did you try compiling this? And why not just pass the whole 3D vector as a reference? If you are trying to access old C code in this manner, then you cannot.
It would be much safer to pass the vector, or a reference to it:
void some_function(std::vector<std::vector<std::vector<int>>> & vector);
You can then get the size and items within the function, leaving less risk for mistakes. You can copy the vector or pass a pointer/reference, depending on expected size and use.
If you need to pass across modules, then it becomes slightly more complicated.
Trying to use &top_level_vector[0] and pass that to a C-style function that expects an int* isn't safe.
To support correct C-style access to a multi-dimensional array, all the bytes of all the hierarchy of arrays would have to be contiguous. In a c++ std::vector, this is true for the items contained by a vector, but not for the vector itself. If you try to take the address of the top-level vector, ala &top_level_vector[0], you're going to get an array of vectors, not an array of int.
The vector structure isn't simply an array of the contained type. It is implemented as a structure containing a pointer, as well as size and capacity book-keeping data. Therefore the question's std::vector<std::vector<std::vector<int> > > is more or less a hierarchical tree of structures, stitched together with pointers. Only the final leaf nodes in that tree are blocks of contiguous int values. And each of those blocks of memory are not necessarily contiguous to any other block.
In order to interface with C, you can only pass the contents of a single vector. So you'll have to create a single std::vector<int> of size x * y * z. Or you could decide to re-structure your C code to handle a single 1-dimensional stripe of data at a time. Then you could keep the hierarchy, and only pass in the contents of leaf vectors.