Initializing multi-dimensional std::vector without knowing dimensions in advance - c++

Context: I have a class, E (think of it as an organism) and a struct, H (a single cell within the organism). The goal is to estimate some characterizing parameters of E. H has some properties that are stored in multi-dimensional matrices. But, the dimensions depend on the parameters of E.
E reads a set of parameters from an input file, declares some objects of type H, solves each of their problems and fills the matrices, computes a likelihood function, exports it, and moves on to next set of parameters.
What I used to do: I used to declare pointers to pointers to pointers in H's header, and postpone memory allocation to H's constructor. This way, E could pass parameters to constructor, and memory allocation could be done afterwards. I de-allocated memory in the destructor.
Problem: Yesterday, I realized this is bad practice! So, I decided to try vectors. I have read several tutorials. At the moment, the only thing that I can think of is using push_back() as used in the question here. But, I have a feeling that this might not be the best practice (as mentioned by many, e.g., here, under method 3).
There are tens of questions that are tangent to this, but none answers this question directly: What is the best practice if dimensions are not known in advance?
Any suggestion helps: Do I have any other solution? Should I stick to arrays?

Using push_back() should be fine, as long as the vector has reserved the appropriate capacity.
If your only hesitancy to using push_back() is the copy overhead when a reallocation is performed, there is a straightforward way to resolve that issue. You use the reserve() method to inform the vector how many elements the vector will eventually have. So long as
reserve() is called before the vector is used, there will just be a single allocation for the needed amount. Then, push_back() will not incur any reallocations as the vector is being filled.
From the example in your cited source:
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
// construct a vector of ints with the given default value
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(default_value);
}
// push back above one-dimensional vector
matrix.push_back(v);
}
This particular example is contrived. As #kei2e noted in a comment, the inner v variable could be initialized once on the outside of the loop, and then reused for each row.
However, as noted by #Jarod42 in a comment, the whole thing can actually be accomplished with the appropriate construction of matrix:
std::vector<std::vector<int>> matrix(M, std::vector<int>(N, default_value));
If this initialization task was populating matrix with values from some external source, then the other suggestion by #Jarod42 could be used, to move the element into place to avoid a copy.
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(source_of_value());
}
matrix.push_back(std::move(v));
}

Related

Populating a vector with known number of elements: specify its size in constructor or by using reserve method?

I would like to create a vector of some complex type, by reading individual elements from a stream. I know the vector size in advance. Is it better to specify the number of elements in the vector constructor or by using reserve method? Which one of these two is better?
int myElementCount = stream.ReadInt();
vector<MyElement> myVector(myElementCount);
for (int i = 0; i < myElementCount; i++)
{
myVector[i] = stream.ReadMyElement();
}
or
int myElementCount = stream.ReadInt();
vector<MyElement> myVector;
myVector.reserve(myElementCount);
for (int i = 0; i < myElementCount; i++)
{
myVector.push_back(stream.ReadMyElement());
}
What about the case where I just create a vector of ints or some other simple type.
It depends on what MyElement is, especially what its operator= does, so it's largely the usual "try both and use the faster one for you". There is a third choice, use c++11 and emplace_back, especially if MyElement is heavy.
As a datapoint, for int or double I found that using the constructor (or resize()) and [] is faster. Specifically, this way the loop is much easier for the compiler to vectorize.

C++ Class Variables: Initialization vs. Assignment and Initialization of vectors

I am working on a C++ program that has a series of class variables that contain vectors on some or all of the member variables. My question is three-fold:
Is it straight-forward to use constructors to initialize vector variables that are part of a class (see sample class definition below)? Could someone post an example constructor for the class below (or for at least the single and two-dimension vector variables)?
Is there a problem with simply initializing the variables myself in my code (i.e., iterating through each element of the vectors using loops to assign an initial value)?
Along the same lines, if the variables need to be initialized to different values in different contexts (e.g., zero in one instance, some number in another instance), is there a way to handle that through constructors?
Sample class definition:
class CreditBasedPoolLevel {
public:
int NumofLoans;
int NumofPaths;
int NumofPeriods;
double TotalPoolBal;
vector<int> NumofModeled;
vector<double> ForbearanceAmt;
vector<double> TotalAmtModeled;
vector<vector<int>> DefCountPoolVector;
vector<vector<double>> TermDefBalPoolVector;
vector<vector<double>> BalloonDefBalPoolVector;
vector<vector<double>> TermDefBalPoolVectorCum;
vector<vector<double>> TermSeverityAmt;
vector<vector<double>> TermELAmtPoolVector;
vector<vector<double>> BalloonELAmtPoolVector;
vector<vector<double>> TermELAmtPoolVectorCum;
};
In C++, initializing a variable calls its constructor. In a vector's case, this means it creates an instance of a vector with whatever the initial capacity is (10 I believe), with no values. At this point, you need to use push_back in order to fill the vector - even though it has a capacity, it will cause undefined behavior if you try to access unfilled areas directly (such as with NumofModeled[0]). You can also initialize it with some amount of space by using vector NumofModeled(x) (x being the number of spaces), but generally because vectors have dynamic size, it's easier to use push_back unless there is some reason you need to enter your data out of order.
Relates to the capacity part of one, if you try to access unfilled space in a vector you will get undefined behavior. It's pretty standard practice to fill a vector with a loop though, such as:
vector<int> v;
int in = 0;
while (cin)
{
cin >> in;
v.push_back(in);
}
Yes, but remember that like functions, constructors only differentiate by the type of input parameters. So, for example, you could have CreditBasedPoolLevel(int level) and CreditBasedPoolLevel(vector<int> levels), but not another with the definition CreditBasedPoolLevel(int otherint), because it would conflict with the first. If you want to be able to take different contextual input of the same type, you can use another variable to define the constructor type, such as CreditBasedPoolLevel(int input, string type) and use a switch block to define the initialization logic based on the type.
As for question number three, simply add a constructor with an argument that is the value you want to initialize the vectors with.
And if you just want the vectors to be default constructed, then there's nothing that needs to be done.
Constructor may look something like this:
CreditBasedPoolLevel::CreditBasedPoolLevel()
{
const int numDefCountPools = 13;
const int numDefCountPoolEntries = 25;
for(int i = 0; i < numDefCountPools; i++)
{
vector<int> v;
for(int j = 0; j < numDefCountPoolEntries; j++)
{
v.push_back(j + i * 5); // Don't know what value you ACTUALLY want to fill here
}
DefCountPoolVector.push_back(v);
}
}
Note that this is ONE solution, it really depends on what values you want, how you went them organized, etc, what is the "right" solution for your case.

Multidimensional vector bus error

I have a 11663 Bus Error when I attempt to do the following;
std::vector< std::vector<int> > bullets;
std::vector<int> num;
num[0] = 7;
bullets.push_back(num);
I thought this would work as the vector bullets's type is a vector. Why doesn't this work as expected? Also, the following works;
std::vector< std::vector<int> > bullets;
std::vector<int> num (4, 100);
bullets.push_back(num);
And I don't know why this works, but not my other code.
std::vector<int> num;
num[0] = 7;
num has not yet allocated storage for anything. Only use the indexing syntax [] if you know an element exists at that index. Otherwise, use push_back, which grows the vectors storage capacity if needed. The second example works because you used the constructor which reserves a certain amount of space for elements (4 in this case, all with the value 100).
std::vector<int> num;
num.push_back(7);
bullets.push_back(num);
On a side note, "this doesn't work" is not a very helpful problem description. Also, note that a vector of vectors used as a matrix is not a good idea in performance critical code should you need to iterate over each element.
Don't scrap it just yet and don't worry abut it unless you know for a fact that it will be a problem, but realize that you lose locality of data with this approach because each vector will allocate its storage separately. If this data is being iterated over in a tight loop you are better off allocating one big vector and calculating the offset to each individual position manually.
num[0] = 7;
should be
num.push_back(7);

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.
Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.
If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.
I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value
Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}
My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value
While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};

Advantage of STL resize()

The resize() function makes vector contain the required number of elements. If we require less elements than vector already contain, the last ones will be deleted. If we ask vector to grow, it will enlarge its size and fill the newly created elements with zeroes.
vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v[i] = i*2;
}
But if we use push_back() after resize(), it will add elements AFTER the newly allocated size, but not INTO it. In the example above the size of the resulting vector is 25, while if we use push_back() in a second loop, it would be 30.
vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v.push_back(i*2); // Writes to elements with indices [25..30), not [20..25) ! <
}
Then where is the advantage of resize() function ? Doesn't it creates a confusion for indexing and accessing elements from the vector ?
It sounds as though you should be using vector::reserve.
vector::resize is used to initialize the newly created space with a given value (or just the default.) The second parameter to the function is the initialization value to use.
Remember the alternative - reserve. resize is used when you want to act on the vector using the [] operator -- hence you need a "empty" table of elements. resize is not intended to be used with push_back. Use reserve if you want to prepare the array for push_back.
Resize is mainly usefull if the array has meaningful "empty" constructor, when you can create an array of empty elements, and only change the ones that are meaningful.
The resize() method changes the vector's size, which is not the same as the vector's capacity.
It is very important to understand the distinction between these two values:
The size is the number of actual elements that the vector contains.
The capacity is the maximum number of elements that the vector could contain without reallocating a larger chunk of memory.
A vector's capacity is always larger or equal to its size. A vector's capacity never shrinks, even when you reduce its size, with one exception: when you use swap() to exchange the contents with another vector. And as others have mentioned, you can increase a vector's capacity by calling reserve().
I think that using the correct terminology for size and capacity makes it easier to understand the C++ vector class and to speak clearly about its behavior.
resize() function changes the actual content of the vector by inserting or erasing elements from the vector. It does not only change its storage capacity. To direct a change only in storage capacity, use vector::reserve instead. Have a look at the vector visualization in the link, notice where v.back is pointing to.
I don't really understand the confusion. The advantage of resize is that it resizes your vector. Having to do a loop of push_backs is both tedious and may require more than one "actual" resize.
If you want to "resize" your vector without changing its accessible indexes then use std::vector<T>::reserve. That will change the size of the internal allocated array without actually "adding" anything.