Multidimensional vector bus error - c++

I have a 11663 Bus Error when I attempt to do the following;
std::vector< std::vector<int> > bullets;
std::vector<int> num;
num[0] = 7;
bullets.push_back(num);
I thought this would work as the vector bullets's type is a vector. Why doesn't this work as expected? Also, the following works;
std::vector< std::vector<int> > bullets;
std::vector<int> num (4, 100);
bullets.push_back(num);
And I don't know why this works, but not my other code.

std::vector<int> num;
num[0] = 7;
num has not yet allocated storage for anything. Only use the indexing syntax [] if you know an element exists at that index. Otherwise, use push_back, which grows the vectors storage capacity if needed. The second example works because you used the constructor which reserves a certain amount of space for elements (4 in this case, all with the value 100).
std::vector<int> num;
num.push_back(7);
bullets.push_back(num);
On a side note, "this doesn't work" is not a very helpful problem description. Also, note that a vector of vectors used as a matrix is not a good idea in performance critical code should you need to iterate over each element.
Don't scrap it just yet and don't worry abut it unless you know for a fact that it will be a problem, but realize that you lose locality of data with this approach because each vector will allocate its storage separately. If this data is being iterated over in a tight loop you are better off allocating one big vector and calculating the offset to each individual position manually.

num[0] = 7;
should be
num.push_back(7);

Related

Initializing multi-dimensional std::vector without knowing dimensions in advance

Context: I have a class, E (think of it as an organism) and a struct, H (a single cell within the organism). The goal is to estimate some characterizing parameters of E. H has some properties that are stored in multi-dimensional matrices. But, the dimensions depend on the parameters of E.
E reads a set of parameters from an input file, declares some objects of type H, solves each of their problems and fills the matrices, computes a likelihood function, exports it, and moves on to next set of parameters.
What I used to do: I used to declare pointers to pointers to pointers in H's header, and postpone memory allocation to H's constructor. This way, E could pass parameters to constructor, and memory allocation could be done afterwards. I de-allocated memory in the destructor.
Problem: Yesterday, I realized this is bad practice! So, I decided to try vectors. I have read several tutorials. At the moment, the only thing that I can think of is using push_back() as used in the question here. But, I have a feeling that this might not be the best practice (as mentioned by many, e.g., here, under method 3).
There are tens of questions that are tangent to this, but none answers this question directly: What is the best practice if dimensions are not known in advance?
Any suggestion helps: Do I have any other solution? Should I stick to arrays?
Using push_back() should be fine, as long as the vector has reserved the appropriate capacity.
If your only hesitancy to using push_back() is the copy overhead when a reallocation is performed, there is a straightforward way to resolve that issue. You use the reserve() method to inform the vector how many elements the vector will eventually have. So long as
reserve() is called before the vector is used, there will just be a single allocation for the needed amount. Then, push_back() will not incur any reallocations as the vector is being filled.
From the example in your cited source:
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
// construct a vector of ints with the given default value
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(default_value);
}
// push back above one-dimensional vector
matrix.push_back(v);
}
This particular example is contrived. As #kei2e noted in a comment, the inner v variable could be initialized once on the outside of the loop, and then reused for each row.
However, as noted by #Jarod42 in a comment, the whole thing can actually be accomplished with the appropriate construction of matrix:
std::vector<std::vector<int>> matrix(M, std::vector<int>(N, default_value));
If this initialization task was populating matrix with values from some external source, then the other suggestion by #Jarod42 could be used, to move the element into place to avoid a copy.
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(source_of_value());
}
matrix.push_back(std::move(v));
}

Performance impact when resizing vector within capacity

I have the following synthesized example of my code:
#include <vector>
#include <array>
#include <cstdlib>
#define CAPACITY 10000
int main() {
std::vector<std::vector<int>> a;
std::vector<std::array<int, 2>> b;
a.resize(CAPACITY, std::vector<int> {0, 0})
b.resize(CAPACITY, std::array<int, 2> {0, 0})
for (;;) {
size_t new_rand_size = (std::rand() % CAPACITY);
a.resize(new_rand_size);
b.resize(new_rand_size);
for (size_t i = 0; i < new_rand_size; ++i) {
a[i][0] = std::rand();
a[i][1] = std::rand();
b[i][0] = std::rand();
b[i][1] = std::rand();
}
process(a); // respectively process(b)
}
}
so obviously, the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?). It just gets reinitialized when up-resizing again within capacity.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
so obviously,
The word "obviously" is typically used to mean "I really, really want the following to be true, so I'm going to skip the part where I determine if it is true." ;) (Admittedly, you did better than most since you did bring up some reasons for your conclusion.)
the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?).
The truth of this depends on the implementation, but the there is some validity here. I would go with a less micro-managementy approach and say that the array version is preferable because the final size is fixed. Using a tool designed for your specialized situation (fixed size array) tends to incur less overhead than using a tool for a more general situation. Not always less, though.
Another factor to consider is the cost of default-initializing the elements. When a std::array is constructed, all of its elements are constructed as well. With a std::vector, you can defer constructing elements until you have the parameters for construction. For objects that are expensive to default-construct, you might be able to measure a performance gain using a vector instead of an array. (If you cannot measure a difference, don't worry about it.)
When you do a comparison, make sure the vector is given a fair chance by using it well. Since the size is known in advance, reserve the required space right away. Also, use emplace_back to avoid a needless copy.
Final note: "contiguous" is a bit more accurate/descriptive than "continuous".
It just gets reinitialized when up-resizing again within capacity.
This is a factor that affects both approaches. In fact, this causes your code to exhibit undefined behavior. For example, let's suppose that your first iteration resizes the outer vector to 1, while the second resizes it to 5. Compare what your code does to the following:
std::vector<std::vector<int>> a;
a.resize(CAPACITY, std::vector<int> {0, 0});
a.resize(1);
a.resize(5);
std::cout << "Size " << a[1].size() <<".\n";
The output indicates that the size is zero at this point, yet your code would assign a value to a[1][0]. If you want each element of a to default to a vector of 2 elements, you need to specify that default each time you resize a, not just initially.
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.
Yes, you can skip the initialization. In fact, it is advisable to do so. Use the tool designed for the task at hand. Your initialization serves to increase the capacity of your vectors. So use the method whose sole purpose is to increase the capacity of a vector: vector::reserve.
Another option – depending on the exact situation — might be to not resize at all. Start with an array of arrays, and track the last usable element in the outer array. This is sort of a step backwards in that you now have a separate variable for tracking the size, but if your real code has enough iterations, the savings from not calling destructors when the size decreases might make this approach worth it. (For cleaner code, write a class that wraps the array of arrays and that tracks the usable size.)
Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization
Yes: Don't resize. Instead, reserve the capacity and push (or emplace) the new elements.

How to reserve a multi-dimensional Vector without increasing the vector size?

I have data which is N by 4 which I push back data as follows.
vector<vector<int>> a;
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
N would be less than 13000. In order to prevent unnecessary reallocation, I would like to reserve 13000 by 4 spaces in advance.
After reading multiple related posts on this topic (eg How to reserve a multi-dimensional Vector?), I know the following will do the work. But I would like to do it with reserve() or any similar function if there are any, to be able to use push_back().
vector<vector<int>> a(13000,vector<int>(4);
or
vector<vector<int>> a;
a.resize(13000,vector<int>(4));
How can I just reserve memory without increasing the vector size?
If your data is guaranteed to be N x 4, you do not want to use a std::vector<std::vector<int>>, but rather something like std::vector<std::array<int, 4>>.
Why?
It's the more semantically-accurate type - std::array is designed for fixed-width contiguous sequences of data. (It also opens up the potential for more performance optimizations by the compiler, although that depends on exactly what it is that you're writing.)
Your data will be laid out contiguously in memory, rather than every one of the different vectors allocating potentially disparate heap locations.
Having said that - #pasbi's answer is correct: You can use std::vector::reserve() to allocate space for your outer vector before inserting any actual elements (both for vectors-of-vectors and for vectors-of-arrays). Also, later on, you can use the std::vector::shrink_to_fit() method if you ended up inserting a lot less than you had planned.
Finally, one other option is to use a gsl::multispan and pre-allocate memory for it (GSL is the C++ Core Guidelines Support Library).
You've already answered your own question.
There is a function vector::reserve which does exactly what you want.
vector<vector<int>> a;
a.reserve(N);
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
This will reserve memory to fit N times vector<int>. Note that the actual size of the inner vector<int> is irrelevant at this point since the data of a vector is allocated somewhere else, only a pointer and some bookkeeping is stored in the actual std::vector-class.
Note: this answer is only here for completeness in case you ever come to have a similar problem with an unknown size; keeping a std::vector<std::array<int, 4>> in your case will do perfectly fine.
To pick up on einpoklum's answer, and in case you didn't find this earlier, it is almost always a bad idea to have nested std::vectors, because of the memory layout he spoke of. Each inner vector will allocate its own chunk of data, which won't (necessarily) be contiguous with the others, which will produce cache misses.
Preferably, either:
Like already said, use an std::array if you have a fixed and known amount of elements per vector;
Or flatten your data structure by having a single std::vector<T> of size N x M.
// Assuming N = 13000, M = 4
std::vector<int> vec;
vec.reserve(13000 * 4);
Then you can access it like so:
// Before:
int& element = vec[nIndex][mIndex];
// After:
int& element = vec[mIndex * 13000 + nIndex]; // Still assuming N = 13000

std::vector capacity smart implementation

I know that std::vector capacity behavior is implementation specific, is there any smart implementation that does this :
vector<int> v;
for(int i = 0; i < 10000 ; ++i){
v.push_back(i);
}
At initialisation, it can predict the capacity of the 'vector', in this example it will initiate the capacity to 10000
I am asking for this because I always thought gcc does this kind of predictions, but I couldn't find anything about this ... I think I have seen this somewhere, so is there any implementation that does this ?
Nothing get predicted. However:
one can use reserve to preallocate the maximum required amount of elements. push_back will then never need to reallocate.
push_back use the growth strategy of vector that allocate more than just one mor element. IIRC the growth factor is 2, which means that the number of reallocation in a serie of push_back tends to become logarithmic. Therefore, the cost of N calls to push_back converges toward log2(N).
It exists different constructor for std::vector. One of these possibilities is to say the default value and the number of values that you want to your vector.
From the documentation of std::vector:
// constructors used in the same order as described above:
std::vector<int> first; // empty vector of ints
std::vector<int> second (4,100); // four ints with value 100
std::vector<int> third (second.begin(),second.end()); // iterating through second
std::vector<int> fourth (third); // a copy of third
This is useful if you know in advance the maximum size of your vector.

Error: Deallocating a 2D array

I am developing a program in which one of the task is to read points (x,y and z) from a text file and then store them in an array. Now the text file may contain 10^2 or even 10^6 points, depending upon the text file user selects. Therefore I am defining a dynamic array.
For allocating a dynamic 2D array, I wrote as below and it works fine:
const int array_size = 100000;
float** array = new float* [array_size];
for(int i = 0; i < array_size; ++i){
ary[i] = new float[2]; // 0,1,2 being the columns for x,y,z co-ordinates
}
After the points are saved in the array, I write the following to deallocate the unallocated memory :
for (int i = 0; i < array_size; i++){
delete [] array[i];
}
delete [] array;
and then my program stops working and shows "Project.exe stopped working".
If I don't deallocate, the program works just fine.
In your comment you say 0,1,2 being the columns for x,y,z co-ordinates, if that's the case, you need to be allocating as float[3]. When you allocate an array of float[N], you are allocating a chunk of the memory of the size N * sizeof(float), and you will index them in the array from 1 to N - 1. Therefore if you need indeces 0,1,2, you will need to allocate a memory of the size 3 * sizeof(float), which makes it float[3].
Because other than that, I can compile and run the code without an error. If you fix it and still get an error, it might be your compiler problem. Then try to decrease 100000 to a small number and try again.
You are saying that you are trying to implement a dynamic array, this is what std::vector does and I would highly recommend that you use it. This way you are using something from the standard library that's extremely well tested and you won't run into issues by essentially trying to roll your own version of std::vector. Additionally this approach wraps memory better as it uses RAII which leverages the language to solve a lot of memory management issues. This has other benefits too like making your code more exception safe.
Also if you are storing x,y,z coordinates consider using a struct or a tuple, I think that enhances readability a lot. You can typedef the coordinate type too. Something like std::vector< coord_t > is more readable to me.
(Thanx a lot for suggestions!!)
Finally I am using vectors for the stated problem for reasons as below:
1.Unlike Arrays (not array object ofcourse), I don't need to manually deallocate unallocated memory.
2.There are numerous built in methods defined under vector class
Vector size can be extended at later stages
Below is how I used 2D Vector to store points (x,y,z co-ordinates)
Initialized (allocated memory) a 2D vector:
vector<vector<float>> array (1000, vector<float> array (3));
Where 1000 is the number of rows, and 3 is the number of columns
Once declared, values can be passed simply as:
array[i][j] = some value;
Also, at later stage I declared functions taking vector arguments and returning vectors as:
vector <vector <float>> function_name ( vector <vector <float>>);
vector <vector <float>> function_name ( vector <vector <float>> input_vector_name)
{
return output_vector_name_created_inside_function
}
Note: This method crates a copy of vector while returning, use pointer to return by reference. Even though mine is not working when I return vector by reference :(
For multi arrays I recommended use boost::multi_array.
Example:
typedef boost::multi_array<double, 3> array_type;
array_type A(boost::extents[3][4][2]);
A[0][0][0] = 3.14;