Efficient memory allocation for large nested vectors - c++

I'm creating a huge matrix that is stored inside nested vectors:
typedef vector<vector<pair<unsigned int, char>>> Matrix;
The outer vector will eventually contain ~400.000 vectors that which each contain ~220 pairs at max (most contain less). This takes about 1GB of RAM and is done like this:
Matrix matrix;
for (unsigned int i = 0; i < rows; i++) {
vector<pair<unsigned int, char>> row;
for (unsigned int j = 0; j < cols; j++) {
// ...calculations...
row.push_back( pair<unsigned int, char>(x, y) );
}
matrix.push_back(row);
}
The first 20% go quite fast but the larger the outer vector grows, the slower gets the whole process. I'm pretty sure that there is some optimization possible, but I'm not an expert on this field. Are there any simple tricks to speed this up? Or are there any major faults in my attempt?

It would be better to just use a single one dimensional vector and wrap up the row, column indexing in some functions/class. This way the memory for the entire matrix is guaranteed to be contiguous.
And instead of using push_back allocate the entire matrix up front:
std::vector<pair<unsigned int, char>> matrix(rows * cols);

I would start with the obvious optimization.
If you know the number of rows before you start filling the values (or usable upper bound), just reserve the space beforehand. The most time spend when pushing_back a lot of values is spend by reallocating memory and copying already contained values.
Matrix matrix(rows);
for(unsigned i = 0; i < rows; i++) {
vector<pair<unsigned int, char>> row(cols);
for(unsigned j; j < cols; j++) {
row[j] = // value
}
matrix[i] = row;
}

Using the VS 2010 compiler, the following turned out to work best:
Matrix matrix;
matrix.reserve(rows);
vector<pair<unsigned int, char>> row;
row.reserve(cols);
for (unsigned int i = 0; i < rows; i++) {
for (unsigned int j = 0; j < cols; j++) {
// ...calculations...
row.push_back( pair<unsigned int, char>(x, y) );
}
matrix.push_back(row);
row.clear();
}
Creating just a single vector that is used to build up all the rows consumes much less memory than creating a fresh one that allocates memory for "cols" entries every time. Not really sure why that is though.
However, I'm accepting Andreas' answer as this one is only a solution for my specific case while his answer provided the general information needed for such optimizations.

The problem is a lot of data copying when the outer vector grows. Consider changing your typedef to
typedef vector< shared_ptr< vector<pair<unsigned int, char>> > > Matrix;
and doing matrix.reserve(rows) before you start fililng it with values.

Related

Allocate a (fixed size) vector<vector<double>> for efficient access and add them?

I am implementing an algorithm that uses rather large vector(vector(double)) types for storage; no elements will be added or removed after preallocation. I would like to make sure that element access is as fast as possible, also I need to add and scale several of them (elementwise). What is the best way to do this?
Here are relevant parts of my (naive) code, that I doubt is efficient:
vector<vector<double>> z;
vector<vector<double>> mu;
vector<vector<double>> temp_NNZ;
..
for(int i = 0; i < init.valsA.size(); ++i){
z.push_back({});
mu.push_back({});
temp_NNZ.push_back({});
for(int j = 0; j < init.valsA[i].size(); ++j){
z[i].push_back(0);
mu[i].push_back(0);
temp_NNZ[i].push_back(0);
}
}
..
for(int i = 0; i < z.size(); ++i){
for(int j = 0; j < z[i].size(); ++j){
z[i][j] = temp_NNZ[i][j] - mu[i][j]/rho - z[i][j];
}
}
There are two ways to do this : vector::resize will create all the elements, and value-initialize them (or copy-initialize them if you give it an initial value), and vector::reserve will let you allocate the amount of memory you need in advance without initializing it (which may be more efficient). In the first case, you'll have to copy the final value to the already existing elements (z[i] = x) whereas in the other you'll have to create the elements as you would do with your current code (z.push_back(x)).

outputting a row of a vector

I am learning c++ and I am currently at halt.
I am trying to write a function such that:
It takes in input a one dimensional vector and an integer which specifies a row.
The numbers on that row are put into an output vector for later use.
The only issue is that this online course states that I must use another function that I have made before that allows a 1d vector with one index be able to have two indexes.
it is:
int twod_to_oned(int row, int col, int rowlen){
return row*rowlen+col;
}
logically what I am trying to do:
I use this function to store the input vector into a temporary vector as a 2D matric with i as the x axis and y as the y axis.
from there I have a loop which reads out the numbers on the row needed and stores it in the output vector.
so far I have:
void get_row(int r, const std::vector<int>& in, std::vector<int>& out){
int rowlength = std::sqrt(in.size());
std::vector <int> temp;
for(int i = 0; i < rowlength; i++){ // i is vertical and j is horizontal
for(int j = 0; j < rowlength; j++){
temp[in[twod_to_oned(i,j,side)]]; // now stored as a 2D array(matrix?)
}
}
for(int i=r; i=r; i++){
for(int j=0; j< rowlength; j++){
out[temp[i][j]];
}
}
I'm pretty sure there is something wrong in the first and last loop which turns into a 2D matric then stores the row.
I starred the parts that are incomplete due to my lack of knowledge.
How could I overcome this issue? I would appreciate any help, Many thanks.
takes in input a one dimensional vector
but
int rowlength = std::sqrt(in.size());
The line of code appears to assume that the input is actually a square two dimensional matrix ( i.e. same number of rows and columns ) So which is it? What if the number of elements in the in vector is not a perfect square?
This confusion about the input is likely to cuase your problem and should be sorted out before doing anything else.
I think what you wanted to do is the following:
void get_row(int r, const std::vector<int>& in, std::vector<int>& out) {
int rowlength = std::sqrt(in.size());
std::vector<std::vector<int>> temp; // now a 2D vector (vector of vectors)
for(int i = 0; i < rowlength; i++) {
temp.emplace_back(); // for each row, we need to emplace it
for(int j = 0; j < rowlength; j++) {
// we need to copy every value to the i-th row
temp[i].push_back(in[twod_to_oned(i, j, rowlength)]);
}
}
for(int j = 0; j < rowlength; j++) {
// we copy the r-th row to out
out.push_back(temp[r][j]);
}
}
Your solution used std::vector<int> instead of std::vector<std::vector<int>>. The former does not support accessing elements by [][] syntax.
You were also assigning to that vector out of its bounds. That lead to undefined behaviour. Always use push_back or emplate_back to add elements. Use operator [] only to access the present data.
Lastly, the same holds true for inserting the row to out vector. Only you can know if the out vector holds enough elements. My solution assumes that out is empty, thus we need to push_back the entire row to it.
In addition: you might want to use std::vector::insert instead of manual for() loop. Consider replacing the third loop with:
out.insert(out.end(), temp[r].begin(), temp[r].end());
which may prove being more efficient and readable. The inner for() look of the first loop could also be replaced in such a way (or even better - one could emplace the vector using iterators obtained from the in vector). I highly advise you to try to implement that.

two dimensional vector matrices addition

vector<vector<int>> AsumB(
int kolumny, vector<vector<int>> matrix1, vector<vector<int>> matrix2) {
vector<vector<int>>matrix(kolumny);
matrix = vector<vector<int>>(matrix1.size());
for (int i = 0; i < kolumny; ++i)
for (int j = 0; i <(static_cast<signed int>(matrix1.size())); ++i)
matrix[i][j] = matrix1[i][j] + matrix2[i][j];
return matrix;
}
Please tell me what I don't understand and help me solve this problem
because for 1dimensional vector this kind of description would work;
What about
vector<vector<int>> AsumB(vector<vector<int>> const & matrix1,
vector<vector<int>> const & matrix2) {
vector<vector<int>> matrix(matrix1);
for (std::size_t i = 0U; i < matrix.size(); ++i)
for (std::size_t j = 0U; j < matrix[j].size(); ++j)
matrix[i][j] += matrix2[i][j];
return matrix;
}
?
Unable to reproduce, and OP's reported compiler error doesn't look like it matches the code, so the problem is probably somewhere else.
However, there is a lot wrong here that could be causing all sorts of bad that should be addressed. I've taken the liberty of reformatting the code a bit to make explaining easier
vector<vector<int>> AsumB(int kolumny,
vector<vector<int>> matrix1,
vector<vector<int>> matrix2)
matrix1 and matrix2 are passed by value. There is nothing wrong logically, but this means there is the potential for a lot of unnecessary copying unless the compiler is very sharp.
{
vector<vector<int>> matrix(kolumny);
Declares a vector of vectors with the outer vector sized to kolumny. There are no inner vectors allocated, so 2D operations are doomed.
matrix = vector<vector<int>>(matrix1.size());
Makes a temporary vector of vectors with the outer vector sized to match the outer vector of matrix1. This temporary vector is then assigned to the just created matrix, replacing it's current contents, and is then destroyed. matrix still has no inner vectors allocated, so 2D operations are still doomed.
for (int i = 0; i < kolumny; ++i)
for (int j = 0; i < (static_cast<signed int>(matrix1.size())); ++i)
i and j should never go negative (huge logic problem if they do), so use an unsigned type. Use the right unsigned type and the static_cast is meaningless.
In addition the inner for loop increments and tests i, not j
matrix[i][j] = matrix1[i][j] + matrix2[i][j];
I see nothing wrong here other than matrix having nothing for j to index. This will result in Undefined Behaviour as access go out of bounds.
return matrix;
}
Cleaning this up so that it is logically sound:
vector<vector<int>> AsumB(const vector<vector<int>> & matrix1,
const vector<vector<int>> & matrix2)
We don't need the number of columns. The vector already knows all the sizes involved. A caveat, though: vector<vector<int>> allows different sizes of all of the inner vectors. Don't do this and you should be good.
Next, this function now takes parameters by constant reference.. With the reference there is no copying. With const the compiler knows the vectors will not be changed insode the function and can prevent errors and make a bunch of optimizations.
{
size_t row = matrix1.size();
size_t is an unsigned data type guaranteed to be large enough to index any representable object. It will be bg enough and you don't have to worry about pesky negaitve numbers. Also eliminates the need for any casting later.
if (!(row > 0 && row == matrix2.size()))
{
return vector<vector<int>>();
}
Here we make sure that everyone agrees ont he number of rows inviolved and return an empty vector if they don't. You could also throw an exception. The exception may be a better solution, but I don't know the use case.
size_t column = matrix1[0].size();
if (!(column > 0 && column == matrix2[0].size()))
{
return vector<vector<int>>();
}
Dowes the same as above, but makes sure the number of columns makes sense.
vector<vector<int>> matrix(row, vector<int>(column));
Created a local row by column matrix to store the result. Note the second parameter. vector<int>(column) tells the compiler that all row inner vectors will be initialized to a vector of size column.
for (int i = 0; i < row; ++i)
{
for (int j = 0; j < column; ++j)
{
Here we simplified the loops just a bit since we know all the sizes.
matrix[i][j] = matrix1[i][j] + matrix2[i][j];
}
}
return matrix;
The compiler has a number of tricks at its disposal to eliminate copying matrix on return. Look up Return Value Optimization with your preferred web search engine if you want to know more.
}
All together:
vector<vector<int>> AsumB(const vector<vector<int>> & matrix1,
const vector<vector<int>> & matrix2)
{
size_t row = matrix1.size();
if (!(row > 0 && row == matrix2.size()))
{
return vector<vector<int>>();
}
size_t column = matrix1[0].size();
if (!(column > 0 && column == matrix2[0].size()))
{
return vector<vector<int>>();
}
vector<vector<int>> matrix(row, vector<int>(column));
for (int i = 0; i < row; ++i)
{
for (int j = 0; j < column; ++j)
{
matrix[i][j] = matrix1[i][j] + matrix2[i][j];
}
}
return matrix;
}

Optimized way to find M largest elements in an NxN array using C++

I need a blazing fast way to find the 2D positions and values of the M largest elements in an NxN array.
right now I'm doing this:
struct SourcePoint {
Point point;
float value;
}
SourcePoint* maxValues = new SourcePoint[ M ];
maxCoefficients = new SourcePoint*[
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
if (sample > maxValues[0].value) {
int q = 1;
while ( sample > maxValues[q].value && q < M ) {
maxValues[q-1] = maxValues[q]; // shuffle the values back
q++;
}
maxValues[q-1].value = sample;
maxValues[q-1].point = Point(i,j);
}
}
}
A Point struct is just two ints - x and y.
This code basically does an insertion sort of the values coming in. maxValues[0] always contains the SourcePoint with the lowest value that still keeps it within the top M values encoutered so far. This gives us a quick and easy bailout if sample <= maxValues, we don't do anything. The issue I'm having is the shuffling every time a new better value is found. It works its way all the way down maxValues until it finds it's spot, shuffling all the elements in maxValues to make room for itself.
I'm getting to the point where I'm ready to look into SIMD solutions, or cache optimisations, since it looks like there's a fair bit of cache thrashing happening. Cutting the cost of this operation down will dramatically affect the performance of my overall algorithm since this is called many many times and accounts for 60-80% of my overall cost.
I've tried using a std::vector and make_heap, but I think the overhead for creating the heap outweighed the savings of the heap operations. This is likely because M and N generally aren't large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is this operation is called repeatedly, and it can't be precomputed.
I just had a thought to pre-load the first M elements of maxValues which may provide some small savings. In the current algorithm, the first M elements are guaranteed to shuffle themselves all the way down just to initially fill maxValues.
Any help from optimization gurus would be much appreciated :)
A few ideas you can try. In some quick tests with N=100 and M=15 I was able to get it around 25% faster in VC++ 2010 but test it yourself to see whether any of them help in your case. Some of these changes may have no or even a negative effect depending on the actual usage/data and compiler optimizations.
Don't allocate a new maxValues array each time unless you need to. Using a stack variable instead of dynamic allocation gets me +5%.
Changing g_Source[i][j] to g_Source[j][i] gains you a very little bit (not as much as I'd thought there would be).
Using the structure SourcePoint1 listed at the bottom gets me another few percent.
The biggest gain of around +15% was to replace the local variable sample with g_Source[j][i]. The compiler is likely smart enough to optimize out the multiple reads to the array which it can't do if you use a local variable.
Trying a simple binary search netted me a small loss of a few percent. For larger M/Ns you'd likely see a benefit.
If possible try to keep the source data in arr[][] sorted, even if only partially. Ideally you'd want to generate maxValues[] at the same time the source data is created.
Look at how the data is created/stored/organized may give you patterns or information to reduce the amount of time to generate your maxValues[] array. For example, in the best case you could come up with a formula that gives you the top M coordinates without needing to iterate and sort.
Code for above:
struct SourcePoint1 {
int x;
int y;
float value;
int test; //Play with manual/compiler padding if needed
};
If you want to go into micro-optimizations at this point, the a simple first step should be to get rid of the Points and just stuff both dimensions into a single int. That reduces the amount of data you need to shift around, and gets SourcePoint down to being a power of two long, which simplifies indexing into it.
Also, are you sure that keeping the list sorted is better than simply recomputing which element is the new lowest after each time you shift the old lowest out?
(Updated 22:37 UTC 2011-08-20)
I propose a binary min-heap of fixed size holding the M largest elements (but still in min-heap order!). It probably won't be faster in practice, as I think OPs insertion sort probably has decent real world performance (at least when the recommendations of the other posteres in this thread are taken into account).
Look-up in the case of failure should be constant time: If the current element is less than the minimum element of the heap (containing the max M elements) we can reject it outright.
If it turns out that we have an element bigger than the current minimum of the heap (the Mth biggest element) we extract (discard) the previous min and insert the new element.
If the elements are needed in sorted order the heap can be sorted afterwards.
First attempt at a minimal C++ implementation:
template<unsigned size, typename T>
class m_heap {
private:
T nodes[size];
static const unsigned last = size - 1;
static unsigned parent(unsigned i) { return (i - 1) / 2; }
static unsigned left(unsigned i) { return i * 2; }
static unsigned right(unsigned i) { return i * 2 + 1; }
void bubble_down(unsigned int i) {
for (;;) {
unsigned j = i;
if (left(i) < size && nodes[left(i)] < nodes[i])
j = left(i);
if (right(i) < size && nodes[right(i)] < nodes[j])
j = right(i);
if (i != j) {
swap(nodes[i], nodes[j]);
i = j;
} else {
break;
}
}
}
void bubble_up(unsigned i) {
while (i > 0 && nodes[i] < nodes[parent(i)]) {
swap(nodes[parent(i)], nodes[i]);
i = parent(i);
}
}
public:
m_heap() {
for (unsigned i = 0; i < size; i++) {
nodes[i] = numeric_limits<T>::min();
}
}
void add(const T& x) {
if (x < nodes[0]) {
// reject outright
return;
}
nodes[0] = x;
swap(nodes[0], nodes[last]);
bubble_down(0);
}
};
Small test/usage case:
#include <iostream>
#include <limits>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
using namespace std;
// INCLUDE TEMPLATED CLASS FROM ABOVE
typedef vector<float> vf;
bool compare(float a, float b) { return a > b; }
int main()
{
int N = 2000;
vf v;
for (int i = 0; i < N; i++) v.push_back( rand()*1e6 / RAND_MAX);
static const int M = 50;
m_heap<M, float> h;
for (int i = 0; i < N; i++) h.add( v[i] );
sort(v.begin(), v.end(), compare);
vf heap(h.get(), h.get() + M); // assume public in m_heap: T* get() { return nodes; }
sort(heap.begin(), heap.end(), compare);
cout << "Real\tFake" << endl;
for (int i = 0; i < M; i++) {
cout << v[i] << "\t" << heap[i] << endl;
if (fabs(v[i] - heap[i]) > 1e-5) abort();
}
}
You're looking for a priority queue:
template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> >
class priority_queue;
You'll need to figure out the best underlying container to use, and probably define a Compare function to deal with your Point type.
If you want to optimize it, you could run a queue on each row of your matrix in its own worker thread, then run an algorithm to pick the largest item of the queue fronts until you have your M elements.
A quick optimization would be to add a sentinel value to yourmaxValues array. If you have maxValues[M].value equal to std::numeric_limits<float>::max() then you can eliminate the q < M test in your while loop condition.
One idea would be to use the std::partial_sort algorithm on a plain one-dimensional sequence of references into your NxN array. You could probably also cache this sequence of references for subsequent calls. I don't know how well it performs, but it's worth a try - if it works good enough, you don't have as much "magic". In particular, you don't resort to micro optimizations.
Consider this showcase:
#include <algorithm>
#include <iostream>
#include <vector>
#include <stddef.h>
static const int M = 15;
static const int N = 20;
// Represents a reference to a sample of some two-dimensional array
class Sample
{
public:
Sample( float *arr, size_t row, size_t col )
: m_arr( arr ),
m_row( row ),
m_col( col )
{
}
inline operator float() const {
return m_arr[m_row * N + m_col];
}
bool operator<( const Sample &rhs ) const {
return (float)other < (float)*this;
}
int row() const {
return m_row;
}
int col() const {
return m_col;
}
private:
float *m_arr;
size_t m_row;
size_t m_col;
};
int main()
{
// Setup a demo array
float arr[N][N];
memset( arr, 0, sizeof( arr ) );
// Put in some sample values
arr[2][1] = 5.0;
arr[9][11] = 2.0;
arr[5][4] = 4.0;
arr[15][7] = 3.0;
arr[12][19] = 1.0;
// Setup the sequence of references into this array; you could keep
// a copy of this sequence around to reuse it later, I think.
std::vector<Sample> samples;
samples.reserve( N * N );
for ( size_t row = 0; row < N; ++row ) {
for ( size_t col = 0; col < N; ++col ) {
samples.push_back( Sample( (float *)arr, row, col ) );
}
}
// Let partial_sort find the M largest entry
std::partial_sort( samples.begin(), samples.begin() + M, samples.end() );
// Print out the row/column of the M largest entries.
for ( std::vector<Sample>::size_type i = 0; i < M; ++i ) {
std::cout << "#" << (i + 1) << " is " << (float)samples[i] << " at " << samples[i].row() << "/" << samples[i].col() << std::endl;
}
}
First of all, you are marching through the array in the wrong order!
You always, always, always want to scan through memory linearly. That means the last index of your array needs to be changing fastest. So instead of this:
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
Try this:
for (int i = 0; i < cols; i++) {
for (int j = 0; j < rows; j++) {
float sample = arr[i][j];
I predict this will make a bigger difference than any other single change.
Next, I would use a heap instead of a sorted array. The standard <algorithm> header already has push_heap and pop_heap functions to use a vector as a heap. (This will probably not help all that much, though, unless M is fairly large. For small M and a randomized array, you do not wind up doing all that many insertions on average... Something like O(log N) I believe.)
Next after that is to use SSE2. But that is peanuts compared to marching through memory in the right order.
You should be able to get nearly linear speedup with parallel processing.
With N CPUs, you can process a band of rows/N rows (and all columns) with each CPU, finding the top M entries in each band. And then do a selection sort to find the overall top M.
You could probably do that with SIMD as well (but here you'd divide up the task by interleaving columns instead of banding the rows). Don't try to make SIMD do your insertion sort faster, make it do more insertion sorts at once, which you combine at the end using a single very fast step.
Naturally you could do both multi-threading and SIMD, but on a problem which is only 30x30, that's not likely to be worthwhile.
I tried replacing float by double, and interestingly that gave me a speed improvement of about 20% (using VC++ 2008). That's a bit counterintuitive, but it seems modern processors or compilers are optimized for double value processing.
Use a linked list to store the best yet M values. You'll still have to iterate over it to find the right spot, but the insertion is O(1). It would probably even be better than binary search and insertion O(N)+O(1) vs O(lg(n))+O(N).
Interchange the fors, so you're not accessing every N element in memory and trashing the cache.
LE: Throwing another idea that might work for uniformly distributed values.
Find the min, max in 3/2*O(N^2) comparisons.
Create anywhere from N to N^2 uniformly distributed buckets, preferably closer to N^2 than N.
For every element in the NxN matrix place it in bucket[(int)(value-min)/range], range=max-min.
Finally create a set starting from the highest bucket to the lowest, add elements from other buckets to it while |current set| + |next bucket| <=M.
If you get M elements you're done.
You'll likely get less elements than M, let's say P.
Apply your algorithm for the remaining bucket and get biggest M-P elements out of it.
If elements are uniform and you use N^2 buckets it's complexity is about 3.5*(N^2) vs your current solution which is about O(N^2)*ln(M).

How to avoid reallocation using the STL (C++)

This question is derived from the topic:
vector reserve c++
I am using a datastructure of the type vector<vector<vector<double> > >. It is not possible to know the size of each of these vector (except the outer one) before items (doubles) are added. I can get an approximate size (upper bound) on the number of items in each "dimension".
A solution with the shared pointers might be the way to go, but I would like to try a solution where the vector<vector<vector<double> > > simply has .reserve()ed enough space (or in some other way has allocated enough memory).
Will A.reserve(500) (assumming 500 is the size or, alternatively an upper bound on the size) be enough to hold "2D" vectors of large size, say [1000][10000]?
The reason for my question is mainly because I cannot see any way of reasonably estimating the size of the interior of A at the time of .reserve(500).
An example of my question:
vector<vector<vector<int> > > A;
A.reserve(500+1);
vector<vector<int> > temp2;
vector<int> temp1 (666,666);
for(int i=0;i<500;i++)
{
A.push_back(temp2);
for(int j=0; j< 10000;j++)
{
A.back().push_back(temp1);
}
}
Will this ensure that no reallocation is done for A?
If temp2.reserve(100000) and temp1.reserve(1000) were added at creation will this ensure no reallocation at all will occur at all?
In the above please disregard the fact that memory could be wasted due to conservative .reserve() calls.
Thank you all in advance!
your example will cause a lot of copying and allocations.
vector<vector<vector<double>>> A;
A.reserve(500+1);
vector<vector<double>> temp2;
vector<double> temp1 (666,666);
for(int i=0;i<500;i++)
{
A.push_back(temp2);
for(int j=0; j< 10000;j++)
{
A.back().push_back(temp1);
}
}
Q: Will this ensure that no reallocation is done for A?
A: Yes.
Q: If temp2.reserve(100000) and temp1.reserve(1000) where added at creation will this ensure no reallocation at all will occur at all?
A: Here temp1 already knows its own length on creation time and will not be modified, so adding the temp1.reserve(1000) will only force an unneeded reallocation.
I don't know what the vector classes copy in their copy ctor, using A.back().reserve(10000) should work for this example.
Update: Just tested with g++, the capacity of temp2 will not be copied. So temp2.reserve(10000) will not work.
And please use the source formating when you post code, makes it more readable :-).
How can reserving 500 entries in A beforehand be enough for [1000][1000]?
You need to reserve > 1000 for A (which is your actual upperbound value), and then whenever you add an entry to A, reserve in it another 1000 or so (again, the upperbound but for the second value).
i.e.
A.reserve(UPPERBOUND);
for(int i = 0; i < 10000000; ++i)
A[i].reserve(UPPERBOUND);
BTW, reserve reserves the number of elements, not the number of bytes.
The reserve function will work properly for you vector A, but will not work as you are expecting for temp1 and temp2.
The temp1 vector is initialized with a given size, so it will be set with the proper capacity and you don't need to use reserve with this as long as you plan to not increase its size.
Regarding temp2, the capacity attribute is not carried over in a copy. Considering whenever you use push_back function you are adding a copy to your vector, code like this
vector<vector<double>> temp2;
temp2.reserve(1000);
A.push_back(temp2); //A.back().capacity() == 0
you are just increasing the allocated memory for temps that will be deallocated soon and not increasing the vector elements capacity as you expect. If you really want to use vector of vector as your solution, you will have to do something like this
vector<vector<double>> temp2;
A.push_back(temp2);
A.back().reserve(1000); //A.back().capacity() == 1000
I had the same issue one day. A clean way to do this (I think) is to write your own Allocator and use it for the inner vectors (last template parameter of std::vector<>). The idea is to write an allocator that don't actually allocate memory but simply return the right address inside the memory of your outter vector. You can easely know this address if you know the size of each previous vectors.
In order to avoid copy and reallocation for a datastructure such as vector<vector<vector<double> > >, i would suggest the following:
vector<vector<vector<double> > > myVector(FIXED_SIZE);
in order to 'assign' value to it, don't define your inner vectors until you actually know their rquired dimensions, then use swap() instead of assignment:
vector<vector<double> > innerVector( KNOWN_DIMENSION );
myVector[i].swap( innerVector );
Note that push_back() will do a copy operation and might cause reallocation, while swap() won't (assuming same allocator types are used for both vectors).
It seems to me that you need a real matrix class instead of nesting vectors. Have a look at boost, which has some strong sparse matrix classes.
Ok, now I have done some small scale testing on my own. I used a "2DArray" obtained from http://www.tek-tips.com/faqs.cfm?fid=5575 to represent a structure allocating memory static. For the dynamic allocation I used vectors almost as indicated in my original post.
I tested the following code (hr_time is a timing routine found on web which I due to anti spam unfortunately cannot post, but credits to David Bolton for providing it)
#include <vector>
#include "hr_time.h"
#include "2dArray.h"
#include <iostream>
using namespace std;
int main()
{
vector<int> temp;
vector<vector<int> > temp2;
CStopWatch mytimer;
mytimer.startTimer();
for(int i=0; i<1000; i++)
{
temp2.push_back(temp);
for(int j=0; j< 2000; j++)
{
temp2.back().push_back(j);
}
}
mytimer.stopTimer();
cout << "With vectors without reserved: " << mytimer.getElapsedTime() << endl;
vector<int> temp3;
vector<vector<int> > temp4;
temp3.reserve(1001);
mytimer.startTimer();
for(int i=0; i<1000; i++)
{
temp4.push_back(temp3);
for(int j=0; j< 2000; j++)
{
temp4.back().push_back(j);
}
}
mytimer.stopTimer();
cout << "With vectors with reserved: " << mytimer.getElapsedTime() << endl;
int** MyArray = Allocate2DArray<int>(1000,2000);
mytimer.startTimer();
for(int i=0; i<1000; i++)
{
for(int j=0; j< 2000; j++)
{
MyArray[i][j]=j;
}
}
mytimer.stopTimer();
cout << "With 2DArray: " << mytimer.getElapsedTime() << endl;
//Test
for(int i=0; i<1000; i++)
{
for(int j=0; j< 200; j++)
{
//cout << "My Array stores :" << MyArray[i][j] << endl;
}
}
return 0;
}
It turns out that there is approx a factor 10 for these sizes. I should thus reconsider if dynamic allocation is appropriate for my application since speed is of utmost importance!
Why not subclass the inner containers and reserve() in constructors ?
If the Matrix does get really large and spare I'd try a sparse matrix lib too. Otherwise, before messing with allocaters, I'd try replacing vector with deque. A deque won't reallocate on growing and offers almost as fast random access as a vector.
This was more or less answered here. So your code would look something like this:
vector<vector<vector<double> > > foo(maxdim1,
vector<vector<double> >(maxdim2,
vector<double>(maxdim3)));