2-dimensional array on heap, which version is faster? - c++

double **array = new double* [X];
for (int i=0; i<X; i++)
array[i] = new double [Y];
array[x][y] = n;
or
double *array = new double [X*Y];
array[x*Y+y] = n;
Second version is created faster, but access is in first version faster (e.g. image processing using convolution), isn't it? Or is it all negligible?

In theory the second version should be faster, because the entire array is allocated contiguouslly, so its more cache-friendly than the first.
But in practice, profile it and see what happens. This kind of performance questions depends heavily on your architecture, OS, etc.
My advise here (In addition to profiling) is: Consider to use standard containers (A std::vector<std::vector<T>> in this case) which had been profiled, tested, and also make your life easier moving you away from raw-pointers and manual memory management.

Ok, I have: 1000x1000 image with conventional implemented Fourier transform on double arrays: Windows 7 Pro 64-bit, VC++ 2010 Express -> exactly the same (2:11 minutes)!

Related

C++ 0xC0000005: Trying to make large vector[180][360]

I'm having a hard time trying to get my computer to allocate a large amount of memory (well within the 32GB on my machine). I'm in Visual Studio 2015, running the target as x64 on a 64-bit machine, so there should be no memory limitations.
Originally I tried to make vector<int> my_vector[4][180][360], which resulted in a stack overflow.
According to the Visual Studio debugger, memory usage does not go above 70MB, so I'm quite curious how the memory problems are occurring. Memory usage on the computer stays with over 25GB free.
Is there any way to declare an array vector such as vector<int> my_vector[4][180][360] without memory problems? So far I can only get as high as v[180][180]. Most of my vectors will have very few elements. Any help is much appreciated, thank you.
static std::vector<int> my_vector[4][180][360];
for (int i=0; i < 4; i++)
for (int j=0; j < 180; j++)
for (int k=0; k < 36; k++)
my_vector[i][j][k].resize(90000);
my_vector[1][2][3][4]=99;
This works on my machine with 24gb by creating virtual to disk. But it is going to be slow more likely than not. You might be better off indexing a disk file.
You can also use std::map to create a sparse array
>
static std::map<int,int> my_map[4][180][360];
my_map[1][2][3][4]=99;
Are you allocating memory on the stack? If so, I believe there is a limit before you get a stack overflow error. For Visual Studio 2015, I think the default stack size is 1MB.
For larger arrays, you need to allocate them on the heap using the new keyword.
If you are trying to allocate a multidimensional array, it can get fairly complex. A two dimensional integer array is allocated dynamically as an array of pointers, with each pointer allocated to new array of integers. In a naïve version, you will need a loop to allocate (and eventually deallocate):
int **a= new int*[1000];
for (i = 0; i< 1000; i++) {
a[i] = new int[1000];
}
As you can see, multiple dimensions become even more complex and will eat up additional memory just to store pointers. However, if you know the total number of elements, you can allocate just a single array to store all elements (100000 for my 1000x1000 example) and calculate the position of each element accordingly.
I'll leave the rest for you to figure out...

Why the compress_matrix in boost ublas allocate more memory that it needed for nonzero elements?

I dug into the boost ublas code and found out the ublas implementation for memory allocation in compressed_matrix is not as standard as in CSC or CSR.
There is one line that cause the trouble, namely,
non_zeros = (std::max) (non_zeros, (std::min) (size1_,size2_)); in the private restrict_capactiy method.
Does that mean if I create a sparse matrix the number of nonzero allocated in boost ublas will always be greater than min(nrow, ncol)?
The following code I used to demonstrate this problem. The output will has zeros in the unused part of the vector allocated in compressed_matrix.
typedef boost::numeric::ublas::compressed_matrix<double, boost::numeric::ublas::column_major,0,std::vector<std::size_t>, std::vector<double> > Matrix;
long nrow = 5;
long ncol = 4;
long nnz = 2;
Matrix m(nrow, ncol, nnz);
cout<<"setting"<<endl;
m(1,2) = 1.1;
m(2,2) = 2.1;
for(int i=0;i<m.index1_data().size();i++)
{
cout<<"ind1 -"<<i<<" "<<m.index1_data()[i]<<endl;
}
for(int i=0;i<m.index2_data().size();i++)
{
cout<<"ind2 -"<<i<<" "<<m.index2_data()[i]<<endl;
}
for(int i=0;i<m.value_data().size();i++)
{
cout<<"val -"<<i<<" "<<m.value_data()[i]<<endl;
}
Perhaps it is a performance-design choice with certain use cases in mind.
The idea is that when filling the compressed_matrix one might try to minimize reallocations of the arrays that maintains the index/values arrays. If one starts from 0 allocated space, it will quickly to speculativelly reallocate once in a while (e.g. reserving twice the space each time the allocated space is exceeded, like std::vector does).
Since the idea is to kill the $N^2$ scaling of the dense matrix. A good guess is that in an sparse matrix you will use more or less $N$ elements out of the $N^2$. If you use more than $N$ then reallocation will happen at some point but not as many times. But then you will probably be in the case when it is better to switch to a dense matrix anyway.
What is a little more surprising is that it overwrites the passed value. But still, the above applies.

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.
Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.
If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.
I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value
Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}
My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value
While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};

How to speed up memory allocation for 2D triangular matrix in c++?

I need to allocate memory for a very large array which represents triangular matrix.
I wrote the following code:
const int max_number_of_particles=20000;
float **dis_vec;
dis_vec = new float **[max_number_of_particles];
for (i = 0; i<max_number_of_particles; i++)
dis_vec[i] = new float *[i];
for (i = 0; i<max_number_of_particles; i++)
for (j = 0; j<i; j++)
dis_vec[i][j] = new float[2];
The problem is that the time needed to do it (to allocate the memory) quickly increases with the increasing size of matrix. Does anyone know better solution for this problem?
Thanks.
Allocate a one dimensional array and convert indices to subscripts and vice versa. One allocation compared to O(N) allocations should be much faster.
EDIT
Specifically, just allocate N(N+1)/2 elements, and when you want to access [r][c] in the original, just access [r*(r+1)/2 + c] instead.
Yes.
First... start with your inner loop.
"new float[2]"
That allocates an array, which I imagine is slower to allocate than a fixed size object that happens to have 2 floats.
struct Float2D {
float a;
float b;
};
x = new Float2D;
that seems better.
But really, forget all that. If you want it fast... just malloc a bunch of floats.
I'd say... let some floats go to waste. Just alloc a plain old 2D array.
float* f = (float*)malloc( max_number_of_particles*max_number_of_particles*2*sizeof(float) );
The only size saving you could get over this, is a 2x size saving by using a triangle instead of a square.
However, I'm pretty damn sure you KILLED that entire "size saving" already, by using "new float[2]", and "new float *[i];". I'm not sure how much the overhead of "new" is, but I imagine it's like malloc except worse. And I think most mallocs have about 8 bytes overhead per allocation.
So what you have already is WORSE than a 2X size lost by allocating a square.
Also, it makes the math simpler. You'd need to do some wierd "Triangular number" math to get the pointer. Something like (n+1)*n/2 or whatever it is :)

C++ - STL vector question

Is there any way to make std::vector faster on reserving + resizing?
I would like to achieve the performance which would be somewhat equivalent to plain C arrays.
See the following code snippets:
TEST(test, vector1) {
for (int i = 0; i < 50; ++i) {
std::vector<int> a;
a.reserve(10000000);
a.resize(10000000);
}
}
TEST(test, vector2) {
for (int i = 0; i < 50; ++i) {
std::vector<int> a(10000000);
}
}
TEST(test, carray) {
for (int i = 0; i < 50; ++i) {
int* new_a = new int[10000000];
delete[] new_a;
}
}
First two tests are two times slower (4095 ms vs 2101 ms) and, obviously, that happens because std::vector is nulling the elements in it. Any ideas on how this could be avoided?
Or probably there is some standard (boost?) container that implements a fixed-size and heap-based array?
Thank you
Well naturally the first 2 tests are slower. They explicitly go through the entire vector and call "int()" on each element. Edit: This has the effect of setting all the elements to "0".
Just try reserving.
There is some very relevant info to your question in this question i asked a while back:
std::vector reserve() and push_back() is faster than resize() and array index, why?
There's boost::array.
Were your tests performed in debug or release mode? I know the microsoft compiler adds a lot of debug checks that can really slow down performance.
Maybe you could use a boost::scoped_array, but if this really is that performance critical, maybe you should try putting the initialization/allocation outside the innermost loop somehow?
I'm going to give you the benefit of the doubt and assume you've already done some profiling and determined the use of vector in this fashion to be a hotspot. If not, it's a bit premature to consider the differences unless you're working at a very tight, small-scale application where every clock cycle counts in which case it's even easier to use a profiler and there's just as much of a reason to do so.
boost::scoped_array is one solution. There's no way to get vector to not initialize the elements it stores. Another one is std::deque if you don't need a contiguous memory block. deque can be significantly faster than vector or a dynamically-allocated array with the same number of elements as it creates as it creates smaller memory blocks which operating systems tend to deal with better along with being cache-friendly.