Speed difference of dynamic and classical multi-dimentional arrays

Speed difference of dynamic and classical multi-dimentional arrays - c++

Are the usages (not creations) speed of dynamic and classical multi-dimensional arrays different in terms of speed?
I mean, for example, when I try to access all values in a three-dimensional array with the help of loops, Is there any speed difference between the arrays which created as dynamic and classical methods.
When I say "dynamic three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is created like this.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
When I say "classical three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is simply created like this.
float matris_cos[kuanta][d][angle_scale];
But please attention, I don't ask the creation speed of these arrays. I want to access the values of these arrays via some loops. Is there any speed difference when I try to access the values.

An array of pointers (to arrays of pointers) will require extra levels of indirection to access a random element, while a multi-dimensional array will require basic arithmetic (multiplication and pointer addition). On most modern platforms, indirection is likely to be slower unless you use cache-friendly access patterns. Also, all the elements of the multi-dimensional array will be contiguous, which could help caching if you iterate over the whole array.
Whether this difference is measurable or not is something you can only tell by measuring it.
If the extra indirection does prove to be a bottleneck, you could replace the array-of-pointers with a class to represent the multi-dimensional array with a flat array:
class array_3d {
size_t d1,d2,d3;
std::vector<float> flat;
public:
array_3d(size_t d1, size_t d2, size_t d3) :
d1(d1), d2(d2), d3(d3), flat(d1*d2*d3)
{}
float & operator()(size_t x, size_t y, size_t z) {
return flat[x*d2*d3 + y*d3 + z];
}
// and a similar const overload
};
I believe that the next C++ standard (due next year) will include dynamically sized arrays, so you should be able to use the multi-dimensional form in all cases.

You won't be able to spot any difference between them in a typical application unless your arrays are pretty huge and you spend a lot of time reading/writing to them, but nonetheless, there is a difference.
float matris_cos[kuanta][d][angle_scale];
1) The memory for this multidimensional array will be contiguous. There will be less cache misses as a result.
2) The array will require space only for the floats themselves.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
1) The memory for this multidimensional array is allocated in blocks and is thus much less likely to be contiguous. This may result in cache misses.
2) This method requires space for the pointers as well as the floats themselves.
Since there's indirection in the second case, you can expect a tiny speed difference when attempting to access or change values.
To recap:
Second case uses more memory
Second case involves indirection
Second case does not have guaranteed cache locality.

Related

Alignment of Eigen matrices in an `std::array`

I have an application where I need to pass N Eigen matrices to some functions. N is a compile-time constant, and these functions are called a significant number of times within tight loops. To avoid dynamic allocations are runtime, I thought it might be nice to store these matrices in an std::array, and then pass iterators to this array as function arguments. As a trivial example, consider:
const int N = 3;
const int SIZE = 125;
typedef std::array<Eigen::Matrix<double, SIZE, 1>, N> MatrixArray;
void computeMatrixProductArray(MatrixArray::const_iterator BeginIn,
MatrixArray::const_iterator BeginEnd,
MatrixArray::iterator BeginOut,
MatrixArray::iterator EndOut)
{
Eigen::Matrix<double, SIZE, 1> Test;
for (int J = 0; J < N; ++J) {
*(BeginOut + J) = Test.array() * (*(BeginIn + J)).array();
}
}
int main()
{
MatrixArray ArrayIn, ArrayOut;
computeMatrixProductArray(ArrayIn.cbegin(), ArrayIn.cend(),
ArrayOut.begin(), ArrayOut.end());
}
My question has to do with the alignment of the matrices stored in MatrixArray, and how Eigen3.3 treats unaligned memory. The size of the matrices, and the properties of std::array, ensure that an individual matrix in a MatrixArray will certainly not be aligned on any nice boundary. However, to my understanding, Eigen3.3 can still vectorize this case with unaligned operations.
Can anyone provide insight on what happens in the above example when J == 1, and the first entry of this matrix is not aligned? Does Eigen3.3 now treat this case similarly to what happens with equivalent dynamic matrices? My understanding here is that scalar, or unaligned, operations are used until an appropriate alignment boundary is reached, at which time fully aligned operations will be used. Or, since the matrix is not aligned to begin with, there is no chance of aligned vectorization? Or is something else happening entirely?
Thanks for any insight, and thanks to the developers for maintaining such a powerful library.

How Physically are Arrays Stored (Specifically with dimensions greater than 2)?

What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.

What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.

The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.

Is it worth to use vector in case of making a map

I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?

I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array

A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.

Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.

Memory layout : 2D NM data as pointer to NM buffer or as array of N pointers to arrays

I'm hesitating on how to organize the memory layout of my 2D data.
Basically, what I want is an N*M 2D double array, where N ~ M are in the thousands (and are derived from user-supplied data)
The way I see it, I have 2 choices :
double *data = new double[N*M];
or
double **data = new double*[N];
for (size_t i = 0; i < N; ++i)
data[i] = new double[M];
The first choice is what I'm leaning to.
The main advantages I see are shorter new/delete syntax, continuous memory layout implies adjacent memory access at runtime if I arrange my access correctly, and possibly better performance for vectorized code (auto-vectorized or use of vector libraries such as vDSP or vecLib)
On the other hand, it seems to me that allocating a big chunk of continuous memory could fail/take more time compared to allocating a bunch of smaller ones. And the second method also has the advantage of the shorter syntax data[i][j] compared to data[i*M+j]
What would be the most common / better way to do this, mainly if I try to view it from a performance standpoint (even though those are gonna be small improvements, I'm curious to see which would more performing).

Between the first two choices, for reasonable values of M and N, I would almost certainly go with choice 1. You skip a pointer dereference, and you get nice caching if you access data in the right order.
In terms of your concerns about size, we can do some back-of-the-envelope calculations.
Since M and N are in the thousands, suppose each is 10000 as an upper bound. Then your total memory consumed is
10000 * 10000 * sizeof(double) = 8 * 10^8
This is roughly 800 MB, which while large, is quite reasonable given the size of memory in modern day machines.

If N and M are constants, it is better to just statically declare the memory you need as a two dimensional array. Or, you could use std::array.
std::array<std::array<double, M>, N> data;
If only M is a constant, you could use a std::vector of std::array instead.
std::vector<std::array<double, M>> data(N);
If M is not constant, you need to perform some dynamic allocation. But, std::vector can be used to manage that memory for you, so you can create a simple wrapper around it. The wrapper below returns a row intermediate object to allow the second [] operator to actually compute the offset into the vector.
template <typename T>
class matrix {
const size_t N;
const size_t M;
std::vector<T> v_;
struct row {
matrix &m_;
const size_t r_;
row (matrix &m, size_t r) : m_(m), r_(r) {}
T & operator [] (size_t c) { return m_.v_[r_ * m_.M + c]; }
T operator [] (size_t c) const { return m_.v_[r_ * m_.M + c]; }
};
public:
matrix (size_t n, size_t m) : N(n), M(m), v_(N*M) {}
row operator [] (size_t r) { return row(*this, r); }
const row & operator [] (size_t r) const { return row(*this, r); }
};
matrix<double> data(10,20);
data[1][2] = .5;
std::cout << data[1][2] << '\n';
In addressing your particular concern about performance: Your rationale for wanting a single memory access is correct. You should want to avoid doing new and delete yourself, however (which is something this wrapper provides), and if the data is more naturally interpreted as multi-dimensional, then showing that in the code will make the code easier to read as well.
Multiple allocations as shown in your second technique is inferior because it will take more time, but its advantage is that it may succeed more often if your system is fragmented (the free memory consists of smaller holes, and you do not have a free chunk of memory large enough to satisfy the single allocation request). But multiple allocations has another downside in that some more memory is needed to allocate space for the pointers to each row.
My suggestion provides the single allocation technique without needed to explicitly call new and delete, as the memory is managed by vector. At the same time, it allows the data to be addressed with the 2-dimensional syntax [x][y]. So it provides all the benefits of a single allocation with all the benefits of the multi-allocation, provided you have enough memory to fulfill the allocation request.

Consider using something like the following:
// array of pointers to doubles to point the beginning of rows
double ** data = new double*[N];
// allocate so many doubles to the first row, that it is long enough to feed them all
data[0] = new double[N * M];
// distribute pointers to individual rows as well
for (size_t i = 1; i < N; i++)
data[i] = data[0] + i * M;
I'm not sure if this is a general practice or not, I just came up with this. Some downs still apply to this approach, but I think it eliminates most of them, like being able to access the individual doubles like data[i][j] and all.

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.

Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.

If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.

I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value

Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}

My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value

While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js