Memory layout : 2D N*M data as pointer to N*M buffer or as array of N pointers to arrays - c++

I'm hesitating on how to organize the memory layout of my 2D data.
Basically, what I want is an N*M 2D double array, where N ~ M are in the thousands (and are derived from user-supplied data)
The way I see it, I have 2 choices :
double *data = new double[N*M];
or
double **data = new double*[N];
for (size_t i = 0; i < N; ++i)
data[i] = new double[M];
The first choice is what I'm leaning to.
The main advantages I see are shorter new/delete syntax, continuous memory layout implies adjacent memory access at runtime if I arrange my access correctly, and possibly better performance for vectorized code (auto-vectorized or use of vector libraries such as vDSP or vecLib)
On the other hand, it seems to me that allocating a big chunk of continuous memory could fail/take more time compared to allocating a bunch of smaller ones. And the second method also has the advantage of the shorter syntax data[i][j] compared to data[i*M+j]
What would be the most common / better way to do this, mainly if I try to view it from a performance standpoint (even though those are gonna be small improvements, I'm curious to see which would more performing).

Between the first two choices, for reasonable values of M and N, I would almost certainly go with choice 1. You skip a pointer dereference, and you get nice caching if you access data in the right order.
In terms of your concerns about size, we can do some back-of-the-envelope calculations.
Since M and N are in the thousands, suppose each is 10000 as an upper bound. Then your total memory consumed is
10000 * 10000 * sizeof(double) = 8 * 10^8
This is roughly 800 MB, which while large, is quite reasonable given the size of memory in modern day machines.

If N and M are constants, it is better to just statically declare the memory you need as a two dimensional array. Or, you could use std::array.
std::array<std::array<double, M>, N> data;
If only M is a constant, you could use a std::vector of std::array instead.
std::vector<std::array<double, M>> data(N);
If M is not constant, you need to perform some dynamic allocation. But, std::vector can be used to manage that memory for you, so you can create a simple wrapper around it. The wrapper below returns a row intermediate object to allow the second [] operator to actually compute the offset into the vector.
template <typename T>
class matrix {
const size_t N;
const size_t M;
std::vector<T> v_;
struct row {
matrix &m_;
const size_t r_;
row (matrix &m, size_t r) : m_(m), r_(r) {}
T & operator [] (size_t c) { return m_.v_[r_ * m_.M + c]; }
T operator [] (size_t c) const { return m_.v_[r_ * m_.M + c]; }
};
public:
matrix (size_t n, size_t m) : N(n), M(m), v_(N*M) {}
row operator [] (size_t r) { return row(*this, r); }
const row & operator [] (size_t r) const { return row(*this, r); }
};
matrix<double> data(10,20);
data[1][2] = .5;
std::cout << data[1][2] << '\n';
In addressing your particular concern about performance: Your rationale for wanting a single memory access is correct. You should want to avoid doing new and delete yourself, however (which is something this wrapper provides), and if the data is more naturally interpreted as multi-dimensional, then showing that in the code will make the code easier to read as well.
Multiple allocations as shown in your second technique is inferior because it will take more time, but its advantage is that it may succeed more often if your system is fragmented (the free memory consists of smaller holes, and you do not have a free chunk of memory large enough to satisfy the single allocation request). But multiple allocations has another downside in that some more memory is needed to allocate space for the pointers to each row.
My suggestion provides the single allocation technique without needed to explicitly call new and delete, as the memory is managed by vector. At the same time, it allows the data to be addressed with the 2-dimensional syntax [x][y]. So it provides all the benefits of a single allocation with all the benefits of the multi-allocation, provided you have enough memory to fulfill the allocation request.

Consider using something like the following:
// array of pointers to doubles to point the beginning of rows
double ** data = new double*[N];
// allocate so many doubles to the first row, that it is long enough to feed them all
data[0] = new double[N * M];
// distribute pointers to individual rows as well
for (size_t i = 1; i < N; i++)
data[i] = data[0] + i * M;
I'm not sure if this is a general practice or not, I just came up with this. Some downs still apply to this approach, but I think it eliminates most of them, like being able to access the individual doubles like data[i][j] and all.

Related

Overloaded vector operators causing a massive performance reduction?

I am summing and multiplying vectors by a constant many many times so I overloaded the operators * and +. However working with vectors greatly slowed down my program. Working with a standard C-array improved the time by a factor of 40. What would cause such a slow down?
An example program showing my overloaded operators and exhibiting the slow-down is below. This program does k = k + (0.0001)*q, log(N) times (here N = 1000000). At the end the program prints the times to do the operations using vectors and c-arrays, and also the ratio of the times.
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <time.h>
#include <vector>
using namespace std;
// -------- OVERLOADING VECTOR OPERATORS ---------------------------
vector<double> operator*(const double a,const vector<double> & vec)
{
vector<double> result;
for(int i = 0; i < vec.size(); i++)
result.push_back(a*vec[i]);
return result;
}
vector<double> operator+(const vector<double> & lhs,
const vector<double> & rhs)
{
vector<double> result;
for(int i = 0; i < lhs.size();i++)
result.push_back(lhs[i]+rhs[i]);
return result;
}
//------------------------------------------------------------------
//--------------- Basic C-Array operations -------------------------
// s[k] = y[k];
void populate_array(int DIM, double *y, double *s){
for(int k=0;k<DIM;k++)
s[k] = y[k];
}
//sums the arrays y and s as y+c s and sends them to s;
void sum_array(int DIM, double *y, double *s, double c){
for(int k=0;k<DIM;k++)
s[k] = y[k] + c*s[k];
}
// sums the array y and s as a*y+c*s and sends them to s;
void sum_array2(int DIM, double *y, double *s,double a,double c){
for(int k=0;k<DIM;k++)
s[k] = a*y[k] + c*s[k];
}
//------------------------------------------------------------------
int main(){
vector<double> k = {1e-8,2e-8,3e-8,4e-8};
vector<double> q = {1e-8,2e-8,3e-8,4e-8};
double ka[4] = {1e-8,2e-8,3e-8,4e-8};
double qa[4] = {1e-8,2e-8,3e-8,4e-8};
int N = 3;
clock_t begin,end;
double elapsed_sec,elapsed_sec2;
begin = clock();
do
{
k = k + 0.0001*q;
N = 2*N;
}while(N<1000000);
end = clock();
elapsed_sec = double(end-begin) / CLOCKS_PER_SEC;
printf("vector time: %g \n",elapsed_sec);
N = 3;
begin = clock();
do
{
sum_array2(4, qa, ka,0.0001,1.0);
N = 2*N;
}while(N<1000000);
end = clock();
elapsed_sec2 = double(end-begin) / CLOCKS_PER_SEC;
printf("array time: %g \n",elapsed_sec2);
printf("time ratio : %g \n", elapsed_sec/elapsed_sec2);
}
I get the ratio of vector time to c-array timeto be typically ~40 on my linux system. What is it about my overload operators that causes the slowdown compared to C-array operations?
Let's take a look at this line:
k = k + 0.0001*q;
To evaluate this, first the computer needs to call your operator*. This function creates a vector and needs to allocate dynamic storage for its elements. Actually, since you use push_back rather than setting the size ahead of time via constructor, resize, or reserve, it might allocate too few elements the first time and need to allocate again to grow the vector.
This created vector (or one move-constructed from it) is then used as a temporary object representing the subexpression 0.0001*q within the whole statement.
Next the computer needs to call your operator+, passing k and that temporary vector. This function also creates and returns a vector, doing at least one dynamic allocation and possibly more. There's a second temporary vector for the subexpression k + 0.0001*q.
Finally, the computer calls an operator= belonging to std::vector. Luckily, there's a move assignment overload, which (probably) just moves the allocated memory from the second temporary to k and deallocates the memory that was in k.
Now that the entire statement has been evaluated, the temporary objects are destroyed. First the temporary for k + 0.0001*q is destroyed, but it no longer has any memory to clean up. Then the temporary for 0.0001*q is destroyed, and it does need to deallocate its memory.
Doing lots of allocating and deallocating of memory, even in small amounts, tends to be somewhat expensive. (The vectors will use std::allocator, which is allowed to be smarter and avoid some allocations and deallocations, but I couldn't say without investigation how likely it would be to actually help here.)
On the other hand, your "C-style" implementation does no allocating or deallocating at all. It does an "in-place" calculation, just modifying arrays passed in to store the values passed out. If you had another C-style implementation with individual functions like double* scalar_times_vec(double s, const double* v, unsigned int len); that used malloc to get memory for the result and required the results to eventually be freed, you would probably get similar results.
So how might the C++ implementation be improved?
As mentioned, you could either reserve the vectors before adding data to them, or give them an initial size and do assignments like v[i] = out; rather than push_back(out);.
The next easiest step would be to use more operators that allow in-place calculations. If you overloaded:
std::vector<double>& operator+=(const std::vector<double>&);
std::vector<double>& operator*=(double);
then you could do:
k += 0.0001*q;
n *= 2;
// or:
n += n;
to do the final computations on k and n in-place. This doesn't easily help with the expression 0.0001*q, though.
Another option that sometimes helps is to overload operators to accept rvalues in order to reuse storage that belonged to temporaries. If we had an overload:
std::vector<double> operator+(const std::vector<double>& a, std::vector<double>&& b);
it would get called for the + in the expression k + 0.0001*q, and the implementation could create the return value from std::move(b), reusing its storage. This gets tricky to get both flexible and correct, though. And it still doesn't eliminate the temporary representing 0.0001*q or its allocation and deallocation.
Another solution that allows in-place calculations in the most general cases is called expression templates. That's rather a lot of work to implement, but if you really need a combination of convenient syntax and efficiency, there are some existing libraries that might be worth looking into.
Edit:
I should have taken a closer look on how you perform the c-array operations... See aschepler's answer on why growing the vectors is the least of your problems.
–––
If you have any idea how many elements you are going to add to a vector, you should always call reserve on the vector before adding them. Otherwise you are going to trigger a potentially large amount of reallocations, which are costly.
A vector occupies a continuous block of memory. To grow, it has to allocate a larger block of memory and copy its entire content to the new location. To avoid this happening every time a element is added, the vector usually allocates more memory than is presently needed to store all its elements. The number of elements it can store without reallocation is its capacity. How large this capacity should be is of course a trade off between avoiding potential future reallocation and wasting memory.
However, if you know (or have a good idea) how many elements will eventually be stored in the vector, you can call reserve(n) to set its capacity to (at least) n and avoid unecessary reallocation.
Edit :
See also here. push_back performes a bound check and is thus a bit slower than just writing to the vector through operator[]. In your case it might be fastest to directly construct a vector of size (not just capacity) n, as doubles are POD and cheap to construct, and than insert the correct values through operator[].

use std::vector for dynamically allocated 2d array?

So I am writing a class, which has 1d-arrays and 2d-arrays, that I dynamically allocate in the constructor
class Foo{
int** 2darray;
int * 1darray;
};
Foo::Foo(num1, num2){
2darray = new int*[num1];
for(int i = 0; i < num1; i++)
{
array[i] = new int[num2];
}
1darray = new int[num1];
}
Then I will have to delete every 1d-array and every array in the 2d array in the destructor, right?
I want to use std::vector for not having to do this. Is there any downside of doing this? (makes compilation slower etc?)
TL;DR: when to use std::vector for dynamically allocated arrays, which do NOT need to be resized during runtime?
vector is fine for the vast majority of uses. Hand-tuned scenarios should first attempt to tune the allocator1, and only then modify the container. Correctness of memory management (and your program in general) is worth much, much more than any compilation time gains.
In other words, vector should be your starting point, and until you find it unsatisfactory, you shouldn't care about anything else.
As an additional improvement, consider using a 1-dimensional vector as a backend storage and only provide 2-dimensional indexed view. This scenario can improve the cache locality and overall performance, while also making some operations like copying of the whole structure much easier.
1 the second of two template parameters that vector accepts, which defaults to a standard allocator for a given type.
There should not be any drawbacks since vector guarantees contiguous memory. But if the size is fixed and C++11 is available maybe an array among other options:
it doesn't allow resizing
depending on how the vector is initialized prevents reallocations
size is hardcoded in the instructions (template argument). See Ped7g comment for a more detailed description
An 2D array is not a array of pointers.
If you define it this way, each row/colum can have a different size.
Furthermore the elements won't be in sequence in memory.
This might lead to poor performance as the prefetcher wont be able to predict your access-patterns really well.
Therefore it is not advised to nest std::vectors inside eachother to model multi-dimensional arrays.
A better approach is to map an continuous chunk of memory onto an mult-dimensional space by providing custom access methods.
You can test it in the browser: http://fiddle.jyt.io/github/3389bf64cc6bd7c2218c1c96f62fa203
#include<vector>
template<class T>
struct Matrix {
Matrix(std::size_t n=1, std::size_t m=1)
: n{n}, m{m}, data(n*m)
{}
Matrix(std::size_t n, std::size_t m, std::vector<T> const& data)
: n{n}, m{m}, data{data}
{}
//Matrix M(2,2, {1,1,1,1});
T const& operator()(size_t i, size_t j) const {
return data[i*m + j];
}
T& operator()(size_t i, size_t j) {
return data[i*m + j];
}
size_t n;
size_t m;
std::vector<T> data;
using ScalarType = T;
};
You can implement operator[] by returning a VectorView which has access to data an index and the dimensions.

Speed difference of dynamic and classical multi-dimentional arrays

Are the usages (not creations) speed of dynamic and classical multi-dimensional arrays different in terms of speed?
I mean, for example, when I try to access all values in a three-dimensional array with the help of loops, Is there any speed difference between the arrays which created as dynamic and classical methods.
When I say "dynamic three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is created like this.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
When I say "classical three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is simply created like this.
float matris_cos[kuanta][d][angle_scale];
But please attention, I don't ask the creation speed of these arrays. I want to access the values of these arrays via some loops. Is there any speed difference when I try to access the values.
An array of pointers (to arrays of pointers) will require extra levels of indirection to access a random element, while a multi-dimensional array will require basic arithmetic (multiplication and pointer addition). On most modern platforms, indirection is likely to be slower unless you use cache-friendly access patterns. Also, all the elements of the multi-dimensional array will be contiguous, which could help caching if you iterate over the whole array.
Whether this difference is measurable or not is something you can only tell by measuring it.
If the extra indirection does prove to be a bottleneck, you could replace the array-of-pointers with a class to represent the multi-dimensional array with a flat array:
class array_3d {
size_t d1,d2,d3;
std::vector<float> flat;
public:
array_3d(size_t d1, size_t d2, size_t d3) :
d1(d1), d2(d2), d3(d3), flat(d1*d2*d3)
{}
float & operator()(size_t x, size_t y, size_t z) {
return flat[x*d2*d3 + y*d3 + z];
}
// and a similar const overload
};
I believe that the next C++ standard (due next year) will include dynamically sized arrays, so you should be able to use the multi-dimensional form in all cases.
You won't be able to spot any difference between them in a typical application unless your arrays are pretty huge and you spend a lot of time reading/writing to them, but nonetheless, there is a difference.
float matris_cos[kuanta][d][angle_scale];
1) The memory for this multidimensional array will be contiguous. There will be less cache misses as a result.
2) The array will require space only for the floats themselves.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
1) The memory for this multidimensional array is allocated in blocks and is thus much less likely to be contiguous. This may result in cache misses.
2) This method requires space for the pointers as well as the floats themselves.
Since there's indirection in the second case, you can expect a tiny speed difference when attempting to access or change values.
To recap:
Second case uses more memory
Second case involves indirection
Second case does not have guaranteed cache locality.

1D or 2D array, what's faster?

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?
I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..
I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.
So, what's
faster,
smaller (RAM)
dynamic 1D arrays or dynamic 2D arrays?
tl;dr : You should probably use a one-dimensional approach.
Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.
1. What's faster?
For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.
2. What's smaller?
Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.
Remarks
I laid out a pretty long answer beneath with several reasons but I want to make some remarks on your assumptions first.
I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y)
Let's compare these two functions:
int get_2d (int **p, int r, int c) { return p[r][c]; }
int get_1d (int *p, int r, int c) { return p[c + C*r]; }
The (non-inlined) assembly generated by Visual Studio 2015 RC for those functions (with optimizations turned on) is:
?get_1d##YAHPAHII#Z PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0
?get_2d##YAHPAPAHII#Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0
The difference is mov (2d) vs. lea (1d).
The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner Fog
Since the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.
This brings us to the next (and more interesting) point:
... but I could image that 1D could be in CPU cache ...
True, but 2d could be in CPU cache, too. See The Downsides: Memory locality for an explanation why 1d is still better.
The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.
Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.
The Problem
To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.
Example case using pointer to pointer syntax
int main (void)
{
// allocate memory for 4x4 integers; quick & dirty
int ** p = new int*[4];
for (size_t i=0; i<4; ++i) p[i] = new int[4];
// do some stuff here, using p[x][y]
// deallocate memory
for (size_t i=0; i<4; ++i) delete[] p[i];
delete[] p;
}
The downsides
Memory locality
For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelated and can therefore result in an arbitrary memory position.
The following image will give you an idea of how the memory may look like.
For the real 2d case:
The violet square is the memory position occupied by p itself.
The green squares assemble the memory region p points to (4 x int*).
The 4 regions of 4 contiguous blue squares are the ones pointed to by each int* of the green region
For the 2d mapped on 1d case:
The green square is the only required pointer int *
The blue squares ensemble the memory region for all matrix elements (16 x int).
This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.
Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.
If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes).
If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned).
There will probably not be any more misses while processing that data.
In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately.
Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth.
If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element.
But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one.
As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.
Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.
Frequent Allocation / Deallocation
As many as N + 1 (4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
The same number of proper, respective deallocation operations must be applied as well.
Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.
This is getting even worse with a growing number of rows.
Memory consumption overhead
I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)
Let's remember: We want to store a 4×4 int matrix which means 64 bytes.
For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume
N*M*sizeof(int) [the actual blue data] +
N*sizeof(int*) [the green pointers] +
sizeof(int**) [the violet variable p] bytes.
That makes 4*4*4 + 4*4 + 4 = 84 bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>.
It will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, that is 4*4*4 + 4*16 + 16 = 144 bytes in total, intead of 64 bytes for 4 x 4 int.
In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)
This means the worst case is:
N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_
The share of the overhead will reduce as the size of the matrix grows but will still be present.
Risk of memory leaks
The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail!
You’ll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.
If new runs of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_alloc is thrown by new.
Example:
In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_alloc exceptions.
// allocate memory for 4x4 integers; quick & dirty
size_t const N = 4;
// we don't need try for this allocation
// if it fails there is no leak
int ** p = new int*[N];
size_t allocs(0U);
try
{ // try block doing further allocations
for (size_t i=0; i<N; ++i)
{
p[i] = new int[4]; // allocate
++allocs; // advance counter if no exception occured
}
}
catch (std::bad_alloc & be)
{ // if an exception occurs we need to free out memory
for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
delete[] p; // free p
throw; // rethrow bad_alloc
}
/*
do some stuff here, using p[x][y]
*/
// deallocate memory accoding to the number of allocations
for (size_t i=0; i<allocs; ++i) delete[] p[i];
delete[] p;
Summary
There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.
Alternative
You should use a contiguous block of memory and map your rows onto that block.
The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like
What is The Rule of Three?
What is meant by Resource Acquisition is Initialization (RAII)?
C++ concept: Container (on cppreference.com)
Example
To provide an idea of how such a class may look like, here's a simple example with some basic features:
2d-size-constructible
2d-resizable
operator(size_t, size_t) for 2d- row major element access
at(size_t, size_t) for checked 2d-row major element access
Fulfills Concept requirements for Container
Source:
#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>
namespace matrices
{
template<class T>
class simple
{
public:
// misc types
using data_type = std::vector<T>;
using value_type = typename std::vector<T>::value_type;
using size_type = typename std::vector<T>::size_type;
// ref
using reference = typename std::vector<T>::reference;
using const_reference = typename std::vector<T>::const_reference;
// iter
using iterator = typename std::vector<T>::iterator;
using const_iterator = typename std::vector<T>::const_iterator;
// reverse iter
using reverse_iterator = typename std::vector<T>::reverse_iterator;
using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;
// empty construction
simple() = default;
// default-insert rows*cols values
simple(size_type rows, size_type cols)
: m_rows(rows), m_cols(cols), m_data(rows*cols)
{}
// copy initialized matrix rows*cols
simple(size_type rows, size_type cols, const_reference val)
: m_rows(rows), m_cols(cols), m_data(rows*cols, val)
{}
// 1d-iterators
iterator begin() { return m_data.begin(); }
iterator end() { return m_data.end(); }
const_iterator begin() const { return m_data.begin(); }
const_iterator end() const { return m_data.end(); }
const_iterator cbegin() const { return m_data.cbegin(); }
const_iterator cend() const { return m_data.cend(); }
reverse_iterator rbegin() { return m_data.rbegin(); }
reverse_iterator rend() { return m_data.rend(); }
const_reverse_iterator rbegin() const { return m_data.rbegin(); }
const_reverse_iterator rend() const { return m_data.rend(); }
const_reverse_iterator crbegin() const { return m_data.crbegin(); }
const_reverse_iterator crend() const { return m_data.crend(); }
// element access (row major indexation)
reference operator() (size_type const row,
size_type const column)
{
return m_data[m_cols*row + column];
}
const_reference operator() (size_type const row,
size_type const column) const
{
return m_data[m_cols*row + column];
}
reference at() (size_type const row, size_type const column)
{
return m_data.at(m_cols*row + column);
}
const_reference at() (size_type const row, size_type const column) const
{
return m_data.at(m_cols*row + column);
}
// resizing
void resize(size_type new_rows, size_type new_cols)
{
// new matrix new_rows times new_cols
simple tmp(new_rows, new_cols);
// select smaller row and col size
auto mc = std::min(m_cols, new_cols);
auto mr = std::min(m_rows, new_rows);
for (size_type i(0U); i < mr; ++i)
{
// iterators to begin of rows
auto row = begin() + i*m_cols;
auto tmp_row = tmp.begin() + i*new_cols;
// move mc elements to tmp
std::move(row, row + mc, tmp_row);
}
// move assignment to this
*this = std::move(tmp);
}
// size and capacity
size_type size() const { return m_data.size(); }
size_type max_size() const { return m_data.max_size(); }
bool empty() const { return m_data.empty(); }
// dimensionality
size_type rows() const { return m_rows; }
size_type cols() const { return m_cols; }
// data swapping
void swap(simple &rhs)
{
using std::swap;
m_data.swap(rhs.m_data);
swap(m_rows, rhs.m_rows);
swap(m_cols, rhs.m_cols);
}
private:
// content
size_type m_rows{ 0u };
size_type m_cols{ 0u };
data_type m_data{};
};
template<class T>
void swap(simple<T> & lhs, simple<T> & rhs)
{
lhs.swap(rhs);
}
template<class T>
bool operator== (simple<T> const &a, simple<T> const &b)
{
if (a.rows() != b.rows() || a.cols() != b.cols())
{
return false;
}
return std::equal(a.begin(), a.end(), b.begin(), b.end());
}
template<class T>
bool operator!= (simple<T> const &a, simple<T> const &b)
{
return !(a == b);
}
}
Note several things here:
T needs to fulfill the requirements of the used std::vector member functions
operator() doesn't do any "of of range" checks
No need to manage data on your own
No destructor, copy constructor or assignment operators required
So you don't have to bother about proper memory handling for each application but only once for the class you write.
Restrictions
There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if
the matrix is very large and sparse (if any of the rows do not even need to be allocated but can be handled using a nullptr) or if
the rows do not have the same number of columns (that is if you don't have a matrix at all but another two-dimensional construct).
Unless you are talking about static arrays, 1D is faster.
Here’s the memory layout of a 1D array (std::vector<T>):
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |
+---+---+---+---+---+---+---+---+---+
And here’s the same for a dynamic 2D array (std::vector<std::vector<T>>):
+---+---+---+
| * | * | * |
+-|-+-|-+-|-+
| | V
| | +---+---+---+
| | | | | |
| | +---+---+---+
| V
| +---+---+---+
| | | | |
| +---+---+---+
V
+---+---+---+
| | | |
+---+---+---+
Clearly the 2D case loses the cache locality and uses more memory. It also introduces an extra indirection (and thus an extra pointer to follow) but the first array has the overhead of calculating the indices so these even out more or less.
1D and 2D Static Arrays
Size: Both will require the same amount of memory.
Speed: You can assume that there will be no speed difference because the memory for both of these arrays should be contiguous (The
whole 2D array should appear as one chunk in memory rather than a
bunch of chunks spread across memory). (This could be compiler
dependent however.)
1D and 2D Dynamic Arrays
Size: The 2D array will require a tiny bit more memory than the 1D array because of the pointers needed in the 2D array to point to the set of allocated 1D arrays. (This tiny bit is only tiny when we're talking about really big arrays. For small arrays, the tiny bit could be pretty big relatively speaking.)
Speed: The 1D array may be faster than the 2D array because the memory for the 2D array would not be contiguous, so cache misses would become a problem.
Use what works and seems most logical, and if you face speed problems, then refactor.
The existing answers all only compare 1-D arrays against arrays of pointers.
In C (but not C++) there is a third option; you can have a contiguous 2-D array that is dynamically allocated and has runtime dimensions:
int (*p)[num_columns] = malloc(num_rows * sizeof *p);
and this is accessed like p[row_index][col_index].
I would expect this to have very similar performance to the 1-D array case, but it gives you nicer syntax for accessing the cells.
In C++ you can achieve something similar by defining a class which maintains a 1-D array internally, but can expose it via 2-D array access syntax using overloaded operators. Again I would expect that to have similar or identical performance to the plain 1-D array.
Another difference of 1D and 2D arrays appears in memory allocation. We cannot be sure that members of 2D array be sequental.
It really depends on how your 2D array is implemented.
consider the code below:
int a[200], b[10][20], *c[10], *d[10];
for (ii = 0; ii < 10; ++ii)
{
c[ii] = &b[ii][0];
d[ii] = (int*) malloc(20 * sizeof(int)); // The cast for C++ only.
}
There are 3 implementations here: b, c and d
There won't be a lot of difference accessing b[x][y] or a[x*20 + y], since one is you doing the computation and the other is the compiler doing it for you. c[x][y] and d[x][y] are slower, because the machine has to find the address that c[x] points to and then access the yth element from there. It is not one straight computation. On some machines (eg AS400 which has 36 byte (not bit) pointers), pointer access is extremely slow. It all depends on the architecture in use. On x86 type architectures, a and b are the same speed, c and d are slower than b.
I love the thorough answer provided by Pixelchemist. A simpler version of this solution may be as follows. First, declare the dimensions:
constexpr int M = 16; // rows
constexpr int N = 16; // columns
constexpr int P = 16; // planes
Next, create an alias and, get and set methods:
template<typename T>
using Vector = std::vector<T>;
template<typename T>
inline T& set_elem(vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
// check indexes here...
return m_[i_*N*P + j_*P + k_];
}
template<typename T>
inline const T& get_elem(const vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
// check indexes here...
return m_[i_*N*P + j_*P + k_];
}
Finally, a vector may be created and indexed as follows:
Vector array3d(M*N*P, 0); // create 3-d array containing M*N*P zero ints
set_elem(array3d, 0, 0, 1) = 5; // array3d[0][0][1] = 5
auto n = get_elem(array3d, 0, 0, 1); // n = 5
Defining the vector size at initialization provides optimal performance. This solution is modified from this answer. The functions may be overloaded to support varying dimensions with a single vector. The downside of this solution is that the M, N, P parameters are implicitly passed to the get and set functions. This can be resolved by implementing the solution within a class, as done by Pixelchemist.

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.
Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.
If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.
I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value
Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}
My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value
While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};