C++ array of Eigen dynamically sized matrix - c++

In my application, I have a one-dimensional grid and for each grid point there is a matrix (equally sized and quadratic). For each matrix, a certain update procedure must be performed. At the moment, I define a type
typedef Eigen::Matrix<double, N, N> my_matrix_t;
and allocate the matrices for all grid points using
my_matrix_t *matrices = new my_matrix_t[num_gridpoints];
Now I would like to address matrices whose sizes are only known at run time (but still quadratic), i.e.,
typedef Eigen::Matrix<double, Dynamic, Dynamic> my_matrix_t;
The allocation procedure remains the same and the code seems to work. However, I assume that the array "matrices" contains only the pointers to the each individual matrix storage and the overall performance will degrade as the memory has to be collected from random places before the operation on each matrix can be carried out.
Q0: Contiguous Memory?
Is the assumption correct that
new[] will only store the pointers and the matrix data is stored anywhere on the head?
it is beneficial to have a contiguous memory region for such problems?
Q1: new[] or std::vector?
Using a std::vector was suggested in the comments. Does this make any difference? Advantages/drawbacks of both solutions?
Q2: Overloading new[]?
I think by overloading the operator new[] in the Eigen::Matrix class (or one of its bases) such an allocation could be achieved. Is this a good idea?
Q3: Alternative ways?
As an alternative, I could think of using a large Eigen::Matrix. Can anyone share their experience here? Do you have other suggestions for me?

Let us sum up what we have so far based on the comments to the question and the mailing list post here. I would like to encourage everyone to edit and add things.
Q0: Contiguous memory region.
Yes, only the pointers are stored (independent of using new[] or std::vector).
Generally, in HPC applications, continuous memory accesses are beneficial.
Q1: The basic mechanisms are the same.
However, std::vector offers more comfort and takes work off the developer. The latter also reduces mistakes and memory leaks.
Q2: Use std::vector.
Overloading new[] is not recommended as it is difficult to get it right. For example, alignment issues could lead to errors on different machines. In order to guarantee correct behavior on all machines, use
std::vector<my_matrix_t, Eigen::aligned_allocator<my_matrix_t>> storage;
as explained here.
Q3: Use a large Eigen Matrix for the complete grid.
Alternatively, let the Eigen library do the complete allocation directly by using on of its data structures. This guarantees that issues such as alignment and a continuous memory region are addressed properly. The matrix
Eigen::Matrix<double, Dynamic, Dynamic> storage(N, num_grid_points * N);
contains all matrices for the complete grid and can be addressed using
/* i + 1 'th matrix for i in [0, num_gridpoints - 1] */
auto matrix = storage.block(0, i * N, N, N);

Related

Do multi-dimensional arrays cause any problems in C and/or C++?

I know that this question seems a little bit hilarious at the first sight. But as I came across this question I´ve found a comment, of #BasileStarynkevitch, a C and C++ high-level user, in which he claimed that multidimensional arrays shall not be preferable to use, neither in C nor in C++:
Don't use multi-dimensional arrays in C++ (or in C).
Why? Why shouldn´t I use multi-dimensional arrays in C++ nor in C?
What does he meant with this statement?
Thereafter, another user replied on this comment:
Basile is right. It's possible to declare a 3D array in C/C++ but causes too many problems.
Which problems?
I use multi-dimensional arrays a lot and see no disadvantages in using them. In opposite, I think it has only advantages.
Are there any issues about using mulit-dimensional arrays I do not know about?
Can anyone explain me what they meant?
This is quite a broad (and interesting) performance related topic. We could discuss cache misses, cost of initialization of multi-dimensional arrays, vectorization, allocation of multidimensional std::array on the stack, allocation of multidimensional std::vector on the heap, access to the latter two, and so on... .
That said, if your program works fine with your multidimensional arrays, leave it the way it is, especially if your multidimensional arrays allow for more readability.
A performance related example:
Consider a std::vector which holds many std::vector<double>:
std::vector<std::vector<double>> v;
We know that each std::vector object inside v is allocated contiguously. Also, all the elements in a std::vector<double> in v are allocated contiguously. However, not all the double's present in v are in contiguous memory. So, depending on how you access those elements (how many times, in what order, ...), a std::vector of std::vector's can be very slow compared to a single std::vector<double> containing all the double's in contiguous memory.
Matrix libraries will typically store a 5x5 matrix in a plain array of size 25.
You cannot answer this question for C and C++ at once, because there is a fundamental difference between these two languages and their handling of multidimensional arrays. So this answer contains two parts:
C++
Multidimensional arrays are pretty useless in C++ because you cannot allocate them with dynamic sizes. The sizes of all dimensions except the outermost one must be compile time constants. In virtually all the usecases for multidimensional arrays I have encountered, the size parameters are simply not known at compile time. Because they come from the dimensions of an image file, or some simulation parameter, etc.
There might be some special cases where the dimensions are actually known at compile time, and in these cases, there is no issue with using multidimensional arrays in C++. In all the other cases, you'll need to either use pointer arrays (tedious to set up), nested std::vector<std::vector<std::vector<...>>>, or a 1D array with manual index computation (error prone).
C
C allows for true multidimensional arrays with dynamic sizes since C99. This is called VLA, and it allows you to create fully dynamically sized multidimensional arrays both on the stack and the heap.
However, there are two catches:
You can pass a multidimensional VLA to a function, but you can't return it. If you want to pass multidimensional data out of a function, you must return it by reference.
void foo(int width, int height, int (*data)[width]); //works
//int (*bar(int width, int height))[width]; //does not work
You can have pointers to multidimensional arrays in variables, and you can pass them to functions, but you cannot store them in structs.
struct foo {
int width, height;
//int (*data)[width]; //does not work
};
Both problems can be worked around (pass by reference to return a multidimensional array, and storing the pointer as a void* in the struct), but it's not trivial. And since its not a heavily used feature, only very few people know how to do it right.
Compile time array sizes
Both C and C++ allow you to use multidimensional arrays with dimensions known at compile time. These do not have the drawbacks listed above.
But their usefulness is reduced greatly: There are just so many cases where you would want to use a multidimensional array, and where you do not have the ghost of a chance to know the involved sizes at compile time. An example is image processing: You don't know the dimensions of the image before you have opened the image file. Likewise with any physics simulation: You do not know how large your working domain is until your program has loaded its configuration files. Etc.
So, in order to be useful, multidimensional arrays must support dynamic sizes imho.
As with most data structures, there is a "right" time to use them, and a "wrong" time. This is largely subjective, but for the purposes of this question let's just assume you're using a 2D array in a place where it wouldn't make sense.
That said, I think there are two notable reasons to avoid using multidimensional arrays in C++, and they mainly arise based on the use cases of the array. Namely:
1. Slow(er) Memory Traversal
A 2-Dimensional array such as i[j][k] can be accessed contiguously, but the computer must spend extra time computing the address of each element - more than it would spend on a 1D array. More importantly, iterators lose their usability in multidimensional arrays, forcing you to use the [j][k] notation, which is slower. One main advantage of simple arrays is their ability to sequentially access all members. This is partially lost with a 2+D array.
2. Inflexible size
This is just an issue with arrays in general, but resizing a multidimensional array becomes much more complex with 2, 3, or more dimensions. If one dimension needs to change size, the entire structure has to be copied over. If your application needs to be resized, its best to use some structure besides a multidimensional array.
Again these are use-case based, but those are both significant issues that could arise by using multidimensional arrays. In both cases above, there are other solutions available that would be better choices than a multi-dimensional array.
Well the "problems" referred to are not using the structure properly, walking off the end of one or another of the dimensions of the array. If you know what you are doing and code carefully it will work perfectly.
I have often used multidimensional arrays for complex matrix manipulations in C and C++. It comes up very frequently in signals analysis and signal detection as well as high performance libraries for analyzing geometries in simulations. I did not even consider dynamic array allocation as part of the question. Even then typically sized arrays for certain bound problems with a reset function could save memory and speed performance for complex analysis. One could use a cache for smaller matrix manipulations in a library and a more complex C++ OO treatment for larger dynamic allocations on a per-problem basis.
The statements are widely applicable, but not universal. If you have static bounds, it's fine.
In C++, if you want dynamic bounds, you can't have a single contiguous allocation, because the dimensions are part of the type. Even if you don't care for a contiguous allocation, you have to be extra careful, especially if you wish to resize a dimension.
Much simpler is to have a single dimension in some container that will manage the allocation, and a multidimensional view
Given:
std::size_t N, M, L;
std::cin >> N >> M >> L;
Compare:
int *** arr = new int**[N];
std::generate_n(arr, N, [M, L]()
{
int ** sub = new int*[M];
std::generate_n(sub, M, [L](){ return new int[L]; });
return sub;
});
// use arr
std::for_each_n(arr, N, [M](int** sub)
{
std::for_each_n(sub, M, [](int* subsub){ delete[] subsub; });
delete[] sub;
});
delete[] arr;
With:
std::vector<int> vec(N * M * L);
gsl::multi_span arr(vec.data(), gsl::strided_bounds<3>({ N, M, L }));
// use arr

Declaring 3D array structure in c++ using vector

Hi I am a graduate student studying scientific computing using c++. Some of our research focus on speed of an algorithm, therefore it is important to construct array structure that is fast enough.
I've seen two ways of constructing 3D Arrays.
First one is to use vector liblary.
vector<vector<vector<double>>> a (isize,vector<double>(jsize,vector<double>(ksize,0)))
This gives 3D array structure of size isize x jsize x ksize.
The other one is to construct a structure containing 1d array of size isize* jsize * ksize using
new double[isize*jsize*ksize]. To access the specific location of (i,j,k) easily, operator overloading is necessary(am I right?).
And from what I have experienced, first one is much faster since it can access to location (i,j,k) easily while latter one has to compute location and return the value. But I have seen some people preferring latter one over the first one. Why do they prefer the latter setting? and is there any disadvantage of using the first one?
Thanks in adavance.
Main difference between those will be the layout:
vector<vector<vector<T>>>
This will get you a 1D array of vector<vector<T>>.
Each item will be a 1D array of vector<T>.
And each item of those 1D array will be a 1D array of T.
The point is, vector itself does not store its content. It manages a chunk of memory, and stores the content there. This has a number of bad consequences:
For a matrix of dimension X·Y·Z, you will end up allocating 1 + X + X·Y memory chunks. That's horribly slow, and will trash the heap. Imagine: a cube matrix of size 20 would trigger 421 calls to new!
To access a cell, you have 3 levels of indirection:
You must access the vector<vector<vector<T>>> object to get pointer to top-level memory chunk.
You must then access the vector<vector<T>> object to get pointer to second-level memory chunk.
You must then access the vector<T> object to get pointer to the leaf memory chunk.
Only then you can access the T data.
Those memory chunks will be spread around the heap, causing a lot of cache misses and slowing the overall computation.
Should you get it wrong at some point, it is possible to end up with some lines in your matrix having different lengths. After all, they're independent 1-d arrays.
Having a contiguous memory block (like new T[X * Y * Z]) on the other hand gives:
You allocate 1 memory chunk. No heap trashing, O(1).
You only need to access the pointer to the memory chunk, then can go straight for desired element.
All matrix is contiguous in memory, which is cache-friendly.
Those days, a single cache miss means dozens or hundreds lost computing cycles, do not underestimate the cache-friendliness aspect.
By the way, there is a probably better way you didn't mention: using one of the numerous matrix libraries that will handle this for you automatically and provide nice support tools (like SSE-accelerated matrix operations). One such library is Eigen, but there are plenty others.
→ You want to do scientific computing? Let a lib handle the boilerplate and the basics so you can focus on the scientific computing part.
In my point of view, there are too much advantages std::vector's have over normal plain arrays.
In short here are some:
It is much harder to create memory leaks with std::vector. This point alone is one of the biggest advantages. This has nothing to do with performance, but should be considered all the time.
std::vector is part of the STL. This part of C++ is one of the most used one. Thousands of people use the STL and so they get "tested" every day. Over the last years they got optimized so radically, they don't lack any performance anymore. (pls correct me if i see this wrong)
Handling with std::vector is easy as 1, 2, 3. No pointer handling no nothing... Just accessing it via methods or with []-operator and more other methods.
First of all, the idea that you access (i,j,k) in your vec^3 directly is somewhat flawed. What you have is a structure of pointers where you need to dereference three pointers along the way. Note that I have no idea whether that is faster or slower than computing the position within a one-dimensional array, though. You'd need to test that and it might depend on the size of your data (especially whether it fits in a chunk).
Second, the vector^3 requires pointers and vector sizes, which require more memory. In many cases, this will be irrelevant (as the image grows cubically but the memory difference only quadratically) but if your algoritm is really going to fill out any memory available, that can matter.
Third, the raw array stores everything in consecutive memory, which is good for streaming and can be good for certain algorithms because of quick cache accesses. For example when you add one 3D image to another.
Note that all of this is about hyper-optimization that you might not need. The advantages of vectors that skratchi.at pointed out in his answer are quite strong, and I add the advantage that vectors usually increase readability. If you do not have very good reasons not to use vectors, then use them.
If you should decide for the raw array, in any case, make sure that you wrap it well and keep the class small and simple, in order to counter problems regarding leaks and such.
Welcome to SO.
If everything what you have are the two alternatives, then the first one could be better.
Prefer using STL array or vector instead of a C array
You should avoid to use C++ plain arrays since you need to manage yourself the memory allocating/deallocating with new/delete and other boilerplate code like keep track of the size/check bounds. In clearly words "C arrays are less safe, and have no advantages over array and vector."
However, there are some important drawbacks in the first alternative. Something I would like to highlight is that:
std::vector<std::vector<std::vector<T>>>
is not a 3-d matrix. In a matrix, all the rows must have the same size. On the other hand, in a "vector of vectors" there is no guarantee that all the nested vectors have the same length. The reason is that a vector is a linear 1-D structure as pointed out in the #spectras answer. Hence, to avoid all sort of bad or unexpected behaviours, you must to include guards in your code to obtain the rectangular invariant guaranteed.
Luckily, the first alternative is not the only one you may have in hands.
For example, you can replace the c-style array by a std::array:
const int n = i_size * j_size * k_size;
std::array<int, n> myFlattenMatrix;
or use std::vector in case your matrix dimensions can change.
Accessing element by its 3 coordinates
Regarding your question
To access the specific location of (i,j,k) easily, operator
overloading is necessary(am I right?).
Not exactly. Since there isn't a 3-parameter operator for neither std::vector nor array, you can't overload it. But you can create a template class or function to wrap it for you. In any case you will must to deference the 3 vectors or calculate the flatten index of the element in the linear storage.
Considering do not use a third part matrix library like Eigen for your experiments
You aren't coding it for production but for research purposes instead. Particularly, your research is exactly regarding the performance of algorithms. In that case, I prefer do not recommend to use a third part library, like Eigen, absolutely. Of course it depends a lot of what kind of "speed of an algorithm" metrics are you willing to gather, but Eigen, for instance, will do a lot of things under the hood (like vectorization) which will have a tremendous influence on your experiments. Since it will be hard for you to control those unseen optimizations, these library's features may lead you to wrong conclusions about your algorithms.
Algorithm's performance and big-o notation
Usually, the performance of algorithms are analysed by using the big-O approach where factors like the actual time spent, hardware speed or programming language traits aren't taken in account. The book "Data Structures and Algorithms in C++" by Adam Drozdek can provide more details about it.

C++ Eigen Matrix Operations vs. Memory Allocation Performance

I have an algorithm that requires the construction of an NxN matrix inside a function that will return the product of this matrix with an Nx1 vector that's also built on the fly.
(N is usually 8 or 9, but must be generalized for values greater than that).
I'm using the Eigen library for performing algebraic operations that are even more complex (least squares and several other constrained problems), so switching it isn't an option.
I've benchmarked the functions, and there's a huge bottleneck due to the intensive memory
allocations. I aim to build a thread safe application, so, for some cases, I replaced these matrices and vectors with references to elements from a global vector that serves as a provider for objects that cannot be stored on the stack. This avoids calling the constructors/destructors of the Eigen matrices and vectors, but it's not an elegant solution and it can lead to huge problems if considerable care is not taken.
As such, does Eigen either offer a workaround because I don't see the option of passing an allocator as a template argument for these objects, OR is there a more obvious thing to do?
You can manage your own memory in a way that fits your needs and use Eigen::Map instead of Eigen::Matrix to perform calculations with it. Just make sure the data is aligned properly or notify Eigen if it isn't.
See the reference Eigen::Map for details.
Here is short example:
#include <iostream>
#include <Eigen/Core>
int main() {
int mydata[3 * 4]; // Manage your own memory as you see fit
int* data_ptr = mydata;
Eigen::Map<Eigen::MatrixXi, Eigen::Unaligned> mymatrix(data_ptr, 3, 4);
// use mymatrix like you would any another matrix
mymatrix = Eigen::MatrixXi::Zero(3, 4);
std::cout << mymatrix << '\n';
// This line will trigger a failed assertion in debug mode
// To change it see
// http://eigen.tuxfamily.org/dox-devel/TopicAssertions.html
mymatrix = Eigen::MatrixXi::Ones(3, 6);
std::cout << mymatrix << '\n';
}
To gather my comments into a full idea. Here is how I would try to do it.
Because the memory allocation in eigen is a pretty advanced stuff IMO and they do not expose much places to tap into it. The best bet is to wrap eigen objects itself into some kind of resource manager, like OP did.
I would make it a simple bin, that hold Matrix< Scalar, Dynamic, Dynamic> objects. This way you template the Scalar type and have a manager for generalized size matrices.
Whenever you call for an object, you check if you have a free object of the desired size, you return reference to it. If not, you allocate a new one. Simple. when you want to release the object, then you mark it free in the resource manager. I don't think anything more complicated is needed, but of course it's possible to implement some more sophisticated logic.
To ensure thread safety I would put a lock in the manager. Initialize it in the constructor if needed. Of course locking on free and allocate would be needed.
However depending on the work schedule. If the threads work on their own arrays I would consider to make one resource manager instance for each thread, so they don't clock each other. The thing is, that a global lock or a global manager would possibly get exhausted if you have like 12 cores working heavy on allocations/deallocations, and effectively serialize your app thourgh this one lock.
You can try replacing your default memory allocator with jemalloc or tcmalloc. It's pretty easy to try out thanks to the LD_PRELOAD mechanism.
https://github.com/jemalloc/jemalloc/wiki/Getting-Started
http://goog-perftools.sourceforge.net/doc/tcmalloc.html
C++ memory allocation mechanism performance comparison (tcmalloc vs. jemalloc)
what're the differences between tcmalloc/jemalloc and memory pool
I think it works for most C++ projects as well.
You could allocate memory for some common matrix sizes before calling that function with operator new or operator new[], store the void* pointers somewhere and let the function itself retrieve an memory block with the right size. After that, you can use placement new for matrix construction. Details are given in More effective C++, item 8.

c++11 std::array vs static array vs std::vector

First question, is it a good thing to start using c++11 if I will develop a code for the 3 following years?
Then if it is, what is the "best" way to implement a matrix if I want to use it with Lapack? I mean, doing std::vector<std::vector< Type > > Matrix is not easily compatible with Lapack.
Up to now, I stored my matrix with Type* Matrix(new Type[N]) (the pointer form with new and delete were important because the size of the array is not given as a number like 5, but as a variable).
But with C++11 it is possible to use std::array. According to this site, this container seems to be the best solution... What do you think?
First things first, if you are going to learn C++, learn C++11. The previous C++ standard was released in 2003, meaning it's already ten years old. That's a lot in IT world. C++11 skills will also smoothly translate to upcoming C++1y (most probably C++14) standard.
The main difference between std::vector and std::array is the dynamic (in size and allocation) and static storage. So if you want to have a matrix class that's always, say, 4x4, std::array<float, 4*4> will do just fine.
Both of these classes provide .data() member, which should produce a compatible pointer. Note however, that std::vector<std::vector<float>> will NOT occuppy contiguous 16*sizeof(float) memory (so v[0].data() won't work). If you need a dynamically sized matrix, use single vector and resize it to the width*height size.
Since the access to the elements will be a bit harder (v[width * y +x] or v[height * x + y]), you might want to provide a wrapper class that will allow you to access arbitrary field by row/column pair.
Since you've also mentioned C-style arrays; std::array provides nicer interface to deal with the same type of storage, and thus should be preferred; there's nothing to gain with static arrays over std::array.
This is a very late reply to the question, but if someone reads this, I just want to point out that one should almost never implement a matrix as a ''vector of vectors''. The reason is that each row of the matrix gets stored in some random location on the heap. This means that matrix operations will do a lot of random memory accesses leading to cache misses, which slows down the implementation considerably.
In other words, if you care at all about performance, just allocate an array/std::array/std::vector of size rows * columns, then use wrapper functions that transforms a pair of integers to the corresponding element in the array. Unless you need to support things like returning references to rows of the matrix, then all of these options should work just fine.

Optimising C++ 2-D arrays

I need a way to represent a 2-D array (a dense matrix) of doubles in C++, with absolute minimum accessing overhead.
I've done some timing on various linux/unix machines and gcc versions. An STL vector of vectors, declared as:
vector<vector<double> > matrix(n,vector<double>(n));
and accessed through matrix[i][j] is between 5% and 100% slower to access than an array declared as:
double *matrix = new double[n*n];
accessed through an inlined index function matrix[index(i,j)], where index(i,j) evaluates to i+n*j. Other ways of arranging a 2-D array without STL - an array of n pointers to the start of each row, or defining the whole thing on the stack as a constant size matrix[n][n] - run at almost exactly the same speed as the index function method.
Recent GCC versions (> 4.0) seem to be able to compile the STL vector-of-vectors to nearly the same efficiency as the non-STL code when optimisations are turned on, but this is somewhat machine-dependent.
I'd like to use STL if possible, but will have to choose the fastest solution. Does anyone have any experience in optimising STL with GCC?
If you're using GCC the compiler can analyze your matrix accesses and change the order in memory in certain cases. The magic compiler flag is defined as:
-fipa-matrix-reorg
Perform matrix flattening and
transposing. Matrix flattening tries
to replace a m-dimensional matrix with
its equivalent n-dimensional matrix,
where n < m. This reduces the level of
indirection needed for accessing the
elements of the matrix. The second
optimization is matrix transposing
that attemps to change the order of
the matrix's dimensions in order to
improve cache locality. Both
optimizations need fwhole-program
flag. Transposing is enabled only if
profiling information is avaliable.
Note that this option is not enabled by -O2 or -O3. You have to pass it yourself.
My guess would be the fastest is, for a matrix, to use 1D STL array and override the () operator to use it as 2D matrix.
However, the STL also defines a type specifically for non-resizeable numerical arrays: valarray. You also have various optimisations for in-place operations.
valarray accept as argument a numerical type:
valarray<double> a;
Then, you can use slices, indirect arrays, ... and of course, you can inherit the valarray and define your own operator()(int i, int j) for 2D arrays ...
Very likely this is a locality-of-reference issue. vector uses new to allocate its internal array, so each row will be at least a little apart in memory due to each block's header; it could be a long distance apart if memory is already fragmented when you allocate them. Different rows of the array are likely to at least incur a cache-line fault and could incur a page fault; if you're really unlucky two adjacent rows could be on memory lines that share a TLB slot and accessing one will evict the other.
In contrast your other solutions guarantee that all the data is adjacent. It could help your performance if you align the structure so it crosses as few cache lines as possible.
vector is designed for resizable arrays. If you don't need to resize the arrays, use a regular C++ array. STL operations can generally operate on C++ arrays.
Do be sure to walk the array in the correct direction, i.e. across (consecutive memory addresses) rather than down. This will reduce cache faults.
My recommendation would be to use Boost.UBLAS, which provides fast matrix/vector classes.
To be fair depends on the algorithms you are using upon the matrix.
The double name[n*m] format is very fast when you are accessing data by rows both because has almost no overhead besides a multiplication and addition and because your rows are packed data that will be coherent in cache.
If your algorithms access column ordered data then other layouts might have much better cache coherence. If your algorithm access data in quadrants of the matrix even other layouts might be better.
Try to make some research directed at the type of usage and algorithms you are using. That is specially important if the matrix are very large, since cache misses may hurt your performance way more than needing 1 or 2 extra math operations to access each address.
You could just as easily do vector< double >( n*m );
You may want to look at the Eigen C++ template library at http://eigen.tuxfamily.org/ . It generates AltiVec or sse2 code to optimize the vector/matrix calculations.
There is the uBLAS implementation in Boost. It is worth a look.
http://www.boost.org/doc/libs/1_36_0/libs/numeric/ublas/doc/matrix.htm
Another related library is Blitz++: http://www.oonumerics.org/blitz/docs/blitz.html
Blitz++ is designed to optimize array manipulation.
I have done this some time back for raw images by declaring my own 2 dimensional array classes.
In a normal 2D array, you access the elements like:
array[2][3]. Now to get that effect, you'd have a class array with an overloaded
[] array accessor. But, this would essentially return another array, thereby giving
you the second dimension.
The problem with this approach is that it has a double function call overhead.
The way I did it was to use the () style overload.
So instead of
array[2][3], change I had it do this style array(2,3).
That () function was very tiny and I made sure it was inlined.
See this link for the general concept of that:
http://www.learncpp.com/cpp-tutorial/99-overloading-the-parenthesis-operator/
You can template the type if you need to.
The difference I had was that my array was dynamic. I had a block of char memory I'd declare. And I employed a column cache, so I knew where in my sequence of bytes the next row began. Access was optimized for accessing neighbouring values, because I was using it for image processing.
It's hard to explain without the code but essentially the result was as fast as C, and much easier to understand and use.