Three-dimensional array as a vector of arrays - c++

I have 3-dim double array with two of its dimensions known at compile time.
So to make it efficient, I'd like to write something like
std::vector<double[7][19]> v;
v.resize(3);
v[2][6][18] = 2 * 7*19 + 6 * 19 + 18;
It's perfect except it does not compile because of "v.resize(3);"
Please don't recommend me using nested vectors like
std::vector<std::vector<std::vector<double>>> v;
because I don't want to set sizes for known dimensions each time I extend v by its first dimension.
What is the cleanest solution here?

Why not a std::vector of std::array of std::array of double?
std::vector<std::array<std::array<double, 19>, 7>> v;

This is a good example of when it's both reasonable and useful to inherit from std::vector, providing:
you can be sure they won't be deleted using a pointer to their vector base class (which would be Undefined Behaviour as the vector destructor's not virtual)
you're prepared to write a forwarding constructor or two if you want to use them
ideally, you're using this in a relatively small application rather than making it part of a library API with lots of distributed client users - the more hassle client impact could be, the more pedantic one should be about full encapsulation
template <typename T, size_t Y, size_t Z>
struct Vec_3D : std::vector<std::array<std::array<T, Y>, Z>>
{
T& operator(size_t x, size_t y, size_t z)
{
return (*this)[x * Y * Z + y * Y + z];
}
const T& operator(size_t x, size_t y, size_t z) const
{
return (*this)[x * Y * Z + y * Y + z];
}
};
So little effort, then you've got a nicer and less error prone v(a, b, c) notation available.
Concerning the derivation, note that:
your derived type makes no attempt to enforce different invariants on the object
it doesn't add any data members or other bases
(The lack of data members / extra bases means even slicing and accidental deletion via a vector* are likely to work in practice, even though they should be avoided it's kind of nice to know you're likely playing with smoke rather than fire.)
Yes I know Alexandrescu, Sutter, Meyers etc. recommend not to do this - I've read their reasons very carefully several times and if you want to champion them even for this usage, please bring relevant technical specifics to the table....

Related

data locality for implementing 2d array in c/c++

Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays).
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
for (i = 0; i < NumRows; ++i) x[i] = (double *)calloc(NumCol, sizeof(double));
return x;
}
double **x = allocate_matrix(1000,2000);
x[m][n] = ...;
But recently noticed that many people implement matrices as follows
double *x = (double *)malloc(NumRows * NumCols * sizeof(double));
x[NumCol * m + n] = ...;
From the locality point of view the second method seems perfect, but has awful readability... So I started to wonder, is my first method with storing auxiliary array or **double pointers really bad or the compiler will optimize it eventually such that it will be more or less equivalent in performance to the second method? I am suspicious because I think that in the first method two jumps are made when accessing the value, x[m] and then x[m][n] and there is a chance that each time the CPU will load first the x array and then x[m] array.
p.s. do not worry about extra memory for storing **double, for large matrices it is just a small percentage.
P.P.S. since many people did not understand my question very well, I will try to re-shape it: do I understand right that the first method is kind of locality-hell, when each time x[m][n] is accessed first x array will be loaded into CPU cache and then x[m] array will be loaded thus making each access at the speed of talking to RAM. Or am I wrong and the first method is also OK from data-locality point of view?
For C-style allocations you can actually have the best of both worlds:
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
x[0] = (double *)calloc(NumRows * NumCol, sizeof(double)); // <<< single contiguous memory allocation for entire array
for (i = 1; i < NumRows; ++i) x[i] = x[i - 1] + NumCols;
return x;
}
This way you get data locality and its associated cache/memory access benefits, and you can treat the array as a double ** or a flattened 2D array (array[i * NumCols + j]) interchangeably. You also have fewer calloc/free calls (2 versus NumRows + 1).
No need to guess whether the compiler will optimize the first method. Just use the second method which you know is fast, and use a wrapper class that implements for example these methods:
double& operator(int x, int y);
double const& operator(int x, int y) const;
... and access your objects like this:
arr(2, 3) = 5;
Alternatively, if you can bear a little more code complexity in the wrapper class(es), you can implement a class that can be accessed with the more traditional arr[2][3] = 5; syntax. This is implemented in a dimension-agnostic way in the Boost.MultiArray library, but you can do your own simple implementation too, using a proxy class.
Note: Considering your usage of C style (a hardcoded non-generic "double" type, plain pointers, function-beginning variable declarations, and malloc), you will probably need to get more into C++ constructs before you can implement either of the options I mentioned.
The two methods are quite different.
While the first method allows for easier direct access to the values by adding another indirection (the double** array, hence you need 1+N mallocs), ...
the second method guarantees that ALL values are stored contiguously and only requires one malloc.
I would argue that the second method is always superior. Malloc is an expensive operation and contiguous memory is a huge plus, depending on the application.
In C++, you'd just implement it like this:
std::vector<double> matrix(NumRows * NumCols);
matrix[y * numCols + x] = value; // Access
and if you're concerned with the inconvenience of having to compute the index yourself, add a wrapper that implements operator(int x, int y) to it.
You are also right that the first method is more expensive when accessing the values. Because you need two memory lookups as you described x[m] and then x[m][n]. There is no way the compiler will "optimize this away". The first array, depending on its size, will be cached, and the performance hit may not be that bad. In the second case, you need an extra multiplication for direct access.
In the first method you use, the double* in the master array point to logical columns (arrays of size NumCol).
So, if you write something like below, you get the benefits of data locality in some sense (pseudocode):
foreach(row in rows):
foreach(elem in row):
//Do something
If you tried the same thing with the second method, and if element access was done the way you specified (i.e. x[NumCol*m + n]), you still get the same benefit. This is because you treat the array to be in row-major order. If you tried the same pseudocode while accessing the elements in column-major order, I assume you'd get cache misses given that the array size is large enough.
In addition to this, the second method has the additional desirable property of being a single contiguous block of memory which further improves the performance even when you loop through multiple rows (unlike the first method).
So, in conclusion, the second method should be much better in terms of performance.
If NumCol is a compile-time constant, or if you are using GCC with language extensions enabled, then you can do:
double (*x)[NumCol] = (double (*)[NumCol]) malloc(NumRows * sizeof (double[NumCol]));
and then use x as a 2D array and the compiler will do the indexing arithmetic for you. The caveat is that unless NumCol is a compile-time constant, ISO C++ won't let you do this, and if you use GCC language extensions you won't be able to port your code to another compiler.

POD real and complex vectors and arrays

[Christian Hacki and Barry point out that a question has been asked specifically about complex previously. My question is more general, as it applies to std::vector, std::array, and all the container classes that use allocators. Also, the answers on the other question are not adequate, IMO. Perhaps I could bump the other question somehow.]
I have a C++ application that uses lots of arrays and vectors of real values (doubles) and complex values. I do not want them initialized to zeros. Hey compiler and STL! - just allocate the dang memory and be done with it. It's on me to put the right values in there. Should I fail to do so, I want the program to crash during testing.
I managed to prevent std::vector from initializing with zeros by defining a custom allocator for use with POD's. (Is there a better way?)
What to do about std::complex? It is not defined as a POD. It has a default constructor that spews zeros. So if I write
std::complex<double> A[compile_time_const];
it spews. Ditto for
std::array <std::complex<double>, compile_time_constant>;
What's the best way to utilize the std::complex<> functionality without provoking swarms of zeros?
[Edit] Consider this actual example from a real-valued FFT routine.
{
cvec Out(N);
for (int k : range(0, N / 2)) {
complex Evenk = Even[k];
complex T = twiddle(k, N, sgn);
complex Oddk = Odd[k] * T;
Out[k] = Evenk + Oddk;
Out[k + N / 2] = Evenk - Oddk; // Note. Not in order
}
return Out;
}

Overloading operator[] to start at 1 and performance overhead

I am doing some C++ computational mechanics (don't worry, no physics knowledge required here) and there is something that really bothers me.
Suppose I want to represent a 3D math Vector (nothing to do with std::vector):
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[0] = x;
coordinates[1] = y;
coordinates[2] = z;
}
private:
double coordinates[3];
};
So far so good. Now I can overload operator[] to extract coordinates:
double& Vector::operator[](int i) {
return coordinates[i] ;
}
So I can type:
Vector V;
… //complex computation with V
double x1 = V[0];
V[1] = coord2;
The problem is, indexing from 0 is NOT natural here. I mean, when sorting arrays, I don't mind, but the fact is that the conventionnal notation in every paper, book or whatever is always substripting coordinates beginning with 1.
It may seem a quibble but the fact is that in formulas, it always takes a double-take to understand what we are taking about. Of course, this is much worst with matrices.
One obvious solution is just a slightly different overloading :
double& Vector::operator[](int i) {
return coordinates[i-1] ;
}
so I can type
double x1 = V[1];
V[2] = coord2;
It seems perfect except for one thing: this i-1 subtraction which seems a good candidate for a small overhead. Very small you would say, but I am doing computationnal mechanics, so this is typically something we couldn't afford.
So now (finally) my question: do you think a compiler can optimize this, or is there a way to make it optimize ? (templates, macro, pointer or reference kludge...)
Logically, in
double xi = V[i];
the integer between the bracket being a literal most of the time (except in 3-iteration for loops), inlining operator[] should make it possible, right ?
(sorry for this looong question)
EDIT:
Thanks for all your comments and answers
I kind of disagree with people telling me that we are used to 0-indexed vectors.
From an object-oriented perspective, I see no reason for a math Vector to be 0-indexed because implemented with a 0-indexed array. We're not suppose to care about the underlying implementation. Now, suppose I don't care about performance and use a map to implement Vector class. Then I would find it natural to map '1' with the '1st' coordinate.
That said I tried out with 1-indexed vectors and matrices, and after some code writing, I find it not interacting nicely every time I use an array around. I thougth Vector and containers (std::array,std::vector...) would not interact often (meaning, transfering data between one another), but it seems I was wrong.
Now I have of a solution that I think is less controversial (please give me your opinion) :
Every time I use a Vector in some physical context, I think of using an enum :
enum Coord {
x = 0,
y = 1,
z = 2
};
Vector V;
V[x] = 1;
The only disadvantage I see being that these x,y and z can be redefined without enven a warning...
This one should be measured or verified by looking at the disassembly, but my guess is: The getter function is tiny and its arguments are constant. There is a high chance the compiler will inline the function and constant-fold the subtraction. In that case the runtime cost would be zero.
Why not to try this:
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[1] = x;
coordinates[2] = y;
coordinates[3] = z;
}
private:
double coordinates[4];
};
If you are not instantiating your object in quantities of millions, then the memory waist might be affordable.
Have you actually profiled it or examined the generated code? That's how this question is answered.
If the operator[] implementation is visible then this is likely to be optimized to have zero overhead.
I recommend you define this in the header (.h) for your class. If you define it in the .cpp then the compiler can't optimize as much. Also, your index should not be an "int" which can have negative values... make it a size_t:
class Vector {
// ...
public:
double& operator[](const size_t i) {
return coordinates[i-1] ;
}
};
You cannot say anything objective about performance without benchmarking. On x86, this subtraction can be compiled using relative addressing, which is very cheap. If operator[] is inlined, then the overhead is zero—you can encourage this with inline or with compiler-specific instructions such as GCC’s __attribute__((always_inline)).
If you must guarantee it, and the offset is a compile-time constant, then using a template is the way to go:
template<size_t I>
double& Vector::get() {
return coordinates[i - 1];
}
double x = v.get<1>();
For all practical purposes, this is guaranteed to have zero overhead thanks to constant-folding. You could also use named accessors:
double Vector::x() const { return coordinates[0]; }
double Vector::y() const { return coordinates[1]; }
double Vector::z() const { return coordinates[2]; }
double& Vector::x() { return coordinates[0]; }
double& Vector::y() { return coordinates[1]; }
double& Vector::z() { return coordinates[2]; }
And for loops, iterators:
const double* Vector::begin() const { return coordinates; }
const double* Vector::end() const { return coordinates + 3; }
double* Vector::begin() { return coordinates; }
double* Vector::end() { return coordinates + 3; }
// (x, y, z) -> (x + 1, y + 1, z + 1)
for (auto& i : v) ++i;
Like many of the others here, however, I disagree with the premise of your question. You really should simply use 0-based indexing, as it is more natural in the realm of C++. The language is already very complex, and you need not complicate things further for those who will maintain your code in the future.
Seriously, benchmark this all three ways (ie, compare the subtraction and the double[4] methods to just using zero-based indices in the caller).
It's entirely possible you'll get a huge win from forcing 16-byte alignment on some cache architectures, and equally possible the subtraction is effectively free on some compiler/instruction set/code path combinations.
The only way to tell is to benchmark realistic code.

Class design: arrays vs multiple variables

I have a bit of a theoretical question, however it is a problem I sometimes face when designing classes and I see it done differently when reading others code. Which of the following would be better and why:
example 1:
class Color
{
public:
Color(float, float, float);
~Color();
friend bool operator==(Color& lhs, Color& rhs);
void multiply(Color);
// ...
float get_r();
float get_g();
float get_b();
private:
float color_values[3];
}
example 2:
class Color
{
public:
// as above
private:
float r;
float g;
float b;
}
Is there a general rule one should follow in cases like this or is it just up to a programmer and what seems to make more sense?
Both!
Use this:
class Color {
// ...
private:
union {
struct {
float r, g, b;
};
float c[3];
};
};
Then c[0] will be equivalent to r, et cetera.
It depends, do you intend to iterate over the whole array ?
In that case, I think solution 1 is more appropriate.
It is very useful to have an array like that when you have functions that operate in a loop on the data
e.g.
void BumpColors(float idx)
{
for (int i = 0; i < 3; ++i)
color_values[i] += idx;
}
vs
void BumpColors(float idx)
{
color_values[0] += idx;
color_values[1] += idx;
color_values[2] += idx;
}
Of course this is trivial, and I think it really is a matter of preference. In some rare occasion you might have APIs that take a pointer to the data though, and while you can do
awesomeAPI((float*)&r);
I would much prefer doing
awesomeAPI((float*)&color_values[0]);
because the array will guarantee its contiguity whereas you can mess up with the contiguity by adding by mistake another member variable that is not related after float r.
Performance wise there would be no difference.
I'd say the second one is the best one.
First, the data your variables contain isn't supposed (physically) to be in an array. If you had for example a class with 3 students, not more, not less, you'd put them in an array, cause they are an array of students, but here, it's just colors.
Second, Someone that reads your code also can understand in the second case really fast what your variables contain (r is red, etc). It isn't the case with an array.
Third, you'll have less bugs, you won't have to remember "oh, in my array, red is 0, g is 1, b is 2", and you won't replace by mistake
return color_values[0]
by
return color_values[1]
in your code.
I think that you are right: "It just up to a programmer and what seems to make more sense." If this were my program, I would choose one form or the other without worrying too much about it, then write some other parts of the program, then revisit the matter later.
One of the benefits of class-oriented design is that it makes internal implementation details of this kind private, which makes it convenient to alter them later.
I think that your question does matter, only I doubt that one can answer it well until one has written more code. In the abstract, there are only three elements, and the three have names -- red, green and blue -- so I think that you could go either way with this. If forced to choose, I choose example 2.
Is there a general rule one should follow in cases like this or is it just up to a programmer and what seems to make more sense?
It's definitely up to the programmer and whatever makes more sense.
In your case, the second option seems more appropriate. After all, logically thinking, your member isn't an array of values, but values for r, g and b.
Advantages of using an array:
Maintainability: You can use the values in the array to loop
Maintainability: When a value should be added (like yellow?) than you don't have to change a lot of code.
Disadvantage:
Readability: The 'values' have more clearer names (namely r, g, b in this case).
In your case probably the r, g, b variables are best, since it's unlikely a color is added and a loop over 3 elements has probably a less high importance than readability.
Sometimes a programmer will use an array ( or data structure )
in order to save the data faster to disk (or memory) using 1 write operation.
This is especially useful if you are reading and writing a lot of data.

What's the most efficient way to convert between two implementations of the same types?

For example, I have a QMatrix4x4 and I have a Ogre::Matrix4.
Converting back and forth from QMatrix to Ogre::Matrix4 is a bit tedious, I would like to know if there are any solid solutions?
Right now I'm simply copying each element over in a for loop, any suggestions?
In C++11, if both types are layout-compatible, you can simply reinterpret_cast between them. Example:
#include <cassert>
struct X{
int a, b, c, d;
};
struct Y{
int arr[4];
};
int main(){
X x{0, 1, 2, 3};
Y& y = reinterpret_cast<Y&>(x);
assert(y.arr[2] == 2);
}
(I hope I got the example right; live on Ideone.)
The problem with this approach is that you'll need to dig into the internals of the implementations, which might not be stable over different releases even. As such, simply copying is likely the best (as in most portable / known / solid) approach.
One thing though: I personally don't know which interfaces the types in question offer, Ogre might also offer algorithms that don't directly operate on a Matrix4 but on an (2D) array aswell. Check the API documentation.