Does 2d Array Work in glUniformMatrix4fv, C++? - c++

I have a 4x4 matrix class that holds the values as 2d float array : float mat[4][4]; and I overloaded the [] operator
inline float *Matrix4::operator[](int row)
{
return mat[row];
}
and I have an uniform mat4 in my shader (uniform mat4 Bones[64];) which i will upload my data onto. And I hold the bones as a Matrix4 pointer (Matrix4 *finalBones) which is a matrix array, containing numJoints elements. and this is what i use to upload the data :
glUniformMatrix4fv(shader.getLocation("Bones"),numJoints,false,finalBones[0][0]);
I am totally unsure about what is going to happen, so i need to ask that if this will work or i need to extract everything into 16*numJoints sized float arrays which seems to be expensive for each tick.
edit
this is what i have done which seems really expensive for each draw operation
void MD5Loader::uploadToGPU()
{
float* finalmat=new float[16*numJoints];
for (int i=0; i < numJoints; i++)
for (int j=0; j < 4; j++)
for(int k=0; k < 4; k++)
finalmat[16*i + j*4 + k] = finalBones[i][j][k];
glUniformMatrix4fv(shader.getLocation("Bones"),numJoints,false,finalmat);
delete[] finalmat;
}

It depends on two things. Assuming your compiler is C++03, (and you care about standards compliance), then your Matrix class must be a POD type. It must, in particular, have no constructors. C++11 relaxes these rules significantly.
The other thing is that your matrix appears to be row-major. I say this because your operator[] seems to think that your 2D array is row-major. The first coordinate is the row rather than the column, so it's row-major storage order.
If you're going to give OpenGL row-major matrices, you need to tell it that they are row-major:
glUniformMatrix4fv(shader.getLocation("Bones"), numJoints, GL_TRUE, finalBones[0][0]);

With C++11, a class with standard layout can be cast to a pointer, addressing its first member, and back again. With a sizeof check, you then have a guarantee of contiguous matrix data:
#include <type_traits>
static_assert((sizeof(Matrix4) == (sizeof(GLfloat) * 16)) &&
(std::is_standard_layout<Matrix4>::value),
"Matrix4 does not satisfy contiguous storage requirements");
Trivial (or POD) layouts are just too restrictive once you have non-trivial constructors, etc. Your post suggests you will need to set the transpose parameter to GL_TRUE.

Related

Alignment of Eigen matrices in an `std::array`

I have an application where I need to pass N Eigen matrices to some functions. N is a compile-time constant, and these functions are called a significant number of times within tight loops. To avoid dynamic allocations are runtime, I thought it might be nice to store these matrices in an std::array, and then pass iterators to this array as function arguments. As a trivial example, consider:
const int N = 3;
const int SIZE = 125;
typedef std::array<Eigen::Matrix<double, SIZE, 1>, N> MatrixArray;
void computeMatrixProductArray(MatrixArray::const_iterator BeginIn,
MatrixArray::const_iterator BeginEnd,
MatrixArray::iterator BeginOut,
MatrixArray::iterator EndOut)
{
Eigen::Matrix<double, SIZE, 1> Test;
for (int J = 0; J < N; ++J) {
*(BeginOut + J) = Test.array() * (*(BeginIn + J)).array();
}
}
int main()
{
MatrixArray ArrayIn, ArrayOut;
computeMatrixProductArray(ArrayIn.cbegin(), ArrayIn.cend(),
ArrayOut.begin(), ArrayOut.end());
}
My question has to do with the alignment of the matrices stored in MatrixArray, and how Eigen3.3 treats unaligned memory. The size of the matrices, and the properties of std::array, ensure that an individual matrix in a MatrixArray will certainly not be aligned on any nice boundary. However, to my understanding, Eigen3.3 can still vectorize this case with unaligned operations.
Can anyone provide insight on what happens in the above example when J == 1, and the first entry of this matrix is not aligned? Does Eigen3.3 now treat this case similarly to what happens with equivalent dynamic matrices? My understanding here is that scalar, or unaligned, operations are used until an appropriate alignment boundary is reached, at which time fully aligned operations will be used. Or, since the matrix is not aligned to begin with, there is no chance of aligned vectorization? Or is something else happening entirely?
Thanks for any insight, and thanks to the developers for maintaining such a powerful library.

How to copy-by-value (not by reference) a CPLEX IloArray, the easy way

I work with CPLEX and C++ via Concert Technology and a recurrent issue I am encountering is that internally the IloArray structures seem to be overloaded vector structures with copy-by-reference operators. Which I must acknowledge is a quite neat and memory efficient way of handling the array structures.
Yet... This implies that making IloIntArray Array_copy = Array, for a previously declared and initialized IloIntArray Array, will make a reference copy of Array into Array_copy. Hence, any change in Array is automatically transferred to Array_copy. The same applies to multi-dimensional IloArray structures via the add() routine.
Let us say, for instance, that I need to populate a 2D IloArray<IloIntArray> Array2D(env), inside a for-loop indexed in i = 1 to iSize, storing in each position of Array2D[i], from i = 1 to iSize, the values of Array which will be different at each iteration of the loop. Making either:
Array2D[i].add(Array) or,
Array2D[i] = Array, assuming Array2D i-dimension was initially set to be of size iSize.
Fails to make the intended copy-by-value, since each time, a copy-by-reference is made to the elements of the i-dimension and you end up with all identical elements, equal to the last value of Array.
Now, besides, making my own copy-by-value operator constructor (Option I) or a copy routine (Option II) receiving, both, the origin and destination arrays as well as the position of the destination array (e.g. multi-dimensional array) where the origin array is to be copied.
Is there another way to make the copy-by-value? In any case, can you help me decide which one of these options is more neat and/or memory efficient? Intuitively I deem Option I to be the more efficient, but I don't know how to do it...
Thanks in advance for your help
Y
So far, I am solving my own issue by implementing a copy() function.
I have typedefed my multi-dimensional arrays as follows:
typedef IloArray<IloIntArray> Ilo2IntArray; and typedef IloArray<IloNumArray> Ilo2NumArray and so on for three or four dimensional integer or numeric arrays.
An example of my Ilo3IntArray copy(Ilo3IntArray origin) overload of the copy function I am using as a patch to make copy-by-value copies, is as follows:
Ilo3IntArray copy(Ilo3IntArray origin) {
IloInt iSize = origin.getSize();
Ilo3IntArray destination(env, iSize);
IloInt jSize = origin[0].getSize();
IloInt zSize = origin[0][0].getSize();
for (IloInt i = 0; i < iSize; i++) {
destination[i] = Ilo2IntArray(env, jSize);
for (IloInt j = 0; j < jSize; j++) {
destination[i][j] = IloIntArray(env, zSize);
for (IloInt z = 0; z < zSize; z++) {
destination[i][j][z] = origin[i][j][z];
}
}
}
return destination;
// Freeing Local Memory
destination.end();
}
Your comments and/or better answers are welcome!!!

What data GLSL takes?

What data GLSL takes? For example I have a matrix 4x4, which is following kind:
float **matrix = new float*[4];
for(int i = 0; i < 4; i++)
matrix[i] = new matrix[4];
Can the GLSL to take this, as mat4x4?
Or, better to use following:
float *matrix = new float[16];
I haven't found this information in specification of GLSL 1.30(I have use particularly this version)
First of all you should make use of GLfloat instead that float, since it's there exactly to represent GL data which is used by your program and the GPU.
Regarding your specific question glUniform has many flavours used to send what you need. You have both
void glUniformMatrix2f(GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
void glUniformMatrix4f(GLint location, GLsizei count, GLboolean transpose, const GLfloat *value);
You can use them easily by passing the pointer to your data:
GLint location = glGetUniformLocation(variable,"variable_name");
glUniformMatrix4f(location, 1, false, &matrix);
You can't use the first variant, because the 16 floats of the matrix data will be allocated in different memory areas. You store array of 4 pointers, each pointing to separate 4xfloat array. OpenGL expects data to be located sequentially, so either use the second variant or use struct/static 2d array:
GLfloat[4][4]
It may be convenient to use the existing library for that, fore example gl matrix
What data GLSL takes? For example I have a matrix 4x4, which is following kind:
float **matrix = new float*[4];
for(int i = 0; i < 4; i++)
matrix[i] = new matrix[4];
That's not a 4×4 matrix. That's an array of pointers to arrays of 4 float elements.
Can the GLSL to take this, as mat4x4?
No, because it's not a matrix.
A 4×4 matrix would be a region of contiguous memory that contains 4·4 = 16 values (floats if you will) of which you denote, that each n-tuple (n=4) of values forms a vector and the m-tuple (m=4) of vectors forms a matrix.
Or, better to use following:
float *matrix = new float[16];
That would be a contiguous region of memory holding 16 = 4·4 float values, which by denotion can be interpreted as a 4×4 matrix of floats. So yes, this can be interpreted by OpenGL / GLSL as a 4×4 matrix of floats mat4.
However if you make this part of some class don't use dynamic memory.
class foo {
foo() { matrix = new float[16]; }
float *matrix;
};
Is bad, because you create some unnecessary overhead. If the class is dynamically allocated (with new) that will trigger another dynamic memory allocation, which creates overhead. If the instances are on automatic memory (stack, i.e. no new), it also imposes unnecessary overhead.
class foo {
foo() { … }
float matrix[16];
};
is much better, because if the class instances are created with new that also covers the memory for the matrix. And if it's on automatic memory it completely avoids the overhead of dynamic allocation.

Speed difference of dynamic and classical multi-dimentional arrays

Are the usages (not creations) speed of dynamic and classical multi-dimensional arrays different in terms of speed?
I mean, for example, when I try to access all values in a three-dimensional array with the help of loops, Is there any speed difference between the arrays which created as dynamic and classical methods.
When I say "dynamic three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is created like this.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
When I say "classical three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is simply created like this.
float matris_cos[kuanta][d][angle_scale];
But please attention, I don't ask the creation speed of these arrays. I want to access the values of these arrays via some loops. Is there any speed difference when I try to access the values.
An array of pointers (to arrays of pointers) will require extra levels of indirection to access a random element, while a multi-dimensional array will require basic arithmetic (multiplication and pointer addition). On most modern platforms, indirection is likely to be slower unless you use cache-friendly access patterns. Also, all the elements of the multi-dimensional array will be contiguous, which could help caching if you iterate over the whole array.
Whether this difference is measurable or not is something you can only tell by measuring it.
If the extra indirection does prove to be a bottleneck, you could replace the array-of-pointers with a class to represent the multi-dimensional array with a flat array:
class array_3d {
size_t d1,d2,d3;
std::vector<float> flat;
public:
array_3d(size_t d1, size_t d2, size_t d3) :
d1(d1), d2(d2), d3(d3), flat(d1*d2*d3)
{}
float & operator()(size_t x, size_t y, size_t z) {
return flat[x*d2*d3 + y*d3 + z];
}
// and a similar const overload
};
I believe that the next C++ standard (due next year) will include dynamically sized arrays, so you should be able to use the multi-dimensional form in all cases.
You won't be able to spot any difference between them in a typical application unless your arrays are pretty huge and you spend a lot of time reading/writing to them, but nonetheless, there is a difference.
float matris_cos[kuanta][d][angle_scale];
1) The memory for this multidimensional array will be contiguous. There will be less cache misses as a result.
2) The array will require space only for the floats themselves.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
1) The memory for this multidimensional array is allocated in blocks and is thus much less likely to be contiguous. This may result in cache misses.
2) This method requires space for the pointers as well as the floats themselves.
Since there's indirection in the second case, you can expect a tiny speed difference when attempting to access or change values.
To recap:
Second case uses more memory
Second case involves indirection
Second case does not have guaranteed cache locality.

What's the proper way to declare and initialize a (large) two dimensional object array in c++?

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.
Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.
If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.
I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value
Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}
My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value
While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};