HLSL mul() variables clarification - c++

The parameters for HLSL's mul( x, y) indicated here: say that
if x is a vector, it is treated as a row vector.
if y is a vector, it is treated as a column vector.
Does this then follow through meaning that:
a.
if x is a vector, y is treated as a row-major matrix
if y is a vector, x is treated as a column-major matrix
b.
since ID3DXBaseEffect::SetMatrix() passes in a row-major matrix, hence I'd use the matrix passed into the shader in following order:
ex. Output.mPosition = mul( Input.mPosition, SetMatrix()value ); ?
I'm just starting out with shaders and current relearning my matrix math. It would be nice if someone could clarify this.

No. The terms "row-major" and "column-major" refer purely to the order of storage of the matrix components in memory. They have nothing to do with the order of multiplication of matrices and vectors. In fact, the D3D9 HLSL mul call interprets matrix arguments as column-major in all cases. The ID3DXBaseEffect::SetMatrix() call interprets its matrix argument as row-major, and transposes behind the scenes to mul's expected column-major order.
If you have a matrix that abstractly looks like this:
[ a b c d ]
[ e f g h ]
[ i j k l ]
[ m n o p ]
then when stored in row-major order, its memory looks like this:
a b c d e f g h i j k l m n o p
i.e. the elements of a row are all contiguous in memory. If stored in column-major order, its memory would look like this:
a e i m b f j n c g k o d h l p
with the elements of a column all contiguous. However, this has precisely zero effect on which element is which. Element b is still in the first row and second column, either way. The labeling of the elements has not changed, only the way they're mapped to memory.
If you declare an array like float matrix[rows][cols] in C, then you are using row-major storage. However, some other languages, like FORTRAN, use column-major storage for their multidimensional arrays by default; and OpenGL also uses column-major storage.
Now, entirely separately, there is another choice of convention, which is whether to use row-vector or column-vector math. This has nothing at all to do with the memory layout of matrices, but it affects how you build your matrices, and the order of multiplication. If you use row vectors, you'll do vector-matrix multiplication:
[ a b c d ]
[x y z w] * [ e f g h ] = [x*a + y*e + z*i + w*m, ... ]
[ i j k l ]
[ m n o p ]
and if you use column vectors, then you'll do matrix-vector multiplication:
[ a b c d ] [ x ]
[ e f g h ] * [ y ] = [x*a + y*b + z*c + w*d, ... ]
[ i j k l ] [ z ]
[ m n o p ] [ w ]
This is because in row-vector math, a vector is really a 1×n matrix (a single row), and in column-vector math it's an n×1 matrix (a single column), and the rule about what sizes of matrices are allowed to be multiplied together determines the order. (You can't multiply a 4×4 matrix by a 1×4 matrix, but you can multiply a 4×4 matrix with a 4×1 one.)
Note that the matrix didn't change between the two equations above; only the interpretation of the vector changed.
So, to get back to your original question:
When you pass a vector to HLSL's mul, it automatically interprets it "correctly" according to which argument it is. If the vector is on the left, it's a row vector, and if it's on the right, it's a column vector.
However, the matrix gets interpreted the same way always. A matrix is a matrix, regardless of whether it's being multiplied with a row vector on the left or a column vector on the right. You can freely decide whether to use row-vector or column-vector math in your code, as long as you're consistent about it. HLSL is agnostic on this point, although the D3DX math library uses row vectors.
And it turns out that for some reason, in D3D9 HLSL, mul always expects matrices to be stored in column-major order. However, the D3DX math library stores matrices in row-major order, and as the documentation says, ID3DXBaseEffect::SetMatrix() expects its input in row-major order. It does a transpose behind the scenes to prepare the matrix for use with mul.
BTW, D3D11 HLSL defaults to column-major order, but allows you to use a compiler directive to tell it to use row-major order instead. It is still agnostic as to row-vector versus column-vector math. And OpenGL GLSL also uses column-major order, but does not (as far as I know) provide a way to change it.
Further reading on these issues:
A word on Matrices by Catalin Zima
Row major vs. column major, row vectors vs. column vectors by Fabian Giesen

Yes, if x is a vector then x is treated as a row major vector and y is treated as a row major matrix; vice versa for column major so for a row-major matrix system:
float4 transformed = mul(position, world);
and for column-major:
float4 transformed = mul(world, position);
Because of the way that matrix multiplication works, if the matrix is column-major then you must post multiply by a column vector to get the correct result. If the matrix is row-major you must pre multiply by a row vector.
So really, hlsl doesn't care whether your matrix is row or column major, it is up to you to apply the vector multiplication in the correct order to get the correct result.

Related

Is OpenVX warpAffine accept a transposed matrix and how does it defined as row major?

I am new to OpenVX, learning from the document that OpenVX uses a row-major storage. And the below matrix access example illustrate it, just like the ordinary row-major access pattern as we used in plain C code.
Then I go to the vx_matrix and vxCreateMatrix document page. The former has such statements:
VX_MATRIX_ROWS - The M dimension of the matrix [REQ-1131]. Read-only [REQ-1132]. Use a vx_size parameter.
VX_MATRIX_COLUMNS - The N dimension of the matrix [REQ-1133]. Read-only [REQ-1134]. Use a vx_size parameter.
While the latter said:
vx_matrix vxCreateMatrix(
vx_context c,
vx_enum data_type,
vx_size columns,
vx_size rows);
So according to my comprehension, in OpenVX world, when i said an MxN matrix, M refers to the row size and N refers to the column size. And the vxCreateMatrix declaration just follow what the row-major storage said, parameter column first and then row.
However, it really confuses me when i reach Warp Affine page, it said:
This kernel performs an affine transform with a 2x3 Matrix M with this method of pixel coordinate translation [REQ-0498]:
And the C declartion:
// x0 = a x + b y + c;
// y0 = d x + e y + f;
vx_float32 mat[3][2] = {
{a, d}, // 'x' coefficients
{b, e}, // 'y' coefficients
{c, f}, // 'offsets'
};
vx_matrix matrix = vxCreateMatrix(context, VX_TYPE_FLOAT32, 2, 3);
vxCopyMatrix(matrix, mat, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST);
If the M is a 2x3 matrix, according to the previous section, it should has 2 row and 3 column. Then why should it be declared as mat[3][2] and createMatrix accept column=2 and row=3 as argument? Is my comprehension totally wrong?
This would be a good start and help for your implementation
https://software.intel.com/content/www/us/en/develop/documentation/sample-color-copy/top/color-copy-pipeline/color-copy-pipeline-the-scan-pre-process-openvx-graph.html

Storing a matrix where every second element is 0

I have to store matrices where the non-zero elements are arranged in a "chess table"-like pattern (1,1), (1,3), (2,2), (2,4), etc. I can't store the zero elements, but I also need to implement addition and multiplication.
The elements are stored in this vector:
std::vector<std::vector<int>> _v;
And I have
size_t n,m
for their size, so printing and addition are fairly straightforward.
Where I run into problems is multiplication.
As an example, multiplying [ [1,0,3],[0,2,0] ] and [ [2,0],[0,1], [1,0] ] would result in [ [5,0],[0,2] ]. The way I'm storing the first matrix is [ [1],[2],[3] ] and the second [ [2,1], [1] ]. Is storing these matrices this way fundamentally wrong? If so, what would be a proper way to store these so I can multiply them?
I would strongly recommend:
Storing data in a simple one-dimensional valarray or vector of
integers. The size of the vector would be "width * height / 2", since
you won't store every second element.
Implementing your own get/set methods for accessing items. In these methods, you'll perform the transformation from x, y to index. And you'll get or set the value
only if x * y is an even number, because you know that every second
item would be zero. Example:
class CheckeredMatrix
{
// ...
public:
void set(int x, int y, int value)
{
int i = y * _width + x;
if (i % 2 == 0)
_data[i / 2] = value;
}
}
Implementing operations add(), sub(), mul() using
only these get/set methods. Then you don't need to modify algorithms
in any way. Since you know that every second item is zero, you can
then optimize the algorithms by skipping every second step, but it
will work without it.
Wrapping everything in a class.
I suggest to represent your matrix as 2 matrices, one with (2*i+1, 2*j+1) coordinates, the other with (2*i, 2*j).
So
a 0 b 0 c
0 d 0 e 0
f 0 g 0 h
->
a b c
f g h
and
d e
Those 2 matrices allow some regular matrix operations (sum, multiplication, transposition).

Understand Translation Matrix in OpenGL

Assume we want to translate a point p(1, 2, 3, w=1) with a vector v(a, b, c, w=0) to a new point p'
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
In Affine transformation definition, we have:
p + v = p'
=> p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
=> point + vector = point (everything works as expected)
In OpenGL, the translation matrix is as following:
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 1
I assume (a, b, c, 1) is the vector from Affine transformation definition
why we have w=1, but not w=0 such as
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 0
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
You are wrong. First of all, this hasn't really anything to do with OpenGL. This is about homogenous coordinates, which is a purely mathematical concept. It works by embedding an n-dimensional vector space into an n+1 dimensional vector space. In the 3D case, we use 4D homogenous coordinates, with the definition that the homogenous vector (x, y, z, w) represents the 3D point (x/w, y/w, z/w) in cartesian coordinates.
As a result, for any w != 0, you get a certain finite point, and for w = 0, you are discribing an infinitely far away point into a specific direction. This means that the homogenous coordinates are more powerful in the regard that they can actually describe infinitely far away points with finite coordinates (which is something which comes very handy for perspective transformations, where infinitely far away points are mapped to finite points, and vice versa).
You can, as a shortcut, imagine (x,y,z,0) as some direction vector. But for a point, it is not just w=1, but any w value unequal 0. Conceptually, this means that any cartesian 3D point is represented by a line in homogenous space (we did go up one dimension, so this actually makes sense).
I assume (a, b, c, 1) is the vector from Affine transformation definition why we have w=1, but not w=0?
Your assumption is wrong. One thing about homogenous coordinates is that we do not apply a translation in the 4D space. We get the effect of the translation in the 3D space by actually doing a shearing operation in 4D space.
So what we really want to do in homogenous space is
(x + w *a, y + w*b, z+ w*c, w)
since the 3D interpretation of the resulting vector will then be
(x + w*a) / w == x/w + a
(y + w*b) / w == y/w + b
(z + w*c) / w == z/w + c
which will represent the translation that we were after.
So to try to make this even more clear:
What you wrote in your question:
p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
Is explicitely not what we want to do. What you describe is an affine translation with respect to the 4D vector space.
But what we actually want is a translation in the 3D cartesian coordinates, so
(1, 2, 3) + (a, b, c) = (1 + a, 2 + b, 3 + c)
Applying your formula would actually mean doing a translation in the homogenous space, which would have the effect of doing a translation which is scaled by the w coordinate, while the formula I gave will always translate the point by (a,b,c), no matter what w we chose for the point.
This is of course not true if we chose w=0. Then, we will get no change at all, which is also correct because a translation will never change directions - your formula would change the direction. Your formula is correct only for w=1, which is aonly a special case. But the key point here is that we are not doing a vector addition after all, but a matrix * vector multiplication. And homogenous coordinates just allow us (among other, more powerful things), to represent a translation via matrix multiplication. But this does not mean that we can just interpret the last column as a translation vector as if we did vector addition.
Simple Answer
The reason is the way how matrix multiplications work. If you multiply a matrix by a vector then the w-component of the result is the inner product of the 4th line of the matrix with the vector. After applying the transformation, a point should still be a point and a direction should be a direction. If you would set that to a 0-vector, the result will always be 0 and thus, the resulting vector will have changed from position (w=1) to direction (w=0).
More detailed answer
The definition of a affine transformation is:
x' = A * x + t,
where is a A is a linear map and t a translation. Traditionally, linear maps are written by mathematicians in matrix form. Note, that t is here, similar to x, a 3-dimensional vector. It would now be cumbersome (and less general, thinking of projective mappings), if we would always have to handle the linear mapping matrix and the translation vector. This can be solved by introducing an additional dimension to the mapping, the so-called homogeneous coordinate, which allows us to store the linear mapping as well as the translation vector in a combined 4x4 matrix. This is called augmented matrix and by definition,
x' A | t x
[ ] = [ | ] * [ ]
1 0 | 1 1
It should also be noted, that affine transformations can now be combined very easily by just multiplying there augmented matrices, which would be hard to do in matrix plus vector notation.
One should also note, that the bottom-right 1 is not part of the translation vector, which is still 3-dimensional, but of the matrix augmentation.
You might also want to read the section about "Augmented matrix" here: https://en.wikipedia.org/wiki/Affine_transformation#Augmented_matrix

vector * matrix product efficiency issue

Just as Z boson recommended, I am using a column-major matrix format in order to avoid having to use the dot product. I don't see a feasible way to avoid it when multiplying a vector with a matrix, though. The matrix multiplication trick requires efficient extraction of rows (or columns, if we transpose the product). To multiply a vector by a matrix, we therefore transpose:
(b * A)^T = A^T * b^T
A is a matrix, b a row vector, which, after being transposed, becomes a column vector. Its rows are just single scalars and the vector * matrix product implementation becomes an inefficient implementation of dot products of columns of (non-transposed) matrix A with b. Is there a way to avoid performing these dot products? The only way I see that could do it, would involve row extraction, which is inefficient with the column-major matrix format.
This can be understood from original post on this (my first on SO)
efficient-4x4-matrix-vector-multiplication-with-sse-horizontal-add-and-dot-prod
. The rest of the discussion applies to 4x4 matrices.
Here are two methods to do do matrix times vector (v = Mu where v and u are column vectors)
method 1) v1 = dot(row1, u), v2 = dot(row2, u), v3 = dot(row3, u), v4 = dot(row4, u)
method 2) v = u1*col1 + u2*col2 + u3*col3 + u4*col4.
The first method is more familiar from math class while the second is more efficient for a SIMD computer. The second method uses vectorized math (like numpy) e.g.
u1*col1 = (u1x*col1x, u1y*col1y, u1z*col1z, u1w*col1w).
Now let's look at vector times matrix (v = uM where v and u are row vectors)
method 1) v1 = dot(col1, u), v2 = dot(col2, u), v3 = dot(col3, u), v4 = dot(col4, u)
method 2) v = u1*row1 + u2*row2 + u3*row3 + u4*row4.
Now the roles of columns and rows have swapped but method 2 is still the efficient method to use on a SIMD computer.
To do matrix times vector efficiently on a SIMD computer the matrix should be stored in column-major order. To do vector times matrix efficient on a SIMD computer the matrix should be stored in row-major order.
As far as I understand OpenGL uses column major ordering and does matrix times vector and DirectX uses row-major ordering and does vector times matrix.
If you have three matrix transformations that you do in order M1 first then M2 then M3 with matrix times vector you write it as
v = M3*M2*M1*u //u and v are column vectors - OpenGL form
With vector times matrix you write
v = u*M1*M2*M3 //u and v are row vectors - DirectX form
Neither form is better than the other in terms of efficiency. It's just a question of notation (and causing confusion which is useful when you have competition).
It's important to note that for matrix*matrix row-major versus column-major storage is irrelevant.
If you want to know why the vertical SIMD instructions are faster than the horizontal ones that's a separate question which should be asked but in short the horizontal ones really act in serial rather than parallel and are broken up into several micro-ops (which is why ironically dppd is faster than dpps).

Image reconstruction using SVD Decomposition

I have performed block SVD decomposition over image and I stored results.
Now, I need to make reconstruction from this results. I found few examples all written in Matlab, which is a mystery for me.
I only need formula from which I can reconstruct my picture, or example written in C language.
Matrix A is equal U*S*V'. How will look formula, e.g. for calculating first five singular values (product of which rows and columns)? Please provide formula with indexes in C like style. U and V' are matrices and S is vector (not matrix).
Not sure if I get your question right, but if you just need to know singular values, they are the diagonal values of the middle matrix S. S in general is a diagonal matrix, which is stored here as a vector. I mean, only the diagonal is stored, you should imagine it as a matrix if you're thinking in matrix calculations.
Those diagonal values are your singular values, if you need the first biggest singular values, just take the 5 biggest values of the vector S.
Quoting from Wikipedia:
The diagonal entries Σi,i of Σ are known as the singular values of M.
The m columns of U and the n columns of V are called the left-singular
vectors and right-singular vectors of M, respectively.
In the above quote, sigma is your S, and M is the original matrix.
You have asked for C code, yet my hope is that pseudocode will suffice (it's late, I'm tired). The target matrix A has m rows, c columns and rank rho. The variable p = min(m,n).
One strategy is to first form the the intermediate matrix product B = US. This is trivial due to the diagonal-like nature of the matrix of singular values. Assume you have rho ( = 5 ) singular values. You must enforce rho <= p.
Replace column vector u1 with s1u1.
Replace column vector u2 with s2u2.
...
Replace column vector urho with srhourho.
Replace column vector urho+1 with a zero vector of length m.
Replace column vector urho+2 with a zero vector of length m.
...
Replace column vector up with a zero vector of length m.
Next form the new image matrix A = BVT. The matrix element in row r and column c is the dot product of the rth row vector (length rho) of B with the cth column vector (length rho) of VT.
Another strategy is to jump to the form where the matrix elements of A in row r and column c are
ar,c = sum ( skur,kvc,k, { k, 1, rho } )
The row counter r runs from 1 to m; the column counter c runs from 1 to n.