I'm having trouble solving what I think should be a fairly simple problem. The basic problem is I want to modify an Eigen PermutationMatrix but I don't know how.
I'm doing a QR decomposition of some matrix X using the C++ Eigen library. I'm doing this on rank-deficient matrices and I need a particular output. Specifically, I need
R^{-1} * t(R^{-1})
The problem is that using Eigen::ColPivHouseholderQR returns a permuted version of R. This is easy enough to fix when X is full rank, but I'd like the fastest solution for when it is rank-deficient. Let me demonstrate:
using namespace Eigen;
// Do QR
ColPivHouseholderQR<MatrixXd> PQR(X);
// Get permutation matrix
ColPivHouseholderQR<MatrixXd>::PermutationType Pmat(PQR.colsPermutation());
int r(PQR.rank());
int p(X.cols());
// Notice I'm only getting r elements, so R_inv is r by r
MatrixXd R_inv = PQR.matrixQR().topLeftCorner(r, r).triangularView<Upper>().solve(MatrixXd::Identity(r, r));
// This only works if r = p and X is full-rank
R_inv = Pmat * R_inv * Pmat;
XtX_inv = R_inv * R_inv.transpose();
So the basic problem is that I would like to modify Pmat so that it only permutes the r columns of R_inv that I've extracted from PQR.matrixQR(). My basic problem is that I have no idea how to modify work with an Eigen PermutationMatrix, as it doesn't seem to have any of the methods or properties of a normal matrix.
One possible solution is the following: when I multiply Pmat * MatrixXd::Identity(p, p), I get a useful matrix.
For example, I get something like:
[0, 1, 0, 0,
1, 0, 0, 0,
0, 0, 0, 1,
0, 0, 1, 0]
If p = 4 and r = 3, then I would just like this sub-view, where I drop all columns right of the first r columns, and then remove all rows that are all 0:
[0, 1, 0,
1, 0, 0,
0, 0, 1]
So I could do the following:
P = Pmat * MatrixXd::Identity(p, p)
P.leftCols(p);
MatrixXd P = Pmat * Eigen::MatrixXd::Identity(p, p);
// https://stackoverflow.com/questions/41305178/removing-zero-columns-or-rows-using-eigen
// find non-zero columns:
Matrix<bool, 1, Dynamic> non_zeros = P.leftCols(r).cast<bool>().rowwise().any();
// allocate result matrix:
MatrixXd res(non_zeros.count(), r);
// fill result matrix:
Index j=0;
for(Index i=0; i<P.rows(); ++i)
{
if(non_zeros(i))
res.row(j++) = P.row(i).leftCols(r);
}
R_inv = res * R_inv * res;
XtX_inv = R_inv * R_inv.transpose();
but this seems expensive and doesn't take advantage of the fact that Pmat already knows which rows of Pmat should be dropped. I'm guessing there is an easier way to work with Pmat.
Is there any way to easily modify an Eigen PermutationMatrix to only consider columns that weren't placed beyond the first r positions?
Any help or tips would be greatly appreciated.
I've come up with another solution, which probably requires less computation.
// Get all column indices
ArrayXi Pmat_indices = Pmat.indices();
// Get the order for the columns you are keeping
ArrayXi Pmat_keep = Pmat_indices.head(r);
// Get the indices for columns you are discarding
ArrayXi Pmat_toss = Pmat_indices.tail(p - r);
// this code takes the indices you are keeping, and, while preserving order, keeps them in the range [0, r-1]
// For each one, see how many dropped indices are smaller, and subtract that difference
// Ex: p = 4, r = 2
// Pmat_indices = {3, 1, 0, 2}
// Pmat_keep = {3, 1}
// Pmat_toss = {0, 2}
// Now we go through each keeper, count how many in toss are smaller, and then modify accordingly
// 3 - 2 and 1 - 1
// Pmat_keep = {1, 0}
for(Index i=0; i<r; ++i)
{
Pmat_keep(i) = Pmat_keep(i) - (Pmat_toss < Pmat_keep(i)).count();
}
// Now this will order just the first few columns in the right order
PermutationMatrix<Dynamic, Dynamic> P = PermutationWrapper<ArrayXi>(Pmat_keep);
R_inv = P * R_inv * P;
Related
I've created a Vector and Matrix class and I am trying to perform operations such as the multiplication of a matrix and vector, the multiplication of a matrix and matrix, and the multiplication of a matrix and a float (scalar). I seem to be having problem getting the right product for the matrix * vector and matrix * matrix.
Here is the part of Matrix class meant to handle those operations:
// Matrix * vector, result vector
Vector Matrix::operator*(const Vector & other) const
{
if (other.getDimensions() == 4)
{
float floats[4];
const float* temp = other.getData();
for (int j = 0; j < 4; j++)
{
Vector myCol = column(j);
floats[j] = (temp[0] * myCol.getData(0)) + (temp[1] * myCol.getData(1)) + (temp[2] * myCol.getData(2)) + (temp[3] * myCol.getData(3));
}
return Vector(floats[0], floats[1], floats[2], floats[3]);
}
else
{
return Vector();
}
}
// Matrix * scalar, result matrix
Matrix Matrix::operator*(float c) const
{
Matrix myMatrix;
for (int i = 0; i < 16; i++)
{
myMatrix.data[i] = this->data[i] * c;
}
return myMatrix;
}
In my main.cpp,
Matrix m = Matrix(Vector(1, 0, 0, 1), Vector(0, 1, 0, 2), Vector(0, 0, 1, 3), Vector(0, 0, 0, 1));
Is the value of the matrix and
v = Vector(1, 0, -1, 1);
Is the value of the vector.
When I multiply m * v I get <1, 0, -1, -1>, but the answer is <2, 2, 2, 1>.
And when doing the matrix * scalar with the same m matrix above and vector v with the values
v = Vector(1, 0, -1, 0);
I get m*v to be <1, 0, -1, 2> when it should be <1, 0, -1, 0>.
My Vector class works fine so I'm suspecting I messed up somewhere with the math for implementing the matrix operations.
To expand on #Klaus answer, mathematically in the expression M*V the vector V is a column, and the elements of the result are (dot-)products of matrix rows and V. Replace column(j) with row(j).
I calculated your example by hand now, and if you expect the result to be <2, 2, 2, 1>, then you definitely swapped rows and columns in your matrix. When you multiply a matrix with a vector you want to put the products of the rows of the matrix and the vector in a result vector. Kind of:
Vector Matrix::operator*(const Vector & other) const
{
float floats[4];
const float* temp = other.getData();
for (int j = 0; j < 4; j++)
{
Vector my_row = row(j);
floats[j] = 0;
for(int i=0; i!=4; ++i)
floats[j]+=temp[i] * myCol.getData(i);
}
//(maybe provide a better constructor to take an array)
return Vector(floats[0], floats[1], floats[2], floats[3]);
}
For the example with the scalar, I don't get the point. I don't understand how you expect a multiplication of a matrix with a scalar if you are multiplying a matrix with a vector.
Also you could improve the error handling by only accepting vectors of size 4 (imposing that as a requirement in your vector class), if you just use vectors of size 4.
PS: maybe you should also put your addition code in the loop, into a second loop, so that it is more readable and expandable.
I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.
A vertex may not contain any point. So every point has coordinates in {0, 1}D. I am only interested in query time, as long as the memory cost is reasonable ( not exponential in N for example :) ).
Given a query that lies on one of the cube's vertices and an input parameter r, find all the vertices (thus points) that have hamming distance <= r with the query.
What's the way to go in a c++ environment?
I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative, would be appreciated! Since hamming distance comes into play, bitwise manipulations should help (e.g. XOR).
There is a nice bithack to go from one bitmask with k bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with k bits set. XORing these masks with an initial value gives all the values at hamming distance exactly k away from it.
So for D dimensions, where D is less than 32 (otherwise change the types),
uint32_t limit = (1u << D) - 1;
for (int k = 1; k <= r; k++) {
uint32_t diff = (1u << k) - 1;
while (diff <= limit) {
// v is the input vertex
uint32_t vertex = v ^ diff;
// use it
diff = nextBitPermutation(diff);
}
}
Where nextBitPermutation may be implemented in C++ as something like (if you have __builtin_ctz)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
Or for MSVC (not tested)
uint32_t nextBitPermutation(uint32_t v) {
// see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
uint32_t t = v | (v - 1);
unsigned long tzc;
_BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
}
If D is really low, 4 or lower, the old popcnt-with-pshufb works really well and generally everything just lines up well, like this:
uint16_t query(int vertex, int r, int8_t* validmask)
{
// validmask should be array of 16 int8_t's,
// 0 for a vertex that doesn't exist, -1 if it does
__m128i valid = _mm_loadu_si128((__m128i*)validmask);
__m128i t0 = _mm_set1_epi8(vertex);
__m128i r0 = _mm_set1_epi8(r + 1);
__m128i all = _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
__m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4);
__m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
__m128i close_enough = _mm_cmpgt_epi8(r0, dist);
__m128i result = _mm_and_si128(close_enough, valid);
return _mm_movemask_epi8(result);
}
This should be fairly fast; fast compared to the bithack above (nextBitPermutation, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin popcnt, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.
This also scales down to D=3 of course, just make none of the points >= 8 valid. D>4 can be done similarly but it takes more code then, and since this is really a brute force solution that is only fast due to parallelism it fundamentally gets slower exponentially in D.
I have a vector of size n; n is power of 2. I need to treat this vector as a matrix n = R*C. Then I need to transpose the matrix.
For example, I have vector: [1,2,3,4,5,6,7,8]
I need to find R and C. In this case it would be: 4,2. And treat vector as matrix:
[1,2]
[3,4]
[5,6]
[7,8]
Transpose it to:
[1, 3, 5, 7]
[2, 4, 6, 8]
After transposition vector should be: [1, 3, 5, 7, 2, 4, 6, 8]
Is there existing algorithms to perform in-place non-square matrix transposition? I don't want to reinvent a wheel.
My vector is very big so I don't want to create intermediate matrix. I need an in-place algorithm. Performance is very important.
All modofications should be done in oroginal vector. Ideally algorithm should work with chunks that will fit in CPU cache.
I can't use iterator because of memory locality. So I need real transposition.
It does not matter if matrix would be 2x4 or 4x2
The problem can be divided in two parts. First, find R and C and then, reshape the matrix. Here is something I would try to do:
Since n is a power of 2, i.e. n = 2^k then if k is even, we have: R=C=sqrt(n). And if k is odd, then R = 2^((k+1)/2) and C=2^((k-1)/2).
Note: Since you mentioned you want to avoid using extra memory, I have made some editions to my original answer.
The code to calculate R and C would be something like:
void getRandC(const size_t& n, size_t& R, size_t& C)
{
int k = (int)log2(double(n)),
i, j;
if (k & 1) // k is odd
i = (j = (k + 1) / 2) - 1;
else
i = j = k / 2;
R = (size_t)exp2(i);
C = (size_t)exp2(j);
}
Which needs C++11. For the second part, in case you want to keep the original vector:
void transposeVector(const std::vector<int>& vec, std::vector<int>& mat)
{
size_t R, C;
getRandC(vec.size(), R, C);
// first, reserve the memory
mat.resize(vec.size());
// now, do the transposition directly
for (size_t i = 0; i < R; i++)
{
for (size_t j = 0; j < C; j++)
{
mat[i * C + j] = vec[i + R * j];
}
}
}
And, if you want to modify the original vector and avoid using extra memory, you can write:
void transposeInPlace(std::vector<int>& vec)
{
size_t R, C;
getRandC(vec.size(), R, C);
for (size_t j = 0; R > 1; j += C, R--)
{
for (size_t i = j + R, k = j + 1; i < vec.size(); i += R)
{
vec.insert(vec.begin() + k++, vec[i]);
vec.erase(vec.begin() + i + 1);
}
}
}
See the live example
Since you haven't provided us with any of your code, can I suggest a different approach (that I don't know will work for your particular situation)?
I would use an algorithm based on your matrix to transpose your values into the new matrix yourself. Since performance is an issue this will help even more so since you don't have to create another matrix. If this is applicable for you.
Have a vector
[1, 2, 3, 4, 5, 6, 7, 8]
Create your matrix
[1, 2]
[3, 4]
[5, 6]
[7, 8]
Reorder vector without another matrix
[1, 3, 5, 7, 2, 4, 6, 8]
Overwrite the values in the current matrix (so you don't have to create a new one) and reorder the values based on your current matrix.
Add values in order
R1 and C1 to transposed_vector[0]
R2 and C1 to transposed_vector[1]
R3 and C1 to transposed_vector[2]
R4 and C1 to transposed_vector[3]
R1 and C2 to transposed_vector[4]
And so on.
For non square matrix representation, I think it may be tricky, and not worth the effort to make the transpose of your flat vector without creating another one. Here is a snippet of what I came up with:
chrono::steady_clock::time_point start = chrono::steady_clock::now();
int i, j, p, k;
vector<int> t_matrix(matrix.size());
for(k=0; k< R*C ;++k)
{
i = k/C;
j = k - i*C;
p = j*R + i;
t_matrix[p] = matrix[k];
}
cout << chrono::duration_cast<chrono::milliseconds> chrono::steady_clock::now() - start).count() << endl;
Here, matrix is your flat vector, t_matrix is the "transposed" flat vector, and R and C are, respectively rows and vector you found for your matrix representation.
Ive been recently reading Matrix Tutorials with openGL and stumbled upon an optimized method for Matrix Multiplication that I cannot understand.
//Create an allias type for a Matrix Type
typedef struct Matrix
{
float m[16];
} Matrix;
//default matrix
static const Matrix IDENTITY_MATRIX = { {
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1
} };
Matrix MultiplyMatrices(const Matrix* m1, const Matrix* m2)
{
Matrix out = IDENTITY_MATRIX;
unsigned int row, column, row_offset;
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row * 4)
for (column = 0; column < 4; ++column)
out.m[row_offset + column] =
(m1->m[row_offset + 0] * m2->m[column + 0]) +
(m1->m[row_offset + 1] * m2->m[column + 4]) +
(m1->m[row_offset + 2] * m2->m[column + 8]) +
(m1->m[row_offset + 3] * m2->m[column + 12]);
return out;
}
These are the questions I have:
In the method MultiplyMatrices why is there a pointer to params m1 and m2? If your just copying their values and returning a new matrix why use a pointer?
Why is the for loop condition identical to its increment?
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row *
4)
The MultiplyMatrices function calculates the product of two matrices. So that's why you need two matrices as the input arguments of this function. Note that the definition of the matrix
typedef struct Matrix
{
float m[16];
} Matrix;
defines a 4 by 4 matrix with a 1-D array. So the offset is 4 for each row. This is just to simulate a 2-D matrix with 1-D array. You need to pass in pointers to two input matrices so that you can get their element values inside the function.
The reason why you see two identical statements in the for loop is:
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row * 4)
Initially the row_offset is set to 0. When the loop is going through each row in the matrix, the row_offset is increasing with row. This is because in the 1-D array representation of 2-D matrix, the a[i][j] element can be written as:
a[i][j] = a[i*num_col+j]
And here num_col is 4. So these two statements are not the same. The first is to initialize. The second is to reset the row_offset when the row index increases by 1.
In the method MultiplyMatrices why is there a pointer to m1 and m2? If your just copying their values why use a pointer?
Maybe I don't understand your question, but how would you propose to do it differently? You're outputting the product into a third memory location out which is the product of m1 and m2. This is the most efficient way..
Why is the for loop condition identical to its increment?
It's not - the ++row increments row before the assignment on each loop. The "condition" is row < 4 which you did not bold - maybe that's the confusion.
I was trying to understand how to store a band matrix, I found an example in the book "C++ and Object Oriented Numeric Computing" but I cannot figure out what's the purpose of the line bda[i] += P; and this also gives me problems when trying to print the band matrix. Here it is:
int N = 5; //Matrix of NxN
int P = 1; //Left bandwidth
int R = 2; //Right bandwidth
//Matrix A
double A[5][5] = { { 1, 6, 10, 0, 0 },
{ 13, 2, 0, 11, 0 },
{ 0, 14, 3, 8, 12 },
{ 0, 0, 0, 4, 9 },
{ 0, 0, 0, 16, 5 } };
//Allocate memory for rows
double** bda = new double*[N];
for (int i = 0; i < N; i++) {
bda[i] = new double[P + R + 1]; //Allocate memory for cols
bda[i] += P; //What's the purpose of this?
}
This is used for a compact way to store a matrix that has P nonzero diagonals to the left of the main diagonal, and R nonzero diagonals to the right, with all other elements being zero. For each row, we only allocate space for the P+R+1 elements around the main diagonal.
The bda[i] += P line makes bda[i] point to an element on the main diagonal. This can make it more convenient to use the matrix: bda[i][0] is on the main diagonal for every i, bda[i][1] is on the first diagonal to the right, bda[i][-1] is on the first diagonal to the left, etc. This allows you to find elements on the main diagonal or near it without having to add P each time. Whether this is helpful depends on how you use the matrix.
Note that if you do this, you will need to subtract P from bda[i] before you delete[] it.