Band matrix storage offset - c++

I was trying to understand how to store a band matrix, I found an example in the book "C++ and Object Oriented Numeric Computing" but I cannot figure out what's the purpose of the line bda[i] += P; and this also gives me problems when trying to print the band matrix. Here it is:
int N = 5; //Matrix of NxN
int P = 1; //Left bandwidth
int R = 2; //Right bandwidth
//Matrix A
double A[5][5] = { { 1, 6, 10, 0, 0 },
{ 13, 2, 0, 11, 0 },
{ 0, 14, 3, 8, 12 },
{ 0, 0, 0, 4, 9 },
{ 0, 0, 0, 16, 5 } };
//Allocate memory for rows
double** bda = new double*[N];
for (int i = 0; i < N; i++) {
bda[i] = new double[P + R + 1]; //Allocate memory for cols
bda[i] += P; //What's the purpose of this?
}

This is used for a compact way to store a matrix that has P nonzero diagonals to the left of the main diagonal, and R nonzero diagonals to the right, with all other elements being zero. For each row, we only allocate space for the P+R+1 elements around the main diagonal.
The bda[i] += P line makes bda[i] point to an element on the main diagonal. This can make it more convenient to use the matrix: bda[i][0] is on the main diagonal for every i, bda[i][1] is on the first diagonal to the right, bda[i][-1] is on the first diagonal to the left, etc. This allows you to find elements on the main diagonal or near it without having to add P each time. Whether this is helpful depends on how you use the matrix.
Note that if you do this, you will need to subtract P from bda[i] before you delete[] it.

Related

Modify/Shrink Eigen Permutation Matrix

I'm having trouble solving what I think should be a fairly simple problem. The basic problem is I want to modify an Eigen PermutationMatrix but I don't know how.
I'm doing a QR decomposition of some matrix X using the C++ Eigen library. I'm doing this on rank-deficient matrices and I need a particular output. Specifically, I need
R^{-1} * t(R^{-1})
The problem is that using Eigen::ColPivHouseholderQR returns a permuted version of R. This is easy enough to fix when X is full rank, but I'd like the fastest solution for when it is rank-deficient. Let me demonstrate:
using namespace Eigen;
// Do QR
ColPivHouseholderQR<MatrixXd> PQR(X);
// Get permutation matrix
ColPivHouseholderQR<MatrixXd>::PermutationType Pmat(PQR.colsPermutation());
int r(PQR.rank());
int p(X.cols());
// Notice I'm only getting r elements, so R_inv is r by r
MatrixXd R_inv = PQR.matrixQR().topLeftCorner(r, r).triangularView<Upper>().solve(MatrixXd::Identity(r, r));
// This only works if r = p and X is full-rank
R_inv = Pmat * R_inv * Pmat;
XtX_inv = R_inv * R_inv.transpose();
So the basic problem is that I would like to modify Pmat so that it only permutes the r columns of R_inv that I've extracted from PQR.matrixQR(). My basic problem is that I have no idea how to modify work with an Eigen PermutationMatrix, as it doesn't seem to have any of the methods or properties of a normal matrix.
One possible solution is the following: when I multiply Pmat * MatrixXd::Identity(p, p), I get a useful matrix.
For example, I get something like:
[0, 1, 0, 0,
1, 0, 0, 0,
0, 0, 0, 1,
0, 0, 1, 0]
If p = 4 and r = 3, then I would just like this sub-view, where I drop all columns right of the first r columns, and then remove all rows that are all 0:
[0, 1, 0,
1, 0, 0,
0, 0, 1]
So I could do the following:
P = Pmat * MatrixXd::Identity(p, p)
P.leftCols(p);
MatrixXd P = Pmat * Eigen::MatrixXd::Identity(p, p);
// https://stackoverflow.com/questions/41305178/removing-zero-columns-or-rows-using-eigen
// find non-zero columns:
Matrix<bool, 1, Dynamic> non_zeros = P.leftCols(r).cast<bool>().rowwise().any();
// allocate result matrix:
MatrixXd res(non_zeros.count(), r);
// fill result matrix:
Index j=0;
for(Index i=0; i<P.rows(); ++i)
{
if(non_zeros(i))
res.row(j++) = P.row(i).leftCols(r);
}
R_inv = res * R_inv * res;
XtX_inv = R_inv * R_inv.transpose();
but this seems expensive and doesn't take advantage of the fact that Pmat already knows which rows of Pmat should be dropped. I'm guessing there is an easier way to work with Pmat.
Is there any way to easily modify an Eigen PermutationMatrix to only consider columns that weren't placed beyond the first r positions?
Any help or tips would be greatly appreciated.
I've come up with another solution, which probably requires less computation.
// Get all column indices
ArrayXi Pmat_indices = Pmat.indices();
// Get the order for the columns you are keeping
ArrayXi Pmat_keep = Pmat_indices.head(r);
// Get the indices for columns you are discarding
ArrayXi Pmat_toss = Pmat_indices.tail(p - r);
// this code takes the indices you are keeping, and, while preserving order, keeps them in the range [0, r-1]
// For each one, see how many dropped indices are smaller, and subtract that difference
// Ex: p = 4, r = 2
// Pmat_indices = {3, 1, 0, 2}
// Pmat_keep = {3, 1}
// Pmat_toss = {0, 2}
// Now we go through each keeper, count how many in toss are smaller, and then modify accordingly
// 3 - 2 and 1 - 1
// Pmat_keep = {1, 0}
for(Index i=0; i<r; ++i)
{
Pmat_keep(i) = Pmat_keep(i) - (Pmat_toss < Pmat_keep(i)).count();
}
// Now this will order just the first few columns in the right order
PermutationMatrix<Dynamic, Dynamic> P = PermutationWrapper<ArrayXi>(Pmat_keep);
R_inv = P * R_inv * P;

Matrix and vector multiplication, outputting incorrect product

I've created a Vector and Matrix class and I am trying to perform operations such as the multiplication of a matrix and vector, the multiplication of a matrix and matrix, and the multiplication of a matrix and a float (scalar). I seem to be having problem getting the right product for the matrix * vector and matrix * matrix.
Here is the part of Matrix class meant to handle those operations:
// Matrix * vector, result vector
Vector Matrix::operator*(const Vector & other) const
{
if (other.getDimensions() == 4)
{
float floats[4];
const float* temp = other.getData();
for (int j = 0; j < 4; j++)
{
Vector myCol = column(j);
floats[j] = (temp[0] * myCol.getData(0)) + (temp[1] * myCol.getData(1)) + (temp[2] * myCol.getData(2)) + (temp[3] * myCol.getData(3));
}
return Vector(floats[0], floats[1], floats[2], floats[3]);
}
else
{
return Vector();
}
}
// Matrix * scalar, result matrix
Matrix Matrix::operator*(float c) const
{
Matrix myMatrix;
for (int i = 0; i < 16; i++)
{
myMatrix.data[i] = this->data[i] * c;
}
return myMatrix;
}
In my main.cpp,
Matrix m = Matrix(Vector(1, 0, 0, 1), Vector(0, 1, 0, 2), Vector(0, 0, 1, 3), Vector(0, 0, 0, 1));
Is the value of the matrix and
v = Vector(1, 0, -1, 1);
Is the value of the vector.
When I multiply m * v I get <1, 0, -1, -1>, but the answer is <2, 2, 2, 1>.
And when doing the matrix * scalar with the same m matrix above and vector v with the values
v = Vector(1, 0, -1, 0);
I get m*v to be <1, 0, -1, 2> when it should be <1, 0, -1, 0>.
My Vector class works fine so I'm suspecting I messed up somewhere with the math for implementing the matrix operations.
To expand on #Klaus answer, mathematically in the expression M*V the vector V is a column, and the elements of the result are (dot-)products of matrix rows and V. Replace column(j) with row(j).
I calculated your example by hand now, and if you expect the result to be <2, 2, 2, 1>, then you definitely swapped rows and columns in your matrix. When you multiply a matrix with a vector you want to put the products of the rows of the matrix and the vector in a result vector. Kind of:
Vector Matrix::operator*(const Vector & other) const
{
float floats[4];
const float* temp = other.getData();
for (int j = 0; j < 4; j++)
{
Vector my_row = row(j);
floats[j] = 0;
for(int i=0; i!=4; ++i)
floats[j]+=temp[i] * myCol.getData(i);
}
//(maybe provide a better constructor to take an array)
return Vector(floats[0], floats[1], floats[2], floats[3]);
}
For the example with the scalar, I don't get the point. I don't understand how you expect a multiplication of a matrix with a scalar if you are multiplying a matrix with a vector.
Also you could improve the error handling by only accepting vectors of size 4 (imposing that as a requirement in your vector class), if you just use vectors of size 4.
PS: maybe you should also put your addition code in the loop, into a second loop, so that it is more readable and expandable.

Partitioning of an AABB

I have a problem where I need to divide an AABB into a number of small AABBs. I need to find the minimum and maximum points in each of the smaller AABB.
If we take this cuboid as an example, we can see that is divided into 64 smaller cuboids. I need to calculate the minimum and maximum points of all of these smaller cuboids, where the number of cuboids (64) can be specified by the end user.
I have made a basic attempt with the following code:
// Half the length of each side of the AABB.
float h = side * 0.5f;
// The length of each side of the inner AABBs.
float l = side / NUMBER_OF_PARTITIONS;
// Calculate the minimum point on the parent AABB.
Vector3 minPointAABB(
origin.getX() - h,
origin.getY() - h,
origin.getZ() - h
);
// Calculate all inner AABBs which completely fill the parent AABB.
for (int i = 0; i < NUMBER_OF_PARTITIONS; i++)
{
// This is not correct! Given a parent AABB of min (-10, 0, 0) and max (0, 10, 10) I need to
// calculate the following positions as minimum points of InnerAABB (with 8 inner AABBs).
// (-10, 0, 0), (-5, 0, 0), (-10, 5, 0), (-5, 5, 0), (-10, 0, 5), (-5, 0, 5),
// (-10, 5, 5), (-5, 5, 5)
Vector3 minInnerAABB(
minPointAABB.getX() + i * l,
minPointAABB.getY() + i * l,
minPointAABB.getZ() + i * l
);
// We can calculate the maximum point of the AABB from the minimum point
// by the summuation of each coordinate in the minimum point with the length of each side.
Vector3 maxInnerAABB(
minInnerAABB.getX() + l,
minInnerAABB.getY() + l,
minInnerAABB.getZ() + l
);
// Add the inner AABB points to a container for later use.
}
Many thanks!
I assume that your problem is that you don't get enough sub-boxes. The number of partitions refers to partitions per dimension, right? So 2 partitions yield 8 sub-boxes, 3 partitions yield 27 sub-boxes and so on.
Then you must have three nested loops, one for each dimension:
for (int k = 0; k < NUMBER_OF_PARTITIONS; k++)
for (int j = 0; j < NUMBER_OF_PARTITIONS; j++)
for (int i = 0; i < NUMBER_OF_PARTITIONS; i++)
{
Vector3 minInnerAABB(
minPointAABB.getX() + i * l,
minPointAABB.getY() + j * l,
minPointAABB.getZ() + k * l
);
Vector3 maxInnerAABB(
minInnerAABB.getX() + l,
minInnerAABB.getY() + l,
minInnerAABB.getZ() + l
);
// Add the inner AABB points to a container for later use.
}
}
}
Alternatively, you can have one huge loop over the cube of your partitios and sort out the indices by division and remainder operations inside the loop, which is a bit messy for three dimensions.
It might also be a good idea to make the code more general by calculating three independent sub-box lengths for each dimension based on the side lengths of the original box.

C++ Matrix Multiplication - understanding the logic behind an optimized method for it

Ive been recently reading Matrix Tutorials with openGL and stumbled upon an optimized method for Matrix Multiplication that I cannot understand.
//Create an allias type for a Matrix Type
typedef struct Matrix
{
float m[16];
} Matrix;
//default matrix
static const Matrix IDENTITY_MATRIX = { {
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1
} };
Matrix MultiplyMatrices(const Matrix* m1, const Matrix* m2)
{
Matrix out = IDENTITY_MATRIX;
unsigned int row, column, row_offset;
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row * 4)
for (column = 0; column < 4; ++column)
out.m[row_offset + column] =
(m1->m[row_offset + 0] * m2->m[column + 0]) +
(m1->m[row_offset + 1] * m2->m[column + 4]) +
(m1->m[row_offset + 2] * m2->m[column + 8]) +
(m1->m[row_offset + 3] * m2->m[column + 12]);
return out;
}
These are the questions I have:
In the method MultiplyMatrices why is there a pointer to params m1 and m2? If your just copying their values and returning a new matrix why use a pointer?
Why is the for loop condition identical to its increment?
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row *
4)
The MultiplyMatrices function calculates the product of two matrices. So that's why you need two matrices as the input arguments of this function. Note that the definition of the matrix
typedef struct Matrix
{
float m[16];
} Matrix;
defines a 4 by 4 matrix with a 1-D array. So the offset is 4 for each row. This is just to simulate a 2-D matrix with 1-D array. You need to pass in pointers to two input matrices so that you can get their element values inside the function.
The reason why you see two identical statements in the for loop is:
for (row = 0, row_offset = row * 4; row < 4; ++row, row_offset = row * 4)
Initially the row_offset is set to 0. When the loop is going through each row in the matrix, the row_offset is increasing with row. This is because in the 1-D array representation of 2-D matrix, the a[i][j] element can be written as:
a[i][j] = a[i*num_col+j]
And here num_col is 4. So these two statements are not the same. The first is to initialize. The second is to reset the row_offset when the row index increases by 1.
In the method MultiplyMatrices why is there a pointer to m1 and m2? If your just copying their values why use a pointer?
Maybe I don't understand your question, but how would you propose to do it differently? You're outputting the product into a third memory location out which is the product of m1 and m2. This is the most efficient way..
Why is the for loop condition identical to its increment?
It's not - the ++row increments row before the assignment on each loop. The "condition" is row < 4 which you did not bold - maybe that's the confusion.

Coin flipping game: Optimization problem

There is a rectangular grid of coins, with heads being represented by the value 1 and tails being represented by the value 0. You represent this using a 2D integer array table (between 1 to 10 rows/columns, inclusive).
In each move, you choose any single cell (R, C) in the grid (R-th row, C-th column) and flip the coins in all cells (r, c), where r is between 0 and R, inclusive, and c is between 0 and C, inclusive. Flipping a coin means inverting the value of a cell from zero to one or one to zero.
Return the minimum number of moves required to change all the cells in the grid to tails. This will always be possible.
Examples:
1111
1111
returns: 1
01
01
returns: 2
010101011010000101010101
returns: 20
000
000
001
011
returns: 6
This is what i tried:
Since the order of flipping doesn't matter, and making a move on a coin twice is like not making a move at all, we can just find all distinct combinations of flipping coins, and minimizing the size of good combinations(good meaning those that give all tails).
This can be done by making a set consisting of all coins, each represented by an index.(i.e. if there were 20 coins in all, this set would contain 20 elements, giving them an index 1 to 20). Then make all possible subsets and see which of them give the answer(i.e. if making a move on the coins in the subset gives us all tails). Finally, minimize size of the good combinations.
I don't know if I've been able to express myself too clearly... I'll post a code if you want.
Anyway, this method is too time consuming and wasteful, and not possible for no.of coins>20(in my code).
How to go about this?
I think a greedy algorithm suffices, with one step per coin.
Every move flips a rectangular subset of the board. Some coins are included in more subsets than others: the coin at (0,0) upper-left is in every subset, and the coin at lower-right is in only one subset, namely the one which includes every coin.
So, choosing the first move is obvious: flip every coin if the lower-right corner must be flipped. Eliminate that possible move.
Now, the lower-right coin's immediate neighbors, left and above, can only potentially be flipped by a single remaining move. So, if that move must be performed, do it. The order of evaluation of the neighbors doesn't matter, since they aren't really alternatives to each other. However, a raster pattern should suffice.
Repeat until finished.
Here is a C++ program:
#include <iostream>
#include <valarray>
#include <cstdlib>
#include <ctime>
using namespace std;
void print_board( valarray<bool> const &board, size_t cols ) {
for ( size_t i = 0; i < board.size(); ++ i ) {
cout << board[i] << " ";
if ( i % cols == cols-1 ) cout << endl;
}
cout << endl;
}
int main() {
srand( time(NULL) );
int const rows = 5, cols = 5;
valarray<bool> board( false, rows * cols );
for ( size_t i = 0; i < board.size(); ++ i ) board[i] = rand() % 2;
print_board( board, cols );
int taken_moves = 0;
for ( size_t i = board.size(); i > 0; ) {
if ( ! board[ -- i ] ) continue;
size_t sizes[] = { i%cols +1, i/cols +1 }, strides[] = { 1, cols };
gslice cur_move( 0, valarray<size_t>( sizes, 2 ),
valarray<size_t>( strides, 2 ) );
board[ cur_move ] ^= valarray<bool>( true, sizes[0] * sizes[1] );
cout << sizes[1] << ", " << sizes[0] << endl;
print_board( board, cols );
++ taken_moves;
}
cout << taken_moves << endl;
}
Not c++. Agree with #Potatoswatter that the optimal solutition is greedy, but I wondered if a Linear Diophantine System also works. This Mathematica function does it:
f[ei_] := (
xdim = Dimensions[ei][[1]];
ydim = Dimensions[ei][[2]];
(* Construct XOR matrixes. These are the base elements representing the
possible moves *)
For[i = 1, i < xdim + 1, i++,
For[j = 1, j < ydim + 1, j++,
b[i, j] = Table[If[k <= i && l <= j, -1, 0], {k, 1, xdim}, {l, 1, ydim}]
]
];
(*Construct Expected result matrix*)
Table[rv[i, j] = -1, {i, 1, xdim}, {j, 1, ydim}];
(*Construct Initial State matrix*)
Table[eiv[i, j] = ei[[i, j]], {i, 1, xdim}, {j, 1, ydim}];
(*Now Solve*)
repl = FindInstance[
Flatten[Table[(Sum[a[i, j] b[i, j], {i, 1, xdim}, {j, 1, ydim}][[i]][[j]])
eiv[i, j] == rv[i, j], {i, 1, xdim}, {j, 1, ydim}]],
Flatten[Table[a[i, j], {i, 1, xdim}, {j, 1, ydim}]]][[1]];
Table[c[i, j] = a[i, j] /. repl, {i, 1, xdim}, {j, 1, ydim}];
Print["Result ",xdim ydim-Count[Table[c[i, j], {i, 1, xdim}, {j, 1,ydim}], 0, ydim xdim]];)
When called with your examples (-1 instead of 0)
ei = ({
{1, 1, 1, 1},
{1, 1, 1, 1}
});
f[ei];
ei = ({
{-1, 1},
{-1, 1}
});
f[ei];
ei = {{-1, 1, -1, 1, -1, 1, -1, 1, 1, -1, 1, -1, -1, -1, -1, 1, -1,
1, -1, 1, -1, 1, -1, 1}};
f[ei];
ei = ({
{-1, -1, -1},
{-1, -1, -1},
{-1, -1, 1},
{-1, 1, 1}
});
f[ei];
The result is
Result :1
Result :2
Result :20
Result :6
Or :)
Solves a 20x20 random problem in 90 seconds on my poor man's laptop.
Basically, you're taking the N+M-1 coins in the right and bottom borders and solving them, then just calling the algorithm recursively on everything else. This is basically what Potatoswatter is saying to do. Below is a very simple recursive algorithm for it.
Solver(Grid[N][M])
if Grid[N-1][M-1] == Heads
Flip(Grid,N-1,M-1)
for each element i from N-2 to 0 inclusive //This is empty if N is 1
If Grid[i][M-1] == Heads
Flip(Grid,i,M-1)
for each element i from M-2 to 0 inclusive //This is empty if M is 1
If Grid[N-1][i] == Heads
Flip(Grid,N-1,i)
if N>1 and M > 1:
Solver(Grid.ShallowCopy(N-1, M-1))
return;
Note: It probably makes sense to implement Grid.ShallowCopy by just having Solver have arguments for the width and the height of the Grid. I only called it Grid.ShallowCopy to indicate that you should not be passing in a copy of the grid, though C++ won't do that with arrays by default anyhow.
An easy criterion for rectangle(x,y) to be flipped seems to be: exactly when the number of ones in the 2x2 square with top-left square (x,y) is odd.
(code in Python)
def flipgame(grid):
w, h = len(grid[0]), len(grid)
sol = [[0]*w for y in range(h)]
for y in range(h-1):
for x in range(w-1):
sol[y][x] = grid[y][x] ^ grid[y][x+1] ^ grid[y+1][x] ^ grid[y+1][x+1]
for y in range(h-1):
sol[y][w-1] = grid[y][w-1] ^ grid[y+1][w-1]
for x in range(w-1):
sol[h-1][x] = grid[h-1][x] ^ grid[h-1][x+1]
sol[h-1][w-1] = grid[h-1][w-1]
return sol
The 2D array returned has a 1 in position (x,y) if rectangle(x,y) should be flipped, so the number of ones in it is the answer to your original question.
EDIT: To see why it works:
If we do moves (x,y), (x,y-1), (x-1,y), (x-1,y-1), only square (x,y) is inverted. This leads to the code above. The solution must be optimal, as there are 2^(hw) possible configurations of the board and 2^(hw) possible ways to transform the board (assuming every move can be done 0 or 1 times). In other words, there is only one solution, hence the above produces the optimal one.
You could use recursive trials.
You would need at least the move count and to pass a copy of the vector. You'd also want to set a maximum move cutoff to set a limit to the breadth of branches coming out of at each node of the search tree. Note this is a "brute force" approach."
Your general algorithm structure would be:
const int MAX_FLIPS=10;
const unsigned int TREE_BREADTH=10;
int run_recursion(std::vector<std::vector<bool>> my_grid, int current flips)
{
bool found = true;
int temp_val = -1;
int result = -1;
//Search for solution with for loops; if true is found in grid, found=false;
...
if ( ! found && flips < MAX_FLIPS )
{
//flip coin.
for ( unsigned int more_flips=0; more_flips < TREE_BREADTH; more_flips++ )
{
//flip one coin
...
//run recursion
temp_val=run_recursion(my_grid,flips+1)
if ( (result == -1 && temp_val != -1) ||
(temp_val != -1 && temp_val < result) )
result = temp_val;
}
}
return result;
}
...sorry in advance for any typos/minor syntax errors. Wanted to prototype a fast solution for you, not write the full code...
Or easier still, you could just use a brute force of linear trials. Use an outer for loop would be number of trials, inner for loop would be flips in trial. On each loop you'd flip and check if you'd succeeded, recycling your success and flip code from above. Success would short circuit the inner loop. At the end of the inner loop, store the result in the array. If failure after max_moves, store -1. Search for the max value.
A more elegant solution would be to use a multithreading library to start a bunch of threads flipping, and have one thread signal to others when it finds a match, and if the match is lower than the # of steps run thus far in another thread, that thread exits with failure.
I suggest MPI, but CUDA might win you brownie points as it's hot right now.
Hope that helps, good luck!