I am trying to test the LAPACK method CGESV, but I am encountering an issue. I want to reuse my 'A' matrix in other parts of my code, but it changes when I pass it into the method. The definition of 'A':
(input/output) COMPLEX array, dimension (LDA,N)
On entry, the N-by-N coefficient matrix A.
On exit, the factors L and U from the factorization
A = P*L*U; the unit diagonal elements of L are not stored.
Is there a way to keep the value of A after passing it into CGESV short of creating a temp variable to store the value?
As you already noticed the A matrix is overwritten with P*L*U decomposition. If the size of the matrix is not so big, you can copy the contents of A matrix and use the copy for the decomposition.
CALL CCOPY(N*N, A, 1, A_NEW, 1)
If the matrix size is so big that you can not keep two copies of it in memory, you can perform the math operations with the decomposed matrix. For example to compute y=A*x
* y = x
CALL CCOPY(N, X, 1, Y, 1)
* y = U * y
CALL CTRMV('Upper', 'No transpose', 'Non-unit', N, A, N, Y, 1)
* y = L * y
CALL CTRMV('Lower', 'No transpose', 'Unit', N, A, N, Y, 1)
* y = P * y
CALL DLASWP( 1, Y, N, 1, N, IPIV, 1 )
The additional memory needed is the integer IPIV sized N.
The routines do their work in-place, so the only way to keep the original array is to make a copy.
Related
I am new to OpenVX, learning from the document that OpenVX uses a row-major storage. And the below matrix access example illustrate it, just like the ordinary row-major access pattern as we used in plain C code.
Then I go to the vx_matrix and vxCreateMatrix document page. The former has such statements:
VX_MATRIX_ROWS - The M dimension of the matrix [REQ-1131]. Read-only [REQ-1132]. Use a vx_size parameter.
VX_MATRIX_COLUMNS - The N dimension of the matrix [REQ-1133]. Read-only [REQ-1134]. Use a vx_size parameter.
While the latter said:
vx_matrix vxCreateMatrix(
vx_context c,
vx_enum data_type,
vx_size columns,
vx_size rows);
So according to my comprehension, in OpenVX world, when i said an MxN matrix, M refers to the row size and N refers to the column size. And the vxCreateMatrix declaration just follow what the row-major storage said, parameter column first and then row.
However, it really confuses me when i reach Warp Affine page, it said:
This kernel performs an affine transform with a 2x3 Matrix M with this method of pixel coordinate translation [REQ-0498]:
And the C declartion:
// x0 = a x + b y + c;
// y0 = d x + e y + f;
vx_float32 mat[3][2] = {
{a, d}, // 'x' coefficients
{b, e}, // 'y' coefficients
{c, f}, // 'offsets'
};
vx_matrix matrix = vxCreateMatrix(context, VX_TYPE_FLOAT32, 2, 3);
vxCopyMatrix(matrix, mat, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST);
If the M is a 2x3 matrix, according to the previous section, it should has 2 row and 3 column. Then why should it be declared as mat[3][2] and createMatrix accept column=2 and row=3 as argument? Is my comprehension totally wrong?
This would be a good start and help for your implementation
https://software.intel.com/content/www/us/en/develop/documentation/sample-color-copy/top/color-copy-pipeline/color-copy-pipeline-the-scan-pre-process-openvx-graph.html
I have to store matrices where the non-zero elements are arranged in a "chess table"-like pattern (1,1), (1,3), (2,2), (2,4), etc. I can't store the zero elements, but I also need to implement addition and multiplication.
The elements are stored in this vector:
std::vector<std::vector<int>> _v;
And I have
size_t n,m
for their size, so printing and addition are fairly straightforward.
Where I run into problems is multiplication.
As an example, multiplying [ [1,0,3],[0,2,0] ] and [ [2,0],[0,1], [1,0] ] would result in [ [5,0],[0,2] ]. The way I'm storing the first matrix is [ [1],[2],[3] ] and the second [ [2,1], [1] ]. Is storing these matrices this way fundamentally wrong? If so, what would be a proper way to store these so I can multiply them?
I would strongly recommend:
Storing data in a simple one-dimensional valarray or vector of
integers. The size of the vector would be "width * height / 2", since
you won't store every second element.
Implementing your own get/set methods for accessing items. In these methods, you'll perform the transformation from x, y to index. And you'll get or set the value
only if x * y is an even number, because you know that every second
item would be zero. Example:
class CheckeredMatrix
{
// ...
public:
void set(int x, int y, int value)
{
int i = y * _width + x;
if (i % 2 == 0)
_data[i / 2] = value;
}
}
Implementing operations add(), sub(), mul() using
only these get/set methods. Then you don't need to modify algorithms
in any way. Since you know that every second item is zero, you can
then optimize the algorithms by skipping every second step, but it
will work without it.
Wrapping everything in a class.
I suggest to represent your matrix as 2 matrices, one with (2*i+1, 2*j+1) coordinates, the other with (2*i, 2*j).
So
a 0 b 0 c
0 d 0 e 0
f 0 g 0 h
->
a b c
f g h
and
d e
Those 2 matrices allow some regular matrix operations (sum, multiplication, transposition).
I want use LAPACK to calculate Q * x and Q^T * x, where Q comes from the reduced QR factorization of an m by n matrix A (m > n), stored in the form of Householder reflectors and a vector tau, as obtained from DGEQRF and x is a vector of length n in the case of Q * x and length m in the case of Q^T * x.
The documentation of DORMQR states that x is overwritten with the result, which already confuses me, since x and Q * x obviuosly have different dimensions if the original matrix A and subsequently its reduced Q are not square. Furthermore it states that
"Q is of order M if SIDE = 'L' and of order N if SIDE = 'R'."
In my case, only the first half applies and M refers to the length of x. What do they mean by order? I have rarely ever heard the term "order" in the context of non-square matrices, and if so, it would be something like m by n, and not just a single number. Do they mean rank?
Can I even use DORMQR to calculate both Q * x and Q^T * x for a non-square Q, or is it not designed for this? Do I need to pad x with zeros?
DORMQR applies only to Q a square matrix. Although the input A to the procedure relates to elementary reflectors, such as output of DGEQRF which can be more general, the documentation has the additional restriction that Q "is a real orthogonal matrix".
Of course, to be orthogonal, Q must be square.
In C++ interface of SuiteSparse, I can use
SuiteSparseQR_factorization <double> *QR;
QR = SuiteSparseQR_factorize(A) ;
to calculate QR decomposition of matrix A so that I can reuse QR for further calculation. But I wonder can I get the real Q,R directly from
this QR object?
SuiteSparse is awesome, but the interface can be confusing. Unfortunately, the methods that involve the SuiteSparseQR_factorization struct, which appear to be the most convenient, haven't worked so well for me in practice. For instance, using SuiteSparseQR_factorize and then SuiteSparseQR_qmult with a sparse matrix input argument actually converts it to a dense matrix first, which seems completely unnecessary!
Instead, use
template <typename Entry> SuiteSparse_long SuiteSparseQR
(
// inputs, not modified
int ordering, // all, except 3:given treated as 0:fixed
double tol, // only accept singletons above tol
SuiteSparse_long econ, // number of rows of C and R to return; a value
// less than the rank r of A is treated as r, and
// a value greater than m is treated as m.
int getCTX, // if 0: return Z = C of size econ-by-bncols
// if 1: return Z = C' of size bncols-by-econ
// if 2: return Z = X of size econ-by-bncols
cholmod_sparse *A, // m-by-n sparse matrix
// B is either sparse or dense. If Bsparse is non-NULL, B is sparse and
// Bdense is ignored. If Bsparse is NULL and Bdense is non-NULL, then B is
// dense. B is not present if both are NULL.
cholmod_sparse *Bsparse,
cholmod_dense *Bdense,
// output arrays, neither allocated nor defined on input.
// Z is the matrix C, C', or X
cholmod_sparse **Zsparse,
cholmod_dense **Zdense,
cholmod_sparse **R, // the R factor
SuiteSparse_long **E, // size n; fill-reducing ordering of A.
cholmod_sparse **H, // the Householder vectors (m-by-nh)
SuiteSparse_long **HPinv,// size m; row permutation for H
cholmod_dense **HTau, // size nh, Householder coefficients
// workspace and parameters
cholmod_common *cc
) ;
This method will perform the factorization and then, optionally, output (among other things) R, the matrix product Z = Q^T * B (or its transpose -- B^T * Q), or the solution of a linear system. To get Q, define B as the identity matrix. Here's an example to get Q and R.
cholmod_common Common, * cc;
cc = &Common;
cholmod_l_start(cc);
cholmod_sparse *A;//assume you have already defined this
int ordering = SPQR_ORDERING_BEST;
double tol = 0;
Long econ = A->nrow;
int getCTX = 1;// Z = (Q^T * B)^T = B^T * Q
cholmod_sparse *B = cholmod_l_speye(A->nrow, A->nrow, CHOLMOD_REAL, cc);//the identity matrix
cholmod_sparse *Q, *R;//output pointers to the Q and R sparse matrices
SuiteSparseQR<double>(ordering, tol, econ, getCTX, A, B, NULL, &Q, NULL, &R, NULL, NULL, NULL, NULL, cc);
If you want any of the other outputs to perform subsequent operations without the use of an explicitly formed Q and/or R, then you need to substitute the NULL's for additional pointers and then make calls to SuiteSparseQR_qmult.
Is there a function in LAPACK, which will give me the elements of a particular submatrix? If so how what is the syntax in C++?
Or do I need to code it up?
There is no function for accessing a submatrix. However, because of the way matrix data is stored in LAPACK routines, you don't need one. This saves a lot of copying, and the data layout was (partially) chosen for this reason:
Recall that a dense (i.e., not banded, triangular, hermitian, etc) matrix in LAPACK is defined by four values:
a pointer to the top left corner of the matrix
the number of rows in the matrix
the number of columns in the matrix
the "leading dimension" of the matrix; typically this is the distance in memory between adjacent elements of a row.
Most of the time, most people only ever use a leading dimension that is equal to the number of rows; a 3x3 matrix is typically stored like so:
a[0] a[3] a[6]
a[1] a[4] a[7]
a[2] a[5] a[8]
Suppose instead that we wanted a 3x3 submatrix of a huge matrix with leading dimension lda. Suppose we specifically want the 3x3 submatrix whose top-left corner is located at a(15,42):
. . .
. . .
... a[15+42*lda] a[15+43*lda] a[15+44*lda] ...
... a[16+42*lda] a[16+43*lda] a[16+44*lda] ...
... a[17+42*lda] a[17+43*lda] a[17+44*lda] ...
. . .
. . .
We could copy this 3x3 matrix into contiguous storage, but if we want to pass it as an input (or output) matrix to an LAPACK routine, we don't need to; we only need to define the parameters appropriately. Let's call this submatrix b; we then define:
// pointer to the top-left corner of b:
float *b = &a[15 + 42*lda];
// number of rows in b:
const int nb = 3;
// number of columns in b:
const int mb = 3;
// leading dimension of b:
const int ldb = lda;
The only thing that might be surprising is the value of ldb; by using the value lda of the "big matrix", we can address the submatrix without copying, and operate on it in-place.
However
I lied (sort of). Sometimes you really can't operate on a submatrix in place, and genuinely need to copy it. I didn't want to talk about that, because it's rare, and you should use in-place operations whenever possible, but I would feel bad not telling you that it is possible. The routine:
SLACPY(UPLO,M,N,A,LDA,B,LDB)
copies the MxN matrix whose top-left corner is A and is stored with leading dimension LDA to the MxN matrix whose top-left corner is B and has leading dimension LDB. The UPLO parameter indicates whether to copy the upper triangle, lower triangle, or the whole matrix.
In the example I gave above, you would use it like this (assuming the clapack bindings):
...
const int m = 3;
const int n = 3;
float b[9];
const int ldb = 3;
slacpy("A", // anything except "U" or "L" means "copy everything"
&m, // number of rows to copy
&n, // number of columns to copy
&a[15 + 42*lda], // pointer to top-left element to copy
lda, // leading dimension of a (something huge)
b, // pointer to top-left element of destination
ldb); // leading dimension of b (== m, so storage is dense)
...