I'm trying to understand the documentation for sgemm as I am transitioning code from using this library to a different library.
The function prototype is
sgemm ( character TRANSA,
character TRANSB,
integer M,
integer N,
integer K,
real ALPHA,
real, dimension(lda,*) A,
integer LDA,
real, dimension(ldb,*) B,
integer LDB,
real BETA,
real, dimension(ldc,*) C,
integer LDC
)
I am having trouble understanding the role or LDA and LDB. The documentation says
LDA is INTEGER
On entry, LDA specifies the first dimension of A as declared
in the calling (sub) program. When TRANSA = 'N' or 'n' then
LDA must be at least max( 1, m ), otherwise LDA must be at
least max( 1, k ).
What does it mean that it specifies the first dimension of A? Is this like switching between row and column major? Or is this slicing the tensor?
LD stands for leading dimension. BLAS is originally a library of Fortran 77 subroutines and in Fortran matrices are stored column-wise: A(i,j) is immediately followed in memory by A(i+1,j), which is opposite of C/C++ where a[i][j] is followed by a[i][j+1]. In order to access element A(i,j) of a matrix that has dimensions A(LDA,*) (which reads as LDA rows and an unspecified number of columns), you need to look (j-1)*LDA + (i-1) elements from the beginning of the matrix (Fortran arrays are 1-indexed by default), therefore you need to know the value of LDA. You don't need to know the actual number of columns, therefore the * in the dummy argument.
It is the same in C/C++. If you have a 2D array declared as a[something][LDA], then element a[i][j] is located i*LDA + j positions after the start of the array, and you only need to know LDA - the value of something does not affect the calculation of the address of a[i][j].
Although GEMM operates on an M x K matrix A, the actual data may be embedded in a bigger matrix that is LDA x L, where LDA >= M and L >= K, therefore the LDA is specified explicitly. The same applies to LDB and LDC.
BLAS was developed many years ago when computer programming was quite different than what it is today. Memory management, in particular, was not as flexible as it is nowadays. Allocating one big matrix and then using and reusing portions of it to store smaller matrices was the norm. Also, GEMM is extensively used in, e.g., iterative algorithms that work on various sub-matrices and it is faster to keep the data in the original matrix and just specify the sub-matrix location and dimension, so you need to provide both dimensions.
Starting with Fortran 90, the language has array slicing and automatic array descriptors that allow one to discover both the dimensions of a slice and those of the bigger matrix, so if GEMM was written in Fortran 90 or later, it wouldn't be that verbose in respect to its arguments. But even if that was the case, C doesn't have array descriptors, so you'll still have to provide all those arguments in order to make GEMM callable from C. In C++, one can hide the descriptor inside a matrix class, and many math libraries actually do so (for example, Scythe).
Related
I have a huge m by 1 array (m is very large) called X which is a result of Fortran matmul operation. My problem is to store this apparently 2D array into an 1D array Y of size m.
I tried with Y = reshape(X, [[2]]) and this result some elements NaN. Can anyone point me to Fortran commands to do it quickly. The elements of X may be zero or non-zero.
The second argument of reshape (or the one with keyword shape=) is the shape of the function's result. In your call, you have requested shape [2].
An array with shape [2] is a rank-1 array with two elements. You want a rank-1 array with m elements:
Y = RESHAPE(X, [m])
Now, in this case there's no need to use reshape:
Y = X(:,1)
where the right-hand side is the rank-1 array section of X.
When you have Y=reshape(X,[2]), if Y is not allocatable and not of size 2 then you have a problem which may indeed result in your compiler deciding---as it is quite entitled to do---to give you a few NaNs.
Note also that you may not need to reshape your array, depending on how you intend to later use it.
I'm creating a 2D square matrix in Excel VBA
Dim Matrix(0 To Nrows, 0 To Ncols) as double
and then I pass it as argument ByRef to a function library,
My_foo(Matrix(0,0))
which calls a C++ code. In C++ code I want to access the matrix to get values and make other operations. On this purpose, I read these values as 1D array, so I created an index [R*Ncols + C], where R and C are the element position in 2D representation
void _stdcall My_foo(double* M, long Nrows, long Ncols){
double Value;
R = 10;
C = 0;
Value = M[R*Ncols + C];
}
At this point I expect to find the element having position (10,0), but I find the element (0,10). It seems my matrix is stored with an inverted column/rows order.
It would be enough to set an index like [C*Nrows + R], and access it by column. I tried this and it works, but it contradicts many blogs and posts...
Why? Is it normal?
While C/C++ uses a row-major format to represent multi-dimensional arrays, for other languages different choices were made, so it is not very surprising that Excel VBA uses a column-major format like it is also documented in this forum question.
I tried to port m code to c or cpp.
In my code there is a line
A = sparse(I,J,IA,nR,nC);
which converts row index I, col index J, and data IA to sparse matrix A with size nR x nC.
Is there any equivalent code with C++ or C?
An naïve algorithm to duplicate result in full matrix is
double *A;
A = malloc(sizeof(double)*nR*nC);
memset(A, 0, sizeof(double));
for(k=0; k<size_of_IA; k++)
A[I[k]*nC + J[k]] += IA[k];
Note that if there is common indices, the value is not over overwritten, but accumulated.
Eigen is an example of a C++ math matrix library that cobtains sparse matrices. It overloads operators to make it feel like a built in feature.
There are many C and C++ matrix libraries. None ship as part of std, nor is there anything built in.
Writing a good sparse matrix library would be quite hard; your best bet is finding a pre-written one. Recommendation questions are off topic
I've never written in Fortran, but I'm trying to adapt a script to R and the following lines are confusing me. So this is how the variable is defined:
real, dimension(n,nd) :: x
Does this mean x is n arrays filled with nd number of real values or a n x nd matrix?
Then
amax = maxval(abs(x))
x = x/amax
is applied. Is the variable amax a global max of the absolute values in x or is it an array of n max values, one for each row? This is important to know if the x = x/amax is being applied to each row or the entire matrix. The purpose of this function seems to be some type of normalization.
The question of the title is much more general than that of the body, so I'll come to that later.
The result of maxval(array) is a scalar, being the maximum value in array (if it's of non-zero size).
In your example, x is a single array of rank 2 (which is commonly thought of as being a matrix). Thus, maxval(x) is indeed what you call the global maximum of that matrix. An alternative form of maxval is required to give the row-by-row maxima: maxval(x,dim=2).
Now, there is something else to note from your example:
x = x/amax
has a requirement about the shapes of x and amax.
You don't give a declaration for amax but there are two possibilities:
amax has the same shape as x; or
amax is a scalar.
[Note that amax needn't be a scalar just because it is assigned a scalar result from that maxval reference. However, you will see that amax won't be declared as rank 1 with size the number of rows of x, so that's another clue that maxval is giving the global maximum.]
These two possibilities come from conformability rules for division. With amax a scalar each element of x is divided by that value; with amax an array each element of x is divided by the corresponding element in amax.
If you want to normalize each individual row of x then you just can't use that division expression with amax a rank 1 array.
Coming to the more general question: even though it's an either/or question the answer is "no". There is no single way. Each function acts as it is defined.
As a general rule, though, the intrinsic functions of Fortran rarely care about the specific case of arrays which have "rows". But one useful thought is that a function acts either:
on all elements individually, returning an array of the same shape;
on the array as a whole, returning a scalar.
Moderated by the fact that many will have this dim argument which causes the function to act on slices instead.
The first line means that the variable x is an array of two dimensions (n,nd) and not n arrays of nd values. The function maxval returns the maximum value in this array.
See page 130 (in the PDF not the printed number) in F90_notes.pdf (you will also find a whole chapter concerning the arrays in the same document).
To add to Baruchel's answer: x/amax divides each element of the 2D array x by the scalar amax.
in C++, what is the indexing value for a W * H * D sized 3D array?
for a particular i, j, k is this the correct indexing:
i*W*H+j*W+k
What you have written is equivalent to the pointer arithmetic that this would do:
T x[D][H][W];
x[i][j][k]; // Pointer arithmetic done here
Obviously, depending on how you order D, H and W (or i, j, k), the calculation will differ.
There is no one "correct" order, but the version you've given should work. The order in which you apply the indices will determine whether you do row-major or column-major indexing. If you're porting Fortran code (for example) it can make sense to reverse the "normal" C order.
Width, height and depth are meaningless in this context. What you need to know is that multidimensional arrays are stored in row-major order.
Yes, assuming i varies from 0 ... D-1, j varies from 0 ... H-1, and k varies from 0 ... W-1.
Usually, though, the purpose of having an indexer, I thought, was to express relations within a sparse matrix so you didn't need to deal with the whole thing (and expend memory for it). If your data span the whole matrix, you might look into creating the 3d matrix as a pointer to an array of pointers, which themselves each point to an array of pointers. Using this allows you to use the x[i][j][k] notation but may be faster.
See http://www.nr.com/cpppages/chapappsel.pdf for a description.
If you need to to iterarate over all elements it is best to do in
for i
for j
for k
order. This way, it would be fastest, because index of array is incremented by one each time and values could be precached.
There is no only one correct way to do this but you probably chose best one.