I am trying to store the stiffness matrix in FORTRAN in sparse format to save memory, i.e. I am using three vectors of non-zero elements (irows, icols, A). After finding out the size of these arrays the next step is to insert the values in them. So I am using gauss points, i.e. for each gauss point I am going to find out the local stiffness matrix and then insert this local stiffness matrix in the Global (irows, icols, A) one.
The main problem with this insertion is that every time we have to check that either the new value is exists in the global array or not, so if the value exists add the new to the old but if not append to the end. i.e. we have to search the whole array to find that either the value exists or not. If the size of these arrays (irows, icols, A) are large so this search is computationally very expensive.
Can any one suggest a better way of insertion of the local stiffness matrix for each gauss point the global stiffness matrix.
I am fairly sure that this is a well known problem in FEM analysis - I found reference to it in this scipy documentation, but of course the principals are language independent. Basically what you should do is create your matrix in the format you have, but instead of searching the matrix to see whether an entry already exists, just assume that it doesn't. This means that you will end up with duplicate entries which need to be added together to get the correct value.
Once you have constructed your matrix, you will usually convert it to some more efficient form for solving it (e.g. CSR etc.) - the exact format may be determined by the sparse solver you are using. During this conversion process duplicate entries should get added together - and some sparse matrix libraries will do this for you. I know that scipy does this, and many of its internal routines are written in fortran, so you might be able to use one of them (they are all open source). Or you could check if anything suitable is on netlib.
If you use a data structure that is pre-sorted it would be very efficient to search it. Either as your primary data structure or as an auxiliary data structure. You want one that you can insert another entry into the middle. For example, a binary search tree (http://en.wikipedia.org/wiki/Binary_search_tree).
Related
Lets assume we have a huge symmetric diagonal matrix. What is the efficient way to implement this?
The only way that i could think of is that by using the symmetric property where Xij = Xji, we can reduce the size of this matrix by half. But then representing this matrix using a 2D array would be inefficient, since we cant reduce the matrix size by using arrays.
Another thing representing this matrix using adjacency list also would be inefficient, because relating this matrix to a graph. It would be a density graph. And the operation of adj list takes lots of time such as removing, inserting and searching.
But what about using heaps?
There is no one answer until you decide what you are going to do with this matrix (or maybe matrices?).
If you are just going to store and remember it, then just store it sequentially, leaving out the redundant entries. (Your code knows how to access it, because that is all it does, right?)
More probably, you want to do normal matrix operations on it. In that case, are you trying to make the storage efficient, or the execution? In the later case, I don't see many opportunities based on it being symmetric--the multiplies are the expensive thing and you probably still need all of those. If it is the storage, then are you limiting yourself to operations that only take symmetric in and symmetric out? Sounds awfully specific. If so, then you only need to do the calculations for the part you are storing, because, by definition the other entries are symmetric, so just write your code to generate that part of the matrix and you are done.
I'm working on the Levenshtein distance with Wagner–Fischer algorithm in Julia.
It would be easy to get the optimal value, but a little hard to get the optimal operation sequence, like insert or deletion, while backtrace from the right down corner of the matrix.
I can record the pointer information of each d[i][j], but it might give me 3 directions to go back to d[i-1][j-1] for substitution, d[i-1][j] for deletion and d[i][j-1] for insertion. So I'm trying to get all combination of the operation sets that gave me the optimal Levenshtein distance.
It seems that I can store one operation set in one array, but I don't know the total number of all combinations as well as there length, so it would be hard for me to define an array to store the operation set during the enumeration process. How can I generate arrays while store the former ones? Or I should use Dataframe?
If you implement the Wagner-Fischer algorithm, at some point, you choose the minimum over three alternatives (see Wikipedia pseudo-code). At this point, you save the chosen alternative in another matrix. Using a statement like:
c[i,j] = indmin([d[i-1,j]+1,d[i,j-1]+1,d[i-1,j-1]+1])
# indmin returns the index of the minimum element in a collection.
Now c[i,j] contains 1,2 or 3 according to deletion, insertion or substitution.
At the end of the calculation, you have the final d matrix element achieving the minimum distance, you then follow the c matrix backwards and read the action at each step. Keeping track of i and j allows reading the exact substitution by looking which element was in string1 at i and string2 at j in the current step. Keeping a matrix like c cannot be avoided because at the end of the algorithm, the information about the intermediate choices (done by min) would be lost.
I'm not sure that I got your question but anyway, vectors in Julia are dynamic data structures, so you are always able to grow it using appropriate function, e.g pull!() , append!() , preapend!() also its possible to reshape() the result vector to an array of desired size.
but one particular approach for the above case could be obtained using sparse() matrix:
import Base.zero
Base.zero(ASCIIString)=""
module GSparse
export insertion,deletion,substitude,result
s=sparse(ASCIIString[])
deletion(newval::ASCIIString)=begin
global s
s.n+=1
push!(s.colptr,last(s.colptr))
s[s.m,s.n]=newval
end
insertion(newval::ASCIIString)=begin
global s
s.m+=1
s[s.m,s.n]=newval
end
substitude(newval::ASCIIString)=begin
global s
s.n+=1
s.m+=1
push!(s.colptr,last(s.colptr))
s[s.m,s.n]=newval
end
result()=begin
global s
ret=zeros(ASCIIString,size(s))
le=length(s)
for (i = 1:le)
ret[le-i+1]=s[i]
end
ret
end
end
using GSparse
insertion("test");
insertion("testo");
insertion("testok");
substitude("1estok");
deletion("1stok");
result()
I like the approach because for large texts you could have many zero elements. also I fill data structure in forward way and create results by reversing.
I'm trying to write a driver in C++ to calculate the eigenvalues for an asymmetric, real-valued sparse matrix using the fortran functions offered by ARPACK, but I am having a bit of trouble with the reverse communication approach.
Generally, I am trying to solve the normal eigenvalue equation:
A*v = lambda*v
and any interaction with the matrix A is done in ARPACK via a function 'av':
av(n, workd[ipntr[0]], workd[ipntr[1]])
which multiplies the vector held in the array 'workd' beginning at location 'ipntr[0]' and inserts the result into the array 'workd' beginning at location 'ipntr[1]'. Examples of this approach are given in the manual at http://www.caam.rice.edu/software/ARPACK/ and also in the ARPACK/EXAMPLES/SIMPLE/dnsimp.f code.
What I would like to know is how do I actually involve the matrix A? If it is not passed to the routine then how is it possible to find its action on the vector provided?
In the example code dnsimp.f their matrix A is calculated within the function 'av', and is 'derived from the standard central difference discretisation of the 2 dimensional convection-diffusion operator'. However, I believe this is problem specific? It also doesn't seem too useful to have to code the derivation of the matrix A into the function. I can't find much information on this from the manual either.
It doesn't seem to be too much of a problem, since as it is a user defined function I am able to just change the definition of 'av' to include the matrix A as a parameter. However I would like to know how it is done properly in case of any potential compatibility issues.
Thank you!
You don't have to supply the matrix to ARPACK.
All you have to do, is to multiply the matrix with the returned vectors (thus, reverse communication) till the desired convergence is reached.
For information on the algorithms, you should take a look at the users guide and especially on the chapter about the underlying algorithms.
Response to comment: The underlying algorithm is a form of Arnoldi Iteration. The basic algorithm is shown in wikipedia and shows, that the matrix A won't be accessed. Neither directly, nor indirectly.
In particular, the algorithms starts with an arbitrary normalized vector q_1. This vector is returned to the user. The user multiplies this vector with the matrix A using their favourite routine (usually some efficient sparse matrix-vector-multiplication) and returns the result to the Arnoldi Iteration to calculate a part of the Hessenberg matrix H (whose eigenvalues typically converge to the extreme eigenvalues of A) and the next vector q_2. This has to be iterated, till your results are converged.
I'm using a the Yale representation of sparse-matrix in power iteration algorithm, everything goes well and fast.
But, now I have a problem, my professor will send the sparse-matrix in a data file unordered, and since the matrix is symmetric only one pair of index will be there.
The problem is, in my implementation I need to insert the elements in order.
I tried somethings to read and after that insert into my sparse-matrix:
1) Using a dense matrix.
2) Using another sparse-matrix implementation, I tried with std::map.
3) Priority queue, I made a array of priority_queues. I insert the element i,j in the priority_queue[i], so when I pop the priority_queue[i] I take the lowest j-index of the row i.
But I need something really fast and memory efficient, because the largest matrix I'll use will be like 100k x 100k, and the tries I made was so slow, almost 200 times slower than the power iteration itself.
Any suggestions? Sorry for the poor english :(
The way many sparse loaders work is that you use an intermediate pure triples structure. I.e. whatever the file looks like, you load it into something like vector< tuple< row, column, value> >.
You then build the sparse structure from that. The reason is precisely what you're running into. Your sparse matrix structure likely has constraints, like you need to know the number of elements in each row/column, or the input needs to be sorted, etc. You can massage your triples array into whatever you need (i.e. by sorting it).
This also makes it trivial to solve your symmetry dilemma. For every triple in the source file, you insert both (row, column, value) and (column, row, value) into your intermediate structure.
Another option is to simply write a script that will sort your professor's file.
FYI, in the sparse world the number of elements (nonzeros) is what matters, not the dimensions of the matrix. 100k-by-100k is a meaningless piece of information. That entire matrix could be totally empty, for example.
Using Armadillo matrix library I am aware that the efficient way of accessing a column in a 2d matrix is via a simply call to .col(i).
I am wondering is there an efficient way of extracting a column stored in a "cube", without first having to call the slice command?
I need the most efficient possible way of accessing the data stored in for instance (using matlab notation) A(:,i,j) . I will be doing this millions of times on a very large dataset, so speed and efficiency is of a high priority.
I think you want
B = A.subcube( span:all, span(i), span(j) );
or equivalently
B = A.subcube( span(), span(i), span(j) );
where B will be a row or column vector of the same type as A (e.g. containing double by default, or a number of other available types).
.slice() should be pretty quick. It simply provides a reference to the underlying Mat class. You could try something along these lines:
cube C(4,3,2);
double* mem = C.slice(1).colptr(2);
Also, bear in mind that Armadillo has range checks enabled by default. If you want to avoid the range checks, use the .at() element accessors:
cube C(4,3,2);
C.at(3,2,1) = 456;
Alternatively, you can store your matrices in a field class:
field<mat> F(100);
F(0).ones(12,34);
Corresponding element access:
F(0)(1,2); // with range checks
F.at(0).at(1,2); // without range checks
You can also compile your code with ARMA_NO_DEBUG defined, which will remove all run-time debugging (such as range checks). This will give you a speedup, but it is only recommended once you have debugged all your code (ie. verified that your algorithm is working correctly). The debugging checks are very useful in picking up mistakes.