Smart way to implement lookup in high-perf FORTRAN code - fortran

I'm writing a simulation in FORTRAN 77, in which I need to get a parameter value from some experimental data. The data comes from an internet database, so I have downloaded it in advance, but there is no simple mathematical model that can be used to provide values on a continous scale - I only have discrete data points. However, I will need to know this parameter for any value on the x axis, and not only the discrete ones I have from the database.
To simplify, you could say that I know the value of f(x) for all integer values of x, and need a way to find f(x) for any real x value (never outside the smallest or largest x I have knowledge of).
My idea was to take the data and make a linear interpolation, to be able to fetch a parameter value; in pseudo-code:
double xd = largest_data_x_lower_than(x)
double slope = (f(xd+dx)-f(xd))/dx // dx is the distance between two x values
double xtra = x-xd
double fofx = f(xd)+slope*xtra
To implement this, I need some kind of lookup for the data points. I could make the lookup for xd easy by getting values from the database for all integer x, so that xd = int(x) and dx = 1, but I still have no idea how to implement the lookup for f(xd).
What would be a good way to implement this?
The value will be fetched something like 10^7 to 10^9 times during one simulation run, so performance is critical. In other words, reading from IO each time I need a value for f(xd) is not an option.
I currently have the data points in a text file with one pair of (tab-delimited) x,f(x) on each line, so bonus points for a solution that also provides a smooth way of getting the data from there into whatever shape it needs to be.

You say that that you have the values for all integers. Do you have pairs i, f(i) for all integers i from M to N? Then read the values f(i) into an array y dimensioned M:N. Unless the number of values is HUGE. For real values between M and N it is easy to index into the array and interpolate between the nearest pair of values.
And why use FORTRAN 77? Fortran 90/95/2003 have been with us for some years now...
EDIT: Answering question in the comment, re how to read the data values only once, in FORTRAN 77, without having to pass them as an argument in a long chain of calls. Technique 1: on program startup, read them into the array, which is in a named common block. Technique 2: the first time the function that returns f(x) is called, read the values into a local variable that is also on a SAVE statement. Use a logical which is SAVEd to designate whether or not the function is on its first call or not. Generally I'd prefer technique 2 as being more "local", but its not thread safe. If you are the doing simulation in parallel, the first technique could be done in a startup phase, before the program goes multi-threaded.
Here is an example of the use of SAVE: fortran SAVE statement. (In Fortran 95 notation ... convert to FORTRAN 77). Put the read of the data into the array in the IF block.

you probably want a way to interpolate or fit your data, but you need to be more specific about, say, dimensionality of your data, how your data behave, what fashion are you accessing your data (for example, maybe your next request is always near the last one), how the grid is made (evenly spaced, random or some other fashion), and where you're needing the data to be able to know which method is the best for you.
However, if the existing data set is very dense and near linear then you can certainly do a linear interpolation.

Using your database (file), you could create an array fvals with fvals(ii) being the function f(xmin + (ii-1) * dx). The mapping between x-value xx and your array index is ii = floor((xx - xmin) / dx) + 1. Once you know ii, you can use the points around it for interpolation: Either doing linear interpolation using ii and ii+1 or some higher order polynomial interpolation. For latter, you could use the corresponding polint routine from Numerical Recipes. See page 103.

Related

Record all optimal sequence alignments when calculating Levenshtein distance in Julia

I'm working on the Levenshtein distance with Wagner–Fischer algorithm in Julia.
It would be easy to get the optimal value, but a little hard to get the optimal operation sequence, like insert or deletion, while backtrace from the right down corner of the matrix.
I can record the pointer information of each d[i][j], but it might give me 3 directions to go back to d[i-1][j-1] for substitution, d[i-1][j] for deletion and d[i][j-1] for insertion. So I'm trying to get all combination of the operation sets that gave me the optimal Levenshtein distance.
It seems that I can store one operation set in one array, but I don't know the total number of all combinations as well as there length, so it would be hard for me to define an array to store the operation set during the enumeration process. How can I generate arrays while store the former ones? Or I should use Dataframe?
If you implement the Wagner-Fischer algorithm, at some point, you choose the minimum over three alternatives (see Wikipedia pseudo-code). At this point, you save the chosen alternative in another matrix. Using a statement like:
c[i,j] = indmin([d[i-1,j]+1,d[i,j-1]+1,d[i-1,j-1]+1])
# indmin returns the index of the minimum element in a collection.
Now c[i,j] contains 1,2 or 3 according to deletion, insertion or substitution.
At the end of the calculation, you have the final d matrix element achieving the minimum distance, you then follow the c matrix backwards and read the action at each step. Keeping track of i and j allows reading the exact substitution by looking which element was in string1 at i and string2 at j in the current step. Keeping a matrix like c cannot be avoided because at the end of the algorithm, the information about the intermediate choices (done by min) would be lost.
I'm not sure that I got your question but anyway, vectors in Julia are dynamic data structures, so you are always able to grow it using appropriate function, e.g pull!() , append!() , preapend!() also its possible to reshape() the result vector to an array of desired size.
but one particular approach for the above case could be obtained using sparse() matrix:
import Base.zero
Base.zero(ASCIIString)=""
module GSparse
export insertion,deletion,substitude,result
s=sparse(ASCIIString[])
deletion(newval::ASCIIString)=begin
global s
s.n+=1
push!(s.colptr,last(s.colptr))
s[s.m,s.n]=newval
end
insertion(newval::ASCIIString)=begin
global s
s.m+=1
s[s.m,s.n]=newval
end
substitude(newval::ASCIIString)=begin
global s
s.n+=1
s.m+=1
push!(s.colptr,last(s.colptr))
s[s.m,s.n]=newval
end
result()=begin
global s
ret=zeros(ASCIIString,size(s))
le=length(s)
for (i = 1:le)
ret[le-i+1]=s[i]
end
ret
end
end
using GSparse
insertion("test");
insertion("testo");
insertion("testok");
substitude("1estok");
deletion("1stok");
result()
I like the approach because for large texts you could have many zero elements. also I fill data structure in forward way and create results by reversing.

C++ array/vector with a float index

I noticed today that I could give a C++ Vector or Array a Float value as index.
(e.g. tab[0.5f])
This Float value will be converted into an Int value and then gives me the same result as tab[0].
This behavior is not interesting to me, as I'm searching for a method to access in the fastest way possible to an Object, depending on a Float key.
Is it possible to keep the access speed of an array/vector, with a Float index ?
I understand that my keys will have an inaccuracy problem, but I expect my Float values to keep a maximum of 3 digits of precision.
Would a Map<Float, Object> do the job ? I've read on the C++ reference documentation that the Map access was "logarithmic in size", which is way less appealing to me.
Thank you :).
Edit :
I need to transform a mesh M containing X numbers of shared vertices into a mesh M' containing X' number of NON shared vertices.
Indexes of vertices are set in M, and I know it's in TRIANGLE mode.
My current algorithm is :
for i in M.indexes, i+3
take 3indexes, and deducing the vertices they are pointing to (get 3vertices of a triangle)
calculate normal on these vertices
check, for each couple {Vertex_i, Normal} (i between 1 and 3, my 3vertices) if I already have this couple stored, and act accordingly
... Next steps
To check the couple {Vertex,Normal}, i use an Array[x][y][z] based on position of the vertice, which IS a Float, though i know it won't be more than 3digits precision.
Use an unordered_map. The find method has a complexity in average case: constant and in worst case: linear in container size.
Note : Since you were willing to use an array, I'm assuming you're not interested in having an ordered container
That been said, in any case, the performance depends on the input (mesh size) and its characteristics, and the only way to choose an optimal solution would be to implement any reasonable ones and benchmark against each other. In many cases theoretical complexity is irrelevant due to implementation specifics/intrinsics. I mean even if one told that a std::vector<std::pair<float, mapped_value>> would perform better in your case, I'd have to actually do some tests to prove him right/wrong

Efficiently searching arrays in FORTRAN

I am trying to store the stiffness matrix in FORTRAN in sparse format to save memory, i.e. I am using three vectors of non-zero elements (irows, icols, A). After finding out the size of these arrays the next step is to insert the values in them. So I am using gauss points, i.e. for each gauss point I am going to find out the local stiffness matrix and then insert this local stiffness matrix in the Global (irows, icols, A) one.
The main problem with this insertion is that every time we have to check that either the new value is exists in the global array or not, so if the value exists add the new to the old but if not append to the end. i.e. we have to search the whole array to find that either the value exists or not. If the size of these arrays (irows, icols, A) are large so this search is computationally very expensive.
Can any one suggest a better way of insertion of the local stiffness matrix for each gauss point the global stiffness matrix.
I am fairly sure that this is a well known problem in FEM analysis - I found reference to it in this scipy documentation, but of course the principals are language independent. Basically what you should do is create your matrix in the format you have, but instead of searching the matrix to see whether an entry already exists, just assume that it doesn't. This means that you will end up with duplicate entries which need to be added together to get the correct value.
Once you have constructed your matrix, you will usually convert it to some more efficient form for solving it (e.g. CSR etc.) - the exact format may be determined by the sparse solver you are using. During this conversion process duplicate entries should get added together - and some sparse matrix libraries will do this for you. I know that scipy does this, and many of its internal routines are written in fortran, so you might be able to use one of them (they are all open source). Or you could check if anything suitable is on netlib.
If you use a data structure that is pre-sorted it would be very efficient to search it. Either as your primary data structure or as an auxiliary data structure. You want one that you can insert another entry into the middle. For example, a binary search tree (http://en.wikipedia.org/wiki/Binary_search_tree).

Armadillo C++:- Efficient access of columns in a cube structure

Using Armadillo matrix library I am aware that the efficient way of accessing a column in a 2d matrix is via a simply call to .col(i).
I am wondering is there an efficient way of extracting a column stored in a "cube", without first having to call the slice command?
I need the most efficient possible way of accessing the data stored in for instance (using matlab notation) A(:,i,j) . I will be doing this millions of times on a very large dataset, so speed and efficiency is of a high priority.
I think you want
B = A.subcube( span:all, span(i), span(j) );
or equivalently
B = A.subcube( span(), span(i), span(j) );
where B will be a row or column vector of the same type as A (e.g. containing double by default, or a number of other available types).
.slice() should be pretty quick. It simply provides a reference to the underlying Mat class. You could try something along these lines:
cube C(4,3,2);
double* mem = C.slice(1).colptr(2);
Also, bear in mind that Armadillo has range checks enabled by default. If you want to avoid the range checks, use the .at() element accessors:
cube C(4,3,2);
C.at(3,2,1) = 456;
Alternatively, you can store your matrices in a field class:
field<mat> F(100);
F(0).ones(12,34);
Corresponding element access:
F(0)(1,2); // with range checks
F.at(0).at(1,2); // without range checks
You can also compile your code with ARMA_NO_DEBUG defined, which will remove all run-time debugging (such as range checks). This will give you a speedup, but it is only recommended once you have debugged all your code (ie. verified that your algorithm is working correctly). The debugging checks are very useful in picking up mistakes.

how to create a 20000*20000 matrix in C++

I try to calculate a problem with 20000 points, so there is a distance matrix with 20000*20000 elements, how can I store this matrix in C++? I use Visual Studio 2008, on a computer with 4 GB of RAM. Any suggestion will be appreciated.
A sparse matrix may be what you looking for. Many problems don't have values in every cell of a matrix. SparseLib++ is a library which allows for effecient matrix operations.
Avoid the brute force approach you're contemplating and try to envision a solution that involves populating a single 20000 element list, rather than an array that covers every possible permutation.
For starters, consider the following simplistic approach which you may be able to improve upon, given the specifics of your problem:
int bestResult = -1; // some invalid value
int bestInner;
int bestOuter;
for ( int outer = 0; outer < MAX; outer++ )
{
for ( int inner = 0; inner < MAX; inner++ )
{
int candidateResult = SomeFunction( list[ inner ], list[ outer ] );
if ( candidateResult > bestResult )
{
bestResult = candidateResult;
bestInner = inner;
bestOuter = outer;
}
}
}
You can represent your matrix as a single large array. Whether it's a good idea to do so is for you to determine.
If you need four bytes per cell, your matrix is only 4*20000*20000, that is, 1.6GB. Any platform should give you that much memory for a single process. Windows gives you 2GiB by default for 32-bit processes -- and you can play with the linker options if you need more. All 32-bit unices I tried gave you more than 2.5GiB.
Is there a reason you need the matrix in memory?
Depending on the complexity of calculations you need to perform you could simply use a function that calculates your distances on the fly. This could even be faster than precalculating ever single distance value if you would only use some of them.
Without more references to the problem at hand (and the use of the matrix), you are going to get a lot of answers... so indulge me.
The classic approach here would be to go with a sparse matrix, however the default value would probably be something like 'not computed', which would require special handling.
Perhaps that you could use a caching approach instead.
Apparently I would say that you would like to avoid recomputing the distances on and on and so you'd like to keep them in this huge matrix. However note that you can always recompute them. In general, I would say that trying to store values that can be recomputed for a speed-off is really what caching is about.
So i would suggest using a distance class that abstract the caching for you.
The basic idea is simple:
When you request a distance, either you already computed it, or not
If computed, return it immediately
If not computed, compute it and store it
If the cache is full, delete some elements to make room
The practice is a bit more complicated, of course, especially for efficiency and because of the limited size which requires an algorithm for the selection of those elements etc...
So before we delve in the technical implementation, just tell me if that's what you're looking for.
Your computer should be able to handle 1.6 GB of data (assuming 32bit)
size_t n = 20000;
typedef long dist_type; // 32 bit
std::vector <dist_type> matrix(n*n);
And then use:
dist_type value = matrix[n * y + x];
You can (by using small datatypes), but you probably don't want to.
You are better off using a quad tree (if you need to find the nearest N matches), or a grid of lists (if you want to find all points within R).
In physics, you can just approximate distant points with a field, or a representative amalgamation of points.
There's always a solution. What's your problem?
Man you should avoid the n² problem...
Put your 20 000 points into a voxel grid.
Finding closest pair of points should then be something like n log n.
As stated by other answers, you should try hard to either use sparse matrix or come up with a different algorithm that doesn't need to have all the data at once in the matrix.
If you really need it, maybe a library like stxxl might be useful, since it's specially designed for huge datasets. It handles the swapping for you almost transparently.
Thanks a lot for your answers. What I am doing is to solve a vehicle routing problem with about 20000 nodes. I need one matrix for distance, one matrix for a neighbor list (for each node, list all other nodes according to the distance). This list will be used very often to find who can be some candidates. I guess sometimes distances matrix can be ommited if we can calculate when we need. But the neighbor list is not convenient to create every time. the list data type could be int.
To mgb:
how much can a 64 bit windows system help this situation?