Performance optimization of RK4 in Python

Performance optimization of RK4 in Python - python-2.7

I am designing the classical Runge-Kutta scheme (RK4) for a large number of coupled equations in Python 2.7. Since there is going to be over a hundred coupled 1st order equations the for-loops will be hell large and I am looking for some optimization hints.
1. When calculating a vector of returned variables for RK coefficients is it better to...
Prealocate a numpy array and fill it or
Use list.append for each variable and numpy.array(list) at the end?
2. The coupled equations obviously have coefficients. Is it better to...
Insert them into the function called in RK4 steps (i.e. redefine them each time the function evaluation is called) or
Label them as global variables?

Related

Functor computing error vector and Jacobian in a single call using Eigen::LevenbergMarquardt

I'm using the Eigen::LevenbergMarquardt solver for a model fitting application. The functor I'm providing to it includes functions to compute the error vector and Jacobian. These functions contain a lot of similar code and some costly calculations are repeated duplicated.
The prototype of the () operator used to compute the error vector includes what appears to be an optional pointer to a Jacobian matrix. If the Eigen::LevenbergMarquardt solver can be setup to compute the error vector and Jacobian at the same time in relevant cases it would really speed up my algorithm.
int operator()(const Eigen::VectorXf& z, Eigen::VectorXf& fvec, Eigen::MatrixXf* _j = 0)
I have not found any documentation describing this _j parameter or how it can be used. Checking its value during my code shows it is always a NULL pointer.
Does anyone know what this parameter is used for and if it's possible to compute the error vector and Jacobian simultaneously when both are needed?

Not sure about this particular solver but these kind of parameters are commonly only used whenever the solver needs it. Maybe it is just an extension point for the future. Looking at the source code, the LM solver never calls the functor with that parameter.
I think a better approach in your case would be to cache redundant parts of the computation within your functor. Maybe just keep a copy of the input vector and do a quick memcmp before doing the computation. Not ideal but since the interface has no way of telling you when the inputs change, that's probably the most robust option you have.

The most optimized way of calculating distance between data in c++

I have n points in a 2D plane. I want to calculate the distance between each two points in c++. Position of m'th point in the plan is (x(m),y(m)). This points changes during passing time. The number of time steps is equal to 10^5.
I wrote below code, but as n is a big number(5000) and I want to find the distance between points 10^5 times, I'm searching for the most optimized way to do that. Could anyone tell me what is the least time-consuming way to do that?
for(i=1;n;++)
for(j=1;n;++)
if (i>j)
r(i,j)= r(j,i);
else
r(i,j)=sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
end
end
end
I know that, in Matlab, I can find this by using bsxfun function. I want also to know which one could calculate distances faster? Matlab or c++?

Regarding Matlab, you have also pdist which does exactly that (but is not so fast), and you should also read this.
About comparing Matlab and C first read this and this. Also keep in mind that Matlab, as a desktop program, requires knowing not only a general efficient way to implement your code but also the right way to do this in Matlab. One example is the difference between functions. Built-in functions are written in FORTRAN or C and run much faster then non-built-in functions. To know if a function is built-in, type:
which function_name
and check if you see "built-in" at the start of the output:
built-in (C:\Program Files\MATLAB\...)

Calling the ARPACK reverse communicated matrix-vector routine

I'm trying to write a driver in C++ to calculate the eigenvalues for an asymmetric, real-valued sparse matrix using the fortran functions offered by ARPACK, but I am having a bit of trouble with the reverse communication approach.
Generally, I am trying to solve the normal eigenvalue equation:
A*v = lambda*v
and any interaction with the matrix A is done in ARPACK via a function 'av':
av(n, workd[ipntr[0]], workd[ipntr[1]])
which multiplies the vector held in the array 'workd' beginning at location 'ipntr[0]' and inserts the result into the array 'workd' beginning at location 'ipntr[1]'. Examples of this approach are given in the manual at http://www.caam.rice.edu/software/ARPACK/ and also in the ARPACK/EXAMPLES/SIMPLE/dnsimp.f code.
What I would like to know is how do I actually involve the matrix A? If it is not passed to the routine then how is it possible to find its action on the vector provided?
In the example code dnsimp.f their matrix A is calculated within the function 'av', and is 'derived from the standard central difference discretisation of the 2 dimensional convection-diffusion operator'. However, I believe this is problem specific? It also doesn't seem too useful to have to code the derivation of the matrix A into the function. I can't find much information on this from the manual either.
It doesn't seem to be too much of a problem, since as it is a user defined function I am able to just change the definition of 'av' to include the matrix A as a parameter. However I would like to know how it is done properly in case of any potential compatibility issues.
Thank you!

You don't have to supply the matrix to ARPACK.
All you have to do, is to multiply the matrix with the returned vectors (thus, reverse communication) till the desired convergence is reached.
For information on the algorithms, you should take a look at the users guide and especially on the chapter about the underlying algorithms.
Response to comment: The underlying algorithm is a form of Arnoldi Iteration. The basic algorithm is shown in wikipedia and shows, that the matrix A won't be accessed. Neither directly, nor indirectly.
In particular, the algorithms starts with an arbitrary normalized vector q_1. This vector is returned to the user. The user multiplies this vector with the matrix A using their favourite routine (usually some efficient sparse matrix-vector-multiplication) and returns the result to the Arnoldi Iteration to calculate a part of the Hessenberg matrix H (whose eigenvalues typically converge to the extreme eigenvalues of A) and the next vector q_2. This has to be iterated, till your results are converged.

Smart way to implement lookup in high-perf FORTRAN code

I'm writing a simulation in FORTRAN 77, in which I need to get a parameter value from some experimental data. The data comes from an internet database, so I have downloaded it in advance, but there is no simple mathematical model that can be used to provide values on a continous scale - I only have discrete data points. However, I will need to know this parameter for any value on the x axis, and not only the discrete ones I have from the database.
To simplify, you could say that I know the value of f(x) for all integer values of x, and need a way to find f(x) for any real x value (never outside the smallest or largest x I have knowledge of).
My idea was to take the data and make a linear interpolation, to be able to fetch a parameter value; in pseudo-code:
double xd = largest_data_x_lower_than(x)
double slope = (f(xd+dx)-f(xd))/dx // dx is the distance between two x values
double xtra = x-xd
double fofx = f(xd)+slope*xtra
To implement this, I need some kind of lookup for the data points. I could make the lookup for xd easy by getting values from the database for all integer x, so that xd = int(x) and dx = 1, but I still have no idea how to implement the lookup for f(xd).
What would be a good way to implement this?
The value will be fetched something like 10^7 to 10^9 times during one simulation run, so performance is critical. In other words, reading from IO each time I need a value for f(xd) is not an option.
I currently have the data points in a text file with one pair of (tab-delimited) x,f(x) on each line, so bonus points for a solution that also provides a smooth way of getting the data from there into whatever shape it needs to be.

You say that that you have the values for all integers. Do you have pairs i, f(i) for all integers i from M to N? Then read the values f(i) into an array y dimensioned M:N. Unless the number of values is HUGE. For real values between M and N it is easy to index into the array and interpolate between the nearest pair of values.
And why use FORTRAN 77? Fortran 90/95/2003 have been with us for some years now...
EDIT: Answering question in the comment, re how to read the data values only once, in FORTRAN 77, without having to pass them as an argument in a long chain of calls. Technique 1: on program startup, read them into the array, which is in a named common block. Technique 2: the first time the function that returns f(x) is called, read the values into a local variable that is also on a SAVE statement. Use a logical which is SAVEd to designate whether or not the function is on its first call or not. Generally I'd prefer technique 2 as being more "local", but its not thread safe. If you are the doing simulation in parallel, the first technique could be done in a startup phase, before the program goes multi-threaded.
Here is an example of the use of SAVE: fortran SAVE statement. (In Fortran 95 notation ... convert to FORTRAN 77). Put the read of the data into the array in the IF block.

you probably want a way to interpolate or fit your data, but you need to be more specific about, say, dimensionality of your data, how your data behave, what fashion are you accessing your data (for example, maybe your next request is always near the last one), how the grid is made (evenly spaced, random or some other fashion), and where you're needing the data to be able to know which method is the best for you.
However, if the existing data set is very dense and near linear then you can certainly do a linear interpolation.

Using your database (file), you could create an array fvals with fvals(ii) being the function f(xmin + (ii-1) * dx). The mapping between x-value xx and your array index is ii = floor((xx - xmin) / dx) + 1. Once you know ii, you can use the points around it for interpolation: Either doing linear interpolation using ii and ii+1 or some higher order polynomial interpolation. For latter, you could use the corresponding polint routine from Numerical Recipes. See page 103.

Performance question: Inverting an array of pointers in-place vs array of values

The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence.
The way I'm doing it now, using a matrix library (MTL4):
For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option.
Using an array of pointers instead:
For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place?
The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations.
So the performance question boils down to this, as I see it:
- The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive.
- The array of pointers does not need the extra memory for matrix A containing values.
- The array of pointers option does not have to copy all NxN values of A between each iteration.
- The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc.
Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

Test, test, test.
Especially in the field of Numerical Linear Algebra. There are many effects in play, which is why there is a number of optimized libraries that have solved that burden for you.
Some effects to consider:
Memory locality and cache effects
Multithreading effects (some algorithms that are optimal while running single-core, cause memory collision/cache eviction when more than one core is utilized).
There is no substitute for testing.

Here are some comments:
Is the function you use for the inversion capable of handling a matrix of pointers instead of values? If it does not realise it has to do an indirection, all kinds of strange effects could happen.
When doing an in-place matrix inversion (meaning the inverted matrix overwrites the input matrix), all input coefficients will get overwritten with new values, because matrix inversion can not be done by re-ordering the elements of the matrix.
During the inversion process, none of the input coefficients may be changed by an outside process. All such updates have to be performed between iterations.
So, you get the following set of trade-offs when you chose the pointer solution:
The coefficients making up matrix A can no longer be calculated asynchronously with the matrix inversion.
Either all coefficients must be recalculated for each iteration (when you use in-place inversion, meaning the inverted matrix uses the same memory as the input matrix), or you still have to use a matrix of N x N values to hold the result of the inversion.

You're getting good answers here. The only thing I would add is some general experience with performance.
You are thinking about performance a-priori. That's reasonable, but the real payoff is a-posteriori. In other words, you don't know for certain where the real optimization opportunities are, until the running code tells you.
You don't know if the bulk of the time will be spent in matrix inversion, multiplication, copying the matrix, dereferencing, or what. People can guess. If I had to guess, it would be matrix inversion, because it's 100x100.
However, something else I can't guess might be even bigger.
Guessing has a very poor track record, especially when you can just find out.
Here's an example of what I mean.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js