How to optimize evaluation of hessian in sympy? - sympy

I'm using statsmodels library for generic likelihood models. As I have a quite complicated likelihood function, I used sympy to calculate gradient and hessian for me. This works fine, but it is too slow for my needs, because likelihood function contains term b0*x0 + b1*x1 + ... + bn*xn. That way hessian size increases by N^2 and so does the complexity.
Elements of the hessian are often pretty similar like expensive_operation * x0 and expensive_operation * x1, etc. It means that if I could pre-calculate expensive_operation and use it in functions in hessian, I would drastically increase performance.
So the question is - is there a tool which would take list of functions, optimize them and then evaluate them effectively? Something like numexpr which would take list of functions?

SymPy has cse, which stands for common subexpression elimination. See the docs.
A simple example:
>>> print(cse(sin(x**2)*cos(x**2) + 2*sin(x**2) - cos(x**2)))
([(x0, x**2), (x1, sin(x0)), (x2, cos(x0))], [x1*x2 + 2*x1 - x2])

Related

Computing the Jacobian matrix in C++ (symbolic math)

Introduction
Let’s assume that I need the Jacobian matrix for the following set of ODE:
dxdt[ 0 ] = -90.0 * x[0] - 50.0 * x[1];
dxdt[ 1 ] = x[0] + 3*x[1];
dxdt[ 2 ] = x[1] + 50*x[2];
In Matlab/Octave this would be pretty easy:
syms x0 x1 x2;
f = [-90.0*x0-50.0*x1, x0+3*x1, x1+50*x2]
v=[x0, x1, x2]
fp = jacobian(f,v)
This would results with following output matrix:
[-90 -50 0 ]
[ 1 3 0 ]
[ 0 1 50]
What I need
Now I want to reproduce the same results in C++. I can’t compute the Jacobian before and hard-code it, as it will depend for example on user inputs and time. So my question is: How to do this? Usually for mathematics operations, I use the Boost library, however in this case I can’t find any solution. There’s only short note about this in implicit systems, but the following code doesn’t work:
sys.second( x , jacobi , t )
It also requests the time (t), so it probably doesn’t generate an analytic form of solution. Do I misunderstand the documentations? Or should I use another function? I would prefer to stay within Boost, as I need the Jacobian as ublas::matrix and I want to avoid conversion.
EDIT:
More specific I will use Jacobian inside rosenbrock4 ODE solver. Example here - lines 47-52. I need automatic generation of this structure as the ODE set may be changed later and I want to avoid manually rewriting Jacobian ever time. Also some variables inside ODE definitions are not constant in time.
I know this is long after the fact, but I have recently been wanting to do the same thing and have come across many auto differentiation (AD) libraries that do this pretty well. I have mostly been using Eigen's AD because I am already using Eigen everywhere. Here's an example of how you can use Eigen's AD to get the jacobian like you asked.
There's also a long list of c++ AD libraries on autodiff.org.
Hope this helps someone!
The Jacobian is based on derivatives of the function. If the function f is only known at run-time (and there are no constraints such as linearity), you have to automatise the differentiation. If you want this to happen exactly (as opposed to a numerical estimation), you need to use symbolic computation. Look for example here and here for libraries supporting this.
Note that the Jacobian usually depends on the state and the time, so it’s impossible to represent it as a constant matrix (such as in your example), unless your problem is so boring that you can solve it analytically anyway.

finding maximum of a function with least probes taken

I have some code, a function basically, that returns a value. This function takes long time to run. Function takes a double as a parameter:
double estimate(double factor);
My goal is to find such parameter factor at which this estimate function returns maximum value. I can simply brute force and iterate different factor inputs and get what I need, but the function takes long time to run, so I'd like to minimize amount of "probes" that I take (e.g. call the estimate function as least as possible).
Usually, maximum is returned for factor values between 0.5 and 3.5. If I graph returned values, I get something that looks like a bell curve. What's the most efficient approach to partition possible inputs to that I could discover maximum faster?
The previous answer suggested a 2 point approach. This is a good idea for functions that are approximately lines, because lines are defined by 2 parameters: y=ax+b.
However, the actual bell-shaped curve is more like a parabola, which is defined by ax²+bx+c (so 3 parameters). You therefore should take 3 points {x1,x2,x3} and solve for {a,b,c}. This will give you an estimate for the xtop at -b/2a. (The linked answer uses the name x0 here).
You'll need to iteratively approximate the actual top if the function isn't a real parabola, but this process converges really fast. The easiest solution is to take the original triplet x1,x2,x3, add xtop and remove the xn value which is furthest away from xtop. The advantage of this is that you can reuse 2 of the old f(x) values. This reuse helps a lot with the stated goal of "mininal samples".
If your function indeed has a bell shaped curve then you can use binary search as follows:
Choose an initial x1 (say x1 = 2, midway between 0.5 and 3.5) and find f(x1) and f(x1 + delta) where delta is small enough. If f(x1 + delta) > f(x1) it means that the peak is towards the right of x1 otherwise it is towards the left.
Carry out binary search and come to a close enough value of the peak as you want.
You can modify the above approach by choosing the next x_t according to the difference f(x1 + delta) - f(x1).

Two point boundary with odeint

I am trying to solve two point boundary problem with odeint. My equation has the form of
y'' + a*y' + b*y + c = 0
It is pretty trivial when I have boundary conditions of y(x_1) = y_1 , y'(x_2) = y_2, but when boundary conditions are y(x_1) = y_1 , y(x_2) = y_2 I am lost. Does anybody know the way to deal with problems like this with odeint or other scientific library?
In this case you need a shooting method. odeint does not have such a method, it solved the initial value problem (IVP) which is your first case. I think in the Numerical Recipies this method is explained and you can use Boost.Odeint to do the time stepping.
An alternative and more efficient method to solve this type of problem is finite differences or finite elements method. For finite differences you can check Numerical Recipes. For finite elements I recommend dealii library.
Another approach is to use b-splines: Assuming you do know the initial x0 and final xfinal points of integration, then you can expand the solution y(x) in a b-spline basis, defined over (x0,xfinal), i.e.
y(x)= \sum_{i=1}^n A_i*B_i(x),
where A_i are constant coefficients to be determined, and B_i(x) are b-spline basis (well defined polynomial functions, that can be differentiated numerically). For scientific applications you can find an implementation of b-splines in GSL.
With this substitution the boundary value problem is reduced to a linear problem, since (am using Einstein summation for repeated indices):
A_i*[ B_i''(x) + a*B_i'(x) + b*B_i(x)] + c =0
You can choose a set of points x and create a linear system from the above equation. You can find information for this type of method in the following review paper "Applications of B-splines in Atomic and Molecular Physics" - H Bachau, E Cormier, P Decleva, J E Hansen and F Martín
http://iopscience.iop.org/0034-4885/64/12/205/
I do not know of any library solving directly this problem, but there are several libraries for B-splines (I recommend GSL for your needs), that will allow you to form the linear system. See this stackoverflow question:
Spline, B-Spline and NURBS C++ library

Predefinition of often used values in computations - does it change anything?

I'm auto generating C code to compute large expressions and try to figure out with simple examples whether it makes sense to predefine certain subparts in separate variables.
As a simple example, say we compute something of the form:
#include <cmath>
double test(double x, double y) {
const double c[9][9] = { ... }; // constants properly initialized, irrelevant
double expr = c[0][0]*x*y
+ c[1][0]*pow(x,2)*y + ... + c[8][0]*pow(x,9)*y
+ c[1][1]*pow(x,2)*pow(y,2) + ... + c[8][1]*pow(x,9)*pow(y,2)
+ ...
with all c[i][j] properly initialized. In reality those expressions contain tens of millions of multiplications and additions.
A colleague now proposed -- to reduce the number of calls to pow() and to cache often needed values in the expressions -- to define every power of x and y in a separate variable, which is no big deal as the code is auto generated anyway, like this:
double xp2 = pow(x,2);
double xp3 = pow(x,3);
double xp4 = pow(x,4);
// ...
// same for pow(y,n)
I think, however, that this is unnecessary, as the compiler should take care of these optimizations.
Unfortunately, I have no experience with reading and interpreting assembly but I think I see that all the calls to pow() are optimized out, is this right? Also, does the compiler cache the values for pow(x,2), pow(x,3), etc?
Thanks in advance for your input!
Using pow with integer arguments... ouch ! Typical implementations of pow are tuned for the general case of floating point arguments, which is why it is usually way slower to write
pow(x, 2) ( = exp(2 * log(x)) )
than
x * x
What I state here is very compiler dependant though. On one hand, some compilers may not even know that pow(x, 2) will yield the same value for a given x (after all, the extern function pow could have side effects), so you don't have any guarantee that common subexpressions will be eliminated. The pow function, on some (many ?) platforms/toolchains, is provided by a library the compiler has no control onto.
On other implementations though, the compiler may turn those pow calls into multiplications, or at least into intrinsics, which may in turn specialize for integer exponents. Your mileage will vary.
The first thing I'd do is to replace calls to pow by multiplications. For larger exponents, you may also do, eg.
double x2 = x * x;
double x3 = x * x2;
double x4 = x2 * x2;
Note that (credits to #Stephen Canon) doing repeated multiplications (with the above quick exponentiation scheme) will introduce roundoff error whose magnitude is proportional to the number of multiplications (ie. O(log exponent)). This error is typically tolerable, but pow guarantees exactness within one unit of least precision.
The compiler may perform common subexpression elimination- remember that it can't guarantee that all functions are re-entrant, but if pow is inlined, then it may well do this.
A good way to compute polynomials is Horner's rule. (eg here) which doesn't require pow() or any extra memory.
Your expression is x*y times a polynomial in y each of whose coefficients is a polynomial in x.
Each of these coefficients can be calculated using Horner with 8 multiplies and additions, and the polynomial in y with 8 more multiplies and additions for a total of 74 multiplies and 72 additions , whereas your sample code looks to me like more that 200 multiplications and more than a hundred calls to pow().
pow may be optimized away depending on the toolchain. The only way you can tell is to try it and see.
In the general case, unless the implementation of pow is visible to the compiler as a macro or inline, then the compiler can't cache the result as it doesn't know what side-effects the function may have.
Profile, find out where the bottlenecks are.
If the sub-expressions are used frequently, it may make sense to cache or store the intermediate values. However, accessing these values may take more time than letting the values sit in a data pipeline within the processor. Data fetches outside of the processor are much slower than fetching from its internal data cache.
Also try using Algebra to simplify the mathematical expressions. Perhaps even Linear Algebra to find some more efficient matrix expressions.
You may want to isolate the calculations to expressions involving one variable. Compilers can optimize code better when only one variable is used or changing at a time. For example, substitute the y variable with expressions involving x, if possible. This would reduce to an expression only involving x.
Also search the web for "data driven design" or "data oriented design". These sites show how to optimize code for data centric applications.

Sparse constrained linear least-squares solver

This great SO answer points to a good sparse solver for Ax=b, but I've got constraints on x such that each element in x is >=0 an <=N.
Also, A is huge (around 2e6x2e6) but very sparse with <=4 elements per row.
Any ideas/recommendations? I'm looking for something like MATLAB's lsqlin but with huge sparse matrices.
I'm essentially trying to solve the large scale bounded variable least squares problem on sparse matrices:
EDIT:
In CVX:
cvx_begin
variable x(n)
minimize( norm(A*x-b) );
subject to
x <= N;
x >= 0;
cvx_end
You are trying to solve least squares with box constraints. Standard sparse least squares algorithms include LSQR and more recently, LSMR. These only require you to apply matrix-vector products. To add in the constraints, realize that if you are in the interior of the box (none of the constraints are "active"), then you proceed with whatever interior point method you chose. For all active constraints, the next iteration you perform will either deactivate the constraint, or constrain you to move along the constraint hyperplane. With some (conceptually relatively simple) suitable modifications to the algorithm you choose, you can implement these constraints.
Generally however, you can use any convex optimization package. I have personally solved this exact type of problem using the Matlab package CVX, which uses SDPT3/SeDuMi for a backend. CVX is merely a very convenient wrapper around these semidefinite program solvers.
Your problem is similar to a nonnegative least-squares problem (NNLS), which can be formulated as
$$\min_x ||Ax-b||_2^2 \text{ subject to } x \ge 0$$,
for which there seems to exist many algorithms.
Actually, you problem can be more or less converted into an NNLS problem, if, in addition to your original nonnegative variables $x$ you create additional variables $x'$ and link them with linear constraints $x_i+x_i'=N$. The problem with this approach is that these additional linear constraints might not be satisfied exactly in the least-squares solution - it might be appropriate then to try to weight them with a large number.
If you reformulate your model as:
min
subject to:
then it is a standard quadratic programming problem. This is a common type of model that can be easily solved with a commercial solver such as CPLEX or Gurobi. (Disclaimer: I currently work for Gurobi Optimization and formerly worked for ILOG, which provided CPLEX).
Your matrix A^T A is positive semi-definite, so your problem is convex; be sure to take advantage of that when setting up your solver.
Most go-to QP solvers are in Fortran and/or non-free; however I've heard good things about OOQP (http://www.mcs.anl.gov/research/projects/otc/Tools/OOQP/OoqpRequestForm.html), though it's a bit of a pain getting a copy.
How about CVXOPT? It works with sparse matrices, and it seems that some of the cone programming solvers may help:
http://abel.ee.ucla.edu/cvxopt/userguide/coneprog.html#quadratic-cone-programs
This is a simple modification of the code in the doc above, to solve your problem:
from cvxopt import matrix, solvers
A = matrix([ [ .3, -.4, -.2, -.4, 1.3 ],
[ .6, 1.2, -1.7, .3, -.3 ],
[-.3, .0, .6, -1.2, -2.0 ] ])
b = matrix([ 1.5, .0, -1.2, -.7, .0])
N = 2;
m, n = A.size
I = matrix(0.0, (n,n))
I[::n+1] = 1.0
G = matrix([-I, I])
h = matrix(n*[0.0] + n*[N])
print G
print h
dims = {'l': n, 'q': [n+1], 's': []}
x = solvers.coneqp(A.T*A, -A.T*b, G, h)['x']#, dims)['x']
print x
CVXOPT support sparse matrices, so it be useful for you.
If you have Matlab, this is something you can do with TFOCS. This is the syntax you would use to solve the problem:
x = tfocs( smooth_quad, { A, -b }, proj_box( 0, N ) );
You can pass A as a function handle if it's too big to fit into memory.