Efficient evaluation of arbitrary functions given as data, in C++ - c++

Consider the following goal:
Create a program that solves: minimize f(x) for an arbitrary f and x supplied as input.
How could one design a C++ program that could receive a description of f and x and process it efficiently?
If the program was actually a C++ library then one could explicitly write the code for f and x (probably inheriting from some base function class for f and state class for x).
However, what should one do if the program is for example a service, and the user is sending the description of f and x in some high level representation, e.g. a JSON object?
Ideas that come to mind
1- Convert f into an internal function representation (e.g. a list of basic operations). Apply those whenever f is evaluated.
Problems: inefficient unless each operation is a batch operation (e.g. if we are doing vector or matrix operations with large vectors / matrices).
2- Somehow generate C++ code and compile the code for representing x and computing f. Is there a way to restrict compilation so that only that code needs to be compiled, but the rest of the code is 'pre-compiled' already?

The usual approach used by the mp library and others is to create an expression tree (or DAG) and use some kind of a nonlinear optimization method that normally relies on derivative information which can be computed using automatic or numeric differentiation.
An expression tree can be efficiently traversed for evaluation using a generic visitor pattern. Using JIT might be an overkill unless the time taken for evaluating a function takes substantial fraction of the optimization time.

Related

C++ pre-runtime symbolic differentiation

Suppose I want to implement a gradient descent based optimizer of some function f(x),
which is a combination of simple functions: +,*,sin,cos (leaving out / for simplicity)
Is there a way, using templates to symbolically calculate the derivative and make a function f'(x)
in C++, then use the function and its gradient at runtime to optimize it.
I'm comfortable with symbolic mathematics, so that isn't the focus of the question.
I could write a parser and input the function as a string, expanding it dynamically at runtime, but especially for more complex functions, this is liable to be slow.
If there's a way to produce the function at compile time, that would be awesome.

How to extract matrixL() and matrixU() when using Eigen::CholmodSupernodalLLT?

I'm trying to use Eigen::CholmodSupernodalLLT for Cholesky decomposition, however, it seems that I could not get matrixL() and matrixU(). How can I extract matrixL() and matrixU() from Eigen::CholmodSupernodalLLT for future use?
A partial answer to integrate what others have said.
Consider Y ~ MultivariateNormal(0, A). One may want to (1) evaluate the (log-)likelihood (a multivariate normal density), (2) sample from such density.
For (1), it is necessary to solve Ax = b where A is symmetric positive-definite, and compute its log-determinant. (2) requires L such that A = L * L.transpose() since Y ~ MultivariateNormal(0, A) can be found as Y = L u where u ~ MultivariateNormal(0, I).
A Cholesky LLT or LDLT decomposition is useful because chol(A) can be used for both purposes. Solving Ax=b is easy given the decomposition, andthe (log)determinant can be easily derived from the (sum)product of the (log-)components of D or the diagonal of L. By definition L can then be used for sampling.
So, in Eigen one can use:
Eigen::SimplicialLDLT solver(A) (or Eigen::SimplicialLLT), when solver.solve(b) and calculate the determinant using solver.vectorD().diag(). Useful because if A is a covariance matrix, then solver can be used for likelihood evaluations, and matrixL() for sampling.
Eigen::CholmodDecomposition does not give access to matrixL() or vectorD() but exposes .logDeterminant() to achieve the (1) goal but not (2).
Eigen::PardisoLDLT does not give access to matrixL() or vectorD() and does not expose a way to get the determinant.
In some applications, step (2) - sampling - can be done at a later stage so Eigen::CholmodDecomposition is enough. At least in my configuration, Eigen::CholmodDecomposition works 2 to 5 times faster than Eigen::SimplicialLDLT (I guess because of the permutations done under the hood to facilitate parallelization)
Example: in Bayesian spatial Gaussian process regression, the spatial random effects can be integrated out and do not need to be sampled. So MCMC can proceed swiftly with Eigen::CholmodDecomposition to achieve convergence for the uknown parameters. The spatial random effects can then be recovered in parallel using Eigen::SimplicialLDLT. Typically this is only a small part of the computations but having matrixL() directly from CholmodDecomposition would simplify them a bit.
You cannot do this using the given class. The class you are referencing is equotation solver (which indeed uses cholesky decomposition). To decompose your matrix you should rather use Eigen::LLT. Code example from their website:
MatrixXd A(3,3);
A << 4,-1,2, -1,6,0, 2,0,5;
LLT<MatrixXd> lltOfA(A);
MatrixXd L = lltOfA.matrixL();
MatrixXd U = lltOfA.matrixU();
As reported somewhere else, e.g., it cannot be done easily.
I am copying a possible recommendation (answered by Gael Guennebaud himself), even if somewhat old:
If you really need access to the factor to do your own cooking, then
better use the built-in SimplicialL{D}LT<> class. Extracting the
factors from the supernodal internal represations of Cholmod/Pardiso
is indeed not straightforward and very rarely needed. We have to
check, but if Cholmod/Pardiso provide routines to manipulate the
factors, like applying it to a vector, then we could let
matrix{L,U}() return a pseudo expression wrapping these routines.
Developing code for extracting this is likely beyond SO, and probably a topic for a feature request.
Of course, the solution with LLT is at hand (but not the topic of the OP).

Mimimization of anonymous function in C++

I have a cyclic program in C++ which includes composing of a function (every time it is different) and further minimization of it. Composing of a function is implemented with GiNaC package (symbolic expressions).
I tried to minimize functions using Matlab fmincon function but it ate all the memory while converting string to lambda function (functions are rather complicated). And I couldn't manage to export function from C++ to Matlab in any way but as a string.
Is there any way to compose a complicated function (3 variables, sin-cos-square root etc.) and minimize it without determing gradient by myself because I don't know how functions look before running the program?
I also looked at NLopt and as I understood it requires gradients to be writte by programmer.
Most optimization algorithms do require the gradient. However, if it's impossible to 'know' it directly, you may evaluate it considering a small increment of every coordinate. If your F function depends on coordinates of x vector, you may approximate the i's component of you gradient vector G as
x1 = x;
x1[i] += dx;
G[i] = (F(x1) - F(x))/dx;
where dx is some small increment. Although such a calculation is approximate it's usually absolutely good for a minimum finding provided that dx is small enough.

LPSolve - specify constant coefficients

I'm using LPSolve IDE to solve a LP problem. I have to test the model against about 10 or 20 sets of different parameters and compare them.
Is there any way for me to keep the general model, but to specify the constants as I wish? For example, if I have the following constraint:
A >= [c]*B
I want to test how the model behaves when [c] = 10, [c] = 20, and so on. For now, I'm simply preparing different .lp files via search&replace, but:
a) it doesn't seem too efficient
b) at some point, I need to consider the constraint of the form A >= B/[c] // =(1/[c]*B). It seems, however, that LPSolve doesn't recogize the division operator. Is specifying 1/[c] directly each time the only option?
It is not completely clear what format you use with lp_solve. With the cplex lp format for example, there is no better way: you cannot use division for the coefficient (or even multiplication for that matter) and there is no function to 'include' another file or introduce a symbolic names for a parameter. It is a very simple language, and not suitable for any complex task.
There are several solutions for your problem; it depends if you are interested in something fast to implement, or 'clean', reusable and with a short runtime (of course this is a compromise).
You have the possibility to generate your lp files from another language, e.g. python, bash, etc. This is a 'quick and dirty' solution: very slow at runtime, but probably the faster to implement.
As every lp solver I know, lp_solve comes with several modelling interfaces: you can for example use the GNU mp format instead of the current one. It recognizes multiplication, divisions, conditionals, etc. (everything you are looking for, see the section 3.1 'numeric expressions')
Finally, you have the possibility to use directly the lp_solve interface from another programming language (e.g. C) which will be the most flexible option, but it may require a little bit more work.
See the lp_solve documentation for more details on the supported input formats and the API reference.

why polymorphism is so costly in haskell(GHC)?

I am asking this question with refernce to this SO question.
Accepted answer by Don stewart : First line says "Your code is highly polymorphic change all float vars to Double .." and it gives 4X performance improvement.
I am interested in doing matrix computations in Haskell, should I make it a habit of writing highly monomorphic code?
But some languages make good use of ad-hoc polymorphism to generate fast code, why GHC won't or can't? (read C++ or D)
why can't we have something like blitz++ or eigen for Haskell? I don't understand how typeclasses and (ad-hoc)polymorphism in GHC work.
With polymorphic code, there is usually a tradeoff between code size and code speed. Either you produce a separate version of the same code for each type that it will operate on, which results in larger code, or you produce a single version that can operate on multiple types, which will be slower.
C++ implementations of templates choose in favor of increasing code speed at the cost of increasing code size. By default, GHC takes the opposite tradeoff. However, it is possible to get GHC to produce separate versions for different types using the SPECIALIZE and INLINABLE pragmas. This will result in polymorphic code that has speed similar to monomorphic code.
I want to supplement Dirk's answer by saying that INLINABLE is usually recommended over SPECIALIZE. An INLINABLE annotation on a function guarantees that the module exports the original source code of the function so that it can be specialized at the point of usage. This usually removes the need to provide separate SPECIALIZE pragmas for every single use case.
Unlike INLINE, INLINABLE does not change GHC's optimization heuristics. It just says "Please export the source code".
I don't understand how typeclasses work in GHC.
OK, consider this function:
linear :: Num x => x -> x -> x -> x
linear a b x = a*x + b
This takes three numbers as input, and returns a number as output. This function accepts any number type; it is polymorphic. How does GHC implement that? Well, essentially the compiler creates a "class dictionary" which holds all the class methods inside it (in this case, +, -, *, etc.) This dictionary becomes an extra, hidden argument to the function. Something like this:
data NumDict x =
NumDict
{
method_add :: x -> x -> x,
method_subtract :: x -> x -> x,
method_multiply :: x -> x -> x,
...
}
linear :: NumDict x -> x -> x -> x -> x
linear dict a b x = a `method_multiply dict` x `method_add dict` b
Whenever you call the function, the compiler automatically inserts the correct dictionary - unless the calling function is also polymorphic, in which case it will have received a dictionary itself, so just pass that along.
In truth, functions that lack polymorphism are typically faster not so much because of a lack of function look-ups, but because knowing the types allows additional optimisations to be done. For example, our polymorphic linear function will work on numbers, vectors, matricies, ratios, complex numbers, anything. Now, if the compiler knows that we want to use it on, say, Double, now all the operations become single machine-code instructions, all the operands can be passed in processor registers, and so on. All of which results in fantastically efficient code. Even if it's complex numbers with Double components, we can make it nice and efficient. If we have no idea what type we'll get, we can't do any of those optimisations... That's where most of the speed difference typically comes from.
For a tiny function like linear, it's highly likely it will be inlined every time it's called, resulting in no polymorphism overhead and a small amount of code duplication - rather like a C++ template. For a larger, more complex polymorphic function, there may be some cost. In general, the compiler decides this, not you - unless you want to start sprinkling pragmas around the place. ;-) Or, if you don't actually use any polymorphism, you can just give everything monomorphic type signatures...