SymPy: Interval Math - sympy

Intro
We have been working on a recent project and have been looking for a suitable system to calculate some values. SymPy was recommended as being a rich mathematical library. However, we have been unable to make it "work" with our project.
The issue we have been struggling with specifically is that many of the values we would be using have been rounded numerous times and are likely susceptible to float errors. To work around this on a previous project, we used Interval Arithmetic for JavaScript to fairly effective use. mpmath for Python appears to be similar, but SymPy not only uses mpmath, but also offers other potentially useful functions we may need in the future.
Problem
A sample equation that we have been working on lately is a = b * (1 + c * d * e) and we are looking to solve for e when all other variables are known. However, some of the variables need to be represented as a range of values as we don't know the exact value, but a small range.
Code
from sympy import *
from sympy.sets.setexpr import SetExpr
a, b, c, d, e = symbols('a b c d e')
b = 40
c = 1
d = 0.1
a = SetExpr(Interval(45.995, 46.005))
equ = Eq(b * (1 + c * d * e), a)
solveset(equ, e)
ValueError: The argument '45.995*I' is not comparable.
This was just the latest attempt, but I have tried setting domains, setting inequalities for symbols, using AccumBounds, and numerous other solutions but I can't help but think that we have completely overlooked something simple.
Solution
It appears that that using one interval is doable with the code provided by the selected answer, but it doesn't extend to multiple symbols requiring intervals or ranges of values. It appears we will be extending the mpmath library to support additional interval functions.

There is an intervalmath module in SymPy in the plotting module for some reason. It doesn't subclass Basic though so can't be used directly in an expression. We can however use lambdify to substitute it into an expression as
from sympy import *
from sympy.plotting.intervalmath import interval
b = 40
c = 1
d = 0.1
a, e = symbols('a, e', real=True)
equ = Eq(b * (1 + c * d * e), a)
sol_e, = solveset(equ, e)
f_e = lambdify(a, sol_e)
int_a = interval(45.995, 46.005)
int_e = f_e(a=int_a)
print(int_e)
This gives
[1.498750, 1.501250]
I don't think the intervalmath module is used much though so there's a good chance it might not fully work in your real problem.
Using sets is probably a better approach and it seems that imageset can do this:
In [16]: set_e = imageset(Lambda(a, sol_e), Interval(45.995, 46.005))
In [17]: set_e
Out[17]: [1.49875, 1.50125]
I'm not sure how well this works with more than one symbol/interval though.
EDIT: For completeness I'm showing how you would use intervalmath with more than one interval:
from sympy import *
from sympy.plotting.intervalmath import interval
b = 40
d = 0.1
a, c, e = symbols('a, c, e', real=True)
equ = Eq(b * (1 + c * d * e), a)
sol_e, = solveset(equ, e)
f_e = lambdify((a, c), sol_e)
int_a = interval(45.995, 46.005)
int_c = interval(0.95, 1.05)
int_e = f_e(a=int_a, c=int_c)
print(int_e)
That gives
[1.427381, 1.580263]

Related

Summing two Poisson distributions with SymPy

I'm new to SymPy and I'm trying to use it to sum two Poisson distributions
Here's what I have so far (using jupyter notebook)
from sympy import *
from sympy.stats import *
init_printing(use_latex='mathjax')
lamda_1, lamda_2 = symbols('lamda_1, lamda_2')
n_1 = Symbol('n_1')
n_2 = Symbol('n_2')
n = Symbol('n')
#setting up distributions
N_1 = density(Poisson('N_1', lamda_1))(n_1)
N_2 = density(Poisson('N_2', lamda_2))(n_2)
display(N_1)
display(N_2)
print('setting N_2 in terms of N and N_1')
N_2 = N_2.subs(n_2,n-n_1)
display(N_2)
print("N_1 * N_2")
N = N_1 * N_2
#display(N)
Sum(N,(n_1,0,n))
#summation(N,(n_1,0,n))
Everything works fine until I try and run the summation. No errors just doesn't do anything and jupyter says it's running. I've let it run for 10 mins and nothing...
When declaring symbols, include their properties: being positive, integer, nonnegative, etc. This helps SymPy decide whether some transformations are legitimate.
lamda_1, lamda_2 = symbols('lamda_1, lamda_2', positive=True)
n_1, n_2, n = symbols('n_1 n_2 n', nonnegative=True, integer=True)
Unfortunately, summation still fails because SymPy cannot come up with the key trick: multiplying and dividing by factorial(n). It seems one has to tell it to do that.
s = summation(N*factorial(n), (n_1, 0, n))/factorial(n)
print(s.simplify())
This prints
Piecewise(((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n), ((-n >= 0) & (lamda_1/lamda_2 <= 1)) | ((-n < 0) & (lamda_1/lamda_2 <= 1))), (lamda_2**n*exp(-lamda_1 - lamda_2)*Sum(lamda_1**n_1*lamda_2**(-n_1)/(factorial(n_1)*factorial(n - n_1)), (n_1, 0, n)), True))
which is a piecewise formula full of unnecessary conditions... but if we ignore those conditions (they are just artifacts of how SymPy performed the summation), the correct result
((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n)
is there.
Aside: avoid doing import * from both sympy and sympy.stats, there are notational clashes such as E being 2.718... versus expected value. from sympy.stats import density, Poisson would be better. Also, N is a built-in SymPy function and is best avoided as a variable name.

Timothy Lottes generic tonemapper

I'm looking at presentation by Timothy Lottes where he derives a generic tonemapper (slides 37 and following).
Although the purpose of the different parameters is explained nicely I find it quite hard to adjust them properly. I wrote a simple script to compare different tonemappers and have trouble to find reasonable settings for the generic tonemapper.
Generally I cannot get the shoulder of the curve to behave comparable to the other operators. Maybe it is a mistake in my implementation (original source code is in the slides).
def generic(x):
a = 1.2 # contrast
d = 1.1 # shoulder
mid_in = 1
mid_out = 0.18
hdr_max = 16
# It seems to work better when omitting the minus
b = (-math.pow(mid_in, a) + math.pow(hdr_max, a) * mid_out) / (math.pow(math.pow(hdr_max, a), d) - math.pow(math.pow(mid_in, a), d) * mid_out)
c = (math.pow(math.pow(hdr_max, a), d) * math.pow(mid_in, a) - math.pow(hdr_max, a) * math.pow(math.pow(mid_in, a), d) * mid_out) / (math.pow(math.pow(hdr_max, a), d) - math.pow(math.pow(mid_in, a), d) * mid_out)
z = math.pow(x, a)
y = z / (math.pow(z, d) * b + c)
return y
Has anybody experimented with this by chance?
Apparently there is a problem in the code presented in the slides slide. Bart Wronski gives some corrected code in the comments section of his blog post.
I have also updated the github project to reflect this.

Parameter optimization using a genetic algorithm?

I am trying to optimize parameters for a known function to fit an experimental data plot. The function is fairly involved
where x sweeps along a know set of numbers and p, g and c are the independent parameters to be optimized. Any ideas or resources that could be of assistance?
I would not recommend Genetic Algorithms. Instead go for straight forward Optimization.
Scipy has some resources.
You haven't provided any data or so, so I'll just go for something that should run. Below is something to get you started. I can't know if it works without seeing the data. Also, there must probably is a way to dynamically feed objectivefunc your x and y data. That's probably in the docs to scipy.optimize.minimize.
What I've done. Create a function to minimize. Here, I've called it objectivefunc. For that I've taken your function y = x^2 * p^2 * g / ... and transformed it to be of the form x^2 * p^2 * g / (...) - y = 0. Then square the left hand side and try to minimise it. Because you will have multiple (x/y) data samples, I'd minimise the sum of the squares. Put it all in a function and pass it to the minimize from scipy.
import numpy as np
from scipy.optimize import minimize
def objectivefunc(pgq):
"""Your function transformed so that it can be minimised.
I've renamed the input pgq, so that pgq[0] is p, pgq[1] is g, etc.
"""
p = pgq[0]
g = pgq[1]
q = pgq[2]
x = [10, 9.4, 17] # Some input data.
y = [12, 42, 0.8]
sum_ = 0
for i in range(len(x)):
sum_ += (x[i]**2 * p**2 * g - y[i] * ( (c**2 - x**2)**2 + x**2 * g**2) )**2
return sum_
pgq = np.array([1.3, 0.7, 0.5]) # Supply sensible initivial values
res = minimize(objectivefunc, pgq, method='nelder-mead',
options={'xtol': 1e-8, 'disp': True})
Have you tired old good Levenberg-Marquardt as implemented in Levenberg-Marquardt.vi. If it does not suite your needs, you can try Waptia libraryfor LabVIEW with one of the genetic algorithms implemented.

Why does it require 20X more time for these calculations when the values get tiny [duplicate]

This question already has answers here:
Why does changing 0.1f to 0 slow down performance by 10x?
(6 answers)
Closed 8 years ago.
I am a circuit designer, not a software engineer, so I have no idea how to track down this problem.
I am working with some IIR filter code and I am have problems with extremely slow execution times when I process extremely small values through the filter. To find the problem, I wrote this test code.
Normally, the loop will run in about 200 ms or so. (I didn't measure it.) But when TestCheckBox->Checked, it requires about 7 seconds to run. The problem lies with the reduction in size of A, B, C and D within the loop, which is exactly what happens to the values in an IIR filter after it's input goes to zero.
I believe the problem lies with the fact that the variable's expononent value becomes less than -308. A simple fix is to declare the variables as long doubles, but that isn't an easy fix in the actual code, and it doesn't seem like I should have to do this.
Any ideas why this happens and what a simple fix might be?
In case its matters, I am using C++ Builder XE3.
int j;
double A, B, C, D, E, F, G, H;
//long double A, B, C, D, E, F, G, H; // a fix
A = (double)random(100000000)/10000000.0 - 5.0;
B = (double)random(100000000)/10000000.0 - 5.0;
C = (double)random(100000000)/10000000.0 - 5.0;
D = (double)random(100000000)/10000000.0 - 5.0;
if(TestCheckBox->Checked)
{
A *= 1.0E-300;
B *= 1.0E-300;
C *= 1.0E-300;
D *= 1.0E-300;
}
for(j=0; j<=1000000; j++)
{
A *= 0.9999;
B *= 0.9999;
C *= 0.9999;
D *= 0.9999;
E = A * B + C - D; // some exercise code
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
E = A * B + C - D;
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
E = A * B + C - D;
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
}
EDIT:
As the answers said, the cause of this problem is denormal math, something I had never heard of. Wikipedia has a pretty nice description of it as does the MSDN article given by Sneftel.
http://en.wikipedia.org/wiki/Denormal_number
Having said this, I still can't get my code to flush denormals. The MSDN article says to do this:
_controlfp(_DN_FLUSH, _MCW_DN)
These definitions are not in the XE3 math libraries however, so I used
controlfp(0x01000000, 0x03000000)
per the article, but this is having no affect in XE3. Nor is the code suggested in the Wikipedia article.
Any suggestions?
You're running into denormal numbers (ones less than DBL_MIN, in which the most significant digit is treated as a zero). Denormals extend the range of the representable floating-point numbers, and are important to maintain certain useful error bounds in FP arithmetic, but operating on them is far slower than operating on normal FP numbers. They also have lower precision. So you should try to keep all your numbers (both intermediate and final quantities) greater than DBL_MIN.
In order to increase performance, you can force denormals to be flushed to zero by calling _controlfp(_DN_FLUSH, _MCW_DN) (or, depending on OS and compiler, a similar function). http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You've entered the realm of floating-point underflow, resulting in denormalized numbers - depending on the hardware you're likely trapping into software, which will be much much slower than hardware operations.

Optimization to find complex number as input

I am wondering if there is a C/C++ library or Matlab code technique to determine real and complex numbers using a minimization solver. Here is a code snippet showing what I would like to do. For example, suppose that I know Utilde, but not x and U variables. I want to use optimization (fminsearch) to determine x and U, given Utilde. Note that Utilde is a complex number.
x = 1.5;
U = 50 + 1i*25;
x0 = [1 20]; % starting values
Utilde = U * (1 / exp(2 * x)) * exp( 1i * 2 * x);
xout = fminsearch(#(v)optim(v, Utilde), x0);
function diff = optim(v, Utilde)
x = v(1);
U = v(2);
diff = abs( -(Utilde/U) + (1 / exp(2 * x)) * exp( 1i * 2 * x ) );
The code above does not converge to the proper values, and xout = 1.7318 88.8760. However, if U = 50, which is not a complex number, then xout = 1.5000 50.0000, which are the proper values.
Is there a way in Matlab or C/C++ to ensure proper convergence, given Utilde as a complex number? Maybe I have to change the code above?
If there isn't a way to do this natively in Matlab, then perhaps one
gist of the question is this: Is there a multivariate (i.e.
Nelder-Mead or similar algorithm) optimization library that is able
to work with real and complex inputs and outputs?
Yet another question is whether the function is convergent or not. I
don't know if it is the algorithm or the function. Might I need to change something in the Utilde = U * (1 / exp(2 * x)) * exp( 1i * 2 * x) expression to make it convergent?
The main problem here is that there is no unique solution to this optimization or parameter fitting problem. For example, looking at the expected and actual results above, Utilde is equivalent (ignoring round-off differences) for the two (x, U) pairs, i.e.
Utilde(x = 1.5, U = 50 + 25i) = Utilde(x = 1.7318, U = 88.8760)
Although I have not examined it in depth, I even suspect that for any value of x, you can find an U that computes to Utilde(x, U) = Utilde(x = 1.5, U = 50 + 25i).
The solution here would thus be to further constrain the parameter fitting problem so that the solver yields any solution that can be considered acceptable. Alternatively, reformulate Utilde to have a unique value for any (x, U) pair.
UPDATE, AUG 1
Given reasonable starting values, it actually seems like it is sufficient to restrict x to be real-valued. Performing unconstrained non-linear optimization using the diff function formulated above, I get the following result:
x = 1.50462926953244
U = 50.6977768845879 + 24.7676554234729i
diff = 3.18731710515855E-06
However, changing the starting guess to values more distant from the desired values does yield different solutions, so restricting x to be real-values does not alone provide a unique solution to the problem.
I have implemented this in C#, using the BOBYQA optimizer, but the numerics should be the same as above. If you want to try outside of Matlab, it should also be relatively simple to turn the C# code below into C++ code using the std::complex class and an (unconstrained) nonlinear C++ optimizer of your own choice. You could find some C++ compatible codes that do not require gradient computation here, and there is also various implementations available in Numerical Recipes. For example, you could access the C version of NR online here.
For reference, here are the relevant parts of my C# code:
class Program
{
private static readonly Complex Coeff = new Complex(-2.0, 2.0);
private static readonly Complex UTilde0 = GetUTilde(1.5, new Complex(50.0, 25.0));
static void Main(string[] args)
{
double[] vars = new[] {1.0, 25.0, 0.0}; // xstart = 1.0, Ustart = 25.0
BobyqaExitStatus status = Bobyqa.FindMinimum(GetObjfnValue, vars.Length, vars);
}
public static Complex GetUTilde(double x, Complex U)
{
return U * Complex.Exp(Coeff * x);
}
public static double GetObjfnValue(int n, double[] vars)
{
double x = vars[0];
Complex U = new Complex(vars[1], vars[2]);
return Complex.Abs(-UTilde0 / U + Complex.Exp(Coeff * x));
}
}
The documentation for fminsearch says how to deal with complex numbers in the limitations section:
fminsearch only minimizes over the real numbers, that is, x must only consist of real numbers and f(x) must only return real numbers. When x has complex variables, they must be split into real and imaginary parts.
You can use the functions real and imag to extract the real and imaginary parts, respectively.
It appears that there is no easy way to do this, even if both x and U are real numbers. The equation for Utilde is not well-posed for an optimization problem, and so it must be modified.
I've tried to code up my own version of the Nelder-Mead optimization algorithm, as well as tried Powell's method. Neither seem to work well for this problem, even when I attempted to modify these methods.