what does the 'find_MAP' output mean in pymc3? - pymc3

What are the returned values of find_MAP in pymc3 ?
It seems that pymc3.Normal and pymc3.Uniform variables are not considered the same: for pymc3.Normal variables, find_MAP returns a value that looks like the maximum a posteriori probability. But for pymc3.Uniform variables, I get a '_interval' suffix added to the name of the variable and I don't find anywhere in the doc the meaning of the returned value (that may seem absurd, not even within the physical limits).
For example:
import numpy as np
import pymc3 as pm3
# create basic data such as obs = (x*0.95)**2+1.1+noise
x=np.arange(10)+1
obs=(x*0.95)**2+np.random.randn(10)+1.1
# fitting the model y=a(1*x)**2+a0 on data points
with pm3.Model() as model:
a0 = pm3.Uniform("a0",0,5)
a1 = pm3.Normal("a1",mu=1,sd=1)
a2 = pm3.Deterministic('a2',(x*a1)**2+a0)
hypothesis = pm3.Normal('hypothesis', mu=a2, sd=0.1, observed=obs)
start = pm3.find_MAP()
print('start: ',start)
returns:
Optimization terminated successfully.
Current function value: 570.382509
Iterations: 13
Function evaluations: 17
Gradient evaluations: 17
start: {'a1': array(0.9461006484031161), 'a0_interval_': array(-1.0812715249577414)}

By default, pymc3 transforms some variables with bounded support to the set of real numbers. The enables a variety of operations which would otherwise choke when given bounded distributions (e.g. some methods of optimizations and sampling). When this automatic transformation is applied, the random variable that you added to the model becomes a child of the transformed variable. This transformed variable is added to the model with [var]_[transform]_ as the name.
The default transformation for a uniform random variable is called the "interval transformation" and the new name of this variable is [name]_interval_. The MAP estimate is the found by optimizing all of the parameters to maximize the posterior probability. We only need to optimize the transformed variable, since this fully determines the value of the variable you originally added to the model. pm.find_MAP() returns only the variables being optimized, not the original variables. Notice that a2 is also not returned, since it is fully determined by a0 and a1.
The code that pymc3 uses for the interval transformation [^1] is
def forward(self, x):
a, b = self.a, self.b
r = T.log((x - a) / (b - x))
return r
Where a is the lower bound, b is the upper bound, and x is the variable being transformed. Using this map, values which are very close to the lower bound have transformed values approaching negative infinity, and values very close to the upper bound approach positive infinity.
Knowing the bounds, we can convert back from the real line to the bounded interval. The code that pymc3 uses for this is
def backward(self, x):
a, b = self.a, self.b
r = (b - a) * T.exp(x) / (1 + T.exp(x)) + a
return r
If you apply this backwards transformation yourself, you can recover a0 itself:
(5 - 0) * exp(-1.0812715249577414) / (1 + exp(-1.0812715249577414)) + 0 = 1.26632733897
Other transformations automatically applied include the log transform (for variables bounded on one side), and the stick-breaking transform (for variables which sum to 1).
[^1] As of commit 87cdd712c86321121c2ed3150764f3d847f5083c.

Related

Find numerical roots in poly rational elements with Sympy function set

I am not pratice in Sympy manipulation.
I need to find roots on particular poly:
-4x**(11/2)-24x**(9/2)-16x**(7/2)+2x**(5/2)+16x**(5)+23x**(4)+5x**(3)-x**(2)
I verified that I have 2 real solution and I find one of them with Sympy function
nsolve(mypoly,x,1).
Why the previous step doesn't look the other?
How can I proceed to find ALL roots?
Thank you to all for assistance
A.
To my knowledge, nsolve looks in the proximity of the provided initial guess to find one root for each equations.
I would plot the expression to find suitable initial guesses:
from sympy import *
from sympy.plotting import PlotGrid
expr = -4*x**(S(11)/2)-24*x**(S(9)/2)-16*x**(S(7)/2)+2*x**(S(5)/2)+16*x**(5)+23*x**(4)+5*x**(3)-x**(2)
p1 = plot(expr, (x, 0, 0.5), adaptive=False, n=1000, ylim=(-0.01, 0.05), show=False)
p2 = plot(expr, (x, 0, 5), adaptive=False, n=1000, ylim=(-200, 200), show=False)
PlotGrid(1, 2, p1, p2)
Now, we can do:
nsolve(expr, x, 0.2)
# out: 0.169003536680445
nsolve(expr, x, 4)
# out: 4.28968831654177
EDIT: to find all roots (even the complex one), we can:
compute the derivative of the expression.
convert both the expression and the derivative to numerical functions with sympy's lambdify.
visually inspect the expression in the complex plane to determine good initial values for the root finding algorithm. I'm going to use this plotting module, SymPy Plotting Backend which exposes a very handy function, plot_complex, to generate domain coloring plots. In particular, I will plot alternating black and white stripes corresponding to modulus.
use scipy's newton method to compute the actual roots. EDIT: I just discovered that nsolve works too :)
# step 1 and 2
f = lambdify(x, expr)
f_der = lambdify(x, expr.diff(x))
# step 3
from spb import plot_complex
r = (x, -1-0.8j, 4.5+0.8j)
w = r[1].real - r[2].real
h = r[1].imag - r[2].imag
# number of discretization points, watch out memory usage
n1 = 1500
n2 = int(h / w * n1)
plot_complex(expr, r, {"interpolation": "spline36"}, grid=False, coloring="e", n1=n1, n2=n2, size=(10, 5))
In the above picture we see circular stripes getting bigger and deforming. The center of these circular stripes represent a pole or a zero. But this is an easy case: there are no poles. So, from the above pictures we count 7 zeros. We already know 3, the two computed above and the value 0. Let's find the others:
from scipy.optimize import newton
r1 = newton(f, x0=-0.9+0.1j, fprime=f_der)
r2 = newton(f, x0=-0.9-0.1j, fprime=f_der)
r3 = newton(f, x0=0.6+0.6j, fprime=f_der)
r4 = newton(f, x0=0.6-0.6j, fprime=f_der)
for r in (r1, r2, r3, r4):
print(r, ": is it a zero?", expr.subs(x, r).evalf())
# out:
# (-0.9202719950522663+0.09010409402273806j) : is it a zero? -8.21787666002984e-15 + 2.06697764417957e-15*I
# (-0.9202719950522663-0.09010409402273806j) : is it a zero? -8.21787666002984e-15 - 2.06697764417957e-15*I
# (0.6323265751497729+0.6785871500619469j) : is it a zero? -2.2103533615688e-15 - 2.77549897301442e-15*I
# (0.6323265751497729-0.6785871500619469j) : is it a zero? -2.2103533615688e-15 + 2.77549897301442e-15*I
As you can see, inserting those values into the original expression get values very very close to zero. It is perfectly normal to see these kind of errors.
I just discovered that you can use also use nsolve instead of newton to compute complex roots. This makes step 1 and 2 unnecessary.
nsolve(expr, x, -0.9+0.1j)
# out: −0.920271995052266+0.0901040940227375𝑖

Using gradient descent to solve a nonlinear system

I have the following code, which uses gradient descent to find the global minimum of y = (x+5)^2:
cur_x = 3 # the algorithm starts at x=3
rate = 0.01 # learning rate
precision = 0.000001 # this tells us when to stop the algorithm
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 # iteration counter
df = lambda x: 2*(x+5) # gradient of our function
while previous_step_size > precision and iters < max_iters:
prev_x = cur_x # store current x value in prev_x
cur_x = cur_x - rate * df(prev_x) # grad descent
previous_step_size = abs(cur_x - prev_x) # change in x
iters = iters+1 # iteration count
print("Iteration",iters,"\nX value is",cur_x) # print iterations
print("The local minimum occurs at", cur_x)
The procedure is fairly simple, and among the most intuitive and brief for solving such a problem (at least, that I'm aware of).
I'd now like to apply this to solving a system of nonlinear equations. Namely, I want to use this to solve the Time Difference of Arrival problem in three dimensions. That is, given the coordinates of 4 observers (or, in general, n+1 observers for an n dimensional solution), the velocity v of some signal, and the time of arrival at each observer, I want to reconstruct the source (determine it's coordinates [x,y,z].
I've already accomplished this using approximation search (see this excellent post on the matter: ), and I'd now like to try doing so with gradient descent (really, just as an interesting exercise). I know that the problem in two dimensions can be described by the following non-linear system:
sqrt{(x-x_1)^2+(y-y_1)^2}+s(t_2-t_1) = sqrt{(x-x_2)^2 + (y-y_2)^2}
sqrt{(x-x_2)^2+(y-y_2)^2}+s(t_3-t_2) = sqrt{(x-x_3)^2 + (y-y_3)^2}
sqrt{(x-x_3)^2+(y-y_3)^2}+s(t_1-t_3) = sqrt{(x-x_1)^2 + (y-y_1)^2}
I know that it can be done, however I cannot determine how.
How might I go about applying this to 3-dimensions, or some nonlinear system in general?

Use Chi-Squared statistic in pymc3

I am trying to use PyMC3 to fit a model to some observed data. This model is based on external code (interfaced via theano.ops.as_op), and depends on multiple parameters that should be fit by the MCMC process. Since the gradient of the external code cannot be determined, I use the Metropolis-Hastings sampler.
I have established Uniform priors for my inputs, and generate a model using my custom code. However, I want to compare the simulated data to my observations (a 3D np.ndarray) using the chi-squared statistic (sum of the squares of data-model/sigma^2) to obtain a log-likelihood. When the MCMC samples are drawn, this should lead to the trace converging on the best values of each parameter.
My model is explained in the following semi-pseudocode (if that's even a word):
import pymc3 as pm
#Some stuff setting up the data, preparing some functions etc.
#theano.compile.ops.as_op(itypes=[input types],otypes = [output types])
def make_model(inputs):
#Wrapper to external code to generate simulated data
return simulated data
model = pm.model()
with model:
#priors for 13 input parameters
simData = make_model(inputs)
I now want to obtain the chi-squared logLikelihood for this model versus the data, which I think can be done using pm.ChiSquared, however I do not see how to combine the data, model and this distribution together to cause the sampler to perform correctly. I would guess it might look something like:
chiSq = pm.ChiSquared(nu=data.size, observed = (data-simData)**2/err**2)
trace = pm.sample(1000)
Is this correct? In running previous tests, I have found the samples appear to be simply drawn from the priors.
Thanks in advance.
Taking aloctavodia's advice, I was able to get parameter estimates for some toy exponential data using a pm.Normal likelihood. Using a pm.ChiSquared likelihood as the OP suggested, the model converged to correct values, but the posteriors on the parameters were roughly three times as broad. Here's the code for the model; I first generated data and then fit with PyMC3.
# Draw `nPoints` observed data points `y_obs` from the function
# 3. + 18. * numpy.exp(-.2 * x)
# with the points evaluated at `x_obs`
# x_obs = numpy.linspace(0, 100, nPoints)
# Add Normal(mu=0,sd=`cov`) noise to each point in `y_obs`
# Then instantiate PyMC3 model for fit:
def YModel(x, c, a, l):
# exponential model expected to describe the data
mu = c + a * pm.math.exp(-l * x)
return mu
def logp(y_mod, y_obs):
# Normal distribution likelihood
return pm.Normal.dist(mu = y_mod, sd = cov).logp(y_obs)
# Chi squared likelihood (to use, comment preceding line & uncomment next 2 lines)
#chi2 = chi2 = pm.math.sum( ((y_mod - y_obs)/cov)**2 )
#return pm.ChiSquared.dist(nu = nPoints).logp(chi2)
with pm.Model() as model:
c = pm.Uniform('constant', lower = 0., upper = 10., testval = 5.)
a = pm.Uniform('amplitude', lower = 0., upper = 50., testval = 25.)
l = pm.Uniform('lambda', lower = 0., upper = 10., testval = 5.)
y_mod = YModel(x_obs, c, a, l)
L = pm.DensityDist('L', logp, observed = {'y_mod': y_mod, 'y_obs': y_obs}, testval = {'y_mod': y_mod, 'y_obs': y_obs})
step = pm.Metropolis([c, a, l])
trace = pm.sample(draws = 10000, step = step)
The above model converged, but I found that success was sensitive to the bounds on the priors and the initial guesses on those parameters.
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
c 3.184397 0.111933 0.002563 2.958383 3.397741 1834.0 1.000260
a 18.276887 0.747706 0.019857 16.882025 19.762849 1343.0 1.000411
l 0.200201 0.013486 0.000361 0.174800 0.226480 1282.0 0.999991
(Edited: I had forgotten to sum the squares of the normalized residuals for chi2)

Finds the optimal cross-validated L1-penalty for a given L2-penalty

I have a list of vectors (each vector is a variable) which values can be 0 or 1, and this values represent the coefficient (a1, a2, ...) of my models:
y = x1 * a1 + x2 * a2 ...
I need to use cross-validation to build a Poisson regression model that balances model fit with complexity by pushing variable coefficients towards zero; in order to reduce the large set of variables to a smaller set of variables with non-zero coefficients.
I know that this result can be obtained with the penalized R package, but I am wondering if it could be done by using any C++ library.

Parameter optimization using a genetic algorithm?

I am trying to optimize parameters for a known function to fit an experimental data plot. The function is fairly involved
where x sweeps along a know set of numbers and p, g and c are the independent parameters to be optimized. Any ideas or resources that could be of assistance?
I would not recommend Genetic Algorithms. Instead go for straight forward Optimization.
Scipy has some resources.
You haven't provided any data or so, so I'll just go for something that should run. Below is something to get you started. I can't know if it works without seeing the data. Also, there must probably is a way to dynamically feed objectivefunc your x and y data. That's probably in the docs to scipy.optimize.minimize.
What I've done. Create a function to minimize. Here, I've called it objectivefunc. For that I've taken your function y = x^2 * p^2 * g / ... and transformed it to be of the form x^2 * p^2 * g / (...) - y = 0. Then square the left hand side and try to minimise it. Because you will have multiple (x/y) data samples, I'd minimise the sum of the squares. Put it all in a function and pass it to the minimize from scipy.
import numpy as np
from scipy.optimize import minimize
def objectivefunc(pgq):
"""Your function transformed so that it can be minimised.
I've renamed the input pgq, so that pgq[0] is p, pgq[1] is g, etc.
"""
p = pgq[0]
g = pgq[1]
q = pgq[2]
x = [10, 9.4, 17] # Some input data.
y = [12, 42, 0.8]
sum_ = 0
for i in range(len(x)):
sum_ += (x[i]**2 * p**2 * g - y[i] * ( (c**2 - x**2)**2 + x**2 * g**2) )**2
return sum_
pgq = np.array([1.3, 0.7, 0.5]) # Supply sensible initivial values
res = minimize(objectivefunc, pgq, method='nelder-mead',
options={'xtol': 1e-8, 'disp': True})
Have you tired old good Levenberg-Marquardt as implemented in Levenberg-Marquardt.vi. If it does not suite your needs, you can try Waptia libraryfor LabVIEW with one of the genetic algorithms implemented.