FInd mode of Poisson distribution with SymPy - sympy

Why I can't find the value at the mode of the Poisson pdf?
import sympy
from sympy import Symbol, simplify
from sympy.stats import Poisson, density
nobs = Symbol('nobs', positive=True, domain=sympy.Integers)
mu = Symbol('μ', domain=sympy.Reals, positive=True)
poisson_pdf = lambda n, _lambda: density(Poisson('X', _lambda))(n)
simplify(sympy.calculus.util.maximum(poisson_pdf(nobs, mu), mu))
I get
Max(oo*(-1)**nobs, (nobs*exp(-1))**nobs/gamma(nobs + 1))
the second argument of max is the solution I would like. What am I missing? And Why I get a gamma function if nobs is positive integer?

The mode of a Poisson is not always unique. (See Wikipedia, note the modes on the plots for λ = 1 and λ = 4.)
Gamma(n + 1) == factorial(n) for positive integer n.

Related

Find numerical roots in poly rational elements with Sympy function set

I am not pratice in Sympy manipulation.
I need to find roots on particular poly:
-4x**(11/2)-24x**(9/2)-16x**(7/2)+2x**(5/2)+16x**(5)+23x**(4)+5x**(3)-x**(2)
I verified that I have 2 real solution and I find one of them with Sympy function
nsolve(mypoly,x,1).
Why the previous step doesn't look the other?
How can I proceed to find ALL roots?
Thank you to all for assistance
A.
To my knowledge, nsolve looks in the proximity of the provided initial guess to find one root for each equations.
I would plot the expression to find suitable initial guesses:
from sympy import *
from sympy.plotting import PlotGrid
expr = -4*x**(S(11)/2)-24*x**(S(9)/2)-16*x**(S(7)/2)+2*x**(S(5)/2)+16*x**(5)+23*x**(4)+5*x**(3)-x**(2)
p1 = plot(expr, (x, 0, 0.5), adaptive=False, n=1000, ylim=(-0.01, 0.05), show=False)
p2 = plot(expr, (x, 0, 5), adaptive=False, n=1000, ylim=(-200, 200), show=False)
PlotGrid(1, 2, p1, p2)
Now, we can do:
nsolve(expr, x, 0.2)
# out: 0.169003536680445
nsolve(expr, x, 4)
# out: 4.28968831654177
EDIT: to find all roots (even the complex one), we can:
compute the derivative of the expression.
convert both the expression and the derivative to numerical functions with sympy's lambdify.
visually inspect the expression in the complex plane to determine good initial values for the root finding algorithm. I'm going to use this plotting module, SymPy Plotting Backend which exposes a very handy function, plot_complex, to generate domain coloring plots. In particular, I will plot alternating black and white stripes corresponding to modulus.
use scipy's newton method to compute the actual roots. EDIT: I just discovered that nsolve works too :)
# step 1 and 2
f = lambdify(x, expr)
f_der = lambdify(x, expr.diff(x))
# step 3
from spb import plot_complex
r = (x, -1-0.8j, 4.5+0.8j)
w = r[1].real - r[2].real
h = r[1].imag - r[2].imag
# number of discretization points, watch out memory usage
n1 = 1500
n2 = int(h / w * n1)
plot_complex(expr, r, {"interpolation": "spline36"}, grid=False, coloring="e", n1=n1, n2=n2, size=(10, 5))
In the above picture we see circular stripes getting bigger and deforming. The center of these circular stripes represent a pole or a zero. But this is an easy case: there are no poles. So, from the above pictures we count 7 zeros. We already know 3, the two computed above and the value 0. Let's find the others:
from scipy.optimize import newton
r1 = newton(f, x0=-0.9+0.1j, fprime=f_der)
r2 = newton(f, x0=-0.9-0.1j, fprime=f_der)
r3 = newton(f, x0=0.6+0.6j, fprime=f_der)
r4 = newton(f, x0=0.6-0.6j, fprime=f_der)
for r in (r1, r2, r3, r4):
print(r, ": is it a zero?", expr.subs(x, r).evalf())
# out:
# (-0.9202719950522663+0.09010409402273806j) : is it a zero? -8.21787666002984e-15 + 2.06697764417957e-15*I
# (-0.9202719950522663-0.09010409402273806j) : is it a zero? -8.21787666002984e-15 - 2.06697764417957e-15*I
# (0.6323265751497729+0.6785871500619469j) : is it a zero? -2.2103533615688e-15 - 2.77549897301442e-15*I
# (0.6323265751497729-0.6785871500619469j) : is it a zero? -2.2103533615688e-15 + 2.77549897301442e-15*I
As you can see, inserting those values into the original expression get values very very close to zero. It is perfectly normal to see these kind of errors.
I just discovered that you can use also use nsolve instead of newton to compute complex roots. This makes step 1 and 2 unnecessary.
nsolve(expr, x, -0.9+0.1j)
# out: −0.920271995052266+0.0901040940227375𝑖

Summing two Poisson distributions with SymPy

I'm new to SymPy and I'm trying to use it to sum two Poisson distributions
Here's what I have so far (using jupyter notebook)
from sympy import *
from sympy.stats import *
init_printing(use_latex='mathjax')
lamda_1, lamda_2 = symbols('lamda_1, lamda_2')
n_1 = Symbol('n_1')
n_2 = Symbol('n_2')
n = Symbol('n')
#setting up distributions
N_1 = density(Poisson('N_1', lamda_1))(n_1)
N_2 = density(Poisson('N_2', lamda_2))(n_2)
display(N_1)
display(N_2)
print('setting N_2 in terms of N and N_1')
N_2 = N_2.subs(n_2,n-n_1)
display(N_2)
print("N_1 * N_2")
N = N_1 * N_2
#display(N)
Sum(N,(n_1,0,n))
#summation(N,(n_1,0,n))
Everything works fine until I try and run the summation. No errors just doesn't do anything and jupyter says it's running. I've let it run for 10 mins and nothing...
When declaring symbols, include their properties: being positive, integer, nonnegative, etc. This helps SymPy decide whether some transformations are legitimate.
lamda_1, lamda_2 = symbols('lamda_1, lamda_2', positive=True)
n_1, n_2, n = symbols('n_1 n_2 n', nonnegative=True, integer=True)
Unfortunately, summation still fails because SymPy cannot come up with the key trick: multiplying and dividing by factorial(n). It seems one has to tell it to do that.
s = summation(N*factorial(n), (n_1, 0, n))/factorial(n)
print(s.simplify())
This prints
Piecewise(((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n), ((-n >= 0) & (lamda_1/lamda_2 <= 1)) | ((-n < 0) & (lamda_1/lamda_2 <= 1))), (lamda_2**n*exp(-lamda_1 - lamda_2)*Sum(lamda_1**n_1*lamda_2**(-n_1)/(factorial(n_1)*factorial(n - n_1)), (n_1, 0, n)), True))
which is a piecewise formula full of unnecessary conditions... but if we ignore those conditions (they are just artifacts of how SymPy performed the summation), the correct result
((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n)
is there.
Aside: avoid doing import * from both sympy and sympy.stats, there are notational clashes such as E being 2.718... versus expected value. from sympy.stats import density, Poisson would be better. Also, N is a built-in SymPy function and is best avoided as a variable name.

Plot a cube root function with SymPy, including negative arguments

I'm trying to plot a cube root function with SymPy. I know what this should look like, but I'm only seeing values for x >= 0, not for negative numbers. I've tried two approaches.
cbrt:
from sympy import symbols, plot
from sympy.functions.elementary.miscellaneous import cbrt
x = symbols('x')
eqn = cbrt(x)
p = plot(eqn)
nthroot:
from sympy import symbols, plot
from sympy.simplify.simplify import nthroot
x = symbols('x')
eqn = nthroot(x, 3)
p = plot(eqn)
SymPy's functions cbrt and root use the principal branch of the root. The principal branch of the multivalued function z->z**(1/3) is equal to -1/2 + I*sqrt(3)/2 at -1. It is not a real number, so you don't see it on the plot.
But it is often desired to get the real-valued root for all real inputs, which is possible for odd degrees. This is provided by the function real_root. So, in principle your code should be
from sympy import symbols, plot, real_root
x = symbols('x')
eqn = real_root(x, 3)
p = plot(eqn)
However, the implementation of real_root does not fit the expectations of the SymPy plotting routine, so the above throws an error as of now. (Different errors in different versions of SymPy). Instead, plot the mathematically equivalent function |x|**(1/3) * sign(x):
from sympy import symbols, plot, root, sign, Abs
x = symbols('x')
eqn = root(Abs(x), 3)*sign(x)
p = plot(eqn)
Remark: The function nthroot from simplify module is not for computing the nth root, it is for simplifying expressions with radicals.

How to calculate the log-normal density (starting from the normal density) using SymPy

from sympy import *
x, y, mu, sigma, density1, density2 = symbols('x y mu sigma density1 density2')
eq1 = Eq(density1, 1/(sqrt(2*pi)*sigma)
*exp(-(x-mu)**2/(2*sigma**2))) # normal
eq2 = Eq(y, exp(x)) # substitution
eq3 = Eq(density2, 1/(y*sqrt(2*pi)*sigma)
*exp(-(ln(y)-mu)**2/(2*sigma**2))) # lognormal
[eq1, eq2, eq3]
Output:
How can I make SymPy take the normal density (eq1), apply the x to y substitution (eq2) and output the lognormal density (eq3)?
(I received no answer to this question at https://stats.stackexchange.com/q/55353/14202 .)
When we change a variable in a probability density function, it is also necessary to multiply the density by the derivative of the function that performs the substitution. This is how substitution works in integrals, and probability density must have integral 1, so we have to respect that.
Let's call the substitution-performing function f (it's log(y) in your example). Here is how the process works, starting from your setup:
f = solve(eq2, x)[0]
new_density = eq1.rhs.subs(x, f) * f.diff()
# test it now
Eq(new_density, eq3.rhs) # True

Python (Scipy): Finding the scale parameter (standard deviation) of a gaussian distribution

it is quite common to calculate the probability density of a value within a probability density function (PDF). Imagine we have a gaussian distribution with mean = 40, a standard deviation of 5 and now would like to get the probability density of value 32. We'd go like:
In [1]: import scipy.stats as stats
In [2]: print stats.norm.pdf(32, loc=40, scale=5)
Out [2]: 0.022
--> The probability density is 2.2%.
But now, let's consider the inverse problem. I have the mean value, I have the value at probabilty density of 0.05 and I would like to get the standard deviation (i.e. the scale parameter).
What I could implement is a numerical approach: create stats.norm.pdf several times with the scale-parameter increased stepwise and take that one with the result getting as closest as possible.
In my case, I specify the value 30 as the 5% mark. So I need to solve this "equation":
stats.norm.pdf(30, loc=40, scale=X) = 0.05
There is a scipy function called "ppf" which is the inverse of the PDF, so it will return the value for a specific probability density, but I haven't found a function to return the scale parameter.
Implementing an iteration would take too much time (both creating and calculating). My script is going to be huge, so I should save computation time. Could the lambda-function help in this case? I roughly know what it's doing, but I haven't used it so far. Any ideas on this?
Thank you!
The normal probability density function, f is given by
Given f and x we wish to solve for 𝞼. Let's ask sympy if it can solve the equation:
import sympy as sy
from sympy.abc import x, y, sigma
expr = (1/(sy.sqrt(2*sy.pi)*sigma) * sy.exp(-x**2/(2*sigma**2))) - y
ans = sy.solve(expr, sigma)[0]
print(ans)
# sqrt(2)*exp(LambertW(-2*pi*x**2*y**2)/2)/(2*sqrt(pi)*y)
So it appears there is a closed-formed solution in terms of the LambertW function, W, which satisfies
z = W(z) * exp(W(z))
for all complex-valued z.
We could use sympy to also find the numerical result for given x and y, but
perhaps it would be faster to do the numerical work with
scipy.special.lambertw:
import numpy as np
import scipy.special as special
def sigma_func(x, y):
results = set([np.real_if_close(
np.sqrt(2)*np.exp(special.lambertw(-2*np.pi*x**2*y**2, k=k)/2)
/(2*np.sqrt(np.pi)*y)).item() for k in (0, -1)])
results = [s for s in results if np.isreal(s)]
return results
In general, the LambertW function returns complex values, but we are only
interested in real-valued solutions for sigma. Per the
docs,
special.lambertw has two partially-real branches, when k=0 and k=1. So the
code above checks if the returned value (for those two branches) is real, and
returns a list of any real solutions if they exist. If no real solution exists,
then an empty list is returned. That happens if the pdf value y is not
attained for any real value of sigma (for the given value of x).
You can use it like this:
x = 30.0
loc = 40.0
y = 0.02
s = sigma_func(loc-x, y)
print(s)
# [16.65817044316178, 6.830458938511113]
import scipy.stats as stats
for si in s:
assert np.allclose(stats.norm.pdf(x, loc=loc, scale=si), y)
In the example you gave, with y = 0.025, there is no solution for sigma:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
x = 30.0
loc = 40.0
y = 0.025
s = np.linspace(5, 20, 100)
plt.plot(s, stats.norm.pdf(x, loc=loc, scale=s))
plt.hlines(y, 4, 20, color='red') # the horizontal line y = 0.025
plt.ylabel('pdf')
plt.xlabel('sigma')
plt.show()
and so sigma_func(40-30, 0.025) returns an empty list:
In [93]: sigma_func(40-30, 0.025)
Out [93]: []
The plot above is typical in the sense that when y is too large there are zero
solutions, at the maximum of the curve (let's call it y_max) there is one
solution
In [199]: y_max = np.nextafter(np.sqrt(1/(np.exp(1)*2*np.pi*(10)**2)), -np.inf)
In [200]: y_max
Out[200]: 0.024197072451914336
In [201]: sigma_func(40-30, y_max)
Out[201]: [9.9999999776424]
and for y smaller than the y_max there are two solutions.
The will be two solutions, because normal PDF is symmetric around the mean.
As it stands, you have a single-variable equation to solve. It won't have a closed-form solution, so you can use e.g. scipy.optimize.fsolve to solve it.
EDIT: see #unutbu's answer for the closed form solution in terms of Lambert W function.