Related
I'm new to SymPy and I'm trying to use it to sum two Poisson distributions
Here's what I have so far (using jupyter notebook)
from sympy import *
from sympy.stats import *
init_printing(use_latex='mathjax')
lamda_1, lamda_2 = symbols('lamda_1, lamda_2')
n_1 = Symbol('n_1')
n_2 = Symbol('n_2')
n = Symbol('n')
#setting up distributions
N_1 = density(Poisson('N_1', lamda_1))(n_1)
N_2 = density(Poisson('N_2', lamda_2))(n_2)
display(N_1)
display(N_2)
print('setting N_2 in terms of N and N_1')
N_2 = N_2.subs(n_2,n-n_1)
display(N_2)
print("N_1 * N_2")
N = N_1 * N_2
#display(N)
Sum(N,(n_1,0,n))
#summation(N,(n_1,0,n))
Everything works fine until I try and run the summation. No errors just doesn't do anything and jupyter says it's running. I've let it run for 10 mins and nothing...
When declaring symbols, include their properties: being positive, integer, nonnegative, etc. This helps SymPy decide whether some transformations are legitimate.
lamda_1, lamda_2 = symbols('lamda_1, lamda_2', positive=True)
n_1, n_2, n = symbols('n_1 n_2 n', nonnegative=True, integer=True)
Unfortunately, summation still fails because SymPy cannot come up with the key trick: multiplying and dividing by factorial(n). It seems one has to tell it to do that.
s = summation(N*factorial(n), (n_1, 0, n))/factorial(n)
print(s.simplify())
This prints
Piecewise(((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n), ((-n >= 0) & (lamda_1/lamda_2 <= 1)) | ((-n < 0) & (lamda_1/lamda_2 <= 1))), (lamda_2**n*exp(-lamda_1 - lamda_2)*Sum(lamda_1**n_1*lamda_2**(-n_1)/(factorial(n_1)*factorial(n - n_1)), (n_1, 0, n)), True))
which is a piecewise formula full of unnecessary conditions... but if we ignore those conditions (they are just artifacts of how SymPy performed the summation), the correct result
((lamda_1 + lamda_2)**n*exp(-lamda_1 - lamda_2)/factorial(n)
is there.
Aside: avoid doing import * from both sympy and sympy.stats, there are notational clashes such as E being 2.718... versus expected value. from sympy.stats import density, Poisson would be better. Also, N is a built-in SymPy function and is best avoided as a variable name.
I'm trying to compute eigenvalues of a symbolic complex matrix Mof size 3x3. In some cases, eigenvals() works perfectly. For example, the following code:
import sympy as sp
kx = sp.symbols('kx')
x = 0.
M = sp.Matrix([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
M[0, 0] = 1.
M[0, 1] = 2./3.
M[0, 2] = 2./3.
M[1, 0] = sp.exp(1j*kx) * 1./6. + x
M[1, 1] = sp.exp(1j*kx) * 2./3.
M[1, 2] = sp.exp(1j*kx) * -1./3.
M[2, 0] = sp.exp(-1j*kx) * 1./6.
M[2, 1] = sp.exp(-1j*kx) * -1./3.
M[2, 2] = sp.exp(-1j*kx) * 2./3.
dict_eig = M.eigenvals()
returns me 3 correct complex symbolic eigenvalues of M. However, when I set x=1., I get the following error:
raise MatrixError("Could not compute eigenvalues for {}".format(self))
I also tried to compute eigenvalues as follows:
lam = sp.symbols('lambda')
cp = sp.det(M - lam * sp.eye(3))
eigs = sp.solveset(cp, lam)
but it returns me a ConditionSet in any case, even when eigenvals() can do the job.
Does anyone know how to properly solve this eigenvalue problem, for any value of x?
Your definition of M made life too hard for SymPy because it introduced floating point numbers. When you want a symbolic solution, floats are to be avoided. That means:
instead of 1./3. (Python's floating point number) use sp.Rational(1, 3) (SymPy's rational number) or sp.S(1)/3 which has the same effect but is easier to type.
instead of 1j (Python's imaginary unit) use sp.I (SymPy's imaginary unit)
instead of x = 1., write x = 1 (Python 2.7 habits and SymPy go poorly together).
With these changes either solveset or solve find the eigenvalues, although solve gets them much faster. Also, you can make a Poly object and apply roots to it, which is probably most efficient:
M = sp.Matrix([
[
1,
sp.Rational(2, 3),
sp.Rational(2, 3),
],
[
sp.exp(sp.I*kx) * sp.Rational(1, 6) + x,
sp.exp(sp.I*kx) * sp.Rational(1, 6),
sp.exp(sp.I*kx) * sp.Rational(-1, 3),
],
[
sp.exp(-sp.I*kx) * sp.Rational(1, 6),
sp.exp(-sp.I*kx) * sp.Rational(-1, 3),
sp.exp(-sp.I*kx) * sp.Rational(2, 3),
]
])
lam = sp.symbols('lambda')
cp = sp.det(M - lam * sp.eye(3))
eigs = sp.roots(sp.Poly(cp, lam))
(It would be easier to do from sympy import * than type all these sp.)
I'm not quite clear on why SymPy's eigenvals method reports failure even with the above modifications. As you can see in the source, it doesn't do much more than what the above code does: call roots on the characteristic polynomial. The difference appears to be in the way this polynomial is created: M.charpoly(lam) returns
PurePoly(lambda**3 + (I*sin(kx)/2 - 5*cos(kx)/6 - 1)*lambda**2 + (-I*sin(kx)/2 + 11*cos(kx)/18 - 2/3)*lambda + 1/6 + 2*exp(-I*kx)/3, lambda, domain='EX')
with mysterious (to me) domain='EX'. Subsequently, an application of roots returns {}, no roots found. Looks like a deficiency of the implementation.
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
This function handles the situation where vector v has the norm value of 0.
Is there any similar functions provided in sklearn or numpy?
If you're using scikit-learn you can use sklearn.preprocessing.normalize:
import numpy as np
from sklearn.preprocessing import normalize
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.
import numpy as np
def normalized(a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2==0] = 1
return a / np.expand_dims(l2, axis)
A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))
print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))
This might also work for you
import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v has length 0.
In that case, introducing a small constant to prevent the zero division solves this.
As proposed in the comments one could also use
v/np.linalg.norm(v)
To avoid zero division I use eps, but that's maybe not great.
def normalize(v):
norm=np.linalg.norm(v)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True):
# d is a (n x dimension) np array
d = _d if not copy else np.copy(_d)
d -= np.min(d, axis=0)
d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
return d
Uses numpys peak to peak function.
a = np.random.random((5, 3))
b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1
c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1
If you don't need utmost precision, your function can be reduced to:
v_norm = v / (np.linalg.norm(v) + 1e-16)
You mentioned sci-kit learn, so I want to share another solution.
sci-kit learn MinMaxScaler
In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.
It also deal with NaN issues for us.
NaNs are treated as missing values: disregarded in fit, and maintained
in transform. ... see reference [1]
Code sample
The code is simple, just type
# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference
[1] sklearn.preprocessing.MinMaxScaler
There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:
import transformations as trafo
import numpy as np
data = np.array([[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
[1.0, 2.0, 3.0]])
print(trafo.unit_vector(data, axis=1))
If you work with multidimensional array following fast solution is possible.
Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.
import numpy as np
arr = np.array([
[1, 2, 3],
[0, 0, 0],
[5, 6, 7]
], dtype=np.float)
lengths = np.linalg.norm(arr, axis=-1)
print(lengths) # [ 3.74165739 0. 10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0. 0. 0. ]
# [0.47673129 0.57207755 0.66742381]]
If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:
import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize
vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()
If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.
import numpy as np
import vg
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.
Without sklearn and using just numpy.
Just define a function:.
Assuming that the rows are the variables and the columns the samples (axis= 1):
import numpy as np
# Example array
X = np.array([[1,2,3],[4,5,6]])
def stdmtx(X):
means = X.mean(axis =1)
stds = X.std(axis= 1, ddof=1)
X= X - means[:, np.newaxis]
X= X / stds[:, np.newaxis]
return np.nan_to_num(X)
output:
X
array([[1, 2, 3],
[4, 5, 6]])
stdmtx(X)
array([[-1., 0., 1.],
[-1., 0., 1.]])
For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.
a / np.linalg.norm(a, axis=1, keepdims=True)
If you want all values in [0; 1] for 1d-array then just use
(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
Where a is your 1d-array.
An example:
>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])
Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.
A simple dot product would do the job. No need for any extra package.
x = x/np.sqrt(x.dot(x))
By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use
norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x
This is a simple approximation to the Biot-Savart law.
I've implemented the integral(sum) in the function calc(),
If the number of spatial points is big, say 10^7 or 10^8 -ish, can calc be written to use NumPy arrays more efficiently? Thanks for your suggestions!
def calc(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :] # START CALCULATION
bottom = ((r**2).sum(axis=-1)**1.5)[...,None] # 1/|r|**3 add axis for vector
top = np.cross(idl_seg[None,:,:], r) # np.cross defaults to last axis
db = (mu0 / four_pi) * top / bottom
b = db.sum(axis=-2) # sum over the segments of the current loop
return b
EDIT: So for example, I can do this. Now there are just two arrays (r and hold) of size nx * ny * nz * nseg * 3. Maybe I should pass smaller chunks of points at a time, so it can all fit in cache at once?
def calc_alt(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :]
hold = np.ones_like(r)*((r**2).sum(axis=-1)**-1.5)[...,None] # note **-1.5 neg
b = (hold * np.cross(idl_seg[None,:,:], r)).sum(axis=-2)
return b * (mu0 / four_pi)
The rest of the code is posted to show how calc is used.
import numpy as np
import matplotlib.pyplot as plt
pi, four_pi = np.pi, 4. * np.pi
mu0 = four_pi * 1E-07 # Tesla m/A exact, defined
r0 = 0.05 # meters
I0 = 100.0 # amps
nx, ny, nz = 48, 49, 50
x,y,z = np.linspace(0,2*r0,nx), np.linspace(0,2*r0,ny), np.linspace(0,2*r0,nz)
xg = np.zeros((nx, ny, nz, 3)) # 3D grid of position vectors
xg[...,0] = x[:, None, None] # fill up the positions
xg[...,1] = y[None, :, None]
xg[...,2] = z[None, None, :]
xgv = xg.reshape(nx*ny*nz, 3) # flattened view of spatial points
nseg = 32 # approximate the current loop as a set of discrete points I*dl
theta = np.linspace(0, 2.*pi, nseg+1)[:-1] # get rid of the repeat
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
b = calc(xgv, xdl, idl) # HERE IS THE CALCULATION
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2] # make component views
bperp = np.sqrt(bx**2 + by**2) # new array for perp field
zround = np.round(z, 4)
iz = 5 # choose a transverse plane for a plot
fields = [ bz, bperp, bx, by]
names = ['Bz', 'Bperp', 'Bx', 'By']
titles = ["approx " + name + " at z = " + str(zround[iz])
for name in names]
plt.figure()
for i, field in enumerate(fields):
print i
plt.subplot(2, 2, i+1)
plt.imshow(field[..., iz], origin='lower') # fields at iz don't use Jet !!!
plt.title(titles[i])
plt.colorbar()
plt.show()
The plotting at the end is just to see that it appears to be working. In reality, never use the default colormap. Bad, awful, naughty Jet! In this case, a divergent cmap with symmetric vmin = -vmax might be good. (see Jake VanderPlas' post, and the matplotlib documentation, and there's some lovely demos down here.
You could compress these lines:
b = db.sum(axis=-2) # sum over the segments of the current loop
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2]
into
bx, by, bz = np.split(db.sum(axis=-2).reshape(nx, ny, nz, 3), 3, -1)
I doubt if it makes any difference in speed. Whether it makes this clearer or not is debateable.
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
could be rewritten as (not tested)
xdl = r0 * np.array([np.cos(theta), np.sin(theta)]
idl = I0 * np.array([-np.sin(theta), np.cos(theta)]
though these would make these (3,nseg). Note that the default axis for split is 0. Combining and split on the 1st axis is usually more natural. Also [None,...] broadcasting is automatic.
The ng construction might also be streamlined.
Mostly these are a cosmetic changes that won't make big differences in performance.
I have just run across np.numexpr which does (among other things) what I suggested in the edit - breaks the arrays into "chunks" so that they can fit into cache, including all temporary arrays needed to evaluate expressions.
https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html
There are nice explanations here and especially in this wiki.
I am trying to integrate a second order differential equation using 'scipy.integrate.odeint'. My eqution is as follows
m*x[i]''+x[i]'= K/N*sum(j=0 to N)of sin(x[j]-x[i])
which I have converted into two first order ODEs as followed. In the below code, yinit is array of the initial values x(0) and x'(0). My question is what should be the values of x(0) and x'(0) ?
x'[i]=y[i]
y'[i]=(-y[i]+K/N*sum(j=0 to N)of sin(x[j]-x[i]))/m
from numpy import *
from scipy.integrate import odeint
N = 50
def f(theta, t):
global N
x, y = theta
m = 0.95
K = 1.0
fx = zeros(N, float)
for i in range(N):
s = 0.0
for j in range(i+1,N):
s = s + sin(x[j] - x[i])
fx[i] = (-y[i] + (K*s)/N)/m
return array([y, fx])
t = linspace(0, 10, 100, endpoint=False)
Uniformly generating random number
theta = random.uniform(-180, 180, N)
Integrating function f using odeint
yinit = array([x(0), x'(0)])
y = odeint(f, yinit, t)[:,0]
print (y)
You can choose as initial condition whatever you want.
In your case, you decided to use a random initial condition for x for all the oscillators. You can use a random initial condition for 'y' as well I guess, as I did below.
There were a few errors in the above code, mostly on how to unpack x,y from theta and how to repack them at the end (see concatenate below in the corrected code). See also the concatenate for yinit.
The rest are stylish/minor changes.
from numpy import concatenate, linspace, random, mod, zeros, sin
from scipy.integrate import odeint
Nosc = 20
assert mod(Nosc, 2) == 0
def f(theta, _):
N = theta.size / 2
x, y = theta[:N], theta[N:]
m = 0.95
K = 1.0
fx = zeros(N, float)
for i in range(N):
s = 0.0
for j in range(i + 1, N):
s = s + sin(x[j] - x[i])
fx[i] = (-y[i] + (K * s) / N) / m
return concatenate(([y, fx]))
t = linspace(0, 10, 50, endpoint=False)
theta = random.uniform(-180, 180, Nosc)
theta2 = random.uniform(-180, 180, Nosc) #added initial condition for the velocities of the oscillators
yinit = concatenate((theta, theta2))
res = odeint(f, yinit, t)
X = res[:, :Nosc].T
Y = res[:, Nosc:].T
To plot the time evolution of the system, you can use something like
import matplotlib.pylab as plt
fig, ax = plt.subplots()
for displacement in X:
ax.plot(t, displacement)
ax.set_xlabel('t')
ax.set_ylabel('x')
fig.show()
What are you modelling? At first the eq. looked a bit like kuramoto oscillators, but then I noticed you also have a x[i]'' term.
Notice how in your model, as you do not have a spring term in the equation, like a term x(t) at the LHS, the value of x converges to an arbitrary value: