I am trying to write a multivariate Singular Spectrum Analysis with Monte Carlo test. To this extent I am working on a code piece that can reconstruct the input series using the lagged trajectory matrix and projection base (ST-PCs) that result from the pca/ssa decomposition of the input series. The attached code piece works for a lagged univariate (that is, single) time series, but I am struggling to make this reconstruction for a lagged multivariate time series. I don't quite get the procedure mathematically and - not surprisingly - I also did not manage to program it. Useful links are attached to the function descriptions of the accompanying code. Input data should be of the form (time * number of series), so say 288x3 implying 3 time series of 288 time levels.
I hope you can help me out!
import numpy as np
def lagged_covariance_matrix(data, M):
""" Computes the lagged covariance matrix using the Broomhead & King method
Background: Plaut, G., & Vautard, R. (1994). Spells of low-frequency oscillations and
weather regimes in the Northern Hemisphere. Journal of the atmospheric sciences, 51(2), 210-236.
Arguments:
data : pxn time series, where p denotes the length of the time series and n the number of channels
M : window length """
# explicitely 'add' spatial dimension if input is a single time series
if np.ndim(data) == 1:
data = np.reshape(data,(len(data),1))
T = data.shape[0]
L = data.shape[1]
N = T - M + 1
X = np.zeros((T, L, M))
for i in range(M):
X[:,:,i] = np.roll(data, -i, axis = 0)
X = X[:N]
# X constitutes the trajectory matrix and is a stacked hankel matrix
X = np.reshape(X, (N, M*L), order = 'C') # https://www.jstatsoft.org/article/viewFile/v067i02/v67i02.pdf
# choose the smallest projection basis for computation of the covariance matrix
if M*L >= N:
return 1/(M*L) * X.dot(X.T), X
else:
return 1/N * X.T.dot(X), X
def sort_by_eigenvalues(eigenvalues, PCs):
""" Sorts the PCs and eigenvalues by descending size of the eigenvalues """
desc = np.argsort(-eigenvalues)
return eigenvalues[desc], PCs[:,desc]
def Reconstruction(M, E, X):
""" Reconstructs the series as the sum of M subseries.
See: https://en.wikipedia.org/wiki/Singular_spectrum_analysis, 'Basic SSA' &
the work of Vivien Sainte Fare Garnot on univariate time series (https://github.com/VSainteuf/mcssa)
Arguments:
M : window length
E : eigenvector basis
X : trajectory matrix """
time = len(X) + M - 1
RC = np.zeros((time, M))
# step 3: grouping
for i in range(M):
d = np.zeros(M)
d[i] = 1
I = np.diag(d)
Q = np.flipud(X # E # I # E.T)
# step 4: diagonal averaging
for k in range(time):
RC[k, i] = np.diagonal(Q, offset = -(time - M - k)).mean()
return RC
#=====================================================================================================
#=====================================================================================================
#=====================================================================================================
# input data
data = None
# number of lags a.k.a. window length
M = 45 # M = 1 means no lag
covmat, X = lagged_covariance_matrix(data, M)
# get the eigenvalues and vectors of the covariance matrix
vals, vecs = np.linalg.eig(covmat)
eig_data, eigvec_data = sort_by_eigenvalues(vals, vecs)
# component reconstruction
recons_data = Reconstruction(M, eigvec_data, X)
The following works but does not make direct use of the projection base (ST-PCs). Hence the original question still stands, but this already helps a great lot and solves the problem for me. This code piece makes use of the similarity between the ST-PCs projection base and the u & vt matrices obtained from the single value decomposition of the lagged trajectory matrix. I think it gives back the same answer as one would obtain using the ST-PCs projection base?
def lag_reconstruction(data, X, M, pairs = None):
""" Reconstructs the series as the sum of M subseries using the lagged trajectory matrix.
Based on equation 2.9 of Plaut, G., & Vautard, R. (1994). Spells of low-frequency oscillations and weather regimes in the Northern Hemisphere. Journal of Atmospheric Sciences, 51(2), 210-236.
Inspired by work of R. van Westen and C. Wieners """
time = data.shape[0] # number of time levels of the original series
L = data.shape[1] # number of input series
N = time - M + 1
u, s, vt = np.linalg.svd(X, full_matrices = False)
rc = np.zeros((time, L, M))
for t in range(time):
counter = 0
for i in range(M):
if t-i >= 0 and t-i < N:
counter += 1
if pairs:
for k in pairs:
rc[t,:,i] += u[t-i, k] * s[k] * vt[k, i*L : i*L + L]
else:
for k in range(len(s)):
rc[t,:,i] += u[t-i, k] * s[k] * vt[k, i*L : i*L + L]
rc[t] = rc[t]/counter
return rc
I haven't been able to find particular solutions to this differential equation.
from sympy import *
m = float(raw_input('Mass:\n> '))
g = 9.8
k = float(raw_input('Drag Coefficient:\n> '))
v = Function('v')
f1 = g * m
t = Symbol('t')
v = Function('v')
equation = dsolve(f1 - k * v(t) - m * Derivative(v(t)), 0)
print equation
for m = 1000 and k = .2 it returns
Eq(f(t), C1*exp(-0.0002*t) + 49000.0)
which is correct but I want the equation solved for when v(0) = 0 which should return
Eq(f(t), 49000*(1-exp(-0.0002*t))
I believe Sympy is not yet able to take into account initial conditions. Although dsolve has the option ics for entering initial conditions (see the documentation), it appears to be of limited use.
Therefore, you need to apply the initial conditions manually. For example:
C1 = Symbol('C1')
C1_ic = solve(equation.rhs.subs({t:0}),C1)[0]
print equation.subs({C1:C1_ic})
Eq(v(t), 49000.0 - 49000.0*exp(-0.0002*t))
I have a numerical analysis assignment and I need to find some coefficients by multiplying matrices. We were given an example in Mathcad, but now we have to do it in another programming language so I chose Python.
The problem is, that I get different results by multiplying matrices in respective environments. Here's the function in Python:
from numpy import *
def matrica(C, n):
N = len(C) - 1
m = N - n
A = [[0] * (N + 1) for i in range(N+1)]
A[0][0] = 1
for i in range(0, n + 1):
A[i][i] = 1
for j in range(1, m + 1):
for i in range(0, N + 1):
if i + j <= N:
A[i+j][n+j] = A[i+j][n+j] - C[i]/2
A[int(abs(i - j))][n+j] = A[int(abs(i - j))][n+j] - C[i]/2
M = matrix(A)
x = matrix([[x] for x in C])
return [float(y) for y in M.I * x]
As you can see I am using numpy library. This function is consistent with its analog in Mathcad until return statement, the part where matrices are multiplied, to be more specific. One more observation: this function returns correct matrix if N = 1.
I'm not sure I understand exactly what your code do. Could you explain a little more, like what math stuff you're actually computing. But if you want a plain regular product and if you use a numpy.matrix, why don't you use the already written matrix product?
a = numpy.matrix(...)
b = numpy.matrix(...)
p = a * b #matrix product
This is a simple approximation to the Biot-Savart law.
I've implemented the integral(sum) in the function calc(),
If the number of spatial points is big, say 10^7 or 10^8 -ish, can calc be written to use NumPy arrays more efficiently? Thanks for your suggestions!
def calc(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :] # START CALCULATION
bottom = ((r**2).sum(axis=-1)**1.5)[...,None] # 1/|r|**3 add axis for vector
top = np.cross(idl_seg[None,:,:], r) # np.cross defaults to last axis
db = (mu0 / four_pi) * top / bottom
b = db.sum(axis=-2) # sum over the segments of the current loop
return b
EDIT: So for example, I can do this. Now there are just two arrays (r and hold) of size nx * ny * nz * nseg * 3. Maybe I should pass smaller chunks of points at a time, so it can all fit in cache at once?
def calc_alt(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :]
hold = np.ones_like(r)*((r**2).sum(axis=-1)**-1.5)[...,None] # note **-1.5 neg
b = (hold * np.cross(idl_seg[None,:,:], r)).sum(axis=-2)
return b * (mu0 / four_pi)
The rest of the code is posted to show how calc is used.
import numpy as np
import matplotlib.pyplot as plt
pi, four_pi = np.pi, 4. * np.pi
mu0 = four_pi * 1E-07 # Tesla m/A exact, defined
r0 = 0.05 # meters
I0 = 100.0 # amps
nx, ny, nz = 48, 49, 50
x,y,z = np.linspace(0,2*r0,nx), np.linspace(0,2*r0,ny), np.linspace(0,2*r0,nz)
xg = np.zeros((nx, ny, nz, 3)) # 3D grid of position vectors
xg[...,0] = x[:, None, None] # fill up the positions
xg[...,1] = y[None, :, None]
xg[...,2] = z[None, None, :]
xgv = xg.reshape(nx*ny*nz, 3) # flattened view of spatial points
nseg = 32 # approximate the current loop as a set of discrete points I*dl
theta = np.linspace(0, 2.*pi, nseg+1)[:-1] # get rid of the repeat
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
b = calc(xgv, xdl, idl) # HERE IS THE CALCULATION
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2] # make component views
bperp = np.sqrt(bx**2 + by**2) # new array for perp field
zround = np.round(z, 4)
iz = 5 # choose a transverse plane for a plot
fields = [ bz, bperp, bx, by]
names = ['Bz', 'Bperp', 'Bx', 'By']
titles = ["approx " + name + " at z = " + str(zround[iz])
for name in names]
plt.figure()
for i, field in enumerate(fields):
print i
plt.subplot(2, 2, i+1)
plt.imshow(field[..., iz], origin='lower') # fields at iz don't use Jet !!!
plt.title(titles[i])
plt.colorbar()
plt.show()
The plotting at the end is just to see that it appears to be working. In reality, never use the default colormap. Bad, awful, naughty Jet! In this case, a divergent cmap with symmetric vmin = -vmax might be good. (see Jake VanderPlas' post, and the matplotlib documentation, and there's some lovely demos down here.
You could compress these lines:
b = db.sum(axis=-2) # sum over the segments of the current loop
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2]
into
bx, by, bz = np.split(db.sum(axis=-2).reshape(nx, ny, nz, 3), 3, -1)
I doubt if it makes any difference in speed. Whether it makes this clearer or not is debateable.
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
could be rewritten as (not tested)
xdl = r0 * np.array([np.cos(theta), np.sin(theta)]
idl = I0 * np.array([-np.sin(theta), np.cos(theta)]
though these would make these (3,nseg). Note that the default axis for split is 0. Combining and split on the 1st axis is usually more natural. Also [None,...] broadcasting is automatic.
The ng construction might also be streamlined.
Mostly these are a cosmetic changes that won't make big differences in performance.
I have just run across np.numexpr which does (among other things) what I suggested in the edit - breaks the arrays into "chunks" so that they can fit into cache, including all temporary arrays needed to evaluate expressions.
https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html
There are nice explanations here and especially in this wiki.
I'm using Newton's method, so I want to find the positions of all six roots of the sixth-order polynomial, basically the points where the function is zero.
I found the rough values on my graph with this code below but want to output those positions of all six roots. I'm thinking of using x as an array to input the values in to find those positions but not sure. I'm using 1.0 for now to locate the rough values. Any suggestions from here??
def P(x):
return 924*x**6 - 2772*x**5 + 3150*x**4 - 1680*x**3 + 420*x**2 - 42*x + 1
def dPdx(x):
return 5544*x**5 - 13860*x**4 + 12600*x**3 - 5040*x**2 + 840*x - 42
accuracy = 1**-10
x = 1.0
xlast = float("inf")
while np.abs(x - xlast) > accuracy:
xlast = x
x = xlast - P(xlast)/dPdx(xlast)
print(x)
p_points = []
x_points = np.linspace(0, 1, 100)
y_points = np.zeros(len(x_points))
for i in range(len(x_points)):
y_points[i] = P(x_points[i])
p_points.append(P(x_points))
plt.plot(x_points,y_points)
plt.savefig("roots.png")
plt.show()
The traditional way is to use deflation to factor out the already found roots. If you want to avoid manipulations of the coefficient array, then you have to divide the roots out.
Having found z[1],...,z[k] as root approximations, form
g(x)=(x-z[1])*(x-z[2])*...*(x-z[k])
and apply Newtons method to h(x)=f(x)/g(x) with h'(x)=f'/g-fg'/g^2. In the Newton iteration this gives
xnext = x - f(x)/( f'(x) - f(x)*g'(x)/g(x) )
Fortunately the quotient g'/g has a simple form
g'(x)/g(x) = 1/(x-z[1])+1/(x-z[2])+...+1/(x-z[k])
So with a slight modification to the Newton step you can avoid finding the same root over again.
This all still keeps the iteration real. To get at the complex root, use a complex number to start the iteration.
Proof of concept, adding eps=1e-8j to g'(x)/g(x) allows the iteration to go complex without preventing real values. Solves the equivalent problem 0=exp(-eps*x)*f(x)/g(x)
import numpy as np
import matplotlib.pyplot as plt
def P(x):
return 924*x**6 - 2772*x**5 + 3150*x**4 - 1680*x**3 + 420*x**2 - 42*x + 1
def dPdx(x):
return 5544*x**5 - 13860*x**4 + 12600*x**3 - 5040*x**2 + 840*x - 42
accuracy = 1e-10
roots = []
for k in range(6):
x = 1.0
xlast = float("inf")
x_points = np.linspace(0.0, 1.0, 200)
y_points = P(x_points)
for rt in roots:
y_points /= (x_points - rt)
y_points = np.array([ max(-1.0,min(1.0,np.real(y))) for y in y_points ])
plt.plot(x_points,y_points,x_points,0*y_points)
plt.show()
while np.abs(x - xlast) > accuracy:
xlast = x
corr = 1e-8j
for rt in roots:
corr += 1/(xlast-rt)
Px = P(xlast)
dPx = dPdx(xlast)
x = xlast - Px/(dPx - Px*corr)
print(x)
roots.append(x)