Is there a way to do this with better "in place" methods? - python-2.7

This is a simple approximation to the Biot-Savart law.
I've implemented the integral(sum) in the function calc(),
If the number of spatial points is big, say 10^7 or 10^8 -ish, can calc be written to use NumPy arrays more efficiently? Thanks for your suggestions!
def calc(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :] # START CALCULATION
bottom = ((r**2).sum(axis=-1)**1.5)[...,None] # 1/|r|**3 add axis for vector
top = np.cross(idl_seg[None,:,:], r) # np.cross defaults to last axis
db = (mu0 / four_pi) * top / bottom
b = db.sum(axis=-2) # sum over the segments of the current loop
return b
EDIT: So for example, I can do this. Now there are just two arrays (r and hold) of size nx * ny * nz * nseg * 3. Maybe I should pass smaller chunks of points at a time, so it can all fit in cache at once?
def calc_alt(points, x_seg, idl_seg):
r = points[:, None, :] - x_seg[None, :, :]
hold = np.ones_like(r)*((r**2).sum(axis=-1)**-1.5)[...,None] # note **-1.5 neg
b = (hold * np.cross(idl_seg[None,:,:], r)).sum(axis=-2)
return b * (mu0 / four_pi)
The rest of the code is posted to show how calc is used.
import numpy as np
import matplotlib.pyplot as plt
pi, four_pi = np.pi, 4. * np.pi
mu0 = four_pi * 1E-07 # Tesla m/A exact, defined
r0 = 0.05 # meters
I0 = 100.0 # amps
nx, ny, nz = 48, 49, 50
x,y,z = np.linspace(0,2*r0,nx), np.linspace(0,2*r0,ny), np.linspace(0,2*r0,nz)
xg = np.zeros((nx, ny, nz, 3)) # 3D grid of position vectors
xg[...,0] = x[:, None, None] # fill up the positions
xg[...,1] = y[None, :, None]
xg[...,2] = z[None, None, :]
xgv = xg.reshape(nx*ny*nz, 3) # flattened view of spatial points
nseg = 32 # approximate the current loop as a set of discrete points I*dl
theta = np.linspace(0, 2.*pi, nseg+1)[:-1] # get rid of the repeat
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
b = calc(xgv, xdl, idl) # HERE IS THE CALCULATION
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2] # make component views
bperp = np.sqrt(bx**2 + by**2) # new array for perp field
zround = np.round(z, 4)
iz = 5 # choose a transverse plane for a plot
fields = [ bz, bperp, bx, by]
names = ['Bz', 'Bperp', 'Bx', 'By']
titles = ["approx " + name + " at z = " + str(zround[iz])
for name in names]
plt.figure()
for i, field in enumerate(fields):
print i
plt.subplot(2, 2, i+1)
plt.imshow(field[..., iz], origin='lower') # fields at iz don't use Jet !!!
plt.title(titles[i])
plt.colorbar()
plt.show()
The plotting at the end is just to see that it appears to be working. In reality, never use the default colormap. Bad, awful, naughty Jet! In this case, a divergent cmap with symmetric vmin = -vmax might be good. (see Jake VanderPlas' post, and the matplotlib documentation, and there's some lovely demos down here.

You could compress these lines:
b = db.sum(axis=-2) # sum over the segments of the current loop
bv = b.reshape(nx, ny, nz, 3) # make a "3D view" again to use for plotting
bx, by, bz = bv[...,0], bv[...,1], bv[...,2]
into
bx, by, bz = np.split(db.sum(axis=-2).reshape(nx, ny, nz, 3), 3, -1)
I doubt if it makes any difference in speed. Whether it makes this clearer or not is debateable.
xdl = np.zeros((nseg, 3)) # these are the position vectors
idl = np.zeros((nseg, 3)) # these are the current vectors
xdl[:,0], xdl[:,1] = r0 * np.cos(theta), r0 * np.sin(theta)
idl[:,0], idl[:,1] = I0 * -np.sin(theta), I0 * np.cos(theta)
could be rewritten as (not tested)
xdl = r0 * np.array([np.cos(theta), np.sin(theta)]
idl = I0 * np.array([-np.sin(theta), np.cos(theta)]
though these would make these (3,nseg). Note that the default axis for split is 0. Combining and split on the 1st axis is usually more natural. Also [None,...] broadcasting is automatic.
The ng construction might also be streamlined.
Mostly these are a cosmetic changes that won't make big differences in performance.

I have just run across np.numexpr which does (among other things) what I suggested in the edit - breaks the arrays into "chunks" so that they can fit into cache, including all temporary arrays needed to evaluate expressions.
https://numexpr.readthedocs.io/projects/NumExpr3/en/latest/user_guide.html
There are nice explanations here and especially in this wiki.

Related

How to fit a 2D ellipse to given points

I would like to fit a 2D array by an elliptic function: (x / a)² + (y / b)² = 1 ----> (and so get the a and b)
And then, be able to replot it on my graph.
I found many examples on internet, but no one with this simple Cartesian equation. I probably have searched badly ! I think a basic solution for this problem could help many people.
Here is an example of the data:
Sadly, I can not put the values... So let's assume that I have an X,Y arrays defining the coordinates of each of those points.
This can be solved directly using least squares. You can frame this as minimizing the sum of squares of quantity (alpha * x_i^2 + beta * y_i^2 - 1) where alpha is 1/a^2 and beta is 1/b^2. You have all the x_i's in X and the y_i's in Y so you can find the minimizer of ||Ax - b||^2 where A is an Nx2 matrix (i.e. [X^2, Y^2]), x is the column vector [alpha; beta] and b is column vector of all ones.
The following code solves the more general problem for an ellipse of the form Ax^2 + Bxy + Cy^2 + Dx +Ey = 1 though the idea is exactly the same. The print statement gives 0.0776x^2 + 0.0315xy+0.125y^2+0.00457x+0.00314y = 1 and the image of the ellipse generated is also below
import numpy as np
import matplotlib.pyplot as plt
alpha = 5
beta = 3
N = 500
DIM = 2
np.random.seed(2)
# Generate random points on the unit circle by sampling uniform angles
theta = np.random.uniform(0, 2*np.pi, (N,1))
eps_noise = 0.2 * np.random.normal(size=[N,1])
circle = np.hstack([np.cos(theta), np.sin(theta)])
# Stretch and rotate circle to an ellipse with random linear tranformation
B = np.random.randint(-3, 3, (DIM, DIM))
noisy_ellipse = circle.dot(B) + eps_noise
# Extract x coords and y coords of the ellipse as column vectors
X = noisy_ellipse[:,0:1]
Y = noisy_ellipse[:,1:]
# Formulate and solve the least squares problem ||Ax - b ||^2
A = np.hstack([X**2, X * Y, Y**2, X, Y])
b = np.ones_like(X)
x = np.linalg.lstsq(A, b)[0].squeeze()
# Print the equation of the ellipse in standard form
print('The ellipse is given by {0:.3}x^2 + {1:.3}xy+{2:.3}y^2+{3:.3}x+{4:.3}y = 1'.format(x[0], x[1],x[2],x[3],x[4]))
# Plot the noisy data
plt.scatter(X, Y, label='Data Points')
# Plot the original ellipse from which the data was generated
phi = np.linspace(0, 2*np.pi, 1000).reshape((1000,1))
c = np.hstack([np.cos(phi), np.sin(phi)])
ground_truth_ellipse = c.dot(B)
plt.plot(ground_truth_ellipse[:,0], ground_truth_ellipse[:,1], 'k--', label='Generating Ellipse')
# Plot the least squares ellipse
x_coord = np.linspace(-5,5,300)
y_coord = np.linspace(-5,5,300)
X_coord, Y_coord = np.meshgrid(x_coord, y_coord)
Z_coord = x[0] * X_coord ** 2 + x[1] * X_coord * Y_coord + x[2] * Y_coord**2 + x[3] * X_coord + x[4] * Y_coord
plt.contour(X_coord, Y_coord, Z_coord, levels=[1], colors=('r'), linewidths=2)
plt.legend()
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Following the suggestion by ErroriSalvo, here is the complete process of fitting an ellipse using the SVD. The arrays x, y are coordinates of the given points, let's say there are N points. Then U, S, V are obtained from the SVD of the centered coordinate array of shape (2, N). So, U is a 2 by 2 orthogonal matrix (rotation), S is a vector of length 2 (singular values), and V, which we do not need, is an N by N orthogonal matrix.
The linear map transforming the unit circle to the ellipse of best fit is
sqrt(2/N) * U * diag(S)
where diag(S) is the diagonal matrix with singular values on the diagonal. To see why the factor of sqrt(2/N) is needed, imagine that the points x, y are taken uniformly from the unit circle. Then sum(x**2) + sum(y**2) is N, and so the coordinate matrix consists of two orthogonal rows of length sqrt(N/2), hence its norm (the largest singular value) is sqrt(N/2). We need to bring this down to 1 to have the unit circle.
N = 300
t = np.linspace(0, 2*np.pi, N)
x = 5*np.cos(t) + 0.2*np.random.normal(size=N) + 1
y = 4*np.sin(t+0.5) + 0.2*np.random.normal(size=N)
plt.plot(x, y, '.') # given points
xmean, ymean = x.mean(), y.mean()
x -= xmean
y -= ymean
U, S, V = np.linalg.svd(np.stack((x, y)))
tt = np.linspace(0, 2*np.pi, 1000)
circle = np.stack((np.cos(tt), np.sin(tt))) # unit circle
transform = np.sqrt(2/N) * U.dot(np.diag(S)) # transformation matrix
fit = transform.dot(circle) + np.array([[xmean], [ymean]])
plt.plot(fit[0, :], fit[1, :], 'r')
plt.show()
But if you assume that there is no rotation, then np.sqrt(2/N) * S is all you need; these are a and b in the equation of the ellipse.
You could try a Singular Value Decomposition of the data matrix.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.svd.html
First center the data by subtracting mean values of X,Y from each column respectively.
X=X-np.mean(X)
Y=Y-np.mean(Y)
D=np.vstack(X,Y)
Then, apply SVD and extract
-eigenvalues (members of s) -> axis length
-eigenvectors(U) -> axis orientation
U, s, V = np.linalg.svd(D, full_matrices=True)
This should be a least-squares fit.
Of course, things can get more complicated than this, please see
https://www.emis.de/journals/BBMS/Bulletin/sup962/gander.pdf

Calculate distance by comparing different points

I have different locations, and I need to calculate the distance between them:
Location Lat Long Distance
A -20 -50 (A-B)+(A-C)+(A-D)
B -20.3 -51 (B-A)+(B-C)+(B-D)
C -21 -50 (C-A)+(C-B)+(C-D)
D -20.8 -50.2 (D-A)+(D-B)+(D-C)
Would anyone know me?
I'm using this equation to calculate the distance between two points, but I don't know how to calculate between several points.
R = 6373.0
dist_lat=lat2-lat
dist_lon=long2-lon
a= np.sin(dist_lat/2)**2 + np.cos(lat) * np.cos(lat2) * np.sin(dist_lon/2)**2
b= 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
Dist= R * b
this is my first time with python so beware, also I made this rather quickly - it is probably buggy.
Tested on python2.7.
import numpy as np
import itertools as it
# these are the points
points = {
'A' :{'lat':-20 ,'lon':-50}
,'B':{'lat':-20.3 ,'lon':-51}
,'C':{'lat':-21 ,'lon':-50}
,'D':{'lat':-20.8 ,'lon':-50.2}
}
# this calculates the distance beween two points
# basicly your formula wrapped in a function
# - used below
def distance(a, b):
R = 6373.0
dist_lat = a['lat']-b['lat']
dist_lon = a['lon']-b['lon']
x = np.sin(dist_lat/2)**2 + np.cos(b['lat']) * np.cos(a['lat']) * np.sin(dist_lon/2)**2
y = 2 * np.arctan2(np.sqrt(x), np.sqrt(1 - x))
return R * y
distances = {}
# produce the distance beween all combinations
# see: https://docs.python.org/2/library/itertools.html
# - uses above function
for x in it.combinations(points.keys(), 2):
distances[ ''.join(x) ] = distance(points[x[0]],points[x[1]])
# for every starting point
# - filter the distance it is involved in
# - sum
for CH in ('A','B','C','D'):
print CH,' ', sum(dict((key,value) for key, value in distances.iteritems() if CH in key).values())
output:
$ python2.7 test.py
A 6983.4634697
B 9375.21128818
C 5859.5650322
D 9472.23979008

User defined SVM kernel with scikit-learn

I encounter a problem when defining a kernel by myself in scikit-learn.
I define by myself the gaussian kernel and was able to fit the SVM but not to use it to make a prediction.
More precisely I have the following code
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.utils import shuffle
import scipy.sparse as sparse
import numpy as np
digits = load_digits(2)
X, y = shuffle(digits.data, digits.target)
gamma = 1.0
X_train, X_test = X[:100, :], X[100:, :]
y_train, y_test = y[:100], y[100:]
m1 = SVC(kernel='rbf',gamma=1)
m1.fit(X_train, y_train)
m1.predict(X_test)
def my_kernel(x,y):
d = x - y
c = np.dot(d,d.T)
return np.exp(-gamma*c)
m2 = SVC(kernel=my_kernel)
m2.fit(X_train, y_train)
m2.predict(X_test)
m1 and m2 should be the same, but m2.predict(X_test) return the error :
operands could not be broadcast together with shapes (260,64) (100,64)
I don't understand the problem.
Furthermore if x is one data point, the m1.predict(x) gives a +1/-1 result, as expexcted, but m2.predict(x) gives an array of +1/-1...
No idea why.
The error is at the x - y line. You cannot subtract the two like that, because the first dimensions of both may not be equal. Here is how the rbf kernel is implemented in scikit-learn, taken from here (only keeping the essentials):
def row_norms(X, squared=False):
if issparse(X):
norms = csr_row_norms(X)
else:
norms = np.einsum('ij,ij->i', X, X)
if not squared:
np.sqrt(norms, norms)
return norms
def euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False):
"""
Considering the rows of X (and Y=X) as vectors, compute the
distance matrix between each pair of vectors.
[...]
Returns
-------
distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2)
"""
X, Y = check_pairwise_arrays(X, Y)
if Y_norm_squared is not None:
YY = check_array(Y_norm_squared)
if YY.shape != (1, Y.shape[0]):
raise ValueError(
"Incompatible dimensions for Y and Y_norm_squared")
else:
YY = row_norms(Y, squared=True)[np.newaxis, :]
if X is Y: # shortcut in the common case euclidean_distances(X, X)
XX = YY.T
else:
XX = row_norms(X, squared=True)[:, np.newaxis]
distances = safe_sparse_dot(X, Y.T, dense_output=True)
distances *= -2
distances += XX
distances += YY
np.maximum(distances, 0, out=distances)
if X is Y:
# Ensure that distances between vectors and themselves are set to 0.0.
# This may not be the case due to floating point rounding errors.
distances.flat[::distances.shape[0] + 1] = 0.0
return distances if squared else np.sqrt(distances, out=distances)
def rbf_kernel(X, Y=None, gamma=None):
X, Y = check_pairwise_arrays(X, Y)
if gamma is None:
gamma = 1.0 / X.shape[1]
K = euclidean_distances(X, Y, squared=True)
K *= -gamma
np.exp(K, K) # exponentiate K in-place
return K
You might want to dig deeper into the code, but look at the comments for the euclidean_distances function. A naive implementation of what you're trying to achieve would be this:
def my_kernel(x,y):
d = np.zeros((x.shape[0], y.shape[0]))
for i, row_x in enumerate(x):
for j, row_y in enumerate(y):
d[i, j] = np.exp(-gamma * np.linalg.norm(row_x - row_y))
return d

Matrix related calculation in python

I am finding a very interesting problem while calculating a matrix update in python . I have to calculate the error (which is difference between previous n updated matrix ).
import numpy as np
import matplotlib.pyplot as plt
#from matplotlib import animation
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
def update(A):
C=A
D=A
D[1:-1,1:-1]=(C[0:-2,1:-1]+C[2:,1:-1]+C[1:-1,0:-2]+C[1:-1,2:])/4
return(np.abs(D-C),D)
def error(A,B):
C=np.zeros(np.shape(A),np.float64)
#e=np.max(np.max(np.abs(C)))
e=(np.abs(C))
return (e.sum(dtype='float64'))
def initial(C):
C[0,:]=0 ## Top Boundary
C[-1,:]=0 ## Bottom Boundary
C[:,0]=0 ## left Boundary
C[:,-1]=100 ## Right Boundary
return(C)
def SolveLaplace(nx, ny,epsilon,imax):
## Initialize the mesh with some values
U = np.zeros((nx, ny),np.float64)
## Set boundary conditions for the problem
U=initial(U)
## Store previous grid values to check against error tolerance
UN=np.zeros((nx, ny),np.float64)
UN=initial(UN)
## Constants
k = 1 ## Iteration counter
## Iterative procedure
while k<imax:
err,U=update(U)
print(err.sum())
k+=1
return (U)
nx = 50.0
ny = 50.0
dx = 0.001
epsilon = 1e-6 ## Absolute Error tolerance
imax = 5000 ## Maximum number of iterations allowed
Z = SolveLaplace(nx, ny,epsilon,imax)
#x = np.linspace(0, nx * dx, nx)
#y = np.linspace(0, ny * dx, ny)
#X, Y = np.meshgrid(x,y)
##===================================================================
def PlotSolution(nx,ny,dx,T):
## Set up x and y vectors for meshgrid
x = np.linspace(0, nx * dx, nx)
y = np.linspace(0, ny * dx, ny)
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(x,y)
ax.plot_surface(X, Y, T.transpose(), rstride=1, cstride=1, cmap=cm.cool, linewidth=0, antialiased=False)
plt.xlabel("X")
plt.ylabel("Y")
#plt.zlabel("T(X,Y)")
plt.figure()
plt.contourf(X, Y, T.transpose(), 32, rstride=1, cstride=1, cmap=cm.cool)
plt.colorbar()
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
##===================================================================
PlotSolution(nx, ny, dx, Z)
I am suppose to solve Laplace equation for 2-d sheet(temperature distribution) and when error is less than certain minimum value ,equilibrium will be achieved. But while calculating error, I am always getting 0 but when I print my matrix then I find it should not be a zero . Guys I think I have some conceptual problem here and So please help .
Your problem is that you use shallow copies, i.e., only copy the reference, when assigning C=A; D=A in the update function. Essentially, after the construction of D, all three variables A,C,D point to the same object. Use
def update(A):
C=1.0*A
D=1.0*A
D[1:-1,1:-1]=(C[0:-2,1:-1]+C[2:,1:-1]+C[1:-1,0:-2]+C[1:-1,2:])/4
return(np.abs(D-C),D)
or even shorter
def update(A):
D=A.copy()
D[1:-1,1:-1]=(A[0:-2,1:-1]+A[2:,1:-1]+A[1:-1,0:-2]+A[1:-1,2:])/4
return(np.abs(D-A),D)
Passing arguments and performing arithmetic operations results automatically in a deep copy.
You know that the (geometric, first order) convergence rate is something like max(1-C/(nx^2), 1-C/(ny^2)), i.e., very slow for even moderately large grids? For real applications, better use conjugate gradients, other Krylov-related algorithms or multi-grid approaches (or sparse solver libraries, UMFpack ...).
In the (unused) error procedure, should there be not something like
e = abs(A-B)
At the moment, you return the norm of the freshly generated zero matrix C.

Second order ODE integration using scipy

I am trying to integrate a second order differential equation using 'scipy.integrate.odeint'. My eqution is as follows
m*x[i]''+x[i]'= K/N*sum(j=0 to N)of sin(x[j]-x[i])
which I have converted into two first order ODEs as followed. In the below code, yinit is array of the initial values x(0) and x'(0). My question is what should be the values of x(0) and x'(0) ?
x'[i]=y[i]
y'[i]=(-y[i]+K/N*sum(j=0 to N)of sin(x[j]-x[i]))/m
from numpy import *
from scipy.integrate import odeint
N = 50
def f(theta, t):
global N
x, y = theta
m = 0.95
K = 1.0
fx = zeros(N, float)
for i in range(N):
s = 0.0
for j in range(i+1,N):
s = s + sin(x[j] - x[i])
fx[i] = (-y[i] + (K*s)/N)/m
return array([y, fx])
t = linspace(0, 10, 100, endpoint=False)
Uniformly generating random number
theta = random.uniform(-180, 180, N)
Integrating function f using odeint
yinit = array([x(0), x'(0)])
y = odeint(f, yinit, t)[:,0]
print (y)
You can choose as initial condition whatever you want.
In your case, you decided to use a random initial condition for x for all the oscillators. You can use a random initial condition for 'y' as well I guess, as I did below.
There were a few errors in the above code, mostly on how to unpack x,y from theta and how to repack them at the end (see concatenate below in the corrected code). See also the concatenate for yinit.
The rest are stylish/minor changes.
from numpy import concatenate, linspace, random, mod, zeros, sin
from scipy.integrate import odeint
Nosc = 20
assert mod(Nosc, 2) == 0
def f(theta, _):
N = theta.size / 2
x, y = theta[:N], theta[N:]
m = 0.95
K = 1.0
fx = zeros(N, float)
for i in range(N):
s = 0.0
for j in range(i + 1, N):
s = s + sin(x[j] - x[i])
fx[i] = (-y[i] + (K * s) / N) / m
return concatenate(([y, fx]))
t = linspace(0, 10, 50, endpoint=False)
theta = random.uniform(-180, 180, Nosc)
theta2 = random.uniform(-180, 180, Nosc) #added initial condition for the velocities of the oscillators
yinit = concatenate((theta, theta2))
res = odeint(f, yinit, t)
X = res[:, :Nosc].T
Y = res[:, Nosc:].T
To plot the time evolution of the system, you can use something like
import matplotlib.pylab as plt
fig, ax = plt.subplots()
for displacement in X:
ax.plot(t, displacement)
ax.set_xlabel('t')
ax.set_ylabel('x')
fig.show()
What are you modelling? At first the eq. looked a bit like kuramoto oscillators, but then I noticed you also have a x[i]'' term.
Notice how in your model, as you do not have a spring term in the equation, like a term x(t) at the LHS, the value of x converges to an arbitrary value: