Can auto differentiation handle separate functions of array slices? - gradient

Given a vector v of length say 30, can auto differentiation tools in say theano or tensorflow be able to take the gradient of something like this:
x = np.random.rand(5, 1)
v = f(x, z)
w = v[0:25].reshape(5, 5)
y = g(np.matmul(w, x) + v[25:30])
minimize ( || y - x || )
Would this even make sense? The way I picture it in my mind I would have to do some multiplications by identity vectors/matrices with trailing 0's to convert v --> w

Slice and reshape operations fit into standard reverse mode AD framework in the same way as any other op. Below is a simple TensorFlow program that is similar to the example you gave (I had to change a couple of things to make dimensions match), and the resulting computation graph for the gradient
def f(x, z):
"""Adds values together, reshapes into vector"""
return tf.reshape(x+z, (5,))
x = tf.Variable(np.random.rand(5, 1))
z = tf.Variable(np.random.rand(5, 1))
v = f(x, z)
w = tf.slice(v, 0, 5)
w = tf.reshape(v, (5, 1))
y = tf.matmul(tf.reshape(w, (5, 1)), tf.transpose(x)) + tf.slice(v, 0, 5)
cost = tf.square(tf.reduce_sum(y-x))
print tf.gradients(cost, [x, z])

Let us take a look at the source code:
#ops.RegisterGradient("Reshape")
def _ReshapeGrad(op, grad):
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
This is how tensorflow automatically differentiates.

Related

Integrate Legendre polynomials in SymPy and use these integrals as coefficients

I am trying to make a simple example in SymPy to compute some coefficients and then use them in a sum of legendre polynomials. Finally plot it. Very simple but can not make it work. I want to use it for my electromagnetism course. I get errors in both the attempts below:
%matplotlib inline
from sympy import *
x, y, z = symbols('x y z')
k, m, n = symbols('k m n', integer=True)
f, step, potential = symbols('f step potential', cls=Function)
var('n x')
A=SeqFormula(2*(2*m+1)*Integral(legendre(2*m+1,x),(x,0,1)).doit(),(m,0,oo)).doit()
Sum(A.coeff(m).doit()*legendre(2*m+1,x),(m,0,10)).doit()
B=Tuple.fromiter(2*(2*n+1)*Integral(legendre(2*n+1,x),(x,0,1)).doit() for n in range(50))
Sum(B[m]*legendre(2*m+1,x),(m,0,10)).doit()
Here is a part of an script in Mathematica of what I would like to replicate:
Nn = 50;
Array[A, Nn]
For[i = 0, i <= Nn, i++, A[i + 1] = Integrate[LegendreP[2*i + 1, x]*(2*(2*i + 1) + 1), {x, 0, 1}]];
Step = Sum[A[n + 1]*LegendreP[2*n + 1, #], {n, 0, Nn}] & Plot[Step[x], {x, -1, 1}]
I think the structure you were searching for with A is Python's lambda.
A = lambda m: 2*(2*m+1)*Integral(legendre(2*m+1, x), (x, 0, 1))
f = Sum(A(m)*legendre(2*m+1, x), (m, 0, 10)).doit()
plot(f, (x, -1, 1))
The key point is that m has to be explicit in order for integration to happen; SymPy does not know a general formula for integrating legendre(n, x). So, the integration here is attempted only when A is called with a concrete value of m, like A(0), A(1), etc.

How to fit a 2D ellipse to given points

I would like to fit a 2D array by an elliptic function: (x / a)² + (y / b)² = 1 ----> (and so get the a and b)
And then, be able to replot it on my graph.
I found many examples on internet, but no one with this simple Cartesian equation. I probably have searched badly ! I think a basic solution for this problem could help many people.
Here is an example of the data:
Sadly, I can not put the values... So let's assume that I have an X,Y arrays defining the coordinates of each of those points.
This can be solved directly using least squares. You can frame this as minimizing the sum of squares of quantity (alpha * x_i^2 + beta * y_i^2 - 1) where alpha is 1/a^2 and beta is 1/b^2. You have all the x_i's in X and the y_i's in Y so you can find the minimizer of ||Ax - b||^2 where A is an Nx2 matrix (i.e. [X^2, Y^2]), x is the column vector [alpha; beta] and b is column vector of all ones.
The following code solves the more general problem for an ellipse of the form Ax^2 + Bxy + Cy^2 + Dx +Ey = 1 though the idea is exactly the same. The print statement gives 0.0776x^2 + 0.0315xy+0.125y^2+0.00457x+0.00314y = 1 and the image of the ellipse generated is also below
import numpy as np
import matplotlib.pyplot as plt
alpha = 5
beta = 3
N = 500
DIM = 2
np.random.seed(2)
# Generate random points on the unit circle by sampling uniform angles
theta = np.random.uniform(0, 2*np.pi, (N,1))
eps_noise = 0.2 * np.random.normal(size=[N,1])
circle = np.hstack([np.cos(theta), np.sin(theta)])
# Stretch and rotate circle to an ellipse with random linear tranformation
B = np.random.randint(-3, 3, (DIM, DIM))
noisy_ellipse = circle.dot(B) + eps_noise
# Extract x coords and y coords of the ellipse as column vectors
X = noisy_ellipse[:,0:1]
Y = noisy_ellipse[:,1:]
# Formulate and solve the least squares problem ||Ax - b ||^2
A = np.hstack([X**2, X * Y, Y**2, X, Y])
b = np.ones_like(X)
x = np.linalg.lstsq(A, b)[0].squeeze()
# Print the equation of the ellipse in standard form
print('The ellipse is given by {0:.3}x^2 + {1:.3}xy+{2:.3}y^2+{3:.3}x+{4:.3}y = 1'.format(x[0], x[1],x[2],x[3],x[4]))
# Plot the noisy data
plt.scatter(X, Y, label='Data Points')
# Plot the original ellipse from which the data was generated
phi = np.linspace(0, 2*np.pi, 1000).reshape((1000,1))
c = np.hstack([np.cos(phi), np.sin(phi)])
ground_truth_ellipse = c.dot(B)
plt.plot(ground_truth_ellipse[:,0], ground_truth_ellipse[:,1], 'k--', label='Generating Ellipse')
# Plot the least squares ellipse
x_coord = np.linspace(-5,5,300)
y_coord = np.linspace(-5,5,300)
X_coord, Y_coord = np.meshgrid(x_coord, y_coord)
Z_coord = x[0] * X_coord ** 2 + x[1] * X_coord * Y_coord + x[2] * Y_coord**2 + x[3] * X_coord + x[4] * Y_coord
plt.contour(X_coord, Y_coord, Z_coord, levels=[1], colors=('r'), linewidths=2)
plt.legend()
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Following the suggestion by ErroriSalvo, here is the complete process of fitting an ellipse using the SVD. The arrays x, y are coordinates of the given points, let's say there are N points. Then U, S, V are obtained from the SVD of the centered coordinate array of shape (2, N). So, U is a 2 by 2 orthogonal matrix (rotation), S is a vector of length 2 (singular values), and V, which we do not need, is an N by N orthogonal matrix.
The linear map transforming the unit circle to the ellipse of best fit is
sqrt(2/N) * U * diag(S)
where diag(S) is the diagonal matrix with singular values on the diagonal. To see why the factor of sqrt(2/N) is needed, imagine that the points x, y are taken uniformly from the unit circle. Then sum(x**2) + sum(y**2) is N, and so the coordinate matrix consists of two orthogonal rows of length sqrt(N/2), hence its norm (the largest singular value) is sqrt(N/2). We need to bring this down to 1 to have the unit circle.
N = 300
t = np.linspace(0, 2*np.pi, N)
x = 5*np.cos(t) + 0.2*np.random.normal(size=N) + 1
y = 4*np.sin(t+0.5) + 0.2*np.random.normal(size=N)
plt.plot(x, y, '.') # given points
xmean, ymean = x.mean(), y.mean()
x -= xmean
y -= ymean
U, S, V = np.linalg.svd(np.stack((x, y)))
tt = np.linspace(0, 2*np.pi, 1000)
circle = np.stack((np.cos(tt), np.sin(tt))) # unit circle
transform = np.sqrt(2/N) * U.dot(np.diag(S)) # transformation matrix
fit = transform.dot(circle) + np.array([[xmean], [ymean]])
plt.plot(fit[0, :], fit[1, :], 'r')
plt.show()
But if you assume that there is no rotation, then np.sqrt(2/N) * S is all you need; these are a and b in the equation of the ellipse.
You could try a Singular Value Decomposition of the data matrix.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.svd.html
First center the data by subtracting mean values of X,Y from each column respectively.
X=X-np.mean(X)
Y=Y-np.mean(Y)
D=np.vstack(X,Y)
Then, apply SVD and extract
-eigenvalues (members of s) -> axis length
-eigenvectors(U) -> axis orientation
U, s, V = np.linalg.svd(D, full_matrices=True)
This should be a least-squares fit.
Of course, things can get more complicated than this, please see
https://www.emis.de/journals/BBMS/Bulletin/sup962/gander.pdf

Graphtheory. How to approach these kinds of problems? I want to know the logic and the way one needs to think while trying to solve this.

Find the number of paths on the Cartesian plane from (0, 0) to (n, n), which never raises above the y = x line. It is possible to make three types of moves along the path:
move up, i.e. from (i, j) to (i, j + 1);
move to the right, i.e. from (i, j) to (i + 1, j);
the right-up move, i.e. from (i, j) to (i + 1, j + 1)
Path count 101
First, we solve a simpler problem:
Find the number of paths on the Cartesian plane from (0, 0) to (n, n) with:
move up, i.e. from (i, j) to (i, j + 1);
move to the right, i.e. from (i, j) to (i + 1, j);
and we can go to grid which x < y.
How to solve it? Too Hard? Okay, we try to find the number of paths from (0, 0) to (2, 2) first. We could draw all paths in a grid:
We define
f(x,y) => the number of paths from (0, 0) to (x, y)
You can see the path to (2, 2) from either (1, 2) or (1, 2), so we can get:
f(2, 2) = f(2, 1) + f(1, 2)
And then you will notice for point(x, y), its path from either (x, y - 1) or (x - 1, y). That's very natural, since we have only two possible moves:
move up, i.e. from (i, j) to (i, j + 1);
move to the right, i.e. from (i, j) to (i + 1, j);
I draw a larger illustration for you, and you can check our conclusion:
So we can get that:
f(x, y) = f(x, y - 1) + f(x - 1, y)
Wait... What if x = 0 or y = 0? That's quite direct:
if x = 0 => f(x, y) = f(x, y - 1)
if y = 0 => f(x, y) = f(x - 1, y)
The last... How about f(0, 0)? We define:
f(0, 0) = 1
since there just 1 path from (0,0) to (1,0) and (0, 1).
OK, summarise:
f(x, y) = f(x, y - 1) + f(x - 1, y)
if x = 0 => f(x, y) = f(x, y - 1)
if y = 0 => f(x, y) = f(x - 1, y)
f(0, 0) = 1
And by recursion, we can solve that problem.
Your problem
Now let's discuss your original problem, just modify our equations a little bit:
f(x, y) = f(x, y - 1) + f(x - 1, y) + f(x - 1, y - 1)
if x = 0 => f(x, y) = f(x, y - 1)
if y = 0 => f(x, y) = f(x - 1, y)
if x < y => f(x, y) = 0
f(0, 0) = 1
and it will result my code.
The last thing I add to my code is Memoization. In short, Memoization can eliminate the repeat calculation -- if we have calculated f(x,y) already, just store it in a dictionary and never calculate it again. You can read the wiki page for a further learning.
So, that's all of my code. If you still get some questions, you can leave a comment here, and I will reply it as soon as possible.
Code:
d = {} # Memoization
def find(x, y):
if x == 0 and y == 0:
return 1
if x < y:
return 0
if d.get((x, y)) is not None:
return d.get((x, y))
ret = 0
if x > 0:
ret += find(x - 1, y)
if y > 0:
ret += find(x, y - 1)
if x > 0 and y > 0:
ret += find(x - 1, y - 1)
d[(x, y)] = ret
return ret
print find(2, 1) # 4
For additional ideas for solving problems such as this one, there is a mathematical curiosity that is in 1-1 correspondence with not only the sequences produced by lattice walks where one's steps must remain below the line x = y, but a veritable plethora, of fantastical mathematical beasts that cut a wide swath of applicability to problem solving and research.
What are these curiosities?
Catalan Numbers:
C_n = 1/(n+1)*(2n)Choose(n), n >= 0, where if n = 0, C_0 = 1.
They also count:
The number of expressions containing $n$ pairs of parentheses
eg. n = 3: ((())), (()()), (())(), ()(()), ()()()
Number of plane trees with n + 1 vertices
Number of Triangulations of a convex (n + 2)-gon
Monomials from the product: p(x1, ..., xn) = x1(x1 + x2)(x1 + x2 + x3) ... (x1 + ... + xn)
bipartite vertices of rooted planar trees
And soooo many more things
These object appear in a lot of active research in mathematical physics, for instance, which is an active area of algorithms research due to the enormous data sets.
So you never know what seemingly far flung concepts are intimately linked in some deep dark mathematical recess.

User defined SVM kernel with scikit-learn

I encounter a problem when defining a kernel by myself in scikit-learn.
I define by myself the gaussian kernel and was able to fit the SVM but not to use it to make a prediction.
More precisely I have the following code
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.utils import shuffle
import scipy.sparse as sparse
import numpy as np
digits = load_digits(2)
X, y = shuffle(digits.data, digits.target)
gamma = 1.0
X_train, X_test = X[:100, :], X[100:, :]
y_train, y_test = y[:100], y[100:]
m1 = SVC(kernel='rbf',gamma=1)
m1.fit(X_train, y_train)
m1.predict(X_test)
def my_kernel(x,y):
d = x - y
c = np.dot(d,d.T)
return np.exp(-gamma*c)
m2 = SVC(kernel=my_kernel)
m2.fit(X_train, y_train)
m2.predict(X_test)
m1 and m2 should be the same, but m2.predict(X_test) return the error :
operands could not be broadcast together with shapes (260,64) (100,64)
I don't understand the problem.
Furthermore if x is one data point, the m1.predict(x) gives a +1/-1 result, as expexcted, but m2.predict(x) gives an array of +1/-1...
No idea why.
The error is at the x - y line. You cannot subtract the two like that, because the first dimensions of both may not be equal. Here is how the rbf kernel is implemented in scikit-learn, taken from here (only keeping the essentials):
def row_norms(X, squared=False):
if issparse(X):
norms = csr_row_norms(X)
else:
norms = np.einsum('ij,ij->i', X, X)
if not squared:
np.sqrt(norms, norms)
return norms
def euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False):
"""
Considering the rows of X (and Y=X) as vectors, compute the
distance matrix between each pair of vectors.
[...]
Returns
-------
distances : {array, sparse matrix}, shape (n_samples_1, n_samples_2)
"""
X, Y = check_pairwise_arrays(X, Y)
if Y_norm_squared is not None:
YY = check_array(Y_norm_squared)
if YY.shape != (1, Y.shape[0]):
raise ValueError(
"Incompatible dimensions for Y and Y_norm_squared")
else:
YY = row_norms(Y, squared=True)[np.newaxis, :]
if X is Y: # shortcut in the common case euclidean_distances(X, X)
XX = YY.T
else:
XX = row_norms(X, squared=True)[:, np.newaxis]
distances = safe_sparse_dot(X, Y.T, dense_output=True)
distances *= -2
distances += XX
distances += YY
np.maximum(distances, 0, out=distances)
if X is Y:
# Ensure that distances between vectors and themselves are set to 0.0.
# This may not be the case due to floating point rounding errors.
distances.flat[::distances.shape[0] + 1] = 0.0
return distances if squared else np.sqrt(distances, out=distances)
def rbf_kernel(X, Y=None, gamma=None):
X, Y = check_pairwise_arrays(X, Y)
if gamma is None:
gamma = 1.0 / X.shape[1]
K = euclidean_distances(X, Y, squared=True)
K *= -gamma
np.exp(K, K) # exponentiate K in-place
return K
You might want to dig deeper into the code, but look at the comments for the euclidean_distances function. A naive implementation of what you're trying to achieve would be this:
def my_kernel(x,y):
d = np.zeros((x.shape[0], y.shape[0]))
for i, row_x in enumerate(x):
for j, row_y in enumerate(y):
d[i, j] = np.exp(-gamma * np.linalg.norm(row_x - row_y))
return d

Calculating index in a subdivided interval?

Consider an interval of values [x, y] equally subdivided in n samples in the following way:
y can be greater, equal or less than x.
Now, we pick up a value z between x and y.
Question: what is the formula to compute the index i of z ? (if x = y, then the formula should return 0 or n-1) (I repeat: y can be greater, equal or less than x.)
For example: if x = - 5, y = -10 and n = 5, then for z = -7.5, i = 2 (if z = -7, i = 2 but if z = -8, i = 3).
You can compute the length of the interval as:
len = y - x
Then you can compute the increase per a single element
increase = len / n;
And now you have i = (z - x) / increase. In short you compute how much does the value increase per a single element and than you compute how many times this increase is needed to get from x to z.
EDIT: if you really require the solution in C++ take care to do all the calculations in double. Also please note the value of i should be an integer rounded down.
Answer logic(IN java):
i = Math.abs(Math.ceil(z - Math.min(x,y)));
if(x>y) high = x low = y
else high = y low = x
if(y>=x)
i = ceil((z-low+1)/(high-low+1)*n)-1
else i = ceil((high-z+1)/(high-low+1)*n)-1