Gradient descent rule in neural network (R) - gradient

I am getting confused about the package "neuralnet".
I want to know the gradient descent rule used in neuralnet(R).
delta w(t) = -n(dE/dw) + u(delta w(t-1))
where E = error, w = weight, u is momentum, n = learning rate and t = time
Is that correct?

Related

Using gradient descent to solve a nonlinear system

I have the following code, which uses gradient descent to find the global minimum of y = (x+5)^2:
cur_x = 3 # the algorithm starts at x=3
rate = 0.01 # learning rate
precision = 0.000001 # this tells us when to stop the algorithm
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 # iteration counter
df = lambda x: 2*(x+5) # gradient of our function
while previous_step_size > precision and iters < max_iters:
prev_x = cur_x # store current x value in prev_x
cur_x = cur_x - rate * df(prev_x) # grad descent
previous_step_size = abs(cur_x - prev_x) # change in x
iters = iters+1 # iteration count
print("Iteration",iters,"\nX value is",cur_x) # print iterations
print("The local minimum occurs at", cur_x)
The procedure is fairly simple, and among the most intuitive and brief for solving such a problem (at least, that I'm aware of).
I'd now like to apply this to solving a system of nonlinear equations. Namely, I want to use this to solve the Time Difference of Arrival problem in three dimensions. That is, given the coordinates of 4 observers (or, in general, n+1 observers for an n dimensional solution), the velocity v of some signal, and the time of arrival at each observer, I want to reconstruct the source (determine it's coordinates [x,y,z].
I've already accomplished this using approximation search (see this excellent post on the matter: ), and I'd now like to try doing so with gradient descent (really, just as an interesting exercise). I know that the problem in two dimensions can be described by the following non-linear system:
sqrt{(x-x_1)^2+(y-y_1)^2}+s(t_2-t_1) = sqrt{(x-x_2)^2 + (y-y_2)^2}
sqrt{(x-x_2)^2+(y-y_2)^2}+s(t_3-t_2) = sqrt{(x-x_3)^2 + (y-y_3)^2}
sqrt{(x-x_3)^2+(y-y_3)^2}+s(t_1-t_3) = sqrt{(x-x_1)^2 + (y-y_1)^2}
I know that it can be done, however I cannot determine how.
How might I go about applying this to 3-dimensions, or some nonlinear system in general?

Plot variables out of differential equations system function

I have a 4-4 differential equations system in a function (subsystem4) and I solved it with odeint funtion. I managed to plot the results of the system. My problem is that I want to plot and some other equations (e.g. x,y,vcxdot...) which are included in the same function (subsystem4) but I get NameError: name 'vcxdot' is not defined. Also, I want to use some of these equations (not only the results of the equation's system) as inputs in a following differential equations system and plot all the equations in the same period of time (t). I have done this using Matlab-Simulink but it was much easier because of Simulink blocks. How can I have access to and plot all the equations of a function (subsystem4) and use them as input in a following system? I am new in python and I use Python 2.7.12. Thank you in advance!
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def subsystem4(u,t):
added_mass_x = 0.03 # kg
added_mass_y = 0.04
mb = 0.3 # kg
m1 = mb-added_mass_x
m2 = mb-added_mass_y
l1 = 0.07 # m
l2 = 0.05 # m
J = 0.00050797 # kgm^2
Sa = 0.0110 # m^2
Cd = 2.44
Cl = 3.41
Kd = 0.000655 # kgm^2
r = 1000 # kg/m^3
f = 2 # Hz
c1 = 0.5*r*Sa*Cd
c2 = 0.5*r*Sa*Cl
c3 = 0.5*mb*(l1**2)
c4 = Kd/J
c5 = (1/(2*J))*(l1**2)*mb*l2
c6 = (1/(3*J))*(l1**3)*mb
vcx = u[0]
vcy = u[1]
psi = u[2]
wz = u[3]
x = 3 + 0.3*np.cos(t)
y = 0.5 + 0.3*np.sin(t)
xdot = -0.3*np.sin(t)
ydot = 0.3*np.cos(t)
xdotdot = -0.3*np.cos(t)
ydotdot = -0.3*np.sin(t)
vcx = xdot*np.cos(psi)-ydot*np.sin(psi)
vcy = ydot*np.cos(psi)+xdot*np.sin(psi)
psidot = wz
vcxdot = xdotdot*np.cos(psi)-xdot*np.sin(psi)*psidot-ydotdot*np.sin(psi)-ydot*np.cos(psi)*psidot
vcydot = ydotdot*np.cos(psi)-ydot*np.sin(psi)*psidot+xdotdot*np.sin(psi)+xdot*np.cos(psi)*psidot
g1 = -(m1/c3)*vcxdot+(m2/c3)*vcy*wz-(c1/c3)*vcx*np.sqrt((vcx**2)+(vcy**2))+(c2/c3)*vcy*np.sqrt((vcx**2)+(vcy**2))*np.arctan2(vcy,vcx)
g2 = (m2/c3)*vcydot+(m1/c3)*vcx*wz+(c1/c3)*vcy*np.sqrt((vcx**2)+(vcy**2))+(c2/c3)*vcx*np.sqrt((vcx**2)+(vcy**2))*np.arctan2(vcy,vcx)
A = 12*np.sin(2*np.pi*f*t+np.pi)
if A>=0.1:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2-c6*np.sqrt((g1**2)+(g2**2))
elif A<-0.1:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2+c6*np.sqrt((g1**2)+(g2**2))
else:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2
return [vcxdot,vcydot,psidot,wzdot]
u0 = [0,0,0,0]
t = np.linspace(0,15,1000)
u = odeint(subsystem4,u0,t)
vcx = u[:,0]
vcy = u[:,1]
psi = u[:,2]
wz = u[:,3]
plt.figure(1)
plt.subplot(211)
plt.plot(t,vcx,'r-',linewidth=2,label='vcx')
plt.plot(t,vcy,'b--',linewidth=2,label='vcy')
plt.plot(t,psi,'g:',linewidth=2,label='psi')
plt.plot(t,wz,'c',linewidth=2,label='wz')
plt.xlabel('time')
plt.legend()
plt.show()
To the immediate question of plotting the derivatives, you can get the velocities by directly calling the ODE function again on the solution,
u = odeint(subsystem4,u0,t)
udot = subsystem4(u.T,t)
and get the separate velocity arrays via
vcxdot,vcydot,psidot,wzdot = udot
In this case the function involves branching, which is not very friendly to vectorized calls of it. There are ways to vectorize branching, but the easiest work-around is to loop manually through the solution points, which is slower than a working vectorized implementation. This will again procude a list of tuples like odeint, so the result has to be transposed as a tuple of lists for "easy" assignment to the single array variables.
udot = [ subsystem4(uk, tk) for uk, tk in zip(u,t) ];
vcxdot,vcydot,psidot,wzdot = np.asarray(udot).T
This may appear to double somewhat the computation, but not really, as the solution points are usually interpolated from the internal step points of the solver. The evaluation of the ODE function during integration will usually happen at points that are different from the solution points.
For the other variables, extract the computation of position and velocities into functions to have the constant and composition in one place only:
def xy_pos(t): return 3 + 0.3*np.cos(t), 0.5 + 0.3*np.sin(t)
def xy_vel(t): return -0.3*np.sin(t), 0.3*np.cos(t)
def xy_acc(t): return -0.3*np.cos(t), -0.3*np.sin(t)
or similar that you can then use both inside the ODE function and in preparing the plots.
What Simulink most likely does is to collect all the equations of all the blocks and form this into one big ODE system which is then solved for the whole state at once. You will need to implement something similar. One big state vector, and each subsystem knows its slice of the state resp. derivatives vector to get its specific state variables from and write the derivatives to. The computation of the derivatives can then use values communicated among the subsystems.
What you are trying to do, solving the subsystems separately, works only for resp. will likely result in a order 1 integration method. All higher order methods need to be able to simultaneously shift the state in some direction computed from previous stages of the method, and evaluate the whole system there.

Schrodinger equation not evolving properly with time?

I'm writing a code in python to evolve the time-dependent Schrodinger equation using the Crank-Nicolson scheme. I didn't know how to deal with the potential so I looked around and found a way from this question, which I have verified from a couple other sources. According to them, for a harmonic oscillator potential, the C-N scheme gives
AΨn+1=A∗Ψn
where the elements on the main diagonal of A are dj=1+[(iΔt) / (2m(Δx)^2)]+[(iΔt(xj)^2)/4] and the elements on the upper and lower diagonals are a=−iΔt/[4m(Δx)^2]
The way I understand it, I'm supposed to give an initial condition(I've chosen a coherent state) in the form of the matrix Ψn and I need to compute the matrix Ψn+1 , which is the wave function after time Δt. To obtain Ψn+1 for a given step, I'm inverting the matrix A and multiplying it with the matrix A* and then multiplying the result with Ψn. The resulting matrix then becomes Ψn for the next step.
But when I'm doing this, I'm getting an incorrect animation. The wave packet is supposed to oscillate between the boundaries but in my animation, it is barely moving from its initial mean value. I just don't understand what I'm doing wrong. Is my understanding of the problem wrong? Or is it a flaw in my code?Please help! I've posted my code below and the video of my animation here. I'm sorry for the length of the code and the question but it's driving me crazy not knowing what my mistake is.
import numpy as np
import matplotlib.pyplot as plt
L = 30.0
x0 = -5.0
sig = 0.5
dx = 0.5
dt = 0.02
k = 1.0
w=2
K=w**2
a=np.power(K,0.25)
xs = np.arange(-L,L,dx)
nn = len(xs)
mu = k*dt/(dx)**2
dd = 1.0+mu
ee = 1.0-mu
ti = 0.0
tf = 100.0
t = ti
V=np.zeros(len(xs))
u=np.zeros(nn,dtype="complex")
V=K*(xs)**2/2 #harmonic oscillator potential
u=(np.sqrt(a)/1.33)*np.exp(-(a*(xs - x0))**2)+0j #initial condition for wave function
u[0]=0.0 #boundary condition
u[-1] = 0.0 #boundary condition
A = np.zeros((nn-2,nn-2),dtype="complex") #define A
for i in range(nn-3):
A[i,i] = 1+1j*(mu/2+w*dt*xs[i]**2/4)
A[i,i+1] = -1j*mu/4.
A[i+1,i] = -1j*mu/4.
A[nn-3,nn-3] = 1+1j*mu/2+1j*dt*xs[nn-3]**2/4
B = np.zeros((nn-2,nn-2),dtype="complex") #define A*
for i in range(nn-3):
B[i,i] = 1-1j*mu/2-1j*w*dt*xs[i]**2/4
B[i,i+1] = 1j*mu/4.
B[i+1,i] = 1j*mu/4.
B[nn-3,nn-3] = 1-1j*(mu/2)-1j*dt*xs[nn-3]**2/4
X = np.linalg.inv(A) #take inverse of A
plt.ion()
l, = plt.plot(xs,np.abs(u),lw=2,color='blue') #plot initial wave function
T=np.matmul(X,B) #multiply A inverse with A*
while t<tf:
u[1:-1]=np.matmul(T,u[1:-1]) #updating u but leaving the boundary conditions unchanged
l.set_ydata((abs(u))) #update plot with new u
t += dt
plt.pause(0.00001)
After a lot of tinkering, it came down to reducing my step size. That did the job for me- I reduced the step size and the program worked. If anyone is facing the same problem as I am, I recommend playing around with the step sizes. Provided that the rest of the code is fine, this is the only possible area of error.

Object Looking Skewed After Essential Matrix Calculation and Projection

I am trying to calculate an essential and a projection matrix from two images. I will then use them to project a 3D object onto the image. The two images I used are
I picked a few pixel correspondences, and fed that to a SVD based least square mechanism which the books say gives me the essential matrix. I used the code below for this task (code is based mostly on Eric Solem's Programming Computer Vision with Python book):
import scipy.linalg as lin
import pandas as pd
def skew(a):
return np.array([[0,-a[2],a[1]],[a[2],0,-a[0]],[-a[1],a[0],0]])
def essential(x1,x2):
n = x1.shape[1]
A = np.zeros((n,9))
for i in range(n):
A[i] = [ x1[0,i]*x2[0,i], \
x1[0,i]*x2[1,i], \
x1[0,i]*x2[2,i], \
x1[1,i]*x2[0,i], \
x1[1,i]*x2[1,i], \
x1[1,i]*x2[2,i], \
x1[2,i]*x2[0,i], \
x1[2,i]*x2[1,i], \
x1[2,i]*x2[2,i]]
U,S,V = lin.svd(A)
F = V[-1].reshape(3,3)
return F
def compute_P_from_essential(E):
U,S,V = lin.svd(E)
if lin.det(np.dot(U,V))<0: V = -V
E = np.dot(U,np.dot(np.diag([1,1,0]),V))
Z = skew([0,0,-1])
W = np.array([[0,-1,0],[1,0,0],[0,0,1]])
P2 = [np.vstack((np.dot(U,np.dot(W,V)).T,U[:,2])).T,
np.vstack((np.dot(U,np.dot(W,V)).T,-U[:,2])).T,
np.vstack((np.dot(U,np.dot(W.T,V)).T,U[:,2])).T,
np.vstack((np.dot(U,np.dot(W.T,V)).T,-U[:,2])).T]
return P2
points = [ \
[266,163,296,160],[265,237,297,266],\
[76,288,51,340],[135,31,142,4],\
[344,167,371,156],[48,165,71,164],\
[151,68,166,56],[237,26,259,19],\
[226,147,254,140]]
df = pd.DataFrame(points)
df['uno'] = 1.
x1 = np.array(df[[0,1,'uno']].T)
x2 = np.array(df[[2,3,'uno']].T)
print x1
print x2
E = essential(x1,x2)
P = compute_P_from_essential(E)
import pandas as pd
x0 = 3.; y0 = 1.; z0 = 1.
print df.shape
e = 1
cube = [[x0,y0,z0],[x0+e,y0,z0],[x0+e,y0+e,z0],[x0,y0+e,z0],
[x0,y0,z0+e],[x0+e,y0,z0+e],[x0+e,y0+e,z0+e],[x0,y0+e,z0+e]]
cube = pd.DataFrame(cube)
cube['1'] = 1.
xx = np.dot(P[1], cube.T) * 100.
xx[1,:] = 360-xx[1,:]
#xx = xx / xx[2]
print xx[0].shape
plt.plot(xx[0], xx[1],'.')
plt.xlim(0,640)
plt.ylim(0,360)
I calculated the essential matrix, then the projection matrix, then used that to project a 3D cube. The result:
This looks skewed, I am not sure why this happened. Any ideas on how to fix this?
Thanks,
First of all, it looks like you are computing the essential matrix using exactly 9 points. You can do this using only 8 (since scale is a free parameter, you can multiply the essential by a scalar and it will stay the same so you can fix one of the parameters and just use 8 points, but I digress.) However, in practice this is a very bad idea because your 8 points might have poor spatial configuration. So what you want to do is to select N matches (600 for example), and use an algorithm like RANSAC to determine the best Essential matrix. But aside from that, what I'd recommend to debug such applications is this: compute the Fundalental matrix F based on the Essential you just computed. Now you can select a point in image 1 and then display the corresponding epipolar line in the second one. That will help you visually evaluate and thus debug the estimation of the Essential.

How to correctly set a random number generator?

In order to develop my implementation of the particle filter algorithm, I need to generate hypotheses about the movements relating to the object to be tracked: if I set N samples and if I use a 2-by-1 state vector, then at each step I have to generate N pairs of random values (a 2-by-N matrix). Moreover, if I know the statistics of movements (mean and standard deviation), then I could use the mean and standard deviation to generate all N values. Finally, to model the uncertainty of the movement, I could generate a noise matrix (a 2-by-N matrix) and add it to the matrix of movements.
Based on these premises, I have implemented the algorithm running in matlab, and I used the following code in order to generate the hypotheses of movement.
ds_mean = [dx_mean dy_mean];
ds_stddev = [dx_stddev dy_stddev];
d = 5;
V = zeros(2,N);
V(1,:) = normrnd(ds_mean(1),ds_stddev(1),1,N); % hypotheses of movement on x axis
V(2,:) = normrnd(ds_mean(2),ds_stddev(2),1,N); % hypotheses of movement on y axis
E = d*randn(2,N); % weighted noise
M = V + E; % hypotheses of movement
A problem occurred when I had to implement the same algorithm using C++ and OpenCV: substantially, while the above matlab code generates good predictions (it works great), instead the same code written in C++ (see the code below) generates poor predictions (ie far away from the object). Why?
RNG m_rng;
x_mean = // ...
y_mean = // ...
x_stddev = // ...
y_stddev = // ...
Mat velocity(STATE_DIM, NUM_PARTICLES, DataType<double>::type);
m_rng.fill(velocity.row(0), RNG::NORMAL, x_mean, x_stddev);
m_rng.fill(velocity.row(1), RNG::NORMAL, y_mean, y_stddev);
Mat noise(STATE_DIM, NUM_PARTICLES, DataType<double>::type);
m_rng.fill(noise,RNG::NORMAL,0,1);
noise *= d; % weighted noise
movements = velocity + noise;
How to make sure that the C++ algorithm works as well as the algorithm implemented in matlab?
I think I just serendipitously answered your question here, or at least provided an alternative solution.
https://stackoverflow.com/a/13897938/1899861
I believe this will generate proper random numbers, and has been tested to death when compiled using Microsoft C on Intel processors (386, 486, Pentium).
FYI, 4.0 * atan(1.0) yields a much better value of PI than the constant in the above environment.