Creating Emax model in pymc3 - pymc3

I am trying to build an Emax model using pymc3 based on the data and model in this video..
(about 40mins in)
https://www.youtube.com/watch?v=U9Nf-ZYHRQA&feature=youtu.be&list=PLvLDbH2lpyXNGV8mpBdF7EFK9LQJzGL-Y
Here is a screen shot showing the model...
My code is here...
pkpd_model = Model()
with pkpd_model:
# Hyperparameter Priors
mu_e0 = Normal('mu_e0', mu=0, tau =1000)
tau_e0 = Uniform('tau_e0', lower=0, upper =100)
mu_emax = Normal('mu_emax', mu=0, tau =1000)
tau_emax = Uniform('tau_emax', lower=0, upper =100)
e0 = Lognormal('e0', mu = mu_e0, tau=tau_e0, shape =n_studies)
emax= Lognormal('emax', mu = mu_emax, tau =tau_emax, shape =n_studies)
ed50 = Lognormal('ed50', mu=1, tau = 1000)
# Normalise sigma for sample size
sigma = np.sqrt(np.square(Uniform('sigma', lower = 0, upper = 1000 ))/n)
# Expected value of outcome
resp_median = e0[study] + (emax[study]*dose)/(ed50+dose)
# Likelihood (sampling distribution) of observations
resp = Lognormal('resp', mu=resp_median, tau =sigma, observed =mean_response)
resp_pred = Lognormal('resp_pred', mu=resp_median, tau =sigma, shape =len(dose))
The model runs and fits okay, its just that the posterior estimates for the model parameter are not near what I am expecting. for example my emax estimate is around 2 but you can clearly see that from the data it should be around 10. So I can only assume I have made an error in building the model, but I cannot for the the life of me see what it is.
Can you help?
Thanks
Mark

After reexamining the model specs I saw that the resp_median is actually the log of that expression. So...
esp_median = np.log(e0[study] + (emax[study]*dose)/(ed50+dose))
improves things. However I am still not sure that I have specified the distributions correctly. Any pointers anyone?

Related

Plot variables out of differential equations system function

I have a 4-4 differential equations system in a function (subsystem4) and I solved it with odeint funtion. I managed to plot the results of the system. My problem is that I want to plot and some other equations (e.g. x,y,vcxdot...) which are included in the same function (subsystem4) but I get NameError: name 'vcxdot' is not defined. Also, I want to use some of these equations (not only the results of the equation's system) as inputs in a following differential equations system and plot all the equations in the same period of time (t). I have done this using Matlab-Simulink but it was much easier because of Simulink blocks. How can I have access to and plot all the equations of a function (subsystem4) and use them as input in a following system? I am new in python and I use Python 2.7.12. Thank you in advance!
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def subsystem4(u,t):
added_mass_x = 0.03 # kg
added_mass_y = 0.04
mb = 0.3 # kg
m1 = mb-added_mass_x
m2 = mb-added_mass_y
l1 = 0.07 # m
l2 = 0.05 # m
J = 0.00050797 # kgm^2
Sa = 0.0110 # m^2
Cd = 2.44
Cl = 3.41
Kd = 0.000655 # kgm^2
r = 1000 # kg/m^3
f = 2 # Hz
c1 = 0.5*r*Sa*Cd
c2 = 0.5*r*Sa*Cl
c3 = 0.5*mb*(l1**2)
c4 = Kd/J
c5 = (1/(2*J))*(l1**2)*mb*l2
c6 = (1/(3*J))*(l1**3)*mb
vcx = u[0]
vcy = u[1]
psi = u[2]
wz = u[3]
x = 3 + 0.3*np.cos(t)
y = 0.5 + 0.3*np.sin(t)
xdot = -0.3*np.sin(t)
ydot = 0.3*np.cos(t)
xdotdot = -0.3*np.cos(t)
ydotdot = -0.3*np.sin(t)
vcx = xdot*np.cos(psi)-ydot*np.sin(psi)
vcy = ydot*np.cos(psi)+xdot*np.sin(psi)
psidot = wz
vcxdot = xdotdot*np.cos(psi)-xdot*np.sin(psi)*psidot-ydotdot*np.sin(psi)-ydot*np.cos(psi)*psidot
vcydot = ydotdot*np.cos(psi)-ydot*np.sin(psi)*psidot+xdotdot*np.sin(psi)+xdot*np.cos(psi)*psidot
g1 = -(m1/c3)*vcxdot+(m2/c3)*vcy*wz-(c1/c3)*vcx*np.sqrt((vcx**2)+(vcy**2))+(c2/c3)*vcy*np.sqrt((vcx**2)+(vcy**2))*np.arctan2(vcy,vcx)
g2 = (m2/c3)*vcydot+(m1/c3)*vcx*wz+(c1/c3)*vcy*np.sqrt((vcx**2)+(vcy**2))+(c2/c3)*vcx*np.sqrt((vcx**2)+(vcy**2))*np.arctan2(vcy,vcx)
A = 12*np.sin(2*np.pi*f*t+np.pi)
if A>=0.1:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2-c6*np.sqrt((g1**2)+(g2**2))
elif A<-0.1:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2+c6*np.sqrt((g1**2)+(g2**2))
else:
wzdot = ((m1-m2)/J)*vcx*vcy-c4*wz**2*np.sign(wz)-c5*g2
return [vcxdot,vcydot,psidot,wzdot]
u0 = [0,0,0,0]
t = np.linspace(0,15,1000)
u = odeint(subsystem4,u0,t)
vcx = u[:,0]
vcy = u[:,1]
psi = u[:,2]
wz = u[:,3]
plt.figure(1)
plt.subplot(211)
plt.plot(t,vcx,'r-',linewidth=2,label='vcx')
plt.plot(t,vcy,'b--',linewidth=2,label='vcy')
plt.plot(t,psi,'g:',linewidth=2,label='psi')
plt.plot(t,wz,'c',linewidth=2,label='wz')
plt.xlabel('time')
plt.legend()
plt.show()
To the immediate question of plotting the derivatives, you can get the velocities by directly calling the ODE function again on the solution,
u = odeint(subsystem4,u0,t)
udot = subsystem4(u.T,t)
and get the separate velocity arrays via
vcxdot,vcydot,psidot,wzdot = udot
In this case the function involves branching, which is not very friendly to vectorized calls of it. There are ways to vectorize branching, but the easiest work-around is to loop manually through the solution points, which is slower than a working vectorized implementation. This will again procude a list of tuples like odeint, so the result has to be transposed as a tuple of lists for "easy" assignment to the single array variables.
udot = [ subsystem4(uk, tk) for uk, tk in zip(u,t) ];
vcxdot,vcydot,psidot,wzdot = np.asarray(udot).T
This may appear to double somewhat the computation, but not really, as the solution points are usually interpolated from the internal step points of the solver. The evaluation of the ODE function during integration will usually happen at points that are different from the solution points.
For the other variables, extract the computation of position and velocities into functions to have the constant and composition in one place only:
def xy_pos(t): return 3 + 0.3*np.cos(t), 0.5 + 0.3*np.sin(t)
def xy_vel(t): return -0.3*np.sin(t), 0.3*np.cos(t)
def xy_acc(t): return -0.3*np.cos(t), -0.3*np.sin(t)
or similar that you can then use both inside the ODE function and in preparing the plots.
What Simulink most likely does is to collect all the equations of all the blocks and form this into one big ODE system which is then solved for the whole state at once. You will need to implement something similar. One big state vector, and each subsystem knows its slice of the state resp. derivatives vector to get its specific state variables from and write the derivatives to. The computation of the derivatives can then use values communicated among the subsystems.
What you are trying to do, solving the subsystems separately, works only for resp. will likely result in a order 1 integration method. All higher order methods need to be able to simultaneously shift the state in some direction computed from previous stages of the method, and evaluate the whole system there.

Use Chi-Squared statistic in pymc3

I am trying to use PyMC3 to fit a model to some observed data. This model is based on external code (interfaced via theano.ops.as_op), and depends on multiple parameters that should be fit by the MCMC process. Since the gradient of the external code cannot be determined, I use the Metropolis-Hastings sampler.
I have established Uniform priors for my inputs, and generate a model using my custom code. However, I want to compare the simulated data to my observations (a 3D np.ndarray) using the chi-squared statistic (sum of the squares of data-model/sigma^2) to obtain a log-likelihood. When the MCMC samples are drawn, this should lead to the trace converging on the best values of each parameter.
My model is explained in the following semi-pseudocode (if that's even a word):
import pymc3 as pm
#Some stuff setting up the data, preparing some functions etc.
#theano.compile.ops.as_op(itypes=[input types],otypes = [output types])
def make_model(inputs):
#Wrapper to external code to generate simulated data
return simulated data
model = pm.model()
with model:
#priors for 13 input parameters
simData = make_model(inputs)
I now want to obtain the chi-squared logLikelihood for this model versus the data, which I think can be done using pm.ChiSquared, however I do not see how to combine the data, model and this distribution together to cause the sampler to perform correctly. I would guess it might look something like:
chiSq = pm.ChiSquared(nu=data.size, observed = (data-simData)**2/err**2)
trace = pm.sample(1000)
Is this correct? In running previous tests, I have found the samples appear to be simply drawn from the priors.
Thanks in advance.
Taking aloctavodia's advice, I was able to get parameter estimates for some toy exponential data using a pm.Normal likelihood. Using a pm.ChiSquared likelihood as the OP suggested, the model converged to correct values, but the posteriors on the parameters were roughly three times as broad. Here's the code for the model; I first generated data and then fit with PyMC3.
# Draw `nPoints` observed data points `y_obs` from the function
# 3. + 18. * numpy.exp(-.2 * x)
# with the points evaluated at `x_obs`
# x_obs = numpy.linspace(0, 100, nPoints)
# Add Normal(mu=0,sd=`cov`) noise to each point in `y_obs`
# Then instantiate PyMC3 model for fit:
def YModel(x, c, a, l):
# exponential model expected to describe the data
mu = c + a * pm.math.exp(-l * x)
return mu
def logp(y_mod, y_obs):
# Normal distribution likelihood
return pm.Normal.dist(mu = y_mod, sd = cov).logp(y_obs)
# Chi squared likelihood (to use, comment preceding line & uncomment next 2 lines)
#chi2 = chi2 = pm.math.sum( ((y_mod - y_obs)/cov)**2 )
#return pm.ChiSquared.dist(nu = nPoints).logp(chi2)
with pm.Model() as model:
c = pm.Uniform('constant', lower = 0., upper = 10., testval = 5.)
a = pm.Uniform('amplitude', lower = 0., upper = 50., testval = 25.)
l = pm.Uniform('lambda', lower = 0., upper = 10., testval = 5.)
y_mod = YModel(x_obs, c, a, l)
L = pm.DensityDist('L', logp, observed = {'y_mod': y_mod, 'y_obs': y_obs}, testval = {'y_mod': y_mod, 'y_obs': y_obs})
step = pm.Metropolis([c, a, l])
trace = pm.sample(draws = 10000, step = step)
The above model converged, but I found that success was sensitive to the bounds on the priors and the initial guesses on those parameters.
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
c 3.184397 0.111933 0.002563 2.958383 3.397741 1834.0 1.000260
a 18.276887 0.747706 0.019857 16.882025 19.762849 1343.0 1.000411
l 0.200201 0.013486 0.000361 0.174800 0.226480 1282.0 0.999991
(Edited: I had forgotten to sum the squares of the normalized residuals for chi2)

Schrodinger equation not evolving properly with time?

I'm writing a code in python to evolve the time-dependent Schrodinger equation using the Crank-Nicolson scheme. I didn't know how to deal with the potential so I looked around and found a way from this question, which I have verified from a couple other sources. According to them, for a harmonic oscillator potential, the C-N scheme gives
AΨn+1=A∗Ψn
where the elements on the main diagonal of A are dj=1+[(iΔt) / (2m(Δx)^2)]+[(iΔt(xj)^2)/4] and the elements on the upper and lower diagonals are a=−iΔt/[4m(Δx)^2]
The way I understand it, I'm supposed to give an initial condition(I've chosen a coherent state) in the form of the matrix Ψn and I need to compute the matrix Ψn+1 , which is the wave function after time Δt. To obtain Ψn+1 for a given step, I'm inverting the matrix A and multiplying it with the matrix A* and then multiplying the result with Ψn. The resulting matrix then becomes Ψn for the next step.
But when I'm doing this, I'm getting an incorrect animation. The wave packet is supposed to oscillate between the boundaries but in my animation, it is barely moving from its initial mean value. I just don't understand what I'm doing wrong. Is my understanding of the problem wrong? Or is it a flaw in my code?Please help! I've posted my code below and the video of my animation here. I'm sorry for the length of the code and the question but it's driving me crazy not knowing what my mistake is.
import numpy as np
import matplotlib.pyplot as plt
L = 30.0
x0 = -5.0
sig = 0.5
dx = 0.5
dt = 0.02
k = 1.0
w=2
K=w**2
a=np.power(K,0.25)
xs = np.arange(-L,L,dx)
nn = len(xs)
mu = k*dt/(dx)**2
dd = 1.0+mu
ee = 1.0-mu
ti = 0.0
tf = 100.0
t = ti
V=np.zeros(len(xs))
u=np.zeros(nn,dtype="complex")
V=K*(xs)**2/2 #harmonic oscillator potential
u=(np.sqrt(a)/1.33)*np.exp(-(a*(xs - x0))**2)+0j #initial condition for wave function
u[0]=0.0 #boundary condition
u[-1] = 0.0 #boundary condition
A = np.zeros((nn-2,nn-2),dtype="complex") #define A
for i in range(nn-3):
A[i,i] = 1+1j*(mu/2+w*dt*xs[i]**2/4)
A[i,i+1] = -1j*mu/4.
A[i+1,i] = -1j*mu/4.
A[nn-3,nn-3] = 1+1j*mu/2+1j*dt*xs[nn-3]**2/4
B = np.zeros((nn-2,nn-2),dtype="complex") #define A*
for i in range(nn-3):
B[i,i] = 1-1j*mu/2-1j*w*dt*xs[i]**2/4
B[i,i+1] = 1j*mu/4.
B[i+1,i] = 1j*mu/4.
B[nn-3,nn-3] = 1-1j*(mu/2)-1j*dt*xs[nn-3]**2/4
X = np.linalg.inv(A) #take inverse of A
plt.ion()
l, = plt.plot(xs,np.abs(u),lw=2,color='blue') #plot initial wave function
T=np.matmul(X,B) #multiply A inverse with A*
while t<tf:
u[1:-1]=np.matmul(T,u[1:-1]) #updating u but leaving the boundary conditions unchanged
l.set_ydata((abs(u))) #update plot with new u
t += dt
plt.pause(0.00001)
After a lot of tinkering, it came down to reducing my step size. That did the job for me- I reduced the step size and the program worked. If anyone is facing the same problem as I am, I recommend playing around with the step sizes. Provided that the rest of the code is fine, this is the only possible area of error.

Trying to fit a sine function to phased light curve

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model,Parameters
f2= "KELT_N16_lc_006261_V01_west_tfa.dat"
t2="TIMES" # file name
NewData2 = np.loadtxt(t2, dtype=float, unpack=True)
NewData = np.loadtxt(f2,dtype=float, unpack=True, usecols=(1,))
flux = NewData
time= NewData2
new_flux=np.hstack([flux,flux])
# fold
period = 2.0232 # period (must be known already!)
foldTimes = ((time)/ period) # divide by period to convert to phase
foldTimes = foldTimes % 1 # take fractional part of phase only (i.e. discard whole number part)
new_phase=np.hstack([foldTimes+1,foldTimes])
print len(new_flux)
print len(new_phase)
def Wave(x, new_flux,new_phase):
wave = new_flux*np.sin(new_phase+x)
return wave
model = Model(Wave)
print "Independent Vars:", model.independent_vars
print "Parameters:",model.param_names
p = Parameters()
p.add_many(('new_flux',13.42, True, None, None, None) )
p.add_many(('new_phase',0,True, None, None, None) )
result=model.fit(new_flux,x=new_phase,params=p,weights= None)
plt.scatter(new_phase,new_flux,marker='o',edgecolors='none',color='blue',s=5.0, label="Period: 2.0232 days")
plt.ylim([13.42,13.54])
plt.xlim(0,2)
plt.gca().invert_yaxis()
plt.title('HD 240121 Light Curve with BJD Correction')
plt.ylabel('KELT Instrumental Magnitude')
plt.xlabel('Phase')
legend = plt.legend(loc='lower right', shadow=True)
plt.scatter(new_phase,result.best_fit,label="One Oscillation Fit", color='red',s=60.0)
plt.savefig('NewEpoch.png')
print result.fit_report()
I am trying to fit a sine function to phased light curve data for a research project. However, I am unsure as to where I am going wrong, and I believe it lays in my parameters. It appears that the fit has an amplitude that is too high, and a period that is too long. Any help would be appreciated. Thank you!
This is what the graph looks like now (Attempt at fitting a sine function to my dataset):
A couple of comments/suggestions:
First, it is almost certainly better to replace
p = Parameters()
p.add_many(('new_flux',13.42, True, None, None, None) )
p.add_many(('new_phase',0,True, None, None, None) )
with
p = Parameters()
p.add('new_flux', value=13.42, vary=True)
p.add('new_phase', value=0, vary=True)
Second, your model does not include a DC offset, but your data clearly has one. The offset is approximately 13.4 and the amplitude of the sine wave is approximately 0.05. While you're at it, you probably want to include a scale the phase as a well as an offset, so that the model is
offset + amplitude * sin(scale*x + phase_shift)
You don't necessarily have to vary all of those, but making your model more general will allow to see how the phase shift and scale are correlated -- given the noise level in your data, that might be important.
With the more general model, you can try a few sets of parameter values, using model.eval() to evaluate a model with a set of Parameters. Once you have a better model and reasonable starting points, you should get a reasonable fit.
How could we help you with your uncommented code?
How do we know what is what and what should it do?
What method for fitting are you using?
Where is the data and in what form ?
I would start with computing the approximate sin wave parameters. Let assume you got some input data in form of n points with x,y coordinates. And want to fit a sin wave:
y(t) = y0+A*sin(x0+x(t)*f)
Where y0 is the y offset, x0 is phase offset, A is amplitude and f is angular frequency.
I would:
Compute avg y value
y0 = sum(data[i].y)/n where i={0,1,2,...n-1}
this is the mean value representing possible y offset y0 of your sin wave.
compute avg distance to y0
d = sum (|data[i].y-y0|)/n where i={0,1,2,...n-1}
If my memory serves well this should be the effective value of amplitude so:
A = sqrt(2)*d
find zero crossings in the dataset
for this the dataset should be sorted by x so sort it if it is not. Remember index i for: first crossing i0, last crossing i1 and number of crossings found j from this we can estimate frequency and phase offset:
f=M_PI*double(j-1)/(datax[i1]-datax[i0]);
x0=-datax[i0]*f;
To determine which half sin wave we aligned to just check the sign of middle point between first two zero crossings
i1=i0+((i1-i0)/(j-1));
if (datay[(i0+i1)>>1]<=y0) x0+=M_PI;
Or check for specific zero crossing pattern instead.
That is all now we have approximate x0,y0,f,A parametters of sinwave.
Here some C++ code I tested with (sorry I do not use Python):
//---------------------------------------------------------------------------
#include <math.h>
// input data
const int n=5000;
double datax[n];
double datay[n];
// fitted sin wave
double A,x0,y0,f;
//---------------------------------------------------------------------------
void data_generate() // genere random noisy sinvawe
{
int i;
double A=150.0,x0=250.0,y0=200.0,f=0.03,r=20.0;
double x,y;
Randomize();
for (i=0;i<n;i++)
{
x=800.0*double(i)/double(n);
y=y0+A*sin(x0+x*f);
datax[i]=x+r*Random();
datay[i]=y+r*Random();
}
}
//---------------------------------------------------------------------------
void data_fit() // find raw approximate of x0,y0,f,A
{
int i,j,e,i0,i1;
double x,y,q0,q1;
// y0 = avg(y)
for (y0=0.0,i=0;i<n;i++) y0+=datay[i]; y0/=double(n);
// A = avg(|y-y0|)
for (A=0.0,i=0;i<n;i++) A+=fabs(datay[i]-y0); A/=double(n); A*=sqrt(2.0);
// bubble sort data by x asc
for (e=1,j=n;e;j--)
for (e=0,i=1;i<j;i++)
if (datax[i-1]>datax[i])
{
x=datax[i-1]; datax[i-1]=datax[i]; datax[i]=x;
y=datay[i-1]; datay[i-1]=datay[i]; datay[i]=y;
e=1;
}
// find zero crossings
for (i=0,j=0;i<n;)
{
// find value below zero
for (;i<n;i++) if (datay[i]-y0<=-0.75*A) break; e=i;
// find value above zero
for (;i<n;i++) if (datay[i]-y0>=+0.75*A) break;
if (i>=n) break;
// find point closest to zero
for (i1=e;e<i;e++)
if (fabs(datay[i1]-y0)>fabs(datay[e]-y0)) i1=e;
if (!j) i0=i1; j++;
}
f=2.0*M_PI*double(j-1)/(datax[i1]-datax[i0]);
x0=-datax[i0]*f;
}
//---------------------------------------------------------------------------
And preview:
The dots are generated noisy data and blue curve is fitted sin wave.
On top of all this you can build your fitting to increase precision. Does not matter which method you will use for the search around found parameters. For example I would go for:
How approximation search works

How efficient/intelligent is Theano in computing gradients?

Suppose I have an artificial neural networks with 5 hidden layers. For the moment, forget about the details of the neural network model such as biases, the activation functions used, type of data and so on ... . Of course, the activation functions are differentiable.
With symbolic differentiation, the following computes the gradients of the objective function with respect to the layers' weights:
w1_grad = T.grad(lost, [w1])
w2_grad = T.grad(lost, [w2])
w3_grad = T.grad(lost, [w3])
w4_grad = T.grad(lost, [w4])
w5_grad = T.grad(lost, [w5])
w_output_grad = T.grad(lost, [w_output])
This way, to compute the gradients w.r.t w1 the gradients w.r.t w2, w3, w4 and w5 must first be computed. Similarly to compute the gradients w.r.t w2 the gradients w.r.t w3, w4 and w5 must be computed first.
However, I could the following code also computes the gradients w.r.t to each weight matrix:
w1_grad, w2_grad, w3_grad, w4_grad, w5_grad, w_output_grad = T.grad(lost, [w1, w2, w3, w4, w5, w_output])
I was wondering, is there any difference between these two methods in terms of performance? Is Theano intelligent enough to avoid re-computing the gradients using the second method? By intelligent I mean to compute w3_grad, Theano should [preferably] use the pre-computed gradients of w_output_grad, w5_grad and w4_grad instead of computing them again.
Well it turns out Theano does not take the previously-computed gradients to compute the gradients in lower layers of a computational graph. Here's a dummy example of a neural network with 3 hidden layers and an output layer. However, it's not going to be a big deal at all since computing the gradients is a once-in-a-life-time operation unless you have to compute the gradient on each iteration. Theano returns a symbolic expression for the derivatives as a computational graph and you can simply use it as a function from that point on. From that point on we simply use the function derived by Theano to compute numerical values and update the weights using those.
import theano.tensor as T
import time
import numpy as np
class neuralNet(object):
def __init__(self, examples, num_features, num_classes):
self.w = shared(np.random.random((16384, 5000)).astype(T.config.floatX), borrow = True, name = 'w')
self.w2 = shared(np.random.random((5000, 3000)).astype(T.config.floatX), borrow = True, name = 'w2')
self.w3 = shared(np.random.random((3000, 512)).astype(T.config.floatX), borrow = True, name = 'w3')
self.w4 = shared(np.random.random((512, 40)).astype(T.config.floatX), borrow = True, name = 'w4')
self.b = shared(np.ones(5000, dtype=T.config.floatX), borrow = True, name = 'b')
self.b2 = shared(np.ones(3000, dtype=T.config.floatX), borrow = True, name = 'b2')
self.b3 = shared(np.ones(512, dtype=T.config.floatX), borrow = True, name = 'b3')
self.b4 = shared(np.ones(40, dtype=T.config.floatX), borrow = True, name = 'b4')
self.x = examples
L1 = T.nnet.sigmoid(T.dot(self.x, self.w) + self.b)
L2 = T.nnet.sigmoid(T.dot(L1, self.w2) + self.b2)
L3 = T.nnet.sigmoid(T.dot(L2, self.w3) + self.b3)
L4 = T.dot(L3, self.w4) + self.b4
self.forwardProp = T.nnet.softmax(L4)
self.predict = T.argmax(self.forwardProp, axis = 1)
def loss(self, y):
return -T.mean(T.log(self.forwardProp)[T.arange(y.shape[0]), y])
x = T.matrix('x')
y = T.ivector('y')
nnet = neuralNet(x)
loss = nnet.loss(y)
diffrentiationTime = []
for i in range(100):
t1 = time.time()
gw, gw2, gw3, gw4, gb, gb2, gb3, gb4 = T.grad(loss, [nnet.w, nnet.w2, logReg.w3, nnet.w4, nnet.b, nnet.b2, nnet.b3, nnet.b4])
diffrentiationTime.append(time.time() - t1)
print 'Efficient Method: Took %f seconds with std %f' % (np.mean(diffrentiationTime), np.std(diffrentiationTime))
diffrentiationTime = []
for i in range(100):
t1 = time.time()
gw = T.grad(loss, [nnet.w])
gw2 = T.grad(loss, [nnet.w2])
gw3 = T.grad(loss, [nnet.w3])
gw4 = T.grad(loss, [nnet.w4])
gb = T.grad(loss, [nnet.b])
gb2 = T.grad(loss, [nnet.b2])
gb3 = T.grad(loss, [nnet.b3])
gb4 = T.grad(loss, [nnet.b4])
diffrentiationTime.append(time.time() - t1)
print 'Inefficient Method: Took %f seconds with std %f' % (np.mean(diffrentiationTime), np.std(diffrentiationTime))
This will print out the followings:
Efficient Method: Took 0.061056 seconds with std 0.013217
Inefficient Method: Took 0.305081 seconds with std 0.026024
This shows that Theano uses a dynamic-programming approach to compute gradients for the efficient method.