Why is using dot product worsening the performance for PyMC3?

Why is using dot product worsening the performance for PyMC3? - pymc3

I am trying to run a simple linear regression using PyMC3. The below code is a snippet:
import numpy as np
from pymc3 import Model, sample, Normal, HalfCauchy
import pymc3 as pm
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
with Model() as multiple_regression_model:
beta = Normal('beta', mu=0, sd=1000, shape=2)
sigma = HalfCauchy('sigma', 1000)
y_hat = beta[0] + X * beta[1]
exp = Normal('y', y_hat, sigma=sigma, observed=y)
with multiple_regression_model:
trace = sample(1000, tune=1000)
trace['beta'].mean(axis=0)
The above code runs in about 6 seconds and gives reasonable estimates for the betas ([-0.19646408, 1.00053091])
But when I try to use the dot product, things get really bad:
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
X_aug_np = np.squeeze(np.dstack((np.ones((500, 1)), X)))
with Model() as multiple_regression_model:
beta = Normal('beta', mu=0, sd=1000, shape=2)
sigma = HalfCauchy('sigma', 1000)
y_hat = pm.math.dot(X_aug_np, beta)
exp = Normal('y', y_hat, sigma=sigma, observed=y)
with multiple_regression_model:
trace = sample(1000, tune=1000)
trace['beta'].mean(axis=0)
Now the code finished in 56 seconds and the estimates are totally off ([249.52363555, -0.0000481 ]).
I thought using dot product will make things faster. Why is it behaving this way? Am I doing something wrong here?

This is a subtle shape and broadcasting bug: if you change the shape of beta to (2, 1), then it works.
To see why, I renamed the two models and tidied the code a bit:
import numpy as np
import pymc3 as pm
X = np.arange(500).reshape(500, 1)
y = np.random.normal(0, 5, [500, 1]) + X
X_aug_np = np.squeeze(np.dstack((np.ones((500, 1)), X)))
with pm.Model() as basic_model:
beta = pm.Normal('beta', mu=0, sd=1000, shape=2)
sigma = pm.HalfCauchy('sigma', 1000)
y_hat = beta[0] + X * beta[1]
exp = pm.Normal('y', y_hat, sigma=sigma, observed=y)
with pm.Model() as matmul_model:
beta = pm.Normal('beta', mu=0, sd=1000, shape=(2, 1))
sigma = pm.HalfCauchy('sigma', 1000)
y_hat = pm.math.dot(X_aug_np, beta)
exp = pm.Normal('y', y_hat, sigma=sigma, observed=y)
How would you have found that out? Since it looked like the models were the same, but they were not sampling similarly, I ran
print(matmul_model.check_test_point())
print(basic_model.check_test_point())
which computes the log probability of the variables at a sensible default. This did not match up, so I checked exp.tag.test_value.shape, and found out it was (500, 500), when I expected it to be (500, 1). Shape handling is super hard in probabilistic programming, and this happened because exp broadcasts y_hat, sigma, and y together.
As an added problem, I could not get matmul_model to sample on my machine, without setting cores=1, chains=4.

Related

Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

I am trying to implement below architecture and not sure in applying gradient tape properly.
In the above architecture we can see, outputs taken from multiple layers in the blue boxes. Each blue box is termed as loss branch in the paper which contains two losses namely cross entropy and l2 loss. I wrote architecture in tensorflow 2 and using gradient tape for custom training purpose. One thing I am not sure is how should I update the losses using gradient tape.
I have two queries,
How am I supposed to use gradient tape for multiple losses in this scenario. I am interested in seeing code!
For instance, consider the 3rd blue box(3rd loss branch) in the above image, where we will take inputs from conv 13 layer and get two outputs, one for classification and other for regression.
So after computing the losses how I am supposed to update the weights, should I update all the layers above(from conv 1 to conv 13) or should I only update the layers weights which fetched me conv 13 (conv 11, 12 and 13).
I am also attaching a link where I posted a question yesterday in detail.
Below is the snippet which I have tried for gradient descent. Please correct me if I am wrong.
images = batch.data[0]
images = (images - 127.5) / 127.5
targets = batch.label
with tensorflow.GradientTape() as tape:
outputs = self.net(images)
loss = self.loss_criterion(outputs, targets)
self.scheduler(i, self.optimizer)
grads = tape.gradient(loss, self.net.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.net.trainable_variables))
Below is the code for custom loss function which is used as loss_criterion above.
losses = []
for i in range(self.num_output_scales):
pred_score = outputs[i * 2]
pred_bbox = outputs[i * 2 + 1]
gt_mask = targets[i * 2]
gt_label = targets[i * 2 + 1]
pred_score_softmax = tensorflow.nn.softmax(pred_score, axis=1)
loss_mask = tensorflow.ones(pred_score_softmax.shape, tensorflow.float32)
if self.hnm_ratio > 0:
pos_flag = (gt_label[:, 0, :, :] > 0.5)
pos_num = tensorflow.math.reduce_sum(tensorflow.cast(pos_flag, dtype=tensorflow.float32))
if pos_num > 0:
neg_flag = (gt_label[:, 1, :, :] > 0.5)
neg_num = tensorflow.math.reduce_sum(tensorflow.cast(neg_flag, dtype=tensorflow.float32))
neg_num_selected = min(int(self.hnm_ratio * pos_num), int(neg_num))
neg_prob = tensorflow.where(neg_flag, pred_score_softmax[:, 1, :, :], \
tensorflow.zeros_like(pred_score_softmax[:, 1, :, :]))
neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
neg_grad_flag = (neg_prob <= prob_threshold)
loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1),
tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)
else:
neg_choice_ratio = 0.1
neg_num_selected = int(tensorflow.cast(tensorflow.size(pred_score_softmax[:, 1, :, :]), dtype=tensorflow.float32) * 0.1)
neg_prob = pred_score_softmax[:, 1, :, :]
neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
neg_grad_flag = (neg_prob <= prob_threshold)
loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1),
tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)
pred_score_softmax_masked = tensorflow.where(loss_mask, pred_score_softmax,
tensorflow.zeros_like(pred_score_softmax, dtype=tensorflow.float32))
pred_score_log = tensorflow.math.log(pred_score_softmax_masked)
score_cross_entropy = - tensorflow.where(loss_mask, gt_label[:, :2, :, :],
tensorflow.zeros_like(gt_label[:, :2, :, :], dtype=tensorflow.float32)) * pred_score_log
loss_score = tensorflow.math.reduce_sum(score_cross_entropy) /
tensorflow.cast(tensorflow.size(score_cross_entropy), tensorflow.float32)
mask_bbox = gt_mask[:, 2:6, :, :]
predict_bbox = pred_bbox * mask_bbox
label_bbox = gt_label[:, 2:6, :, :] * mask_bbox
# l2 loss of boxes
# loss_bbox = tensorflow.math.reduce_sum(tensorflow.nn.l2_loss((label_bbox - predict_bbox)) ** 2) / 2
loss_bbox = mse(label_bbox, predict_bbox) / tensorflow.math.reduce_sum(mask_bbox)
# Adding only losses relevant to a branch and sending them for back prop
losses.append(loss_score + loss_bbox)
# losses.append(loss_bbox)
# Adding all losses and sending to back prop Approach 1
# loss_cls += loss_score
# loss_reg += loss_bbox
# loss_branch.append(loss_score)
# loss_branch.append(loss_bbox)
# loss = loss_cls + loss_reg
return losses
I am not getting any error but my losses aren't minimizing. Here is the log for my training.
Someone please help me in fixing this.

Bayesian Inference with PyMC3. Compilation error.

The following two codes do a simple bayesian inference in python using PyMC3. While the first code for exponential model compiles and run perfectly fine, the second one for a simple ode model, gives an error. I do not understand why one is working and the other is not. Please help.
Code #1
import numpy as np
import pymc3 as pm
def f(a,b,x,c):
return a * np.exp(b*x)+c
#Generating Data with error
a, b = 5, 0.2
xdata = np.linspace(0, 10, 21)
ydata = f(a, b, xdata,0.5)
yerror = 5 * np.random.rand(len(xdata))
ydata += np.random.normal(0.0, np.sqrt(yerror))
model = pm.Model()
with model:
alpha = pm.Uniform('alpha', lower=a/2, upper=2*a)
beta = pm.Uniform('beta', lower=b/2, upper=2*b)
mu = f(alpha, beta, xdata,0.5)
Y_obs = pm.Normal('Y_obs', mu=mu, sd=yerror, observed=ydata)
trace = pm.sample(100, tune = 50, nchains = 1)
Code #2
import numpy as np
import pymc3 as pm
def solver(I, a, T, dt):
"""Solve u'=-a*u, u(0)=I, for t in (0,T] with steps of dt."""
dt = float(dt) # avoid integer division
N = int(round(T/dt)) # no of time intervals
print N
T = N*dt # adjust T to fit time step dt
u = np.zeros(N+1) # array of u[n] values
t = np.linspace(0, T, N+1) # time mesh
u[0] = I # assign initial condition
for n in range(0, N): # n=0,1,...,N-1
u[n+1] = (1 - a*dt)*u[n]
return np.ravel(u)
# Generating data
ydata = solver(1,1.7,10,0.1)
yerror = 5 * np.random.rand(101)
ydata += np.random.normal(0.0, np.sqrt(yerror))
model = pm.Model()
with model:
alpha = pm.Uniform('alpha', lower = 1.0, upper = 2.5)
mu = solver(1,alpha,10,0.1)
Y_obs = pm.Normal('Y_obs', mu=mu, sd=yerror, observed=ydata)
trace = pm.sample(100, nchains=1)
The error is
Traceback (most recent call last):
File "1.py", line 27, in <module>
mu = solver(1,alpha,10,0.1)
File "1.py", line 16, in solver
u[n+1] = (1 - a*dt)*u[n]
ValueError: setting an array element with a sequence.
Please help.

The error is in this line:
mu = solver(1,alpha,10,0.1)
You are trying to pass alpha as a value, but alpha is a pymc3 distribution. The function solver only works when you provide a number in the second argument.
The code #1 works because this function
def f(a,b,x,c):
return a * np.exp(b*x)+c
returns a number.

Trying to divide by solution of odeint

I am using odeint in python to solve something (the Friedmann equation for a matter only universe) and it gives me the values of a that i want. However, how do i get it to return/plot (da/dt)/a? i.e how can divide the values for the function for the derivative by the corresponding values of the solution?
This is my attempted code: (ignore the earlier bits i.e the figure 1 plot; its the part with H i'm concerned about)
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
from scipy.integrate import odeint
t_0 = 0.0004
a_0 = 0.001
omega_m = 1.0 #for EdS
H_0 = 1./13.7
#the function for EdS universe
def Friedmann(a, t):
dadt = H_0 * (omega_m)**(1./2.) * a**(-1./2.)
return dadt
t = np.linspace(t_0,13.7,101)
a = odeint(Friedmann, a_0, t)
a = np.array(a).flatten()
plt.figure(1)
plt.subplot(211)
plt.plot(t, a)
plt.title("Einstein-de Sitter Universe")
plt.xlabel("t")
plt.ylabel("a")
#comparing to analytic solution
an = (((3. / 2.) * (H_0 * omega_m**(1./2.)) * (t - t_0)) + a_0**(3. / 2.))**(2. / 3.)
an = np.array(an).flatten()
plt.figure(1)
plt.subplot(212)
plt.plot(t, a, t, an, "+")
H = [x/y for x, y in zip(Friedmann(a, t), a)]
plt.figure(2)
plt.plot(t, H)
plt.show()
Any help is much appreciated.

2 layer NN weights not updating

I have a fairly simple NN that has 1 hidden layer.
However, the weights don't seem to be updating. Or perhaps they are but the variable values don't change ?
Either way, my accuracy is 0.1 and it doesn't change no matter I change the learning rate or the activation function. Not sure what is wrong. Any ideas ?
I've posted the entire code correctly formatter so you guys can directly copy paste it and run it on your local machines.
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels. mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [784, None],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([25, 784],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([10, 25],stddev=1.0/math.sqrt(25)),name='W')
# bias units b1 = tf.Variable(tf.zeros([25,1]),name='b1')
b2 = tf.Variable(tf.zeros([25,1]),name='b2')
b3 = tf.Variable(tf.zeros([10,1]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(W1, x,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(W3, hidden1,name='y') + b3
y_ = tf.placeholder(tf.float32, [10, None],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(2).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.global_variables_initializer()
sess.run(init)
# Train
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: np.transpose(batch_xs), y_: np.transpose(batch_ys)})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: np.transpose(mnist.test.images), y_: np.transpose(mnist.test.labels)}))

The reason why you are getting 0.1 accuracy consistently is mainly due to the order of dimensions of the input placeholder and the weights following it. Learning rate is another factor. If the learning rate is very high, the gradient would be oscillating and will not reach any minima.
Tensorflow takes the number of instances(batches) as the first index value of placeholder. So the code which declares input x
x = tf.placeholder(tf.float32, [784, None],name='x')
should be declared as
x = tf.placeholder(tf.float32, [None, 784],name='x')
Consequently, W1 should be declared as
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
and so on.. Even the bias variables should be declared in the transpose sense. (Thats how tensorflow takes it :) )
For example
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
I'm putting the corrected full code below for your reference. I achieved an accuracy of 0.9262 with this :D
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels.
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [None, 784],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([25, 10],stddev=1.0/math.sqrt(25)),name='W')
# bias units
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(x, W1,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(hidden1, W3,name='y') + b3
y_ = tf.placeholder(tf.float32, [None, 10],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.initialize_all_variables()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

generate N random numbers from a skew normal distribution using numpy

I need a function in python to return N random numbers from a skew normal distribution. The skew needs to be taken as a parameter.
e.g. my current use is
x = numpy.random.randn(1000)
and the ideal function would be e.g.
x = randn_skew(1000, skew=0.7)
Solution needs to conform with: python version 2.7, numpy v.1.9
A similar answer is here: skew normal distribution in scipy However this generates a PDF not the random numbers.

I start by generating the PDF curves for reference:
NUM_SAMPLES = 100000
SKEW_PARAMS = [-3, 0]
def skew_norm_pdf(x,e=0,w=1,a=0):
# adapated from:
# http://stackoverflow.com/questions/5884768/skew-normal-distribution-in-scipy
t = (x-e) / w
return 2.0 * w * stats.norm.pdf(t) * stats.norm.cdf(a*t)
# generate the skew normal PDF for reference:
location = 0.0
scale = 1.0
x = np.linspace(-5,5,100)
plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
p = skew_norm_pdf(x,location,scale,alpha_skew)
# n.b. note that alpha is a parameter that controls skew, but the 'skewness'
# as measured will be different. see the wikipedia page:
# https://en.wikipedia.org/wiki/Skew_normal_distribution
plt.plot(x,p)
Next I found a VB implementation of sampling random numbers from the skew normal distribution and converted it to python:
# literal adaption from:
# http://stackoverflow.com/questions/4643285/how-to-generate-random-numbers-that-follow-skew-normal-distribution-in-matlab
# original at:
# http://www.ozgrid.com/forum/showthread.php?t=108175
def rand_skew_norm(fAlpha, fLocation, fScale):
sigma = fAlpha / np.sqrt(1.0 + fAlpha**2)
afRN = np.random.randn(2)
u0 = afRN[0]
v = afRN[1]
u1 = sigma*u0 + np.sqrt(1.0 -sigma**2) * v
if u0 >= 0:
return u1*fScale + fLocation
return (-u1)*fScale + fLocation
def randn_skew(N, skew=0.0):
return [rand_skew_norm(skew, 0, 1) for x in range(N)]
# lets check they at least visually match the PDF:
plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
p = randn_skew(NUM_SAMPLES, alpha_skew)
sns.distplot(p)
And then wrote a quick version which (without extensive testing) appears to be correct:
def randn_skew_fast(N, alpha=0.0, loc=0.0, scale=1.0):
sigma = alpha / np.sqrt(1.0 + alpha**2)
u0 = np.random.randn(N)
v = np.random.randn(N)
u1 = (sigma*u0 + np.sqrt(1.0 - sigma**2)*v) * scale
u1[u0 < 0] *= -1
u1 = u1 + loc
return u1
# lets check again
plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
p = randn_skew_fast(NUM_SAMPLES, alpha_skew)
sns.distplot(p)

from scipy.stats import skewnorm
a=10
data= skewnorm.rvs(a, size=1000)
Here, a is a parameter which you can refer to:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skewnorm.html

Adapted from rsnorm function from fGarch R package
def random_snorm(n, mean = 0, sd = 1, xi = 1.5):
def random_snorm_aux(n, xi):
weight = xi/(xi + 1/xi)
z = numpy.random.uniform(-weight,1-weight,n)
xi_ = xi**numpy.sign(z)
random = -numpy.absolute(numpy.random.normal(0,1,n))/xi_ * numpy.sign(z)
m1 = 2/numpy.sqrt(2 * numpy.pi)
mu = m1 * (xi - 1/xi)
sigma = numpy.sqrt((1 - m1**2) * (xi**2 + 1/xi**2) + 2 * m1**2 - 1)
return (random - mu)/sigma
return random_snorm_aux(n, xi) * sd + mean

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why is using dot product worsening the performance for PyMC3? - pymc3

Related

Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

Bayesian Inference with PyMC3. Compilation error.

Trying to divide by solution of odeint

2 layer NN weights not updating

generate N random numbers from a skew normal distribution using numpy

Categories

Resources