PyMC3: Defining centered covariates in the model specification - pymc3

Good morning,
I'm in the process of learning PyMC3 and to start, I generated some synthetic data to use in estimating a Poisson regression. One thing I realized quickly is that I needed to make my model covariates zero mean and unit variance prior to fitting the model to avoid numerical issues. I performed the normalization outside of the PyMC3 model specification which leaves me wondering if there's some way to integrate it? If the normalization is not integrated, I'm left having to estimate the means and standard deviations of the covariates from the training data and make sure I apply that same normalization to future test data. I'd love it if that could be part of the model itself so that I don't have to always apply these external operations as well. Is that possible? If so, I'd love any pointers to examples of how to do it.
Thanks!
Chris
Here's an example I've been experimenting with that helps clarify my question. I'm wondering how I might move the normalization of the covariates into the PyMC3 model specification at the end of the code block.
# Generate synthetic data
N = 10000
tree_height = []
wind_speed = []
event_count = []
for i in range(N):
tree_height.append(np.random.rayleigh(scale=10))
wind_speed.append(np.random.rayleigh(scale=5))
event_count.append(np.random.poisson(tree_height[-1] + wind_speed[-1] + 0.1*tree_height[-1]*wind_speed[-1]))
# Normalize the synthetic covariates to be zero mean, unit variance
mn_tree_height = np.mean(tree_height)
std_tree_height = np.std(tree_height)
mn_wind_speed = np.mean(wind_speed)
std_wind_speed = np.std(wind_speed)
tree_height = (tree_height-mn_tree_height) / std_tree_height
wind_speed = (wind_speed-mn_wind_speed) / mn_wind_speed
# Build the data frame
df = pd.DataFrame.from_dict({'tree_height': tree_height, 'wind_speed': wind_speed, 'event_count': event_count})
# Patsy model specification
fml = 'event_count ~ tree_height * wind_speed'
# Design matrices
(outcome,covars) = pt.dmatrices(fml, df, return_type='dataframe', NA_action='raise')
# Theano shared variables for mini-batch training and testing
wind_speed = shared(covars.wind_speed.values)
tree_height = shared(covars.tree_height.values)
ws_th = shared(covars['tree_height:wind_speed'].values)
# PyMC3 model specification
with pm.Model() as m:
b0 = pm.Normal('b0_intercept', mu=0, sigma=10)
b1 = pm.Normal('b1_wind_speed', mu=0, sigma=10)
b2 = pm.Normal('b2_tree_height', mu=0, sigma=10)
b3 = pm.Normal('b3_tree_height:wind_speed', mu=0, sigma=10)
theta = (b0 +
b1 * wind_speed +
b2 * tree_height +
b3 * ws_th)
y = pm.Poisson('outcome', mu=np.exp(theta), observed=outcome['event_count'].values)

Related

How do I fit a pymc3 model when each person has multiple data points?

I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).
Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.
To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)
Here is how I tried to fit it using pymc3:
traces = {}
for grp in range(2):
group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]
with pm.Model() as irt:
# Priors
a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
a = pm.Deterministic('a', np.exp(a_tmp))
# We do this transformation since we must have a >= 0
b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
# Now for the hyperpriors on the groups:
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
theta = pm.Normal('theta', mu = theta_mu,
sd = theta_sigma, shape = N)
p = getProbs(Disc, Diff, theta, N)
y = pm.Bernoulli('y', p = p, observed = data)
traces[grp] = pm.sample(1000)
The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?
Here's the code for "getProbs" in case it's helpful:
def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
for t in range(n):
p = item(Disc[t], Diff[t], THETA[th])
probs = np.append(probs, p)
return probs
I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.
I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.
Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.
Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.
EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:
p = theano.tensor.stack(p, axis = 0)
I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)
group = group.astype('int')
data = prac_data["Response"]
with pm.Model() as irt:
# Priors
# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination
b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty
# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)
# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
sd = theta_sigma[group], shape = 2*N)
# Here, we're building an array of the probabilities we need for
# each trial:
p = np.array([])
for n in range(2*N):
for t in range(10):
x = -a[t]*(theta[n] - b[t])
p = np.append(p, x)
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)
y = pm.Bernoulli('y', logit_p = p, observed = data)
# On my computer, this took about 5 minutes to run.
traces = pm.sample(1000, cores = 1)
print(az.summary(traces)) # Summary of parameter distributions

Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

I am trying to implement below architecture and not sure in applying gradient tape properly.
In the above architecture we can see, outputs taken from multiple layers in the blue boxes. Each blue box is termed as loss branch in the paper which contains two losses namely cross entropy and l2 loss. I wrote architecture in tensorflow 2 and using gradient tape for custom training purpose. One thing I am not sure is how should I update the losses using gradient tape.
I have two queries,
How am I supposed to use gradient tape for multiple losses in this scenario. I am interested in seeing code!
For instance, consider the 3rd blue box(3rd loss branch) in the above image, where we will take inputs from conv 13 layer and get two outputs, one for classification and other for regression.
So after computing the losses how I am supposed to update the weights, should I update all the layers above(from conv 1 to conv 13) or should I only update the layers weights which fetched me conv 13 (conv 11, 12 and 13).
I am also attaching a link where I posted a question yesterday in detail.
Below is the snippet which I have tried for gradient descent. Please correct me if I am wrong.
images = batch.data[0]
images = (images - 127.5) / 127.5
targets = batch.label
with tensorflow.GradientTape() as tape:
outputs = self.net(images)
loss = self.loss_criterion(outputs, targets)
self.scheduler(i, self.optimizer)
grads = tape.gradient(loss, self.net.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.net.trainable_variables))
Below is the code for custom loss function which is used as loss_criterion above.
losses = []
for i in range(self.num_output_scales):
pred_score = outputs[i * 2]
pred_bbox = outputs[i * 2 + 1]
gt_mask = targets[i * 2]
gt_label = targets[i * 2 + 1]
pred_score_softmax = tensorflow.nn.softmax(pred_score, axis=1)
loss_mask = tensorflow.ones(pred_score_softmax.shape, tensorflow.float32)
if self.hnm_ratio > 0:
pos_flag = (gt_label[:, 0, :, :] > 0.5)
pos_num = tensorflow.math.reduce_sum(tensorflow.cast(pos_flag, dtype=tensorflow.float32))
if pos_num > 0:
neg_flag = (gt_label[:, 1, :, :] > 0.5)
neg_num = tensorflow.math.reduce_sum(tensorflow.cast(neg_flag, dtype=tensorflow.float32))
neg_num_selected = min(int(self.hnm_ratio * pos_num), int(neg_num))
neg_prob = tensorflow.where(neg_flag, pred_score_softmax[:, 1, :, :], \
tensorflow.zeros_like(pred_score_softmax[:, 1, :, :]))
neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
neg_grad_flag = (neg_prob <= prob_threshold)
loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1),
tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)
else:
neg_choice_ratio = 0.1
neg_num_selected = int(tensorflow.cast(tensorflow.size(pred_score_softmax[:, 1, :, :]), dtype=tensorflow.float32) * 0.1)
neg_prob = pred_score_softmax[:, 1, :, :]
neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
neg_grad_flag = (neg_prob <= prob_threshold)
loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1),
tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)
pred_score_softmax_masked = tensorflow.where(loss_mask, pred_score_softmax,
tensorflow.zeros_like(pred_score_softmax, dtype=tensorflow.float32))
pred_score_log = tensorflow.math.log(pred_score_softmax_masked)
score_cross_entropy = - tensorflow.where(loss_mask, gt_label[:, :2, :, :],
tensorflow.zeros_like(gt_label[:, :2, :, :], dtype=tensorflow.float32)) * pred_score_log
loss_score = tensorflow.math.reduce_sum(score_cross_entropy) /
tensorflow.cast(tensorflow.size(score_cross_entropy), tensorflow.float32)
mask_bbox = gt_mask[:, 2:6, :, :]
predict_bbox = pred_bbox * mask_bbox
label_bbox = gt_label[:, 2:6, :, :] * mask_bbox
# l2 loss of boxes
# loss_bbox = tensorflow.math.reduce_sum(tensorflow.nn.l2_loss((label_bbox - predict_bbox)) ** 2) / 2
loss_bbox = mse(label_bbox, predict_bbox) / tensorflow.math.reduce_sum(mask_bbox)
# Adding only losses relevant to a branch and sending them for back prop
losses.append(loss_score + loss_bbox)
# losses.append(loss_bbox)
# Adding all losses and sending to back prop Approach 1
# loss_cls += loss_score
# loss_reg += loss_bbox
# loss_branch.append(loss_score)
# loss_branch.append(loss_bbox)
# loss = loss_cls + loss_reg
return losses
I am not getting any error but my losses aren't minimizing. Here is the log for my training.
Someone please help me in fixing this.

Joining of curve fitting models

I have this 7 quasi-lorentzian curves which are fitted to my data.
and I would like to join them, to make one connected curved line. Do You have any ideas how to do this? I've read about ComposingModel at lmfit documentation, but it's not clear how to do this.
Here is a sample of my code of two fitted curves.
for dataset in [Bxfft]:
dataset = np.asarray(dataset)
freqs, psd = signal.welch(dataset, fs=266336/300, window='hamming', nperseg=16192, scaling='spectrum')
plt.semilogy(freqs[0:-7000], psd[0:-7000]/dataset.size**0, color='r', label='Bx')
x = freqs[100:-7900]
y = psd[100:-7900]
# 8 Hz
model = Model(lorentzian)
params = model.make_params(amp=6, cen=5, sig=1, e=0)
result = model.fit(y, params, x=x)
final_fit = result.best_fit
print "8 Hz mode"
print(result.fit_report(min_correl=0.25))
plt.plot(x, final_fit, 'k-', linewidth=2)
# 14 Hz
x2 = freqs[220:-7780]
y2 = psd[220:-7780]
model2 = Model(lorentzian)
pars2 = model2.make_params(amp=6, cen=10, sig=3, e=0)
pars2['amp'].value = 6
result2 = model2.fit(y2, pars2, x=x2)
final_fit2 = result2.best_fit
print "14 Hz mode"
print(result2.fit_report(min_correl=0.25))
plt.plot(x2, final_fit2, 'k-', linewidth=2)
UPDATE!!!
I've used some hints from user #MNewville, who posted an answer and using his code I got this:
So my code is similar to his, but extended with each peak. What I'm struggling now is replacing ready LorentzModel with my own.
The problem is when I do this, the code gives me an error like this.
C:\Python27\lib\site-packages\lmfit\printfuncs.py:153: RuntimeWarning:
invalid value encountered in double_scalars [[Model]] spercent =
'({0:.2%})'.format(abs(par.stderr/par.value))
About my own model:
def lorentzian(x, amp, cen, sig, e):
return (amp*(1-e)) / ((pow((1.0 * x - cen), 2)) + (pow(sig, 2)))
peak1 = Model(lorentzian, prefix='p1_')
peak2 = Model(lorentzian, prefix='p2_')
peak3 = Model(lorentzian, prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peak2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amp=6, p1_cen=8, p1_sig=1, p1_e=0,
p2_ampe=16, p2_cen=14, p2_sig=3, p2_e=0,
p3_amp=16, p3_cen=21, p3_sig=3, p3_e=0,)
rest of the code is similar like at #MNewville
[![enter image description here][3]][3]
A composite model for 3 Lorentzians would look like this:
from lmfit import Model, LorentzianModel
peak1 = LorentzianModel(prefix='p1_')
peak2 = LorentzianModel(prefix='p2_')
peak3 = LorentzianModel(prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peaks2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amplitude=10, p1_center=8, p1_sigma=3,
p2_amplitude=10, p2_center=15, p2_sigma=3,
p3_amplitude=10, p3_center=20, p3_sigma=3)
# perhaps set bounds to prevent peaks from swapping or crazy values
params['p1_amplitude'].min = 0
params['p2_amplitude'].min = 0
params['p3_amplitude'].min = 0
params['p1_sigma'].min = 0
params['p2_sigma'].min = 0
params['p3_sigma'].min = 0
params['p1_center'].min = 2
params['p1_center'].max = 11
params['p2_center'].min = 10
params['p2_center'].max = 18
params['p3_center'].min = 17
params['p3_center'].max = 25
# then do a fit over the full data range
result = model.fit(y, params, x=x)
I think the key parts you were missing were: a) just add models together, and b) use prefix to avoid name collisions of parameters.
I hope that is enough to get you started...

How can I implement a joint hyerprior?

I'm trying to recreate results from Bayesian Data Analysis Third Edition.
Chapter 5 Section 3 concerns tumors in rats. a Hierarchical model is fit and the hyperprior used is not one of the densities included in pymc3.
The hyperprior is a*b*(a+b)^-2.5. Here is my attempt using pymc3.
import pymc3 as pm
with pm.Model() as model:
def ab_dist(x):
#Should be log density, from what I have read
a = x[0]
b = x[1]
return a+b-5/2*(a+b)
ab = pm.DensityDist('ab', ab_dist, shape = 2, testval=[2,2])
a = ab[0]
b = ab[1]
theta = pm.Beta('theta',alpha = a,beta = b)
Y= pm.Binomial('y', n = n, p = theta, observed = y)
At this stage, I am returned an error
ValueError: Input dimension mis-match. (input[0].shape[0] = 71, input[1].shape[0] = 20000)
What have I done wrong? Have I correctly implemented the density?

2 layer NN weights not updating

I have a fairly simple NN that has 1 hidden layer.
However, the weights don't seem to be updating. Or perhaps they are but the variable values don't change ?
Either way, my accuracy is 0.1 and it doesn't change no matter I change the learning rate or the activation function. Not sure what is wrong. Any ideas ?
I've posted the entire code correctly formatter so you guys can directly copy paste it and run it on your local machines.
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels. mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [784, None],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([25, 784],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([10, 25],stddev=1.0/math.sqrt(25)),name='W')
# bias units b1 = tf.Variable(tf.zeros([25,1]),name='b1')
b2 = tf.Variable(tf.zeros([25,1]),name='b2')
b3 = tf.Variable(tf.zeros([10,1]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(W1, x,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(W3, hidden1,name='y') + b3
y_ = tf.placeholder(tf.float32, [10, None],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(2).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.global_variables_initializer()
sess.run(init)
# Train
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: np.transpose(batch_xs), y_: np.transpose(batch_ys)})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: np.transpose(mnist.test.images), y_: np.transpose(mnist.test.labels)}))
The reason why you are getting 0.1 accuracy consistently is mainly due to the order of dimensions of the input placeholder and the weights following it. Learning rate is another factor. If the learning rate is very high, the gradient would be oscillating and will not reach any minima.
Tensorflow takes the number of instances(batches) as the first index value of placeholder. So the code which declares input x
x = tf.placeholder(tf.float32, [784, None],name='x')
should be declared as
x = tf.placeholder(tf.float32, [None, 784],name='x')
Consequently, W1 should be declared as
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
and so on.. Even the bias variables should be declared in the transpose sense. (Thats how tensorflow takes it :) )
For example
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
I'm putting the corrected full code below for your reference. I achieved an accuracy of 0.9262 with this :D
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels.
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [None, 784],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([25, 10],stddev=1.0/math.sqrt(25)),name='W')
# bias units
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(x, W1,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(hidden1, W3,name='y') + b3
y_ = tf.placeholder(tf.float32, [None, 10],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.initialize_all_variables()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))