Tensorflow Deep Learning - model size and parameters - python-2.7

According to Andrej's blog -
Where he says that for a Convolutional Layer, with parameter sharing, it introduces F x F x D weights per filter, for a total of (F x F x D) x K weights and K biases.
In my tensorflow code, I have an architecture like this (where D=1)
conv1 : F = 3, K = 32, S = 1, P = 1.
pool1 :
conv2
and so on...
According to the formula,
A model generated with F=3 for conv1 should have 9K weights ,i.e. smaller model, and
A model generated with F=5 should have 25K weights i.e. bigger model
Question
In my code, when I write out the model files for both these cases, I see that the .ckpt file is about 380MB (F=3) and 340MB (F=5). Am I missing something?
Code:
Here's the reference code for saving the variables to a model and printing its size.
''' Run the session and save the model'''
#Add a saver here
saver = tf.train.Saver()
# Run session
sess.run(tf.initialize_all_variables())
for i in range(201):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
# Save model
save_path = saver.save(sess, "/Users/voladoddi/Desktop/dropmodel.ckpt")
print("Model saved in file: %s" % save_path)
# Test
print("test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
# Print model size.
vars = 0
for v in tf.all_variables():
vars += np.prod(v.get_shape().as_list())
print(vars*4)/(1024**2),"MB"

Related

How do I fit a pymc3 model when each person has multiple data points?

I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).
Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.
To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)
Here is how I tried to fit it using pymc3:
traces = {}
for grp in range(2):
group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]
with pm.Model() as irt:
# Priors
a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
a = pm.Deterministic('a', np.exp(a_tmp))
# We do this transformation since we must have a >= 0
b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
# Now for the hyperpriors on the groups:
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
theta = pm.Normal('theta', mu = theta_mu,
sd = theta_sigma, shape = N)
p = getProbs(Disc, Diff, theta, N)
y = pm.Bernoulli('y', p = p, observed = data)
traces[grp] = pm.sample(1000)
The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?
Here's the code for "getProbs" in case it's helpful:
def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
for t in range(n):
p = item(Disc[t], Diff[t], THETA[th])
probs = np.append(probs, p)
return probs
I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.
I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.
Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.
Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.
EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:
p = theano.tensor.stack(p, axis = 0)
I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)
group = group.astype('int')
data = prac_data["Response"]
with pm.Model() as irt:
# Priors
# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination
b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty
# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)
# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
sd = theta_sigma[group], shape = 2*N)
# Here, we're building an array of the probabilities we need for
# each trial:
p = np.array([])
for n in range(2*N):
for t in range(10):
x = -a[t]*(theta[n] - b[t])
p = np.append(p, x)
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)
y = pm.Bernoulli('y', logit_p = p, observed = data)
# On my computer, this took about 5 minutes to run.
traces = pm.sample(1000, cores = 1)
print(az.summary(traces)) # Summary of parameter distributions

TypeError: only length-1 arrays can be converted to Python scalars, python2.7

I searched about this issue, I got more questions "the same error" but different code and different reason. So, I was hesitant more to put my issue here. After reading the majority of answers, I didn't find a solution for my issue.
The original and full code here
chapter6.py:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from datasets import gtsrb
from classifiers import MultiClassSVM
def main():
strategies = ['one-vs-one', 'one-vs-all']
features = [None, 'gray', 'rgb', 'hsv', 'surf', 'hog']
accuracy = np.zeros((2, len(features)))
precision = np.zeros((2, len(features)))
recall = np.zeros((2, len(features)))
for f in xrange(len(features)):
print "feature", features[f]
(X_train, y_train), (X_test, y_test) = gtsrb.load_data(
"datasets/gtsrb_training",
feature=features[f],
test_split=0.2,
seed=42)
# convert to numpy
X_train = np.squeeze(np.array(X_train)).astype(np.float32)
y_train = np.array(y_train)
X_test = np.squeeze(np.array(X_test)).astype(np.float32)
y_test = np.array(y_test)
# find all class labels
labels = np.unique(np.hstack((y_train, y_test)))
for s in xrange(len(strategies)):
print " - strategy", strategies[s]
# set up SVMs
MCS = MultiClassSVM(len(labels), strategies[s])
# training phase
print " - train"
MCS.fit(X_train, y_train)
# test phase
print " - test"
acc, prec, rec = MCS.evaluate(X_test, y_test)
accuracy[s, f] = acc
precision[s, f] = np.mean(prec)
recall[s, f] = np.mean(rec)
print " - accuracy: ", acc
print " - mean precision: ", np.mean(prec)
print " - mean recall: ", np.mean(rec)
# plot results as stacked bar plot
f, ax = plt.subplots(2)
for s in xrange(len(strategies)):
x = np.arange(len(features))
ax[s].bar(x - 0.2, accuracy[s, :], width=0.2, color='b',
hatch='/', align='center')
ax[s].bar(x, precision[s, :], width=0.2, color='r', hatch='\\',
align='center')
ax[s].bar(x + 0.2, recall[s, :], width=0.2, color='g', hatch='x',
align='center')
ax[s].axis([-0.5, len(features) + 0.5, 0, 1.5])
ax[s].legend(('Accuracy', 'Precision', 'Recall'), loc=2, ncol=3,
mode='expand')
ax[s].set_xticks(np.arange(len(features)))
ax[s].set_xticklabels(features)
ax[s].set_title(strategies[s])
plt.show()
if __name__ == '__main__':
main()
classifiers.py
import cv2
import numpy as np
from abc import ABCMeta, abstractmethod
from matplotlib import pyplot as plt
__author__ = "Michael Beyeler"
__license__ = "GNU GPL 3.0 or later"
class Classifier:
"""
Abstract base class for all classifiers
A classifier needs to implement at least two methods:
- fit: A method to train the classifier by fitting the model to
the data.
- evaluate: A method to test the classifier by predicting labels of
some test data based on the trained model.
A classifier also needs to specify a classification strategy via
setting self.mode to either "one-vs-all" or "one-vs-one".
The one-vs-all strategy involves training a single classifier per
class, with the samples of that class as positive samples and all
other samples as negatives.
The one-vs-one strategy involves training a single classifier per
class pair, with the samples of the first class as positive samples
and the samples of the second class as negative samples.
This class also provides method to calculate accuracy, precision,
recall, and the confusion matrix.
"""
__metaclass__ = ABCMeta
#abstractmethod
def fit(self, X_train, y_train):
pass
#abstractmethod
def evaluate(self, X_test, y_test, visualize=False):
pass
def _accuracy(self, y_test, Y_vote):
"""Calculates accuracy
This method calculates the accuracy based on a vector of
ground-truth labels (y_test) and a 2D voting matrix (Y_vote) of
size (len(y_test), num_classes).
:param y_test: vector of ground-truth labels
:param Y_vote: 2D voting matrix (rows=samples, cols=class votes)
:returns: accuracy e[0,1]
"""
# predicted classes
y_hat = np.argmax(Y_vote, axis=1)
# all cases where predicted class was correct
mask = y_hat == y_test
return np.float32(np.count_nonzero(mask)) / len(y_test)
def _precision(self, y_test, Y_vote):
"""Calculates precision
This method calculates precision extended to multi-class
classification by help of a confusion matrix.
:param y_test: vector of ground-truth labels
:param Y_vote: 2D voting matrix (rows=samples, cols=class votes)
:returns: precision e[0,1]
"""
# predicted classes
y_hat = np.argmax(Y_vote, axis=1)
if self.mode == "one-vs-one":
# need confusion matrix
conf = self._confusion(y_test, Y_vote)
# consider each class separately
prec = np.zeros(self.num_classes)
for c in xrange(self.num_classes):
# true positives: label is c, classifier predicted c
tp = conf[c, c]
# false positives: label is c, classifier predicted not c
fp = np.sum(conf[:, c]) - conf[c, c]
if tp + fp != 0:
prec[c] = tp * 1. / (tp + fp)
elif self.mode == "one-vs-all":
# consider each class separately
prec = np.zeros(self.num_classes)
for c in xrange(self.num_classes):
# true positives: label is c, classifier predicted c
tp = np.count_nonzero((y_test == c) * (y_hat == c))
# false positives: label is c, classifier predicted not c
fp = np.count_nonzero((y_test == c) * (y_hat != c))
if tp + fp != 0:
prec[c] = tp * 1. / (tp + fp)
return prec
def _recall(self, y_test, Y_vote):
"""Calculates recall
This method calculates recall extended to multi-class
classification by help of a confusion matrix.
:param y_test: vector of ground-truth labels
:param Y_vote: 2D voting matrix (rows=samples, cols=class votes)
:returns: recall e[0,1]
"""
# predicted classes
y_hat = np.argmax(Y_vote, axis=1)
if self.mode == "one-vs-one":
# need confusion matrix
conf = self._confusion(y_test, Y_vote)
# consider each class separately
recall = np.zeros(self.num_classes)
for c in xrange(self.num_classes):
# true positives: label is c, classifier predicted c
tp = conf[c, c]
# false negatives: label is not c, classifier predicted c
fn = np.sum(conf[c, :]) - conf[c, c]
if tp + fn != 0:
recall[c] = tp * 1. / (tp + fn)
elif self.mode == "one-vs-all":
# consider each class separately
recall = np.zeros(self.num_classes)
for c in xrange(self.num_classes):
# true positives: label is c, classifier predicted c
tp = np.count_nonzero((y_test == c) * (y_hat == c))
# false negatives: label is not c, classifier predicted c
fn = np.count_nonzero((y_test != c) * (y_hat == c))
if tp + fn != 0:
recall[c] = tp * 1. / (tp + fn)
return recall
def _confusion(self, y_test, Y_vote):
"""Calculates confusion matrix
This method calculates the confusion matrix based on a vector of
ground-truth labels (y-test) and a 2D voting matrix (Y_vote) of
size (len(y_test), num_classes).
Matrix element conf[r,c] will contain the number of samples that
were predicted to have label r but have ground-truth label c.
:param y_test: vector of ground-truth labels
:param Y_vote: 2D voting matrix (rows=samples, cols=class votes)
:returns: confusion matrix
"""
y_hat = np.argmax(Y_vote, axis=1)
conf = np.zeros((self.num_classes, self.num_classes)).astype(np.int32)
for c_true in xrange(self.num_classes):
# looking at all samples of a given class, c_true
# how many were classified as c_true? how many as others?
for c_pred in xrange(self.num_classes):
y_this = np.where((y_test == c_true) * (y_hat == c_pred))
conf[c_pred, c_true] = np.count_nonzero(y_this)
return conf
class MultiClassSVM(Classifier):
"""
Multi-class classification using Support Vector Machines (SVMs)
This class implements an SVM for multi-class classification. Whereas
some classifiers naturally permit the use of more than two classes
(such as neural networks), SVMs are binary in nature.
However, we can turn SVMs into multinomial classifiers using at least
two different strategies:
* one-vs-all: A single classifier is trained per class, with the
samples of that class as positives (label 1) and all
others as negatives (label 0).
* one-vs-one: For k classes, k*(k-1)/2 classifiers are trained for each
pair of classes, with the samples of the one class as
positives (label 1) and samples of the other class as
negatives (label 0).
Each classifier then votes for a particular class label, and the final
decision (classification) is based on a majority vote.
"""
def __init__(self, num_classes, mode="one-vs-all", params=None):
"""
The constructor makes sure the correct number of classifiers is
initialized, depending on the mode ("one-vs-all" or "one-vs-one").
:param num_classes: The number of classes in the data.
:param mode: Which classification mode to use.
"one-vs-all": single classifier per class
"one-vs-one": single classifier per class pair
Default: "one-vs-all"
:param params: SVM training parameters.
For now, default values are used for all SVMs.
Hyperparameter exploration can be achieved by
embedding the MultiClassSVM process flow in a
for-loop that classifies the data with
different parameter values, then pick the
values that yield the best accuracy.
Default: None
"""
self.num_classes = num_classes
self.mode = mode
self.params = params or dict()
# initialize correct number of classifiers
self.classifiers = []
if mode == "one-vs-one":
# k classes: need k*(k-1)/2 classifiers
for _ in xrange(num_classes*(num_classes - 1) / 2):
self.classifiers.append(cv2.ml.SVM_create())
elif mode == "one-vs-all":
# k classes: need k classifiers
for _ in xrange(num_classes):
self.classifiers.append(cv2.ml.SVM_create())
else:
print "Unknown mode ", mode
def fit(self, X_train, y_train, params=None):
"""Fits the model to training data
This method trains the classifier on data (X_train) using either
the "one-vs-one" or "one-vs-all" strategy.
:param X_train: input data (rows=samples, cols=features)
:param y_train: vector of class labels
:param params: dict to specify training options for cv2.SVM.train
leave blank to use the parameters passed to the
constructor
"""
if params is None:
params = self.params
if self.mode == "one-vs-one":
svm_id = 0
for c1 in xrange(self.num_classes):
for c2 in xrange(c1 + 1, self.num_classes):
# indices where class labels are either `c1` or `c2`
data_id = np.where((y_train == c1) + (y_train == c2))[0]
# set class label to 1 where class is `c1`, else 0
y_train_bin = np.where(y_train[data_id] == c1, 1,
0).flatten()
self.classifiers[svm_id].train(X_train[data_id, :],
y_train_bin,
params=self.params)
svm_id += 1
elif self.mode == "one-vs-all":
for c in xrange(self.num_classes):
# train c-th SVM on class c vs. all other classes
# set class label to 1 where class==c, else 0
y_train_bin = np.where(y_train == c, 1, 0).flatten()
# train SVM
self.classifiers[c].train(X_train, y_train_bin,
params=self.params)
def evaluate(self, X_test, y_test, visualize=False):
"""Evaluates the model on test data
This method evaluates the classifier's performance on test data
(X_test) using either the "one-vs-one" or "one-vs-all" strategy.
:param X_test: input data (rows=samples, cols=features)
:param y_test: vector of class labels
:param visualize: flag whether to plot the results (True) or not
(False)
:returns: accuracy, precision, recall
"""
# prepare Y_vote: for each sample, count how many times we voted
# for each class
Y_vote = np.zeros((len(y_test), self.num_classes))
if self.mode == "one-vs-one":
svm_id = 0
for c1 in xrange(self.num_classes):
for c2 in xrange(c1 + 1, self.num_classes):
data_id = np.where((y_test == c1) + (y_test == c2))[0]
X_test_id = X_test[data_id, :]
y_test_id = y_test[data_id]
# set class label to 1 where class==c1, else 0
# y_test_bin = np.where(y_test_id==c1,1,0).reshape(-1,1)
# predict labels
y_hat = self.classifiers[svm_id].predict_all(X_test_id)
for i in xrange(len(y_hat)):
if y_hat[i] == 1:
Y_vote[data_id[i], c1] += 1
elif y_hat[i] == 0:
Y_vote[data_id[i], c2] += 1
else:
print "y_hat[", i, "] = ", y_hat[i]
# we vote for c1 where y_hat is 1, and for c2 where y_hat
# is 0 np.where serves as the inner index into the data_id
# array, which in turn serves as index into the results
# array
# Y_vote[data_id[np.where(y_hat == 1)[0]], c1] += 1
# Y_vote[data_id[np.where(y_hat == 0)[0]], c2] += 1
svm_id += 1
elif self.mode == "one-vs-all":
for c in xrange(self.num_classes):
# set class label to 1 where class==c, else 0
# predict class labels
# y_test_bin = np.where(y_test==c,1,0).reshape(-1,1)
# predict labels
y_hat = self.classifiers[c].predict_all(X_test)
# we vote for c where y_hat is 1
if np.any(y_hat):
Y_vote[np.where(y_hat == 1)[0], c] += 1
# with this voting scheme it's possible to end up with samples
# that have no label at all...in this case, pick a class at
# random...
no_label = np.where(np.sum(Y_vote, axis=1) == 0)[0]
Y_vote[no_label, np.random.randint(self.num_classes,
size=len(no_label))] = 1
accuracy = self._accuracy(y_test, Y_vote)
precision = self._precision(y_test, Y_vote)
recall = self._recall(y_test, Y_vote)
return accuracy, precision, recall
when running chapter6.py
The output is:
feature None
- strategy one-vs-one
- train
Traceback (most recent call last):
File "/home/redhwan/Downloads/opencv-python-blueprints-master/chapter6/chapter6.py", line 77, in <module>
main()
File "/home/redhwan/Downloads/opencv-python-blueprints-master/chapter6/chapter6.py", line 44, in main
MCS.fit(X_train, y_train)
File "/home/redhwan/Downloads/opencv-python-blueprints-master/chapter6/classifiers.py", line 258, in fit
params=self.params)
TypeError: only length-1 arrays can be converted to Python scalars
please help me or your suggestion
Thank you in advance!

ValueError: Tensor Tensor("Const:0", shape=(), dtype=float32) may not be fed with tf.placeholder

I'm trying to make speech recognition system with tensorflow.
Input data is an numpy array of size 50000 X 1.
Output data (mapping data) is an numpy array of size 400 X 1.
Input and mapping data is passed in batches of 2 in a list.
I've used this tutorial to design the neural network. Following is the code snippet:
For RNN:
input_data = tf.placeholder(tf.float32, [batch_size, sound_constants.MAX_ROW_SIZE_IN_DATA, sound_constants.MAX_COLUMN_SIZE_IN_DATA], name="train_input")
target = tf.placeholder(tf.float32, [batch_size, sound_constants.MAX_ROW_SIZE_IN_TXT, sound_constants.MAX_COLUMN_SIZE_IN_TXT], name="train_output")
fwd_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True, forget_bias=1.0)
# creating one backward cell
bkwd_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True, forget_bias=1.0)
# creating bidirectional RNN
val, _, _ = tf.nn.static_bidirectional_rnn(fwd_cell, bkwd_cell, tf.unstack(input_data), dtype=tf.float32)
For feeding data:
feed = {g['input_data'] : trb[0], g['target'] : trb[1], g['dropout'] : 0.6}
accuracy_, _ = sess.run([g['accuracy'], g['ts']], feed_dict=feed)
accuracy += accuracy_
When I ran the code, I got this error:
Traceback (most recent call last):
File "/home/wolborg/PycharmProjects/speech-to-text-rnn/src/rnn_train_1.py", line 205, in <module>
tr_losses, te_losses = train_network(g)
File "/home/wolborg/PycharmProjects/speech-to-text-rnn/src/rnn_train_1.py", line 177, in train_network
accuracy_, _ = sess.run([g['accuracy'], g['ts']], feed_dict=feed)
File "/home/wolborg/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/wolborg/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1102, in _run
raise ValueError('Tensor %s may not be fed.' % subfeed_t)
ValueError: Tensor Tensor("Const:0", shape=(), dtype=float32) may not be fed.
Process finished with exit code 1
Earlier, I was facing this issue with tf.sparse_placeholder, then after some browsing, I changed input type to tf.placeholder and made related changes. Now I'm clueless on where I'm making the error.
Please suggest something as how should I feed data.
Entire code:
import tensorflow as tf
# for taking MFCC and label input
import numpy as np
import rnn_input_data_1
import sound_constants
# input constants
# Training Parameters
num_input = 10 # mfcc data input
training_data_size = 8 # determines number of files in training and testing module
testing_data_size = num_input - training_data_size
# Network Parameters
learning_rate = 0.0001 # for large training set, it can be set 0.001
num_hidden = 200 # number of hidden layers
num_classes = 28 # total alphabet classes (a-z) + extra symbols (', ' ')
epoch = 1 # number of iterations
batch_size = 2 # number of batches
mfcc_coeffs, text_data = rnn_input_data_1.mfcc_and_text_encoding()
class DataGenerator:
def __init__(self, data_size):
self.ptr = 0
self.epochs = 0
self.data_size = data_size
def next_batch(self):
self.ptr += batch_size
if self.ptr > self.data_size:
self.epochs += 1
self.ptr = 0
return mfcc_coeffs[self.ptr-batch_size : self.ptr], text_data[self.ptr-batch_size : self.ptr]
def reset_graph():
if 'sess' in globals() and sess:
sess.close()
tf.reset_default_graph()
def struct_network():
print ('Inside struct network !!')
reset_graph()
input_data = tf.placeholder(tf.float32, [batch_size, sound_constants.MAX_ROW_SIZE_IN_DATA, sound_constants.MAX_COLUMN_SIZE_IN_DATA], name="train_input")
target = tf.placeholder(tf.float32, [batch_size, sound_constants.MAX_ROW_SIZE_IN_TXT, sound_constants.MAX_COLUMN_SIZE_IN_TXT], name="train_output")
keep_prob = tf.constant(1.0)
fwd_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True, forget_bias=1.0)
# creating one backward cell
bkwd_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True, forget_bias=1.0)
# creating bidirectional RNN
val, _, _ = tf.nn.static_bidirectional_rnn(fwd_cell, bkwd_cell, tf.unstack(input_data), dtype=tf.float32)
# adding dropouts
val = tf.nn.dropout(val, keep_prob)
val = tf.transpose(val, [1, 0, 2])
last = tf.gather(val, int(val.get_shape()[0]) - 1)
# creating bidirectional RNN
print ('BiRNN created !!')
print ('Last Size: ', last.get_shape())
weight = tf.Variable(tf.truncated_normal([num_hidden * 2, sound_constants.MAX_ROW_SIZE_IN_TXT]))
bias = tf.Variable(tf.constant(0.1, shape=[sound_constants.MAX_ROW_SIZE_IN_TXT]))
# mapping to 28 output classes
logits = tf.matmul(last, weight) + bias
prediction = tf.nn.softmax(logits)
prediction = tf.reshape(prediction, shape = [batch_size, sound_constants.MAX_ROW_SIZE_IN_TXT, sound_constants.MAX_COLUMN_SIZE_IN_TXT])
# getting probability distribution
mat1 = tf.cast(tf.argmax(prediction,1),tf.float32)
correct = tf.equal(prediction, target)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
logits = tf.reshape(logits, shape=[batch_size, sound_constants.MAX_ROW_SIZE_IN_TXT, sound_constants.MAX_COLUMN_SIZE_IN_TXT])
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=target))
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
# returning components as dictionary elements
return {'input_data' : input_data,
'target' : target,
'dropout': keep_prob,
'loss': loss,
'ts': train_step,
'preds': prediction,
'accuracy': accuracy
}
def train_network(graph):
# initialize tensorflow session and all variables
# tf_gpu_config = tf.ConfigProto(allow_soft_placement = True, log_device_placement = True)
# tf_gpu_config.gpu_options.allow_growth = True
# with tf.Session(config = tf_gpu_config) as sess:
with tf.Session() as sess:
train_instance = DataGenerator(training_data_size)
test_instance = DataGenerator(testing_data_size)
print ('Training data size: ', train_instance.data_size)
print ('Testing data size: ', test_instance.data_size)
sess.run(tf.global_variables_initializer())
print ('Starting session...')
step, accuracy = 0, 0
tr_losses, te_losses = [], []
current_epoch = 0
while current_epoch < epoch:
step += 1
trb = train_instance.next_batch()
feed = {g['input_data'] : trb[0], g['target'] : trb[1], g['dropout'] : 0.6}
accuracy_, _ = sess.run([g['accuracy'], g['ts']], feed_dict=feed)
accuracy += accuracy_
if train_instance.epochs > current_epoch:
current_epoch += 1
tr_losses.append(accuracy / step)
step, accuracy = 0, 0
#eval test set
te_epoch = test_instance.epochs
while test_instance.epochs == te_epoch:
step += 1
print ('Testing round ', step)
trc = test_instance.next_batch()
feed = {g['input_data']: trc[0], g['target']: trc[1]}
accuracy_ = sess.run([g['accuracy']], feed_dict=feed)[0]
accuracy += accuracy_
te_losses.append(accuracy / step)
step, accuracy = 0,0
print("Accuracy after epoch", current_epoch, " - tr:", tr_losses[-1], "- te:", te_losses[-1])
return tr_losses, te_losses
g = struct_network()
tr_losses, te_losses = train_network(g)
You defined keep_prob as a tf.constant, but then trying to feed the value into it. Replace keep_prob = tf.constant(1.0) with keep_prob = tf.placeholder(tf.float32,[]) or keep_prob = tf.placeholder_with_default(1.0,[])

2 layer NN weights not updating

I have a fairly simple NN that has 1 hidden layer.
However, the weights don't seem to be updating. Or perhaps they are but the variable values don't change ?
Either way, my accuracy is 0.1 and it doesn't change no matter I change the learning rate or the activation function. Not sure what is wrong. Any ideas ?
I've posted the entire code correctly formatter so you guys can directly copy paste it and run it on your local machines.
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels. mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [784, None],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([25, 784],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([10, 25],stddev=1.0/math.sqrt(25)),name='W')
# bias units b1 = tf.Variable(tf.zeros([25,1]),name='b1')
b2 = tf.Variable(tf.zeros([25,1]),name='b2')
b3 = tf.Variable(tf.zeros([10,1]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(W1, x,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(W3, hidden1,name='y') + b3
y_ = tf.placeholder(tf.float32, [10, None],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(2).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.global_variables_initializer()
sess.run(init)
# Train
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: np.transpose(batch_xs), y_: np.transpose(batch_ys)})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: np.transpose(mnist.test.images), y_: np.transpose(mnist.test.labels)}))
The reason why you are getting 0.1 accuracy consistently is mainly due to the order of dimensions of the input placeholder and the weights following it. Learning rate is another factor. If the learning rate is very high, the gradient would be oscillating and will not reach any minima.
Tensorflow takes the number of instances(batches) as the first index value of placeholder. So the code which declares input x
x = tf.placeholder(tf.float32, [784, None],name='x')
should be declared as
x = tf.placeholder(tf.float32, [None, 784],name='x')
Consequently, W1 should be declared as
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
and so on.. Even the bias variables should be declared in the transpose sense. (Thats how tensorflow takes it :) )
For example
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
I'm putting the corrected full code below for your reference. I achieved an accuracy of 0.9262 with this :D
from tensorflow.examples.tutorials.mnist import input_data
import math
import numpy as np
import tensorflow as tf
# one hot option returns binarized labels.
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
# model parameters
x = tf.placeholder(tf.float32, [None, 784],name='x')
# weights
W1 = tf.Variable(tf.truncated_normal([784, 25],stddev= 1.0/math.sqrt(784)),name='W')
W2 = tf.Variable(tf.truncated_normal([25, 25],stddev=1.0/math.sqrt(25)),name='W')
W3 = tf.Variable(tf.truncated_normal([25, 10],stddev=1.0/math.sqrt(25)),name='W')
# bias units
b1 = tf.Variable(tf.zeros([25]),name='b1')
b2 = tf.Variable(tf.zeros([25]),name='b2')
b3 = tf.Variable(tf.zeros([10]),name='b3')
# NN architecture
hidden1 = tf.nn.relu(tf.matmul(x, W1,name='hidden1')+b1, name='hidden1_out')
# hidden2 = tf.nn.sigmoid(tf.matmul(W2, hidden1, name='hidden2')+b2, name='hidden2_out')
y = tf.matmul(hidden1, W3,name='y') + b3
y_ = tf.placeholder(tf.float32, [None, 10],name='y_')
# Create the model
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter('log_simple_graph', sess.graph)
init = tf.initialize_all_variables()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
summary =sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if summary is not None:
summary_writer.add_event(summary)
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

Plot in tensorboard is always closes and like a circle

I was trying to plot a loss curve, but is always abnormal (just like a circle, I really don't know how to describe it in English properly), I had found many topics about question like this and just can't solve, my tensorflow version is 0.10.0.
import tensorflow as tf
from tensorflow.core.util.event_pb2 import SessionLog
import os
# initialize variables/model parameters
# define the training loop operations
def inputs():
# read/generate input training data X and expected outputs Y
weight_age = [[84,46],[73,20],[65,52],[70,30],[76,57],[69,25],[63,28],[72,36],[79,57],[75,44],[27,24]
,[89,31],[65,52],[57,23],[59,60],[69,48],[60,34],[79,51],[75,50],[82,34],[59,46],[67,23],
[85,37],[55,40],[63,30]]
blodd_fat_content = [354,190,405,263,451,302,288,385,402,365,209,290,346,
254,395,434,220,374,308,220,311,181,274,303,244]
return tf.to_float(weight_age), tf.to_float(blodd_fat_content)
def inference(X):
# compute inference model over data X and return the result
return tf.matmul(X, W) + b
def loss(X, Y):
# compute loss over training data X and expected outputs Y
Y_predicted = inference(X)
return tf.reduce_sum(tf.squared_difference(Y, Y_predicted))
def train(total_loss):
# train / adjust model parameters according to computed total loss
learning_rate = 1e-7
return tf.train.GradientDescentOptimizer(learning_rate).minimize(total_loss)
def evaluate(sess, X, Y):
# evaluate the resulting trained model
print (sess.run(inference([[80., 25.]])))
print (sess.run(inference([[60., 25.]])))
g1 = tf.Graph()
with tf.Session(graph=g1) as sess:
W = tf.Variable(tf.zeros([2,1]), name="weights")
b = tf.Variable(0., name="bias")
tf.initialize_all_variables().run()
X, Y = inputs()
print (sess.run(W))
total_loss = loss(X, Y)
train_op = train(total_loss)
tf.scalar_summary("loss", total_loss)
summaries = tf.merge_all_summaries()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
summary_writer = tf.train.SummaryWriter('linear', g1)
summary_writer.add_session_log(session_log= SessionLog(status=SessionLog.START), global_step=1)
# actual training loop
training_steps = 100
tolerance = 100
total_loss_last = 0
initial_step = 0
# Create a saver.
saver = tf.train.Saver()
# verify if we don't have a checkpoint saved already
ckpt = tf.train.get_checkpoint_state(os.path.dirname('my_model'))
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
initial_step = int(ckpt.model_checkpoint_path.rsplit('-', 1)[1])
# summary_writer.add_session_log(SessionLog(status=SessionLog.START), global_step=initial_step)
for step in range(initial_step, training_steps):
sess.run([train_op])
if step%20 == 0:
saver.save(sess, 'my-model', global_step=step)
gap = abs(sess.run(total_loss) - total_loss_last)
total_loss_last = sess.run(total_loss)
summary_writer.add_summary(sess.run(summaries), step)
# for debugging and learning purposes, see how the loss gets decremented thru training steps
if step % 10 == 0:
print ("loss: ", sess.run([total_loss]))
print("step: ", step)
if gap < tolerance:
break
# evaluation...
evaluate(sess, X, Y)
coord.request_stop()
coord.join(threads)
saver.save(sess, 'my-model', global_step=training_steps)
summary_writer.flush()
sess.close()