Tensorflow RNN training won't execute? - python-2.7

I am currently trying to train this RNN network, but seem to be running into weird errors, which I am not able to decode.
The input to my rnn network is digital sampled audio files. As the audio file can be of different length, will the vector of the sampled audio also have different lengths.
The output or the target of the neural network is to recreate a 14 dimensional vector, containing certain information of the audio files. I've already know the target, by manually calculating it, but need to make it work with a neural network.
I am currently using tensorflow as framework.
My network setup looks like this:
def last_relevant(output):
max_length = int(output.get_shape()[1])
relevant = tf.reduce_sum(tf.mul(output, tf.expand_dims(tf.one_hot(length, max_length), -1)), 1)
return relevant
def length(sequence): ##Zero padding to fit the max lenght... Question whether that is a good idea.
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
def cost(output, target):
# Compute cross entropy for each frame.
cross_entropy = target * tf.log(output)
cross_entropy = -tf.reduce_sum(cross_entropy, reduction_indices=2)
mask = tf.sign(tf.reduce_max(tf.abs(target), reduction_indices=2))
cross_entropy *= mask
# Average over actual sequence lengths.
cross_entropy = tf.reduce_sum(cross_entropy, reduction_indices=1)
cross_entropy /= tf.reduce_sum(mask, reduction_indices=1)
return tf.reduce_mean(cross_entropy)
### Tensorflow neural network setup
batch_size = None
sequence_length_max = max_length
data = tf.placeholder(tf.float32,[batch_size,sequence_length_max,input_dimension])
target = tf.placeholder(tf.float32,[None,14])
num_hidden = 24 ## Hidden layer
cell = tf.nn.rnn_cell.LSTMCell(num_hidden,state_is_tuple=True) ## Long short term memory
output, state = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32,sequence_length = length(data)) ## Creates the Rnn skeleton
last = last_relevant(output)#tf.gather(val, int(val.get_shape()[0]) - 1) ## Appedning as last
weight = tf.Variable(tf.truncated_normal([num_hidden, int(target.get_shape()[1])]))
bias = tf.Variable(tf.constant(0.1, shape=[target.get_shape()[1]]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)
cross_entropy = cost(output,target)# How far am I from correct value?
optimizer = tf.train.AdamOptimizer() ## TensorflowOptimizer
minimize = optimizer.minimize(cross_entropy)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(prediction, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
## Training ##
init_op = tf.initialize_all_variables()
sess = tf.Session()
batch_size = 1000
no_of_batches = int(len(train_data)/batch_size)
epoch = 5000
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
inp, out = train_data[ptr:ptr+batch_size], train_output[ptr:ptr+batch_size]
sess.run(minimize,{data: inp, target: out})
print "Epoch - ",str(i)
incorrect = sess.run(error,{data: test_data, target: test_output})
print('Epoch {:2d} error {:3.1f}%'.format(i + 1, 100 * incorrect))
The error seem to be the usage of the function last_relevant, which should take the output, and feed it back.
This is the error message:
TypeError: Expected binary or unicode string, got <function length at 0x7f846594dde8>
Anyway to tell what could be wrong here?

I tried to build your code in my local.
There is a fundamental mistake in the code which is that you call tf.one_hot but what you pass don't really fit with what is expected:
Read documentation here:
tf.one_hot(indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None)
However, you are passing a function pointer ("length" is a function in your code, I recommend naming your function in a meaningful manner by refraining yourself from using common keywords) instead of the first parameter.
For a wild guide, you can put your indices as first param (instead of my placeholder empty list) and it will be fixed
relevant = tf.reduce_sum(
tf.mul(output, tf.expand_dims(tf.one_hot([], max_length), -1)), 1)


Use Chi-Squared statistic in pymc3

I am trying to use PyMC3 to fit a model to some observed data. This model is based on external code (interfaced via theano.ops.as_op), and depends on multiple parameters that should be fit by the MCMC process. Since the gradient of the external code cannot be determined, I use the Metropolis-Hastings sampler.
I have established Uniform priors for my inputs, and generate a model using my custom code. However, I want to compare the simulated data to my observations (a 3D np.ndarray) using the chi-squared statistic (sum of the squares of data-model/sigma^2) to obtain a log-likelihood. When the MCMC samples are drawn, this should lead to the trace converging on the best values of each parameter.
My model is explained in the following semi-pseudocode (if that's even a word):
import pymc3 as pm
#Some stuff setting up the data, preparing some functions etc.
#theano.compile.ops.as_op(itypes=[input types],otypes = [output types])
def make_model(inputs):
#Wrapper to external code to generate simulated data
return simulated data
model = pm.model()
with model:
#priors for 13 input parameters
simData = make_model(inputs)
I now want to obtain the chi-squared logLikelihood for this model versus the data, which I think can be done using pm.ChiSquared, however I do not see how to combine the data, model and this distribution together to cause the sampler to perform correctly. I would guess it might look something like:
chiSq = pm.ChiSquared(nu=data.size, observed = (data-simData)**2/err**2)
trace = pm.sample(1000)
Is this correct? In running previous tests, I have found the samples appear to be simply drawn from the priors.
Thanks in advance.
Taking aloctavodia's advice, I was able to get parameter estimates for some toy exponential data using a pm.Normal likelihood. Using a pm.ChiSquared likelihood as the OP suggested, the model converged to correct values, but the posteriors on the parameters were roughly three times as broad. Here's the code for the model; I first generated data and then fit with PyMC3.
# Draw `nPoints` observed data points `y_obs` from the function
# 3. + 18. * numpy.exp(-.2 * x)
# with the points evaluated at `x_obs`
# x_obs = numpy.linspace(0, 100, nPoints)
# Add Normal(mu=0,sd=`cov`) noise to each point in `y_obs`
# Then instantiate PyMC3 model for fit:
def YModel(x, c, a, l):
# exponential model expected to describe the data
mu = c + a * pm.math.exp(-l * x)
return mu
def logp(y_mod, y_obs):
# Normal distribution likelihood
return pm.Normal.dist(mu = y_mod, sd = cov).logp(y_obs)
# Chi squared likelihood (to use, comment preceding line & uncomment next 2 lines)
#chi2 = chi2 = pm.math.sum( ((y_mod - y_obs)/cov)**2 )
#return pm.ChiSquared.dist(nu = nPoints).logp(chi2)
with pm.Model() as model:
c = pm.Uniform('constant', lower = 0., upper = 10., testval = 5.)
a = pm.Uniform('amplitude', lower = 0., upper = 50., testval = 25.)
l = pm.Uniform('lambda', lower = 0., upper = 10., testval = 5.)
y_mod = YModel(x_obs, c, a, l)
L = pm.DensityDist('L', logp, observed = {'y_mod': y_mod, 'y_obs': y_obs}, testval = {'y_mod': y_mod, 'y_obs': y_obs})
step = pm.Metropolis([c, a, l])
trace = pm.sample(draws = 10000, step = step)
The above model converged, but I found that success was sensitive to the bounds on the priors and the initial guesses on those parameters.
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
c 3.184397 0.111933 0.002563 2.958383 3.397741 1834.0 1.000260
a 18.276887 0.747706 0.019857 16.882025 19.762849 1343.0 1.000411
l 0.200201 0.013486 0.000361 0.174800 0.226480 1282.0 0.999991
(Edited: I had forgotten to sum the squares of the normalized residuals for chi2)

TensorFlow CNN with 3D input is too slow to train

I am building a TensorFlow CNN in Python 2.7 to classify 40x40x40 3D images into one of the two categories. The training data is stored in a HDF5 file as a (5000,40,40,40,1) array (5000 training images, 1 color channel). However, when I am training the network, each iteration of a batch of 32 images takes about a minute to complete. Through Activity Monitor I see that there are about 6GB data written to the disk in each iteration. The HDF5 itself is only about 500MB. What is happening?
This is the code I used to load data:
f = h5py.File(file_name, 'r')
images = f.get('key_name')
images = np.array(images)
images = images.astype(np.float32)
images = np.multiply(images, 1.0 / 100.0)
I also tried to directly use a HDF5 object for each iteration, instead of loading all the data into memory at once. But the problem remains:
f = h5py.File(file_name, 'r')
images = np.array(f.get('key_name')[:batch_size])
CNN Structure
The CNN I built has two 3D convolution layer, one flatten layer, one fully connected layer and one output layer. Here is the complete code for the structure:
num_channels = 1
filter_size_conv1 = 5
filter_size_conv2 = 5
num_filters_conv1 = 32
num_filters_conv2 = 64
fc_layer_size = 64
x = tf.placeholder(tf.float32, shape=[None, img_size, img_size, img_size, num_channels], name='x')
y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
y_true_cls = tf.argmax(y_true, dimension=1)
def create_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def create_biases(size):
return tf.Variable(tf.constant(0.05,shape=[size]))
def create_convolutional_layer(input,
weights = create_weights(shape=[
conv_filter_size, conv_filter_size, conv_filter_size, num_input_channels, num_filters])
biases = create_biases(num_filters)
layer = tf.nn.conv3d(input=input,
strides=[1, 1, 1, 1, 1],
layer += biases
layer = tf.nn.max_pool3d(input=layer,
ksize=[1, 4, 4, 4, 1],
strides=[1, 4, 4, 4, 1],
layer = tf.nn.relu(layer)
return layer
def create_flatten_layer(layer):
layer_shape = layer.get_shape()
num_features = layer_shape[1:5].num_elements()
layer = tf.reshape(layer, [-1, num_features])
return layer
def create_fc_layer(input,
weights = create_weights(shape=[num_inputs, num_outputs])
biases = create_biases(num_outputs)
layer = tf.matmul(input, weights) + biases
if use_relu:
layer = tf.nn.relu(layer)
return layer
layer_conv1 = create_convolutional_layer(input=x,
layer_conv2 = create_convolutional_layer(input=layer_conv1,
layer_flat = create_flatten_layer(layer_conv2)
layer_fc1 = create_fc_layer(input=layer_flat,
layer_fc2 = create_fc_layer(input=layer_fc1,
Any help would be appreciated. Thank you so much!
I actually found the same problem here. I'm using conv3d and it takes like forever to finish. Probably related to the optimization of conv3d

How to deal with overfitting in Tensorflow?

I'm currently trying to train a image classification convolutional neural network. I'm using an architecture similar to that in the TensorFlow tutorial. After training, I can get a quite high training accuracy and a very low cross entropy. But the test accuracy is always only a little bit higher than random guessing. The neural network seems to suffer from overfitting. In the training process, I have applied stochastic gradient descent and droupout to try to avoid overfitting. But it just doesn't seem to work.
Here is part of my code.
batch_image = np.ndarray(shape=(100,9216), dtype='float')
batch_class = np.ndarray(shape=(100,10), dtype='float')
# first convolutinal layer
w_conv1 = weight_variable([5, 5, 3, 64])
b_conv1 = bias_variable([64])
x_image = tf.reshape(x, [-1, 48, 64, 3])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
norm1 = tf.nn.lrn(tf.to_float(h_pool1, name='ToFloat'), 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
# second convolutional layer
w_conv2 = weight_variable([5, 5, 64, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(norm1, w_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
norm2 = tf.nn.lrn(tf.to_float(h_pool2, name='ToFloat'), 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
# densely connected layer
w_fc1 = weight_variable([12*16*64, 512])
b_fc1 = bias_variable([512])
h_pool2_flat = tf.reshape(norm2, [-1, 12*16*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
#densely connected layer
w_fc2 = weight_variable([512, 256])
b_fc2 = bias_variable([256])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1, w_fc2) + b_fc2)
# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc2, keep_prob)
# readout layer
w_fc3 = weight_variable([256, 10])
b_fc3 = bias_variable([10])
y_prob = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc3) + b_fc3)
# train and evaluate the model
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_prob + 0.000000001))
train_step = tf.train.GradientDescentOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_prob, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
for i in range(100):
rand_idx = np.random.randint(17778, size=(100))
k = 0
for j in rand_idx:
batch_image[k] = images[j]
batch_class[k] = np.zeros(shape=(10))
batch_class[k, classes[j, 0]] = 1.0
train_step.run(feed_dict={x:batch_image, y_:batch_class, keep_prob:0.5})
train_accuracy = accuracy.eval(feed_dict={x:batch_image, y_:batch_class, keep_prob:1.0})
train_ce = cross_entropy.eval(feed_dict={x:batch_image, y_:batch_class, keep_prob:1.0})
I am wondering is there any mistake in my code or do I have to apply any other strategies to get a better test accuracy.
Thank you!
You can try below strategies to avoid overfitting.
shuffle the input data
Use early stopping for the Loss function with some Patience level.
L1 & L2 Regularization
Add Dropout
Batch Normalization.
If pixels are not normalized, dividing the pixels values with 255 also helps.
Perform Image Data Agumentation.
May be hyper parameter tuning grid search.
Hope it helps! Happy Coding.
Thank You!

Data Augmentation Image Data Generator Keras Semantic Segmentation

I'm fitting full convolutional network on some image data for semantic segmentation using Keras. However, I'm having some problems overfitting. I don't have that much data and I want to do data augmentation. However, as I want to do pixel-wise classification, I need any augmentations like flips, rotations, and shifts to apply to both feature images and the label images. Ideally I'd like to use the Keras ImageDataGenerator for on-the-fly transformations. However, as far as I can tell, you cannot do equivalent transformations on both the feature and label data.
Does anyone know if this is the case and if not, does anyone have any ideas? Otherwise, I'll use other tools to create a larger dataset and just feed it in all at once.
Yes you can. Here's an example from Keras's docs. You zip together two generators seeded with the same seeds and the fit_generator them.
# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods seed = 1
image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)
image_generator = image_datagen.flow_from_directory(
mask_generator = mask_datagen.flow_from_directory(
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
There are works on extending ImageDataGenerator to be more flexible for exactly these type of cases (see in this issue on Github for examples).
Additionally, as mentioned by Mikael Rousson in the comments, you can easily create your own version of ImageDataGenerator yourself, while leveraging many of its built-in functions to make it easier. Here is an example code I've used for an image denoising problem, where I use random crops + additive noise to generate clean and noisy image pairs on the fly. You could easily modify this to add other types of augmentations. After which, you can use Model.fit_generator to train using these methods.
from keras.preprocessing.image import load_img, img_to_array, list_pictures
def random_crop(image, crop_size):
height, width = image.shape[1:]
dy, dx = crop_size
if width < dx or height < dy:
return None
x = np.random.randint(0, width - dx + 1)
y = np.random.randint(0, height - dy + 1)
return image[:, y:(y+dy), x:(x+dx)]
def image_generator(list_of_files, crop_size, to_grayscale=True, scale=1, shift=0):
while True:
filename = np.random.choice(list_of_files)
img = img_to_array(load_img(filename, to_grayscale))
cropped_img = random_crop(img, crop_size)
if cropped_img is None:
yield scale * cropped_img - shift
def corrupted_training_pair(images, sigma):
for img in images:
target = img
if sigma > 0:
source = img + np.random.normal(0, sigma, img.shape)/255.0
source = img
yield (source, target)
def group_by_batch(dataset, batch_size):
while True:
sources, targets = zip(*[next(dataset) for i in xrange(batch_size)])
batch = (np.stack(sources), np.stack(targets))
yield batch
def load_dataset(directory, crop_size, sigma, batch_size):
files = list_pictures(directory)
generator = image_generator(files, crop_size, scale=1/255.0, shift=0.5)
generator = corrupted_training_pair(generator, sigma)
generator = group_by_batch(generator, batch_size)
return generator
You can then use the above like so:
train_set = load_dataset('images/train', (patch_height, patch_width), noise_sigma, batch_size)
val_set = load_dataset('images/val', (patch_height, patch_width), noise_sigma, batch_size)
model.fit_generator(train_set, samples_per_epoch=batch_size * 1000, nb_epoch=nb_epoch, validation_data=val_set, nb_val_samples=1000)

Return progress status when drawing a large NetworkX graph

I have a large graph that I'm drawing that is taking a long time to
Is it possible to return a status, current_node, or percentage of the current status of the drawing?
I'm not looking to incrementally draw the network as all I'm doing it is saving it to a high dpi image.
Here's an example of the code I'm using:
path = nx.shortest_path(G, source=u'1234', target=u'98765')
path_edges = zip(path, path[1:])
pos = nx.spring_layout(G)
plt.savefig('prototype_map.png', dpi=1000)
I believe the only way to do it is to accommodate the source code of draw function to print something saying 10%, 20% complete.... But when I checked the source code of draw_networkx_nodes & draw_networkx, I realized that it is not a straight forward task as the draw function stores the positions (nodes and edges) in a numpy array, send it to the ax.scatter function of matplotlib (sourcecode) which is a bit hard to manipulate without messing something up. The only thing I can think of is to change:
xy = numpy.asarray([pos[v] for v in nodelist]) # In draw_networkx_nodes function
xy = []
count = 0
for v in nodelist:
count +=1
if (count == len(nodelist)):
print '50% of nodes completed'
print '100% of nodes completed'
xy = numpy.asarray(xy)
Similarly when draw_network_edges is called, to indicate progress in edges drawing. I am not sure how far from truth this will be because I do not know how much time is spent in the ax.scatter function. I also, looked in the source code of the scatter function but I could not pin point a loop or something to print an indication that some progress has been done.
Some layout functions accept pos argument to conduct incremental work. We can use this fact to split the computation into chunks and draw a progress bar using tqdm
def plot_graph(g, iterations=50, pos=None, k_numerator=None, figsize=(10, 10)):
if k_numerator is None:
k = None
k = k_numerator / np.sqrt(g.number_of_nodes())
with tqdm(total=iterations) as pbar:
step = 5
iterations_done = 0
while iterations_done < iterations:
pos = nx.layout.fruchterman_reingold_layout(
g, iterations=step, pos=pos, k=k
iterations_done += step
fig = plt.figure(figsize=figsize, dpi=120)
return fig, pos