I'm fitting full convolutional network on some image data for semantic segmentation using Keras. However, I'm having some problems overfitting. I don't have that much data and I want to do data augmentation. However, as I want to do pixel-wise classification, I need any augmentations like flips, rotations, and shifts to apply to both feature images and the label images. Ideally I'd like to use the Keras ImageDataGenerator for on-the-fly transformations. However, as far as I can tell, you cannot do equivalent transformations on both the feature and label data.
Does anyone know if this is the case and if not, does anyone have any ideas? Otherwise, I'll use other tools to create a larger dataset and just feed it in all at once.

Yes you can. Here's an example from Keras's docs. You zip together two generators seeded with the same seeds and the fit_generator them.
# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods seed = 1, augment=True, seed=seed), augment=True, seed=seed)
image_generator = image_datagen.flow_from_directory(
mask_generator = mask_datagen.flow_from_directory(
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

There are works on extending ImageDataGenerator to be more flexible for exactly these type of cases (see in this issue on Github for examples).
Additionally, as mentioned by Mikael Rousson in the comments, you can easily create your own version of ImageDataGenerator yourself, while leveraging many of its built-in functions to make it easier. Here is an example code I've used for an image denoising problem, where I use random crops + additive noise to generate clean and noisy image pairs on the fly. You could easily modify this to add other types of augmentations. After which, you can use Model.fit_generator to train using these methods.
from keras.preprocessing.image import load_img, img_to_array, list_pictures
def random_crop(image, crop_size):
height, width = image.shape[1:]
dy, dx = crop_size
if width < dx or height < dy:
return None
x = np.random.randint(0, width - dx + 1)
y = np.random.randint(0, height - dy + 1)
return image[:, y:(y+dy), x:(x+dx)]
def image_generator(list_of_files, crop_size, to_grayscale=True, scale=1, shift=0):
while True:
filename = np.random.choice(list_of_files)
img = img_to_array(load_img(filename, to_grayscale))
cropped_img = random_crop(img, crop_size)
if cropped_img is None:
yield scale * cropped_img - shift
def corrupted_training_pair(images, sigma):
for img in images:
target = img
if sigma > 0:
source = img + np.random.normal(0, sigma, img.shape)/255.0
source = img
yield (source, target)
def group_by_batch(dataset, batch_size):
while True:
sources, targets = zip(*[next(dataset) for i in xrange(batch_size)])
batch = (np.stack(sources), np.stack(targets))
yield batch
def load_dataset(directory, crop_size, sigma, batch_size):
files = list_pictures(directory)
generator = image_generator(files, crop_size, scale=1/255.0, shift=0.5)
generator = corrupted_training_pair(generator, sigma)
generator = group_by_batch(generator, batch_size)
return generator
You can then use the above like so:
train_set = load_dataset('images/train', (patch_height, patch_width), noise_sigma, batch_size)
val_set = load_dataset('images/val', (patch_height, patch_width), noise_sigma, batch_size)
model.fit_generator(train_set, samples_per_epoch=batch_size * 1000, nb_epoch=nb_epoch, validation_data=val_set, nb_val_samples=1000)


Finding random index with specific value in large numpy array

I have a very large 2D numpy array (~5e8 values). I have labeled that array using scipy.ndimage.label I then want to find a random index of the flattened array that contains each label. I can do this with:
import numpy as np
from scipy.ndimage import label
base_array = np.random.randint(0, 5, (100000, 5000))
labeled_array, nlabels = label(base_array)
for label_num in xrange(1, nlabels+1):
indices = np.where(labeled_array.flat == label_num)[0]
index = np.random.choice(indices)
But, it is slow with an array this large. I have also tried replacing the np.where with:
indices = np.argwhere(labeled_array.flat == label).squeeze()
And found it to be slower. I have a suspicion that the boolean masking is the slow part. Is there anyway to speed this up, or a better way to do this. I will say in my real application the array is fairly sparse with about 25% fill, though I have no experience with scipy's sparse array functions.
Your suspicion that masking separately for each label is expensive is correct, because no matter how you do it the masking will always be O(n).
We can circumvent this by argsorting by label and then randomly picking from each block of equal labels.
Since the labels are an integer range we can get the argsort cheaper than np.argsort by using some sparse matrix machinery available in scipy.
As my machine doesn't have an awful lot of ram I had to shrink your example a bit (factor 4). It then runs in about 5 seconds.
import numpy as np
from scipy.ndimage import label
from scipy import sparse
def multi_randint(bins):
"""draw one random int from each range(bins[i], bins[i+1])"""
high = np.diff(bins)
n = high.size
pick = np.random.randint(0, 1<<30, (n,))
reject = np.flatnonzero(pick + (1<<30) % high >= (1<<30))
while reject.size:
npick = np.random.randint(0, 1<<30, (reject.size,))
rejrej = npick + (1<<30) % sizes[reject] >= (1<<30)
pick[reject] = npick
reject = reject[rejrej]
return bins[:-1] + pick % high
# build mock data, note that I had to shrink by 4x b/c memory
base_array = np.random.randint(0, 5, (50000, 2500), dtype=np.int8)
labeled_array, nlabels = label(base_array)
# build auxiliary sparse matrix
h = sparse.csr_matrix(
(np.ones(labeled_array.size, bool), labeled_array.ravel(),
np.arange(labeled_array.size+1, dtype=np.int32)),
(labeled_array.size, nlabels+1))
# conversion to csc argsorts the labels (but cheaper than argsort)
h = h.tocsc()
# draw
result = h.indices[multi_randint(h.indptr)]
# check result
assert len(set(labeled_array.ravel()[result])) == nlabels+1

Use Chi-Squared statistic in pymc3

I am trying to use PyMC3 to fit a model to some observed data. This model is based on external code (interfaced via theano.ops.as_op), and depends on multiple parameters that should be fit by the MCMC process. Since the gradient of the external code cannot be determined, I use the Metropolis-Hastings sampler.
I have established Uniform priors for my inputs, and generate a model using my custom code. However, I want to compare the simulated data to my observations (a 3D np.ndarray) using the chi-squared statistic (sum of the squares of data-model/sigma^2) to obtain a log-likelihood. When the MCMC samples are drawn, this should lead to the trace converging on the best values of each parameter.
My model is explained in the following semi-pseudocode (if that's even a word):
import pymc3 as pm
#Some stuff setting up the data, preparing some functions etc.
#theano.compile.ops.as_op(itypes=[input types],otypes = [output types])
def make_model(inputs):
#Wrapper to external code to generate simulated data
return simulated data
model = pm.model()
with model:
#priors for 13 input parameters
simData = make_model(inputs)
I now want to obtain the chi-squared logLikelihood for this model versus the data, which I think can be done using pm.ChiSquared, however I do not see how to combine the data, model and this distribution together to cause the sampler to perform correctly. I would guess it might look something like:
chiSq = pm.ChiSquared(nu=data.size, observed = (data-simData)**2/err**2)
trace = pm.sample(1000)
Is this correct? In running previous tests, I have found the samples appear to be simply drawn from the priors.
Thanks in advance.
Taking aloctavodia's advice, I was able to get parameter estimates for some toy exponential data using a pm.Normal likelihood. Using a pm.ChiSquared likelihood as the OP suggested, the model converged to correct values, but the posteriors on the parameters were roughly three times as broad. Here's the code for the model; I first generated data and then fit with PyMC3.
# Draw `nPoints` observed data points `y_obs` from the function
# 3. + 18. * numpy.exp(-.2 * x)
# with the points evaluated at `x_obs`
# x_obs = numpy.linspace(0, 100, nPoints)
# Add Normal(mu=0,sd=`cov`) noise to each point in `y_obs`
# Then instantiate PyMC3 model for fit:
def YModel(x, c, a, l):
# exponential model expected to describe the data
mu = c + a * pm.math.exp(-l * x)
return mu
def logp(y_mod, y_obs):
# Normal distribution likelihood
return pm.Normal.dist(mu = y_mod, sd = cov).logp(y_obs)
# Chi squared likelihood (to use, comment preceding line & uncomment next 2 lines)
#chi2 = chi2 = pm.math.sum( ((y_mod - y_obs)/cov)**2 )
#return pm.ChiSquared.dist(nu = nPoints).logp(chi2)
with pm.Model() as model:
c = pm.Uniform('constant', lower = 0., upper = 10., testval = 5.)
a = pm.Uniform('amplitude', lower = 0., upper = 50., testval = 25.)
l = pm.Uniform('lambda', lower = 0., upper = 10., testval = 5.)
y_mod = YModel(x_obs, c, a, l)
L = pm.DensityDist('L', logp, observed = {'y_mod': y_mod, 'y_obs': y_obs}, testval = {'y_mod': y_mod, 'y_obs': y_obs})
step = pm.Metropolis([c, a, l])
trace = pm.sample(draws = 10000, step = step)
The above model converged, but I found that success was sensitive to the bounds on the priors and the initial guesses on those parameters.
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
c 3.184397 0.111933 0.002563 2.958383 3.397741 1834.0 1.000260
a 18.276887 0.747706 0.019857 16.882025 19.762849 1343.0 1.000411
l 0.200201 0.013486 0.000361 0.174800 0.226480 1282.0 0.999991
(Edited: I had forgotten to sum the squares of the normalized residuals for chi2)

keras autoencoder vs PCA

I am playing with a toy example to understand PCA vs keras autoencoder
I have the following code for understanding PCA:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets
iris = datasets.load_iris()
X =
pca = decomposition.PCA(n_components=3)
array([ 0.92461621, 0.05301557, 0.01718514])
array([[ 0.36158968, -0.08226889, 0.85657211, 0.35884393],
[ 0.65653988, 0.72971237, -0.1757674 , -0.07470647],
[-0.58099728, 0.59641809, 0.07252408, 0.54906091]])
I have done a few readings and play codes with keras including this one.
However, the reference code feels too high a leap for my level of understanding.
Does someone have a short auto-encoder code which can show me
(1) how to pull the first 3 components from auto-encoder
(2) how to understand what amount of variance the auto-encoder captures
(3) how the auto-encoder components compare against PCA components
First of all, the aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. So, the target output of the autoencoder is the autoencoder input itself.
It is shown in [1] that If there is one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
And in [2] you can see that If the hidden layer is nonlinear, the autoencoder behaves differently from PCA, with the ability to capture multi-modal aspects of the input distribution.
Autoencoders are data-specific, which means that they will only be able to compress data similar to what they have been trained on. So, the usefulness of features that have been learned by hidden layers could be used for evaluating the efficacy of the method.
For this reason, one way to evaluate an autoencoder efficacy in dimensionality reduction is cutting the output of the middle hidden layer and compare the accuracy/performance of your desired algorithm by this reduced data rather than using original data.
Generally, PCA is a linear method, while autoencoders are usually non-linear. Mathematically, it is hard to compare them together, but intuitively I provide an example of dimensionality reduction on MNIST dataset using Autoencoder for your better understanding. The code is here:
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense
from keras.utils import np_utils
import numpy as np
num_train = 60000
num_test = 10000
height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range
Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels
input_img = Input(shape=(height * width,))
x = Dense(height * width, activation='relu')(input_img)
encoded = Dense(height * width//2, activation='relu')(x)
encoded = Dense(height * width//8, activation='relu')(encoded)
y = Dense(height * width//256, activation='relu')(x)
decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)
z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)
model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy, X_train,
validation_data=(X_test, X_test))
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
metrics=['accuracy']), Y_train,
validation_data=(X_test, Y_test))
scores = reduced.evaluate(X_test, Y_test, verbose=1)
print("Accuracy: ", scores[1])
It produces a $y\in \mathbb{R}^{3}$ ( almost like what you get by decomposition.PCA(n_components=3) ). For example, here you see the outputs of layer y for a digit 5 instance in dataset:
class y_1 y_2 y_3
5 87.38 0.00 20.79
As you see in the above code, when we connect layer y to a softmax dense layer:
mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)
the new model mid give us a good classification accuracy about 95%. So, it would be reasonable to say that y, is an efficiently extracted feature vector for the dataset.
[1]: Bourlard, Hervé, and Yves Kamp. "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics 59.4 (1988): 291-294.
[2]: Japkowicz, Nathalie, Stephen Jose Hanson, and Mark A. Gluck. "Nonlinear autoassociation is not equivalent to PCA." Neural computation 12.3 (2000): 531-545.
The earlier answer cover the whole thing, however I am doing the analysis on the Iris data - my code comes with a slightly modificiation from this post which dives further into the topic. As it was request, lets load the data
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
iris = load_iris()
X =
y =
target_names = iris.target_names
scaler = MinMaxScaler()
X_scaled = scaler.transform(X)
Let's do a regular PCA
from sklearn import decomposition
pca = decomposition.PCA()
pca_transformed = pca.fit_transform(X_scaled)
plot3clusters(pca_transformed[:,:2], 'PCA', 'PC')
A very simple AE model with linear layers, as the earlier answer pointed out with ... the first reference, one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data.
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt
#create an AE and fit it with our data using 3 neurons in the dense layer using keras' functional API
input_dim = X_scaled.shape[1]
encoding_dim = 2
input_img = Input(shape=(input_dim,))
encoded = Dense(encoding_dim, activation='linear')(input_img)
decoded = Dense(input_dim, activation='linear')(encoded)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
history =, X_scaled,
verbose = 0)
# use our encoded layer to encode the training input
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoder_layer(encoded_input))
encoded_data = encoder.predict(X_scaled)
plot3clusters(encoded_data[:,:2], 'Linear AE', 'AE')
You can look into the loss if you want
#plot our loss
plt.title('model train vs validation loss')
plt.legend(['train', 'validation'], loc='upper right')
The function to plot the data
def plot3clusters(X, title, vtitle):
import matplotlib.pyplot as plt
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(X[y == i, 0], X[y == i, 1], color=color, alpha=1., lw=lw, label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel(vtitle + "1")
plt.ylabel(vtitle + "2")
Regarding explaining the variability, using non-linear hidden function, leads to other approximation similar to ICA / TSNE and others. Where the idea of variance explanation is not there, still one can look into the convergence.

Tensorflow RNN training won't execute?

I am currently trying to train this RNN network, but seem to be running into weird errors, which I am not able to decode.
The input to my rnn network is digital sampled audio files. As the audio file can be of different length, will the vector of the sampled audio also have different lengths.
The output or the target of the neural network is to recreate a 14 dimensional vector, containing certain information of the audio files. I've already know the target, by manually calculating it, but need to make it work with a neural network.
I am currently using tensorflow as framework.
My network setup looks like this:
def last_relevant(output):
max_length = int(output.get_shape()[1])
relevant = tf.reduce_sum(tf.mul(output, tf.expand_dims(tf.one_hot(length, max_length), -1)), 1)
return relevant
def length(sequence): ##Zero padding to fit the max lenght... Question whether that is a good idea.
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
def cost(output, target):
# Compute cross entropy for each frame.
cross_entropy = target * tf.log(output)
cross_entropy = -tf.reduce_sum(cross_entropy, reduction_indices=2)
mask = tf.sign(tf.reduce_max(tf.abs(target), reduction_indices=2))
cross_entropy *= mask
# Average over actual sequence lengths.
cross_entropy = tf.reduce_sum(cross_entropy, reduction_indices=1)
cross_entropy /= tf.reduce_sum(mask, reduction_indices=1)
return tf.reduce_mean(cross_entropy)
### Tensorflow neural network setup
batch_size = None
sequence_length_max = max_length
data = tf.placeholder(tf.float32,[batch_size,sequence_length_max,input_dimension])
target = tf.placeholder(tf.float32,[None,14])
num_hidden = 24 ## Hidden layer
cell = tf.nn.rnn_cell.LSTMCell(num_hidden,state_is_tuple=True) ## Long short term memory
output, state = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32,sequence_length = length(data)) ## Creates the Rnn skeleton
last = last_relevant(output)#tf.gather(val, int(val.get_shape()[0]) - 1) ## Appedning as last
weight = tf.Variable(tf.truncated_normal([num_hidden, int(target.get_shape()[1])]))
bias = tf.Variable(tf.constant(0.1, shape=[target.get_shape()[1]]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)
cross_entropy = cost(output,target)# How far am I from correct value?
optimizer = tf.train.AdamOptimizer() ## TensorflowOptimizer
minimize = optimizer.minimize(cross_entropy)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(prediction, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
## Training ##
init_op = tf.initialize_all_variables()
sess = tf.Session()
batch_size = 1000
no_of_batches = int(len(train_data)/batch_size)
epoch = 5000
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
inp, out = train_data[ptr:ptr+batch_size], train_output[ptr:ptr+batch_size]
ptr+=batch_size,{data: inp, target: out})
print "Epoch - ",str(i)
incorrect =,{data: test_data, target: test_output})
print('Epoch {:2d} error {:3.1f}%'.format(i + 1, 100 * incorrect))
The error seem to be the usage of the function last_relevant, which should take the output, and feed it back.
This is the error message:
TypeError: Expected binary or unicode string, got <function length at 0x7f846594dde8>
Anyway to tell what could be wrong here?
I tried to build your code in my local.
There is a fundamental mistake in the code which is that you call tf.one_hot but what you pass don't really fit with what is expected:
Read documentation here:
tf.one_hot(indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None)
However, you are passing a function pointer ("length" is a function in your code, I recommend naming your function in a meaningful manner by refraining yourself from using common keywords) instead of the first parameter.
For a wild guide, you can put your indices as first param (instead of my placeholder empty list) and it will be fixed
relevant = tf.reduce_sum(
tf.mul(output, tf.expand_dims(tf.one_hot([], max_length), -1)), 1)

How to calculate dice coefficient for measuring accuracy of image segmentation in python

I have an image of land cover and I segmented it using K-means clustering. Now I want to calculate the accuracy of my segmentation algorithm. I read somewhere that dice co-efficient is the substantive evaluation measure. But I am not sure how to calculate it.
I use Python 2.7
Are there any other effective evaluation methods? Please give a summary or a link to a source. Thank You!
I used the following code for measuring the dice similarity for my original and the segmented image but it seems to take hours to calculate:
for i in xrange(0,7672320):
for j in xrange(0,3):
dice = np.sum([seg==gt])*2.0/(np.sum(seg)+np.sum(gt)) #seg is the segmented image and gt is the original image. Both are of same size
Please refer to Dice similarity coefficient at wiki
A sample code segment here for your reference. Please note that you need to replace k with your desired cluster since you are using k-means.
import numpy as np
# segmentation
seg = np.zeros((100,100), dtype='int')
seg[30:70, 30:70] = k
# ground truth
gt = np.zeros((100,100), dtype='int')
gt[30:70, 40:80] = k
dice = np.sum(seg[gt==k])*2.0 / (np.sum(seg) + np.sum(gt))
print 'Dice similarity score is {}'.format(dice)
If you are working with opencv you could use the following function:
import cv2
import numpy as np
#load images
y_pred = cv2.imread('predictions/image_001.png')
y_true = cv2.imread('ground_truth/image_001.png')
# Dice similarity function
def dice(pred, true, k = 1):
intersection = np.sum(pred[true==k]) * 2.0
dice = intersection / (np.sum(pred) + np.sum(true))
return dice
dice_score = dice(y_pred, y_true, k = 255) #255 in my case, can be 1
print ("Dice Similarity: {}".format(dice_score))
In case you want to evaluate with this metric within a deep learning model using tensorflow you can use the following:
def dice_coef(y_true, y_pred):
y_true_f = tf.reshape(tf.dtypes.cast(y_true, tf.float32), [-1])
y_pred_f = tf.reshape(tf.dtypes.cast(y_pred, tf.float32), [-1])
intersection = tf.reduce_sum(y_true_f * y_pred_f)
return (2. * intersection + 1.) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + 1.)
This is an important clarification if what you're using has more than 2 classes (aka, a mask with 1 and 0).
If you are using multiple classes, make sure to specify that the prediction and ground truth also equal the value which you want. Otherwise you can end up getting DSC values greater than 1.
This is the extra ==k at the end of each [] statement:
import numpy as np
# segmentation
seg = np.zeros((100,100), dtype='int')
seg[30:70, 30:70] = k
# ground truth
gt = np.zeros((100,100), dtype='int')
gt[30:70, 40:80] = k
dice = np.sum(seg[gt==k]==k)*2.0 / (np.sum(seg[seg==k]==k) + np.sum(gt[gt==k]==k))
print 'Dice similarity score is {}'.format(dice)