output feature map dimension of VGG16 model - computer-vision

I saw the example of feature extraction in the keras doc and used the following code to extract feature from input image
input_shape = (224, 224, 3)
model = VGG16(weights = 'imagenet', input_shape = (input_shape[0],
input_shape[1], input_shape[2]), pooling = 'max', include_top = False)
img = image.load_img(img_path, target_size=(input_shape[0],
input_shape[1]))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
feature = model.predict(img)
Then when I output the shape of the feature variable, I found it is (1, 512). Why it is this dimension?
The print model.summary() shows the shape of last conv layer's output after maxpooling is (7, 7, 512), this is the dimension that I expect feature should be.

Thank Yong Yuan for helping me figuring this out. Since he has some problem with answering question on SO so I just put his answer here in case other people have same question.
Basically it's because there's a global max pooling layer specified in this model (as we can see in the line model = VGG16(....., pooling = 'max', ....) which selects the largest cell from the 7*7 cells. It's also said in keras documents:
pooling: Optional pooling mode for feature extraction when include_top is False.
And in the output given by model.summary(), we can see after the max pooling of the fifth convolution block there's actually a global_max_pooling2d_1 layer and so the final dimension becomes 512.

Related

Tensor flow shuffle a tensor for batch gradient

To whom it may concern,
I am pretty new to tensorflow. I am trying to solve the famous MNIST problem for CNN. But i have encountered difficulty when i have to resuffle the x_training data (which is a [40000, 28, 28, 1] shape data.
my code is as below:
x_train_final = tf.reshape(x_train_final, [-1, image_width, image_width, 1])
x_train_final = tf.cast(x_train_final, dtype=tf.float32)
perm = np.arange(num_training_example).astype(np.int32)
np.random.shuffle(perm)
x_train_final = x_train_final[perm]
Below errors happened:
ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice_1371' (op: 'StridedSlice') with input shapes: [40000,28,28,1], [1,40000], [1,40000], [1].
Anyone can advise how can i work around this? Thanks.
I would suggest you to make use of scikit's shuffle function.
from sklearn.utils import shuffle
x_train_final = shuffle(x_train_final)
Also, you can pass in multiple arrays and shuffle function will reorganize(shuffle) the data in those multiple arrays maintaining same shuffling order in all those arrays. So with that, you can even pass in your label dataset as well.
Ex:
X_train, y_train = shuffle(X_train, y_train)

Separate Positive and Negative Samples for SVM Custom Object Detector

I am trying to train a Custom Object Detector by using the HOG+SVM method on OpenCV.
I have managed to extract HOG features from my positive and negative samples using the below line of code:
import cv2
hog = cv2.HOGDescriptor()
def poshoggify():
for i in range(1,20):
image = cv2.imread("/Users/munirmalik/cvprojek/cod/pos/" + str(i)+ ".jpg")
(winW, winH) = (500, 500)
for resized in pyramid(image, scale=1.5):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(resized, stepSize=32, windowSize=(winW, winH)):
# if the window does not meet our desired window size, ignore it
if window.shape[0] != winH or window.shape[1] != winW:
continue
img_pos = hog.compute(image)
np.savetxt('posdata.txt',img_pos)
return img_pos
And the equivalent function for the negative samples.
How do I format the data in such a way that the SVM knows which is positive and which is negative?
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
How do I format the data in such a way that the SVM knows which is positive and which is negative?
You would now create another list called labels which would store the class value associated with a corresponding image. For example, if you have a training set of features that looks like this:
features = [pos_features1, pos_features2, neg_features1, neg_features2, neg_features3, neg_features4]
you would have a corresponding labels class like
labels = [1, 1, 0, 0, 0, 0]
You would then feed this to a classifier like so:
clf=LinearSVC(C=1.0, class_weight='balanced')
clf.fit(features,labels)
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
Before training, you should have split your labelled dataset (groundtruth) into training and testing datasets. You can do this using skilearns KFold module.

Can I use a neural network on a linear regression using Keras? If yes , How?

I'm having difficulties setting up a NN in Keras. Please help me!
This is my code and I'm getting random values every time when I predict.
model = Sequential()
layer1 = Dense(5, input_shape = (5,))
model.add(layer1)
model.add(Activation('relu'))
layer2 = Dense(1)
model.add(layer2)
model.add(Activation('relu'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(xtrain, ytrain, verbose=1)
I have 5 input features and want to predict a single continuous value as an output
Input space have five features.
The problem was that i am getting random prediction at same input. Now, I have reach the solution. It is happening just because of that i am not doing the normalisation of features.
Thanks
From my point of view,
you are not giving your input shape correctly
layer1 = Dense(5, input_shape = (5,))
What is your actual input shape?

python2.7 - average of multiple opencv histograms

I am using python2.7 opencv library to calculate histograms of some images, all of the exact same size (cv2.calchist)
i have a need to do 2 things:
1. calculate the average of multiple images - multiple images who represent a similar object, and therefor i want to have a "representive" histogram of that object (if you have a better idea i am open to suggustions) for future comparisons.
2. store the histogram data in my mongo db for future comparisons (cv2 correlation)
the only code i see rellevant for the question is my histogram_comparison code:
def histogram_comparison(real, fake):
images = [real, fake]
index = []
for image in images:
image = image.decode('base64')
image = np.fromstring(image, dtype=np.uint8)
image = cv2.imdecode(image, 1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hist = cv2.calcHist([image], [0, 1, 2], None, [32, 32, 32],
[0, 256, 0, 256, 0, 256])
hist = cv2.normalize(hist).flatten()
index.append(hist)
result_dist = cv2.compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)
return round(result_dist, 5)
taken from: http://www.pyimagesearch.com/2014/07/14/3-ways-compare-histograms-using-opencv-python/
i do realize that when using numpy's (or was it scipy?) histograms, there is an easy way to get the bins and average them, but then im not really sure how then to compare between histograms so i would rather stay with opencv
thanks in advance
Since OpenCV (since 2.2) natively uses numpy arrays and since len(images) is constant, you can get avg between all your histograms and stores in mongo by simply:
h, b = np.histogram(images, bins=[0, 256])
db.histograms.insert({hist:(h/len(images)), bins:b })
I do not know if it's exactly what you want, but I hope it helps! See ya!

How to get the dropout mask in Tensorflow

I have constructed a regression type of neural net (NN) with dropout by Tensorflow. I would like to know if it is possible to find which hidden units are dropped from the previous layer in the output file. Therefore, we could implement the NN results by C++ or Matlab.
The following is an example of Tensorflow model. There are three hidden layer with one output layer. After the 3rd sigmoid layer, there is a dropout with probability equal to 0.9. I would like to know if it is possible to know which hidden units in the 3rd sigmoid layer are dropped.
def multilayer_perceptron(_x, _weights, _biases):
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(_x, _weights['h1']), _biases['b1']))
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, _weights['h2']), _biases['b2']))
layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, _weights['h3']), _biases['b3']))
layer_d = tf.nn.dropout(layer_3, 0.9)
return tf.matmul(layer_d, _weights['out']) + _biases['out']
Thank you very much!
There is a way to get the mask of 0 and 1, and of shape layer_3.get_shape() produced by tf.nn.dropout().
The trick is to give a name to your dropout operation:
layer_d = tf.nn.dropout(layer_3, 0.9, name='my_dropout')
Then you can get the wanted mask through the TensorFlow graph:
graph = tf.get_default_graph()
mask = graph.get_tensor_by_name('my_dropout/Floor:0')
The tensor mask will be of same shape and type as layer_d, and will only have values 0 or 1. 0 corresponds to the dropped neurons.
Simple and idiomatic solution (although possibly slightly slower than Oliver's):
# generate mask
mask = tf.nn.dropout(tf.ones_like(layer),rate)
# apply mask
dropped_layer = layer * mask