I am using OpenCV 3.4.1.
I am working on a video classification project and am trying to use KNearest to classify between 2 categories. I have 8 areas of interest in each video frame. To make decision on each frame, each KNearest is done on the pixel values on each area. The majority win (and favor to one category if it is a tie). So, I have 8 sets of training data (one for each area of interest).
Problem: The response generated from the knn model changed when I labelled the categories differently.
The training data sets are organized as rows of:
[category label], data0, data1, data2....etc. (different dimensions for each
training set)
where dataX = pixel data of a frame (1 row = 1 frame)
Then, I build the model by:
Ptr<TrainData> tdata = TrainData::loadFromCSV(filename, 0, 0, -1, String("cat"));
Mat raw = tdata->getTrainSamples();
Mat res = tdata->getResponses();
PCA pca(raw, noArray(), PCA::DATA_AS_ROW, 0.99);
Mat knnIn = pca.project(raw);
Ptr<ml::KNearest> knn = ml::KNearest::create();
knn->train(knnIn, ml::ROW_SAMPLE, res);
After that, testing data is passed to the pca and knn to get the response.
To testing it, I put 1 set of testing data to the 8 knn models.
If I use 0 & 1 as the [category label] in the training data sets, the 8 responses from KNN are 1,1,1,1,1,0,0,1.
If I change the label to 2 & 1 instead (replace all '0' by '2' in the first column of the training data), the 8 responses becomes 1,1,1,1,1,1,1,1, while I am expecting 1,1,1,1,1,2,2,1.
Some observations:
no editing error on the training data(while changing the category labels).
The KNN model isClassifier()=true. DefaultK=10. AlgorithmType=1 (BRUTE_FORCE).
The result is consistent with the same training data set and testing data set.
I don't see any pattern on the difference responses for the 2 label sets (after using different
training data sets and testing data sets).
Please shed some light. Thank you very much!
Related
I have raster data for built up areas around the globe with 40m resolution as vrt file, download data from here , and I am trying to crop the data by a mask and then extract color index value for each cell.
Note: another 2 files exist with the data: vrt.clr and vrt.ovr
Here is a sample of data:
view of vrt data in arcmap.
My question: why I am getting empty cells values when I crop by mask ?
I have tried the following:
extract by mask using arcmap toolbox
using gdal in python 2.7
import gdal
ds = gdal.Open('input.vrt')
ds = gdal.Translate('output.vrt', ds, projWin =
[80.439,5.341,81.048,4.686])
ds = None
I have also try to save the data as tif
Also, is there any way to read the color index value at given coordinates (x,y) after masking the data?
The data appears to be in the Pseudo Mercator projection (EPSG 3857). So therefore you should either specify the extent for projWin in that coordinate system, or add projWinSRS if you want to provide them in a different coordinate system.
Also, if you want gdal.Translate to output to a VRT file, you should add format='VRT. Because in your code snippet outputs to the default file format, which is GeoTIFF.
When i assume your coordinates are WGS84 (EPSG 4326), it defines a small region over the ocean south of Sri Lanka. That doesn't make much sense given the nature of the data.
If you want to read the array given by your coordinates you could use:
invrt = 'GHS_BUILT_LDSMT_GLOBE_R2015B_3857_38_v1_0.vrt'
outfile = '/vsimem/tmpfile'
ds = gdal.Translate(outfile, invrt, projWin=[80.439, 5.341, 81.048, 4.686], projWinSRS='EPSG:4326')
data = ds.ReadAsArray()
ds = None
gdal.Unlink(outfile)
The plotted array looks like:
I am trying to train a Custom Object Detector by using the HOG+SVM method on OpenCV.
I have managed to extract HOG features from my positive and negative samples using the below line of code:
import cv2
hog = cv2.HOGDescriptor()
def poshoggify():
for i in range(1,20):
image = cv2.imread("/Users/munirmalik/cvprojek/cod/pos/" + str(i)+ ".jpg")
(winW, winH) = (500, 500)
for resized in pyramid(image, scale=1.5):
# loop over the sliding window for each layer of the pyramid
for (x, y, window) in sliding_window(resized, stepSize=32, windowSize=(winW, winH)):
# if the window does not meet our desired window size, ignore it
if window.shape[0] != winH or window.shape[1] != winW:
continue
img_pos = hog.compute(image)
np.savetxt('posdata.txt',img_pos)
return img_pos
And the equivalent function for the negative samples.
How do I format the data in such a way that the SVM knows which is positive and which is negative?
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
How do I format the data in such a way that the SVM knows which is positive and which is negative?
You would now create another list called labels which would store the class value associated with a corresponding image. For example, if you have a training set of features that looks like this:
features = [pos_features1, pos_features2, neg_features1, neg_features2, neg_features3, neg_features4]
you would have a corresponding labels class like
labels = [1, 1, 0, 0, 0, 0]
You would then feed this to a classifier like so:
clf=LinearSVC(C=1.0, class_weight='balanced')
clf.fit(features,labels)
Furthermore, how do I translate this training to the "test" of detecting the desired objects through my webcam?
Before training, you should have split your labelled dataset (groundtruth) into training and testing datasets. You can do this using skilearns KFold module.
Assume that I have given only one test image.
I extract images,labels using mnist.test.next_batch(100).
When i give 1 test image, I am getting 2 images (same image duplicated)
When I give 2 test images, I am getting 4 (2 images duplicated).
Same problem exists for training images.
I consoled the length of test images and train images inside read_data_sets method(...tensorflow/contrib/learn/python/learn/datasets/mnist.py).
It is giving correct length.
Here is my code.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, validation_size=0)
images, labels = mnist.test.next_batch(100)
print len(images) #double the actual length
I am a beginner in keras and I am trying to classify data with a neural network.
x_train = x_train.reshape(1,x_train.shape[0],window,5)
x_val = x_val.reshape(1,x_val.shape[0],window,5)
x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
model = Sequential()
model.add(Dense(64,activation='relu',input_shape= (data_dim,window,5)))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2,activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
weights = model.get_weights()
model_info = model.fit(x_train, y_train,batch_size=batchsize, nb_epoch=15,verbose=1,validation_data=(x_val, y_val))
print x_train.shape
#(1,1600,45,5)
print y_train.shape
#(1600,2)
I always have this error with this script and I don't understand why:
ValueError: Error when checking target: expected dense_3 to have 4 dimensions, but got array with shape (16000, 2)
Your model's output (dense_3, so named because it is the third Dense layer) has four dimensions. However, the labels you are attempting to compare it to (y_train) is only two dimensions. You will need to alter your network's architecture so that your model reshapes the data to match the labels.
Keeping track of tensor shapes is difficult when you're just starting out, so I recommend calling plot_model(model, to_file='model.png', show_shapes=True) before calling model.fit. You can look at the resulting PNG to understand what effect layers are having on the shape of your data.
I want to use Google's Tensorflow to return similar images to an input image.
I have installed Tensorflow from http://www.tensorflow.org (using PIP installation - pip and python 2.7) on Ubuntu14.04 on a virtual machine CPU.
I have downloaded the trained model Inception-V3 (inception-2015-12-05.tgz) from http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz that is trained on ImageNet Large Visual Recognition Challenge using the data from 2012, but I think it has both the Neural network and the classifier inside it (as the task there was to predict the category). I have also downloaded the file classify_image.py that classifies an image in 1 of the 1000 classes in the model.
So I have a random image image.jpg that I an running to test the model. when I run the command:
python /home/amit/classify_image.py --image_file=/home/amit/image.jpg
I get the below output: (Classification is done using softmax)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 3
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 3
trench coat (score = 0.62218)
overskirt (score = 0.18911)
cloak (score = 0.07508)
velvet (score = 0.02383)
hoopskirt, crinoline (score = 0.01286)
Now, the task at hand is to find images that are similar to the input image (image.jpg) out of a database of 60,000 images (jpg format, and kept in a folder at /home/amit/images). I believe this can be done by removing the final classification layer from the inception-v3 model, and using the feature set of the input image to find cosine distance from the feature set all the 60,000 images, and we can return the images having less distance (cos 0 = 1)
Please suggest me the way forward for this problem and how do I do this using Python API.
I think I found an answer to my question:
In the file classify_image.py that classifies the image using the pre trained model (NN + classifier), I made the below mentioned changes (statements with #ADDED written next to them):
def run_inference_on_image(image):
"""Runs inference on an image.
Args:
image: Image file name.
Returns:
Nothing
"""
if not gfile.Exists(image):
tf.logging.fatal('File does not exist %s', image)
image_data = gfile.FastGFile(image, 'rb').read()
# Creates graph from saved GraphDef.
create_graph()
with tf.Session() as sess:
# Some useful tensors:
# 'softmax:0': A tensor containing the normalized prediction across
# 1000 labels.
# 'pool_3:0': A tensor containing the next-to-last layer containing 2048
# float description of the image.
# 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
# encoding of the image.
# Runs the softmax tensor by feeding the image_data as input to the graph.
softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
feature_tensor = sess.graph.get_tensor_by_name('pool_3:0') #ADDED
predictions = sess.run(softmax_tensor,
{'DecodeJpeg/contents:0': image_data})
predictions = np.squeeze(predictions)
feature_set = sess.run(feature_tensor,
{'DecodeJpeg/contents:0': image_data}) #ADDED
feature_set = np.squeeze(feature_set) #ADDED
print(feature_set) #ADDED
# Creates node ID --> English string lookup.
node_lookup = NodeLookup()
top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
I ran the pool_3:0 tensor by feeding in the image_data to it. Please let me know if I am doing a mistake. If this is correct, I believe we can use this tensor for further calculations.
Tensorflow now has a nice tutorial on how to get the activations before the final layer and retrain a new classification layer with different categories:
https://www.tensorflow.org/versions/master/how_tos/image_retraining/
The example code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
In your case, yes, you can get the activations from pool_3 the layer below the softmax layer (or the so-called bottlenecks) and send them to other operations as input:
Finally, about finding similar images, I don't think imagenet's bottleneck activations are very pertinent representation for image search. You could consider to use an autoencoder network with direct image inputs.
(source: deeplearning4j.org)
Your problem sounds similar to this visual search project