Why accurcy of CNN using Euclidean distance is 1 - computer-vision

I am a super noob in computer vision and ML. I was watching https://www.youtube.com/watch?v=hAeos2TocJ8 video tutorial and the professor said that: the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean distance is 0. Can someone please explain why? I really appreciate your help!

You have heard it wrong. In the video at the time 17:10. He actually said the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean distance is 100% and not 0. Since training data is used, the image under test is also present in the training set. Hence the minimum euclidean distance between the test image and training data is always 0.

Related

Spatial pyramid matching (SPM) for SIFT then input to SVM in C++

I am trying to classify MRI images of brain tumors into benign and malignant using C++ and OpenCV. I am planning on using bag-of-words (BoW) method after clustering SIFT descriptors using kmeans. Meaning, I will represent each image as a histogram with the whole "codebook"/dictionary for the x-axis and their occurrence count in the image for the y-axis. These histograms will then be my input for my SVM (with RBF kernel) classifier.
However, the disadvantage of using BoW is that it ignores the spatial information of the descriptors in the image. Someone suggested to use SPM instead. I read about it and came across this link giving the following steps:
Compute K visual words from the training set and map all local features to its visual word.
For each image, initialize K multi-resolution coordinate histograms to zero. Each coordinate histogram consist of L levels and each level
i has 4^i cells that evenly partition the current image.
For each local feature (let's say its visual word ID is k) in this image, pick out the k-th coordinate histogram, and then accumulate one
count to each of the L corresponding cells in this histogram,
according to the coordinate of the local feature. The L cells are
cells where the local feature falls in in L different resolutions.
Concatenate the K multi-resolution coordinate histograms to form a final "long" histogram of the image. When concatenating, the k-th
histogram is weighted by the probability of the k-th visual word.
To compute the kernel value over two images, sum up all the cells of the intersection of their "long" histograms.
Now, I have the following questions:
What is a coordinate histogram? Doesn't a histogram just show the counts for each grouping in the x-axis? How will it provide information on the coordinates of a point?
How would I compute the probability of the k-th visual word?
What will be the use of the "kernel value" that I will get? How will I use it as input to SVM? If I understand it right, is the kernel value is used in the testing phase and not in the training phase? If yes, then how will I train my SVM?
Or do you think I don't need to burden myself with the spatial info and just stick with normal BoW for my situation(benign and malignant tumors)?
Someone please help this poor little undergraduate. You'll have my forever gratefulness if you do. If you have any clarifications, please don't hesitate to ask.
Here is the link to the actual paper, http://www.csd.uwo.ca/~olga/Courses/Fall2014/CS9840/Papers/lazebnikcvpr06b.pdf
MATLAB code is provided here http://web.engr.illinois.edu/~slazebni/research/SpatialPyramid.zip
Co-ordinate histogram (mentioned in your post) is just a sub-region in the image in which you compute the histogram. These slides explain it visually, http://web.engr.illinois.edu/~slazebni/slides/ima_poster.pdf.
You have multiple histograms here, one for each different region in the image. The probability (or the number of items would depend on the sift points in that sub-region).
I think you need to define your pyramid kernel as mentioned in the slides.
A Convolutional Neural Network may be better suited for your task if you have enough training samples. You can probably have a look at Torch or Caffe.

Ratio of positive to negative data to use when training a cascade classifier (opencv)

So I'm using OpenCV's LBP detector. The shapes I'm detecting are all roughly circular (differing mostly in aspect ratio), with some wide changes in brightness/contrast, and a little bit of occlusion.
OpenCV's guide on how to train the detector is here
My main question to anyone with experience using it is how numPos and numNeg should be in relation to eachother? I have roughly 1000 positive samples (so ~900 being used per stage)
What I need to decide is how many negative samples to use per stage for training. I have about 20000 images from which to draw negative data, so redundancy isn't really an issue.
In general the rule I hear is 1:2, but that seems like under-utilization, given how much negative data I have at my disposal. On the flip side, what effects should I expect if I train my detector with 1:20? How should I determine the proper ratio?

Calculate body volume using kinect

I need an algorithm for calculating the body volume (in cubic meters) using kinect.
I know I can extract the cloud and the depth frame (isolating the body by using some methods of the skeleton NUI) but I don't know how to calculate the volume value from this matrix.
Exporting a volume block would be of any help?
If you need to compute the body volume precisely you can use the algorithm for generating Avatars from Kinect for monitoring obesity as it is demonstrated in this video, which shows an example of computing the volume of pregnant women using Kinect. Watch demo video.
The algorithm is described in details in the technical paper: A. Barmpoutis. 'Tensor Body: Real-time Reconstruction of the Human Body and Avatar Synthesis from RGB-D', IEEE Transactions on Cybernetics, Special issue on Computer Vision for RGB-D Sensors: Kinect and Its Applications, October 2013, Vol. 43(5), Pages: 1347-1356. Read PDF.
If you have depth and can determine distance via kinect sensor, and hence height, then you have x, y centimeters dimensions, via depth distance per z delta, and a rough z/2 per pixel/ray cloud-based approximation of depth cm. Keep in mind that human anterior and posterior are asymmetrical (hence the "rough" z/2 approximation - multiplying by 2).
If you can formalize a model of the human form, you can create a fitting algorithm that gives better approximate volume based on the given sensor information.

cvSVM training produces poor results for HOGDescriptor

My objective is to train an SVM and get support vectors which i can plug into opencv's HOGdescriptor for object detection.
I have gathered 4000~ positives and 15000~ negatives and I train using the SVM provided by opencv. the results give me too many false positives.(up to 20 per image) I would clip out the false positives and add them into the pool of negatives to retrain. and I would end up with even more false positives at times! I have tried adjusting L2HysThreshold of my hogdescriptor upwards to 300 without significant improvement. is my pool of positives and negatives large enough?
the SVM training is also much faster than expected. I have tried with a feature vector size of 2916 and 12996, using grayscale images and color images on separate tries. SVM training has never taken longer than 20 minutes. I use auto_train. I am new to machine learning but from what i hear training with a dataset as large as mine should take at least a day no?
I believe cvSVM is not doing much learning and according to http://opencv-users.1802565.n2.nabble.com/training-a-HOG-descriptor-td6363437.html, it is not suited for this purpose. does anyone with experience with cvSVM have more input on this?
I am considering using SVMLight http://svmlight.joachims.org/ but it looks like there isn't a way to visualize the SVM hyperplane. What are my options?
I use opencv2.4.3 and have tried the following setsups for hogdescriptor
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(5,5);
hog.blockSize = cv::Size(10,10);
hog.blockStride = cv::Size(5,5); //12996 feature vector
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(10,10);
hog.blockSize = cv::Size(20,20);
hog.blockStride = cv::Size(10,10); //2916 feature vector
Your first descriptor dimension is way too large to be any useful. To form any reliable SVM hyperplane, you need at least the same number of positive and negative samples as your descriptor dimensions. This is because ideally you need separating information in every dimension of the hyperplane.
The number of positive and negative samples should be more or less the same unless you provide your SVM trainer with a bias parameter (may not be available in cvSVM).
There is no guarantee that HOG is a good descriptor for the type of problem you are trying to solve. Can you visually confirm that the object you are trying to detect has a distinct shape with similar orientation in all samples? A single type of flower for example may have a unique shape, however many types of flowers together don't have the same unique shape. A bamboo has a unique shape but may not be distinguishable from other objects easily, or may not have the same orientation in all sample images.
cvSVM is normally not the tool used to train SVMs for OpenCV HOG. Use the binary form of SVMLight (not free for commercial purposes) or libSVM (ok for commercial purposes). Calculate HOGs for all samples using your C++/OpenCV code and write it to a text file in the correct input format for SVMLight/libSVM. Use either of the programs to train a model using linear kernel with the optimal C. Find the optimal C by searching for the best accuracy while changing C in a loop. Calculate the detector vector (a N+1 dimensional vector where N is the dimension of your descriptor) by finding all the support vectors, multiplying alpha values by each corresponding support vector, and then for each dimension adding all the resulting alpha * values to find an ND vector. As the last element add -b where b is the hyperplane bias (you can find it in the model file coming out of SVMLight/libSVM training). Feed this N+1 dimensional detector to HOGDescriptor::setSVMDetector() and use HOGDescriptor::detect() or HOGDescriptor::detectMultiScale() for detection.
I have had successful results using SVMLight to learn SVM models when training from OpenCV, but haven't used cvSVM, so can't compare.
The hogDraw function from http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html will visualise your descriptor.

Methods for calculating an optimal distance threshold for fisherfaces. Source code provided

I've been working on a face recognizer based on the fisherfaces implementation (the pre opencv 2.4 version) provided by bytefish. The actual fisherfaces algorithm is the same, the differences are mostly convenience ones:
-Image/storage compression.
-Classifications by string opposed to ints.
-Inclusive predictions (multiple results sorted by a percentage).
Note: The percentage is calculated with: percent = 1.0 - (lowdist/distthreshold)
Where lowdist is the lowest euclidian distance between a src matrix (the test face image) and a matrix in a projection set (the trained face images) and distthreshold is the maximum distance allowed.
The inclusive predictions are where I'm having trouble. I haven't found a decent way to calculate an optimal threshold to use. Currently I'm just choosing 2200.0 as a random value to test with. This of course produces a lot of flaky results, especially when face images are coming from random sources with different lighting and resolutions.
So my question is: Is there a way to calculate an optimal distance threshold to use with fisherfaces?
I've provided the source code to the recognizer below.
Ignore the method, "FBaseLDARecognizer::calculateOptimalThreshold", it isn't finished. The goal was to add a group of faces to the recognizer and then test against an unknown set of faces with known classifications and get the maximum and minimum correct distances. That is as far as I got, I haven't thought of a useful way to use that data yet. So currently it always returns 0.0.
Note: This is not finished, there are a few performance issues I've yet to clean up. Also, this code isn't commented. If further explanation is needed please let me know and I can comment and reupload the files.
Source Files:
Header
Source