For my studies I need to train a deep NN to identify certain sounds and their delays. We have 1X25K sample points (microphone output) and need quantification of events and their intensity.
In order to simplify the model to look more like the MNIST training procedure, for now we use the classification for the quantification (if there are two events each with intensity of 5 and 3, the output would be 8 and the delays vector).
we tried to throw the data [trainNum, 25000] to a 3 layered NN with 250, 100 and 50 neurons and adamoptimizer for three classes output as 100\ 010\001 [trainNum, 3] . The cost is not reducing from 400 and accuracy is 30%.
Please would appreciate any help and comments.
additional information: 2700 samples, 270 batches, 10 epochs. Used the following tutorial and changed the data from the MNIST to out sound data - https://pythonprogramming.net/tensorflow-neural-network-session-machine-learning-tutorial/
Thank you in advance
All the best,
AA
I have rendered multiple images from an application. Here is sample images that illustrate two images that looks almost the same to the eye .
I try to compare them with the following command in image magick.
compare -metric AE img1.png img2.png diff.png
6384
This means 6384 pixels differ even if the images are similar.
I got minor changes like if a pattern is moved 1 pixel to the right this will give me a large error in number of different pixels. Is there a good way of do this kind of diff with ImageMagick? I have experimented with the fuzz parameter, but it really does not help me. Is ImageMagick compare only suited for comparing photographic images? Are there better switches to ImageMagick that can recognize a text that has moved some pixels and report it as equal? Should I use another tool?
Edit:
Adding an example on a image that looks clearly different for a human and will illustrate the kind of difference I am trying to differentiate. In this image not many pixels are changed, but the visible pattern is clearly changed.
It's hard to give any detailed answer as I don't know what you are looking for or expecting. I guess you may need some sort of Perceptual Hash if you are looking for images that people would perceive as similar or dissimilar, or maybe a Scale/Rotation/Translation Invariant technique that identifies similar images independently of resizes, shifts and rotations.
You could look at the Perceptual Hash and Image Moments with ImageMagick like this:
identify -verbose -features 1 -moments 1.png
Image: 1.png
Format: PNG (Portable Network Graphics)
Mime type: image/png
Class: PseudoClass
Geometry: 103x115+0+0
Resolution: 37.79x37.79
Print size: 2.72559x3.04313
Units: PixelsPerCentimeter
Type: Grayscale
Base type: Grayscale
Endianess: Undefined
Colorspace: Gray
Depth: 8-bit
Channel depth:
gray: 8-bit
Channel statistics:
Pixels: 11845
Gray:
min: 62 (0.243137)
max: 255 (1)
mean: 202.99 (0.79604)
standard deviation: 85.6322 (0.335812)
kurtosis: -0.920271
skewness: -1.0391
entropy: 0.840719
Channel moments:
Gray:
Centroid: 51.6405,57.1281
Ellipse Semi-Major/Minor axis: 66.5375,60.336
Ellipse angle: 0.117192
Ellipse eccentricity: 0.305293
Ellipse intensity: 190.641 (0.747614)
I1: 0.000838838 (0.213904)
I2: 6.69266e-09 (0.00043519)
I3: 3.34956e-15 (5.55403e-08)
I4: 5.38335e-15 (8.92633e-08)
I5: 2.27572e-29 (6.25692e-15)
I6: -4.33202e-19 (-1.83169e-09)
I7: -2.16323e-30 (-5.94763e-16)
I8: 3.96612e-20 (1.67698e-10)
Channel perceptual hash:
Red, Hue:
PH1: 0.669868, 11
PH2: 3.35965, 11
PH3: 7.27735, 11
PH4: 7.05343, 11
PH5: 11, 11
PH6: 8.746, 11
PH7: 11, 11
Green, Chroma:
PH1: 0.669868, 11
PH2: 3.35965, 11
PH3: 7.27735, 11
PH4: 7.05343, 11
PH5: 11, 11
PH6: 8.746, 11
PH7: 11, 11
Blue, Luma:
PH1: 0.669868, 0.669868
PH2: 3.35965, 3.35965
PH3: 7.27735, 7.27735
PH4: 7.05343, 7.05343
PH5: 11, 11
PH6: 8.746, 8.746
PH7: 11, 11
Channel features (horizontal, vertical, left and right diagonals, average):
Gray:
Angular Second Moment:
0.364846, 0.615673, 0.372224, 0.372224, 0.431242
Contrast:
0.544246, 0.0023846, 0.546612, 0.546612, 0.409963
Correlation:
-0.406263, 0.993832, -0.439964, -0.439964, -0.07309
Sum of Squares Variance:
1.19418, 1.1939, 1.19101, 1.19101, 1.19253
Inverse Difference Moment:
0.737681, 1.00758, 0.745356, 0.745356, 0.808993
Sum Average:
1.63274, 0.546074, 1.63983, 1.63983, 1.36462
Sum Variance:
4.43991, 0.938019, 4.46048, 4.46048, 3.57472
Sum Entropy:
0.143792, 0.159713, 0.143388, 0.143388, 0.14757
Entropy:
0.462204, 0.258129, 0.461828, 0.461828, 0.410997
Difference Variance:
0.0645055, 0.189604, 0.0655494, 0.0655494, 0.0963021
Difference Entropy:
0.29837, 0.003471, 0.297282, 0.297282, 0.224101
Information Measure of Correlation 1:
-0.160631, -0.971422, -0.146024, -0.146024, -0.356026
Information Measure of Correlation 2:
0.294281, 0.625514, 0.29546, 0.29546, 0.377679
You could also go on Fred Weinhaus's excellent website (here) and download his script called moments which will calculate the Hu and Maitra moments and see if those will tell you what you want. Basically, you could run the script on each of your images like this:
./moments image1.png > 1.txt
./moments image2.png > 2.txt
and then use your favourite diff tool to see what has changed between the two images you wish to compare.
As I explained in my previous post here, I am trying to generate some cascade.xml files to recognize euro coins to be used in my iOS app. Anyway, I am founding many difficulties in understanding how to generate a .vec file to give as input to opencv_traincascade. This because I heard many dissenting views: someone told me that vector file must include only positive images containing only the object to recognize; someone else instead (and also as read in my tutorials) said that vector file must include "samples" images, in other words random backgrounds to which the object to recognize has been added by opencv_createsamples. In oter words with:
opencv_createsamples -img positives/1.png -bg negatives.txt -info 1.txt -num 210 -maxxangle 0.0 -maxyangle 0.0 -maxzangle 0.9 -bgcolor 255 -bgthresh 8 -w 48 -h 48
which generated 12000 images.
Finally, I have created the .vec file with:
cat *.txt > positives.txt
opencv_createsamples -info positives.txt -bg negatives.txt -vec 2.vec -num 12600 -w 48 -h 48
So, I would like to ask you which the correct images to be contained in vector files from the following two:
Moreover, which is the final command to give to launch the training? This is the ones I have used up to now:
opencv_traincascade -data final -vec 2.vec -bg negatives.txt -numPos 12000 -numNeg 3000 -numStages 20 -featureType HAAR -precalcValBufSize 2048 -precalcIdxBufSize 2048 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -w 48 -h 48 -mode ALL
where the .vec files contains 12000 samples images (background + coin to recognize).
Should the .vec file contain only positive images (only coins), how shall I tell opencv_traincascade to train using the samples images?
I really need to know how to do things correctly because I have launched many training which then lead to no correct result and since they take many hours or days to execute, I can't waste time anymore.
Thank to all for your attention.
UPDATE
I managed to create a cascade.xml file with LBP. See what happens if I give one of the image used as training samples to a simple OpenCV program:
while with an image like the following:
it does not work at all. I really don't know where I am making the mistake.
UPDATE
Maybe firstly converting positive images to gray scale could help?
I've used the negative samples database of the INRIA training http://pascal.inrialpes.fr/data/human/
and this input (png with alpha transparency around the coin):
using this with this command:
opencv_createsamples -img pos_color.png -num 10 -bg neg.txt -info test.dat -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100
-bgcolor 0 -bgthresh 0
produces output like this:
so background color obviously didn't work. Converting to grayscale in the beginning however gives me this input:
and same command produces output like this:
I know this is no answer to all of your questions, but maybe it still helps.
OpenCV cascades (HAAR, LBP) can excellently detect objects which have permanent features. As example all faces have nose, eyes and mouth at the same places. OpenCV cascades are trained to search common features in required class of object and ignore features which changes from object to object. The problem is conclude that the cascade uses rectangular shape of search window, but a coin has round shape. Therefore an image of the coin always will be have part of background.
So training images of the coin must includes all possible backgrounds in order to classifiers can ignore them (otherwise it will be detect coin only on the specific background).
So all training samples must have the same image ratio, size and position of the coin (square images with coin in center, diameter of coin = 0.8-0.9 image width), and different background!
we have been stuck on a problem on haar cascade training for a week now. Actually we are following this tutorial http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html for creating cascade xml file.
But on the last command
opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt
-numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 45
-numNeg 45 -w 90 -h 100 -mode ALL -precalcValBufSize 1024
-precalcIdxBufSize 1024
we get an error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std:: bad_alloc
Aborted (core dumped)
Specifications for the images are: 45 positive and 45 negative images(both with dimensions 90 X 100). I have made sure that samples.vec exists in the same folder and have also tried using 2048 for precalcValBufSize parameters. Please help us!
You've run out of memory. You have several options:
Use a 64-bit computer with more memory
Use smaller-size positive training images. 24x24 or 32x32 is typical. 64x64 is considered large.
Use LBP or HOG features instead of Haar. Haar features take orders of magnitude more memory than the others.
By the way, your negative images should not be the same size as your positive images. Negative images should be large scenes containing backgrounds typically associated with your objects of interest. opencv_traincascade automatically scans them for useful negative samples.
I am attempting to train a cascade classifier to detect deer in images. The problem is that my classifier always returns exactly one positive hit, in the direct center of the input image. This is true for a test image, a training image from the positive set, and a training image from the negative set.
For my positive training set, I am using the deer image set from the CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html). This gives me 5000 32x32 color images of deer in various poses. For the negative training set, I am using the images from the Labelme12-50k dataset (http://www.ais.uni-bonn.de/download/datasets.html), which gives me 39000 random images. I resized all of these images to 32x32 to make the sizes consistent with the positive training set.
I then created the positive vector with the following command:
./opencv_createsamples -info posFiles.txt -w 32 -h 32 -num 5000 -vec posVector.vec
The vector appeared to be created successfully. Then, I trained my cascade classifier using the command:
./opencv_traincascade -data /home/mitchell/ece492/Deerinator_Software/Deerinator_Software/trunk/Haar/data -vec posVector_5000.vec -bg negFiles.txt -numPos 4000 -numNeg 39000 -w 32 -h 32 -featureType LBP -numStages 18
The cascade classifier takes about 5 hours to train, and appears to have a negative rejection rate of 0.038 However, whenever I test the classifier on an image using the command:
./c-example-facedect --cascade=cascade.xml img.png
I always get the same result: a single hit in the center of the image. This happens for testing images, images from the positive training set, and images from the negative training set. I'm not sure what to do at this point - at this point, I'm just using the opencv sample executables. I'm not sure if the process is with my input training set or with my usage of the classifier. Anyone have any suggestions?
I think this fails because the picture samples are too small. I think they are just 32 by 32. How can that be used for positive samples? If I am wrong and the pictures are actually bigger, then teach me how to unpack them and I bet I can get this to run for you.