OpenCV: how to create .vec file to use with opencv_traincascade - c++

As I explained in my previous post here, I am trying to generate some cascade.xml files to recognize euro coins to be used in my iOS app. Anyway, I am founding many difficulties in understanding how to generate a .vec file to give as input to opencv_traincascade. This because I heard many dissenting views: someone told me that vector file must include only positive images containing only the object to recognize; someone else instead (and also as read in my tutorials) said that vector file must include "samples" images, in other words random backgrounds to which the object to recognize has been added by opencv_createsamples. In oter words with:
opencv_createsamples -img positives/1.png -bg negatives.txt -info 1.txt -num 210 -maxxangle 0.0 -maxyangle 0.0 -maxzangle 0.9 -bgcolor 255 -bgthresh 8 -w 48 -h 48
which generated 12000 images.
Finally, I have created the .vec file with:
cat *.txt > positives.txt
opencv_createsamples -info positives.txt -bg negatives.txt -vec 2.vec -num 12600 -w 48 -h 48
So, I would like to ask you which the correct images to be contained in vector files from the following two:
Moreover, which is the final command to give to launch the training? This is the ones I have used up to now:
opencv_traincascade -data final -vec 2.vec -bg negatives.txt -numPos 12000 -numNeg 3000 -numStages 20 -featureType HAAR -precalcValBufSize 2048 -precalcIdxBufSize 2048 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -w 48 -h 48 -mode ALL
where the .vec files contains 12000 samples images (background + coin to recognize).
Should the .vec file contain only positive images (only coins), how shall I tell opencv_traincascade to train using the samples images?
I really need to know how to do things correctly because I have launched many training which then lead to no correct result and since they take many hours or days to execute, I can't waste time anymore.
Thank to all for your attention.
UPDATE
I managed to create a cascade.xml file with LBP. See what happens if I give one of the image used as training samples to a simple OpenCV program:
while with an image like the following:
it does not work at all. I really don't know where I am making the mistake.
UPDATE
Maybe firstly converting positive images to gray scale could help?

I've used the negative samples database of the INRIA training http://pascal.inrialpes.fr/data/human/
and this input (png with alpha transparency around the coin):
using this with this command:
opencv_createsamples -img pos_color.png -num 10 -bg neg.txt -info test.dat -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100
-bgcolor 0 -bgthresh 0
produces output like this:
so background color obviously didn't work. Converting to grayscale in the beginning however gives me this input:
and same command produces output like this:
I know this is no answer to all of your questions, but maybe it still helps.

OpenCV cascades (HAAR, LBP) can excellently detect objects which have permanent features. As example all faces have nose, eyes and mouth at the same places. OpenCV cascades are trained to search common features in required class of object and ignore features which changes from object to object. The problem is conclude that the cascade uses rectangular shape of search window, but a coin has round shape. Therefore an image of the coin always will be have part of background.
So training images of the coin must includes all possible backgrounds in order to classifiers can ignore them (otherwise it will be detect coin only on the specific background).
So all training samples must have the same image ratio, size and position of the coin (square images with coin in center, diameter of coin = 0.8-0.9 image width), and different background!

Related

ImageMagick Perspective Distortion performance

My requirement is perspective rotating a user submitted image into a pre-rendered background image (which is actually an image per frame on a video).
The easiest was to use ImageMagick and I wrote a very crude simple bash script to achieve what I needed as follows:
#!/bin/bash
# #author: neurosys
# #Description: Perspective transforms and projects an alpha image
# onto a background image.
if [ $# -ne 11 ]
then
echo 'Usage: ./map_image.sh background.jpg image.png output.jpg x1 y1 x2 y2 x3 y3 x4 y4';
exit;
fi
BG=$1
IMAGE=$2
DEST=$3
TEMP='temp.png'
BG_SIZE_W=$(convert $BG -print "%w\n" /dev/null)
BG_SIZE_H=$(convert $BG -print "%h\n" /dev/null)
IMAGE_W=$(convert $IMAGE -print "%w\n" /dev/null)
IMAGE_H=$(convert $IMAGE -print "%h\n" /dev/null)
X1=$4
Y1=$5
X2=$6
Y2=$7
X3=$8
Y3=$9
X4=${10}
Y4=${11}
OFFSET=15
TRANSFORM="$OFFSET,$OFFSET, $X1,$Y1 $(($IMAGE_W+$OFFSET)),$OFFSET $X2,$Y2 $OFFSET, $(($IMAGE_H+$OFFSET)) $X3,$Y3 $(($IMAGE_W+$OFFSET)), $(($IMAGE_H+$OFFSET)) $X4,$Y4"
echo "Transform matrix: $TRANSFORM"
convert $IMAGE -background transparent -extent $BG_SIZE_W\x$BG_SIZE_H-$OFFSET-$OFFSET $TEMP
convert $TEMP -background transparent -distort Perspective "$TRANSFORM" $TEMP
convert $BG $TEMP -composite $DEST
rm -f $TEMP
However, it takes for about 4 seconds to produce the desired image on my computer as follows:
[neuro#neuro-linux ~]$ time ./map_image.sh bg.png Hp-lovecraft.jpg output.jpg 494 108 579 120 494 183 576 196 && nomacs output.jpg
Transform matrix: 15,15, 494,108 195,15 579,120 15, 267 494,183 195, 267 576,196
real 0m3.852s
user 0m3.437s
sys 0m0.037s
[neuro#neuro-linux ~]$
The order of operations as well as the parameters I use in the above ImageMagick script might not be optimal. So, any opinions or alternatives to achieve what I need are greatly welcome.
The images used for the above example are,
Background
User submitted image
Output
I am wondering if there is a way to speed up this so much so that I can generate frames for a one minute long video (25 fps * 60 sec) under a few seconds?
As a matter of fact, in case this approach fails, I may resort to writing an OpenGL program for this specifically, which I believe will be much faster given hardware leveraging.
Somewhat off-topic note: The background image is prerendered in animation software (3ds Max). In case I resort to writing an opengl renderer, I can import mesh and camera from 3ds Max and do so for better perspective and lighting.
Thanks.
Edit:
With the help of guys over at ImageMagick forum, the bottleneck turned out in the first conversion of -extent, which was unneeded.
I ended up combining all the commands into one:
convert image.png -background transparent +distort Perspective "1,1, 494,108 201,1 579,120 1, 201 494,183 201, 201 576,196" -compose DstOver -composite bg.png out.png
It runs in 0.6 seconds but transparency somehow does not work, so the output image ends up being only the distorted image with black background all around.
Edit:
Someone on ImageMagick forums wrote a very fast and clean script that reduced it to 0.13 seconds.
Here is the link, in case anyone needs it:
https://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=29495&p=132182#p132141
Try using MPC format as your $TEMP instead of PNG. Encoding of MPC is much much faster. It's designed for use as a temporary file, for use with ImageMagick.
MPC actually creates two files, *.mpc and *.cache, so you need to
remove both. In your script, set TEMP=temp.mpc and TEMPCACHE=temp.cache,
and then at the end of the script, rm $TEMP TEMPCACHE
See the MPC entry on the ImageMagick Formats page.
If I get the dimensions of an image using your technique, it takes around 0.4 seconds for width and another 0.4 seconds for height. I mean like this:
BG_SIZE_W=$(convert $BG -print "%w\n" /dev/null) # 0.48 sec
BG_SIZE_H=$(convert $BG -print "%h\n" /dev/null) # 0.48 sec
If you get both the width and the height in one go like this, it takes 0.006 seconds altogether on my machine:
read BG_SIZE_W BG_SIZE_H < <(identify -ping -format "%w %h" bg.png)
I am still looking at the rest of your code.

Compare rendered images using imagemagick

I have rendered multiple images from an application. Here is sample images that illustrate two images that looks almost the same to the eye .
I try to compare them with the following command in image magick.
compare -metric AE img1.png img2.png diff.png
6384
This means 6384 pixels differ even if the images are similar.
I got minor changes like if a pattern is moved 1 pixel to the right this will give me a large error in number of different pixels. Is there a good way of do this kind of diff with ImageMagick? I have experimented with the fuzz parameter, but it really does not help me. Is ImageMagick compare only suited for comparing photographic images? Are there better switches to ImageMagick that can recognize a text that has moved some pixels and report it as equal? Should I use another tool?
Edit:
Adding an example on a image that looks clearly different for a human and will illustrate the kind of difference I am trying to differentiate. In this image not many pixels are changed, but the visible pattern is clearly changed.
It's hard to give any detailed answer as I don't know what you are looking for or expecting. I guess you may need some sort of Perceptual Hash if you are looking for images that people would perceive as similar or dissimilar, or maybe a Scale/Rotation/Translation Invariant technique that identifies similar images independently of resizes, shifts and rotations.
You could look at the Perceptual Hash and Image Moments with ImageMagick like this:
identify -verbose -features 1 -moments 1.png
Image: 1.png
Format: PNG (Portable Network Graphics)
Mime type: image/png
Class: PseudoClass
Geometry: 103x115+0+0
Resolution: 37.79x37.79
Print size: 2.72559x3.04313
Units: PixelsPerCentimeter
Type: Grayscale
Base type: Grayscale
Endianess: Undefined
Colorspace: Gray
Depth: 8-bit
Channel depth:
gray: 8-bit
Channel statistics:
Pixels: 11845
Gray:
min: 62 (0.243137)
max: 255 (1)
mean: 202.99 (0.79604)
standard deviation: 85.6322 (0.335812)
kurtosis: -0.920271
skewness: -1.0391
entropy: 0.840719
Channel moments:
Gray:
Centroid: 51.6405,57.1281
Ellipse Semi-Major/Minor axis: 66.5375,60.336
Ellipse angle: 0.117192
Ellipse eccentricity: 0.305293
Ellipse intensity: 190.641 (0.747614)
I1: 0.000838838 (0.213904)
I2: 6.69266e-09 (0.00043519)
I3: 3.34956e-15 (5.55403e-08)
I4: 5.38335e-15 (8.92633e-08)
I5: 2.27572e-29 (6.25692e-15)
I6: -4.33202e-19 (-1.83169e-09)
I7: -2.16323e-30 (-5.94763e-16)
I8: 3.96612e-20 (1.67698e-10)
Channel perceptual hash:
Red, Hue:
PH1: 0.669868, 11
PH2: 3.35965, 11
PH3: 7.27735, 11
PH4: 7.05343, 11
PH5: 11, 11
PH6: 8.746, 11
PH7: 11, 11
Green, Chroma:
PH1: 0.669868, 11
PH2: 3.35965, 11
PH3: 7.27735, 11
PH4: 7.05343, 11
PH5: 11, 11
PH6: 8.746, 11
PH7: 11, 11
Blue, Luma:
PH1: 0.669868, 0.669868
PH2: 3.35965, 3.35965
PH3: 7.27735, 7.27735
PH4: 7.05343, 7.05343
PH5: 11, 11
PH6: 8.746, 8.746
PH7: 11, 11
Channel features (horizontal, vertical, left and right diagonals, average):
Gray:
Angular Second Moment:
0.364846, 0.615673, 0.372224, 0.372224, 0.431242
Contrast:
0.544246, 0.0023846, 0.546612, 0.546612, 0.409963
Correlation:
-0.406263, 0.993832, -0.439964, -0.439964, -0.07309
Sum of Squares Variance:
1.19418, 1.1939, 1.19101, 1.19101, 1.19253
Inverse Difference Moment:
0.737681, 1.00758, 0.745356, 0.745356, 0.808993
Sum Average:
1.63274, 0.546074, 1.63983, 1.63983, 1.36462
Sum Variance:
4.43991, 0.938019, 4.46048, 4.46048, 3.57472
Sum Entropy:
0.143792, 0.159713, 0.143388, 0.143388, 0.14757
Entropy:
0.462204, 0.258129, 0.461828, 0.461828, 0.410997
Difference Variance:
0.0645055, 0.189604, 0.0655494, 0.0655494, 0.0963021
Difference Entropy:
0.29837, 0.003471, 0.297282, 0.297282, 0.224101
Information Measure of Correlation 1:
-0.160631, -0.971422, -0.146024, -0.146024, -0.356026
Information Measure of Correlation 2:
0.294281, 0.625514, 0.29546, 0.29546, 0.377679
You could also go on Fred Weinhaus's excellent website (here) and download his script called moments which will calculate the Hu and Maitra moments and see if those will tell you what you want. Basically, you could run the script on each of your images like this:
./moments image1.png > 1.txt
./moments image2.png > 2.txt
and then use your favourite diff tool to see what has changed between the two images you wish to compare.

Gender Recognition Haarcascade

I've been doing some research and attempted to build a haarcascade for identifying gender.
I read this article, which describes how they did it, which i also tried to do : http://www.ijcce.org/papers/301-E043.pdf
I used a library of 228 male faces and 350 female faces.
Using the opencv createclassifier on my positives.txt file which contains a list of the male
faces. Using the .vec file create by the classifier I used haartraining with the following command:
opencv_traincascade -data classifier -vec positivies.vec -bg negatives.txt -numStages 20 -minHitRate 0.99 -maxFalseAlarmRate 0.5 -numPos 228 -numNeg 350 -w 640 -h 480 -mode ALL
After running this a few times I do not get a haar classifier.xml output file so I'm unsure whether I am doing everything correctly.
But my question is whether it is possible using male faces as positive samples and female as negative samples to train and use a haarcascade for classifying gender?
As already said in the comments with one cascade classifier you can only detect a male/female face or no face at all.
But you could just train two classifiers one for female and one for male and then run them both.
For the training I would recommend you to use more training examples.
I used this tutorial. It is for python, but can easily used for every other language, it might help you as well:
https://pythonprogramming.net/haar-cascade-object-detection-python-opencv-tutorial/

bad_alloc() error while haar cascade training

we have been stuck on a problem on haar cascade training for a week now. Actually we are following this tutorial http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html for creating cascade xml file.
But on the last command
opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt
-numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 45
-numNeg 45 -w 90 -h 100 -mode ALL -precalcValBufSize 1024
-precalcIdxBufSize 1024
we get an error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std:: bad_alloc
Aborted (core dumped)
Specifications for the images are: 45 positive and 45 negative images(both with dimensions 90 X 100). I have made sure that samples.vec exists in the same folder and have also tried using 2048 for precalcValBufSize parameters. Please help us!
You've run out of memory. You have several options:
Use a 64-bit computer with more memory
Use smaller-size positive training images. 24x24 or 32x32 is typical. 64x64 is considered large.
Use LBP or HOG features instead of Haar. Haar features take orders of magnitude more memory than the others.
By the way, your negative images should not be the same size as your positive images. Negative images should be large scenes containing backgrounds typically associated with your objects of interest. opencv_traincascade automatically scans them for useful negative samples.

OpenCV Traincascade returns garbage

I am attempting to train a cascade classifier to detect deer in images. The problem is that my classifier always returns exactly one positive hit, in the direct center of the input image. This is true for a test image, a training image from the positive set, and a training image from the negative set.
For my positive training set, I am using the deer image set from the CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html). This gives me 5000 32x32 color images of deer in various poses. For the negative training set, I am using the images from the Labelme12-50k dataset (http://www.ais.uni-bonn.de/download/datasets.html), which gives me 39000 random images. I resized all of these images to 32x32 to make the sizes consistent with the positive training set.
I then created the positive vector with the following command:
./opencv_createsamples -info posFiles.txt -w 32 -h 32 -num 5000 -vec posVector.vec
The vector appeared to be created successfully. Then, I trained my cascade classifier using the command:
./opencv_traincascade -data /home/mitchell/ece492/Deerinator_Software/Deerinator_Software/trunk/Haar/data -vec posVector_5000.vec -bg negFiles.txt -numPos 4000 -numNeg 39000 -w 32 -h 32 -featureType LBP -numStages 18
The cascade classifier takes about 5 hours to train, and appears to have a negative rejection rate of 0.038 However, whenever I test the classifier on an image using the command:
./c-example-facedect --cascade=cascade.xml img.png
I always get the same result: a single hit in the center of the image. This happens for testing images, images from the positive training set, and images from the negative training set. I'm not sure what to do at this point - at this point, I'm just using the opencv sample executables. I'm not sure if the process is with my input training set or with my usage of the classifier. Anyone have any suggestions?
I think this fails because the picture samples are too small. I think they are just 32 by 32. How can that be used for positive samples? If I am wrong and the pictures are actually bigger, then teach me how to unpack them and I bet I can get this to run for you.