I'm using OpenCV and C++. I want to check if an image is part of another image and already have found a function called matchTemplate which is working. But what if the template image is a little bit differently? Is there a function or a way like matchTemplate that checks if a template is part of a source image, but with tolerance parameters like position, angle, size and maybe even deformation? Or do I need a completely different approach here than template matching?
Here's my code so far, which finds a template image in a source image, but without (or almost without) tolerance.
#include <opencv2\core\core.hpp>
#include <opencv2\highgui\highgui.hpp>
#include <opencv2\imgproc\imgproc.hpp>
#include <opencv2\highgui\highgui.hpp>
#include <iostream>
#include <stdio.h>
using namespace cv;
using namespace std;
/// Global Variables
Mat img; Mat templ; Mat result;
const char* image_window = "Source Image";
const char* result_window = "Result window";
int match_method;
int max_Trackbar = 5;
/// Function Headers
void MatchingMethod( int, void* );
* #function main
int main( int, char** argv )
/// Load image and template
img = imread( "a1.jpg", 1 );
templ = imread( "a2.jpg", 1 );
/// Create windows
namedWindow( image_window, WINDOW_AUTOSIZE );
namedWindow( result_window, WINDOW_AUTOSIZE );
/// Create Trackbar
const char* trackbar_label = "Method: \n 0: SQDIFF \n 1: SQDIFF NORMED \n 2: TM CCORR \n 3: TM CCORR NORMED \n 4: TM COEFF \n 5: TM COEFF NORMED";
createTrackbar( trackbar_label, image_window, &match_method, max_Trackbar, MatchingMethod );
MatchingMethod( 0, 0 );
return 0;
* #function MatchingMethod
* #brief Trackbar callback
void MatchingMethod( int, void* )
/// Source image to display
Mat img_display;
img.copyTo( img_display );
/// Create the result matrix
int result_cols = img.cols - templ.cols + 1;
int result_rows = img.rows - templ.rows + 1;
result.create( result_cols, result_rows, CV_32FC1 );
/// Do the Matching and Normalize
matchTemplate( img, templ, result, match_method );
normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() );
/// Localizing the best match with minMaxLoc
double minVal; double maxVal; Point minLoc; Point maxLoc;
Point matchLoc;
minMaxLoc( result, &minVal, &maxVal, &minLoc, &maxLoc, Mat() );
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if( match_method == TM_SQDIFF || match_method == TM_SQDIFF_NORMED )
{ matchLoc = minLoc; }
{ matchLoc = maxLoc; }
/// Show me what you got
rectangle( img_display, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
rectangle( result, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
imshow( image_window, img_display );
imshow( result_window, result );
The images I'm using in my code:

You've identified the major limitation with template matching. It's very fragile to any deformation of the image. Template-matching works by sliding a template-sized box around the image, and checking the similarity between the template and the region inside the box. It checks similarity using a pixel-by-pixel comparison method, such as normalized cross-correlation. If you want to allow different sizes and rotations, you'll need to write a loop that scales the original template up or down, or rotates it. It gets really inefficient.
If you want to allow deformation, and also do a more efficient search at different scales and rotations, the standard method is SURF. It's very efficient, and quite accurate if your images have good resolution, which yours do. You can google tutorials and find sample code for finding objects using SURF. Basically SURF identifies keypoints (distinctive image regions) in the template and the image. Then, you find the region in the image with the largest number of keypoints which match the template. (If you're already doing this, and it's what you meant by "feature matching," then I think you're on the right track.)


c++, opecv: HoughLines() doesn't receive the values passed by trackbars

I'm supposed to detect the two white lines of the road with the function HoughLines. I use three trackbars in order to find the best parameters to detect ONLY the two white lines of the road. I have tried this: (the problem is that it looks like even if I change the values of the trackbars it doesn't updates the images, it is still at the first values). I'm using opencv with c++.
Without trackbars it works, but it's almost impossibile finding good values without it because I don't know how to tune the parameters and the image is pretty complex.
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/calib3d.hpp>
using namespace cv;
using namespace std;
const int kernel_size = 3;
Mat src, src_gray;
Mat dst, detected_edges;
Mat cdst;
int slider_value_one;
int slider_value_two;
int slider_value_three;
vector<Vec2f> lines; // will hold the results of the detection
static void Hough_transform(int, void*)
// Standard Hough Line Transform
HoughLines(detected_edges, lines, 1, CV_PI/180, 130,slider_value_one,slider_value_two); // runs the actual detectio
// Draw the lines
for( size_t i = 0; i < lines.size(); i++ )
float rho = lines[i][0], theta = lines[i][1];
Point pt1, pt2;
double a = cos(theta), b = sin(theta);
double x0 = a*rho, y0 = b*rho;
pt1.x = cvRound(x0 + 1000*(-b));
pt1.y = cvRound(y0 + 1000*(a));
pt2.x = cvRound(x0 - 1000*(-b));
pt2.y = cvRound(y0 - 1000*(a));
line( cdst, pt1, pt2, Scalar(0,0,255), 3, CV_AA);
imshow("standard Hough Line Transform", cdst);
int main(int argc, const char * argv[]) {
//-----Loads an image
src = imread("/Users/massimilianolorenzin/Documents/Progetti\ XCode/lab4/lab4/lab4/images/road2.png");
/// Convert the image to grayscale
cvtColor( src, src_gray, CV_BGR2GRAY);
/// Reduce noise with a kernel 3x3
blur( src_gray, detected_edges, Size(3,3) );
/// Canny detector
Canny( detected_edges, detected_edges, 150, 450, kernel_size );
// Copy edges to the images that will display the results in BGR
cvtColor(detected_edges, cdst, COLOR_GRAY2BGR);
namedWindow("standard Hough Line Transform"); // Create Window
//first TrackBar
createTrackbar( "First Par", "standard Hough Line Transform", &slider_value_one, 200, Hough_transform);
Hough_transform(slider_value_one,0 );
//second TrackBar
createTrackbar( "Second Par", "standard Hough Line Transform", &slider_value_two, 100, Hough_transform);
Hough_transform(slider_value_two, 0 );
//third TrackBar
createTrackbar( "Third Par", "standard Hough Line Transform", &slider_value_three, 100, Hough_transform);
Hough_transform( slider_value_three, 0 );
imshow("Input Image",src);
imshow( "edges", detected_edges );
return 0;
HoughLines() looks doesn't answer at the values set with the trackbars. The window appear normally, slider_value_one, slider_value_two, slider_value_three have the right values because I printed and i saw them, so I don't understand why HoughLines() doesn't take the passed values.
enter image description here
Above there is the input image, while below there is the final ouput. I am asked, in this step, just to create the two lines to the side of the road, coloring the area between the two lines( like in the photo) is requested in the next step.

Does the StereoBM class in opencv do rectification of the input images or frames?

I am using the SteroBM class for a stereo vision as part of my project. I am taking the input frames from 2 Web cams and running the Stereo block matching computation on the input frames gray scale frames without rectification. The output I am getting is far from the ground truth(very patchy). I want to know, is it because I am not doing rectification on input frames. Moreover, the base line I have chosen to keep at 20cm. I am using opencv-3.2.0 version c++.
The code I am running is given below.
#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
#include </home/eswar/softwares/opencv_contrib-3.2.0/modules/contrib_world/include/opencv2/contrib_world.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/calib3d.hpp>
#include <opencv2/imgproc.hpp>
#include <stdio.h>
#include <iostream>
using namespace std;
using namespace cv;
int main()
//initialize and allocate memory to load the video stream from camera
VideoCapture camera0(0);
VideoCapture camera1(1);
if( !camera0.isOpened() ) return 1;
if( !camera1.isOpened() ) return 1;
Mat frame0,frame1;
Mat frame0gray,frame1gray;
Mat dispbm,dispsgbm;
Mat dispnorm_bm,dispnorm_sgbm;
Mat falseColorsMap, sfalseColorsMap;
int ndisparities = 16*5; /**< Range of disparity */
int SADWindowSize = 21; /**< Size of the block window. Must be odd */
Ptr<StereoBM> sbm = StereoBM::create( ndisparities, SADWindowSize );
Ptr<StereoSGBM> sgbm = StereoSGBM::create(0, //int minDisparity
96, //int numDisparities
5, //int SADWindowSize
600, //int P1 = 0
2400, //int P2 = 0
10, //int disp12MaxDiff = 0
16, //int preFilterCap = 0
2, //int uniquenessRatio = 0
20, //int speckleWindowSize = 0
30, //int speckleRange = 0
true); //bool fullDP = false
//-- Check its extreme values
double minVal; double maxVal;
//grab and retrieve each frames of the video sequentially
camera0 >> frame0;
camera1 >> frame1;
imshow("Video0", frame0);
imshow("Video1", frame1);
sbm->compute( frame0gray, frame1gray, dispbm );
minMaxLoc( dispbm, &minVal, &maxVal );
dispbm.convertTo( dispnorm_bm, CV_8UC1, 255/(maxVal - minVal));
sgbm->compute(frame0gray, frame1gray, dispsgbm);
minMaxLoc( dispsgbm, &minVal, &maxVal );
dispsgbm.convertTo( dispnorm_sgbm, CV_8UC1, 255/(maxVal - minVal));
imshow( "BM", dispnorm_bm);
imshow( "SGBM",dispnorm_sgbm);
//wait for 40 milliseconds
int c = cvWaitKey(40);
//exit the loop if user press "Esc" key (ASCII value of "Esc" is 27)
if(27 == char(c)) break;
return 0;
Although in the code you see block matching also being used, please ignore because its giving even worse output. I find that the SGBM output is closer to the ground truth and therefore I've decided to improve on it. However if any help about how the block matching results can be improved. It would great and I'd certainly appreciate that.
Th output image depth image for SGBM technique looks like.
No, StereoBM doesn't do rectification, just block matching and some pre and post processing, however opencv provide functions for camera calibration and rectification check this link
Also there is a ready made example for this process in opencv examples so don't have to write the code from scratch.
About the results, StereoBM is based on SAD algorithm(local stereo-matching) which is not robust, you can try wls filter, which could improve your results significantly.
StereoSGBM is based on SGM algorithm (actually it is a little different from the one introduced in the original paper) is semi global algorithm which consider global optimisation in disparity map generation which produce better disparity but slower.
As indicated above I tried rectification of the frames. The code is below.
#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
#include </home/eswar/softwares/opencv_contrib-3.2.0/modules/contrib_world /include/opencv2/contrib_world.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/calib3d.hpp>
#include <opencv2/imgproc.hpp>
#include <stdio.h>
#include <iostream>
#include <opencv2/xfeatures2d/nonfree.hpp>
using namespace std;
using namespace cv;
using namespace cv::xfeatures2d;
int main()
//initialize and allocate memory to load the video stream from camera
VideoCapture camera0(0);
VideoCapture camera1(1);
int count=0;
Mat loRes, hiRes;
if( !camera0.isOpened() ) return 1;
if( !camera1.isOpened() ) return 1;
camera0.set(CV_CAP_PROP_FRAME_WIDTH, 400);
camera0.set(CV_CAP_PROP_FRAME_HEIGHT, 400);
camera1.set(CV_CAP_PROP_FRAME_WIDTH, 400);
camera1.set(CV_CAP_PROP_FRAME_HEIGHT, 400);
Mat frame0,frame1;
Mat frame0gray,frame1gray;
Mat dispbm,dispsgbm,disparity,disparity1;
Mat dispnorm_bm,dispnorm_sgbm;
Mat falseColorsMap, sfalseColorsMap,falsemap;
Mat img_matches;
Mat H1,H2;
int ndisparities = 96; /**< Range of disparity */
int SADWindowSize = 7;
Ptr<StereoBM> sbm = StereoBM::create( ndisparities, SADWindowSize );
Ptr<StereoSGBM> sgbm = StereoSGBM::create(-3, //int minDisparity
96, //int numDisparities
7, //int SADWindowSize
60, //int P1 = 0
2400, //int P2 = 0
90, //int disp12MaxDiff = 0
16, //int preFilterCap = 0
1, //int uniquenessRatio = 0
60, //int speckleWindowSize = 0
20, //int speckleRange = 0
true); //bool fullDP = false
//-- Check its extreme values
double minVal; double maxVal;
double max_dist = 0;
double min_dist = 100;
int minHessian = 630;
Ptr<Feature2D> f2d = SIFT::create();
vector<KeyPoint> keypoints_1, keypoints_2;
Ptr<Feature2D> fd = SIFT::create();
Mat descriptors_1, descriptors_2;
BFMatcher matcher(NORM_L2, true); //BFMatcher matcher(NORM_L2);
vector< DMatch > matches;
vector< DMatch > good_matches;
vector<uchar> status;
//grab and retrieve each frames of the video sequentially
camera0 >> frame0;
camera1 >> frame1;
imshow("Video0", frame0);
imshow("Video1", frame1);
sbm->compute( frame0gray, frame1gray, dispbm );
minMaxLoc( dispbm, &minVal, &maxVal );
dispbm.convertTo( dispnorm_bm, CV_8UC1, 255/(maxVal - minVal));
sgbm->compute(frame0gray, frame1gray, dispsgbm);
minMaxLoc( dispsgbm, &minVal, &maxVal );
dispsgbm.convertTo( dispnorm_sgbm, CV_8UC1, 255/(maxVal - minVal));
applyColorMap(dispnorm_bm, falseColorsMap, cv::COLORMAP_JET);
applyColorMap(dispnorm_sgbm, sfalseColorsMap, cv::COLORMAP_JET);
f2d->detect( frame0gray, keypoints_1 );
f2d->detect( frame1gray, keypoints_2 );
//-- Step 2: Calculate descriptors (feature vectors)
fd->compute( frame0gray, keypoints_1, descriptors_1 );
fd->compute( frame1gray, keypoints_2, descriptors_2 );
//-- Step 3: Matching descriptor vectors with a brute force matcher
matcher.match( descriptors_1, descriptors_2, matches );
drawMatches(frame0gray, keypoints_1, frame1gray, keypoints_2, matches, img_matches);
imshow("matches", img_matches);
//-- Quick calculation of max and min distances between keypoints
for( int i = 0; i < matches.size(); i++ )
{ double dist = matches[i].distance;
if( dist < min_dist ) min_dist = dist;
if( dist > max_dist ) max_dist = dist;
for( int i = 0; i < matches.size(); i++ )
if( matches[i].distance <= max(4.5*min_dist, 0.02) ){
good_matches.push_back( matches[i]);
Mat F = findFundamentalMat(imgpts1, imgpts2, cv::FM_RANSAC, 3., 0.99, status); //FM_RANSAC
stereoRectifyUncalibrated(imgpts1, imgpts1, F, frame0gray.size(), H1, H2);
Mat rectified1(frame0gray.size(), frame0gray.type());
warpPerspective(frame0gray, rectified1, H1, frame0gray.size());
Mat rectified2(frame1gray.size(), frame1gray.type());
warpPerspective(frame1gray, rectified2, H2, frame1gray.size());
sgbm->compute(rectified1, rectified2, disparity);
minMaxLoc( disparity, &minVal, &maxVal );
disparity.convertTo( disparity1, CV_8UC1, 255/(maxVal - minVal));
applyColorMap(disparity1, falsemap, cv::COLORMAP_JET);
imshow("disparity_rectified_color", falsemap);
imshow( "BM", falseColorsMap);
imshow( "CSGBM",sfalseColorsMap);
//wait for 40 milliseconds
int c = cvWaitKey(40);
//exit the loop if user press "Esc" key (ASCII value of "Esc" is 27)
if(27 == char(c)) break;
return 0;
Now the output again isn't that good but improved from last time. However there seems to be one constant problem, that is also seen in the above image. The left side of the output image has a total black region. It shouldn't come this way right.
How to solve this problem?
Any help appreciated.

ASL hand sign detection opencv

I am a bit new to opencv and could use some help. I want to detect ASL hand signs.
For detecting hands, I can use either detection by skin color or a haar classifier. I already detect hands, but the problem is detecting the hand shape.
I can get the curent hand shape using the algorithm described here, so the problem is how do I compare this shape to my database of shapes?
I tried comparing them using the algorithm described here, which detects similar features images have. The problem is that this will match it with all the hands, since...well it detects them as hands. For instance, check this image, it should point only to V, but it detects features in W and R, too.
I want my final result to be like here, so how can I compare image shapes? Is my approach wrong?
I was thinking that detecting by convexity hull won't work, because most of the signs are closed fists. Check O, for instance, it has no open fingers, so I thought that trying to compare contours would be the best. How to compare them, though? FLANN doesn't seem to work. Or I'm doing it wrong.
Would a Haar cascade classifier work? Or would it detect two hands in different positions as hands as well?
Or is there another way to match shapes? That could solve my problem, but I couldn't find any example that does for custom shapes, only for ones like rectangles, circles and triangles.
Ok, I've been playing a bit with matchShapes as berak told me. Here's my code below(it's a bit messy as I'm testing currently).
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace cv;
using namespace std;
Mat src; Mat src_gray;
int thresh = 10;
int max_thresh = 300;
/// Function header
void thresh_callback(int, void* );
/** #function main */
int main( int argc, char** argv )
/// Load source image and convert it to gray
src = imread( argv[1], 1 );
/// Convert image to gray and blur it
cvtColor( src, src_gray, CV_BGR2GRAY );
blur( src_gray, src_gray, Size(3,3) );
/// Create Window
char* source_window = "Source";
namedWindow( source_window, CV_WINDOW_AUTOSIZE );
imshow( source_window, src );
createTrackbar( " Canny thresh:", "Source", &thresh, max_thresh, thresh_callback );
thresh_callback( 0, 0 );
/** #function thresh_callback */
void thresh_callback(int, void* )
Mat canny_output;
vector<vector<Point> > contours;
vector<Vec4i> hierarchy;
double largest_area=0;
int largest_contour_index=0;
Rect bounding_rect;
/// Detect edges using canny
Canny( src_gray, canny_output, thresh, thresh*2, 3 );
/// Find contours
findContours( canny_output, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );
/// Draw contours
Mat drawing = Mat::zeros( canny_output.size(), CV_8UC3 );
vector<vector<Point> >hull( contours.size() );
for( int i = 0; i< contours.size(); i++ )
{ Scalar color = Scalar( 255,255,255 );
convexHull( Mat(contours[i]), hull[i], false );
// imshow("conturul"+to_string(i), drawing );
double a=contourArea( hull[i],false); // Find the area of contour
largest_contour_index=i; //Store the index of largest contour
cout<<"zaindex "<<largest_contour_index<<endl;
Scalar color = Scalar( 255,255,255 );
drawContours( drawing, hull, largest_contour_index, color, 2, 8, hierarchy, 0, Point() );
namedWindow( "maxim", CV_WINDOW_AUTOSIZE );
imshow( "maxim", drawing );
Mat rects=imread( "scene.png", 1 );
rectangle(rects, bounding_rect, Scalar(0,255,0),1, 8,0);
imshow( "maxim2", rects );
/// Show in a window
The problem with it is the definition of a contour. These hand 'contours' are actually made of multiple contours themselves and that image that I showed earlier is actually made of these multiple contours but overlapped with eachother. matchShapes accepts arrays of Points as parameters, but the contours are arrays of arrays of Points.
So my question is, how can I add my contours vector with itself so I can pass it to matchShapes? In other words, how can I make a single contour from multiple overlapped contours?

goodFeaturesToTrack OpenCV 2.4 Extremely Slow compared to Opencv1

I have this really strange issue and I think I might be doing something wrong, but I have an opencv1 implementation for Pyramidal Lucas Kanade and an opencv2 implementation. The difference is that the opencv2 takes MUCH longer to run (in particular the goodFeaturesToTrack function) vs. the opencv1. In addition, including the opencv2 libs and headers in the opencv1 implmentation results in that one becoming extremely slow as well (we're talking about 0.002 s per two images vs. 1 second per two images). Am I doing something wrong?
Windows 7, 64 bit. Here is the opencv2 code that runs really slow, at about 1 frame per second. As I said, taking the opencv1 implementation and switching library version causes the same slow down by a factor of 10 or more. I think this is very weird and google came up with no information! THANKS!!!
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
#include <cmath>
using namespace cv;
using namespace std;
int64 now, then;
double elapsed_seconds, tickspersecond=cvGetTickFrequency() * 1.0e6;
int main(int argc, char** argv)
// Load two images and allocate other structures
Mat imgA = imread("0000.png", CV_LOAD_IMAGE_GRAYSCALE);
Mat imgB = imread("0001.png", CV_LOAD_IMAGE_GRAYSCALE);
Size img_sz = imgA.size();
Mat imgC(img_sz,1);
int win_size = 15;
int maxCorners = 100;
double qualityLevel = 0.05;
double minDistance = 2.0;
int blockSize = 3;
double k = 0.04;
std::vector<cv::Point2f> cornersA;
std::vector<cv::Point2f> cornersB;
then = cvGetTickCount();
goodFeaturesToTrack( imgA,cornersA,maxCorners,qualityLevel,minDistance,cv::Mat(),blockSize,true);
goodFeaturesToTrack( imgB,cornersB,maxCorners,qualityLevel,minDistance,cv::Mat(),blockSize,true);
now = cvGetTickCount();
cout << (double)(now - then) / tickspersecond;
cornerSubPix( imgA, cornersA, Size( win_size, win_size ), Size( -1, -1 ),
TermCriteria( CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 20, 0.03 ) );
cornerSubPix( imgB, cornersB, Size( win_size, win_size ), Size( -1, -1 ),
TermCriteria( CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 20, 0.03 ) );
// Call Lucas Kanade algorithm
CvSize pyr_sz = Size( img_sz.width+8, img_sz.height/3 );
std::vector<uchar> features_found;
std::vector<float> feature_errors;
calcOpticalFlowPyrLK( imgA, imgB, cornersA, cornersB, features_found, feature_errors ,
Size( win_size, win_size ), 5,
cvTermCriteria( CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 20, 0.3 ), 0 );
// Make an image of the results
for( int i=0; i < features_found.size(); i++ ){
// cout<<"Error is "<<feature_errors[i]<<endl;
//cout<<"Got it"<<endl;
Point p0( ceil( cornersA[i].x ), ceil( cornersA[i].y ) );
Point p1( ceil( cornersB[i].x ), ceil( cornersB[i].y ) );
line( imgC, p0, p1, CV_RGB(255,255,255), 2 );
namedWindow( "ImageA", 0 );
namedWindow( "ImageB", 0 );
namedWindow( "LKpyr_OpticalFlow", 0 );
imshow( "ImageA", imgA );
imshow( "ImageB", imgB );
imshow( "LKpyr_OpticalFlow", imgC );
return 0;
You're probably using the debug libraries (*d.lib) instead of the release ones. I had this same problem with ~1-2s per call for goodFeaturesToTrack() and switching to release solved it.
Why are you calling goodFeaturesToTrack twice ?
Call it once to get cornersA and then use LK to identify the same corners / features in imgB.

Using opencv to match an image from a group of images for purpose of identification in C++

EDIT: I've acquired enough reputation through this post to be able to edit it with more links, which will help me get my point across better
People playing binding of isaac often come across important items on little pedestals.
The goal is to have a user confused about what an item is be able to press a button which will then instruct him to "box" the item(think windows desktop boxing). The box gives us the region of interest(the actual item plus some background environment) to compare to what will be an entire grid of items.
Theoretical user boxed item
Theoretical grid of items(there's not many more, I just ripped this out of the binding of isaac wiki)
The location in the grid of items identified as the item the user boxed would represent a certain area on the image that correlates to a proper link to the binding of isaac wiki giving information on the item.
In the grid the item is 1st column 3rd from the bottom row. I use these two images in all of the things I tried below
My goal is creating a program that can take a manual crop of an item from the game "The Binding of Isaac", identify the cropped item by finding comparing the image to an image of a table of items in the game, then display the proper wiki page.
This would be my first "real project" in the sense that it requires a huge amount of library learning to get what I want done. It's been a bit overwhelming.
I've messed with a few options just from googling around. (you can quickly find the tutorials I used by searching the name of the method and opencv. my account is heavily restricted with link posting for some reason)
using bruteforcematcher:
#include <stdio.h>
#include <iostream>
#include "opencv2/core/core.hpp"
#include <opencv2/legacy/legacy.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include "opencv2/highgui/highgui.hpp"
using namespace cv;
void readme();
/** #function main */
int main( int argc, char** argv )
if( argc != 3 )
{ return -1; }
Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );
Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );
if( !img_1.data || !img_2.data )
{ return -1; }
//-- Step 1: Detect the keypoints using SURF Detector
int minHessian = 400;
SurfFeatureDetector detector( minHessian );
std::vector<KeyPoint> keypoints_1, keypoints_2;
detector.detect( img_1, keypoints_1 );
detector.detect( img_2, keypoints_2 );
//-- Step 2: Calculate descriptors (feature vectors)
SurfDescriptorExtractor extractor;
Mat descriptors_1, descriptors_2;
extractor.compute( img_1, keypoints_1, descriptors_1 );
extractor.compute( img_2, keypoints_2, descriptors_2 );
//-- Step 3: Matching descriptor vectors with a brute force matcher
BruteForceMatcher< L2<float> > matcher;
std::vector< DMatch > matches;
matcher.match( descriptors_1, descriptors_2, matches );
//-- Draw matches
Mat img_matches;
drawMatches( img_1, keypoints_1, img_2, keypoints_2, matches, img_matches );
//-- Show detected matches
imshow("Matches", img_matches );
return 0;
/** #function readme */
void readme()
{ std::cout << " Usage: ./SURF_descriptor <img1> <img2>" << std::endl; }
results in not so useful looking stuff. Cleaner but equally unreliable results using flann.
#include <stdio.h>
#include <iostream>
#include "opencv2/core/core.hpp"
#include <opencv2/legacy/legacy.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include "opencv2/highgui/highgui.hpp"
using namespace cv;
void readme();
/** #function main */
int main( int argc, char** argv )
if( argc != 3 )
{ readme(); return -1; }
Mat img_1 = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE );
Mat img_2 = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE );
if( !img_1.data || !img_2.data )
{ std::cout<< " --(!) Error reading images " << std::endl; return -1; }
//-- Step 1: Detect the keypoints using SURF Detector
int minHessian = 400;
SurfFeatureDetector detector( minHessian );
std::vector<KeyPoint> keypoints_1, keypoints_2;
detector.detect( img_1, keypoints_1 );
detector.detect( img_2, keypoints_2 );
//-- Step 2: Calculate descriptors (feature vectors)
SurfDescriptorExtractor extractor;
Mat descriptors_1, descriptors_2;
extractor.compute( img_1, keypoints_1, descriptors_1 );
extractor.compute( img_2, keypoints_2, descriptors_2 );
//-- Step 3: Matching descriptor vectors using FLANN matcher
FlannBasedMatcher matcher;
std::vector< DMatch > matches;
matcher.match( descriptors_1, descriptors_2, matches );
double max_dist = 0; double min_dist = 100;
//-- Quick calculation of max and min distances between keypoints
for( int i = 0; i < descriptors_1.rows; i++ )
{ double dist = matches[i].distance;
if( dist < min_dist ) min_dist = dist;
if( dist > max_dist ) max_dist = dist;
printf("-- Max dist : %f \n", max_dist );
printf("-- Min dist : %f \n", min_dist );
//-- Draw only "good" matches (i.e. whose distance is less than 2*min_dist )
//-- PS.- radiusMatch can also be used here.
std::vector< DMatch > good_matches;
for( int i = 0; i < descriptors_1.rows; i++ )
{ if( matches[i].distance < 2*min_dist )
{ good_matches.push_back( matches[i]); }
//-- Draw only "good" matches
Mat img_matches;
drawMatches( img_1, keypoints_1, img_2, keypoints_2,
good_matches, img_matches, Scalar::all(-1), Scalar::all(-1),
vector<char>(), DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS );
//-- Show detected matches
imshow( "Good Matches", img_matches );
for( int i = 0; i < good_matches.size(); i++ )
{ printf( "-- Good Match [%d] Keypoint 1: %d -- Keypoint 2: %d \n", i, good_matches[i].queryIdx, good_matches[i].trainIdx ); }
return 0;
/** #function readme */
void readme()
{ std::cout << " Usage: ./SURF_FlannMatcher <img1> <img2>" << std::endl; }
templatematching has been my best method so far. of the 6 methods it ranges from getting only 0-4 correct identifications though.
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
using namespace std;
using namespace cv;
/// Global Variables
Mat img; Mat templ; Mat result;
char* image_window = "Source Image";
char* result_window = "Result window";
int match_method;
int max_Trackbar = 5;
/// Function Headers
void MatchingMethod( int, void* );
/** #function main */
int main( int argc, char** argv )
/// Load image and template
img = imread( argv[1], 1 );
templ = imread( argv[2], 1 );
/// Create windows
namedWindow( image_window, CV_WINDOW_AUTOSIZE );
namedWindow( result_window, CV_WINDOW_AUTOSIZE );
/// Create Trackbar
char* trackbar_label = "Method: \n 0: SQDIFF \n 1: SQDIFF NORMED \n 2: TM CCORR \n 3: TM CCORR NORMED \n 4: TM COEFF \n 5: TM COEFF NORMED";
createTrackbar( trackbar_label, image_window, &match_method, max_Trackbar, MatchingMethod );
MatchingMethod( 0, 0 );
return 0;
* #function MatchingMethod
* #brief Trackbar callback
void MatchingMethod( int, void* )
/// Source image to display
Mat img_display;
img.copyTo( img_display );
/// Create the result matrix
int result_cols = img.cols - templ.cols + 1;
int result_rows = img.rows - templ.rows + 1;
result.create( result_cols, result_rows, CV_32FC1 );
/// Do the Matching and Normalize
matchTemplate( img, templ, result, match_method );
normalize( result, result, 0, 1, NORM_MINMAX, -1, Mat() );
/// Localizing the best match with minMaxLoc
double minVal; double maxVal; Point minLoc; Point maxLoc;
Point matchLoc;
minMaxLoc( result, &minVal, &maxVal, &minLoc, &maxLoc, Mat() );
/// For SQDIFF and SQDIFF_NORMED, the best matches are lower values. For all the other methods, the higher the better
if( match_method == CV_TM_SQDIFF || match_method == CV_TM_SQDIFF_NORMED )
{ matchLoc = minLoc; }
{ matchLoc = maxLoc; }
/// Show me what you got
rectangle( img_display, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
rectangle( result, matchLoc, Point( matchLoc.x + templ.cols , matchLoc.y + templ.rows ), Scalar::all(0), 2, 8, 0 );
imshow( image_window, img_display );
imshow( result_window, result );
of the 6
This was sort of a best case result though. The next item I tried was
and resulted in fail,fail,fail,fail,fail,fail
From item to item all of these methods have some that work well and some that do terribly
So I'll ask: is templatematching my best bet or is there a method I'm not considering that will be my holy grail?
How can I get a USER to create the crop manually? Opencv's documentation on this is really bad and the examples I find online are extremely old cpp or straight C.
Thanks for any help. This venture has been an interesting experience so far. I had to strip all of the links which would better portray how everything's been working out, but the site is saying I'm posting more than 10 links even when I'm not.
some more examples of items throughout the game:
the rock is a rare item and one of the few that can be "anywhere" on the screen. items like the rock are the reason why cropping of the item by user is the best way about isolating the item, otherwise their positions are only in a couple of specific places.
An item after a boss fight, lots of stuff everywhere and transparency in the middle. I would imagine this being one of the harder ones to work correctly
Rare room. simple background. no item transparency.
here are the two tables all of the items in the game are.. I'll make them one image eventually but for now they were directly taken from the isaac wiki.
One important detail here is that you have pure image of every item in your table. You know color of background and can detach item from the rest of the picture. For example, in addition to matrix, representing image itself, you may store matrix of 1-s and 0-s of the same size, where ones correspond to image area and zeros - to background. Let's call this matrix "mask" and pure image of the item - "pattern".
There are 2 ways to compare images: match image with the pattern and match pattern with the image. What you have described is matching image with the pattern - you have some cropped image and want to find similar pattern. Instead, think about searching pattern on image.
Let's first define function match() that takes pattern, mask and image of the same size and checks if area on pattern under the mask is exactly the same as in image (pseudocode):
def match(pattern, mask, image):
for x = 0 to pattern.width:
for y = 0 to pattern.height:
if mask[x, y] == 1 and # if in pattern this pixel is not part of background
pattern[x, y] != image[x, y]: # and pixels on pattern and image differ
return False
return True
But sizes of pattern and cropped image may differ. Standard solution for this (used, for example, in cascade classifier) is to use sliding window - just move pattern "window" across image and check if pattern matches selected region. This is pretty much how image detection works in OpenCV.
Of course, this solution is not very robust - cropping, resizing or any other image transformations may change some pixels, and in this case method match() will always return false. To overcome this, instead of boolean answer you can use distance between image and pattern. In this case function match() should return some value of similarity, say, between 0 and 1, where 1 stands for "exactly the same", while 0 for "completely different". Then you either set threshold for similarity (e.g. image should be at least 85% similar to the pattern), or just select pattern with highest value of similarity.
Since items in the game are artificial images and variation in them is very small, this approach should be enough. However, for more complicated cases you will need other features than simply pixels under the mask. As I already suggested in my comment, methods like Eigenfaces, cascade classifier using Haar-like features or even Active Appearance Models may be more efficient for these tasks. As for SURF, as far as I know it's better suited for tasks with varying angle and size of object, but not for different backgrounds and all such things.
I came upon your question while trying to figure out my own template-matching issue, and now I'm back to share what I think might be your best bet based on my own experience. You've probably long-since abandoned this, but hey someone else might be in similar shoes one day.
None of the items that you shared are a solid rectangle, and since template matching in opencv cannot work with a mask you'll always be comparing your reference image against what I must assume is at least several different backgrounds (not to mention the items that are found in varied locations on different backgrounds, making the template match even worse).
It will always be comparing the background pixels and confounding your match unless you can collect a crop of every single situation where the reference image can be found. If decals of blood/etc introduce yet more variability into the backgrounds around the items too then template matching probably won't get great results.
So the two things I would try if I were you are depending on some details:
If possible, crop a reference template of every situation where the item is found (this will not be a good time), then compare the user-specified area against every template of every item. Take the best result from these comparisons and you will, if lucky, have a correct match.
The example screen shots you shared don't have any dark/black lines on the background,so the outlines of all of the items stands out. If this is consistent throughout the game, you can find edges within the user-specified area and detect the exterior contours. Ahead of time you would have processed the exterior contours of each reference item and stored those contours. Then you can compare your contour(s) in the user's crop against each contour in your database, taking the best match as the answer.
I'm confident either of those could work for you, depending on whether the game is well-represented by your screenshots.
Note: The contour matching will be much, much faster than the template matching. Fast enough to run in realtime and negate the need for the user to crop anything, perhaps.