Template Matching with Mask - c++

I want to perform Template matching with mask. In general Template matching can be made faster by converting the image from Spacial domain into Frequency domain. But is there any any method i can apply if i want to perform the same with mask? I'm using opencv c++. Is there any matching function already there in opencv for this task?
My current Approach:
Bitwise Xor Image A & Image B with Mask.
Count the Non-Zero Pixels.
Fill the Resultant matrix with this count.
Search for maxi-ma.
Few parameters I'm guessing now are:
Skip the Tile position if the matches are less than 25%.
Skip the Tile position if the previous Tile has matches are less than 50%.
My question: is there any algorithm to do this matching already? Is there any mathematical operation which can speed up this process?

With binary images, you can use directly HU-Moments and Mahalanobis distance to find if image A is similar to image B. If the distance tends to 0, then the images are the same.
Of course you can use also Features detectors so see what matches, but for pictures like these, HU Moments or Features detectors will give approximately same results, but HU Moments are more efficient.
Using findContours, you can extract the black regions inside the white star and fill them, in order to have image A = image B.
Other approach: using findContours on your mask and apply the result to Image A (extracting the Region of Interest), you can extract what's inside the star and count how many black pixels you have (the mismatching ones).

I have same requirement and I have tried the almost same way. As in the image, I want to match the castle. The castle has a different shield image and variable length clan name and also grass background(This image comes from game Clash of Clans). The normal opencv matchTemplate does not work. So I write my own.
I follow the ways of matchTemplate to create a result image, but with different algorithm.
The core idea is to count the matched pixel under the mask. The code is following, it is simple.
This works fine, but the time cost is high. As you can see, it costs 457ms.
Now I am working on the optimization.
The source and template images are both CV_8U3C, mask image is CV_8U. Match one channel is OK. It is more faster, but it still costs high.
Mat tmp(matTempl.cols, matTempl.rows, matTempl.type());
int matchCount = 0;
float maxVal = 0;
double areaInvert = 1.0 / countNonZero(matMask);
for (int j = 0; j < resultRows; j++)
float* data = imgResult.ptr<float>(j);
for (int i = 0; i < resultCols; i++)
Mat matROI(matSource, Rect(i, j, matTempl.cols, matTempl.rows));
bitwise_xor(matROI, matTempl, tmp);
bitwise_and(tmp, matMask, tmp);
data[i] = 1.0f - float(countNonZero(tmp) * areaInvert);
if (data[i] > matchingDegree)
SRect rc;
rc.left = i;
rc.top = j;
rc.right = i + imgTemplate.cols;
rc.bottom = j + imgTemplate.rows;
if ( data[i] > maxVal)
maxVal = data[i];
maxIndex = rcOuts.size() - 1;
if (++matchCount == maxMatchs)
Log_Warn("Too many matches, stopped at: " << matchCount);
return true;
It says I have not enough reputations to post image....
New added:
I success optimize the algorithm by using key points. Calculate all the points is cost, but it is faster to calculate only server key points. See the picture, the costs decrease greatly, now it is about 7ms.
I still can not post image, please visit: http://i.stack.imgur.com/ePcD9.png
Please give me reputations, so I can post images. :)

There is a technical formulation for template matching with mask in OpenCV Documentation, which works well. It can be used by calling cv::matchTemplate and its source code is also available under the Intel License.


OpenCV homography - question about deringing lanczos interpolation

I'm attempting to improve performance of the OpenCV lanczos interpolation algorithm for applying homography transformations to astronomical images, as it is prone to ringing artefacts around stars in some images.
My approach is to apply homography twice, once using lanczos and once using bilinear filtering which is not susceptible to ringing, but doesn't perform as well at preserving detail. I then use the bilinear-interpolated output as a guide image, and clamp the lanczos-interpolated output to the guide if it undershoots the guide by more than a given percentage.
I have working code (below) but have 2 questions:
It doesn't seem optimal to iterate across elements in the Mat. Is there a better way of doing the compare and replace loop using OpenCV Mat methods?
My overall approach is computationally expensive - I'm applying homography to the entire Mat twice. Is there an overall better approach to preventing deringing of lanczos interpolation? (Rewriting the entire algorithm plus all the various optimisations that OpenCV makes available is not an option for me.)
warpPerspective(in, out, H, Size(target_rx, target_ry), interpolation, BORDER_TRANSPARENT);
if (interpolation == OPENCV_LANCZOS4) {
int count = 0;
// factor sets how big an undershoot can be tolerated
double factor = 0.75;
// Create guide image
warpPerspective(in, guide, H, Size(target_rx, target_ry), OPENCV_LINEAR, BORDER_TRANSPARENT);
// Compare the two, replace out pixels with guide pixels if too far out
for (int i = 0 ; i < out.rows ; i++) {
const double* outi = out.ptr<double>(i);
const double* guidei = guide.ptr<double>(i);
for (int j = 0; j < out.cols ; j++) {
if (outi[j] < guidei[j] * factor) {
out.at<double>(i, j) = guidei[j];
With a steer from Christoph Rackwitz, the answer was surprisingly simple:
compare(out, (guide * factor), mask, CMP_LT);
guide.copyTo(out, mask);
Thanks :)

Finding repeated pattern in a series of numbers in C++

I am trying to implement an auto grid detection system for an electrocardiogram, ecg, paper see the figure below.The idea behind is to add the pixel values(only considered the red channel) by going through pixel by pixel of the ecg image as shown in the code below.
QImage image("C:/Users/.../Desktop/ECGProject/electrocardiogram.jpg");
std::vector<int> pixelValues;
for (int y = 0; y < img.height(); y++)
int rowSumR = 0, rowSumG = 0, rowSumB = 0;
for (int x = 0; x < img.width(); x++)
QRgb rgb = img.pixel(x, y);
rowSumR += qRed(rgb);
rowSumR /= img.width();
const int &value = rowSumR/4;
The vector pixelValues contains summed values which has repeated pattern in a y direction. The goal is to detect those repeated pattern (for instance the line drawn in black color on in the ecg image is the interest or what I am looking to identify in a y direction). I also draw the summed pixel value in y direction using matlab(see the figure below) and the red circles are the pattern I am interested in. Any suggestion/algorithm to find these repeated pattern would be appreciated.
If you need to identify the number of bold red grid lines and "cut off" the similar patterns associated with each "period" in it I would suggest using of pitch tracking algorithms used in speech processing. One such approach, which computes the so-called pitch track is described in this work:
If you need help implementing that algorithm I can do it for you if you provide me the data.
I wrote a following program for you in matlab:
load data.txt
y = data(:,2);
yr = resample(y,10,1);
xhat = cceps(yr);
maxima = zeros(10000,1);
cnt = 1;
for i = 2:length(xhat)-1
if xhat(i-1) < xhat(i) && xhat(i+1) < xhat(i)
maxima(cnt) = i-1;
cnt = cnt + 1;
maxima(cnt:end) = [];
The cepstra are a signal processing tool, which allow detection of periodicity. It actually deconvolve signals. Say, in our case, we have an impuls train and some pattern convolved. Cepstral analysis 'decouples' the impuls train and the pattern. The impuls train period results in a maximum at given time spot in the cepstrum. If you run this program you can state from the output that the fine grained periodicity has mean period of 3.5 pixels and the greedy periodicity (you marked the corresponding impulses red) has mean period of 23.4 pixels (note the interpolation). Based on this observation you can try by the correlation analysis to refine the local placement of impulses with a technique known from speech processing as pitch-analysis (which is based on the correlation analysis). This last step might be necessary since there are apparent irregularities in peaks placement. Let me know if you have further doubts.

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
if (sc * Xnew / w > dc) {
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
Upsampling/downsampling kernel:
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

colorbalance in an image using c++ and opencv

I'm trying to score the colorbalance of an image using c++ and opencv.
To do this the easiest way is to count the number of pixels in each color and then see if one of the colors is more prevalent.
I figured I should probably used calcHist and with the split function I can split a image in R, G, and B histograms. However I am unsure about what to do next. I could probably walk through all the bins and just see how many pixels are in there but this seems like a lot of work (I currently use 256 bins).
Is there a faster way to count the pixels in a color range? Also I am not sure how it would work if white or black are the more prevalant colors?
Automatic color balance algorithm is described in this link http://web.stanford.edu/~sujason/ColorBalancing/simplestcb.html
For C++ Code you can refer to this link : https://www.morethantechnical.com/2015/01/14/simplest-color-balance-with-opencv-wcode/
/// perform the Simplest Color Balancing algorithm
void SimplestCB(Mat& in, Mat& out, float percent) {
assert(in.channels() == 3);
assert(percent > 0 && percent < 100);
float half_percent = percent / 200.0f;
vector<Mat> tmpsplit; split(in,tmpsplit);
for(int i=0;i<3;i++) {
//find the low and high precentile values (based on the input percentile)
Mat flat; tmpsplit[i].reshape(1,1).copyTo(flat);
cv::sort(flat,flat,CV_SORT_EVERY_ROW + CV_SORT_ASCENDING);
int lowval = flat.at<uchar>(cvFloor(((float)flat.cols) * half_percent));
int highval = flat.at<uchar>(cvCeil(((float)flat.cols) * (1.0 - half_percent)));
cout << lowval << " " << highval << endl;
//saturate below the low percentile and above the high percentile
tmpsplit[i].setTo(lowval,tmpsplit[i] < lowval);
tmpsplit[i].setTo(highval,tmpsplit[i] > highval);
//scale the channel
// Usage example
void main() {
Mat tmp,im = imread("lily.png");
Colour balance is normally looking at a white (or gray) surface and checking the ratios of red/blue to green. A perfectly balanced system would have equal signal levels in red/blue.
You can then simply work out the average red/blue from the test gray card image and apply the same scaling to your real image.
Doing it on a live image with no reference is trickier, you have to find areas that are probably white (ie bright and nearly r=g=b) and use them as the reference
There's no definitive algorithm for colour balance, so anything you might implement, however good it is, will probably fail in some conditions.
One of the simplest algorithms is called Grey World, and assumes that statistically the average colour of a scene should be grey. And if it isn't, it means that it needs to be corrected to grey. So, very simply (in pseudo-python), if you have an image RGB:
cc[0] = np.mean(RGB[:,0]) # calculating channel-wise average
cc[1] = np.mean(RGB[:,1])
cc[2] = np.mean(RGB[:,2])
cc = cc / np.sqrt((cc**2).sum()) # normalise the light (you might want to
# play with this a bit
RGB /= cc # divide every pixel by the estimated light
Note that here I'm assuming that RGB is an array of floats with values between 0 and 1. Something else that helps is to exclude from the average pixels that contain values below and above certain thresholds (e.g., below 0.05 and above 0.95). This way you ignore pixels whose value is heavily influenced by noise (small values) and pixels that saturated the camera sensor and whose colour may not be reliable (large values).

Recognizing an image from a list with OpenCV SIFT using the FLANN matching

The point of the application is to recognize an image from an already set list of images. The list of images have had their SIFT descriptors extracted and saved in files. Nothing interesting here:
std::vector<cv::KeyPoint> detectedKeypoints;
cv::Mat objectDescriptors;
// Extract data
cv::SIFT sift;
sift.detect(image, detectedKeypoints);
sift.compute(image, detectedKeypoints, objectDescriptors);
// Save the file
cv::FileStorage fs(file, cv::FileStorage::WRITE);
fs << "descriptors" << objectDescriptors;
fs << "keypoints" << detectedKeypoints;
Then the device takes a picture. SIFT descriptors are extracted in the same way. The idea now was to compare the descriptors to the ones from the files. I am doing that using the FLANN matcher from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list I should have the best match.
const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1);
const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64);
// Match using Flann
cv::Mat indexMat;
cv::FlannBasedMatcher matcher(indexParams, searchParams);
std::vector< cv::DMatch > matches;
matcher.match(objectDescriptors, readDescriptors, matches);
After matching I understand that I get a list of the closest found distances between the feature vectors. I find the minimum distance and, using it I can count "good matches" and even get a list of the respective points:
// Count the number of mathes where the distance is less than 2 * min_dist
int goodCount = 0;
for (int i = 0; i < objectDescriptors.rows; i++)
if (matches[i].distance < 2 * min_dist)
// Save the points for the homography calculation
I'm showing easy parts of the code just to make this more easy to follow, I know some of it doesn't need to be here.
Continuing, I was hoping that simply counting the number of good matches like this would be enough, but it turned out to mostly just point me to the image with the most descriptors. What I tried to after this was computing the homography. The aim was to compute it and see whether it's a valid homoraphy or not. The hope was that a good match, and only a good match, would have a homography that is a good transformation. Creating the homography was done simply using cv::findHomography on the obj and scene which are std::vector< cv::Point2f>. I checked the validity of the homography using some code I found online:
bool niceHomography(cv::Mat H)
std::cout << H << std::endl;
const double det = H.at<double>(0, 0) * H.at<double>(1, 1) - H.at<double>(1, 0) * H.at<double>(0, 1);
if (det < 0)
std::cout << "Homography: bad determinant" << std::endl;
return false;
const double N1 = sqrt(H.at<double>(0, 0) * H.at<double>(0, 0) + H.at<double>(1, 0) * H.at<double>(1, 0));
if (N1 > 4 || N1 < 0.1)
std::cout << "Homography: bad first column" << std::endl;
return false;
const double N2 = sqrt(H.at<double>(0, 1) * H.at<double>(0, 1) + H.at<double>(1, 1) * H.at<double>(1, 1));
if (N2 > 4 || N2 < 0.1)
std::cout << "Homography: bad second column" << std::endl;
return false;
const double N3 = sqrt(H.at<double>(2, 0) * H.at<double>(2, 0) + H.at<double>(2, 1) * H.at<double>(2, 1));
if (N3 > 0.002)
std::cout << "Homography: bad third row" << std::endl;
return false;
return true;
I don't understand the math behind this so, while testing, I sometimes replaced this function with a simple check whether the determinant of the homography was positive. The problem is that I kept having issues here. The homographies were either all bad, or good when they shouldn't have been (when I was checking only the determinant).
I figured I should actually use the homography and for a number of points just compute their position in the destination image using their position in the source image. Then I would compare these average distances, and I would ideally get a very obvious smaller average distance in the case of the correct image. This did not work at all. All the distances were colossal. I thought I might have used the homography the other way around to calculate the right position, but switching obj and scene with each other gave similar results.
Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting the n smallest distances for every image instead of a number depending on the minimum distance, or getting distances depending on a global maximum distance. None of these approaches gave me definite good results, and I feel stuck now.
My only next strategy would be to sharpen the images or even turn them to binary images using some local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistake anyone can see in my work.
I don't know whether this is relevant, but I added some of the images I am testing this on. Many times in the test images most of the SIFT vectors come from the frame (higher contrast) than the painting. This is why I'm thinking sharpening the images might work, but I don't want to go deeper in case something I did previously is wrong.
The gallery of images is here with the descriptions in the titles. The images are of quite high resolution, please view in case it might give some hints.
You can try to test if when matching, the lines between the source image and the target image are relatively parallel. If it's not a correct match, then you'd have a lot of noise and the lines won't be parallel.
See the attached image which shows a correct match (using SURF and BF) - all the lines are mostly parallel (though I should point out that this is an easy example).
You are going correct way.
First, use second nearest ratio isntead of your "good match by 2*min_dist" https://stackoverflow.com/a/23019889/1983544.
Second, use homography other way. When you find homography, you have not only H ,matrix, but the number of correspondences consistent with it. Check if it is some reasonable number, say >=15. If less, than object is not matched.
Third, if you have a big viewpoint change, SIFT or SURF are unable to match images. Try to use MODS instead (http://cmp.felk.cvut.cz/wbs/ here is Windows and Linux binaries, as well as paper describing algorithm) or ASIFT (much slower and matches much worse, but open source) http://www.ipol.im/pub/art/2011/my-asift/
Or at least use MSER or Hessian-Affine detector instead of SIFT (retaining SIFT as descriptor).