Smoothing a contour with a lookup table / levels mapping (OpenCV) - c++

I'm trying to smooth jagged contours drawn by OpenCV's drawContours() method. I'm applying a Gaussian blur to the contour, then trying to use a lookup table to map the pixel intensities.
However, I don't know what values to use in my lookup table. Right now I'm just guessing at arbitrary numbers. I put together a small mockup: The first two images are results directly from OpenCV. The last image is achieved through Photoshop's levels feature. As you can see it's smoother.
How do I know what values to use in my look up table?
std::vector<char> lut(256);
for (int i = 0; i <= 255; ++i) {
if(i >= 75) lut[i] = 255;
else if (i <= 25) lut[i] = 0;
else if (i < 75 && i > 25) lut[i] = i;
cv::LUT(contoursOverlay, lut, contoursOverlay);

Did you think to apply only a dilatation filter (and optionally a blur afterward):
It could be simpler and a better solution. In your case, I doubt there is good practise in this case. I'm not sure it's the good solution.


OpenCV homography - question about deringing lanczos interpolation

I'm attempting to improve performance of the OpenCV lanczos interpolation algorithm for applying homography transformations to astronomical images, as it is prone to ringing artefacts around stars in some images.
My approach is to apply homography twice, once using lanczos and once using bilinear filtering which is not susceptible to ringing, but doesn't perform as well at preserving detail. I then use the bilinear-interpolated output as a guide image, and clamp the lanczos-interpolated output to the guide if it undershoots the guide by more than a given percentage.
I have working code (below) but have 2 questions:
It doesn't seem optimal to iterate across elements in the Mat. Is there a better way of doing the compare and replace loop using OpenCV Mat methods?
My overall approach is computationally expensive - I'm applying homography to the entire Mat twice. Is there an overall better approach to preventing deringing of lanczos interpolation? (Rewriting the entire algorithm plus all the various optimisations that OpenCV makes available is not an option for me.)
warpPerspective(in, out, H, Size(target_rx, target_ry), interpolation, BORDER_TRANSPARENT);
if (interpolation == OPENCV_LANCZOS4) {
int count = 0;
// factor sets how big an undershoot can be tolerated
double factor = 0.75;
// Create guide image
warpPerspective(in, guide, H, Size(target_rx, target_ry), OPENCV_LINEAR, BORDER_TRANSPARENT);
// Compare the two, replace out pixels with guide pixels if too far out
for (int i = 0 ; i < out.rows ; i++) {
const double* outi = out.ptr<double>(i);
const double* guidei = guide.ptr<double>(i);
for (int j = 0; j < out.cols ; j++) {
if (outi[j] < guidei[j] * factor) {<double>(i, j) = guidei[j];
With a steer from Christoph Rackwitz, the answer was surprisingly simple:
compare(out, (guide * factor), mask, CMP_LT);
guide.copyTo(out, mask);
Thanks :)

Template Matching with Mask

I want to perform Template matching with mask. In general Template matching can be made faster by converting the image from Spacial domain into Frequency domain. But is there any any method i can apply if i want to perform the same with mask? I'm using opencv c++. Is there any matching function already there in opencv for this task?
My current Approach:
Bitwise Xor Image A & Image B with Mask.
Count the Non-Zero Pixels.
Fill the Resultant matrix with this count.
Search for maxi-ma.
Few parameters I'm guessing now are:
Skip the Tile position if the matches are less than 25%.
Skip the tile position if the matches are less than 25%.
Skip the Tile position if the previous Tile has matches are less than 50%.
My question: is there any algorithm to do this matching already? Is there any mathematical operation which can speed up this process?
With binary images, you can use directly HU-Moments and Mahalanobis distance to find if image A is similar to image B. If the distance tends to 0, then the images are the same.
Of course you can use also Features detectors so see what matches, but for pictures like these, HU Moments or Features detectors will give approximately same results, but HU Moments are more efficient.
Using findContours, you can extract the black regions inside the white star and fill them, in order to have image A = image B.
Other approach: using findContours on your mask and apply the result to Image A (extracting the Region of Interest), you can extract what's inside the star and count how many black pixels you have (the mismatching ones).
I have same requirement and I have tried the almost same way. As in the image, I want to match the castle. The castle has a different shield image and variable length clan name and also grass background(This image comes from game Clash of Clans). The normal opencv matchTemplate does not work. So I write my own.
I follow the ways of matchTemplate to create a result image, but with different algorithm.
The core idea is to count the matched pixel under the mask. The code is following, it is simple.
This works fine, but the time cost is high. As you can see, it costs 457ms.
Now I am working on the optimization.
The source and template images are both CV_8U3C, mask image is CV_8U. Match one channel is OK. It is more faster, but it still costs high.
Mat tmp(matTempl.cols, matTempl.rows, matTempl.type());
int matchCount = 0;
float maxVal = 0;
double areaInvert = 1.0 / countNonZero(matMask);
for (int j = 0; j < resultRows; j++)
float* data = imgResult.ptr<float>(j);
for (int i = 0; i < resultCols; i++)
Mat matROI(matSource, Rect(i, j, matTempl.cols, matTempl.rows));
bitwise_xor(matROI, matTempl, tmp);
bitwise_and(tmp, matMask, tmp);
data[i] = 1.0f - float(countNonZero(tmp) * areaInvert);
if (data[i] > matchingDegree)
SRect rc;
rc.left = i; = j;
rc.right = i + imgTemplate.cols;
rc.bottom = j + imgTemplate.rows;
if ( data[i] > maxVal)
maxVal = data[i];
maxIndex = rcOuts.size() - 1;
if (++matchCount == maxMatchs)
Log_Warn("Too many matches, stopped at: " << matchCount);
return true;
It says I have not enough reputations to post image....
New added:
I success optimize the algorithm by using key points. Calculate all the points is cost, but it is faster to calculate only server key points. See the picture, the costs decrease greatly, now it is about 7ms.
I still can not post image, please visit:
Please give me reputations, so I can post images. :)
There is a technical formulation for template matching with mask in OpenCV Documentation, which works well. It can be used by calling cv::matchTemplate and its source code is also available under the Intel License.

Kinect for Windows v2 depth to color image misalignment

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*), cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (<UINT16>(i, j) > 0 &&<UINT16>(i, j) < maxVal * (max_z / 100) &&<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{<cv::Vec3b>(i, j) =<cv::Vec3b>(colorY, colorX);
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Recognizing an image from a list with OpenCV SIFT using the FLANN matching

The point of the application is to recognize an image from an already set list of images. The list of images have had their SIFT descriptors extracted and saved in files. Nothing interesting here:
std::vector<cv::KeyPoint> detectedKeypoints;
cv::Mat objectDescriptors;
// Extract data
cv::SIFT sift;
sift.detect(image, detectedKeypoints);
sift.compute(image, detectedKeypoints, objectDescriptors);
// Save the file
cv::FileStorage fs(file, cv::FileStorage::WRITE);
fs << "descriptors" << objectDescriptors;
fs << "keypoints" << detectedKeypoints;
Then the device takes a picture. SIFT descriptors are extracted in the same way. The idea now was to compare the descriptors to the ones from the files. I am doing that using the FLANN matcher from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list I should have the best match.
const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1);
const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64);
// Match using Flann
cv::Mat indexMat;
cv::FlannBasedMatcher matcher(indexParams, searchParams);
std::vector< cv::DMatch > matches;
matcher.match(objectDescriptors, readDescriptors, matches);
After matching I understand that I get a list of the closest found distances between the feature vectors. I find the minimum distance and, using it I can count "good matches" and even get a list of the respective points:
// Count the number of mathes where the distance is less than 2 * min_dist
int goodCount = 0;
for (int i = 0; i < objectDescriptors.rows; i++)
if (matches[i].distance < 2 * min_dist)
// Save the points for the homography calculation
I'm showing easy parts of the code just to make this more easy to follow, I know some of it doesn't need to be here.
Continuing, I was hoping that simply counting the number of good matches like this would be enough, but it turned out to mostly just point me to the image with the most descriptors. What I tried to after this was computing the homography. The aim was to compute it and see whether it's a valid homoraphy or not. The hope was that a good match, and only a good match, would have a homography that is a good transformation. Creating the homography was done simply using cv::findHomography on the obj and scene which are std::vector< cv::Point2f>. I checked the validity of the homography using some code I found online:
bool niceHomography(cv::Mat H)
std::cout << H << std::endl;
const double det =<double>(0, 0) *<double>(1, 1) -<double>(1, 0) *<double>(0, 1);
if (det < 0)
std::cout << "Homography: bad determinant" << std::endl;
return false;
const double N1 = sqrt(<double>(0, 0) *<double>(0, 0) +<double>(1, 0) *<double>(1, 0));
if (N1 > 4 || N1 < 0.1)
std::cout << "Homography: bad first column" << std::endl;
return false;
const double N2 = sqrt(<double>(0, 1) *<double>(0, 1) +<double>(1, 1) *<double>(1, 1));
if (N2 > 4 || N2 < 0.1)
std::cout << "Homography: bad second column" << std::endl;
return false;
const double N3 = sqrt(<double>(2, 0) *<double>(2, 0) +<double>(2, 1) *<double>(2, 1));
if (N3 > 0.002)
std::cout << "Homography: bad third row" << std::endl;
return false;
return true;
I don't understand the math behind this so, while testing, I sometimes replaced this function with a simple check whether the determinant of the homography was positive. The problem is that I kept having issues here. The homographies were either all bad, or good when they shouldn't have been (when I was checking only the determinant).
I figured I should actually use the homography and for a number of points just compute their position in the destination image using their position in the source image. Then I would compare these average distances, and I would ideally get a very obvious smaller average distance in the case of the correct image. This did not work at all. All the distances were colossal. I thought I might have used the homography the other way around to calculate the right position, but switching obj and scene with each other gave similar results.
Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting the n smallest distances for every image instead of a number depending on the minimum distance, or getting distances depending on a global maximum distance. None of these approaches gave me definite good results, and I feel stuck now.
My only next strategy would be to sharpen the images or even turn them to binary images using some local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistake anyone can see in my work.
I don't know whether this is relevant, but I added some of the images I am testing this on. Many times in the test images most of the SIFT vectors come from the frame (higher contrast) than the painting. This is why I'm thinking sharpening the images might work, but I don't want to go deeper in case something I did previously is wrong.
The gallery of images is here with the descriptions in the titles. The images are of quite high resolution, please view in case it might give some hints.
You can try to test if when matching, the lines between the source image and the target image are relatively parallel. If it's not a correct match, then you'd have a lot of noise and the lines won't be parallel.
See the attached image which shows a correct match (using SURF and BF) - all the lines are mostly parallel (though I should point out that this is an easy example).
You are going correct way.
First, use second nearest ratio isntead of your "good match by 2*min_dist"
Second, use homography other way. When you find homography, you have not only H ,matrix, but the number of correspondences consistent with it. Check if it is some reasonable number, say >=15. If less, than object is not matched.
Third, if you have a big viewpoint change, SIFT or SURF are unable to match images. Try to use MODS instead ( here is Windows and Linux binaries, as well as paper describing algorithm) or ASIFT (much slower and matches much worse, but open source)
Or at least use MSER or Hessian-Affine detector instead of SIFT (retaining SIFT as descriptor).

What is the fastest way to access a pixel in QImage?

I would like to know what is the fastest way to modify a portion of a QImage.
I have this piece of code that has to be executed with a frequency of 30Hz. It displays an image through a sort of keyhole. It is not possible to see the entire image but only a portion inside a circle. The first for-loop erases the previous "keyhole portion displayed" and the second updates the position of the "displayed keyhole".
for (int i = (prev_y - r_y); i < (prev_y + r_y); i++){
QRgb *line = (QRgb *)backgrd->scanLine(i);
for(int j = (prev_x - r_x); j < (prev_x + r_x) ; j++){
if((i >= 0 && i < this->backgrd->height()) && (j >= 0 && j < this->backgrd->width()))
line[j] = qRgb(0,0,0);
prev_x = new_x; prev_y = new_y;
for (int i = (new_y - r_y); i < (new_y + r_y); i++){
QRgb *line = (QRgb *)backgrd->scanLine(i);
QRgb *line2 = (QRgb *)this->picture->scanLine(i);
for(int j = (new_x - r_x); j < (new_x + r_x) ; j++){
if ((((new_x - j)*(new_x - j)/(r_x*r_x) + (new_y - i)*(new_y - i)/(r_y*r_y)) <= 1) && (i >= 0) && (i < this->picture->height())&& (j >= 0) && (j < this->picture->width()))
line[j] = line2[j];
this->current_img = this->backgrd;
this->update(); //Display QImage* this->current_img
If I analyse the timestamps of the program I find a delay in the flow of execution every time it is executed...
Is it so high consuming to access a pixel in a QImage? Am I doing something wrong?
Is there a better alternative to QImage for a Qt program?
How about prerendering your 'keyhole' in an array/qimage and doing a bitwise AND with the source?
Original pixel && black => black
Original pixel && white => original pixel
You have a lot of conditions in the innermost loop (some can be moved out though), but the circle radius calculation with the multiplies and divides looks costly. You can reuse the keyhole mask for every frame, so no calculations need be performed.
You could move some of the conditions at least to the outer loop, and maybe pre-compute some of the terms inside the conditions, though this may be optimized anyway.
Call update only for the rectangle(s) you modified
Where do you get the time stamp? Maybe you lose time somewhere else?
Actually I understood it wasn't pixel acces that was slow, but the rendering.
During the tests I did I used plain color images, but these kind of images are much faster to render than complex images loaded from file. With other tests I realized was the rendering that was slow.
The fastest way to render an QImage is first of all to transform it using
public: static QImage QGLWidget::convertToGLFormat(const QImage &img)
then the image can be fastly manipulated (it preserves bits() scanline() width() and height() functions)
and can be displayed very fast by openGL (no further conversions are necessary)
QPainter painter(this);
glDrawPixels(img.width(), img.height(), GL_RGBA, GL_UNSIGNED_BYTE, img.bits());
As far as I know the fastest way to access the data of a QImage is to use QImage::bits() which give you direct access to the data of QImage.
For your problem, A better approch will be to do as Bgie suggested : using a array representing the keyhole and doing only a bitwise AND operation.
it will help to choose the correct format for your Image, the format RGB32 and ARG32_Premultiplied_ARGB32 are the fastest. Don't use ARGB32 if you don't need it.