How to compute 2D log-chromaticity? - c++

My goal is to remove shadows from image. I use C++ and OpenCV. Sure I lack enough math background and not being native English speaker makes everything harder to understand.
After reading different approaches to remove shadows I found method which should work for me but it relies on something they call "2D chromaticity" and "2D log-chromaticity space" but even this term seems to be inconsistent in different sources. Many papers on topic, few are listed here:
http://www.cs.cmu.edu/~efros/courses/LBMV09/Papers/finlayson-eccv-04.pdf
http://www2.cmp.uea.ac.uk/Research/compvis/Papers/DrewFinHor_ICCV03.pdf
http://www.cvc.uab.es/adas/publications/alvarez_2008.pdf
http://ivrgwww.epfl.ch/alumni/fredemba/papers/FFICPR06.pdf
I teared Google into strips by searching right words and explanations. Best I found is Illumination invariant image which did not help me much.
I tried to repeat formula log(G/R), log(B/R) described in first paper, page 3 to get figures similar to 2b.
As input I used http://en.wikipedia.org/wiki/File:Gretag-Macbeth_ColorChecker.jpg
Output I get is
My source code:
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
using namespace std;
using namespace cv;
int main( int argc, char** argv ) {
Mat src;
src = imread( argv[1], 1 );
if( !src.data )
{ return -1; }
Mat image( 600, 600, CV_8UC3, Scalar(127,127,127) );
int cn = src.channels();
uint8_t* pixelPtr = (uint8_t*)src.data;
for(int i=0 ; i< src.rows;i++) {
for(int j=0 ; j< src.cols;j++) {
Scalar_<uint8_t> bgrPixel;
bgrPixel.val[0] = pixelPtr[i*src.cols*cn + j*cn + 0]; // B
bgrPixel.val[1] = pixelPtr[i*src.cols*cn + j*cn + 1]; // G
bgrPixel.val[2] = pixelPtr[i*src.cols*cn + j*cn + 2]; // R
if(bgrPixel.val[2] !=0 ) { // avoid division by zero
float a= image.cols/2+50*(log((float)bgrPixel.val[0] / (float)bgrPixel.val[2])) ;
float b= image.rows/2+50*(log((float)bgrPixel.val[1] / (float)bgrPixel.val[2])) ;
if(!isinf(a) && !isinf(b))
image.at<Vec3b>(a,b)=Vec3b(255,2,3);
}
}
}
imshow("log-chroma", image );
imwrite("log-chroma.png", image );
waitKey(0);
}
What I am missing or misunderstand?

By reading the paper Recovery of Chromaticity Image Free from Shadows via Illumination Invariance that you've posted, and your code, I guess the problem is that your coordinate system (X/Y axis) are linear while in the paper the coordinate system are log(R/G) by log(B/G).

This is the closest I can figure. Reading through this:
http://www2.cmp.uea.ac.uk/Research/compvis/Papers/DrewFinHor_ICCV03.pdf
I came across the sentence:
"Fig. 2(a) shows log-chromaticities for the 24 surfaces of a Macbeth ColorChecker Chart, (the six neutral patches all belong to the same
cluster). If we now vary the lighting and plot median values
for each patch, we see the curves in Fig. 2(b)."
If you look closely at the log-chromaticity plot, you see 19 blobs, corresponding to each of the 18 colors in the Macbeth chart, plus the sum of all the 6 grayscale targets in the bottom row:
Explanation of Log Chromaticities
Explanation of Log Chromaticities
With 1 picture, we can only get 1 point of each blob: We take the median value inside each target and plot it. To get plot from the paper, we would have to create multiple images with different lighting. We might be able to do this by varying the temperature of the image in an image editor.
For now, I just looked at the color patches in the original image and plotted the points:
Input:
Color Patches Used
Output:
Log Chromaticity
The graph dots are not all in the same place as the paper, but I figure it's fairly close. Would someone please check my work to see if this makes sense?

In that OpenCV code I got a "undefined Identifier error" for the function ifinf() and I solved it by replacing it with _finite(). That might be the issue with the Visual studio version.
if(!isinf(a) && !isinf(b)) ----> if(_finite(a) && _finite(b))
Include this header:
#include<float.h>

Related

Template Matching with Mask

I want to perform Template matching with mask. In general Template matching can be made faster by converting the image from Spacial domain into Frequency domain. But is there any any method i can apply if i want to perform the same with mask? I'm using opencv c++. Is there any matching function already there in opencv for this task?
My current Approach:
Bitwise Xor Image A & Image B with Mask.
Count the Non-Zero Pixels.
Fill the Resultant matrix with this count.
Search for maxi-ma.
Few parameters I'm guessing now are:
Skip the Tile position if the matches are less than 25%.
Skip the tile position if the matches are less than 25%.
Skip the Tile position if the previous Tile has matches are less than 50%.
My question: is there any algorithm to do this matching already? Is there any mathematical operation which can speed up this process?
With binary images, you can use directly HU-Moments and Mahalanobis distance to find if image A is similar to image B. If the distance tends to 0, then the images are the same.
Of course you can use also Features detectors so see what matches, but for pictures like these, HU Moments or Features detectors will give approximately same results, but HU Moments are more efficient.
Using findContours, you can extract the black regions inside the white star and fill them, in order to have image A = image B.
Other approach: using findContours on your mask and apply the result to Image A (extracting the Region of Interest), you can extract what's inside the star and count how many black pixels you have (the mismatching ones).
I have same requirement and I have tried the almost same way. As in the image, I want to match the castle. The castle has a different shield image and variable length clan name and also grass background(This image comes from game Clash of Clans). The normal opencv matchTemplate does not work. So I write my own.
I follow the ways of matchTemplate to create a result image, but with different algorithm.
The core idea is to count the matched pixel under the mask. The code is following, it is simple.
This works fine, but the time cost is high. As you can see, it costs 457ms.
Now I am working on the optimization.
The source and template images are both CV_8U3C, mask image is CV_8U. Match one channel is OK. It is more faster, but it still costs high.
Mat tmp(matTempl.cols, matTempl.rows, matTempl.type());
int matchCount = 0;
float maxVal = 0;
double areaInvert = 1.0 / countNonZero(matMask);
for (int j = 0; j < resultRows; j++)
{
float* data = imgResult.ptr<float>(j);
for (int i = 0; i < resultCols; i++)
{
Mat matROI(matSource, Rect(i, j, matTempl.cols, matTempl.rows));
tmp.setTo(Scalar(0));
bitwise_xor(matROI, matTempl, tmp);
bitwise_and(tmp, matMask, tmp);
data[i] = 1.0f - float(countNonZero(tmp) * areaInvert);
if (data[i] > matchingDegree)
{
SRect rc;
rc.left = i;
rc.top = j;
rc.right = i + imgTemplate.cols;
rc.bottom = j + imgTemplate.rows;
rcOuts.push_back(rc);
if ( data[i] > maxVal)
{
maxVal = data[i];
maxIndex = rcOuts.size() - 1;
}
if (++matchCount == maxMatchs)
{
Log_Warn("Too many matches, stopped at: " << matchCount);
return true;
}
}
}
}
It says I have not enough reputations to post image....
http://i.stack.imgur.com/mJrqU.png
New added:
I success optimize the algorithm by using key points. Calculate all the points is cost, but it is faster to calculate only server key points. See the picture, the costs decrease greatly, now it is about 7ms.
I still can not post image, please visit: http://i.stack.imgur.com/ePcD9.png
Please give me reputations, so I can post images. :)
There is a technical formulation for template matching with mask in OpenCV Documentation, which works well. It can be used by calling cv::matchTemplate and its source code is also available under the Intel License.

Recognizing an image from a list with OpenCV SIFT using the FLANN matching

The point of the application is to recognize an image from an already set list of images. The list of images have had their SIFT descriptors extracted and saved in files. Nothing interesting here:
std::vector<cv::KeyPoint> detectedKeypoints;
cv::Mat objectDescriptors;
// Extract data
cv::SIFT sift;
sift.detect(image, detectedKeypoints);
sift.compute(image, detectedKeypoints, objectDescriptors);
// Save the file
cv::FileStorage fs(file, cv::FileStorage::WRITE);
fs << "descriptors" << objectDescriptors;
fs << "keypoints" << detectedKeypoints;
fs.release();
Then the device takes a picture. SIFT descriptors are extracted in the same way. The idea now was to compare the descriptors to the ones from the files. I am doing that using the FLANN matcher from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list I should have the best match.
const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1);
const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64);
// Match using Flann
cv::Mat indexMat;
cv::FlannBasedMatcher matcher(indexParams, searchParams);
std::vector< cv::DMatch > matches;
matcher.match(objectDescriptors, readDescriptors, matches);
After matching I understand that I get a list of the closest found distances between the feature vectors. I find the minimum distance and, using it I can count "good matches" and even get a list of the respective points:
// Count the number of mathes where the distance is less than 2 * min_dist
int goodCount = 0;
for (int i = 0; i < objectDescriptors.rows; i++)
{
if (matches[i].distance < 2 * min_dist)
{
++goodCount;
// Save the points for the homography calculation
obj.push_back(detectedKeypoints[matches[i].queryIdx].pt);
scene.push_back(readKeypoints[matches[i].trainIdx].pt);
}
}
I'm showing easy parts of the code just to make this more easy to follow, I know some of it doesn't need to be here.
Continuing, I was hoping that simply counting the number of good matches like this would be enough, but it turned out to mostly just point me to the image with the most descriptors. What I tried to after this was computing the homography. The aim was to compute it and see whether it's a valid homoraphy or not. The hope was that a good match, and only a good match, would have a homography that is a good transformation. Creating the homography was done simply using cv::findHomography on the obj and scene which are std::vector< cv::Point2f>. I checked the validity of the homography using some code I found online:
bool niceHomography(cv::Mat H)
{
std::cout << H << std::endl;
const double det = H.at<double>(0, 0) * H.at<double>(1, 1) - H.at<double>(1, 0) * H.at<double>(0, 1);
if (det < 0)
{
std::cout << "Homography: bad determinant" << std::endl;
return false;
}
const double N1 = sqrt(H.at<double>(0, 0) * H.at<double>(0, 0) + H.at<double>(1, 0) * H.at<double>(1, 0));
if (N1 > 4 || N1 < 0.1)
{
std::cout << "Homography: bad first column" << std::endl;
return false;
}
const double N2 = sqrt(H.at<double>(0, 1) * H.at<double>(0, 1) + H.at<double>(1, 1) * H.at<double>(1, 1));
if (N2 > 4 || N2 < 0.1)
{
std::cout << "Homography: bad second column" << std::endl;
return false;
}
const double N3 = sqrt(H.at<double>(2, 0) * H.at<double>(2, 0) + H.at<double>(2, 1) * H.at<double>(2, 1));
if (N3 > 0.002)
{
std::cout << "Homography: bad third row" << std::endl;
return false;
}
return true;
}
I don't understand the math behind this so, while testing, I sometimes replaced this function with a simple check whether the determinant of the homography was positive. The problem is that I kept having issues here. The homographies were either all bad, or good when they shouldn't have been (when I was checking only the determinant).
I figured I should actually use the homography and for a number of points just compute their position in the destination image using their position in the source image. Then I would compare these average distances, and I would ideally get a very obvious smaller average distance in the case of the correct image. This did not work at all. All the distances were colossal. I thought I might have used the homography the other way around to calculate the right position, but switching obj and scene with each other gave similar results.
Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting the n smallest distances for every image instead of a number depending on the minimum distance, or getting distances depending on a global maximum distance. None of these approaches gave me definite good results, and I feel stuck now.
My only next strategy would be to sharpen the images or even turn them to binary images using some local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistake anyone can see in my work.
I don't know whether this is relevant, but I added some of the images I am testing this on. Many times in the test images most of the SIFT vectors come from the frame (higher contrast) than the painting. This is why I'm thinking sharpening the images might work, but I don't want to go deeper in case something I did previously is wrong.
The gallery of images is here with the descriptions in the titles. The images are of quite high resolution, please view in case it might give some hints.
You can try to test if when matching, the lines between the source image and the target image are relatively parallel. If it's not a correct match, then you'd have a lot of noise and the lines won't be parallel.
See the attached image which shows a correct match (using SURF and BF) - all the lines are mostly parallel (though I should point out that this is an easy example).
You are going correct way.
First, use second nearest ratio isntead of your "good match by 2*min_dist" https://stackoverflow.com/a/23019889/1983544.
Second, use homography other way. When you find homography, you have not only H ,matrix, but the number of correspondences consistent with it. Check if it is some reasonable number, say >=15. If less, than object is not matched.
Third, if you have a big viewpoint change, SIFT or SURF are unable to match images. Try to use MODS instead (http://cmp.felk.cvut.cz/wbs/ here is Windows and Linux binaries, as well as paper describing algorithm) or ASIFT (much slower and matches much worse, but open source) http://www.ipol.im/pub/art/2011/my-asift/
Or at least use MSER or Hessian-Affine detector instead of SIFT (retaining SIFT as descriptor).

Motion detection using OpenCV/C++, threshold always become zero

I work on C++ crowd detection code with “OpenCV”, that takes 2 frames and subtracts them. Then compare the result with threshold.
This is the first time I deal with “OpenCV” for C++ and I don't have much information about it.
These are the steps of the code:
Take two video frames with α minutes in between.
Convert frames to black and white images.
Subtract the two frames.
Compare the difference with the threshold.
If Difference <= threshold then crowd detected else--> there is no crowd.
c++ code:
#include <iostream>
#include <opencv2/opencv.hpp>
using namespace std;
using namespace cv;
int main (int argc, const char * argv[])
{
//first frame.
Mat current_frame = imread("image1.jpg",CV_LOAD_IMAGE_GRAYSCALE);
//secunde frame.
Mat previous_frame = imread("image2.jpg",CV_LOAD_IMAGE_GRAYSCALE);
//Minus the current frame from the previous_frame and store the result.
Mat result = current_frame-previous_frame;
//compare the difference with threshold, if the deference <70 -> there is crowd.
//if it is > 70 there is no crowd
int threshold= cv::threshold(result,result,0,70,CV_THRESH_BINARY);
if (threshold==0) {
cout<< "crowd detected \n";
}
else {
cout<<" no crowd detected \n ";
}
}
The problem is :
The threshold always be zero!
and the output always:
crowd detected
even if there is no crowd
We don't care about the output image because we won't use it, and we just want to know the last value of threshold.
My aim is to know how much deference between 2 frames. I want to compare the deference with threshold to detect the human crowd in specific place
I hope that one of you can help me
Thank you
there's a couple of flaws in your usage of the threshold function
it just returns the threshold value ( not what you expected )
'We don't care about the output image' - well, you'd better!
if you want to threshold against a value of 70, that should be your 3rd arg, not the 4th (the 4th arg is the value, anything >thresh is set to
what you probably wanted, is :
cv::threshold(result,result,70,1, CV_THRESH_BINARY); // same as : result = result>70;
int on_pixels = countNonZero(result);
// or:
int on_pixels = sum(result)[0];
[sidenote: if you really want to detect human crowds, you'll have to put much more sweat into this. just diff'ing frames is prone to err with illumination changes, also, there's birds, cars and traffic lights]

Pre-processing before digit recognition with KNN classifier

Right now I'm trying to create digit recognition system using OpenCV. There are many articles and examples in WEB (and even on StackOverflow). I decided to use KNN classifier because this solution is the most popular in WEB. I found a database of handwritten digits with a training set of 60k examples and with error rate less than 5%.
I used this tutorial as an example of how to work with this database using OpenCV. I'm using exactly same technique and on test data (t10k-images.idx3-ubyte) I've got 4% error rate. But when I try to classify my own digits I've got much bigger error. For example:
is recognized as 7
and are recognized as 5
and are recognized as 1
is recognized as 8
And so on (I can upload all images if it's needed).
As you can see all digits have good quality and are easily-recognizable for human.
So I decided to do some pre-processing before classifying. From the table on MNIST database site I found that people are using deskewing, noise removal, blurring and pixel shift techniques. Unfortunately almost all links to the articles are broken. So I decided to do such pre-processing by myself, because I already know how to do that.
Right now, my algorithm is the following:
Erode image (I think that my original digits are too
rough).
Remove small contours.
Threshold and blur image.
Center digit (instead of shifting).
I think that deskewing is not needed in my situation because all digits are normally rotated. And also I have no idea how to find a right rotation angle.
So after this I've got these images:
is also 1
is 3 (not 5 as it used to be)
is 5 (not 8)
is 7 (profit!)
So, such pre-processing helped me a bit, but I need better results, because in my opinion such digits should be recognized without problems.
Can anyone give me any advice with pre-processing? Thanks for any help.
P.S. I can upload my source (c++) code.
I realized my mistake - it wasn't connected with pre-processing at all (thanks to #DavidBrown and #John). I used handwritten dataset of digits instead of printed (capitalized). I didn't find such database in the web so I decided to create it by myself. I have uploaded my database to the Google Drive.
And here's how you can use it (train and classify):
int digitSize = 16;
//returns list of files in specific directory
static vector<string> getListFiles(const string& dirPath)
{
vector<string> result;
DIR *dir;
struct dirent *ent;
if ((dir = opendir(dirPath.c_str())) != NULL)
{
while ((ent = readdir (dir)) != NULL)
{
if (strcmp(ent->d_name, ".") != 0 && strcmp(ent->d_name, "..") != 0 )
{
result.push_back(ent->d_name);
}
}
closedir(dir);
}
return result;
}
void DigitClassifier::train(const string& imagesPath)
{
int num = 510;
int size = digitSize * digitSize;
Mat trainData = Mat(Size(size, num), CV_32FC1);
Mat responces = Mat(Size(1, num), CV_32FC1);
int counter = 0;
for (int i=1; i<=9; i++)
{
char digit[2];
sprintf(digit, "%d/", i);
string digitPath(digit);
digitPath = imagesPath + digitPath;
vector<string> images = getListFiles(digitPath);
for (int j=0; j<images.size(); j++)
{
Mat mat = imread(digitPath+images[j], 0);
resize(mat, mat, Size(digitSize, digitSize));
mat.convertTo(mat, CV_32FC1);
mat = mat.reshape(1,1);
for (int k=0; k<size; k++)
{
trainData.at<float>(counter*size+k) = mat.at<float>(k);
}
responces.at<float>(counter) = i;
counter++;
}
}
knn.train(trainData, responces);
}
int DigitClassifier::classify(const Mat& img) const
{
Mat tmp = img.clone();
resize(tmp, tmp, Size(digitSize, digitSize));
tmp.convertTo(tmp, CV_32FC1);
return knn.find_nearest(tmp.reshape(1, 1), 5);
}
5 & 6 , 1 & 7, 9 & 8 are recognized as the same because central points of classes are too similar. What about this ?
Apply connected component labeling method to digits for getting real boundaries of digits and crop images over these boundaries. So, you will work on more correct area and central points are normalized.
Then divide digits into two parts as horizontally. (For example you will have two circles after dividing "8")
As a result, "9" and "8" are more recognizable as well as "5" and "6". Upper parts will be same but lower parts are different.
I can not give you a better answer than your own answer, but I would like to contribute with an advise. You could improve your digits recognition system on the following way:
Apply over the white and black patch an skeletonization process.
After that, apply distance transform.
On this way you can improve results of the classifier when digits are not exactly centered or they are not exactly the same, morphologically speaking.

How to randomly choose sample points that maximize space occupation?

I would like to generate the sample points that can randomly fill/cover a space (like in the attached image). I think they have a method called "Quasi-random" that can generate such sample points. However, it's a little bit far from my knowledge. Can someone make suggestions or help me find a library that can be do this? Or suggest how to start writing such a program?
In the image, 256 sample points are applied on the given space, placed at random positions to cover the whole given space.
Update:
I just try to use some code from Halton Quasi-random Sequence and compare with the result of pseudo-random which is post by friend below. The result of Halton's method is more better in my opinion. I would like to share some result as below;
The code which I wrote is
#include "halton.hpp"
#include "opencv2/opencv.hpp"
int main()
{
int m_dim_num = 2;
int m_n = 50;
int m_seed[2], m_leap[2], m_base[2];
double m_r[100];
for (int i = 0; i < m_dim_num; i++)
{
m_seed[i] = 0;
m_leap[i] = 1;
m_base[i] = 2+i;
}
cv::Mat out(100, 100, CV_8UC1);
i4_to_halton_sequence( m_dim_num, m_n, 0, m_seed, m_leap, m_base, m_r);
int displaced = 100;
for (int i = 0; i < 100; i=i+2)
{
cv::circle(out, cv::Point2d((m_r[i])*displaced, (m_r[i+1])*displaced), 1, cv::Scalar(0, 255, 0), 1, 8, 0);
}
cv::imshow("test", out);
cv::waitKey(0);
return 0;
}
As I little bit familiar with OpenCV, I wrote this code by plot on the matrix of OpenCV (Mat). The "i4_to_halton_sequence()" is the function from the library that I mentioned above.
The result is not better, but might be use in somehow for my work. Someone have another idea?
I am going to give an answer that will seem half-assed. However, this topic has been studied extensively in the literature, so I will just refer you to some summaries from Wikipedia and other places online.
What you want is also called low-discrepancy sequence (or quasi-random, as you pointed out). You can read more about it here: http://en.wikipedia.org/wiki/Low-discrepancy_sequence. It's useful for a number of things, which includes numerical integration and, more recently, simulating retinal ganglion mosaic.
There are many ways to generate low-discrepancy sequences (or pseudo quasi random sequences :p). Some of these are in ACM Collected Algorithms (http://www.netlib.org/toms/index.html).
The most common of which, I think, is called Sobol sequence (algorithm 659 from the ACM thing). You can get some details on this here: http://en.wikipedia.org/wiki/Sobol_sequence
For the most part, unless you are really into it, that stuff looks pretty scary. For quick result, I would use GNU's GSL (GNU Scientific Library): http://www.gnu.org/software/gsl/
This library includes code to generate quasi-random sequences (http://www.gnu.org/software/gsl/manual/html_node/Quasi_002dRandom-Sequences.html) including Sobol sequence (http://www.gnu.org/software/gsl/manual/html_node/Quasi_002drandom-number-generator-examples.html).
If you're still stuck, I can paste some code here, but you're better off digging into GSL.
Well here's another way to do quasi-random that covers the entire space.
Since you have 256 points to use, you can start by plotting those points as a 16x16 grid.
Then apply some function that give some random offset to each point (say 0 to ±2 to the points' x and y coordinates).
You could create equidistant points (all points have same distance to their neighbors) and then, in a second step, move each point randomly a bit so that they appear 'random'.
The second idea I have is:
1. Start with one area.
2. Create a random point P rand about the 'middle' of your area.
3. Divide the area into 4 areas by that point. P is the upper right corner of the lower left subarea, the upper left corner of the lower right area and so on.
4. Repeat steps 2..4 for all 4 sub areas. Of course, not forever, but until you're satisfied.
This algorithms ensures that each 'hole' (i.e. the new sub area) is filled with a point.
Update: Your initial area should be twice as large as your area, because of step (2). This ensures having points at the edges and corners as well.
This is called a "low discrepancy sequence". The linked Wikipage explains how you can generate them.
But I suspect you already knew this, as your image is very similar to the 2,3 Halton sequence example from Wikipedia
You just need library rand() function:
#include <stdlib.h>
#include <time.h>
unsigned int N = 256; //number of points
int RANGE_X = 100; //x range to put sample points in
int RANGE_Y = 100;
void PutSamplePoint(int x, int y)
{
//some your code putting sample point on field
}
int main()
{
srand((unsigned)time(0)); //initialize random generator - uses current time as seed
for(unsigned int i = 0; i < N; i++)
{
int x = rand() % RANGE_X; //returns random value in range [0, RANGE_X)
int y = rand() % RANGE_Y;
PutSamplePoint(x, y);
}
return 0;
}