OpenCV C++ Bag Of Words - c++

Everywhere online, you can find little tutorials on different segments of a BOW, but (from I've found anyway) nothing on what you do after:
bowDE.setVocabulary(dictionary);
...
bowDE.compute(image, keypoints, descriptors);
Once you've used the BOWImgDescriptorExtractor to compute, what do you then do?
How do you find out what is a good match, and what is not?
And then can you then utilize that information?
If so, how?

If you have both descriptor and extractor, you can use a matcher to find matches.
Here is a sample function:
void drawMatches(const Mat& Img1,const Mat& Img2,const vector<KeyPoint>& Keypoints1,
const vector<KeyPoint>& Keypoints2,const Mat& Descriptors1,const Mat& Descriptors2)
{
Ptr<DescriptorMatcher> descriptorMatcher = DescriptorMatcher::create( "BruteForce" ); //
vector<DMatch> matches;
descriptorMatcher->match( Descriptors1, Descriptors2, matches );
Mat matchImg;
drawMatches(Img1,Keypoints1,Img2,Keypoints2,matches,matchImg,Scalar::all(-1),CV_RGB(255,255,255),Mat(),4);
imshow("match",show);
}
Once you get those matches, you can determine which matches are "good" by inspecting their max distance, average distance, total match size and so on.
There is also an official tutorial about how to use those descriptors and keypoints to get matches
Features2D + Homography to find a known object
Although it uses a different feature detector from yours, you can still use the matching part of the article.
Update:
There is no way to make an accurate answer to whether a match is a "correct" match. But you can get the values of the matching pairs.
Here is an example of "wrong" matches and "right" matches, using SIFT feature detector and BruteForce matcher.
Part of the code:
size_t matches_size = matches.size();
for( unsigned i = 0; i < matches_size; i++ )
{
if( matches[i].distance < MY_GOOD_DISTANCE)//You can get the matching distance like this.
{
good_matches.push_back( matches[i]);
}
}
This is a right match.
After computing the matches, I listed the distance of the matches:
27.7669 43.715 45.2217 47.4552 53.1601 54.074 57.3672 58.2924 59.0593 63.3009
63.6475 64.1093 64.8922 67.0075 70.9718 73.4507 74.0878 76.6225 76.6551 80.075
81.2219 82.2192 83.6959 89.2412 90.7855 91.4604 95.3363 95.352 95.6033 98.209
98.3362 98.3412 99.4082 101.035 104.024 109.567 110.095 110.345 112.858 118.339
119.311 123.976 125.948 126.625 128.02 128.269 130.219 133.015 135.739 138.43
144.499 146.055 146.492 147.054 152.925 160.044 161.165 168.899 170.871 179.881
183.39 183.573 187.061 192.764 192.961 194.268 194.44 196.489 202.255 204.854
230.643 230.92 231.961 233.238 235.253 236.023 244.225 246.337 253.829 260.384
261.383 263.934 266.933 269.232 272.586 273.651 283.891 289.261 291.805 297.165
297.22 297.627 304.132 307.633 307.695 314.798 325.294 334.74 335.272 344.17
352.095 353.456 354.144 357.398 363.762 366.344 367.301 368.977 371.102 371.44
371.863 372.459 372.85 373.17 376.082 378.844 382.372 389.01 389.704 397.028
398.236 400.53 414.523 417.628 422.61 430.731 461.3
Min value: 27.76
Max value: 461.3
Average: 210.2526882
And here is a wrong match:
336.161 437.132 310.587 376.245 368.683 449.708 334.148 354.79 333.981 399.794 368.889
361.653 341.778 266.443 259.365 338.726 352.789 381.097 427.143 350.732 355.522 349.819
358.569 373.139 348.201 341.923 383.188 378.233 399.844 294.16 505.107 347.978 314.021
332.983 335.364 403.217 385.8 408.859 381.472 372.078 434.167 436.489 279.646 253.271
268.522 376.303 418.071 373.3 369.004 272.145 254.448 408.185 326.351 351.886 333.981
371.59 440.336 230.558 250.928 337.368 288.579 262.107 409.971 339.391 380.58 374.162
361.96 392.59 345.936 328.691 383.586 398.986 336.283 365.768 492.984 392.379 377.042
371.652 279.014 370.849 378.213 351.048 311.148 319.168 324.268 319.191 261.555 339.257
298.572 241.622 406.977 286.068 438.586
Min value: 230
Max value: 505
Average: 352.6009711
After you get the distance of all matches, you can easily see what is a "good" match and what is a "bad" match.
Here's the scoring part. A little bit tricky, highly related with data.
MY_AVG_DISTANCE, MY_LEAST_DISTANCE, MY_MAX_DISTANCE and MY_GOOD_DISTANCE are values you should carefully select. Check your own matching distances, and select some value for them.
int good_size = good_matches.size() > 30 ? 30 : good_matches.size(); //In case there are too many "good matches"
//...
//===========SCORE ============
double avg = 0; //Calculates the average of some of the matches.
int avgCount = 0;
int goodCount = 0 ;
for( unsigned i = 0; i < matches.size(); i++ )
{
double dist = matches[i].distance;
if( dist < MY_AVG_DISTANCE && dist > MY_LEAST_DISTANCE )
{
avg += dist;
avgCount++;
}
if(dist < MY_GOOD_DISTANCE && dist > MY_LEAST_DISTANCE ){
goodCount++;
}
}
if(avgCount > 6){
avg /= avgCount;
if(goodCount < 12){
avg = avg + (12-goodCount) * 4;
}
}else{
avg = MY_MAX_DISTANCE;
}
avg = avg > MY_AVG_DISTANCE ? MY_AVG_DISTANCE : avg;
avg = avg < MY_MIN_DISTANCE ? MY_MIN_DISTANCE : avg;
double score_avg = (MY_AVG_DISTANCE - avg) / ( MY_AVG_DISTANCE - MY_MIN_DISTANCE ) * 100;
if(formsHomography){ //Some bonus...not related with your matching method, but you can adopt something like this
score_avg += 40;
score_avg = score_avg > 100 ? 100 : score_avg;
}else{
score_avg -= 5;
score_avg = score_avg < 0 ? 0 : score_avg;
}
return score_avg;

You can find a simple implementation of bag of words in C++ here. So you don't need OpenCV to depend on.
class Statistics {
std::unordered_map<std::string, int64_t> _counts;
int64_t _totWords;
void process(std::string& token);
public:
explicit Statistics(const std::string& text);
double Dist(const Statistics& fellow) const;
bool IsEmpty() const { return _totWords == 0; }
};
namespace {
const std::string gPunctStr = ".,;:!?";
const std::unordered_set<char> gPunctSet(gPunctStr.begin(), gPunctStr.end());
}
Statistics::Statistics(const std::string& text) {
std::string lastToken;
for (size_t i = 0; i < text.size(); i++) {
int ch = static_cast<uint8_t>(text[i]);
if (!isspace(ch)) {
lastToken.push_back(tolower(ch));
continue;
}
process(lastToken);
}
process(lastToken);
}
void Statistics::process(std::string& token) {
do {
if (token.size() == 0) {
break;
}
if (gPunctSet.find(token.back()) != gPunctSet.end()) {
token.pop_back();
}
} while (false);
if (token.size() != 0) {
auto it = _counts.find(token);
if (it == _counts.end()) {
_counts.emplace(token, 1);
}
else {
it->second++;
}
_totWords++;
token.clear();
}
}
double Statistics::Dist(const Statistics& fellow) const {
double sum = 0;
for (const auto& wordInfo : _counts) {
const std::string wordText = wordInfo.first;
const double freq = double(wordInfo.second) / _totWords;
auto it = fellow._counts.find(wordText);
double fellowFreq;
if (it == fellow._counts.end()) {
fellowFreq = 0;
}
else {
fellowFreq = double(it->second) / fellow._totWords;
}
const double d = freq - fellowFreq;
sum += d * d;
}
return std::sqrt(sum);
}

Related

OpenCV::LMSolver getting a simple example to run

PROBLEM: The documentation for cv::LMSolver at opencv.org is very thin, to say the least. Finding some useful examples on the internet, also, was not possible.
APPROACH: So, I did write some simple code:
#include <opencv2/calib3d.hpp>
#include <iostream>
using namespace cv;
using namespace std;
struct Easy : public LMSolver::Callback {
Easy() = default;
virtual bool compute(InputArray f_param, OutputArray f_error, OutputArray f_jacobian) const override
{
Mat param = f_param.getMat();
if( f_error.empty() ) f_error.create(1, 1, CV_64F); // dim(error) = 1
Mat error = f_error.getMat();
vector<double> x{param.at<double>(0,0), param.at<double>(1,0)}; // dim(param) = 2
double error0 = calc(x);
error.at<double>(0,0) = error0;
if( ! f_jacobian.needed() ) return true;
else if( f_jacobian.empty() ) f_jacobian.create(1, 2, CV_64F);
Mat jacobian = f_jacobian.getMat();
double e = 1e-10; // estimate derivatives in epsilon environment
jacobian.at<double>(0, 0) = (calc({x[0] + e, x[1] }) - error0) / e; // d/dx0 (error)
jacobian.at<double>(0, 1) = (calc({x[0], x[1] + e}) - error0) / e; // d/dx1 (error)
return true;
}
double calc(const vector<double> x) const { return x[0]*x[0] + x[1]*x[1]; }
};
int main(int argc, char** argv)
{
Ptr<Easy> callback = makePtr<Easy>();
Ptr<LMSolver> solver = LMSolver::create(callback, 100000, 1e-37);
Mat parameters = (Mat_<double>(2,1) << 5, 100);
solver->run(parameters);
cout << parameters << endl;
}
QUESTIONS:
What does the return value of LMSolver::Callback::compute() report to the caller?
Currently, it finds the minimum at (-9e-07,4e-5), instead of the expected (0.0, 0.0). How can the precision be improved?
What does the return value of LMSolver::Callback::compute() report to the caller?
Thankfully, opencv is opensource, so we might be able to figure this out simply by checking out the code.
Looking at the source code on Github, I found that all of the calls to compute() look like:
if( !cb->compute(x, r, J) )
return -1;
Returning false simply causes the solver to bail out. So it seems that the return value of the callback's compute() is simply whether the generation of the jacobian was successful or not.
Currently, it finds the minimum at (-9e-07,4e-5). How can the precision be improved?
If anything, you should at least compare the return value of run() against your maximum iteration count to make sure that it did, in fact, converge as much as it could.
I suppose OP wants to minimize x^2 + y^2 with respect to [x, y].
Because Levenberg-Marquardt method solves a least square problem, the error should be defined as [x, y] so that it minimizes || [x, y] ||^2 = x^2 + y^2.
Another suggestion is that the Jacobian matrix should be provided analytically whenever possible, although this is not crucial in this particular case.
struct Easy : public LMSolver::Callback {
Easy() = default;
virtual bool compute(InputArray f_param, OutputArray f_error, OutputArray f_j
acobian) const override
{
Mat param = f_param.getMat();
if( f_error.empty() ) f_error.create(2, 1, CV_64F); // dim(error) = 2
Mat error = f_error.getMat();
vector<double> x{param.at<double>(0,0), param.at<double>(1,0)}; // dim(param) = 2
error.at<double>(0, 0) = x[0];
error.at<double>(1, 0) = x[1];
if( ! f_jacobian.needed() ) return true;
else if( f_jacobian.empty() ) f_jacobian.create(2, 2, CV_64F);
Mat jacobian = f_jacobian.getMat();
jacobian.at<double>(0, 0) = 1;
jacobian.at<double>(0, 1) = 0;
jacobian.at<double>(1, 0) = 0;
jacobian.at<double>(1, 1) = 1;
return true;
}
};

What is wrong with this MLP backpropogation implementation?

I am writing a MLP neural network in C++, and I am struggling with backpropogation. My implementation follows this article closely, but I've done something wrong and can't spot the problem. My Matrix class confirms that there are no mismatched dimensions in any of Matrix calculations, but the output seems to always approach zero or some variant of infinity. Is this "vanishing" or "exploding" gradients as mentioned here, or is there something else going wrong?
Here is my activation function and its derivative:
double sigmoid(double d) {
return 1/(1+exp(-d));
}
double dsigmoid(double d) {
return sigmoid(d) * (1 - sigmoid(d));
}
Here is my training algorithm:
void KNN::train(const Matrix& input, const Matrix& target) {
this->layer[0] = input;
for(uint i = 1; i <= this->num_depth+1; i++) {
this->layer[i] = Matrix::multiply(this->weights[i-1], this->layer[i-1]);
this->layer[i] = Matrix::function(this->layer[i], sigmoid);
}
this->deltas[this->num_depth+1] = Matrix::multiply(Matrix::subtract(this->layer[this->num_depth+1], target), Matrix::function(Matrix::multiply(this->weights[this->num_depth], this->layer[this->num_depth]), dsigmoid), true);
this->gradients[this->num_depth+1] = Matrix::multiply(this->deltas[this->num_depth+1], Matrix::transpose(this->layer[this->num_depth]));
this->weights[this->num_depth] = Matrix::subtract(this->weights[this->num_depth], Matrix::multiply(Matrix::multiply(this->weights[this->num_depth], this->learning_rate), this->gradients[this->num_depth+1], true));
for(int i = this->num_depth; i > 0; i--) {
this->deltas[i] = Matrix::multiply(Matrix::multiply(Matrix::transpose(this->weights[i]), this->deltas[i+1]), Matrix::function(Matrix::multiply(this->weights[i-1], this->layer[i-1]), dsigmoid), true);
this->gradients[i] = Matrix::multiply(this->deltas[i], Matrix::transpose(this->layer[i-1]));
this->weights[i-1] = Matrix::subtract(this->weights[i-1], Matrix::multiply(Matrix::multiply(this->weights[i-1], this->learning_rate), this->gradients[i], true));
}
}
The third argument in Matrix::multiply tells whether or not to use Hadamard product (default is false). this->num_depth is the number of hidden layers.
Adding biases seems to do... something, but output almost always tends towards zero.
void KNN::train(const Matrix& input, const Matrix& target) {
this->layer[0] = input;
for(uint i = 1; i <= this->num_depth+1; i++) {
this->layer[i] = Matrix::multiply(this->weights[i-1], this->layer[i-1]);
this->layer[i] = Matrix::add(this->layer[i], this->biases[i-1]);
this->layer[i] = Matrix::function(this->layer[i], this->activation);
}
this->deltas[this->num_depth+1] = Matrix::multiply(Matrix::subtract(this->layer[this->num_depth+1], target), Matrix::function(Matrix::multiply(this->weights[this->num_depth], this->layer[this->num_depth]), this->dactivation), true);
this->gradients[this->num_depth+1] = Matrix::multiply(this->deltas[this->num_depth+1], Matrix::transpose(this->layer[this->num_depth]));
this->weights[this->num_depth] = Matrix::subtract(this->weights[this->num_depth], Matrix::multiply(Matrix::multiply(this->weights[this->num_depth], this->learning_rate), this->gradients[this->num_depth+1], true));
this->biases[this->num_depth] = Matrix::subtract(this->biases[this->num_depth], Matrix::multiply(this->deltas[this->num_depth+1], this->learning_rate * .5));
for(uint i = this->num_depth+1 -1; i > 0; i--) {
this->deltas[i] = Matrix::multiply(Matrix::multiply(Matrix::transpose(this->weights[i+1 -1]), this->deltas[i+1]), Matrix::function(Matrix::multiply(this->weights[i-1], this->layer[i-1]), this->dactivation), true);
this->gradients[i] = Matrix::multiply(this->deltas[i], Matrix::transpose(this->layer[i-1]));
this->weights[i-1] = Matrix::subtract(this->weights[i-1], Matrix::multiply(Matrix::multiply(this->weights[i-1], this->learning_rate), this->gradients[i], true));
this->biases[i-1] = Matrix::subtract(this->biases[i-1], Matrix::multiply(this->deltas[i], this->learning_rate * .5));
}
}

unexpected results with word2vec algorithm

I implemented word2vec in c++.
I found the original syntax to be unclear, so I figured I'd re-implement it, using all the benefits of c++ (std::map, std::vector, etc)
This is the method that actually gets called every time a sample is trained (l1 denotes the index of the first word, l2 the index of the second word, label indicates whether it is a positive or negative sample, and neu1e acts as the accumulator for the gradient)
void train(int l1, int l2, double label, std::vector<double>& neu1e)
{
// Calculate the dot-product between the input words weights (in
// syn0) and the output word's weights (in syn1neg).
auto f = 0.0;
for (int c = 0; c < m__numberOfFeatures; c++)
f += syn0[l1][c] * syn1neg[l2][c];
// This block does two things:
// 1. Calculates the output of the network for this training
// pair, using the expTable to evaluate the output layer
// activation function.
// 2. Calculate the error at the output, stored in 'g', by
// subtracting the network output from the desired output,
// and finally multiply this by the learning rate.
auto z = 1.0 / (1.0 + exp(-f));
auto g = m_learningRate * (label - z);
// Multiply the error by the output layer weights.
// (I think this is the gradient calculation?)
// Accumulate these gradients over all of the negative samples.
for (int c = 0; c < m__numberOfFeatures; c++)
neu1e[c] += (g * syn1neg[l2][c]);
// Update the output layer weights by multiplying the output error
// by the hidden layer weights.
for (int c = 0; c < m__numberOfFeatures; c++)
syn1neg[l2][c] += g * syn0[l1][c];
}
This method gets called by
void train(const std::string& s0, const std::string& s1, bool isPositive, std::vector<double>& neu1e)
{
auto l1 = m_wordIDs.find(s0) != m_wordIDs.end() ? m_wordIDs[s0] : -1;
auto l2 = m_wordIDs.find(s1) != m_wordIDs.end() ? m_wordIDs[s1] : -1;
if(l1 == -1 || l2 == -1)
return;
train(l1, l2, isPositive ? 1 : 0, neu1e);
}
which in turn gets called by the main training method.
Full code can be found at
https://github.com/jorisschellekens/ml/tree/master/word2vec
With complete example at
https://github.com/jorisschellekens/ml/blob/master/main/example_8.hpp
When I run this algorithm, the top 10 words 'closest' to father are:
father
Khan
Shah
forgetful
Miami
rash
symptoms
Funeral
Indianapolis
impressed
This the method to calculate the nearest words:
std::vector<std::string> nearest(const std::string& s, int k) const
{
// calculate distance
std::vector<std::tuple<std::string, double>> tmp;
for(auto &t : m_unigramFrequency)
{
tmp.push_back(std::make_tuple(t.first, distance(t.first, s)));
}
// sort
std::sort(tmp.begin(), tmp.end(), [](const std::tuple<std::string, double>& t0, const std::tuple<std::string, double>& t1)
{
return std::get<1>(t0) < std::get<1>(t1);
});
// take top k
std::vector<std::string> out;
for(int i=0; i<k; i++)
{
out.push_back(std::get<0>(tmp[tmp.size() - 1 - i]));
}
// return
return out;
}
Which seems weird.
Is something wrong with my algorithm?
Are you sure, that you get "nearest" words (not farest)?
...
// take top k
std::vector<std::string> out;
for(int i=0; i<k; i++)
{
out.push_back(std::get<0>(tmp[tmp.size() - 1 - i]));
}
...

clustering image segments in opencv

I am working on motion detection with non-static camera using opencv.
I am using a pretty basic background subtraction and thresholding approach to get a broad sense of all that's moving in a sample video. After thresholding, I enlist all separable "patches" of white pixels, store them as independent components and color them randomly with red, green or blue. The image below shows this for a football video where all such components are visible.
I create rectangles over these detected components and I get this image:
So I can see the challenge here. I want to cluster all the "similar" and close-by components into a single entity so that the rectangles in the output image show a player moving as a whole (and not his independent limbs). I tried doing K-means clustering but since ideally I would not know the number of moving entities, I could not make any progress.
Please guide me on how I can do this. Thanks
this problem can be almost perfectly solved by dbscan clustering algorithm. Below, I provide the implementation and result image. Gray blob means outlier or noise according to dbscan. I simply used boxes as input data. Initially, box centers were used for distance function. However for boxes, it is insufficient to correctly characterize distance. So, the current distance function use the minimum distance of all 8 corners of two boxes.
#include "opencv2/opencv.hpp"
using namespace cv;
#include <map>
#include <sstream>
template <class T>
inline std::string to_string (const T& t)
{
std::stringstream ss;
ss << t;
return ss.str();
}
class DbScan
{
public:
std::map<int, int> labels;
vector<Rect>& data;
int C;
double eps;
int mnpts;
double* dp;
//memoization table in case of complex dist functions
#define DP(i,j) dp[(data.size()*i)+j]
DbScan(vector<Rect>& _data,double _eps,int _mnpts):data(_data)
{
C=-1;
for(int i=0;i<data.size();i++)
{
labels[i]=-99;
}
eps=_eps;
mnpts=_mnpts;
}
void run()
{
dp = new double[data.size()*data.size()];
for(int i=0;i<data.size();i++)
{
for(int j=0;j<data.size();j++)
{
if(i==j)
DP(i,j)=0;
else
DP(i,j)=-1;
}
}
for(int i=0;i<data.size();i++)
{
if(!isVisited(i))
{
vector<int> neighbours = regionQuery(i);
if(neighbours.size()<mnpts)
{
labels[i]=-1;//noise
}else
{
C++;
expandCluster(i,neighbours);
}
}
}
delete [] dp;
}
void expandCluster(int p,vector<int> neighbours)
{
labels[p]=C;
for(int i=0;i<neighbours.size();i++)
{
if(!isVisited(neighbours[i]))
{
labels[neighbours[i]]=C;
vector<int> neighbours_p = regionQuery(neighbours[i]);
if (neighbours_p.size() >= mnpts)
{
expandCluster(neighbours[i],neighbours_p);
}
}
}
}
bool isVisited(int i)
{
return labels[i]!=-99;
}
vector<int> regionQuery(int p)
{
vector<int> res;
for(int i=0;i<data.size();i++)
{
if(distanceFunc(p,i)<=eps)
{
res.push_back(i);
}
}
return res;
}
double dist2d(Point2d a,Point2d b)
{
return sqrt(pow(a.x-b.x,2) + pow(a.y-b.y,2));
}
double distanceFunc(int ai,int bi)
{
if(DP(ai,bi)!=-1)
return DP(ai,bi);
Rect a = data[ai];
Rect b = data[bi];
/*
Point2d cena= Point2d(a.x+a.width/2,
a.y+a.height/2);
Point2d cenb = Point2d(b.x+b.width/2,
b.y+b.height/2);
double dist = sqrt(pow(cena.x-cenb.x,2) + pow(cena.y-cenb.y,2));
DP(ai,bi)=dist;
DP(bi,ai)=dist;*/
Point2d tla =Point2d(a.x,a.y);
Point2d tra =Point2d(a.x+a.width,a.y);
Point2d bla =Point2d(a.x,a.y+a.height);
Point2d bra =Point2d(a.x+a.width,a.y+a.height);
Point2d tlb =Point2d(b.x,b.y);
Point2d trb =Point2d(b.x+b.width,b.y);
Point2d blb =Point2d(b.x,b.y+b.height);
Point2d brb =Point2d(b.x+b.width,b.y+b.height);
double minDist = 9999999;
minDist = min(minDist,dist2d(tla,tlb));
minDist = min(minDist,dist2d(tla,trb));
minDist = min(minDist,dist2d(tla,blb));
minDist = min(minDist,dist2d(tla,brb));
minDist = min(minDist,dist2d(tra,tlb));
minDist = min(minDist,dist2d(tra,trb));
minDist = min(minDist,dist2d(tra,blb));
minDist = min(minDist,dist2d(tra,brb));
minDist = min(minDist,dist2d(bla,tlb));
minDist = min(minDist,dist2d(bla,trb));
minDist = min(minDist,dist2d(bla,blb));
minDist = min(minDist,dist2d(bla,brb));
minDist = min(minDist,dist2d(bra,tlb));
minDist = min(minDist,dist2d(bra,trb));
minDist = min(minDist,dist2d(bra,blb));
minDist = min(minDist,dist2d(bra,brb));
DP(ai,bi)=minDist;
DP(bi,ai)=minDist;
return DP(ai,bi);
}
vector<vector<Rect> > getGroups()
{
vector<vector<Rect> > ret;
for(int i=0;i<=C;i++)
{
ret.push_back(vector<Rect>());
for(int j=0;j<data.size();j++)
{
if(labels[j]==i)
{
ret[ret.size()-1].push_back(data[j]);
}
}
}
return ret;
}
};
cv::Scalar HSVtoRGBcvScalar(int H, int S, int V) {
int bH = H; // H component
int bS = S; // S component
int bV = V; // V component
double fH, fS, fV;
double fR, fG, fB;
const double double_TO_BYTE = 255.0f;
const double BYTE_TO_double = 1.0f / double_TO_BYTE;
// Convert from 8-bit integers to doubles
fH = (double)bH * BYTE_TO_double;
fS = (double)bS * BYTE_TO_double;
fV = (double)bV * BYTE_TO_double;
// Convert from HSV to RGB, using double ranges 0.0 to 1.0
int iI;
double fI, fF, p, q, t;
if( bS == 0 ) {
// achromatic (grey)
fR = fG = fB = fV;
}
else {
// If Hue == 1.0, then wrap it around the circle to 0.0
if (fH>= 1.0f)
fH = 0.0f;
fH *= 6.0; // sector 0 to 5
fI = floor( fH ); // integer part of h (0,1,2,3,4,5 or 6)
iI = (int) fH; // " " " "
fF = fH - fI; // factorial part of h (0 to 1)
p = fV * ( 1.0f - fS );
q = fV * ( 1.0f - fS * fF );
t = fV * ( 1.0f - fS * ( 1.0f - fF ) );
switch( iI ) {
case 0:
fR = fV;
fG = t;
fB = p;
break;
case 1:
fR = q;
fG = fV;
fB = p;
break;
case 2:
fR = p;
fG = fV;
fB = t;
break;
case 3:
fR = p;
fG = q;
fB = fV;
break;
case 4:
fR = t;
fG = p;
fB = fV;
break;
default: // case 5 (or 6):
fR = fV;
fG = p;
fB = q;
break;
}
}
// Convert from doubles to 8-bit integers
int bR = (int)(fR * double_TO_BYTE);
int bG = (int)(fG * double_TO_BYTE);
int bB = (int)(fB * double_TO_BYTE);
// Clip the values to make sure it fits within the 8bits.
if (bR > 255)
bR = 255;
if (bR < 0)
bR = 0;
if (bG >255)
bG = 255;
if (bG < 0)
bG = 0;
if (bB > 255)
bB = 255;
if (bB < 0)
bB = 0;
// Set the RGB cvScalar with G B R, you can use this values as you want too..
return cv::Scalar(bB,bG,bR); // R component
}
int main(int argc,char** argv )
{
Mat im = imread("c:/data/football.png",0);
std::vector<std::vector<cv::Point> > contours;
std::vector<cv::Vec4i> hierarchy;
findContours(im.clone(), contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
vector<Rect> boxes;
for(size_t i = 0; i < contours.size(); i++)
{
Rect r = boundingRect(contours[i]);
boxes.push_back(r);
}
DbScan dbscan(boxes,20,2);
dbscan.run();
//done, perform display
Mat grouped = Mat::zeros(im.size(),CV_8UC3);
vector<Scalar> colors;
RNG rng(3);
for(int i=0;i<=dbscan.C;i++)
{
colors.push_back(HSVtoRGBcvScalar(rng(255),255,255));
}
for(int i=0;i<dbscan.data.size();i++)
{
Scalar color;
if(dbscan.labels[i]==-1)
{
color=Scalar(128,128,128);
}else
{
int label=dbscan.labels[i];
color=colors[label];
}
putText(grouped,to_string(dbscan.labels[i]),dbscan.data[i].tl(), FONT_HERSHEY_COMPLEX,.5,color,1);
drawContours(grouped,contours,i,color,-1);
}
imshow("grouped",grouped);
imwrite("c:/data/grouped.jpg",grouped);
waitKey(0);
}
I agree with Sebastian Schmitz: you probably shouldn't be looking for clustering.
Don't expect an uninformed method such as k-means to work magic for you. In particular one that is as crude a heuristic as k-means, and which lives in an idealized mathematical world, not in messy, real data.
You have a good understanding of what you want. Try to put this intuition into code. In your case, you seem to be looking for connected components.
Consider downsampling your image to a lower resolution, then rerunning the same process! Or running it on the lower resolution right away (to reduce compression artifacts, and improve performance). Or adding filters, such as blurring.
I'd expect best and fastest results by looking at connected components in the downsampled/filtered image.
I am not entirely sure if you are really looking for clustering (in the Data Mining sense).
Clustering is used to group similar objects according to a distance function. In your case the distance function would only use the spatial qualities. Besides, in k-means clustering you have to specify a k, that you probably don't know beforehand.
It seems to me you just want to merge all rectangles whose borders are closer together than some predetermined threshold. So as a first idea try to merge all rectangles that are touching or that are closer together than half a players height.
You probably want to include a size check to minimize the risk of merging two players into one.
Edit: If you really want to use a clustering algorithm use one that estimates the number of clusters for you.
I guess you can improve your original attempt by using morphological transformations. Take a look at http://docs.opencv.org/master/d9/d61/tutorial_py_morphological_ops.html#gsc.tab=0. Probably you can deal with a closed set for each entity after that, specially with separate players as you got in your original image.

OpenCV groupRectangles - getting grouped and ungrouped rectangles

I'm using OpenCV and want to group together rectangles that have significant overlap. I've tried using groupRectangles for this, which takes a group threshold argument. With a threshold of 0 it doesn't do any grouping at all, and with a threshold of 1 is only returns rectangles that were the result of at least 2 rectangles. For example, given the rectangles on the left in the image below you end up with the 2 rectangles on the right:
What I'd like to end up with is 3 rectangles. The 2 on the right in the image above, plus the rectangle in the top right of the image to the left that doesn't overlap with any other rectangles. What's the best way to achieve this?
The solution I ended up going with was to duplicate all of the initial rectangles before calling groupRectangles. That way every input rectangle is guaranteed to be grouped with at least one other rectangle, and will appear in the output:
int size = rects.size();
for( int i = 0; i < size; i++ )
{
rects.push_back(Rect(rects[i]));
}
groupRectangles(rects, 1, 0.2);
A little late to the party, however "duplicating" solution did not properly work for me. I also had another problem where merged rectangles would overlap and would need to be merged.
So I came up with an overkill solution (might require C++14 compiler). Here's usage example:
std::vector<cv::Rect> rectangles, test1, test2, test3;
rectangles.push_back(cv::Rect(cv::Point(5, 5), cv::Point(15, 15)));
rectangles.push_back(cv::Rect(cv::Point(14, 14), cv::Point(26, 26)));
rectangles.push_back(cv::Rect(cv::Point(24, 24), cv::Point(36, 36)));
rectangles.push_back(cv::Rect(cv::Point(37, 20), cv::Point(40, 40)));
rectangles.push_back(cv::Rect(cv::Point(20, 37), cv::Point(40, 40)));
test1 = rectangles;
test2 = rectangles;
test3 = rectangles;
//Output format: {Rect(x, y, width, height), ...}
//Merge once
mergeRectangles(test1);
//Output rectangles: test1 = {Rect(5, 5, 31, 31), Rect(20, 20, 20, 20)}
//Merge until there are no rectangles to merge
mergeRectangles(test2, true);
//Output rectangles: test2 = {Rect(5, 5, 35, 35)}
//Override default merge (intersection) function to merge all rectangles
mergeRectangles(test3, false, [](const cv::Rect& r1, const cv::Rect& r2) {
return true;
});
//Output rectangles: test3 = {Rect(5, 5, 35, 35)}
Function:
void mergeRectangles(std::vector<cv::Rect>& rectangles, bool recursiveMerge = false, std::function<bool(const cv::Rect& r1, const cv::Rect& r2)> mergeFn = nullptr) {
static auto defaultFn = [](const cv::Rect& r1, const cv::Rect& r2) {
return (r1.x < (r2.x + r2.width) && (r1.x + r1.width) > r2.x && r1.y < (r2.y + r2.height) && (r1.y + r1.height) > r2.y);
};
static auto innerMerger = [](std::vector<cv::Rect>& rectangles, std::function<bool(const cv::Rect& r1, const cv::Rect& r2)>& mergeFn) {
std::vector<std::vector<std::vector<cv::Rect>::const_iterator>> groups;
std::vector<cv::Rect> mergedRectangles;
bool merged = false;
static auto findIterator = [&](std::vector<cv::Rect>::const_iterator& iteratorToFind) {
for (auto groupIterator = groups.begin(); groupIterator != groups.end(); ++groupIterator) {
auto foundIterator = std::find(groupIterator->begin(), groupIterator->end(), iteratorToFind);
if (foundIterator != groupIterator->end()) {
return groupIterator;
}
}
return groups.end();
};
for (auto rect1_iterator = rectangles.begin(); rect1_iterator != rectangles.end(); ++rect1_iterator) {
auto groupIterator = findIterator(rect1_iterator);
if (groupIterator == groups.end()) {
groups.push_back({rect1_iterator});
groupIterator = groups.end() - 1;
}
for (auto rect2_iterator = rect1_iterator + 1; rect2_iterator != rectangles.end(); ++rect2_iterator) {
if (mergeFn(*rect1_iterator, *rect2_iterator)) {
groupIterator->push_back(rect2_iterator);
merged = true;
}
}
}
for (auto groupIterator = groups.begin(); groupIterator != groups.end(); ++groupIterator) {
auto groupElement = groupIterator->begin();
int x1 = (*groupElement)->x;
int x2 = (*groupElement)->x + (*groupElement)->width;
int y1 = (*groupElement)->y;
int y2 = (*groupElement)->y + (*groupElement)->height;
while (++groupElement != groupIterator->end()) {
if (x1 > (*groupElement)->x)
x1 = (*groupElement)->x;
if (x2 < (*groupElement)->x + (*groupElement)->width)
x2 = (*groupElement)->x + (*groupElement)->width;
if (y1 >(*groupElement)->y)
y1 = (*groupElement)->y;
if (y2 < (*groupElement)->y + (*groupElement)->height)
y2 = (*groupElement)->y + (*groupElement)->height;
}
mergedRectangles.push_back(cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2)));
}
rectangles = mergedRectangles;
return merged;
};
if (!mergeFn)
mergeFn = defaultFn;
while (innerMerger(rectangles, mergeFn) && recursiveMerge);
}
By checking out groupRectangles() in opencv-3.3.0 source code:
if( groupThreshold <= 0 || rectList.empty() )
{
// ......
return;
}
I saw that if groupThreshold is set to less than or equal to 0, the function would just return without doing any grouping.
On the other hand, the following code removed all rectangles which don't have more than groupThreshold similarities.
// filter out rectangles which don't have enough similar rectangles
if( n1 <= groupThreshold )
continue;
That explains why with groupThreshold=1 only rectangles with at least 2 overlappings are in the output.
One possible solution could be to modify the source code shown above (replacing n1 <= groupThreshold with n1 < groupThreshold) and re-compile OpenCV.