Linear regression poor gradient descent performance - c++

I have implemented a simple Linear Regression (single variate for now) example in C++ to help me get my head around the concepts. I'm pretty sure that the key algorithm is right but my performance is terrible.
This is the method which actually performs the gradient descent:
void LinearRegression::BatchGradientDescent(std::vector<std::pair<int,int>> & data,float& theta1,float& theta2)
{
float weight = (1.0f/static_cast<float>(data.size()));
float theta1Res = 0.0f;
float theta2Res = 0.0f;
for(auto p: data)
{
float cost = Hypothesis(p.first,theta1,theta2) - p.second;
theta1Res += cost;
theta2Res += cost*p.first;
}
theta1 = theta1 - (m_LearningRate*weight* theta1Res);
theta2 = theta2 - (m_LearningRate*weight* theta2Res);
}
With the other key functions given as:
float LinearRegression::Hypothesis(float x,float theta1,float theta2) const
{
return theta1 + x*theta2;
}
float LinearRegression::CostFunction(std::vector<std::pair<int,int>> & data,
float theta1,
float theta2) const
{
float error = 0.0f;
for(auto p: data)
{
float prediction = (Hypothesis(p.first,theta1,theta2) - p.second) ;
error += prediction*prediction;
}
error *= 1.0f/(data.size()*2.0f);
return error;
}
void LinearRegression::Regress(std::vector<std::pair<int,int>> & data)
{
for(unsigned int itr = 0; itr < MAX_ITERATIONS; ++itr)
{
BatchGradientDescent(data,m_Theta1,m_Theta2);
//Some visualisation code
}
}
Now the issue is that if the learning rate is greater than around 0.000001 the value of the cost function after gradient descent is higher than it is before. That is to say, the algorithm is working in reverse. The line forms into a straight line through the origin pretty quickly but then takes millions of iterations to actually reach a reasonably well fit line.
With a learning rate of 0.01, after six iterations the output is: (where difference is costAfter-costBefore)
Cost before 102901.945312, cost after 517539430400.000000, difference 517539332096.000000
Cost before 517539430400.000000, cost after 3131945127824588800.000000, difference 3131944578068774912.000000
Cost before 3131945127824588800.000000, cost after 18953312418560698826620928.000000, difference 18953308959796185006080000.000000
Cost before 18953312418560698826620928.000000, cost after 114697949347691988409089177681920.000000, difference 114697930004878874575022382383104.000000
Cost before 114697949347691988409089177681920.000000, cost after inf, difference inf
Cost before inf, cost after inf, difference nan
In this example the thetas are set to zero, the learning rate to 0.000001, and there are 8,000,000 iterations! The visualisation code only updates the graph after every 100,000 iterations.
Function which creates the data points:
static void SetupRegressionData(std::vector<std::pair<int,int>> & data)
{
srand (time(NULL));
for(int x = 50; x < 750; x += 3)
{
data.push_back(std::pair<int,int>(x+(rand() % 100), 400 + (rand() % 100) ));
}
}
In short, if my learning rate is too high the gradient descent algorithm effectively runs backwards and tends to infinity and if it is lowered to the point where it actually converges towards a minima the number of iterations required to actually do so is unacceptably high.
Have I missed anything/made a mistake in the core algorithm?

Looks like everything is behaving as expected, but you are having problems selecting a reasonable learning rate. That's not a totally trivial problem, and there are many approaches ranging from pre-defined schedules that progressively reduce the learning rate (see e.g. this paper) to adaptive methods such as AdaGrad or AdaDelta.
For your vanilla implementation with fixed learning rate you should make your life easier by normalising the data to zero mean and unit standard deviation before you feed it into the gradient descent algorithm. That way you will be able to reason about the learning rate more easily. Then you can just rescale your prediction accordingly.

Related

Pre-Collision Object Staging

I am making a billiards game. Currently, when one ball collides with another at high speed, the collision is not always calculated correctly. I know what the issue is, but I'm not 100% sure how to fix it.
Say two balls are traveling with these velocities:
More often than not, when the collision is detected, the balls will have some overlap between them that looks like this:
Currently, my physics engine will handle the collision at this moment in time. This will not give the desired result since this is NOT where the balls collide in reality - balls don't go through one another. So, we need back up the balls to where they really collide. That would look like this:
I am looking for an efficient algorithm that would help me do this. Currently, I have a very naive and inefficient method - I move both balls to their locations just before the collision and take very small steps toward the moment of collision. Of course, this is very inefficient. Here is what it looks like:
void CBallCollision::StageCollision()
{
double sumOfRadii = mBall1->GetRadius() + mBall2->GetRadius();
mBall1->SetCenter(mBall1->GetLastLocationOnTable().first, mBall1->GetLastLocationOnTable().second);
mBall2->SetCenter(mBall2->GetLastLocationOnTable().first, mBall2->GetLastLocationOnTable().second);
double timeStep = 0.008;
double tolerance = 0.1 * min(mBall1->GetRadius(), mBall2->GetRadius());
int iter = 0;
while (GetDistance() > sumOfRadii)
{
double xGoal1 = mBall1->GetX() + mBall1->GetVelocityX() * timeStep;
double yGoal1 = mBall1->GetY() + mBall1->GetVelocityY() * timeStep;
pair<double, double> newCoords1 = mBall1->LinearInterpolate(xGoal1, yGoal1);
double xGoal2 = mBall2->GetX() + mBall2->GetVelocityX() * timeStep;
double yGoal2 = mBall2->GetY() + mBall2->GetVelocityY() * timeStep;
pair<double, double> newCoords2 = mBall2->LinearInterpolate(xGoal2, yGoal2);
double dist = (pow(newCoords1.first - newCoords2.first, 2) + pow(newCoords1.second - newCoords2.second, 2));
if (abs(dist - sumOfRadii) > tolerance)
{
timeStep *= 0.5;
}
else
{
mBall1->SetX(newCoords1.first);
mBall1->SetY(newCoords1.second);
mBall2->SetX(newCoords2.first);
mBall2->SetY(newCoords2.second);
}
iter++;
if (iter > 1000)
{
break;
}
}
}
If I don't put an upper bound on the number of iterations, the program crashes. I'm sure there is a much more efficient way of going about this. Any help is appreciated.

How to implement batch backpropagation

I have successfully implemented stochastic backpropagation and I am trying to increase its accuracy. I've noticed batched backpropagation seems to be more popular I wanted to try and see if that will improve the network's accuracy, however I can't seem to figure out how to implement it. By "batched backpropagation" I mean backpropagation where the weights and biases are only updated after the completion of a mini-batch or epoch instead of updating it after each input.
My understanding is that you sum up the changes that are needed to be made to each weight and bias and apply the change at the end of the batch of training examples. I basically changed nothing from my original stochastic backprop code except instead of applying the change directly to the weights and biases I apply the change to a buffer which is then used to update the weights and biases later. Or am I supposed to sum up the cost from each training example and then at the end of the batch run backpropagation? If this is the case then what do I use for the intermediate results (the output vectors of each layer) if the cost is a combination of the cost for a batch of inputs?
//Called after each calculation on a training example
void ML::NeuralNetwork::learnBatch(const Matrix & calc, const Matrix & real) const {
ML::Matrix cost = 2 * (calc - real);
for (int i = weights.size() - 1; i >= 0; --i) {
//Each element in results is the column vector output for each layer
//ElementMultiply() returns Hadamard Product
ML::Matrix dCdB = cost.elementMultiply(ML::sigDerivative(weights[i] * results[i] + biases[i]));
ML::Matrix dCdW = dCdB * results[i].transpose();
cost = weights[i].transpose() * dCdB;
sumWeights[i] += learningRate * dCdW; //Scalar multiplication
sumBiases[i] += learningRate * dCdB;
/* Original Code:
* weights[i] -= learningRate * dCdW;
* biases[i] -= learningRate * dCdB;
*/
}
}
//Called at the end of a batch
void ML::NeuralNetwork::update() {
for (int i = 0; i < weights.size(); ++i) {
weights[i] -= sumWeights[i];
biases[i] -= sumBiases[i];
//Sets all elements in the matrix to 0
sumWeights[i].zero();
sumBiases[i].zero();
}
}
Besides the addition of an update() function I really haven't changed much from my working stochastic backprop code. With my current batch backprop code the neural network never learns and consistently gets 0 correct outputs even after iterating over 200 batches. Is there something I'm not understanding?
All help will be greatly appreciated.
In batch back propagation, you sum the contribution of the backpropagation of each sample.
In other terms, the resulting gradient is thus the sum of the gradient of each sample.

study of FFT - Why it's not fast?

I am not sure if it's more math or more programming question. If it's math please tell me.
I know there is a lot of ready to use for free FFT projects. But I try to understand FFT method. Just for fun and for studying it. So I made both algorithms - DFT and FFT, to compare them.
But I have problem with my FFT. It seems there is not big difference in efficiency. My FFT is only little bit faster then DFT (in some cases it's two times faster, but it's max acceleration)
In most articles about FFT, there is something about bit reversal. But I don't see the reason to use bit reversing. Probably it's the case. I don't understand it. Please help me. What I do wrong?
This is my code (you can copy it here and see how it works - online compiler):
#include <complex>
#include <iostream>
#include <math.h>
#include <cmath>
#include <vector>
#include <chrono>
#include <ctime>
float _Pi = 3.14159265;
float sampleRate = 44100;
float resolution = 4;
float _SRrange = sampleRate / resolution; // I devide Sample Rate to make the loop smaller,
//just to perform tests faster
float bufferSize = 512;
// Clock class is for measure time to execute whole loop:
class Clock
{
public:
Clock() { start = std::chrono::high_resolution_clock::now(); }
~Clock() {}
float secondsElapsed()
{
auto stop = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();
}
void reset() { start = std::chrono::high_resolution_clock::now(); }
private:
std::chrono::time_point<std::chrono::high_resolution_clock> start;
};
// Function to calculate magnitude of complex number:
float _mag_Hf(std::complex<float> sf);
// Function to calculate exp(-j*2*PI*n*k / sampleRate) - where "j" is imaginary number:
std::complex<float> _Wnk_Nc(float n, float k);
// Function to calculate exp(-j*2*PI*k / sampleRate):
std::complex<float> _Wk_Nc(float k);
int main() {
float scaleFFT = 512; // devide and conquere - if it's "1" then whole algorhitm is just simply DFT
// I wonder what is the maximum of that value. I alvays thought it should be equal to
// buffer size (number o samples) but above some value it start to work slower then DFT
std::vector<float> inputSignal; // array of input signal
inputSignal.resize(bufferSize); // how many sample we will use to calculate Fourier Transform
std::vector<std::complex<float>> _Sf; // array to store Fourier Transform value for each measured frequency bin
_Sf.resize(scaleFFT); // resize it to size which we need.
std::vector<std::complex<float>> _Hf_Db_vect; //array to store magnitude (in logarythmic dB scale)
//for each measured frequency bin
_Hf_Db_vect.resize(_SRrange); //resize it to make it able to store value for each measured freq value
std::complex<float> _Sf_I_half; // complex to calculate first half of freq range
// from 1 to Nyquist (sampleRate/2)
std::complex<float> _Sf_II_half; // complex to calculate second half of freq range
//from Nyquist to sampleRate
for(int i=0; i<(int)_Sf.size(); i++)
inputSignal[i] = cosf((float)i/_Pi); // fill the input signal with some data, no matter
Clock _time; // Start measure time
for(int freqBinK=0; freqBinK < _SRrange/2; freqBinK++) // start calculate all freq (devide by 2 for two halves)
{
for(int i=0; i<(int)_Sf.size(); i++) _Sf[i] = 0.0f; // clean all values, for next loop we need all values to be zero
for (int n=0; n<bufferSize/_Sf.size(); ++n) // Here I take all samples in buffer
{
std::complex<float> _W = _Wnk_Nc(_Sf.size()*(float)n, freqBinK);
for(int i=0; i<(int)_Sf.size(); i++) // Finally here is my devide and conquer
_Sf[i] += inputSignal[_Sf.size()*n +i] * _W; // And I see no reason to use any bit reversal, how it shoul be????
}
std::complex<float> _Wk = _Wk_Nc(freqBinK);
_Sf_I_half = 0.0f;
_Sf_II_half = 0.0f;
for(int z=0; z<(int)_Sf.size()/2; z++) // here I calculate Fourier transform for each freq
{
_Sf_I_half += _Wk_Nc(2.0f * (float)z * freqBinK) * (_Sf[2*z] + _Wk * _Sf[2*z+1]); // First half - to Nyquist
_Sf_II_half += _Wk_Nc(2.0f * (float)z *freqBinK) * (_Sf[2*z] - _Wk * _Sf[2*z+1]); // Second half - to SampleRate
// also don't see need to use reversal bit, where it shoul be??? :)
}
// Calculate magnitude in dB scale
_Hf_Db_vect[freqBinK] = _mag_Hf(_Sf_I_half); // First half
_Hf_Db_vect[freqBinK + _SRrange/2] = _mag_Hf(_Sf_II_half); // Second half
}
std::cout << _time.secondsElapsed() << std::endl; // time measuer after execution of whole loop
}
float _mag_Hf(std::complex<float> sf)
{
float _Re_2;
float _Im_2;
_Re_2 = sf.real() * sf.real();
_Im_2 = sf.imag() * sf.imag();
return 20*log10(pow(_Re_2 + _Im_2, 0.5f)); //transform magnitude to logarhytmic dB scale
}
std::complex<float> _Wnk_Nc(float n, float k)
{
std::complex<float> _Wnk_Ncomp;
_Wnk_Ncomp.real(cosf(-2.0f * _Pi * (float)n * k / sampleRate));
_Wnk_Ncomp.imag(sinf(-2.0f * _Pi * (float)n * k / sampleRate));
return _Wnk_Ncomp;
}
std::complex<float> _Wk_Nc(float k)
{
std::complex<float> _Wk_Ncomp;
_Wk_Ncomp.real(cosf(-2.0f * _Pi * k / sampleRate));
_Wk_Ncomp.imag(sinf(-2.0f * _Pi * k / sampleRate));
return _Wk_Ncomp;
}
One huge mistake you are making is calculating the butterfly weights (which involves sin and cos) on the fly (in _Wnk_Nc()). sin and cos typically cost 10s to 100s of clock cycles, whereas the other butterfly operations are just mul and add, which only take a few cycles, hence the need to factor these out. All fast FFT implementations do this as part of an initialisation step (usually called "plan creation" or similar). See e.g. FFTW and KissFFT.
apart of abovementioned "pre-calculating butterfly weights" optimization, most FFT implementations also use SIMD instructions to vectorize code.
// also don't see need to use reversal bit, where it shoul be?
The very first butterfly loop should be reverse-bit indexed. Those indexes are usually calculated inside recursion, but for loop solution calculating those indexes is also costly, so it's better to pre-calculate them in plan as well.
Combining those optimization approaches result in approximately 100x speedup
Most fast FFT implementations either use a lookup table of precomputed twiddle factors, or a simple recursion to rotate the twiddle factors on the fly, instead of calling trigonometric math library functions inside the FFT inner loop.
For large FFTs, using a trig recursion formula is less likely to thrash the data caches on contemporary processors.

Efficiency of Breadth First Search

What would be the most efficient way to compute the fewest hops it takes to get from x1, y1 to x2, y2 on an unbounded/infinite chess board? Assume that from x1, y1 we can always generate a set of legal moves.
This sounds tailor made for BFS and I have implemented one successfully. But its space and time complexity seem atrocious if x2, y2 is arbitrarily large.
I have been looking at various other algorithms like A*, Bidirectional search, iterative deepening DFS etc but so far I am clueless as to which approach would yield the most optimal (and complete) solution. Is there some insight I am missing?
If the set of legal moves is independent of the current space, then this seems ideal as an integer linear programming (ILP) problem. You'd basically solve for the number of each type of move, such that the total number of moves is minimized. For instance, for a knight constrained to only move up and to the right (so that each move was either x+=1, y+=2 or x+=2, y+=1, you'd minimize a1+a2, subject to 2*a1+a2 == x2-x1, a1+2*a2 == y2-y1, a1 >= 0, a2 >= 0. While ILPs in general are NP-complete, I'd expect a standard hill-climbing algorithm to be able to solve it quite efficiently.
I don't have a complete proof yet, but I believe that if x1,y1 and x2,y2 are far away in both directions, then any optimal solution will have a lot of moves that move directly toward x2 and directly toward y2 (2 possible L-shaped moves that move in this direction). If the current position x is close to x2 but the current position y is far away from y2 for example, then alternate between the two moves that move two squares toward y2. And similarly if y is close to y2 and x and x2 are far away. Then, as soon as both the vertical and horizontal distance to x2,y2 are less than some rather small threshold (probably like 5 or 10), then you have to solve the problem with BFS or whatever to get the optimal solution, and the solution you get should be guaranteed to be optimal. I'll update my answer when I have a proof but I am almost certain this is true. If so, it means that no matter how far away x1,y1 and x2,y2 are from each other, you basically only have to solve a problem where the horizontal and vertical distances are like 5 or 10, which can be done quickly.
To expand on the discussion in comments, an uninformed search like breadth-first search (BFS) will find the optimal solution (the shortest path). However it only considers the cost-so-far g(n) for a node n and its cost increases exponentially with distance from source to target. To tame the cost of the search whilst still ensuring that the search finds the optimal solution, you need to add some information to the search algorithm via a heuristic, h(n).
Your case is a good fit for A* search, where the heuristic is a measure of distance from a node to the target (x2, y2). You could use the Euclidian distance "as the crow flies", but as you're considering a Knight then Manhattan distance might be more appropriate. Whatever measure you choose it has to be less (or equal to) the actual distance from the node to the target for the search to find the optimal solution (in this case the heuristic is known as "admissible"). Note that you need to divide each distance by a constant in order to get it to underestimate moves: divide by 3 for the Manhattan distance, and by sqrt(5) for the Euclidian distance (sqrt(5) is the length of the diagonal of a 2 by 1 square).
When you're running the algorithm you estimate the total distance f(n) from any node n that we've got to already as the distance so far plus the heuristic distance. I.e. f(n) = g(n) + h(n) where g(n) is the distance from (x1,y1) to node n and h(n) is the estimated heuristic distance from node n to (x2,y2). Given the nodes you've got to, you always choose the node n with the lowest f(n). I like the way you put it:
maintain a priority queue of nodes to be checked out ordered by g(n) + h(n).
If the heuristic is admissible then the algorithm finds the optimal solution because a suboptimal path can never be at the front of the priority queue: any fragment of the optimal path will always have a lower total distance (where, again, total distance is incurred distance plus heuristic distance).
The distance measure we've chosen here is monotonic (i.e. increases as the path lengthens rather than going up or down). In this case it's possible to show that it's efficient. As usual, see wikipedia or other sources on the web for more details. The Colorado state university page is particularly good and has nice discussions on optimality and efficiency.
Taking an example of going from (-200,-100) to (0,0), which is equivalent to your example of (0,0) to (200,100), in my implementation what we see with a Manhattan heuristic is as follows
The implementation does too much searching because with the heuristic h = Manhattan distance, taking steps of across 1 up 2 seem just as good as the optimal steps of across 2 up 1, i.e. the f() values don't distinguish the two. However the algorithm still finds the optimal solution of 100 moves. It takes 2118 steps, which is still a lot better than the breadth first search, which spreads out like an ink blot (I estimate it might take 20000 to 30000 steps).
How does it do if you choose the h = Euclidian distance?
This is a lot better! It only takes 104 steps, and it does so well because it incorporates our intuition that you need to head in roughly the right direction. But before we congratulate ourselves let's try another example, from (-200,0) to (0,0). Both heuristics find an optimal path of length 100. The Euclidian heuristic takes 12171 steps to find an optimal path, as shown below.
Whereas the Manhattan heuristic takes 16077 steps
Leaving aside the fact that the Manhattan heuristic does worse, again, I believe the real problem here is that there are multiple optimal paths. This isn't so strange: a re-ordering of an optimal path is still optimal. This fact is automatically taken into account by recasting the problem in a mathematical form along the lines of #Sneftel's answer.
In summary, A* with an admissible heuristic produces an optimal solution more efficiently than does BFS but it is likely that there are more efficient soluions out there. A* is a good default algorithm in cases where you can easily come up with a distance heuristic, and although in this case it isn't going to be the best solution, it's possible to learn a lot about the problem by implementing it.
Code below in C++ as you requested.
#include <memory>
using std::shared_ptr;
#include <vector>
using std::vector;
#include <queue>
using std::priority_queue;
#include <map>
using std::map;
using std::pair;
#include <math.h>
#include <iostream>
using std::cout;
#include <fstream>
using std::ofstream;
struct Point
{
short x;
short y;
Point(short _x, short _y) { x = _x; y = _y; }
bool IsOrigin() { return x == 0 && y == 0; }
bool operator<(const Point& p) const {
return pair<short, short>(x, y) < pair<short, short>(p.x, p.y);
}
};
class Path
{
Point m_end;
shared_ptr<Path> m_prev;
int m_length; // cached
public:
Path(const Point& start)
: m_end(start)
{ m_length = 0; }
Path(const Point& start, shared_ptr<Path> prev)
: m_end(start)
, m_prev(prev)
{ m_length = m_prev->m_length +1; }
Point GetEnd() const { return m_end; }
int GetLength() const { return m_length; }
vector<Point> GetPoints() const
{
vector<Point> points;
for (const Path* curr = this; curr; curr = curr->m_prev.get()) {
points.push_back(curr->m_end);
}
return points;
}
double g() const { return m_length; }
//double h() const { return (abs(m_end.x) + abs(m_end.y)) / 3.0; } // Manhattan
double h() const { return sqrt((m_end.x*m_end.x + m_end.y*m_end.y)/5); } // Euclidian
double f() const { return g() + h(); }
};
bool operator<(const shared_ptr<Path>& p1, const shared_ptr<Path>& p2)
{
return 1/p1->f() < 1/p2->f(); // priority_queue has biggest at end of queue
}
int main()
{
const Point source(-200, 0);
const Point target(0, 0);
priority_queue<shared_ptr<Path>> q;
q.push(shared_ptr<Path>(new Path(source)));
map<Point, short> endPath2PathLength;
endPath2PathLength.insert(map<Point, short>::value_type(source, 0));
int pointsExpanded = 0;
shared_ptr<Path> path;
while (!(path = q.top())->GetEnd().IsOrigin())
{
q.pop();
const short newLength = path->GetLength() + 1;
for (short dx = -2; dx <= 2; ++dx){
for (short dy = -2; dy <= 2; ++dy){
if (abs(dx) + abs(dy) == 3){
const Point newEnd(path->GetEnd().x + dx, path->GetEnd().y + dy);
auto existingEndPath = endPath2PathLength.find(newEnd);
if (existingEndPath == endPath2PathLength.end() ||
existingEndPath->second > newLength) {
q.push(shared_ptr<Path>(new Path(newEnd, path)));
endPath2PathLength[newEnd] = newLength;
}
}
}
}
pointsExpanded++;
}
cout<< "Path length " << path->GetLength()
<< " (points expanded = " << pointsExpanded << ")\n";
ofstream fout("Points.csv");
for (auto i : endPath2PathLength) {
fout << i.first.x << "," << i.first.y << "," << i.second << "\n";
}
vector<Point> points = path->GetPoints();
ofstream fout2("OptimalPoints.csv");
for (auto i : points) {
fout2 << i.x << "," << i.y << "\n";
}
return 0;
}
Note this isn't very well tested so there may well be bugs but I hope the general idea is clear.

Explain this algorithm (Compare points in SURF algorithm)

I need to know if this algorithm is a known one:
void getMatches(IpVec &ipts1, IpVec &ipts2, IpPairVec &matches, float ratio) {
float dist, d1, d2;
Ipoint *match;
matches.clear();
for (unsigned int i = 0; i < ipts1.size(); i++) {
d1 = d2 = FLT_MAX;
for (unsigned int j = 0; j < ipts2.size(); j++) {
dist = ipts1[i] - ipts2[j];
if (dist < d1) // if this feature matches better than current best
{
d2 = d1;
d1 = dist;
match = &ipts2[j];
} else if (dist < d2) // this feature matches better than second best
{
d2 = dist;
}
}
// If match has a d1:d2 ratio < 0.65 ipoints are a match
if (d1 / d2 < ratio) {
// Store the change in position
ipts1[i].dx = match->x - ipts1[i].x;
ipts1[i].dy = match->y - ipts1[i].y;
matches.push_back(std::make_pair(ipts1[i], *match));
}
}
}
class Ipoint {
public:
//! Destructor
~Ipoint() {
};
//! Constructor
Ipoint() : orientation(0) {
};
//! Gets the distance in descriptor space between Ipoints
float operator-(const Ipoint &rhs) {
float sum = 0.f;
for (int i = 0; i < 64; ++i) {
//std::cout << i << "\n";
try {
sum += (this->descriptor[i] - rhs.descriptor[i])*(this->descriptor[i] - rhs.descriptor[i]);
} catch (char *str) {
std::cout << "Caught some other exception: " << str << "\n";
}
}
return sqrt(sum);
};
//! Coordinates of the detected interest point
float x, y;
//! Detected scale
float scale;
//! Orientation measured anti-clockwise from +ve x-axis
float orientation;
//! Sign of laplacian for fast matching purposes
int laplacian;
//! Vector of descriptor components
float descriptor[64];
//! Placeholds for point motion (can be used for frame to frame motion analysis)
float dx, dy;
//! Used to store cluster index
int clusterIndex;
};
This compares the results of SURF algorithm.
This is a nearest neighbor algorithm? That looks like the func is searching the nearest point of every point.
Can I do the same using Quadtree or kd-tree?
There is a better algorithm to compare to images points and know if them are the same or similar?
Preferable I want to store them into mysql and build a kd-tree to compare 1 image through all images, that's possible?
RANSAC is useful for anything in this task?
There is any method to catch false positives?
You've asked a lot of questions and I don't think I can answer all of them, but here are answers to as much of your question as I can.
This is most certainly a nearest-neighbor algorithm where the goal is to find the two closest points to each point in the first vector and then check whether the ratio of their distances is less than some cutoff value.
You could do this with a quadtree or kd-tree, but because your points are all one-dimensional values you could do much better using a balanced binary search tree. Given such a tree, if you thread a linked list through the nodes, you can find the k nearest neighbors to some test point p by looking up the closest element p in the binary search tree, then traversing (k + 1) steps in each direction and taking the k closest points of what you find. This runs in time O(lg n + k), where n is the number of points and k is as above. This is substantially more efficient than what you have now, which takes O(n) time per lookup.
If the dimensionality of your feature vector is more than 1, but less than 20, then using kd-trees would be a very effective measure.
For higher dimensionalities, you might want to either reduce the number of dimensions with PCA before applying a kd-tree or use a more scalable ANN structure, such as locality-sensitive hashing.
SURF is best for scene and object detection. If you need to find out if two images are the same, you would do better with global descriptor algorithms, such as GIST. The advantage of using a global descriptor is that you get a single vector for the whole image and image comparison is performed with simple Eucledian distance.
You could definitely do this using MySQL because you don't need a kd-tree. A simple balanced binary tree should be sufficient.
RANSAC is a method of estimating model parameters that is robust against outliers. It is useful for using SURF features for combining multiple photographs into a 3D scene.
Checking for false positives is definitely a machine learning exercise and I'm not well-trained in that area. You could probably do this using a supervised learning algorithm (such as an SVM, boosted decision tree, or neural network), but I don't know enough to advise you on this.
Hope this helps!
I'll just answer 5 since templatetypedef addresses the rest. RANSAC is a parameter estimation method (kinda like finding the line of best fit to a set of data points). So its not directly useful in nearest neighbor searches.