How to fix the number of SIFT keypoints? - c++

I am trying to use SIFT descriptors that are directly used for image classification. The SIFT is defined by: Ptr<SIFT> sift = SIFT::create(100). Then I expect there would be 100 keypoints to be extracted. But the number of actually detected keypoints (sift->detect(img_resiz,keypoints)) is not always 100 (sometimes exceeding the preset value). How could that happen?
I want to have the fixed number of keypoints per image so as to produce the consistent length of descriptors (after being reshaped into a row vector) among different images (alternatively I may need more processing based on the bag-of-word to represent the sift descriptors into the same dimension).

There was an error in the function KeyPointsFilter::retainBest(std::vector<KeyPoint>& keypoints, int n_points) as you can see here: https://github.com/opencv/opencv/commit/3f3c8823ac22e34a37d74bc824e00a807535b91b.
I could reproduce the error with an older version of OpenCV (3.4.5) and sometimes you had 1 more KeyPoint than expected e.g. 101 instead of 100 because of that marked line.
If you don't want to switch to a newer OpenCV version you could do something like:
// Detect SIFT keypoints
std::vector<cv::KeyPoint> keypoints_sift, keypoints_sift_100;
cv::Ptr<cv::xfeatures2d::SiftFeatureDetector> sift = cv::xfeatures2d::SiftFeatureDetector::create(100);
sift->detect(img, keypoints_sift);
std::cout << keypoints_sift.size() << std::endl;
for (size_t i = 0; i < 100; ++i) {
keypoints_sift_100.push_back(keypoints_sift[i]);
}
So you would keep the 100 best keypoints after detection since they are ranked by their scores https://docs.opencv.org/4.1.0/d5/d3c/classcv_1_1xfeatures2d_1_1SIFT.html.

Related

Using custom distance function with FLANN or Knn in OpenCv?

I'm working on code that computes dense SIFT features from a set of images, based on SIFT flow: http://people.csail.mit.edu/celiu/SIFTflow/
I'd like to try building a FLANN index on these images by comparing the "energy" between each image in SIFT flow representation.
I have the code to compute the energy from here: http://richardt.name/publications/video-deanaglyph/
Is there a way to create my own distance function for the indexing?
RELATED NOTE:
I was finally able to get an alternate (but not custom) distance function working with flann::Index. The trick is you need to use flann::GenericIndex like so:
flann::GenericIndex<cvflann::ChiSquareDistance<int>> flannIndex(descriptors, cvflann::KDTreeIndexParams());
But you need to give it CV_32S descriptors.
And if you use knnSearch with custom distance function, you have to provide a CV_32S results Mat and CV_32F distances Mat.
Here's my full code in case it's helpful (not a lot of documentation out there):
Mat samples;
loadDescriptors(samples); // loading descriptors from .yml file
samples *= 100000; // scaling up my descriptors to be int
samples.convertTo(samples, CV_32S); // convert float to int
// create flann index
flann::GenericIndex<cvflann::ChiSquareDistance<int>> flannIndex(samples, cvflann::KDTreeIndexParams());
// NOTE lack of distance type in constructor parameters
// (unlike flann::index)
// now try knnSearch
int k=10; // find 10 nearest neighbors
Mat results(1,10,CV_32S), dists(1,10,CV_32F);
// (1,10) Mats for the output, types CV_32S and CV_32F
Mat responseHistogram;
responseHistogram = samples.row(60);
// choose a random row from the descriptors Mat
// to find nearest neighbors
flannIndex.knnSearch(responseHist, results, dists, k, cvflann::SearchParams(200) );
cout << results << endl;
cout << dists << endl;
flannIndex.save(ofToDataPath("indexChi2.txt"));
Using Chi Squared actually seems to work better for me than L2 distance. My feature vectors are BoW histograms in this case.

OCR & OpenCV: Difference between two frames on high resolution images

According to this post OCR: Difference between two frames, I now know how to find pixel differences between two images with OpenCV.
I would like to improve this solution and use it with high resolution images (from a video) with rich content. The example above is not applicable with big images because the process is to slow (too much differences found, the "findCountours method" fills the tab with 250k elements which takes a huge time to process).
My application uses a RLE decoder to decode the compressed frames of the video. Once the frame is decoded, I would like to compare the current frame with the previous one in order to store the differences between the two frames in a "Mat" tab for example.
The goal of all of this is to be able to perform an analysis on the different pixels and to check if there is any latin character. This allows me to reduce the amount of pixels to analyze and to save precious time.
If anyone has other ideas instead of this one to perform such operations, feel free to propose it please.
Thank you for your help.
EDIT 1:
Example of two high resolution images of a computer screen. These are for the moment the perfect example of what I'm trying to analyse. As we can see there is just a window as difference between the two big images and I would like to analyze just the new "Challenge" window for any character.
EDIT 2:
I'm trying to tune the algorithm depending on the data analyzed. Typically on the two following pictures I only get the green lines as differences and no text at all (which is what is the most interesting). I'm trying to understand better how things work for this.
1st image:
2nd image:
3rd image:
As you can see I only have those green lines and never the text (at the best I can have just ONE letter when decreasing the countours[i].size())
In addition to the post you mentioned, you need to:
When you binarize the mask, use a threshold higher then 0 to remove small differences.
Remove some noise. You can find all connected components, and remove smaller ones.
Find the area of the bigger connected components. You can use convexHull and fillConvexPoly to get the mask of the different objects on screen
Copy the second image to a new image, with the given mask.
The result will look like:
Code:
#include <opencv2/opencv.hpp>
#include <vector>
using namespace std;
using namespace cv;
int main()
{
Mat3b img1 = imread("path_to_image_1");
Mat3b img2 = imread("path_to_image_2");
Mat3b diff;
absdiff(img1, img2, diff);
// Split each channel
vector<Mat1b> masks;
split(diff, masks);
// Create a black mask
Mat1b mask(diff.rows, diff.cols, uchar(0));
// OR with each channel of the N channels mask
for (int i = 0; i < masks.size(); ++i)
{
mask |= masks[i];
}
// Binarize mask
mask = mask > 100;
// Results images
vector<Mat3b> difference_images;
// Remove small blobs
//Mat kernel = getStructuringElement(MORPH_RECT, Size(5,5));
//morphologyEx(mask, mask, MORPH_OPEN, kernel);
// Find connected components
vector<vector<Point>> contours;
findContours(mask.clone(), contours, CV_RETR_EXTERNAL, CHAIN_APPROX_NONE);
for (int i = 0; i < contours.size(); ++i)
{
if (contours[i].size() > 1000)
{
Mat1b mm(mask.rows, mask.cols, uchar(0));
vector<Point> hull;
convexHull(contours[i], hull);
fillConvexPoly(mm, hull, Scalar(255));
Mat3b difference_img(img2.rows, img2.cols, Vec3b(0,0,0));
img2.copyTo(difference_img, mm);
difference_images.push_back(difference_img.clone());
}
}
return 0;
}

C++ OpenCV SVM Predict Not Working

Trying to create a functional SVM. I have 114 training images, 60 Positive/54 Negative, and 386 testing images for the SVM to predict against.
I read in the training image features to float like this:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
And the same for the testing images too:
testDataFloat[i][0] = testFeatures.rows;
testDataFloat[i][2] = testFeatures.cols;
Then, using Micka's answer to this question, I turn the testDataFloat into a 1 Dimensional Array, and feed it to a Mat like this so to predict on the SVM:
float* testData1D = (float*)testDataFloat;
Mat testDataMat1D(height*width, 1, CV_32FC1, testData1D);
float testPredict = SVMmodel.predict(testDataMat1D);
Once this was all in place, there is the Debug Error of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
Looking at this post I found (Thanks to berak) that:
"all images (used in training & prediction) have to be the same size"
So I included a re-size function that would re-size the images to be all square at whatever size you wished (100x100, 200x200, 1000, 1000 etc.)
Run it again with the images re-sized to a new directory that the program now loads the images in from, and I get the exact same error as before of:
Sizes of input arguments do not match (the sample size is different from what has been used for training) in cvPreparePredictData
I just have no idea anymore on what to do. Why is it still throwing that error?
EDIT
I changed
Mat testDataMat1D(TestDFheight*TestDFwidth, 1, CV_32FC1, testData1D);
to
Mat testDataMat1D(1, TestDFheight*TestDFwidth, CV_32FC1, testData1D);
and placed the .predict inside the loop that the features are given to the float so that each image is given to the .predict individually because of this question. With the to int swapped so that .cols = 1 and .rows = TestDFheight*TestDFwidth the program seems to actually run, but then stops on image 160 (.exe has stopped working)... So that's a new concern.
EDIT 2
Added a simple
std::cout << testPredict;
To view the determined output of the SVM, and it seems to be positively matching everything until Image 160, where it stops running:
Please check your training and test feature vector.
I'm assuming your feature data is some form of cv::Mat containing features on each row.
In which case you want your training matrix to be a concatenation of each feature matrix from each image.
These line doesn't look right:
trainingDataFloat[i][0] = trainFeatures.rows;
trainingDataFloat[i][1] = trainFeatures.cols;
This is setting an element of a 2d matrix to the number of rows and columns in trainFeatures. This has nothing to do with the actual data that is in the trainFeatures matrix.
What are you trying to detect? If each image is a positive and negative example, then are you trying to detect something in an image? What are your features?
If you're trying to detect an object in the image on a per image basis, then you need a feature vector describing the whole image in one vector. In which case you'd do something like this with your training data:
int N; // Set to number of images you plan on using for training
int feature_size; // Set to the number of features extracted in each image. Should be constant across all images.
cv::Mat X = cv::Mat::zeros(N, feature_size, CV_32F); // Feature matrix
cv::Mat Y = cv::Mat::zeros(N, 1, CV_32F); // Label vector
// Now use a for loop to copy data into X and Y, Y = +1 for positive examples and -1 for negative examples
for(int i = 0; i < trainImages.size(); ++i)
{
X.row(i) = trainImages[i].features; // Where features is a cv::Mat row vector of size N of the extracted features
Y.row(i) = trainImages[i].isPositive ? 1:-1;
}
// Now train your cv::SVM on X and Y.

How to detect object from video using SVM

This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code
From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.
The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}
You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.
You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.

Matching thermographic / non-thermographic images with OpenCV feature detectors

I’m currently working on building software which can match infrared and non-infrared images taken from a fixed point using a thermographic camera.
The use case is the following: A picture is taken from using a tripod of a fixed point using an infrared thermographic camera and a standard camera. After taking the pictures, the photographer wants to match images from each camera. There will be some scenarios where an image is taken with only one camera as the other image type is unnecessary. Yes, it may be possible for the images to be matched using timestamps, but the end-user demands they be matched using computer vision.
I've looked at other image matching posts on StackOverflow -- they have often focused on using histogram matching and feature detectors. Histogram matching is not an option here, as we cannot match colors between the two image types. As a result, I've developed an application which does feature detection. In addition to standard feature detection, I’ve also added some logic which says that two keypoints cannot be matching if they are not within a certain margin of each other (a keypoint on the far left of the query image cannot match a keypoint on the far right of the candidate image) -- this process occurs in stage 3 of the code below.
To give you an idea of the current output, here is a valid and invalid match produced -- note the thermographic image is on the left. My objective is to improve the accuracy of the matching process.
Valid match:
Invalid match:
Here is the code:
// for each candidate image specified on the command line, compare it against the query image
Mat img1 = imread(argv[1], CV_LOAD_IMAGE_GRAYSCALE); // loading query image
for(int candidateImage = 0; candidateImage < (argc - 2); candidateImage++) {
Mat img2 = imread(argv[candidateImage + 2], CV_LOAD_IMAGE_GRAYSCALE); // loading candidate image
if(img1.empty() || img2.empty())
{
printf("Can't read one of the images\n");
return -1;
}
// detecting keypoints
SiftFeatureDetector detector;
vector<KeyPoint> keypoints1, keypoints2;
detector.detect(img1, keypoints1);
detector.detect(img2, keypoints2);
// computing descriptors
SiftDescriptorExtractor extractor;
Mat descriptors1, descriptors2;
extractor.compute(img1, keypoints1, descriptors1);
extractor.compute(img2, keypoints2, descriptors2);
// matching descriptors
BFMatcher matcher(NORM_L1);
vector< vector<DMatch> > matches_stage1;
matcher.knnMatch(descriptors1, descriptors2, matches_stage1, 2);
// use nndr to eliminate weak matches
float nndrRatio = 0.80f;
vector< DMatch > matches_stage2;
for (size_t i = 0; i < matches_stage1.size(); ++i)
{
if (matches_stage1[i].size() < 2)
continue;
const DMatch &m1 = matches_stage1[i][0];
const DMatch &m2 = matches_stage1[i][3];
if(m1.distance <= nndrRatio * m2.distance)
matches_stage2.push_back(m1);
}
// eliminate points which are too far away from each other
vector<DMatch> matches_stage3;
for(int i = 0; i < matches_stage2.size(); i++) {
Point queryPt = keypoints1.at(matches_stage2.at(i).queryIdx).pt;
Point trainPt = keypoints2.at(matches_stage2.at(i).trainIdx).pt;
// determine the lowest number here
int lowestXAxis;
int greaterXAxis;
if(queryPt.x <= trainPt.x) { lowestXAxis = queryPt.x; greaterXAxis = trainPt.x; }
else { lowestXAxis = trainPt.x; greaterXAxis = queryPt.x; }
int lowestYAxis;
int greaterYAxis;
if(queryPt.y <= trainPt.y) { lowestYAxis = queryPt.y; greaterYAxis = trainPt.y; }
else { lowestYAxis = trainPt.y; greaterYAxis = queryPt.y; }
// determine if these points are acceptable
bool acceptable = true;
if( (lowestXAxis + MARGIN) < greaterXAxis) { acceptable = false; }
if( (lowestYAxis + MARGIN) < greaterYAxis) { acceptable = false; }
if(acceptable == false) { continue; }
//// it's acceptable -- provide details, perform input
matches_stage3.push_back(matches_stage2.at(i));
}
// output how many individual matches were found for this training image
cout << "good matches found for candidate image # " << (candidateImage+1) << " = " << matches_stage3.size() << endl;
I used this sites code as an example. The problem I’m having is that the feature detection is not reliable, and I seem to be missing the purpose of the NNDR ratio. I understand that I am finding K possible matches for each point within the query image and that I have K = 2. But I don’t understand the purpose of this part within the example code:
vector< DMatch > matches_stage2;
for (size_t i = 0; i < matches_stage1.size(); ++i)
{
if (matches_stage1[i].size() < 2)
continue;
const DMatch &m1 = matches_stage1[i][0];
const DMatch &m2 = matches_stage1[i][1];
if(m1.distance <= nndrRatio * m2.distance)
matches_stage2.push_back(m1);
}
Any ideas on how I can improve this further? Any advice would be appreciated as always.
The validation you currently use
First stage
First of all, let's talk about the part of the code that you don't understand. The idea is to keep only "strong matches". Actually, your call of knnMatch finds, for each descriptor, the best two correspondences with respect to the Euclidean distance "L2"(*). This does not mean at all that these are good matches in reality, but only that those feature points are quite similar.
Let me try to explain your validation now, considering only one feature point in image A (it generalizes to all of them):
You match the descriptor of this point against image B
You get the two best correspondences with respect to the Euclidean distance (i.e. you get the two most similar points in image B)
If the distance from your point to the best correspondence is much smaller than the one from your point to the second-best correspondence, then you assume that it is a good match. In other words, there was only one point in image B that was really similar (i.e. had a small Euclidean distance) to the point in image A.
If both matches are too similar (i.e. !(m1.distance <= nndrRatio * m2.distance)), then you cannot really discriminate between them and you don't consider the match.
This validation has some major weaknesses, as you have probably observed:
Firstly, if the best matches you get from knnMatch are both terribly bad, then the best of those might be accepted anyway.
It does not take the geometry into account. Therefore, a point far on the left of the image might be similar to a point far on the right, even though in reality they clearly don't match.
* EDIT: Using SIFT, you describe each feature point in your image using a floating-point vector. By computing the Euclidean distance between two vectors, you know how similar they are. If both vectors are exactly the same, then the distance is zero. The smaller the distance, the more similar the points. But this is not geometric: a point on the left-hand side of your image might look similar to a point in the right-hand side. So you first find the pairs of points that look similar (i.e. "This point in A looks similar to this point in B because the Euclidean distance between their feature vectors is small") and then you need to verify that this match is coherent (i.e. "It is possible that those similar points are actually the same because they both are on the left-hand side of my image" or "They look similar, but that is incoherent because I know that they must lie on the same side of the image and they don't").
Second stage
What you do in your second stage is interesting since it considers the geometry: knowing that both images were taken from the same point (or almost the same point?), you eliminate matches that are not in the same region in both images.
The problem I see with this is that if both images weren't taken at the exact same position with the very same angle, then it won't work.
Proposition to improve your validation further
I would personally work on the second stage. Even though both images aren't necessarily exactly the same, they describe the same scene. And you can take advantage of the geometry of it.
The idea is that you should be able to find a transformation from the first image to the second one (i.e. the way in which a point moved from image A to image B is actually linked to the way all of the points moved). And in your situation, I would bet that a simple homography is adapted.
Here is my proposition:
Compute the matches using knnMatch, and keep stage 1 (you might want to try removing it later and observe the consequences)
Compute the best possible homography transform between those matches using cv::findHomography (choose the RANSAC algorithm).
findHomography has a mask output that will give you the inliers (i.e. the matches that were used to compute the homography transform).
The inliers will most probably be good matches since there will be coherent geometrically.
EDIT: I just found an example using findHomography here.
Haven't tried it with infrared/visible light photography, but mutual information metrics usually do a reasonable job when you have very different histograms for similar images.
Depending on how fast you need this to be and how many candidates there are, one way to leverage this would be to register the images using a mutual information metric and find the image pair where you end up with the lowest error. It would probably be a good idea to downsample the images to speed things up and reduce noise-sensitivity.
After extracting keypoints,forming descriptors and matching use some outlier removal algorithm like RANSAC. Opencv provide RANSAC with findHomography function.you can see the implementation.I have used this with SURF and it gave me reasonably good results.
Ideas:
a) use the super-resolution module to improve your input (OpenCV245).
b) use maximally stable local color regions as matching features (MSER).