My C++ code is not detecting objects correctly yolov5 - c++

I have a yolov5 onnx file where I trained apples and bananas. I was using python until today, but I decided to switch to c++ to gain some speed. I get correct results when I use yolov5's own onnx files and image in the code I added below. But when I add my own onnx file and my test image it gives me wrong result. You can also find the attached image. What is the problem here?
// Include Libraries.
\#include \<opencv2/opencv.hpp\>
\#include \<fstream\>
// Namespaces.
using namespace cv;
using namespace std;
using namespace cv::dnn;
// Constants.
const float INPUT_WIDTH = 640.0;
const float INPUT_HEIGHT = 640.0;
const float SCORE_THRESHOLD = 0.3;
const float NMS_THRESHOLD = 0.4;
const float CONFIDENCE_THRESHOLD = 0.65;
// Text parameters.
const float FONT_SCALE = 0.7;
const int FONT_FACE = FONT_HERSHEY_SIMPLEX;
const int THICKNESS = 1;
// Colors.
Scalar BLACK = Scalar(0,0,0);
Scalar BLUE = Scalar(255, 178, 50);
Scalar YELLOW = Scalar(0, 255, 255);
Scalar RED = Scalar(0,0,255);
// Draw the predicted bounding box.
void draw_label(Mat& input_image, string label, int left, int top)
{
// Display the label at the top of the bounding box.
int baseLine;
Size label_size = getTextSize(label, FONT_FACE, FONT_SCALE, THICKNESS, &baseLine);
top = max(top, label_size.height);
// Top left corner.
Point tlc = Point(left, top);
// Bottom right corner.
Point brc = Point(left + label_size.width, top + label_size.height + baseLine);
// Draw black rectangle.
rectangle(input_image, tlc, brc, BLACK, FILLED);
// Put the label on the black rectangle.
putText(input_image, label, Point(left, top + label_size.height), FONT_FACE, FONT_SCALE, YELLOW, THICKNESS);
}
vector\<Mat\> pre_process(Mat &input_image, Net &net)
{
// Convert to blob.
Mat blob;
blobFromImage(input_image, blob, 1./255., Size(INPUT_WIDTH, INPUT_HEIGHT), Scalar(), true, false);
net.setInput(blob);
// Forward propagate.
vector<Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());
return outputs;
}
Mat post_process(Mat &input_image, vector\<Mat\> &outputs, const vector\<string\> &class_name)
{
// Initialize vectors to hold respective outputs while unwrapping detections.
vector\<int\> class_ids;
vector\<float\> confidences;
vector\<Rect\> boxes;
// Resizing factor.
float x_factor = input_image.cols / INPUT_WIDTH;
float y_factor = input_image.rows / INPUT_HEIGHT;
float *data = (float *)outputs[0].data;
const int dimensions = 85;
const int rows = 25200;
// Iterate through 25200 detections.
for (int i = 0; i < rows; ++i)
{
float confidence = data[4];
// Discard bad detections and continue.
if (confidence >= CONFIDENCE_THRESHOLD)
{
float * classes_scores = data + 5;
// Create a 1x85 Mat and store class scores of 80 classes.
Mat scores(1, class_name.size(), CV_32FC1, classes_scores);
// Perform minMaxLoc and acquire index of best class score.
Point class_id;
double max_class_score;
minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
// Continue if the class score is above the threshold.
if (max_class_score > SCORE_THRESHOLD)
{
// Store class ID and confidence in the pre-defined respective vectors.
confidences.push_back(confidence);
class_ids.push_back(class_id.x);
// Center.
float cx = data[0];
float cy = data[1];
// Box dimension.
float w = data[2];
float h = data[3];
// Bounding box coordinates.
int left = int((cx - 0.5 * w) * x_factor);
int top = int((cy - 0.5 * h) * y_factor);
int width = int(w * x_factor);
int height = int(h * y_factor);
// Store good detections in the boxes vector.
boxes.push_back(Rect(left, top, width, height));
}
}
// Jump to the next column.
data += 85;
}
// Perform Non Maximum Suppression and draw predictions.
vector<int> indices;
NMSBoxes(boxes, confidences, SCORE_THRESHOLD, NMS_THRESHOLD, indices);
for (int i = 0; i < indices.size(); i++)
{
int idx = indices[i];
Rect box = boxes[idx];
int left = box.x;
int top = box.y;
int width = box.width;
int height = box.height;
// Draw bounding box.
rectangle(input_image, Point(left, top), Point(left + width, top + height), BLUE, 3*THICKNESS);
// Get the label for the class name and its confidence.
string label = format("%.2f", confidences[idx]);
label = class_name[class_ids[idx]] + ":" + label;
// Draw class labels.
draw_label(input_image, label, left, top);
//cout<<"The Value is "<<label;
//cout<<endl;
}
return input_image;
}
int main()
{
vector<string> class_list;
ifstream ifs("/Users/admin/Documents/C++/First/obj.names");
string line;
while (getline(ifs, line))
{
class_list.push_back(line);
}
// Load image.
Mat frame;
frame = imread("/Users/admin/Documents/C++/First/test.jpg");
// Load model.
Net net;
net = readNet("/Users/admin/Documents/C++/First/my.onnx");
vector<Mat> detections;
detections = pre_process(frame, net);
Mat img = post_process(frame, detections, class_list);
//Mat img = post_process(frame.clone(), detections, class_list);
// Put efficiency information.
// The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
vector<double> layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
string label = format("Inference time : %.2f ms", t);
putText(img, label, Point(20, 40), FONT_FACE, FONT_SCALE, RED);
imshow("Output", img);
waitKey(0);
return 0;
}
The photos I use are 640x480. I played around with the size of the photo, thinking it might be related, but the same problem persisted.

The Yolov5 output format is xyxy as can be seen here:
https://github.com/ultralytics/yolov5/blob/bfa1f23045c7c4136a9b8ced9d6be8249ed72692/detect.py#L161
Not xywh as you are assuming in your code

Related

Slow motion in C++

I want to do slow motion. I've seen an implementation here: https://github.com/vaibhav06891/SlowMotion
I modified the code to generate only one frame.
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/video/tracking.hpp>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <fstream>
#include <string>
using namespace cv;
using namespace std;
#define CLAMP(x,min,max) ( ((x) < (min)) ? (min) : ( ((x) > (max)) ? (max) : (x) ) )
int main(int argc, char** argv)
{
Mat frame,prevframe;
prevframe = imread("img1.png");
frame = imread("img2.png");
Mat prevgray, gray;
Mat fflow,bflow;
Mat flowf(frame.rows,frame.cols ,CV_8UC3); // the forward co-ordinates for interpolation
flowf.setTo(Scalar(255,255,255));
Mat flowb(frame.rows,frame.cols ,CV_8UC3); // the backward co-ordinates for interpolation
flowb.setTo(Scalar(255,255,255));
Mat final(frame.rows,frame.cols ,CV_8UC3);
int fx,fy,bx,by;
cvtColor(prevframe,prevgray,COLOR_BGR2GRAY); // Convert to gray space for optical flow calculation
cvtColor(frame, gray, COLOR_BGR2GRAY);
calcOpticalFlowFarneback(prevgray, gray, fflow, 0.5, 3, 15, 3, 3, 1.2, 0); // forward optical flow
calcOpticalFlowFarneback(gray, prevgray, bflow, 0.5, 3, 15, 3, 3, 1.2, 0); //backward optical flow
for (int y=0; y<frame.rows; y++)
{
for (int x=0; x<frame.cols; x++)
{
const Point2f fxy = fflow.at<Point2f>(y,x);
fy = CLAMP(y+fxy.y*0.5,0,frame.rows);
fx = CLAMP(x+fxy.x*0.5,0,frame.cols);
flowf.at<Vec3b>(fy,fx) = prevframe.at<Vec3b>(y,x);
const Point2f bxy = bflow.at<Point2f>(y,x);
by = CLAMP(y+bxy.y*(1-0.5),0,frame.rows);
bx = CLAMP(x+bxy.x*(1-0.5),0,frame.cols);
flowb.at<Vec3b>(by,bx) = frame.at<Vec3b>(y,x);
}
}
final = flowf*(1-0.5) + flowb*0.5; //combination of frwd and bckward martrix
cv::medianBlur(final,final,3);
imwrite( "output.png",final);
return 0;
}
But the result is not as expected.
For the images:
The result is :
Does anyone know what is the problem?
The optical flow algorithm won't work for your test images.
The first problem is that your test images have very little difference in neighbour pixel values. That completely black lines and a single color square give no clues to optical flow algorithm where the image areas moved as the algorithm is not able to process the whole image at once and calculates optical flow with a small 15x15 (as you set it in calcOpticalFlowFarneback) pixels window.
The second problem is that your test images differ too much. The distance between positions of brown square is too big. Again Farneback is not able to detect it.
Try the code with some real life video frames or edit your tests to be less monotonous (set some texture to the square, background and rectangle lines) and bring the squares closer to each other on the images (try 2-10 px distance). You can also play with calcOpticalFlowFarneback arguments (read here) to suit your conditions.
You can use this code to save the optical flow you get to an image for debugging:
Mat debugImage = Mat::zeros(fflow.size(), CV_8UC3);
float hsvHue, magnitude;
for (int x = 0; x < fflow.cols; x++)
{
for (int y = 0; y < fflow.rows; y++)
{
auto& item = fflow.at<Vec2f>(y, x);
magnitude = sqrtf(item[0] * item[0] + item[1] * item[1]);
hsvHue = atan2f(item[1], item[0]) / static_cast<float>(CV_PI)* 180.f;
// div 2 to fit 0..255 range
hsvHue = (hsvHue >= 0. ? hsvHue : (360.f + hsvHue)) / 2.f;
debugImage.at<Vec3b>(y, x)[0] = static_cast<uchar>(hsvHue);
debugImage.at<Vec3b>(y, x)[1] = 255;
debugImage.at<Vec3b>(y, x)[2] = static_cast<uchar>(255.f * magnitude);
}
}
cvtColor(debugImage, debugImage, CV_HSV2BGR);
imwrite("OpticalFlow.png", debugImage);
Here pixel flow direction will be represented with color (hue), and pixel move distance will be represented with brightness.
Try to use this images I created:
.
Also note that
for (int y = 0; y < frame.rows; y++)
{
for (int x = 0; x < frame.cols; x++)
{
const Point2f fxy = fflow.at<Point2f>(y, x);
fy = CLAMP(y + fxy.y*0.5, 0, frame.rows);
fx = CLAMP(x + fxy.x*0.5, 0, frame.cols);
flowf.at<Vec3b>(fy, fx) = prevframe.at<Vec3b>(y, x);
...
code won't color some flowf pixels that have no corresponding target positions they moved to, and optical flow algorithm can produce such situations. I would change it to:
for (int y = 0; y < frame.rows; y++)
{
for (int x = 0; x < frame.cols; x++)
{
const Point2f fxy = fflow.at<Point2f>(y, x);
fy = CLAMP(y - fxy.y*0.5, 0, frame.rows);
fx = CLAMP(x - fxy.x*0.5, 0, frame.cols);
flowf.at<Vec3b>(y, x) = prevframe.at<Vec3b>(fy, fx);
const Point2f bxy = bflow.at<Point2f>(y, x);
by = CLAMP(y - bxy.y*(1 - 0.5), 0, frame.rows);
bx = CLAMP(x - bxy.x*(1 - 0.5), 0, frame.cols);
flowb.at<Vec3b>(y, x) = frame.at<Vec3b>(by, bx);
}
}
With this changed code and my tests I get this output:

Calculate text size

Note: This question How to put text into a bounding box in OpenCV? is in some ways similar to this one but it is not the same question. The OP of the questions tried to spread a text to the whole size of his image & the code in the answer that gots the mark is just resizing the text using a mask.
I'm using openCV combined with C++ to do some image detection & manipulation.
So I want to align a text with a unknown length at a specific origin. The font-scale should be calculated because I'd like to specify a width factor for the maximum Text-width like you can see in the image below:
This is the code I got so far:
int fontFace = cv::FONT_HERSHEY_DUPLEX,
fontScale = myTextString.size() / 10;
cv::Size textSize = getTextSize(image, fontFace, fontScale, 0, 0);
putText(image, myTextString, cv::Point( (origin.x + textSize.width)/2, (origin.y + textSize.height)/2 ), fontFace, fontScale, Scalar(255, 0, 0));
Something like this should do it. You can change how the margins are calculated to change the horizontal/vertical alignment of the font.
If the height doesn't matter, you can just leave target.height a large number.
void drawtorect(cv::Mat & mat, cv::Rect target, int face, int thickness, cv::Scalar color, const std::string & str)
{
cv::Size rect = cv::getTextSize(str, face, 1.0, thickness, 0);
double scalex = (double)target.width / (double)rect.width;
double scaley = (double)target.height / (double)rect.height;
double scale = std::min(scalex, scaley);
int marginx = scale == scalex ? 0 : (int)((double)target.width * (scalex - scale) / scalex * 0.5);
int marginy = scale == scaley ? 0 : (int)((double)target.height * (scaley - scale) / scaley * 0.5);
cv::putText(mat, str, cv::Point(target.x + marginx, target.y + target.height - marginy), face, scale, color, thickness, 8, false);
}
* edit *
// Sample code
int L = 80; // width and height per square
int M = 60;
cv::Mat m( 5*M, 7*L,CV_8UC3,cv::Scalar(0,0,0) );
// create checkerboard
for ( int y=0,ymax=m.rows-M;y<=ymax; y+=M)
{
int c = (y/M)%2 == 0 ? 0 : 1;
for ( int x=0,xmax=m.cols-L;x<=xmax;x+=L)
{
if ( (c++)%2!=0 )
continue; // skip odd squares
// convenient way to do this
m( cv::Rect(x,y,L,M) ).setTo( cv::Scalar(64,64,64) );
}
}
// fill checkerboard ROIs by some text
int64 id=1;
for ( int y=0,ymax=m.rows-M;y<=ymax; y+=M)
{
for ( int x=0,xmax=m.cols-L;x<=xmax;x+=L)
{
std::stringstream ss;
ss<<(id<<=1); // some increasing text input
drawtorect( m, cv::Rect(x,y,L,M), cv::FONT_HERSHEY_PLAIN,1,cv::Scalar(255,255,255),ss.str() );
}
}

Video Stabilization

I 'm researching about Video Stabilization field. I implement a application using OpenCV.
My progress such as:
Surf points extraction
Matching
estimateRigidTransform
warpAffine
But the result video is not be stable. Can anyone help me this problem or provide me some source code link to improve?
Sample video: Hippo video
Here is my code [EDIT]
#include "stdafx.h"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <opencv2/nonfree/features2d.hpp>
#include <opencv2/opencv.hpp>
const double smooth_level = 0.7;
using namespace cv;
using namespace std;
struct TransformParam
{
TransformParam() {}
TransformParam(double _dx, double _dy, double _da) {
dx = _dx;
dy = _dy;
da = _da;
}
double dx; // translation x
double dy; // translation y
double da; // angle
};
int main( int argc, char** argv )
{
VideoCapture cap ("test12.avi");
Mat cur, cur_grey;
Mat prev, prev_grey;
cap >> prev;
cvtColor(prev, prev_grey, COLOR_BGR2GRAY);
// Step 1 - Get previous to current frame transformation (dx, dy, da) for all frames
vector <TransformParam> prev_to_cur_transform; // previous to current
int k=1;
int max_frames = cap.get(CV_CAP_PROP_FRAME_COUNT);
VideoWriter writeVideo ("stable.avi",0,30,cvSize(prev.cols,prev.rows),true);
Mat last_T;
double avg_dx = 0, avg_dy = 0, avg_da = 0;
Mat smooth_T(2,3,CV_64F);
while(true) {
cap >> cur;
if(cur.data == NULL) {
break;
}
cvtColor(cur, cur_grey, COLOR_BGR2GRAY);
// vector from prev to cur
vector <Point2f> prev_corner, cur_corner;
vector <Point2f> prev_corner2, cur_corner2;
vector <uchar> status;
vector <float> err;
goodFeaturesToTrack(prev_grey, prev_corner, 200, 0.01, 30);
calcOpticalFlowPyrLK(prev_grey, cur_grey, prev_corner, cur_corner, status, err);
// weed out bad matches
for(size_t i=0; i < status.size(); i++) {
if(status[i]) {
prev_corner2.push_back(prev_corner[i]);
cur_corner2.push_back(cur_corner[i]);
}
}
// translation + rotation only
Mat T = estimateRigidTransform(prev_corner2, cur_corner2, false);
// in rare cases no transform is found. We'll just use the last known good transform.
if(T.data == NULL) {
last_T.copyTo(T);
}
T.copyTo(last_T);
// decompose T
double dx = T.at<double>(0,2);
double dy = T.at<double>(1,2);
double da = atan2(T.at<double>(1,0), T.at<double>(0,0));
prev_to_cur_transform.push_back(TransformParam(dx, dy, da));
avg_dx = (avg_dx * smooth_level) + (dx * (1- smooth_level));
avg_dy = (avg_dy * smooth_level) + (dy * (1- smooth_level));
avg_da = (avg_da * smooth_level) + (da * (1- smooth_level));
smooth_T.at<double>(0,0) = cos(avg_da);
smooth_T.at<double>(0,1) = -sin(avg_da);
smooth_T.at<double>(1,0) = sin(avg_da);
smooth_T.at<double>(1,1) = cos(avg_da);
smooth_T.at<double>(0,2) = avg_dx;
smooth_T.at<double>(1,2) = avg_dy;
Mat stable;
warpAffine(prev,stable,smooth_T,prev.size());
Mat canvas = Mat::zeros(cur.rows, cur.cols*2+10, cur.type());
prev.copyTo(canvas(Range::all(), Range(0, prev.cols)));
stable.copyTo(canvas(Range::all(), Range(prev.cols+10, prev.cols*2+10)));
imshow("before and after", canvas);
waitKey(20);
writeVideo.write(stable);
cur.copyTo(prev);
cur_grey.copyTo(prev_grey);
k++;
}
}
First, you can just blur you image. It will helps a bit. Second, you can easily smooth your matrix by simplest implementation of exponential smooth A(t+1) = a*A(t)+(1-a)*A(t+1) and play with a-value in [0;1] range. Third, you can turn off some type of transformations like rotation, shift etc.
Here is code example:
t = estimateRigidTransform(new, old, 0); // 0 means not all transformations (5 of 6)
if(!t.empty()){
// t(Range(0,2), Range(0,2)) = Mat::eye(2, 2, CV_64FC1); // turning off rotation
// t.at<double>(0,2) = 0; t.at<double>(1,2) = 0; // turning off shift dx and dy
tAvrg = tAvrg*a + t*(1-a); // a - smooth level in [0;1] range, play with it
warpAffine(new, stable, tAvrg, Size(new.cols, new.rows));
}

Finding HSV Thresholds Via Histograms with OpenCV

I'm trying to write a method that will find the proper threshold values in HSV space for an object placed at the center of the screen. These values are used for an object tracking algorithm. I've tested that piece of code with hand coded threshold values and it works well. The idea behind the method is that it should calculate the histograms for each of the channels and then return the 5th and 95th percentile for each to be used as the threshold values. (credit: How to find RGB/HSV color parameters for color tracking?) The image being passed is a picture of the object to be tracked (which is set by the user before the whole process begins. Here is the code
std::vector<cv::Scalar> HSV_Threshold_Determiner::Get_Threshold_Values(const cv::Mat& image)
{
cv::Mat inputImage;
cv::cvtColor(image, inputImage, CV_BGR2HSV);
std::vector<cv::Mat> bgrPlanes;
cv::split(inputImage, bgrPlanes);
cv::Mat hHist, sHist, vHist;
int hMax = 180, svMax = 256;
float hRanges[] = { 0, (float)hMax };
const float* hRange = { hRanges };
float svRanges[] = { 0, (float)svMax };
const float* svRange = { svRanges };
//float sRanges[] = { 0, 256 };
cv::calcHist(&bgrPlanes[0], 1, 0, cv::Mat(), hHist, 1, &hMax, &hRange);
cv::calcHist(&bgrPlanes[1], 1, 0, cv::Mat(), sHist, 1, &svMax, &svRange);
cv::calcHist(&bgrPlanes[2], 1, 0, cv::Mat(), vHist, 1, &svMax, &svRange);
int totalEntries = image.cols * image.rows;
int fiveCutoff = (int)(totalEntries * .05);
int ninetyFiveCutoff = (int)(totalEntries * .95);
float hTotal = 0, sTotal = 0, vTotal = 0;
bool hMinFound = false, hMaxFound = false, sMinFound = false, sMaxFound = false,
vMinFound = false, vMaxFound = false;
cv::Scalar hThresholds;
cv::Scalar sThresholds;
cv::Scalar vThresholds;
for(int i = 0; i < vHist.rows; ++i)
{
if(i < hHist.rows)
{
hTotal += hHist.at<float>(i, 0);
if(hTotal >= fiveCutoff && !hMinFound)
{
hThresholds.val[0] = i;
hMinFound = true;
}
else if(hTotal>= ninetyFiveCutoff && !hMaxFound)
{
hThresholds.val[1] = i;
hMaxFound = true;
}
}
sTotal += sHist.at<float>(i, 0);
vTotal += vHist.at<float>(i, 0);
if(sTotal >= fiveCutoff && !sMinFound)
{
sThresholds.val[0] = i;
sMinFound = true;
}
else if(sTotal >= ninetyFiveCutoff && !sMaxFound)
{
sThresholds.val[1] = i;
sMaxFound = true;
}
if(vTotal >= fiveCutoff && !vMinFound)
{
vThresholds.val[0] = i;
vMinFound = true;
}
else if(vTotal >= ninetyFiveCutoff && !vMaxFound)
{
vThresholds.val[1] = i;
vMaxFound = true;
}
if(vMaxFound && sMaxFound && hMaxFound)
{
break;
}
}
std::vector<cv::Scalar> returnVect;
returnVect.push_back(hThresholds);
returnVect.push_back(sThresholds);
returnVect.push_back(vThresholds);
return returnVect;
}
What I am trying to do is sum up the number of entries in each bucket until I get to a number that is greater than or equal to five percent and ninety-five percent of the total. Unfortunately the numbers I get are never close to the ones I get if I do the thresholding by hand.
Mat img = ... // from camera or some other source
// STEP 1: learning phase
Mat hsv, imgThreshed, processed, denoised;
cv::GaussianBlur(img, denoised, cv::Size(5,5), 2, 2); // remove noise
cv::cvtColor(denoised, hsv, CV_BGR2HSV);
// lets say we picked manually a region of 100x100 px with the interested color/object using mouse
cv::Mat roi = hsv (cv::Range(mousex-50, mousey+50), cv::Range(mousex-50, mousey+50));
// must split all channels to get Hue only
std::vector<cv::Mat> hsvPlanes;
cv::split(roi, hsvPlanes);
// compute statistics for Hue value
cv::Scalar mean, stddev;
cv::meanStdDev(hsvPlanes[0], mean, stddev);
// ensure we get 95% of all valid Hue samples (statistics 3*sigma rule)
float minHue = mean[0] - stddev[0]*3;
float maxHue = mean[0] + stddev[0]*3;
// STEP 2: detection phase
cv::inRange(hsvPlanes[0], cv::Scalar(minHue), cv::Scalar(maxHue), imgThreshed);
imshow("thresholded", imgThreshed);
cv_erode(imgThreshed, processed, 5); // minimizes noise
cv_dilate(processed, processed, 20); // maximize left regions
imshow("final", processed);
//STEP 3: do some blob/contour detection on processed image & find maximum blob/region, etc ...
A much simpler solution - just calculate mean & std. deviation for a region of interest, i.e. containing the Hue value.
Since Hue is the most stable component in the image, the other components saturation & value should be discarded as they vary too much. However you can still compute mean for them if needed.

Is there an easy way/algorithm to match 2 clouds of 2D points?

I am wondering if there is an easy way to match (register) 2 clouds of 2d points.
Let's say I have an object represented by points and an cluttered 2nd image with the object points and noise (noise in a way of points that are useless).
Basically the object can be 2d rotated as well as translated and scaled.
I know there is the ICP - Algorithm but I think that this is not a good approach due to high noise.
I hope that you understand what i mean. please ask if (im sure it is) anything is unclear.
cheers
Here is the function that finds translation and rotation. Generalization to scaling, weighted points, and RANSAC are straight forward. I used openCV library for visualization and SVD. The function below combines data generation, Unit Test , and actual solution.
// rotation and translation in 2D from point correspondences
void rigidTransform2D(const int N) {
// Algorithm: http://igl.ethz.ch/projects/ARAP/svd_rot.pdf
const bool debug = false; // print more debug info
const bool add_noise = true; // add noise to imput and output
srand(time(NULL)); // randomize each time
/*********************************
* Creat data with some noise
**********************************/
// Simulated transformation
Point2f T(1.0f, -2.0f);
float a = 30.0; // [-180, 180], see atan2(y, x)
float noise_level = 0.1f;
cout<<"True parameters: rot = "<<a<<"deg., T = "<<T<<
"; noise level = "<<noise_level<<endl;
// noise
vector<Point2f> noise_src(N), noise_dst(N);
for (int i=0; i<N; i++) {
noise_src[i] = Point2f(randf(noise_level), randf(noise_level));
noise_dst[i] = Point2f(randf(noise_level), randf(noise_level));
}
// create data with noise
vector<Point2f> src(N), dst(N);
float Rdata = 10.0f; // radius of data
float cosa = cos(a*DEG2RAD);
float sina = sin(a*DEG2RAD);
for (int i=0; i<N; i++) {
// src
float x1 = randf(Rdata);
float y1 = randf(Rdata);
src[i] = Point2f(x1,y1);
if (add_noise)
src[i] += noise_src[i];
// dst
float x2 = x1*cosa - y1*sina;
float y2 = x1*sina + y1*cosa;
dst[i] = Point2f(x2,y2) + T;
if (add_noise)
dst[i] += noise_dst[i];
if (debug)
cout<<i<<": "<<src[i]<<"---"<<dst[i]<<endl;
}
// Calculate data centroids
Scalar centroid_src = mean(src);
Scalar centroid_dst = mean(dst);
Point2f center_src(centroid_src[0], centroid_src[1]);
Point2f center_dst(centroid_dst[0], centroid_dst[1]);
if (debug)
cout<<"Centers: "<<center_src<<", "<<center_dst<<endl;
/*********************************
* Visualize data
**********************************/
// Visualization
namedWindow("data", 1);
float w = 400, h = 400;
Mat Mdata(w, h, CV_8UC3); Mdata = Scalar(0);
Point2f center_img(w/2, h/2);
float scl = 0.4*min(w/Rdata, h/Rdata); // compensate for noise
scl/=sqrt(2); // compensate for rotation effect
Point2f dT = (center_src+center_dst)*0.5; // compensate for translation
for (int i=0; i<N; i++) {
Point2f p1(scl*(src[i] - dT));
Point2f p2(scl*(dst[i] - dT));
// invert Y axis
p1.y = -p1.y; p2.y = -p2.y;
// add image center
p1+=center_img; p2+=center_img;
circle(Mdata, p1, 1, Scalar(0, 255, 0));
circle(Mdata, p2, 1, Scalar(0, 0, 255));
line(Mdata, p1, p2, Scalar(100, 100, 100));
}
/*********************************
* Get 2D rotation and translation
**********************************/
markTime();
// subtract centroids from data
for (int i=0; i<N; i++) {
src[i] -= center_src;
dst[i] -= center_dst;
}
// compute a covariance matrix
float Cxx = 0.0, Cxy = 0.0, Cyx = 0.0, Cyy = 0.0;
for (int i=0; i<N; i++) {
Cxx += src[i].x*dst[i].x;
Cxy += src[i].x*dst[i].y;
Cyx += src[i].y*dst[i].x;
Cyy += src[i].y*dst[i].y;
}
Mat Mcov = (Mat_<float>(2, 2)<<Cxx, Cxy, Cyx, Cyy);
if (debug)
cout<<"Covariance Matrix "<<Mcov<<endl;
// SVD
cv::SVD svd;
svd = SVD(Mcov, SVD::FULL_UV);
if (debug) {
cout<<"U = "<<svd.u<<endl;
cout<<"W = "<<svd.w<<endl;
cout<<"V transposed = "<<svd.vt<<endl;
}
// rotation = V*Ut
Mat V = svd.vt.t();
Mat Ut = svd.u.t();
float det_VUt = determinant(V*Ut);
Mat W = (Mat_<float>(2, 2)<<1.0, 0.0, 0.0, det_VUt);
float rot[4];
Mat R_est(2, 2, CV_32F, rot);
R_est = V*W*Ut;
if (debug)
cout<<"Rotation matrix: "<<R_est<<endl;
float cos_est = rot[0];
float sin_est = rot[2];
float ang = atan2(sin_est, cos_est);
// translation = mean_dst - R*mean_src
Point2f center_srcRot = Point2f(
cos_est*center_src.x - sin_est*center_src.y,
sin_est*center_src.x + cos_est*center_src.y);
Point2f T_est = center_dst - center_srcRot;
// RMSE
double RMSE = 0.0;
for (int i=0; i<N; i++) {
Point2f dst_est(
cos_est*src[i].x - sin_est*src[i].y,
sin_est*src[i].x + cos_est*src[i].y);
RMSE += SQR(dst[i].x - dst_est.x) + SQR(dst[i].y - dst_est.y);
}
if (N>0)
RMSE = sqrt(RMSE/N);
// Final estimate msg
cout<<"Estimate = "<<ang*RAD2DEG<<"deg., T = "<<T_est<<"; RMSE = "<<RMSE<<endl;
// show image
printTime(1);
imshow("data", Mdata);
waitKey(-1);
return;
} // rigidTransform2D()
// --------------------------- 3DOF
// calculates squared error from two point mapping; assumes rotation around Origin.
inline float sqErr_3Dof(Point2f p1, Point2f p2,
float cos_alpha, float sin_alpha, Point2f T) {
float x2_est = T.x + cos_alpha * p1.x - sin_alpha * p1.y;
float y2_est = T.y + sin_alpha * p1.x + cos_alpha * p1.y;
Point2f p2_est(x2_est, y2_est);
Point2f dp = p2_est-p2;
float sq_er = dp.dot(dp); // squared distance
//cout<<dp<<endl;
return sq_er;
}
// calculate RMSE for point-to-point metrics
float RMSE_3Dof(const vector<Point2f>& src, const vector<Point2f>& dst,
const float* param, const bool* inliers, const Point2f center) {
const bool all_inliers = (inliers==NULL); // handy when we run QUADRTATIC will all inliers
unsigned int n = src.size();
assert(n>0 && n==dst.size());
float ang_rad = param[0];
Point2f T(param[1], param[2]);
float cos_alpha = cos(ang_rad);
float sin_alpha = sin(ang_rad);
double RMSE = 0.0;
int ninliers = 0;
for (unsigned int i=0; i<n; i++) {
if (all_inliers || inliers[i]) {
RMSE += sqErr_3Dof(src[i]-center, dst[i]-center, cos_alpha, sin_alpha, T);
ninliers++;
}
}
//cout<<"RMSE = "<<RMSE<<endl;
if (ninliers>0)
return sqrt(RMSE/ninliers);
else
return LARGE_NUMBER;
}
// Sets inliers and returns their count
inline int setInliers3Dof(const vector<Point2f>& src, const vector <Point2f>& dst,
bool* inliers,
const float* param,
const float max_er,
const Point2f center) {
float ang_rad = param[0];
Point2f T(param[1], param[2]);
// set inliers
unsigned int ninliers = 0;
unsigned int n = src.size();
assert(n>0 && n==dst.size());
float cos_ang = cos(ang_rad);
float sin_ang = sin(ang_rad);
float max_sqErr = max_er*max_er; // comparing squared values
if (inliers==NULL) {
// just get the number of inliers (e.g. after QUADRATIC fit only)
for (unsigned int i=0; i<n; i++) {
float sqErr = sqErr_3Dof(src[i]-center, dst[i]-center, cos_ang, sin_ang, T);
if ( sqErr < max_sqErr)
ninliers++;
}
} else {
// get the number of inliers and set them (e.g. for RANSAC)
for (unsigned int i=0; i<n; i++) {
float sqErr = sqErr_3Dof(src[i]-center, dst[i]-center, cos_ang, sin_ang, T);
if ( sqErr < max_sqErr) {
inliers[i] = 1;
ninliers++;
} else {
inliers[i] = 0;
}
}
}
return ninliers;
}
// fits 3DOF (rotation and translation in 2D) with least squares.
float fit3DofQUADRATICold(const vector<Point2f>& src, const vector<Point2f>& dst,
float* param, const bool* inliers, const Point2f center) {
const bool all_inliers = (inliers==NULL); // handy when we run QUADRTATIC will all inliers
unsigned int n = src.size();
assert(dst.size() == n);
// count inliers
int ninliers;
if (all_inliers) {
ninliers = n;
} else {
ninliers = 0;
for (unsigned int i=0; i<n; i++){
if (inliers[i])
ninliers++;
}
}
// under-dermined system
if (ninliers<2) {
// param[0] = 0.0f; // ?
// param[1] = 0.0f;
// param[2] = 0.0f;
return LARGE_NUMBER;
}
/*
* x1*cosx(a)-y1*sin(a) + Tx = X1
* x1*sin(a)+y1*cos(a) + Ty = Y1
*
* approximation for small angle a (radians) sin(a)=a, cos(a)=1;
*
* x1*1 - y1*a + Tx = X1
* x1*a + y1*1 + Ty = Y1
*
* in matrix form M1*h=M2
*
* 2n x 4 4 x 1 2n x 1
*
* -y1 1 0 x1 * a = X1
* x1 0 1 y1 Tx Y1
* Ty
* 1=Z
* ----------------------------
* src1 res src2
*/
// 4 x 1
float res_ar[4]; // alpha, Tx, Ty, 1
Mat res(4, 1, CV_32F, res_ar); // 4 x 1
// 2n x 4
Mat src1(2*ninliers, 4, CV_32F); // 2n x 4
// 2n x 1
Mat src2(2*ninliers, 1, CV_32F); // 2n x 1: [X1, Y1, X2, Y2, X3, Y3]'
for (unsigned int i=0, row_cnt = 0; i<n; i++) {
// use inliers only
if (all_inliers || inliers[i]) {
float x = src[i].x - center.x;
float y = src[i].y - center.y;
// first row
// src1
float* rowPtr = src1.ptr<float>(row_cnt);
rowPtr[0] = -y;
rowPtr[1] = 1.0f;
rowPtr[2] = 0.0f;
rowPtr[3] = x;
// src2
src2.at<float> (0, row_cnt) = dst[i].x - center.x;
// second row
row_cnt++;
// src1
rowPtr = src1.ptr<float>(row_cnt);
rowPtr[0] = x;
rowPtr[1] = 0.0f;
rowPtr[2] = 1.0f;
rowPtr[3] = y;
// src2
src2.at<float> (0, row_cnt) = dst[i].y - center.y;
}
}
cv::solve(src1, src2, res, DECOMP_SVD);
// estimators
float alpha_est;
Point2f T_est;
// original
alpha_est = res.at<float>(0, 0);
T_est = Point2f(res.at<float>(1, 0), res.at<float>(2, 0));
float Z = res.at<float>(3, 0);
if (abs(Z-1.0) > 0.1) {
//cout<<"Bad Z in fit3DOF(), Z should be close to 1.0 = "<<Z<<endl;
//return LARGE_NUMBER;
}
param[0] = alpha_est; // rad
param[1] = T_est.x;
param[2] = T_est.y;
// calculate RMSE
float RMSE = RMSE_3Dof(src, dst, param, inliers, center);
return RMSE;
} // fit3DofQUADRATICOLd()
// fits 3DOF (rotation and translation in 2D) with least squares.
float fit3DofQUADRATIC(const vector<Point2f>& src_, const vector<Point2f>& dst_,
float* param, const bool* inliers, const Point2f center) {
const bool debug = false; // print more debug info
const bool all_inliers = (inliers==NULL); // handy when we run QUADRTATIC will all inliers
assert(dst_.size() == src_.size());
int N = src_.size();
// collect inliers
vector<Point2f> src, dst;
int ninliers;
if (all_inliers) {
ninliers = N;
src = src_; // copy constructor
dst = dst_;
} else {
ninliers = 0;
for (int i=0; i<N; i++){
if (inliers[i]) {
ninliers++;
src.push_back(src_[i]);
dst.push_back(dst_[i]);
}
}
}
if (ninliers<2) {
param[0] = 0.0f; // default return when there is not enough points
param[1] = 0.0f;
param[2] = 0.0f;
return LARGE_NUMBER;
}
/* Algorithm: Least-Square Rigid Motion Using SVD by Olga Sorkine
* http://igl.ethz.ch/projects/ARAP/svd_rot.pdf
*
* Subtract centroids, calculate SVD(cov),
* R = V[1, det(VU')]'U', T = mean_q-R*mean_p
*/
// Calculate data centroids
Scalar centroid_src = mean(src);
Scalar centroid_dst = mean(dst);
Point2f center_src(centroid_src[0], centroid_src[1]);
Point2f center_dst(centroid_dst[0], centroid_dst[1]);
if (debug)
cout<<"Centers: "<<center_src<<", "<<center_dst<<endl;
// subtract centroids from data
for (int i=0; i<ninliers; i++) {
src[i] -= center_src;
dst[i] -= center_dst;
}
// compute a covariance matrix
float Cxx = 0.0, Cxy = 0.0, Cyx = 0.0, Cyy = 0.0;
for (int i=0; i<ninliers; i++) {
Cxx += src[i].x*dst[i].x;
Cxy += src[i].x*dst[i].y;
Cyx += src[i].y*dst[i].x;
Cyy += src[i].y*dst[i].y;
}
Mat Mcov = (Mat_<float>(2, 2)<<Cxx, Cxy, Cyx, Cyy);
Mcov /= (ninliers-1);
if (debug)
cout<<"Covariance-like Matrix "<<Mcov<<endl;
// SVD of covariance
cv::SVD svd;
svd = SVD(Mcov, SVD::FULL_UV);
if (debug) {
cout<<"U = "<<svd.u<<endl;
cout<<"W = "<<svd.w<<endl;
cout<<"V transposed = "<<svd.vt<<endl;
}
// rotation (V*Ut)
Mat V = svd.vt.t();
Mat Ut = svd.u.t();
float det_VUt = determinant(V*Ut);
Mat W = (Mat_<float>(2, 2)<<1.0, 0.0, 0.0, det_VUt);
float rot[4];
Mat R_est(2, 2, CV_32F, rot);
R_est = V*W*Ut;
if (debug)
cout<<"Rotation matrix: "<<R_est<<endl;
float cos_est = rot[0];
float sin_est = rot[2];
float ang = atan2(sin_est, cos_est);
// translation (mean_dst - R*mean_src)
Point2f center_srcRot = Point2f(
cos_est*center_src.x - sin_est*center_src.y,
sin_est*center_src.x + cos_est*center_src.y);
Point2f T_est = center_dst - center_srcRot;
// Final estimate msg
if (debug)
cout<<"Estimate = "<<ang*RAD2DEG<<"deg., T = "<<T_est<<endl;
param[0] = ang; // rad
param[1] = T_est.x;
param[2] = T_est.y;
// calculate RMSE
float RMSE = RMSE_3Dof(src_, dst_, param, inliers, center);
return RMSE;
} // fit3DofQUADRATIC()
// RANSAC fit in 3DOF: 1D rot and 2D translation (maximizes the number of inliers)
// NOTE: no data normalization is currently performed
float fit3DofRANSAC(const vector<Point2f>& src, const vector<Point2f>& dst,
float* best_param, bool* inliers,
const Point2f center ,
const float inlierMaxEr,
const int niter) {
const int ITERATION_TO_SETTLE = 2; // iterations to settle inliers and param
const float INLIERS_RATIO_OK = 0.95f; // stopping criterion
// size of data vector
unsigned int N = src.size();
assert(N==dst.size());
// unrealistic case
if(N<2) {
best_param[0] = 0.0f; // ?
best_param[1] = 0.0f;
best_param[2] = 0.0f;
return LARGE_NUMBER;
}
unsigned int ninliers; // current number of inliers
unsigned int best_ninliers = 0; // number of inliers
float best_rmse = LARGE_NUMBER; // error
float cur_rmse; // current distance error
float param[3]; // rad, Tx, Ty
vector <Point2f> src_2pt(2), dst_2pt(2);// min set of 2 points (1 correspondence generates 2 equations)
srand (time(NULL));
// iterations
for (int iter = 0; iter<niter; iter++) {
#ifdef DEBUG_RANSAC
cout<<"iteration "<<iter<<": ";
#endif
// 1. Select a random set of 2 points (not obligatory inliers but valid)
int i1, i2;
i1 = rand() % N; // [0, N[
i2 = i1;
while (i2==i1) {
i2 = rand() % N;
}
src_2pt[0] = src[i1]; // corresponding points
src_2pt[1] = src[i2];
dst_2pt[0] = dst[i1];
dst_2pt[1] = dst[i2];
bool two_inliers[] = {true, true};
// 2. Quadratic fit for 2 points
cur_rmse = fit3DofQUADRATIC(src_2pt, dst_2pt, param, two_inliers, center);
// 3. Recalculate to settle params and inliers using a larger set
for (int iter2=0; iter2<ITERATION_TO_SETTLE; iter2++) {
ninliers = setInliers3Dof(src, dst, inliers, param, inlierMaxEr, center); // changes inliers
cur_rmse = fit3DofQUADRATIC(src, dst, param, inliers, center); // changes cur_param
}
// potential ill-condition or large error
if (ninliers<2) {
#ifdef DEBUG_RANSAC
cout<<" !!! less than 2 inliers "<<endl;
#endif
continue;
} else {
#ifdef DEBUG_RANSAC
cout<<" "<<ninliers<<" inliers; ";
#endif
}
#ifdef DEBUG_RANSAC
cout<<"; recalculate: RMSE = "<<cur_rmse<<", "<<ninliers <<" inliers";
#endif
// 4. found a better solution?
if (ninliers > best_ninliers) {
best_ninliers = ninliers;
best_param[0] = param[0];
best_param[1] = param[1];
best_param[2] = param[2];
best_rmse = cur_rmse;
#ifdef DEBUG_RANSAC
cout<<" --- Solution improved: "<<
best_param[0]<<", "<<best_param[1]<<", "<<param[2]<<endl;
#endif
// exit condition
float inlier_ratio = (float)best_ninliers/N;
if (inlier_ratio > INLIERS_RATIO_OK) {
#ifdef DEBUG_RANSAC
cout<<"Breaking early after "<< iter+1<<
" iterations; inlier ratio = "<<inlier_ratio<<endl;
#endif
break;
}
} else {
#ifdef DEBUG_RANSAC
cout<<endl;
#endif
}
} // iterations
// 5. recreate inliers for the best parameters
ninliers = setInliers3Dof(src, dst, inliers, best_param, inlierMaxEr, center);
return best_rmse;
} // fit3DofRANSAC()
Let me first make sure I'm interpreting your question correctly. You have two sets of 2D points, one of which contains all "good" points corresponding to some object of interest, and one of which contains those points under an affine transformation with noisy points added. Right?
If that's correct, then there is a fairly reliable and efficient way to both reject noisy points and determine the transformation between your points of interest. The algorithm that is usually used to reject noisy points ("outliers") is known as RANSAC, and the algorithm used to determine the transformation can take several forms, but the most current state of the art is known as the five-point algorithm and can be found here -- a MATLAB implementation can be found here.
Unfortunately I don't know of a mature implementation of both of those combined; you'll probably have to do some work of your own to implement RANSAC and integrate it with the five point algorithm.
Edit:
Actually, OpenCV has an implementation that is overkill for your task (meaning it will work but will take more time than necessary) but is ready to work out of the box. The function of interest is called cv::findFundamentalMat.
I believe you are looking for something like David Lowe's SIFT (Scale Invariant Feature Transform). Other option is SURF (SIFT is patent protected). The OpenCV computer library presents a SURF implementation
I would try and use distance geometry (http://en.wikipedia.org/wiki/Distance_geometry) for this
Generate a scalar for each point by summing its distances to all neighbors within a certain radius. Though not perfect, this will be good discriminator for each point.
Then put all the scalars in a map that allows a point (p) to be retrieve by its scalar (s) plus/minus some delta
M(s+delta) = p (e.g K-D Tree) (http://en.wikipedia.org/wiki/Kd-tree)
Put all the reference set of 2D points in the map
On the other (test) set of 2D points:
foreach test scaling (esp if you have a good idea what typical scaling values are)
...scale each point by S
...recompute the scalars of the test set of points
......for each point P in test set (or perhaps a sample for faster method)
.........lookup point in reference scalar map within some delta
.........discard P if no mapping found
.........else foreach P' point found
............examine neighbors of P and see if they have corresponding scalars in the reference map within some delta (i.e reference point has neighbors with approx same value)
......... if all points tested have a mapping in the reference set, you have found a mapping of test point P onto reference point P' -> record mapping of test point to reference point
......discard scaling if no mappings recorded
Note this is trivially parallelized in several different places
This is off the top of my head, drawing from research I did years ago. It lacks fine details but the general idea is clear: find points in the noisy (test) graph whose distances to their closest neighbors are roughly the same as the reference set. Noisy graphs will have to measure the distances with a larger allowed error that less noisy graphs.
The algorithm works perfectly for graphs with no noise.
Edit: there is a refinement for the algorithm that doesn't require looking at different scalings. When computing the scalar for each point, use a relative distance measure instead. This will be invariant of transform
From C++, you could use ITK to do the image registration. It includes many registration functions that will work in the presence of noise.
The KLT (Kanade Lucas Tomasi) Feature Tracker makes a Affine Consistency Check of tracked features. The Affine Consistency Check takes into account translation, rotation and scaling. I don't know if it is of help to you, because you can't use the function (which calculates the affine transformation of a rectangular region) directly. But maybe you can learn from the documentation and source-code, how the affine transformation can be calculated and adapt it to your problem (clouds of points instead of a rectangular region).
You want want the Denton-Beveridge point matching algorithm. Source code at the bottom of the page linked below, and there is also a paper that explain the algorithm and why Ransac is a bad choice for this problem.
http://jasondenton.me/pntmatch.html