What accuracy should I expect from basic opencv ortho-rectification algorithms? - c++

So, I'm taking over the work on an ortho-rectification algorithm that is intended to produce "accurate" results. I'm running into trouble trying to increase the accuracy and could use a little help.
Here is the basic approach.
Extract a calibration pattern from an image that was taken from a mobile phone.
Rectify the image based on a calibration pattern in the image
Scale the image to get the real world size of the scene around the pattern.
The calibration pattern is held against a flat surface, like a wall, counter, table, floor and the user takes a picture. With that picture, we want to measure artifacts on the same surface as the calibration pattern. We have tried this with calibration patterns ranging from the size of a credit card to a sheet of paper (8.5" x 11")
Here is an example input picture
With this resulting output image
Right now our measurements are usually within 1-2% of what we expect. This is sufficient for small areas (less than 25cm away from the calibration pattern. However, we'd like the algorithm to scale so that we can accurately measure a 2x2 meter area. However, at that size, the current error is too much (2-4 cm).
Here is the algorithm we are following.
// convert original image to grayscale and perform morphological dilation to reduce false matches when finding circle grid
Mat imgGray;
cvtColor(imgOriginal, imgGray, CV_BGR2GRAY);
// find calibration pattern in original image
Size patternSize(4, 11);
vector <Point2f> circleCenters_OriginalImage;
if (!findCirclesGrid(imgGray, patternSize, circleCenters_OriginalImage, CALIB_CB_ASYMMETRIC_GRID))
{
return false;
}
Point2f inputQuad[4];
inputQuad[0] = Point2f(circleCenters_OriginalImage[0].x, circleCenters_OriginalImage[0].y);
inputQuad[1] = Point2f(circleCenters_OriginalImage[3].x, circleCenters_OriginalImage[3].y);
inputQuad[2] = Point2f(circleCenters_OriginalImage[43].x, circleCenters_OriginalImage[43].y);
inputQuad[3] = Point2f(circleCenters_OriginalImage[40].x, circleCenters_OriginalImage[40].y);
// create model points for calibration pattern
vector <Point2f> circleCenters_ObjectSpace = GeneratePatternPointsInObjectSpace(circleCenters_OriginalImage[0], Distance(circleCenters_OriginalImage[0], circleCenters_OriginalImage[1]) / 2.0f, ioData.marker_up);
Point2f outputQuad[4];
outputQuad[0] = Point2f(circleCenters_ObjectSpace[0].x, circleCenters_ObjectSpace[0].y);
outputQuad[1] = Point2f(circleCenters_ObjectSpace[3].x, circleCenters_ObjectSpace[3].y);
outputQuad[2] = Point2f(circleCenters_ObjectSpace[43].x, circleCenters_ObjectSpace[43].y);
outputQuad[3] = Point2f(circleCenters_ObjectSpace[40].x, circleCenters_ObjectSpace[40].y);
Mat lambda(2,4,CV_32FC1);
lambda = Mat::zeros(imgOriginal.rows, imgOriginal.cols, imgOriginal.type());
lambda = getPerspectiveTransform(inputQuad, outputQuad);
warpPerspective(imgOriginal, imgOrthorectified, lambda, imgOrthorectified.size());
...
My Questions:
Is it reasonable to shoot for error < 0.25%? Is there a different algorithm that would yield more accurate results? What are the most valuable sources of error to identify and resolve?
As I've worked on this, I've also looked at removing pincushion / barrel distortions, and trying homographies to find the perspective transform. The best approaches I have found so far remain in the 1-2% error.
Any suggestions of where to go next would be really helpful

Related

Rectangle detection / tracking using OpenCV

What I need
I'm currently working on an augmented reality kinda game. The controller that the game uses (I'm talking about the physical input device here) is a mono colored, rectangluar pice of paper. I have to detect the position, rotation and size of that rectangle in the capture stream of the camera. The detection should be invariant on scale and invariant on rotation along the X and Y axes.
The scale invariance is needed in case that the user moves the paper away or towards the camera. I don't need to know the distance of the rectangle so scale invariance translates to size invariance.
The rotation invariance is needed in case the user tilts the rectangle along its local X and / or Y axis. Such a rotation changes the shape of the paper from rectangle to trapezoid. In this case, the object oriented bounding box can be used to measure the size of the paper.
What I've done
At the beginning there is a calibration step. A window shows the camera feed and the user has to click on the rectangle. On click, the color of the pixel the mouse is pointing at is taken as reference color. The frames are converted into HSV color space to improve color distinguishing. I have 6 sliders that adjust the upper and lower thresholds for each channel. These thresholds are used to binarize the image (using opencv's inRange function).
After that I'm eroding and dilating the binary image to remove noise and unite nerby chunks (using opencv's erode and dilate functions).
The next step is finding contours (using opencv's findContours function) in the binary image. These contours are used to detect the smallest oriented rectangles (using opencv's minAreaRect function). As final result I'm using the rectangle with the largest area.
A short conclusion of the procedure:
Grab a frame
Convert that frame to HSV
Binarize it (using the color that the user selected and the thresholds from the sliders)
Apply morph ops (erode and dilate)
Find contours
Get the smallest oriented bouding box of each contour
Take the largest of those bounding boxes as result
As you may noticed, I don't make an advantage of the knowledge about the actual shape of the paper, simply because I don't know how to use this information properly.
I've also thought about using the tracking algorithms of opencv. But there were three reasons that prevented me from using them:
Scale invariance: as far as I read about some of the algorithms, some don't support different scales of the object.
Movement prediction: some algorithms use movement prediction for better performance, but the object I'm tracking moves completely random and therefore unpredictable.
Simplicity: I'm just looking for a mono colored rectangle in an image, nothing fancy like car or person tracking.
Here is a - relatively - good catch (binary image after erode and dilate)
and here is a bad one
The Question
How can I improve the detection in general and especially to be more resistant against lighting changes?
Update
Here are some raw images for testing.
Can't you just use thicker material?
Yes I can and I already do (unfortunately I can't access these pieces at the moment). However, the problem still remains. Even if I use material like cartboard. It isn't bent as easy as paper, but one can still bend it.
How do you get the size, rotation and position of the rectangle?
The minAreaRect function of opencv returns a RotatedRect object. This object contains all the data I need.
Note
Because the rectangle is mono colored, there is no possibility to distinguish between top and bottom or left and right. This means that the rotation is always in range [0, 180] which is perfectly fine for my purposes. The ratio of the two sides of the rect is always w:h > 2:1. If the rectangle would be a square, the range of roation would change to [0, 90], but this can be considered irrelevant here.
As suggested in the comments I will try histogram equalization to reduce brightness issues and take a look at ORB, SURF and SIFT.
I will update on progress.
The H channel in the HSV space is the Hue, and it is not sensitive to the light changing. Red range in about [150,180].
Based on the mentioned information, I do the following works.
Change into the HSV space, split the H channel, threshold and normalize it.
Apply morph ops (open)
Find contours, filter by some properties( width, height, area, ratio and so on).
PS. I cannot fetch the image you upload on the dropbox because of the NETWORK. So, I just use crop the right side of your second image as the input.
imgname = "src.png"
img = cv2.imread(imgname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## Split the H channel in HSV, and get the red range
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv)
h[h<150]=0
h[h>180]=0
## normalize, do the open-morp-op
normed = cv2.normalize(h, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8UC1)
kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3,3))
opened = cv2.morphologyEx(normed, cv2.MORPH_OPEN, kernel)
res = np.hstack((h, normed, opened))
cv2.imwrite("tmp1.png", res)
Now, we get the result as this (h, normed, opened):
Then find contours and filter them.
contours = cv2.findContours(opened, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))[-2]
bboxes = []
rboxes = []
cnts = []
dst = img.copy()
for cnt in contours:
## Get the stright bounding rect
bbox = cv2.boundingRect(cnt)
x,y,w,h = bbox
if w<30 or h < 30 or w*h < 2000 or w > 500:
continue
## Draw rect
cv2.rectangle(dst, (x,y), (x+w,y+h), (255,0,0), 1, 16)
## Get the rotated rect
rbox = cv2.minAreaRect(cnt)
(cx,cy), (w,h), rot_angle = rbox
print("rot_angle:", rot_angle)
## backup
bboxes.append(bbox)
rboxes.append(rbox)
cnts.append(cnt)
The result is like this:
rot_angle: -2.4540319442749023
rot_angle: -1.8476102352142334
Because the blue rectangle tag in the source image, the card is splited into two sides. But a clean image will have no problem.
I know it's been a while since I asked the question. I recently continued on the topic and solved my problem (although not through rectangle detection).
Changes
Using wood to strengthen my controllers (the "rectangles") like below.
Placed 2 ArUco markers on each controller.
How it works
Convert the frame to grayscale,
downsample it (to increase performance during detection),
equalize the histogram using cv::equalizeHist,
find markers using cv::aruco::detectMarkers,
correlate markers (if multiple controllers),
analyze markers (position and rotation),
compute result and apply some error correction.
It turned out that the marker detection is very robust to lighting changes and different viewing angles which allows me to skip any calibration steps.
I placed 2 markers on each controller to increase the detection robustness even more. Both markers has to be detected only one time (to measure how they correlate). After that, it's sufficient to find only one marker per controller as the other can be extrapolated from the previously computed correlation.
Here is a detection result in a bright environment:
in a darker environment:
and when hiding one of the markers (the blue point indicates the extrapolated marker postition):
Failures
The initial shape detection that I implemented didn't perform well. It was very fragile to lighting changes. Furthermore, it required an initial calibration step.
After the shape detection approach I tried SIFT and ORB in combination with brute force and knn matcher to extract and locate features in the frames. It turned out that mono colored objects don't provide much keypoints (what a surprise). The performance of SIFT was terrible anyway (ca. 10 fps # 540p).
I drew some lines and other shapes on the controller which resulted in more keypoints beeing available. However, this didn't yield in huge improvements.

How to align 2 images based on their content with OpenCV

I am totally new to OpenCV and I have started to dive into it. But I'd need a little bit of help.
So I want to combine these 2 images:
I would like the 2 images to match along their edges (ignoring the very right part of the image for now)
Can anyone please point me into the right direction? I have tried using the findTransformECC function. Here's my implementation:
cv::Mat im1 = [imageArray[1] CVMat3];
cv::Mat im2 = [imageArray[0] CVMat3];
// Convert images to gray scale;
cv::Mat im1_gray, im2_gray;
cvtColor(im1, im1_gray, CV_BGR2GRAY);
cvtColor(im2, im2_gray, CV_BGR2GRAY);
// Define the motion model
const int warp_mode = cv::MOTION_AFFINE;
// Set a 2x3 or 3x3 warp matrix depending on the motion model.
cv::Mat warp_matrix;
// Initialize the matrix to identity
if ( warp_mode == cv::MOTION_HOMOGRAPHY )
warp_matrix = cv::Mat::eye(3, 3, CV_32F);
else
warp_matrix = cv::Mat::eye(2, 3, CV_32F);
// Specify the number of iterations.
int number_of_iterations = 50;
// Specify the threshold of the increment
// in the correlation coefficient between two iterations
double termination_eps = 1e-10;
// Define termination criteria
cv::TermCriteria criteria (cv::TermCriteria::COUNT+cv::TermCriteria::EPS, number_of_iterations, termination_eps);
// Run the ECC algorithm. The results are stored in warp_matrix.
findTransformECC(
im1_gray,
im2_gray,
warp_matrix,
warp_mode,
criteria
);
// Storage for warped image.
cv::Mat im2_aligned;
if (warp_mode != cv::MOTION_HOMOGRAPHY)
// Use warpAffine for Translation, Euclidean and Affine
warpAffine(im2, im2_aligned, warp_matrix, im1.size(), cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
else
// Use warpPerspective for Homography
warpPerspective (im2, im2_aligned, warp_matrix, im1.size(),cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
UIImage* result = [UIImage imageWithCVMat:im2_aligned];
return result;
I have tried playing around with the termination_eps and number_of_iterations and increased/decreased those values, but they didn't really make a big difference.
So here's the result:
What can I do to improve my result?
EDIT: I have marked the problematic edges with red circles. The goal is to warp the bottom image and make it match with the lines from the image above:
I did a little bit of research and I'm afraid the findTransformECC function won't give me the result I'd like to have :-(
Something important to add:
I actually have an array of those image "stripes", 8 in this case, they all look similar to the images shown here and they all need to be processed to match the line. I have tried experimenting with the stitch function of OpenCV, but the results were horrible.
EDIT:
Here are the 3 source images:
The result should be something like this:
I transformed every image along the lines that should match. Lines that are too far away from each other can be ignored (the shadow and the piece of road on the right portion of the image)
By your images, it seems that they overlap. Since you said the stitch function didn't get you the desired results, implement your own stitching. I'm trying to do something close to that too. Here is a tutorial on how to implement it in c++: https://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
You can use Hough algorithm with high threshold on two images and then compare the vertical lines on both of them - most of them should be shifted a bit, but keep the angle.
This is what I've got from running this algorithm on one of the pictures:
Filtering out horizontal lines should be easy(as they are represented as Vec4i), and then you can align the remaining lines together.
Here is the example of using it in OpenCV's documentation.
UPDATE: another thought. Aligning the lines together can be done with the concept similar to how cross-correlation function works. Doesn't matter if picture 1 has 10 lines, and picture 2 has 100 lines, position of shift with most lines aligned(which is, mostly, the maximum for CCF) should be pretty close to the answer, though this might require some tweaking - for example giving weight to every line based on its length, angle, etc. Computer vision never has a direct way, huh :)
UPDATE 2: I actually wonder if taking bottom pixels line of top image as an array 1 and top pixels line of bottom image as array 2 and running general CCF over them, then using its maximum as shift could work too... But I think it would be a known method if it worked good.

Match object between different video frames

Am trying to use OPENCV to detect the shift in consecutive video frames when the camera is unstable and moving real time as shown in the picture.. To compensate the effect of shaking or changing in the angle I want to match some objects in the image as example the clock and from the center of the same object in the consecutive frames I can detect the shift value and compensate its effect. I don't know the way to do this real time or how many ways are available and accurate to do this.
Thank you in advance and I hope my question is clear.
This is a fairly standard operation, as it's actively used in MPEG-4 compression. It's called "motion estimation" and you don't do it on objects (too hard, requires image segmentation). In OpenCV, it's covered under Video Stabilization
If you want to try writing code yourself then one method is to first of all crop the frame to produce a sub image of your actual image slightly smaller than your actual image along each dimension. This will give you some room to move.
Next you want to be able to find and track shapes in OpenCV - an example of code is here - http://opencv-srf.blogspot.co.uk/2011/09/object-detection-tracking-using-contours.html - Play around until you get a few geometric primitive shapes coming up on each frame.
Next you want to build some vectors between the centres of each shape - these are what will determine the movement of the camera - if in the next frame most of the vectors are displaced but parallel that is a good indicator that the camera has moved.
The last step is to calculate the displacement, which should is matter of measuring the distance between detected parallel vectors. If this is smaller than your sub-image cropping then you can crop the original image to negate the displacement.
The pseudo code for each iteration would be something like -
//Variables
image wholeFrame1, wholeFrame2, subImage, shapesFrame1, shapesFrame2
vectorArray vectorsFrame1, vectorsFrame2; parallelVectorList
vector cameraDisplacement = [0,0]
//Display image
subImage = cropImage(wholeFrame1, cameraDisplacement)
display(subImage);
//Find shapes to track
shapesFrame1 = findShapes(wholeFrame1)
shapesFrame2 = findShapes(wholeFrame2)
//Store a list of parallel vectors
parallelVectorList = detectParallelVectors(shapesFrame1, shapesFrame2)
//Find the mean displacement of each pair of parallel vectors
cameraDisplacement = meanDisplacement(parallelVectorList)
//Crop the next image accounting for camera displacement
subImage = cropImage(wholeFrame1, cameraDisplacement)
There are better ways of doing it but this would be easy enough for someone doing their first attempt at image stabilisation with experience of OpenCV.

OpenCV 3.0: Calibration not fitting as expected

I'm getting results I don't expect when I use OpenCV 3.0 calibrateCamera. Here is my algorithm:
Load in 30 image points
Load in 30 corresponding world points (coplanar in this case)
Use points to calibrate the camera, just for un-distorting
Un-distort the image points, but don't use the intrinsics (coplanar world points, so intrinsics are dodgy)
Use the undistorted points to find a homography, transforming to world points (can do this because they are all coplanar)
Use the homography and perspective transform to map the undistorted points to the world space
Compare the original world points to the mapped points
The points I have are noisy and only a small section of the image. There are 30 coplanar points from a single view so I can't get camera intrinsics, but should be able to get distortion coefficients and a homography to create a fronto-parallel view.
As expected, the error varies depending on the calibration flags. However, it varies opposite to what I expected. If I allow all variables to adjust, I would expect error to come down. I am not saying I expect a better model; I actually expect over-fitting, but that should still reduce error. What I see though is that the fewer variables I use, the lower my error. The best result is with a straight homography.
I have two suspected causes, but they seem unlikely and I'd like to hear an unadulterated answer before I air them. I have pulled out the code to just do what I'm talking about. It's a bit long, but it includes loading the points.
The code doesn't appear to have bugs; I've used "better" points and it works perfectly. I want to emphasize that the solution here can't be to use better points or perform a better calibration; the whole point of the exercise is to see how the various calibration models respond to different qualities of calibration data.
Any ideas?
Added
To be clear, I know the results will be bad and I expect that. I also understand that I may learn bad distortion parameters which leads to worse results when testing points that have not been used to train the model. What I don't understand is how the distortion model has more error when using the training set as the test set. That is, if the cv::calibrateCamera is supposed to choose parameters to reduce error over the training set of points provided, yet it is producing more error than if it had just selected 0s for K!, K2, ... K6, P1, P2. Bad data or not, it should at least do better on the training set. Before I can say the data is not appropriate for this model, I have to be sure I'm doing the best I can with the data available, and I can't say that at this stage.
Here an example image
The points with the green pins are marked. This is obviously just a test image.
Here is more example stuff
In the following the image is cropped from the big one above. The centre has not changed. This is what happens when I undistort with just the points marked manually from the green pins and allowing K1 (only K1) to vary from 0:
Before
After
I would put it down to a bug, but when I use a larger set of points that covers more of the screen, even from a single plane, it works reasonably well. This looks terrible. However, the error is not nearly as bad as you might think from looking at the picture.
// Load image points
std::vector<cv::Point2f> im_points;
im_points.push_back(cv::Point2f(1206, 1454));
im_points.push_back(cv::Point2f(1245, 1443));
im_points.push_back(cv::Point2f(1284, 1429));
im_points.push_back(cv::Point2f(1315, 1456));
im_points.push_back(cv::Point2f(1352, 1443));
im_points.push_back(cv::Point2f(1383, 1431));
im_points.push_back(cv::Point2f(1431, 1458));
im_points.push_back(cv::Point2f(1463, 1445));
im_points.push_back(cv::Point2f(1489, 1432));
im_points.push_back(cv::Point2f(1550, 1461));
im_points.push_back(cv::Point2f(1574, 1447));
im_points.push_back(cv::Point2f(1597, 1434));
im_points.push_back(cv::Point2f(1673, 1463));
im_points.push_back(cv::Point2f(1691, 1449));
im_points.push_back(cv::Point2f(1708, 1436));
im_points.push_back(cv::Point2f(1798, 1464));
im_points.push_back(cv::Point2f(1809, 1451));
im_points.push_back(cv::Point2f(1819, 1438));
im_points.push_back(cv::Point2f(1925, 1467));
im_points.push_back(cv::Point2f(1929, 1454));
im_points.push_back(cv::Point2f(1935, 1440));
im_points.push_back(cv::Point2f(2054, 1470));
im_points.push_back(cv::Point2f(2052, 1456));
im_points.push_back(cv::Point2f(2051, 1443));
im_points.push_back(cv::Point2f(2182, 1474));
im_points.push_back(cv::Point2f(2171, 1459));
im_points.push_back(cv::Point2f(2164, 1446));
im_points.push_back(cv::Point2f(2306, 1474));
im_points.push_back(cv::Point2f(2292, 1462));
im_points.push_back(cv::Point2f(2278, 1449));
// Create corresponding world / object points
std::vector<cv::Point3f> world_points;
for (int i = 0; i < 30; i++) {
world_points.push_back(cv::Point3f(5 * (i / 3), 4 * (i % 3), 0.0f));
}
// Perform calibration
// Flags are set out so they can be commented out and "freed" easily
int calibration_flags = 0
| cv::CALIB_FIX_K1
| cv::CALIB_FIX_K2
| cv::CALIB_FIX_K3
| cv::CALIB_FIX_K4
| cv::CALIB_FIX_K5
| cv::CALIB_FIX_K6
| cv::CALIB_ZERO_TANGENT_DIST
| 0;
// Initialise matrix
cv::Mat intrinsic_matrix = cv::Mat(3, 3, CV_64F);
intrinsic_matrix.ptr<float>(0)[0] = 1;
intrinsic_matrix.ptr<float>(1)[1] = 1;
cv::Mat distortion_coeffs = cv::Mat::zeros(5, 1, CV_64F);
// Rotation and translation vectors
std::vector<cv::Mat> undistort_rvecs;
std::vector<cv::Mat> undistort_tvecs;
// Wrap in an outer vector for calibration
std::vector<std::vector<cv::Point2f>>im_points_v(1, im_points);
std::vector<std::vector<cv::Point3f>>w_points_v(1, world_points);
// Calibrate; only 1 plane, so intrinsics can't be trusted
cv::Size image_size(4000, 3000);
calibrateCamera(w_points_v, im_points_v,
image_size, intrinsic_matrix, distortion_coeffs,
undistort_rvecs, undistort_tvecs, calibration_flags);
// Undistort im_points
std::vector<cv::Point2f> ud_points;
cv::undistortPoints(im_points, ud_points, intrinsic_matrix, distortion_coeffs);
// ud_points have been "unintrinsiced", but we don't know the intrinsics, so reverse that
double fx = intrinsic_matrix.at<double>(0, 0);
double fy = intrinsic_matrix.at<double>(1, 1);
double cx = intrinsic_matrix.at<double>(0, 2);
double cy = intrinsic_matrix.at<double>(1, 2);
for (std::vector<cv::Point2f>::iterator iter = ud_points.begin(); iter != ud_points.end(); iter++) {
iter->x = iter->x * fx + cx;
iter->y = iter->y * fy + cy;
}
// Find a homography mapping the undistorted points to the known world points, ground plane
cv::Mat homography = cv::findHomography(ud_points, world_points);
// Transform the undistorted image points to the world points (2d only, but z is constant)
std::vector<cv::Point2f> estimated_world_points;
std::cout << "homography" << homography << std::endl;
cv::perspectiveTransform(ud_points, estimated_world_points, homography);
// Work out error
double sum_sq_error = 0;
for (int i = 0; i < 30; i++) {
double err_x = estimated_world_points.at(i).x - world_points.at(i).x;
double err_y = estimated_world_points.at(i).y - world_points.at(i).y;
sum_sq_error += err_x*err_x + err_y*err_y;
}
std::cout << "Sum squared error is: " << sum_sq_error << std::endl;
I would take random samples of the 30 input points and compute the homography in each case along with the errors under the estimated homographies, a RANSAC scheme, and verify consensus between error levels and homography parameters, this can be just a verification of the global optimisation process. I know that might seem unnecessary, but it is just a sanity check for how sensitive the procedure is to the input (noise levels, location)
Also, it seems logical that fixing most of the variables gets you the least errors, as the degrees of freedom in the minimization process are less. I would try fixing different ones to establish another consensus. At least this would let you know which variables are the most sensitive to the noise levels of the input.
Hopefully, such a small section of the image would be close to the image centre as it will incur the least amount of lens distortion. Is using a different distortion model possible in your case? A more viable way is to adapt the number of distortion parameters given the position of the pattern with respect to the image centre.
Without knowing the constraints of the algorithm, I might have misunderstood the question, that's also an option too, in such case I can roll back.
I would like to have this as a comment rather, but I do not have enough points.
OpenCV runs Levenberg-Marquardt algorithm inside calibrate camera.
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm/
This algortihm works fine in problems with one minimum. In case of single image, points located close each other and many dimensional problem (n= number of coefficents) algorithm may be unstable (especially with wrong initial guess of camera matrix. Convergence of algorithm is well described here:
https://na.math.kit.edu/download/papers/levenberg.pdf/
As you wrote, error depends on calibration flags - number of flags changes dimension of a problem to be optimized.
Camera calibration also calculates pose of camera, which will be bad in models with wrong calibration matrix.
As a solution I suggest changing approach. You dont need to calculate camera matrix and pose in this step. Since you know, that points are located on a plane you can use 3d-2d plane projection equation to determine distribution type of points. By distribution I mean, that all points will be located equally on some kind of trapezoid.
Then you can use cv::undistort with different distCoeffs on your test image and calculate image point distribution and distribution error.
The last step will be to perform this steps as a target function for some optimization algorithm with distortion coefficents being optimized.
This is not the easiest solution, but i hope it will help you.

Writing robust (color and size invariant) circle detection with OpenCV (based on Hough transform or other features)

I wrote the following very simple python code to find circles in an image:
import cv
import numpy as np
WAITKEY_DELAY_MS = 10
STOP_KEY = 'q'
cv.NamedWindow("image - press 'q' to quit", cv.CV_WINDOW_AUTOSIZE);
cv.NamedWindow("post-process", cv.CV_WINDOW_AUTOSIZE);
key_pressed = False
while key_pressed != STOP_KEY:
# grab image
orig = cv.LoadImage('circles3.jpg')
# create tmp images
grey_scale = cv.CreateImage(cv.GetSize(orig), 8, 1)
processed = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.Smooth(orig, orig, cv.CV_GAUSSIAN, 3, 3)
cv.CvtColor(orig, grey_scale, cv.CV_RGB2GRAY)
# do some processing on the grey scale image
cv.Erode(grey_scale, processed, None, 10)
cv.Dilate(processed, processed, None, 10)
cv.Canny(processed, processed, 5, 70, 3)
cv.Smooth(processed, processed, cv.CV_GAUSSIAN, 15, 15)
storage = cv.CreateMat(orig.width, 1, cv.CV_32FC3)
# these parameters need to be adjusted for every single image
HIGH = 50
LOW = 140
try:
# extract circles
cv.HoughCircles(processed, storage, cv.CV_HOUGH_GRADIENT, 2, 32.0, HIGH, LOW)
for i in range(0, len(np.asarray(storage))):
print "circle #%d" %i
Radius = int(np.asarray(storage)[i][0][2])
x = int(np.asarray(storage)[i][0][0])
y = int(np.asarray(storage)[i][0][1])
center = (x, y)
# green dot on center and red circle around
cv.Circle(orig, center, 1, cv.CV_RGB(0, 255, 0), -1, 8, 0)
cv.Circle(orig, center, Radius, cv.CV_RGB(255, 0, 0), 3, 8, 0)
cv.Circle(processed, center, 1, cv.CV_RGB(0, 255, 0), -1, 8, 0)
cv.Circle(processed, center, Radius, cv.CV_RGB(255, 0, 0), 3, 8, 0)
except:
print "nothing found"
pass
# show images
cv.ShowImage("image - press 'q' to quit", orig)
cv.ShowImage("post-process", processed)
cv_key = cv.WaitKey(WAITKEY_DELAY_MS)
key_pressed = chr(cv_key & 255)
As you can see from the following two examples, the 'circle finding quality' varies quite a lot:
CASE1:
CASE2:
Case1 and Case2 are basically the same image, but still the algorithm detects different circles. If I present the algorithm an image with differently sized circles, the circle detection might even fail completely. This is mostly due to the HIGH and LOW parameters which need to be adjusted individually for each new picture.
Therefore my question: What are the various possibilities of making this algorithm more robust? It should be size and color invariant so that different circles with different colors and in different sizes are detected. Maybe using the Hough transform is not the best way of doing things? Are there better approaches?
The following is based on my experience as a vision researcher. From your question you seem to be interested in possible algorithms and methods rather only a working piece of code. First I give a quick and dirty Python script for your sample images and some results are shown to prove it could possibly solve your problem. After getting these out of the way, I try to answer your questions regarding robust detection algorithms.
Quick Results
Some sample images (all the images apart from yours are downloaded from flickr.com and are CC licensed) with the detected circles (without changing/tuning any parameters, exactly the following code is used to extract the circles in all the images):
Code (based on the MSER Blob Detector)
And here is the code:
import cv2
import math
import numpy as np
d_red = cv2.cv.RGB(150, 55, 65)
l_red = cv2.cv.RGB(250, 200, 200)
orig = cv2.imread("c.jpg")
img = orig.copy()
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
detector = cv2.FeatureDetector_create('MSER')
fs = detector.detect(img2)
fs.sort(key = lambda x: -x.size)
def supress(x):
for f in fs:
distx = f.pt[0] - x.pt[0]
disty = f.pt[1] - x.pt[1]
dist = math.sqrt(distx*distx + disty*disty)
if (f.size > x.size) and (dist<f.size/2):
return True
sfs = [x for x in fs if not supress(x)]
for f in sfs:
cv2.circle(img, (int(f.pt[0]), int(f.pt[1])), int(f.size/2), d_red, 2, cv2.CV_AA)
cv2.circle(img, (int(f.pt[0]), int(f.pt[1])), int(f.size/2), l_red, 1, cv2.CV_AA)
h, w = orig.shape[:2]
vis = np.zeros((h, w*2+5), np.uint8)
vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)
vis[:h, :w] = orig
vis[:h, w+5:w*2+5] = img
cv2.imshow("image", vis)
cv2.imwrite("c_o.jpg", vis)
cv2.waitKey()
cv2.destroyAllWindows()
As you can see it's based on the MSER blob detector. The code doesn't preprocess the image apart from the simple mapping into grayscale. Thus missing those faint yellow blobs in your images is expected.
Theory
In short: you don't tell us what you know about the problem apart from giving only two sample images with no description of them. Here I explain why I in my humble opinion it is important to have more information about the problem before asking what are efficient methods to attack the problem.
Back to the main question: what is the best method for this problem?
Let's look at this as a search problem. To simplify the discussion assume we are looking for circles with a given size/radius. Thus, the problem boils down to finding the centers. Every pixel is a candidate center, therefore, the search space contains all the pixels.
P = {p1, ..., pn}
P: search space
p1...pn: pixels
To solve this search problem two other functions should be defined:
E(P) : enumerates the search space
V(p) : checks whether the item/pixel has the desirable properties, the items passing the check are added to the output list
Assuming the complexity of the algorithm doesn't matter, the exhaustive or brute-force search can be used in which E takes every pixel and passes to V. In real-time applications it's important to reduce the search space and optimize computational efficiency of V.
We are getting closer to the main question. How we could define V, to be more precise what properties of the candidates should be measures and how should make solve the dichotomy problem of splitting them into desirable and undesirable. The most common approach is to find some properties which can be used to define simple decision rules based on the measurement of the properties. This is what you're doing by trial and error. You're programming a classifier by learning from positive and negative examples. This is because the methods you're using have no idea what you want to do. You have to adjust / tune the parameters of the decision rule and/or preprocess the data such that the variation in the properties (of the desirable candidates) used by the method for the dichotomy problem are reduced. You could use a machine learning algorithm to find the optimal parameter values for a given set of examples. There's a whole host of learning algorithms from decision trees to genetic programming you can use for this problem. You could also use a learning algorithm to find the optimal parameter values for several circle detection algorithms and see which one gives a better accuracy. This takes the main burden on the learning algorithm you just need to collect sample images.
The other approach to improve robustness which is often overlooked is to utilize extra readily available information. If you know the color of the circles with virtually zero extra effort you could improve the accuracy of the detector significantly. If you knew the position of the circles on the plane and you wanted to detect the imaged circles, you should remember the transformation between these two sets of positions is described by a 2D homography. And the homography can be estimated using only four points. Then you could improve the robustness to have a rock solid method. The value of domain-specific knowledge is often underestimated. Look at it this way, in the first approach we try to approximate some decision rules based on a limited number of sample. In the second approach we know the decision rules and only need to find a way to effectively utilize them in an algorithm.
Summary
To summarize, there are two approaches to improve the accuracy / robustness of the solution:
Tool-based: finding an easier to use algorithm / with fewer number of parameters / tweaking the algorithm / automating this process by using machine learning algorithms
Information-based: are you using all the readily available information? In the question you don't mention what you know about the problem.
For these two images you have shared I would use a blob detector not the HT method. For background subtraction I would suggest to try to estimate the color of the background as in the two images it is not varying while the color of the circles vary. And the most of the area is bare.
This is a great modelling problem. I have the following recommendations/ ideas:
Split the image to RGB then process.
pre-processing.
Dynamic parameter search.
Add constraints.
Be sure about what you are trying to detect.
In more detail:
1: As noted in other answers, converting straight to grayscale discards too much information - any circles with a similar brightness to the background will be lost. Much better to consider the colour channels either in isolation or in a different colour space. There are pretty much two ways to go here: perform HoughCircles on each pre-processed channel in isolation, then combine results, or, process the channels, then combine them, then operate HoughCircles. In my attempt below, I've tried the second method, splitting to RGB channels, processing, then combining. Be wary of over saturating the image when combining, I use cv.And to avoid this issue (at this stage my circles are always black rings/discs on white background).
2: Pre-processing is quite tricky, and something its often best to play around with. I've made use of AdaptiveThreshold which is a really powerful convolution method that can enhance edges in an image by thresholding pixels based on their local average (similar processes also occur in the early pathway of the mammalian visual system). This is also useful as it reduces some noise. I've used dilate/erode with only one pass. And I've kept the other parameters how you had them. It seems using Canny before HoughCircles does help a lot with finding 'filled circles', so probably best to keep it in. This pre-processing is quite heavy and can lead to false positives with somewhat more 'blobby circles', but in our case this is perhaps desirable?
3: As you've noted HoughCircles parameter param2 (your parameter LOW) needs to be adjusted for each image in order to get an optimal solution, in fact from the docs:
The smaller it is, the more false circles may be detected.
Trouble is the sweet spot is going to be different for every image. I think the best approach here is to make set a condition and do a search through different param2 values until this condition is met. Your images show non-overlapping circles, and when param2 is too low we typically get loads of overlapping circles. So I suggest searching for the:
maximum number of non-overlapping, and non-contained circles
So we keep calling HoughCircles with different values of param2 until this is met. I do this in my example below, just by incrementing param2 until it reaches the threshold assumption. It would be way faster (and fairly easy to do) if you perform a binary search to find when this is met, but you need to be careful with exception handling as opencv often throws a errors for innocent looking values of param2 (at least on my installation). A different condition that would we very useful to match against would be the number of circles.
4: Are there any more constraints we can add to the model? The more stuff we can tell our model the easy a task we can make it to detect circles. For example, do we know:
The number of circles. - even an upper or lower bound is helpful.
Possible colours of the circles, or of the background, or of 'non-circles'.
Their sizes.
Where they can be in an image.
5: Some of the blobs in your images could only loosely be called circles! Consider the two 'non-circular blobs' in your second image, my code can't find them (good!), but... if I 'photoshop' them so they are more circular, my code can find them... Maybe if you want to detect things that are not circles, a different approach such as Tim Lukins may be better.
Problems
By doing heavy pre-processing AdaptiveThresholding and `Canny' there can be a lot of distortion to features in an image, which may lead to false circle detection, or incorrect radius reporting. For example a large solid disc after processing can appear a ring, so HughesCircles may find the inner ring. Furthermore even the docs note that:
...usually the function detects the circles’ centers well, however it may fail to find the correct radii.
If you need more accurate radii detection, I suggest the following approach (not implemented):
On the original image, ray-trace from reported centre of circle, in an expanding cross (4 rays: up/down/left/right)
Do this seperately in each RGB channel
Combine this info for each channel for each ray in a sensible fashion (ie. flip, offset, scale, etc as necessary)
take the average for the first few pixels on each ray, use this to detect where a significant deviation on the ray occurs.
These 4 points are estimates of points on the circumference.
Use these four estimates to determine a more accurate radius, and centre position(!).
This could be generalised by using an expanding ring instead of four rays.
Results
The code at end does pretty good quite a lot of the time, these examples were done with code as shown:
Detects all circles in your first image:
How the pre-processed image looks before canny filter is applied (different colour circles are highly visible):
Detects all but two (blobs) in second image:
Altered second image (blobs are circle-afied, and large oval made more circular, thus improving detection), all detected:
Does pretty well in detecting centres in this Kandinsky painting (I cannot find concentric rings due to he boundary condition).
Code:
import cv
import numpy as np
output = cv.LoadImage('case1.jpg')
orig = cv.LoadImage('case1.jpg')
# create tmp images
rrr=cv.CreateImage((orig.width,orig.height), cv.IPL_DEPTH_8U, 1)
ggg=cv.CreateImage((orig.width,orig.height), cv.IPL_DEPTH_8U, 1)
bbb=cv.CreateImage((orig.width,orig.height), cv.IPL_DEPTH_8U, 1)
processed = cv.CreateImage((orig.width,orig.height), cv.IPL_DEPTH_8U, 1)
storage = cv.CreateMat(orig.width, 1, cv.CV_32FC3)
def channel_processing(channel):
pass
cv.AdaptiveThreshold(channel, channel, 255, adaptive_method=cv.CV_ADAPTIVE_THRESH_MEAN_C, thresholdType=cv.CV_THRESH_BINARY, blockSize=55, param1=7)
#mop up the dirt
cv.Dilate(channel, channel, None, 1)
cv.Erode(channel, channel, None, 1)
def inter_centre_distance(x1,y1,x2,y2):
return ((x1-x2)**2 + (y1-y2)**2)**0.5
def colliding_circles(circles):
for index1, circle1 in enumerate(circles):
for circle2 in circles[index1+1:]:
x1, y1, Radius1 = circle1[0]
x2, y2, Radius2 = circle2[0]
#collision or containment:
if inter_centre_distance(x1,y1,x2,y2) < Radius1 + Radius2:
return True
def find_circles(processed, storage, LOW):
try:
cv.HoughCircles(processed, storage, cv.CV_HOUGH_GRADIENT, 2, 32.0, 30, LOW)#, 0, 100) great to add circle constraint sizes.
except:
LOW += 1
print 'try'
find_circles(processed, storage, LOW)
circles = np.asarray(storage)
print 'number of circles:', len(circles)
if colliding_circles(circles):
LOW += 1
storage = find_circles(processed, storage, LOW)
print 'c', LOW
return storage
def draw_circles(storage, output):
circles = np.asarray(storage)
print len(circles), 'circles found'
for circle in circles:
Radius, x, y = int(circle[0][2]), int(circle[0][0]), int(circle[0][1])
cv.Circle(output, (x, y), 1, cv.CV_RGB(0, 255, 0), -1, 8, 0)
cv.Circle(output, (x, y), Radius, cv.CV_RGB(255, 0, 0), 3, 8, 0)
#split image into RGB components
cv.Split(orig,rrr,ggg,bbb,None)
#process each component
channel_processing(rrr)
channel_processing(ggg)
channel_processing(bbb)
#combine images using logical 'And' to avoid saturation
cv.And(rrr, ggg, rrr)
cv.And(rrr, bbb, processed)
cv.ShowImage('before canny', processed)
# cv.SaveImage('case3_processed.jpg',processed)
#use canny, as HoughCircles seems to prefer ring like circles to filled ones.
cv.Canny(processed, processed, 5, 70, 3)
#smooth to reduce noise a bit more
cv.Smooth(processed, processed, cv.CV_GAUSSIAN, 7, 7)
cv.ShowImage('processed', processed)
#find circles, with parameter search
storage = find_circles(processed, storage, 100)
draw_circles(storage, output)
# show images
cv.ShowImage("original with circles", output)
cv.SaveImage('case1.jpg',output)
cv.WaitKey(0)
Ah, yes… the old colour/size invariants for circles problem (AKA the Hough transform is too specific and not robust)...
In the past I have relied much more on the structural and shape analysis functions of OpenCV instead. You can get a very good idea of from "samples" folder of what is possible - particularly fitellipse.py and squares.py.
For your elucidation, I present a hybrid version of these examples and based on your original source. The contours detected are in green and the fitted ellipses in red.
It's not quite there yet:
The pre-processing steps need a bit of tweaking to detect the more faint circles.
You could test the contour further to determine if it is a circle or not...
Good luck!
import cv
import numpy as np
# grab image
orig = cv.LoadImage('circles3.jpg')
# create tmp images
grey_scale = cv.CreateImage(cv.GetSize(orig), 8, 1)
processed = cv.CreateImage(cv.GetSize(orig), 8, 1)
cv.Smooth(orig, orig, cv.CV_GAUSSIAN, 3, 3)
cv.CvtColor(orig, grey_scale, cv.CV_RGB2GRAY)
# do some processing on the grey scale image
cv.Erode(grey_scale, processed, None, 10)
cv.Dilate(processed, processed, None, 10)
cv.Canny(processed, processed, 5, 70, 3)
cv.Smooth(processed, processed, cv.CV_GAUSSIAN, 15, 15)
#storage = cv.CreateMat(orig.width, 1, cv.CV_32FC3)
storage = cv.CreateMemStorage(0)
contours = cv.FindContours(processed, storage, cv.CV_RETR_EXTERNAL)
# N.B. 'processed' image is modified by this!
#contours = cv.ApproxPoly (contours, storage, cv.CV_POLY_APPROX_DP, 3, 1)
# If you wanted to reduce the number of points...
cv.DrawContours (orig, contours, cv.RGB(0,255,0), cv.RGB(255,0,0), 2, 3, cv.CV_AA, (0, 0))
def contour_iterator(contour):
while contour:
yield contour
contour = contour.h_next()
for c in contour_iterator(contours):
# Number of points must be more than or equal to 6 for cv.FitEllipse2
if len(c) >= 6:
# Copy the contour into an array of (x,y)s
PointArray2D32f = cv.CreateMat(1, len(c), cv.CV_32FC2)
for (i, (x, y)) in enumerate(c):
PointArray2D32f[0, i] = (x, y)
# Fits ellipse to current contour.
(center, size, angle) = cv.FitEllipse2(PointArray2D32f)
# Convert ellipse data from float to integer representation.
center = (cv.Round(center[0]), cv.Round(center[1]))
size = (cv.Round(size[0] * 0.5), cv.Round(size[1] * 0.5))
# Draw ellipse
cv.Ellipse(orig, center, size, angle, 0, 360, cv.RGB(255,0,0), 2,cv.CV_AA, 0)
# show images
cv.ShowImage("image - press 'q' to quit", orig)
#cv.ShowImage("post-process", processed)
cv.WaitKey(-1)
EDIT:
Just an update to say that I believe a major theme to all these answers is that there are a host of further assumptions and constraints that can be applied to what you seek to recognise as circular. My own answer makes no pretences at this - neither in the low-level pre-processing or the high-level geometric fitting. The fact that many of the circles are not really that round due to the way they are drawn or the non-affine/projective transforms of the image, and with the other properties in how they are rendered/captured (colour, noise, lighting, edge thickness) - all result in any number of possible candidate circles within just one image.
There are much more sophisticated techniques. But they will cost you. Personally I like #fraxel idea of using the addaptive threshold. That is fast, reliable and reasonably robust. You can then test further the final contours (e.g. use Hu moments) or fittings with a simple ratio test of the ellipse axis - e.g. if ((min(size)/max(size))>0.7).
As ever with Computer Vision there is the tension between pragmatism, principle, and parsomony. As I am fond of telling people who think that CV is easy, it is not - it is in fact famously an AI complete problem. The best you can often hope for outside of this is something that works most of the time.
Looking through your code, I noticed the following:
Greyscale conversion. I understand why you're doing it, but realize that you're throwing
away information there. As you see in the "post-process" images, your yellow circles are
the same intensity as the background, just in a different color.
Edge detection after noise removal (erae/dilate). This shouldn't be necessary; Canny ought to take care of this.
Canny edge detection. Your "open" circles have two edges, an inner and outer edge. Since they're fairly close, the Canny gauss filter might add them together. If it doesn't, you'll have two edges close together. I.e. before Canny, you have open and filled circles. Afterwards, you have 0/2 and 1 edge, respectively. Since Hough calls Canny again, in the first case the two edges might be smoothed together (depending on the initial width), which is why the core Hough algorithm can treat open and filled circles the same.
So, my first recommendation would be to change the grayscale mapping. Don't use intensity, but use hue/saturation/value. Also, use a differential approach - you're looking for edges. So, compute a HSV transform, smooth a copy, and then take the difference between the original and smoothed copy. This will get you dH, dS, dV values (local variation in Hue, Saturation, Value) for each point. Square and add to get a one-dimensional image, with peaks near all edges (inner and outer).
My second recommendation would be local normalization, but I'm not sure if that's even necessary. The idea is that you don't care particularly much about the exact value of the edge signal you got out, it should really be binary anyway (edge or not). Therefore, you can normalize each value by dividing by a local average (where local is in the order of magnitude of your edge size).
The Hough transform uses a "model" to find certain features in a (typically) edge-detected image, as you may know. In the case of HoughCircles that model is a perfect circle. This means there probably doesn't exist a combination of parameters that will make it detect the more erratically and ellipse shaped circles in your picture without increasing the number of false positives. On the other hand, due to the underlying voting mechanism, a non-closed perfect circle or a perfect circle with a "dent" might consistently show up. So depending on your expected output you may or may not want to use this method.
That said, there are a few things I see which might help you on your way with this function:
HoughCircles calls Canny internally, so I guess you can leave that call out.
param1 (which you call HIGH) is typically initialised around a value of 200. It is used as a parameter to the internal call to Canny: cv.Canny(processed, cannied, HIGH, HIGH/2). It might help to run Canny yourself like this to see how setting HIGH affects the image being worked with by the Hough transform.
param2 (which you call LOW) is typically initialised around a value 100. It is the voting threshold for the Hough transform's accumulators. Setting it higher means more false negatives, lower more false positives. I believe this is the first one you want to start fiddling around with.
Ref: http://docs.opencv.org/3.0-beta/modules/imgproc/doc/feature_detection.html#houghcircles
Update re: filled circles: After you've found the circle shapes with the Hough transform you can test if they are filled by sampling the boundary colour and comparing it to one or more points inside the supposed circle. Alternatively you can compare one or more points inside the supposed circle to a given background colour. The circle is filled if the former comparison succeeds, or in the case of the alternative comparison if it fails.
Ok looking at the images. I suggest using **Active Contours**
Active Contours
The good thing about active contours is that they almost perfectly fit into the any given shape. Be it squares or triangle and in your case they are the perfect candidates.
If you are able to extract the centre of the circles, that is great. Active contours always need a point to start from which they can either grow or shrink to fit. Not necessary that the centres are always aligned to the centre. A little offset will still be ok.
And in your case, if you let the contours to grow from the centre outwards, they shall rest a the circle boundaries.
Note that active contours that grow or shrink use balloon energy which means you can set the direction of contours, inwards or outwards.
You would probably need to use the gradient image in grey scale. But still you can try in colour as well. If it works!
And if you do not provide centres, throw in lots of active contours, make then grow/shrink. Contours that settle down are kept, unsettled ones are thrown away. This is a brute force approach. Will CPU intensive. But will require more careful work to make sure you leave correct contours and throw out the bad ones.
I hope this way you can solve the problem.