So this should be straight forward but I a not very familiar with OpenCV.
Can someone suggest a method to measure the distance in pixels (red line) as shown in the image below? Preferably it had some options like width of measurement (as demonstrated at the end and begining of the red line) or something of sorts. This kind of measurement is very common in software like ImageJ, I can imagine it should be somewhat trivial to do it in OpenCV.
I would like to take several samples accros the image width as well.
Greets
I am using openCV and learning about it
Your task is quite simple.
optional smoothing (Gauss filter) - you have to experiment with your data to see if it helps
edge detection (will transform image to lines representing edges) - for example cv::Canny
Hough transform to detect lines - openCV.
Find two maximum values (longest lines) in Hough transform
you will have two questions of straight lines, then you can use this information to calculate distance between them
Note that whit this approach image doesn't have to be straight. You will have line equations which you have to manipulate in smart way. If those two lines are parallel this there is simple formula to get distance between them. If they are not perfectly parallel then you have to take this int account and use information about image area to get average distance.
A simple way to find the width of the channel would be the following:
distance = []
h = img.shape[0]
for j in range(img.shape[1]):
line_top = 0
line_bottom = img.shape[0]
found_top = False
found_bottom = False
for i in range(h):
if img[i,j,0] > 0 and not found_top:
line_top = i
found_top = True
if img[h-i-1,j,0] > 0 and not found_bottom:
line_bottom = h-i
found_bottom = True
if found_top and found_bottom:
distance.append(line_bottom-line_top)
break
But this would cause the distance to take into acount the very small white speckles.
To solve this there are several options:
Preprocess the image using opencv morphological transformation.
Preprocess the image using opencv gaussian filter or similar.
Update the code to use a larger window.
Another solution would be to apply opencv's findContours.
Theres is a MATLAB example that matches two images and outputs the rotation and scale:
https://de.mathworks.com/help/vision/examples/find-image-rotation-and-scale-using-automated-feature-matching.html?requestedDomain=www.mathworks.com
My goal is to recreate this example using C++. I am using the same method of keypoint detection (Harris) and the keypoints seem to be mostly identical to the ones Matlab finds. So far so good.
cv::goodFeaturesToTrack(image_grayscale, corners, number_of_keypoints, 0.01, 5, mask, 3, true, 0.04);
for (int i = 0; i < corners.size(); i++) {
keypoints.push_back(cv::KeyPoint(corners[i], 5));
}
BRISK is used to extract features from the keypoints.
int Threshl = 120;
int Octaves = 8;
float PatternScales = 1.0f;
cv::Ptr<cv::Feature2D> extractor = cv::BRISK::create(Threshl, Octaves, PatternScales);
extractor->compute(image, mykeypoints, descriptors);
These descriptors are then matched using flannbasedmatcher.
cv::FlannBasedMatcher matcher;
matcher.match(descriptors32A, descriptors32B, matches);
Now the problem is that about 80% of my matches are wrong and unusable. For the identical set of images Matlab returns only a couple of matches from which only ~20% are wrong. I have tried sorting the Matches in C++ based on their distance value with no success. The values range between 300 and 700 and even the matches with the lowest distance are almost entirely incorrect.
Now 20% of good matches are enough to calculate the offset but a lot of processing power is wasted on checking wrong matches. What would be a better way to sort the correct matches or is there something obvious I am doing wrong?
EDIT:
I have switched from Harris/BRISK to AKAZE which seems to deliver much better features and matches that can easily be sorted by their distance value. The only downside is the much higher computation time. With two 1000px wide images AKAZE needs half a minute to find the keypoints (on a PC). I reducted this by scaling down the images which makes for an acceptable ~3-5 seconds.
The method you are using finds for each point an nearest neighbour no matter how close it is. Two strategies are common:
1. Match set A to set B and set B to A and keep only matches which exist in both matchings.
2. Use 2 knnMatch and perform a ratio check, i.e. keep only the matches where the 1 NN is a lot closer than the 2 NN, e.g.
d1 < 0.8 * d2.
The MATLAB code uses SURF. OpenCV also provides SURF, SIFT and AKAZE, try one of these. Especially SURF would be interesting for a comparison.
I am trying to find the exact number of neighbour nodes in a big 3D points dataset. The goal is for each point of the dataset to retrieve all the possible neighbours in a region with a given radius. FLANN ensures that for lower dimensional data can retrieve the exact neighbors while comparing with brute force search it seems to not be the case. The neighbors are essential for further calculations and therefore I need the exact number. I tested increasing the radius a little bit but doesn't seem to be this the problem. Is anyone aware how to calculate the exact neighbors with FLANN or other C++ library?
The code:
// All nodes to be tested for inclusion in support domain.
flann::Matrix<double> query_nodes = flann::Matrix<double>(&nodes_pos[0].x, nodes_pos.size(), 3);
// Set default search parameters
flann::SearchParams search_parameters = flann::SearchParams();
search_parameters.checks = -1;
search_parameters.sorted = false;
search_parameters.use_heap = flann::FLANN_True;
flann::KDTreeSingleIndexParams index_parameters = flann::KDTreeSingleIndexParams();
flann::KDTreeSingleIndex<flann::L2_3D<double> > index(query_nodes, index_parameters);
index.buildIndex();
//FLANN uses L2 for radius search.
double l2_radius = (this->support_layer_*grid.spacing)*(this->support_layer_*grid.spacing);
double extension = l2_radius/10.;
l2_radius+= extension;
index.radiusSearch(query_nodes, indices, dists, l2_radius, search_parameters);
Try nanoflann. It is designed for low dimensional spaces and gives exact nearest neighbors. Furthermore, it is just one header file that you can either "install" or just copy to your project.
You should check page 6+ from the flann-manual, to fine-tune your search parameters, such as target_precision, which should be set to 1, for "maximum" accuracy.
That parameter is often found as epsilon (ε) in Approximate Nearest Neighbor Search (ANNS), which is used in high dimensional spaces, in order to (try) to beat the curse of dimensionality. FLANN is usually used in 128 dimensions, not 3, as far as I can tell, which may explain the bad performance you are experiencing.
A c++ library that works well in 3 dimensions is CGAL. However, it's much larger than FLANN, because it is a library for computational geometry, thus it provides functionality for many problems, not just NNS.
I'm getting results I don't expect when I use OpenCV 3.0 calibrateCamera. Here is my algorithm:
Load in 30 image points
Load in 30 corresponding world points (coplanar in this case)
Use points to calibrate the camera, just for un-distorting
Un-distort the image points, but don't use the intrinsics (coplanar world points, so intrinsics are dodgy)
Use the undistorted points to find a homography, transforming to world points (can do this because they are all coplanar)
Use the homography and perspective transform to map the undistorted points to the world space
Compare the original world points to the mapped points
The points I have are noisy and only a small section of the image. There are 30 coplanar points from a single view so I can't get camera intrinsics, but should be able to get distortion coefficients and a homography to create a fronto-parallel view.
As expected, the error varies depending on the calibration flags. However, it varies opposite to what I expected. If I allow all variables to adjust, I would expect error to come down. I am not saying I expect a better model; I actually expect over-fitting, but that should still reduce error. What I see though is that the fewer variables I use, the lower my error. The best result is with a straight homography.
I have two suspected causes, but they seem unlikely and I'd like to hear an unadulterated answer before I air them. I have pulled out the code to just do what I'm talking about. It's a bit long, but it includes loading the points.
The code doesn't appear to have bugs; I've used "better" points and it works perfectly. I want to emphasize that the solution here can't be to use better points or perform a better calibration; the whole point of the exercise is to see how the various calibration models respond to different qualities of calibration data.
Any ideas?
Added
To be clear, I know the results will be bad and I expect that. I also understand that I may learn bad distortion parameters which leads to worse results when testing points that have not been used to train the model. What I don't understand is how the distortion model has more error when using the training set as the test set. That is, if the cv::calibrateCamera is supposed to choose parameters to reduce error over the training set of points provided, yet it is producing more error than if it had just selected 0s for K!, K2, ... K6, P1, P2. Bad data or not, it should at least do better on the training set. Before I can say the data is not appropriate for this model, I have to be sure I'm doing the best I can with the data available, and I can't say that at this stage.
Here an example image
The points with the green pins are marked. This is obviously just a test image.
Here is more example stuff
In the following the image is cropped from the big one above. The centre has not changed. This is what happens when I undistort with just the points marked manually from the green pins and allowing K1 (only K1) to vary from 0:
Before
After
I would put it down to a bug, but when I use a larger set of points that covers more of the screen, even from a single plane, it works reasonably well. This looks terrible. However, the error is not nearly as bad as you might think from looking at the picture.
// Load image points
std::vector<cv::Point2f> im_points;
im_points.push_back(cv::Point2f(1206, 1454));
im_points.push_back(cv::Point2f(1245, 1443));
im_points.push_back(cv::Point2f(1284, 1429));
im_points.push_back(cv::Point2f(1315, 1456));
im_points.push_back(cv::Point2f(1352, 1443));
im_points.push_back(cv::Point2f(1383, 1431));
im_points.push_back(cv::Point2f(1431, 1458));
im_points.push_back(cv::Point2f(1463, 1445));
im_points.push_back(cv::Point2f(1489, 1432));
im_points.push_back(cv::Point2f(1550, 1461));
im_points.push_back(cv::Point2f(1574, 1447));
im_points.push_back(cv::Point2f(1597, 1434));
im_points.push_back(cv::Point2f(1673, 1463));
im_points.push_back(cv::Point2f(1691, 1449));
im_points.push_back(cv::Point2f(1708, 1436));
im_points.push_back(cv::Point2f(1798, 1464));
im_points.push_back(cv::Point2f(1809, 1451));
im_points.push_back(cv::Point2f(1819, 1438));
im_points.push_back(cv::Point2f(1925, 1467));
im_points.push_back(cv::Point2f(1929, 1454));
im_points.push_back(cv::Point2f(1935, 1440));
im_points.push_back(cv::Point2f(2054, 1470));
im_points.push_back(cv::Point2f(2052, 1456));
im_points.push_back(cv::Point2f(2051, 1443));
im_points.push_back(cv::Point2f(2182, 1474));
im_points.push_back(cv::Point2f(2171, 1459));
im_points.push_back(cv::Point2f(2164, 1446));
im_points.push_back(cv::Point2f(2306, 1474));
im_points.push_back(cv::Point2f(2292, 1462));
im_points.push_back(cv::Point2f(2278, 1449));
// Create corresponding world / object points
std::vector<cv::Point3f> world_points;
for (int i = 0; i < 30; i++) {
world_points.push_back(cv::Point3f(5 * (i / 3), 4 * (i % 3), 0.0f));
}
// Perform calibration
// Flags are set out so they can be commented out and "freed" easily
int calibration_flags = 0
| cv::CALIB_FIX_K1
| cv::CALIB_FIX_K2
| cv::CALIB_FIX_K3
| cv::CALIB_FIX_K4
| cv::CALIB_FIX_K5
| cv::CALIB_FIX_K6
| cv::CALIB_ZERO_TANGENT_DIST
| 0;
// Initialise matrix
cv::Mat intrinsic_matrix = cv::Mat(3, 3, CV_64F);
intrinsic_matrix.ptr<float>(0)[0] = 1;
intrinsic_matrix.ptr<float>(1)[1] = 1;
cv::Mat distortion_coeffs = cv::Mat::zeros(5, 1, CV_64F);
// Rotation and translation vectors
std::vector<cv::Mat> undistort_rvecs;
std::vector<cv::Mat> undistort_tvecs;
// Wrap in an outer vector for calibration
std::vector<std::vector<cv::Point2f>>im_points_v(1, im_points);
std::vector<std::vector<cv::Point3f>>w_points_v(1, world_points);
// Calibrate; only 1 plane, so intrinsics can't be trusted
cv::Size image_size(4000, 3000);
calibrateCamera(w_points_v, im_points_v,
image_size, intrinsic_matrix, distortion_coeffs,
undistort_rvecs, undistort_tvecs, calibration_flags);
// Undistort im_points
std::vector<cv::Point2f> ud_points;
cv::undistortPoints(im_points, ud_points, intrinsic_matrix, distortion_coeffs);
// ud_points have been "unintrinsiced", but we don't know the intrinsics, so reverse that
double fx = intrinsic_matrix.at<double>(0, 0);
double fy = intrinsic_matrix.at<double>(1, 1);
double cx = intrinsic_matrix.at<double>(0, 2);
double cy = intrinsic_matrix.at<double>(1, 2);
for (std::vector<cv::Point2f>::iterator iter = ud_points.begin(); iter != ud_points.end(); iter++) {
iter->x = iter->x * fx + cx;
iter->y = iter->y * fy + cy;
}
// Find a homography mapping the undistorted points to the known world points, ground plane
cv::Mat homography = cv::findHomography(ud_points, world_points);
// Transform the undistorted image points to the world points (2d only, but z is constant)
std::vector<cv::Point2f> estimated_world_points;
std::cout << "homography" << homography << std::endl;
cv::perspectiveTransform(ud_points, estimated_world_points, homography);
// Work out error
double sum_sq_error = 0;
for (int i = 0; i < 30; i++) {
double err_x = estimated_world_points.at(i).x - world_points.at(i).x;
double err_y = estimated_world_points.at(i).y - world_points.at(i).y;
sum_sq_error += err_x*err_x + err_y*err_y;
}
std::cout << "Sum squared error is: " << sum_sq_error << std::endl;
I would take random samples of the 30 input points and compute the homography in each case along with the errors under the estimated homographies, a RANSAC scheme, and verify consensus between error levels and homography parameters, this can be just a verification of the global optimisation process. I know that might seem unnecessary, but it is just a sanity check for how sensitive the procedure is to the input (noise levels, location)
Also, it seems logical that fixing most of the variables gets you the least errors, as the degrees of freedom in the minimization process are less. I would try fixing different ones to establish another consensus. At least this would let you know which variables are the most sensitive to the noise levels of the input.
Hopefully, such a small section of the image would be close to the image centre as it will incur the least amount of lens distortion. Is using a different distortion model possible in your case? A more viable way is to adapt the number of distortion parameters given the position of the pattern with respect to the image centre.
Without knowing the constraints of the algorithm, I might have misunderstood the question, that's also an option too, in such case I can roll back.
I would like to have this as a comment rather, but I do not have enough points.
OpenCV runs Levenberg-Marquardt algorithm inside calibrate camera.
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm/
This algortihm works fine in problems with one minimum. In case of single image, points located close each other and many dimensional problem (n= number of coefficents) algorithm may be unstable (especially with wrong initial guess of camera matrix. Convergence of algorithm is well described here:
https://na.math.kit.edu/download/papers/levenberg.pdf/
As you wrote, error depends on calibration flags - number of flags changes dimension of a problem to be optimized.
Camera calibration also calculates pose of camera, which will be bad in models with wrong calibration matrix.
As a solution I suggest changing approach. You dont need to calculate camera matrix and pose in this step. Since you know, that points are located on a plane you can use 3d-2d plane projection equation to determine distribution type of points. By distribution I mean, that all points will be located equally on some kind of trapezoid.
Then you can use cv::undistort with different distCoeffs on your test image and calculate image point distribution and distribution error.
The last step will be to perform this steps as a target function for some optimization algorithm with distortion coefficents being optimized.
This is not the easiest solution, but i hope it will help you.
I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.