Is camera pose estimation with SOLVEPNP_EPNP sensitive to outliers and can this be rectified? - c++

I have to do an assignment in which I should compare the function solvePnP() used with SOLVEPNP_EPNP and solvePnPRansac() used with SOLVEPNP_ITERATIVE. The goal is to calculate a warped image from an input image.
To do this, I get an RGB input image, the same image as 16bit depth information image, the camera intrinsics and a list of feature match points between the given image and the wanted resulting warped image (which is the same scene from a different perspective.
This is how I went about this task so far:
Calculate a list of 3D object points form the depth image and the the intrinsics which correspond to the list of feature matches.
use solvePnP() and solvePnPRansac() with the respective algorithms where the calculated 3D object points and the feature match points of the resulting image are the inputs. As a result I get a rotation and a translation vector for both methods.
As sanity check I calculate the average reprojection error using projectPoints() for all feature match points and comparing the resulting projected points to the feature match points of the resulting image.
Finally I calculate 3D object points for each pixel of the input image and again project them using the rotation and translation vector from before. Each projected point will get the color from the corresponding pixel in the input image resulting in the final warped image.
These are my inputs:
Using steps described above I get the following output with the Ransac Method:
This looks pretty much like the reference solution I have, so this should be mostly correct.
However with the solvePnP() method using SOLVEPNP_EPNP the resulting rotation and translation vectors look like this, which doesn't make sense at all:
================ solvePnP using SOVLEPNP_EPNP results: ===============
Rotation: [-4.3160208e+08; -4.3160208e+08; -4.3160208e+08]
Translation: [-4.3160208e+08; -4.3160208e+08; -4.3160208e+08]
The assignment sheet states, that the list of feature matches contain some miss - matches, so basically outliers. As far as I know, Ransac handles outliers better, however can this be the reason for this weird results for the other method? I was expecting some anomalies, but this is completely wrong and the resulting image is completely black since no points are inside the image area.
Maybe someone can point me into the right direction.

OK, I could solve the issue. I used float for all all calculation (3D object points, matches, ...) beforehand and tried to change everything to double - it did the trick.
The warped perspective is still off and I get a rather high re-projection error, however this should be due to the nature of the algorithm itself, which doesn't handle outliers very well.
The weird thing about this is, that in the OpenCV documentation on solvePnP() it states that vector<Point3f> and vector<Point2f>can be passed as arguments for object points and image points respectively.

Related

Build-in function for interpolating single pixels and small blobs

Problem
Is there a build-in function for interpolating single pixels?
Given a normal image as Mat and a Point, e.g. an anomaly of the sensor or an outlier, is their some function to repair this Point?
Furthermore, if I have more than one Point connected (let's say a blob with area smaller 10x10) is there a possibility to fix them too?
Trys but not really solutions
It seems that interpolation is implemented in the geometric transformations including resizing images and to extrapolate pixels outside of the image with borderInterpolate, but I haven't found a possibility for single pixels or small clusters of pixels.
A solution with medianBlur like suggested here does not seem appropriate as it changes the whole image.
Alternative
If there isn't a build-in function, my idea would be to look at all 8-connected surrounding pixels which are not part of the blob and calculate the mean or weighted mean. If doing this iteratively, all missing or erroneous pixel should be filled. But this method would be dependent of the applied order to correct each pixel. Are there other suggestions?
Update
Here is an image to illustrate the problem. Left the original image with a contour marking the pixels to fix. Right side shows the fixed pixels. I hope to find some sophisticated algorithms to fix the pixel.
The build-in function inpaint of OpenCV does the desired interpolation of chosen pixels. Simply create a mask with all pixels to be repaired.
See the documentation here: OpenCV 3.2. Description: inpaint and Function: inpaint

3D reconstruction using stereo vison - theory

I am currently reading into the topic of stereo vision, using the book of Hartley&Zimmerman alongside some papers, as I am trying to develop an algorithm capable of creating elevation maps from two images.
I am trying to come up with the basic steps for such an algorithm. This is what I think I have to do:
If I have two images I somehow have to find the fundamental matrix, F, in order to find the actual elevation values at all points from triangulation later on. If the cameras are calibrated this is straightforward if not it is slightly more complex (plenty of methods for this can be found in H&Z).
It is necessary to know F in order to obtain the epipolar lines. These are lines that are used in order to find image point x in the first image back in the second image.
Now comes the part were it gets a bit confusing for me:
Now I would start taking a image point x_i in the first picture and try to find the corresponding point x_i’ in the second picture, using some matching algorithm. Using triangulation it is now possible to compute the real world point X and from that it’s elevation. This process will be repeated for every pixel in the right image.
In the perfect world (no noise etc) triangulation will be done based on
x1=P1X
x2=P2X
In the real world it is necessary to find a best fit instead.
Doing this for all pixels will lead to the complete elevation map as desired, some pixels will however be impossible to match and therefore can't be triangulated.
What confuses me most is that I have the feeling that Hartley&Zimmerman skip the entire discussion on how to obtain your point correspondences (matching?) and that the papers I read in addition to the book talk a lot about disparity maps which aren’t mentioned in H&Z at all. However I think I understood correctly that the disparity is simply the difference x1_i- x2_i?
Is this approach correct, and if not where did I make mistakes?
Your approach is in general correct.
You can think of a stereo camera system as two points in space where their relative orientation is known. This are the optical centers. In front of each optical center, you have a coordinate system. These are the image planes. When you have found two corresponding pixels, you can then calculate a line for each pixel, wich goes throug the pixel and the respectively optical center. Where the two lines intersect, there is the object point in 3D. Because of the not perfect world, they will probably not intersect and one may use the point where the lines are closest to each other.
There exist several algorithms to detect which points correspond.
When using disparities, the two image planes need to be aligned such that the images are parallel and each row in image 1 corresponds to the same row in image 2. Then correspondences only need to be searched on a per row basis. Then it is also enough to know about the differences on x-axis of the single corresponding points. This is then the disparity.

Finding Circle Edges :

Finding Circle Edges :
Here are the two sample images that i have posted.
Need to find the edges of the circle:
Does it possible to develop one generic circle algorithm,that could find all possible circles in all scenarios ?? Like below
1. Circle may in different color ( White , Black , Gray , Red)
2. Background color may be different
3. Different in its size
http://postimage.org/image/tddhvs8c5/
http://postimage.org/image/8kdxqiiyb/
Please suggest some idea to write a algorithm that should work out on above circle
Sounds like a job for the Hough circle transform:
I have not used it myself so far, but it is included in OpenCV. Among other parameters, you can give it a minimum and maximum radius.
Here are links to documentation and a tutorial.
I'd imagine your second example picture will be very hard to detect though
You could apply an edge detection transformation to both images.
Here is what I did in Paint.NET using the outline effect:
You could test edge detect too but that requires more contrast in the images.
Another thing to take into consideration is what it exactly is that you want to detect; in the first image, do you want to detect the white ring or the disc inside. In the second image; do you want to detect the all the circles (there are many tiny ones) or just the big one(s). These requirement will influence what transformation to use and how to initialize these.
After transforming the images into versions that 'highlight' the circles you'll need an algorithm to find them.
Again, there are more options than just one. Here is a paper describing an algoritm
Searching the web for image processing circle recognition gives lots of results.
I think you will have to use a couple of different feature calculations that can be used for segmentation. I the first picture the circle is recognizeable by intensity alone so that one is easy. In the second picture it is mostly the texture that differentiates the circle edge, in that case a feature image based based on some kind of texture filter will be needed, calculating the local variance for instance will result in a scalar image that can segment out the circle. If there are other features that defines the circle in other scenarios (different colors for background foreground etc) you might need other explicit filters that give a scalar difference for those cases.
When you have scalar images where the circles stand out you can use the circular Hough transform to find the circle. Either run it for different circle sizes or modify it to detect a range of sizes.
If you know that there will be only one circle and you know the kind of noise that will be present (vertical/horizontal lines etc) an alternative approach is to design a more specific algorithm e.g. filter out the noise and find center of gravity etc.
Answer to comment:
The idea is to separate the algorithm into independent stages. I do not know how the specific algorithm you have works but presumably it could take a binary or grayscale image where high values means pixel part of circle and low values pixel not part of circle, the present algorithm also needs to give some kind of confidence value on the circle it finds. This present algorithm would then represent some stage(s) at the end of the complete algorithm. You will then have to add the first stage which is to generate feature images for all kind of input you want to handle. For the two examples it should suffice with one intensity image (simply grayscale) and one image where each pixel represents the local variance. In the color case do a color transform an use the hue value perhaps? For every input feed all feature images to the later stage, use the confidence value to select the most likely candidate. If you have other unknowns that your algorithm need as input parameters (circle size etc) just iterate over the possible values and make sure your later stages returns confidence values.

Homography computation of a deformed square planar object

suppose you have a square planar object (a piece of paper). You take a photo of it. Generally speaking, it will appear deformed. Suppose you process the image and compute the four corners of the planar object. Given the four points, you can compute an homography.
But now suppose that the object undergoes some type of deformation. All we can say about the nature of the deformation is:
it is "smooth" ( the surface of the object will not form sharp angles)
the surface of the object will be always totally visible even after the deformation.
For example: you stick the square paper on the surface of a cylindrical object.
The question is: given only the four coordinates (in pixel) of the corners of the planar (deformed) object, can i compute the correct homography? That is, can i "remove" the effect of the deformation before computing the homograhy?
Even an "approximated" (read working ;) method would be really useful.
Thanks.
Ps.
I wish to add that i don't know, a priori, the content of the planar object. Infact, the algorithm i am writing computes the homography, unwarp the object and check its content. It is a 2D barcode, so i have a pair id/crc of numbers. If the crc extracted from the object is equal to the crc computed on the id then it is a valid barcode.
A homography is by definition a plane-plane transform. If the barcode is small enough you could probably assume that the object it is attached to is piecewise planar. After rectifying the image of the barcode you could estimate a barrel distortion model.
If you want to remove the deformation first then you would have to estimate the surface first and then flatten it. That would be a lot more difficult.

finding Image shift

How to find shift and rotation between same two images using programming languages vb.net or C++ or C#?
The problem you state is called motion detection (or motion compensation) and is one of the most important problems in image and video processing at the moment. No easy "here are ten lines of code that will do it" solution exists except for some really trivial cases.
Even your seemingly trivial case is quite a difficult one because a rotation by an unknown angle could cause slight pixel-by-pixel changes that can't be easily detected without specifically tailored algorithms used for motion detection.
If the images are very similar such that the camera is only slightly moved and rotated then the problem could be solved without using highly complex techniques.
What I would do, in that case, is use a motion tracking algorithm to get the optical flow of the image sequence which is a "map" which approximates how a pixel has "moved" from image A to B. OpenCV which is indeed a very good library has functions that does this: CalcOpticalFlowLK and CalcOpticalFlowPyrLK.
The tricky bit is going from the optical flow to total rotation of the image. I would start by heavily low pass filter the optical flow to get a smoother map to work with.
Then you need to use some logic to test if the image is only shifted or rotated. If it is only shifted then the entire map should be one "color", i.e. all flow vectors point in the same direction.
If there has been a rotation then the vectors will point in different direction depending on the rotation.
If the input images are not as nice as the above method requires, then I would look into feature descriptors to find how a specific object in the first image is located within the second. This will however be much harder.
There is no short answer. You could try to use free OpenCV library for finding relationship between two images.
The two operations, rotation and translation can be determined in either order. It's far easier to first detect rotation, because you can then compensate for that. Once both images are oriented the same, the translation becomes a matter of simmple correlation.
Finding the relative rotation of an image is best done by determining the local gradients. For every neighborhood (e.g. 3x3 pixels), treat the greyvalue as a function z(x,y), fit a plane through the 9 pixels, and determine the slope or gradient of that plane. Now average the gradient you found over the entire image, or at least the center of it. Your two images will produce different averages. Part of that is because for non-90 degree rotations the images won't overlap fully, but in general the difference in average gradients is the rotation between the two.
Once you've rotated back one image, you can determine a correlation. This is a fairly standard operation; you're essentially determining for each possible offset how well the two images overlap. This will give you an estimate for the shift.
Once you've got both, you can refine your rotation angle estimate by rotating back the translation, shifting the second image, and determining the average gradient only over the pixels common to both images.
If the images are exactly the same, it should be fairly easy to extract some feature points - for example using SIFT - and match the features of both images. You can then use any two of the matching features to find the rotation and translation. The translation is just the difference between two matching feature points. The you compensate for the translation in one image and get the rotation angle as the angle formed by the three remaining points.