Cube detection using C++ and openCV - c++

I am currently working on a robotic project: a robot must grab an cube using a Kinect camera that process cube detection and calculate coordinates.
I am new in computer vision. I first worked on static image of square in order to get a basic understanding. Using C++ and openCV, I managed to get the corners (and their x y pixel coordinates) of the square using smoothing (remove noise), edge detection (canny function), lines detection (Hough transform) and lines intersection (mathematical calculation) on an simplified picture (uniform background).
By adjusting some threshold I can achieve corners detection assuming that I have only one square and no line feature in the background.
Now is my question: do you have any direction/recommendation/advice/literature about cube recognition algorithm ?
What I have found so far involves shape detection combined with texture detection and/or learning sequence. Moreover, in their applications, they often use GPU/parallellisation computing, which I don't have...
The Kinect also provided a depth camera which gives distance of the pixel from the camera. Maybe I can use this to bypass "complicated" image processing ?
Thanks in advance.

OpenCV 3.0 with contrib includes surface_matching module.
Cameras and similar devices with the capability of sensation of 3D
structure are becoming more common. Thus, using depth and intensity
information for matching 3D objects (or parts) are of crucial
importance for computer vision. Applications range from industrial
control to guiding everyday actions for visually impaired people. The
task in recognition and pose estimation in range images aims to
identify and localize a queried 3D free-form object by matching it to
the acquired database.
http://docs.opencv.org/3.0.0/d9/d25/group__surface__matching.html

Related

Details about iphone arkit planeDetection

I would like to know from computer vision point of view how to detect plane surface and why Arkit can not detect vertical surface.
The way that ground plane detection works is as follows. A sparse 3D reconstruction of the scene is performed using feature-based Visual Inertial Odometry (which means estimating the camera pose using visual motion combined with information from the intertidal sensors). Points in the 3D reconstruction (also called a map) corresponds to a feature point detected in two or more camera images. From this sparse reconstruction, a ground plane is established by finding all the reconstructed points which are approximately coplanar. The way this is solved most likely with RANSAC based plane fitting. This works by randomly sampling a small set of feature points (typically 3 or 4), finding the equation of a plane which most closely fits these points, and then testing all other points for whether they lie close to the fitted plane. The process repeats many times (commonly hundreds) until a plane is found which fits a large number of feature points. There is an assumption in this library that the plane is a ground plane (not a wall) so any detected planes with strong inclination angles are rejected. It can do this using the onboard gyroscopic sensor. The reason why only ground planes are supported is that they correspond to the most common use case of AR (placing virtual objects on a ground plane) but in the future other geometric surfaces will almost certainly be supported.

How can I segment an object using Kinect V1?

I'm working with SDK 1.8 and I'm getting the depth stream from the Kinect. Now, I want to hold a paper of A4 size in front of the camera and want to get the co-ordinates of the corners of that paper so I can project an image onto it.
How can I detect the corners of the paper and get the co-ordinates? Does Kinect SDK 1.8 provide that option?
Thanks
Kinect SDK 1.8 does not provide this feature itself (to my knowledge). Depending on the language which you use for coding, there most certainly are libraries which allow such an operation if you segment it into steps.
OpenCV for example is quite useful in image-processing. When I once worked with the Kinect for object recognition, I used AForge with C#.
I recommend to target the challenge as follows:
Edge Detection:
You will apply edge detection algorithms such as the Canny Filter onto the image. First you will probably - depending on the library - transform your depth picture into a greyscale picture. The resulting image will be greyscale as well and the intensity of a pixel correlates with the probability of it belonging to an edge. Using a threshold, you will binarize this picture to black/white.
Hough Transformation: is used to get the position and parameters of a line within a image, which allows further calculation. Hough Transformation is VERY sensistive to its parameters and you will spend a lot of time in tuning those to get good results.
Calculation of edge points: Assuming that your Hough Transformation was successful, you can now calculate all intersections or the given lines which will yield the points that you are looking for.
All of these steps (especially Edge Detection and Hough Transformation) have been asked/answered/discussed in this forum.
If you provide code and intermediate results or further question, you can get a more detailled answer.
p.s.
I remember that the kinect was not that accurate and that noise was a topic. Therefore you might consider using a filter before doing these operations.

OpenCV triangulatePoints varying distance

I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?
The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/
There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.

Align profile face image with its frontal face image

I have a profile face:
and a frontal face image:
Output: aligned profile face with reference to frontal face.
Idea: I just need to know which 3 common points I can take,which will be visible on both faces and then use affineTransform and display the aligned profile face
OR any other **simple method** of doing so
development envi.:c++ and opencv 2.4.2
what I tried:
haarcascade feature detection(common detection point in both images=eye) ; it wont detect ear in frontal face
OpenCV: Shift/Align face image relative to reference Image (Image Registration) (I get error message)
As discussed here by #bytefish, finding the accurate position of the eyes in a given image is far from trivial. The Haar-cascades for finding the eyes in OpenCV produce too many false positive to be useful, moreover this approach won't be robust to image rotation.
You'll need a robust head pose estimation for aligning face images. Here are two most robust ones (with code available):
Gary B. Huang, Vidit Jain, and Erik Learned-Miller. Unsupervised joint alignment of complex images. International Conference on Computer Vision (ICCV), 2007. (Project page), (PDF Online available), (Source code)
X. Zhu, D. Ramanan. Face Detection, Pose Estimation and Landmark Localization in the Wild Computer Vision and Pattern Recognition (CVPR) Providence, Rhode Island, June 2012. (Project page), (PDF Online available), (Source code)
For example, using the method described in the second paper, you will get more robust features like that are shown in the following images. And these robust features will, in turn, ensure to generate more robust face alignment performance.
If you look for something really simple you can treat your mouth as a line on a planar object and calculate the rotation by the amount of line foreshortening. You should never smile though when you take pictures nor when you write your code.
A much cooler approach though would be to map your face as a texture to a predefined 3D model and rotate it until it correlates best with your profile view.
Of course, the right way to do this is to use a set of different head rotations to train a binary classifier that does only pairwise intensity comparisons as in Lepetit paper.

Detecting a cross in an image with OpenCV

I'm trying to detect a shape (a cross) in my input video stream with the help of OpenCV. Currently I'm thresholding to get a binary image of my cross which works pretty good. Unfortunately my algorithm to decide whether the extracted blob is a cross or not doesn't perform very good. As you can see in the image below, not all corners are detected under certain perspectives.
I'm using findContours() and approxPolyDP() to get an approximation of my contour. If I'm detecting 12 corners / vertices in this approximated curve, the blob is assumed to be a cross.
Is there any better way to solve this problem? I thought about SIFT, but the algorithm has to perform in real-time and I read that SIFT is not really suitable for real-time.
I have a couple of suggestions that might provide some interesting results although I am not certain about either.
If the cross is always near the center of your image and always lies on a planar surface you could try to find a homography between the camera and the plane upon which the cross lies. This would enable you to transform a sample image of the cross (at a selection of different in plane rotations) to the coordinate system of the visualized cross. You could then generate templates which you could match to the image. You could do some simple pixel agreement tests to determine if you have a match.
Alternatively you could try to train a Haar-based classifier to recognize the cross. This type of classifier is often used in face detection and detects oriented edges in images, classifying faces by the relative positions of several oriented edges. It has good classification accuracy on faces and is extremely fast. Although I cannot vouch for its accuracy in this particular situation it might provide some good results for simple shapes such as a cross.
Computing the convex hull and then taking advantage of the convexity defects might work.
All crosses should have four convexity defects, making up four sets of two points, or four vectors. Furthermore, if your shape was a cross then these four vectors will have two pairs of supplementary angles.