How to segment whiteboard region? - c++

This is a case of whiteboard e-learning. The video shows the instructor teaching using the whiteboard.
The student is asked to select four corners of the whiteboard. two of the corners may not be in the visible region. Can anyone suggest an algorithm which finds out the whiteboard area based of the four corner points selected?
I want to do something like what we see in camscanner app.

If you can mark 4 points in image coordinates, then you can map these points to another quadrangle, or more specifically, rectangle using a homography. You will also need the whiteboard aspect ratio so that your result will have the correct X and Y equal scales. With this homography you can warp the video to straighten the board.
Note that the warping will cause some blurring of more distant areas and anything outside the whiteboard plane, e.g. the instructor, will be unrealistically warped.
If some of these points are outside the image, you will have to approximate their position for a proper view.

Related

Finding order of objects with OpenCV 3

With a group of friend, we are trying to accomplish a computer vision task on Raspberry Pi, coding with C++ using OpenCV library.
Let me explain the task first.
There is a pattern consisting of 16 seperate squares with each square being red, yellow or blue colored. We are mounting rasperry pi on a quadcopter with its camera module and gathering video feed of the pattern.
We have to detect colors of squares which was easy to accomplish with a little research on web. Tricky part is we have to detect order of the squares as well in order to save the colors in an array in an order.
So far we have accomplished filtering desired colors (red, yellow, blue) to determine squares.
example pattern to recognize and our process so far
In the second image, we know the colors and center points of each square. What we need is a way to write them in an order to a file or on screen.
And to find the order, we tried several OpenCV methods that find corners. With corner points at hand, we compared each point and determined end points so we could draw a boundingrectangle and overcome little distortions.
But since quadcopter gets the video stream, there is always a chance of high distortion. That messes up our corner theory, resulting in wrong order of colors. For example it can capture an image like this:
highly distorted image
It is not right to find order of these squares by comparing their center points. It also won't work finding endpoints to draw a larger rectangle around them to flatten pattern. And then order...
What I ask for is algorithm suggestions. Are we totally going in the wrong direction trying to find corners? Is it possible to determine the order without taking distortion into consideration?
Thanks in advance.
Take the two centers that are the furthest apart and number them 1 and 16. Then find the two centers that are the furthest from the line 1-16, to the left (number 4) and to the right (number 13). Now you have the four corners.
Compute the affine transform that maps the coordinates of the corners 1, 4 and 13 to (0,0), (3,0) and (0,3). Apply this transform to the 16 centers and round to the nearest integers. If all goes well, you will obtain the "logical" coordinates of the squares, in range [0, 3] x [0, 3]. The mapping to the cell indexes is immediate.
Note that because of symmetry, a fourfold undeterminacy will remain, which you can probably lift by checking the color patterns.
This procedure will be very robust to deformations. If there is extreme perspective, you can even exploit the four corners to determine an homographic transform instead of affine. In your case, I doubt this will be useful. You can assess proper working by checking that all expected indexes have been assigned.

Detecting/correcting Photo Warping via Point Correspondences

I realize there are many cans of worms related to what I'm asking, but I have to start somewhere. Basically, what I'm asking is:
Given two photos of a scene, taken with unknown cameras, to what extent can I determine the (relative) warping between the photos?
Below are two images of the 1904 World's Fair. They were taken at different levels on the wireless telegraph tower, so the cameras are more or less vertically in line. My goal is to create a model of the area (in Blender, if it matters) from these and other photos. I'm not looking for a fully automated solution, e.g., I have no problem with manually picking points and features.
Over the past month, I've taught myself what I can about projective transformations and epipolar geometry. For some pairs of photos, I can do pretty well by finding the fundamental matrix F from point correspondences. But the two below are causing me problems. I suspect that there's some sort of warping - maybe just an aspect ratio change, maybe more than that.
My process is as follows:
I find correspondences between the two photos (the red jagged lines seen below).
I run the point pairs through Matlab (actually Octave) to find the epipoles. Currently, I'm using Peter Kovesi's
Peter's Functions for Computer Vision.
In Blender, I set up two cameras with the images overlaid. I orient the first camera based on the vanishing points. I also determine the focal lengths from the vanishing points. I orient the second camera relative to the first using the epipoles and one of the point pairs (below, the point at the top of the bandstand).
For each point pair, I project a ray from each camera through its sample point, and mark the closest covergence of the pair (in light yellow below). I realize that this leaves out information from the fundamental matrix - see below.
As you can see, the points don't converge very well. The ones from the left spread out the further you go horizontally from the bandstand point. I'm guessing that this shows differences in the camera intrinsics. Unfortunately, I can't find a way to find the intrinsics from an F derived from point correspondences.
In the end, I don't think I care about the individual intrinsics per se. What I really need is a way to apply the intrinsics to "correct" the images so that I can use them as overlays to manually refine the model.
Is this possible? Do I need other information? Obviously, I have little hope of finding anything about the camera intrinsics. There is some obvious structural info though, such as which features are orthogonal. I saw a hint somewhere that the vanishing points can be used to further refine or upgrade the transformations, but I couldn't find anything specific.
Update 1
I may have found a solution, but I'd like someone with some knowledge of the subject to weigh in before I post it as an answer. It turns out that Peter's Functions for Computer Vision has a function for doing a RANSAC estimate of the homography from the sample points. Using m2 = H*m1, I should be able to plot the mapping of m1 -> m2 over top of the actual m2 points on the second image.
The only problem is, I'm not sure I believe what I'm seeing. Even on an image pair that lines up pretty well using the epipoles from F, the mapping from the homography looks pretty bad.
I'll try to capture an understandable image, but is there anything wrong with my reasoning?
A couple answers and suggestions (in no particular order):
A homography will only correctly map between point correspondences when either (a) the camera undergoes a pure rotation (no translation) or (b) the corresponding points are all co-planar.
The fundamental matrix only relates uncalibrated cameras. The process of recovering a camera's calibration parameters (intrinsics) from unknown scenes, known as "auto-calibration" is a rather difficult problem. You'd need these parameters (focal length, principal point) to correctly reconstruct the scene.
If you have (many) more images of this scene, you could try using a system such as Visual SFM: http://ccwu.me/vsfm/ It will attempt to automatically solve the Structure From Motion problem, including point matching, auto-calibration and sparse 3D reconstruction.

How to make people with different distances to the camera appear to be the same sizes?

Now I am doing a project which is called crowd estimation. I am focused on estimating the crowd levels in canteens. My approach is based on subtracting the background and only keeping the people in the foreground. The people are represented by white pixels and the background is black. So I can estimate the crowd level by counting the white pixels in the foreground. However, this needs people with different distances to the camera appear to be the same sizes. Otherwise, the white pixels for the people sitting near to the camera may be 5 times another person who sits far away from the camera. This is not I want, I want their pixels to be almost the same. Below is a screenshot of the canteen:
The first way I have tried is by segmenting the scene into different regions, the near regions are assigned with less weights, the far regions are assigned with more weights. However, the segmentation and weights assigning are all done manually, and it's hard to judge the line to segment and the weights assigned to each region. I use a picture below to show how it is done:
Another way I have tried is the perceptive transform, choosing four points on the input image and mapping them to the 4 points on the output image. However it's hard to choose the 4 points and it's hard to decide whether the people's sizes are the same after the transformation. It's shown below:
Can anyone provide a good way to solve this people's sizes problem(due too different distances)? Your reply is greatly appreciated.

Detect ball/circle in OpenCV (C++)

I am trying to detect a ball in an filtered image.
In this image I've already removed the stuff that can't be part of the object.
Of course I tried the HoughCircle function, but I did not get the expected output.
Either it didn't find the ball or there were too many circles detected.
The problem is that the ball isn't completly round.
Screenshots:
I had the idea that it could work, if I identify single objects, calculate their center and check whether the radius is about the same in different directions.
But it would be nice if it detect the ball also if he isn't completely visible.
And with that method I can't detect semi-circles or something like that.
EDIT: These images are from a video stream (real time).
What other method could I try?
Looks like you've used difference imaging or something similar to obtain the images you have..? Instead of looking for circles, look for a more generic loop. Suggestions:
Separate all connected components.
For every connected component -
Walk around the contour and collect all contour pixels in a list
Suggestion 1: Use least squares to fit an ellipse to the contour points
Suggestion 2: Study the curvature of every contour pixel and check if it fits a circle or ellipse. This check may be done by computing a histogram of edge orientations for the contour pixels, or by checking the gradients of orienations from contour pixel to contour pixel. In the second case, for a circle or ellipse, the gradients should be almost uniform (ask me if this isn't very clear).
Apply constraints on perimeter, area, lengths of major and minor axes, etc. of the ellipse or loop. Collect these properties as features.
You can either use hard-coded heuristics/thresholds to classify a set of features as ball/non-ball, or use a machine learning algorithm. I would first keep it simple and simply use thresholds obtained after studying some images.
Hope this helps.

3D reconstruction using stereo vison - theory

I am currently reading into the topic of stereo vision, using the book of Hartley&Zimmerman alongside some papers, as I am trying to develop an algorithm capable of creating elevation maps from two images.
I am trying to come up with the basic steps for such an algorithm. This is what I think I have to do:
If I have two images I somehow have to find the fundamental matrix, F, in order to find the actual elevation values at all points from triangulation later on. If the cameras are calibrated this is straightforward if not it is slightly more complex (plenty of methods for this can be found in H&Z).
It is necessary to know F in order to obtain the epipolar lines. These are lines that are used in order to find image point x in the first image back in the second image.
Now comes the part were it gets a bit confusing for me:
Now I would start taking a image point x_i in the first picture and try to find the corresponding point x_i’ in the second picture, using some matching algorithm. Using triangulation it is now possible to compute the real world point X and from that it’s elevation. This process will be repeated for every pixel in the right image.
In the perfect world (no noise etc) triangulation will be done based on
x1=P1X
x2=P2X
In the real world it is necessary to find a best fit instead.
Doing this for all pixels will lead to the complete elevation map as desired, some pixels will however be impossible to match and therefore can't be triangulated.
What confuses me most is that I have the feeling that Hartley&Zimmerman skip the entire discussion on how to obtain your point correspondences (matching?) and that the papers I read in addition to the book talk a lot about disparity maps which aren’t mentioned in H&Z at all. However I think I understood correctly that the disparity is simply the difference x1_i- x2_i?
Is this approach correct, and if not where did I make mistakes?
Your approach is in general correct.
You can think of a stereo camera system as two points in space where their relative orientation is known. This are the optical centers. In front of each optical center, you have a coordinate system. These are the image planes. When you have found two corresponding pixels, you can then calculate a line for each pixel, wich goes throug the pixel and the respectively optical center. Where the two lines intersect, there is the object point in 3D. Because of the not perfect world, they will probably not intersect and one may use the point where the lines are closest to each other.
There exist several algorithms to detect which points correspond.
When using disparities, the two image planes need to be aligned such that the images are parallel and each row in image 1 corresponds to the same row in image 2. Then correspondences only need to be searched on a per row basis. Then it is also enough to know about the differences on x-axis of the single corresponding points. This is then the disparity.