Reconstruction of stereo image from single view images

Reconstruction of stereo image from single view images - c++

How can I reconstruct an image from the stereo image pairs using OpenCV?

This is not necessarily an easy-to-solve problem. The thing is that both images store almost the same information, but from a slightly different perspective (angle and distance). So you have a perspective for each 2 of the stereo-optics. The only way to restore this is if(a) you knew what this perspective would be, e.g. a relative position-vector between both perspectives and the angle for both, you could create a mapping for a pixel in one of the images to the other.
The color of this (mapped) pixel ought to be the same, but as older stereo-optic-systems mapped to blue and red, you might have different values and thus have gained information doing this. Still, without these perspectives, you will need to correlate both pictures to each other and do quite complex image processing. I would suggest using scholar.google.com, unfortunately I failed to find anything useful, if you also can't find it there, start a phd ;)
Anyone who does know an algorithm of method to somehow restore such images, please let me know :) I am very curious about this as well.

Related

Detect a 2 x 3 Matrix of white dots in an image

I want to locate a service robot via infrared landmarks. The idea is to detect two landmarks, get the distance to the landmarks and calculate the robots position from these informations (the position of the landmarks are known).
For this I have built an artificial 2x3 matrix of IR LEDs, which are visible in the robots infrared camera image (shown in the image below).
As the first step, I want to detect a single landmark in a picture and get it's x-y coordinates. I can use these coordinates in the future to get the distance from the depth-image provided.
My first approach was to convert the image to a black and white image. Then I tried to filter out different cluster of points (which i dilated and contoured in the first place). I couldn't succeed with this method.
Now I wonder if there are any pattern recognition/computer vision methods which can help me to quite "easily" detect the pattern.
I've added a picture of the infrared image with the landmark in it and a converted black/white image.
a) Which method can help me to solve this problem?
b) Should I use a 3x3 Matrix or any other geometric form instead of the 2x3 Matrix ?
IR-Image
Black-White Image

A direct answer:
1) find all small circles in the image; 2) look among these small circles for ones that are the same size and close together, and, say, form parallel lines.
The reason for this approach is that you have coded the robot with a specific pattern of small objects. Therefore, look for the objects and then look for the pattern. (If the orientation and size wouldn't change, then you could just look for a sub-image within the larger image, but because it can, you need to look for elements of the pattern that remain consistent with motion in the 3D space, that is, the parallel lines.)
This will work in the example images, but to know whether this will work more generally, we need to know more than you told us: It depends on whether the variation in the images of the matrix and the variations in the background will let this be enough to distinguish between them. If not, maybe you need a more clever algorithm or maybe a different pattern of lights. In the extreme case, it's obvious that if you had another 2x3 matric around, it's not enough. It all depends on the variation of the object to be identified and the variations within the background scene, and because you don't tell us either of these things, it's hard to say the best way, what's good enough, what's a better way, etc.
If you have the choice, and here it sound like you do, good data is better than clever analysis. For this problem, I'd call good data to be anything that clearly distinguishes the object from the background. You need to think of it this way, and look at what the background is, and all the different perspectives on the lights that are possible, and make sure these can never be confused.
For example, if you have a lot of control over this, and enough time, temporal variations are often the easiest. Turning the lights (or a subset of the lights) on and off, etc, and then looking for the expected temporal variation is often the surest way to distinguish signal from noise — but really, this again is just making an assumption about the background and foreground (ie, that the background won't vary with some particular time pattern).

Detecting/correcting Photo Warping via Point Correspondences

I realize there are many cans of worms related to what I'm asking, but I have to start somewhere. Basically, what I'm asking is:
Given two photos of a scene, taken with unknown cameras, to what extent can I determine the (relative) warping between the photos?
Below are two images of the 1904 World's Fair. They were taken at different levels on the wireless telegraph tower, so the cameras are more or less vertically in line. My goal is to create a model of the area (in Blender, if it matters) from these and other photos. I'm not looking for a fully automated solution, e.g., I have no problem with manually picking points and features.
Over the past month, I've taught myself what I can about projective transformations and epipolar geometry. For some pairs of photos, I can do pretty well by finding the fundamental matrix F from point correspondences. But the two below are causing me problems. I suspect that there's some sort of warping - maybe just an aspect ratio change, maybe more than that.
My process is as follows:
I find correspondences between the two photos (the red jagged lines seen below).
I run the point pairs through Matlab (actually Octave) to find the epipoles. Currently, I'm using Peter Kovesi's
Peter's Functions for Computer Vision.
In Blender, I set up two cameras with the images overlaid. I orient the first camera based on the vanishing points. I also determine the focal lengths from the vanishing points. I orient the second camera relative to the first using the epipoles and one of the point pairs (below, the point at the top of the bandstand).
For each point pair, I project a ray from each camera through its sample point, and mark the closest covergence of the pair (in light yellow below). I realize that this leaves out information from the fundamental matrix - see below.
As you can see, the points don't converge very well. The ones from the left spread out the further you go horizontally from the bandstand point. I'm guessing that this shows differences in the camera intrinsics. Unfortunately, I can't find a way to find the intrinsics from an F derived from point correspondences.
In the end, I don't think I care about the individual intrinsics per se. What I really need is a way to apply the intrinsics to "correct" the images so that I can use them as overlays to manually refine the model.
Is this possible? Do I need other information? Obviously, I have little hope of finding anything about the camera intrinsics. There is some obvious structural info though, such as which features are orthogonal. I saw a hint somewhere that the vanishing points can be used to further refine or upgrade the transformations, but I couldn't find anything specific.
Update 1
I may have found a solution, but I'd like someone with some knowledge of the subject to weigh in before I post it as an answer. It turns out that Peter's Functions for Computer Vision has a function for doing a RANSAC estimate of the homography from the sample points. Using m2 = H*m1, I should be able to plot the mapping of m1 -> m2 over top of the actual m2 points on the second image.
The only problem is, I'm not sure I believe what I'm seeing. Even on an image pair that lines up pretty well using the epipoles from F, the mapping from the homography looks pretty bad.
I'll try to capture an understandable image, but is there anything wrong with my reasoning?

A couple answers and suggestions (in no particular order):
A homography will only correctly map between point correspondences when either (a) the camera undergoes a pure rotation (no translation) or (b) the corresponding points are all co-planar.
The fundamental matrix only relates uncalibrated cameras. The process of recovering a camera's calibration parameters (intrinsics) from unknown scenes, known as "auto-calibration" is a rather difficult problem. You'd need these parameters (focal length, principal point) to correctly reconstruct the scene.
If you have (many) more images of this scene, you could try using a system such as Visual SFM: http://ccwu.me/vsfm/ It will attempt to automatically solve the Structure From Motion problem, including point matching, auto-calibration and sparse 3D reconstruction.

Pixel level image registration / alignment?

I'm trying to remove foreground from two images, here's a sample pair of images:
As you can see, the Budweiser bottle is removed from the scene before the second shot is taken.
These photos were captured from a pinhole camera (iPhone), and, the tricky part is I'm hand-holding the camera, so it cannot be guaranteed that the images are perfectly aligned pixel by pixel, so a simple minus-threshold method will not work.
Then, I've decided to perform image registration using findHomography and warpPerspective from OpenCV, here's the result image:
This image is warped with the matrix I've got from findHomography, it kind of improved the alignment quality, but still not that aligned so I can use a simple way to remove the foreground.
So, finally, I decided to implement a "fuzzy-minus" algorithm: for every pixel in image1, I'll look through a 7x7 neighbour in image2 (a 7 by 7 kernel?), using the minimal difference in grayscale as the result of minus, and threshold the result into binary image, here's what I've got:
And the result is still not good. Notice the white wholes in the bottle, this is produced due to similar grayscale value of foreground and background. So I'm not sure what to do now.
I can think of two ways to solve the problem, the first is to get a better aligned pair of images, and simply minus the pairs; the second is to use a more robust way to extract the foreground.
Can anyone give me some advice on how to deal with this kind of problem? I believe there should be some state-of-art algorithms or processing pipelines, but after googling around, I get nothing.
I'm using OpenCV with C++, it would be fantastic if you can tell me how to do it with these tools in hand.
Big big thanks in advance!

The problem is not in your algorithm. You are having problem because the two scenes were not taken from exactly the same angle, as shown in the animation below. This slight difference highlight the edges in the subtraction.
You need a static camera in order to apply this approach.

I suggest using mathematical morphology on the mask that you got to get rid of the artifacts.
Try applying both opening and closing to get rid of the black and the white small regions.
Mathematical Morphology
Mathematical Morphology in opencv
The difference between the two picture is pretty huge, so you will need to use a large structure element, but I don't think you will be able to get rid of the shadow.
For the two large strips in the background, you may try to use a horizontally shaped structure element as well.
Edit
Is it possible to produce a grayscale image instead of a binary image? if yes, you may try to experiment with the hat method for the shadow, but I am not sure about this point.
This is what I got using two different structure elements for closing THEN opening
Mat mask = imread("mask.jpg",CV_LOAD_IMAGE_GRAYSCALE);
morphologyEx(mask,mask,MORPH_CLOSE,getStructuringElement(CV_SHAPE_ELLIPSE,Size(50,10)));
morphologyEx(mask,mask,MORPH_OPEN,getStructuringElement(CV_SHAPE_ELLIPSE,Size(10,50)));
imshow("open",mask);
imwrite("maskopenclose.jpg",mask);

I would suggest optical flow for alignment and OpenCV's background subtraction algorithm:
http://docs.opencv.org/trunk/doc/tutorials/video/background_subtraction/background_subtraction.html

I suggest that instead of using findHomography try using some of openCV's stereo correspondence functions: http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
there is a sample code here: https://github.com/Itseez/opencv/blob/master/samples/cpp/stereo_calib.cpp

Target Detection - Algorithm suggestions

I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?

In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.

A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.

I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.

How to approach this image processing problem - classification

I have this image:
What I would like to do is classify this image between the flowers and trees, so that I could find the region of the image that is covered by trees, and the region that is covered by those flowers.
I was thinking that this could be some kind of FFT problem, but I'm not exactly sure how it would work. The FFT of the individual flower is different that the trees, so I could compare magnitudes there or something, but I dont know if thats the exact right approach.
The reason I was leaning down this route is because I have, in the past, written an image classification algorithm that relied on magnitude data to distinguish different areas of an image, but I'm just not sure how to generate that, or if its the right approach.
Thanks for any tips

You might try texture-based approaches such as o co-occurence matrix. Reasonably close to your FFT approach (you look for patterns in local similarity), but not restricted to simple frequencies.

What if you try extracting color planes from the RGB image? The "greener" components (i.e. the trees) should lie all in the green plane in RGB color space, whereas flowers will share components between red, green and blue (thus if you average the three planes I expect you to see the flowers enhanced.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js