OpenCV image-based optical flow field - c++

I am looking for a simple algorithm to detect the optical flow of the entire input.
In OpenCV, the Lucas-Kanade point tracking functionality is really good, but it is very slow for more than a handful of points. I am looking for an image-based result, rather than point-based. The only information I can find is about LK tracking.
I can calculate the magnitude of motion based on simple frame differencing, but I want to know the direction too. I basically want to end up with an optical flow field texture that I can feed into a gpu fluid simulation.
There must be some simple algorithm based on elementary motion detectors or something. Something like a combination of frame differencing, scaling and blurring with 3 sequential frames.
Just to be super clear, I DON'T want information on the Lucas-Kanade method.

OpenCV has a BackgroundSubtractor class that does frame differencing, I guess you'll have to do the blurring part yourself. This is, however, not strictly a calculation of optical flow.
Farneback has a method for dense optical flow, implemented in OpenCV through the cv::calcOpticalFlowFarneback(..) method. It will generate a matrix "flow" which has magnitude and direction components. Horn-Schunck method is not a built-in in OpenCV.
PS: Lukas Kanade is not very slow. It's probably the extraction of feature points that is slow. Try using cv::FAST detector.

Related

Mono Slam scale consistency

When running mono Slam how is scale consistency achieved between frames? One short tutorial ran the 5point algorithm repeatedly and used a motion model for consistency of scale between frames but that is clearly not done in the general case.
I think I read something once about how the 5 point algorithm is used initially to estimate motion between first two frames then features are tracked over time using reprojection loss and 3d projection.
How is it done?

Undistorting without calibration images

I'm currently trying to "undistort" fisheye imagery using OpenCV in C++. I know the exact lens and camera model, so I figured that I would be able to use this information to calculate some parameters and ultimately convert fisheye images to rectilinear images. However, all the tutorials I've found online encourage using auto-calibration with checkerboards. Is there a way to calibrate the fisheye camera by just using camera + lens parameters and some math? Or do I have to use the checkerboard calibration technique?
I am trying to avoid having to use the checkerboard calibration technique because I am just receiving some images to undistort, and it would be undesirable to have to ask for images of checkerboards if possible. The lens is assumed to retain a constant zoom/focal length for all images.
Thank you so much!
To un-distord an image, you need to know the intrinsic parameters of the camera which describe the distorsion.
You can't compute them from datasheet values, because they depend on how the lens is manufactured and two lenses of the same vendor & model might have different distorsion coefficients, especially if they are cheap one.
Some raster graphics editor embed a lens database from which you can query distorsion coefficients. But there is no magic, they built it by measuring the lens distorsion and eventually interpolate them after.
But you can still use an empiric method to correct at least barrel effect.
They are plenty of shaders to do so and you can always do your own maths to build a distorsion map.

In computer vision, what does MVS do that SFM can't?

I'm a dev with about a decade of enterprise software engineering under his belt, and my hobbyist interests have steered me into the vast and scary realm of computer vision (CV).
One thing that is not immediately clear to me is the division of labor between Structure from Motion (SFM) tools and Multi View Stereo (MVS) tools.
Specifically, CMVS appears to be the best-in-show MVS tool, and Bundler seems to be one of the better open source SFM tools out there.
Taken from CMVS's own homepage:
You should ALWAYS use CMVS after Bundler and before PMVS2
I'm wondering: why?!? My understanding of SFM tools is that they perform the 3D reconstruction for you, so why do we need MVS tools in the first place? What value/processing/features do they add that SFM tools like Bundler can't address? Why the proposed pipeline of:
Bundler -> CMVS -> PMVS2
?
Quickly put, Structure from Motion (SfM) and MultiView Stereo (MVS) techniques are complementary, as they do not deal with the same assumptions. They also differ slightly in their inputs, MVS requiring camera parameters to run, which is estimated (output) by SfM. SfM only gives a coarse 3D output, whereas PMVS2 gives a more dense output, and finally CMVS is there to circumvent some limitations of PMVS2.
The rest of the answer provides an high-level overview of how each method works, explaining why it is this way.
Structure from Motion
The first step of the 3D reconstruction pipeline you highlighted is a SfM algorithm that could be done using Bundler, VisualSFM, OpenMVG or the like. This algorithm takes in input some images and outputs the camera parameters of each image (more on this later) as well as a coarse 3D shape of the scene, often called the sparse reconstruction.
Why does SfM outputs only a coarse 3D shape? Basically, SfM techniques begins by detecting 2D features in every input image and matching those features between pairs of images. The goal is, for example, to tell "this table corner is located at those pixels locations in those images." Those features are described by what we call descriptors (like SIFT or ORB). Those descriptors are built to represent a small region (ie. a bunch of neighboring pixels) in images. They can represent reliably highly textured or rough geometries (e.g., edges), but these scene features need to be unique (in the sense distinguishing) throughout the scene to be useful. For example (maybe oversimplified), a wall with repetitive patterns would not be very useful for the reconstruction, because even though it is highly textured, every region of the wall could potentially match pretty much everywhere else on the wall. Since SfM is performing a 3D reconstruction using those features, the vertices of the 3D scene reconstruction will be located on those unique textures or edges, giving a coarse mesh as output. SfM won't typically produce a vertex in the middle of surface without precise and distinguishing texture. But, when many matches are found between the images, one can compute a 3D transformation matrix between the images, effectively giving the relative 3D position between the two camera poses.
MultiView Stereo
Afterwards, the MVS algorithm is used to refine the mesh obtained by the SfM technique, resulting in what is called a dense reconstruction. This algorithm requires the camera parameters of each image to work, which is output by the SfM algorithm. As it works on a more constrained problem (since they already have the camera parameters of every image like position, rotation, focal, etc.), MVS will compute 3D vertices on regions which were not (or could not be) correctly detected by descriptors or matched. This is what PMVS2 does.
How can PMVS work on regions where 2D feature descriptor would difficultly match? Since you know the camera parameters, you know a given pixel in an image is the projection of a line in another image. This approach is called epipolar geometry. Whereas SfM had to seek through the entire 2D image for every descriptor to find a potential match, MVS will work on a single 1D line to find matches, simplifying the problem quite a deal. As such, MVS usually takes into account illumination and object materials into its optimization, which SfM does not.
There is one issue, though: PMVS2 performs a quite complex optimization that can be dreadfully slow or take an astronomic amount of memory on large image sequences. This is where CMVS comes into play, clustering the coarse 3D SfM output into regions. PMVS2 will then be called (potentially in parallel) on each cluster, simplifying its execution. CMVS will then merge each PMVS2 output in an unified detailed model.
Conclusion
Most of the information provided in this answer and many more can be found in this tutorial from Yasutaka Furukawa, author of CMVS and PMVS2:
http://www.cse.wustl.edu/~furukawa/papers/fnt_mvs.pdf
In essence, both techniques emerge from two different approaches: SfM aims to perform a 3D reconstruction using a structured (but theunknown) sequence of images while MVS is a generalization of the two-view stereo vision, based on human stereopsis.

Remove noise from the computed optical flow

I compute the optical flow on grayscale videos which contains true-white and noisy-black patch besides the useful information. I want to remove those patches because the correspondant optical flow is foolish.
Those patches are on the edges of the image and their sizes vary from a video to another. My goal is to extract a bounding box describing the useful information in my video thanks to the optical flow.
How can I compute this bounding box ? Or at least, how can I remove the computed optical flow in those regions ?
Edit : I saw your answers. I'll try that next week end then come back to discuss about that. Tank you !
Remove noise from optical flow could be a complicated task. A simple and dummy way could be to use a threshold on the optical flow vector intensity.
But if you only need to find bounding boxes why just do not use a simple background/motion object segmentation? Like MOG, GMG, opencv has nice implementations of them and they works well and are quite fast. See this tutorial.
It's a little tough to understand what the problem is, if the noises is true-white and noisy-black patches in a grayscale image as you have said, then I suggest you look at eroding and dilating. More information can be found here: Eroding and Dilating
Should this not be what you are asking, do post some sample images with the patches and comment so that I can have a clearer idea on what the problem is. Cheers.
If I understand correctly, you are getting noisy optical flow in patches which are grey/white or basically uniform. A simple approach would be to divide the image into small patches and compute the entropy over each patch. Now, patches which have a very low entropy can be discarded by choosing an appropriate threshold because they do not contain much information.

Traffic Motion Recognition

I'm trying to build a simple traffic motion monitor to estimate average speed of moving vehicles, and I'm looking for guidance on how to do so using an open source package like OpenCV or others that you might recommend for this purpose. Any good resources that are particularly good for this problem?
The setup I'm hoping for is to install a webcam on a high-rise building next to the road in question, and point the camera down onto moving traffic. Camera altitude would be anywhere between 20 ft and 100ft, and the building would be anywhere between 20ft and 500ft away from the road.
Thanks for your input!
Generally speaking, you need a way to detect cars so you can get their 2D coordinates in the video frame. You might want to use a tracker to speed up the process and take advantage of the predictable motion of the vehicles. You, also, need a way to calibrate the camera so you can translate the 2D coordinates in the image to depth information so you can approximate speed.
So as a first step, look at detectors such as deformable parts model DPM, and tracking by detection methods. You'll probably need to port some code from Matlab (and if you do, please make it available :-) ). If that's too slow, maybe do some segmentation of foreground blobs, and track the colour histogram or HOG descriptors using a Particle Filter or a Kalman Filter to predict motion.