Best algorithm for feature detection in urban environment - OpenCV - c++

I'm using OpenCV library (C++) to extract detectors from 2 images coming from a video stream taker from an aerial camera in order to, afterwards, find the matching points in successive images. i'm wondering which is the best algorithm to find robust detectors of a urban environment??
Ps. Actually I'm using SURF but when the images changes a little (because the camera is translating very slowly) the matchings between these descriptors become very few!

If you want to try different aproaches give a try to RoboRealm , they have a trial version, you just put the algoritms and seems the results, for testing purposes even if you will use OpenCV its ok.

Related

Stitching images can't detect common feature points

I wish to stitch two or more images using OpenCV and C++. The images have regions of overlap but they are not being detected. I tried using homography detector. Can someone please suggest as to what other methods I should use. Also, I wish to use the ORB algorithm, and not SIFT or SURF.
The images can be found at-
https://drive.google.com/open?id=133Nbo46bgwt7Q4IT2RDuPVR67TX9xG6F
This a very common problem. Because images like this, they actually do not have much in common. The overlap region is not rich in feature. What you can do is dig into opencv stitcher code and there they use confidence factor for feature matching, you can play with that confidence factor to get matches in this case. But this will only work if your feature detector is able to detect some features in overlapping resion.
You can also look at this post:
Related Question
It might be helpful for you.
"OpenCV stitching code"
This is full pipleline of OPencv Stitching code. You can see that there are lot of parameters you can change to make your code give some good stitching result. Also I would suggest using a small image (640 X480) for the feature detection step. Using small images is better than using very large images

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Face recognition using neural networks

I am doing a project on face recognition, for that I have already used different methods like eigenface, fisherface, LBP histograms and surf. But these methods are not giving me an accurate result. Surf gives good matches for exact same images, but I need to match one image with it's own different poses(wearing glasses,side pose,if somebody is covering his face) etc. LBP compares histogram of images, i.e., only color informations. So when there is high variation on lighting condition it is not showing good results. So I heard about neural networks, but I don't know much about that. Is it possible to train the system very accurately by using neural networks. If possible how can we do that?
According to this OpenCV page, there does seem to be some support for machine learning. That being said, the support does seem to be a bit limited.
What you could do, would be to:
User OpenCV to extract the face of the person.
Change the image to grey scale.
Try to manipulate so that the face is always the same size.
All the above should be doable with OpenCV itself (could be wrong, haven't messed with OpenCV in a while) so that should save you some time.
Next, you take the image, as a bitmap maybe, and feed the bitmap as a vector to the neural network. Alternatively, as #MatthiasB recommended, you could feed the features instead of individual pixels. This would simplify the data being passed, thus making the network easier to train.
As for training, you manipulate these images as above, and then feed them to the network. If a person uses glasses occasionally, you could have cases of the same person with and without glasses, etc.

FFT based image registration (optionally using OpenCV) in cpp?

I'm trying to align two images taken from a handheld camera.
At first, I was trying to use the OpenCV warpPerspective method based on SIFT/SURF feature points. The problem is the feature-extract & matching process may be extremely slow when the image quality is high (3000x4000). I tried to scale-down the image before find feature-points, the result is not as good as before.(The Mat generated from findHomography shouldn't be affected by scaling down the image, right?) And sometimes, due to lack of good feature point matches, the result is quite strange.
After searching on this topic, it seems that solving the problem in Fourier domain will speed up the registration process. And I've found this question which leads me to the code here.
The only problem is the code is written in python with numpy (not even using OpenCV), which makes it quite hard to re-written to C++ code using OpenCV (In OpenCV, I can only find dft and there's no fftshift nor fft stuff, I'm not quite familiar with NumPy, and I'm not brave enough to simply ignore the missing methods). So I'm wondering why there is not such a Fourier-domain image registration implementation using C++?
Can you guys give me some suggestion on how to implement one, or give me a link to the already implemented C++ version? Or help me to turn the python code into C++ code?
Big thanks!
I'm fairly certain that the FFT method can only recover a similarity transform, that is, only a (2d) rotation, translation and scale. Your results might not be that great using a handheld camera.
This is not quite a direct answer to your question, but, as a suggestion for a speed improvement, have you tried using a faster feature detector and descriptor? In OpenCV SIFT/SURF are some of the slowest methods they have for feature extraction/matching. You could try testing some of their other methods first, they all work quite well and are faster than SIFT/SURF. Especially if you use their FLANN-based matcher.
I've had to do this in the past with similar sized imagery, and using the binary descriptors OpenCV has increases the speed significantly.
If you need only shift you can use OpenCV's phasecorrelate

How to make rgbdemo working with non-kinect stereo cameras?

I was trying to get RGBDemo(mostly reconstructor) working with 2 logitech stereo cameras, but I did not figure out how to do it.
I noticed that there is a opencv grabber in nestk library and its header file is included in the reconstructor.cpp. Yet, when I try "rgbd-viewer --camera-id 0", it keeps looking for kinect.
My questions:
1. Is RGBDemo only working with kinect so far?
2. If RGBDemo can work with non-kinect stereo cameras, how do I do that?
3. If I need to write my own implementation for non-kinect stereo cameras, any suggestion on how to start?
Thanks in advance.
if you want to do it with non-kinect cameras. You don't even need stereo. There are algorithms now that are able to determine whether two images' viewpoints are sufficiently different that they can be used as if they were taken by a stereo camera. In fact, they use images from different cameras that are found on the internet and reconstruct 3D models of famous places. I can write you a tutorial on how to get it working. I've been meaning to do so. The software is called Bundler. Along with Bundler, people often also use CMVS and PMVS. CMVS preprocesses the images for PMVS. PMVS generates dense clouds.
BUT! I highly recommend that you don't go this route. It makes a lot of mistakes because there is so much less information in 2D images. It makes it very hard to reconstruct the 3D model. So, it ends up making a lot of mistakes, or not working. Although Bundler and PMVS are awesome compared to previous software, the stuff you can do with kinect is on a whole other level.
To use kinect will only cost you $80 for the kinect off of ebay or $99 off of amazon and another $5 for the power adapter off of amazon. So, I'd highly recommend this route. Kinect provides much more information for the algorithm to work with than 2D images do, making it much more effective, reliable and fast. In fact, it could take hours to process images with Bundler and PMVS. Whereas with kinect, I made a model of my desk in just a few seconds! It truly rocks!