Image Edge Detection in C++ - c++

I am trying to find a way to determine the correctness of edge detection. I want it to have little markers showing where the program determines the edges to be with something like x's or dots or lines. I am looking for something that does this: http://en.wikipedia.org/wiki/File:Corner.png

OpenCV has an edge detector and is usable in C++. As it happens the image you linked to is used in the article describing (one of) the built in algorithms.

The image you link to ins't edge detection.
Edge detection is normally just finding abrubt brightness changes in a greyscale image - you do this with differention - eg. Sobel operator.
Specifically finding corners is either done with SIFT or something like Laplacian of Gaussians

That image is not result of edge detection operations! It's corner detection. They have entirely different purposes:
Corner detection is an approach used
within computer vision systems to
extract certain kinds of features and
infer the contents of an image. Corner
detection is frequently used in motion
detection, image matching, tracking,
image mosaicing, panorama stitching,
3D modelling and object recognition.
Corner detection overlaps with the
topic of interest point detection.
OpenCV has corner detection algorithms. The latest link includes a source code example for VS 2008. You can also check this link for another example. Google can provide much more.

Related

automatically getting edge detection for image alignment

I am trying to do image alignment like posted on adrian blog like this image or in this link.
I want to do image alignment on this kind of image. The problem is I want to automatically detect the 4 point edges which are hard to detect in this kind of images with contour detection like in the tutorial.
Now I can do alignment just fine with manually input edge coordinates. Some of my friends suggest me to detect the edges with dlib landmark detection, but as far as I can see it mostly uses shape in which dlib automatically marking the landmark.
Do I miss something here? Or is there any tutorial or even basic guide about how to do that?
Maybe you can try to detect edges on a Gaussian pyramid. You can find an explanation here https://en.wikipedia.org/wiki/Pyramid_(image_processing). The basic idea is that by filtering with Gaussian filters of increasing size, the small objects are blurred. Thus at some scale, we get only edges of the showcase (maybe need further processing).
Here is the tutorial of opencv on image pyramid: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_pyramids/py_pyramids.html.
I think wavelet pyramid (do wavelet transform several times) may work for your problem, since wavelet can reduce the details in image.

In computer vision, what does MVS do that SFM can't?

I'm a dev with about a decade of enterprise software engineering under his belt, and my hobbyist interests have steered me into the vast and scary realm of computer vision (CV).
One thing that is not immediately clear to me is the division of labor between Structure from Motion (SFM) tools and Multi View Stereo (MVS) tools.
Specifically, CMVS appears to be the best-in-show MVS tool, and Bundler seems to be one of the better open source SFM tools out there.
Taken from CMVS's own homepage:
You should ALWAYS use CMVS after Bundler and before PMVS2
I'm wondering: why?!? My understanding of SFM tools is that they perform the 3D reconstruction for you, so why do we need MVS tools in the first place? What value/processing/features do they add that SFM tools like Bundler can't address? Why the proposed pipeline of:
Bundler -> CMVS -> PMVS2
?
Quickly put, Structure from Motion (SfM) and MultiView Stereo (MVS) techniques are complementary, as they do not deal with the same assumptions. They also differ slightly in their inputs, MVS requiring camera parameters to run, which is estimated (output) by SfM. SfM only gives a coarse 3D output, whereas PMVS2 gives a more dense output, and finally CMVS is there to circumvent some limitations of PMVS2.
The rest of the answer provides an high-level overview of how each method works, explaining why it is this way.
Structure from Motion
The first step of the 3D reconstruction pipeline you highlighted is a SfM algorithm that could be done using Bundler, VisualSFM, OpenMVG or the like. This algorithm takes in input some images and outputs the camera parameters of each image (more on this later) as well as a coarse 3D shape of the scene, often called the sparse reconstruction.
Why does SfM outputs only a coarse 3D shape? Basically, SfM techniques begins by detecting 2D features in every input image and matching those features between pairs of images. The goal is, for example, to tell "this table corner is located at those pixels locations in those images." Those features are described by what we call descriptors (like SIFT or ORB). Those descriptors are built to represent a small region (ie. a bunch of neighboring pixels) in images. They can represent reliably highly textured or rough geometries (e.g., edges), but these scene features need to be unique (in the sense distinguishing) throughout the scene to be useful. For example (maybe oversimplified), a wall with repetitive patterns would not be very useful for the reconstruction, because even though it is highly textured, every region of the wall could potentially match pretty much everywhere else on the wall. Since SfM is performing a 3D reconstruction using those features, the vertices of the 3D scene reconstruction will be located on those unique textures or edges, giving a coarse mesh as output. SfM won't typically produce a vertex in the middle of surface without precise and distinguishing texture. But, when many matches are found between the images, one can compute a 3D transformation matrix between the images, effectively giving the relative 3D position between the two camera poses.
MultiView Stereo
Afterwards, the MVS algorithm is used to refine the mesh obtained by the SfM technique, resulting in what is called a dense reconstruction. This algorithm requires the camera parameters of each image to work, which is output by the SfM algorithm. As it works on a more constrained problem (since they already have the camera parameters of every image like position, rotation, focal, etc.), MVS will compute 3D vertices on regions which were not (or could not be) correctly detected by descriptors or matched. This is what PMVS2 does.
How can PMVS work on regions where 2D feature descriptor would difficultly match? Since you know the camera parameters, you know a given pixel in an image is the projection of a line in another image. This approach is called epipolar geometry. Whereas SfM had to seek through the entire 2D image for every descriptor to find a potential match, MVS will work on a single 1D line to find matches, simplifying the problem quite a deal. As such, MVS usually takes into account illumination and object materials into its optimization, which SfM does not.
There is one issue, though: PMVS2 performs a quite complex optimization that can be dreadfully slow or take an astronomic amount of memory on large image sequences. This is where CMVS comes into play, clustering the coarse 3D SfM output into regions. PMVS2 will then be called (potentially in parallel) on each cluster, simplifying its execution. CMVS will then merge each PMVS2 output in an unified detailed model.
Conclusion
Most of the information provided in this answer and many more can be found in this tutorial from Yasutaka Furukawa, author of CMVS and PMVS2:
http://www.cse.wustl.edu/~furukawa/papers/fnt_mvs.pdf
In essence, both techniques emerge from two different approaches: SfM aims to perform a 3D reconstruction using a structured (but theunknown) sequence of images while MVS is a generalization of the two-view stereo vision, based on human stereopsis.

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Is Face Detection needed before doing annotation - Image Processing

I need to annotate frontal (or near frontal) images using openCV. I'm currently going through the OpenCV manual and the book "Mastering OpenCV". This is the first time I'm using OpenCV and due to that I'm little bit confused with annotation and face detection.
I need to mark about 25 points in the human face. The required points are there in eyes, mouth, nose, eyes, ears .My question is :
Is it necessary to detect the face first, and then eyes, eyebrows, mouth, nose, ears. Is it the case that then only I can proceed with annotation. The reason why I'm asking this is that I'll be doing the annotation manually. So that, obviously I can see where the face is and then eyes, nose etc. I don't see the point of detecting the face first.
Can someone explain whether face detection is really needed in this case ?
According to the book "Mastering openCV" , I need to do the following step-by-step.
(1) Loading Haar Detector for face Detection
(2) Grayscale colour conversion
(3) Shrinking the image
(4) Histogram Equalization
(5) Detecting the face
(6) Face preprocessing to detect eyes, mouth, nose etc.
(7) Annotation
Face detection allows a computer algorithm to search an image much much faster for the features like eyes & mouth.
If you are annotating the image yourself then it is of course much quicker just to annotate the wanted features and ignore unwanted ones.
No, You don't need to annotate landmarks for face detection, Opencv provide you by some functions to detect faces, using some already trained models using Haar Cascades classifiers, prepared in opencv package as xml files, you just need to call them as explained here
Annotation of images by some predefined landmarks is used to detect facial expression, and some facial details as estimation of head pose in the space, for these purposes AAM, ASM models are used.
As well, annotating images is a step to train a model, for that you may use a lot of universal annotated databases, available on internet, whereas your test images don't need to be annotated

Facial Feature Points Detection using OpenCV

I want to detect the points on a face as shown in the picture
I am using OpenCV CascadeClassifier::detectMultiScale.
I am using the haarcascade_frontalface_alt, haarcascade_eye, haarcascade_mcs_mouth xml files.
I am satisfied with the face detection but not with the facial feature points detection.
I want the feature points detection to work for the images upto a distance of 8 feet.
I am looking for more accuracy and robustness wrt pose(15 degrees) and opening of mouth, without compromising speed.
I am looking for speed of 25fps on an i5 processor.
Can anyone suggest/refer me any libraries/open source codes for my problem.
C++ platform.
Try Flandmarks http://cmp.felk.cvut.cz/~uricamic/flandmark/
It extracts 7 feature points but you will not be able to get the feature points for upperlip and the lower lip.
You can try using an ASM mesh to fit the face. There are several implementations that use ASM/AAM.
https://code.google.com/p/asmlib-opencv/ is an open source library which has built in dataset for face images. Do look into it.
Cheers