What's the best way for complex shape matching

What's the best way for complex shape matching - c++

I need to know what's the best way to match certain shape (template) in the image.
I know there is several ways, but some of them did not lead to a very good results and the another need a lot of process time, so anyone tried a good and fast way to do the matching with short process time.
For example this is the template...
And I have a sample and I want to compare the sample with the template and return true if the sample is similar to the template else return false.
Note: I tried contour matching, Cascade Classification, and SURF, but all of them is not very good or the process time is not so good.

Matching things with eachother can be a rather difficult task, mainly due to the fact that different techniques have very different characteristics and can yield almost perfect results on some categories and very bad results on others.
This said, I don't think you'll ever get an answer to your question, at least not one that says "Use xyx method from [cited paper], that will solve all your problems". I'll try to point out some examples for you hoping that it'll help.
Template matching operator: compare a template with a sliding window on your image, can achieve very good results if your template is very similar to the object you are looking for in the image, no matter how complex it is. Can be very fast, it's not invariant to basically anything, so if you plan to have rotations, significant changes in lighting or something else, this is probably not going to work for you. here you can find out some code. Watch out which color space are you using, different color spaces can achieve very different results if used right (e.g. for face analysis HSV can be better that RGB in some cases)
Keypoint matching like SIFT or SURF: I used this a lot with very good results. You'll need to decide what descriptor to use and what matcher. OpenCV has some nice examples,here you can find one. Not going to be the fastest way to match your object since these descriptors can take some time to be extracted, it's good if you don't know much about the conditions you'll be working in though: it's usually robust to scale, rotation and lightning changes as long that keypoints can be correctly found on both the template and the image.
Shape matching: I was rather surprised when, in an image classification competition i participated in, I had been able to use a simple HOG descriptor to obtain very discriminating information about my images. Histograms of Oriented Gradients are a rather powerful tool for describing the shape of an object, it uses edge orientation and magnitude to describe your image. They can be fast to compute (OpenCV has a a GPU implementation I think), configurable (you can decide how thick your grid can be and how many cells, resulting in very different information). HOGs are not invariant to rotation, seen the object from a different angle will likely produce a different histogram, but they are very robust to lighting changes due to the fact that doesn't use color.
HOGs are just an example, there are a lot of shape and contour descriptors but basically they offer pretty much the same I think.
Histogram matching: not my first choice, it can be useful if you know something about the object and the rest of image. For example, if you know you are looking for your pink flower in a jungle image where it's the only pink thing there, a simple color histogram matching will do just fine. Pick up a sliding window, run it over your image, compare your histograms and you'll be done. Very fast, very simple, it doesn't use the shape at all so no matter how complex your object is, you'll find it. Not using shape makes it robust to rotations, watch out for lighting changes though. A very big limitations of this method is that if there are other pink things in your jungle you won't be able to distinguish.
Hybrid approaches: here is where you can get the best out of the techniques cited above. As you have seen, most of them work well in a certain environment and quite bad in others. You can use a combination of the techniques you know and obtain something much better than the sum of the parts. I worked a lot with HOGs and head pose estimation and a real breakthrough came when we started extracting HOGs not in a dense way but around certain keypoints. You'll need to know your problem, find out what do you need and adapt a bunch of methods to it. In general, hybrid methods can work a lot better and a lot slower.
Hope this helps you a bit, I don't think that, given the information you gave us, I could give you a much better answer..(probably someone else can, that's why I'm still a student :) )

Related

Im trying to use this method to detect moving object. Can someone advise me for this?

I want to ask about what kind of problems there be if i use this method to extract foreground.
The condition before using this method is that it runs on fixed camera so there're not going to be any movement on camera position.
And what im trying to do is below.
read one frame from camera and set this frame as background image. This is done periodically.
periodically subtract frames that are read afterward to background image above. Then there will be only moving things colored differently from other area
that are same to background image.
then isolate moving object by using grayscale, binarization, thresholding.
iterate above 4 processes.
If i do this, would probability of successfully detect moving object be high? If not... could you tell me why?

If you consider illumination change(gradually or suddenly) in scene, you will see that your method does not work.
There are more robust solutions for these problems. One of these(maybe the best) is Gaussian Mixture Model applied for background subtraction.
You can use BackgroundSubtractorMOG2 (implementation of GMM) in OpenCV library.

Your scheme is quite adequate to cases where the camera is fix and the background is stationary. Indoor and man-controlled scenes are more appropriate to this approach than outdoor and natural scenes .I've contributed to a detection system that worked basically on the same principals you suggested. But of course the details are crucial. A few remarks based on my experience
Your initialization step can cause very slow convergence to a normal state. You set the background to the first frames, and then pieces of background coming behind moving objects will be considered as objects. A better approach is to take the median of N first frames.
Simple subtraction may not be enough in cases of changing light condition etc. You may find a similarity criterion better for your application.
simple thresholding on the difference image may not be enough. A simple approach is to dilate the foreground for the sake of not updating the background on pixels that where accidentally identified as such.
Your step 4 is unclear, I assumed that you mean that you update the foreground only on those places that are identified as background on the last frame. Note that with such a simple approach, pixels that are actually background may be stuck forever with a "foreground" labeling, as you don't update the background under them. There are many possible solutions to this.

There are many ways to solve this problem, and it will really depend on the input images as to which method will be the most appropriate. It may be worth doing some reading on the topic
The method you are suggesting may work, but it's a slightly non-standard approach to this problem. My main concern would be that subtracting several images from the background could lead to saturation and then you may lose some detail of the motion. It may be better to take difference between consecutive images, and then apply the binarization / thresholding to these images.
Another (more complex) approach which has worked for me in the past is to take subregions of the image and then cross-correlate with the new image. The peak in this correlation can be used to identify the direction of movement - it's a useful approach if more then one thing is moving.
It may also be possible to use a combination of the two approaches above for example.
Subtract second image from the first background.
Threshold etc to find the ROI where movement is occurring
Use a pattern matching approach to track subsequent movement focussed on the ROI detected above.
The best approach will depend on you application but there are lots of papers on this topic

image difference algorithm based on visual perception

I have tried searching google for this for quite some time and have had no luck find what I need. What I am trying to do is I have a few different versions of the same image these versions being derived from the original by dithering them to a palette. The image is split into 8x8 tiles that use one of n palettes. I want to pick which palette looks best. There are a few questions on stackoverflow relating to image difference however they suggest a simple algorithm
Σ(R1n-R2n)^2+(G1n-G2n)^2+(B1n-B2n)^2 and picking the lowest sum. Some suggest that to improve this a better delta algorithm can be used, I have tried CIEDE2000 still found the results unsatisfactory and in many instances I can manually pick a better tile. Note that I did do a bit of verification of my implementation of CIEDE2000 using precomputed data and used this library for http://www.getreuer.info/home/colorspace rgb to CIELab so I doubt that the reason I am a bit disappointed with the results is due to a bug and most of the time it does an okay job. What I want to do is implement a better algorithm than just the delta to pick the best tile. Does anyone know of a good algorithm for this (inclusive) or better terminology so that when I search for this I find more relevant information?

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much

I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf

If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.

Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

remove gradient of a image without a comparison image

currently i am having much difficulty thinking of a good method of removing the gradient from a image i received.
The image is a picture taken by a microscope camera that has a light glare in the middle. The image has a pattern that goes throughout the image. However i am supposed to remove the light glare on the image created by the camera light.
Unfortunately due to the nature of the camera it is not possible to take a picture on black background with the light to find the gradient distribution. Nor do i have a comparison image that is without the gradient. (note- the location of the light glare will always be consistant when the picture is taken)
In easier terms its like having a photo with a flash in it but i want to get rid of the flash. The only problem is i have no way to obtaining the image without flash to compare to or even obtaining a black image with just the flash on it.
My current thought is conduct edge detection and obtain samples in specific locations away from the edges (due to color difference) and use that to gauge the distribution of gradient since those areas are supposed to have relatively identical colors. However i was wondering if there was a easier and better way to do this.
If needed i will post a example of the image later.
At the moment i have a preferrence of solving this in c++ using opencv if that makes it easier.
thanks in advance for any possible ideas for this problem. If there is another link, tutorial, or post that may solve my problem i would greatly appreciate the post.
as you can tell there is a light thats being shinned on the img as you can tell from the white spot. and the top is lighter than the bottome due to the light the color inside the oval is actually different when the picture is taken in color. However the color between the box and the oval should be consistant. My original idea was to perhaps sample only those areas some how and build a profile that i can utilize to remove the light but i am unsure how effective that would be or if there is a better way
EDIT :
Well i tried out Roger's suggestion and the results were suprisngly good. Using 110 kernel gaussian blurr to find illumination and conducting CLAHE on top of that. (both done in opencv)
However my colleage told me that the image doesn't look perfectly uniform and pointed out that around the area where the light used to be is slightly brighter. He suggested trying a selective gaussian blur where the areas above certain threshold pixel values are not blurred while the rest of the image is blurred.
Does anyone have opinions regarding this and perhaps a link, tutorial, or an example of something like this being done? Most of the things i find tend to be selective blur for programs like photoshop and gimp
EDIT2 :
it is difficult to tell with just eyes but i believe i have achieved relatively close uniformization by using a simple plane fitting algorithm.((-A * x - B * y) / C) (x,y,z) where z is the pixel value. I think that this can be improved by utilizing perhaps a sine fitting function? i am unsure. But I am relatively happy with the results. Many thanks to Roger for the great ideas.
I believe using a bunch of pictures and getting the avg would've been another good method (suggested by roger) but Unofruntely i was not able to implement this since i was not supplied with various pictures and the machine is under modification so i was unable to use it.

I have done some work in this area previously and found that a large Gaussian blur kernel can produce a reasonable approximation to the background illumination. I will try to get something working on your example image but, in the meantime, here is an example of your image after Gaussian blur with radius 50 pixels, which may help you decide if it's worth progressing.
UPDATE
Just playing with this image, you can actually get a reasonable improvement using adaptive histogram equalisation (I used CLAHE) - see comparison below - any use?
I will update this answer with more details as I progress.

I would like to point you to this paper: http://www.cs.berkeley.edu/~ravir/dirtylens.pdf, but, in my opinion, without any sort of calibration/comparison image taken apriori, it is difficult to mine out the ground truth from the flared image.
However, if you are trying to just present the image minus the lens flare, disregarding the actual scientific data behind the flared part, then you switch into the domain of image inpainting. Criminsi's algorithm, as described in this paper: http://research.microsoft.com/pubs/67276/criminisi_tip2004.pdf and explained/simplified in these two links: http://cs.brown.edu/courses/csci1950-g/results/final/eboswort/ http://www.cc.gatech.edu/~sooraj/inpainting/, will do a very good job in restoring texture information to the flared up regions. (If you'd really like to pursue this approach, do mention that. More comprehensive help can be provided for this).
However, given the fact that we're dealing with microscopic data, I doubt if you'd like to lose the scientific data contained in a particular region of an image. In that case, I really think you need to find a workaround to determine the flare model of the flash/light source w.r.t the lens you're using.
I hope someone else can shed more light on this.

Possibility of creating a software that can recognize context of an image?

I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?

Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js