Image Correspondence - Matching regions of images - computer-vision

I have 2 images with the same content but might have different scale or rotation. The problem is, I have to find the regions of these images and match them with one another. For example, if I have a circle on image1, i have to find the corresponding circle in image2.
I just like to ask what the proper way of solving this is. I am looking at the matchShapes of opencv. I believe this problem is image correspondence but I really have no idea how to solve it!
Thanks in advance!
I have the following images:
Template Image => https://lh6.googleusercontent.com/-q5qeExXUlpc/T7SbL9yWmCI/AAAAAAAAByg/gV_vM1kyLnU/w348-h260-n-k/1.labeled.jpg
Sample Image => https://lh4.googleusercontent.com/-x0IWxV7JdbI/T7SbNjG5czI/AAAAAAAAByw/WSu-y5O7ee4/w348-h260-n-k/2.labeled.jpg
Note that the numbers on the images correspond to the proper matching of regions. These are not present when comaparing the images.

As usually with computer vision problems, you can never provide too much information and make too many assumptions about the data you intend to analyze. Solving the general problem is close to impossible as we can't do human level pattern recognition with computers. How does your problem set look like? A few examples would be very helpful in trying to provide good answers.
You mention that the images have the same content, but with different colors. If that means it's the same scene photographed under different lighting conditions and from possibly different angles, you might need to do a rigid image registration first, so the feature points in the two images should overlap. If the shapes on your images might have multiple distortions compared to each other, you might be interested in non-rigid image registration.
If you already know the objects you are looking for, you can simply do a search for these objects in both images, for example with chamfer matching or any other matching algorithm.

Use ORB feature detector from OpenCV. Once you have the descriptors use BFMatcher with norm type NORM_HAMMING.

Related

Object detection using a single almost identical reference image

I would like to understand what solutions there are to perform object detection using a single almost identical reference image on a picture or in an augmented reality setting.
To be more specific: I want to detect flat (i.e. 2-dimensional) and mostly rectangular objects. I have a database with "perfect" reference images (high quality, full frontal, exact colors, no alterations, etc.) of the objects to be detected but I may have only one reference for each object.
I am talking about things such as logos, famous paintings and playing cards so the reference will have exactly the same content, shape and proportions as the object. From my understanding, the only difference between the object and the reference could then be perspective and a difference in lighting conditions. Let's assume none of these are very extreme (e.g. no sharp angle or colored light).
I know that image recognition and object detection usually requires many training images but given these simplified conditions, is there a way to make it work with one or few images (or create several by transforming one)?
I looked here and elsewhere and the only thing I found so far was this example of the Vuforia SDK: https://www.youtube.com/watch?v=MtiUx_szKbI&t=1m10s. One image of a card in a card game is apparently enough to create an overlay so I assume there are ways. This is not my field of expertise so I hope you guys can help me out :)
If there were no perspective distortion, you could use simple normalized cross-correlation. But since there is, you probably want to use SURF. The basic algorithm to use SURF to find your reference image within a world image is:
find keypoints, such as corners, in both images.
describe the local texture of each keypoint.
use those descriptors to match keypoints between images. If there are a lot of matches, with consistent geometry, you've probably found your object.
Check out this tutorial, that walks you through doing exactly that: http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html#feature-homography

Extracting the descriptors of a bunch of sample images and training an SVM in OpenCV

I have 4 types of symbols of musical notes of the same color: Whole note, half note, Crotchet and quaver. I need to classify an image and tell if it has one of this symbols (just one for now) and which one. for example, if i have an image with just the musical staff (but nothing else in it) it should tell me that the image is empty, but if i have an image with a Half note symbol in it, it should tell me something like "it is a half note".
Suppose i have 20 sample images for each possible symbol and 20 with the base case (nothing in it), i want to train a SVM to classify any input image. I've read about how i could do it, but i still have certain doubts. i think the process is something like this (and please correct me if i'm wrong):
extract the descriptors of all the sample images.
put those descriptors inside different Mat Objects (one for each symbol).
feed those Mats to the SVM to train it.
Use the SVM to classify the images.
i have specific doubts about what i think is the process:
is what i described the correct process for what i need to do?
should i pre-process the sample images (say extract the background and apply canny edges) before i feed them to the descriptor extractor? o can i leave them as they are?
i have read about three methods of extracting the descriptors: HOG, BOW (Bag of Words) and SIFT. i think they all do what i need but i don't know which one to use. i see that HOG is mostly (if not all times) for face and pedestrians detection and i don't know if it could be used for my case. Any advice of which one should use?
how many sample images should i have for every case?
i dont need specific details of the implementation, but i do need answers to these questions, thank you in advance
I'm not an expert of SIFT and BOW but I know something about HOG and SVM.
1 Is what i described the correct process for what i need to do?
If you are using OpenCV and HOG no that is not correct. Have a look to the sample code for HOG in OpenCV samples and you will find that, once extracted, the descriptors directly feed the SVM without filling a MAT element.
2 should i pre-process the sample images (say extract the background and apply canny edges) before i feed them to the descriptor extractor? o can i leave them as they are?
This is not mandatory. Preprocessing has been proved to be very useful but for your simple case you wont need it. On the other hand, if your wall presents draws, stickers or something that can confuse the detector then yes. It can be a good solution to decrease the number of false positives.
3 i have read about three methods of extracting the descriptors: HOG, BOW (Bag of Words) and SIFT. i think they all do what i need but i don't know which one to use. i see that HOG is mostly (if not all times) for face and pedestrians detection and i don't know if it could be used for my case. Any advice of which one should use?
I have direct knowledge only of HOG. You can easily implement your own detector with HOG without any problem, I'm currently using it for traffic signs. Pay attention to the detection window that you want to use. You can leave all the other parameters as they are, it will work for simple cases.
4 how many sample images should i have for every case?
Once again it depends on the situation. I would say that 200 images (try also with less) for class will do the trick but you can always increase the number by applying some transformation on the positives. Try to flip, saturate or blur the images.
Some more considerations. I think that you can work with grey scale images due to the fact that color is not important to distinguish the notes (all the same color right?). If you have problem with false positives you can try to use the HSV color space to filter out patches that you will then use to detect the notes (it really works well with red!!). The easiest way to train your SVM is using a linear kernel and then train a model for each class.

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

C++ OpenCV sky image stitching

Some background:
Hi all! I have a project which involves cloud imaging. I take pictures of the sky using a camera mounted on a rotating platform. I then need to compute the amount of cloud present based on some color threshold. I am able to this individually for each picture. To completely achieve my goal, I need to do the computation on the whole image of the sky. So my problem lies with stitching several images (about 44-56 images). I've tried using the stitch function on all and some subsets of image set but it returns an incomplete image (some images were not stitched). This could be because of a lack of overlap of something, I dunno. Also the output image has been distorted weirdly (I am actually expecting the output to be something similar to a picture taken by a fish-eye lense).
The actual problem:
So now I'm trying to figure out the opencv stitching pipeline. Here is a link:
http://docs.opencv.org/modules/stitching/doc/introduction.html
Based on what I have researched I think this is what I want to do. I want to map all the images to a circular shape, mainly because of the way how my camera rotates, or something else that has uses a fairly simple coordinate transformation. So I think I need get some sort of fixed coordinate transform thing for the images. Is this what they call the homography? If so, does anyone have any idea how I can go about my problem? After this, I believe I need to get a mask for blending the images. Will I need to get a fixed mask like the one I want for my homography?
Am I going through a possible path? I have some background in programming but almost none in image processing. I'm basically lost. T.T
"So I think I need get some sort of fixed coordinate transform thing for the images. Is this what they call the homography?"
Yes, the homography matrix is the transformation matrix between an original image and the ideal result. It warps an image in perspective so it can fit in stitching to the other image.
"If so, does anyone have any idea how I can go about my problem?"
Not with the limited information you provided. It would ease the problem a lot if you know the order of pictures (which borders which.. row, column position)
If you have no experience in image processing, I would recommend you use a tutorial covering stitching using more basic functions in detail. There is some important work behind the scenes, and it's not THAT harder to actually do it yourself.
Start with this example. It stitches two pictures.
http://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/

Pattern Recognition in C++

I have a simple template grayscale image, with white background and black shape over it, and I have several similar test images, I want to compare these two images and see if template matches any of the test images. Can you please suggest a simple(easy to use) pattern recognition library for C++ which takes two images and compares them and shows the result?
Just do image1-image2 for all pixels. Then sum up all the differences. The lower the results, the closer the images.
If your pattern could be of several sizes, then you have to resize it and check it for each positions.
Implement a Neural Network on the image. Inputs should be the greyscales of your image. you should train your network to a train set, chose proper regularization parameters using a cross validation set, and finally test your network on a test set.
http://www.codeproject.com/Articles/13582/Back-propagation-Neural-Net
(I have done this myself to train a network to recognise hand written digits - it works very well.)
How simple the library you need is depends on the specific parameters of your problem. OpenCV is a great image processing library that should be able to do what you need it to. Here is a tutorial on template matching in OpenCV. It makes it very easy to switch between matching metrics and choose the best one for your problem.