I would like to understand what solutions there are to perform object detection using a single almost identical reference image on a picture or in an augmented reality setting.
To be more specific: I want to detect flat (i.e. 2-dimensional) and mostly rectangular objects. I have a database with "perfect" reference images (high quality, full frontal, exact colors, no alterations, etc.) of the objects to be detected but I may have only one reference for each object.
I am talking about things such as logos, famous paintings and playing cards so the reference will have exactly the same content, shape and proportions as the object. From my understanding, the only difference between the object and the reference could then be perspective and a difference in lighting conditions. Let's assume none of these are very extreme (e.g. no sharp angle or colored light).
I know that image recognition and object detection usually requires many training images but given these simplified conditions, is there a way to make it work with one or few images (or create several by transforming one)?
I looked here and elsewhere and the only thing I found so far was this example of the Vuforia SDK: https://www.youtube.com/watch?v=MtiUx_szKbI&t=1m10s. One image of a card in a card game is apparently enough to create an overlay so I assume there are ways. This is not my field of expertise so I hope you guys can help me out :)
If there were no perspective distortion, you could use simple normalized cross-correlation. But since there is, you probably want to use SURF. The basic algorithm to use SURF to find your reference image within a world image is:
find keypoints, such as corners, in both images.
describe the local texture of each keypoint.
use those descriptors to match keypoints between images. If there are a lot of matches, with consistent geometry, you've probably found your object.
Check out this tutorial, that walks you through doing exactly that: http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html#feature-homography
Related
I'm currently learning OpenCV for a project I recently started in, and need to detect 3D boxes (imagine the big plastic boxes maybe 3ft x 2ft x 2ft) in an image. I've used the inRange method to create an image which just had the boxes I'd like to detect in it, but I'm not sure where to go from there. I'd like to get a 3D representation of these boxes back from OpenCV, but I can't figure out how. I've found quite a few tutorials explaining how to do this with just one object (which I have done successfully), but I don't know how I would make this work with multiple boxes in one image.
Thanks!
If you have established a method that works well with one object, you may just go with a divide-and-conquer approach: split your problem into several small ones by dividing your image with multiple boxes into an several images with one object.
Apply an Object Detector to your image. This Tutorial on Object Detection may help you. A quick search for object detection with OpenCV also gave this.
Determine the bounding boxes of the objects (min/max of the x and y-coordinates, maybe add some border margin)
Crop bounding boxes to get single object images
Apply your already working method to the set of single object images
In case of overlap, the cropped images may need some processing to isolate a "main" object. Whether 4. works is then dependent on how robust your method is to occlusions.
I stumbled over your question when looking for object detection. It's been quite a while since you asked, but since this is a public knowledge base a discussion on this topic might still be helpful for others.
Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!
I am doing a project on face recognition from CCTV cameras, I want to recognize each individual faces. I think eigenface method is best for face recognition. But when we use eigenface method for moving object face recognition, is there any problem? Can we recognize individuals perfectly? Since it is not still image, I am really confused to select a method.
Please help me to know whether this method is ok, otherwise suggest a better alternative.
Short answer: Typically those computer vision techniques used in image analysis can be used in video analysis, too. Videos just give you more information (esp. the temporal information.) For example, you could do face recognition using multiple frames, and between each frame you do object tracking. Associating multiple frames typically give you higher accuracy.
IMO, the most difficult problems are: you're more likely to face viewing angle, calibration problems, and lighting condition problems, in which you will need accurate face detection technique, or more training data in order to recognize faces under viewing angles and lighting conditions. Eigen face based approach relies on an accurate position of faces, eyes, and so on. Otherwise, you are likely to mix different features in the same vector. But again, this problem also exists in face recognition under still image.
To sum up, video content only gives you more information. If you don't really want to associated frames and consider temporal information, video is just a collection of still images :)
Some background:
Hi all! I have a project which involves cloud imaging. I take pictures of the sky using a camera mounted on a rotating platform. I then need to compute the amount of cloud present based on some color threshold. I am able to this individually for each picture. To completely achieve my goal, I need to do the computation on the whole image of the sky. So my problem lies with stitching several images (about 44-56 images). I've tried using the stitch function on all and some subsets of image set but it returns an incomplete image (some images were not stitched). This could be because of a lack of overlap of something, I dunno. Also the output image has been distorted weirdly (I am actually expecting the output to be something similar to a picture taken by a fish-eye lense).
The actual problem:
So now I'm trying to figure out the opencv stitching pipeline. Here is a link:
http://docs.opencv.org/modules/stitching/doc/introduction.html
Based on what I have researched I think this is what I want to do. I want to map all the images to a circular shape, mainly because of the way how my camera rotates, or something else that has uses a fairly simple coordinate transformation. So I think I need get some sort of fixed coordinate transform thing for the images. Is this what they call the homography? If so, does anyone have any idea how I can go about my problem? After this, I believe I need to get a mask for blending the images. Will I need to get a fixed mask like the one I want for my homography?
Am I going through a possible path? I have some background in programming but almost none in image processing. I'm basically lost. T.T
"So I think I need get some sort of fixed coordinate transform thing for the images. Is this what they call the homography?"
Yes, the homography matrix is the transformation matrix between an original image and the ideal result. It warps an image in perspective so it can fit in stitching to the other image.
"If so, does anyone have any idea how I can go about my problem?"
Not with the limited information you provided. It would ease the problem a lot if you know the order of pictures (which borders which.. row, column position)
If you have no experience in image processing, I would recommend you use a tutorial covering stitching using more basic functions in detail. There is some important work behind the scenes, and it's not THAT harder to actually do it yourself.
Start with this example. It stitches two pictures.
http://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
I have 2 images with the same content but might have different scale or rotation. The problem is, I have to find the regions of these images and match them with one another. For example, if I have a circle on image1, i have to find the corresponding circle in image2.
I just like to ask what the proper way of solving this is. I am looking at the matchShapes of opencv. I believe this problem is image correspondence but I really have no idea how to solve it!
Thanks in advance!
I have the following images:
Template Image => https://lh6.googleusercontent.com/-q5qeExXUlpc/T7SbL9yWmCI/AAAAAAAAByg/gV_vM1kyLnU/w348-h260-n-k/1.labeled.jpg
Sample Image => https://lh4.googleusercontent.com/-x0IWxV7JdbI/T7SbNjG5czI/AAAAAAAAByw/WSu-y5O7ee4/w348-h260-n-k/2.labeled.jpg
Note that the numbers on the images correspond to the proper matching of regions. These are not present when comaparing the images.
As usually with computer vision problems, you can never provide too much information and make too many assumptions about the data you intend to analyze. Solving the general problem is close to impossible as we can't do human level pattern recognition with computers. How does your problem set look like? A few examples would be very helpful in trying to provide good answers.
You mention that the images have the same content, but with different colors. If that means it's the same scene photographed under different lighting conditions and from possibly different angles, you might need to do a rigid image registration first, so the feature points in the two images should overlap. If the shapes on your images might have multiple distortions compared to each other, you might be interested in non-rigid image registration.
If you already know the objects you are looking for, you can simply do a search for these objects in both images, for example with chamfer matching or any other matching algorithm.
Use ORB feature detector from OpenCV. Once you have the descriptors use BFMatcher with norm type NORM_HAMMING.