Possibility of creating a software that can recognize context of an image? - computer-vision

I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?

Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application

Related

What's the best way for complex shape matching

I need to know what's the best way to match certain shape (template) in the image.
I know there is several ways, but some of them did not lead to a very good results and the another need a lot of process time, so anyone tried a good and fast way to do the matching with short process time.
For example this is the template...
And I have a sample and I want to compare the sample with the template and return true if the sample is similar to the template else return false.
Note: I tried contour matching, Cascade Classification, and SURF, but all of them is not very good or the process time is not so good.
Matching things with eachother can be a rather difficult task, mainly due to the fact that different techniques have very different characteristics and can yield almost perfect results on some categories and very bad results on others.
This said, I don't think you'll ever get an answer to your question, at least not one that says "Use xyx method from [cited paper], that will solve all your problems". I'll try to point out some examples for you hoping that it'll help.
Template matching operator: compare a template with a sliding window on your image, can achieve very good results if your template is very similar to the object you are looking for in the image, no matter how complex it is. Can be very fast, it's not invariant to basically anything, so if you plan to have rotations, significant changes in lighting or something else, this is probably not going to work for you. here you can find out some code. Watch out which color space are you using, different color spaces can achieve very different results if used right (e.g. for face analysis HSV can be better that RGB in some cases)
Keypoint matching like SIFT or SURF: I used this a lot with very good results. You'll need to decide what descriptor to use and what matcher. OpenCV has some nice examples,here you can find one. Not going to be the fastest way to match your object since these descriptors can take some time to be extracted, it's good if you don't know much about the conditions you'll be working in though: it's usually robust to scale, rotation and lightning changes as long that keypoints can be correctly found on both the template and the image.
Shape matching: I was rather surprised when, in an image classification competition i participated in, I had been able to use a simple HOG descriptor to obtain very discriminating information about my images. Histograms of Oriented Gradients are a rather powerful tool for describing the shape of an object, it uses edge orientation and magnitude to describe your image. They can be fast to compute (OpenCV has a a GPU implementation I think), configurable (you can decide how thick your grid can be and how many cells, resulting in very different information). HOGs are not invariant to rotation, seen the object from a different angle will likely produce a different histogram, but they are very robust to lighting changes due to the fact that doesn't use color.
HOGs are just an example, there are a lot of shape and contour descriptors but basically they offer pretty much the same I think.
Histogram matching: not my first choice, it can be useful if you know something about the object and the rest of image. For example, if you know you are looking for your pink flower in a jungle image where it's the only pink thing there, a simple color histogram matching will do just fine. Pick up a sliding window, run it over your image, compare your histograms and you'll be done. Very fast, very simple, it doesn't use the shape at all so no matter how complex your object is, you'll find it. Not using shape makes it robust to rotations, watch out for lighting changes though. A very big limitations of this method is that if there are other pink things in your jungle you won't be able to distinguish.
Hybrid approaches: here is where you can get the best out of the techniques cited above. As you have seen, most of them work well in a certain environment and quite bad in others. You can use a combination of the techniques you know and obtain something much better than the sum of the parts. I worked a lot with HOGs and head pose estimation and a real breakthrough came when we started extracting HOGs not in a dense way but around certain keypoints. You'll need to know your problem, find out what do you need and adapt a bunch of methods to it. In general, hybrid methods can work a lot better and a lot slower.
Hope this helps you a bit, I don't think that, given the information you gave us, I could give you a much better answer..(probably someone else can, that's why I'm still a student :) )

Is there a way to compare two hand-drawn drawings?

For an experiment, we are looking for a way to automatically compare two hand (mouse-)drawn images. These images are, for instance, drawn on an HTML5 canvas element and we need some way to see whether the pictures roughly match.
So, if someone draws a house, we need to test whether the second drawing looks like the first house. It doesn't matter what exactly is in the image, but we only need to know whether the two images look alike. I.e., we want to know whether the person drawing the picture, can redraw roughly the same picture. The exact orientations of the lines, the size of the image or the position of the picture on the canvas shouldn't matter.
Is there, by any chance, a library or project that does this?
Yes absolutely
With API provided by Face.com and with imagemagick Both are providing
api's
http://www.imagemagick.org/api/compare.php
FaceAPI - Track Faces from a Webcam: http://faceapi.com
it might be useful for your requirement
Ultimately, we didn't find a library that does exactly the same, so we have modified the original project's goals and have crowdsourced the comparison that needed to be done.
You can probably also use Mechanical Turk or something similar for the time being, depending on how much you wish to pay for it.
I think researchers are currently be working on technical solutions to the issue described above.

Augmented Reality-PC

I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D
I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).

Image processing Basics

I am planning to do a project on Image Processing, my knowledge in this subject in general is low. My Preferred Language is C++.
Can the members out here give me:
A Brief idea of What Image Processing is?
What books should i consult [ please keep in mind i am a beginner and am ONLY interested in making a college Project ]
What libraries can i use? [ I know about Boost/OpenCV etc. I would like to know what simplest and can get my project done quickly - its a minor project ]
Apart from the above 3 points, anything which i should know if be told to me will be of good help. Thanks in Advance.
I would suggest reading a good book. Image processing is not a programming field - it's an engineering field and it involves mathematics and signal processing knowledge and intuition. The Gonzalez and Woods Image Processing is quite good and doesn't require a vast knowledge of signal processing before you start reading it. The bottom line is you don't learn image processing like you learn a new programming language; you learn it like a completely new subject that just happens to involve coding. To break this up into answers to your questions,
Image processing is a discipline of digital signal processing which is itself at an intersection of computer science and applied mathematics. It involves pixel-based image operations for purposes of image enhancement (color and contrast correction, denoising, deblurring), visual effects (spatial distortion, morphing, color-substitution), artificial vision (feature extraction, texture segmentation, pattern identification, spatial perception). There are also many narrowly applied areas of image processing such as RADAR image processing, medical image processing, etc.
The book I mentioned above is really a great read. If it's a bit pricey for you, I always find it useful going to Amazon and searching for an inexpensive older edition used book on the subject with a five star rating. Never failed me yet. Beware of getting books that are too old though.
There are plenty of libraries for the task, Boost/CImg are some of them, and it really depends on the platform you're coding for. However, I would think that an image processing project would not involve any libraries, instead you would be writing image processing filters and other operators yourself -- that's the essence of it. You would very likely use algorithm libraries though for faster computation. A project in image processing is not a software project; rather, it's an engineering project and using a library would kill the purpose completely. That is in my humble opinion, of course.
Answer to 3.: CImg might be a good choice to start quickly.
Modifying the image data in such a way to get the desired effect (for example, change a colour image into black and white image).
Very broad question, and the answer depends on what you want to do.
Take a look into GraphickMagick or ImageMagick.
Image processing is a lot about math, and is particular matrix manipulations and in more advanced processing, Fourrier transformation.
image processing is at its basic definition, image manipulation, whatever the manipulations are (either color manipulation, feature extractions, enhancements, ... ). Image processing is different than computer graphics (2d and 3d)
I would assume visit your local college library, they should have existing reference for image processing, algorithms and all that jazz. You have to decide (with your college professor/adviser) what part of image processing you want to explore.
Have a look at the ImageMagick libraries (among others), it offer a good package to start learning about image processing; source code is available).
Max.
Altough old, I trink Digital Image Processing by K. Pratt is a good choice to start (to get a gist of common techniques), but imho you shouldn't learn with C++; a high level language with good image processing toolbox (like MATLAB) is far better to try algorithms (whic sometimes need heavy use of complex numerical methods).

Finding the right tools for programming a futuristic-styled UI project

I've always been inspired by dynamic, futuristic-like user interfaces. The best I can describe is a graphic interface such as in the latest Iron Man movies.
Although I wouldn't build a full blown application, I would like to make little snipplets of animations that I plan to make interactive. And maybe put them together someday to make something bigger. Admittedly, I will use for audio manipulation in the future but anyway, that's not the point since it's the animations/graphics I'm unsure of.
I know it's possible to make those kind of animations in Adobe After Effects. I'm just having a hard time thinking of the processes (artistically and programmability) to proceed.
While researching on this on my own I have acquired basic experience with OGRE 3D and Blender. I've imported and compiled meshes on OGRE, have been able to do basic things like move the meshes around which is about it.
I'm beginning to think I may be approaching this the wrong way and there are better tools or if 3D is overkill for those kind of animations when 2D would suffice and maybe provide a smoother experience.
I'm having trouble understanding the process and am wondering two things:
1.)The main thing I'm having trouble understanding is how to get still graphics to make animations? Do the meshes keep the timeline from a program like Blender then a graphics engine like OGRE reads the timeline and plays them?
Most importantly:
2.)Do I even need graphics (meshes)? Most of the interface are thin-border boxes, text and shapes of transparent LED-like colors that can move around dynamically to make that futuristic effect.
Please share your opinions, suggestions and anything you think might help me accomplish to develop those kinds of sexy eye candy! Thanks.
When you look at awesome futuristic UIs in movies, they are usually made of
basic primitives
desaturated colors, and/or one color tone
transparency
a cool font or two
high-tech text, graphs or similar
simple animations to make things look "alive", blinking lights/text and similar
a touch interface, of course
Maybe you can't do a lot about the touch interface, but the rest is really not hard graphics wise, it's a matter of carefully crafted artwork and combining simple elements in a cool way.
Also I would look into Adobe Photoshop and fancy texturing rather than Blender and fancy modelling, as you are looking for a fancy 2D UI, and detailed 3D models will not be that important. Playing around in photoshop (well, or GIMP if you want a free alternative) can help you develop your art skills, and help you get that high-tech, sci-fi look on a 2D surface.
You know, I would go as far as to suggest making some sci-fi wallpapers in the style you are after before trying to solve this problem in code. I think you will find that photo manipulation skills and an eye for art will help you here. And for gods sake, look at those movies (Iron Man, Minority Report etc.) that have those UIs you are aiming at, and analyze what exactly they are. Decompose them like I did in the list above.
As for the "which tools should I use?", I say the answer to that is fairly simple:
OpenGL
Photoshop (or GIMP if you are a starving student etc.)
A compiler & toolchain
A code editor/IDE
A cup
I see this is tagged C++, which is an excellent choice of programming language if I may say so.
Ogre is a full blown 3D engine, which is fine, but not exactly targeted at what you want to use it for. You might find that you struggle to get what you want done (disclaimer: I have not tried this in Ogre, and it might work well for this. Then again, when did you last see Ogre used in an audio manipulation program?). My advice is to learn good, simple OpenGL. That would give you complete power over your UI, not get in your way or limit you in any way. It is also cross platform, well documented, and used by tons of developers all over the world (also for audio manipulation applications). I can't see how you could possibly go wrong with it. The fun part is that it probably won't take you long to get advanced enough in it to start developing some pretty nice UIs. As I mentioned, it's more of an art problem than a coding problem.
The cup is for the coffee, by the way. :)
The easiest and most efficient way is to keep track of all your graphics data (meshes, animations, effects) in "media files" and load & play them in runtime. Though you'll be able to easily change your game without changing the code.
For example, you have a Diablo-like game and you wanna turn it to the future-style. You just need to rewrite some player and AI scripts and modify meshes/effects/sounds/animations. But if you've done those via code - it will be a new game from scratch.
I would suggest Ogre, but you already used that, so by my opinion, you are on the right track.
Look up 'billboards' in Ogre documentation, re: LED and 2D stuff.