For an experiment, we are looking for a way to automatically compare two hand (mouse-)drawn images. These images are, for instance, drawn on an HTML5 canvas element and we need some way to see whether the pictures roughly match.
So, if someone draws a house, we need to test whether the second drawing looks like the first house. It doesn't matter what exactly is in the image, but we only need to know whether the two images look alike. I.e., we want to know whether the person drawing the picture, can redraw roughly the same picture. The exact orientations of the lines, the size of the image or the position of the picture on the canvas shouldn't matter.
Is there, by any chance, a library or project that does this?
Yes absolutely
With API provided by Face.com and with imagemagick Both are providing
api's
http://www.imagemagick.org/api/compare.php
FaceAPI - Track Faces from a Webcam: http://faceapi.com
it might be useful for your requirement
Ultimately, we didn't find a library that does exactly the same, so we have modified the original project's goals and have crowdsourced the comparison that needed to be done.
You can probably also use Mechanical Turk or something similar for the time being, depending on how much you wish to pay for it.
I think researchers are currently be working on technical solutions to the issue described above.
Related
I am going to split this question in 3 parts
First, I've been given this problem, and I don't know where to start, if you have been solving related problem, would you give me some hints and keywords to help me do some more research?
I have done some research on my own
So here is some 2D chest CT scans (sorry due to reputation rule i can't implement images directly)
All photos are in the same angle. So I think I can simply read each photo to a vector of pixels, do some thresh holding to make all black and black-ish pixels going to be a non-colored pixel. Next, I'll create a vector called vector_of_photo of those vectors. Then the index of each vector in vector_of_photo are now the Z-index.
Now I can render a 3d photo from those vectors of pixels right?
In the second place, I got trouble understand raycasting algorithm,
I think the idea here is, when I already got a box of pixel then everytime I rotate the box, it cast straight-lines from that angle of the camera to the box, each line found a has-colored pixel going to stop casting and render that pixel (or more specific, copy the pixel to the exactly location on the plane).
Did I understand it correctly?
At last, the OPENGL/c++ part is just the option I think I'm going to use to solve this problem. And I'm not pretty sure it is a good idea or not, so give me some more hint about the programming language, library or module I should take a look at.
I happen to be working on the same problem in my spare time. Haha :)
Here is one approach to your problem:
Load the images into your application, such that you get the 3D volumetric dataset that you describe
Remove all points that don't fit within some range of values (e.g. 0.4/1.0 to 0.6/1.0 brightness). You may need to apply preprocessing and filtering.
Fit a mesh to the resulting point cloud with open-source software. Here is a good blog post about that
https://towardsdatascience.com/5-step-guide-to-generate-3d-meshes-from-point-clouds-with-python-36bad397d8ba
Take the resulting mesh (probably, an STL file) and visualize it in any software your want (Blender 3D, Unity 3D, Cinema 4D, a custom OpenGL application), anything really.
My own approach to this problem is very similar to the one you suggest in your question, and I have already made some headway. Therefore, I thought it would be good to suggest another route.
NOTE Please be aware that what you are working on is not a trivial problem. It's a large project, and there are many Commerical companies that put years into doing just this. This is a great project for learning OpenGL, rendering, and other concepts. It's perfectly doable, but you may be looking at several months of work, and lots of trial and error. Good luck!
Its not often that two people would happen to work on the same problem, so if you want to discuss further, feel free to contact me over linkedin and/or post a comment below. www.linkedin.com/in/michael-sohnen-a2454b1b2
I need to know what's the best way to match certain shape (template) in the image.
I know there is several ways, but some of them did not lead to a very good results and the another need a lot of process time, so anyone tried a good and fast way to do the matching with short process time.
For example this is the template...
And I have a sample and I want to compare the sample with the template and return true if the sample is similar to the template else return false.
Note: I tried contour matching, Cascade Classification, and SURF, but all of them is not very good or the process time is not so good.
Matching things with eachother can be a rather difficult task, mainly due to the fact that different techniques have very different characteristics and can yield almost perfect results on some categories and very bad results on others.
This said, I don't think you'll ever get an answer to your question, at least not one that says "Use xyx method from [cited paper], that will solve all your problems". I'll try to point out some examples for you hoping that it'll help.
Template matching operator: compare a template with a sliding window on your image, can achieve very good results if your template is very similar to the object you are looking for in the image, no matter how complex it is. Can be very fast, it's not invariant to basically anything, so if you plan to have rotations, significant changes in lighting or something else, this is probably not going to work for you. here you can find out some code. Watch out which color space are you using, different color spaces can achieve very different results if used right (e.g. for face analysis HSV can be better that RGB in some cases)
Keypoint matching like SIFT or SURF: I used this a lot with very good results. You'll need to decide what descriptor to use and what matcher. OpenCV has some nice examples,here you can find one. Not going to be the fastest way to match your object since these descriptors can take some time to be extracted, it's good if you don't know much about the conditions you'll be working in though: it's usually robust to scale, rotation and lightning changes as long that keypoints can be correctly found on both the template and the image.
Shape matching: I was rather surprised when, in an image classification competition i participated in, I had been able to use a simple HOG descriptor to obtain very discriminating information about my images. Histograms of Oriented Gradients are a rather powerful tool for describing the shape of an object, it uses edge orientation and magnitude to describe your image. They can be fast to compute (OpenCV has a a GPU implementation I think), configurable (you can decide how thick your grid can be and how many cells, resulting in very different information). HOGs are not invariant to rotation, seen the object from a different angle will likely produce a different histogram, but they are very robust to lighting changes due to the fact that doesn't use color.
HOGs are just an example, there are a lot of shape and contour descriptors but basically they offer pretty much the same I think.
Histogram matching: not my first choice, it can be useful if you know something about the object and the rest of image. For example, if you know you are looking for your pink flower in a jungle image where it's the only pink thing there, a simple color histogram matching will do just fine. Pick up a sliding window, run it over your image, compare your histograms and you'll be done. Very fast, very simple, it doesn't use the shape at all so no matter how complex your object is, you'll find it. Not using shape makes it robust to rotations, watch out for lighting changes though. A very big limitations of this method is that if there are other pink things in your jungle you won't be able to distinguish.
Hybrid approaches: here is where you can get the best out of the techniques cited above. As you have seen, most of them work well in a certain environment and quite bad in others. You can use a combination of the techniques you know and obtain something much better than the sum of the parts. I worked a lot with HOGs and head pose estimation and a real breakthrough came when we started extracting HOGs not in a dense way but around certain keypoints. You'll need to know your problem, find out what do you need and adapt a bunch of methods to it. In general, hybrid methods can work a lot better and a lot slower.
Hope this helps you a bit, I don't think that, given the information you gave us, I could give you a much better answer..(probably someone else can, that's why I'm still a student :) )
I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?
Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application
I'm developing application that uses DirectShow combined with C++.
Its main goal is to capture users' faces.
I have reached the phase when I capture a image from my webcam.
The problem is I need an intelligent render. In fact, I need that render to be able to detect a face inside a rectangle.
I'm wondring if there is a filter that I can use for this purpose,
or if I need to create my own custmized filter.
If so enlighten my mind.
It would look like this:
I need to understand how I can draw a recangle in my render in the first place. Because otherwise, even if I know the algorithm, I will not be able to apply it. This is my main goal now.
I have some idea but I don't know if they are correct. I think I need to grab each frame separately and apply some modification in some pixels, like what's drawn in the live render.
Have a look at OpenCV
Quick look inside and I found this.
Making your own "filter" that works well is no easy job.
Are you talking about automatic detection of where there is something like a human face in the shot you have taken with the webcam? In this case object detection algorithms like Viola-Jones might be interesting for you.
If a commercial package is an option, you can use the Montivision Filter SDK which includes filters that should do the job out of the box. They offer a free eval which is perfect for experimentation.
I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D
I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).