Augmented Reality-PC - c++

I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D

I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).

Related

Possibility of creating a software that can recognize context of an image?

I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?
Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application

Is there a way to compare two hand-drawn drawings?

For an experiment, we are looking for a way to automatically compare two hand (mouse-)drawn images. These images are, for instance, drawn on an HTML5 canvas element and we need some way to see whether the pictures roughly match.
So, if someone draws a house, we need to test whether the second drawing looks like the first house. It doesn't matter what exactly is in the image, but we only need to know whether the two images look alike. I.e., we want to know whether the person drawing the picture, can redraw roughly the same picture. The exact orientations of the lines, the size of the image or the position of the picture on the canvas shouldn't matter.
Is there, by any chance, a library or project that does this?
Yes absolutely
With API provided by Face.com and with imagemagick Both are providing
api's
http://www.imagemagick.org/api/compare.php
FaceAPI - Track Faces from a Webcam: http://faceapi.com
it might be useful for your requirement
Ultimately, we didn't find a library that does exactly the same, so we have modified the original project's goals and have crowdsourced the comparison that needed to be done.
You can probably also use Mechanical Turk or something similar for the time being, depending on how much you wish to pay for it.
I think researchers are currently be working on technical solutions to the issue described above.

Debugging of image processing code

What kind of debugging is available for image processing/computer vision/computer graphics applications in C++? What do you use to track errors/partial results of your method?
What I have found so far is just one tool for online and one for offline debugging:
bmd: attaches to a running process and enables you to view a block of memory as an image
imdebug: enables printf-style of debugging
Both are quite outdated and not really what I would expect.
What would seem useful for offline debugging would be some style of image logging, lets say a set of commands which enable you to write images together with text (probably in the form of HTML, maybe hierarchical), easy to switch off at both compile and run time, and the least obtrusive it can get.
The output could look like this (output from our simple tool):
http://tsh.plankton.tk/htmldebug/d8egf100-RF-SVM-RBF_AC-LINEAR_DB.html
Are you aware of some code that goes in this direction?
I would be grateful for any hints.
Coming from a ray tracing perspective, maybe some of those visual methods are also useful to you (it is one of my plans to write a short paper about such techniques):
Surface Normal Visualization. Helps to find surface discontinuities. (no image handy, the look is very much reminiscent of normal maps)
color <- rgb (normal.x+0.5, normal.y+0.5, normal.z+0.5)
Distance Visualization. Helps to find surface discontinuities and errors in finding a nearest point. (image taken from an abandoned ray tracer of mine)
color <- (intersection.z-min)/range, ...
Bounding Volume Traversal Visualization. Helps visualizing a bounding volume hierarchy or other hierarchical structures, and helps to see the traversal hotspots, like a code profiler (e.g. Kd-trees). (tbp of http://ompf.org/forum coined the term Kd-vision).
color <- number_of_traversal_steps/f
Bounding Box Visualization (image from picogen or so, some years ago). Helps to verify the partitioning.
color <- const
Stereo. Maybe useful in your case as for the real stereographic appearance. I must admit I never used this for debugging, but when I think about it, it could prove really useful when implementing new types of 3d-primitives and -trees (image from gladius, which was an attempt to unify realtime and non-realtime ray tracing)
You just render two images with slightly shifted position, focusing on some point
Hit-or-not visualization. May help to find epsilon errors. (image taken from metatrace)
if (hit) color = const_a;
else color = const_b
Some hybrid of several techniques.
Linear interpolation: lerp(debug_a, debug_b)
Interlacing: if(y%2==0) debug_a else debug_b
Any combination of ideas, for example the color-tone from Bounding Box Visualization, but with actual scene-intersection and lighting applied
You may find some more glitches and debugging imagery on http://phresnel.org , http://phresnel.deviantart.com , http://picogen.deviantart.com , and maybe http://greenhybrid.deviantart.com (an old account).
Generally, I prefer to dump bytearray of currently processed image as raw data triplets and run Imagemagick to create png from it with number e.g img01.png. In this way i can trace the algorithms very easy. Imagemagick is run from the function in the program using system call. This make possible do debug without using any external libs for image formats.
Another option, if you are using Qt is to work with QImage and use img.save("img01.png") from time to time like a printf is used for debugging.
it's a bit primitive compared to what you are looking for, but i have done what you suggested in your OP using standard logging and by writing image files. typically, the logging and signal export processes and staging exist in unit tests.
signals are given identifiers (often input filename), which may be augmented (often process name or stage).
for development of processors, it's quite handy.
adding html for messages would be simple. in that context, you could produce viewable html output easily - you would not need to generate any html, just use html template files and then insert the messages.
i would just do it myself (as i've done multiple times already for multiple signal types) if you get no good referrals.
In Qt Creator you can watch image modification while stepping through the code in the normal C++ debugger, see e.g. http://labs.qt.nokia.com/2010/04/22/peek-and-poke-vol-3/

Detecting transparent glass in images

Are there any methods in the computer vision literature that allows for detecting transparent glass in images? Like if I have an image of a car, can I detect windows? etc...
All methods I've found so far are active methods (i.e. require calibration, control over the environment or lasers). I need a passive method (i.e. all you have is an image, or multi-view images of the object and thats it).
Here is some very recent work aimed at detecting transparent objects in a general setting.
http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf
http://videolectures.net/nips09_fritz_alfm/
I think what you looking for is detection of translucent regions. There is very limited work here since it is a very hard problem. Basically it is a major chicken and egg problem. Translucent regions cause almost all fundamental image processing tools to fail (e.g. motion estimation, feature matching, tracking, etc...). Yet you must use such tools to detect translucent regions. Anyway, up to my knowledge this is the most recent piece of work in this area and I doubt there is any other.
http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Misc.Icip2011/CVPR_new.pdf
It is published in CVPR which is a top conference in Computer Vision.
Just a wild guess: if the camera is moving and you perform a 3D reconstruction of the scene, you could detect large discontinuities of the reconstructions at the reflected regions.
I think you should provide a clearer description of what your are trying to achieve.
The paper "Deriving intrinsic images from image sequences" shows some results with transparencies.
If you are close enough, you may be able to use the glass refraction (a la Snell's law) to detect the glass from multiple views.
I also think that reflections (specular regions) are a good indication for curved glasses.
Detecting it is one thing, but separating is another. You can do separation because its like putting 2 sounds with 1 of the sounds 180 degree out of phase. If you manage to learn the phasing sound by itself, you have the other sound automatically, so you could then learn that one too. Im stuck at the point where I can only superimposesubtract them if I learnt them by themselves. So the real gain here is somehow learning this addup, as 2 separate things, even though you never saw them apart.

CoreImage for Win32

For those not familiar with Core Image, here's a good description of it:
http://developer.apple.com/macosx/coreimage.html
Is there something equivalent to Apple's CoreImage/CoreVideo for Windows? I looked around and found the DirectX/Direct3D stuff, which has all the underlying pieces, but there doesn't appear to be any high level API to work with, unless you're willing to use .NET AND use WPF, neither of which really interest me.
The basic idea would be create/load an image, attach any number of filters that can be chained together, forming a graph, and then render the image to an HDC, using the GPU to do most of the hard work. DirectX/Direct3D has these pieces, but you have to jump through a lot of hoops (or so it appears) to use it.
There are a variety of tools for working with shaders (such as RenderMonkey and FX-Composer), but no direct equivalent to CoreImage.
But stacking up fragment shaders on top of each other is not very hard, so if you don't mind learning OpenGL it would be quite doable to build a framework that applies shaders to an input image and draws the result to an HDC.
Adobe's new Pixel Blender is the closest technology out there. It is cross-platform -- it's part of the Flash 10 runtime, as well as the key pixel-oriented CS4 apps, namely After Effects and (soon) Photoshop. It's unclear, however, how much is currently exposed for embedding in other applications at this point. In the most extreme case it should be possible to embed by embedding a Flash view, but that is more overhead than would obviously be idea.
There is also at least one smaller-scale 3rd party offering: Conduit Pixel Engine. It is commercial, with no licensing price clearly listed, however.
I've now got a solution to this. I've implemented an ImageContext class, a special Image class, and a Filter class that allows similar functionality to Apple's CoreImage. All three use OpenGL (I gave up trying to get this to work on DirectX due to image quality issues, if someone knows DirectX well contact me, because I'd love to have a Dx version) to render an image(s) to a context and use the filters to apply their effects (as HLGL frag shaders). There's a brief write up here:
ImageKit
with a screen shot of an example filter and some sample source code.