Detecting transparent glass in images - computer-vision

Are there any methods in the computer vision literature that allows for detecting transparent glass in images? Like if I have an image of a car, can I detect windows? etc...
All methods I've found so far are active methods (i.e. require calibration, control over the environment or lasers). I need a passive method (i.e. all you have is an image, or multi-view images of the object and thats it).

Here is some very recent work aimed at detecting transparent objects in a general setting.
http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf
http://videolectures.net/nips09_fritz_alfm/

I think what you looking for is detection of translucent regions. There is very limited work here since it is a very hard problem. Basically it is a major chicken and egg problem. Translucent regions cause almost all fundamental image processing tools to fail (e.g. motion estimation, feature matching, tracking, etc...). Yet you must use such tools to detect translucent regions. Anyway, up to my knowledge this is the most recent piece of work in this area and I doubt there is any other.
http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Misc.Icip2011/CVPR_new.pdf
It is published in CVPR which is a top conference in Computer Vision.

Just a wild guess: if the camera is moving and you perform a 3D reconstruction of the scene, you could detect large discontinuities of the reconstructions at the reflected regions.

I think you should provide a clearer description of what your are trying to achieve.
The paper "Deriving intrinsic images from image sequences" shows some results with transparencies.
If you are close enough, you may be able to use the glass refraction (a la Snell's law) to detect the glass from multiple views.
I also think that reflections (specular regions) are a good indication for curved glasses.

Detecting it is one thing, but separating is another. You can do separation because its like putting 2 sounds with 1 of the sounds 180 degree out of phase. If you manage to learn the phasing sound by itself, you have the other sound automatically, so you could then learn that one too. Im stuck at the point where I can only superimposesubtract them if I learnt them by themselves. So the real gain here is somehow learning this addup, as 2 separate things, even though you never saw them apart.

Related

Shouldn't there be some adjustments for google cardboard?

Shouldn't there be some adjustments for google cardboard? With all different sizes of phones and with everyone having a bit of differences in how far apart our eyes are I was looking for a way to re position the two images closer so that it looked better. I don't need to use all the pixels and I'm thinking if you allowed adjustments to the center placement of each view that this could be more usable. As is I have to hold the phone a bit further from me to see a good image.
The Cardboard is open "technology" and you are free to adjust it to your own personal needs - no one is going to do that for you. If you are on a bigger budget, there are cheap plastic headsets available from various manufacturers. I got my headset for around 35$ with shipping.
I personally use a Color Cross but there are many others. Just make sure to look for some with open back, so you can plug in headphones, for example, or use the camera once that becomes a thing. An adjustable phone holder is a big plus, so be on the lookout for that too. Another important thing is adjustable IPD (Inter Pupillary Distance) for the lenses in the headset - some headsets with fixed lense distance gave me the cross-eyed effect. Also, many headsets have adjustable lens-to-phone distance, which also can be important.
Please note that all this is necessary for an okay-ish experience, and for the very best one available, you should get a whole integrated headset, like the Sony Morpherus, Oculus Rift or SteamVR. Also bear in mind that this technology is still in the RnD phase and there are many problems to be solved.
For an interesting read on some of these problems, check this out:
http://media.steampowered.com/apps/valve/2013/MAbrashGDC2013.pdf

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Possibility of creating a software that can recognize context of an image?

I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?
Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application

Augmented Reality-PC

I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D
I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).

SolidWorks API - Electromagnetic Dynamics

Is it possible to simulate custom forces (in my case, electromagnetic) using the SolidWorks API for Animator/Motion Study/COSMOS/EMS?
I'm looking for any combination of API's that would expose the required data to be able to simulate the dynamics of either electrical positive/negative or magnetic north/south forces.
The very basics of what I need to be able to do is:
Model two cubes
Mark a point on one as having positive charge and the point on the other as negative charge (or north/south magnetism)
Press "Go"
Watch them come together and stick
Once I can figure out how to do this, I can go through with the more complicated code that I'm trying to write (that's not the problem). I'm simply stuck on where to begin. I have searched and searched but cannot find a definitive answer, the documentation is sparse and hard to grasp.
If this is definitely not possible or not worth it to attempt in SolidWorks, then that's an acceptable answer. I never would have chosen SolidWorks if I was left free to pick the platform, but it was chosen for me.
EDIT
It seems COSMOSMotion API's IDDMActionReactionForce class is what I was looking for. Can anyone point me to an example of using it to define a custom force between two objects?
I can't speak about SolidWorks, so my answer may be irrelevant — BUT I have used ray-tracing software to model dynamic systems.
I my case, I was simulating the circumstances of lunar and solar eclipses. The ray-tracing software (POVRay) took care of generating an image of the scene including the Sun, Earth and Moon, but I had to calculate the positions of the various bodies for each frame of the animation.
I suspect this may be the case with modelling Electromagnetic Dynamics, and you will have to calculate the positions of the bodies involved at intervals, so that Solidworks will render the scenes of an animation.
I may be all wrong about the capabilities of SolidWorks, so I wish you luck.
I was tempted to say that "it's impossible" because you said it would be "an acceptable answer", but that would be too easy.
After much trying, my conclusion is SolidWorks is not the appropriate platform for this. It doesn't let you hook into its internal physics calculations and the Force object I spoke of is way too inefficient for the problem I needed to model. Theoretically, it will work to bring two cubes together along side SolidWorks' built in gravity/collision detection simulation elements but when confronted with an n-body problem, it was apparent that it wasn't made for that.