Water (pool, puddle) segmentation algorithm

Water (pool, puddle) segmentation algorithm - computer-vision

Is there a generic computer vision technique that can be used to detect water (puddles, pools...) in a video? The video should be acquired from a camera attached to a drone, and this drone should not be too far above the water (10 to 30 meters above).
I'm specifying that the water should be in a pool or puddle, because the water should be standing, not moving in relation to its surroundings.

Well, it was quite an interesting task to verify multi-color segmentation can handle pools. Long story short, it definitely can, but pools are quite tricky anyway.
First of all, it will not be enough to have "water detection and color variation" paper. You'll also need "pools design and color variation" handbook.
Moreover, specific landscape, architecture, point of view, etc provide more colors to be used.
It might be easier to train a neural net to recognize some specific patterns, but blue color variations as a poor man solution might also do the trick.
Here is a complete 4K example

Related

Shouldn't there be some adjustments for google cardboard?

Shouldn't there be some adjustments for google cardboard? With all different sizes of phones and with everyone having a bit of differences in how far apart our eyes are I was looking for a way to re position the two images closer so that it looked better. I don't need to use all the pixels and I'm thinking if you allowed adjustments to the center placement of each view that this could be more usable. As is I have to hold the phone a bit further from me to see a good image.

The Cardboard is open "technology" and you are free to adjust it to your own personal needs - no one is going to do that for you. If you are on a bigger budget, there are cheap plastic headsets available from various manufacturers. I got my headset for around 35$ with shipping.
I personally use a Color Cross but there are many others. Just make sure to look for some with open back, so you can plug in headphones, for example, or use the camera once that becomes a thing. An adjustable phone holder is a big plus, so be on the lookout for that too. Another important thing is adjustable IPD (Inter Pupillary Distance) for the lenses in the headset - some headsets with fixed lense distance gave me the cross-eyed effect. Also, many headsets have adjustable lens-to-phone distance, which also can be important.
Please note that all this is necessary for an okay-ish experience, and for the very best one available, you should get a whole integrated headset, like the Sony Morpherus, Oculus Rift or SteamVR. Also bear in mind that this technology is still in the RnD phase and there are many problems to be solved.
For an interesting read on some of these problems, check this out:
http://media.steampowered.com/apps/valve/2013/MAbrashGDC2013.pdf

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much

I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf

If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.

Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks

I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.

I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.

Detecting transparent glass in images

Are there any methods in the computer vision literature that allows for detecting transparent glass in images? Like if I have an image of a car, can I detect windows? etc...
All methods I've found so far are active methods (i.e. require calibration, control over the environment or lasers). I need a passive method (i.e. all you have is an image, or multi-view images of the object and thats it).

Here is some very recent work aimed at detecting transparent objects in a general setting.
http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf
http://videolectures.net/nips09_fritz_alfm/

I think what you looking for is detection of translucent regions. There is very limited work here since it is a very hard problem. Basically it is a major chicken and egg problem. Translucent regions cause almost all fundamental image processing tools to fail (e.g. motion estimation, feature matching, tracking, etc...). Yet you must use such tools to detect translucent regions. Anyway, up to my knowledge this is the most recent piece of work in this area and I doubt there is any other.
http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Misc.Icip2011/CVPR_new.pdf
It is published in CVPR which is a top conference in Computer Vision.

Just a wild guess: if the camera is moving and you perform a 3D reconstruction of the scene, you could detect large discontinuities of the reconstructions at the reflected regions.

I think you should provide a clearer description of what your are trying to achieve.
The paper "Deriving intrinsic images from image sequences" shows some results with transparencies.
If you are close enough, you may be able to use the glass refraction (a la Snell's law) to detect the glass from multiple views.
I also think that reflections (specular regions) are a good indication for curved glasses.

Detecting it is one thing, but separating is another. You can do separation because its like putting 2 sounds with 1 of the sounds 180 degree out of phase. If you manage to learn the phasing sound by itself, you have the other sound automatically, so you could then learn that one too. Im stuck at the point where I can only superimposesubtract them if I learnt them by themselves. So the real gain here is somehow learning this addup, as 2 separate things, even though you never saw them apart.

Is stereoscopy (3D stereo) making a come back?

I'm working on a stereoscopy application in C++ and OpenGL (for medical image visualization). From what I understand, the technology was quite big news about 10 years ago but it seems to have died down since. Now, many companies seem to be investing in the technology... Including nVidia it would seem.
Stereoscopy is also known as "3D Stereo", primarily by nVidia (I think).
Does anyone see stereoscopy as a major technology in terms of how we visualize things? I'm talking in both a recreational and professional capacity.

With nVidia's 3D kit you don't need to "make a stereoscopy application", drivers and video card take care of that. 10 years ago there was good quality stereoscopy with polarized glasses and extremely expensive monitors and low quality stereoscopy with red/cyan glasses. What you have now is both cheap and good quality. Right now all you need is 120Hz LCD, entry level graphics card and $100 shutter glasses.
So no doubt about it, it will be the next big thing. At least in entertainment.

One reason why it is probably coming back is due to the fact that we know have screens with high enough refreshrate so that 3D is possible. I think I read that you will need somewhere around 100Hz for 3D-TV. So, no need for bulky glasses anymore.
Edit: To reiterate: You no longer need glasses in order to have 3D TV. This article was posted in a swedish magazine a few weeks ago: http://www.nyteknik.se/nyheter/it_telekom/tv/article510136.ece.
What it says is basically that instead of glasses you use a technique with vertical lenses on the screen. Problem with CRT is that they are not flat. Our more modern flat screens obviously hasn't got this problem.
The second problem is that you need high frequency (at least 100 Hz as that makes the eye get 50 frames per second) and a lot of pixels, since each eye only gets half the pixels.
TV sets that support 3D without glasses have been sold by various companies since 2005.

Enthusiasm for stereo display seems to come and go in cycles of hype and disappointment (e.g cinema). I don't expect TV and PCs will be any different.
For medical visualisation, if it was that useful there would be armies of clinicians sitting in front of expensive displays wearing shutter glasses already. Big hint: there aren't. And that market doesn't need 3D display tech to reach "impulse purchase" pricing levels as an enabler.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js