Is stereoscopy (3D stereo) making a come back? - c++

I'm working on a stereoscopy application in C++ and OpenGL (for medical image visualization). From what I understand, the technology was quite big news about 10 years ago but it seems to have died down since. Now, many companies seem to be investing in the technology... Including nVidia it would seem.
Stereoscopy is also known as "3D Stereo", primarily by nVidia (I think).
Does anyone see stereoscopy as a major technology in terms of how we visualize things? I'm talking in both a recreational and professional capacity.

With nVidia's 3D kit you don't need to "make a stereoscopy application", drivers and video card take care of that. 10 years ago there was good quality stereoscopy with polarized glasses and extremely expensive monitors and low quality stereoscopy with red/cyan glasses. What you have now is both cheap and good quality. Right now all you need is 120Hz LCD, entry level graphics card and $100 shutter glasses.
So no doubt about it, it will be the next big thing. At least in entertainment.

One reason why it is probably coming back is due to the fact that we know have screens with high enough refreshrate so that 3D is possible. I think I read that you will need somewhere around 100Hz for 3D-TV. So, no need for bulky glasses anymore.
Edit: To reiterate: You no longer need glasses in order to have 3D TV. This article was posted in a swedish magazine a few weeks ago: http://www.nyteknik.se/nyheter/it_telekom/tv/article510136.ece.
What it says is basically that instead of glasses you use a technique with vertical lenses on the screen. Problem with CRT is that they are not flat. Our more modern flat screens obviously hasn't got this problem.
The second problem is that you need high frequency (at least 100 Hz as that makes the eye get 50 frames per second) and a lot of pixels, since each eye only gets half the pixels.
TV sets that support 3D without glasses have been sold by various companies since 2005.

Enthusiasm for stereo display seems to come and go in cycles of hype and disappointment (e.g cinema). I don't expect TV and PCs will be any different.
For medical visualisation, if it was that useful there would be armies of clinicians sitting in front of expensive displays wearing shutter glasses already. Big hint: there aren't. And that market doesn't need 3D display tech to reach "impulse purchase" pricing levels as an enabler.

Related

Water (pool, puddle) segmentation algorithm

Is there a generic computer vision technique that can be used to detect water (puddles, pools...) in a video? The video should be acquired from a camera attached to a drone, and this drone should not be too far above the water (10 to 30 meters above).
I'm specifying that the water should be in a pool or puddle, because the water should be standing, not moving in relation to its surroundings.
Well, it was quite an interesting task to verify multi-color segmentation can handle pools. Long story short, it definitely can, but pools are quite tricky anyway.
First of all, it will not be enough to have "water detection and color variation" paper. You'll also need "pools design and color variation" handbook.
Moreover, specific landscape, architecture, point of view, etc provide more colors to be used.
It might be easier to train a neural net to recognize some specific patterns, but blue color variations as a poor man solution might also do the trick.
Here is a complete 4K example

DirectX render video

Can someone explains how intro company logo or some other similar video sequences are presented in modern 3D games?
Is it pre rendered video sequence or it is rendered using DirectX to preserve quality, resolution and aspect ratio.
I understand that it may vary from company to company but I am searching for "best practice" since I can't find that in documentation.
Pretty much always videos. Rendering a video with a different aspect ratio is almost always the same as cutting angles on a 3D scene rendering.
Also:
1) they're not always high resolution (sometimes, when playing 1080p, you can see that the video is actually 720p or much smaller)
2) to avoid quality issues, they can just record with high resolution and scale on small resolutions (it's not obvious to tell the difference between a scaled-down video and a perfect fit).
On a side note, some games may have 3D scenes but that means lags possible (before setting video settings), less complex visual effects and graphics quality (because rendering an animated movie is not the same as rendering a game, we're talking a million times longer in the most extreme cases).

What main factors/features explain the high price of most industrial computer vision hardware? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I am a student who is currently working on a computer science project that will require soon computer vision and more specifically stereoscopy (for depth detection). I am now looking for a great camera to do the job and I found several interesting options:
1- A custom built set of two cheap cameras (i.e. webcam);
2- The old, classic, economic and proven Kinect;
3- Specialized stereo sensors.
I found a couple months ago this sensor: https://duo3d.com/product/duo-mini-lv1
I tought it was interesting because it is small, stereoscopic and brand new (encouraging a fresh USA company). However, if we take apart the additional APIs that come with it, I just don't understand why you would buy this when a Kinect (or "cheap" cameras) are at least 4-5 times less expensive and still have great if not better specifications.
Same applies for this one: http://www.ptgrey.com/bumblebee2-firewire-stereo-vision-camera-systems
Can someone please explain to me why I would need such a device and if not, why they are needed for some?
The reason you want a "real" stereo camera as opposed to a pair of usb webcams is synchronization. Cameras like the bumblebee have hardware triggering, which takes the two images with virtually no delay in between. With the webcams you will always have a noticeable delay between the two shots. This may be ok if you are looking at a static scene or at a scene where things are moving slowly. But if you want to have a stereo camera on a mobile robot, you will need good synchronization.
Kinect is great. However, a good stereo pair of cameras has a couple of serious advantages. One is that Kinect will not work outdoors during the day. The bright sun will interfere with the infra-red camera that Kinect uses for depth estimation. Also Kinect has a fairly short range of a few meters. You can get 3D information at much larger distances with a stereo pair by increasing the baseline.
In computer vision, we always want an ideal stereo camera such as no skewness on pixels, perfectly matched, aligned, identical cameras and so on. The cameras must supply enough images per seconds, because some of the algorithms uses temporal information that requires high fps to approximate the continuous motion. The lens is also an important part which may be so expensive. Additionally, the hardware suppliers generally provide an SDK. Creating an SDK is adding them extra value, because we always want to care what is important for us such as algorithms. Preparing the required software in order to connect the cameras to PC or any other boards may waste our time.
Kinect allows researchers to get depth information easily with a really good price; however, I agree with Dima that it's just for indoor applications and Kinect depth data has some holes which are generally required to be filled.
In addition to what Dima already pointed out: The cameras that you have listed both only give you the raw image data. If you are interested in the depth data, you will have to compute them yourself which can be very resource demanding and hard to do in real-time.
I currently know of one hardware system which does that for you in real-time:
http://nerian.com/products/sp1-stereo-vision/
But I don't think that this would be cheap either, and you would have to buy that on-top of the industrial cameras. So, if you are looking for a cheap solution, you should go with the Kinnect.

Shouldn't there be some adjustments for google cardboard?

Shouldn't there be some adjustments for google cardboard? With all different sizes of phones and with everyone having a bit of differences in how far apart our eyes are I was looking for a way to re position the two images closer so that it looked better. I don't need to use all the pixels and I'm thinking if you allowed adjustments to the center placement of each view that this could be more usable. As is I have to hold the phone a bit further from me to see a good image.
The Cardboard is open "technology" and you are free to adjust it to your own personal needs - no one is going to do that for you. If you are on a bigger budget, there are cheap plastic headsets available from various manufacturers. I got my headset for around 35$ with shipping.
I personally use a Color Cross but there are many others. Just make sure to look for some with open back, so you can plug in headphones, for example, or use the camera once that becomes a thing. An adjustable phone holder is a big plus, so be on the lookout for that too. Another important thing is adjustable IPD (Inter Pupillary Distance) for the lenses in the headset - some headsets with fixed lense distance gave me the cross-eyed effect. Also, many headsets have adjustable lens-to-phone distance, which also can be important.
Please note that all this is necessary for an okay-ish experience, and for the very best one available, you should get a whole integrated headset, like the Sony Morpherus, Oculus Rift or SteamVR. Also bear in mind that this technology is still in the RnD phase and there are many problems to be solved.
For an interesting read on some of these problems, check this out:
http://media.steampowered.com/apps/valve/2013/MAbrashGDC2013.pdf

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.