Creating VR scenarios for vision tests - computer-vision

What technical and equipment constraints will be faced when deploying traditional visual acuity testing and color vision testing into VR scenarios?
I thought about several points, one is that the resolution of the human eye is different from the resolution of the device. We can simulate a realistic visual distance on the device, but does the resolution of VR devices now allow us to recognize small letters in this case?
That's one of my ideas. I wish more potential challenges you provide and suggestions of doing such project!

Related

What main factors/features explain the high price of most industrial computer vision hardware? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I am a student who is currently working on a computer science project that will require soon computer vision and more specifically stereoscopy (for depth detection). I am now looking for a great camera to do the job and I found several interesting options:
1- A custom built set of two cheap cameras (i.e. webcam);
2- The old, classic, economic and proven Kinect;
3- Specialized stereo sensors.
I found a couple months ago this sensor: https://duo3d.com/product/duo-mini-lv1
I tought it was interesting because it is small, stereoscopic and brand new (encouraging a fresh USA company). However, if we take apart the additional APIs that come with it, I just don't understand why you would buy this when a Kinect (or "cheap" cameras) are at least 4-5 times less expensive and still have great if not better specifications.
Same applies for this one: http://www.ptgrey.com/bumblebee2-firewire-stereo-vision-camera-systems
Can someone please explain to me why I would need such a device and if not, why they are needed for some?
The reason you want a "real" stereo camera as opposed to a pair of usb webcams is synchronization. Cameras like the bumblebee have hardware triggering, which takes the two images with virtually no delay in between. With the webcams you will always have a noticeable delay between the two shots. This may be ok if you are looking at a static scene or at a scene where things are moving slowly. But if you want to have a stereo camera on a mobile robot, you will need good synchronization.
Kinect is great. However, a good stereo pair of cameras has a couple of serious advantages. One is that Kinect will not work outdoors during the day. The bright sun will interfere with the infra-red camera that Kinect uses for depth estimation. Also Kinect has a fairly short range of a few meters. You can get 3D information at much larger distances with a stereo pair by increasing the baseline.
In computer vision, we always want an ideal stereo camera such as no skewness on pixels, perfectly matched, aligned, identical cameras and so on. The cameras must supply enough images per seconds, because some of the algorithms uses temporal information that requires high fps to approximate the continuous motion. The lens is also an important part which may be so expensive. Additionally, the hardware suppliers generally provide an SDK. Creating an SDK is adding them extra value, because we always want to care what is important for us such as algorithms. Preparing the required software in order to connect the cameras to PC or any other boards may waste our time.
Kinect allows researchers to get depth information easily with a really good price; however, I agree with Dima that it's just for indoor applications and Kinect depth data has some holes which are generally required to be filled.
In addition to what Dima already pointed out: The cameras that you have listed both only give you the raw image data. If you are interested in the depth data, you will have to compute them yourself which can be very resource demanding and hard to do in real-time.
I currently know of one hardware system which does that for you in real-time:
http://nerian.com/products/sp1-stereo-vision/
But I don't think that this would be cheap either, and you would have to buy that on-top of the industrial cameras. So, if you are looking for a cheap solution, you should go with the Kinnect.

Suggestion for beginners of 3d measurement in images/videos?

I should face an application of estimating the size(lengths, width and height) of cars in surveillance videos. Where should I start to learn? And what is the baseline accuracy(%1? %10?) that the state-of-art could achieve?
This is not a light subject to pick up, but if you are inclinded here is an excellent book to get you started: Multiple View Geometry in Computer Vision
A very recent review of automated traffic analysis is this one. A slightly older one is here.
In reality - this is a hard problem. Additional measurement of the vehicles in question will at the least require some form of calibration.

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.

Design of virtual trial room

As a part of my masters project I proposed to build a virtual trial room application intended for retail clothing stores. Currently its meant to be used directly in store though it may be extended for online stores as well.
This application will show customers how a selected apparel would look on them by showing it on their 3D replica on screen.
It involves 3 steps
Sizing up the customer
Building customer replica 3D humanoid model
Apply simulated cloth on the model
My question is about the feasibility of the project and choice of framework.
Can this be achieved in real time using a normal Desktop computer? If yes what would be appropriate framework ( hardware, software, programming language etc ) for this purpose?
On the work I have done till now, I was planning to achieve above steps in following ways
for step 1 : option a) Two cameras for front and side views or
option b) 1 Kinect or 2 Kinect for complete 3D data
for step 2: either use makehuman (http://www.makehuman.org/) code to build a customised 3D model using above data or build everything from scratch, unsure about the framework.
for step 3: Just need few cloth samples, so thought of building simulated clothes in blender.
Currently I have just the vague idea about different pieces but I am not sure of how to develop complete application.
Theoretically this can be achieved in real time. Many usefull algorithms for video tracking, stereo vision and 3d recostruction are available in OpenCV library. But it's very difficult to build robust solution. For example, you'll probably need to track human body which moves frame to frame and perform pose estimation (OpenCV contains POSIT algorithm), however it's not trivial to eliminate noise in resulting objects coordinates. For inspiration see a nice work on video tracking.
You might want to choose another way, simplify some things, avoid complicated stuff do things less dynamicaly and estimate only clothes size and approximate human location. I this case most likely you will create something usefull and interesting.
I've lost link to one online fiting room where hands and body detection implemented. Using Kinnect solves many problems. But If for some reason you won't use it then AR(augmented reality) helps you (yet another fitting room)

Detecting transparent glass in images

Are there any methods in the computer vision literature that allows for detecting transparent glass in images? Like if I have an image of a car, can I detect windows? etc...
All methods I've found so far are active methods (i.e. require calibration, control over the environment or lasers). I need a passive method (i.e. all you have is an image, or multi-view images of the object and thats it).
Here is some very recent work aimed at detecting transparent objects in a general setting.
http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf
http://videolectures.net/nips09_fritz_alfm/
I think what you looking for is detection of translucent regions. There is very limited work here since it is a very hard problem. Basically it is a major chicken and egg problem. Translucent regions cause almost all fundamental image processing tools to fail (e.g. motion estimation, feature matching, tracking, etc...). Yet you must use such tools to detect translucent regions. Anyway, up to my knowledge this is the most recent piece of work in this area and I doubt there is any other.
http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Misc.Icip2011/CVPR_new.pdf
It is published in CVPR which is a top conference in Computer Vision.
Just a wild guess: if the camera is moving and you perform a 3D reconstruction of the scene, you could detect large discontinuities of the reconstructions at the reflected regions.
I think you should provide a clearer description of what your are trying to achieve.
The paper "Deriving intrinsic images from image sequences" shows some results with transparencies.
If you are close enough, you may be able to use the glass refraction (a la Snell's law) to detect the glass from multiple views.
I also think that reflections (specular regions) are a good indication for curved glasses.
Detecting it is one thing, but separating is another. You can do separation because its like putting 2 sounds with 1 of the sounds 180 degree out of phase. If you manage to learn the phasing sound by itself, you have the other sound automatically, so you could then learn that one too. Im stuck at the point where I can only superimposesubtract them if I learnt them by themselves. So the real gain here is somehow learning this addup, as 2 separate things, even though you never saw them apart.