stereo depth map but with a single moving camera measured with sensors - computer-vision

I've just gotten started learning about calculating depth from stereo images and before I went and committed to learning about this I wanted to check if it was a viable choice for a project I'm doing. I have a drone with a single rgb camera that has sensors that can give the orientation and movement of the drone. Would it be possible to sample two frames, the distance, and orientation differences between the samples and use this to calculate depth? I've seen in most examples the cameras are lined up horizontally. Is this necessary for stereo images or can I use any reasonable angle and distance between the two sampled images? Would it be feasible to do this in real time? My overall goal is to do some sort of monocular slam to have this drone navigate indoor areas. I know that ORB slam exists but I am mostly doing this for a learning experience and so would like to do things from scratch where possible.
Thank you.

Related

Mono Slam scale consistency

When running mono Slam how is scale consistency achieved between frames? One short tutorial ran the 5point algorithm repeatedly and used a motion model for consistency of scale between frames but that is clearly not done in the general case.
I think I read something once about how the 5 point algorithm is used initially to estimate motion between first two frames then features are tracked over time using reprojection loss and 3d projection.
How is it done?

Undistorting without calibration images

I'm currently trying to "undistort" fisheye imagery using OpenCV in C++. I know the exact lens and camera model, so I figured that I would be able to use this information to calculate some parameters and ultimately convert fisheye images to rectilinear images. However, all the tutorials I've found online encourage using auto-calibration with checkerboards. Is there a way to calibrate the fisheye camera by just using camera + lens parameters and some math? Or do I have to use the checkerboard calibration technique?
I am trying to avoid having to use the checkerboard calibration technique because I am just receiving some images to undistort, and it would be undesirable to have to ask for images of checkerboards if possible. The lens is assumed to retain a constant zoom/focal length for all images.
Thank you so much!
To un-distord an image, you need to know the intrinsic parameters of the camera which describe the distorsion.
You can't compute them from datasheet values, because they depend on how the lens is manufactured and two lenses of the same vendor & model might have different distorsion coefficients, especially if they are cheap one.
Some raster graphics editor embed a lens database from which you can query distorsion coefficients. But there is no magic, they built it by measuring the lens distorsion and eventually interpolate them after.
But you can still use an empiric method to correct at least barrel effect.
They are plenty of shaders to do so and you can always do your own maths to build a distorsion map.

Visual Odometry, Camera Parameters

I am studying about visual odometry and watched Prof. Dr. Cyrill Stachniss' video recordings which are available as YouTube 2015/16 Playlist about Photogrammetry I & II .
First, If I want to create my own dataset (like KITTI dataset for VO or like Oxford campus dataset) what should be the properties of the image that I take with a camera.
Are they just images? Or, does they have some special properties ? That is, how can I create my own dataset with a monocular or stereo camera.
Thank you.
To get extrinsic and intrinsic parameters from the image you must have a set of images of known shape from varying views. It's not trivial task to do on your own, by common CV libraries / solution have a built-in utilities for camera calibration (I have to deal with OpenCV library and Matlab CV package and they are generally the same).Usually it's done with a black and white checkboard or another simple geometric pattern.
Then with known camera parameters you can manipulate your own dataset.
Matlab camera calibration reference
OpenCV camera calibration tutorials
If you want to benchmark some visual odometry algorithms with your dataset, you will definitely need the intrinsic parameters of your camera as well as its pose.
As said in #f4f answer, the intrinsic calibration is typically done with some images of a checkerboard that you tilt and rotate (see opencv).
This will give you parameters such as focal length, optical center but also the distortion coefficients which can be important depending on your camera.
Getting the pose of the camera (i.e extrinsic parameters) at each frame is probably trickier. Usually the ground-truth is obtained using information from additional sensors (tracking system, IMU, GPS, ...). You can have a look at : TUM RGB-D SLAM Dataset and the corresponding paper. They explain how they used a motion-capture system to get the ground-truth pose.
Recording the time of acquisition of the camera frames can also be interesting (one timestamp per frame).
Creating your own visual odometry dataset is not trivial. If you just want to create a dataset "for fun" or to do some experiments and if you have only a camera available, I would say you can just try some methods that are known to work well (like ORB-SLAM). This will give you good approximate of the camera poses (you may have to manually fix the unknown scale).

How to turn any camera into a Depth Camera?

I want to build a depth camera that finds out any image from particular distance. I have already read the following link.
http://www.i-programmer.info/news/194-kinect/7641-microsoft-research-shows-how-to-turn-any-camera-into-a-depth-camera.html
https://jahya.net/blog/how-depth-sensor-works-in-5-minutes/
But couldn't understand clearly which hardware requirements need & how to integrated into all together?
Thanks
Certainly, a depth sensor needs an IR sensor, just like in Kinect or Asus Xtion and other cameras available that provides the depth or range image. However, Microsoft came up with machine learning techniques and using algorithmic modification and research which you can find here. Also here is a video link which shows the mobile camera that has been modified to get depth rendering. But some hardware changes might be necessary if you make a standalone 2D camera into a new performing device. So I would suggest you to see the hardware design of the existing market devices as well.
one way or the other you would need two angles to the same points to get a depth. So search for depth sensors and examples e.g. kinect with ros or openCV or here
also you could transfere two camera streams into a point cloud but that's another story
Here's what I know:
3D Cameras
RGBD and Stereoscopic cameras are popular for these applications but are not always practical / available. I've prototyped with Kinects (v1,v2) and intel cameras (r200,d435). Certainly those are preferred even today.
2D Cameras
IF YOU WANT TO USE RGB DATA FOR DEPTH INFO then you need to have an algorithm that will process the math for each frame; try an RGB SLAM. A good algo will not process ALL the data every frame but it will process all the data once and then look for clues to support evidence of changes to your scene. A number of BIG companies have already done this (it's not that difficult if you have a big team w big money) think Google, Apple, MSFT, etc etc.
Good luck out there, make something amazing!

Traffic Motion Recognition

I'm trying to build a simple traffic motion monitor to estimate average speed of moving vehicles, and I'm looking for guidance on how to do so using an open source package like OpenCV or others that you might recommend for this purpose. Any good resources that are particularly good for this problem?
The setup I'm hoping for is to install a webcam on a high-rise building next to the road in question, and point the camera down onto moving traffic. Camera altitude would be anywhere between 20 ft and 100ft, and the building would be anywhere between 20ft and 500ft away from the road.
Thanks for your input!
Generally speaking, you need a way to detect cars so you can get their 2D coordinates in the video frame. You might want to use a tracker to speed up the process and take advantage of the predictable motion of the vehicles. You, also, need a way to calibrate the camera so you can translate the 2D coordinates in the image to depth information so you can approximate speed.
So as a first step, look at detectors such as deformable parts model DPM, and tracking by detection methods. You'll probably need to port some code from Matlab (and if you do, please make it available :-) ). If that's too slow, maybe do some segmentation of foreground blobs, and track the colour histogram or HOG descriptors using a Particle Filter or a Kalman Filter to predict motion.