3D reconstruction from a single 2D image on an uncalibrated camera? - computer-vision

I'm very new to this space so I'm not sure if this question is obvious or not but I thought I would ask anyways.
I have been doing research into 3D reconstruction from a single 2D image for anatomical class specific objects (hands, faces, etc). from the papers I have read to perform 3D reconstruction on a single 2D image requires building a 3D Morphable Model. once 3DMM is built you can then match a 2D image to the Morphable Model by labeling landmarks onto the 2d image.
Is possible to perform this reconstruction on uncalibrated cameras?
The reason I am asking is that I would like to make an application that performs this type of 3D reconstruction from a mobile device (an iPhone for example) but having to perform camera calibration prior to taking a photo is something I would rather avoid if at all possible.
Thank you in advance for any information provided:)

Related

Undistorting without calibration images

I'm currently trying to "undistort" fisheye imagery using OpenCV in C++. I know the exact lens and camera model, so I figured that I would be able to use this information to calculate some parameters and ultimately convert fisheye images to rectilinear images. However, all the tutorials I've found online encourage using auto-calibration with checkerboards. Is there a way to calibrate the fisheye camera by just using camera + lens parameters and some math? Or do I have to use the checkerboard calibration technique?
I am trying to avoid having to use the checkerboard calibration technique because I am just receiving some images to undistort, and it would be undesirable to have to ask for images of checkerboards if possible. The lens is assumed to retain a constant zoom/focal length for all images.
Thank you so much!
To un-distord an image, you need to know the intrinsic parameters of the camera which describe the distorsion.
You can't compute them from datasheet values, because they depend on how the lens is manufactured and two lenses of the same vendor & model might have different distorsion coefficients, especially if they are cheap one.
Some raster graphics editor embed a lens database from which you can query distorsion coefficients. But there is no magic, they built it by measuring the lens distorsion and eventually interpolate them after.
But you can still use an empiric method to correct at least barrel effect.
They are plenty of shaders to do so and you can always do your own maths to build a distorsion map.

Object Detection in Fisheye Images

I have a camera that uses a fish-eye lens and need to run an object detection network like YOLO or SSD with it.
Should I rectify/un-distort the incoming image first? Is that computationally expensive?
Or, should I try to train the network using fish-eye images?
Many thanks for the help.
If you are trying to use a model that was pretrained on perspective rectilinear images, you will probably get poor results either way. On one hand, objects in raw fisheye images have a different appearance from the same objects in perspective images and many will be misdetected. On the other hand, you can't really "undistort" a fisheye image with a large field of view, and when you try - it looks very different from real perspective pictures. Some researchers are doing neither and investigating other alternatives instead.
If you have a training set of fisheye images, you can train a model on the raw fisheye images. It is a more difficult task for the network to learn, because the same object changes appearance when it moves, while convolutional neural networks are shift invariant, but nevertheless it is possible and has been demonstrated in the literature.

Visual Odometry, Camera Parameters

I am studying about visual odometry and watched Prof. Dr. Cyrill Stachniss' video recordings which are available as YouTube 2015/16 Playlist about Photogrammetry I & II .
First, If I want to create my own dataset (like KITTI dataset for VO or like Oxford campus dataset) what should be the properties of the image that I take with a camera.
Are they just images? Or, does they have some special properties ? That is, how can I create my own dataset with a monocular or stereo camera.
Thank you.
To get extrinsic and intrinsic parameters from the image you must have a set of images of known shape from varying views. It's not trivial task to do on your own, by common CV libraries / solution have a built-in utilities for camera calibration (I have to deal with OpenCV library and Matlab CV package and they are generally the same).Usually it's done with a black and white checkboard or another simple geometric pattern.
Then with known camera parameters you can manipulate your own dataset.
Matlab camera calibration reference
OpenCV camera calibration tutorials
If you want to benchmark some visual odometry algorithms with your dataset, you will definitely need the intrinsic parameters of your camera as well as its pose.
As said in #f4f answer, the intrinsic calibration is typically done with some images of a checkerboard that you tilt and rotate (see opencv).
This will give you parameters such as focal length, optical center but also the distortion coefficients which can be important depending on your camera.
Getting the pose of the camera (i.e extrinsic parameters) at each frame is probably trickier. Usually the ground-truth is obtained using information from additional sensors (tracking system, IMU, GPS, ...). You can have a look at : TUM RGB-D SLAM Dataset and the corresponding paper. They explain how they used a motion-capture system to get the ground-truth pose.
Recording the time of acquisition of the camera frames can also be interesting (one timestamp per frame).
Creating your own visual odometry dataset is not trivial. If you just want to create a dataset "for fun" or to do some experiments and if you have only a camera available, I would say you can just try some methods that are known to work well (like ORB-SLAM). This will give you good approximate of the camera poses (you may have to manually fix the unknown scale).

Is there a way to construct and store a 3D Map from point cloud and depth data?

I am currently working on a SLAM algorithm, and I succeeded in gathering the depth and RGB data on the form of a point cloud. However, I only display the frames that my Kinect 2.0 received to the screen and that is all.
I would like to gather those frames and as I move the Kinect, I construct a more elaborate Map (either 2D or 3D) so that it will help me in the localization or mapping.
My idea of the map construction would be just like when we create a Panorama image from many single snapshots.
Anyone has a clue, idea or an algorithm to do it?
You can use rtabmap to create 3D map and localizing your device. Its very simple to use and supports different devices.

Algorithm for generating a triangular mesh from a cloud of points using kinect

I'm using OpenNI libraries (kinect) and OpenGL. I can catch the Depth from the kinect (using openni and opencv) and I can convert it in a cloud of points.
So, i have 640*480 points in the 3D space and, for viewing purpose, i would like to generate a mesh composed of triangles.
I need an algorithm that is simple but capable of representing walls and obstacles of every kind. What can you suggest me?
Meshlab provides some algorithms to do this: Ball pivoting, Poisson triangulation and VGC. You can find implementations of them in the web.