How to find overlapping fields of view? - opengl

If I have 2 cameras and I'm given the positions and orientations of the cameras in the same coordinate system, is there any way I could detect overlapping fields of view? In other words, how could I tell if something that's displayed in the frame of 1 camera is also displayed in another? In addition, I'm also given the view and projection matrices of the 2 cameras.

To detect two overlapping fields of view you'll want to do a collision check between two viewing frustums (viewing volume).
A frustum is a convex polyhedra so you can use the separating axis theorem to do it.
See here.
However, if you just want to know if an object that is displayed in the frame of one camera is displayed in the frame of another camera the best way to do that is to transform the world space coordinates from said object in to the viewport space of both cameras. If both coordinates land within the range [0:width, height:0] for both and the z coordinate is positive, then the object is in view of both cameras.
This page has a great diagram of the 3D transformation viewing pipeline if you want to read more on what viewspace and worldspace are.

Related

How to get 3d coordinates from a 3d object file.

I am using 3 ArUco Markers stuck on a 3D head phantom model to do pose estimation using OpenCV in C++. My algorithm for pose estimation is giving me the translation with respect to the camera, but I want to now know the coordinates of the marker with respect to the model coordinate system. Therefore I have scanned the head model using a 3D scanner and have an object file and the texture file with me. My question is what is the easiest or best way to get the coordinates of the markers with respect to the head model. Should I use OpenGL, blender or some other software for it? Looking for some pointers or advice.
Sounds like you have the coordinates for the markers with respect to the camera as the coordinate system, so coordinates in "eye space" or camera space. Which is when you have coordinates where the camera is at the origin.
This article has a brilliant diagram which explain the different spaces and how to transform in to different spaces:
http://antongerdelan.net/opengl/raycasting.html
If you want these same coordinates but in model space you need the matrices that will get you in to that space.
In this case you are going from eye/camera space -> model space so you need to multiply those coordinates by the inverse view matrix then by the inverse model matrix. Then your coordinate would be in model space.
But this is a lot more difficult when you are using a physical camera, as opposed to a software camera, in OpenGL for example.
To do that you will need to use OpenCV to obtain your camera's intrinsic and extrinsic parameters.
See this tutorial for more details:
https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html

light field rendering from camera array

I'm trying to implement something similar to "Dynamically Reparameterized Light Fields" (Isaksen, McMillan, & Gortler), where the light field is a series of cameras placed on a plane:
In the paper, it discusses that we can find the corresponding camera and pixels by the following formulation: MF→D s,t = Ps,t ◦ TF.
The dataset that I'm using doesn't contain any camera parameters or any information regarding what is the distance between the cameras. I just know that they are placed on a plane uniformly. I have a free moving camera and rendering a view quad. So I can get the 3D position of the focal surface but I don't know how to get the (s,t,u,v) parameters from that. As soon as I get these parameters I can render correctly.

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

How to create views from a 360 degree panorama. (like street view)

Given a sphere like this one from google streetview.
If i wanted to create 4 views, front view, left view, right view and back view, how do i do the transformations needed to straiten the image out like if i was viewing it in google streetview. Notice the green line i drawed in, in the raw image its bended, but in street view its strait. How can i do this?
The streetview image is a spherical map. The way streetview and Google Earth work is by rendering the scene as if you were standing at the center of a giant sphere This sphere is textured with an image like in your question. The longitude on the sphere corresponds to the x coordinate on the texture and the latitude with the y coordinate.
A way to create the pictures you need would be to render the texture as a sphere like Google Earth does and then taking a screenshot of all the sides.
A way to do it purely mathematical is to envision yourself at the center of a cube and a sphere at the same time. The images you are looking for are the sides of the cube. If you want to know how a specific pixel in the cube map relates to a pixel in the spherical map, make a vector that points from the center of the cube to that pixel, and then see where that same vector points to on the sphere (latitude & longitude).
I'm sure if you search the web for spherical map cube map conversion you will be able to find more examples and implementations. Good luck!

How to use a chessboard to find the rotation/translation between 2 cameras

I am using opencv with C, and I am trying to get the extrinsic parameters (Rotation and translation) between 2 cameras.
I'm told that a checkerboard pattern can be used to calibrate, but I can't find any good samples on this. How do I go about doing this?
edit
The suggestions given are for calibrating a single camera with a checkerboard. How would you find the rotation and translation between 2 cameras given the checkerboard images from both views?
I was using code from http://www.starlino.com/opencv_qt_stereovision.html. It has some useful information and code of the author is pretty easy to understand and analyze, it covers both - chessboard calibrate and getting depth image from stereo cameras. I think it's based on this OpenCV book
opencv library here and about 3 chapters of the opencv book
A picture from a camera is just a projection of a bunch of color samples onto a plane. Assuming that the camera itself creates pictures with square pixels, the possible position of a given pixel is a vector from the camera's origin through the plane the pixel was projected onto. We'll refer to that plane as the picture plane.
One sample doesn't give you that much information. Two samples tells you a little bit more - the position of the camera relative to the plane created by three points: the two sample points and the camera position. And a third sample tells you the relative position of the camera in the world; this will be a single point in space.
If you take the same three samples and find them in another picture taken from a different camera, you will be able to determine the relative position of the cameras from the three samples (and their orientations based on the right and up vectors of the picture plane). To the correct distance, you need to know the distance between the actual sample points. In the case of a checkerboard, it's the physical dimensions of the checkerboard.