Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Consider the following diagram and equations representing a pinhole camera:
Suppose the image size is W times H pixels, and that there is no nonlinear distortion. To compute the field of view I proceed as in the picture below:
where \tilde{H} is the image width in the image plane, not in the pixel coordinates, and s_y is the height of a pixel in the image plane units.
In an exercise I'm told to account for the fact that the principal point might not be in the image center.
How could this happen, how do we correct the FOV in this case?
Moreover, suppose the image was distorted as follows, before being projected on the pixel coordinates:
How do we account for the distortion in the FOV? How is it even defined?
The principal point may not be centered in the image for a variety of reasons, for example, the lens may be slightly decentered due to the mechanics of the mount, or the image may have been cropped.
To compute the FOV with a decentered principal point you just redo your computation separately for the angles to the left and right sides of the focal axis (for the horizontal FOV, above and below for the vertical), and add the angles up.
The FOV is defined exactly in the same way, as the angle between the light rays that project to left and right extrema of the image image row containing the principal point. To compute it you need to first undistort those pixel coordinates. For ordinary photographic lenses, where the barrel term dominates the distortion, the result is a slightly larger FOV than what you compute ignoring the distortion. Note also that, due to the nonlinearitiy of the distortion, the horizontal, vertical and diagonal FOV's are not simply related through the image aspect ratio when the distortion is taken into account.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I am interested in generating a 360 degree rendition of a real terrain model (a triangulated terrain) using OpenGL so that I can extract accurate 3D information in the way of depth, orientation(azimuth) and angle of elevation. That is, so that for each pixel I end up with accurate information about the angle of elevation, azimuth and depth as measured from the camera position. The 360 degree view would be 'stitch together' after the camera is rotated around. My questions is how accurate would the information be?
If I had a camera width of 100 pixels, a horizontal field of view of 45 degrees and rotated 8 times around, would each orientation (1/10th of degree) have the right depth and angle of elevation?
If this is not accurate due to projection, is there a way to adjust for any deviations?
Just as an illustration, the figure below shows a panoramic I created (not with OpenGL). The image has 3600 columns (one per 1/10th of a degree in azimuth where each column has the same angular unit), depth (in meters) and the elevation (not the angle of elevation). This was computed programmatically without OpenGL
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have 3D scene with perspective projection. Also I can select an object on the scene. I need to draw axes for selected object. The problem is the axes don't save their size in perspective projection. If object is far from the eye (camera), axes is going be small too.
How to draw axes with the same size regardless of the position of the eye (camera)?
There are two ways to achieve this:
Shader only approach
When looking at the perspective projection, the size change according to depth is caused by the perspective divide (dividing all components by w). If you want to prevent this from happening, you can multiply the x and y coordinate of the projected vertices with the w-coordinate which will cancel out the perspective divide. It's a bit tricky to do because the correction before all other transformations, but something along this line should work for the general case:
vec4 ndc = MVP * vec4(pos, 1);
float sz = ndc.w;
gl_Position= MVP * vec4(pos.xy * sz, pos.z, 1);
Drawback: Needs a specialized shader
CPU approach
The other option is to render the axis with a orthographic projection, while calculating the location where it has to be placed on the CPU.
This can, for example, be done by projection the target location with the perspective projection, perform the perspective divide. The resulting x,y components give the location in screen-space where the axis have to be placed.
Now use this position to render the axis with orthographic projection which will maintain the sizes no matter how far away the axis are.
Drawbacks: With this approach depth values might not be compatible with the perspective projected part of the scene.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have two questions that have been very lacking in answers on Google.
My first question - generating planes
I am trying to calculate the 4 vertices for a finite plane based on a provided normal, a position, and a radius. How can I do this? An example of some pseudo-code or a description of an algorithm to produce the 4 vertices of a finite plane would be much appreciated.
Furthermore, it would be useful to know how to rotate a plane with an arbitrary normal to align with another plane, such that their normals are the same, and their vertices are aligned.
My second question - distance to points on a cube
How do I calculate the distance to a point on the surface of a cube, given a vector from the centre of the cube?
This is quite hard to explain, and so my google searches on this have been hard to phrase well.
Basically, I have a cube with side length s. I have a vector from the centre of the cube v, and I want to know the distance from the centre of the cube to the point on the surface that that vector points to. Is there a generalised formula that can tell me this distance?
An answer to either of these would be appreciated, but a solution to the cube distance problem is the one that would be more convenient at this moment.
Thanks.
Edit:
I say "finite plane", what I mean is a quad. Forgive me for bad terminology, but I prefer to call it a plane, because I am calculating the quad based on a plane. The quad's vertices are just 4 points on the surface of the plane.
Second Question:
Say your vector is v=(x,y,z)
So the point where it hits the cube surface is the point where the largest coordinate in absolute value equals s, or mathematically:
(x,y,z) * (s/m)
where
m = max{ |x| , |y| , |z| }
The distance is:
|| (x,y,z) * (s/m) || = sqrt(x^2 + y^2 + z^2) * (s/max{ |x| , |y| , |z| })
We can also formulate the answer in norms:
distance = s * ||v||_2 / ||v||_inf
(These are the l2 norm and the l-infinity norm)
I am trying to understand the basic principles of 3D reconstruction, and have chosen to play around with OpenMVG. However, I have seen evidence that the following concepts I'm asking about apply to all/most SfM/MVS tools, not just OpenMVG. As such, I suspect any Computer Vision engineer should be able to answer these questions, even if they have no direct OpenMVG experience.
I'm trying to fully understand intrinsic camera parameters, or as they seem to be called, "camera instrinsics", or "intrinsic parameters". According to OpenMVG's documentation, camera intrinsics depend on the type of camera that is used to take the pictures (e.g., the camera model), of which, OpenMVG supports five models:
Pinhole: 3 intrinsic parameters (focal, principal point x, principal point y)
Pinhole Radial 1: 4 intrinsic params (focal, principal point x, principal point y, one radial distortion factor)
Pinhole Radial 3: 6 params (focal, principal point x, principal point y, 3 radial distortion factors)
Pinhole Brown: 8 params (focal, principal point x, principal point y, 5 distortion factors (3radial+2 tangential))
Pinhole w/ Fish-Eye Distortion: 7 params (focal, principal point x, principal point y, 4 distortion factors)
This is all explained on their wiki page that explains their camera model, which is the subject of my question.
On that page there are several core concepts that I need clarification on:
focal plane: What it is and how does it differ from the image plane (as shown in the diagram at the top of that page)?
focal distance/length: What is it?
principal point: What is it, and why should it ideally be the center of the image?
scale factor: Is this just an estimate of how far the camera is from the image plane?
distortion: What is it and what's the difference between its various subtypes:
radial
tangential
fish-eye
Thanks in advance for any clarification/correction here!
I am unsure about the focal plane, so I will come back to it after I write about the other concepts you mention. Suppose you have a pinhole camera model with rectangular pixels, and let P=[X Y Z]^T be a point in camera space, with ^T denoting the transpose. In that case (assuming Z is the camera axis), this point can be projected as p=KP where K (the calibration matrix) is
f_x 0 c_x
0 f_y c_y
0 0 1
(of course, you will want to divide p by its third coordinate after that).
The focal length, that I will note f is the distance between the camera center and the image plane. The variables
f_x=s_x*f
f_y=s_y*f
in the matrix above respectively express this value in terms of pixel width and height. The variables s_x and s_y are the scale factors that are mentioned on the page you cite. The scale factor is the ratio between the size (width or height) of pixels and the units that you use in camera space. So, for example, if your pixel widths are half the size of the units you use on the x axis of camera space, you will have s_x=2.
I have seen people use the term principal point to refer to different things. While some people define it as the intersection between the camera axis and the image plane (Wikipedia seems to do this), others define it as the point given by [c_x c_y]^T. For clarity's sake, let's separate the whole projection process:
The two terms on the right hand side of the equation do different things. The first one scales the point and puts it into the image plane. The second term (i.e. [c_x c_y 1]^T) shifts the result from the other term. So, [-c_x ,-c_y]^T is the center of the image's coordinate system.
As for the difference between tangential/radial distortion: usually when correcting distortion, we assume that the center of the image o remains undistorted. A pixel p will have "moved away" from its true position q under the effect of distortion. If that movement is along the vector q-o then the distortion is radial, but if that movement has a component in a different direction, it is said to (also) have tangential distortion.
As I said I'm a bit unsure about what the focal plane they show in their figure means, but I think the term usually refers to the plane on which the upside-down image would form in a physical pinhole camera. A point P on the image plane (expressed in world coordinates) would just be -P on the focal plane.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I want to move a box follow the mouse position but I don't know how to convert position that I get from sf::Mouse::getPosition() to the coordinate in OpenGL
If you can, try using the gluUnproject function from the GLU library. Otherwise, you will need to reimplement it by computing the inverse matrices of both modelview and projection, then apply them in reverse order (ie. reverse projection then reverse modelview) to your screen point. You may have to add an extra step to convert the window canvas coordinates back to the projection screen coordinates (that step depends on your projection setup).
I provided a sample programme using SDL and gluUnproject in that answer.
Note that:
the modelview inverse can be computed trivially by successivelly applying the opposit transformations in the reverse order.
For instance, if you set your modelview from identity first by an translation, then an rotation, all you need to do is to set it to the <-a,-b,-c> rotation, and then apply the <-x,-y,-z> translation to get the inverse modelview.
For the projection inverse matrix, the red book appendix F - pointer courtesie of that gamedev.net page (though the link is broken there) - gives a solution.
This will only provide you the matrices to unproject a point from the homogeneous opengl projection space. You need first to pick a point from that space. That point maybe chosen using the screen cordinates first transformed back into the projection space. In my example, this involves flipping the coordinates with regards to the canvas dimension (but things could be perhaps different with another projection setup) and then extending them to 3D by adding a carefully chosen z component.
That said, in the example programme of the other question, the goal was to cast a ray passing through the projected pixel into the scene, and figure out the distance from that line to points in the scene, and pick the closest one. You might be able to avoid the whole unproject business, by noticing that the mouse always move in the camera projection plan. Hence the translation vector for the object will necessary be composed of the X and Y unit vectors of the camera (I am assuming that Z is the axis perpendicular to the screen, as usual in OpenGL), both scaled by factor depending on the distance of the object to the camera.
You will get something like that:
+--------+ object translation plane
| /
| /
| /
| /
+----+ screen plane
| /
| /
| /
| /
+ camera eye position
You can get the scaling factor from the Intercept theorem, and the X and Y camera vectors from the first and second columns of the modelview matrix.
The final translation vector should be something along the lines of:
T = f * (dx * X + dy * Y)
where f is the scaling factor, X and Y the camera vectors, and <dx,dy> the mouse coordinates delta vector in the projection space.
You know your window resolution, and the mouse position relative to the window. From there you can determine a normalized coordinate in [0,1]. From this coordinate, you can then project a ray into your scene, and using the inverse of your projection*view matrix, can turn this into a world-space ray.
Then it is up to you to intersect the world space ray against your scene objects (via collision detection) to determine the "clicked on" objects (note that there may be more than one due to depth; usually you want the closest hit). This all depends on how you have organized your scene's spatial information and this is all made faster if you have some spatial partitioning structures (e.g. octree or BSP) for quick culling and simplified bounding boxes (e.g. AABBs or spheres) on your "scene objects" for a fast broad phase.
I would say more, but "the coordinate in OpenGL" is highly underspecified. Usually, you are not only interested in the coordinate, but also the "scene object" it meaningfully belongs to, and a whole bunch of other properties.