OpenCV Image Coordinate System - c++

I understand that the origin of the coordinate system in OpenCV is in the top left corner of the image, but I did not yet find a definite answer to the question, if integer coordinates fall on the top left border of a pixel or on its center.
I ran cv::aruco::interpolateCornersCharuco( ... ) and cv::findChessboardCorners( ... ) on the same ChArUco board and the results were inconsitent. The former suggests that the coordinate grid lies on the pixel borders, the latter, that it lies in the center of the pixels. The drawing functions (like cv::circle( ... ) also suggest the center of the pixels. I’m all the more confused, as cv::aruco::interpolateCornersCharuco( ... ) and cv::calibrateCamera( ... ) seem to both use cv::solvePnP( ... ) without modulating the coordinates. Maybe I oversaw something, but this suggests to me, that the functions would yield inconsistent camera parameters for the same input images.
For camera matrices the question translates to: Is the principle point for a perfect pinhole camera ( img_width / 2, img_height / 2 ) or ( (img_width - 1 ) / 2, (img_height - 1 ) / 2 ). For the simplest case of a point on the optical axis, one can determine, that for the former, the projection would be in the center of pixel border coordinate system and for the latter it would be in the center of the pixel center coordinate system.

Related

Invalid cameras calibration for an head mounter Eye Tracking system

I'm working on an Eye Tracking system with two cameras mounted on some kind of glasses. There are optical lenses so that the screen is perceived at around 420 mm from the eye.
From a few dozen pupil samples, we compute two eye models (one for each camera), located in their respective camera coordinates system. This is based on the works here, but modified so that an estimation of the eye center is found using some kind of brute-force approach to minimize the ellipse projection error on the model given its center position in camera space.
Theorically, an approximation of the cameras parameters would be symetrical to the lenses on the Y axis. So every camera should be at the coordinates (around 17.5mm or -17.5, 0, 3.3) with respect to the lenses coordinates system, a rotation of around 42.5 degrees on the Y axis.
With the However, with these values, there is an offset in the result. See below:
The red point is the gaze center estimated by the left eye tracker, the white one is the right eye tracker, in screen coordinates
The screen limits are represented by the white lines.
The green line is the gaze vector, in camera coordinates (projected in 2D for visualization)
The two camera centers found, projected in 2D, are in the middle of the eye (the blue circle).
The pupil samples and current pupils are represented by the ellipses with matching colors.
The offset on x isn't constant which mean the rotation on Y is not exact. and the position of the camera aren't precise too. In order to fix it, we used: this to calibrate and then this to get the rotation parameters from the rotation matrix.
We added a camera on the middle of the lenses (Close to the theorical 0,0,0 point ?) to get the extrinsics and intrinsic parameters of the cameras, relative to our lens center. However, with about 50 checkerboard captures from different positions, the results given by OpenCV doesn't seems correct.
For example, it gives for a camera a position of about (-14,0,10) in lens coordinates for the translation and something like (-2.38, 49, -2.83) as rotation angles in degrees.
The previous screenshots are taken with theses parameters. The theorical ones are a bit further apart, but are more likely to reach the screen borders, unlike the opencv value.
This is probably because the test camera is in front of the optic, not behind, where our real 0,0,0 would be located (we just add the distance at which the screen is perceived on the Z axis afterwards, which is 420mm).
However, we have no way to put the camera in (0, 0, 0).
As the system is compact (everything is captured within a few cm^2), each degree or millimeter can change the result drastically so without the precise value the cameras, we're a bit stuck.
Our objective here is to find an accurate way to get the extrinsic and intrisic parameters of each cameras, so that we can compute a precise position of the center of the eye of the person wearing the glasses, without other calibration procedure than looking around (so no fixation points)
Right now, the system is precise enough so that we get a global indication on where someone is looking on the screen,but there is a divergence between the right and left camera, it's not precise enough. Any advice or hint that could help us is welcome :)

Relate textures areas of a cube with the current Oculus viewport

I'm creating a 360° image player using Oculus rift SDK.
The scene is composed by a cube and the camera is posed in the center of it with just the possibility to rotate around yaw, pitch and roll.
I've drawn the object using openGL considering a 2D texture for each cube's face to create the 360° effect.
I would like to find the portion in the original texture that is actual shown on the Oculus viewport in a certain instant.
Up to now, my approach was try to find the an approximate pixel position of some significant point of the viewport (i.e. the central point and the corners) using the Euler Angles in order to identify some areas in the original textures.
Considering all the problems of using Euler Angles, do not seems the smartest way to do it.
Is there any better approach to accomplish it?
Edit
I did a small example that can be runned in the render loop:
//Keep the Orientation from Oculus (Point 1)
OVR::Matrix4f rotation = Matrix4f(hmdState.HeadPose.ThePose);
//Find the vector respect to a certain point in the viewport, in this case the center (Point 2)
FovPort fov_viewport = FovPort::CreateFromRadians(hmdDesc.CameraFrustumHFovInRadians, hmdDesc.CameraFrustumVFovInRadians);
Vector2f temp2f = fov_viewport.TanAngleToRendertargetNDC(Vector2f(0.0,0.0));// this values are the tangent in the center
Vector3f vector_view = Vector3f(temp2f.x, temp2f.y, -1.0);// just add the third component , where is oriented
vector_view.Normalize();
//Apply the rotation (Point 3)
Vector3f final_vect = rotation.Transform(vector_view);//seems the right operation.
//An example to check if we are looking at the front face (Partial point 4)
if (abs(final_vect.z) > abs(final_vect.x) && abs(final_vect.z) > abs(final_vect.y) && final_vect.z <0){
system("pause");
}
Is it right to consider the entire viewport or should be done for each single eye?
How can be indicated a different point of the viewport respect to the center? I don't really understood which values should be the input of TanAngleToRendertargetNDC().
You can get a full rotation matrix by passing the camera pose quaternion to the OVR::Matrix4 constructor.
You can take any 2D position in the eye viewport and convert it to its camera space 3D coordinate by using the fovPort tan angles. Normalize it and you get the direction vector in camera space for this pixel.
If you apply the rotation matrix gotten earlier to this direction vector you get the actual direction of that ray.
Now you have to convert from this direction to your texture UV. The component with the highest absolute value in the direction vector will give you the face of the cube it's looking at. The remaining components can be used to find the actual 2D location on the texture. This depends on how your cube faces are oriented, if they are x-flipped, etc.
If you are at the rendering part of the viewer, you will want to do this in a shader. If this is to find where the user is looking at in the original image or the extent of its field of view, then only a handful of rays would suffice as you wrote.
edit
Here is a bit of code to go from tan angles to camera space coordinates.
float u = (x / eyeWidth) * (leftTan + rightTan) - leftTan;
float v = (y / eyeHeight) * (upTan + downTan) - upTan;
float w = 1.0f;
x and y are pixel coordinates, eyeWidth and eyeHeight are eye buffer size, and *Tan variables are the fovPort values. I first express the pixel coordinate in [0..1] range, then scale that by the total tan angle for the direction, and then recenter.

OpenGL: Size of a 3D bounding box on screen

I need a simple and fast way to find out how big a 3D bounding box appears on screen (for LOD calculation) by using OpenGL Modelview and Projection matrices and the OpenGL Viewport dimensions.
My first intention is to project all 8 box corners on screen by using gluProject() and calculate the area of the convex hull afterwards. This solution works only with bounding boxes that are fully within the view frustum.
But how can a get the covered area on screen for boxes that are not fully within the viewing volume? Imaging a box where 7 corners are behind the near plane and only one corner is in front of the near plane and thus within the view frustum.
I have found another very similar question Screen Projection and Culling united but it does not cover my problem.
what about using queries and get samples that passes rendering?
http://www.opengl.org/wiki/Query_Object and see GL_SAMPLES_PASSED,
that way you could measure how many fragments are rendered and compare it for proper LOD selection.
Why not just manually multiply the world-view-projection with the vertex positions? This will give you the vertices in "normalized device coordinates" where -1 is the bottom left of the screen and +1 is the top-right of the screen?
The only thing is if the projection is perspective, you have to divide your vertices by their 4th component, ie if the final vertex is (x,y,z,w) you would divide by w.
Take for example a position vector
v = {x, 0, -z, 1}
Given a vertical viewing angle view 'a' and an aspect ration 'r', the position of x' in normalized device coordinates (range 0 - 1) is this (this formula taken directly out of a graphics programming book):
x' = x * cot(a/2) / ( r * z )
So a perspective projection for given parameters these will be as follows (shown in row major format):
cot(a/2) / r 0 0 0
0 cot(a/2) 0 0
0 0 z1 -1
0 0 z2 0
When you multiply your vector by the projection matrix (assuming the world, view matrices are identity in this example) you get the following (i'm only computing the new "x" and "w" values cause only they matter in this example).
v' = { x * cot(a/2) / r, newY, newZ, z }
So finally when we divide the new vector by its fourth component we get
v' = { x * cot(a/2) / (r*z), newY/z, newZ/z, 1 }
So v'.x is now the screen space coordinate v.x. This is exactly what the graphics pipeline does to figure out where your vertex is on screen.
I've used this basic method before to figure out the size of geometry on screen. The nice part about it is that the math works regardless of whether or not the projection is perspective or orthographic, as long you divide by the 4th component of the vector (for orthographic projections, the 4th component will be 1).

Camera calibration: aspect ratio different from image aspect ratio. How to correct?

I've been going back and forth between my OpenCV and OpenGL components and I'm not sure which of the two should correct for this.
Using OpenCV camera calibration yields fx, fy with an aspect ratio of approximately 1, which would correspond to images of square size. My calibration output:
...
image_width: 640
image_height: 480
board_width: 6
board_height: 9
square_size: 5.
flags: 0
camera_matrix: !!opencv-matrix
rows: 3
cols: 3
dt: d
data: [ 6.6244874649105122e+02, 0., 3.4060477796553954e+02,
0., 6.4821741696313484e+02, 2.5815418044786418e+02,
0., 0., 1. ]
distortion_coefficients: !!opencv-matrix
rows: 5
cols: 1
dt: d
data: [ -1.1832005538154940e-01, 1.2254891816651683e+00, 3.8133468645674677e-02, 1.3073747832019200e-02, -3.9497162757084490e+00 ]
However, my image widths are 640x480, as you can see in the calibration output.
When I detect the checkerboard in the frame and draw the checkerboard with OpenGL, what OpenGL renders ok in height but stretched in width direction so it doesn't fit the checkerboard where it really is. What certainly solves it is multiplying the fx component of the calibration by 480/640, but I don't know if that is the way to go.
How do I correct for this? Scale the calibration values with the image size, or do something in OpenGL to fix it?
Edit
There is the distinction between capturing and displaying. I capture images with a smartphone that was calibrated, and that smartphone spits out images of 640x480.
For finding the chessboard I use the intrinsics as shown above, that were calibrated.
Then, no matter which aspect ratio I give to OpenGL, let it be fx/fy, 640/480 or fx/fy * 640/480, it is wrong. The chessboard that is projected back in OpenGL in such a way that what OpenGL projects back is stretched in width direction.
The only way that it looks exactly right in OpenGL, is if I use fx=640, fy=480 for finding the chessboard. And that is wrong as well because now I am totally ignoring the camera intrinsics...
Edit2
I mean no matter how I set the aspect ratio that I pass to gluPerspective, it doesn't come out right.
gluPerspective( /* field of view in degree */ m_fovy,
/* aspect ratio */ 1.0/m_aspect_ratio,
/* Z near */ 1.0, /* Z far */ 1000.0);
What I've tried for values of m_aspect_ratio:
Output aspect ratio of OpenCV's calibrationMatrixValues
fx/fy
640/480
fx/fy * 640/480
output of calibrationMatrixValues * 640/480
All seem to botch the width. Note that the origin of the chessboard is in my screenshot the topmost inner corner in the image: it is placed correctly, and so is the bottommost inner corner in the image. It's a scaling problem..
Edit3
It was something really, really stupid.. I was setting the aspect ratio for OpenGL like so:
gluPerspective( /* field of view in degree */ m_fovy,
/* aspect ratio */ 1.0/m_aspect_ratio,
/* Z near */ 1.0, /* Z far */ 1000.0);
and setting
m_aspect_ratio = viewportpixelheight / viewportpixelwidth;
not realizing that viewportpixelheight and viewportpixelwidth are integers, so I'm doing integer division which resulted either in 1 or (when swapping them) in 0.
The camera calibration matrix maps world coordinates to image coordinates. OpenGL maps image coordinates to screen coordinates. If your problem is in how the image is being displayed you should handle it in OpenGL.
I'm sorry for the confusion. What I was trying to do is make a distinction between capturing and displaying the image. You are correct that the aspect ratio of the physical camera can be calculated from the focal lengths. Those focal lengths and that aspect ratio are fixed by the hardware of the camera. Once you have captured that image though you are free to display it at any aspect ratio you choose with OpenGL. You can crop and stretch the image all you like to change the aspect ratio of what is displayed. It is not a given that the aspect ratio of the camera matches the aspect ratio of your screen. OpenCV calculates the camera calibration matrix with direct raw measurements from the physical camera. If we assume they are correct and constant (both of which seem reasonable if there is no zoom) then any further changes to aspect ratio are the responsibility of OpenGL. When I said fx and fy do not determine aspect ratio I was referring to the displayed aspect ratio which was not very clear at all I'm sorry.
Also, I should mention, the reason you can calculate aspect ratio from focal length is that focal length is expressed in units of pixels and those units can be different on the x and y axis. A brief explanation can be found here
The best explanation I have found of focal lengths in the camera matrix is in the section on Camera intrinsics of Computer Vision Algorithms and Applications. In the pdf it starts on page 72, page 50 in the book.
An aspect ratio of 1.0 doesn't indicate a square image, it indicates square pixels (which explains why it's almost always 1.0; non-square pixels are relatively rare except in cameras that capture in anamorphic format). The camera matrix contains no information about absolute image dimensions or physical camera dimensions.
On the other hand, gluPerspective's aspect ratio does represent image dimensions. It's important not to confuse one aspect ratio for the other.
If you want to use gluPerspective, it is possible, but you should understand that gluPerspective doesn't let you model all of the intrinsic camera parameters (namely axis skew and principal point offset). I describe how to set the aspect ratio and fovy correctly in this article.
However, I strongly recommend using either gluFrustum (which allows non-zero principal point offset), or glLoadMatrix directly (which also allows nonzero axis skew). Both approaches are explained in this article.

Problem with Multigradient brush implementation from scatch in C++ and GDI

I am trying to implement a gradient brush from scratch in C++ with GDI. I don't want to use GDI+ or any other graphics framework. I want the gradient to be of any direction (arbitrary angle).
My algorithm in pseudocode:
For each pixel in x dirrection
For each pixel in the y direction
current position = current pixel - centre //translate origin
rotate this pixel according to the given angle
scalingFactor =( rotated pixel + centre ) / extentDistance //translate origin back
rgbColor = startColor + scalingFactor(endColor - startColor)
extentDistance is the length of the line passing from the centre of the rectangle and has gradient equal to the angle of the gradient
Ok so far so good. I can draw this and it looks nice. BUT unfortunately because of the rotation bit the rectangle corners have the wrong color. The result is perfect only for angle which are multiples of 90 degrees. The problem appears to be that the scaling factor doesn't scale over the entire size of the rectangle.
I am not sure if you got my point cz it's really hard to explain my problem without a visualisation of it.
If anyone can help or redirect me to some helpful material I'd be grateful.
Ok guys fixed it. Apparently the problem was that when I was rotating the gradient fill (not the rectangle) I wasn't calculating the scaling factor correctly. The distance over which the gradient is scaled changes according to the gradient direction. What must be done is to find where the edge points of the rect end up after the rotation and based on that you can find the distance over which the gradient should be scaled. So basically what needs to be corrected in my algorithm is the extentDistance.
How to do it:
•Transform the coordinates of all four corners
•Find the smallest of all four x's as minX
•Find the largest of all four x's and call it maxX
•Do the same for y's.
•The distance between these two point (max and min) is the extentDistance