I have a pallet that shown in the following image and I would like to segment it. I don´t have the ground plane, so that I can check the normals of the points if they are parallel to the ground plane normal then segment those points. but I dont have the ground plane. How would I segment those points.
[Edit by Spektre]
Here Red/Blue color encoded partial derivation of your depth image from your previous question:
The black areas has the same Z coordinate (poor Z resolution or plane parallel with screen projection plane) The Red/Blue lines are geometric edges in point cloud. As you can see it is far from the real plane.
Related
I am trying to understand the pinhole camera model and the geometry behind some computer vision and camera calibration stuff that I am looking at.
So, if I understand correctly, the pinhole camera model maps the pixel coordinates to 3D real world coordinates. So, the model looks as:
y = K [R|T]x
Here y is pixel coordinates in homogeneous coordinates, R|T represent the extrinsic transformation matrix and x is the 3D world coordinates also in homogeneous coordinates.
Now, I am looking at a presentation which says
project the center of the focus region onto the ground plane using [R|T]
Now the center of the focus region is just taken to be the center of the image. I am not sure how I can estimate the ground plane? Assuming, the point to be projected is in input space, the projection should that be computed by inverting the [R|T] matrix and multiplying that point by the inverted matrix?
EDIT
Source here on page 29: http://romilbhardwaj.github.io/static/BuildSys_v1.pdf
I would like to compute the volume of that cube in the figure without point cloud or depth map, I don't have access to them, but I have access to the corners of the cube in the screen space coordinates.
I know the ground mesh it's it a 0,0,0. Then I project a ray from origin 0,0,0 to all the points. I'm following the article to project a ray from camera to an image plane http://nghiaho.com/?page_id=363
My question is how would I know which points are candidates for the cube and which points are not candidates ?
I am using orthographic projection glOrtho for my scene. I implemented a virtual trackball to rotate an object beside that I also implemented a zoom in/out on the view matrix. Say I have a cube of size 100 unit and is located at the position of (0,-40000,0) far from the origin. If the center of rotation is located at the origin once the user rotate the cube and after zoom in or out, it could be position at some where (0,0,2500000) (this position is just an assumption and it is calculated after multiplied by the view matrix). Currently I define a very big range of near(-150000) and far(150000) plane, but some time the object still lie outside either the near or far plane and the object just turn invisible, if I define a larger near and far clipping plane say -1000000 and 1000000, it will produce an ungly z artifacts. So my question is how do I correctly calculate the near and far plane when user rotate the object in real time? Thanks in advance!
Update:
I have implemented a bounding sphere for the cube. I use the inverse of view matrix to calculate the camera position and calculate the distance of the camera position from the center of the bounding sphere (the center of the bounding sphere is transformed by the view matrix). But I couldn't get it to work. can you further explain what is the relationship between the camera position and the near plane?
A simple way is using the "bounding sphere". If you know the data bounding box, the maximum diagonal length is the diameter of the bounding sphere.
Let's say you calculate the distance 'dCC' from the camera position to the center of the sphere. Let 'r' the radius of that sphere. Then:
Near = dCC - r - smallMargin
Far = dCC + r + smallMargin
'smallMargin' is a value used just to avoid clipping points on the surface of the sphere due to numerical precision issues.
The center of the sphere should be the center of rotation. If not, the diameter should grow so as to cover all data.
I'm working on an Eye Tracking system with two cameras mounted on some kind of glasses. There are optical lenses so that the screen is perceived at around 420 mm from the eye.
From a few dozen pupil samples, we compute two eye models (one for each camera), located in their respective camera coordinates system. This is based on the works here, but modified so that an estimation of the eye center is found using some kind of brute-force approach to minimize the ellipse projection error on the model given its center position in camera space.
Theorically, an approximation of the cameras parameters would be symetrical to the lenses on the Y axis. So every camera should be at the coordinates (around 17.5mm or -17.5, 0, 3.3) with respect to the lenses coordinates system, a rotation of around 42.5 degrees on the Y axis.
With the However, with these values, there is an offset in the result. See below:
The red point is the gaze center estimated by the left eye tracker, the white one is the right eye tracker, in screen coordinates
The screen limits are represented by the white lines.
The green line is the gaze vector, in camera coordinates (projected in 2D for visualization)
The two camera centers found, projected in 2D, are in the middle of the eye (the blue circle).
The pupil samples and current pupils are represented by the ellipses with matching colors.
The offset on x isn't constant which mean the rotation on Y is not exact. and the position of the camera aren't precise too. In order to fix it, we used: this to calibrate and then this to get the rotation parameters from the rotation matrix.
We added a camera on the middle of the lenses (Close to the theorical 0,0,0 point ?) to get the extrinsics and intrinsic parameters of the cameras, relative to our lens center. However, with about 50 checkerboard captures from different positions, the results given by OpenCV doesn't seems correct.
For example, it gives for a camera a position of about (-14,0,10) in lens coordinates for the translation and something like (-2.38, 49, -2.83) as rotation angles in degrees.
The previous screenshots are taken with theses parameters. The theorical ones are a bit further apart, but are more likely to reach the screen borders, unlike the opencv value.
This is probably because the test camera is in front of the optic, not behind, where our real 0,0,0 would be located (we just add the distance at which the screen is perceived on the Z axis afterwards, which is 420mm).
However, we have no way to put the camera in (0, 0, 0).
As the system is compact (everything is captured within a few cm^2), each degree or millimeter can change the result drastically so without the precise value the cameras, we're a bit stuck.
Our objective here is to find an accurate way to get the extrinsic and intrisic parameters of each cameras, so that we can compute a precise position of the center of the eye of the person wearing the glasses, without other calibration procedure than looking around (so no fixation points)
Right now, the system is precise enough so that we get a global indication on where someone is looking on the screen,but there is a divergence between the right and left camera, it's not precise enough. Any advice or hint that could help us is welcome :)
I'm trying to implement tiled deferred rendering method and now I'm stuck. I'm computing min/max depth for each tile (32x32) and storing it in texture. Then I want to compute screen space bounding box (bounding square) represented by left down and top right coords of rectangle for every pointlight (sphere) in my scene (see pic from my app). This together with min/max depth will be used to check if light affects actual tile.
Problem is I have no idea how to do this. Any idea, source code or exact math?
Update
Screen-space is basically a 2D entity, so instead of a bounding box think of a bounding rectangle.
Here is a simple way to compute it:
Project 8 corner points of your world-space bounding box onto the screen using your ModelViewProjection matrix
Find a bounding rectangle of these points (which is just min/max X and Y coordinates of the points)
A more sophisticated way can be used to compute a screen-space bounding rect for a point light source. We calculate four planes that pass through the camera position and are tangent to the light’s sphere of illumination (the light radius). Intersections of each tangent plane with the image plane gives us 4 lines on the image plane. This lines define the resulting bounding rectangle.
Refer to this article for math details: http://www.altdevblogaday.com/2012/03/01/getting-the-projected-extent-of-a-sphere-to-the-near-plane/