Missing pixels when rotating an object - c++

So I created an ellipsoid but when I try to rotate it almost half of the voxels goes missing. Whether I do it using the rotation matrix or using the transfer matrix from one base to another, I immediately start losing pixels. If the ellipsoid is at 0 degrees, 90, 180, or 270, it looks fine. But as it travels in-between those angles, the background color starts peeking through in little holes everywhere on my object. I assume this is some kind of float to int conversion issue, but I don't know how to go about fixing it. Anyone have any ideas?

The reason this is happening is because when you draw your rotated shape, due to the rounding/truncation of coordinates, multiple source pixels map to the same target pixel, which means that there are target pixels that do not get any value.
The correct way to do this is to take all target pixels and perform the inverse transform, and collect the value that should be in the target pixel. Potentially doing some interpolation.

Related

Is there a way to make a flat surface, that is perpendicular to the camera, visible in OpenGL?

As seen in the picture we have a rectangle which is 2 Dimensional in a 3D space.
When I rotate said rectangle (Slightly rotated) so that the normal of this plate is perpendicular to the camera, it vanishes (Heavily rotated Rectangle just before vanishing). The intuition would tell me, that the plate is supposed to show a single line of pixels, however this is not the case (but certainly what I'm trying to achieve). This Problem seems so trivial, but I can't find a solution to it.

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

Opengl rotation at a point

I want to rotate a shape in opengl, but I want to rotate it at a point. Namely I have a cylinder and I want to rotate it so it looks like it is spinning at the bottom and the spin 'size' increases until the object falls to the ground. How would I do this kind of rotation in opengl?
Translate to the origin
Rotate
Translate back
So, if you want to rotate around (a,b,c), you would translate (-a,-b,-c) in step 1, and (a,b,c) in step 3.
(Don't be afraid of the number of operations, by the way. Internally all you do is multiply the transform matrix three times, but the pipeline that transforms the vertices is agnostic of how many operations you did, it still only uses the one final matrix. The magic of using a matrix for transformation.)

OpenGL: Understanding transformation

I was trying to understand lesson 9 from NEHEs tutorial, which is about bitmaps being moved in 3d space.
the most interesting thing here is to move 2d bitmap texture on a simple quad through 3d space and keep it facing the screen (viewer) all the time. So the bitmap looks 3d but is 2d facing the viewer all the time no matter where it is in the 3d space.
In lesson 9 a list of stars is generated moving in a circle, which looks really nice. To avoid seeing the star from its side the coder is doing some tricky coding to keep the star facing the viewer all the time.
the code for this is as follows: ( the following code is called for each star in a loop)
glLoadIdentity();
glTranslatef(0.0f,0.0f,zoom);
glRotatef(tilt,1.0f,0.0f,0.0f);
glRotatef(star[loop].angle,0.0f,1.0f,0.0f);
glTranslatef(star[loop].dist,0.0f,0.0f);
glRotatef(-star[loop].angle,0.0f,1.0f,0.0f);
glRotatef(-tilt,1.0f,0.0f,0.0f);
After the lines above, the drawing of the star begins. If you check the last two lines, you see that the transformations from line 3 and 4 are just cancelled (like undo). These two lines at the end give us the possibility to get the star facing the viewer all the time. But i dont know why this is working.
And i think this comes from my misunderstanding of how OpenGL really does the transformations.
For me the last two lines are just like undoing what is done before, which for me, doesnt make sense. But its working.
So when i call glTranslatef, i know that the current matrix of the view gets multiplied with the translation values provided with glTranslatef.
In other words "glTranslatef(0.0f,0.0f,zoom);" would move the place where im going to draw my stars into the scene if zoom is negative. OK.
but WHAT exactly is moved here? Is the viewer moved "away" or is there some sort of object coordinate system which gets moved into scene with glTranslatef? Whats happening here?
Then glRotatef, what is rotated here? Again a coordinate system, the viewer itself?
In a real world. I would place the star somewhere in the 3d space, then rotate it in the world space around my worlds origin, then do the moving as the star is moving to the origin and starts at the edge again, then i would do a rotate for the star itself so its facing to the viewer. And i guess this is done here. But how do i rotate first around the worlds origin, then around the star itself? for me it looks like opengl is switching between a world coord system and a object coord system which doesnt really happen as you see.
I dont need to add the rest of the code, because its pretty standard. Simple GL initializing for 3d drawings, the rotating stuff, and then the simple drawing of QUADS with the star texture using blending. Thats it.
Could somebody explain what im misunderstanding here?
Another way of thinking about the gl matrix stack is to walk up it, backwards, from your draw call. In your case, since your draw is the last line, let's step up the code:
1) First, the star is rotated by -tilt around the X axis, with respect to the origin.
2) The star is rotated by -star[loop].angle around the Y axis, with respect to the origin.
3) The star is moved by star[loop].dist down the X axis.
4) The star is rotated by star[loop].angle around the Y axis, with respect to the origin. Since the star is not at the origin any more due to step 3, this rotation both moves the center of the star, AND rotates it locally.
5) The star is rotated by tilt around the X axis, with respect to the origin. (Same note as 4)
6) The star is moved down the Z axis by zoom units.
The trick here is difficult to type in text, but try and picture the sequence of moves. While steps 2 and 4 may seem like they invert each other, the move in between them changes the nature of the rotation. The key phrase is that the rotations are defined around the Origin. Moving the star changes the effect of the rotation.
This leads to a typical use of stacking matrices when you want to rotate something in-place. First you move it to the origin, then you rotate it, then you move it back. What you have here is pretty much the same concept.
I find that using two hands to visualize matrices is useful. Keep one hand to represent the origin, and the second (usually the right, if you're in a right-handed coordinate system like OpenGL), represents the object. I splay my fingers like the XYZ axes to I can visualize the rotation locally as well as around the origin. Starting like this, the sequence of rotations around the origin, and linear moves, should be easier to picture.
The second question you asked pertains to how the camera matrix behaves in a typical OpenGL setup. First, understand the concept of screen-space coordinates (similarly, device-coordinates). This is the space that is actually displayed. X and Y are the vectors of your screen, and Z is depth. The space is usually in the range -1 to 1. Moving an object down Z effectively moves the object away.
The Camera (or Perspective Matrix) is typically responsible for converting 'World' space into this screen space. This matrix defines the 'viewer', but in the end it is just another matrix. The matrix is always applied 'last', so if you are reading the transforms upward as I described before, the camera is usually at the very top, just as you are seeing. In this case you could think of that last transform (translate by zoom) as a very simple camera matrix, that moves the camera back by zoom units.
Good luck. :)
The glTranslatef in the middle is affected by the rotation : it moves the star along axis x' to distance dist, and axis x' is at that time rotated by ( tilt + angle ) compared to the original x axis.
In opengl you have object coordinates which are multiplied by a (a stack of) projection matrix. So you are moving the objects. If you want to "move a camera" you have to mutiply by the inverse matrix of the camera position and axis :
ProjectedCoords = CameraMatrix^-1 . ObjectMatrix . ObjectCoord
I also found this very confusing but I just played around with some of the NeHe code to get a better understanding of glTranslatef() and glRotatef().
My current understanding is that glRotatef() actually rotates the coordinate system, such that glRotatef(90.0f, 0.0f, 0.0f, 1.0f) will cause the x-axis to be where the y-axis was previously. After this rotation, glTranslatef(1.0f, 0.0f, 0.0f) will move an object upwards on the screen.
Thus, glTranslatef() moves objects in accordance with the current rotation of the coordinate system. Therefore, the order of glTranslatef and glRotatef are important in tutorial 9.
In technical terms my description might not be perfect, but it works for me.

Fast plane rotation algorithm?

I am working on an application that detects the most prominent rectangle in an image, then seeks to rotate it so that the bottom left of the rectangle rests at the origin, similar to how IUPR's OSCAR system works. However, once the most prominent rectangle is detected, I am unsure how to take into account the depth component or z-axis, as the rectangle won't always be "head-on". Any examples to further my understanding would be greatly appreciated. Seen below is an example from IUPR's OSCAR system.
alt text http://quito.informatik.uni-kl.de/oscar/oscar.php?serverimage=img_0324.jpg&montage=use
You don't actually need to deal with the 3D information in this case, it's just a mappping function, from one set of coordinates to another.
Look at affine transformations, they're capable of correcting simple skew and perspective effects. You should be able to find code somewhere that will calculate a transform from the 4 points at the corners of your rectangle.
Almost forgot - if "fast" is really important, you could simplify the system to only use simple shear transformations in combination, though that'll have a bad impact on image quality for highly-tilted subjects.
Actually, I think you can get away with something much simpler than Mark's approach.
Once you have the 2D coordinates on the skewed image, re-purpose those coordinates as texture coordinates.
In a renderer, draw a simple rectangle where each corner's vertices are texture mapped to the vertices found on the skewed 2D image (normalized and otherwise transformed to your rendering system's texture coordinate plane).
Now you can rely on hardware (using OpenGL or similar) to do the correction for you, or you can write your own texture mapper:
The aspect ratio will need to be guessed at since we are disposing of the actual 3D info. However, you can get away with just taking the max width and max height of your skewed rectangle.
Perspective Texture Mapping by Chris Hecker