OpenGL ES coordinates to screen pixels - c++

I am trying to make an advertising application in openGL es 2.0.
Minimizing the problem here, i can explain as an example that I created a rectangle animated cube with having some advertising images on top of it. model and animation is created in 3DS Max and converted into .pod and it is coming in the Tv screen perfectly.
Now I want to know how much screen it is covering in pixels, if my projection is 1280x720, because scaling and translation has been given in the hands of advertiser and he don't know coordinates. advertiser only knows the language of pixels. So if he increase the X axis scale in pixels, I need to convert those to OpenGL coordinates and also have to adjust the translation by myself, so that cube not goes out of screen.
In short, how can I get the no of pixels taken by cube in screen? Is there any easy way?

It's the MVP matrix which gets applied by rendering pipeline to the 'OpenGL coordinates/vertices' to finally extract the screen coordinates.
So it's possible to use it's inverse to compute vertices.
Now the problem is multiple combinations of vertices, view and projection matrices can give the same screen coordinates, i.e. the mapping from vertex position to screen coordinates is not unique.
So we have to reduce the unknowns in the equation to just x and y by fixing all the other variables (in case of translation) and probably to just z (in case of scaling).
For translation, for example, the code could be:
Point3D get3dPoint(Point2D point2D, int width,
int height, Matrix viewMatrix, Matrix projectionMatrix) {
double x = 2.0 * point2D.x / clientWidth - 1;
double y = - 2.0 * point2D.y / clientHeight + 1;
Matrix4 viewProjectionInverse = inverse(projectionMatrix *
viewMatrix);
double fixedZ = 1.0;
Point3D point3D = new Point3D(x, y, fixedZ);
return viewProjectionInverse.multiply(point3D);
}

Related

OpenGL - Why is my ray picking not working?

I recently setup a project that uses OpenGL (Via the C# Wrapper Library OpenTK) which should do the following:
Create a perspective projection camera - this camera will be used to make the user rotate,move etc. to look at my 3d models.
Draw some 3d objects.
Use 3d ray picking via unproject to let the user pick points/models in the 3d view.
The last step (ray picking) looks ok on my 3d preview (GLControl) but returns invalid results like Vector3d (1,86460186949617; -45,4086124979203; -45,0387025610247). I have no idea why this is the case!
I am using the following code to setup my viewport:
this.RenderingControl.MakeCurrent();
int w = RenderingControl.Width;
int h = RenderingControl.Height;
// Use all of the glControl painting area
GL.Viewport(0, 0, w, h);
GL.MatrixMode(MatrixMode.Projection);
GL.LoadIdentity();
Matrix4 p = Matrix4.CreatePerspectiveFieldOfView(MathHelper.PiOver4, w / (float)h, 0.1f, 64.0f);
GL.LoadMatrix(ref p);
I use this method for unprojecting:
/// <summary>
/// This methods maps screen coordinates to viewport coordinates.
/// </summary>
/// <param name="screen"></param>
/// <param name="view"></param>
/// <param name="projection"></param>
/// <param name="view_port"></param>
/// <returns></returns>
private Vector3d UnProject(Vector3d screen, Matrix4d view, Matrix4d projection, int[] view_port)
{
Vector4d pos = new Vector4d();
// Map x and y from window coordinates, map to range -1 to 1
pos.X = (screen.X - (float)view_port[0]) / (float)view_port[2] * 2.0f - 1.0f;
pos.Y = (screen.Y - (float)view_port[1]) / (float)view_port[3] * 2.0f - 1.0f;
pos.Z = screen.Z * 2.0f - 1.0f;
pos.W = 1.0f;
Vector4d pos2 = Vector4d.Transform(pos, Matrix4d.Invert(Matrix4d.Mult(view, projection)));
Vector3d pos_out = new Vector3d(pos2.X, pos2.Y, pos2.Z);
return pos_out / pos2.W;
}
I use this code to position my camera (including rotation) and do the ray picking:
// Clear buffers
GL.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);
// Apply camera
GL.MatrixMode(MatrixMode.Modelview);
Matrix4d mv = Matrix4d.LookAt(EyePosition, Vector3d.Zero, Vector3d.UnitY);
GL.LoadMatrix(ref mv);
GL.Translate(0, 0, ZoomFactor);
// Rotation animation
if (RotationAnimationActive)
{
CameraRotX += 0.05f;
}
if (CameraRotX >= 360)
{
CameraRotX = 0;
}
GL.Rotate(CameraRotX, Vector3.UnitY);
GL.Rotate(CameraRotY, Vector3.UnitX);
// Apply useful rotation
GL.Rotate(50, 90, 30, 0f);
// Draw Axes
drawAxes();
// Draw vertices of my 3d objects ...
// Picking Test
int x = MouseX;
int y = MouseY;
int[] viewport = new int[4];
Matrix4d modelviewMatrix, projectionMatrix;
GL.GetDouble(GetPName.ModelviewMatrix, out modelviewMatrix);
GL.GetDouble(GetPName.ProjectionMatrix, out projectionMatrix);
GL.GetInteger(GetPName.Viewport, viewport);
// get depth of clicked pixel
float[] t = new float[1];
GL.ReadPixels(x, RenderingControl.Height - y, 1, 1, OpenTK.Graphics.OpenGL.PixelFormat.DepthComponent, PixelType.Float, t);
var res = UnProject(new Vector3d(x, viewport[3] - y, t[0]), modelviewMatrix, projectionMatrix, viewport);
GL.Begin(BeginMode.Lines);
GL.Color3(Color.Yellow);
GL.Vertex3(0, 0, 0);
GL.Vertex3(res);
Debug.WriteLine(res.ToString());
GL.End();
I get the following result from my ray picker:
Clicked Position = (1,86460186949617; -45,4086124979203;
-45,0387025610247)
This vector is shown as the yellow line on the attached screenshot.
Why is the Y and Z Position not in the range -1/+1? Where do these values like -45 come from and why is the ray rendered correctly on the screen?
If you have only a tip about what could be broken I would also appreciate your reply!
Screenshot:
If you break down the transform from screen to world into individual matrices, print out the inverses of the M, V, and P matrices, and print out the intermediate result of each (matrix inverse) * (point) calculation from screen to world/model, then I think you'll see the problem. Or at least you'll see that there is a problem with using the inverse of the M-V-P matrix and then intuitively grasp the solution. Or maybe just read the list of steps below and see if that helps.
Here's the approach I've used:
Convert the 2D vector for mouse position in screen/control/widget coordinates to the 4D vector (mouse.x, mouse.y, 0, 1).
Transform the 4D vector from screen coordinates to Normalized Device Coordinates (NDC) space. That is, multiply the inverse of your NDC-to-screen matrix [or equivalent equations] by (mouse.x, mouse.y, 0, 1) to yield a 4D vector in NDC coordinate space: (nx, ny, 0, 1).
In NDC coordinates, define two 4D vectors: the source (near point) of the ray as (nx, ny, -1, 1) and a far point at (nx, ny, +1, 1).
Multiply each 4D vector by the inverse of the (perspective) projection matrix.
Convert each 4D vector to a 3D vector (i.e. divide through by the fourth component, often called "w"). *
Multiply the 3D vectors by the inverse of the view matrix.
Multiply the 3D vectors by the inverse of the model matrix (which may well be the identity matrix).
Subtract the 3D vectors to yield the ray.
Normalize the ray.
Yee-haw. Go back and justify each step with math, if desired, or save that review for later [if ever] and work frantically towards catching up on creating actual 3D graphics and interaction and whatnot.
Go back and refactor, if desired.
(* The framework I use allows multiplication of a 3D vector by a 4x4 matrix because it treats the 3D vector as a 4D vector. I can make this more clear later, if necessary, but I hope the point is reasonably clear.)
That worked for me. This set of steps also works for Ortho projections, though with Ortho you could cheat and write simpler code since the projection matrix isn't goofy.
It's late as I write this and I may have misinterpreted your problem. I may have also misinterpreted your code since I use a different UI framework. But I know how aggravating ray casting for OpenGL can be, so I'm posting in the hope that at least some of what I write is useful, and that I can thereby alleviate some human misery.
Postscript. Speaking of misery: I found numerous forum posts and blog pages that address ray casting for OpenGL, but most posts start with some variant of the following: "First, you have to know X" [where X is not necessary to know]; or "Go look at the unproject function [in library X in repository Y for which you'll need client app Z . ..]"; or a particular favorite of mine: "Review a textbook on linear algebra."
Having to slog through yet another description of the OpenGL rendering pipeline or the OpenGL transformation conga line when you just need to debug ray casting--a common problem--is like having to listen to a lecture on hydraulics when you discover your brake pedal isn't working.

Wrong aspect ratio calculations for camera (simple ray-caster)

I am working on some really simple ray-tracer.
For now I am trying to make the perspective camera works properly.
I use such loop to render the scene (with just two, hard-coded spheres - I cast ray for each pixel from its center, no AA applied):
Camera * camera = new PerspectiveCamera({ 0.0f, 0.0f, 0.0f }/*pos*/,
{ 0.0f, 0.0f, 1.0f }/*direction*/, { 0.0f, 1.0f, 0.0f }/*up*/,
buffer->getSize() /*projectionPlaneSize*/);
Sphere * sphere1 = new Sphere({ 300.0f, 50.0f, 1000.0f }, 100.0f); //center, radius
Sphere * sphere2 = new Sphere({ 100.0f, 50.0f, 1000.0f }, 50.0f);
for(int i = 0; i < buffer->getSize().getX(); i++) {
for(int j = 0; j < buffer->getSize().getY(); j++) {
//for each pixel of buffer (image)
double centerX = i + 0.5;
double centerY = j + 0.5;
Geometries::Ray ray = camera->generateRay(centerX, centerY);
Collision * collision = ray.testCollision(sphere1, sphere2);
if(collision){
//output red
}else{
//output blue
}
}
}
The Camera::generateRay(float x, float y) is:
Camera::generateRay(float x, float y) {
//position = camera position, direction = camera direction etc.
Point2D xy = fromImageToPlaneSpace({ x, y });
Vector3D imagePoint = right * xy.getX() + up * xy.getY() + position + direction;
Vector3D rayDirection = imagePoint - position;
rayDirection.normalizeIt();
return Geometries::Ray(position, rayDirection);
}
Point2D fromImageToPlaneSpace(Point2D uv) {
float width = projectionPlaneSize.getX();
float height = projectionPlaneSize.getY();
float x = ((2 * uv.getX() - width) / width) * tan(fovX);
float y = ((2 * uv.getY() - height) / height) * tan(fovY);
return Point2D(x, y);
}
The fovs:
double fovX = 3.14159265359 / 4.0;
double fovY = projectionPlaneSize.getY() / projectionPlaneSize.getX() * fovX;
I get good result for 1:1 width:height aspect (e.g. 400x400):
But I get errors for e.g. 800x400:
Which is even slightly worse for bigger aspect ratios (like 1200x400):
What did I do wrong or which step did I omit?
Can it be a problem with precision or rather something with fromImageToPlaneSpace(...)?
Caveat: I spent 5 years at a video company, but I'm a little rusty.
Note: after writing this, I realized that pixel aspect ratio may not be your problem as the screen aspect ratio also appears to be wrong, so you can skip down a bit.
But, in video we were concerned with two different video sources: standard definition with a screen aspect ratio of 4:3 and high definition with a screen aspect ratio of 16:9.
But, there's also another variable/parameter: pixel aspect ratio. In standard definition, pixels are square and in hidef pixels are rectangular (or vice-versa--I can't remember).
Assuming your current calculations are correct for screen ratio, you may have to account for the pixel aspect ratio being different, either from camera source or the display you're using.
Both screen aspect ratio and pixel aspect ratio can be stored a .mp4, .jpeg, etc.
I downloaded your 1200x400 jpeg. I used ImageMagick on it to change only the pixel aspect ratio:
convert orig.jpg -resize 125x100%\! new.jpg
This says change the pixel aspect ratio (increase the width by 125% and leave the height the same). The \! means pixel vs screen ratio. The 125 is because I remember the rectangular pixel as 8x10. Anyway, you need to increase the horizontal width by 10/8 which is 1.25 or 125%
Needless to say this gave me circles instead of ovals.
Actually, I was able to get the same effect with adjusting the screen aspect ratio.
So, somewhere in your calculations, you're introducing a distortion of that factor. Where are you applying the scaling? How are the function calls different?
Where do you set the screen size/ratio? I don't think that's shown (e.g. I don't see anything like 1200 or 400 anywhere).
If I had to hazard a guess, you must account for aspect ratio in fromImageToPlaneSpace. Either width/height needs to be prescaled or the x = and/or y = lines need scaling factors. AFAICT, what you've got will only work for square geometry at present. To test, using the 1200x400 case, multiply the x by 125% [a kludge] and I bet you get something.
From the images, it looks like you have incorrectly defined the mapping from pixel coordinates to world coordinates and are introducing some stretch in the Y axis.
Skimming your code it looks like you are defining the camera's view frustum from the dimensions of the frame buffer. Therefore if you have a non-1:1 aspect ratio frame buffer, you have a camera whose view frustum is not 1:1. You will want to separate the model of the camera's view frustum from the image space dimension of the final frame buffer.
In other words, the frame buffer is the portion of the plane projected by the camera that we are viewing. The camera defines how the 3D space of the world is projected onto the camera plane.
Any basic book on 3D graphics will discuss viewing and projection.

fwidth(uv) giving strange results in glsl

I checked the result of the filter-width GLSL function by coloring it in red on a plane around the camera.
The result is a bizarre pattern. I thought that it would be a circular gradient on the plane extending around the camera relative to distance. The further pixels uniformly represent more distant UV coordinates between pixels at further distances.
Why isn't fwidth(UV) a simple gradient as a function of distance from the camera? I don't understand how it would work properly if it isn't, because I want to anti-alias pixels as a function of amplitude of the UV coordinates between them.
float width = fwidth(i.uv)*.2;
return float4(width,0,0,1)*(2*i.color);
UVs that are close = black, and far = red.
Result:
the above pattern from fwidth is axis aligned, and has 1 axis of symmetry. it couldnt anti-alias 2 axis checkerboard or an n-axis texture of perlin noise or a radial checkerboard:
float2 xy0 = float2(i.uv.x , i.uv.z) + float2(-0.5, -0.5);
float c0 = length(xy0); //sqrt of xx+yy, polar coordinate radius math
float r0 = atan2(i.uv.x-.5,i.uv.z-.5);//angle polar coordinate
float ww =round(sin(c0* freq) *sin(r0* 50)*.5+.5) ;
Axis independent aliasing pattern:
The mipmaping and filtering parameters are determined by the partial derivatives of the texture coordinates in screen space, not the distance (actually as soon as the fragment stage kicks in, there's no such thing as distance anymore).
I suggest you replace the fwidth visualization with a procedurally generated checkerboard (i.e. (mod(uv.s * k, 1) > 0.5)*(mod(uv.t * k, 1) < 0.5)), where k is a scaling parameter) you'll see that the "density" of the checkerboard (and the aliasing artifacts) is the highst, where you've got the most red in your picture.

How to move a camera using in a ray-tracer?

I am currently working on ray-tracing techniques and I think I've made a pretty good job; but, I haven't covered camera yet.
Until now, I used a plane fragment for view plane which is located between (-width/2, height/2, 200) and (width/2, -height/2, 200) [200 is just a fixed number of z, can be changed].
Addition to that, I use the camera mostly on e(0, 0, 1000), and I use a perspective projection.
I send rays from point e to pixels, and print it to image's corresponding pixel after calculating the pixel color.
Here is a image I created. Hopefully you can guess where eye and view plane are by looking at the image.
My question starts from here. It's time to move my camera around, but I don't know how to map 2D view plane coordinates to the canonical coordinates. Is there a transformation matrix for that?
The method I think requires to know the 3D coordinates of pixels on view plane. I am not sure it's the right method to use. So, what do you suggest?
There are a variety of ways to do it. Here's what I do:
Choose a point to represent the camera location (camera_position).
Choose a vector that indicates the direction the camera is looking (camera_direction). (If you know a point the camera is looking at, you can compute this direction vector by subtracting camera_position from that point.) You probably want to normalize (camera_direction), in which case it's also the normal vector of the image plane.
Choose another normalized vector that's (approximately) "up" from the camera's point of view (camera_up).
camera_right = Cross(camera_direction, camera_up)
camera_up = Cross(camera_right, camera_direction) (This corrects for any slop in the choice of "up".)
Visualize the "center" of the image plane at camera_position + camera_direction. The up and right vectors lie in the image plane.
You can choose a rectangular section of the image plane to correspond to your screen. The ratio of the width or height of this rectangular section to the length of camera_direction determines the field of view. To zoom in you can increase camera_direction or decrease the width and height. Do the opposite to zoom out.
So given a pixel position (i, j), you want the (x, y, z) of that pixel on the image plane. From that you can subtract camera_position to get a ray vector (which then needs to be normalized).
Ray ComputeCameraRay(int i, int j) {
const float width = 512.0; // pixels across
const float height = 512.0; // pixels high
double normalized_i = (i / width) - 0.5;
double normalized_j = (j / height) - 0.5;
Vector3 image_point = normalized_i * camera_right +
normalized_j * camera_up +
camera_position + camera_direction;
Vector3 ray_direction = image_point - camera_position;
return Ray(camera_position, ray_direction);
}
This is meant to be illustrative, so it is not optimized.
For rasterising renderers, you tend to need a transformation matrix because that's how you map directly from 3D coordinates to screen 2D coordinates.
For ray tracing, it's not necessary because you're typically starting from a known pixel coordinate in 2D space.
Given the eye position, a point in 3-space that's in the center of the screen, and vectors for "up" and "right", it's quite easy to calculate the 3D "ray" that goes from the eye position and through the specified pixel.
I've previously posted some sample code from my own ray tracer at https://stackoverflow.com/a/12892966/6782

How to tell the size of font in pixels when rendered with openGL

I'm working on the editor for Bitfighter, where we use the default OpenGL stroked font. We generally render the text with a linewidth of 2, but this makes smaller fonts less readable. What I'd like to do is detect when the fontsize will fall below some threshold, and drop the linewidth to 1. The problem is, after all the transforms and such are applied, I don't know how to tell how tall (in pixels) a font of size <fontsize> will be rendered.
This is the actual inner rendering function:
if(---something--- < thresholdSizeInPixels)
glLineWidth(1);
float scalefactor = fontsize / 120;
glPushMatrix();
glTranslatef(x, y + (fix ? 0 : size), 0);
glRotatef(angle * radiansToDegreesConversion, 0, 0, 1);
glScalef(scaleFactor, -scaleFactor, 1);
for(S32 i = 0; string[i]; i++)
OpenglUtils::drawCharacter(string[i]);
glPopMatrix();
Just before calling this, I want to check the height of the font, then drop the linewidth if necessary. What goes in the ---something--- spot?
Bitfighter is a pure old-school 2D game, so there are no fancy 3D transforms going on. All code is in C++.
My solution was to combine the first part Christian Rau's solution with a fragment of the second. Basically, I can get the current scaling factor with this:
static float modelview[16];
glGetFloatv(GL_MODELVIEW_MATRIX, modelview); // Fills modelview[]
float scalefact = modelview[0];
Then, I multiply scalefact by the fontsize in pixels, and multiply that by the ratio of windowHeight / canvasHeight to get the height in pixels that my text will be rendered.
That is...
textheight = scalefact * fontsize * widndowHeight / canvasHeight
And I liked also the idea of scaling the line thickness rather than stepping from 2 to 1 when a threshold is crossed. It all works very nicely now.
where we use the default OpenGL stroked font
OpenGL doesn't do fonts. There is no default OpenGL stroked font.
Maybe you are referring to GLUT and its glutStrokeCharacter function. Then please take note that GLUT is not part of OpenGL. It's an independent library, focused on providing a simplicistic framework for small OpenGL demos and tutorials.
To answer your question: GLUT Stroke Fonts are defined in terms of vertices, so the usual transformations apply. Since usually all transformations are linear, you can simply transform the vector (0, base_height, 0) through modelview and projection finally doing the perspective divide (gluProject does all this for you – GLU is not part OpenGL, too), the resulting vector is what you're looking for; take the vector length for scaling the width.
This should be determinable rather easily. The font's size in pixels just depends on the modelview transformation (actually only the scaling part), the projection transformation (which is a simple orthographic projection, I suppose) and the viewport settings, and of course on the size of an individual character of the font in untransformed form (what goes into the glVertex calls).
So you just take the font's basic size (lets consider the height only and call it height) and first do the modelview transformation (assuming the scaling shown in the code is the only one):
height *= scaleFactor;
Next we do the projection transformation:
height /= (top-bottom);
with top and bottom being the values you used when specifying the orthographic transformation (e.g. using glOrtho). And last but not least we do the viewport transformation:
height *= viewportHeight;
with viewportHeight being, you guessed it, the height of the viewport specified in the glViewport call. The resulting height should be the height of your font in pixels. You can use this to somehow scale the line width (without an if), as the line width parameter is in floats anyway, let OpenGL do the discretization.
If your transformation pipeline is more complicated, you could use a more general approach using the complete transformation matrices, perhaps with the help of gluProject to transform an object-space point to a screen-space point:
double x0, x1, y0, y1, z;
double modelview[16], projection[16];
int viewport[4];
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
glGetIntegerv(GL_VIEWPORT, viewport);
gluProject(0.0, 0.0, 0.0, modelview, projection, viewport, &x0, &y0, &z);
gluProject(fontWidth, fontHeight, 0.0, modelview, projection, viewport, &x1, &y1, &z);
x1 -= x0;
y1 -= y0;
fontScreenSize = sqrt(x1*x1 + y1*y1);
Here I took the diagonal of the character and not only the height, to better ignore rotations and we used the origin as reference value to ignore translations.
You might also find the answers to this question interesting, which give some more insight into OpenGL's transformation pipeline.