OpenGl verex2f but pixels are integers - opengl

when im using this code:
glBegin(GL_POINTS);
glVertex2f(2.5, 2.5);
glVertex2f(3.2, 3.2);
glEnd();
if im understand correctly, so the coordinates are relative to bottom left corner of the screen, but what are those coordinates?
if they are in pixels unit aren't they suppose to be integers?
what is the meaning of using floating points when pixels are integer units?

if they are in pixels unit aren't they suppose to be integers?
They aren't in pixel units. The vertex data is transformed to the final window space (=pixel) coordinates throughout the pipeline. These input coordinates you specify are in object space, and this is a coordinate system which you are defining as you see fit.
You should really make yourself familiar with the coordinate transformations.
Also, you should be aware that in OpenGL, you are not drawing pixels. You are drawing graphics primitives - points, lines, triangles - which are defined by a certain number of vertices each.
what is the meaning of using floating points when pixels are integer units?
Even in window space, floating-point coordinates are useful. Pixels are not discrete points, but represent some area. OpenGL (and other render APIs) define a pixel in window space to be a square with side lenght of 1 unit. Vertices can fall to any (sub-pixel) position inside such a pixel square, and rasterization rules will be applied to generate the appropriate pixel-sized fragments for the primitives you are rendering.

Related

Confusion about MSAA

I was researching about MSAA and how it works. I understand the concept how it works and what is the idea behind it. Basically, if the center of triangle covers the center of the pixel this is processed ( in case of the non-msaa). However, If msaa is involved. Let's say 4xmsaa then it will sample 4 other point as sub-sample. Pixel shader will execute per-pixel. However, occlusion,and coverage test will be applied for each sub-pixel. The point I'm confused is I imagine the pixel as little squares on the screen and I couldn't understand how sub-sampling points are determined inside the sample rectangle. How computer aware of one pixels sub-sample locations. And if there is only one square how it sub-sampled colors are determined.(If there is one square then there should be only one color). Lastly,How each sub-sample might have different depth value if it was basically same pixel.
Thank you!
Basically, if the center of triangle covers the center of the pixel this is processed ( in case of the non-msaa).
No, that's not making sense. The center of a triangle is just a point, and that pint falling onto a pixel center means nothing. Standard rasterizing rule is: if the center of the pixel lies inside of the triangle, a fragment is produced (with special rules for cases where the center of the pixel lies exactly on the boundary of the triangle).
The point I'm confused is I imagine the pixel as little squares on the screen and I couldn't understand how sub-sampling points are determined inside the sample rectangle.
No Idea what you mean by "sample rectangle", but keeping that aside: If you use some coordinate frame of reference where a pixel is 1x1 units in area, than you can simply use fractional parts for describing locations within a pixel.
Default OpenGL Window space uses a convention where (0,0) is the lower left corner of the bottom left pixel, and (width,height) is the upper-right corner of the top-right pixel, and all the pixel centers are at half integers .5.
The rasterizer of a real GPU does work with fixed-point representations, and the D3D spec requires that it has at least 8 bits of fractional precision for sub-pixel locations (GL leaves the exact precsision up to the implementor).
Note that at this point, the pixel raster is not relevant at all. A coverage sample is just testing if some 2D point lies inside or outside of an 2D triangle, and a point is always a mathematically infinitely small entity with an area of 0. The conventions for the coordinate systems to do this calculation in can be arbitrarly defined.
And if there is only one square how it sub-sampled colors are determined.(If there is one square then there should be only one color). Lastly,How each sub-sample might have different depth value if it was basically same pixel.
When you use multipsamling, you always use a multisampled framebuffer, which means that for each pixel, there is not a single color, depth, ... value, but there are n (your multisampling count, typically between 2 and 16 inclusively). You will need an additional pass to calculate the single per-pixel values needed for displaying the anti-aliased results (the grpahics API might hide this from you when done on the default frambebuffer, but when you work with custom render targets, you have to do this manually).

What is, in simple terms, textureGrad()?

I read the Khronos wiki on this, but I don't really understand what it is saying. What exactly does textureGrad do?
I think it samples multiple mipmap levels and computes some color mixing using the explicit derivative vectors given to it, but I am not sure.
When you sample a texture, you need the specific texture coordinates to sample the texture data at. For sake of simplicity, I'm going to assume a 2D texture, so the texture coordinates are a 2D vector (s,t). (The explanation is analogous for other dimensionalities).
If you want to texture-map a triangle, one typically uses one of two strategies to get to the texture coordinates:
The texture coordinates are part of the model. Every vertex contains the 2D texture coordinates as a vertex attribute. During rasterization, those texture coordinates are interpolated across the primitive.
You specify a mathematic mapping. For example, you could define some function mapping the 3D object coordinates to some 2D texture coordinates. You can for example define some projection, and project the texture onto a surface, just like a real projector would project an image onto some real-world objects.
In either case, each fragment generated when rasterizing the typically gets different texture coordinates, so each drawn pixel on the screen will get a different part of the texture.
The key point is this: each fragment has 2D pixel coordinates (x,y) as well as 2D texture coordinates (s,t), so we can basically interpret this relationship as a mathematical function:
(s,t) = T(x,y)
Since this is a vector function in the 2D pixel position vector (x,y), we can also build the partial derivatives along x direction (to the right), and y direction (upwards), which are telling use the rate of change of the texture coordinates along those directions.
And the dTdx and dTdy in textureGrad are just that.
So what does the GPU need this for?
When you want to actually filter the texture (in contrast to simple point sampling), you need to know the pixel footprint in texture space. Each single fragment represents the area of one pixel on the screen, and you are going to use a single color value from the texture to represent the whole pixel (multisampling aside). The pixel footprint now represent the actual area the pixel would have in texture space. We could calculate it by interpolating the texcoords not for the pixel center, but for the 4 pixel corners. The resulting texcoords would form a trapezoid in texture space.
When you minify the texture, several texels are mapped to the same pixel (so the pixel footprint is large in texture space). When you maginify it, each pixel will represent only a fraction of the corresponding texel (so the footprint is quiete small).
The texture footprint tells you:
if the texture is minified or magnified (GL has different filter settings for each case)
how many texels would be mapped to each pixel, so which mipmap level would be appropriate
how much anisotropy there is in the pixel footprint. Each pixel on the screen and each texel in texture space is basically a square, but the pixel footprint might significantly deviate from than, and can be much taller than wide or the over way around (especially in situations with high perspective distortion). Classic bilinear or trilinear texture filters always use a square filter footprint, but the anisotropic texture filter will uses this information to
actually generate a filter footprint which more closely matches that of the actual pixel footprint (to avoid to mix in texel data which shouldn't really belong to the pixel).
Instead of calculating the texture coordinates at all pixel corners, we are going to use the partial derivatives at the fragment center as an approximation for the pixel footprint.
The following diagram shows the geometric relationship:
This represents the footprint of four neighboring pixels (2x2) in texture space, so the uniform grid are the texels, and the 4 trapezoids represent the 4 pixel footprints.
Now calculating the actual derivatives would imply that we have some more or less explicit formula T(x,y) as described above. GPUs usually use another approximation:
the just look at the actual texcoords the the neighboring fragments (which are going to be calculated anyway) in each 2x2 pixel block, and just approximate the footprint by finite differencing - the just subtracting the actual texcoords for neighboring fragments from each other.
The result is shown as the dotted parallelogram in the diagram.
In hardware, this is implemented so that always 2x2 pixel quads are shaded in parallel in the same warp/wavefront/SIMD-Group. The GLSL derivative functions like dFdx and dFdy simply work by subtracting the actual values of the neighboring fragments. And the standard texture function just internally uses this mechanism on the texture coordinate argument. The textureGrad functions bypass that and allow you to specify your own values, which means you control the what pixel footprint the GPU assumes when doing the actual filtering / mipmap level selection.

Why does OpenGL allow/use fractional values as the location of vertices?

As far as I understand, location of a point/pixel cannot be a fraction, at least on a raster graphics system where hardwares use pixels to display images.
Then, why and how does OpenGL use fractional values for plotting pixels?
For example, how is it possible: glVertex2f(0.15f, 0.51f); ?
This command does not plot any pixels. It merely defines the location of a point in 3D space (you'll notice that there are 3 coordinates, while for a pixel on the screen you'd only need 2). This is the starting point for the OpenGL pipeline. This point then goes through a lot of transformations before it ends up on the screen.
Also, the coordinates are unitless. For example, you can say that your viewport is between 0.0f and 1.0f, then these coordinates make a lot of sense. Basically you have to think of these point in terms of mathematics, not pixels.
I would suggest some reading on how OpenGL transformations work, for example here, here or the tutorial here.
The vectors you pass into OpenGL are not viewport positions but arbitrary numbers in some vector space. Only after a chain of transformations these numbers are mapped into viewport pixel positions. With the old fixed function pipeline this could be anything that can be represented by a vector–matrix multiplication.
These days, where everything is programmable (shaders) the mapping can very well be any kind of function you can think of. For example the values you pass into glVertex (immediate mode call, but available to shaders with OpenGL-2.1) may be interpreted as polar coordinates in the vertex shader:
This is a perfectly valid OpenGL-2.1 vertex shader that interprets the vertex position to be in polar coordinates. Note that due to triangles and lines being straight edges and polar coordinates being curvilinear this gives good visual results only for points or highly tesselated primitives.
#version 110
void main() {
gl_Position =
gl_ModelViewProjectionMatrix
* vec4( gl_Vertex.y*vec2(sin(gl_Vertex.x),cos(gl_Vertex.x)) , 0, 1);
}
As you can see here the valus passed to glVertex are actually arbitrary, unitless components of vectors in some vector space. Only by applying some transformation to the viewport space these vectors gain meaning. Hence it makes no way to impose a certain value range onto the values that go into the vertex attribute.
Vertex and pixel are very different things.
It's quite possible to have all your vertices within one pixel (although in this case you probably need help with LODing).
You might want to start here...
http://www.glprogramming.com/blue/ch01.html
Specifically...
Primitives are defined by a group of one or more vertices. A vertex defines a point, an endpoint of a line, or a corner of a polygon where two edges meet. Data (consisting of vertex coordinates, colors, normals, texture coordinates, and edge flags) is associated with a vertex, and each vertex and its associated data are processed independently, in order, and in the same way.
And...
Rasterization produces a series of frame buffer addresses and associated values using a two-dimensional description of a point, line segment, or polygon. Each fragment so produced is fed into the last stage, per-fragment operations, which performs the final operations on the data before it's stored as pixels in the frame buffer.
For your example, before glVertex2f(0.15f, 0.51f) is on the screen, there are many transforms to be done. Making complex thing crudely simpler, after moving your vertex to view space (applying camera position and direction), the magic here is (1) projection matrix, and (2) viewport setting.
Internally, OpenGL "screen coordinates" are in a cube (-1, -1, -1) - (1, 1, 1), :
http://www.matrix44.net/cms/wp-content/uploads/2011/03/ogl_coord_object_space_cube.png
Projection matrix 'squeezes' the frustum in this cube (which you do in vertex shader), assuming you have perspective transform - if projection is orthogonal, the projection is just a tube, limited by near and far values (and like in both cases, scaling factors):
http://www.songho.ca/opengl/files/gl_projectionmatrix01.png
EDIT: Maybe better example here:
http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/#The_Projection_matrix
(EDIT: The Z-coordinate is used as depth value) When fragments are finally transferred to pixels on texture/framebuffer/screen, these are multiplied with viewport settings:
https://www3.ntu.edu.sg/home/ehchua/programming/opengl/images/GL_2DViewportAspectRatio.png
Hope this helps!

draw a triangle within a single pixel in opengl

Is it possible to draw a triangle within single pixel?
For example, when I specify the co-ordinates of the vertices of the triangle as A(0, 1), B(0, 0) and C(1, 0). I don't see a triangle being rendered at all. I was expecting to see a small triangle fitting within the pixel.
Is there something I am missing?
A pixel is the smallest discrete unit your display can show. Pixels can only have one color.
Therefore, while OpenGL can attempt to render a triangle to half of a pixel, all you will see is either that pixel filled in or that pixel not filled in. Antialiasing can make the filled in color less strong, but the color from a pixel is solid across the entire pixel.
That's simply the nature of a discrete image.
A pixel is a single point how does a triangle fit into a single point?
It is the absolute smallest unit of an image.
Why do you think you can render half a pixel diagonally? A pixel is either on or off, it can't be any other state. What OpenGL specification do you base your assumption on, most 3D libraries will decide to render a pixel based on how much of the sub-pixel information is filled it. But a pixel can't be partially painted, it is either on or off. A pixel is like a light bulb, you can' light up half of a light bulb.
Regardless, the 3D coordinate space represented doesn't map to the 2D space represented by the graphics plane of the camera drawn on the monitor.
Only with specific camera settings and drawing triangles in a 2D plane at a specific distance from the camera can you expect to try and map the 3D coordinates to 2D coordinates in a 1:1 manner, and even then it isn't precise in many cases.
Sub-pixel rendering, doesn't mean what you think it means, it is a technique/algorithm to determine what RGB elements of a pixels to light up and what color to make them, when there are lots of pixels to be lit up, especially in anti-aliasing situations, and the surrounding pixels are taken into consideration, in a 2D rasterized display. There is no way to partially illuminate a single pixel in a shape, sub-pixel rendering just varies the intensity of the color and brightness of a pixel in a more subtle manner. This only works on LCD display. The wikipedia article describes this very well.
You could never draw a triangle in a single pixel in that case either. A triangle will require at minimum 3 pixels to appear as something that might represent a triangle:
■
■ ■
and 6 pixels to represent a rasterized triangle with all three edges represented.
■
■ ■
■ ■ ■
Is it possible to draw a triangle within single pixel?
No!
You could try evaluate how much of the pixel is covered by the triangle, but there's no way to draw only part of a pixel. A pixel is the smallest unit of a rasterized display device. The pixel is the smallest element. And the pixel density of a display device sets the physical limit on the representable resolution.
The mathematical theory behind it is called "sampling therory" and most importantly you need to know about the so called Nyquist theorem.
Pixels being the ultimately smallest elements of a picture are also the reason why you can't zoom into a picture like they do in CSI:NY, it's simply not possible because there's simply no more information in the picture as there are pixels. (Well, if you have some additional source of information, for example by combining the images taken over a longer period of time and you can estimate the movements, then it actuall is possible to turn temporal information into spatial information, but that's a different story.)

labels in an opengl map application

Short Version
How can I draw short text labels in an OpenGL mapping application without having to manually recompute coordinates as the user zooms in and out?
Long Version
I have an OpenGL-based mapping application where I need to be able to draw data sets with up to about 250k points. Each point can have a short text label, usally about 4 or 5 characters long.
Currently, I do this using a single textue containing all the characters. For each point, I define a quad for each character in its label. So a point with the label "Fred" would have four quads associated with it, and each quad uses texture coordinates into that single texture to draw its corresponding character.
When I draw the map, I draw the map points themselves in map coordinates (e.g., longitude/latitude). Then I compute the position of each point in screen coordinates and update the four corner points for each of that point's label quads, again in screen coordinates. (For instance, if I determine the point is drawn at screen point 100, 150, I could set the quad for the first character in the point's label to be the rectangle starting with left-top point of 105, 155 and having a width of 6 pixels and a height of 12 pixels, as appropriate for the particular character. Then the second character might start at 120, 155, and so on.) Then once all these label character quads are positioned correctly, I draw them using an orthogonal screen projection.
The problem is that the process of updating all of those character quad coordinates is slow, taking about half a second for a particular test data set with 150k points (meaning that, since each label is about four characters long, there are about 150k * [ 4 characters per point] * [ 4 coordinate pairs per character] coordinate pairs that need to be set on each update.
If the map application didn't involve zooming, I would not need to recompute all these coordinates on each refresh. I could just compute the label coordinates once and then simply shift my viewing rectangle to show the right area. But with zooming, I can't see how to make it work without doing coordniate computation, because otherwise the characters will grow huge as you zoom in and tiny as you zoom out.
What I want (and what I understand OpenGL doesn't provide) is a way to tell OpenGL that a quad should be drawn in a fixed screen-coordinate rectangle, but that the top-left position of that rectangle should be a fixed distance from a given point in map coordinate space. So I want both a primitive hierarchy (a given map point is that parent of its label character quads) and the ability to mix two different coordinate systems within this hierarchy.
I'm trying to understand whether there is some magic transformation matrix I can set that will do all this form me, but I can't see how to do it.
The other alternative I've considered is using a shader on each point to handle computing the label character quad coordinates for that point. I haven't worked with shaders before, and I'm just trying to understand (a) if it's possible to use shaders to do this, and (b) whether computing all those points in shader code actually buys me anything over computing them myself. (By the way, I have confirmed that the big bottleneck is computing the quad coordinates, not in uploading the updated coordinates to the GPU. The latter takes a bit of time, but it's the computation, the sheer number of coordinates being updated, that takes up the bulk of that half second.)
(Of course, the other other alternative is to be smarter about which labels need to be drawn in a given view in the first place. But for now I'd like to concentrate on the solution assuming all labels need to be drawn.)
So the basic problem ("because otherwise the characters will grow huge as you zoom in and tiny as you zoom out") is that you are doing calculations in map coordinates rather than screen coordinates? And if you did it in screen coords, this would require more computations? Obviously, any rendering needs to translate from map coordinates to screen coordinates. The problem seems to be that you are translating from map to screen too late. Therefore, rather than doing a single map-to-screen for each point, and then working in screen coords, you are working mostly in map coords, and then translating per-character to screen coords at the very end. And the slow part is that you are working in screen coords, then having to manually translate back to map coords just to tell OpenGL the map coords, and it will convert those back to screen coords! Is that a fair assessment of your problem?
The solution therefore is to push that transformation earlier in your pipeline. However, I can see why it is tricky, because at first glance, OpenGL seems want to do everything in "world coordinates" (for you, map coords), but not in screen coords.
Firstly, I am wondering why you are doing separate coordinate calculations for each character. What font rendering system are you using? Something like FreeType will automatically generate a bitmap image of an entire string, and doesn't require you to work per-character [edit: this isn't quite true; see comments]. You definitely shouldn't need to calculate the map coordinate (or even screen coordinate) for every character. Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
Now as for working in screen coords, it may be helpful to learn a bit about shaders. The more you learn about OpenGL, the more you learn that really it isn't a 3D rendering engine at all. It's just a 2D graphics library with some very fast matrix primitives built-in. OpenGL actually works, at the lowest level, in screen coordinates (not pixel coordinates -- it works in normalized screen space, I think from memory from -1 to 1 in both the X and Y axis). The only reason it "feels" like you're working in world coordinates is because of these matrices you have set up.
So I think the reason why you are working in map coords all the way until the end is because it's easiest: OpenGL naturally does the map-to-screen transform for you (using the matrices). You have to change that, because you want to work in screen coords yourself, and therefore you need to make the transformation a long time before OpenGL gets its hands on your data. So when you go to draw a label, you should manually apply the map-to-screen transformation matrix on each point, as follows:
You have a particular point (which needs a label drawn) in map coords.
Apply the map-to-screen matrix to convert the point to screen coords. This probably means multiplying the point by the MODELVIEW and PROJECTION matrices, using the same algorithm that OpenGL does when it's rendering a vertex. So you could either glGet the GL_MODELVIEW_MATRIX and GL_PROJECTION_MATRIX to extract OpenGL's current matrices, or you could manually keep around a copy of the matrix yourself.
Now you have the map label in screen coords, compute the position of the label's text. This is simply adding 5 pixels in the X and Y axis, as you said above. However, remember that you aren't in pixel space, but normalised screen space, so you are working in percentages (add 0.05 units, would add 5% of the screen space, for example). It's probably better not to think in pixels, because then your application will scale to match the resolution. But if you really want to think in pixels, you will have to calculate the pixels-to-units based on the resolution.
Use glPushMatrix to save the current matrix, then glLoadIdentity to set the current matrix to the identity -- tell OpenGL not to transform your vertices. (I think you will have to do this for both the PROJECTION and MODELVIEW matrices.)
Draw your label, in screen coordinates.
So you don't really need to write a shader. You could certainly do this in a shader, and it would certainly make step 2 faster (no need to write your own software matrix multiply code; multiplying matrices on the GPU is extremely fast). But that would be a later optimisation, and a lot of work. I think the above steps will help you work in screen coordinates and avoid having to waste a lot of time just to give OpenGL map coordinates.
Side comment on:
"""
generate a bitmap image of an entire string, and doesn't require you to work per-character
...
Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
"""
Freetype or no, you could certainly compute a bitmap image for each label, rather than each character, but that would require one of:
storing thousands of different textures, one for each label
It seems like a bad idea to store that many textures, but maybe it's not.
or
rendering each label, for each point, at each screen update.
this would certainly be too slow.
Just to follow up on the resolution:
I didn't really solve this problem, but I ended up being smarter about when I draw labels in the first place. I was able to quickly determine whether I was about to draw too many characters (i.e., so many characters that on a typical screen with a typical density of points the labels would be too close together to read in a useful way) and then I simply don't label at all. With drawing up to about 5000 characters at a time there isn't a noticeable slowdown recomputing the character coordinates as described above.