I was researching about MSAA and how it works. I understand the concept how it works and what is the idea behind it. Basically, if the center of triangle covers the center of the pixel this is processed ( in case of the non-msaa). However, If msaa is involved. Let's say 4xmsaa then it will sample 4 other point as sub-sample. Pixel shader will execute per-pixel. However, occlusion,and coverage test will be applied for each sub-pixel. The point I'm confused is I imagine the pixel as little squares on the screen and I couldn't understand how sub-sampling points are determined inside the sample rectangle. How computer aware of one pixels sub-sample locations. And if there is only one square how it sub-sampled colors are determined.(If there is one square then there should be only one color). Lastly,How each sub-sample might have different depth value if it was basically same pixel.
Thank you!
Basically, if the center of triangle covers the center of the pixel this is processed ( in case of the non-msaa).
No, that's not making sense. The center of a triangle is just a point, and that pint falling onto a pixel center means nothing. Standard rasterizing rule is: if the center of the pixel lies inside of the triangle, a fragment is produced (with special rules for cases where the center of the pixel lies exactly on the boundary of the triangle).
The point I'm confused is I imagine the pixel as little squares on the screen and I couldn't understand how sub-sampling points are determined inside the sample rectangle.
No Idea what you mean by "sample rectangle", but keeping that aside: If you use some coordinate frame of reference where a pixel is 1x1 units in area, than you can simply use fractional parts for describing locations within a pixel.
Default OpenGL Window space uses a convention where (0,0) is the lower left corner of the bottom left pixel, and (width,height) is the upper-right corner of the top-right pixel, and all the pixel centers are at half integers .5.
The rasterizer of a real GPU does work with fixed-point representations, and the D3D spec requires that it has at least 8 bits of fractional precision for sub-pixel locations (GL leaves the exact precsision up to the implementor).
Note that at this point, the pixel raster is not relevant at all. A coverage sample is just testing if some 2D point lies inside or outside of an 2D triangle, and a point is always a mathematically infinitely small entity with an area of 0. The conventions for the coordinate systems to do this calculation in can be arbitrarly defined.
And if there is only one square how it sub-sampled colors are determined.(If there is one square then there should be only one color). Lastly,How each sub-sample might have different depth value if it was basically same pixel.
When you use multipsamling, you always use a multisampled framebuffer, which means that for each pixel, there is not a single color, depth, ... value, but there are n (your multisampling count, typically between 2 and 16 inclusively). You will need an additional pass to calculate the single per-pixel values needed for displaying the anti-aliased results (the grpahics API might hide this from you when done on the default frambebuffer, but when you work with custom render targets, you have to do this manually).
Related
Both OpenGL and Direct3D use pixel's center as a sample point during rasterization (without antialiasing).
For example here is the quote from D3D11 rasterization rules:
Any pixel center which falls inside a triangle is drawn
I tried to find out what is the reason to use (0.5, 0.5) instead of, say, (0.0, 0.0) or whatever else in range of 0.0 - 1.0f for both x and y.
The result might be translated a little, but does it really matter? Does it produce some visible artifacts? May be, it makes some algorithms harder to implement? Or it's just a convention?
Again, I don't talk about multisampling here.
So what is the reason?
Maybe this is not the answer to your problem, but I try to answer your question from ray tracing perspective.
In ray tracing, you can get color of every single points in the scene. But since we have a limited amount of pixel, you need to downsample to your image to your screen pixels.
In ray tracing, if you use 1 ray per pixel, we generally choose center point to create our ray which gives the most correct render results. In the image below, I try to show the difference when you choose a corner of pixel or center. The distance will get bigger when your object is far from the rendering screen.
If you use more than one ray for each pixel, lets say 5 rays (4 corners + 1 center) and average the result, you will of course get more realistic image ( Will handle aliasing problems much better) However it will be slower as you guess.
So, it is probably the same idea that opengl and directX take one sample for each pixel instead of multisampling and taking average (Performance issues) and center point is probably giving the best result.
EDIT :
For area rasterization, center of pixel is used because if center of pixel remains inside Area, it is guaranteed that at least 50% of pixel is inside the shape.(Except shape corners) That's why since the proportion is greater than half that pixel is colored.
For other corner selections there is no general rule. Lets look at example image below. The black point (bottom left) is outside of area and should not be drawn (And when you look at it more than half of pixel is outside. However if you look at blue point %80 of pixel is inside area but since bottom left corner is outside area it shouldn't be drawn
This answer mainly focuses on the OP's comment on
Cagkan Toptas answer:
Thanx for the answer, but my question is: why does it give better
results? Does it at all? If yes, what is the explanation?"
It depends on how you define "better" results. From an image qualioty perspective, it does not change much, as long as the primitves are not specifically aligned (after the projection).
Using just one sample at (0,0) instead (0.5, 0.5) will just shift the scene by half a pixel (in both axis, of course). In the general case of aribitrary placed primitves, the average error should be the same.
However, if you want "pixel-exact" drawing (i.e. for text, and UI, and also full-screen post-processing effects), you just would have to take the convention of the underlying implementation into account, and both conventions would work.
One advantage of the "center at half integers" rule is that you can get the integer pixel coordinates (with respect to the sample locations) of the nearest pixel by a simple floor(floating_point_coords) operation, which is simpler than rounding to the nearest integer.
I've been trying to utilize the techniques in Eric Penner's "Shader Amortization using
Pixel Quad Message Passing" from GPU Pro 2, Chapter VI.2. The basic idea is that modern GPU's process fragment shaders in 2x2 fragment quads, and you can use ddx() and ddy() to get the value of some_var at all four fragments as long as the following hold:
Your GPU supports high-quality derivatives
You know which fragment you're processing (top-left, top-right, bottom-left, bottom-right)
This opens up a lot of opportunities for fragment shader optimization (like distributing texture fetches over a 2x2 pixel quad) that you'd need Compute Shaders to beat.
My problem is this:
I can't deterministically detect which fragment I'm processing. Ideally, each fragment block would start at even-numbered output pixel coords like (0, 0), (2, 0), ... (1024, 1024), ..., so you'd just need to check whether the output pixel x and y coords are even or odd to know which fragment you're currently processing. The method Penner uses in the book assumes this works...but it seems to be going wrong for me.
Unfortunately, my 2x2 fragment quads appear to be starting in nondeterministic places: I've seen them start at (even, even), (even, odd), and (odd, even). I can't remember if I've seen (odd, odd) or not, but anyway, the arrangement seems to depend on a myriad of factors I don't understand, including the output resolution and shader specifics. (I'm testing on an 8800 GTS, in case anyone's wondering.)
Does anyone know what might be causing this nondeterminism or have any documentation on it? I understand there's virtually no official standardization in this area, but I'm more interested in how things work in practice on modern desktop-level GPU's, and I'm hoping there's a way to get this technique to work. If no one knows how to reason about the even/odd start behavior, does anyone know any other way of determining the current fragment's relative location in its 2x2 quad?
Thanks :)
As it turns out, the premise of my question was mostly wrong:
The 2x2 fragment quads DO almost always start on even pixel numbers...as long as the output resolution is even-numbered.
If the output resolution is odd-numbered (a possibility with the underlying program I'm working with), things can get more complicated, for obvious reasons. I don't expect there's any uniformity here across drivers/GPU's/etc. either, but my current tests (which themselves may still be buggy) appear to demonstrate 2x2 pixel quads starting at an odd pixel along the dimension with odd resolution, at least when the odd dimension is horizontal.
All of this weirdness helped obscure my bigger issue: The code I used to detect the fragment's location in the pixel quad was buggy. I tested by setting the texture coordinates equal within a pixel quad (set to the pixel quad center)...or so I thought. However, I calculated the screen coordinates based on a full-screen quad where the uv mapping has the +v axis pointing downward. The screenspace origin starts at the bottom-left, because it's based on the top-right quadrant of Cartesian coordinates, and I accidentally forgot to invert the v-coordinate of the uv offset I used to find the pixel quad center. Many of my nondeterministic observations came from failing to check my assumptions while debugging and misinterpreting things as a result, particularly in combination with odd resolutions.
This was an embarrassing mistake I should have caught a lot sooner, but I figured I'd detail it as a warning to others to always double-check the direction of your vertical axis when you're dealing with opposite-facing coordinate frames. ;)
UPDATE:
I ran across a situation where 2x2 pixel quads started on even pixel numbers even when the resolution was odd. Thanks to the nondeterminism under odd resolutions, I had to work out another solution:
If you're deriving your screen pixel numbers from the uv coords of a fullscreen quad (for post-processing), the fragment location derived from this is only useful for arranging/placing shared samples between fragments, etc., not for the quad-pixel communication itself. You'll need to have screen pixel numbers with respect to the screenspace origin for that. You can derive these from vertex positions, or you can use ddx().x and ddy().y on the uv-based pixel numbers to find out their screen direction and mirror the fragment position in the appropriate direction from there.
Calculate the fragment location based on your screen pixel numbers (with respect to the true screenspace origin) and the assumption 2x2 pixel quads start on even pixels. (If you used uv-based pixel numbers, now is the time to mirror things.)
Do a ddx().x and ddy().y on the fragment location, and if they're negative in either direction, you know the pixel quad starts at an odd pixel number in that direction...so mirror in that direction.
If you calculate two fragment positions, one based on a uv origin and one based on a screen origin, use the uv-based one for reasoning about uv-based sample placement, and use the screen-based one for actually obtaining the values of a variable at neighboring fragments.
Profit.
I'll post a link to my working MIT-licensed code once I release it on Github, along with usage examples (the speedup is unfortunately not what I expected, but whatever ;)). I'm just waiting to get done with a larger shader I'll be uploading along with it.
So I'm rendering this diagram each frame:
https://dl.dropbox.com/u/44766482/diagramm.png
Basically, each second it moves everything one pixel to the left and every frame it updates the rightmost pixel column with current data. So a lot of changes are made.
It is completely constructed from GL_LINES, always from bottom to top.
However those black missing columns are not intentional at all, it's just the rasterizer not picking them up.
I'm using integers for positions and bytes for colors, the projection matrix is exactly 1:1; translating by 1 means moving 1 pixel. Orthogonal.
So my problem is, how to get rid of the black lines? I suppose I could write the data to texture, but that seems expensive. Currently I use a VBO.
Render you columns as quads instead with a width of 1 pixel, the rasterization rules of OpenGL will make sure you have no holes this way.
Realize the question is already closed, but you can also get the effect you want by drawing your lines centered at 0.5. A pixel's CENTER is at 0.5, and drawing a line there will always be picked up by the rasterizer in the right place.
Is it possible to draw a triangle within single pixel?
For example, when I specify the co-ordinates of the vertices of the triangle as A(0, 1), B(0, 0) and C(1, 0). I don't see a triangle being rendered at all. I was expecting to see a small triangle fitting within the pixel.
Is there something I am missing?
A pixel is the smallest discrete unit your display can show. Pixels can only have one color.
Therefore, while OpenGL can attempt to render a triangle to half of a pixel, all you will see is either that pixel filled in or that pixel not filled in. Antialiasing can make the filled in color less strong, but the color from a pixel is solid across the entire pixel.
That's simply the nature of a discrete image.
A pixel is a single point how does a triangle fit into a single point?
It is the absolute smallest unit of an image.
Why do you think you can render half a pixel diagonally? A pixel is either on or off, it can't be any other state. What OpenGL specification do you base your assumption on, most 3D libraries will decide to render a pixel based on how much of the sub-pixel information is filled it. But a pixel can't be partially painted, it is either on or off. A pixel is like a light bulb, you can' light up half of a light bulb.
Regardless, the 3D coordinate space represented doesn't map to the 2D space represented by the graphics plane of the camera drawn on the monitor.
Only with specific camera settings and drawing triangles in a 2D plane at a specific distance from the camera can you expect to try and map the 3D coordinates to 2D coordinates in a 1:1 manner, and even then it isn't precise in many cases.
Sub-pixel rendering, doesn't mean what you think it means, it is a technique/algorithm to determine what RGB elements of a pixels to light up and what color to make them, when there are lots of pixels to be lit up, especially in anti-aliasing situations, and the surrounding pixels are taken into consideration, in a 2D rasterized display. There is no way to partially illuminate a single pixel in a shape, sub-pixel rendering just varies the intensity of the color and brightness of a pixel in a more subtle manner. This only works on LCD display. The wikipedia article describes this very well.
You could never draw a triangle in a single pixel in that case either. A triangle will require at minimum 3 pixels to appear as something that might represent a triangle:
■
■ ■
and 6 pixels to represent a rasterized triangle with all three edges represented.
■
■ ■
■ ■ ■
Is it possible to draw a triangle within single pixel?
No!
You could try evaluate how much of the pixel is covered by the triangle, but there's no way to draw only part of a pixel. A pixel is the smallest unit of a rasterized display device. The pixel is the smallest element. And the pixel density of a display device sets the physical limit on the representable resolution.
The mathematical theory behind it is called "sampling therory" and most importantly you need to know about the so called Nyquist theorem.
Pixels being the ultimately smallest elements of a picture are also the reason why you can't zoom into a picture like they do in CSI:NY, it's simply not possible because there's simply no more information in the picture as there are pixels. (Well, if you have some additional source of information, for example by combining the images taken over a longer period of time and you can estimate the movements, then it actuall is possible to turn temporal information into spatial information, but that's a different story.)
Short Version
How can I draw short text labels in an OpenGL mapping application without having to manually recompute coordinates as the user zooms in and out?
Long Version
I have an OpenGL-based mapping application where I need to be able to draw data sets with up to about 250k points. Each point can have a short text label, usally about 4 or 5 characters long.
Currently, I do this using a single textue containing all the characters. For each point, I define a quad for each character in its label. So a point with the label "Fred" would have four quads associated with it, and each quad uses texture coordinates into that single texture to draw its corresponding character.
When I draw the map, I draw the map points themselves in map coordinates (e.g., longitude/latitude). Then I compute the position of each point in screen coordinates and update the four corner points for each of that point's label quads, again in screen coordinates. (For instance, if I determine the point is drawn at screen point 100, 150, I could set the quad for the first character in the point's label to be the rectangle starting with left-top point of 105, 155 and having a width of 6 pixels and a height of 12 pixels, as appropriate for the particular character. Then the second character might start at 120, 155, and so on.) Then once all these label character quads are positioned correctly, I draw them using an orthogonal screen projection.
The problem is that the process of updating all of those character quad coordinates is slow, taking about half a second for a particular test data set with 150k points (meaning that, since each label is about four characters long, there are about 150k * [ 4 characters per point] * [ 4 coordinate pairs per character] coordinate pairs that need to be set on each update.
If the map application didn't involve zooming, I would not need to recompute all these coordinates on each refresh. I could just compute the label coordinates once and then simply shift my viewing rectangle to show the right area. But with zooming, I can't see how to make it work without doing coordniate computation, because otherwise the characters will grow huge as you zoom in and tiny as you zoom out.
What I want (and what I understand OpenGL doesn't provide) is a way to tell OpenGL that a quad should be drawn in a fixed screen-coordinate rectangle, but that the top-left position of that rectangle should be a fixed distance from a given point in map coordinate space. So I want both a primitive hierarchy (a given map point is that parent of its label character quads) and the ability to mix two different coordinate systems within this hierarchy.
I'm trying to understand whether there is some magic transformation matrix I can set that will do all this form me, but I can't see how to do it.
The other alternative I've considered is using a shader on each point to handle computing the label character quad coordinates for that point. I haven't worked with shaders before, and I'm just trying to understand (a) if it's possible to use shaders to do this, and (b) whether computing all those points in shader code actually buys me anything over computing them myself. (By the way, I have confirmed that the big bottleneck is computing the quad coordinates, not in uploading the updated coordinates to the GPU. The latter takes a bit of time, but it's the computation, the sheer number of coordinates being updated, that takes up the bulk of that half second.)
(Of course, the other other alternative is to be smarter about which labels need to be drawn in a given view in the first place. But for now I'd like to concentrate on the solution assuming all labels need to be drawn.)
So the basic problem ("because otherwise the characters will grow huge as you zoom in and tiny as you zoom out") is that you are doing calculations in map coordinates rather than screen coordinates? And if you did it in screen coords, this would require more computations? Obviously, any rendering needs to translate from map coordinates to screen coordinates. The problem seems to be that you are translating from map to screen too late. Therefore, rather than doing a single map-to-screen for each point, and then working in screen coords, you are working mostly in map coords, and then translating per-character to screen coords at the very end. And the slow part is that you are working in screen coords, then having to manually translate back to map coords just to tell OpenGL the map coords, and it will convert those back to screen coords! Is that a fair assessment of your problem?
The solution therefore is to push that transformation earlier in your pipeline. However, I can see why it is tricky, because at first glance, OpenGL seems want to do everything in "world coordinates" (for you, map coords), but not in screen coords.
Firstly, I am wondering why you are doing separate coordinate calculations for each character. What font rendering system are you using? Something like FreeType will automatically generate a bitmap image of an entire string, and doesn't require you to work per-character [edit: this isn't quite true; see comments]. You definitely shouldn't need to calculate the map coordinate (or even screen coordinate) for every character. Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
Now as for working in screen coords, it may be helpful to learn a bit about shaders. The more you learn about OpenGL, the more you learn that really it isn't a 3D rendering engine at all. It's just a 2D graphics library with some very fast matrix primitives built-in. OpenGL actually works, at the lowest level, in screen coordinates (not pixel coordinates -- it works in normalized screen space, I think from memory from -1 to 1 in both the X and Y axis). The only reason it "feels" like you're working in world coordinates is because of these matrices you have set up.
So I think the reason why you are working in map coords all the way until the end is because it's easiest: OpenGL naturally does the map-to-screen transform for you (using the matrices). You have to change that, because you want to work in screen coords yourself, and therefore you need to make the transformation a long time before OpenGL gets its hands on your data. So when you go to draw a label, you should manually apply the map-to-screen transformation matrix on each point, as follows:
You have a particular point (which needs a label drawn) in map coords.
Apply the map-to-screen matrix to convert the point to screen coords. This probably means multiplying the point by the MODELVIEW and PROJECTION matrices, using the same algorithm that OpenGL does when it's rendering a vertex. So you could either glGet the GL_MODELVIEW_MATRIX and GL_PROJECTION_MATRIX to extract OpenGL's current matrices, or you could manually keep around a copy of the matrix yourself.
Now you have the map label in screen coords, compute the position of the label's text. This is simply adding 5 pixels in the X and Y axis, as you said above. However, remember that you aren't in pixel space, but normalised screen space, so you are working in percentages (add 0.05 units, would add 5% of the screen space, for example). It's probably better not to think in pixels, because then your application will scale to match the resolution. But if you really want to think in pixels, you will have to calculate the pixels-to-units based on the resolution.
Use glPushMatrix to save the current matrix, then glLoadIdentity to set the current matrix to the identity -- tell OpenGL not to transform your vertices. (I think you will have to do this for both the PROJECTION and MODELVIEW matrices.)
Draw your label, in screen coordinates.
So you don't really need to write a shader. You could certainly do this in a shader, and it would certainly make step 2 faster (no need to write your own software matrix multiply code; multiplying matrices on the GPU is extremely fast). But that would be a later optimisation, and a lot of work. I think the above steps will help you work in screen coordinates and avoid having to waste a lot of time just to give OpenGL map coordinates.
Side comment on:
"""
generate a bitmap image of an entire string, and doesn't require you to work per-character
...
Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
"""
Freetype or no, you could certainly compute a bitmap image for each label, rather than each character, but that would require one of:
storing thousands of different textures, one for each label
It seems like a bad idea to store that many textures, but maybe it's not.
or
rendering each label, for each point, at each screen update.
this would certainly be too slow.
Just to follow up on the resolution:
I didn't really solve this problem, but I ended up being smarter about when I draw labels in the first place. I was able to quickly determine whether I was about to draw too many characters (i.e., so many characters that on a typical screen with a typical density of points the labels would be too close together to read in a useful way) and then I simply don't label at all. With drawing up to about 5000 characters at a time there isn't a noticeable slowdown recomputing the character coordinates as described above.