Precision of mapping floating point numbers to integer screen coordinates

Precision of mapping floating point numbers to integer screen coordinates - opengl

In OpenGL, vertices are specified between -1.0 and 1.0 range in NDC and then are mapped to the actual screen. But isn't it possible that with very large screen resolution it becomes impossible to specify the exact pixel location on a screen with this limited floating point value range?
So, mathematically, how large should be the screen resolution to that happen?

A standard (IEEE 754) 32-bit float has 24 bits of precision in the mantissa. 23 bits are stored, plus an implicit leading 1. Since we're looking at a range of -1.0 to 1.0 here, we can also include the sign bit when estimating the precision. So that gives 25 bits of precision.
25 bits of precision is enough to cover 2^25 values. 2^25 = 33,554,432. So with float precision, we could handle a resolution of about 33,554,432 x 33,554,432 pixels. I think we're safe for a while!

Generally the coordinates used for rasterization are not floating-point at all.
They are fixed-point with a few bits reserved for subpixel accuracy (you absolutely need this since pixel coverage for things like triangles is based on distance from pixel center).
  
The amount of subpixel precision you are afforded really depends on the value of GL_MAX_VIEWPORT_DIMS. But if GL_MAX_VIEWPORT_DIMS did not exist, then for sure, it would make sense to use floating-point pixel coordinates since you would want to support a massive (potentially unknown) range of coordinates.
In the minimum OpenGL implementation, there must be 4-bits of sub-pixel precision (GL_SUBPIXEL_BITS), so if your GPU used 16-bit for raster coordinates that would give you 12-bit (integer) + 4-bit (fractional) to spread across GL_MAX_VIEWPORT_DIMS (the value would probably be 4096 for 12.4 fixed-point). Such an implementation would limit the integer coordinates to the range [0,4095] and would divide each of those integer coordinates into 16 sub-pixel positions.

Related

Store 3 signed floats (from -4 to 4) for each pixel of a 32 bit texture (R11F_G11F_B10F)

Encoding
As part of a graphical application I'm currently working on, I need to store three signed floats per each pixel of a 32 bit texture. At the moment, to reach this goal, I'm using the following C++ function:
void encode_full(float* rgba, unsigned char* c) {
int range = 8;
for (int i = 0; i < 3; i++) {
rgba[i] += range / 2;
rgba[i] /= range;
rgba[i] *= 255.0f;
c[i] = floor(rgba[i]);
}
c[3] = 255;
}
Although this encoding function brings along a considerable loss in precision, things are made better by the fact that the range of considered values is limited to the interval (-4,4).
Nonetheless, even though the function yields decent results, I think I could do a considerably better job by exploiting the alpha channel (currently unused) to get additional precision. In particular I was thinking to use 11 bits for the first float, 11 bits for the second, and 10 bits for the last float, or 10 - 10 - 10 - 2 (unused). OpenGL has a similar format, called R11F_G11F_B10F.
However, I'm having some difficulties coming up with an encoding function for this particular format. Does anyone know how to write such a function in C++?
Decoding
On the decoding side, this is the function I'm using within my shader.
float3 decode(float4 color) {
int range = 8;
return color.xyz * range - range / 2;
}
Please, notice that the shader is written in Cg, and used within the Unity engine. Furthermore, notice that Unity's implementation of Cg shaders handles only a subset of the Cg language (for instance pack/unpack functions are not supported).
If possible, along with the encoding function, a bit of help for the decoding function would be highly appreciated. Thanks!
Edit
I've mentioned the R11F_G11F_B10F only as a frame of reference for the way the bits are to be split among the color channels. I don't want a float representation, since this would actually imply a loss of precision for the given range, as pointed out in some of the answers.

"10 bits" translates to an integer between 0 and 1023, so the mapping from [-4.0,+4.0] trivially is floor((x+4.0) * (1023.0/8.0)). For 11 bits, substitute 2047.
Decoding is the other way around, (y*8.0/1023.0) - 4.0.

I think using GL_R11F_G11F_B10F is not going to help in your case. As the format name suggests, the components here are 11-bit and 10-bit float numbers, meaning that they are stored as a mantissa and exponent. More specifically, from the spec:
An unsigned 11-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 6-bit mantissa (M).
An unsigned 10-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 5-bit mantissa (M).
In both cases, as common for floating point formats, there is an implicit leading 1 bit for the mantissa. So effectively, the mantissa has 7 bits of precision for the 11-bit case, 6 bits for the 10-bit case.
This is less than the 8-bit precision you're currently using. Now, it's important to understand that the precision for the float case is non-uniform, and relative to the size of the number. So very small numbers would actually have better precision than an 8-bit fixed point number, while numbers towards the top of the range would have worse precision. If you use the natural mapping of your [-4.0, 4.0] range to positive floats, for example by simply adding 4.0 before converting to the 11/10-bit signed float, you would get better precision for values close to -4.0, but worse precision for values close to 4.0.
The main advantage of float formats is really that they can store a much wider range of values, while still maintaining good relative precision.
As long as you want to keep memory use at 4 bytes/pixel, a much better choice for you would be a format like GL_RGB10, giving you an actual precision of 10 bits for each component. This is very similar to GL_RGB10_A2 (and its unsigned sibling GL_RGB10_A2UI), except that it does not expose the alpha component you are not using.
If you're willing to increase memory usage beyond 4 bytes/pixel, you have numerous options. For example, GL_RGBA16 will give you 16 bits of fixed point precision per component. GL_RGB16F gives you 16-bit floats (with 11 bits relative precision). Or you can go all out with GL_RGB32F, which gives you 32-bit float for each component.

Why don't we have smaller float types in OpenGL and DirectX?

What's the problem with having lower than 10 bit floats? Why don't we have 8 bit floats? I can imagine how it would affect the outcome if glfloat is used for colors, but I can't imagine how it affects vertices. Do problems start to occur when we zoom into objects?
Yes, gpu treats so many values as 32 bit values, and even opengl redbook suggests us to use half floats whenever possible but how can I know my lower limit in precision?

Why don't we have 8 bit floats? I can imagine how it would affect the outcome if glfloat is used for colors, but I can't imagine how it affects vertices. Do problems start to occur when we zoom into objects?
Well, yes. Given that even low resolution displays these days have at least 1024 pixels in one direction you need at least 10 bits of significant digits to accurately represent a position on the screen. So assuming that the whole transformation chain is performed without loss of precision (which is not the case obviously) this means, that you need at least 11 bits of significant digits in the original data.
In a floating point value, the mantissa is, what gives the significant digits. In a half precision float (16 bits total) the mantissa is 11 bits long, which is the least amount of precision one needs to represent vertices in screen space without transformation operation roundoff artifacts to become visible on a low resolution screen.
8 bits would be just too little precision to be useful for anything.

what is need of integer and SNORM textures

What is the advantage or need of integer and SNORM textures? when do we use (-1,0) range for textures?
Any example where we need specifically integer or SNORM textures?

According to https://www.opengl.org/registry/specs/EXT/texture_snorm.txt,
signed normalized integer texture formats are used to represent a floating-point values in the range [-1.0, 1.0] with an exact value of floating point zero.

It is nice for storing noise, especially Perlin.
Better than float textures because no bits wasted on the exponent.
Unfortunately, as of GL4.4 none of the snorm formats can be rendered to.
You can use the unorm formats and *2-1 for the same effect, although there may be issues with getting an exact 0.

How to correctly normalize a floating point value in C++?

Maybe I don't understand the IEEE754 standard that much, but given a set of floating point values that are float or double, for example :
56.543f 3238.124124f 121.3f ...
you are able to convert them in values ranging from 0 to 1, so you normalize them, by taking an appropriate common factor while considering what is the maximum value and the minimum value in the set.
Now my point is that in this transformation I need a much higher precision for the set of destination that ranges from 0 to 1 if compared to the level of precision that I need in the first one, especially if the values in the first set are covering a wide range of numerical values ( really big and really small values ).
How the float or the double ( or the IEEE 754 standard if you want ) type can handle this situation while providing more precision for the second set of values knowing that I will basically not need an integer part ?
Or it doesn't handle this at all and I need fixed point math with a totally different type ?

Floating point numbers are stored in a format similar to scientific notation. Internally, they align the leading 1 of the binary representation to the top of the significand. Each value is carried with the same number of binary digits of precision relative to its own magnitude.
When you compress your set of floating point values to the range 0..1, the only precision loss you will get will be due to the rounding that occurs in the various steps of the process.
If you're merely compressing by scaling, you will lose only a small amount of precision near the LSBs of the mantissa (around 1 or 2 ulp, where ulp means "units of the last place).
If you also need to shift your data, then things get trickier. If your data is all positive, then subtracting off the smallest number will not damage anything. But, if your data is a mixture of positive and negative data, then some of your values near zero may suffer a loss in precision.
If you do all the arithmetic at double precision, you'll carry 53 bits of precision through the calculation. If your precision needs fit within that (which likely they do), then you'll be fine. Otherwise, the exact numerical performance will depend on the distribution of your data.

Single and double IEEE floats have a format where the exponent and fraction parts have fixed bit-width. So this is not possible (i.e. you will always have unused bits if you only store values between 0 and 1). (See: http://en.wikipedia.org/wiki/Single-precision_floating-point_format)
Are you sure the 52-bit wide fraction part of a double is not precise enough?
Edit: If you use the whole range of the floating format, you will lose precision when normalizing the values. The roundings can be off and enough small values will become 0. Unless you know that this is a problem, don't worry. Otherwise you have to look up some other solution as mentioned in other answers.

Having binary floating point values (with an implicit leading one) expressed as
(1+fraction) * 2^exponent where fraction < 1
A division a/b is:
a/b = (1+fraction(a)) / (1+fraction(b)) * 2^(exponent(a) - exponent(b))
Hence division/multiplication has essentially no loss of precision.
A subtraction a-b is:
a-b = (1+fraction(a)) * 2^(exponent(a) - (1+fraction(b)) * exponent(b))
Hence a subtraction/addition might have a loss of precision (big - tiny == big) !
Clamping a value x in a range [min, max] to [0, 1]
(x - min) / (max - min)
will have precision issues if any subtraction has a loss of precision.
Answering your question:
Nothing is, choose a suitable representation (floating point, fraction, multi precision ...) for your algorithms and expected data.

If you have a selection of doubles and you normalize them to between 0.0 and 1.0, there are a number of sources of precision loss. They are all, however, much smaller than you suspect.
First, you will lose some precision in the arithmetic operations required to normalize them as rounding occurs. This is relatively small -- a bit or so per operation -- and usually relatively random.
Second, the exponent component will no longer be using the positive exponent possibility.
Third, as all the values are positive, the sign bit will also be wasted.
Forth, if the input space does not include +inf or -inf or +NaN or -NaN or the like, those code points will also be wasted.
But, for the most part, you'll waste about 3 bits of information in a 64 bit double in your normalization, one of which being the kind of thing that is nearly unavoidable when you deal with finite-bit-width values.
Any 64 bit fixed point representation of the values from 0 to 1 will have far less "range" than doubles. A double can represent something on the order of 10^-300, while a 64 bit fixed point representation that includes 1.0 can only go as low as 10^-19 or so. (The 64 bit fixed point representation can represent 1 - 10^-19 as being distinct from 1, while the double cannot, but the 64 bit fixed point value can not represent anything smaller than 2^-64, while doubles can).
Some of the numbers above are approximate, and may depend on rounding/exact format.

For higher precision you can try http://www.boost.org/doc/libs/1_55_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats.html.
Note also, that for the numerical critical operations +,- there are special algorithms that minimize the numerical error introduced by the algorithm:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm

using floats to store large numbers

I'm using floats to represent a position in my game:
struct Position
{
float x;
float y;
};
I'm wondering if this is the best choice and what the consequences will be as the position values continue to grow larger. I took some time to brush up on how floats are stored and realized that I am a little confused.
(I'm using Microsoft Visual C++ compiler.)
In float.h, FLT_MAX is defined as follows:
#define FLT_MAX 3.402823466e+38F /* max value */
which is 340282346600000000000000000000000000000.
That value is much greater than UINT_MAX which is defined as:
#define UINT_MAX 0xffffffff
and corresponds to the value 4294967295.
Based on this, it seems like a float would be a good choice to store a very large number like a position. Even though FLT_MAX is very large, I'm wondering how the precision issues will come into play.
Based on my understanding, a float uses 1 bit to store the sign, 8 bits to store the exponent, and 23 bits to store the mantissa (a leading 1 is assumed):
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
That means FLT_MAX might look like:
0 11111111 11111111111111111111111
which would be the equivalent of:
1.11111111111111111111111 x 2^128
or
111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Even knowing this, I have trouble visualizing the loss of precision and I'm getting confused thinking about what will happen as the values continue to increase.
Is there any easier way to think about this? Are floats or doubles generally used to store very large numbers over something like an unsigned int?

A way of thinking about the precision of a float, is to consider that they have roughly 5 digits of accuracy. So if your units are meters, and you have something 1km away, thats 1000m - attempting to deal with that object at a resolution of 10cm (0.1m) or less may be problematic.
The usual approach in a game would be to use floats, but to divide the world up such that positions are relative to local co-ordinate systems (for example, divide the world into a grid, and for each grid square have a translation value). Everything will have enough precision until it gets transformed relative to the camera for rendering, at which point the imprecision for far away things is not a problem.
As an example, imagine a game set in the solar system. If the origin of your co-ordinate system is in the heart of the sun, then co-ordinates on the surface of planets will be impossible to represent accurately in a float. However if you instead have a co-ordinate system relative to the planet's surface, which in turn is relative to the center of the planet, and then you know where the planet is relative to the sun, you can operate on things in a local space with accuracy, and then transform into whatever space you want for rendering.

No, they're not.
Let's say your position needs to increase by 10 cm for a certain frame since the game object moved.
Assuming a game world scaled in meters, this is 0.10. But if your float value is large enough it won't be able to represent a difference of 0.10 any more, and your attempt to increase the value will simply fail.

Do you need to store a value greater than 16.7m with a fractional part? Then float will be too small.
This series by Bruce Dawson may help.

If you really need to handle very large numbers, then consider using an arbitrary-precision arithmetic library. You will have to profile your code because these libraries are slower than the arithmetics of built-in types.
It is possible that you do not really need very large coordinate values. For example, you could wrap around the edges of your world, and use modulo arithmetic for handling positions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js