I am currently working with GLSL 330 and came across some odd behavior of the mod() function.
Im working under windows 8 with a Radeon HD 6470M. I can not recreate this behavior on my desktop PC which uses windows 7 and a GeForce GTX 260.
Here is my test code:
float testvalf = -126;
vec2 testval = vec2(-126, -126);
float modtest1 = mod(testvalf, 63.0); //returns 63
float modtest2 = mod(testval.x, 63.0); //returns 63
float modtest3 = mod(-126, 63.0); //returns 0
Edit:
Here are some more test results done after IceCools suggestion below.
int y = 63;
int inttestval = -126;
ivec2 intvectest(-126, -126);
float floattestval = -125.9;
float modtest4 = mod(inttestval, 63); //returns 63
float modtest5 = mod(intvectest, 63); //returns vec2(63.0, 63.0)
float modtest6 = mod(intvectest.x, 63); //returns 63
float modtest7 = mod(floor(floattestval), 63); //returns 63
float modtest8 = mod(inttestval, y); //returns 63
float modtest9 = mod(-126, y); //returns 63
I updated my drivers and tested again, same results. Once again not reproducable on the desktop.
According to the GLSL docs on mod the possible parameter combinations are (GenType, float) and (GenType, GenType) (no double, since we're < 4.0). Also the return type is forced to float but that shouldn't matter for this problem.
I don't know that if you did it on intention but -126 is an int not a float, and the code might not be doing what you expect.
By the way about the modulo:
Notice that 2 different functions are called:
The first two line:
float mod(float, float);
The last line:
int mod(int, float);
If I'm right mod is calculated like:
genType mod(genType x, float y){
return x - y*floor(x/y);
}
Now note, that if x/y evaluates -2.0 it will return 0, but if it evaluates as -2.00000001 then 63.0 will be returned. That difference is not impossible between int/float and float/float division.
So the reason is might be just the fact that you are using ints and floats mixed.
I think I have found the answer.
One thing I've been wrong about is that mangsl's keyword for genType doesn't mean a generic type, like in a c++ template.
GenType is shorthand for float, vec2, vec3, and vec4 (see link - ctrl+f genType).
Btw genType naming is like:
genType - floats
genDType - doubles
genIType - ints
genBType - bools
Which means that genType mod(genType, float) implies that there is no function like int mod(int, float).
All the code above have been calling float mod(float, float) (thankfully there is implicit typecast for function parameters, so mod(int, int) works too, but actually mod(float, float) is called).
Just as a proof:
int x = mod(-126, 63);
Doesn't compile: error C7011: implicit cast from "float" to "int"
It only doesn't work because it returns float, so it works like this:
float x = mod(-126, 63);
Therefore float mod(float, float) is called.
So we are back at the original problem:
float division is inaccurate
int to float cast is inaccurate
It shouldn't be a problem on most GPU, as floats are considered equal if the difference between them is less than 10^-5 (it may vary with hardware, but this is the case for my GPU). So floor(-2.0000001) is -2. Highp floats are far more accurate than this.
Therefore either you are not using highp floats (precision highp float; should fix it then) or your GPU has stricter limit for float equality, or some of the functions are returning less accurate value.
If all else fails try:
#extension BlackMagic : enable
Maybe some driver setting is forcing default float precision to be mediump.
If this happens, all your defined variables will be mediump, however, numbers typed in the code will still remain highp.
Consider this code:
precision mediump float;
float x = 0.4121551, y = 0.4121552;
x == y; // true
0.4121551 == 0.4121552; // false, as highp they still differ.
So that mod(-126,63.0) could be still precise enough to return the correct value, as its working with high precision floats, however if you give a variable (like at all the other cases), which will only be mediump, the function won't have enough precision to calculate the correct value, and as you look at your tests, this is what's happening:
All the functions that take at least one variable are not precise enough
The only function call that takes 2 typed numbers return the correct value.
Related
If I am trying to multiply a float by a whole number is it faster to multiply it by a whole number being represented by an integer
int x;
...
float y = 0.5784f * x; //Where x contains a dynamically chosen whole number
or by another float (provided there is no loss in accuracy)
float x;
...
float y = 0.5784f * x; //Where x contains a dynamically chosen and floating point representable whole number
or does it vary greatly between hardware? Is there a common circuit (found in most floating point units) that handles float and integer multiplication or is the general practice for the hardware to first convert the integer into a float and then use a circuit that performs float * float? What if the whole number being represented is extremely small such as a value of 0 or 1 determined dynamically and used to determine whether or not the float is added to a sum without branching?
int x;
...
float y = 0.5784f + 0.3412f * x; //Where x contains either 0 or 1 (determined dynamically).
Thanks for the help in advance.
Is it faster to multiply a float by an integer or another float
In general, float * float is faster, yet I suspect little or no difference. The speed of a program is a result of the entire code, not just this line. Faster here may cost one more in other places.
Trust your compile or get a better compiler to emit code that performs 0.5784f * some_int well.
In the 0.5784f * some_int case, the language obliges some_int to act as if it is converted to float first*1 before the multiplication. But a sharp compiler may known of implementation specific tricks to perform the multiplications better/faster directly without a separate explicit conversion - as long as it gets an allowable result..
In the float y = 0.5784f + 0.3412f * x; //Where x contains either 0 or 1 (determined dynamically). a compile might see that too and take advantage to emit efficient code.
Only in select cases and with experience will you out-guess the compiler. Code for clarity first.
You could always profile different codes/compiler options and compare.
Tip: In my experience, I find more performance gains with a larger view of code than the posted concern - which verges on micro-optimization.
*1 See FLT_EVAL_METHOD for other possibilities.
Problem: In a homework problem (it is to be done on paper with a pen so no coding) I must determine the type and value of an addition performed in C++.
1 + 0.5
What I've answered is:
Type float (because I thought that integer + float = float)
Value 1.5 (As far as I know when two different datatypes are added,
the result of the addition is going to be converted to the datatype that does not loose any information.)
Solution says:
Type: double
Value: 1.5
My Question: Why is 0.5 a double and not a float? How can I distingish between a float and a double? I mean, 0.5 looks to me like a float and a double.
First of all, yes. integer + float = float. You are correct about that part.
The issue is not with that, but rather your assumption that 0.5 is a float. It is not. In C++, float literals are followed by an f meaning that 0.5f is a float. However, 0.5 is actually a double. That means that your equation is now:
integer + double = double
As you can see, this result is a double. That is why the correct answer to your question is that the resulting type is a double.
By the way, to clear the record, technically what's going on here isn't integer + double = double. What is happening is that the 1 is falling subject to implicit conversion. Essentially, the 1 is converted to a double, since the other side of the operation is a double as well. This way, the computer is adding the same types and not different ones. That means that the actual addition taking place here is more like:
double + double = double
In C++, floating point literals without a type suffix are double by default. If you want it to be float, you need to specify the f suffix, like 0.5f.
I used to replace const with #define, but in the below example it prints false.
#include <iostream>
#define x 3e+38
using namespace std;
int main() {
float p = x;
if (p==x)
cout<<"true"<<endl;
else
cout<<"false"<<endl;
return 0;
}
But if I replace
#define x 3e+38
with
const float x = 3e+38;
it works perfectly, question is why? (I know there are several topics discussed for #define vs const, but really didn't get this, kindly enlighten me)
In c++ the literals are double precision. In the first examples the number 3e+38 is first converted to float in the variable initialization and then back to double precision in the comparison. The conversions are not necessary exact, so the numbers may differ. In the second example numbers stay float all the time. To fix it you can change p to double, write
#define x 3e+38f
(which defines a float literal), or change the comparison to
if (p == static_cast<float>(x))
which performs the same conversion as the variable initialization, and does then the comparison in single precision.
Also as commented the comparison of floating point numbers with == is not usually a good idea, as rounding errors yield unexpected results, e.g., x*y might be different from y*x.
The number 3e+38 is double due its magnitude.
The assignment
float p = x;
causes the 3e+38 to lose its precision and hence its value when stored in p.
thats why the comparison :
if(p==x)
results in false because p has different value than 3e+38.
I am writing a protocol, that uses RFC 7049 as its binary representation. The standard states, that the protocol may use 32-bit floating point representation of numbers, if their numeric value is equivalent to respective 64-bit numbers. The conversion must not lead to lose of precision.
What 32-bit float numbers can be bigger than 64-bit integer and numerically equivalent with them?
Is comparing float x; uint64_t y; (float)x == (float)y enough for ensuring, that the values are equivalent? Will this comparison ever be true?
RFC 7049 §3.6. Numbers
For the purposes of this specification, all number representations
for the same numeric value are equivalent. This means that an
encoder can encode a floating-point value of 0.0 as the integer 0.
It, however, also means that an application that expects to find
integer values only might find floating-point values if the encoder
decides these are desirable, such as when the floating-point value is
more compact than a 64-bit integer.
There certainly are numbers for which this is true:
2^33 can be perfectly represented as a floating point number, but clearly cannot be represented as a 32-bit integer. The following code should work as expected:
bool representable_as_float(int64_t value) {
float repr = value;
return repr >= -0x1.0p63 && repr < 0x1.0p63 && (int64_t)repr == value;
}
It is important to notice though that we are basically doing (int64_t)(float)value and not the other way around - we are interested if the cast to float loses any precision.
The check to see whether repr is smaller than the maximum value of int64_t is important since we could invoke undefined behavior otherwise, since the cast to float may round up to the next higher number (which could then be larger than the maximum value possible in int64_t). (Thanks to #tmyklebu for pointing this out).
Two samples:
// powers of 2 can easily be represented
assert(representable_as_float(((int64_t)1) << 33));
// Other numbers not so much:
assert(!representable_as_float(std::numeric_limits<int64_t>::max()));
The following is based on Julia's method for comparing floats and integers. This does not require access to 80-bit long doubles or floating point exceptions, and should work under any rounding mode. I believe this should work for any C float type (IEEE754 or not), and not cause any undefined behaviour.
UPDATE: technically this assumes a binary float format, and that the float exponent size is large enough to represent 264: this is certainly true for the standard IEEE754 binary32 (which you refer to in your question), but not, say, binary16.
#include <stdio.h>
#include <stdint.h>
int cmp_flt_uint64(float x,uint64_t y) {
return (x == (float)y) && (x != 0x1p64f) && ((uint64_t)x == y);
}
int main() {
float x = 0x1p64f;
uint64_t y = 0xffffffffffffffff;
if (cmp_flt_uint64(x,y))
printf("true\n");
else
printf("false\n");
;
}
The logic here is as follows:
The first equality can be true only if x is a non-negative integer in the interval [0,264].
The second checks that x (and hence (float)y) is not 264: if this is the case, then y cannot be represented exactly by a float, and so the comparison is false.
Any remaining values of x can be exactly converted to a uint64_t, and so we cast and compare.
No, you need to compare (long double)x == (long double)y on an architecture where the mantissa of a long double can hold 63 bits. This is because some big long long ints will lose precision when you convert them to float, and compare as equal to a non-equivalent float, but if you convert to long double, it will not lose precision on that architecture.
The following program demonstrates this behavior when compiled with gcc -std=c99 -mssse3 -mfpmath=sse on x86, because these settings use wide-enough long doubles but prevent the implicit use of higher-precision types in calculations:
#include <assert.h>
#include <stdint.h>
const int64_t x = (1ULL<<62) - 1ULL;
const float y = (float)(1ULL<<62);
// The mantissa is not wide enough to store
// 63 bits of precision.
int main(void)
{
assert ((float)x == (float)y);
assert ((long double)x != (long double)y);
return 0;
}
Edit: If you don’t have wide enough long doubles, the following might work:
feclearexcept(FE_ALL_EXCEPT);
x == y;
ftestexcept(FE_INEXACT);
I think, although I could be mistaken, that an implementation could round off x during the conversion in a way that loses precision.
Another strategy that could work is to compare
extern uint64_t x;
extern float y;
const float z = (float)x;
y == z && (uint64_t)z == x;
This should catch losses of precision due to round-off error, but it could conceivably cause undefined behavior if the conversion to z rounds up. It will work if the conversion is set to round toward zero when converting x to z.
I wrote a code to calculate error function or double erf(double x). It uses lots of constants in calculations which uses double as well. However, the requirement is to write a code float format or float erf(float). I have to maintain 6 decimals accuracy (typically for float).
When I converted erf(x) into float erf( double x), the results are still the same and accurate. However when I convert x to float or float erf(float x) I am getting some significant errors in small values of x.
Is there a way to convert float to double for x so that precision is still maintained within the code of erf(x)? My intuition tells me that my erf code is good only for double value numbers.
You can't convert from float to double and except that float will have the same precision of double.
With double you get double the precision of a float
Note that in C++ you have erf: http://en.cppreference.com/w/cpp/numeric/math/erf
Inside float erf(float x) you can cast the value of x to double at points where precision exceeding float is required.
float demoA(float x)
{
return x*x*x-1;
}
float demoB(float x)
{
return static_cast<double>(x)*x*x - 1;
}
In this case, demoB will return a much better value than demoA if the paramerter is close to one. The conversion of the first operator of the multiplication to double is enough, because it causes promotion of the other operand.