C/C++: 1.00000 <= 1.0f = False - c++
Can someone explain why 1.000000 <= 1.0f is false?
The code:
#include <iostream>
#include <stdio.h>
using namespace std;
int main(int argc, char **argv)
{
float step = 1.0f / 10;
float t;
for(t = 0; t <= 1.0f; t += step)
{
printf("t = %f\n", t);
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
}
printf("t = %f\n", t );
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
cout << "\n(1.000000 <= 1.0f) = " << (1.000000 <= 1.0f) << "\n";
}
The result:
t = 0.000000
t = 0
(t <= 1.0f) = 1
t = 0.100000
t = 0.1
(t <= 1.0f) = 1
t = 0.200000
t = 0.2
(t <= 1.0f) = 1
t = 0.300000
t = 0.3
(t <= 1.0f) = 1
t = 0.400000
t = 0.4
(t <= 1.0f) = 1
t = 0.500000
t = 0.5
(t <= 1.0f) = 1
t = 0.600000
t = 0.6
(t <= 1.0f) = 1
t = 0.700000
t = 0.7
(t <= 1.0f) = 1
t = 0.800000
t = 0.8
(t <= 1.0f) = 1
t = 0.900000
t = 0.9
(t <= 1.0f) = 1
t = 1.000000
t = 1
(t <= 1.0f) = 0
(1.000000 <= 1.0f) = 1
As correctly pointed out in the comments, the value of t is not actually the same 1.00000 that you are defining in the line below.
Printing t with higher precision with std::setprecision(20) will reveal its actual value: 1.0000001192092895508.
The common way to avoid these kinds of issues is to compare not with 1, but with 1 + epsilon, where epsilon is a very small number, that is maybe one or two magnitudes greater than your floating point precision.
So you would write your for loop condition as
for(t = 0; t <= 1.000001f; t += step)
Note that in your case, epsilon should be atleast ten times greater than the maximum possible floating point error, as the float is added ten times.
As pointed out by Muepe and Alain, the reason for t != 1.0f is that 1/10 can not be precisely represented in binary floating point numbers.
Floating point types in C++ (and most other languages) are implemented using an approach that uses the available bytes (for example 4 or 8) for the following 3 components:
Sign
Exponent
Mantissa
Lets have a look at it for a 32 bit (4 byte) type which often is what you have in C++ for float.
The sign is just a simple bit beeing 1 or 0 where 0 could mean its positive and 1 that its negative. If you leave every standardization away that exists you could also say 0 -> negative, 1 -> positive.
The exponent could use 8 bits. Opposed to our daily life this exponent is not ment to be used to the base 10 but base 2. That means 1 as an exponent does not correspond to 10 but to 2, and the exponent 2 means 4 (=2^2) and not 100 (=10^2).
Another important part is, that for floating point variables we also might want to have negative exponents like 2^-1 beeing 0.5, 2^-2 for 0.25 and so on. Thus we define a bias value that gets subtracted from the exponent and yields the real value. In this case with 8 bits we'd choose 127 meaning that an exponent of 0 gives 2^-127 and an exponent of 255 means 2^128. But, there is an exception to this case. Usually two values of the exponent are used to mark NaN and infinity. Therefore the real exponent is from 0 to 253 giving a range from 2^-127 to 2^126.
The mantissa obviously now fills up the remaining 23 bits. If we see the mantissa as a series of 0 and 1 you can imagine its value to be like 1.m where m is the series of those bits, but not in powers of 10 but in powers of 2. So 1.1 would be 1 * 2^0 + 1 * 2^-1 = 1 + 0.5 = 1.5. As an example lets have a look at the following mantissa (a very short one):
m = 100101 -> 1.100101 to base 2 -> 1 * 2^0 + 1 * 2^-1 + 0 * 2^-2 + 0 * 2^-3 + 1 * 2^-4 + 0 * 2^-5 + 1 * 2^-6 = 1 * 1 + 1 * 0.5 + 1 * 1/16 + 1 * 1/64 = 1.578125
The final result of a float is then calculated using:
e * 1.m * (sign ? -1 : 1)
What exactly is going wrong in your loop: Your step is 0.1! 0.1 is a very bad number for floating point numbers to base 2, lets have a look why:
sign -> 0 (as its non-negative)
exponent -> The first value smaller than 0.1 is 2^-4. So the exponent should be -4 + 127 = 123
mantissa -> For this we check how many times the exponent is 0.1 and then try to convert the fraction to a mantissa. 0.1 / (2^-4) = 0.1/0.0625 = 1.6. Considering the mantissa gives 1.m our mantissa should be 0.6. So lets convert that to binary:
0.6 = 1 * 2^-1 + 0.1 -> m = 1
0.1 = 0 * 2^-2 + 0.1 -> m = 10
0.1 = 0 * 2^-3 + 0.1 -> m = 100
0.1 = 1 * 2^-4 + 0.0375 -> m = 1001
0.0375 = 1 * 2^-5 + 0.00625 -> m = 10011
0.00625 = 0 * 2^-6 + 0.00625 -> m = 100110
0.00625 = 0 * 2^-7 + 0.00625 -> m = 1001100
0.00625 = 1 * 2^-8 + 0.00234375 -> m = 10011001
We could continue like thiw until we have our 23 mantissa bits but i can tell you that you get:
m = 10011001100110011001...
Therefore 0.1 in a binary floating point environment is like 1/3 is in a base 10 system. Its a periodic infinite number. As the space in a float is limited there comes the 23rd bit where it just has to cut of, and therefore 0.1 is a tiny bit greater than 0.1 as there are not all infinite parts of the number in the float and after 23 bits there would be a 0 but it gets rounded up to a 1.
The reason is that 1.0/10.0 = 0.1 can not be represented exactly in binary, just as 1.0/3.0 = 0.333.. can not be represented exactly in decimals.
If we use
float step = 1.0f / 8;
for example, the result is as expected.
To avoid such problems, use a small offset as shown in the answer of mic_e.
Related
Comput modulo between floating point numbers in C++
I have the following code to compute modulo between two floating point numbers: auto mod(float x, float denom) { return x>= 0 ? std::fmod(x, denom) : denom + std::fmod(x + 1.0f, denom) - 1.0f; } It does only work partially for negative x: -8 0 -7.75 0.25 -7.5 0.5 -7.25 0.75 -7 1 -6.75 1.25 -6.5 1.5 -6.25 1.75 -6 2 -5.75 2.25 -5.5 2.5 -5.25 2.75 -5 3 -4.75 -0.75 <== should be 3.25 -4.5 -0.5 <== should be 3.5 -4.25 -0.25 <== should be 3.75 -4 0 -3.75 0.25 -3.5 0.5 -3.25 0.75 -3 1 -2.75 1.25 -2.5 1.5 -2.25 1.75 -2 2 -1.75 2.25 -1.5 2.5 -1.25 2.75 -1 3 -0.75 3.25 -0.5 3.5 -0.25 3.75 0 0 How to fix it for negative x. Denom is assumed to be an integer greater than 0. Note: fmod as is provided by the standard library is broken for x < 0.0f. x is in the left column, and the output is in the right column, like so: for(size_t k = 0; k != 65; ++k) { auto x = 0.25f*(static_cast<float>(k) - 32); printf("%.8g %.8g\n", x, mod(x, 4)); }
Note: fmod as is provided by the standard library is broken for x < 0.0f I guess you want the result to always be a positive value1: In mathematics, the result of the modulo operation is an equivalence class, and any member of the class may be chosen as representative; however, the usual representative is the least positive residue, the smallest non-negative integer that belongs to that class (i.e., the remainder of the Euclidean division). The usual workaround was shown in Igor Tadetnik's comment, but that seems not enough. #IgorTandetnik That worked. Pesky signed zero though, but I guess you cannot do anything about that. Well, consider this(2, 3): auto mod(double x, double denom) { auto const r{ std::fmod(x, denom) }; return std::copysign(r < 0 ? r + denom : r, 1); } 1) https://en.wikipedia.org/wiki/Modulo 2) https://en.cppreference.com/w/cpp/numeric/math/copysign 3) https://godbolt.org/z/fdr9cbsYT
How do I reinterpret the exponents (2^31, ..., 2^0) of an integer?
I have a regular 32 bit Integer where the bits represent b31*2^31, b30*2^30, ..., b0*2^0 respectively. Is there a way to decrease each exponent by 32, i.e. calculate b31*2^-1 + b30*2^-2 + ... + b0*2^-32 (similar to an IEEE Float mantissa) without having to extract the bits (e.g. using shifts or mod)? E.g. for an 8 bit Integer, if input is 0010 1111 = 47 then output is 0 * 2^-1 + 0 * 2^-2 + 1 * 2^-3 + 0 * 2^-4 + 1 * 2^-5 + 1 * 2^-6 + 1 * 2^-7 + 1 * 2^-8 = 0.18359375.
Why does this expression resolve to zero?
int a = 1/2 == 0.25 * 2; I'm not sure why I'm not seeing this. Am I missing something with precedence?
Let digging: int a = 1/2 == 0.25 * 2; First, 1/2 == 0 (type of int), and 0.25 * 2 == 0.5 (type of double). So does 0 equal to 0.5? No. So a receives the value of 0 (FALSE).
get round numbers between 0 to 1
I have UISlider that produce numbers between 0 to 1, 0.0590829 0.0643739 .. I want to get the rounded number between them, like: 0.1 0.2 0.3 ... 1.0 found this (in c): float x = arc4random() % 11 * 0.1; but its not working on swift var x = arc4random() % 11 * 0.1; //error: binary operator '*' cannot be applied to operands of type 'UInt32' and 'Double' Thanks
Multiply by 10 to get values between 0.0 and 10.0 round to remove the decimal divide by 10 Example: let values = [0, 0.0643739, 0.590829, 0.72273, 1] for value in values { print("\(value) -> \(round(value * 10) / 10)") } // 0.0 -> 0.0 // 0.0643739 -> 0.1 // 0.590829 -> 0.6 // 0.72273 -> 0.7 // 1.0 -> 1.0
In binary notation, what is the meaning of the digits after the radix point "."?
I have this example on how to convert from a base 10 number to IEEE 754 float representation Number: 45.25 (base 10) = 101101.01 (base 2) Sign: 0 Normalized form N = 1.0110101 * 2^5 Exponent esp = 5 E = 5 + 127 = 132 (base 10) = 10000100 (base 2) IEEE 754: 0 10000100 01101010000000000000000 This makes sense to me except one passage: 45.25 (base 10) = 101101.01 (base 2) 45 is 101101 in binary and that's okay.. but how did they obtain the 0.25 as .01 ?
Simple place value. In base 10, you have these places: ... 103 102 101 100 . 10-1 10-2 10-3 ... ... thousands, hundreds, tens, ones . tenths, hundredths, thousandths ... Similarly, in binary (base 2) you have: ... 23 22 21 20 . 2-1 2-2 2-3 ... ... eights, fours, twos, ones . halves, quarters, eighths ... So the second place after the . in binary is units of 2-2, well known to you as units of 1/4 (or alternately, 0.25).
You can convert the part after the decimal point to another base by repeatedly multiplying by the new base (in this case the new base is 2), like this: 0.25 * 2 = 0.5 -> The first binary digit is 0 (take the integral part, i.e. the part before the decimal point). Continue multiplying with the part after the decimal point: 0.5 * 2 = 1.0 -> The second binary digit is 1 (again, take the integral part). This is also where we stop because the part after the decimal point is now zero, so there is nothing more to multiply. Therefore the final binary representation of the fractional part is: 0.012. Edit: Might also be worth noting that it's quite often that the binary representation is infinite even when starting with a finite fractional part in base 10. Example: converting 0.210 to binary: 0.2 * 2 = 0.4 -> 0 0.4 * 2 = 0.8 -> 0 0.8 * 2 = 1.6 -> 1 0.6 * 2 = 1.2 -> 1 0.2 * 2 = ... So we end up with: 0.001100110011...2. Using this method you see quite easily if the binary representation ends up being infinite.
"Decimals" (fractional bits) in other bases are surprisingly unintuitive considering they work in exactly the same way as integers. base 10 scinot 10e2 10e1 10e0 10e-1 10e-2 10e-3 weight 100.0 10.0 1.0 0.1 0.01 0.001 value 0 4 5 .2 5 0 base 2 scinot 2e6 2e5 2e4 2e3 2e2 2e1 2e0 2e-1 2e-2 2e-3 weight 64 32 16 8 4 2 1 .5 .25 .125 value 0 1 0 1 1 0 1 .0 1 0 If we start with 45.25, that's bigger/equal than 32, so we add a binary 1, and subtract 32. We're left with 13.25, which is smaller than 16, so we add a binary 0. We're left with 13.25, which is bigger/equal than 8, so we add a binary 1, and subtract 8. We're left with 05.25, which is bigger/equal than 4, so we add a binary 1, and subtract 4. We're left with 01.25, which is smaller than 2, so we add a binary 0. We're left with 01.25, which is bigger/equal than 1, so we add a binary 1, and subtract 1. With integers, we'd have zero left, so we stop. But: We're left with 00.25, which is smaller than 0.5, so we add a binary 0. We're left with 00.25, which is bigger/equal to 0.25, so we add a binary 1, and subtract 0.25. Now we have zero, so we stop (or not, you can keep going and calculating zeros forever if you want) Note that not all "easy" numbers in decimal always reach that zero stopping point. 0.1 (decimal) converted into base 2, is infinitely repeating: 0.0001100110011001100110011... However, all "easy" numbers in binary will always convert nicely into base 10. You can also do this same process with fractional (2.5), irrational (pi), or even imaginary(2i) bases, except the base cannot be between -1 and 1 inclusive .
2.00010 = 2+1 = 10.0002 1.00010 = 2+0 = 01.0002 0.50010 = 2-1 = 00.1002 0.25010 = 2-2 = 00.0102 0.12510 = 2-3 = 00.0012
The fractions base 2 are .1 = 1/2, .01 = 1/4. ...
Think of it this way (dot) 2^-1 2^-2 2^-3 etc so . 0/2 + 1/4 + 0/8 + 0/16 etc See http://floating-point-gui.de/formats/binary/
You can think of 0.25 as 1/4. Dividing by 2 in (base 2) moves the decimal point one step left, the same way dividing by 10 in (base 10) moves the decimal point one step left. Generally dividing by M in (base M) moves the decimal point one step left. so base 10 base 2 -------------------------------------- 1 => 1 1/2 = 0.5 => 0.1 0.5/2 = 1/4 = 0.25 => 0.01 0.25/2 = 1/8 = 0.125 => 0.001 . . . etc.