x64 rounding inconsistency in cvRound() (_mm_cvtsd_si32) - c++

On x64 Windows using MSVC2013, I am using the cvRound function of OpenCV with the intention of round up from x.5 values. I've come across an inconsistency where cvRound(17.5f) returns 18 (good!), but cvRound(20.5f) returns 20 and not 21 as expected
cvRound is simply implemented thus, so it seems to be an Microsoft inconsistency in _mm_cvtsd_si32().
int cvRound( double value )
{
__m128d t = _mm_set_sd( value );
return _mm_cvtsd_si32(t);
}
Can anyone suggest how/why this could be?
FWIW, cvRound(20.5f + 1e-3f) returns 21.

Small half-integers can be exactly represented by binary floating point -- 0.5 is a power of 2.
What is really going on is "rounding half to even." This is a way to remove a bias which occurs when half-integers are always rounded up.
http://en.wikipedia.org/wiki/Rounding#Round_half_to_even

The rounding behavior of the SSE instructions is configurable via the floating point environment (specifically, the MXCSR register). There are several IEEE rounding modes. The default rounding mode is round-to-nearest, ties-to-even, so if the value is exactly in the middle of two representable values, the result is rounded to the nearest even value.
Consider the following test program that demonstrates the different rounding modes in action:
#include <fenv.h>
#include <immintrin.h>
#include <stdio.h>
int main()
{
printf("Default: %d\n", _mm_cvtsd_si32(_mm_set_sd(20.5)));
fesetround(FE_DOWNWARD);
printf("FE_DOWNWARD: %d\n", _mm_cvtsd_si32(_mm_set_sd(20.5)));
fesetround(FE_UPWARD);
printf("FE_UPWARD: %d\n", _mm_cvtsd_si32(_mm_set_sd(20.5)));
fesetround(FE_TONEAREST);
printf("FE_TONEAREST: %d\n", _mm_cvtsd_si32(_mm_set_sd(20.5)));
fesetround(FE_TOWARDZERO);
printf("FE_TOWARDZERO: %d\n", _mm_cvtsd_si32(_mm_set_sd(20.5)));
}
Output:
Default: 20
FE_DOWNWARD: 20
FE_UPWARD: 21
FE_TONEAREST: 20
FE_TOWARDZERO: 20

The rounding works like that for the same reason that this code prints that the values are equal (tested with MSVC2012)
float f1 = 20.4999999f;
float f2 = 20.5f;
if(f1==f2)
printf("equal\n");
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Related

How to round a floating point type to two decimals or more in C++? [duplicate]

How can I round a float value (such as 37.777779) to two decimal places (37.78) in C?
If you just want to round the number for output purposes, then the "%.2f" format string is indeed the correct answer. However, if you actually want to round the floating point value for further computation, something like the following works:
#include <math.h>
float val = 37.777779;
float rounded_down = floorf(val * 100) / 100; /* Result: 37.77 */
float nearest = roundf(val * 100) / 100; /* Result: 37.78 */
float rounded_up = ceilf(val * 100) / 100; /* Result: 37.78 */
Notice that there are three different rounding rules you might want to choose: round down (ie, truncate after two decimal places), rounded to nearest, and round up. Usually, you want round to nearest.
As several others have pointed out, due to the quirks of floating point representation, these rounded values may not be exactly the "obvious" decimal values, but they will be very very close.
For much (much!) more information on rounding, and especially on tie-breaking rules for rounding to nearest, see the Wikipedia article on Rounding.
Using %.2f in printf. It only print 2 decimal points.
Example:
printf("%.2f", 37.777779);
Output:
37.77
Assuming you're talking about round the value for printing, then Andrew Coleson and AraK's answer are correct:
printf("%.2f", 37.777779);
But note that if you're aiming to round the number to exactly 37.78 for internal use (eg to compare against another value), then this isn't a good idea, due to the way floating point numbers work: you usually don't want to do equality comparisons for floating point, instead use a target value +/- a sigma value. Or encode the number as a string with a known precision, and compare that.
See the link in Greg Hewgill's answer to a related question, which also covers why you shouldn't use floating point for financial calculations.
How about this:
float value = 37.777779;
float rounded = ((int)(value * 100 + .5) / 100.0);
printf("%.2f", 37.777779);
If you want to write to C-string:
char number[24]; // dummy size, you should take care of the size!
sprintf(number, "%.2f", 37.777779);
Always use the printf family of functions for this. Even if you want to get the value as a float, you're best off using snprintf to get the rounded value as a string and then parsing it back with atof:
#include <math.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
double dround(double val, int dp) {
int charsNeeded = 1 + snprintf(NULL, 0, "%.*f", dp, val);
char *buffer = malloc(charsNeeded);
snprintf(buffer, charsNeeded, "%.*f", dp, val);
double result = atof(buffer);
free(buffer);
return result;
}
I say this because the approach shown by the currently top-voted answer and several others here -
multiplying by 100, rounding to the nearest integer, and then dividing by 100 again - is flawed in two ways:
For some values, it will round in the wrong direction because the multiplication by 100 changes the decimal digit determining the rounding direction from a 4 to a 5 or vice versa, due to the imprecision of floating point numbers
For some values, multiplying and then dividing by 100 doesn't round-trip, meaning that even if no rounding takes place the end result will be wrong
To illustrate the first kind of error - the rounding direction sometimes being wrong - try running this program:
int main(void) {
// This number is EXACTLY representable as a double
double x = 0.01499999999999999944488848768742172978818416595458984375;
printf("x: %.50f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.50f\n", res1);
printf("Rounded with round, then divided: %.50f\n", res2);
}
You'll see this output:
x: 0.01499999999999999944488848768742172978818416595459
Rounded with snprintf: 0.01000000000000000020816681711721685132943093776703
Rounded with round, then divided: 0.02000000000000000041633363423443370265886187553406
Note that the value we started with was less than 0.015, and so the mathematically correct answer when rounding it to 2 decimal places is 0.01. Of course, 0.01 is not exactly representable as a double, but we expect our result to be the double nearest to 0.01. Using snprintf gives us that result, but using round(100 * x) / 100 gives us 0.02, which is wrong. Why? Because 100 * x gives us exactly 1.5 as the result. Multiplying by 100 thus changes the correct direction to round in.
To illustrate the second kind of error - the result sometimes being wrong due to * 100 and / 100 not truly being inverses of each other - we can do a similar exercise with a very big number:
int main(void) {
double x = 8631192423766613.0;
printf("x: %.1f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.1f\n", res1);
printf("Rounded with round, then divided: %.1f\n", res2);
}
Our number now doesn't even have a fractional part; it's an integer value, just stored with type double. So the result after rounding it should be the same number we started with, right?
If you run the program above, you'll see:
x: 8631192423766613.0
Rounded with snprintf: 8631192423766613.0
Rounded with round, then divided: 8631192423766612.0
Oops. Our snprintf method returns the right result again, but the multiply-then-round-then-divide approach fails. That's because the mathematically correct value of 8631192423766613.0 * 100, 863119242376661300.0, is not exactly representable as a double; the closest value is 863119242376661248.0. When you divide that back by 100, you get 8631192423766612.0 - a different number to the one you started with.
Hopefully that's a sufficient demonstration that using roundf for rounding to a number of decimal places is broken, and that you should use snprintf instead. If that feels like a horrible hack to you, perhaps you'll be reassured by the knowledge that it's basically what CPython does.
Also, if you're using C++, you can just create a function like this:
string prd(const double x, const int decDigits) {
stringstream ss;
ss << fixed;
ss.precision(decDigits); // set # places after decimal
ss << x;
return ss.str();
}
You can then output any double myDouble with n places after the decimal point with code such as this:
std::cout << prd(myDouble,n);
There isn't a way to round a float to another float because the rounded float may not be representable (a limitation of floating-point numbers). For instance, say you round 37.777779 to 37.78, but the nearest representable number is 37.781.
However, you can "round" a float by using a format string function.
You can still use:
float ceilf(float x); // don't forget #include <math.h> and link with -lm.
example:
float valueToRound = 37.777779;
float roundedValue = ceilf(valueToRound * 100) / 100;
In C++ (or in C with C-style casts), you could create the function:
/* Function to control # of decimal places to be output for x */
double showDecimals(const double& x, const int& numDecimals) {
int y=x;
double z=x-y;
double m=pow(10,numDecimals);
double q=z*m;
double r=round(q);
return static_cast<double>(y)+(1.0/m)*r;
}
Then std::cout << showDecimals(37.777779,2); would produce: 37.78.
Obviously you don't really need to create all 5 variables in that function, but I leave them there so you can see the logic. There are probably simpler solutions, but this works well for me--especially since it allows me to adjust the number of digits after the decimal place as I need.
Use float roundf(float x).
"The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.5
#include <math.h>
float y = roundf(x * 100.0f) / 100.0f;
Depending on your float implementation, numbers that may appear to be half-way are not. as floating-point is typically base-2 oriented. Further, precisely rounding to the nearest 0.01 on all "half-way" cases is most challenging.
void r100(const char *s) {
float x, y;
sscanf(s, "%f", &x);
y = round(x*100.0)/100.0;
printf("%6s %.12e %.12e\n", s, x, y);
}
int main(void) {
r100("1.115");
r100("1.125");
r100("1.135");
return 0;
}
1.115 1.115000009537e+00 1.120000004768e+00
1.125 1.125000000000e+00 1.129999995232e+00
1.135 1.134999990463e+00 1.139999985695e+00
Although "1.115" is "half-way" between 1.11 and 1.12, when converted to float, the value is 1.115000009537... and is no longer "half-way", but closer to 1.12 and rounds to the closest float of 1.120000004768...
"1.125" is "half-way" between 1.12 and 1.13, when converted to float, the value is exactly 1.125 and is "half-way". It rounds toward 1.13 due to ties to even rule and rounds to the closest float of 1.129999995232...
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 and rounds to the closest float of 1.129999995232...
If code used
y = roundf(x*100.0f)/100.0f;
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 but incorrectly rounds to float of 1.139999985695... due to the more limited precision of float vs. double. This incorrect value may be viewed as correct, depending on coding goals.
Code definition :
#define roundz(x,d) ((floor(((x)*pow(10,d))+.5))/pow(10,d))
Results :
a = 8.000000
sqrt(a) = r = 2.828427
roundz(r,2) = 2.830000
roundz(r,3) = 2.828000
roundz(r,5) = 2.828430
double f_round(double dval, int n)
{
char l_fmtp[32], l_buf[64];
char *p_str;
sprintf (l_fmtp, "%%.%df", n);
if (dval>=0)
sprintf (l_buf, l_fmtp, dval);
else
sprintf (l_buf, l_fmtp, dval);
return ((double)strtod(l_buf, &p_str));
}
Here n is the number of decimals
example:
double d = 100.23456;
printf("%f", f_round(d, 4));// result: 100.2346
printf("%f", f_round(d, 2));// result: 100.23
I made this macro for rounding float numbers.
Add it in your header / being of file
#define ROUNDF(f, c) (((float)((int)((f) * (c))) / (c)))
Here is an example:
float x = ROUNDF(3.141592, 100)
x equals 3.14 :)
Let me first attempt to justify my reason for adding yet another answer to this question. In an ideal world, rounding is not really a big deal. However, in real systems, you may need to contend with several issues that can result in rounding that may not be what you expect. For example, you may be performing financial calculations where final results are rounded and displayed to users as 2 decimal places; these same values are stored with fixed precision in a database that may include more than 2 decimal places (for various reasons; there is no optimal number of places to keep...depends on specific situations each system must support, e.g. tiny items whose prices are fractions of a penny per unit); and, floating point computations performed on values where the results are plus/minus epsilon. I have been confronting these issues and evolving my own strategy over the years. I won't claim that I have faced every scenario or have the best answer, but below is an example of my approach so far that overcomes these issues:
Suppose 6 decimal places is regarded as sufficient precision for calculations on floats/doubles (an arbitrary decision for the specific application), using the following rounding function/method:
double Round(double x, int p)
{
if (x != 0.0) {
return ((floor((fabs(x)*pow(double(10.0),p))+0.5))/pow(double(10.0),p))*(x/fabs(x));
} else {
return 0.0;
}
}
Rounding to 2 decimal places for presentation of a result can be performed as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,8),6),2));
For val = 6.825, result is 6.83 as expected.
For val = 6.824999, result is 6.82. Here the assumption is that the calculation resulted in exactly 6.824999 and the 7th decimal place is zero.
For val = 6.8249999, result is 6.83. The 7th decimal place being 9 in this case causes the Round(val,6) function to give the expected result. For this case, there could be any number of trailing 9s.
For val = 6.824999499999, result is 6.83. Rounding to the 8th decimal place as a first step, i.e. Round(val,8), takes care of the one nasty case whereby a calculated floating point result calculates to 6.8249995, but is internally represented as 6.824999499999....
Finally, the example from the question...val = 37.777779 results in 37.78.
This approach could be further generalized as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,N+2),N),2));
where N is precision to be maintained for all intermediate calculations on floats/doubles. This works on negative values as well. I do not know if this approach is mathematically correct for all possibilities.
...or you can do it the old-fashioned way without any libraries:
float a = 37.777779;
int b = a; // b = 37
float c = a - b; // c = 0.777779
c *= 100; // c = 77.777863
int d = c; // d = 77;
a = b + d / (float)100; // a = 37.770000;
That of course if you want to remove the extra information from the number.
this function takes the number and precision and returns the rounded off number
float roundoff(float num,int precision)
{
int temp=(int )(num*pow(10,precision));
int num1=num*pow(10,precision+1);
temp*=10;
temp+=5;
if(num1>=temp)
num1+=10;
num1/=10;
num1*=10;
num=num1/pow(10,precision+1);
return num;
}
it converts the floating point number into int by left shifting the point and checking for the greater than five condition.

Double rounding error, even when using DBL_DIG

I am trying to generate a random number between -10 and 10 with step 0.3 (though I want to have these be arbitrary values) and am having issues with double precision floating point accuracy. Float.h's DBL_DIG is meant to be the minimum accuracy at which no rounding error occurs [EDIT: This is false, see Eric Postpischil's comment for a true definition of DBL_DIG], yet when printing to this many digits, I still see rounding error.
#include <stdio.h>
#include <float.h>
#include <stdlib.h>
int main()
{
for (;;)
{
printf("%.*g\n", DBL_DIG, -10 + (rand() % (unsigned long)(20 / 0.3)) * 0.3);
}
}
When I run this, I get this output:
8.3
-7
1.7
-6.1
-3.1
1.1
-3.4
-8.2
-9.1
-9.7
-7.6
-7.9
1.4
-2.5
-1.3
-8.8
2.6
6.2
3.8
-3.4
9.5
-7.6
-1.9
-0.0999999999999996
-2.2
5
3.2
2.9
-2.5
2.9
9.5
-4.6
6.2
0.799999999999999
-1.3
-7.3
-7.9
Of course, a simple solution would be to just #define DBL_DIG 14 but I feel that is wasting accuracy. Why is this happening and how do I prevent this happening? This is not a duplicate of Is floating point math broken? since I am asking about DBL_DIG, and how to find the minimum accuracy at which no error occurs.
For the specific code in the question, we can avoid excess rounding errors by using integer values until the last moment:
printf("%.*g\n", DBL_DIG,
(-100 + rand() % (unsigned long)(20 / 0.3) * 3.) / 10.);
This was obtained by multiplying each term in the original expression by 10 (−10 because −100 and .3 becomes 3) and then dividing the whole expression by 10. So all values we care about in the numerator1 are integers, which floating-point represents exactly (within range of its precision).
Since the integer values will be computed exactly, there will be just a single rounding error, in the final division by 10, and the result will be the double closest to the desired value.
How many digits should I print to in order to avoid rounding error in most circumstances? (not just in my example above)
Just using more digits is not a solution for general cases. One approach for avoiding error in most cases is to learn about floating-point formats and arithmetic in considerable detail and then write code thoughtfully and meticulously. This approach is generally good but not always successful as it is usually implemented by humans, who continue to make mistakes in spite of all efforts to the contrary.
Footnote
1 Considering (unsigned long)(20 / 0.3) is a longer discussion involving intent and generalization to other values and cases.
generate a random number between -10 and 10 with step 0.3
I would like the program to work with arbitrary values for the bounds and step size.
Why is this happening ....
The source of trouble is assuming that typcial real numbers (such as string "0.3") can encode exactly as a double.
A double can encode about 264 different values exactly. 0.3 is not one of them.
Instead the nearest double is used.
The exact value and 2 nearest are listed below:
0.29999999999999993338661852249060757458209991455078125
0.299999999999999988897769753748434595763683319091796875 (best 0.3)
0.3000000000000000444089209850062616169452667236328125
So OP's code is attempting "-10 and 10 with step 0.2999..." and printing out "-0.0999999999999996" and "0.799999999999999" is more correct than "-0.1" and "0.8".
.... how do I prevent this happening?
Print with a more limited precision.
// reduce the _bit_ output precision by about the root of steps
#define LOG10_2 0.30102999566398119521373889472449
int digits_less = lround(sqrt(20 / 0.3) * LOG10_2);
for (int i = 0; i < 100; i++) {
printf("%.*e\n", DBL_DIG - digits_less,
-10 + (rand() % (unsigned long) (20 / 0.3)) * 0.3);
}
9.5000000000000e+00
-3.7000000000000e+00
8.6000000000000e+00
5.9000000000000e+00
...
-1.0000000000000e-01
8.0000000000000e-01
OP's code really is not doings "steps" as that hints toward a loop with a step of 0.3. The above digits_less is based on repetitive "steps", otherwise OP's equation warrants about 1 decimal digit reduction. The best reduction in precisions depends on estimating the potential cumulative error of all calculations from "0.3" conversion --> double 0.3 (1/2 bit), division (1/2 bit), multiplication (1/2 bit) and addition (more complicated bit).
Wait for the next version of C which may support decimal floating point.

C++ int64 * double == off by one

Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:
#include <stdint.h>
#include <iostream>
using namespace std;
int main() {
double odds = 1.18;
int64_t st = 1000000000;
int64_t res = st * odds;
cout << "result: " << res << endl;
return 1;
}
I appreciate any feedback.
1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.
So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:
odds = 1 / 3 and st = 30000
Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:
odds = 0.333333
0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.
There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.
Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.
An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.
First of all - 1.18 is not exactly representable in double. Mathematically the result of:
double odds = 1.18;
is 1.17999999999999993782751062099 (according to an online calculator).
So, mathematically, odds * st is 1179999999.99999993782751062099.
But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:
Do the computation in double precision
Do the computation in higher precision and then round the result to double
Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.
However, doing it in long double precision produces something more like 1179999999.99999993782751062099
Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.
Then converting this next-lowest result to integer will truncate the fractional part.
There is an interesting blog post here where the author describes the behaviour of GCC:
It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)
According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.

Unexpected Output when adding two float numbers

I wrote the following C++ code:
float a, b;
int c;
a = 8.6;
b = 1.4;
c = a + b;
printf("%d\n", c);
The output is 10.
But when I run the following code:
float a, b;
int c;
a = 8.7;
b = 1.3;
c = a + b;
printf("%d\n", c);
The output is 9.
What is the difference between the two, as they are giving different outputs?
There is no such number as 8.7 or 1.3 in floating point. There is a number 10, and a number -6.5, and a number 0.96044921875... but no 8.7 or 1.3.
At best, your computer can round 8.7 to the nearest floating point number, and round 1.3 to the nearest floating point number as well. The computer adds these rounded numbers to each other, and then rounds the result.
Do not use floating point numbers for money.
#include <stdio.h>
int main(int argc, char *argv[])
{
float a = 8.7, b = 1.3;
printf("Looks like: %.1f + %.1f = %.1f\n", a, b, a+b);
printf("The truth: %.20f + %.20f = %.20f\n", a, b, a+b);
return 0;
}
On an x86 GCC/Linux computer, I get the result:
Looks like: 8.7 + 1.3 = 10.0
The truth: 8.69999980926513671875 + 1.29999995231628417969 = 9.99999976158142089844
On a PPC GCC/OS X computer, I get the result:
Looks like: 8.7 + 1.3 = 10.0
The truth: 8.69999980926513671875 + 1.29999995231628417969 = 10.00000000000000000000
Notice how 8.7 and 1.3 are both rounded down in this particular case. If you chose numbers that get rounded up, you might see a number larger than 10 on the right hand side.
See What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg (link).
Floating point numbers are not the same as real numbers and their behavior is quite different.
Real numbers are infinite, while floating point numbers are finite and can only represent a small subset of all the possible real numbers.
Since not all real numbers can be represented as floating point, a floating point assignment or operation may give you slightly different results than the same done in the real number space.
See the wikipedia entry on floating point for an introduction. The section about floating point accuracy is particularly interesting and gives other examples similar to yours.
There's no real difference between the two. They both behave in ways that are unpredictable.
What you're doing is equivalent to flipping a coin twice and asking what you did differently to get heads one time and tails the other. It's not that you did anything different, it's that this is what happens when you flip coins.
If you ask a person to add one third and two thirds using 6 digit decimal precision and then round down to an integer, you might get 0 and you might get 1. It will depend on things like whether they represent 2/3 as "0.666666" or "0.6666667" and they're both acceptable. So both 0 and 1 are acceptable answers. If you're not prepared to accept either answer, don't ask that kind of question.

union consisting of float : completely insane output

#include <stdio.h>
union NumericType
{
float value;
int intvalue;
}Values;
int main()
{
Values.value = 1094795585.00;
printf("%f \n",Values.value);
return 0;
}
This program outputs as :
1094795648.000000
Can anybody explain Why is this happening? Why did the value of the float Values.value increase? Or am I missing something here?
First off, this has nothing whatsoever to do with the use of a union.
Now, suppose you write:
int x = 1.5;
printf("%d\n", x);
what will happen? 1.5 is not an integer value, so it gets converted to an integer (by truncation) and x so actually gets the value 1, which is exactly what is printed.
The exact same thing is happening in your example.
float x = 1094795585.0;
printf("%f\n", x);
1094795585.0 is not representable as a single precision floating-point number, so it gets converted to a representable value. This happens via rounding. The two closest values are:
1094795520 (0x41414100) -- closest `float` smaller than your number
1094795585 (0x41414141) -- your number
1094795648 (0x41414180) -- closest `float` larger than your number
Because your number is slightly closer to the larger value (this is somewhat easier to see if you look at the hexadecimal representation), it rounds to that value, so that is the value stored in x, and that is the value that is printed.
A float isn't as precise as you would like it to be. Its mantissa of an effective 24 bit only provides a precision of 7-8 decimal digits. Your example requires 10 decimal digits precision. A double has an effective 53 bit mantissa which provides 15-16 digits of precision which is enough for your purpose.
It's because your float type doesn't have the precision to display that number. Use a double.
floats only have 7 digits of precision
See this link for more details:
link text
When I do this, I get the same results:
int _tmain(int argc, _TCHAR* argv[])
{
float f = 1094795585.00f;
// 1094795648.000000
printf("%f \n",f);
return 0;
}
I simply don't understand why people use floats - they are often no faster than doubles and may be slower. This code:
#include <stdio.h>
union NumericType
{
double value;
int intvalue;
}Values;
int main()
{
Values.value = 1094795585.00;
printf("%lf \n",Values.value);
return 0;
}
produces:
1094795585.000000
By default a printf of float with %f will give precision 6 after the decimal. If you want a precision of 2 digits after the decimal use %.2f.
Even the below gives same result
#include <stdio.h>
union NumericType
{
float value;
int intvalue;
}Values;
int main()
{
Values.value = 1094795585;
printf("%f \n",Values.value);
return 0;
}
Result
./a.out
1094795648.000000
It only complicates things to speak of decimal digits because this is binary arithmetic. To explain this we can begin by looking at the set of integers in the single precision format where all the integers are representable. Since the single precision format has 23+1=24 bits of precision that means that the range is
0 to 2^24-1
This is not good or detailed enough for explaining so I'll refine it further to
0 to 2^24-2^0 in steps of 2^0
The next higher set is
0 to 2^25-2^1 in steps of 2^1
The next lower set is
0 to 2^23-2^-1 in steps of 2^-1
Your number, 1094795585 (0x41414141 in hex), falls in the range that has a maximum of slightly less than 2^31 =. That range can be expressed in detail as 0 to 2^31-2^7 in steps of 2^7. It's logical because 2^31 is 7 powers of 2 greater than 24. Therefore the increments must also be 7 powers of 2 greater.
Looking at the "next lower" and "next higher" values mentioned in another post we see that the difference between them is 128 i e 2^7.
There's really nothing strange or weird or funny or even magic about this. It's actually absolutely clear and quite simple.