C++ int64 * double == off by one

C++ int64 * double == off by one - c++

Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:
#include <stdint.h>
#include <iostream>
using namespace std;
int main() {
double odds = 1.18;
int64_t st = 1000000000;
int64_t res = st * odds;
cout << "result: " << res << endl;
return 1;
}
I appreciate any feedback.

1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.
So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:
odds = 1 / 3 and st = 30000
Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:
odds = 0.333333
0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.
There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.
Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.
An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.

First of all - 1.18 is not exactly representable in double. Mathematically the result of:
double odds = 1.18;
is 1.17999999999999993782751062099 (according to an online calculator).
So, mathematically, odds * st is 1179999999.99999993782751062099.
But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:
Do the computation in double precision
Do the computation in higher precision and then round the result to double
Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.
However, doing it in long double precision produces something more like 1179999999.99999993782751062099
Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.
Then converting this next-lowest result to integer will truncate the fractional part.
There is an interesting blog post here where the author describes the behaviour of GCC:
It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)
According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.

Related

How to round a floating point type to two decimals or more in C++? [duplicate]

How can I round a float value (such as 37.777779) to two decimal places (37.78) in C?

If you just want to round the number for output purposes, then the "%.2f" format string is indeed the correct answer. However, if you actually want to round the floating point value for further computation, something like the following works:
#include <math.h>
float val = 37.777779;
float rounded_down = floorf(val * 100) / 100; /* Result: 37.77 */
float nearest = roundf(val * 100) / 100; /* Result: 37.78 */
float rounded_up = ceilf(val * 100) / 100; /* Result: 37.78 */
Notice that there are three different rounding rules you might want to choose: round down (ie, truncate after two decimal places), rounded to nearest, and round up. Usually, you want round to nearest.
As several others have pointed out, due to the quirks of floating point representation, these rounded values may not be exactly the "obvious" decimal values, but they will be very very close.
For much (much!) more information on rounding, and especially on tie-breaking rules for rounding to nearest, see the Wikipedia article on Rounding.

Using %.2f in printf. It only print 2 decimal points.
Example:
printf("%.2f", 37.777779);
Output:
37.77

Assuming you're talking about round the value for printing, then Andrew Coleson and AraK's answer are correct:
printf("%.2f", 37.777779);
But note that if you're aiming to round the number to exactly 37.78 for internal use (eg to compare against another value), then this isn't a good idea, due to the way floating point numbers work: you usually don't want to do equality comparisons for floating point, instead use a target value +/- a sigma value. Or encode the number as a string with a known precision, and compare that.
See the link in Greg Hewgill's answer to a related question, which also covers why you shouldn't use floating point for financial calculations.

How about this:
float value = 37.777779;
float rounded = ((int)(value * 100 + .5) / 100.0);

printf("%.2f", 37.777779);
If you want to write to C-string:
char number[24]; // dummy size, you should take care of the size!
sprintf(number, "%.2f", 37.777779);

Always use the printf family of functions for this. Even if you want to get the value as a float, you're best off using snprintf to get the rounded value as a string and then parsing it back with atof:
#include <math.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
double dround(double val, int dp) {
int charsNeeded = 1 + snprintf(NULL, 0, "%.*f", dp, val);
char *buffer = malloc(charsNeeded);
snprintf(buffer, charsNeeded, "%.*f", dp, val);
double result = atof(buffer);
free(buffer);
return result;
}
I say this because the approach shown by the currently top-voted answer and several others here -
multiplying by 100, rounding to the nearest integer, and then dividing by 100 again - is flawed in two ways:
For some values, it will round in the wrong direction because the multiplication by 100 changes the decimal digit determining the rounding direction from a 4 to a 5 or vice versa, due to the imprecision of floating point numbers
For some values, multiplying and then dividing by 100 doesn't round-trip, meaning that even if no rounding takes place the end result will be wrong
To illustrate the first kind of error - the rounding direction sometimes being wrong - try running this program:
int main(void) {
// This number is EXACTLY representable as a double
double x = 0.01499999999999999944488848768742172978818416595458984375;
printf("x: %.50f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.50f\n", res1);
printf("Rounded with round, then divided: %.50f\n", res2);
}
You'll see this output:
x: 0.01499999999999999944488848768742172978818416595459
Rounded with snprintf: 0.01000000000000000020816681711721685132943093776703
Rounded with round, then divided: 0.02000000000000000041633363423443370265886187553406
Note that the value we started with was less than 0.015, and so the mathematically correct answer when rounding it to 2 decimal places is 0.01. Of course, 0.01 is not exactly representable as a double, but we expect our result to be the double nearest to 0.01. Using snprintf gives us that result, but using round(100 * x) / 100 gives us 0.02, which is wrong. Why? Because 100 * x gives us exactly 1.5 as the result. Multiplying by 100 thus changes the correct direction to round in.
To illustrate the second kind of error - the result sometimes being wrong due to * 100 and / 100 not truly being inverses of each other - we can do a similar exercise with a very big number:
int main(void) {
double x = 8631192423766613.0;
printf("x: %.1f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.1f\n", res1);
printf("Rounded with round, then divided: %.1f\n", res2);
}
Our number now doesn't even have a fractional part; it's an integer value, just stored with type double. So the result after rounding it should be the same number we started with, right?
If you run the program above, you'll see:
x: 8631192423766613.0
Rounded with snprintf: 8631192423766613.0
Rounded with round, then divided: 8631192423766612.0
Oops. Our snprintf method returns the right result again, but the multiply-then-round-then-divide approach fails. That's because the mathematically correct value of 8631192423766613.0 * 100, 863119242376661300.0, is not exactly representable as a double; the closest value is 863119242376661248.0. When you divide that back by 100, you get 8631192423766612.0 - a different number to the one you started with.
Hopefully that's a sufficient demonstration that using roundf for rounding to a number of decimal places is broken, and that you should use snprintf instead. If that feels like a horrible hack to you, perhaps you'll be reassured by the knowledge that it's basically what CPython does.

Also, if you're using C++, you can just create a function like this:
string prd(const double x, const int decDigits) {
stringstream ss;
ss << fixed;
ss.precision(decDigits); // set # places after decimal
ss << x;
return ss.str();
}
You can then output any double myDouble with n places after the decimal point with code such as this:
std::cout << prd(myDouble,n);

There isn't a way to round a float to another float because the rounded float may not be representable (a limitation of floating-point numbers). For instance, say you round 37.777779 to 37.78, but the nearest representable number is 37.781.
However, you can "round" a float by using a format string function.

You can still use:
float ceilf(float x); // don't forget #include <math.h> and link with -lm.
example:
float valueToRound = 37.777779;
float roundedValue = ceilf(valueToRound * 100) / 100;

In C++ (or in C with C-style casts), you could create the function:
/* Function to control # of decimal places to be output for x */
double showDecimals(const double& x, const int& numDecimals) {
int y=x;
double z=x-y;
double m=pow(10,numDecimals);
double q=z*m;
double r=round(q);
return static_cast<double>(y)+(1.0/m)*r;
}
Then std::cout << showDecimals(37.777779,2); would produce: 37.78.
Obviously you don't really need to create all 5 variables in that function, but I leave them there so you can see the logic. There are probably simpler solutions, but this works well for me--especially since it allows me to adjust the number of digits after the decimal place as I need.

Use float roundf(float x).
"The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.5
#include <math.h>
float y = roundf(x * 100.0f) / 100.0f;
Depending on your float implementation, numbers that may appear to be half-way are not. as floating-point is typically base-2 oriented. Further, precisely rounding to the nearest 0.01 on all "half-way" cases is most challenging.
void r100(const char *s) {
float x, y;
sscanf(s, "%f", &x);
y = round(x*100.0)/100.0;
printf("%6s %.12e %.12e\n", s, x, y);
}
int main(void) {
r100("1.115");
r100("1.125");
r100("1.135");
return 0;
}
1.115 1.115000009537e+00 1.120000004768e+00
1.125 1.125000000000e+00 1.129999995232e+00
1.135 1.134999990463e+00 1.139999985695e+00
Although "1.115" is "half-way" between 1.11 and 1.12, when converted to float, the value is 1.115000009537... and is no longer "half-way", but closer to 1.12 and rounds to the closest float of 1.120000004768...
"1.125" is "half-way" between 1.12 and 1.13, when converted to float, the value is exactly 1.125 and is "half-way". It rounds toward 1.13 due to ties to even rule and rounds to the closest float of 1.129999995232...
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 and rounds to the closest float of 1.129999995232...
If code used
y = roundf(x*100.0f)/100.0f;
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 but incorrectly rounds to float of 1.139999985695... due to the more limited precision of float vs. double. This incorrect value may be viewed as correct, depending on coding goals.

Code definition :
#define roundz(x,d) ((floor(((x)*pow(10,d))+.5))/pow(10,d))
Results :
a = 8.000000
sqrt(a) = r = 2.828427
roundz(r,2) = 2.830000
roundz(r,3) = 2.828000
roundz(r,5) = 2.828430

double f_round(double dval, int n)
{
char l_fmtp[32], l_buf[64];
char *p_str;
sprintf (l_fmtp, "%%.%df", n);
if (dval>=0)
sprintf (l_buf, l_fmtp, dval);
else
sprintf (l_buf, l_fmtp, dval);
return ((double)strtod(l_buf, &p_str));
}
Here n is the number of decimals
example:
double d = 100.23456;
printf("%f", f_round(d, 4));// result: 100.2346
printf("%f", f_round(d, 2));// result: 100.23

I made this macro for rounding float numbers.
Add it in your header / being of file
#define ROUNDF(f, c) (((float)((int)((f) * (c))) / (c)))
Here is an example:
float x = ROUNDF(3.141592, 100)
x equals 3.14 :)

Let me first attempt to justify my reason for adding yet another answer to this question. In an ideal world, rounding is not really a big deal. However, in real systems, you may need to contend with several issues that can result in rounding that may not be what you expect. For example, you may be performing financial calculations where final results are rounded and displayed to users as 2 decimal places; these same values are stored with fixed precision in a database that may include more than 2 decimal places (for various reasons; there is no optimal number of places to keep...depends on specific situations each system must support, e.g. tiny items whose prices are fractions of a penny per unit); and, floating point computations performed on values where the results are plus/minus epsilon. I have been confronting these issues and evolving my own strategy over the years. I won't claim that I have faced every scenario or have the best answer, but below is an example of my approach so far that overcomes these issues:
Suppose 6 decimal places is regarded as sufficient precision for calculations on floats/doubles (an arbitrary decision for the specific application), using the following rounding function/method:
double Round(double x, int p)
{
if (x != 0.0) {
return ((floor((fabs(x)*pow(double(10.0),p))+0.5))/pow(double(10.0),p))*(x/fabs(x));
} else {
return 0.0;
}
}
Rounding to 2 decimal places for presentation of a result can be performed as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,8),6),2));
For val = 6.825, result is 6.83 as expected.
For val = 6.824999, result is 6.82. Here the assumption is that the calculation resulted in exactly 6.824999 and the 7th decimal place is zero.
For val = 6.8249999, result is 6.83. The 7th decimal place being 9 in this case causes the Round(val,6) function to give the expected result. For this case, there could be any number of trailing 9s.
For val = 6.824999499999, result is 6.83. Rounding to the 8th decimal place as a first step, i.e. Round(val,8), takes care of the one nasty case whereby a calculated floating point result calculates to 6.8249995, but is internally represented as 6.824999499999....
Finally, the example from the question...val = 37.777779 results in 37.78.
This approach could be further generalized as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,N+2),N),2));
where N is precision to be maintained for all intermediate calculations on floats/doubles. This works on negative values as well. I do not know if this approach is mathematically correct for all possibilities.

...or you can do it the old-fashioned way without any libraries:
float a = 37.777779;
int b = a; // b = 37
float c = a - b; // c = 0.777779
c *= 100; // c = 77.777863
int d = c; // d = 77;
a = b + d / (float)100; // a = 37.770000;
That of course if you want to remove the extra information from the number.

this function takes the number and precision and returns the rounded off number
float roundoff(float num,int precision)
{
int temp=(int )(num*pow(10,precision));
int num1=num*pow(10,precision+1);
temp*=10;
temp+=5;
if(num1>=temp)
num1+=10;
num1/=10;
num1*=10;
num=num1/pow(10,precision+1);
return num;
}
it converts the floating point number into int by left shifting the point and checking for the greater than five condition.

double to scientific notation conversion - precision error

I'm writing a piece of code to convert double values to scientific notations upto a precision of 15 in C++. I know I can use standard libraries like sprintf with %e option to do this. But I would need to come out with my own solution.
I'm trying something like this.
double norm = 68600000;
if (norm)
{
while (norm >= 10.0)
{
norm /= 10.0;
exp++;
}
while (norm < 1.0)
{
norm *= 10.0;
exp--;
}
}
The result I get is
norm = 6.8599999999999994316;
exp = 7
The reason for loosing this precision I clarified from this question
Now I try to round the value to the precision of 15, which would result in
6.859 999 999 999 999
(its evident that since the 16th decimal point is less than 5 we get this result)
Expected answer: norm = 6.860 000 000 000 000, exp = 7
My question is, is there any better way for double to scientific notation conversion to the precision of 15(without using the standard libs), so that I would get exactly 6.86 when I round. If you have noticed the problem here is not with the rounding mechanism, but with the double to scientific notation conversion due to the precision loss related to machine epsilon

You could declare norm as a long double for some more precision. long double wiki Although there are some compiler specific issues to be aware of. Some compilers make long double synonymous with double.
Another way to go about solving this precision problem is to work with numbers in the form of strings and implement custom arithmetic operations for strings that would not be subject to machine epsilon.
For example:
int getEXP(string norm){ return norm.length() - 1; };
string norm = "68600000";
int exp = getEXP(norm); // returns 7
The next step would be to implement functions to insert a decimal character into the appropriate place in the norm string, and add whatever level of precision you'd like. No machine epsilon to worry about.

Accurate percentage in C++

Given 2 numbers, where A <= B say for example A = 9 and B = 10, I am trying to get the percentage of how smaller A is compared to B. I need to have the percentage as an int e.g. if the result is 10.00% The int should be 1000.
Here is my code:
int A = 9;
int B = 10;
int percentage = (((1 - (double)A/B) / 0.01)) * 100;
My code returns 999 instead of 1000. Some precision related to the usage of double is lost.
Is there a way to avoid losing precision in my case?

Seems the formula you're looking for is
int result = 10000 - (A*10000+B/2)/B;
The idea is to do all computations in integers and delaying division.
To do the rounding half of the denominator is added before performing the division (otherwise you get truncation in the division and thus upper rounding because of 100%-x)
For example with A=9 and B=11 the percentage is 18.18181818... and rounding 18.18, the computation without the rounding would give 1819 instead of the expected result 1818.
Note that the computation is done all in integers so there is a risk of overflow for large values of A and B. For example if int is 32 bit then A can be up to around 200000 before risking an overflow when computing A*10000.
Using A*10000LL instead of A*10000 in the formula will trade in some speed to raise the limit to a much bigger value.

Offcourse there may be precision loss in floating point number. Either you should use fixed point number as #6502 answered or add a bias to the result to get the intended answer.
You should better do
assert(B != 0);
int percentage = ((A<0) == (B<0) ? 0.5 : -0.5) + (((1 - (double)A/B) / 0.01)) * 100;
Because of precision loss, result of (((1 - (double)A/B) / 0.01)) * 100 may be slightly less or more than intended. If you add extra 0.5, it is guaranteed to be sligthly more than intended. Now when you cast this value to an integer, you get intended answer. (floor or ceil value depending whether the fractional part of the result of equation was above or below 0.5)

I tried
float floatpercent = (((1 - (double)A/B) / 0.01)) * 100;
int percentage = (int) floatpercent;
cout<< percentage;
displays 1000
I suspect a precision loss on automatic casting to int as the root problem to your code.

[I alluded to this in a comment to the original question, but I though I'd post it as an answer.]
The core problem is that the form of expression you're using amplifies the unavoidable floating point loss of precision when representing simple fractions of 10.
Your expression (with casts stripped out for now, using standard precedence to also avoid some parens)
((1 - A/B) / 0.01) * 100
is quite a complicated way of representing what you want, although it's algebraically correct. Unfortunately, floating point numbers can only precisely represent numbers like 1/2, 1/4, 1/8, etc, their multiples, and sums of those. In particular, neither 9/10 or 1/10 or 1/100 have precise representations.
The above expression introduces these errors twice: first in the calculation of A/B, and then in the division by 0.01. These two imprecise values are then divided, which further amplifies the inherent error.
The most direct way to write what you meant (again without needed casts) is
((B-A) / B) * 10000
This produces the correct answer and considerably easier to read, I would suggest, than the original. The fully correct C form is
((B - A) / (double)B) * 10000
I've tested this and it works reliably. As others have noted, it's generally good better to work with doubles instead of floats, as their extra precision makes them less prone (but not immune) to this sort of difficulty.

Detecting precision loss when converting from double to float

I am writing a piece of code in which i have to convert from double to float values. I am using boost::numeric_cast to do this conversion which will alert me of any overflow/underflow. However i am also interested in knowing if that conversion resulted in some precision loss or not.
For example
double source = 1988.1012;
float dest = numeric_cast<float>(source);
Produces dest which has value 1988.1
Is there any way available in which i can detect this kind of precision loss/rounding

You could cast the float back to a double and compare this double to the original - that should give you a fair indication as to whether there was a loss of precision.

float dest = numeric_cast<float>(source);
double residual = source - numeric_cast<double>(dest);
Hence, residual contains the "loss" you're looking for.

Look at these articles for single precision and double precision floats. First of all, floats have 8 bits for the exponent vs. 11 for a double. So anything bigger than 10^127 or smaller than 10^-126 in magnitude is going to be the overflow as you mentioned. For the float, you have 23 bits for the actual digits of the number, vs 52 bits for the double. So obviously, you have a lot more digits of precision for the double than float.
Say you have a number like: 1.1123. This number may not actually be encoded as 1.1123 because the digits in a floating point number are used to actually add up as fractions. For example, if your bits in the mantissa were 11001, then the value would be formed by 1 (implicit) + 1 * 1/2 + 1 * 1/4 + 0 * 1/8 + 0 * 1/16 + 1 * 1/32 + 0 * (64 + 128 + ...). So the exact value cannot be encoded unless you can add up these fractions in such a way that it's the exact number. This is rare. Therefore, there will almost always be a precision loss.

You're going to have a certain level of precision loss, as per Dave's answer. If, however, you want to focus on quantifying it and raising an exception when it exceeds a certain number, you will have to open up the floating point number itself and parse out the mantissa & exponent, then do some analysis to determine if you've exceeded your tolerance.
But, the good news, its usually the standard IEEE floating-point float. :-)

Discrepancy between the values computed by Fortran and C++

I would have dared say that the numeric values computed by Fortran and C++ would be way more similar. However, from what I am experiencing, it turns out that the calculated numbers start to diverge after too few decimal digits. I have come across this problem during the process of porting some legacy code from the former language to the latter. The original Fortran 77 code...
INTEGER M, ROUND
DOUBLE PRECISION NUMERATOR, DENOMINATOR
M = 2
ROUND = 1
NUMERATOR=5./((M-1+(1.3**M))**1.8)
DENOMINATOR = 0.7714+0.2286*(ROUND**3.82)
WRITE (*, '(F20.15)') NUMERATOR/DENOMINATOR
STOP
... outputs 0.842201471328735, while its C++ equivalent...
int m = 2;
int round = 1;
long double numerator = 5.0 / pow((m-1)+pow(1.3, m), 1.8);
long double denominator = 0.7714 + 0.2286 * pow(round, 3.82);
std::cout << std::setiosflags(std::ios::fixed) << std::setprecision(15)
<< numerator/denominator << std::endl;
exit(1);
... returns 0.842201286195064. That is, the computed values are equal only up to the sixth decimal. Although not particularly a Fortran advocator, I feel inclined to consider its results as the 'correct' ones, given its legitimate reputation of number cruncher. However, I am intrigued about the cause of this difference between the computed values. Does anyone know what the reason for this discrepancy could be?

In Fortran, by default, floating point literals are single precision, whereas in C/C++ they are double precision.
Thus, in your Fortran code, the expression for calculating NUMERATOR is done in single precision; it is only converted to double precision when assigning the final result to the NUMERATOR variable.
And the same thing for the expression calculating the value that is assigned to the DENOMINATOR variable.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ int64 * double == off by one - c++

Related

How to round a floating point type to two decimals or more in C++? [duplicate]

double to scientific notation conversion - precision error

Accurate percentage in C++

Detecting precision loss when converting from double to float

Discrepancy between the values computed by Fortran and C++

Categories

Resources