This question already has answers here:
How do I print a double value with full precision using cout?
(17 answers)
Closed 1 year ago.
I have the following calculation:
double a = 141150, b = 141270, c = 141410;
double d = (a + b + c) / 3;
cout << d << endl;
The output shows d = 141277, whereas d should be 141276.666667. The calculation consists of double additions and a double division. Why am I getting a result that is rounded up?? By the way d = (a + b + c) / 3.0 doesn't help.
However in another similar calculation, the result is correct:
double u = 1, v = 2, x = 3, y = 4;
double z = (u + v + x + y) / 4;
z results in 2.5 as expected. These two calculations are essentially the same, but why different behaviors?
Lastly, I know C++ automatically truncates numbers casted to lower precision, but I've never heard of automatic rounding. Can someone shed some light?
Here you can find multiple answers on the same problem with code snippets as examples. It does include what the guys said in the comments
Related
This question already has answers here:
Why does changing 0.1f to 0 slow down performance by 10x?
(6 answers)
Closed 8 years ago.
I am a circuit designer, not a software engineer, so I have no idea how to track down this problem.
I am working with some IIR filter code and I am have problems with extremely slow execution times when I process extremely small values through the filter. To find the problem, I wrote this test code.
Normally, the loop will run in about 200 ms or so. (I didn't measure it.) But when TestCheckBox->Checked, it requires about 7 seconds to run. The problem lies with the reduction in size of A, B, C and D within the loop, which is exactly what happens to the values in an IIR filter after it's input goes to zero.
I believe the problem lies with the fact that the variable's expononent value becomes less than -308. A simple fix is to declare the variables as long doubles, but that isn't an easy fix in the actual code, and it doesn't seem like I should have to do this.
Any ideas why this happens and what a simple fix might be?
In case its matters, I am using C++ Builder XE3.
int j;
double A, B, C, D, E, F, G, H;
//long double A, B, C, D, E, F, G, H; // a fix
A = (double)random(100000000)/10000000.0 - 5.0;
B = (double)random(100000000)/10000000.0 - 5.0;
C = (double)random(100000000)/10000000.0 - 5.0;
D = (double)random(100000000)/10000000.0 - 5.0;
if(TestCheckBox->Checked)
{
A *= 1.0E-300;
B *= 1.0E-300;
C *= 1.0E-300;
D *= 1.0E-300;
}
for(j=0; j<=1000000; j++)
{
A *= 0.9999;
B *= 0.9999;
C *= 0.9999;
D *= 0.9999;
E = A * B + C - D; // some exercise code
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
E = A * B + C - D;
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
E = A * B + C - D;
F = A - C * B + D;
G = A + B + C + D;
H = A * C - B + G;
}
EDIT:
As the answers said, the cause of this problem is denormal math, something I had never heard of. Wikipedia has a pretty nice description of it as does the MSDN article given by Sneftel.
http://en.wikipedia.org/wiki/Denormal_number
Having said this, I still can't get my code to flush denormals. The MSDN article says to do this:
_controlfp(_DN_FLUSH, _MCW_DN)
These definitions are not in the XE3 math libraries however, so I used
controlfp(0x01000000, 0x03000000)
per the article, but this is having no affect in XE3. Nor is the code suggested in the Wikipedia article.
Any suggestions?
You're running into denormal numbers (ones less than DBL_MIN, in which the most significant digit is treated as a zero). Denormals extend the range of the representable floating-point numbers, and are important to maintain certain useful error bounds in FP arithmetic, but operating on them is far slower than operating on normal FP numbers. They also have lower precision. So you should try to keep all your numbers (both intermediate and final quantities) greater than DBL_MIN.
In order to increase performance, you can force denormals to be flushed to zero by calling _controlfp(_DN_FLUSH, _MCW_DN) (or, depending on OS and compiler, a similar function). http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You've entered the realm of floating-point underflow, resulting in denormalized numbers - depending on the hardware you're likely trapping into software, which will be much much slower than hardware operations.
This question already has answers here:
Why does division result in zero instead of a decimal?
(5 answers)
Integer division always zero [duplicate]
(1 answer)
Closed 9 years ago.
Why is k value returns 0? Please help.
double fah, kel;
fah = std::atof(input.c_str()); //convert string input to a double & assign value to be fah degree
kel = (double)((f + 459.67) * (5/9)); //calculate fah temp to kelvin
k value returns 0 when I add "5/9" to the calculation.
In C++, 5/9 == 0 because of integer division.
Use 5.0/9.
The problem is integer division. The result of this:
5/9
is 0. You should use a floating point type:
5/9.0 // 9.0 is a double.
Expression 5 / 9 has integral type that is its result is equal to 0.
Change (5/9) at least to (5.0/9)
This is an integer division issue. Additionally, you simplify the equation a bit:
Dk = C*Df + C*B ===> Dk = C*Df + A
where Dk is degrees Kelvin, Df is degrees Farenheit, C is the 5/9 constant, and A is C*Kelvin Offset (also constant). Which would make your code:
const double FtoKMultiplier = 5.0/9.0; // done at compile time
const double FtoKOffset = 459.67 * FtoKMultiplier; // also done at compile time
double fah = std::atof(input.c_str());
double kel = FtoKMultipler * fah + FtoKOffset; // single multiplication and addition at runtime
This question already has answers here:
round() for float in C++
(23 answers)
Closed 3 years ago.
I have a double (call it x), meant to be 55 but in actuality stored as 54.999999999999943157 which I just realised.
So when I do
double x = 54.999999999999943157;
int y = (int) x;
y = 54 instead of 55!
This puzzled me for a long time. How do I get it to correctly round?
add 0.5 before casting (if x > 0) or subtract 0.5 (if x < 0), because the compiler will always truncate.
float x = 55; // stored as 54.999999...
x = x + 0.5 - (x<0); // x is now 55.499999...
int y = (int)x; // truncated to 55
C++11 also introduces std::round, which likely uses a similar logic of adding 0.5 to |x| under the hood (see the link if interested) but is obviously more robust.
A follow up question might be why the float isn't stored as exactly 55. For an explanation, see this stackoverflow answer.
Casting is not a mathematical operation and doesn't behave as such. Try
int y = (int)round(x);
Casting to an int truncates the value. Adding 0.5 causes it to do proper rounding.
int y = (int)(x + 0.5);
It is worth noting that what you're doing isn't rounding, it's casting. Casting using (int) x truncates the decimal value of x. As in your example, if x = 3.9995, the .9995 gets truncated and x = 3.
As proposed by many others, one solution is to add 0.5 to x, and then cast.
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
double x=54.999999999999943157;
int y=ceil(x);//The ceil() function returns the smallest integer no less than x
return 0;
}
I have problem with precision. I have to make my c++ code to have same precision as matlab. In matlab i have script which do some stuff with numbers etc. I got code in c++ which do the same as that script. Output on the same input is diffrent :( I found that in my script when i try 104 >= 104 it returns false. I tried to use format long but it did not help me to find out why its false. Both numbers are type of double. i thought that maybe matlab stores somewhere the real value of 104 and its for real like 103.9999... So i leveled up my precision in c++. It also didnt help because when matlab returns me value of 50.000 in c++ i got value of 50.050 with high precision. Those 2 values are from few calculations like + or *. Is there any way to make my c++ and matlab scrips have same precision?
for i = 1:neighbors
y = spoints(i,1)+origy;
x = spoints(i,2)+origx;
% Calculate floors, ceils and rounds for the x and y.
fy = floor(y); cy = ceil(y); ry = round(y);
fx = floor(x); cx = ceil(x); rx = round(x);
% Check if interpolation is needed.
if (abs(x - rx) < 1e-6) && (abs(y - ry) < 1e-6)
% Interpolation is not needed, use original datatypes
N = image(ry:ry+dy,rx:rx+dx);
D = N >= C;
else
% Interpolation needed, use double type images
ty = y - fy;
tx = x - fx;
% Calculate the interpolation weights.
w1 = (1 - tx) * (1 - ty);
w2 = tx * (1 - ty);
w3 = (1 - tx) * ty ;
w4 = tx * ty ;
%Compute interpolated pixel values
N = w1*d_image(fy:fy+dy,fx:fx+dx) + w2*d_image(fy:fy+dy,cx:cx+dx) + ...
w3*d_image(cy:cy+dy,fx:fx+dx) + w4*d_image(cy:cy+dy,cx:cx+dx);
D = N >= d_C;
end
I got problems in else which is in line 12. tx and ty eqauls 0.707106781186547 or 1 - 0.707106781186547. Values from d_image are in range 0 and 255. N is value 0..255 of interpolating 4 pixels from image. d_C is value 0.255. Still dunno why matlab shows that when i have in N vlaues like: x x x 140.0000 140.0000 and in d_C: x x x 140 x. D gives me 0 on 4th position so 140.0000 != 140. I Debugged it trying more precision but it still says that its 140.00000000000000 and it is still not 140.
int Codes::Interpolation( Point_<int> point, Point_<int> center , Mat *mat)
{
int x = center.x-point.x;
int y = center.y-point.y;
Point_<double> my;
if(x<0)
{
if(y<0)
{
my.x=center.x+LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x+LEN;
my.y=center.y-LEN;
}
}
else
{
if(y<0)
{
my.x=center.x-LEN;
my.y=center.y+LEN;
}
else
{
my.x=center.x-LEN;
my.y=center.y-LEN;
}
}
int a=my.x;
int b=my.y;
double tx = my.x - a;
double ty = my.y - b;
double wage[4];
wage[0] = (1 - tx) * (1 - ty);
wage[1] = tx * (1 - ty);
wage[2] = (1 - tx) * ty ;
wage[3] = tx * ty ;
int values[4];
//wpisanie do tablicy 4 pixeli ktore wchodza do interpolacji
for(int i=0;i<4;i++)
{
int val = mat->at<uchar>(Point_<int>(a+help[i].x,a+help[i].y));
values[i]=val;
}
double moze = (wage[0]) * (values[0]) + (wage[1]) * (values[1]) + (wage[2]) * (values[2]) + (wage[3]) * (values[3]);
return moze;
}
LEN = 0.707106781186547 Values in array values are 100% same as matlab values.
Matlab uses double precision. You can use C++'s double type. That should make most things similar, but not 100%.
As someone else noted, this is probably not the source of your problem. Either there is a difference in the algorithms, or it might be something like a library function defined differently in Matlab and in C++. For example, Matlab's std() divides by (n-1) and your code may divide by n.
First, as a rule of thumb, it is never a good idea to compare floating point variables directly. Instead of, for example instead of if (nr >= 104) you should use if (nr >= 104-e), where e is a small number, like 0.00001.
However, there must be some serious undersampling or rounding error somewhere in your script, because getting 50050 instead of 50000 is not in the limit of common floating point imprecision. For example, Matlab can have a step of as small as 15 digits!
I guess there are some casting problems in your code, for example
int i;
double d;
// ...
d = i/3 * d;
will will give a very inaccurate result, because you have an integer division. d = (double)i/3 * d or d = i/3. * d would give a much more accurate result.
The above example would NOT cause any problems in Matlab, because there everything is already a floating-point number by default, so a similar problem might be behind the differences in the results of the c++ and Matlab code.
Seeing your calculations would help a lot in finding what went wrong.
EDIT:
In c and c++, if you compare a double with an integer of the same value, you have a very high chance that they will not be equal. It's the same with two doubles, but you might get lucky if you perform the exact same computations on them. Even in Matlab it's dangerous, and maybe you were just lucky that as both are doubles, both got truncated the same way.
By you recent edit it seems, that the problem is where you evaluate your array. You should never use == or != when comparing floats or doubles in c++ (or in any languages when you use floating-point variables). The proper way to do a comparison is to check whether they are within a small distance of each other.
An example: using == or != to compare two doubles is like comparing the weight of two objects by counting the number of atoms in them, and deciding that they are not equal even if there is one single atom difference between them.
MATLAB uses double precision unless you say otherwise. Any differences you see with an identical implementation in C++ will be due to floating-point errors.
This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Why pow(10,5) = 9,999 in C++
(8 answers)
Closed 4 years ago.
I've found an interesting floating point problem. I have to calculate several square roots in my code, and the expression is like this:
sqrt(1.0 - pow(pos,2))
where pos goes from -1.0 to 1.0 in a loop. The -1.0 is fine for pow, but when pos=1.0, I get an -nan. Doing some tests, using gcc 4.4.5 and icc 12.0, the output of
1.0 - pow(pos,2) = -1.33226763e-15
and
1.0 - pow(1.0,2) = 0
or
poss = 1.0
1.0 - pow(poss,2) = 0
Where clearly the first one is going to give problems, being negative. Anyone knows why pow is returning a number smaller than 0? The full offending code is below:
int main() {
double n_max = 10;
double a = -1.0;
double b = 1.0;
int divisions = int(5 * n_max);
assert (!(b == a));
double interval = b - a;
double delta_theta = interval / divisions;
double delta_thetaover2 = delta_theta / 2.0;
double pos = a;
//for (int i = 0; i < divisions - 1; i++) {
for (int i = 0; i < divisions+1; i++) {
cout<<sqrt(1.0 - pow(pos, 2)) <<setw(20)<<pos<<endl;
if(isnan(sqrt(1.0 - pow(pos, 2)))){
cout<<"Danger Will Robinson!"<<endl;
cout<< sqrt(1.0 - pow(pos,2))<<endl;
cout<<"pos "<<setprecision(9)<<pos<<endl;
cout<<"pow(pos,2) "<<setprecision(9)<<pow(pos, 2)<<endl;
cout<<"delta_theta "<<delta_theta<<endl;
cout<<"1 - pow "<< 1.0 - pow(pos,2)<<endl;
double poss = 1.0;
cout<<"1- poss "<<1.0 - pow(poss,2)<<endl;
}
pos += delta_theta;
}
return 0;
}
When you keep incrementing pos in a loop, rounding errors accumulate and in your case the final value > 1.0. Instead of that, calculate pos by multiplication on each round to only get minimal amount of rounding error.
The problem is that floating point calculations are not exact, and that 1 - 1^2 may be giving small negative results, yielding an invalid sqrt computation.
Consider capping your result:
double x = 1. - pow(pos, 2.);
result = sqrt(x < 0 ? 0 : x);
or
result = sqrt(abs(x) < 1e-12 ? 0 : x);
setprecision(9) is going to cause rounding. Use a debugger to see what the value really is. Short of that, at least set the precision beyond the possible size of the type you're using.
You will almost always have rounding errors when calculating with doubles, because the double type has only 15 significant decimal digits (52 bits) and a lot of decimal numbers are not convertible to binary floating point numbers without rounding. The IEEE standard contains a lot of effort to keep those errors low, but by principle it cannot always succeed. For a thorough introduction see this document
In your case, you should calculate pos on each loop and round to 14 or less digits. That should give you a clean 0 for the sqrt.
You can calc pos inside the loop as
pos = round(a + interval * i / divisions, 14);
with round defined as
double round(double r, int digits)
{
double multiplier = pow(digits,10);
return floor(r*multiplier + 0.5)/multiplier;
}