Can somebody explain to me why ColdFusion (tested on 2016,2018 and 2021) is doing a wrong double to long conversion? I know it can mess things up for fractional values, but in this example, it is clearly an integer value.
This is the code:
<cfoutput>
<cfset a = 69.35>
#getMetadata(a)# #a#</br>
<cfset b = a * 100>
#getMetadata(b)# #b#</br>
<cfset c = int(b)>
#getMetadata(c)# #c#</br>
</cfoutput>
And this is the output:
class coldfusion.runtime.CFDouble 69.35
class java.lang.Double 6935
class java.lang.Long 6934
It is "sort-of" fixable by doing this:
<cfset d = int(javacast("string", b))>
#getMetadata(d)# #d#</br>
Returning
class java.lang.Long 6935
But I'm not really satisfied by this "solution"...
Thanks!
EDIT:
Because ColdFusion is running on top of java, I guess this is responsible for it:
public static void main(String[] args)
{
double a = 69.35;
double b = a * 100;
System.out.println(b);
long c = (int)b;
System.out.println(c);
long d = Math.round(b);
System.out.println(d);
}
Output:
6934.999999999999
6934
6935
And most likely, ColdFusion is using int() and not round() to convert the double value to long... This is one of the "nice" side effects of a typeless programming language, which internally makes a mess of it. Makes me think of javascript ;-)
EDIT 2:
As Cameron pointed out, there is a difference between #b# and #b.ToString()#. The former is returning 6935, while the latter is returning 6934.999999999999. This is confusing in my opinion, but I'll keep it in the back of my head for in case I run into another strange problem with double/long values :-)
And to make it even a bit more confusion:
int(ToString(b)) is returning 6935 while int(b.ToString()) is returning 6934...
<cfset a = 69.35>
#getMetadata(a)# #a#</br>
<cfset b = a * 100>
#getMetadata(b)# #b#</br>
#b.toString()# #ToString(b)#</br>
Is returning:
class java.lang.String 69.35
class java.lang.Double 6935
6934.999999999999 6935
So, don't assume that b.ToString() is the same as ToString(b) ...
As #SOS touches on in their comment (not sure why they did not make it an "answer"?), the issue is not the conversion. The issue is that ColdFusion is displaying 69.35 * 100 as equalling 6935, which it isn't. And even ColdFusion doesn't really think it is.
As far as most computing languages are concerned, 69.35 * 100 is 6934.999999999999 (check on JS, Python, Ruby etc if you like), due to issues with the inherent inaccuracy of representing decimal fractional values in a system that stores stuff in binary. I've written about this before: Floating point arithmetic with decimals.
Internally ColdFusion is storing the result as 6934.999999999999:
<cfset f = 69.35 * 100>
<cfoutput>#f.toString()#</cfoutput>
This yields:
6934.999999999999
So when you use int to take the integer portion of 6934.999999999999, you get 6934. That part is actually doing the job correctly! ;-)
I know I'm kind of late to the game with this answer, but here's what I've used in the past when encountering precision issues in ColdFusion when dealing with mathematical calculations using currency. In order to avoid precision errors, I've always wrapped my mathematical calculations with the precisionEvaluate() function. Using it with your sample code
<cfoutput>
<cfset a = 69.35>
#getMetadata(a)# #a#</br>
<cfset b = precisionEvaluate(a * 100)>
#getMetadata(b)# #b#</br>
<cfset c = int(b)>
#getMetadata(c)# #c#</br>
</cfoutput>
The resultant output looks like this. As you can see, it converts it from a Double to a BigDecimal and it avoids the precision issues.
class coldfusion.runtime.CFDouble 69.35
class java.math.BigDecimal 6935.00
class java.lang.Long 6935
You can view the results here.
Related
Firstly, I realise that most base 10 numbers cannot be precisely expressed in base 2, and so my question isn't really about the deficiencies of floating point arithmetic.
I am trying to write a function that will attempt to correct a double tainted by cumulative rounding error by checking the last 6 meaningful digits are within some tolerance and changing it to the next representable above some supposed exact value (only for display purposes - unless it is an integer or power of two).
A component of my function that surprises me is the output of exp10 though; As far as I'm aware, so long as the spacing between two doubles is less than 2 then integer values stored as doubles should be exact - and though 10^14 is pushing it, this should be an exact integer (since 10^14 =~ 2^46.507 < 2^53). However this is not what my testing shows.
An excerpt of my debugging efforts (nothing stands out as obvious) and output is as follows:
double test = 0.000699;
double tmp = fabs(test);
double exp = 10.0 - floor(log10(tmp));
double powTen = exp10(10.0 - floor(log10(tmp)));
double powTen2 = exp10(exp);
double powTen3 = exp10((int)exp);
double powTen4 = exp10(exp);
double powTen5 = pow(10, exp);
printf("exp: %.16lf\n", exp);
printf("powTen: %.16lf\n", powTen);
printf("powTen2: %.16lf\n", powTen2);
printf("powTen3: %.16lf\n", powTen3);
printf("powTen4: %.16lf\n", powTen4);
//these two are exact
printf("10^14: %.16lf\n", exp10(14));
printf("powTen5: %.16lf\n", powTen5);
printf("exp == 14.0: %d\n", exp == 14.0);
output:
exp: 14.0000000000000000
powTen: 100000000000000.1250000000000000
powTen2: 100000000000000.1250000000000000
powTen3: 100000000000000.1250000000000000
powTen4: 100000000000000.1250000000000000
10^14: 100000000000000.0000000000000000
powTen5: 100000000000000.0000000000000000
exp == 14.0: 1
pow is getting the answer exact, as is exp10 with a hardcoded int. For all other cases I am adding in 1/8 (the spacing between 10^14 and 10^14 + next representable is 1/64).
The documentation says that exp10 should be equivalent to pow. Can anyone see something I'm missing?
Edit - with O3, O2, O1 optimisation I am getting the expected outputs - unless the data cannot be known until runtime. at this point exp10 is still misbehaving.
It is probable that your exp10 implementation is misbehaving. Note that the results it's returning are sometimes off by an ulp (that's 0.125 relative to your 10^14.)
This is a rather heinous error; you've got a case where the correct answer is representable as a double yet exp10 isn't doing so.
I'd echo Ben Voigt's comment that the compiler may sometimes evaluate things itself instead of passing them off to the math library. It's probably doing a better job since it probably links to arbitrary-precision math library. You might experiment with the -fno-builtin option to see whether it changes anything.
Unfortunately, I don't think crlibm has implemented exp10. Otherwise I'd recommend you just use that and stop worrying.
EDIT: The copy of eglibc source I have seems to implement exp10 thus:
double
__ieee754_exp10 (double arg)
{
/* This is a very stupid and inprecise implementation. It'll get
replaced sometime (soon?). */
return __ieee754_exp (M_LN10 * arg);
}
Don't expect this to work well.
I'm trying to print the number 684.545007 with 2 points precision in the sense that the number be truncated (not rounded) after 684.54.
When I use
var = 684.545007;
printf("%.2f\n",var);
it outputs 684.55, but what I'd like to get is 684.54.
Does anyone knows how can I correct this?
What you're looking for is truncation. This should work (at least for numbers that aren't terribly large):
printf(".2f", ((int)(100 * var)) / 100.0);
The conversion to integer truncates the fractional part.
In C++11 or C99, you can use the dedicated function trunc for this purpose (from the header <cmath> or <math.h>. This will avoid the restriction to values that fit into an integral type.
std::trunc(100 * var) / 100 // no need for casts
Here is my approach. It seems ugly but does work in most cases e.g. var can be larger then int, can be zero or bizarre '-0'. It does not handle infinities and NaNs though.
double var = 684.545007; // or whatever
double var_trunc = var>=0. ? floor(var*100.)/100. : ceil(var*100.)/100.;
printf ("%g\n", var_trunc);
printf("%.2f\n", var - 0.005);
I'm often using the wrong literals in expressions, e.g. dividing a float by an int, like this:
float f = read_f();
float g = f / 2;
I believe that the compiler will in this case first convert the int literal (2) to float, and then apply the division operator. GCC and Clang have always let stuff like that pass, but Visual C++ warns about an implicit conversion. So I have to write it like this:
float f = read_f();
float g = f / 2.0f;
That got me wondering: Should I always use the appropriate literals for float, double, long etc.? I normally use int literals whenever I can get away with it, but I'm not sure if that's actually a good idea.
Is this a likely cause of subtle errors?
Is this only an issue for expressions or also for function parameters?
Are there warning levels for GCC or Clang that warn about such implicit conversions?
How about unsigned int, long int etc?
You should always explicitly indicate the type of literal that you intend to use. This will prevent problems when for example this sort of code:
float foo = 9.0f;
float bar = foo / 2;
changes to the following, truncating the result:
int foo = 9;
float bar = foo / 2;
It's a concern with function parameters as well when you have overloading and templates involved.
I know gcc has -Wconversion but I can't recall everything that it covers.
For integer values that fit in int I usually don't qualify those for long or unsigned as there is usually much less chance there for subtle bugs.
There's pretty much never an absolutely correct answer to a "should" question. Who's going to use this code, and for what? That's relevant here. But also, particularly for anything to do with floats, it's good to get into the habit of specifying exactly the operations you require. float*float is done in single-precision. anything with a double is done double-precision, 2 gets converted to a double so you're specifying different operations here.
The best answer here is What Every Computer Scientist Should Know About Floating-Point Arithmetic. I'd say don't tl;dr it, there are no simple answers with floating point.
First, I understand that the double type in C++ has been discussed lots of time, but I wasn't able to answer my question after searching. Any help or idea is highly appreciated.
The simplified version of my question is: I got three different results (a=-0.926909, a=-0.926947 and a=-0.926862) when I computed a=b-c+d with three different approaches and the same values of b, c and d, and I don't know which one to trust.
The detailed version of my question is:
I was recently writing a program (in C++ on Ubuntu 10.10) to handle some data. One function looks like this:
void calc() {
double a, b;
...
a = b - c + d; // c, d are global variables of double
...
}
When I was using GDB to debug the above code, during a call to calc(), I recorded the values of b, c and d before the statement a = b - c + d as follows:
b = 54.7231
c = 55.4051
d = -0.244947
After the statement a = b - c + d excuted, I found that a=-0.926909 instead of -0.926947 which is calculated by a calculator. Well, so far it is not quite confusing yet, as I guess this might just be a precision problem. Later on I re-implemented another version of calc() for some reason. Let's call this new version calc_new(). calc_new() is almost the same as calc(), except for how and where b, c and d are calculated:
void calc_new() {
double a, b;
...
a = b - c + d; // c, d are global variables of double
...
}
This time when I was debugging, the values of b, c and d before the statement a = b - c + d are the same as when calc() was debugged: b = 54.7231, c = 55.4051, d = -0.244947. However, this time after the statement a = b - c + d executed, I got a=-0.926862. That being said, I got three different a when I computed a = b - c + d with the same values of b, c and d. I think differences between a=-0.926862, a=-0.926909 and a=-0.926947 are not small, but I cannot figure out the cause. And which one is correct?
With Many Thanks,
Tom
If you expect the answer to be accurate in the 5th and 6th decimal place, you need to know exactly what the inputs to the calculation are in those places. You are seeing inputs with only 4 decimal places, you need to display their 5th and 6th place as well. Then I think you would see a comprehensible situation that matches your calculator to 6 decimal places. Double has more than sufficient precision for this job, there would only be precision problems here if you were taking the difference of two very similar numbers (you're not).
Edit: Unsurprisingly, increasing the display precision would have also shown you that calc() and calc_new() were supplying different inputs to the calculation. Credit to Mike Seymour and Dietmar Kuhl in the comments who were the first to see your actual problem.
Let me try to answer the question I suspect that you meant to ask. If I have mistaken your intent, then you can disregard the answer.
Suppose that I have the numbers u = 500.1 and v = 5.001, each to four decimal places of accuracy. What then is w = u + v? Answer, w = 505.101, but to four decimal places, it's w = 505.1.
Now consider x = w - u = 5.000, which should equal v, but doesn't quite.
If I only change the order of operations however, I can get x to equal v exactly, not by x = w - u or by x = (u + v) - u, but by x = v + (u - u).
Is that trivial? Yes, in my example, it is; but the same principle applies in your example, except that they aren't really decimal places but bits of precision.
In general, to maintain precision, if you have some floating-point numbers to sum, you should try to add the small ones together first, and only bring the larger ones into the sum later.
We're discussing here about smoke. If nothing changed in the environment an expression like:
a = b + c + d
MUST ALWAYS RETURN THE SAME VALUE IF INPUTS AREN'T CHANGED.
No rounding errors. No esoteric pragmas, nothing at all.
If you check your bank account today and tomorrow (and nothing changed in that time) I suspect you'll go crazy if you see something different. We're speaking about programs, not random number generators!!!
The correct one is -0.926947.
The differences you see are far too large for rounding errors (even in single precision) as one can check in this encoder.
When using the encoder, you need to enter them like this: -55.926909 (to account for the potential effect of the operator commutativity effects nicely described in previously submitted answers.) Additionally, a difference in just the last significant bit may well be due to rounding effects, but you will not see any with your values.
When using the tool, 64bit format (Binary64) corresponds to your implementation's double type.
Rational numbers do not always have a terminating expansion in a given base. 1/3rd cannot be expressed in a finite number of digits in base ten. In base 2, rational numbers with a denominator that is a power of two will have a terminating expansion. The rest won't. So 1/2, 1/4, 3/8, 7/16.... any number that looks like x/(2^n) can be represented accurately. That turns out to be a fairly sparse subset of the infinite series of rational numbers. Everything else will be subject to the errors introduced by trying to represent an infinite number of binary digits within a finite container.
But addition is commutative, right? Yes. But when you start introducing rounding errors things change a little. With a = b + c + d as an example, let's say that d cannot be expressed in a finite number of binary digits. Neither can c. So adding them together will give us some inaccurate value, which itself may also be incapable of being represented in a finite number of binary digits. So error on top of error. Then we add that value to b, which may also not be a terminating expansion in binary. So taking one inaccurate result and adding it to another inaccurate number results in another inaccurate number. And because we're throwing away precision at every step, we potentially break the symmetry of commutativity at each step.
There's a post I made: (Perl-related, but it's a universal topic) Re: Shocking Imprecision (PerlMonks), and of course the canonical What Every Computer Scientist Should Know About Floating Point Math, both which discuss the topic. The latter is far more detailed.
I have a function getSlope which takes as parameters 4 doubles and returns another double calculated using this given parameters in the following way:
double QSweep::getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
The problem is that when calling this function with arguments for example:
getSlope(2.71156, -1.64161, 2.70413, -1.72219);
the returned result is:
10.8557
and this is not a good result for my computations.
I have calculated the slope using Mathematica and the result for the slope for the same parameters is:
10.8452
or with more digits for precision:
10.845222072678331.
The result returned by my program is not good in my further computations.
Moreover, I do not understant how does the program returns 10.8557 starting from 10.845222072678331 (supposing that this is the approximate result for the division)?
How can I get the good result for my division?
thank you in advance,
madalina
I print the result using the command line:
std::cout<<slope<<endl;
It may be that my parameters are maybe not good, as I read them from another program (which computes a graph; after I read this parameters fromt his graph I have just displayed them to see their value but maybe the displayed vectors have not the same internal precision for the calculated value..I do not know it is really strange. Some numerical errors appears..)
When the graph from which I am reading my parameters is computed, some numerical libraries written in C++ (with templates) are used. No OpenGL is used for this computation.
thank you,
madalina
I've tried with float instead of double and I get 10.845110 as a result. It still looks better than madalina result.
EDIT:
I think I know why you get this results. If you get a, b, c and d parameters from somewhere else and you print it, it gives you rounded values. Then if you put it to Mathemtacia (or calc ;) ) it will give you different result.
I tried changing a little bit one of your parameters. When I did:
double c = 2.7041304;
I get 10.845806. I only add 0.0000004 to c!
So I think your "errors" aren't errors. Print a, b, c and d with better precision and then put them to Mathematica.
The following code:
#include <iostream>
using namespace std;
double getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
int main( ) {
double s = getSlope(2.71156, -1.64161, 2.70413, -1.72219);
cout << s << endl;
}
gives a result of 10.8452 with g++. How are you printing out the result in your code?
Could it be that you use DirectX or OpenGL in your project? If so they can turn off double precision and you will get strange results.
You can check your precision settings with
std::sqrt(x) * std::sqrt(x)
The result has to be pretty close to x.
I met this problem long time ago and spend a month checking all the formulas. But then I've found
D3DCREATE_FPU_PRESERVE
The problem here is that (c-a) is small, so the rounding errors inherent in floating point operations is magnified in this example. A general solution is to rework your equation so that you're not dividing by a small number, I'm not sure how you would do it here though.
EDIT:
Neil is right in his comment to this question, I computed the answer in VB using Doubles and got the same answer as mathematica.
The results you are getting are consistent with 32bit arithmetic. Without knowing more about your environment, it's not possible to advise what to do.
Assuming the code shown is what's running, ie you're not converting anything to strings or floats, then there isn't a fix within C++. It's outside of the code you've shown, and depends on the environment.
As Patrick McDonald and Treb brought both up the accuracy of your inputs and the error on a-c, I thought I'd take a look at that. One technique to look at rounding errors is interval arithmetic, which makes the upper and lower bounds which value represents explicit (they are implicit in floating point numbers, and are fixed to the precision of the representation). By treating each value as an upper and lower bound, and by extending the bounds by the error in the representation ( approx x * 2 ^ -53 for a double value x ), you get a result which gives the lower and upper bounds on the accuracy of a value, taking into account worst case precision errors.
For example, if you have a value in the range [1.0, 2.0] and subtract from it a value in the range [0.0, 1.0], then the result must lie in the range [below(0.0),above(2.0)] as the minimum result is 1.0-1.0 and the maximum is 2.0-0.0. below and above are equivalent to floor and ceiling, but for the next representable value rather than for integers.
Using intervals which represent worst-case double rounding:
getSlope(
a = [2.7115599999999995262:2.7115600000000004144],
b = [-1.6416099999999997916:-1.6416100000000002357],
c = [2.7041299999999997006:2.7041300000000005888],
d = [-1.7221899999999998876:-1.7221900000000003317])
(d-b) = [-0.080580000000000526206:-0.080579999999999665783]
(c-a) = [-0.0074300000000007129439:-0.0074299999999989383218]
to double precision [10.845222072677243474:10.845222072679954195]
So although c-a is small compared to c or a, it is still large compared to double rounding, so if you were using the worst imaginable double precision rounding, then you could trust that value's to be precise to 12 figures - 10.8452220727. You've lost a few figures off double precision, but you're still working to more than your input's significance.
But if the inputs were only accurate to the number significant figures, then rather than being the double value 2.71156 +/- eps, then the input range would be [2.711555,2.711565], so you get the result:
getSlope(
a = [2.711555:2.711565],
b = [-1.641615:-1.641605],
c = [2.704125:2.704135],
d = [-1.722195:-1.722185])
(d-b) = [-0.08059:-0.08057]
(c-a) = [-0.00744:-0.00742]
to specified accuracy [10.82930108:10.86118598]
which is a much wider range.
But you would have to go out of your way to track the accuracy in the calculations, and the rounding errors inherent in floating point are not significant in this example - it's precise to 12 figures with the worst case double precision rounding.
On the other hand, if your inputs are only known to 6 figures, it doesn't actually matter whether you get 10.8557 or 10.8452. Both are within [10.82930108:10.86118598].
Better Print out the arguments, too. When you are, as I guess, transferring parameters in decimal notation, you will lose precision for each and every one of them. The problem being that 1/5 is an infinite series in binary, so e.g. 0.2 becomes .001001001.... Also, decimals are chopped when converting an binary float to a textual representation in decimal.
Next to that, sometimes the compiler chooses speed over precision. This should be a documented compiler switch.
Patrick seems to be right about (c-a) being the main cause:
d-b = -1,72219 - (-1,64161) = -0,08058
c-a = 2,70413 - 2,71156 = -0,00743
S = (d-b)/(c-a)= -0,08058 / -0,00743 = 10,845222
You start out with six digits precision, through the subtraction you get a reduction to 3 and four digits. My best guess is that you loose additonal precision because the number -0,00743 can not be represented exaclty in a double. Try using intermediate variables with a bigger precision, like this:
double QSweep::getSlope(double a, double b, double c, double d)
{
double slope;
long double temp1, temp2;
temp1 = (d-b);
temp2 = (c-a);
slope = temp1/temp2;
return slope;
}
While the academic discussion going on is great for learning about the limitations of programming languages, you may find the simplest solution to the problem is an data structure for arbitrary precision arithmetic.
This will have some overhead, but you should be able to find something with fairly guaranteeable accuracy.