JavaCast float rounds generously - coldfusion

I calculate sums and put these into an Excel sheet I generate using POI library in Coldfusion. As the Java library expects typed vars I always call setCellValue( JavaCast( "float", myVar ) ). I was made aware of a rounding error by .03. Way bigger than differences typically known after casting to float.
<cfset s = 601761.66>
<cfoutput>
#s#<br>
#JavaCast( "float", s )#<br>
#LSNumberFormat( JavaCast( "float", s ), ".0000000" )#<br><br>
</cfoutput>
The first line prints 601761.66
The second rounds to 601761.7
The third however prints: 601761,6875000 which is equal to 601761,69 and by .03 larger than the value I put in.
I know, LSNumberFormat returns a string. I called it just for comparison. POI seems to store the float value, Excel eventually displays the value as LSNumberFormat does.
How can I pass a value to setCellValue that is so close to my value that at least the second digit after the decmal is rounded correctly?

Short answer:
Use type double instead of float, i.e. javacast("double", value)
Longer answer:
The Cell.setCellValue() method actually expects type Double (not Float). Double is also what CF uses for most numeric operations and functions . When you pass a Float into those methods, it is implicitly converted into a Double. That conversion is (indirectly) causing the unexpected result.
The reason is that both Float and Double are approximate types. However, Double has greater precision:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. ...
double: The double data type is a double-precision 64-bit IEEE 754 floating point. ...
So as this thread points out (emphasis mine):
It's not that you're actually getting extra precision - it's that the
float didn't accurately represent the number you were aiming for
originally. The double is representing the original float accurately;
toString is showing the "extra" data which was already present.
... [When converted to a double, it] will have exactly the same value, but when you convert it to a string it will "trust" that it's accurate to a higher precision, so won't round off as early, and you'll see the "extra digits" which were already there, but hidden from you
That is why both "601761.66" and "601761.6875" seem to be rounded to "601761.7" when cast as a float, but are displayed expected when cast as a double.
<cfscript>
value1 = "601761.66";
value2 = "601761.6875";
WriteOutput("<br>[Float] "& value1 &" = "& javacast("float", value1));
WriteOutput("<br>[Float] "& value2 &" = "& javacast("float", value2));
WriteOutput("<br>[Float=>Double] "& value1 &" = "& javacast("double", javacast("float", value1)));
WriteOutput("<br>[Double] "& value1 &" = "& javacast("double", value1));
WriteOutput("<br>[Double] "& value2 &" = "& javacast("double", value2));
</cfscript>
Output:
[Float] 601761.66 = 601761.7
[Float] 601761.6875 = 601761.7
[Float=>Double] 601761.66 = 601761.6875
[Double] 601761.66 = 601761.66
[Double] 601761.6875 = 601761.6875
NB: CF uses Float.toString() and Double.ToString() to display the values via cfoutput/writeOutput/cfdump.

Related

In C++ : how to print the digits after the decimal.

In C++ : how to print the digits after the decimal.
For example i have this float number ( 12.54 ), and i want to print it like this ( 0.54 ).
Thank you all.
You can use modf function.
double integral_part;
double fractional = modf(some_double, &integral_part);
You can also cast it to an integer, but be warned you may overflow the integer. The result is not predictable then.
The simplest way
float f = 10.123;
float fract = f - (int)f;
std::cout << fract;
But for large input you can obtain integer overflow. In this case use
float fract = f - truncf(f);
Output
0.123
In C++ : how to print the digits after the decimal. For example i have
this float number ( 12.54 ), and i want to print it like this ( 0.54
).
If you want to use get the fractional part of a floating type number you have a choice of std::floor or std::trunc. Non negative numbers will be treated the same by either but negative numbers will not.
std::floor returns the lowest, non fractional, value while std::trunc returns the non fractional towards 0.
double f=1.23;
floor(f); // yields .23
trunc(1.23); // also yields .23
However
double f=-1.23;
floor(f); // yields -2
trunc(f); // but yields -1
So use trunc to get the fractional part for both positive and negative f's:
double f=-1.23;
f - floor(f); // yields .77
f - trunc(f); // but yields -.23

Convert float to string with precision and without tolerance

How can you convert float to string with specified precision without tolerance?
For example, with precision 6 get the following result.
40.432 -> 40.432000.
In a string the only value that I can get is 40.431999.
The problem is that you're using a float data type which only has a precision of at most 7.22 total digits and sometimes as little as 6 digits, and you're trying to display 8 total digits (2 before the decimal and 6 after). As noted in the comments, the closest possible binary float to 40.432 is 40.43199920654296875, the second closest would be 40.432003021240234375.
You can get more digits by converting to the larger double type. Once you've done that you can round to the nearest 6-digit number. Note that if the float was generated by a calculation, rounding may actually create a less accurate result.
If you always know your numbers will be between 10 and 100, this simple code will work. Otherwise you'll need a more complex process to determine the appropriate amount of rounding.
float f = 40.432;
double d = f;
double r = std::round(d * 10000.0) / 10000.0; // 2 digits before the decimal, 4 after
std::cout << std::fixed << std::setprecision(6) << r;
Note that the last 2 digits will always be zero because of the rounding.
See it in action: http://coliru.stacked-crooked.com/a/f085e56c03ebeb73
How can you convert float to string with specified precision
You can use a string stream:
std::ostringstream strs;
strs << std::fixed << std::setprecision(6) << the_value;
std::string str = strs.str();
In a string the only value that I can get is 40.431999.
Your problem may be that there exists no exact representation for 40.432 in the floating point format that your system uses. Since you can never have a floating point value 40.432, you can never convert such value to a string.
It just so happens that the closest representable value to 40.432 is closer to 40.431999 than it is to 40.432.
You need to:
Either Accept that 40.432 ~~ 40.431999
Or use a floating point format that is precise enough to have a representation for 40.432 that is closer to 40.432 than it is to 40.431999, and is also precise enough for all other numbers for which you have a specific expected value. IEEE 754 double precision floating point happens to have a representable value closer to 40.432 than 40.431999.
Or stop using floating point. You won't have problems like this if you use fixed point or arbitrary precision data types.

How are these double precision values accurate to 20 decimals?

I am testing some very simple equivalence errors when precision is an issue and was hoping to perform the operations in extended double precision (so that I knew what the answer would be in ~19 digits) and then perform the same operations in double precision (where there would be roundoff error in the 16th digit), but somehow my double precision arithmetic is maintaining 19 digits of accuracy.
When I perform the operations in extended double, then hardcode the numbers into another Fortran routine, I get the expected errors, but is there something strange going on when I assign an extended double precision variable to a double precision variable here?
program code_gen
implicit none
integer, parameter :: Edp = selected_real_kind(17)
integer, parameter :: dp = selected_real_kind(8)
real(kind=Edp) :: alpha10, x10, y10, z10
real(kind=dp) :: alpha8, x8, y8, z8
real(kind = dp) :: pi_dp = 3.1415926535897932384626433832795028841971693993751058209749445
integer :: iter
integer :: niters = 10
print*, 'tiny(x10) = ', tiny(x10)
print*, 'tiny(x8) = ', tiny(x8)
print*, 'epsilon(x10) = ', epsilon(x10)
print*, 'epsilon(x8) = ', epsilon(x8)
do iter = 1,niters
x10 = rand()
y10 = rand()
z10 = rand()
alpha10 = x10*(y10+z10)
x8 = x10
x8 = x8 - pi_dp
x8 = x8 + pi_dp
y8 = y10
y8 = y8 - pi_dp
y8 = y8 + pi_dp
z8 = z10
z8 = z8 - pi_dp
z8 = z8 + pi_dp
alpha8 = alpha10
write(*, '(a, es30.20)') 'alpha8 .... ', x8*(y8+z8)
write(*, '(a, es30.20)') 'alpha10 ... ', alpha10
if( alpha8 .gt. x8*(y8+z8) ) then
write(*, '(a)') 'ERROR(.gt.)'
elseif( alpha8 .lt. x8*(y8+z8) ) then
write(*, '(a)') 'ERROR(.lt.)'
endif
enddo
end program code_gen
where rand() is the gfortran function found here.
If we are speaking about only one precision type (take, for example, double), then we can denote machine epsilon as E16 which is approximately 2.22E-16. If we take a simple addition of two Real numbers, x+y, then the resulting machine expressed number is (x+y)*(1+d1) where abs(d1) < E16. Likewise, if we then multiply that number by z, the resulting value is really (z*((x+y)*(1+d1))*(1+d2)) which is nearly (z*(x+y)*(1+d1+d2)) where abs(d1+d2) < 2*E16. If we now move to extended double precision, then the only thing that changes is that E16 turns to E20 and has a value of around 1.08E-19.
My hope was to perform the analysis in extended double precision so that I could compare two numbers which should be equal but show that, on occasion, roundoff error will cause comparisons to fail. By assigning x8=x10, I was hoping to create a double precision 'version' of the extended double precision value x10, where only the first ~16 digits of x8 conform to the values of x10, but upon printing out the values, it shows that all 20 digits are the same and the expected double precision roundoff error is not occurring as I would expect.
It should also be noted that before this attempt, I wrote a program which actually writes another program where the values of x, y, and z are 'hardcoded' to 20 decimal places. In this version of the program, the comparisons of .gt. and .lt. failed as expected, but I am not able to duplicate the same failures by casting an extended double precision value as a double precision variable.
In an attempt to further 'perturb' the double precision values and add roundoff error, I have added, then substracted, pi from my double precision variables which should leave the remaining variables with some double precision roundoff error, but I am still not seeing that in the final result.
As the gfortran documentation you link states, the function result of rand is a default real value (single precision). Such a value can be represented exactly by each of your other real types.
That is, x10=rand() assigns a single precision value to the extended precision variable x10. It does so exactly. This same value now stored in x10 is assigned to the double precision variable x8, but this remains exactly representable as double precision.
There is sufficient precision in the single-as-double that the calculations using double and extended types return the same value. [See the note at the end of this answer.]
If you wish to see real effects of loss of precision, then start by using an extended or double precision value. For example, rather than using rand (returning a single precision value), use the intrinsic random_number
call random_number(x10)
(which has the advantage of being standard Fortran). Unlike a function, which in (nearly) all cases returns a value type regardless of the end use of the value, this subroutine will give you a precision corresponding to the argument. You will (hopefully) see much as you will from your "hard-coded" experiment.
Alternatively, as agentp commented, it may be more intuitive to start with a double precision value
call random_number(x8); x10=x8 ! x8 and x10 have the precision of double precision
call random_number(y8); y10=y8
call random_number(z8); z10=z8
and perform the calculations from that starting point: those extra bits will then start to show.
In summary, when you do x8=x10 you are getting the first few bits of x8 corresponding to those of x10, but many of those bits and those that follow in x10 are all zero.
When it comes to your pi_dp perturbation, you are again assigning a single precision (this time a literal constant) value to a double precision variable. Just having all those digits doesn't make it anything other than a default real literal. You can specify a different kind of literal with a _Edp suffix, as described in other answers.
Finally, one also then has to worry about what the compiler does with regards to optimization.
My thesis is that starting from the single precision value, the calculations performed are representable exactly in both double and extended precision (with the same values). For other calculations, or from a starting point with more bits set, or representations (for example, on some systems or with other compilers the numeric type with kind selected_real_kind(17) may have quite different characteristics such as a different radix) that needn't be the case.
While this was largely based on guessing and hoping it explained the observation. Fortunately, there are ways to test this idea. As we're talking about IEEE arithmetic we can consider the inexact flag. If that flag isn't raised during the computation we can be happy.
With gfortran there is the compilation option -ffpe=inexact which will make the inexact flag signalling. With gfortran 5.0 the intrinsic module ieee_exceptions is supported which can be used in a portable/standard manner.
You can consider this flag for further experimentation: if it is raised then you can expect to see differences between the two precisions.

Not getting expected results with pow() in C++

I am doing a homework assignment for my C++ class, and am trying to do what I consider a fairly simple math function, but I'm getting stumped because it is not returning what I expect it to return. Basically, it's just an APR problem and I am trying to convert from APR to monthly interest. My instructor has given us this formula to calculate this:
monthlyInterest = ((APR / 100) + 1)^(1/12)
So this is the code I am using. I separated yearlyInterest from the monthlyInterest calculation because I was making sure I wasn't making a simple mistake:
double balance, payment, APR;
cin >> balance >> payment >> APR;
const double yearlyInterest = (APR / 100) + 1;
const double monthlyInterest = pow(yearlyInterest, 1/12)
Using the inputs 1000, 100, and 19.9, the results I get in my physical calculator, and what I am expecting are:
yearlyInterest ~1.19
monthlyInterest ~1.015...
But the result my debugger is giving me is:
yearlyInterest ~1.19
monthlyInterest = 1
So basically, I am asking why my monthlyInterest is incorrect? I don't believe it is a type issue because pow() will output a double. Also, I don't think that the decimal should overflow the double type, it's not that big and I only need a few digits of accuracy anyway. So if anyone can help me determine the mistake I made, I would appreciate it.
Sidenote: I have included the following. I use instead of because it is what I learned on. If this is a problem, I can change it.
#include <iostream>
#include <cmath>
In this line:
const double monthlyInterest = pow(yearlyInterest, 1/12)
1/12 gets rounded to 0 (because it's integer division) and the result is 1. Replace with:
const double monthlyInterest = pow(yearlyInterest, 1.0/12)
Mind the ..
1/12 and APR are performing integer division.
If you explicitly use floating point numbers you'll get your expected results.
Instead of 1/12 use 1/12. (notice decimal point)
and instead of APR/100 use APR/100. (notice decimal point)
One thing is clear and that is that 1/12 == 0 (try 1./12).
Danger, Will Robinson! 1/12 is not the same as 1.0/12.0.
You need to make the expression 1/12 return a float value to force floating point division instead of integer division.
change:
const double monthlyInterest = pow(yearlyInterest, 1/12);
to:
const double monthlyInterest = pow(yearlyInterest, 1.0/12);
const double monthlyInterest = pow(yearlyInterest, 1/12);
integer division of 1/12 truncates the fractional result to 0. Thus the above line would be processed as
const double monthlyInterest = pow(yearlyInterest, 0);
and a number raised to the 0 power equals 1. Thus in your case monthlyInterest is assigned the value 1.

Why do I get two different outputs here?

The following two pieces of code produce two different outputs.
//this one gives incorrect output
cpp_dec_float_50 x=log(2)
std::cout << std::setprecision(std::numeric_limits<cpp_dec_float_50>::digits)<< x << std::endl;
The output it gives is
0.69314718055994528622676398299518041312694549560547
which is only correct upto the 15th decimal place. Had x been double, even then we'd have got first 15 digits correct. It seems that the result is overflowing. I don't see though why it should. cpp_dec_float_50 is supposed to have 50 digits precision.
//this one gives correct output
cpp_dec_float_50 x=2
std::cout << std::setprecision(std::numeric_limits<cpp_dec_float_50>::digits)<< log(x) << std::endl;
The output it gives is
0.69314718055994530941723212145817656807550013436026
which is correct according to wolframaplha .
When you do log(2), you're using the implementation of log in the standard library, which takes a double and returns a double, so the computation is carried out to double precision.
Only after that's computed (to, as you noted, a mere 15 digits of precision) is the result converted to your 50-digit extended precision number.
When you do:
cpp_dec_float_50 x=2;
/* ... */ log(x);
You're passing an extended precision number to start with, so (apparently) an extended precision overload of log is being selected, so it computes the result to the 50 digit precision you (apparently) want.
This is really just a complex version of:
float a = 1 / 2;
Here, 1 / 2 is integer division because the parameters are integers. It's only converted to a float to be stored in a after the result is computed.
C++ rules for how to compute a result do not depend on what you do with that result. So the actual calculation of log(2) is the same whether you store it in an int, a float, or a cpp_dec_float_50.
Your second bit of code is the equivalent of:
float b = 1;
float c = 2;
float a = b / c;
Now, you're calling / on a float, so you get floating point division. C++'s rules do take into account the types of arguments and paramaters. That's complex enough, and trying to also take into account what you do with the result would make C++'s already overly-complex rules incomprehensible to mere mortals.