In my question about Analysis of float/double precision in 32 decimal digits, one answer said to take a look at __float128.
I used it and the compiler could find it, but I can not print it, since the complier can not find the header quadmath.h.
So my questions are:
__float128 is standard, correct?
How to print it?
Isn't quadmath.h standard?
These answers did not help:
Use extern C
Precision in C++
Printing
The ref also did not help.
Note that I do not want to use any non standard library.
[EDIT]
It would be also useful, if that question had an answer, even if the answer was a negative one.
Work in GNU-Fortran! It allows to run the same program in different precision: single (32 bit), double (64 bit), extended (80 bit) and quad (128 bit). You don't have to do any changes in the program, you simply write 'real' for all floating points. The size of floating points is set by compiler options -freal-4-real-8, -freal-4-real-10 and -freal-4-real-16.
Using the boost library was the best answer for me:
#include <boost/multiprecision/float128.hpp>
#include <boost/math/special_functions/gamma.hpp>
using namespace boost::multiprecision;
float128 su1= 0.33333333333333333q;
cout << "su1=" << su1 << endl;
Remember to link this library:
-lquadmath
No, it's not standard - neither the type nor the header. That's why the type has a double underscore (reserved name). Apparently, quadmath.h provides a quadmath_snprintf method. In C++ you would have used <<, of course.
Related
In my question about Analysis of float/double precision in 32 decimal digits, one answer said to take a look at __float128.
I used it and the compiler could find it, but I can not print it, since the complier can not find the header quadmath.h.
So my questions are:
__float128 is standard, correct?
How to print it?
Isn't quadmath.h standard?
These answers did not help:
Use extern C
Precision in C++
Printing
The ref also did not help.
Note that I do not want to use any non standard library.
[EDIT]
It would be also useful, if that question had an answer, even if the answer was a negative one.
Work in GNU-Fortran! It allows to run the same program in different precision: single (32 bit), double (64 bit), extended (80 bit) and quad (128 bit). You don't have to do any changes in the program, you simply write 'real' for all floating points. The size of floating points is set by compiler options -freal-4-real-8, -freal-4-real-10 and -freal-4-real-16.
Using the boost library was the best answer for me:
#include <boost/multiprecision/float128.hpp>
#include <boost/math/special_functions/gamma.hpp>
using namespace boost::multiprecision;
float128 su1= 0.33333333333333333q;
cout << "su1=" << su1 << endl;
Remember to link this library:
-lquadmath
No, it's not standard - neither the type nor the header. That's why the type has a double underscore (reserved name). Apparently, quadmath.h provides a quadmath_snprintf method. In C++ you would have used <<, of course.
Running the following
#include <iostream>
#include <complex>
int main()
{
std::complex<double> i (0,1);
std::complex<double> comp =pow(i, 2 );
std::cout<<comp<<std::endl;
return 0;
}
gives me the expected result (-1,0) without c++11. However, compiling with c++11 gives the highly annoying (-1,1.22461e-016).
What to do, and what is best practice?
Of course this can be fixed manually by flooring etc., but I would appreciate to know the proper way of addressing the problem.
SYSTEM: Win8.1, using Desktop Qt 5.1.1 (Qt Creator) with MinGW 4.8 32 bit. Using c++11 by adding the flag QMAKE_CXXFLAGS += -std=c++11 in the Qt Creator .pro file.
In C++11 we have a few new overloads of pow(std::complex). GCC has two nonstandard overloads on top of that, one for raising to an int and one for raising to an unsigned int.
One of the new standard overloads (namely std::complex</*Promoted*/> pow(const std::complex<T> &, const U &)) causes an ambiguity when calling pow(i, 2) with the non-standard ones. Their solution is to #ifdef the non-standard ones out in the presence of C++11 and you go from calling the specialized function (which uses successive squaring) to the generic method (which uses pow(double,double) and std::polar).
You need to get into a different mode when you are using floating point numbers. Floating points are APPROXIMATIONS of real numbers.
1.22461e-016 is
0.0000000000000000122461
An engineer would say that IS zero. You will always get such variations (unless you stick to operations on sums of powers of 2 with the same general range.
A value as simple 0.1 cannot be represented exactly with floating point numbers.
The general problem you present has to parts:
1. Dealing with floating point numbers in processing
2. Displaying flooding point numbers.
For the processing, I would wager that doing:
comp = i * i ;
Would give you want you want.
Pow (x, y) is going to do
exp (log (x) * y)
For output, switch to using an F format.
I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.
Is there a way to use decimal data types such as decimal32, decimal64 or decimal128in my C++ programs?
The classes from the Decimal TR are not implemented for all compilers. Some compilers, e.g., gcc, implement the C Decimal TR and provide the corresponding extensions in C++, too. In the past there was an open source implementation for the C++ Decimal TR available but I failed to locate it. If your compiler doesn't support the decimal types, your best option is probably to create a wrapper for IBM's decNumber library.
To improve the situation in the future of C++, I have created a plan to update the TR and I'm going to turn the current TR into a complete proposal ready for the next C++ committee meeting (in April in Bristol), trying to get it adopted into the C++ standard, possibly into the revision planned for 2014. The implementation I have is part of my regular work and it isn't up to me to decide whether it is can be made available publically although there is some hope that it can be open sourced at some point.
You can use easy to use header-only solution for C++ with templates:
https://github.com/vpiotr/decimal_for_cpp
Notice that this is not a *Big*Decimal class; it is limited to 64 bits' worth of "mantissa" digits.
[taken from link]
#include "decimal.h"
using namespace dec;
// the following declares currency variable with 2 decimal points
// initialized with integer value (can be also floating-point)
decimal<2> value(143125);
// to use non-decimal constants you need to convert them to decimal
value = value / decimal_cast<2>(333.0);
// output values
cout << "Result is: " << value << endl;
// this should display something like "429.80"
// to mix decimals with different precision use decimal_cast
decimal<6> exchangeRate(12.1234);
value = decimal_cast<2>(decimal_cast<6>(value) * exchangeRate);
cout << "Result 2 is: " << value << endl;
// this should display something like "5210.64"
cout << "Result 2<6> is: " << decimal_cast<6>(value) << endl;
// this should display something like "5210.640000"
use an int32 or int64, and (manually) shift the decimal point to where you want it to be. If you're measuring dollars, for example, just measure cents instead and display the value differently. simple!
Boost has cpp_dec_float as well. That's probably the best solution until it's adopted into the standard.
https://www.boost.org/doc/libs/1_68_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/cpp_dec_float.html
EDIT: This library uses floating point values in the implementation so is not a true decimal math library IMO.
gcc/clang (usually) come with their own floating point decimal implementations, if your distro decides to compile them into whatever gcc/clang version they offer (not the case for some arm distros I tried out). This is why you sometimes need a custom decimal type implementation. Try mine for ideas (tested on i586 all the way to aarch64).
decimal.h this library is not exit is is saying by my compiler.
/tmp/TQDyfEvEXQ.cpp:2:10: fatal error: decimal.h: No such file or directory
2 | #include <decimal.h>
| ^~~~~~~~~~~
compilation terminated.```
This code:
#include <iostream>
int main( int, char **argv )
{
std::cout << 1.23e45 << std::endl;
}
prints
1.23e+045
when compiled with MS Visual Studio 2003, and
1.23e+45
on my Linux machine.
How can I specify the width of the exponent field (and why is there a difference in the first place)?
I don't think this is possible with standard manipulators. (if it is, I'd love to be corrected and learn how)
Your only remaining option is creating a streambuf yourself, and intercepting all exponent numbers that go to the stream, reformat them by hand, and pass them on to the underlying stream.
Seems a lot of work, and while not rocket science, no trivial task either.
On the 'why' question: I know linux defines the exponent as minimum two digits, I suppose Windows specifies it as minimum three?
// on linux
std::cout << std::scientific << 1.23e4 << std::endl
Also adds a leading zero:
1.230000e+04
As a follow-up of #Pieter's answer, I've been looking inside the operator<< (ostream&, double). Indeed, there is no field to specify the significance or width of the exponent. On windows, the operator<< forwards to sprintf, which has no exponent-size either.
In it's turn, the sprintf function (on Windows) calls into _cfltcvt_l, a function we have no source code for, but whose signature doesn't provide for an exponent-precision.
I know nothing of the implementation on Linux.
Look into the iomanip header. It has a lot of width-precision etc... functionality.