Java-style printing of `double` in C++ - c++

Is there any combination of stream manipulators (or any other method in the standard C++) that would allow me to get the "right" number of digits when printing double in C++?
By the "right" number I mean the number of digits as defined here:
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
In a bit of a simplistic example, let's suppose that we have three double values: DD, D0 and D1. DD is the "middle", D1 has mantissa larger by 1, D0 smaller by 1.
When printed to some very large arbitrary precision, they produce the following values (the numbers in the example are completely off the wall):
D0 => 1.299999999701323987
DD => 1.300000000124034353
D1 => 1.300000000524034353
(EPSILON, the value of least significant bit of mantissa at 0 exponent, is ~ 0.0000000004)
In that case, the method above would produce
D0 => 1.2999999997
DD => 1.3
DD => 1.3000000005

If I understand you correctly, you want std::to_chars.
value is converted to a string as if by std::printf in the default ("C") locale. The conversion specifier is f or e (resolving in favor of f in case of a tie), chosen according to the requirement for a shortest representation: the string representation consists of the smallest number of characters such that there is at least one digit before the radix point (if present) and parsing the representation using the corresponding std::from_chars function recovers value exactly. If there are several such representations, one with the smallest difference to value is chosen, resolving any remaining ties using rounding according to std::round_to_nearest.
It's a low-level function, so printing the result requires some work:
run on gcc.godbolt.org
#include <charconv>
#include <iostream>
int main()
{
double val = 0.1234;
char buf[64];
*std::to_chars(buf, buf + sizeof buf, val).ptr = '\0';
std::cout << buf << '\n';
}
This function needs an up-to-date standard library: GCC (libstdc++) 11 or newer, or Clang (libc++) 14 or newer (currently trunk). MSVC also supports it.

Absent the C++17 library function that isn’t implemented everywhere yet, you can just try precisions (with whatever equivalent of %g) until one rounds back exactly. If you don’t care about subnormal numbers (where even two digits can be too many), start at 15 for IEEE 754 doubles; that never produces meaningless trailing digits. You’ll never need to go past 17.

Related

How to assign exact content of a string to a double in c++

I dont' have much experience with C++, I have the following problem.
With the following code :
double d = 0.0000000;
stringstream ss;
ss << std::fixed << std::setprecision( 2 ) << d;
ss >> d;
or
std::string content = ss.str();
d = atof( content.c_str() );
Either of the two ways while debuging in MS Visual Studio , I see the value of d is 0.0000000 not 0.00 as in the string content
How do I get exact content of string assigned to double d?
May be I should ask a broader question :
I am writing a method that returns a double with precision as needed. For example if I have 2.446343434 as value of d and precision is 2, how can I get my method return d as 2.45 ?
After reading the below answers : I came to know that it is not possible to do such thing. So the next question is :
Even if my above code tries to put 2.45 into double, the C++ runtime will append zeros ( how many ? ) to 2.45 and return right? Is there a way to control appending zeros to the double?
I see the value of d is 0.0000000 not 0.00 as in the string
But both of those numbers have exactly the same value. So, a number can have the value 0.0000000 if and only if it also has the value 0.00.
How do I get exact content of string assigned to double d?
You cannot. double represents a numeric value. It does not represent a character string.
Also, in more general, that is not possible because floating point numbers cannot represent all the numbers that can be represented by a character string. But that is not a problem with 0, which is indeed representable.
I am writing a method that returns a double with precision as needed. For example if I have 2.446343434 as value of d and precision is 2, how can I get my method return d as 2.45
You cannot get your method to return a double with the value 2.45 unless the format of the double can represent 2.45. The binary64 format specified by IEEE 754 can not represent 2.45. In such case, the best that you can do, is to return the representable number closest to the number with 2 significant fractional digits, which in the case of IEEE 754 would be 2.45000000000000017763568394003. The program in your question achieves that.
If that's not what you want, then floating point is not appropriate for your use case.
The human-readable value for d that you're seeing in the debugger, "0.0000000", is just a representation, and a fairly arbitrary one at that. The actual double object does not store this string, nor anything with a fixed number of decimal places.
Its actual identity, at the lowest level, is (a) in binary, (b) encoded according to the floating-point specification, and (c) irrelevant for your purposes. The value is zero; period.
The debugger has simply chosen to use seven decimal places when converting the number into something you can read with your eyes and brain. When using printf or std::cout to similarly output the number for reading, you can pick some other format if you like, including a format with two decimal places to match your original string input. That's just different ways of saying the same thing.
Do not confuse value with representation.
Also, your insistence on specifically two decimal places makes me suspicious: if you're planning on using double to store currency, just don't. Floating-point types are not appropriate for that.

printing very large floating point numbers

#include <stdio.h>
#include <float.h>
int main()
{
printf("%f\n", FLT_MAX);
}
Output from GNU:
340282346638528859811704183484516925440.000000
Output from Visual Studio:
340282346638528860000000000000000000000.000000
Do the C and C++ standards allow both results? Or do they mandate a specific result?
Note that FLT_MAX = 2^128-2^104 = 340282346638528859811704183484516925440.
I think the relevant part of the C99 standard is the "Recommended practice" from 7.19.6.1 p.13:
For e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most
DECIMAL_DIG, then the result should be correctly rounded. If the number of
significant decimal digits is more than DECIMAL_DIG but the source value is exactly
representable with DECIMAL_DIG digits, then the result should be an exact
representation with trailing zeros. Otherwise, the source value is bounded by two
adjacent decimal strings L < U, both having DECIMAL_DIG significant digits; the value
of the resultant decimal string D should satisfy L <= D <= U, with the extra stipulation that
the error should have a correct sign for the current rounding direction.
My impression is that this allows some leeway in what may be printed in this case; so my conclusion is that both VS and GCC are compliant here.
Both are allowed by the C standard (C++ just inports the C standard)
From a draft version in section 5.2.4.2.2 part 10
The values given in the following list shall be replaced by constant expressions with
implementation-defined values that are greater than or equal to those shown:
— maximum representable finite floating-point number, (1 − b −p)b emax
FLT_MAX 1E+37
and visual C++ 2012 has
#define FLT_MAX 3.402823466e+38F /* max value */
The code itself is flawed where it uses %f for a value larger than significance held in a float or double. By doing so you are asking to see "behind the curtain" as to the meaningless guard bits or other floating point noise generated in the conversion to decimal.
Clearly you should not expect any consistency in the metal filings generated after making an engine at Honda versus at Toyota. Nevermind any sensible expectation of such consistency.
The proper way to display such numbers is by using one of the "scientific" formats such as %g provided precision is not over-specified. On IEEE-754 implementations, 7 decimal figures are significant for float, 15-16 for double, about 19 for long double, and 34 for __float128. So, for the example you have given, %.15g would be proper, assuming it is on a IEEE-754 implementation.

C++ double operator+ [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Incorrect floating point math?
Float compile-time calculation not happening?
Strange stuff going on today, I'm about to lose it...
#include <iomanip>
#include <iostream>
using namespace std;
int main()
{
cout << setprecision(14);
cout << (1/9+1/9+4/9) << endl;
}
This code outputs 0 on MSVC 9.0 x64 and x86 and on GCC 4.4 x64 and x86 (default options and strict math...). And as far as I remember, 1/9+1/9+4/9 = 6/9 = 2/3 != 0
1/9 is zero, because 1 and 9 are integers and divided by integer division. The same applies to 4/9.
If you want to express floating-point division through arithmetic literals, you have to either use floating-point literals 1.0/9 + 1.0/9 + 4.0/9 (or 1/9. + 1/9. + 4/9. or 1.f/9 + 1.f/9 + 4.f/9) or explicitly cast one operand to the desired floating-point type (double) 1/9 + (double) 1/9 + (double) 4/9.
P.S. Finally my chance to answer this question :)
Use a decimal point in your calculations to force floating point math optionally along with one of these suffixes: f l F L on your numbers. A number alone without a decimal point and without one of those suffixes is not considered a floating point literal.
C++03 2.13.3-1 on Floating literals:
A floating literal consists of an
integer part, a decimal point, a
fraction part, an e or E, an
optionally signed integer exponent,
and an optional type suffix. The
integer and fraction parts both
consist of a sequence of decimal (base
ten) digits. Either the integer part
or the fraction part (not both) can be
omitted; either the decimal point or
the letter e (or E) and the exponent
(not both) can be omitted. The integer
part, the optional decimal point and
the optional fraction part form the
significant part of the floating
literal. The exponent, if present,
indicates the power of 10 by which the
significant part is to be scaled. If
the scaled value is in the range of
representable values for its type, the
result is the scaled value if
representable, else the larger or
smaller representable value nearest
the scaled value, chosen in an
implementation-defined manner. The
type of a floating literal is double
unless explicitly specified by a
suffix. The suffixes f and F specify
float, the suffixes l and L specify
long double. If the scaled value is
not in the range of representable
values for its type, the program is
ill-formed. 18
They are all integers. So 1/9 is 0. 4/9 is also 0. And 0 + 0 + 0 = 0. So the result is 0. If you want fractions, cast your fractions to floats.
1/9(=0)+1/9(=0)+4/9(=0) = 0
well, in C++ (and many other languages), 1/9+1/9+4/9 is zero, because it is integer arithmetic.
You probably want to write 1/9.0+1/9.0+4/9.0
Unless you specifically specify the decimal, the numbers C++ uses are integers, so 1/9 = 4/9 = 0 and 0 + 0 + 0 = 0.
You should simply add the decimal 1.0 etc...
By the C rules of types, you're doing all integer math there. 1/9 and 4/9 are both truncated to 0 (as integers). If you wrote 1.0/9.0 etc, it would use double precision math and do what you want.
You might make it a habit to use more parentheses. They cost little time, make clear what you intend, and ensure you get what you wanted. Well mostly... ;)

Fixed Length Float in C/C++?

I was wondering whether it is possible to limit the number of characters we enter in a float.
I couldn't seem to find any method. I have to read in data from an external interface which sends float data of the form xx.xx. As of now I am using conversion to char and vice-versa, which is a messy work-around. Can someone suggest inputs to improve the solution?
If you always have/want only 2 decimal places for your numbers, and absolute size is not such a big issue, why not work internally with integers instead, but having their meaning be "100th of the target unit". At the end you just need to convert them back to a float and divide by 100.0 and you're back to what you want.
This is a slight misunderstanding. You cannot think of a float or double as being a decimal number.
Most any attempt to use it as a fixed decimal number of precision, say, 2, will incur problems as some values will not be precisely equal to xxx.xx but only approximately so.
One solution that many apps use is to ensure that:
1) display of floating point numbers is well controlled using printf/sprintf to a certain number of significant digits,
2) one does not do exact comparison between floating point numbers, i.e. to compare to the 2nd decimal point of precision two numbers a, b : abs(a-b) <= epsilon should generally be used. Outright equality is dangerous as 0.01 might have multiple floating point values, e.g. 0.0101 and 0.0103 might result if you do arithmetic, but be indistinguishable to the user if values are truncated to 2 dp, and they may be logically equivalent to your application which is assuming 2dp precision.
Lastly, I would suggest you use double instead of float. These days there is no real overhead as we aren't doing floating point without a maths coprocessor any more! And a float under 32-bit architectures has 7 decimal points of precision, and a double has 15, and this is enough to be significant in many case.
Rounding a float (that is, binary floating-point number) to 2 decimal digits doesn't make much sense because you won't be able to round it exactly in some cases anyway, so you'll still get a small delta which will affect subsequent calculations. If you really need it to be precisely 2 places, then you need to use decimal arithmetic; for example, using IBM's decNumber++ library, which implements ISO C/C++ TR 24773 draft
You can limit the number of significant numbers to output:
http://www.cplusplus.com/reference/iostream/manipulators/setprecision/
but I don't think there is a function to actually lop off a certain number of digits. You could write a function using ftoa() (or stringstream), lop off a certain number of digits, and use atof() (or stringstream) and return that.
You should checks the string rather than the converted float. It will be easier to check the number of digits.
Why don't you just round the floats to the desired precision?
double round(double val, int decimalPlaces)
{
double power_of_10 = pow(10.0, static_cast<double>(decimalPlaces));
return floor(val * power_of_10 + 0.5) / power_of_10;
}
int main()
{
double d;
cin >> d;
// round d to 3 decimal places...
d = round(d, 3);
// do something with d
d *= 1.75;
cout << setprecision(3) << d; // now output to 3 decimal places
}
There exist no fixed point decimal datatype in C, but you can mimic pascal's decimal with a struct of two ints.
If the need is to take 5 digits [ including or excluding the decimal point ], you could simply write like below.
scanf( "%5f", &a );
where a is declared as float.
Fo eg:
If you enter 123.45, scanf will consider the first 5 characters i.e., 4 digits and the decimal point & will store 123.4
If entered 123456, the value of a will be 12345 [ ~ 12345.00 ]
With printf, we would be able to control how many characters can be printed after decimal as well.
printf( "%5.2f \n", a );
The value of 123.4 will be printed as 12.30 [ total 5, including the decimal & 2 digits after decimal ]
But this have a limitation, where if the digits in the value are more than 5, it will display the actual value.
eg: The value of 123456.7, will be displayed as 123456.70.
This [ specifying the no. of digits after the decimal, as mentioned for printf ] I heard can be used for scanf as well, I am not sure sure & the compiler I use doesn't support that format. Verify whether your compiler does.
Now, when it comes to taking data from an external interface, are you talking about serialization here, I mean transmission of data on netwrok.
Then, to my knowledge your approach is fine.
We generally tend to read in the form of char only, to make sure the application works for any format of data.
You can print a float use with printf("%.2f", float), or something similar.

double type digits in C++

The IEE754 (64 bits) floating point is supposed to correctly represent 15 significant digit although the internal representation has 17 ditigs. Is there a way to force the 16th and 17th digits to zero ??
Ref:
http://msdn.microsoft.com/en-us/library/system.double(VS.80).aspx :
.
.
Remember that a floating-point number can only approximate a decimal number, and that the precision of a floating-point number determines how accurately that number approximates a decimal number. By default, a Double value contains 15 decimal digits of precision, although a maximum of 17 digits is maintained internally. The precision of a floating-point number has several consequences:
.
.
Example nos:
d1 = 97842111437.390091
d2 = 97842111437.390076
d1 and d2 differ in 16th and 17th decimal places that are not supposed to be significant. Looking for ways to force them to zero. ie
d1 = 97842111437.390000
d2 = 97842111437.390000
No. Counter-example: the two closest floating-point numbers to a rational
1.11111111111118
(which has 15 decimal digits) are
1.1111111111111799942818834097124636173248291015625
1.1111111111111802163264883347437717020511627197265625
In other words, there is not floating-point number that starts with 1.1111111111111800.
This question is a little malformed. The hardware stores the numbers
in binary, not decimal. So in the general case you can't do precise
math in base 10. Some decimal numbers (0.1 is one of them!) do not
even have a non-repeating representation in binary. If you have
precision requirements like this, where you care about the number
being of known precision to exactly 15 decimal digits, you will need
to pick another representation for your numbers.
No, but I wonder if this is relevant to any of your issues (GCC specific):
GCC Documentation
-ffloat-store Do not store floating point variables in registers, and
inhibit other options that might
change whether a floating point value
is taken from a register or memory.
This option prevents undesirable
excess precision on machines such as
the 68000 where the floating registers
(of the 68881) keep more precision
than a double is supposed to have.
Similarly for the x86 architecture.
For most programs, the excess
precision does only good, but a few
programs rely on the precise
definition of IEEE floating point. Use
-ffloat-store for such programs, after modifying them to store all pertinent
intermediate computations into
variables.
You should be able to directly modify the bits in your number by creating a union with a field for the floating point number and an integral type of the same size. Then you can access the bits you want and set them however you want. Here is in example where I whack the sign bit; you can choose any field you want, of course.
#include <stdio.h>
union double_int {
double fp;
unsigned long long integer;
};
int main(int argc, const char *argv[])
{
double my_double = 1325.34634;
union double_int *my_union = (union double_int *)&my_double;
/* print original numbers */
printf("Float %f\n", my_double);
printf("Integer %llx\n", my_union->integer);
/* whack the sign bit to 1 */
my_union->integer |= 1ULL << 63;
/* print modified numbers */
printf("Negative float %f\n", my_double);
printf("Negative integer %llx\n", my_union->integer);
return 0;
}
Generally speaking, people only care about something like this ("I only want the first x digits") when displaying the number. That's relatively easy with stringstreams or sprintf.
If you're concerned about comparing numbers with ==; you really can't do that with floating point numbers. Instead you want to see if the numbers are close enough (say, within an epsilon() of each other).
Playing with the bits of the number directly isn't a great idea.