C++ long long to string conversion fails [closed] - c++

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following function which has to convert numbers of any size to strings:
string NumberToString(long long number) {
ostringstream NumberInStream;
NumberInStream.str("");
NumberInStream.clear();
NumberInStream << setprecision(0);
NumberInStream << fixed << number;
return NumberInStream.str();
}
The function works very well for numbers of maxlength: 9.
So for example when I input a 10-digit long number, e.g. 1234567890 it returns wrong format.
Some examples:
1494978929 became 1494978944
1494979474 became 1494979456
1494979487 became 1494979456
1494979498 became 1494979456
1494979500 became 1494979456
1494979529 became 1494979584
1494979540 became 1494979584
However,
2 became 2
120 became 120
44567 became 44567
456.45 became 456 because of setprecision(0)

Welcome to floating point precision. Try your same script using double in the function prototype instead and you will see that you get the results you want. However this will fail if you input integers of a certain length.
Just looks at the output of this . .
printf ("%f\n", 1494978929.f)
And you'll see that you cannot represent that int as a float with total precision. Call the same with .0 instead of .f at the end and you'll see a different result.

This is a problem with IEEE floating point notation: https://en.wikipedia.org/wiki/IEEE_floating_point. Your question says you are converting between a long long, but your function takes a float as an argument. Floats are stored in 32 bits in memory. Depending on implementation, the bits are split into three sets: one bit is used to determine the sign (s), then a number of bits are used for the significand (c), and the rest of the bits are the quotient (q). Depending on the bits set, the number determined using this notation is (-1)^s * c * b^q where b is the base (usually 2 or 10). How all of this is represented depends on your compiler and the ISO standard. What this means is that the numbers represented by IEEE floating point have to fit this function. Basically all of the relatively small integers you would want to represent work with this formula, but when you try to represent very small or very large numbers, IEEE floating point will break down. In your situation, some strings of numbers over 10 digits require too much precision for floats to represent. I recommend that you use a double or long double for these, or use a long long as mentioned above instead of floating point numbers.

yep, float is 32 bit, and part of that is the exponent, so float has less precision than a normal int, but it has a wide range of expression due to sacrificing some bits for the exponent.
either use double, giving you more range, but still not enough for a long long, or make it a template:
template <typename T>
string NumberToString(T number) {
ostringstream NumberInStream;
NumberInStream.str("");
NumberInStream.clear();
NumberInStream << setprecision(0);
NumberInStream << fixed << number;
return NumberInStream.str();
}
That might need some tweaking now though, but it won't lost precision due to passing your value into a data type like float or double that has less bits of precision than the number you started with.

Related

How to perform sum between double with bitwise operations [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd like to know how floating-point numbers sum works.
How can I sum two double(or float) numbers using bitwise operations?
Short answer: if you need to ask, you are not going to implement floating-point addition from bitwise operators. It is completely possible but there are a number of subtle points that you would need to have asked about before. You could start by implementing a double → float conversion function, it is simpler but would introduce you to many of the concepts. You could also do double → nearest integer as an exercise.
Nevertheless, here is the naive version of addition:
Use large arrays of bits for each of the two operands (254 + 23 for float, 2046 + 52 for double). Place the significand at the right place in the array according to the exponent. Assuming the arguments are both normalized, do not forget to place the implicit leading 1. Add the two arrays of bits with the usual rules of binary addition. Then convert the resulting array to floating-point format: first look for the leftmost 1; the position of this leftmost 1 determines the exponent. The significand of the result starts right after this leading 1 and is respectively 23- or 52-bit wide. The bits after that determine whether the value should be rounded up or down.
Although this is the naive version, it is already quite complicated.
The non-naive version does not use 2100-bit wide arrays, but takes advantage of a couple of “guard bits” instead (see section “on rounding” in this document).
The additional subtleties include:
The sign bits of the arguments can mean that the magnitudes should be subtracted for an addition, or added for a subtraction.
One of the arguments can be NaN. Then the result is NaN.
One of the arguments can be an infinity. If the other argument is finite or the same infinity, the result is the same infinity. Otherwise, the result is NaN.
One of the arguments can be a denormalized number. In this case there is no leading 1 when transferring the number to the array of bits for addition.
The result of the addition can be an infinity: depending on the details of the implementation, this would be recognized as an exponent too large to fit the format, or an overflow during the addition of the binary arrays (the overflow can also occur during the rounding step).
The result of the addition can be a denormalized number. This is recognized as the absence of a leading 1 in the first 2046 bits of the array of bits. In this case the last 52 bits of the array should be transferred to the significand of the result, and the exponent should be set to zero, to indicate a denormalized result.

How would I find 100! accurately? [Programming challenge] [duplicate]

This question already has answers here:
Calculate the factorial of an arbitrarily large number, showing all the digits
(11 answers)
Closed 9 years ago.
I tried using double but it would give me scientific answers like 3.2e+12. I need proper answer. How would I do that??
My code so far:
int n, x;
double fact;
cin>>n;
while(n--)
{
fact=1;
cin>>x;
for(;x>1;x--)
fact*=x;
cout<<fact<<endl;
}
First things first, using floating point formats such as double and float will always introduce rounding error, if you want to reduce the error with large numbers, use long or long long, however these will not be able to represent values as large as double or long double (note that the behavior and support for long long and long double varies between compilers). You might want to look into BigNums like bigint or bigdouble, though you will sacrifice preformace.
That said, this issue might also be one of setting the formatting: the number is large enough that it is outputted in scientific notation, to change this you can use
cout<<std::fixed;
possible duplicate of How to make C++ cout not use scientific notation
double is a fixed-size type, typically 64 bits, with 53 bits of precision; so it can't accurately represent any integer with more than about 16 digits. The largest standard integer type typically has 64 bits, and can represent integers up to about 19 digits. 100! is much larger than that, so can't be represented accurately by any built-in type.
You'll need a large integer type, representing a number as an array of smaller numbers. There's no standard type; you could use a library like Boost.Multiprecision or GMP or, since this is a programming challenge, implement it yourself. To calculate factorials, you'll need to implement multiplication; the easiest way to do that is with the "long multiplication" algorithm you learnt in school.
There's no data type can store such a big number as (100!) so far.
You should finish something like a BigIntenger class to calculate 100!;
And usually,such big number can be stored by strings.

Why the digits after decimal are all zero?

I want to perform some calculations and I want the result correct up to some decimal places, say 12.
So I wrote a sample:
#define PI 3.1415926535897932384626433832795028841971693993751
double d, k, h;
k = 999999/(2*PI);
h = 999999;
d = PI*k*k*h;
printf("%.12f\n", d);
But it gives the output:
79577232813771760.000000000000
I even used setprecision(), but same answer rather in exponential form.
cout<<setprecision(12)<<d<<endl;
prints
7.95772328138e+16
Used long double also, but in vain.
Now is there any way other than storing the integer part and the fractional part separately in long long int types?
If so, what can be done to get the answer precisely?
A double has only about 16 decimal digits of precision. Everything after the decimal point would be nonsense. (In fact, the last digit or two left of the point may not agree with an infinite-precision calculation.)
Long double is not standardized, AFAIK. It may be that on your system it is the same as double, or no more precise. That would slightly surprise me, but it doesn't violate anything.
You need to read Double-Precision concepts again; more carefully.
The double has increased precision by using 64 bits.
Stuff before the decimal is more important than that after it.
So, when you have a large integer part, it will truncate the lower precision -- this is being described to you in various answers here as rounding off.
Update:
To increase precision, you'll need to use some library or change your language.
Check this other question: Best coding language for dealing with large numbers (50000+ digits)
Yet, I'll ask you to re-check your intent once more.
Do you really need 12 decimal places for numbers that have really high values
(over 10 digits in the integer part like in your example)?
Maybe you won't really have large integer parts
(in which case such code should work fine).
But if you are tracking a value like 10000000000.123456789,
I am really interested in exactly which application you are working on (astronomy?).
If the integer part of your values is some way under 10000, you should be fine here.
Update2:
IF you must demonstrate the ability of a specific formula to work accurately within constrained error limits, the way to go is fixing the processing of your formula such that the least error is introduced.
Example,
If you want to do say, (x * y) / z
it would be prudent to try something like max(x,y)/z * min(x,y)
rather than, the original form which may overflow after (x * y), loosing precision if that did not fit in the 16 decimals of double
If you had just 2 digit precision,
. 2-digit regular-precision
`42 * 7 290 297
(42 * 7)/2 290/2 294/2
Result ==> 145 147
But ==> 42/2 = 21
21 * 7 = 147
This is probably the intent of your contest.
The double-precision binary format used by most computers can only hold about 16 digits, after that you'll get rounding. See http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Floating point values have a limit range of digits. Just because your "PI" value has six times as many digits as a double will support doesn't alter the way the hardware works.
A typical (IEEE754) double will produce approximately 15-16 decimal places. Whether that's 0.12345678901235, 1234567.8901235, 12345678901235 or 12345678901235000000000, or some other variation.
In other words, yes, if you calculate your calculation EXACTLY, you'll get lots of decimal places, because pi never ends. On a computer, you get about 15-16 digits, no matter what input values you use - all that changes is where in that sequence the decimal place sits. To get more, you need "big number support", such as the Gnu Multiprcession (GMP) library.
You're looking for std::fixed. That tells the ostream not to use exponential form.
cout << setprecision(12) << std::fixed << d << endl;

Long double does not print as the constant I initialized it with [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Floating point inaccuracy examples
Im having a problem... When i compile the src, the variable showed isn't the same that i initialized, see it:
#include <iostream>
using namespace std;
int main()
{
long double mynum = 4.7;
cout.setf(ios::fixed,ios::floatfield);
cout.precision( 20 );
cout << mynum << endl;
}
And then:
[fpointbin#fedora ~]$ ./a.out
4.70000000000000017764
How to fix it? I want to "cout" shows 4.700000...
Your variable is long double, but the default precision of the literal 4.7 is only double. Since you're printing it as long double, the interpretation chooses to print it with enough significant digits to distinguish it from neighbouring long double values, even though those neighbouring values are not possible doubles.
The internal representation of doubles does not allow for an 'exact' representation of 4.7. The 'closest' is 4.70000000000000017764. In reality there is no need to look at a precision of 20 when you have 64 bit doubles. The maximum effective precision is about 15. Try using 12 or so,
cout.precision( 12 );
and you should get what you want to see.
Most platforms, including yours, can only represent those floating point numbers exactly which have a short, finite binary expansion, i.e. which are finite sums of powers of two. 4.7 is not such a number, so it cannot be represented precisely on your platform, and if you demand excessive precision (20 is too much as your mantissa has 64 bits, and log_10(64) is 19.27), then you will inevitably face small errors.
(However, as #Henning says, you are already losing precision when assigning from a (non-long) double; you should write your literal constant as a long double: 4.7L. Then you should only see an error in the 20th digit.)
floats and doubles are binary floating-point types, i.e. they store a mantissa and an exponent in base 2.
This means that any decimal number that cannot be represented exactly into the finite digits of the mantissa will be approximated; the problem you showed comes from this: 4.7 cannot be represented exactly into the mantissa of a double (the literal 4.7 is of type double, kudos #Henning Makholm for spotting it), so the nearest approximation is used.
To better visualize the problem: in base 3, 2/3 is a number with a finite representation (i.e. 0.23), while in base 10 it is a periodic number (0,6666666...); if you have only a finite space for digits, you'll have to perform an approximation, that will be 0,66666667. That's exactly the same thing here, with the source base being 10 and the "target" base being 2.
If there's a special need to avoid this kind of approximations (e.g. when dealing with decimal amounts of money) special decimal types can be used, that store mantissa and exponent in base 10 (C++ do not provide such type of its own, but there are many decimal classes available on the Net); still, for "normal"/scientific calculations binary FP types are used, because they are much faster and more space-efficient.
Certain numbers cannot be represented in base two. Apparently, 4.7 is one of them. What you're seeing is the closest representable number to 4.7.
There's nothing you can do about this, other than setting the precision to a lower number.

C++: How to Convert From Float to String Without Rounding, Truncation or Padding? [duplicate]

This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))