Dfference between float and double [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
First code
double pi=3.14159,a=100.64;
cin>>a;
double sum=(a*a)*pi;
cout <<fixed<<setprecision(4)<<"Value is="<<sum<<endl;
return 0;
the value is =31819.3103
second code
float pi=3.14159,a=100.64;
float sum=(a*a)*pi;
cout <<fixed<<setprecision(4)<<"Value="<<sum<<endl;
return 0;
the value is =31819.3105
why the difference between two value ?

In both float and double (and all other floating-point types available in c++) the values are represented in floating-point form: to store x = m * 2^p, the values m and p are written to memory.
Obviously, not all real numbers can be represented in such form (especially given that the maximum length of m and p is limited). All the numbers that cannot be represented in such form are rounded to one of the nearest neighbours. Since both 3.14159 and 100.64 are infinite fractions in the binary system, both of them are rounded, and when you write a = 3.14159, a is really a bit different.
Subsequently, the result of some expression calculation on the rounded values is not precise and may vary if we use a different rounding mode, that's why you see the result you see.
Probably, the value obtained by using double is more precise as double on most architectures and compilers uses more digits of mantissa. To achieve even more precision, consider using long double.

Related

How can we clearly know the precision in double or float in C/C++? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Suppose we have a real number a which has infinite precision.
Now, we have floating type double or float in C/C++ and want to represent a using those types. Let's say "a_f" is the name of the variable for a.
I already understood how the values are represented, which consists of the following three parts: sign, fraction, and exponent.
Depending on what types are used, the number of bits assigned for fraction and exponent differ and that determines the "precision".
How is the precision defined in this sense?
Is that the upper bound of absolute difference between a and a_f (|a - a_f|), or is that anything else?
In the case of double, why is the "precision" bounded by 2^{-54}??
Thank you.
The precision of floating point types is normally defined in terms of the the number of digits in the mantissa, which can be obtained using std::numeric_limits<T>::digits (where T is the floating point type of interest - float, double, etc).
The number of digits in the mantissa is defined in terms of the radix, obtained using std::numeric_limits<T>::radix.
Both the number of digits and radix of floating point types are implementation defined. I'm not aware of any real-world implementation that supports a floating point radix other than 2 (but the C++ standard doesn't require that).
If the radix is 2 std::numeric_limits<T>::digits is the number of bits (i.e. base two digits), and that defines the precision of the floating point type. For IEEE754 double precision types, that works out to 54 bits precision - but the C++ standard does not require an implementation to use IEEE floating point representations.
When storing a real value a in a floating point variable, the actual variable stored (what you're describing as a_f) is the nearest approximation that can be represented (assuming effects like overflow do not occur). The difference (or magnitude of the difference) between the two does not only depend on the mantissa - it also depends on the floating point exponent - so there is no fixed upper bound.
Practically (in very inaccurate terms) the possible difference between a value and its floating point approximation is related to the magnitude of the value. Floating point variables do not represent a uniformly distributed set of values between the minimum and maximum representable values - this is a trade-off of representation using a mantissa and exponent, which is necessary to be able to represent a larger range of values than a integral type of the same size.
The thing with floating points is that they get more innacurate the greater or smaller they are. For example:
double x1 = 10;
double x2 = 20;
std::cout << std::boolalpha << (x1 == x2);
prints, as expected, false.
However, the following code:
// the greatest number representable as double. #include <limits>
double x1 = std::numeric_limits<double>::max();
double x2 = x1 - 10;
std::cout << std::boolalpha << (x1 == x2);
prints, unexpectedly, true, since the numbers are so big that you can't meaingfully represent x1 - 10. It gets rounded to x1.
One may then ask where and what are the bounds. As we see the inconsistencies, we obvioulsy need some tools to inspect them. <limits> and <cmath> are your friends.
std::nextafter:
std::nextafter takes two floats or doubles. The first argument is our starting point and the second one represents the direction where we want to compute the next, representable value. For example, we can see that:
double x1 = 10;
double x2 = std::nextafter(x1, std::numeric_limits<double>::max());
std::cout << std::setprecision(std::numeric_limits<double>::digits) << x2;
x2 is slightly more than 10. On the other hand:
double x1 = std::numeric_limits<double>::max();
double x2 = std::nextafter(x1, std::numeric_limits<double>::lowest());
std::cout << std::setprecision(std::numeric_limits<double>::digits)
<< x1 << '\n' << x2;
Outputs on my machine:
1.79769313486231570814527423731704356798070567525845e+308
1.7976931348623155085612432838450624023434343715745934e+308
^ difference
This is only 16th decimal place. Considering that this number is multiplied by 10308, you can see why dividing 10 changed absolutely nothing.
It's tough to talk about specific values. One may estimate that doubles have 15 digits of precision (combined before and after dot) and it's a decent estimation, however, if you want to be sure, use convenient tools designed for this specific task.
For instance, number 123456789 may be represented as .12 * 10^9 or maybe .12345 * 10^9 or .1234567 * 10^9. None of these is an exact representation and some are better than the others. Which one you go with depends on how many bits you have for the fraction. More bits means more precision. The number of bits used to represent the fraction is called the "precision".

Efficient way to compute the next higher integer after a float? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Surprisingly I can't find an easy reference on this, I want to compute:
float x = /*...*/;
float next = nextint(x);
where next is strictly greater than x (ie if x is an integer, return the next higher integer). Ideally without branches.
You seem to want floor + 1:
float next = floorf(x) + 1; // or std::floor
Note that this gives you the mathematically next integer, rounded to nearest representable value, which for large x may be x itself. This does not produce a strictly larger representable integer in such case. You should consider whether this is what you intend.
There is a way to get the correct result even for large floating point numbers (where the next float may be more than 1 away).
float nextint(float x)
{
constexpr float MAX_VALUE = std::numeric_limits<float>::max();
return std::ceil(std::nextafter(x, MAX_VALUE));
}
First, we move to the next representable floating point value (towards positive infinity). Then we round up to the nearest floating point value.
Proof of correctness:
We trivially satisfy the "strictly greater" criterion because nextafter strictly increases the number and ceil never lowers it.
We never advance by more than one representable integer (that is, we actually get the "next higher" one): Either nextafter(x) is already the next higher representable integer (in which case ceil leaves it unchanged), or it is a float between x and the next higher integer (in which case ceil takes us to the latter).

Can you add different number types together in c++? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm quite new to c++,and I'm wondering whether you can add different number types together,like this:
int num1=1;
float num2=1.0;
double num3=1.0;
can you add these variables together?If you can,what type would
num1+num2+num3
be?
As already said, the answer will be double.
What the compiler will do for this (without optimization) is
Read literal 1 into num1
Read literal 1.0f into num2
Read literal 1.0 info num3
Convert integer num1 to float num1'
Add num1' and num2, result is float tmp
Convert float tmp to double tmp'
Add tmp' and num3 to get the final double result
You need to take some care with these conversions. Whilst you can convert float (and int) to double without any loss of precision, you can't always do the same with int to float.
float has 24 bits of precision, which means that it can exactly represent all integers up to about 16.8 million, while a signed int can go up to about 2 billion. See here for details.
[I'm assuming the LP64 model]
The answer is double. if you want to test it, you can try auto ret = num1+num2+num3
ans see that type ret has.
Yes of course... the result will give a float
1+1.0+1.0=3.0
since double

How to find digits after decimal? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I want to write a program in C++ to cin a Decimal number and cout the digits after Decimal
for example 0.26547 -> 5.
I wrote this but not work correctly:
int main()
{
int i=0,b;
float a ;
cin>>a ;
while(a!=0)
{
a*=10 ;
b=a ;
a-=b ;
i+=1 ;
}
cout<<i ;
}
For example for 0.258 instead of 3, returns 20.
can one explain me what is the problem of this code ?
thank you
C++ permits decimal representation of floating point numbers, but as far as I know all extant implementations use binary representation. And then the idea of storing the user's decimal number specification as a floating point value, loses critical information about the decimal digits: they're simply not there any more. So to count the decimal digits in the specification, you have to store it as a string.
Pseudo-code:
input number specification as a string, e.g. using getline.
verify that it's a valid number specification, e.g. using stod.
scan for the first period from the right, call this position P.
scan for the maximum number of decimal digits from position P.
I am unsure why your code does not work (What compiler are you using?), but I think it might related to
b=a;
Try explicitly casting your float to an int
b = int(a);
Alternatively, you could choose not to use an int, and round a float down using the function floor by including math.h #include <math.h>
float a = 5.9f;
float b = floor(a);
a -= b;

Rounding double in C++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I apologize for asking yet another rounding question. However, all the search has not yielded a satisfactory solution to my problem. The only possible answer is that what I am looking for may not be possible at all. Just wanted to make sure if the experts think the same.
So, here is my sample code:
double Round(double dbVal, int nPlaces)
{
const double dbShift = pow(10.0, nPlaces);
return floor(dbVal * dbShift + 0.5) / dbShift;
}
main()
{
string sNum = "1.29585";
double dNum = stod(sNum);
int iNumDecimals = 5;
double dRoundedNum = Round(dNum, iNumDecimals);
}
The number sNum is read as a string from a file. For example, the number in the file is 1.29585. I convert it to double using stod. dNum comes out to be 1.295849999999.... I would like to get back 1.29585 in double. Using a Round function as shown above does not help. The round function also returns 1.295849999999....
Is it possible to get back the exact 1.29585 at all? Any other possible solution?
Thanks in advance for any advice.
Your number is rounded to the closest representable double to the number you provided ( 1.29585 ). To 17 places, it is: 1.29584999999999995
The next largest representable double-precision number is 1 / 252 larger than that: 1.29585000000000017.
That's roughly 1 part in 5 quadrillion. An error of that magnitude, scaled to the circumference of the entire solar system, would only be about 8 centimeters.
So, in terms of rounding, the double you have is correctly rounded to the nearest representable binary value.
By default, floating point numbers are stored in binary. Just as you can't express 1/3 as an exact decimal fraction (you can approximate it with "0.33333333", extending out the 3s until you get sick of it), you can't express all round decimal values exactly in binary.
If you're curious what the above two values look like in binary: (You can refer to the diagram and description here to understand how to interpret that hexadecimal value if you are interested.)
1.29584999999999995 == 0x3FF4BBCD35A85879
1.29585000000000017 == 0x3FF4BBCD35A8587A
For your zillions of calculations, this approximate representation should cause no problem, unless you're computing a series of values that need to be rounded to an exact number of decimal places. Typically, that's only necessary if you're computing actual bank transactions or the like. Bankers want decimal rounding, so that their computations today match the way computations were done 100 years ago so that they have continuity between the pre- and post-computer eras, not because they're magically more accurate.
Double precision arithmetic carries 16 to 17 decimal positions of precision. The fact that it doesn't print as a nice round number of decimal digits doesn't mean it's inaccurate. If you compare the calculation the computer makes with double precision to the same calculation you'd do by hand (even with the aid of a standard calculator displaying 9 to 12 digits of precision), the computer's double precision arithmetic will generally come out ahead.
What you most likely want to do is to make sure to print out your final calculations to the appropriate number of decimal places. For example, you can use std::setprecision() from <iomanip> to control the precision of values printed via std::cout.
EDIT: If your application actually requires decimal arithmetic and decimal rounding, then you will need to look into decimal floating point support, either built into the compiler or in a 3rd party library. Some recent compilers do have support for this, and some processors even have hardware support for decimal floating point. GCC has decimal floating point extensions, for example, as does Intel's compiler. In contrast, Microsoft suggests finding a BCD library, it seems.
Try this way, i donĀ“t remember the format for double right now, i use float.
float num;
sscanf("1.29585", "%f", &num);
std::cout << num << std::endl;
The "I whipped it up in Haskell and translated it" answer:
#include <iostream>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
double d = 1.2958499999999;
cout << setprecision(15) << d << endl;
cout << setprecision(15) << double(round(d * 1e5) / 1e5) << endl;
return 0;
}
// outputs:
// 1.2958499999999
// 1.29585
It is hardly a general answer, but it is correct to the letter of the question. I highly recommend understanding the evil that is IEEE floating point using e.g. Jim Buck's reference rather than putting this hack to any great use.