Precision loss of 1 when converting double to float - c++

For the below program I am getting precision loss of 1 which I am unable to understand. Need help.
void main()
{
typedef std::numeric_limits< double > dbl;
cout.precision(dbl::digits10);
double x = -53686781.0;
float xFloat = (float) x;
cout << "x :: " << x << "\n";
cout << "xFloat :: " << xFloat << "\n";
}
Outpput:
x :: -53686781
xFloat :: -53686780

53686781 looks like this in binary: 11001100110011000111111101. That's 26 bits.
Your float can only store up to 24 bits in its mantissa portion, so, you end up with 110011001100110001111111 stored in it. The last two binary digits, 01, get truncated.
And 11001100110011000111111100 is 53686780.
As simple as that.

For normal floats I believe p=23, which gives 2^23 of digit precision (about 7 digits as already mentioned. Double has p=52, which gives 2^52 of digit precision (about 15 digits).
The wiki page is actually pretty good.

Related

How to set precision of a float?

For a number a = 1.263839, we can do -
float a = 1.263839
cout << fixed << setprecision(2) << a <<endl;
output :- 1.26
But what if i want set precision of a number and store it, for example-
convert 1.263839 to 1.26 without printing it.
But what if i want set precision of a number and store it
You can store the desired precision in a variable:
int precision = 2;
You can then later use this stored precision when converting the float to a string:
std::cout << std::setprecision(precision) << a;
I think OP wants to convert from 1.263839 to 1.26 without printing the number.
If this is your goal, then you first must realise, that 1.26 is not representable by most commonly used floating point representation. The closest representable 32 bit binary IEEE-754 value is 1.2599999904632568359375.
So, assuming such representation, the best that you can hope for is some value that is very close to 1.26. In best case the one I showed, but since we need to calculate the value, keep in mind that some tiny error may be involved beyond the inability to precisely represent the value (at least in theory; there is no error with your example input using the algorithm below, but the possibility of accuracy loss should always be considered with floating point math).
The calculation is as follows:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D
std::round to nearest integer.
Divide by D.
P.S. Sometimes you might not want to round to the nearest, but instead want std::floor or std::ceil to the precision. This is slightly trickier. Simply std::floor(val * D) / D is wrong. For example 9.70 floored to two decimals that way would become 9.69, which would be undesirable.
What you can do in this case is multiply with one magnitude of precision, round to nearest, then divide the extra magnitude and proceed:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D * 10
std::round to nearest integer.
Divide by 10
std::floor or std::ceil
Divide by D.
You would need to truncate it. Possibly the easiest way is to multiply it by a factor (in case of 2 decimal places, by a factor of 100), then truncate or round it, and lastly divide by the very same factor.
Now, mind you, that floating-point precision issues might occur, and that even after those operations your float might not be 1.26, but 1.26000000000003 instead.
If your goal is to store a number with a small, fixed number of digits of precision after the decimal point, you can do that by storing it as an integer with an implicit power-of-ten multiplier:
#include <stdio.h>
#include <math.h>
// Given a floating point value and the number of digits
// after the decimal-point that you want to preserve,
// returns an integer encoding of the value.
int ConvertFloatToFixedPrecision(float floatVal, int numDigitsAfterDecimalPoint)
{
return (int) roundf(floatVal*powf(10.0f, numDigitsAfterDecimalPoint));
}
// Given an integer encoding of your value (as returned
// by the above function), converts it back into a floating
// point value again.
float ConvertFixedPrecisionBackToFloat(int fixedPrecision, int numDigitsAfterDecimalPoint)
{
return ((float) fixedPrecision) / powf(10.0f, numDigitsAfterDecimalPoint);
}
int main(int argc, char ** arg)
{
const float val = 1.263839;
int fixedTwoDigits = ConvertFloatToFixedPrecision(val, 2);
printf("fixedTwoDigits=%i\n", fixedTwoDigits);
float backToFloat = ConvertFixedPrecisionBackToFloat(fixedTwoDigits, 2);
printf("backToFloat=%f\n", backToFloat);
return 0;
}
When run, the above program prints this output:
fixedTwoDigits=126
backToFloat=1.260000
If you're talking about storing exactly 1.26 in your variable, chances are you can't (there may be an off chance that exactly 1.26 works, but let's assume it doesn't for a moment) because floating point numbers don't work like that. There are always little inaccuracies because of the way computers handle floating point decimal numbers. Even if you could get 1.26 exactly, the moment you try to use it in a calculation.
That said, you can use some math and truncation tricks to get very close:
int main()
{
// our float
float a = 1.263839;
// the precision we're trying to accomplish
int precision = 100; // 3 decimal places
// because we're an int, this will keep the 126 but lose everything else
int truncated = a * precision; // multiplying by the precision ensures we keep that many digits
// convert it back to a float
// Of course, we need to ensure we're doing floating point division
float b = static_cast<float>(truncated) / precision;
cout << "a: " << a << "\n";
cout << "b: " << b << "\n";
return 0;
}
Output:
a: 1.26384
b: 1.26
Note that this is not really 1.26 here. But is is very close.
This can be demonstrated by using setprecision():
cout << "a: " << std:: setprecision(10) << a << "\n";
cout << "b: " << std:: setprecision(10) << b << "\n";
Output:
a: 1.263839006
b: 1.25999999
So again, it's not exactly 1.26, but very close, and slightly closer than you were before.
Using a stringstream would be an easy way to achieve that:
#include <iostream>
#include <iomanip>
#include <sstream>
using namespace std;
int main() {
stringstream s("");
s << fixed << setprecision(2) << 1.263839;
float a;
s >> a;
cout << a; //Outputs 1.26
return 0;
}

Ceiling Operation Gives Wrong answer

`when I Run the following code it gives me answer as 16777216 but it is supposed to give 16777215 why is this so..
int d=33554431;
d=d-ceil(d/(float)2);
cout<<d<<" ";
Well, my calculator says that 33,554,431 / 2 is actually 16,777,215.5, which means that ceil(16,777,215.5) = 16,777,216 is actually correct.
Ceil rounds up to the next bigger integer, if that was unclear.
Ok, at first I misunderstood your question; the title sounds like you are asking why the Ceiling (ceil function) isn't correct.
int d=33554431;
d=d-ceil(d/(float)2);
cout<<d<<" ";
In your second line, you cast the literal 2 to a float value so the compiler also converts d to a float when it calculates d/2. Because of the internal representation, float (single precision floating point) are limited in the values that they can accurately represent. I typically assume no more than 7 digits of precision, if I need more than that, I use doubles. Anyway if you look at this link (https://en.wikipedia.org/wiki/Single-precision_floating-point_format) integers in the range [16777217,33554432] round to a multiple of 2. SO when the compiler converts d to a float it becomes 33554432. You can see that be running the following code:
int d1 = 33554431;
float f = d1;
int d2 = f;
cout << d1 << endl;
cout << f << endl;
cout << d2 << endl;
To fix your original code, try this:
int d=33554431;
d=d-ceil(d/(double)2);
cout<<d<<" ";
or
int d=33554431;
d=d-ceil(d/2.0);
cout<<d<<" ";

c++: a program to find the average of very high numbers?

so im trying to make a c++ program that can find the average of very high numbers (the range was <10^19)
heres my attemp:
#include <iostream>
int main()
{
long double a,b,result;
std::cin>>a;
std::cin>>b;
result=(a+b)/2;
std::cout<<result<<"\n";
}
but somehow i did not the result i expected. my teacher said there was a "trick" and there was no need to even use double. but i search and researched and did not found the trick. so any help?
When using floating point numbers you have to consider their precision, it is represented by std::numeric_limits<T>::digits10 in base 10, and the following program can give them (they may depend on your platform):
#include <iostream>
#include <limits>
int main() {
std::cout << "float: " << std::numeric_limits<float>::digits10 << "\n";
std::cout << "double: " << std::numeric_limits<double>::digits10 << "\n";
std::cout << "long double: " << std::numeric_limits<long double>::digits10 << "\n";
return 0;
}
On ideone I get:
float: 6
double: 15
long double: 18
Which is consistent with 32 bits, 64 bits and 80 bits floating point numbers (respectively).
Since 1019 is above 18 digits (it has 20), the type you have chosen lacks the necessary precision to represent all numbers below it, and no amount of computation can recover the lost data.
Let's switch back to integrals, while their range is more limited, they have a higher degree of precision for the same amount of bits. A 64 bits signed integer has a maximum of 9,223,372,036,854,775,807 and the unsigned version goes up to 18,446,744,073,709,551,615. For comparison 1019 is 10,000,000,000,000,000,000.
A uint64_t (from <cstdint>) gives you to necessary building block, however you'll be teetering on the edge of overflow: 2 times 1019 is too much.
You now have to find a way to compute the average without adding the two number together.
Supposing two integers M, N such that M <= N, (M + N) / 2 = M + (N - M) / 2

Widening precision with cast results in how much precision?

I know that widening conversions are safe in that they result in no loss of data, but is there a real gain in precision or is it a longer representation with the same number of signifigant figures?
For example,
#include <iostream>
#include <iomanip>
int main()
{
float i = 0.012530f;
std::cout << std::setw(20) << std::setprecision(7) << i << std::endl;
double ii = (double)i;
std::cout << std::setw(20) << std::setprecision(15) << ii << std::endl;
double j = 0.012530;
std::cout << std::setw(20) << std::setprecision(15) << j << std::endl;
}
Produces the output
0.01253
0.012529999949039
0.01253
Looking at the variables in the debugger shows that j is rounded as floating point cannot represent the original number exactly, but it is still a more exact approximation of the original number than ii.
i = 0.012530000
ii = 0.012529999949038029
j = 0.012529999999999999
Why is it that the cast is less exact than the direct assignment? Can I only count on 8 digits of exactitude if I widen the precision of a float?
It seems like the answer to your question is obvious. Because double holds more precision than float, you get a more precise value if you assign directly to a double and lose precision if you go through a float.
When you do float i = 0.012530f; you get a float that's as close to 0.01253 as a float can get. To 7 digits, that looks like 0.012530.
When you do double j = 0.012530;, you get a double that's as close to 0.01253 as a double can get.
If you cast the float to a double, you get a double that's as close to 0.01253 as a float can get.
You can't really compare numbers output to different precisions to see which is closer. For example, say the correct number is 0.5, and you have two approximations, "0.5001" and "0.49". Clearly, the first is better. But if you display the first with 5 decimal digits "0.5001" and the second with only one decimal digit "0.5", the second looks closer. Your first output has this kind of false, apparent precision due to showing with few digits and lucky rounding.

How to set 3 digits after comma

Well, basically I was using setprecision(3), but that is rounding up the last number, for example if we do like this -
double x = 5;
x = (double) x / 3;
cout << fixed << setprecision(3) << x << endl;
It will show 1.667
But, if we do it with calculator, it will show - 1.666666666...67
So basically, what I mean is, is there any chance to output in file, just the first 3 digits after the comma, and not to round it up?
1.666666666...67 rounded to three decimal places is 1.667
If you just want to truncate the output then send it to a string with strstream, search the string for the position of "." and truncate the string 3 places beyond that
Or if you simply want to always round down, multiply the result by 1000, use floor() to round down and then divide by 1000.0 again.
A cast to long truncates the fraction part :
int main()
{
double x;
x= -100.666666666666666;
x = static_cast<double> ( static_cast<long>(x * 1000) )/1000;
cout << x << endl;
}
We could use floor(double) from cmath, which is more preferable, but it's rounds negatives to negative side either.
cout << fixed << setprecision(3) << double(int(x*1000))/1000 << endl;
we use int() to truncate the tailing digits.