I am new to C++ and have the following simple code snippet exploring C++ limitations:
int main() {
float x = 2147483000;
int y = static_cast<int>(x);
cout << "x: " << x << ", y: " << y;
}
Why is the output showing me different values for x & y even though the float value is within int limit which is 2147483647
Code Results
x: 2.14748e+09, y: 2147483008
Why is it showing different values for x & y?
I have read your question carefully. There is a misconception, not mistake.
That is happening because the float have a certain capacity to store its precision up to 7 decimal places if the digits exceeds than the 7th digit, It would loss its precision after the 7th digit for beyond to it. Due to this reason the output is not accurate or same.
Why is it showing different values for x & y?
The default conversion for displaying float that are in the range for which an exponent is used for display is to show six significant digits.
In the format commonly used for float, IEEE-754 binary32, the two representable values nearest 2,147,483,000 are 2,147,482,880 and 2,147,483,008. So, for float x = 2147483000;, a good C++ implementation will convert 2,147,483,000 to the closest float value, 2,147,483,008.
Then int y = static_cast<int>(x); sets y to this same value, 2,147,483,008.
When the float x is inserted into the cout stream with default formatting, six significant digits are used, producing 2.14748•109.
When the int y is inserted into the stream, its full value is shown, 2,147,483,008.
You can see the full value by requesting more precision. std::cout << std::setprecision(99) << "x: " << x << ", y: " << y << '\n'; produces “x: 2147483008, y: 2147483008”.
Related
`when I Run the following code it gives me answer as 16777216 but it is supposed to give 16777215 why is this so..
int d=33554431;
d=d-ceil(d/(float)2);
cout<<d<<" ";
Well, my calculator says that 33,554,431 / 2 is actually 16,777,215.5, which means that ceil(16,777,215.5) = 16,777,216 is actually correct.
Ceil rounds up to the next bigger integer, if that was unclear.
Ok, at first I misunderstood your question; the title sounds like you are asking why the Ceiling (ceil function) isn't correct.
int d=33554431;
d=d-ceil(d/(float)2);
cout<<d<<" ";
In your second line, you cast the literal 2 to a float value so the compiler also converts d to a float when it calculates d/2. Because of the internal representation, float (single precision floating point) are limited in the values that they can accurately represent. I typically assume no more than 7 digits of precision, if I need more than that, I use doubles. Anyway if you look at this link (https://en.wikipedia.org/wiki/Single-precision_floating-point_format) integers in the range [16777217,33554432] round to a multiple of 2. SO when the compiler converts d to a float it becomes 33554432. You can see that be running the following code:
int d1 = 33554431;
float f = d1;
int d2 = f;
cout << d1 << endl;
cout << f << endl;
cout << d2 << endl;
To fix your original code, try this:
int d=33554431;
d=d-ceil(d/(double)2);
cout<<d<<" ";
or
int d=33554431;
d=d-ceil(d/2.0);
cout<<d<<" ";
I make some calculations and the result is
0.000137*0.000137= 0.000000018769
I save the answer in float y
but it seems to be saved as 1.88788682e-008
I want it to be saved as 0.000000018769
I tried the type double but got same answer
int main()
{
float y= 0.000137*0.000137;
return 0;
}
y appears in the watch while debugging as 0.000137*0.000137
You don't really have control over how floating point numbers are stored (which is mostly irrelevant anyway). You do have control over how they're printed though. If you want to print them out without the scientific notation, you can use std::fixed to get that:
int main() {
float y = 0.000137*0.000137;
std::cout << std::fixed << std::setprecision(12) << y << "\n";
}
Result:
0.000000018769
It is always saved equal and those numbers are equivalent
If you are printing with cout, check this reference about cout formating page
http://www.cplusplus.com/reference/ios/fixed/
I am trying to write a calculator in C++ that does the basic functions of /, *, -, or + and shows the answer to two decimal places (with 0.01 precision).
For example 100.1 * 100.1 should print the result as 10020.01 but instead I get -4e-171. From my understanding this is from overflow, but that's why I chose long double in the first place!
#include <iostream>
#include <iomanip>
using namespace std;
long double getUserInput()
{
cout << "Please enter a number: \n";
long double x;
cin >> x;
return x;
}
char getMathematicalOperation()
{
cout << "Please enter which operator you want "
"(add +, subtract -, multiply *, or divide /): \n";
char o;
cin >> o;
return o;
}
long double calculateResult(long double nX, char o, long double nY)
{
// note: we use the == operator to compare two values to see if they are equal
// we need to use if statements here because there's no direct way
// to convert chOperation into the appropriate operator
if (o == '+') // if user chose addition
return nX + nY; // execute this line
if (o == '-') // if user chose subtraction
return nX - nY; // execute this line
if (o == '*') // if user chose multiplication
return nX * nY; // execute this line
if (o == '/') // if user chose division
return nX / nY; // execute this line
return -1; // default "error" value in case user passed in an invalid chOperation
}
void printResult(long double x)
{
cout << "The answer is: " << setprecision(0.01) << x << "\n";
}
long double calc()
{
// Get first number from user
long double nInput1 = getUserInput();
// Get mathematical operations from user
char o = getMathematicalOperation();
// Get second number from user
long double nInput2 = getUserInput();
// Calculate result and store in temporary variable (for readability/debug-ability)
long double nResult = calculateResult(nInput1, o, nInput2);
// Print result
printResult(nResult);
return 0;
}
setprecision tells it how many decimal places you want as an int so you're actually setting it to setprecision(0) since 0.01 get truncated. In your case you want it set to 2. You should also use std::fixed or you'll get scientific numbers.
void printResult(long double x)
{
cout << "The answer is: " << std::fixed << setprecision(2) << x << "\n";
}
working example
It is not due to overflow you get the strange result. Doubles can easily hold numbers in the range you are showing.
Try to print the result without setprecision.
EDIT:
After trying
long double x = 100.1;
cout << x << endl;
I see that it doesn't work on my Windows system.
So I searched a little and found:
print long double on windows
maybe that is the explanation.
So I tried
long double x = 100.1;
cout << (double)x << endl;
which worked fine.
2nd EDIT:
Also see this link provided by Raphael
http://oldwiki.mingw.org/index.php/long%20double
The default floating point presentation switches automatically between presentation like 314.15 and 3.1e2, depending on the size of the number and the maximum number of digits it can use. With this presentation the precision is the maximum number of digits. By default it's 6.
You can either increase the maximum number of digits so that your result can be presented like 314.15, or you can force such fixed point notation by using the std::fixed manipulator. With std::fixed the precision is the number of decimals.
However, with std::fixed very large and very small numbers may be pretty unreadable.
The setprecision() manipulator specifies the number of digits after the decimal point. So, if you want 100.01 to be printed, use setprecision(2).
When you use setprecision(0.01), the value 0.01 is being converted to int, which will have a value of 0.
It wouldn't have hurt if you had actually read the documentation for setprecision() - that clearly specifies an int argument, not a floating point one.
const double dBLEPTable_8_BLKHAR[4096] = {
0.00000000000000000000000000000000,
-0.00000000239150987901837200000000,
-0.00000000956897738824125100000000,
-0.00000002153888378764179400000000,
-0.00000003830892270073604800000000,
-0.00000005988800189093979000000000,
-0.00000008628624126316708500000000,
-0.00000011751498329992671000000000,
-0.00000015358678995269770000000000,
-0.00000019451544774895524000000000,
-0.00000024031597312124120000000000,
-0.00000029100459975062165000000000
}
If I change the double above to float, am I doing incurring conversion cpu cycles when I perform operations on the array contents? Or is the "conversion" sorted out during compile time?
Say, dBLEPTable_8_BLKHAR[1] + dBLEPTable_8_BLKHAR[2] , something simple like this?
On a related note, how many trailing decimal places should a float be able to store?
This is c++.
Any good compiler will convert the initializers during compile time. However, you also asked
am I incurring conversion cpu cycles when I perform operations on the array contents?
and that depends on the code performing the operations. If your expression combines array elements with variables of double type, then the operation will be performed at double precision, and the array elements will be promoted (converted) before the arithmetic takes place.
If you just combine array elements with variables of float type (including other array elements), then the operation is performed on floats and the language doesn't require any promotion (But if your hardware only implements double precision operations, conversion might still be done. Such hardware surely makes the conversions very cheap, though.)
Ben Voigt answer addresses your question for most parts.
But you also ask:
On a related note, how many trailing decimal places should a float be able to store
It depends on the value of the number you are trying to store. For large numbers there is no decimals - in fact the format can't even give you a precise value for the integer part. For instance:
float x = BIG_NUMBER;
float y = x + 1;
if (x == y)
{
// The code get here if BIG_NUMBER is very high!
}
else
{
// The code get here if BIG_NUMBER is no so high!
}
If BIG_NUMBER is 2^23 the next greater number would be (2^23 + 1).
If BIG_NUMBER is 2^24 the next greater number would be (2^24 + 2).
The value (2^24 + 1) can not be stored.
For very small numbers (i.e. close to zero), you will have a lot of decimal places.
Floating point is to be used with great care because they are very imprecise.
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
For small numbers you can experiment with the program below.
Change the exp variable to set the starting point. The program will show you what the step size is for the range and the first four valid numbers.
int main (int argc, char* argv[])
{
int exp = -27; // <--- !!!!!!!!!!!
// Change this to set starting point for the range
// Starting point will be 2 ^ exp
float f;
unsigned int *d = (unsigned int *)&f; // Brute force to set f in binary format
unsigned int e;
cout.precision(100);
// Calculate step size for this range
e = ((127-23) + exp) << 23;
*d = e;
cout << "Step size = " << fixed << f << endl;
cout << "First 4 numbers in range:" << endl;
// Calculate first four valid numbers in this range
e = (127 + exp) << 23;
*d = e | 0x00000000;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000001;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000002;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000003;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
return 0;
}
For exp = -27 the output will be:
Step size = 0.0000000000000008881784197001252323389053344726562500000000000000000000000000000000000000000000000000
First 4 numbers in range:
0x32000000 = 0.0000000074505805969238281250000000000000000000000000000000000000000000000000000000000000000000000000
0x32000001 = 0.0000000074505814851022478251252323389053344726562500000000000000000000000000000000000000000000000000
0x32000002 = 0.0000000074505823732806675252504646778106689453125000000000000000000000000000000000000000000000000000
0x32000003 = 0.0000000074505832614590872253756970167160034179687500000000000000000000000000000000000000000000000000
const double dBLEPTable_8_BLKHAR[4096] = {
If you change the double in that line to float, then one of two things will happen:
At compile time, the compiler will convert the numbers -0.00000000239150987901837200000000 to the float that best represents them, and will then store that data directly into the array.
At runtime, during the program initialization (before main() is called!) the runtime that the compiler generated will fill that array with data of type float.
Either way, once you get to main() and to code that you've written, all of that data will be stored as float variables.
For the below program I am getting precision loss of 1 which I am unable to understand. Need help.
void main()
{
typedef std::numeric_limits< double > dbl;
cout.precision(dbl::digits10);
double x = -53686781.0;
float xFloat = (float) x;
cout << "x :: " << x << "\n";
cout << "xFloat :: " << xFloat << "\n";
}
Outpput:
x :: -53686781
xFloat :: -53686780
53686781 looks like this in binary: 11001100110011000111111101. That's 26 bits.
Your float can only store up to 24 bits in its mantissa portion, so, you end up with 110011001100110001111111 stored in it. The last two binary digits, 01, get truncated.
And 11001100110011000111111100 is 53686780.
As simple as that.
For normal floats I believe p=23, which gives 2^23 of digit precision (about 7 digits as already mentioned. Double has p=52, which gives 2^52 of digit precision (about 15 digits).
The wiki page is actually pretty good.