Am I doing double to float conversion here - c++

const double dBLEPTable_8_BLKHAR[4096] = {
0.00000000000000000000000000000000,
-0.00000000239150987901837200000000,
-0.00000000956897738824125100000000,
-0.00000002153888378764179400000000,
-0.00000003830892270073604800000000,
-0.00000005988800189093979000000000,
-0.00000008628624126316708500000000,
-0.00000011751498329992671000000000,
-0.00000015358678995269770000000000,
-0.00000019451544774895524000000000,
-0.00000024031597312124120000000000,
-0.00000029100459975062165000000000
}
If I change the double above to float, am I doing incurring conversion cpu cycles when I perform operations on the array contents? Or is the "conversion" sorted out during compile time?
Say, dBLEPTable_8_BLKHAR[1] + dBLEPTable_8_BLKHAR[2] , something simple like this?
On a related note, how many trailing decimal places should a float be able to store?
This is c++.

Any good compiler will convert the initializers during compile time. However, you also asked
am I incurring conversion cpu cycles when I perform operations on the array contents?
and that depends on the code performing the operations. If your expression combines array elements with variables of double type, then the operation will be performed at double precision, and the array elements will be promoted (converted) before the arithmetic takes place.
If you just combine array elements with variables of float type (including other array elements), then the operation is performed on floats and the language doesn't require any promotion (But if your hardware only implements double precision operations, conversion might still be done. Such hardware surely makes the conversions very cheap, though.)

Ben Voigt answer addresses your question for most parts.
But you also ask:
On a related note, how many trailing decimal places should a float be able to store
It depends on the value of the number you are trying to store. For large numbers there is no decimals - in fact the format can't even give you a precise value for the integer part. For instance:
float x = BIG_NUMBER;
float y = x + 1;
if (x == y)
{
// The code get here if BIG_NUMBER is very high!
}
else
{
// The code get here if BIG_NUMBER is no so high!
}
If BIG_NUMBER is 2^23 the next greater number would be (2^23 + 1).
If BIG_NUMBER is 2^24 the next greater number would be (2^24 + 2).
The value (2^24 + 1) can not be stored.
For very small numbers (i.e. close to zero), you will have a lot of decimal places.
Floating point is to be used with great care because they are very imprecise.
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
For small numbers you can experiment with the program below.
Change the exp variable to set the starting point. The program will show you what the step size is for the range and the first four valid numbers.
int main (int argc, char* argv[])
{
int exp = -27; // <--- !!!!!!!!!!!
// Change this to set starting point for the range
// Starting point will be 2 ^ exp
float f;
unsigned int *d = (unsigned int *)&f; // Brute force to set f in binary format
unsigned int e;
cout.precision(100);
// Calculate step size for this range
e = ((127-23) + exp) << 23;
*d = e;
cout << "Step size = " << fixed << f << endl;
cout << "First 4 numbers in range:" << endl;
// Calculate first four valid numbers in this range
e = (127 + exp) << 23;
*d = e | 0x00000000;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000001;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000002;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000003;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
return 0;
}
For exp = -27 the output will be:
Step size = 0.0000000000000008881784197001252323389053344726562500000000000000000000000000000000000000000000000000
First 4 numbers in range:
0x32000000 = 0.0000000074505805969238281250000000000000000000000000000000000000000000000000000000000000000000000000
0x32000001 = 0.0000000074505814851022478251252323389053344726562500000000000000000000000000000000000000000000000000
0x32000002 = 0.0000000074505823732806675252504646778106689453125000000000000000000000000000000000000000000000000000
0x32000003 = 0.0000000074505832614590872253756970167160034179687500000000000000000000000000000000000000000000000000

const double dBLEPTable_8_BLKHAR[4096] = {
If you change the double in that line to float, then one of two things will happen:
At compile time, the compiler will convert the numbers -0.00000000239150987901837200000000 to the float that best represents them, and will then store that data directly into the array.
At runtime, during the program initialization (before main() is called!) the runtime that the compiler generated will fill that array with data of type float.
Either way, once you get to main() and to code that you've written, all of that data will be stored as float variables.

Related

Iterating QList<double> changes list values

I'm trying to convert a double with 4 decimals in a quint32, but when I iterate the list, the values are different.
I added a breakpoint at the first cycle and these are the variables, how can I make "i" to be 112778?
EDIT:
This is the code:
QList<double> list;
list << 11.2778;
list << 11.3467;
list << 11.3926;
list << 11.4531;
list << 11.4451;
list << 11.4625;
list << 11.4579;
list << 11.4375;
list << 11.4167;
list << 11.6285;
list << 11.5625;
list << 11.4427;
list << 11.4278;
list << 11.4063;
list << 11.2500;
for(double value : list)
{
double v = value * 10000;
quint32 i = v;
qDebug() << v << i;
}
I was expecting the numbers to be converted to quint32 without floating point, but that's not the result
This is just a question of floating point precision in C++, and there are a lot of existing SO questions on the topic. The problem I think arises from the fact that: 11.2778 * 10000 might not get calculated to be exactly 112778. It might think it is 112777.999999999, or whatever. Converting to an int doesn't round to the nearest integer, it just truncates everything after the decimal point. So that's how you can end up with 112777. To fix this, you can simply force it to round:
for(double value : list)
{
double v = value * 10000;
quint32 i = qRound(v); // Round the double to get the best int
qDebug() << value << v << i;
}
I printed value, too, as like below.
qDebug() << value << v << i;
The output is below.
11.2778 112778 112777
11.3467 113467 113467
11.3926 113926 113926
11.4531 114531 114530
11.4451 114451 114451
11.4625 114625 114625
11.4579 114579 114579
11.4375 114375 114375
11.4167 114167 114167
11.6285 116285 116285
11.5625 115625 115625
11.4427 114427 114427
11.4278 114278 114278
11.4063 114063 114063
11.25 112500 112500
Do you mean that the last digit is different? If so, the last digit may be different because decimal numbers are not hold on the memory digit by digit.

Multiplication of large numbers yields wrong value

I have the code
long long x = 200000 * 200000;
cout << x << endl;
it outputs 1345294336
I've tried converting to a string and outputting each digit, and it still outputs the same thing
Try
long long x = 200000LL * 200000LL;
std::cout << x << std::endl;
note the "LL" suffix. To read more about using suffixes for numeric literals, visit the Integer Literals page on cppreference.com.

Calculate using int and output float?

//findSlope(twoPoints).exe
//finding the slope of line AB, using coordiantes of point A and B.
#include <iostream>
int main()
{
int a, b, c, d;
float answer;
std::cout << "The X coordiante of A: ";
std::cin >> a;
std::cout << "\nThe Y coordiante of A: ";
std::cin >> b;
std::cout << "\nThe X coordiante of B: ";
std::cin >> c;
std::cout << "\nThe Y coordiante of B: ";
std::cin >> d;
std::cout << "\nThe slope of line AB = " << std::endl;
answer = (b-d)/(a-c);
std::cout.setf(std::ios::fixed);
std::cout.precision(3);
std::cout << answer << std::endl;
//alternative= std::cout << fixed << setprecision(#) << answer << std::endl;
std::cout.unsetf(std::ios::fixed);
return 0;
}
I am learning C++ and I tried to code a program that calculate the slope using the coordinates of two points.
I understand that if I use float for variables I declared for the coordinates, the result of the calculation would output as float with decimals. However, I wonder if I may still use int for user input so that I can ensure the inputs are integers.
Extra question: Would it be possible to convert a float presented in the form of "#.##" to "# #/#"? More like how we do mathematics IRL.
You can use implicit conversion to double:
answer = (b-d)/(a-c*1.0);
Or explicit cast:
answer = (b-d)/(a-(float)c);
Bonuses:
for the fraction part: Converting decimal to fraction c++
Why does integer division result in an integer?
You can use int for user input, but to precisely calculate anything that contains a division operator /, you'll need to cast to floating point types.
It's usually considered a good practice in C++ to use static_cast for that (although you still may use c-style (float) syntax).
For example:
answer = static_cast<float>(b - d) / (a - c);
Here, you convert (b - d) to float and then divide it by integer, which results in a float.
Note that the following wouldn't work correctly:
answer = static_cast<float>((b - d) / (a - c));
The reason is that you first divide an int by another int and then convert the resulting int to a float.
P. S. float is really inaccurate, so I would advise to use double instead of float in all cases except where you want to write faster code that does not depend on mathematical accuracy (even though I'm not sure it would be faster on modern processors) or maintain compatibility with an existing library that uses float for some of its functions.

How to express large numbers to two decimal places in C++ Calculator

I am trying to write a calculator in C++ that does the basic functions of /, *, -, or + and shows the answer to two decimal places (with 0.01 precision).
For example 100.1 * 100.1 should print the result as 10020.01 but instead I get -4e-171. From my understanding this is from overflow, but that's why I chose long double in the first place!
#include <iostream>
#include <iomanip>
using namespace std;
long double getUserInput()
{
cout << "Please enter a number: \n";
long double x;
cin >> x;
return x;
}
char getMathematicalOperation()
{
cout << "Please enter which operator you want "
"(add +, subtract -, multiply *, or divide /): \n";
char o;
cin >> o;
return o;
}
long double calculateResult(long double nX, char o, long double nY)
{
// note: we use the == operator to compare two values to see if they are equal
// we need to use if statements here because there's no direct way
// to convert chOperation into the appropriate operator
if (o == '+') // if user chose addition
return nX + nY; // execute this line
if (o == '-') // if user chose subtraction
return nX - nY; // execute this line
if (o == '*') // if user chose multiplication
return nX * nY; // execute this line
if (o == '/') // if user chose division
return nX / nY; // execute this line
return -1; // default "error" value in case user passed in an invalid chOperation
}
void printResult(long double x)
{
cout << "The answer is: " << setprecision(0.01) << x << "\n";
}
long double calc()
{
// Get first number from user
long double nInput1 = getUserInput();
// Get mathematical operations from user
char o = getMathematicalOperation();
// Get second number from user
long double nInput2 = getUserInput();
// Calculate result and store in temporary variable (for readability/debug-ability)
long double nResult = calculateResult(nInput1, o, nInput2);
// Print result
printResult(nResult);
return 0;
}
setprecision tells it how many decimal places you want as an int so you're actually setting it to setprecision(0) since 0.01 get truncated. In your case you want it set to 2. You should also use std::fixed or you'll get scientific numbers.
void printResult(long double x)
{
cout << "The answer is: " << std::fixed << setprecision(2) << x << "\n";
}
working example
It is not due to overflow you get the strange result. Doubles can easily hold numbers in the range you are showing.
Try to print the result without setprecision.
EDIT:
After trying
long double x = 100.1;
cout << x << endl;
I see that it doesn't work on my Windows system.
So I searched a little and found:
print long double on windows
maybe that is the explanation.
So I tried
long double x = 100.1;
cout << (double)x << endl;
which worked fine.
2nd EDIT:
Also see this link provided by Raphael
http://oldwiki.mingw.org/index.php/long%20double
The default floating point presentation switches automatically between presentation like 314.15 and 3.1e2, depending on the size of the number and the maximum number of digits it can use. With this presentation the precision is the maximum number of digits. By default it's 6.
You can either increase the maximum number of digits so that your result can be presented like 314.15, or you can force such fixed point notation by using the std::fixed manipulator. With std::fixed the precision is the number of decimals.
However, with std::fixed very large and very small numbers may be pretty unreadable.
The setprecision() manipulator specifies the number of digits after the decimal point. So, if you want 100.01 to be printed, use setprecision(2).
When you use setprecision(0.01), the value 0.01 is being converted to int, which will have a value of 0.
It wouldn't have hurt if you had actually read the documentation for setprecision() - that clearly specifies an int argument, not a floating point one.

print X number after the decimal point using the cout

I have this code:
double a = 7.456789;
cout.unsetf(ios::floatfield);
cout.precision(5);
cout << a;
and also this one:
double a = 798456.6;
cout.unsetf(ios::floatfield);
cout.precision(5);
cout << a;
the result of the first code is: 7.4568
Which is almost what I want (what I want to recieve is 7.4567)
the result of the second : 7.9846e+05
Which is not at all what I want (I want 798456.6)
I want to print the number till 4 numbers after the decimal point
How can I do that ?
By using unsetf(), you are telling cout to use its default formatting for floating-point values. Since you want an exact number of digits after the decimal, you should be using setf(fixed) or std::fixed instead, eg:
double a = ...;
std::cout.setf(std::fixed, ios::floatfield);
std::cout.precision(5);
std::cout << a;
.
double a = ...;
std::cout.precision(5);
std::cout << std::fixed << a;