C++ Cout floating point problem - c++

#include <iostream>
using namespace std;
int main()
{
float s;
s = 10 / 3;
cout << s << endl;
cout.precision(4);
cout << s << endl;
return 0;
}
Why the output does not show 3.333 but only 3 ??

because you are doing integer division with s = 10 / 3
Try
s = 10.0f / 3.0f

The correct way to do a constant float division is:
s = 10.f / 3.f; // one of the operands must be a float
Without the f suffix, you are doing double division, giving a warning (from float to double).
You can also cast one of the operands:
s = static_cast<float>(10) / 3; // use static_cast, not C-style casts
Resulting in the correct division.

10/3 is integer division. You need to use 10.0/3 or (float)10/3 or 10/3.0, etc.

Related

Float operations using double

I have a function which takes two strings(floating point) , operation and floating point bit-width:
EvaluateFloat(const string &str1, const string &str2, enum operation/*add,subtract, multiply,div*/, unsigned int bit-width, string &output)
input str1 and str2 could be float(32 bit) or double (64 bit).
Is it fine If store the inputs in double and perform double operation irrespective of bit-width and depending upon bit-width typecast it to float if it was 32 bit.
e.g
double num1 = atof(str1);
double num2 = atof(str2);
double result = num1 operation num2; //! operation will resolved using switch
if(32 == bit-width)
{
float f_result = result;
output = std::to_string(f_result);
}
else
{
output = std::to_string(result);
}
Can I assume safely f_result will be exactly same if I had performed operation using float type for float operations i.e.
float f_num1 = num1;
float f_num2 = num2;
float f_result = f_num1 operation f_num2
PS:
We assume there won;t be any cascaded operation i.e. out = a + b + c
instead it will transformed to: temp = a +b out = temp + c
I'm not concerned by inf and nan values.
I'm trying to code redundancy otherwise I have two do same operation
twice once for float and other for double
C++ does not specify which formats are used for float or double. If IEEE-754 binary32 and binary64 are used, then double-rounding errors do not occur for +, -, *, /, or sqrt. Given float x and float y, the following hold (float arithmetic on the left, double on the right):
x+y = (float) ((double) x + (double) y).
x-y = (float) ((double) x - (double) y).
x*y = (float) ((double) x * (double) y).
x/y = (float) ((double) x / (double) y).
sqrt(x) = (float) sqrt((double) x).
This is per the dissertation A Rigorous Framework for Fully Supporting the IEEE Standard for Floating-Point Arithmetic in High-Level Programming Languages by Samuel A. Figueroa del Cid, January 2000, New York University. Essentially, double has so many digits (bits) beyond float that the rounding to double never conceals the information needed to round correctly to float for results of these operations. (This cannot hold for operations in general; it depends on properties of these operations.) On page 57, Figueroa del Cid gives a table showing that, if the float format has p bits, then, to avoid double rounding errors, double must have 2p+1 bits for addition or subtraction, 2p for multiplication and division, and 2p+2 for sqrt. Since binary32 has 24 bits in the significand and double has 53, these are satisfied. (See the paper for details. There are some caveats, such as that p must be at least 2 or 4 for the various operations.)
According to standards floating point operations on double is equivalent to doing the operation in infinite precision. If we convert it to float we have now rounded it twice. In general this is not equivalent to just rounding to a float in the first place. For example. 0.47 rounds to 0.5 which rounds to 1, but 0.47 rounds directly to 0. As mentioned by chtz, multiplication of two floats should always be exactly some double (using IEEE math where double has more than twice the precision of float), so when we cast to a float we have still only lost precision once and so the result should be the same. Likewise addition and subtraction should not be a problem.
Division cannot be exactly represented in a double (not even 1/3), so we may think there is a problem with division. However I have run the sample code over night, trying over 3 trillion cases and have not found any case where running the original divide as a double gives a different answer.
#include <iostream>
int main() {
long i=0;
while (1) {
float x = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
float y = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
float f = x / y;
double d = (double)x / (double)y;
if(++i % 10000000 == 0) { std::cout << i << "\t" << x << "," << y << std::endl; }
if ((float(d) != f)) {
std::cout << std::endl;
std::cout << x << "," << y << std::endl;
std::cout << std::hex << *(int*)&x << "," << std::hex << *(int*)&y << std::endl;
std::cout << float(d) - f << std::endl;
return 1;
}
}
}

Why are the decimals absent in fractional number even after type casting from int to double?

From what I've gathered, assigning a fractional number to a double won't work properly unless either the numerator or the denominator is a floating point number, ( and by "not working properly", I mean that the decimals get cut off, I know that numbers can't be stored as fractions of course). However, I've tried type casting ints to doubles before assigning them to another double variable but it still doesn't work. It's not a big deal since I just had to do a minor work around, but why is this the case?
I added some coding I did while testing.
#include <iostream>
using namespace std;
double convert(int v) {
return v;
}
int main() {
int a = 5;
int b = 2;
double n;
n = convert(a) / convert(b);
cout << n << endl; // Decimals are stored
a = static_cast<double> (a);
b = static_cast<double> (b);
n = a / b;
cout << n << endl; // Decimals are cut off
a = (double) a;
b = (double) b;
n = a / b;
cout << n << endl; << // Decimals are cut off
double c = a;
double d = b;
n = c / d;
cout << n << endl; // Decimals are stored
return 0;
}
Output:
2.5
2
2
2.5
Because
a / b;
is integer division (because both operands are int) i.e. the output is an integer, whether the output is then assigned to double or anything else is irrelevant in the calculation of the result.
Because of integer division.
n = a / b;
Here a and b are integers so the result is also an integer, this is a rule of C++, so 5/2 == 2. The integer 2 then gets converted to a double which then prints as 2.
int a = 5;
a = static_cast<double> (a);
The first line creates an int variable named a and puts the value 5 in it. The second line explicitly converts the value of a to a double, then stores that converted value in a. However, a has type int, so there is an implicit conversion to int. That is, the second line is functionally equivalent to:
a = static_cast<int> ( static_cast<double> (a) );
So by the time you get to the division, you are back to integer arithmetic. To get the conversion to floating point to "stick" through your division, you need to avoid throwing it away. You could either assign the converted value to a new variable, as in
double aa = static_cast<double> (a);
or do the conversion in the same expression as the division
n = static_cast<double>(a) / b;
n = a / static_cast<double>(b);
n = static_cast<double>(a) / static_cast<double>(b);
Any of these three alternatives will trigger floating-point division.

Ensure float to be smaller than exact value

I want to calculate a sum of the following form in C++
float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn
xi,yi are all integers. The result will be an approximation of the actual value. It is crucial that this approximation is smaller or equal to the actual value. I can assume that all my values are finite and positive.
I tried using nextf(,0) as in this code snippet.
cout.precision( 15 );
float a = 1.0f / 3.0f * 10; //3 1/3
float b = 2.0f / 3.0f * 10; //6 2/3
float af = nextafterf( a , 0 );
float bf = nextafterf( b , 0 );
cout << a << endl;
cout << b << endl;
cout << af << endl;
cout << bf << endl;
float sumf = 0.0f;
for ( int i = 1; i <= 3; i++ )
{
sumf = sumf + bf;
}
sumf = sumf + af;
cout << sumf << endl;
As one can see the correct solution would be 3*6,666... +3.333.. = 23,3333...
But as output I get:
3.33333349227905
6.66666698455811
3.33333325386047
6.66666650772095
23.3333339691162
Even though my summands are smaller than what they should represent, their sum is not. In this case applying nextafterf to sumf will give me 23.3333320617676 which is smaller. But does this always work? Is it possible that the rounding error gets so big that nextafterf still leaves me above the correct value?
I know that I could avoid this by implementing a class for fractions and calculating everything exactly. But I'm curious whether it is possible to achieve my goal with floats.
Try changing the float rounding mode to FE_TOWARDZERO.
See code example here:
Change floating point rounding mode
My immediate reaction is that the approach you're taking is fundamentally flawed.
The problem is that with floating point numbers, the size of step that nextafter will take will depend on the magnitude of the numbers involved. Let's consider a somewhat extreme example:
#include <iostream>
#include <iomanip>
#include <cmath>
int main() {
float num = 1.0e-10f;
float denom = 1.0e10f;
std::cout << std::setprecision(7) << num - std::nextafterf(num, 0) << "\n";
std::cout << std::setprecision(7) << denom - std::nextafterf(denom, 0) << "\n";
}
Result:
6.938894e-018
1024
So, since the numerator is a lot smaller than the denominator, the increment is also much smaller.
The result seems fairly clear: instead of the result being slightly smaller than the input, the result should be quite a bit larger than the input.
If you want to ensure the result is smaller than the correct number, the obvious choice would be to round the numerator down, but the denominator up (i.e. nextafterf(denom, positive_infinity). This way, you get a smaller numerator and a larger denominator, so the result is always smaller than the un-modified version would have been.
float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn has 3 places where rounding may occur.
Conversion of int to float - it is not always exact.
Division floating point x/floating point y
Addition: floating point quotient + floating point quotient.
By using the next, (either up or down per the equation needs), the results will certainly be less than the exact mathematical value. This approach may not generate the float closest to the exact answer, yet will be close and certainly smaller.
float foo(const int *x, const int *y, size_t n) {
float sum = 0.0;
for (size_t i=0; i<n; i++) { // assume x[0] is x1, x[1] is x2 ...
float fx = nextafterf(x[i], 0.0);
float fy = nextafterf(y[i], FLT_MAX);
// divide by slightly smaller over slightly larger
float q = nextafterf(fx / fy, 0.0);
sum = nextafterf(sum + q, 0.0);
}
return sum;
}

Rounding double values in C++ like MS Excel does it

I've searched all over the net, but I could not find a solution to my problem. I simply want a function that rounds double values like MS Excel does. Here is my code:
#include <iostream>
#include "math.h"
using namespace std;
double Round(double value, int precision) {
return floor(((value * pow(10.0, precision)) + 0.5)) / pow(10.0, precision);
}
int main(int argc, char *argv[]) {
/* The way MS Excel does it:
1.27815 1.27840 -> 1.27828
1.27813 1.27840 -> 1.27827
1.27819 1.27843 -> 1.27831
1.27999 1.28024 -> 1.28012
1.27839 1.27866 -> 1.27853
*/
cout << Round((1.27815 + 1.27840)/2, 5) << "\n"; // *
cout << Round((1.27813 + 1.27840)/2, 5) << "\n";
cout << Round((1.27819 + 1.27843)/2, 5) << "\n";
cout << Round((1.27999 + 1.28024)/2, 5) << "\n"; // *
cout << Round((1.27839 + 1.27866)/2, 5) << "\n"; // *
if(Round((1.27815 + 1.27840)/2, 5) == 1.27828) {
cout << "Hurray...\n";
}
system("PAUSE");
return EXIT_SUCCESS;
}
I have found the function here at stackoverflow, the answer states that it works like the built-in excel rounding routine, but it does not. Could you tell me what I'm missing?
In a sense what you are asking for is not possible:
Floating point values on most common platforms do not have a notion of a "number of decimal places". Numbers like 2.3 or 8.71 simply cannot be represented precisely. Therefore, it makes no sense to ask for any function that will return a floating point value with a given number of non-zero decimal places -- such numbers simply do not exist.
The only thing you can do with floating point types is to compute the nearest representable approximation, and then print the result with the desired precision, which will give you the textual form of the number that you desire. To compute the representation, you can do this:
double round(double x, int n)
{
int e;
double d;
std::frexp(x, &e);
if (e >= 0) return x; // number is an integer, nothing to do
double const f = std::pow(10.0, n);
std::modf(x * f, &d); // d == integral part of 10^n * x
return d / f;
}
(You can also use modf instead of frexp to determine whether x is already an integer. You should also check that n is non-negative, or otherwise define semantics for negative "precision".)
Alternatively to using floating point types, you could perform fixed point arithmetic. That is, you store everything as integers, but you treat them as units of, say, 1/1000. Then you could print such a number as follows:
std::cout << n / 1000 << "." << n % 1000;
Addition works as expected, though you have to write your own multiplication function.
To compare double values, you must specify a range of comparison, where the result could be considered "safe". You could use a macro for that.
Here is one example of what you could use:
#define COMPARE( A, B, PRECISION ) ( ( A >= B - PRECISION ) && ( A <= B + PRECISION ) )
int main()
{
double a = 12.34567;
bool equal = COMPARE( a, 12.34567F, 0.0002 );
equal = COMPARE( a, 15.34567F, 0.0002 );
return 0;
}
Thank you all for your answers! After considering the possible solutions I changed the original Round() function in my code to adding 0.6 instead of 0.5 to the value.
The value "127827.5" (I do understand that this is not an exact representation!) becomes "127828.1" and finally through floor() and dividing it becomes "1.27828" (or something more like 1.2782800..001). Using COMPARE suggested by Renan Greinert with a correctly chosen precision I can safely compare the values now.
Here is the final version:
#include <iostream>
#include "math.h"
#define COMPARE(A, B, PRECISION) ((A >= B-PRECISION) && (A <= B+PRECISION))
using namespace std;
double Round(double value, int precision) {
return floor(value * pow(10.0, precision) + 0.6) / pow(10.0, precision);
}
int main(int argc, char *argv[]) {
/* The way MS Excel does it:
1.27815 1.27840 // 1.27828
1.27813 1.27840 -> 1.27827
1.27819 1.27843 -> 1.27831
1.27999 1.28024 -> 1.28012
1.27839 1.27866 -> 1.27853
*/
cout << Round((1.27815 + 1.27840)/2, 5) << "\n";
cout << Round((1.27813 + 1.27840)/2, 5) << "\n";
cout << Round((1.27819 + 1.27843)/2, 5) << "\n";
cout << Round((1.27999 + 1.28024)/2, 5) << "\n";
cout << Round((1.27839 + 1.27866)/2, 5) << "\n";
//Comparing the rounded value against a fixed one
if(COMPARE(Round((1.27815 + 1.27840)/2, 5), 1.27828, 0.000001)) {
cout << "Hurray!\n";
}
//Comparing two rounded values
if(COMPARE(Round((1.27815 + 1.27840)/2, 5), Round((1.27814 + 1.27841)/2, 5), 0.000001)) {
cout << "Hurray!\n";
}
system("PAUSE");
return EXIT_SUCCESS;
}
I've tested it by rounding a hundred double values and than comparing the results to what Excel gives. They were all the same.
I'm afraid the answer is that Round cannot perform magic.
Since 1.27828 is not exactly representable as a double, you cannot compare some double with 1.27828 and hope it will match.
You need to do the maths without the decimal part, to get that numbers... so something like this.
double dPow = pow(10.0, 5.0);
double a = 1.27815;
double b = 1.27840;
double a2 = 1.27815 * dPow;
double b2 = 1.27840 * dPow;
double c = (a2 + b2) / 2 + 0.5;
Using your function...
double c = (Round(a) + Round(b)) / 2 + 0.5;

C++ floating point to integer type conversions

What are the different techniques used to convert float type of data to integer in C++?
#include <iostream>
using namespace std;
struct database {
int id, age;
float salary;
};
int main() {
struct database employee;
employee.id = 1;
employee.age = 23;
employee.salary = 45678.90;
/*
How can i print this value as an integer
(with out changing the salary data type in the declaration part) ?
*/
cout << endl << employee.id << endl << employee.
age << endl << employee.salary << endl;
return 0;
}
What you are looking for is 'type casting'. typecasting (putting the type you know you want in brackets) tells the compiler you know what you are doing and are cool with it. The old way that is inherited from C is as follows.
float var_a = 9.99;
int var_b = (int)var_a;
If you had only tried to write
int var_b = var_a;
You would have got a warning that you can't implicitly (automatically) convert a float to an int, as you lose the decimal.
This is referred to as the old way as C++ offers a superior alternative, 'static cast'; this provides a much safer way of converting from one type to another. The equivalent method would be (and the way you should do it)
float var_x = 9.99;
int var_y = static_cast<int>(var_x);
This method may look a bit more long winded, but it provides much better handling for situations such as accidentally requesting a 'static cast' on a type that cannot be converted. For more information on the why you should be using static cast, see this question.
Normal way is to:
float f = 3.4;
int n = static_cast<int>(f);
Size of some float types may exceed the size of int.
This example shows a safe conversion of any float type to int using the int safeFloatToInt(const FloatType &num); function:
#include <iostream>
#include <limits>
using namespace std;
template <class FloatType>
int safeFloatToInt(const FloatType &num) {
//check if float fits into integer
if ( numeric_limits<int>::digits < numeric_limits<FloatType>::digits) {
// check if float is smaller than max int
if( (num < static_cast<FloatType>( numeric_limits<int>::max())) &&
(num > static_cast<FloatType>( numeric_limits<int>::min())) ) {
return static_cast<int>(num); //safe to cast
} else {
cerr << "Unsafe conversion of value:" << num << endl;
//NaN is not defined for int return the largest int value
return numeric_limits<int>::max();
}
} else {
//It is safe to cast
return static_cast<int>(num);
}
}
int main(){
double a=2251799813685240.0;
float b=43.0;
double c=23333.0;
//unsafe cast
cout << safeFloatToInt(a) << endl;
cout << safeFloatToInt(b) << endl;
cout << safeFloatToInt(c) << endl;
return 0;
}
Result:
Unsafe conversion of value:2.2518e+15
2147483647
43
23333
For most cases (long for floats, long long for double and long double):
long a{ std::lround(1.5f) }; //2l
long long b{ std::llround(std::floor(1.5)) }; //1ll
Check out the boost NumericConversion library. It will allow to explicitly control how you want to deal with issues like overflow handling and truncation.
I believe you can do this using a cast:
float f_val = 3.6f;
int i_val = (int) f_val;
the easiest technique is to just assign float to int, for example:
int i;
float f;
f = 34.0098;
i = f;
this will truncate everything behind floating point or you can round your float number before.
One thing I want to add. Sometimes, there can be precision loss. You may want to add some epsilon value first before converting. Not sure why that works... but it work.
int someint = (somedouble+epsilon);
This is one way to convert IEEE 754 float to 32-bit integer if you can't use floating point operations. It has also a scaler functionality to include more digits to the result. Useful values for scaler are 1, 10 and 100.
#define EXPONENT_LENGTH 8
#define MANTISSA_LENGTH 23
// to convert float to int without floating point operations
int ownFloatToInt(int floatBits, int scaler) {
int sign = (floatBits >> (EXPONENT_LENGTH + MANTISSA_LENGTH)) & 1;
int exponent = (floatBits >> MANTISSA_LENGTH) & ((1 << EXPONENT_LENGTH) - 1);
int mantissa = (floatBits & ((1 << MANTISSA_LENGTH) - 1)) | (1 << MANTISSA_LENGTH);
int result = mantissa * scaler; // possible overflow
exponent -= ((1 << (EXPONENT_LENGTH - 1)) - 1); // exponent bias
exponent -= MANTISSA_LENGTH; // modify exponent for shifting the mantissa
if (exponent <= -(int)sizeof(result) * 8) {
return 0; // underflow
}
if (exponent > 0) {
result <<= exponent; // possible overflow
} else {
result >>= -exponent;
}
if (sign) result = -result; // handle sign
return result;
}