Converting int to float? - c++

Whenever I try to compile I get
24 [Warning] converting to int from float
83 [Warning] converting to int from float
int a , b = 8;
float c = 5.2;
float d = 8E3;
a = static_cast<float>(b) * c; // 24
cout << a << "\n";
cout << d << "\n";
int x, y, answer;
x = 7;
y = 9;
answer = 5;
answer *= (x + y);
cout << answer << "\n";
answer *= x + y;
cout << answer << "\n";
float m = 33.97;
answer += (x + y + m); // 83
cout << answer << "\n";
Any suggestions as to what I'm doing wrong?

a = static_cast<float>(b) * c;
a is an int, and the right-hand side of the equation is the multiplication of two floats, which will result in an intermediate float value, which is then implicitly casted to an int, causing the warning you are seeing.
Also:
answer += (x + y + m);
answer is an int type, and so are x and y, but m is float, again causing the intermediate result of the right-hand side to be a float.
These conversions will cause truncation of the fractional values of the float results. You can get rid of the warnings by explicitly casting to an int:
a = static_cast<int>(static_cast<float>(b) * c);
answer += static_cast<int>(x + y + m);

Well you're just getting a warning since the compiler is changing a floating-point value to an integer, thus truncating the result.
int a;
float f = 3.2;
a = f; // a is 3, a trunctated 3.2

Your question seems to be about conversion from float to int, not from int to float.
Basically you are doing nothing wrong. It is only a warning, as the value will be truncated (and maybe you don't expect that). To tell the compiler that you really want to get an int out of a float, you can make the cast explicit, like this:
a = static_cast<int>(static_cast<float>(b) * c);
Then it will not warn you anymore.

Related

How to correct this arithmetic operation without the need to use fmod?

In c++ this code below shows an error:
expression must have integral or unscoped enum type
illegal left operand has type 'double'
is it possible to correct it without the need to use fmod?
# include <iostream>
using namespace std;
int main()
{
int x = 5, y = 6, z = 4;
float w = 3.5, c;
c = (y + w - 0.5) % x * y; // here is the error
cout << "c = " << c << endl;
return 0;
}
You can use type casting to fix it :
c = ((int) (y + w - 0.5)) % x * y;
To clarify your response in the comments, changing c to type int still don't work as the part (y + w - 0.5) is not evaluated as int but as double. And modulus operation doesn't take that type as an argument.
Full modified code :
#include <iostream>
using namespace std;
int main()
{
int x = 5, y = 6, z = 4;
float w = 3.5, c; //c could still stayed as float
c = ((int) (y + w - 0.5)) % x * y; //swapped out here
cout << "c = " << c << endl;
}
Output : c = 24.
To be clear here, this is only a temporary fix for this case, when you know (y + w - 0.5) is going to have a clear integer value. If the value is something like 0.5 or 1.447, std::fmod is desirable.
Here's a post on type conversion rules in an expression regarding interaction between float/double and int/long long : Implicit type conversion rules in C++ operators

Float operations using double

I have a function which takes two strings(floating point) , operation and floating point bit-width:
EvaluateFloat(const string &str1, const string &str2, enum operation/*add,subtract, multiply,div*/, unsigned int bit-width, string &output)
input str1 and str2 could be float(32 bit) or double (64 bit).
Is it fine If store the inputs in double and perform double operation irrespective of bit-width and depending upon bit-width typecast it to float if it was 32 bit.
e.g
double num1 = atof(str1);
double num2 = atof(str2);
double result = num1 operation num2; //! operation will resolved using switch
if(32 == bit-width)
{
float f_result = result;
output = std::to_string(f_result);
}
else
{
output = std::to_string(result);
}
Can I assume safely f_result will be exactly same if I had performed operation using float type for float operations i.e.
float f_num1 = num1;
float f_num2 = num2;
float f_result = f_num1 operation f_num2
PS:
We assume there won;t be any cascaded operation i.e. out = a + b + c
instead it will transformed to: temp = a +b out = temp + c
I'm not concerned by inf and nan values.
I'm trying to code redundancy otherwise I have two do same operation
twice once for float and other for double
C++ does not specify which formats are used for float or double. If IEEE-754 binary32 and binary64 are used, then double-rounding errors do not occur for +, -, *, /, or sqrt. Given float x and float y, the following hold (float arithmetic on the left, double on the right):
x+y = (float) ((double) x + (double) y).
x-y = (float) ((double) x - (double) y).
x*y = (float) ((double) x * (double) y).
x/y = (float) ((double) x / (double) y).
sqrt(x) = (float) sqrt((double) x).
This is per the dissertation A Rigorous Framework for Fully Supporting the IEEE Standard for Floating-Point Arithmetic in High-Level Programming Languages by Samuel A. Figueroa del Cid, January 2000, New York University. Essentially, double has so many digits (bits) beyond float that the rounding to double never conceals the information needed to round correctly to float for results of these operations. (This cannot hold for operations in general; it depends on properties of these operations.) On page 57, Figueroa del Cid gives a table showing that, if the float format has p bits, then, to avoid double rounding errors, double must have 2p+1 bits for addition or subtraction, 2p for multiplication and division, and 2p+2 for sqrt. Since binary32 has 24 bits in the significand and double has 53, these are satisfied. (See the paper for details. There are some caveats, such as that p must be at least 2 or 4 for the various operations.)
According to standards floating point operations on double is equivalent to doing the operation in infinite precision. If we convert it to float we have now rounded it twice. In general this is not equivalent to just rounding to a float in the first place. For example. 0.47 rounds to 0.5 which rounds to 1, but 0.47 rounds directly to 0. As mentioned by chtz, multiplication of two floats should always be exactly some double (using IEEE math where double has more than twice the precision of float), so when we cast to a float we have still only lost precision once and so the result should be the same. Likewise addition and subtraction should not be a problem.
Division cannot be exactly represented in a double (not even 1/3), so we may think there is a problem with division. However I have run the sample code over night, trying over 3 trillion cases and have not found any case where running the original divide as a double gives a different answer.
#include <iostream>
int main() {
long i=0;
while (1) {
float x = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
float y = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
float f = x / y;
double d = (double)x / (double)y;
if(++i % 10000000 == 0) { std::cout << i << "\t" << x << "," << y << std::endl; }
if ((float(d) != f)) {
std::cout << std::endl;
std::cout << x << "," << y << std::endl;
std::cout << std::hex << *(int*)&x << "," << std::hex << *(int*)&y << std::endl;
std::cout << float(d) - f << std::endl;
return 1;
}
}
}

C++ double ptr to long ptr conversion

Consider the below code fragment:
double * p = new double[16];
int value = 1;
void * q = p;
*((double *)q) = value;
int x = ((long *)p)[0];
cout << "Value of x for double * to long * = " << x << endl;
*((int *)q) = value ;
x = ((long *)p)[0];
cout << "Value of x for int * to long * = " << x << endl;
Here the outputs are 0 and 1 respectively. Can anyone explain to me why?
Also if I directly access the value at pointer...ie. p[0], the value is correctly shown as 1 in both case. Why?
Integers are stored in memory in straight binary so you can convert between intand long with no problem. doulbe is stored using the floating point binary syntax where some of the bits are used to describe the mantissa and the others are used to describe the exponent (similar to scientific notation i.e. 5e2 = 500).
If you try and use the data for a double as the data for a double then it will not convert correctly due to the different ways that the binary stores the value.

C++ floating point to integer type conversions

What are the different techniques used to convert float type of data to integer in C++?
#include <iostream>
using namespace std;
struct database {
int id, age;
float salary;
};
int main() {
struct database employee;
employee.id = 1;
employee.age = 23;
employee.salary = 45678.90;
/*
How can i print this value as an integer
(with out changing the salary data type in the declaration part) ?
*/
cout << endl << employee.id << endl << employee.
age << endl << employee.salary << endl;
return 0;
}
What you are looking for is 'type casting'. typecasting (putting the type you know you want in brackets) tells the compiler you know what you are doing and are cool with it. The old way that is inherited from C is as follows.
float var_a = 9.99;
int var_b = (int)var_a;
If you had only tried to write
int var_b = var_a;
You would have got a warning that you can't implicitly (automatically) convert a float to an int, as you lose the decimal.
This is referred to as the old way as C++ offers a superior alternative, 'static cast'; this provides a much safer way of converting from one type to another. The equivalent method would be (and the way you should do it)
float var_x = 9.99;
int var_y = static_cast<int>(var_x);
This method may look a bit more long winded, but it provides much better handling for situations such as accidentally requesting a 'static cast' on a type that cannot be converted. For more information on the why you should be using static cast, see this question.
Normal way is to:
float f = 3.4;
int n = static_cast<int>(f);
Size of some float types may exceed the size of int.
This example shows a safe conversion of any float type to int using the int safeFloatToInt(const FloatType &num); function:
#include <iostream>
#include <limits>
using namespace std;
template <class FloatType>
int safeFloatToInt(const FloatType &num) {
//check if float fits into integer
if ( numeric_limits<int>::digits < numeric_limits<FloatType>::digits) {
// check if float is smaller than max int
if( (num < static_cast<FloatType>( numeric_limits<int>::max())) &&
(num > static_cast<FloatType>( numeric_limits<int>::min())) ) {
return static_cast<int>(num); //safe to cast
} else {
cerr << "Unsafe conversion of value:" << num << endl;
//NaN is not defined for int return the largest int value
return numeric_limits<int>::max();
}
} else {
//It is safe to cast
return static_cast<int>(num);
}
}
int main(){
double a=2251799813685240.0;
float b=43.0;
double c=23333.0;
//unsafe cast
cout << safeFloatToInt(a) << endl;
cout << safeFloatToInt(b) << endl;
cout << safeFloatToInt(c) << endl;
return 0;
}
Result:
Unsafe conversion of value:2.2518e+15
2147483647
43
23333
For most cases (long for floats, long long for double and long double):
long a{ std::lround(1.5f) }; //2l
long long b{ std::llround(std::floor(1.5)) }; //1ll
Check out the boost NumericConversion library. It will allow to explicitly control how you want to deal with issues like overflow handling and truncation.
I believe you can do this using a cast:
float f_val = 3.6f;
int i_val = (int) f_val;
the easiest technique is to just assign float to int, for example:
int i;
float f;
f = 34.0098;
i = f;
this will truncate everything behind floating point or you can round your float number before.
One thing I want to add. Sometimes, there can be precision loss. You may want to add some epsilon value first before converting. Not sure why that works... but it work.
int someint = (somedouble+epsilon);
This is one way to convert IEEE 754 float to 32-bit integer if you can't use floating point operations. It has also a scaler functionality to include more digits to the result. Useful values for scaler are 1, 10 and 100.
#define EXPONENT_LENGTH 8
#define MANTISSA_LENGTH 23
// to convert float to int without floating point operations
int ownFloatToInt(int floatBits, int scaler) {
int sign = (floatBits >> (EXPONENT_LENGTH + MANTISSA_LENGTH)) & 1;
int exponent = (floatBits >> MANTISSA_LENGTH) & ((1 << EXPONENT_LENGTH) - 1);
int mantissa = (floatBits & ((1 << MANTISSA_LENGTH) - 1)) | (1 << MANTISSA_LENGTH);
int result = mantissa * scaler; // possible overflow
exponent -= ((1 << (EXPONENT_LENGTH - 1)) - 1); // exponent bias
exponent -= MANTISSA_LENGTH; // modify exponent for shifting the mantissa
if (exponent <= -(int)sizeof(result) * 8) {
return 0; // underflow
}
if (exponent > 0) {
result <<= exponent; // possible overflow
} else {
result >>= -exponent;
}
if (sign) result = -result; // handle sign
return result;
}

C++ Cout floating point problem

#include <iostream>
using namespace std;
int main()
{
float s;
s = 10 / 3;
cout << s << endl;
cout.precision(4);
cout << s << endl;
return 0;
}
Why the output does not show 3.333 but only 3 ??
because you are doing integer division with s = 10 / 3
Try
s = 10.0f / 3.0f
The correct way to do a constant float division is:
s = 10.f / 3.f; // one of the operands must be a float
Without the f suffix, you are doing double division, giving a warning (from float to double).
You can also cast one of the operands:
s = static_cast<float>(10) / 3; // use static_cast, not C-style casts
Resulting in the correct division.
10/3 is integer division. You need to use 10.0/3 or (float)10/3 or 10/3.0, etc.