I just wanted to ask what happens number wise if i do not typecast integers to float when storing in a float variable like this:
int32 IntVar1 = 100
int32 IntVar2 = 200
float FloatVar = IntVar1/IntVar2;
Currently i am doing this:
int32 IntVar1 = 100
int32 IntVar2 = 200
float FloatVar = float(IntVar1)/float(IntVar2);
But in the amount of code i have, this looks really retarded. I thought about changing my int variables to float, but i guess that would be a performance hit. And since the integer values are not supposed to hold any decimals, it feels like a complete waste.
So i wonder, are there any way that option 1 could be working? Or do i have to typecast OR change variables to float? (All typecasting pretty much makes the code unreadable)
I wouldn't worry too much about premature optimization. If it makes more sense for your values to be expressed as float types, go for it. If your program doesn't run as fast as you need, and you've profiled it and know that the floating point operations are the problem, then start thinking about how to speed it up.
I'd value readability over all of the casting, which seems to be your instinct as well.
Also, since this question is tagged C++, I think it's (unfortunately?) more idiomatic to do:
float FloatVar = static_cast<float>(IntVar1)/IntVar2
Behold the magic of functions:
float div(int x, int y)
{
return float(x) / float(y);
}
Now you can say:
int32 IntVar1 = 100
int32 IntVar2 = 200
float FloatVar = div(IntVar1, IntVar2);
You need at least one of those operands to be float, otherwise the division will be truncated. I usually cast the first operand:
float FloatVar = (float)IntVar1/IntVar2;
which, elegance-wise, isn't that bad.
As per the ISO/IEC standard- N3797 - section 5.6
For integral operands the / operator yields the algebraic quotient
with any fractional part discarded; if the quotient a/b is
representable in the type of the result, (a/b)*b + a%b is equal to a;
otherwise, the behavior of both a/b and a%b is undefined
The discarding of the fractional part is called truncation towards zero.
There is no wonder if the fractional part is discarded in
22/7
Related
I'm writing a program to calculate the result of numbers:
int main()
{
float a, b;
cin >> a >> b;
float result = b + a * a * 0.4;
cout << result;
}
but I have a warning at a * a and it said Warning C26451 Arithmetic overflow: Using operator '*' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '*' to avoid overflow (io.2). Sorry if this a newbie question, can anyone help me with this? Thank you!
In the C language as described in the first edition of K&R, all floating-point operations were performed by converting operands to a common type (specifically double), performing operations with that type, and then if necessary converting the result to whatever type was needed. On many platforms, that was the most convenient and space-efficient way of handling floating-point math. While the Standard still allows implementations to behave that way, it also allows implementations to perform floating-point operations on smaller types to be performed using those types directly.
As written, the subexpression a * a * 0.5; would be performed by multiplying a * a together using float type, then multiply by a value 0.5 which is of type double. This latter multiplication would require converting the float result of a * a to double. If e.g. a had been equal to 2E19f, then performing the multiply using type float would yield a value too large to be represented using that type. Had the code instead performed the multiplication using type double, then the result 4E38 would be representable in that type, and the result of multiplying that by 0.5 (i.e. 2E38) would be within the range that is representable by float.
Note that in this particular situation, the use of float for the intermediate computations would only affect the result if a was within narrow ranges of very large or very small. If instead of multiplying by 0.5 one had multiplied by other values, however, the choice of whether to use float or double for the first multiplication could affect the accuracy of rounding. Generally, using double for both multiplies would yield slightly more accurate results, but at the expense of increased execution time. Using float for both may yield better execution speed, but at the result of reduced precision. If the floating-point constant had been something that isn't precisely representable in float, converting to double and multiplying by a double constant may yield slightly more accurate results than using float for everything, but in most cases where one would want that precision, one would also want the increased position that would be achieved by using double for the first multiply as well.
Let's look at the error message.
Using operator '*' on a 4 byte value
It is describing this code:
a * a
Your float is 4 bytes. The result of the multiplication is 4 bytes. And the result of a multiplication may overflow.
and then casting the result to a 8 byte value.
It is describing this code:
(result) * 0.4;
Your result is 4 bytes. 0.4 is a double, which is 8 bytes. C++ will promote your float result to a double before performing this multiplication.
So...
The compiler is observing that you are doing float math that could overflow and then immediately converting the result to a double, making the potential overflow unnecessary.
Change the code to this to remove the float to double to float conversions.
float result = b + a * a * 0.4f;
I read the question as "how to change the code to remove the warning?".
If you take the advice in the warning's text literally:
float result = b + (double)a * a * 0.4;
But this is nonsense — if an overflow happens, your result will probably not fit into float result.
It looks like in your case overflow is not possible, and you feel perfectly fine doing all calculations with float. If so, just write 0.4f instead of 0.4 — then the constant will have float type (instead of double), and the result will also be float.
If you want to "fix" the problem with overflow
double result = b + (double)a * a * 0.4;
But then you must also change the following code, which uses the result. And you don't remove the possibility of overflow, you just make it much less likely.
I'm trying to convert a float value to an integer, modify the int value, then reconvert back to a float value. However, the decimals' value gets lost and I'm pretty sure I used the static_cast<>() function wrong in my code.
My code is a binary multiplier, which shifts the binary value f times to left. For example, when I'm doing something like 1.2 x 2, I'm only getting 2 instead of 2.4.
int mantissa;
int f;
int exp;
float result = mantissa + 0x800000;
int resultInt = static_cast<int>(result);
int expF = log2(abs(f));
int expM = exp + expF;
int newExp = (127 + 23 - expM);
resultInt >>= newExp;
float result2 = resultInt;
Bit shifting will not work for floating point values because the bits are laid out differently. They have to preserve the decimal location as well as the digits (hence the floating "point" value).
An integer, on the other hand, works well with bit shifting due to how well it maps from decimal-to-binary, but does not store a decimal point anywhere. Thus, when casting, you lose that information.
In short, it is impossible to multiply a decimal value directly using bit shifting the same way you can with an integer.
However, you can multiply the floating point by 10 until all digits are on the left side of the decimal, then cast to an integer. It may eat up performance depending on how it's implemented, but it's certainly possible to preserve all information this way. It's difficult to answer the question beyond that without understanding your intentions.
a complete newbie here. For my school homework, I was given to write a program that displays -
s= 1 + 1/2 + 1/3 + 1/4 ..... + 1/n
Here's what I did -
#include<iostream.h>
#include<conio.h>
void main()
{
clrscr();
int a;
float s=0, n;
cin>>a;
for(n=1;n<=a;n++)
{
s+=1/n;
}
cout<<s;
getch();
}
It perfectly displays what it should. However, in the past I have only written programs which uses int data type. To my understanding, int data type does not contain any decimal place whereas float does. So I don't know much about float yet. Later that night, I was watching some video on YouTube in which he was writing the exact same program but in a little different way. The video was in some foreign language so I couldn't understand it. What he did was declared 'n' as an integer.
int a, n;
float s=0;
instead of
int a
float s=0, n;
But this was not displaying the desired result. So he went ahead and showed two ways to correct it. He made changes in the for loop body -
s+=1.0f/n;
and
s+=1/(float)n;
To my understanding, he declared 'n' a float data type later in the program(Am I right?). So, my question is, both display the same result but is there any difference between the two? As we are declaring 'n' a float, why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. And in the second method, why we can't write 1(float)/n instead of 1/(float)n? As in the first method we have added float suffix with 1. Also, is there a difference between 1.f and 1.0f?
I tried to google my question but couldn't find any answer. Also, another confusion that came to my mind after a few hours is - Why are we even declaring 'n' a float? As per the program, the sum should come out as a real number. So, shouldn't we declare only 's' a float. The more I think the more I confuse my brain. Please help!
Thank You.
The reason is that integer division behaves different than floating point division.
4 / 3 gives you the integer 1. 10 / 3 gives you the integer 3.
However, 4.0f / 3 gives you the float 1.3333..., 10.0f / 3 gives you the float 3.3333...
So if you have:
float f = 4 / 3;
4 / 3 will give you the integer 1, which will then be stored into the float f as 1.0f.
You instead have to make sure either the divisor or the dividend is a float:
float f = 4.0f / 3;
float f = 4 / 3.0f;
If you have two integer variables, then you have to convert one of them to a float first:
int a = ..., b = ...;
float f = (float)a / b;
float f = a / (float)b;
The first is equivalent to something like:
float tmp = a;
float f = tmp / b;
Since n will only ever have an integer value, it makes sense to define it as as int. However doing so means that this won't work as you might expect:
s+=1/n;
In the division operation both operands are integer types, so it performs integer division which means it takes the integer part of the result and throws away any fractional component. So 1/2 would evaluate to 0 because dividing 1 by 2 results in 0.5, and throwing away the fraction results in 0.
This in contrast to floating point division which keeps the fractional component. C will perform floating point division if either operand is a floating point type.
In the case of the above expression, we can force floating point division by performing a typecast on either operand:
s += (float)1/n
Or:
s += 1/(float)n
You can also specify the constant 1 as a floating point constant by giving a decimal component:
s += 1.0/n
Or appending the f suffix:
s += 1.0f/n
The f suffix (as well as the U, L, and LL suffixes) can only be applied to numerical constants, not variables.
What he is doing is something called casting. I'm sure your school will mention it in new lectures. Basically n is set as an integer for the entire program. But since integer and double are similar (both are numbers), the c/c++ language allows you to use them as either as long as you tell the compiler what you want to use it as. You do this by adding parenthesis and the data type ie
(float) n
he declared 'n' a float data type later in the program(Am I right?)
No, he defined (thereby also declared) n an int and later he explicitly converted (casted) it into a float. Both are very different.
both display the same result but is there any difference between the two?
Nope. They're the same in this context. When an arithmetic operator has int and float operands, the former is implicitly converted into the latter and thereby the result will also be a float. He's just shown you two ways to do it. When both the operands are integers, you'd get an integer value as a result which may be incorrect, when proper mathematical division would give you a non-integer quotient. To avoid this, usually one of the operands are made into a floating-point number so that the actual result is closer to the expected result.
why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. [...] Also, is there a difference between 1.f and 1.0f?
This is because the language syntax is defined thus. When you're declaring a floating-point literal, the suffix is to use .f. So 5 would be an int while 5.0f or 5.f is a float; there's no difference when you omit any trailing 0s. However, n.f is syntax error since n is a identifier (variable) name and not a constant number literal.
And in the second method, why we can't write 1(float)/n instead of 1/(float)n?
(float)n is a valid, C-style casting of the int variable n, while 1(float) is just syntax error.
s+=1.0f/n;
and
s+=1/(float)n;
... So, my question is, both display the same result but is there any difference between the two?
Yes.
In both C and C++, when a calculation involves expressions of different types, one or more of those expressions will be "promoted" to the type with greater precision or range. So if you have an expression with signed and unsigned operands, the signed operand will be "promoted" to unsigned. If you have an expression with float and double operands, the float operand will be promoted to double.
Remember that division with two integer operands gives an integer result - 1/2 yields 0, not 0.5. To get a floating point result, at least one of the operands must have a floating point type.
In the case of 1.0f/n, the expression 1.0f has type float1, so the n will be "promoted" from type int to type float.
In the case of 1/(float) n, the expression n is being explicitly cast to type float, so the expression 1 is promoted from type int to float.
Nitpicks:
Unless your compiler documentation explicitly lists void main() as a legal signature for the main function, use int main() instead. From the online C++ standard:
3.6.1 Main function
...
2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a declared return type of type int, but otherwise its type is implementation-defined...
Secondly, please format your code - it makes it easier for others to read and debug. Whitespace and indentation are your friends - use them.
1. The constant expression 1.0 with no suffix has type double. The f suffix tells the compiler to treat it as float. 1.0/n would result in a value of type double.
I am doing something like this
int a = 3;
int b = 4;
float c = a/b ; //This returns 0 while its suppose to return 0.75
I wanted to know why the above code doesn't work ? I realize that 3 is an int and 4 is an int too. However the result is a float which is being assigned to float. However I am getting a 0 here. Any suggestions on what I might be doing wrong ?
The division is evaluated first, and because it is two integer operands, it evaluates to an integer... which then only get assigned to a float.
This is due to a predefined set of rules that decreases in type complexity. To force the result to be of a particular type (at least), at least one of the operands needs to be of that type. (via a static_cast< > )
Thus:
float c = a / static_cast<float>(b);
float c = a/b ;
a and b are integers, so it is integer division.
From the C++ standard:
5.6 Multiplicative operators [expr.mul]
For integral operands the / operator yields the algebraic quotient with any fractional part discarded.
Instaed, try this:
float c = a / static_cast<float>(b);
(As #TrevorHickey suggested, static_cast<float> is better than old-style (float) cast.)
You cant divide two ints and receive a float. You either have to cast to a float or have the types as a float.
float a = 3;
float b = 4;
float c = a/b;
or
float c = (float)a/(float)b;
HINT: the result from integer division is integer. The result of the division is then assigned to a float. That is a/b results in an int. Cast that however you want, but you aren't gonna get 0.75 out of it.
If you are working in C++, you should use the static_cast method over the implicit cast.
This will ensure that the type can be safely cast at compile time.
float c = a/static_cast<float>(b);
Maybe, it's very simple question but I couldn't get the answer. I've been searching quite a while ( now Google think that I'm sending automated queries http://twitter.com/michaelsync/status/17177278608 ) ..
int n = 4.35 *100;
cout << n;
Why does the output become "434" instead of "435"? 4.35 * 100 = 435 which is a integer value and this should be assignable to the integer variable "n", right?
OR Does the C++ compiler cast 4.35 to integer before multiplying? I think it won't. Why does the compiler automatically change 4.35 to 4.34 which is still a float??
Thanks.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
That's really just a starting point, sadly, as then languages introduce their own foibles as to when they do type conversions, etc. In this case you've merely created a situation where the constant 4.35 can't be represented precisely, and thus 4.35*100 is more like 434.9999999999, and the cast to int does trunc, not round.
If you run this statement:
cout << 4.35
Dollars to donuts you get something approximately like 4.3499998821 because 4.35 isn't exactly representable in a float.
When the compiler casts a float to an int it truncates.
To get the behavior your expect, try:
int n = floor((4.35 * 100.0) + 0.5);
(The trickyness with floor is because C++ doesn't have a native round() function)
The internal representation of 4.35 ends up being 4.349999999 or similar. Multiplying by 100 shifts the decimal, and the .9999 is dropped off (truncated) when converting to int.
Edit: Was looking for the link Nick posted. :)
Floating point numbers don't work that way. Many (most, technically an infinite number of...) values cannot be stored or manipulated precisely as floating point. 4.35 would seem to be one of them. It's getting stored as something that's actually below 4.35, hence your result.
When a float is converted to an int the fractional part is truncated, the conversion doesn't take the nearest int to the float in value.
4.35 can't be exactly represented as a float, the nearest representable number is (we can deduce) very slightly less that 4.35, i.e. 4.34999... , so when multiplied by 100 you get 434.999...
If you want to convert a positive float to the nearest int you should add 0.5 before converting to int.
E.g.
int n = (4.35 * 100) + 0.5;
cout << n;