Does the presence of one floating-point data type (e.g. double) ensure that all +, -, *, /, %, etc math operations assume double operands?
If the story is more complicated than that, is there a resource that describes these rules? Should I not ask such questions and always explicitly cast int to double when the result of the equation is double. Here are some equations I'm thinking about. I purposefully did not compile and run then on my system, since this is the type of thing that could be compiler dependent.
int a(1), b(2), c(3);
double d(4.);
double result1 = a + b/d + c; // equal to 4 or to 4.5?
double result2 = (a + b)/d + c; // equal to 3 or to 3.75?
double result3 = a/b + d; // equal to 4 or to 4.5?
I purposefully did not compile and run then on my system, since this is the type of thing that could be compiler dependent.
This is not compiler dependent. C++ clearly defines the order of these operations and how they are converted.
How the conversion happens is dependent on the order of operations.
double result1 = a + b / d + c; // equal to 4 or to 4.5?
In this example, the division happens first. Because this is an int divided by a double, the compiler handles this by converting the int into a double. Thus, the result of b / d is a double.
The next thing that C++ does is add a to the result of b / d. This is an int added to a double, so it converts the int to a double and adds, resulting in a double. The same thing happens with c.
double result3 = a / b + d; // equal to 4 or to 4.5?
In this example, division is handled first. a and b are both ints, so no conversion is done. The result of a / b is of type int and is 0.
Then, the result of this is added to d. This is an int plus a double, so C++ converts the int to a double, and the result is a double.
Even though a double is present in this expression, a / b is evaluated first, and the double means nothing until execution reaches the double. Therefore, integer division occurs.
I find promotion and conversion rules pretty complex. Usually integer-like numbers (short, int, long) are promoted to floating-point equivalents (float, double). But things are complicated by size differences and sign.
See this question for specifics about conversion.
Does one double promote every int in the equation to double?
No. Only the result of a single operation (with respect to precedence).
double result1 = a + b/d + c; // equal to 4 or to 4.5?
4.5.
double result2 = (a + b)/d + c; // equal to 3 or to 3.75?
3.75.
double result3 = a/b + d; // equal to 4 or to 4.5?
4.
You must consider the precedence of every operator, you must think like a parser:
double result1 = a + b/d + c; // equal to 4 or to 4.5?
That's like a + (b/d) +c because the '/' operator has the biggest precedence.Then it doesn't matter what of these 2 operations is made for first, because the floating point operand is in the middle, and it "infects" other operands and make them be double.So it's 4.5.
double result2 = (a + b)/d + c; // equal to 3 or to 3.75?
Same here, it's like ((a+b)/d )+c, so a+b is 3, that 3 becomes a floating point number because gets promoted to double, because is the dividend of d, which is a double, so it's 0.75+3, that is 3.75.
double result3 = a/b + d; // equal to 4 or to 4.5?
It's like (a/b)+d, so a/b is zero and d is 4, so it's 4.
A parser makes all the operations in order of precedence, so you can exactly know what will be the result of the expression.
Generally, if one operand of a binary operator is floating point and the other is integer, the integer is converted to floating point, and the result is floating point.
In a compound expression, with multiple subexpressions, each operator is processed individually, using the precedence rules you probably know. Thus, in a*b + c*d, a*b is evaluated, and c*d is evaluated, and the results are added together. Whatever is in c*d has no effect in a*b and vice-versa.
C++ is complicated, of course, and user-defined operators may have other behaviors.
The authoritative resource that defines the rules is the C++ standard. The standard is quite large and technical. You might prefer to examine the C standard first. See this answer for links to the standards. Any good book on C or C++ should describe the default type conversions and expression evaluation.
Related
Suppose we have three integer (int, long, long long, unsigned int, etc) variables a, b, c. Normally, performing
c = a / b;
would result in truncate of fractions. However, is it possible for c to end up with an incorrect value?
I am not talking about a / b may be out of range for c's type. Rather, I am talking about how integer division is implemented in C. Does performing a / b first generate a float type intermediate result, and then the intermediate value is truncated?
If so, I wonder if loss of precision of the intermediate value can lead to an incorrect value of c. For example, suppose the precise value for a / b is 2, but somehow the intermediate result is 1.9999..., thus c will end up with an incorrect value of 1. Can such cases happen, or does integer division always result in a correct value if the expected value is in the range of c's type?
Does performing a / b first generate a float type intermediate result
As far as the language is concerned, there are no intermediate results.
does integer division always result in a correct value if the expected value is in the range of c's type?
Yes.
Section 6.5.5 of the C11 standards states
When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a;
Which means there's no way, mathematically, that you'll get wrong results.
Suppose we have three integer (int, long, long long, unsigned int, etc) variables a, b, c. Normally, performing
c = a / b;
would result in truncate of fractions. However, is it possible for c to end up with an incorrect value? I am not talking about a / b may be out of range for c's type.
It should not be possible that for example the last digit of division be wrong, if all rules were followed otherwise. C11 6.5.5p6:
When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded.
i.e. the result is not "close" to but exactly the same as a / b would be algebraically, just anything following the point discarded.
That does not mean there won't be any gotchas: it is possible that the division of a / b might be mathematically not out of range for c's type yet out of range for the type used in the division itself which can cause wrong values be set in c.
Consider this example:
#include <stdio.h>
#include <inttypes.h>
int main(void) {
int32_t a = INT32_MIN;
int32_t b = -1;
int64_t c = a / b;
printf("%" PRId64, c);
}
The result of division of INT32_MIN / -1 is representable in c, it is INT32_MAX + 1, which is positive. However on 32-bit platforms the arithmetic happens in 32 bits, and this division produces an integer overflow which causes the behaviour to be undefined. What happens on my computer is that if I compile without optimizations it aborts the program. If I compile with optimizations enabled (-O3), the compiler will resolve this calculation at compilation time, and handles the overflow in a peculiar way and produces the result -2147483648 which is negative.
Likewise, if you do this:
uint16_t a = 16;
int16_t b = -1;
int32_t result = a / b;
printf("%" PRId32 "\n", result);
the result on a 32-bit int machine is -16. If you change the type of a to uint32_t the math happens in unsigned:
uint32_t a = 16;
int16_t b = -1;
int32_t result = a / b;
printf("%" PRId32 "\n", result);
The result is of course 0. And you would get 0 from the former calculation too on a 16-bit machine.
a complete newbie here. For my school homework, I was given to write a program that displays -
s= 1 + 1/2 + 1/3 + 1/4 ..... + 1/n
Here's what I did -
#include<iostream.h>
#include<conio.h>
void main()
{
clrscr();
int a;
float s=0, n;
cin>>a;
for(n=1;n<=a;n++)
{
s+=1/n;
}
cout<<s;
getch();
}
It perfectly displays what it should. However, in the past I have only written programs which uses int data type. To my understanding, int data type does not contain any decimal place whereas float does. So I don't know much about float yet. Later that night, I was watching some video on YouTube in which he was writing the exact same program but in a little different way. The video was in some foreign language so I couldn't understand it. What he did was declared 'n' as an integer.
int a, n;
float s=0;
instead of
int a
float s=0, n;
But this was not displaying the desired result. So he went ahead and showed two ways to correct it. He made changes in the for loop body -
s+=1.0f/n;
and
s+=1/(float)n;
To my understanding, he declared 'n' a float data type later in the program(Am I right?). So, my question is, both display the same result but is there any difference between the two? As we are declaring 'n' a float, why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. And in the second method, why we can't write 1(float)/n instead of 1/(float)n? As in the first method we have added float suffix with 1. Also, is there a difference between 1.f and 1.0f?
I tried to google my question but couldn't find any answer. Also, another confusion that came to my mind after a few hours is - Why are we even declaring 'n' a float? As per the program, the sum should come out as a real number. So, shouldn't we declare only 's' a float. The more I think the more I confuse my brain. Please help!
Thank You.
The reason is that integer division behaves different than floating point division.
4 / 3 gives you the integer 1. 10 / 3 gives you the integer 3.
However, 4.0f / 3 gives you the float 1.3333..., 10.0f / 3 gives you the float 3.3333...
So if you have:
float f = 4 / 3;
4 / 3 will give you the integer 1, which will then be stored into the float f as 1.0f.
You instead have to make sure either the divisor or the dividend is a float:
float f = 4.0f / 3;
float f = 4 / 3.0f;
If you have two integer variables, then you have to convert one of them to a float first:
int a = ..., b = ...;
float f = (float)a / b;
float f = a / (float)b;
The first is equivalent to something like:
float tmp = a;
float f = tmp / b;
Since n will only ever have an integer value, it makes sense to define it as as int. However doing so means that this won't work as you might expect:
s+=1/n;
In the division operation both operands are integer types, so it performs integer division which means it takes the integer part of the result and throws away any fractional component. So 1/2 would evaluate to 0 because dividing 1 by 2 results in 0.5, and throwing away the fraction results in 0.
This in contrast to floating point division which keeps the fractional component. C will perform floating point division if either operand is a floating point type.
In the case of the above expression, we can force floating point division by performing a typecast on either operand:
s += (float)1/n
Or:
s += 1/(float)n
You can also specify the constant 1 as a floating point constant by giving a decimal component:
s += 1.0/n
Or appending the f suffix:
s += 1.0f/n
The f suffix (as well as the U, L, and LL suffixes) can only be applied to numerical constants, not variables.
What he is doing is something called casting. I'm sure your school will mention it in new lectures. Basically n is set as an integer for the entire program. But since integer and double are similar (both are numbers), the c/c++ language allows you to use them as either as long as you tell the compiler what you want to use it as. You do this by adding parenthesis and the data type ie
(float) n
he declared 'n' a float data type later in the program(Am I right?)
No, he defined (thereby also declared) n an int and later he explicitly converted (casted) it into a float. Both are very different.
both display the same result but is there any difference between the two?
Nope. They're the same in this context. When an arithmetic operator has int and float operands, the former is implicitly converted into the latter and thereby the result will also be a float. He's just shown you two ways to do it. When both the operands are integers, you'd get an integer value as a result which may be incorrect, when proper mathematical division would give you a non-integer quotient. To avoid this, usually one of the operands are made into a floating-point number so that the actual result is closer to the expected result.
why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. [...] Also, is there a difference between 1.f and 1.0f?
This is because the language syntax is defined thus. When you're declaring a floating-point literal, the suffix is to use .f. So 5 would be an int while 5.0f or 5.f is a float; there's no difference when you omit any trailing 0s. However, n.f is syntax error since n is a identifier (variable) name and not a constant number literal.
And in the second method, why we can't write 1(float)/n instead of 1/(float)n?
(float)n is a valid, C-style casting of the int variable n, while 1(float) is just syntax error.
s+=1.0f/n;
and
s+=1/(float)n;
... So, my question is, both display the same result but is there any difference between the two?
Yes.
In both C and C++, when a calculation involves expressions of different types, one or more of those expressions will be "promoted" to the type with greater precision or range. So if you have an expression with signed and unsigned operands, the signed operand will be "promoted" to unsigned. If you have an expression with float and double operands, the float operand will be promoted to double.
Remember that division with two integer operands gives an integer result - 1/2 yields 0, not 0.5. To get a floating point result, at least one of the operands must have a floating point type.
In the case of 1.0f/n, the expression 1.0f has type float1, so the n will be "promoted" from type int to type float.
In the case of 1/(float) n, the expression n is being explicitly cast to type float, so the expression 1 is promoted from type int to float.
Nitpicks:
Unless your compiler documentation explicitly lists void main() as a legal signature for the main function, use int main() instead. From the online C++ standard:
3.6.1 Main function
...
2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a declared return type of type int, but otherwise its type is implementation-defined...
Secondly, please format your code - it makes it easier for others to read and debug. Whitespace and indentation are your friends - use them.
1. The constant expression 1.0 with no suffix has type double. The f suffix tells the compiler to treat it as float. 1.0/n would result in a value of type double.
I am doing something like this
int a = 3;
int b = 4;
float c = a/b ; //This returns 0 while its suppose to return 0.75
I wanted to know why the above code doesn't work ? I realize that 3 is an int and 4 is an int too. However the result is a float which is being assigned to float. However I am getting a 0 here. Any suggestions on what I might be doing wrong ?
The division is evaluated first, and because it is two integer operands, it evaluates to an integer... which then only get assigned to a float.
This is due to a predefined set of rules that decreases in type complexity. To force the result to be of a particular type (at least), at least one of the operands needs to be of that type. (via a static_cast< > )
Thus:
float c = a / static_cast<float>(b);
float c = a/b ;
a and b are integers, so it is integer division.
From the C++ standard:
5.6 Multiplicative operators [expr.mul]
For integral operands the / operator yields the algebraic quotient with any fractional part discarded.
Instaed, try this:
float c = a / static_cast<float>(b);
(As #TrevorHickey suggested, static_cast<float> is better than old-style (float) cast.)
You cant divide two ints and receive a float. You either have to cast to a float or have the types as a float.
float a = 3;
float b = 4;
float c = a/b;
or
float c = (float)a/(float)b;
HINT: the result from integer division is integer. The result of the division is then assigned to a float. That is a/b results in an int. Cast that however you want, but you aren't gonna get 0.75 out of it.
If you are working in C++, you should use the static_cast method over the implicit cast.
This will ensure that the type can be safely cast at compile time.
float c = a/static_cast<float>(b);
I see a lot of C++ code that has lines like:
float a = 2;
float x = a + 1.0f;
float b = 3.0f * a;
float c = 2.0f * (1.0f - a);
Are these .0f after these literals really necessary? Would you lose numeric accuracy if you omit these?
I thought you only need them if you have a line like this:
float a = 10;
float x = 1 / a;
where you should use 1.0f, right?
You would need to use it in the following case:
float x = 1/3;
either 1 or 3 needs to have a .0 or else x will always be 0.
If a is an int, these two lines are definitely not equivalent:
float b = 3.0f * a;
float b = 3 * a;
The second will silently overflow if a is too large, because the right-hand side is evaluated using integer arithmetic. But the first is perfectly safe.
But if a is a float, as in your examples, then the two expressions are equivalent. Which one you use is a question of personal preference; the first is probably more hygeinic.
It somewhat depends on what you are doing with the numbers. The type of a floating point literal with a f or F are of type float. The type of a floating point literal without a suffix is of type double. As a result, there may be subtle differences when using a f suffix compared to not using it.
As long as a subexpression involves at least one object of floating point type it probably doesn't matter much. It is more important to use suffixes with integers to be interpreted as floating points: If there is no floating point value involved in a subexpression integer arithmetic is used. This can have major effects because the result will be an integer.
float b = 3.0f * a;
Sometimes this is done because you want to make sure 3.0 is created as a float and not as double.
For example, when I'm dividing two ints and want a float returned, I superstitiously write something like this:
int a = 2, b = 3;
float c = (float)a / (float)b;
If I do not cast a and b to floats, it'll do integer division and return an int.
Similarly, if I want to multiply a signed 8-bit number with an unsigned 8-bit number, I will cast them to signed 16-bit numbers before multiplying for fear of overflow:
u8 a = 255;
s8 b = -127;
s16 = (s16)a * (s16)b;
How exactly does the compiler behave in these situations when not casting at all or when only casting one of the variables? Do I really need to explicitly cast all of the variables, or just the one on the left, or the one on the right?
Question 1: Float division
int a = 2, b = 3;
float c = static_cast<float>(a) / b; // need to convert 1 operand to a float
Question 2: How the compiler works
Five rules of thumb to remember:
Arithmetic operations are always performed on values of the same type.
The result type is the same as the operands (after promotion)
The smallest type arithmetic operations are performed on is int.
ANSCI C (and thus C++) use value preserving integer promotion.
Each operation is done in isolation.
The ANSI C rules are as follows:
Most of these rules also apply to C++ though not all types are officially supported (yet).
If either operand is a long double the other is converted to a long double.
If either operand is a double the other is converted to a double.
If either operand is a float the other is converted to a float.
If either operand is a unsigned long long the other is converted to unsigned long long.
If either operand is a long long the other is converted to long long.
If either operand is a unsigned long the other is converted to unsigned long.
If either operand is a long the other is converted to long.
If either operand is a unsigned int the other is converted to unsigned int.
Otherwise both operands are converted to int.
Overflow
Overflow is always a problem. Note. The type of the result is the same as the input operands so all the operations can overflow, so yes you do need to worry about it (though the language does not provide any explicit way to catch this happening.
As a side note:
Unsigned division can not overflow but signed division can.
std::numeric_limits<int>::max() / -1 // No Overflow
std::numeric_limits<int>::min() / -1 // Will Overflow
In general, if operands are of different types, the compiler will promote all to the largest or most precise type:
If one number is... And the other is... The compiler will promote to...
------------------- ------------------- -------------------------------
char int int
signed unsigned unsigned
char or int float float
float double double
Examples:
char + int ==> int
signed int + unsigned char ==> unsigned int
float + int ==> float
Beware, though, that promotion occurs only as required for each intermediate calculation, so:
4.0 + 5/3 = 4.0 + 1 = 5.0
This is because the integer division is performed first, then the result is promoted to float for the addition.
You can just cast one of them. It doesn't matter which one though.
Whenever the types don't match, the "smaller" type is automatically promoted to the "larger" type, with floating point being "larger" than integer types.
Division of integers: cast any one of the operands, no need to cast them both. If both operands are integers the division operation is an integer division, otherwise it is a floating-point division.
As for the overflow question, there is no need to explicitly cast, as the compiler implicitly does that for you:
#include <iostream>
#include <limits>
using namespace std;
int main()
{
signed int a = numeric_limits<signed int>::max();
unsigned int b = a + 1; // implicit cast, no overflow here
cout << a << ' ' << b << endl;
return 0;
}
In the case of the floating-point division, as long as one variable is of a floating-point datatype (float or double), then the other variable should be widened to a floating-point type, and floating-point division should occur; so there's no need to cast both to a float.
Having said that, I always cast both to a float, anyway.
I think as long as you are casting just one of the two variables the compiler will behave properly (At least on the compilers that I know).
So all of:
float c = (float)a / b;
float c = a / (float)b;
float c = (float)a / (float)b;
will have the same result.
Then there are older brain-damaged types like me who, having to use old-fashioned languages, just unthinkingly write stuff like
int a;
int b;
float z;
z = a*1.0*b;
Of course this isn't universal, good only for pretty much just this case.
Having worked on safety-critical systems, i tend to be paranoid and always cast both factors: float(a)/float(b) - just in case some subtle gotcha is planning to bite me later. No matter how good the compiler is said to be, no matter how well-defined the details are in the official language specs. Paranoia: a programmer's best friend!
Do you need to cast one or two sides? The answer isn't dictated by the compiler. It has to know the exact, precse rules. Instead, the answer should be dictated by the person who will read the code later. For that reason alone, cast both sides to the same type. Implicit truncation might be visible enough, so that cast could be redundant.
e.g. this cast float->int is obvious.
int a = float(foo()) * float(c);