I'm refreshing my memory about operator precedence, because I try to be smart guy and avoid parentheses as much as possible, refreshing on following 2 links:
cpp reference
MS docs
One problem I have, is that those 2 "reliable" docs are not telling the same thing, I no longer know whom to trust?
For one example, Cppreference says throw keyword is in same group as the conditional operator. Microsoft's docs say the conditional operator is higher than throw. There are other differences.
Which site is correct, or are both sites wrong in different ways?
TL;DR: The Microsoft docs can be interpreted to be less correct, depending on how you look at them.
The first thing you have to understand is that C++ as a language does not have "operator precedence" rules. C++ has a grammar; it is the grammar that defines what a particular piece of C++ syntax means. It is the C++ grammar that tells you that 5 + 3 * 4 should be considered equivalent to 5 + (3 * 4) rather than (5 + 3) * 4.
Therefore, any "operator precedence" rules that you see are merely a textual, human-readable explanation of the C++ grammar around expression parsing. As such, one can imagine that two different ways of describing the behavior of the same grammar could exist.
Consider the specific example of throw vs. the ?: operator. The Microsoft site says that ?: has higher precedence than throw, while the Cppreference site says that they have the same precedence.
First, let's look at a hypothetical C++ expression:
throw val ? one : two
By Microsoft's rules, the ?: operator has higher precedence, so would be parsed as throw (val ? one : two). By Cppreference's rules, the two operators have equal precedence. However, since they have right-to-left associativity, the ?: gets first dibs on the sub-expressions. So we have throw (val ? one : two).
So both of them resolve to the same result.
But what does the C++ grammar say? Well, here's a relevant fragment of the grammar:
throw-expression:
throw assignment-expression(opt)
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
This is parsed as a throw-expression, which contains an assignment-expression, which contains a conditional-expression, which is where our ?: lies. In short, the parser parses it as throw (val ? one : two).
So both pages are the same, and both of them are correct.
Now consider:
val ? throw one : two
How does this get parsed? Well, the thing to remember is that ?: is a ternary operator; unlike most others, it has three terms. That is, the conditional-expression itself is not finished being specified until the : <something> gets parsed.
So the precedence of throw vs ?: is irrelevant in this case. The throw one is within the ternary operator because the expression is literally within the ternary operator. The two operators are not competing.
Lastly, how about:
val ? one : throw two
Microsoft gives ?: higher precedence. By Microsoft's documentation, precedence "specifies the order of operations in expressions that contain more than one operator". So the ?: happens first.
Here's the rub though. throw by itself is actually a grammatically legal expression (it's only valid C++ within a catch clause, but the grammar is legal everywhere). As such, val ? one : throw could be a legitimate expression, which is what the Microsoft docs' rules would appear to say.
Of course, (val ? one : throw) two is not a legitimate expression, because () two isn't legal C++ grammar. So one could interpret Microsoft's rules to say that this should be a compile error.
But it's not. C++'s grammar states:
conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression
throw two is the full assignment-expression used as the third operand of the given expression. So this should be parsed as val ? one : (throw two).
And what of Cppreference? Well, by giving them right-to-left associativity, the throw two is grouped with itself. So it should be considered val ? one : (throw two).
Related
Exercise 4.20 of C++ Primer, 5e asks whether the expression iter++->empty(); is legal. Assume that iter is a vector<string>::iterator.
This expression is legal. I compiled it with gcc, and the answers to another question on Stack Overflow have addressed this much. However, I'm confused as to why it is legal.
This answer to a similar question gives the following as an equivalent pair of expressions:
iter->empty();
iter++;
The operator precedence table in my book lists -> as having higher precedence than the postfix ++ operator. This matches the explicit order of operations in the equivalent code above. However, I am used to seeing operators apply to whatever is right next to them. In the case of ->, I expected the compiler would to apply it to ++ (by itself, without iter) and throw an error. In other words, I tried to parenthesize the original expression as iter(++->empty());, which is obviously illegal.
So, it seems like c++ requires compilers to parse expressions in a more complex way than just parenthesizing based on precedence and associativity. Is that right? If there is an easy way to explain how this actually happens, I would like to know about it.
Per cppreference, ++ and -> have the same precedence and have left to aright associativity. That means that iter++ is executed first, and then ->empty() is applied to the result of iter++, which is just iter (from before the increment) since it is postfix increment.
There are lots of questions on concepts of precedence and order of evaluation but I failed to find one that refers to my special case.
Consider the following statement:
if(f(0) && g(0)) {};
Is it guaranteed that f(0) will be evaluated first? Notice that the operator is &&.
My confusion stems from what I've read in "The C++ Programming Language, (Stroustrup, 4ed, 2013)".
In section 10.3.2 of the book, it says:
The order of evaluation of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left-to-right. For example:
int x = f(2)+g(3); // undefined whether f() or g() is called first
This seems to apply to all operators including && operator, but in a following paragraph it says:
The operators , (comma), && (logical and), and || (logical or) guarantee that their left-hand operand is evaluated before their right-hand operand.
There is also another mention of this in section 11.1.1:
The && and || operators evaluate their second argument only if necessary, so they can be used to control evaluation order (ยง10.3.2). For example:
while (p && !whitespace(p)) ++p;
Here, p is not dereferenced if it is the nullptr.
This last quote implies that && and || evaluate their 1st argument first, so it seems to reinforce my assumption that operators mentioned in 2nd quote are exceptions to 1st quote, but I cannot draw a definitive conclusion from this last example either, as the expression contains only one subexpression as opposed to my example, which contains two.
The special sequencing behavior of &&, ||, and , is well-established in C and C++. The first sentence you quoted should say "The order of evaluation of subexpressions within an expression is generally unspecified" or "With a few specific exceptions, the order of evaluation of subexpressions within an expression is unspecified".
You asked about C++, but this question in the C FAQ list is pertinent.
Addendum: I just realized that "unspecified" is a better word in these rules than "undefined". Writing something like f() + g() doesn't give you undefined behavior. You just have no way of knowing whether f or g might be called first.
Yes, it is guaranteed that f(0) will be completely evaluated first.
This is to support behaviour known as short-circuiting, by which we don't need to call the second function at all if the first returns false.
Symbols, like &, *, etc., are used in both expressions and declarations, which are two distinctive concepts.
In expressions, the symbols are operators, for which we have a well-defined table of precedence and associativity. When an expression is complex, we can decompose and analyze it using this table.
e.g.
(a + b) * ++c
Question:
In declarations, these symbols are not operator and hence we cannot apply the table of precedence and associativity for operators. Is there a table of precedence and associativity for symbols in declarations?
Or in other words, when an declaration gets complicated (try this one int*& (*&f(int*))), is there a systematic way to decompose and analyze it?
A closely related follow-up question:
Some book (primer) taught us how to read complex declaration with an example of typedef:
typedef int (*tp_alias)[10]; //defines tp_alias as an pointer to an array of 10 int
Method taught by the book: use the the alias name as the starting point of reading, tp_alias is the new type name. Looking to the left, it has a *, so it is a pointer. Then look outside the parenthesis: to the right, [10] means it is an array of size 10; to the left, int means the element of the array is int.
Follow-up Question:
How do we read other type aliasing declaration, such as using? Since the alias name is no longer in position? e.g. using tp_alias = int (*)[10]?
Maybe to read from within the (), but whatif there is more than one ()s?(I have not seen one but it is a possibility.)
You use the spiral rule.
A good explanation of it is here.
There is no precedence table for expressions!
The precedence of operators is inferred from the grammar. For example, consider the definitions of logical-and-expression and logical-or-expression:
logical-and-expression:
inclusive-or-expression
logical-and-expression && inclusive-or-expression
logical-or-expression:
logical-and-expression
logical-or-expression || logical-and-expression
Now consider the expression a || b && c. It must be parsed as a logical-or-expression (where the left side is the logical-or-expression a and the right side is the logical-and-expression b && c). It can't be parsed as a logical-and-expression, because if it were, then the left side, a || b, would have to be a logical-and-expression too, and it isn't.
On the other hand, in (a || b) && c, you can't parse it as a logical-or-expression because then you'd have (a as the left side and b) && c as the right side, and neither is a valid expression. You can parse it as a logical-and-expression because (a || b), unlike a || b, is a valid logical-and-expression.
The compiler parses the code, and builds a syntax tree. If the root node is a logical-or-expression, then the logical OR operation is done last. If the root node is a logical-and-expression, then the logical AND operation is done last. And so on.
OK, so that was the bad news. Now the good news.
Even though there's no precedence table for operators in expressions, most of the time you can just pretend there is (and of course it can be overridden by parentheses); that's how you parse expressions mentally. So, it turns out that the same is true for declarators. There is no precedence table, but you can mentally parse them as though there is.
In fact, the designers of C were wise enough to make declarations use the same "precedence rules" as expressions. This is the "declaration follows usage" rule. For example, in the expression
(*a)[10]
we have an array indexing expression containing an indirection expression, and not vice versa. In the same way, in
int (*a)[10];
what we have is an array declarator containing a pointer declarator, and not vice versa. (Therefore, the result is a pointer to array.) So if you remember from expressions that [] (array indexing) and () (function call) have higher precedence than * (indirection), you can apply that same ordering to understanding declarations too. (References are a special case; it's easy to remember that references have the same "precedence" in a declaration as pointers.)
Almost all C/C++ operator precedence tables I have consulted list the ternary conditional operator as having higher precedence than the assignment operators. There are a few tables, however, such as the one on wikipedia, and the one at operator-precedence.com, that place them on the same precedence level. Which is it, higher or same?
In the C++ grammar,
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression
initializer-clause:
assignment-expression
braced-init-list
could be combined to
assignment-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression
logical-or-expression assignment-operator assignment-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
If only looking at = and ?:, and if ignoring the inner expression between ? and :, this clearly gives ?: and = the exact same precedence.
This is different from the C grammar, in which neither ?:'s left nor its right operand can have an assignment operator as its topmost operator.
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
So for C, it makes sense to give them different precedence levels.
That said, precedence levels are only an approximation of what the standard actually says, there will be cases for any precedence levels you choose that show the levels to be misleading or just plain wrong. Depending on your interpretation, the inner expression of ?: may be one of them, it is for me.
The answer for C++ is that ?: and = have the same precedence. Yes, almost every C++ operator precedence table out there is wrong.
In C it doesn't matter whether ?: is higher than = or not, because in C the ?: operator is not allowed to evaluate to an l-value, which is what it would have to do if precedence were to influence the behavior (given that they are already RTL associative). See the discussion under Luchian Crigore's answer for example.
Perhaps this error is so widespread because early C++ operator precedence tables may have been copied and extended from C tables. And perhaps the error has persisted because the only counterexample - expressions of the form a?b:c=d - are rarely used. Perhaps.
You'll find that, in the standard:
5 Expressions [expr]
58) The precedence of operators is not directly specified, but it can be derived from the syntax. (note)
This means precedence tables are inferred, not specified. As long as they behave the same, you can say that both are right. So, even if a precedence table places them as having the same precedence, or places the ternary above the assignment operator, in practice the same thing happens, because of the syntax.
Note that associativity plays a bigger role here (this is also derived from the syntax).
Even if you assume that they have the same precedence:
a = b ? c : d;
will be treated as a = (b ? c : d) because they are both right-to-left associative.
Are operator precedence & associativity rules ever violated in any C/C++ expression?
If so, can you give an example?
Assume the claims of precedence and associativity rules are:
Each operator has a given precedence level, and each precedence level has a given associativity. If a sub-expression is seen by two operators where they expect an operand, it belongs to the one with higher precedence. Ties are broken by associativity.
Edit: Background
The standard defines C/C++ expressions as a CFG, which is much more flexible than a precedence-based parser. For example, it would have been possible to give binary operators asymmetrical "precedence", which would have rendered any precedence table incorrect. However, it appears to me that the design of the grammar was constrained to uphold simple precedence rules. Here are some alleged "counterexamples" that I have come across:
1) a?b,c:d is not interpreted as (a?b),(c:d)
Some claim that the ?: operator exhibits different precedence towards its middle operand than towards its other operands, because a?b,c:d, for example, is not interpreted as (a?b),(c:d). However, neither b nor c occupies a position in which it appears to ?: as its inner operand. By that reasoning a[b+c] should be interpreted as (a[b)+(c]), which is ludicrous.
2) sizeof(int)*a is interpreted as (sizeof(int))*a rather than sizeof((int)(*a))
... because C disallows an uparenthesized cast as sizeof's operator. However, both of these interpretations conform to precedence rules. The confusion comes from the * operator's ambiguity (Is it the binary or the unary operator?). Precedence tables are not meant to resolve operator ambiguities. They are, after all, not operator-symbol-precedence tables. So the operator precedence rules themselves are intact.
3) a+b=c results in syntax error, not semantic error
a+b=c, according to the standard, is invalid C syntax. If C had had a precedence-based parser, it would only have been caught at the semantic level. In C, it so happens that any expression that is not a unary-expression cannot be l-valued. These semantically doomed LHS expressions therefore do not need to be accommodated syntactically. It makes no difference to the language as a whole, and precedence tables needn't be in the business of predicting the syntacticness/symanticness of the error that is going to result from an expression.
For one example, the usual precedence table says that sizeof and cast expressions have the same precedence. Both the table and the standard say that they associate right-to-left.
This simplification is fine when you're looking at, say, *&foo, which means the same as *(&foo).
It might also suggest to you that sizeof (int) 1 is legal C++ and that it means the same thing as sizeof( (int) 1 ). But it's not legal, because in fact sizeof( type-id ) is a special thing of its own in the grammar. Its existence prevents sizeof (int) 1 from being a sizeof expression whose operand is a cast-expression whose operand is 1.
So I think you could say that the "sizeof ( type-id )" production in the C++ grammar is an exception to what the usual precedence/associativity tables say. They do accurately describe the "sizeof unary-expression" production.
It depends on whether the "rules" are correct. The language definition doesn't talk about precedence, so the precedence tables you see in various places may or may not reflect what the language definition actually requires.
Until someone can find a counterexample, I'm going to put forward this as the default answer:
No, C/C++ precedence and associativity rules are never violated.