Are C/C++ operator precedence & associativity rules ever violated? - c++

Are operator precedence & associativity rules ever violated in any C/C++ expression?
If so, can you give an example?
Assume the claims of precedence and associativity rules are:
Each operator has a given precedence level, and each precedence level has a given associativity. If a sub-expression is seen by two operators where they expect an operand, it belongs to the one with higher precedence. Ties are broken by associativity.
Edit: Background
The standard defines C/C++ expressions as a CFG, which is much more flexible than a precedence-based parser. For example, it would have been possible to give binary operators asymmetrical "precedence", which would have rendered any precedence table incorrect. However, it appears to me that the design of the grammar was constrained to uphold simple precedence rules. Here are some alleged "counterexamples" that I have come across:
1) a?b,c:d is not interpreted as (a?b),(c:d)
Some claim that the ?: operator exhibits different precedence towards its middle operand than towards its other operands, because a?b,c:d, for example, is not interpreted as (a?b),(c:d). However, neither b nor c occupies a position in which it appears to ?: as its inner operand. By that reasoning a[b+c] should be interpreted as (a[b)+(c]), which is ludicrous.
2) sizeof(int)*a is interpreted as (sizeof(int))*a rather than sizeof((int)(*a))
... because C disallows an uparenthesized cast as sizeof's operator. However, both of these interpretations conform to precedence rules. The confusion comes from the * operator's ambiguity (Is it the binary or the unary operator?). Precedence tables are not meant to resolve operator ambiguities. They are, after all, not operator-symbol-precedence tables. So the operator precedence rules themselves are intact.
3) a+b=c results in syntax error, not semantic error
a+b=c, according to the standard, is invalid C syntax. If C had had a precedence-based parser, it would only have been caught at the semantic level. In C, it so happens that any expression that is not a unary-expression cannot be l-valued. These semantically doomed LHS expressions therefore do not need to be accommodated syntactically. It makes no difference to the language as a whole, and precedence tables needn't be in the business of predicting the syntacticness/symanticness of the error that is going to result from an expression.

For one example, the usual precedence table says that sizeof and cast expressions have the same precedence. Both the table and the standard say that they associate right-to-left.
This simplification is fine when you're looking at, say, *&foo, which means the same as *(&foo).
It might also suggest to you that sizeof (int) 1 is legal C++ and that it means the same thing as sizeof( (int) 1 ). But it's not legal, because in fact sizeof( type-id ) is a special thing of its own in the grammar. Its existence prevents sizeof (int) 1 from being a sizeof expression whose operand is a cast-expression whose operand is 1.
So I think you could say that the "sizeof ( type-id )" production in the C++ grammar is an exception to what the usual precedence/associativity tables say. They do accurately describe the "sizeof unary-expression" production.

It depends on whether the "rules" are correct. The language definition doesn't talk about precedence, so the precedence tables you see in various places may or may not reflect what the language definition actually requires.

Until someone can find a counterexample, I'm going to put forward this as the default answer:
No, C/C++ precedence and associativity rules are never violated.

Related

Operator precedence, inconsistent documentations

I'm refreshing my memory about operator precedence, because I try to be smart guy and avoid parentheses as much as possible, refreshing on following 2 links:
cpp reference
MS docs
One problem I have, is that those 2 "reliable" docs are not telling the same thing, I no longer know whom to trust?
For one example, Cppreference says throw keyword is in same group as the conditional operator. Microsoft's docs say the conditional operator is higher than throw. There are other differences.
Which site is correct, or are both sites wrong in different ways?
TL;DR: The Microsoft docs can be interpreted to be less correct, depending on how you look at them.
The first thing you have to understand is that C++ as a language does not have "operator precedence" rules. C++ has a grammar; it is the grammar that defines what a particular piece of C++ syntax means. It is the C++ grammar that tells you that 5 + 3 * 4 should be considered equivalent to 5 + (3 * 4) rather than (5 + 3) * 4.
Therefore, any "operator precedence" rules that you see are merely a textual, human-readable explanation of the C++ grammar around expression parsing. As such, one can imagine that two different ways of describing the behavior of the same grammar could exist.
Consider the specific example of throw vs. the ?: operator. The Microsoft site says that ?: has higher precedence than throw, while the Cppreference site says that they have the same precedence.
First, let's look at a hypothetical C++ expression:
throw val ? one : two
By Microsoft's rules, the ?: operator has higher precedence, so would be parsed as throw (val ? one : two). By Cppreference's rules, the two operators have equal precedence. However, since they have right-to-left associativity, the ?: gets first dibs on the sub-expressions. So we have throw (val ? one : two).
So both of them resolve to the same result.
But what does the C++ grammar say? Well, here's a relevant fragment of the grammar:
throw-expression:
throw assignment-expression(opt)
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
This is parsed as a throw-expression, which contains an assignment-expression, which contains a conditional-expression, which is where our ?: lies. In short, the parser parses it as throw (val ? one : two).
So both pages are the same, and both of them are correct.
Now consider:
val ? throw one : two
How does this get parsed? Well, the thing to remember is that ?: is a ternary operator; unlike most others, it has three terms. That is, the conditional-expression itself is not finished being specified until the : <something> gets parsed.
So the precedence of throw vs ?: is irrelevant in this case. The throw one is within the ternary operator because the expression is literally within the ternary operator. The two operators are not competing.
Lastly, how about:
val ? one : throw two
Microsoft gives ?: higher precedence. By Microsoft's documentation, precedence "specifies the order of operations in expressions that contain more than one operator". So the ?: happens first.
Here's the rub though. throw by itself is actually a grammatically legal expression (it's only valid C++ within a catch clause, but the grammar is legal everywhere). As such, val ? one : throw could be a legitimate expression, which is what the Microsoft docs' rules would appear to say.
Of course, (val ? one : throw) two is not a legitimate expression, because () two isn't legal C++ grammar. So one could interpret Microsoft's rules to say that this should be a compile error.
But it's not. C++'s grammar states:
conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression
throw two is the full assignment-expression used as the third operand of the given expression. So this should be parsed as val ? one : (throw two).
And what of Cppreference? Well, by giving them right-to-left associativity, the throw two is grouped with itself. So it should be considered val ? one : (throw two).

How are (complex) declarations parsed in terms of precedence and associativity?

Symbols, like &, *, etc., are used in both expressions and declarations, which are two distinctive concepts.
In expressions, the symbols are operators, for which we have a well-defined table of precedence and associativity. When an expression is complex, we can decompose and analyze it using this table.
e.g.
(a + b) * ++c
Question:
In declarations, these symbols are not operator and hence we cannot apply the table of precedence and associativity for operators. Is there a table of precedence and associativity for symbols in declarations?
Or in other words, when an declaration gets complicated (try this one int*& (*&f(int*))), is there a systematic way to decompose and analyze it?
A closely related follow-up question:
Some book (primer) taught us how to read complex declaration with an example of typedef:
typedef int (*tp_alias)[10]; //defines tp_alias as an pointer to an array of 10 int
Method taught by the book: use the the alias name as the starting point of reading, tp_alias is the new type name. Looking to the left, it has a *, so it is a pointer. Then look outside the parenthesis: to the right, [10] means it is an array of size 10; to the left, int means the element of the array is int.
Follow-up Question:
How do we read other type aliasing declaration, such as using? Since the alias name is no longer in position? e.g. using tp_alias = int (*)[10]?
Maybe to read from within the (), but whatif there is more than one ()s?(I have not seen one but it is a possibility.)
You use the spiral rule.
A good explanation of it is here.
There is no precedence table for expressions!
The precedence of operators is inferred from the grammar. For example, consider the definitions of logical-and-expression and logical-or-expression:
logical-and-expression:
inclusive-or-expression
logical-and-expression && inclusive-or-expression
logical-or-expression:
logical-and-expression
logical-or-expression || logical-and-expression
Now consider the expression a || b && c. It must be parsed as a logical-or-expression (where the left side is the logical-or-expression a and the right side is the logical-and-expression b && c). It can't be parsed as a logical-and-expression, because if it were, then the left side, a || b, would have to be a logical-and-expression too, and it isn't.
On the other hand, in (a || b) && c, you can't parse it as a logical-or-expression because then you'd have (a as the left side and b) && c as the right side, and neither is a valid expression. You can parse it as a logical-and-expression because (a || b), unlike a || b, is a valid logical-and-expression.
The compiler parses the code, and builds a syntax tree. If the root node is a logical-or-expression, then the logical OR operation is done last. If the root node is a logical-and-expression, then the logical AND operation is done last. And so on.
OK, so that was the bad news. Now the good news.
Even though there's no precedence table for operators in expressions, most of the time you can just pretend there is (and of course it can be overridden by parentheses); that's how you parse expressions mentally. So, it turns out that the same is true for declarators. There is no precedence table, but you can mentally parse them as though there is.
In fact, the designers of C were wise enough to make declarations use the same "precedence rules" as expressions. This is the "declaration follows usage" rule. For example, in the expression
(*a)[10]
we have an array indexing expression containing an indirection expression, and not vice versa. In the same way, in
int (*a)[10];
what we have is an array declarator containing a pointer declarator, and not vice versa. (Therefore, the result is a pointer to array.) So if you remember from expressions that [] (array indexing) and () (function call) have higher precedence than * (indirection), you can apply that same ordering to understanding declarations too. (References are a special case; it's easy to remember that references have the same "precedence" in a declaration as pointers.)

Does precedence & associativity group operators or operands?

By reading the book "C++ Primer" and wikipedia, I notice both mentioned that "precedence and associativity defines the grouping of OPERATORS". However, it appears to me the examples they given were showing grouping of OPERANDS. Here I quote:
from "Defined terms # C++ Primer 5th edition":
associativity: Determines how OPERATORS with the same precedence are
grouped.
from Operator_associativity # Wikipedia:
Consider the expression a ~ b ~ c. If the operator ~ has left
associativity, this expression would be interpreted as (a ~ b) ~ c. If
the operator has right associativity, the expression would be
interpreted as a ~ (b ~ c).
But from what I see, the above explanation grouped two OPERANDS (not operators): a and b into (a ~ b), or b and c into (b ~ c). Because I see they parenthesized two operands, but not two operators.
Given that operator and operands are different concepts, does the precedence and associativity rule group operator or operands ?
Thanks in advance.
Precedence and associativity address how a language interprets parenthesis-free expressions involving three or more operands. I'll use the symbols # and # to denote two operators. Consider the expressions
a # b # c,
x # y # z,
d # e # f, and
u # v # w.
Note that the first two expressions involve different operands. The C++ precedence rules determines whether a#b#c is interpreted as meaning (a#b)#c or a#(b#c), and whether x#y#z is interpreted as meaning x#(y#z) or (x#y)#z.
The latter two expressions involve the same operand. Precedence has no bearing here. It's associativity that determines whether d#e#f is interpreted as meaning (d#e)#f or d#(e#f), and whether u#v#w is interpreted as meaning (u#v)#w or u#(v#w).
C++ has a large number of operators. There's an easy way to deal with the plethora of precedences: Use parentheses. My rule is "Everyone knows a*b+c means (a*b)+c. Nobody but a language lawyer knows whether a?b:c=d means (a?b:c)=d or a?b:(c=d). Use parentheses when in doubt."
Note: Apparently even Microsoft and wikipedia don't know the correct answer to "What does a?b:c=d mean?", at least as of June 16, 2014. The precedence tables at wikipedia and Microsoft have the ternary operator separate from the lower precedence assignment operators, which is incorrect. That would mean a?b:c=d needs to be interpreted as (a?b:c)=d, which always assigns the value of d to b or c, depending on whether a is true of false. That is incorrect. The correct interpretation is a?b:(c=d). The precedence tables at cppreference.com and cplusplus.com correctly group the ternary operator with the assignment operators.
There's an even better solution to puzzling out what the standard says: Just use parentheses.
Given that operator and operands are different concepts, does the precedence and associativity rule group operator or operands ?
Both. It groups operands and the corresponding operator(s). In:
(a ~ b) ~ c
you are grouping a and b, which are operands, by the ~ operator. Precedence and associativity are properties only of the operand, though.
The associativity rules of a language specify which operator is evaluated first when two operators with the same precedence are adjacent in an expression.
The precedence rules of a language specify which operator is evaluated first when two operators with different precedence are adjacent in an expression.
Wiki says:
[..] the associativity (or fixity) of an operator is a property that determines how operators of the same precedence are grouped in the absence of parentheses. If an operand is both preceded and followed by operators (for example, "^ 4 ^"), and those operators have equal precedence, then the operand may be used as input to two different operations (i.e. the two operations indicated by the two operators). The choice of which operations to apply the operand to, is determined by the "associativity" of the operators.
This means that operators are grouped among the operands and operands are grouped among the operators.

Difference between sequence points and operator precedence? 0_o

Let me present a example :
a = ++a;
The above statement is said to have undefined behaviors ( I already read the article on UB on SO)
but according precedence rule operator prefix ++ has higher precedence than assignment operator =
so a should be incremented first then assigned back to a. so every evaluation is known, so why it is UB ?
The important thing to understand here is that operators can produce values and can also have side effects.
For example ++a produces (evaluates to) a + 1, but it also has the side effect of incrementing a. The same goes for a = 5 (evaluates to 5, also sets the value of a to 5).
So what you have here is two side effects which change the value of a, both happening between sequence points (the visible semicolon and the end of the previous statement).
It does not matter that due to operator precedence the order in which the two operators are evaluated is well-defined, because the order in which their side effects are processed is still undefined.
Hence the UB.
Precedence is a consequence of the grammar rules for parsing expressions. The fact that ++ has higher precedence than = only means that ++ binds to its operand "tighter" than =. In fact, in your example, there is only one way to parse the expression because of the order in which the operators appear. In an example such as a = b++ the grammar rules or precedence guarantee that this means the same as a = (b++) and not (a = b)++.
Precedence has very little to do with the order of evaluation of expression or the order in which the side-effects of expressions are applied. (Obviously, if an operator operates on another expression according to the grammar rules - or precedence - then the value of that expression has to be calculated before the operator can be applied but most independent sub-expressions can be calculated in any order and side-effects also processed in any order.)
why it is UB ?
Because it is an attempt to change the variable a two times before one sequence point:
++a
operator=
Sequence point evaluation #6: At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;. from Wikipedia.
You're trying to change the same variable, a, twice. ++a changes it, and assignment (=) changes it. But the sequence point isn't complete until the end of the assignment. So, while it makes complete sense to us - it's not guaranteed by the standard to give the right behavior as the standard says not to change something more than once in a sequence point (to put it simply).
It's kind of subtle, but it could be interpreted as one of the following (and the compiler doesn't know which:
a=(a+1);a++;
a++;a=a;
This is because of some ambiguity in the grammar.

difference between infix, infixr, infixl

I read a book that uses infix, infixr, and infixl in the sample programs. I'm wondering what the differences are. I'm guessing that infixr performs operation from right to left, and vice versa.
Yes, the r/l indicates the associativity. Without testing I'd assume that infix has normal left associativity.
infix defines the operator to be left-associative, infixr defines it to be right-associative. infixl does not exist.
It depends on the implementation. The SML '97 standard is a little different from SML/NJ and Mlton. You get slightly different behaviour with each in terms of associativity rules and the way expressions are parenthesised depending on priority (the standard is a bit stricter than the implementations).