How are (complex) declarations parsed in terms of precedence and associativity? - c++

Symbols, like &, *, etc., are used in both expressions and declarations, which are two distinctive concepts.
In expressions, the symbols are operators, for which we have a well-defined table of precedence and associativity. When an expression is complex, we can decompose and analyze it using this table.
e.g.
(a + b) * ++c
Question:
In declarations, these symbols are not operator and hence we cannot apply the table of precedence and associativity for operators. Is there a table of precedence and associativity for symbols in declarations?
Or in other words, when an declaration gets complicated (try this one int*& (*&f(int*))), is there a systematic way to decompose and analyze it?
A closely related follow-up question:
Some book (primer) taught us how to read complex declaration with an example of typedef:
typedef int (*tp_alias)[10]; //defines tp_alias as an pointer to an array of 10 int
Method taught by the book: use the the alias name as the starting point of reading, tp_alias is the new type name. Looking to the left, it has a *, so it is a pointer. Then look outside the parenthesis: to the right, [10] means it is an array of size 10; to the left, int means the element of the array is int.
Follow-up Question:
How do we read other type aliasing declaration, such as using? Since the alias name is no longer in position? e.g. using tp_alias = int (*)[10]?
Maybe to read from within the (), but whatif there is more than one ()s?(I have not seen one but it is a possibility.)

You use the spiral rule.
A good explanation of it is here.

There is no precedence table for expressions!
The precedence of operators is inferred from the grammar. For example, consider the definitions of logical-and-expression and logical-or-expression:
logical-and-expression:
inclusive-or-expression
logical-and-expression && inclusive-or-expression
logical-or-expression:
logical-and-expression
logical-or-expression || logical-and-expression
Now consider the expression a || b && c. It must be parsed as a logical-or-expression (where the left side is the logical-or-expression a and the right side is the logical-and-expression b && c). It can't be parsed as a logical-and-expression, because if it were, then the left side, a || b, would have to be a logical-and-expression too, and it isn't.
On the other hand, in (a || b) && c, you can't parse it as a logical-or-expression because then you'd have (a as the left side and b) && c as the right side, and neither is a valid expression. You can parse it as a logical-and-expression because (a || b), unlike a || b, is a valid logical-and-expression.
The compiler parses the code, and builds a syntax tree. If the root node is a logical-or-expression, then the logical OR operation is done last. If the root node is a logical-and-expression, then the logical AND operation is done last. And so on.
OK, so that was the bad news. Now the good news.
Even though there's no precedence table for operators in expressions, most of the time you can just pretend there is (and of course it can be overridden by parentheses); that's how you parse expressions mentally. So, it turns out that the same is true for declarators. There is no precedence table, but you can mentally parse them as though there is.
In fact, the designers of C were wise enough to make declarations use the same "precedence rules" as expressions. This is the "declaration follows usage" rule. For example, in the expression
(*a)[10]
we have an array indexing expression containing an indirection expression, and not vice versa. In the same way, in
int (*a)[10];
what we have is an array declarator containing a pointer declarator, and not vice versa. (Therefore, the result is a pointer to array.) So if you remember from expressions that [] (array indexing) and () (function call) have higher precedence than * (indirection), you can apply that same ordering to understanding declarations too. (References are a special case; it's easy to remember that references have the same "precedence" in a declaration as pointers.)

Related

Can parentheses override an expression's order of evaluation? [duplicate]

This question already has answers here:
Operator Precedence vs Order of Evaluation
(6 answers)
Closed 5 years ago.
Grouping operators and operands and Order of Evaluation are two important concepts of expression in C++.
Grouping
For expression with multiple operators, how the operands grouped with the specific operators is decided by the precedence and associativity of the operators and may depend on the order of evaluation.
Order
In C++, only 4 operators have the specified order of evaluations (logical AND, logical OR, conditional and comma operator). For the other operators, the evaluation order is unspecified.
Parentheses
Parentheses can override the precedence and associativity, and therefore specify the grouping of a compound expression.
However, the book by Peter Gottschling claims the parentheses can change the order of the evaluation. I personally doubt it; I think it's an error! In the example from the quotation below, the parentheses do not tell which expression of x, y and z is evaluated first, which one is later and which one is the last. It only groups the expression y + z as the left operand of the * operator.
An expression surrounded by parentheses is an expression as well,
e.g., (x + y). As this grouping by parentheses precedes all operators,
we can change the order of evaluation to suit our needs: x * (y + z)
computes the addition first. Discovering Modern C++, Chapter 1.4.1
Question
Can parentheses override expressions' order of evaluation?
The quoted sentence is poorly worded. The author didn't mean that the order of evaluation is changed, or even specified; I think the word "order" was meant in terms of how a human might read the expression (i.e. precedence).
Of course, if the three variables are independent and reading them has no side-effects, the "as if" rule makes the unspecified order irrelevant, as it wouldn't change the value of the expression.

C++ Order of Evaluation of Subexpressions with Logical Operators

There are lots of questions on concepts of precedence and order of evaluation but I failed to find one that refers to my special case.
Consider the following statement:
if(f(0) && g(0)) {};
Is it guaranteed that f(0) will be evaluated first? Notice that the operator is &&.
My confusion stems from what I've read in "The C++ Programming Language, (Stroustrup, 4ed, 2013)".
In section 10.3.2 of the book, it says:
The order of evaluation of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left-to-right. For example:
int x = f(2)+g(3); // undefined whether f() or g() is called first
This seems to apply to all operators including && operator, but in a following paragraph it says:
The operators , (comma), && (logical and), and || (logical or) guarantee that their left-hand operand is evaluated before their right-hand operand.
There is also another mention of this in section 11.1.1:
The && and || operators evaluate their second argument only if necessary, so they can be used to control evaluation order (ยง10.3.2). For example:
while (p && !whitespace(p)) ++p;
Here, p is not dereferenced if it is the nullptr.
This last quote implies that && and || evaluate their 1st argument first, so it seems to reinforce my assumption that operators mentioned in 2nd quote are exceptions to 1st quote, but I cannot draw a definitive conclusion from this last example either, as the expression contains only one subexpression as opposed to my example, which contains two.
The special sequencing behavior of &&, ||, and , is well-established in C and C++. The first sentence you quoted should say "The order of evaluation of subexpressions within an expression is generally unspecified" or "With a few specific exceptions, the order of evaluation of subexpressions within an expression is unspecified".
You asked about C++, but this question in the C FAQ list is pertinent.
Addendum: I just realized that "unspecified" is a better word in these rules than "undefined". Writing something like f() + g() doesn't give you undefined behavior. You just have no way of knowing whether f or g might be called first.
Yes, it is guaranteed that f(0) will be completely evaluated first.
This is to support behaviour known as short-circuiting, by which we don't need to call the second function at all if the first returns false.

computation order of equal priority operands in C

What is the computation order of the equal priority operands in C / C++ ?
For example in following piece of code:
if ( scanf("%c", &ch_variable) && (ch_variable == '\n') )
Can I be sure that 1st expression inside the IF statement is performed before the 2nd (i.e. the value of ch_variable compared, is a newly scanned one)?
Or is it somehow decided by compiler? And if so, how this decision is being made?
BTW, I usually use the following flags for compilation:
-std=c99 -O0 -pedantic -Wall -ansi
Can I be sure that 1st expression inside the IF statement is performed before the 2nd (i.e. the value of ch_variable compared, is a newly scanned one)?
Yes - the first expression (the scanf call) is evaluated first, and what's more the second doesn't happen at all if the scanf call returns 0 - see below. That's short circuit evaluation.
Broader discussion.
Read about the operator precedence at cppreference.com
Summarily - operators are arranged in groups with well-defined relative precedence (e.g. '*' has higher precendence than +, as per usage in mathematics), and left-to-right or right-to-left associativity (e.g. a + b + c is left associative and evaluated as (a + b) + c, but a = b = c is right-associative and evaluated as a = (b = c)).
In your code:
if (scanf("%c", &ch_variable) && (ch_variable == '\n') )
The ( and ) work as you'd expect - overriding any implicit precedence between && and == (but in this case the precedence is the same). && is therefore uncontested, and as a short-circuit operator it ensures its left argument is converted - if necessary - to boolean (so if scanf returns 0 it's deemed false, otherwise true), then if and only if that's true does it evaluate the right-hand-side argument, and only if they're both true does the if statement run the following statement or {} statement block.
This has nothing to do with "priority" (operator precedence), but with the order of evaluation of sub-expressions.
The && operator is a special case in C, as it guarantees order of evaluation from left to right. There is a sequence point between the evaluation of the left operand and the right operand, meaning that the left operation will always be executed/evaluated first.
Many C operators do not come with this nice guarantee, however. Imagine the code had been like this:
if ( (scanf("%c", &ch_variable)!=0) & (ch_variable == '\n') )
This is obfuscated code but it logically does the same thing as your original code. With one exception: the & operator behaves as most operators in C, meaning there are no guarantees that the left operand will get evaluated before the right one. So my example has the potential of evaluating ch_variable before it has been given a valid value, which is a severe bug.
The order of evaluation of such sub-expressions is unspecified behavior, meaning that the compiler is free to evaluate any side first. It doesn't need to document what it will do and it doesn't even need to pick the same side consistently between compilations, or even pick the same side consistently throughout the program.
The language was deliberately designed this way to allow compilers to optimize the code in the best possible way, from case to case.
Yes, absolutely, anything involving && and || (except if you use operator&& or operator|| - which is one of the main reasons NOT to use these operators) is "strict short cutting" - in other words, if the overall outcome of the result can be determined, the rest is not evaluated, and the order is strictly left to right - always, by the language standard. [Of course, if the compiler can be SURE it's completely safe, it may reorder things, but that is part of the "as-if" definition of the standard - if the compiler behaves "as-if" it is doing it the way the standard says].
Beware that:
if(scanf("%c", &ch_variable) && scanf("%c", &second_variable))
{
...
}
else
{
...
}
may not have set "second_variable" at all in the else part, so it's unsafe to use it at this point.
I would aos use scanf("%c", &ch_variable) > 0 instead - as it could return -1 at EOF, which is true in your case, without an intermediate 0 return value...
It's guaranteed that the first expression is evaluated before the second one.
See Is short-circuiting logical operators mandated? And evaluation order? for a citation of the standard.
Note that if you overload the && operator, the whole expression is equivalent to a function call. In that case both expressions will be evaluated unconditionally (i.e. even if the first expression would be some "falsy" value...).
The order that the operands are evaluated is defined in this case, and it is left-to-right.
This goes for both C and C++.
For a reference of this, see for example page 99 of the C standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf.
Hence, in terms of order-of-evaluation, your code will do what you want it to. But it does have some other problems; see the post comments for this.

Are C/C++ operator precedence & associativity rules ever violated?

Are operator precedence & associativity rules ever violated in any C/C++ expression?
If so, can you give an example?
Assume the claims of precedence and associativity rules are:
Each operator has a given precedence level, and each precedence level has a given associativity. If a sub-expression is seen by two operators where they expect an operand, it belongs to the one with higher precedence. Ties are broken by associativity.
Edit: Background
The standard defines C/C++ expressions as a CFG, which is much more flexible than a precedence-based parser. For example, it would have been possible to give binary operators asymmetrical "precedence", which would have rendered any precedence table incorrect. However, it appears to me that the design of the grammar was constrained to uphold simple precedence rules. Here are some alleged "counterexamples" that I have come across:
1) a?b,c:d is not interpreted as (a?b),(c:d)
Some claim that the ?: operator exhibits different precedence towards its middle operand than towards its other operands, because a?b,c:d, for example, is not interpreted as (a?b),(c:d). However, neither b nor c occupies a position in which it appears to ?: as its inner operand. By that reasoning a[b+c] should be interpreted as (a[b)+(c]), which is ludicrous.
2) sizeof(int)*a is interpreted as (sizeof(int))*a rather than sizeof((int)(*a))
... because C disallows an uparenthesized cast as sizeof's operator. However, both of these interpretations conform to precedence rules. The confusion comes from the * operator's ambiguity (Is it the binary or the unary operator?). Precedence tables are not meant to resolve operator ambiguities. They are, after all, not operator-symbol-precedence tables. So the operator precedence rules themselves are intact.
3) a+b=c results in syntax error, not semantic error
a+b=c, according to the standard, is invalid C syntax. If C had had a precedence-based parser, it would only have been caught at the semantic level. In C, it so happens that any expression that is not a unary-expression cannot be l-valued. These semantically doomed LHS expressions therefore do not need to be accommodated syntactically. It makes no difference to the language as a whole, and precedence tables needn't be in the business of predicting the syntacticness/symanticness of the error that is going to result from an expression.
For one example, the usual precedence table says that sizeof and cast expressions have the same precedence. Both the table and the standard say that they associate right-to-left.
This simplification is fine when you're looking at, say, *&foo, which means the same as *(&foo).
It might also suggest to you that sizeof (int) 1 is legal C++ and that it means the same thing as sizeof( (int) 1 ). But it's not legal, because in fact sizeof( type-id ) is a special thing of its own in the grammar. Its existence prevents sizeof (int) 1 from being a sizeof expression whose operand is a cast-expression whose operand is 1.
So I think you could say that the "sizeof ( type-id )" production in the C++ grammar is an exception to what the usual precedence/associativity tables say. They do accurately describe the "sizeof unary-expression" production.
It depends on whether the "rules" are correct. The language definition doesn't talk about precedence, so the precedence tables you see in various places may or may not reflect what the language definition actually requires.
Until someone can find a counterexample, I'm going to put forward this as the default answer:
No, C/C++ precedence and associativity rules are never violated.

Difference between sequence points and operator precedence? 0_o

Let me present a example :
a = ++a;
The above statement is said to have undefined behaviors ( I already read the article on UB on SO)
but according precedence rule operator prefix ++ has higher precedence than assignment operator =
so a should be incremented first then assigned back to a. so every evaluation is known, so why it is UB ?
The important thing to understand here is that operators can produce values and can also have side effects.
For example ++a produces (evaluates to) a + 1, but it also has the side effect of incrementing a. The same goes for a = 5 (evaluates to 5, also sets the value of a to 5).
So what you have here is two side effects which change the value of a, both happening between sequence points (the visible semicolon and the end of the previous statement).
It does not matter that due to operator precedence the order in which the two operators are evaluated is well-defined, because the order in which their side effects are processed is still undefined.
Hence the UB.
Precedence is a consequence of the grammar rules for parsing expressions. The fact that ++ has higher precedence than = only means that ++ binds to its operand "tighter" than =. In fact, in your example, there is only one way to parse the expression because of the order in which the operators appear. In an example such as a = b++ the grammar rules or precedence guarantee that this means the same as a = (b++) and not (a = b)++.
Precedence has very little to do with the order of evaluation of expression or the order in which the side-effects of expressions are applied. (Obviously, if an operator operates on another expression according to the grammar rules - or precedence - then the value of that expression has to be calculated before the operator can be applied but most independent sub-expressions can be calculated in any order and side-effects also processed in any order.)
why it is UB ?
Because it is an attempt to change the variable a two times before one sequence point:
++a
operator=
Sequence point evaluation #6: At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;. from Wikipedia.
You're trying to change the same variable, a, twice. ++a changes it, and assignment (=) changes it. But the sequence point isn't complete until the end of the assignment. So, while it makes complete sense to us - it's not guaranteed by the standard to give the right behavior as the standard says not to change something more than once in a sequence point (to put it simply).
It's kind of subtle, but it could be interpreted as one of the following (and the compiler doesn't know which:
a=(a+1);a++;
a++;a=a;
This is because of some ambiguity in the grammar.