C++ Order of Evaluation of Subexpressions with Logical Operators - c++

There are lots of questions on concepts of precedence and order of evaluation but I failed to find one that refers to my special case.
Consider the following statement:
if(f(0) && g(0)) {};
Is it guaranteed that f(0) will be evaluated first? Notice that the operator is &&.
My confusion stems from what I've read in "The C++ Programming Language, (Stroustrup, 4ed, 2013)".
In section 10.3.2 of the book, it says:
The order of evaluation of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left-to-right. For example:
int x = f(2)+g(3); // undefined whether f() or g() is called first
This seems to apply to all operators including && operator, but in a following paragraph it says:
The operators , (comma), && (logical and), and || (logical or) guarantee that their left-hand operand is evaluated before their right-hand operand.
There is also another mention of this in section 11.1.1:
The && and || operators evaluate their second argument only if necessary, so they can be used to control evaluation order (§10.3.2). For example:
while (p && !whitespace(p)) ++p;
Here, p is not dereferenced if it is the nullptr.
This last quote implies that && and || evaluate their 1st argument first, so it seems to reinforce my assumption that operators mentioned in 2nd quote are exceptions to 1st quote, but I cannot draw a definitive conclusion from this last example either, as the expression contains only one subexpression as opposed to my example, which contains two.

The special sequencing behavior of &&, ||, and , is well-established in C and C++. The first sentence you quoted should say "The order of evaluation of subexpressions within an expression is generally unspecified" or "With a few specific exceptions, the order of evaluation of subexpressions within an expression is unspecified".
You asked about C++, but this question in the C FAQ list is pertinent.
Addendum: I just realized that "unspecified" is a better word in these rules than "undefined". Writing something like f() + g() doesn't give you undefined behavior. You just have no way of knowing whether f or g might be called first.

Yes, it is guaranteed that f(0) will be completely evaluated first.
This is to support behaviour known as short-circuiting, by which we don't need to call the second function at all if the first returns false.

Related

C++ compiler optimizations and short-circuit evaluation [duplicate]

This question already has answers here:
Is short-circuiting logical operators mandated? And evaluation order?
(7 answers)
Closed 7 years ago.
Here is my code :
b = f() || b;
The function f() has side effect and it must be always executed. Normally, only the right operand can be short-circuited and this code should work. But I am afraid some compilators reverse the two operands, since it's more efficient to short-circuit a function evaluation rather than a simple variable evaluation. I know that g++ -O3 can break some specifications, but I don't know if this code can be affected.
So, is my code risk-free?
I knew Is short-circuiting logical operators mandated? And evaluation order? but my question was about compilers optimizations, I didn't know that they can't break the standards (even if this would be strange).
But I am afraid some compilators reverse the two operands
These expressions must be evaluated left-to-right. This is covered in the standard about the operators &&, ||, ?, and ,. They specifically mention the order, as well as enforced sequence points.
§5.14.1 (Logical AND)
The && operator groups left-to-right. The operands are both contextually converted to bool (Clause 4). The result is true if both operands are true and false otherwise. Unlike &, && guarantees left-to-right evaluation: the second operand is not evaluated if the first operand is false.
§5.15.1 (Logical OR)
The || operator groups left-to-right. The operands are both contextually converted to bool (Clause 4). It returns true if either of its operands is true, and false otherwise. Unlike |, || guarantees left-to-right evaluation; moreover, the second operand is not evaluated if the first operand evaluates to true.
§5.16.1 (Conditional operator)
Conditional expressions group right-to-left. The first expression is contextually converted to bool (Clause 4). It is evaluated and if it is true, the result of the conditional expression is the value of the second expression,
otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value computation and side effect associated with the first expression is sequenced before every value computation
and side effect associated with the second or third expression.
§5.19.1 (Comma operator)
The comma operator groups left-to-right. A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded value
expression (Clause 5). Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression. The type and value of the result are the type and value of the right operand; the result is of the same value category
as its right operand, and is a bit-field if its right operand is a glvalue and a bit-field. If the value of the right operand is a temporary (12.2), the result is that temporary.
Regarding your concern about optimizations violating this order, no compilers are not allowed to change the order. Compilers must first and foremost (try to) follow the standard. Then they can try to make your code faster. They may not violate the standard just for the sake of performance. That undermines the entire premise of having a standard.
It's explicitly stated by the standard that optimized code should behave "as-if" it was exactly the code that was written, as long as you only rely on standard behavior.
Since the standard requires the boolean statements to be evaluated from left to right, no (compliant) optimization can change the order of evaluation.

computation order of equal priority operands in C

What is the computation order of the equal priority operands in C / C++ ?
For example in following piece of code:
if ( scanf("%c", &ch_variable) && (ch_variable == '\n') )
Can I be sure that 1st expression inside the IF statement is performed before the 2nd (i.e. the value of ch_variable compared, is a newly scanned one)?
Or is it somehow decided by compiler? And if so, how this decision is being made?
BTW, I usually use the following flags for compilation:
-std=c99 -O0 -pedantic -Wall -ansi
Can I be sure that 1st expression inside the IF statement is performed before the 2nd (i.e. the value of ch_variable compared, is a newly scanned one)?
Yes - the first expression (the scanf call) is evaluated first, and what's more the second doesn't happen at all if the scanf call returns 0 - see below. That's short circuit evaluation.
Broader discussion.
Read about the operator precedence at cppreference.com
Summarily - operators are arranged in groups with well-defined relative precedence (e.g. '*' has higher precendence than +, as per usage in mathematics), and left-to-right or right-to-left associativity (e.g. a + b + c is left associative and evaluated as (a + b) + c, but a = b = c is right-associative and evaluated as a = (b = c)).
In your code:
if (scanf("%c", &ch_variable) && (ch_variable == '\n') )
The ( and ) work as you'd expect - overriding any implicit precedence between && and == (but in this case the precedence is the same). && is therefore uncontested, and as a short-circuit operator it ensures its left argument is converted - if necessary - to boolean (so if scanf returns 0 it's deemed false, otherwise true), then if and only if that's true does it evaluate the right-hand-side argument, and only if they're both true does the if statement run the following statement or {} statement block.
This has nothing to do with "priority" (operator precedence), but with the order of evaluation of sub-expressions.
The && operator is a special case in C, as it guarantees order of evaluation from left to right. There is a sequence point between the evaluation of the left operand and the right operand, meaning that the left operation will always be executed/evaluated first.
Many C operators do not come with this nice guarantee, however. Imagine the code had been like this:
if ( (scanf("%c", &ch_variable)!=0) & (ch_variable == '\n') )
This is obfuscated code but it logically does the same thing as your original code. With one exception: the & operator behaves as most operators in C, meaning there are no guarantees that the left operand will get evaluated before the right one. So my example has the potential of evaluating ch_variable before it has been given a valid value, which is a severe bug.
The order of evaluation of such sub-expressions is unspecified behavior, meaning that the compiler is free to evaluate any side first. It doesn't need to document what it will do and it doesn't even need to pick the same side consistently between compilations, or even pick the same side consistently throughout the program.
The language was deliberately designed this way to allow compilers to optimize the code in the best possible way, from case to case.
Yes, absolutely, anything involving && and || (except if you use operator&& or operator|| - which is one of the main reasons NOT to use these operators) is "strict short cutting" - in other words, if the overall outcome of the result can be determined, the rest is not evaluated, and the order is strictly left to right - always, by the language standard. [Of course, if the compiler can be SURE it's completely safe, it may reorder things, but that is part of the "as-if" definition of the standard - if the compiler behaves "as-if" it is doing it the way the standard says].
Beware that:
if(scanf("%c", &ch_variable) && scanf("%c", &second_variable))
{
...
}
else
{
...
}
may not have set "second_variable" at all in the else part, so it's unsafe to use it at this point.
I would aos use scanf("%c", &ch_variable) > 0 instead - as it could return -1 at EOF, which is true in your case, without an intermediate 0 return value...
It's guaranteed that the first expression is evaluated before the second one.
See Is short-circuiting logical operators mandated? And evaluation order? for a citation of the standard.
Note that if you overload the && operator, the whole expression is equivalent to a function call. In that case both expressions will be evaluated unconditionally (i.e. even if the first expression would be some "falsy" value...).
The order that the operands are evaluated is defined in this case, and it is left-to-right.
This goes for both C and C++.
For a reference of this, see for example page 99 of the C standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf.
Hence, in terms of order-of-evaluation, your code will do what you want it to. But it does have some other problems; see the post comments for this.

Do parentheses force order of evaluation and make an undefined expression defined?

I was just going though my text book when I came across this question:
What would be the value of a after the following expression? Assume the initial value of a = 5. Mention the steps.
a+=(a++)+(++a)
At first I thought this is undefined behaviour because a has been modified more than once. Then I read the question and it said "Mention the steps" so I probably thought this question is right.
Does applying parentheses make an undefined behaviour defined?
Is a sequence point created after evaluating a parentheses expression?
If it is defined,how do the parentheses matter since ++ and () have the same precedence?
No, applying parentheses doesn't make it a defined behaviour. It's still undefined. The C99 standard §6.5 ¶2 says
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored.
Putting a sub-expression in parentheses may force the order of evaluation of sub-expressions but it does not create a sequence point. Therefore, it does not guarantee when the side effects of the sub-expressions, if they produce any, will take place. Quoting the C99 standard again §5.1.2.3¶2
Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and no side
effects of subsequent evaluations shall have taken place.
For the sake of completeness, following are sequence points laid down by the C99 standard in Annex C.
The call to a function, after the arguments have been evaluated.
The end of the first operand of the following operators: logical AND &&; logical OR ||; conditional ?; comma ,.
The end of a full declarator.
The end of a full expression; the expression in an expression statement; the controlling expression of a selection statement (if
or switch); the controlling expression of a while or do
statement; each of the expressions of a for statement; the
expression in a return statement.
Immediately before a library function returns.
After the actions associated with each formatted input/output function conversion specifier.
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any
movement of the objects passed as arguments to that call.
Adding parenthesis does not create a sequence point and in the more modern standards it does not create a sequenced before relationship with respect to side effects which is the problem with the expression that you have unless noted the rest of this will be with respect to C++11. Parenthesis are a primary expression covered in section 5.1 Primary expressions, which has the following grammar (emphasis mine going forward):
primary-expression:
literal
this
( expression )
[...]
and in paragraph 6 it says:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. The presence of parentheses does not affect whether the expression is an lvalue. The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
The postfix ++ is problematic since we can not determine when the side effect of updating a will happen pre C++11 and in C this applies to both the postfix ++ and prefix ++ operations. With respect to how undefined behavior changed for prefix ++ in C++11 see Assignment operator sequencing in C11 expressions.
The += operation is problematic since:
[...]E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is
evaluated only once[...]
So in C++11 the following went from undefined to defined:
a = ++a + 1 ;
but this remains undefined:
a = a++ + 1 ;
and both of the above are undefined pre C++11 and in both C99 and C11.
From the draft C++11 standard section 1.9 Program execution paragraph 15 says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Operator associativity and order of evaluation [duplicate]

The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2 leads to Undefined Behavior, since Sequence points in C and C++ include:
Between evaluation of the left and right operands of the && (logical
AND), || (logical OR), and comma
operators. For example, in the
expression *p++ != 0 && *q++ != 0, all
side effects of the sub-expression
*p++ != 0 are completed before any attempt to access q.
Between the evaluation of the first operand of the ternary
"question-mark" operator and the
second or third operand. For example,
in the expression a = (*p++) ? (*p++)
: 0 there is a sequence point after
the first *p++, meaning it has already
been incremented by the time the
second instance is executed.
At the end of a full expression. This category includes expression
statements (such as the assignment
a=b;), return statements, the
controlling expressions of if, switch,
while, or do-while statements, and all
three expressions in a for statement.
Before a function is entered in a function call. The order in which
the arguments are evaluated is not
specified, but this sequence point
means that all of their side effects
are complete before the function is
entered. In the expression f(i++) + g(j++) + h(k++),
f is called with a
parameter of the original value of i,
but i is incremented before entering
the body of f. Similarly, j and k are
updated before entering g and h
respectively. However, it is not
specified in which order f(), g(), h()
are executed, nor in which order i, j,
k are incremented. The values of j and
k in the body of f are therefore
undefined.3 Note that a function
call f(a,b,c) is not a use of the
comma operator and the order of
evaluation for a, b, and c is
unspecified.
At a function return, after the return value is copied into the
calling context. (This sequence point
is only specified in the C++ standard;
it is present only implicitly in
C.)
At the end of an initializer; for example, after the evaluation of 5
in the declaration int a = 5;.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2 clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5 will make the variable result store 1.
Now the expression x<y<z in Line 5 can be evaluated as either:
x<(y<z) or (x<y)<z. In the first case the value of result will be 0 and in the second case result will be 1. But we know, when the Operator Precedence is Equal/Same - Associativity comes into play, hence, is evaluated as (x<y)<z.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article:
It mentions "Expressions with higher-precedence operators are evaluated first."
It may sound incorrect. But, I think the article is not saying something wrong if we consider that () is also an operator x<y<z is same as (x<y)<z. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since < is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z, we can see here that there are in all 3 expressions, x, y and z, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x, y, z would be there rvalues, i.e., 10, 1 and 2 respectively. Hence, now we may interpret x<y<z as 10<1<2.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1 or 1<2 and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );
Now in the above example, since the comma operator has same precedence, the expressions are evaluated left-to-right and the return value of the last printf() is stored in myval.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question.
The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b, each of a and b is evaluated, then the value computation is carried out to determine the result.
It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z. According to the associativity rules, this parses as (x<y)<z. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:
push(z); // Evaluates its argument and pushes value on stack
push(y);
push(x);
test_less(); // compares TOS to TOS(1), pushes result on stack
test_less();
This evaluates z before x or y, but still evaluates (x<y), then compares the result of that comparison to z, just as it's supposed to.
Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to x*y+z, and still evaluate z before x or y:
push(z);
push(y);
push(x);
mul();
add();
Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c; could be executed something like this:
push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);
This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a is the target of the assignment, this still evaluates a before evaluating either b or c. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.
Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
load variable[i]
add_with_carry 0
store variable[i]
}
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
The only way precedence influences order of evaluation is that it
creates dependencies; otherwise the two are orthogonal. You've
carefully chosen trivial examples where the dependencies created by
precedence do end up fully defining order of evaluation, but this isn't
generally true. And don't forget, either, that many expressions have
two effects: they result in a value, and they have side effects. These
two are no required to occur together, so even when dependencies
force a specific order of evaluation, this is only the order of
evaluation of the values; it has no effect on side effects.
A good way to look at this is to take the expression tree.
If you have an expression, lets say x+y*z you can rewrite that into an expression tree:
Applying the priority and associativity rules:
x + ( y * z )
After applying the priority and associativity rules, you can safely forget about them.
In tree form:
x
+
y
*
z
Now the leaves of this expression are x, y and z. What this means is that you can evaluate x, y and z in any order you want, and also it means that you can evaluate the result of * and x in any order.
Now since these expressions don't have side effects you don't really care. But if they do, the ordering can change the result, and since the ordering can be anything the compiler decides, you have a problem.
Now, sequence points bring a bit of order into this chaos. They effectively cut the tree into sections.
x + y * z, z = 10, x + y * z
after priority and associativity
x + ( y * z ) , z = 10, x + ( y * z)
the tree:
x
+
y
*
z
, ------------
z
=
10
, ------------
x
+
y
*
z
The top part of the tree will be evaluated before the middle, and middle before bottom.
Precedence has nothing to do with order of evaluation and vice-versa.
Precedence rules describe how an underparenthesized expression should be parenthesized when the expression mixes different kinds of operators. For example, multiplication is of higher precedence than addition, so 2 + 3 x 4 is equivalent to 2 + (3 x 4), not (2 + 3) x 4.
Order of evaluation rules describe the order in which each operand in an expression is evaluated.
Take an example
y = ++x || --y;
By operator precedence rule, it will be parenthesize as (++/-- has higher precedence than || which has higher precedence than =):
y = ( (++x) || (--y) )
The order of evaluation of logical OR || states that (C11 6.5.14)
the || operator guarantees left-to-right evaluation.
This means that the left operand, i.e the sub-expression (x++) will be evaluated first. Due to short circuiting behavior; If the first operand compares unequal to 0, the second operand is not evaluated, right operand --y will not be evaluated although it is parenthesize prior than (++x) || (--y).
It mentions "Expressions with higher-precedence operators are evaluated first."
I am just going to repeat what I said here. As far as standard C and C++ are concerned that article is flawed. Precedence only affects which tokens are considered to be the operands of each operator, but it does not affect in any way the order of evaluation.
So, the link only explains how Microsoft implemented things, not how the language itself works.
I think it's only the
a++ + ++a
epxression problematic, because
a = a++ + ++a;
fits first in 3. but then in the 6. rule: complete evaluation before assignment.
So,
a++ + ++a
gets for a=1 fully evaluated to:
1 + 3 // left to right, or
2 + 2 // right to left
The result is the same = 4.
An
a++ * ++a // or
a++ == ++a
would have undefined results. Isn't it?

Difference between sequence points and operator precedence? 0_o

Let me present a example :
a = ++a;
The above statement is said to have undefined behaviors ( I already read the article on UB on SO)
but according precedence rule operator prefix ++ has higher precedence than assignment operator =
so a should be incremented first then assigned back to a. so every evaluation is known, so why it is UB ?
The important thing to understand here is that operators can produce values and can also have side effects.
For example ++a produces (evaluates to) a + 1, but it also has the side effect of incrementing a. The same goes for a = 5 (evaluates to 5, also sets the value of a to 5).
So what you have here is two side effects which change the value of a, both happening between sequence points (the visible semicolon and the end of the previous statement).
It does not matter that due to operator precedence the order in which the two operators are evaluated is well-defined, because the order in which their side effects are processed is still undefined.
Hence the UB.
Precedence is a consequence of the grammar rules for parsing expressions. The fact that ++ has higher precedence than = only means that ++ binds to its operand "tighter" than =. In fact, in your example, there is only one way to parse the expression because of the order in which the operators appear. In an example such as a = b++ the grammar rules or precedence guarantee that this means the same as a = (b++) and not (a = b)++.
Precedence has very little to do with the order of evaluation of expression or the order in which the side-effects of expressions are applied. (Obviously, if an operator operates on another expression according to the grammar rules - or precedence - then the value of that expression has to be calculated before the operator can be applied but most independent sub-expressions can be calculated in any order and side-effects also processed in any order.)
why it is UB ?
Because it is an attempt to change the variable a two times before one sequence point:
++a
operator=
Sequence point evaluation #6: At the end of an initializer; for example, after the evaluation of 5 in the declaration int a = 5;. from Wikipedia.
You're trying to change the same variable, a, twice. ++a changes it, and assignment (=) changes it. But the sequence point isn't complete until the end of the assignment. So, while it makes complete sense to us - it's not guaranteed by the standard to give the right behavior as the standard says not to change something more than once in a sequence point (to put it simply).
It's kind of subtle, but it could be interpreted as one of the following (and the compiler doesn't know which:
a=(a+1);a++;
a++;a=a;
This is because of some ambiguity in the grammar.