What's the general rules to placing parenthesis in OCaml? - ocaml

I always have troubles to place parenthesis in OCaml. Well, I always don't want to, but sometimes get error.
for example, let's say I have two functions:
let f_a x y = x+y and let f_b x = x+1.
If I do f_a 3 f_b 4, I can't and I should do f_a 3 (f_b 4).
But if I do f_a 3 * f_b 4, it is perfectly fine.
Another example, If I do f_a x y::[], it is fine too and I don't need to add parenthesis like this (f_a x y)::[].
Also I find that I don't need parenthesis for elements inside a tuple: (f_a 1 2, f_b 3) is fine.
So can anyone teach me the general rules to decide when using parenthesis and when not?

Here's a table explaining the precedence and associativity of certain expressions in OCaml.
As you can see from there, function application is left-associative, meaning that f_a 3 f_b 4 is interpreted as (((f_a) 3) f_b) 4. However, multiplication (*) has lower precedence than function application, which means that f_a 3 * f_b 4 is interpreted as (f_a 3) * (f_b 4) (first applying functions, afterwards multiplication).
Last, :: has a lower precedence than function application, so f_a x y::[] first applies the function, and afterwards concatenates to the empty list (i.e. "consumes" ::[]). This means that f_a x y::[] is seen as (f_a x y)::[].
Unfortunately, I could not deduce a simple rule of thumb, but I always remember that "function application has a quite high precedence and is left-associative". This works quite well for me.

Parentheses are simply to group things so they are evaluated as one whereas without parentheses they may or may not be due to operator precedence. Function application has higher precedence than all normal operators. You can see the OCaml precedence table by going here and then scrolling up a little.

Related

What is ACTUALLY happening with parenthesis '()' in Clojure?

I'm looking for the technical answer answer here. How is Clojure interpreting these symbols? My current working understanding is that the opening paren '(' is a kind of call that calls the succeeding operator on the operands while the closing paren ')' is a terminate that wraps up the previous evaluation and returns the final value generated (whether function or value).
Any and all details on the truth here would be appreciated. I'm looking to go deep here as well as seeing/knowing every level of abstraction along the way. It bugs me to know that I may have some imaginal thinking going on currently.
(foo x1 x2)
is the syntax for calling a special form or var (function or macro).
So the compiler will analyze the form (foo x1 x2) and will check if foo is a special form (if, try, let*, etc.) and if not, the symbol will be resolved to a var in the context of the current namespace. If that var is macro, then macroexpansion will happen, else the call will be treated as a normal function call.
To prevent treating (foo x1 x2) as a function call you can quote the expression: '(foo x1 x2) and then it will just remain a list of symbols.
More info:
https://clojure.org/reference/special_forms
https://clojure.org/reference/macros
You are trying to do too much at once when you say
'(' is a kind of call that calls the succeeding operator on the operands while the closing paren ')' is a terminate that wraps up the previous evaluation and returns the final value generated
The Clojure evaluation model does not assign semantics to characters directly. Instead, evaluation of a Clojure program goes through two broad phases:
First, read the characters in the source file according to the language's lexical rules, yielding a Clojure data structure
Second, evaluate that data structure, according to the language's evaluation rules, yielding a value
So when we write an expression like (+ (* 4 5) 2), what happens? The reader matches up parentheses to create lists, and yields as its result a list of three elements: the symbol +, another list (containing the symbol * and the numbers 4 and 5), and the number 2.
Next we move to evaluate that expression. Notice, crucially, that at this point there is no trace of parentheses. The textual source of the program is no longer material. We're evaluating a list. Of course, if we printed that list, conventionally we would surround it with parentheses, but that does not concern the evaluator. How do we evaluate this list? Well, the evaluation rule for lists is to, first, evaluate each of the components, and then invoke the first component as a function, passing the remaining components as arguments1. So our list of pending tasks is:
Evaluate +
Evaluate (* 4 5)
Evaluate 2
Invoke the result of (1), passing the results of (2) and (3) as arguments
(1), of course, evaluates to the addition function, (2) evaluates (in a similar manner) to the number 20, and (3) evaluates to the number 2 (since numbers evaluate to themselves). Thus, (4) becomes "Invoke the addition function, passing the numbers 20 and 2 as arguments". Of course, the final result is 22.
1 The rule is actually more complicated than this, because of macros, but for functions this suffices.
The other 2 answers are great. The simple summary is:
foo( x1, x2 ) // Java function call
(foo x1 x2) // Clojure function call
In both cases, the compiler will evaluate any nested function calls in x1 or x2 before calling foo on the resulting values.

The nonfix operator in SML

I read the grammar of SML and I found out that beside infix and infixr it also contains nonfix. I tried to find some basic examples but it seems that no one uses it. Also tried to find some previous threads about that operator but there are none.
What is the idea behind nonfix? Why it seems like no one uses it?
nonfix turns an infix operator into a "regular" function on tuples. For example * is a function of type int * int -> int, but can't be called as e.g. * (2,3). If for some reason you wanted to do that, you could do the following:
nonfix *
and then * (2,3) will evaluate to 6.
Unfortunately, as an annoying side effect, you can no longer use 2 * 3.
The reason why it doesn't seem to be heavily used is that if I wanted to use * as a regular function, I could just use op: For example, op * (2,3) evaluates to 6. The annoyance of not being able to later on use * as an infix operator outweighs the advantage of not having to type op.

Operator associativity and order of evaluation [duplicate]

The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2 leads to Undefined Behavior, since Sequence points in C and C++ include:
Between evaluation of the left and right operands of the && (logical
AND), || (logical OR), and comma
operators. For example, in the
expression *p++ != 0 && *q++ != 0, all
side effects of the sub-expression
*p++ != 0 are completed before any attempt to access q.
Between the evaluation of the first operand of the ternary
"question-mark" operator and the
second or third operand. For example,
in the expression a = (*p++) ? (*p++)
: 0 there is a sequence point after
the first *p++, meaning it has already
been incremented by the time the
second instance is executed.
At the end of a full expression. This category includes expression
statements (such as the assignment
a=b;), return statements, the
controlling expressions of if, switch,
while, or do-while statements, and all
three expressions in a for statement.
Before a function is entered in a function call. The order in which
the arguments are evaluated is not
specified, but this sequence point
means that all of their side effects
are complete before the function is
entered. In the expression f(i++) + g(j++) + h(k++),
f is called with a
parameter of the original value of i,
but i is incremented before entering
the body of f. Similarly, j and k are
updated before entering g and h
respectively. However, it is not
specified in which order f(), g(), h()
are executed, nor in which order i, j,
k are incremented. The values of j and
k in the body of f are therefore
undefined.3 Note that a function
call f(a,b,c) is not a use of the
comma operator and the order of
evaluation for a, b, and c is
unspecified.
At a function return, after the return value is copied into the
calling context. (This sequence point
is only specified in the C++ standard;
it is present only implicitly in
C.)
At the end of an initializer; for example, after the evaluation of 5
in the declaration int a = 5;.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2 clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5 will make the variable result store 1.
Now the expression x<y<z in Line 5 can be evaluated as either:
x<(y<z) or (x<y)<z. In the first case the value of result will be 0 and in the second case result will be 1. But we know, when the Operator Precedence is Equal/Same - Associativity comes into play, hence, is evaluated as (x<y)<z.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article:
It mentions "Expressions with higher-precedence operators are evaluated first."
It may sound incorrect. But, I think the article is not saying something wrong if we consider that () is also an operator x<y<z is same as (x<y)<z. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since < is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z, we can see here that there are in all 3 expressions, x, y and z, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x, y, z would be there rvalues, i.e., 10, 1 and 2 respectively. Hence, now we may interpret x<y<z as 10<1<2.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1 or 1<2 and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );
Now in the above example, since the comma operator has same precedence, the expressions are evaluated left-to-right and the return value of the last printf() is stored in myval.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question.
The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b, each of a and b is evaluated, then the value computation is carried out to determine the result.
It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z. According to the associativity rules, this parses as (x<y)<z. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:
push(z); // Evaluates its argument and pushes value on stack
push(y);
push(x);
test_less(); // compares TOS to TOS(1), pushes result on stack
test_less();
This evaluates z before x or y, but still evaluates (x<y), then compares the result of that comparison to z, just as it's supposed to.
Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to x*y+z, and still evaluate z before x or y:
push(z);
push(y);
push(x);
mul();
add();
Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c; could be executed something like this:
push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);
This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a is the target of the assignment, this still evaluates a before evaluating either b or c. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.
Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
load variable[i]
add_with_carry 0
store variable[i]
}
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
The only way precedence influences order of evaluation is that it
creates dependencies; otherwise the two are orthogonal. You've
carefully chosen trivial examples where the dependencies created by
precedence do end up fully defining order of evaluation, but this isn't
generally true. And don't forget, either, that many expressions have
two effects: they result in a value, and they have side effects. These
two are no required to occur together, so even when dependencies
force a specific order of evaluation, this is only the order of
evaluation of the values; it has no effect on side effects.
A good way to look at this is to take the expression tree.
If you have an expression, lets say x+y*z you can rewrite that into an expression tree:
Applying the priority and associativity rules:
x + ( y * z )
After applying the priority and associativity rules, you can safely forget about them.
In tree form:
x
+
y
*
z
Now the leaves of this expression are x, y and z. What this means is that you can evaluate x, y and z in any order you want, and also it means that you can evaluate the result of * and x in any order.
Now since these expressions don't have side effects you don't really care. But if they do, the ordering can change the result, and since the ordering can be anything the compiler decides, you have a problem.
Now, sequence points bring a bit of order into this chaos. They effectively cut the tree into sections.
x + y * z, z = 10, x + y * z
after priority and associativity
x + ( y * z ) , z = 10, x + ( y * z)
the tree:
x
+
y
*
z
, ------------
z
=
10
, ------------
x
+
y
*
z
The top part of the tree will be evaluated before the middle, and middle before bottom.
Precedence has nothing to do with order of evaluation and vice-versa.
Precedence rules describe how an underparenthesized expression should be parenthesized when the expression mixes different kinds of operators. For example, multiplication is of higher precedence than addition, so 2 + 3 x 4 is equivalent to 2 + (3 x 4), not (2 + 3) x 4.
Order of evaluation rules describe the order in which each operand in an expression is evaluated.
Take an example
y = ++x || --y;
By operator precedence rule, it will be parenthesize as (++/-- has higher precedence than || which has higher precedence than =):
y = ( (++x) || (--y) )
The order of evaluation of logical OR || states that (C11 6.5.14)
the || operator guarantees left-to-right evaluation.
This means that the left operand, i.e the sub-expression (x++) will be evaluated first. Due to short circuiting behavior; If the first operand compares unequal to 0, the second operand is not evaluated, right operand --y will not be evaluated although it is parenthesize prior than (++x) || (--y).
It mentions "Expressions with higher-precedence operators are evaluated first."
I am just going to repeat what I said here. As far as standard C and C++ are concerned that article is flawed. Precedence only affects which tokens are considered to be the operands of each operator, but it does not affect in any way the order of evaluation.
So, the link only explains how Microsoft implemented things, not how the language itself works.
I think it's only the
a++ + ++a
epxression problematic, because
a = a++ + ++a;
fits first in 3. but then in the 6. rule: complete evaluation before assignment.
So,
a++ + ++a
gets for a=1 fully evaluated to:
1 + 3 // left to right, or
2 + 2 // right to left
The result is the same = 4.
An
a++ * ++a // or
a++ == ++a
would have undefined results. Isn't it?

Explain the difference:

int i=1,2,3,4; // Compile error
// The value of i is 1
int i = (1,2,3,4,5);
// The value of i is 5
What is the difference between these definitions of i in C and how do they work?
Edit: The first one is a compiler error. How does the second work?
= takes precedence over ,1. So the first statement is a declaration and initialisation of i:
int i = 1;
… followed by lots of comma-separated expressions that do nothing.
The second code, on the other hand, consists of one declaration followed by one initialisation expression (the parentheses take precedence so the respective precedence of , and = are no longer relevant).
Then again, that’s purely academic since the first code isn’t valid, neither in C nor in C++. I don’t know which compiler you’re using that it accepts this code. Mine (rightly) complains
error: expected unqualified-id before numeric constant
1 Precedence rules in C++ apply regardless of how an operator is used. = and , in the code of OP do not refer to operator= or operator,. Nevertheless, they are operators as far as C++ is concerned (§2.13 of the standard), and the precedence of the tokens = and , does not depend on their usage – it so happens that , always has a lower precedence than =, regardless of semantics.
You have run into an interesting edge case of the comma operator (,).
Basically, it takes the result of the previous statement and discards it, replacing it with the next statement.
The problem with the first line of code is operator precedence. Because the = operator has greater precedence than the , operator, you get the result of the first statement in the comma chain (1).
Correction (thanks #jrok!) - the first line of code neither compiles, nor is it using the comma as an operator, but instead as an expression separator, which allows you to define multiple variable names of the same type at a time.
In the second one, all of the first values are discarded and you are given the final result in the chain of items (5).
Not sure about C++, but at least for C the first one is invalid syntax so you can't really talk about a declaration since it doesn't compile. The second one is just the comma operator misused, with the result 5.
So, bluntly, the difference is that the first isn't C while the second is.

Operator Precedence vs Order of Evaluation

The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2 leads to Undefined Behavior, since Sequence points in C and C++ include:
Between evaluation of the left and right operands of the && (logical
AND), || (logical OR), and comma
operators. For example, in the
expression *p++ != 0 && *q++ != 0, all
side effects of the sub-expression
*p++ != 0 are completed before any attempt to access q.
Between the evaluation of the first operand of the ternary
"question-mark" operator and the
second or third operand. For example,
in the expression a = (*p++) ? (*p++)
: 0 there is a sequence point after
the first *p++, meaning it has already
been incremented by the time the
second instance is executed.
At the end of a full expression. This category includes expression
statements (such as the assignment
a=b;), return statements, the
controlling expressions of if, switch,
while, or do-while statements, and all
three expressions in a for statement.
Before a function is entered in a function call. The order in which
the arguments are evaluated is not
specified, but this sequence point
means that all of their side effects
are complete before the function is
entered. In the expression f(i++) + g(j++) + h(k++),
f is called with a
parameter of the original value of i,
but i is incremented before entering
the body of f. Similarly, j and k are
updated before entering g and h
respectively. However, it is not
specified in which order f(), g(), h()
are executed, nor in which order i, j,
k are incremented. The values of j and
k in the body of f are therefore
undefined.3 Note that a function
call f(a,b,c) is not a use of the
comma operator and the order of
evaluation for a, b, and c is
unspecified.
At a function return, after the return value is copied into the
calling context. (This sequence point
is only specified in the C++ standard;
it is present only implicitly in
C.)
At the end of an initializer; for example, after the evaluation of 5
in the declaration int a = 5;.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2 clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5 will make the variable result store 1.
Now the expression x<y<z in Line 5 can be evaluated as either:
x<(y<z) or (x<y)<z. In the first case the value of result will be 0 and in the second case result will be 1. But we know, when the Operator Precedence is Equal/Same - Associativity comes into play, hence, is evaluated as (x<y)<z.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article:
It mentions "Expressions with higher-precedence operators are evaluated first."
It may sound incorrect. But, I think the article is not saying something wrong if we consider that () is also an operator x<y<z is same as (x<y)<z. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since < is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z, we can see here that there are in all 3 expressions, x, y and z, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x, y, z would be there rvalues, i.e., 10, 1 and 2 respectively. Hence, now we may interpret x<y<z as 10<1<2.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1 or 1<2 and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );
Now in the above example, since the comma operator has same precedence, the expressions are evaluated left-to-right and the return value of the last printf() is stored in myval.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question.
The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b, each of a and b is evaluated, then the value computation is carried out to determine the result.
It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z. According to the associativity rules, this parses as (x<y)<z. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:
push(z); // Evaluates its argument and pushes value on stack
push(y);
push(x);
test_less(); // compares TOS to TOS(1), pushes result on stack
test_less();
This evaluates z before x or y, but still evaluates (x<y), then compares the result of that comparison to z, just as it's supposed to.
Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to x*y+z, and still evaluate z before x or y:
push(z);
push(y);
push(x);
mul();
add();
Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c; could be executed something like this:
push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);
This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a is the target of the assignment, this still evaluates a before evaluating either b or c. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.
Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
load variable[i]
add_with_carry 0
store variable[i]
}
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
The only way precedence influences order of evaluation is that it
creates dependencies; otherwise the two are orthogonal. You've
carefully chosen trivial examples where the dependencies created by
precedence do end up fully defining order of evaluation, but this isn't
generally true. And don't forget, either, that many expressions have
two effects: they result in a value, and they have side effects. These
two are no required to occur together, so even when dependencies
force a specific order of evaluation, this is only the order of
evaluation of the values; it has no effect on side effects.
A good way to look at this is to take the expression tree.
If you have an expression, lets say x+y*z you can rewrite that into an expression tree:
Applying the priority and associativity rules:
x + ( y * z )
After applying the priority and associativity rules, you can safely forget about them.
In tree form:
x
+
y
*
z
Now the leaves of this expression are x, y and z. What this means is that you can evaluate x, y and z in any order you want, and also it means that you can evaluate the result of * and x in any order.
Now since these expressions don't have side effects you don't really care. But if they do, the ordering can change the result, and since the ordering can be anything the compiler decides, you have a problem.
Now, sequence points bring a bit of order into this chaos. They effectively cut the tree into sections.
x + y * z, z = 10, x + y * z
after priority and associativity
x + ( y * z ) , z = 10, x + ( y * z)
the tree:
x
+
y
*
z
, ------------
z
=
10
, ------------
x
+
y
*
z
The top part of the tree will be evaluated before the middle, and middle before bottom.
Precedence has nothing to do with order of evaluation and vice-versa.
Precedence rules describe how an underparenthesized expression should be parenthesized when the expression mixes different kinds of operators. For example, multiplication is of higher precedence than addition, so 2 + 3 x 4 is equivalent to 2 + (3 x 4), not (2 + 3) x 4.
Order of evaluation rules describe the order in which each operand in an expression is evaluated.
Take an example
y = ++x || --y;
By operator precedence rule, it will be parenthesize as (++/-- has higher precedence than || which has higher precedence than =):
y = ( (++x) || (--y) )
The order of evaluation of logical OR || states that (C11 6.5.14)
the || operator guarantees left-to-right evaluation.
This means that the left operand, i.e the sub-expression (x++) will be evaluated first. Due to short circuiting behavior; If the first operand compares unequal to 0, the second operand is not evaluated, right operand --y will not be evaluated although it is parenthesize prior than (++x) || (--y).
It mentions "Expressions with higher-precedence operators are evaluated first."
I am just going to repeat what I said here. As far as standard C and C++ are concerned that article is flawed. Precedence only affects which tokens are considered to be the operands of each operator, but it does not affect in any way the order of evaluation.
So, the link only explains how Microsoft implemented things, not how the language itself works.
I think it's only the
a++ + ++a
epxression problematic, because
a = a++ + ++a;
fits first in 3. but then in the 6. rule: complete evaluation before assignment.
So,
a++ + ++a
gets for a=1 fully evaluated to:
1 + 3 // left to right, or
2 + 2 // right to left
The result is the same = 4.
An
a++ * ++a // or
a++ == ++a
would have undefined results. Isn't it?