Is a C++11-compliant compiler allowed to optimize/transform this code from:
bool x = true; // *not* an atomic type, but suppose bool can be read/written atomically
/*...*/
{
while (x); // spins until another thread changes the value of x
}
to anything equivalent to an infinite loop:
{
while (true); // infinite loop
}
The above conversion is certainly valid from the point of view of a single-thread program, but this is not the general case.
Also, was that optimization allowed in pre-C++11?
Absolutely.
Since x is not marked as volatile and appears to be a local object with automatic storage duration and internal linkage, and the program does not modify it, the two programs are equivalent.
In both C++03 and C++11 this is by the as-if rule, since accessing a non-volatile object is not considered to be a "side effect" of the program:
[C++11: 1.9/12]: Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
C++11 does make room for a global object to have its value changed in one thread then that new value read in another:
[C++11: 1.10/3]: The value of an object visible to a thread T at a particular point is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
However, if you're doing this, since your object is not atomic:
[C++11: 1.10/21]: The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
And, when undefined behaviour is invoked, anything can happen.
Bootnote
[C++11: 1.10/25]: An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
Again, note that the object would have to be atomic (say, std::atomic<bool>) to obtain this guarantee.
The compiler is allowed go do anything to those two loops. Including terminating the program. Because infinite loops have undefined behaviour if they do not perform a synchronization-like operation (do something that requires synchronization with another thread or I/O), according to the C++ memory model:
Note that it means that a program with endless recursion or endless loop (whether implemented as a for-statement or by looping goto or otherwise) has undefined behavior.
Related
This is a scenario you shouldn't ever do, but https://timsong-cpp.github.io/cppwp/class.cdtor#4 states:
Member functions, including virtual functions ([class.virtual]), can be called during construction or destruction ([class.base.init]).
Does this hold if the functions are called in parallel? That is, ignoring the race condition, if the A is in the middle of construction, and frobme is called at some point AFTER the constructor is invoked (e.g. during construction), is that still defined behavior?
#include <thread>
struct A {
void frobme() {}
};
int main() {
char mem[sizeof(A)];
auto t1 = std::thread([mem]() mutable { new(mem) A; });
auto t2 = std::thread([mem]() mutable { reinterpret_cast<A*>(mem)->frobme(); });
t1.join();
t2.join();
}
As a separate scenario, it was also pointed out to me that it's possible for A's constructor to create multiple threads, where those those threads may invoke a member function function before A is finished construction, but the ordering of those operations would be more analyzable (you know no races will occur until AFTER the thread is generated in the constructor).
There are two issues here: your specific code and your general question.
In your specific code, even in the best possible case scenario (where t2 executes after t1), you have a data race due to the lack of synchronization between creation and use. And that makes your code UB regardless of the order of execution.
In the general question, let's assume that the constructor of a type hands the this pointer off to some other thread, which then calls functions on it, and the hand-off itself is properly synchronized. Would some other thread invoking a member function be considered a data race?
Well, it certainly would be a data race if the other thread invokes a function that reads member values or other data written by the constructor subsequent to the point of the hand-off, or if the constructor accesses members or other data written by the member function being invoked. That is, if there are no data races between the code being executed simultaneously.
Assuming that neither of those is the case, then everything should be fine (mostly. It's possible to define A in such a way that your reinterpret_cast doesn't return a usable pointer to the A you created in that storage; you'd need to launder it). An object under construction/destruction can be accessed, but only in certain ways. Stick to those ways, and you should be fine... with one possible catch.
There's nothing in the standard about data races on the completion of an object's initialization, only on conflicts in memory locations. Once the object is fully constructed, the behavior of virtual functions could change, based on changing vtable pointers and such if the dynamic type is a class derived from the class given to the other thread. I don't believe there's a clear statement about this in the section on the object model.
Also, note that C++20 added a special rule to class.cdtor:
During the construction of an object, if the value of the object or any of its subobjects is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor's this pointer, the value of the object or subobject thus obtained is unspecified.
Besides the race condition (which you might be managing with mutexes or similar), you're subject to the usual limitations on an object whose lifetime has not yet started, namely:
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways.
See [basic.life] for the full list of operations that are and are not allowed.
In particular, one of the restrictions is that
The program has undefined behavior if:
...
the glvalue is used to call a non-static member function of the object
which clearly forbids your example.
Also [class.cdtor] says:
For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior
and even if you do synchronize to some event triggered after construction begins, this rule will forbid that code:
During the construction of an object, if the value of the object or any of its subobjects is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor's this pointer, the value of the object or subobject thus obtained is unspecified
Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.
I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.
TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.
I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.
This question already has answers here:
Is it safe to share a volatile variable between the main program and an ISR in C?
(5 answers)
Closed 5 years ago.
In the following code snippet, an interrupt routine uses one of many arrays for its execution. The array used is selected synchronously, not asynchronously (it will never change while the ISR is executing). On a single core microcontroller (this question assumes an STM32L496 if the architecture is important), is the volatile specifier required in the declaration of foo?
int a[] = {1, 2, 3};
int b[] = {4, 5, 6};
int * foo; //int * volatile foo? int volatile * volatile foo?
main(){
disable_interrupt();
foo = a;
enable_interrupt();
...
disable_interrupt();
foo = b;
enable_interrupt();
}
void interrupt(){
//Use foo
}
My assumption is that the volatile specifier is not required because any caching of the value of foo will be correct.
EDIT:
To clarify, the final answer is that volatile or some other synchronisation is required because otherwise writes to foo can be omitted or reordered. Caching is not the only concern.
volatile stops the compiler optimizing it, forcing the compiler
To always read the memory, not a cached value from a register
To not move things before, or after the volatile read/write
On a complex CPU (e.g. x86), it is possible for the CPU to re-order operations before, or after a volatile access.
It is typically for memory-mapped-io, where regions of memory, are actually devices, and can change (even on a single core CPU), without visible cause.
The mechanism for C++11 is to use std::atomic to change a value which may occur on different threads of execution.
With a single core, the code will safely modify the value, and store it. If you use volatile, then it will be written to the memory point, before the interrupts are enabled.
If you don't use volatile, then the code may still have the new value in a register before it is used in the interrupt.
int * volatile foo;
Describes that foo can change, but the values it points to, are stable.
int volatile * volatile foo
Describes foo can change, and the things it points to can also change. I think you want int * volatile foo;
Update
For those who doubt that volatile is a compiler barrier.
From the standard n4296
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O
function, or calling a function that does any of those operations are all side effects, which are changes in the
state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching
a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call
to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered
complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile
access may not have completed yet.
and
From cppreference cv object
volatile object - an object whose type is volatile-qualified, or a subobject of a volatile object, or a mutable subobject of a const-volatile object. Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization (that is, within a single thread of execution, volatile accesses cannot be optimized out or reordered with another visible side effect that is sequenced-before or sequenced-after the volatile access. This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution, see std::memory_order). Any attempt to refer to a volatile object through a non-volatile glvalue (e.g. through a reference or pointer to non-volatile type) results in undefined behavior.
These seem to concur, that there is a compiler barrier, but some of the side effects of interacting with the volatile object may not have completed. For the single core processor, it appears to be a suitable mechanism if C++11 atomics are not available.
From : C++ standard : n4296
We have :-
Every value computation and side effect associated with a full-expression is sequenced before every value
computation and side effect associated with the next full-expression to be evaluated.
From this I understand, there is a happens-before relationship for any operation with a side-effect.
Access to volatile objects are evaluated strictly according to the rules of the abstract machine
From this I understand, that there are rules (which maybe opaque).
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O
function, or calling a function that does any of those operations are all side effects, which are changes in the
state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching
a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call
to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered
complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile
access may not have completed yet.
From this I understand that access to volatile (and a few other things), create a side effect, which stops the compiler from re-ordering statements near a volatile access.
Please see the following code valid in C and C++:
extern int output;
extern int input;
extern int error_flag;
void func(void)
{
if (0 != error_flag)
{
output = -1;
}
else
{
output = input;
}
}
Is the compiler allowed to compile the above code in the same way as if it looked like below?
extern int output;
extern int input;
extern int error_flag;
void func(void)
{
output = -1;
if (0 == error_flag)
{
output = input;
}
}
In other words, is the compiler allowed to generate (from the first snippet) code that always makes a temporary assignment of -1 to output and then assign input value to output depending on error_flag status?
Would the compiler be allowed to do it if output would be declared as volatile?
Would the compiler be allowed to do it if output would be declared as atomic_int (stdatomic.h)?
Update after David Schwartz's comment:
If the compiler is free to add additional writes to a variable, it seems it is not possible to tell from the C code whether a data race exists or not. How to determine this?
Yes, the speculative assignment is possible. Modification of a non-volatile variable is not part of the observable behaviour of the program and thus a spurious write is allowed. (See below for the definition of "observable behaviour", which does not actually include all behaviour which you might observe.)
No. If output is volatile, speculative or spurious mutations are not permitted because the mutation is part of observable behaviour. (Writing to -- or reading from -- a hardware register may have consequences other than just storing a value. This is one of the primary use cases of volatile.)
(Edited) No, the speculative assignment is not possible with atomic output. Loads and stores of atomic variables are synchronized operations, so it should not be possible to load a value of such a variable which was not explicitly stored into the variable.
Observable behaviour
Although a program can do lots of obviously observable things (for example, abruptly terminating because of a segfault), the C and C++ standards only guarantee a limited set of results. Observable behaviour is defined in the C11 draft at §5.1.2.3p6 and in the current C++14 draft at §1.9p8 [intro.execution] with very similar wording:
The least requirements on a conforming implementation are:
— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program.
The above is taken from the C++ standard; the C standard differs in that in the second point it does not allow multiple possible results, and in the third point it explicitly references a relevant section of the standard library requirements. But details aside, the definitions are co-ordinated; for the purpose of this question, the relevant point is that only access to volatile variables is observable (up to the point that the value of a non-volatile variable is sent to an output device or file).
Data Races
This paragraph needs also to be read in the overall context of the C and C++ standards, which free the implementation from all requirements if the program engenders undefined behaviour. That's why the segfault is not considered in the definition of observable behaviour above: the segfault is a possible undefined behaviour but not a possible behaviour in a conformant program. So in the universe of only conformant programs and conformant implementations, there are no segfaults.
That's important because a program with a data race is not conformant. A data race has undefined behaviour, even if it seems innocuous. And since it is the responsibility of the programmer to avoid undefined behaviour, the implementation may optimize without regard to data races.
The exposition of the memory model in the C and C++ standards is dense and technical, and probably not suitable as an introduction to the concepts. (Browsing around the material on Hans Boehm's site will probably prove less difficult.) Extracting quotes from the standard is risky, because the details are important. But here is a small leap into the morass, from the current C++14 standard, §1.10 [intro.multithread]:
Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
…
Two actions are potentially concurrent if
— they are performed by different threads, or
— they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The take-away here is that a read and a write of the same variable need to be synchronized; otherwise it is a data race and the result is undefined behaviour. Some programmers might object to the strictness of this prohibition, arguing that some data races are "benign". This is the topic of Hans Boehm's 2011 HotPar paper "How to miscompile programs with "benign" data races" (pdf) (author's summary: "There are no benign data races"), and he explains it all much better than I could.
Synchronization here includes the use of atomic types, so it is not a data race to concurrently read and modify an atomic variable. (The result of the read is unpredictable, but it must be either the value before the modification or the value afterwards.) This prevents the compiler from performing "piecemeal" modification of an atomic variable without some explicit synchronization.
After some thought and more research, my conclusion is that the compiler cannot perform speculative writes to atomic variables either. Consequently, I modified the answer to question 3, which I had originally answered "no".
Other useful references:
Bartosz Milewski: Dealing with Benign Data Races the C++ Way
Milewski deals with the precise issue of speculative writes to atomic variables, and concludes:
Can’t the compiler still do the same dirty trick, and momentarily store 42 in the owner variable? No, it can’t! Since the variable is declared atomic the compiler can no longer assume that the write can’t be observed by other threads.
Herb Sutter on Thread Safety and Synchronization
As usual, an accessible and well-written explanation.
Yes, the compiler is allowed to do that kind of optimization. In general, you can assume that the compiler (and the CPU too) can reorder your code assuming it is running in a single thread. If you have more than one thread, you need to synchronize. If you don't synchronize and your code writes to a memory location that is written to or read by another thread, your code contains a data race, in C++ this is undefined behavior.
volatile doesn't change the data race problem. However IIRC, the compiler is not allowed to reorder reads and writes to a volatile variable.
When using atomic_int, the compiler can still perform certain optimizations. I don't think that the compiler can invent writes though (that could break a multithreaded program). However, it can still reorder operations, so be careful.
Is it a standard term which is well defined, or just a term coined by developers to explain a concept (.. and what is the concept)? As I understand this has something to do with the all-confusing sequence points, but am not sure.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
For reference, some questions talking about side effects:
Is comma operator free from side effect?
Force compiler to not optimize side-effect-less statements
Side effects when passing objects to function in C++
A "side effect" is defined by the C++ standard in [intro.execution], by:
Reading an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
The term "side-effect" arises from the distinction between imperative languages and pure functional languages. A C++ expression can do three things:
compute a result (or compute "no result" in the case of a void expression),
raise an exception instead of evaluating to a result,
in addition to 1 or 2, otherwise alter the state of the abstract machine on which the program is nominally running.
(3) are side-effects, the "main effect" being to evaluate the result of the expression. Exceptions are a slightly awkward special case, in that altering the flow of control does change the state of the abstract machine (by changing the current point of execution), but isn't a side-effect. The code to construct, handle and destroy the exception may have its own side-effects, of course.
The same principles apply to functions, with the return value in place of the result of the expression.
So, int foo(int a, int b) { return a + b; } just computes a return value, it doesn't alter anything else. Therefore it has no side-effects, which sometimes is an interesting property of a function when it comes to reasoning about your program (e.g. to prove that it is correct, or by the compiler when it optimizes). int bar(int &a, int &b) { return ++a + b; } does have a side-effect, since modifying the caller's object a is an additional effect of the function beyond simply computing a return value. It would not be permitted in a pure functional language.
The stuff in your quote about "has finished being evaluated" refers to the fact that the result of an expression (or return value of a function) can be a "temporary object", which is destroyed at the end of the full expression in which it occurs. So creating a temporary isn't a "side-effect" by that definition: other changes are.
What exactly is a 'side-effect' in C++? Is it a standard term which is well defined...
c++11 draft - 1.9.12: Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
The significance is that, as expressions are being evaluated they can modify the program state and/or perform I/O. Expressions are allowed in myriad places in C++: variable assignments, if/else/while conditions, for loop setup/test/modify steps, function parameters etc.... A couple examples: ++x and strcat(buffer, "append this").
In a C++ program, the Standard grants the optimiser the right to generate code representing the program operations, but requires that all the operations associated with steps before a sequence point appear before any operations related to steps after the sequence point.
The reason C++ programmers tend to have to care about sequence points and side effects is that there aren't as many sequence points as you might expect. For example: given x = 1; f(++x, ++x);, you may expect a call to f(2, 3) but it's actually undefined behaviour. This behaviour is left undefined so the compiler's optimiser has more freedom to arrange operations with side effects to run in the most efficient order possible - perhaps even in parallel. It also avoid burdening compiler writers with detecting such conditions.
1.Is comma operator free from side effect?
Yes - a comma operator introduces a sequence point: the steps on the left must be complete before those on the right execute. There are a list of sequence points at http://en.wikipedia.org/wiki/Sequence_point - you should read this! (If you have to ask about side effects, then be careful in interpreting this answer - the "comma operator" is NOT invoked between function arguments, array initialisation elements etc.. The comma operator is relatively rarely used and somewhat obscure. Do some reading if you're not sure what the comma operator really is.)
2.Force compiler to not optimize side-effect-less statements
I assume you mean "side-effect-ful" statements. Compiler's are not obliged to support any such option. What behaviour would they exhibit if they tried? - the Standard doesn't define what they should do in such situations. Sometimes a majority of programmers might share an intuitive expectation, but other times it's really arbitary.
3.Side effects when passing objects to function in C++
When calling a function, all the parameters must have been completely evaluated - and their side effects triggered - before the function call takes place. BUT, there are no restrictions on the compiler related to evaluating specific parameter expressions before any other. They can be overlapping, in parallel etc.. So, in f(expr1, expr2) - some of the steps in evaluating expr2 might run before anything from expr1, but expr1 might still complete first - it's undefined.
1.9.6
The observable behavior of the abstract machine is its sequence of
reads and writes to volatile data and calls to library I/O
functions.
A side-effect is anything that affects observable behavior.
Note that there are exceptions specified by the standard, where observable behavior doesn't have to conform to that of the abstract machine - see return value optimization, temporary copy elision.