C++ switch statement expression evaluation guarantee - c++

Regarding switch the standard states the following. "When the switch statement is executed, its condition is evaluated and compared with each case constant."
Does it mean that the condition expression evaluated once and once only, and it is guaranteed by the standard for each compiler?
For example, when a function is used in the switch statement head, with a side effect.
int f() { ... }
switch (f())
{
case ...;
case ...;
}

I think it is guaranteed that f is only called once.
First we have
The condition shall be of integral type, enumeration type, or class type.
[6.4.2 (1)] (the non-integral stuff does not apply here), and
The value of a condition that is an expression is the value of the
expression
[6.4 (4)]. Furthermore,
The value of the condition will be referred to as simply “the condition” where the
usage is unambiguous.
[6.4 (4)] That means in our case, the "condition" is just a plain value of type int, not f. f is only used to find the value for the condition. Now when control reaches the switch statement
its condition is evaluated
[6.4.2 (5)], i.e. we use the value of the int that is returned by f as our "condition". Then finally the condition (which is a value of type int, not f), is
compared with each case constant
[6.4.2 (5)]. This will not trigger side effects from f again.
All quotes from N3797. (Also checked N4140, no difference)

Reading N4296
Page 10 para 14:
Every value computation and side effect associated with a full-expression is sequenced before every value
computation and side effect associated with the next full-expression to be evaluated.
When I read the first line of para. 10 (above that):
A full-expression is an expression that is not a sub-expression of
another expression.
I have to believe that the condition of a switch statement is a full-expression and each condition expression is a full expression (albeit trivial at execution).
A switch is a statement not an expression (see 6.4.2 and many other places).
So by that reading the evaluation of the switch must take place before the evaluation of the case constants.
As ever many points boil down to tortuous reading of the specification to come to an obvious conclusion.
If I peer reviewed that sentence I would propose the following amendment (in bold):
When the switch statement is executed, its condition is evaluated
once per execution of the switch statement and compared with each case constant.

Yes the expression is evaluated only once when the switch statement is executed:
§ 6.4 Selection statements
4 [...] The value of a condition that is an expression is the value of the
expression [...] The value of the condition will be referred to as simply “the condition” where the usage is unambiguous.
This means that the expression is evaluated and its value is considered the condition to be evaluated against each case statement.

Section 6.4.4:
...The value of a condition that is an expression is the value of the
expression, contextually converted to bool for statements other than
switch;...The value of the condition will be referred to as simply “the condition” where the
usage is unambiguous
In my understanding, the quote above is equivalent to the following pseudo-code:
switchCondition := evaluate(expression)
Now add your quote
...its condition is evaluated and compared with each case constant.
Which should be translated to:
foreach case in cases
if case.constant == switchCondition
goto case.block
So yeah, it looks like this is the case.

Does this code print hello once or twice?
int main() {
printf("hello\n");
}
Well, I think the answer is in the more general understanding of what the standard describes rather than in the specific switch statement wording.
As per Program execution [intro.execution] the standard describes the behaviour of some abstract machine that executes the program parsed according to the C++ grammar. It does not really define what 'abstract machine' or 'executes' mean, but they are assumed to mean their obvious computer science concepts, i.e. a computer that goes through the abstract syntax tree and evaluates every part of it according to the semantics described by the standard. This implies that if you wrote something once, then when the execution gets to that point, it is evaluated only once.
The more relevant question is "when the implementation may evaluate something not the way written in the program"? For this there is the as-if rule and a bunch of undefined behaviours which permit the implementation to deviate from this abstract interpretation.

This issue was clarified for C++ '20 making it clear that the condition is evaluated once:
When the switch statement is executed, its condition is evaluated. If one of the case constants has the same value as the condition, control is passed to the statement following the matched case label.
The commit message for the change acknowledges that it was potentially confusing before:
[stmt.switch] Clarify comparison for case labels

The expression is guaranteed that is evaluated only once by the flow of control. This is justified in the standard N4431 §6.4.2/6 The switch statement [stmt.switch] (Emphasis mine):
case and default labels in themselves do not alter the flow of
control, which continues unimpeded across such labels. To exit from a
switch, see break, 6.6.1. [ Note: Usually, the substatement that is
the subject of a switch is compound and case and default labels appear
on the top-level statements contained within the (compound)
substatement, but this is not required. Declarations can appear in the
substatement of a switch-statement. — end note ]

Related

What is this "for" syntax called? [duplicate]

I have seen some very weird for loops when reading other people's code. I have been trying to search for a full syntax explanation for the for loop in C but it is very hard because the word "for" appears in unrelated sentences making the search almost impossible to Google effectively.
This question came to my mind after reading this thread which made me curious again.
The for here:
for(p=0;p+=(a&1)*b,a!=1;a>>=1,b<<=1);
In the middle condition there is a comma separating the two pieces of code, what does this comma do? The comma on the right side I understand as it makes both a>>=1 and b<<=1.
But within a loop exit condition, what happens? Does it exit when p==0, when a==1 or when both happen?
It would be great if anyone could help me understand this and maybe point me in the direction of a full for loop syntax description.
The comma is not exclusive of for loops; it is the comma operator.
x = (a, b);
will do first a, then b, then set x to the value of b.
The for syntax is:
for (init; condition; increment)
...
Which is somewhat (ignoring continue and break for now) equivalent to:
init;
while (condition) {
...
increment;
}
So your for loop example is (again ignoring continue and break) equivalent to
p=0;
while (p+=(a&1)*b,a!=1) {
...
a>>=1,b<<=1;
}
Which acts as if it were (again ignoring continue and break):
p=0;
while (true) {
p+=(a&1)*b;
if (a == 1) break;
...
a>>=1;
b<<=1;
}
Two extra details of the for loop which were not in the simplified conversion to a while loop above:
If the condition is omitted, it is always true (resulting in an infinite loop unless a break, goto, or something else breaks the loop).
A continue acts as if it were a goto to a label just before the increment, unlike a continue in the while loop which would skip the increment.
Also, an important detail about the comma operator: it is a sequence point, like && and || (which is why I can split it in separate statements and keep its meaning intact).
Changes in C99
The C99 standard introduces a couple of nuances not mentioned earlier in this explanation (which is very good for C89/C90).
First, all loops are blocks in their own right. Effectively,
for (...) { ... }
is itself wrapped in a pair of braces
{
for (...) { ... }
}
The standard sayeth:
ISO/IEC 9899:1999 §6.8.5 Iteration statements
¶5 An iteration statement is a block whose scope is a strict subset of the scope of its
enclosing block. The loop body is also a block whose scope is a strict subset of the scope
of the iteration statement.
This is also described in the Rationale in terms of the extra set of braces.
Secondly, the init portion in C99 can be a (single) declaration, as in
for (int i = 0; i < sizeof(something); i++) { ... }
Now the 'block wrapped around the loop' comes into its own; it explains why the variable i cannot be accessed outside the loop. You can declare more than one variable, but they must all be of the same type:
for (int i = 0, j = sizeof(something); i < j; i++, j--) { ... }
The standard sayeth:
ISO/IEC 9899:1999 §6.8.5.3 The for statement
The statement
for ( clause-1 ; expression-2 ; expression-3 ) statement
behaves as follows: The expression expression-2 is the controlling expression that is
evaluated before each execution of the loop body. The expression expression-3 is
evaluated as a void expression after each execution of the loop body. If clause-1 is a
declaration, the scope of any variables it declares is the remainder of the declaration and
the entire loop, including the other two expressions; it is reached in the order of execution
before the first evaluation of the controlling expression. If clause-1 is an expression, it is
evaluated as a void expression before the first evaluation of the controlling expression.133)
Both clause-1 and expression-3 can be omitted. An omitted expression-2 is replaced by a
nonzero constant.
133) Thus, clause-1 specifies initialization for the loop, possibly declaring one or more variables for use in
the loop; the controlling expression, expression-2, specifies an evaluation made before each iteration,
such that execution of the loop continues until the expression compares equal to 0; and expression-3
specifies an operation (such as incrementing) that is performed after each iteration.
The comma simply separates two expressions and is valid anywhere in C where a normal expression is allowed. These are executed in order from left to right. The value of the rightmost expression is the value of the overall expression.
for loops consist of three parts, any of which may also be empty; one (the first) is executed at the beginning, and one (the third) at the end of each iteration. These parts usually initialize and increment a counter, respectively; but they may do anything.
The second part is a test that is executed at the beginning of each execution. If the test yields false, the loop is aborted. That's all there is to it.
The C style for loop consists of three expressions:
for (initializer; condition; counter) statement_or_statement_block;
The initializer runs once, when the loop starts.
The condition is checked before each iteration. The loop runs as long it evaluates to true.
The counter runs once after each iteration.
Each of these parts can be an expression valid in the language you write the loop in. That means they can be used more creatively. Anything you want to do beforehand can go into the initializer, anything you want to do in between can go into the condition or the counter, up to the point where the loop has no body anymore.
To achieve that, the comma operator comes in very handy. It allows you to chain expressions together to form a single new expression. Most of the time it is used that way in a for loop, the other implications of the comma operator (e.g. value assignment considerations) play a minor role.
Even though you can do clever things by using syntax creatively - I would stay clear of it until I find a really good reason to do so. Playing code golf with for loops makes code harder to read and understand (and maintain).
The wikipedia has a nice article on the for loop as well.
Everything is optional in a for loop. We can initialize more than one variable, we can check for more than one condition, we can iterate more than one variable using the comma operator.
The following for loop will take you into an infinite loop. Be careful by checking the condition.
for(;;)
Konrad mentioned the key point that I'd like to repeat: The value of the rightmost expression is the value of the overall expression.
A Gnu compiler stated this warning when I put two tests in the "condition" section of the for loop
warning: left-hand operand of comma expression has no effect
What I really intended for the "condition" was two tests with an "&&" between. Per Konrad's statement, only the test on to the right of the comma would affect the condition.
the for loop is execution for particular time for(;;)
the syntex for for loop
for(;;)
OR
for (initializer; condition; counter)
e.g (rmv=1;rmv<=15;rmv++)
execution to 15 times in for block
1.first initializ the value because start the value
(e.g)rmv=1 or rmv=2
2.second statement is test the condition is true or false ,the condition true no.of time execution the for loop and the condition is false terminate for block,
e.g i=5;i<=10 the condition is true
i=10;i<10 the condition is false terminate for block,
3.third is increment or decrement
(e.g)rmv++ or ++rmv

What does the parenthesis operator does on its own in C++

During writing some code I had a typo that lead to unexpected compilation results and caused me to play and test what would be acceptable by the compiler (VS 2010).
I wrote an expression consisting of only the parenthesis operator with a number in it (empty parenthesis give a compilation error):
(444);
When I ran the code in debug mode, it seems that the line is simply skipped by the program. What is the meaning of the parenthesis operator when it appears by itself?
If I can answer informally,
(444);
is a statement. It can be written wherever the language allows you to write a statement, such as in a function. It consists of an expression 444, enclosed in parentheses (which is also an expression) followed by the statement terminator ;.
Of course, any sane compiler operating in accordance with the as-if rule, will remove it during compilation.
One place where at least one statement is required is in a switch block (even if program control never reaches that point):
switch (1){
case 0:
; // Removing this statement causes a compilation error
}
(444); is a statement consisting of a parenthesized expression (444) and a statement terminator ;
(444) consists of parentheses () and a prvalue expression 444
A parenthesized expression (E) is a primary expression whose type, value, and value category are identical to those of E. The parenthesized expression can be used in exactly the same contexts as those where E can be used, and with the same meaning, except as otherwise indicated.
So in this particular case, parentheses have no additional significance,
so (444); becomes 444; which is then optimized out by the compiler.

Do parentheses force order of evaluation and make an undefined expression defined?

I was just going though my text book when I came across this question:
What would be the value of a after the following expression? Assume the initial value of a = 5. Mention the steps.
a+=(a++)+(++a)
At first I thought this is undefined behaviour because a has been modified more than once. Then I read the question and it said "Mention the steps" so I probably thought this question is right.
Does applying parentheses make an undefined behaviour defined?
Is a sequence point created after evaluating a parentheses expression?
If it is defined,how do the parentheses matter since ++ and () have the same precedence?
No, applying parentheses doesn't make it a defined behaviour. It's still undefined. The C99 standard §6.5 ¶2 says
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored.
Putting a sub-expression in parentheses may force the order of evaluation of sub-expressions but it does not create a sequence point. Therefore, it does not guarantee when the side effects of the sub-expressions, if they produce any, will take place. Quoting the C99 standard again §5.1.2.3¶2
Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and no side
effects of subsequent evaluations shall have taken place.
For the sake of completeness, following are sequence points laid down by the C99 standard in Annex C.
The call to a function, after the arguments have been evaluated.
The end of the first operand of the following operators: logical AND &&; logical OR ||; conditional ?; comma ,.
The end of a full declarator.
The end of a full expression; the expression in an expression statement; the controlling expression of a selection statement (if
or switch); the controlling expression of a while or do
statement; each of the expressions of a for statement; the
expression in a return statement.
Immediately before a library function returns.
After the actions associated with each formatted input/output function conversion specifier.
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any
movement of the objects passed as arguments to that call.
Adding parenthesis does not create a sequence point and in the more modern standards it does not create a sequenced before relationship with respect to side effects which is the problem with the expression that you have unless noted the rest of this will be with respect to C++11. Parenthesis are a primary expression covered in section 5.1 Primary expressions, which has the following grammar (emphasis mine going forward):
primary-expression:
literal
this
( expression )
[...]
and in paragraph 6 it says:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. The presence of parentheses does not affect whether the expression is an lvalue. The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
The postfix ++ is problematic since we can not determine when the side effect of updating a will happen pre C++11 and in C this applies to both the postfix ++ and prefix ++ operations. With respect to how undefined behavior changed for prefix ++ in C++11 see Assignment operator sequencing in C11 expressions.
The += operation is problematic since:
[...]E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is
evaluated only once[...]
So in C++11 the following went from undefined to defined:
a = ++a + 1 ;
but this remains undefined:
a = a++ + 1 ;
and both of the above are undefined pre C++11 and in both C99 and C11.
From the draft C++11 standard section 1.9 Program execution paragraph 15 says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

C++ ternary operator execution conditions

I am unsure about the guarantees of execution for the C / C++ ternary operator.
For instance if I am given an address and a boolean that tells if that address is good for reading I can easily avoid bad reads using if/else:
int foo(const bool addressGood, const int* ptr) {
if (addressGood) { return ptr[0]; }
else { return 0; }
}
However can a ternary operator (?:) guarantee that ptr won't be accessed unless addressGood is true? Or could an optimizing compiler generate code that accesses ptr in any case (possibly crashing the program), stores the value in an intermediate register and use conditional assignment to implement the ternary operator?
int foo(const bool addressGood, const int* ptr) {
// Not sure about ptr access conditions here.
return (addressGood) ? ptr[0] : 0;
}
Thanks.
Yes, the standard guarantees that ptr is only accessed if addressGood is true. See this answer on the subject, which quotes the standard:
Conditional expressions group right-to-left. The first expression is contextually converted to bool (Clause 4). It is evaluated and if it is true, the result of the conditional expression is the value of the second expression, otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second or third expression.
(C++11 standard, paragraph 5.16/1)
I would say, in addition to the answer that "yes, it's guaranteed by the C++ standard":
Please use the first form. It's MUCH clearer what you are trying to achieve.
I can almost guarantee that any sane compiler (with minimal amount of optimisation) generates exactly the same code for both examples anyway.
So whilst it's useful to know that both of these forms achieve the same "protection", it is definitely preferred to use the form that is most readable.
It also means you don't need to write a comment explaining that it is safe because of paragraph such and such in the C++ standard, thus making both take up the same amount of code-space - because if you didn't know it before, then you can rely on someone else ALSO not knowing that this is safe, and then spending the next half hour finding the answer via google, and either running into this thread, or asking the question again!
The conditional (ternary) operator guarantees to only evaluate the second operand if the first operand compares unequal to 0, and only evaluate the third operand if the first operand compares equal to 0. This means that your code is safe.
There is also a sequence point after the evaluation of the first operand.
By the way, you don't need the parantheses - addressGood ? ptr[0] : 0 is fine too.
c++11/[expr.cond]/1
Conditional expressions group right-to-left. The first expression is
contextually converted to bool (Clause 4).
It is evaluated and if it is true, the result of the conditional expression is the value of the second expression,
otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value
computation and side effect associated with the first expression is sequenced before every value computation
and side effect associated with the second or third expression.

Why can the condition of a for-loop be left empty? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
No loop condition in for and while loop
Why is the condition in a for-loop allowed to be left empty? For example
for (;;)
compiles fine. Why does this empty expression evaluate to true but the following
if () {}
while () {}
will both fail? I want to know if/why the for-loop is an exception to this case.
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
These are the iteration statements
while ( expression ) statement
do statement while ( expression ) ;
for ( expression [opt] ; expression [opt] ; expression [opt] ) statement
for ( declaration expression [opt] ; expression [opt] ) statement
The while loop was designed to evaluate the controlling expression before each execution of the loop and the do loop was designed to evaluate after each execution.
The for loop was designed as a more sophisticated iteration statement.
6.8.5.3 The for statement
The statement
for ( clause-1 ; expression-2 ; expression-3 ) statement
behaves as follows: The
expression expression-2 is the controlling expression that is
evaluated before each execution of the loop body. The expression
expression-3 is evaluated as a void expression after each execution of
the loop body. If clause-1 is a declaration, the scope of any
identifiers it declares is the remainder of the declaration and the
entire loop, including the other two expressions; it is reached in the
order of execution before the first evaluation of the controlling
expression. If clause-1 is an expression, it is evaluated as a void
expression before the first evaluation of the controlling expression.
Both clause-1 and expression-3 can be omitted. An omitted
expression-2 is replaced by a nonzero constant.
The specification allows expression-2, the condition of the loop, to be omitted and is replaced by a nonzero constant. This means that the for loop will continue to execute indefinitely.
This is useful for allowing a simple syntax for iterating with no end.
for(int i = 0;;i++) { //do stuff with i }
That's much simpler to write and understand than writing a while(1) loop with a variable declared outside the loop and then incremented inside the loop.
The for loop specification then goes on to allow you to omit clause-1 so that you can declare or initialize variables elsewhere, and you can omit expression-3 so that you are not required to evaluate any expression upon completion of each loop.
The for loop is a special case. The while loop and the do loop are strict and require an expression, but the for loop is a flexible iteration statement.
I can't give a definitive answer, but my guess would be that it's because in the for loop case, there are three different expressions, each of which you may or may not need to use depending on the loop.
In the if case, it would make no sense if there were no expression; it would need to always act as if the expression were true or false, and thus be equivalent to just one of the clauses alone. In the while case, there would be a meaningful interpretation of while () { }, which would be to evaluate to while (1) { } (giving you a loop that you can break out of with break), but I guess that the benefits of leaving out that single character are not worth the trouble.
However, in the for loop case, there are three different expressions, each of which may or may not be needed. For instance, if you want to initialize and increment something on every loop, but will use break to break out of the loop, you can write for (i = 0; ; ++i), or if you just want the test and increment, because your variable is already initialized, you could write for (; i > 0; --i). Given that each of these expressions may not be necessary depending on your loop, making you fill in a placeholder for all that you do not use seems a bit cumbersome; and for consistency, rather than requiring a placeholder in one of them, all of them are optional, and the condition is considered to be a constant non-zero value if omitted.
Of course, it is sometimes easy to read too much intent into a design decision. Sometimes, the reason for a given standard is simply that that's how it was implemented in the first implementation, and everyone else just copied that. For an example, see Rob Pike's explanation of why files starting with . are hidden in Unix; it wasn't due to a deliberate design decision, but simply because someone took a shortcut when writing ls and didn't want to display the . and .. directory entries every time.