During writing some code I had a typo that lead to unexpected compilation results and caused me to play and test what would be acceptable by the compiler (VS 2010).
I wrote an expression consisting of only the parenthesis operator with a number in it (empty parenthesis give a compilation error):
(444);
When I ran the code in debug mode, it seems that the line is simply skipped by the program. What is the meaning of the parenthesis operator when it appears by itself?
If I can answer informally,
(444);
is a statement. It can be written wherever the language allows you to write a statement, such as in a function. It consists of an expression 444, enclosed in parentheses (which is also an expression) followed by the statement terminator ;.
Of course, any sane compiler operating in accordance with the as-if rule, will remove it during compilation.
One place where at least one statement is required is in a switch block (even if program control never reaches that point):
switch (1){
case 0:
; // Removing this statement causes a compilation error
}
(444); is a statement consisting of a parenthesized expression (444) and a statement terminator ;
(444) consists of parentheses () and a prvalue expression 444
A parenthesized expression (E) is a primary expression whose type, value, and value category are identical to those of E. The parenthesized expression can be used in exactly the same contexts as those where E can be used, and with the same meaning, except as otherwise indicated.
So in this particular case, parentheses have no additional significance,
so (444); becomes 444; which is then optimized out by the compiler.
Related
I was working on this answer. And I ran into a conundrum: scanf has an assignment suppressing '*':
If this option is present, the function does not assign the result of the conversion to any receiving argument
But when used in get_time the '*' gives a run-time error on Visual Studio, libc++, and libstdc++: str >> get_time(&tmbuf, "%T.%*Y") so I believe it's not supported.
As such I chose to ignore input by reading into tmbuf.tm_year twice:
str >> get_time(&tmbuf, "%H:%M:%S.%Y UTC %b %d %Y");
This works and seems to be my only option so far as get_time goes since the '*' isn't accepted. But as we all know, just because it works doesn't mean it's defined. Can someone confirm that:
It is defined to assign the same variable twice in get_time
The stream will always be read left-to-right so the 1stincidence of %Y will be stomped, not the 2nd
The standard specifies the exact algorithm of processing the format string of get_time in 22.4.5.1.1 time_get members. (time_get::get is what eventually gets called when you do str>>get_time(...)). I quote the important parts:
The function starts by evaluating err = ios_base::goodbit. It then enters a loop, reading zero or more characters from s at each iteration. Unless otherwise specified below, the loop terminates when the first of the following conditions holds:
(8.1) — The expression fmt == fmtend evaluates to true.
skip boring error-handling parts
(8.4) — The next element of fmt is equal to ’%’, optionally followed by a modifier character, followed by a conversion specifier character, format, together forming a conversion specification valid for the ISO/IEC 9945 function strptime. skip boring error-handling parts the function evaluates s = do_get(s, end, f, err, t, format, modifier) skip more boring error-handling parts the function increments fmt to point just past the end of the conversion specification and continues looping.
As can be seen from the description, the format string is processed strictly sequentially left to right. There's no provision to handle repeating conversion specifications specially. So the answer must be yes, what you have done is it is well defined and perfectly legal.
Regarding switch the standard states the following. "When the switch statement is executed, its condition is evaluated and compared with each case constant."
Does it mean that the condition expression evaluated once and once only, and it is guaranteed by the standard for each compiler?
For example, when a function is used in the switch statement head, with a side effect.
int f() { ... }
switch (f())
{
case ...;
case ...;
}
I think it is guaranteed that f is only called once.
First we have
The condition shall be of integral type, enumeration type, or class type.
[6.4.2 (1)] (the non-integral stuff does not apply here), and
The value of a condition that is an expression is the value of the
expression
[6.4 (4)]. Furthermore,
The value of the condition will be referred to as simply “the condition” where the
usage is unambiguous.
[6.4 (4)] That means in our case, the "condition" is just a plain value of type int, not f. f is only used to find the value for the condition. Now when control reaches the switch statement
its condition is evaluated
[6.4.2 (5)], i.e. we use the value of the int that is returned by f as our "condition". Then finally the condition (which is a value of type int, not f), is
compared with each case constant
[6.4.2 (5)]. This will not trigger side effects from f again.
All quotes from N3797. (Also checked N4140, no difference)
Reading N4296
Page 10 para 14:
Every value computation and side effect associated with a full-expression is sequenced before every value
computation and side effect associated with the next full-expression to be evaluated.
When I read the first line of para. 10 (above that):
A full-expression is an expression that is not a sub-expression of
another expression.
I have to believe that the condition of a switch statement is a full-expression and each condition expression is a full expression (albeit trivial at execution).
A switch is a statement not an expression (see 6.4.2 and many other places).
So by that reading the evaluation of the switch must take place before the evaluation of the case constants.
As ever many points boil down to tortuous reading of the specification to come to an obvious conclusion.
If I peer reviewed that sentence I would propose the following amendment (in bold):
When the switch statement is executed, its condition is evaluated
once per execution of the switch statement and compared with each case constant.
Yes the expression is evaluated only once when the switch statement is executed:
§ 6.4 Selection statements
4 [...] The value of a condition that is an expression is the value of the
expression [...] The value of the condition will be referred to as simply “the condition” where the usage is unambiguous.
This means that the expression is evaluated and its value is considered the condition to be evaluated against each case statement.
Section 6.4.4:
...The value of a condition that is an expression is the value of the
expression, contextually converted to bool for statements other than
switch;...The value of the condition will be referred to as simply “the condition” where the
usage is unambiguous
In my understanding, the quote above is equivalent to the following pseudo-code:
switchCondition := evaluate(expression)
Now add your quote
...its condition is evaluated and compared with each case constant.
Which should be translated to:
foreach case in cases
if case.constant == switchCondition
goto case.block
So yeah, it looks like this is the case.
Does this code print hello once or twice?
int main() {
printf("hello\n");
}
Well, I think the answer is in the more general understanding of what the standard describes rather than in the specific switch statement wording.
As per Program execution [intro.execution] the standard describes the behaviour of some abstract machine that executes the program parsed according to the C++ grammar. It does not really define what 'abstract machine' or 'executes' mean, but they are assumed to mean their obvious computer science concepts, i.e. a computer that goes through the abstract syntax tree and evaluates every part of it according to the semantics described by the standard. This implies that if you wrote something once, then when the execution gets to that point, it is evaluated only once.
The more relevant question is "when the implementation may evaluate something not the way written in the program"? For this there is the as-if rule and a bunch of undefined behaviours which permit the implementation to deviate from this abstract interpretation.
This issue was clarified for C++ '20 making it clear that the condition is evaluated once:
When the switch statement is executed, its condition is evaluated. If one of the case constants has the same value as the condition, control is passed to the statement following the matched case label.
The commit message for the change acknowledges that it was potentially confusing before:
[stmt.switch] Clarify comparison for case labels
The expression is guaranteed that is evaluated only once by the flow of control. This is justified in the standard N4431 §6.4.2/6 The switch statement [stmt.switch] (Emphasis mine):
case and default labels in themselves do not alter the flow of
control, which continues unimpeded across such labels. To exit from a
switch, see break, 6.6.1. [ Note: Usually, the substatement that is
the subject of a switch is compound and case and default labels appear
on the top-level statements contained within the (compound)
substatement, but this is not required. Declarations can appear in the
substatement of a switch-statement. — end note ]
May be this is a silly question, but did anyone already know whether or not there would be a mandatory space between the case keyword and its constant expression in the switch statement?
The standard seems not say anything about...
Consider the following code:
switch(int_expression)
{
case1: /*anything*/ // no space before 1
caseZERO: /*anything*/ // no space before ZERO
// ZERO being defined as 0
// by the pre-processor
}
Both my reference compilers accept this code and once run it works nicely.
How does the preprocessor recognize that ZERO must be substituted?
Note also that if the control expression was a char type instead of a int type the following is also compiled but this time it no longer works
switch(char_expression)
{
case'a': /* anything */ // NO SPACES embedded
caseZERO: /* anything */ // NO SPACES again
// ZERO being defined as '0'
// by the pre-processor
}
THIS COMPILES but even if the value of the char_expression was '0'
no statement in caseZERO is executed.
Can anyone explain this?
These are general labels for use with goto and won't actually match the expression as you intend. The syntax is valid, but the semantics are different.
You don't specify whether this is a C compiler or a C++ compiler, so here's the documentation for goto in C as well.
Also, if you look in the actual grammar provided by the ISO C standard in Appendix A.2.3 (6.8.1) defines a labeled-statement as follows:
(6.8.1) labeled-statement:
identifier : statement
case constant-expression : statement
default : statement
Note specifically that the case is a single token which must be followed by at least one token separating character in the context of the grammar. A numeral as in your example of case1 does not count as a token separating character, so your example falls into the first branch of the grammar which is an identifier not a case statement, and thus can't be used as the target of a switch statement.
To answer your other question regarding ZERO being defined in a preprocessor macro, the substitution is not happening as you believe. Again, the preprocessor operates on tokens, and thus caseZERO is a single token which does not match the ZERO macro and so will not be substituted at all. Again, this is just defining a labeled-statement using the identifier branch where the identifier is the entire token caseZERO and not case0 or case'0' as you believe. The preprocessor does have means of doing "token pasting" using the ## operator, but that would require you to use case ## ZERO. However, this still would not have the behavior that you probably intend.
The type of the case expression is not relevant. It's the syntactic form that matters.
switch (expr) {
caseZERO: /*...*/;
case0: /*...*/;
case'a': /*...*/;
}
caseZERO and case0 are both valid identifiers. In this context, they are labels, but not case labels. If you had a statement
goto caseZERO;
or
goto case0;
in the same function, it would branch to the corresponding labeled statement.
(This is one of the many cases in C where a typo results in code that's still syntactically valid, but with a substantially different meaning.)
case'a', on the other hand, is two tokens, because the 'a' is a character constant. (But it should still be written as case 'a': for the benefit of the human reader.)
The rules for splitting source code into tokens require white space between an identifier, keyword, or numeric literal and another identifier, keyword, or numeric literal, because otherwise it would be ambiguous. They do not require white space in other contexts. That's why, for example, you can write:
x=y+func(42);
rather than
x = y + func ( 42 ) ;
Adding some white space will make the code more legible to human readers:
x = y + func(42);
but the compiler doesn't care.
(Another case where whitespace is important is in the definition of a function-like macro. The ( must immediately follow the macro name; otherwise it's treated as the first token of the expansion rather than introducing the parameter list.)
I am working in C++ (not C++/CLI) in Visual Studio 2012.
I don't understand why this code works, I would have expected it to fail at compilation time, but it doesn't even fail at runtime:
double MyClass::MyMethod() const
{
//some code here
return (10, 20, 30, 40);
}
I produced this code by mistake, wasn't on purpose, I noticed the mistake when I was running my Unit Tests. And I am surprised it works. When I run it, it returns 40, the last number on the list.
Can someone explain me what this syntax means and why it works?
This is using the comma operator which will evaluate each expression from left to right but only return the last. If we look at the draft C++ standard section 5.18 Comma operator it says:
A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded value expression (Clause 5).83 Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression.
the linked article gives the most common use as:
allow multiple assignment statements without using a block statement, primarily in the initialization and the increment expressions of a for loop.
and this previous thread Uses of C comma operator has some really interesting examples of how people use the comma operator if you are really curious.
Enabling warning which is always a good idea may have helped you out here, in gcc using -Wall I see the following warning:
warning: left operand of comma operator has no effect [-Wunused-value]
return (10, 20, 30, 40);
^
and then two more of those.
The comma operator is a 'sequence point' in C++, often used to initialise multiple variables in for loops.
So the code is evaluating a series of integers, one at a time, as single expressions. The last of these is the returned value, and the return statement as a whole is equivalent to simply return (40);
the expression (10, 20, 30, 40) is actually a series of 4 expressions separated by , You can use , to separate multiple expressions and the result is the evaluation of the last one.
You have used the , i.e. comma operator
return () is valid.
and so is return (/*valid evaluation*/)
Comma operator returns the last value i.e 40
int i=1,2,3,4; // Compile error
// The value of i is 1
int i = (1,2,3,4,5);
// The value of i is 5
What is the difference between these definitions of i in C and how do they work?
Edit: The first one is a compiler error. How does the second work?
= takes precedence over ,1. So the first statement is a declaration and initialisation of i:
int i = 1;
… followed by lots of comma-separated expressions that do nothing.
The second code, on the other hand, consists of one declaration followed by one initialisation expression (the parentheses take precedence so the respective precedence of , and = are no longer relevant).
Then again, that’s purely academic since the first code isn’t valid, neither in C nor in C++. I don’t know which compiler you’re using that it accepts this code. Mine (rightly) complains
error: expected unqualified-id before numeric constant
1 Precedence rules in C++ apply regardless of how an operator is used. = and , in the code of OP do not refer to operator= or operator,. Nevertheless, they are operators as far as C++ is concerned (§2.13 of the standard), and the precedence of the tokens = and , does not depend on their usage – it so happens that , always has a lower precedence than =, regardless of semantics.
You have run into an interesting edge case of the comma operator (,).
Basically, it takes the result of the previous statement and discards it, replacing it with the next statement.
The problem with the first line of code is operator precedence. Because the = operator has greater precedence than the , operator, you get the result of the first statement in the comma chain (1).
Correction (thanks #jrok!) - the first line of code neither compiles, nor is it using the comma as an operator, but instead as an expression separator, which allows you to define multiple variable names of the same type at a time.
In the second one, all of the first values are discarded and you are given the final result in the chain of items (5).
Not sure about C++, but at least for C the first one is invalid syntax so you can't really talk about a declaration since it doesn't compile. The second one is just the comma operator misused, with the result 5.
So, bluntly, the difference is that the first isn't C while the second is.