So I'm looking over C++ operator rules as I do when my programs start behaving wonkily. And I come across the comma operator. Now, I have known it was there for a while but never used it, so I began reading, and I come across this little gem:
if (int y = f(x), y > x)
{
// statements that use y
}
I had never thought about using commas' first arguments' side-effects to get locally-scoped variables without the need for bulky block-delimited code or repeated function calls. Naturally, this all excited me greatly, and I immediately ran off to try it.
test_comma.cpp: In function 'int main()':
test_comma.cpp:9:18: error: expected ')' before ',' token
if (int y = f(x), y > x) {
I tried this on both a C and C++ compiler, and neither of them liked it. I tried instead declaring y in the outer scope, and it compiled and ran just fine without the int in the if condition, but that defeats the purpose of the comma here. Is this just a GCC implementation quirk? The opinion of the Internet seems to be that this should be perfectly valid C (and ostensibly, to my eye, C++) code; there is no mention of this error on any GCC or C++ forum that I've seen.
EDIT: Some more information. I am using MinGW GCC 4.8.1-4 on Windows 7 64-bit (though obviously my binaries are 32-bit; I need to install mingw-w64 one of these days).
I also tried using this trick outside of a conditional statement, as below:
int y = (int z = 5, z);
This threw up two different errors:
test_comma.cpp: In function 'int main()':
test_comma.cpp:9:11: error: expected primary-expression before 'int'
int y = (int z = 5, z);
^
test_comma.cpp:9:11: error: expected ')' before 'int'
With creative use of parentheses in my if statement above, I managed to get the same errors there, too.
Contrary to what several other people have claimed, declarations inside the if conditional are perfectly valid. However, your code is not.
The first problem is that you're not actually using the comma operator, but [almost] attempting to declare multiple variables. That is not valid in an if conditional. And, even if it were possible, your second declaration would be entirely broken anyway since you try to redeclare y, and you do so with > instead of =. It all simply makes no sense.
The following code is sort of similar:
if (int y = (f(x), y > x))
Now at least it's half-valid, but you're using y uninitialised and yielding undefined behaviour.
Declarations and expressions are not the same thing, so the following is quite different code:
int y = 0;
if (y = f(x), y > x)
Now you don't have a problem with uninitialised variables, either (because I initialised y myself), and you're getting this "side-effect declaration" that doesn't change the resulting value of the if conditional. But it's about as clear as mud. Look how the precedence forms:
int y = 0;
if ((y = f(x)), (y > x))
That's not really very intuitive.
Hopefully this total catastrophe has been a lesson in avoiding this sort of cryptic code in entirety. :)
You cannot declare variable and apply operator , simultaneously either you are declaring variable (in case of if it would be only one 'cause result needs to be resolved to bool), either you are writing some statement (also resolving to bool) which may include operator , in it.
You need to declare y on the top of if condition:
int y;
if(y=f(x),y>x)
{
}
This will check the last condition defined in the if condition and rest others are executed as general statements.
Related
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 6 years ago.
I have a friend who is getting different output than I do for the following program:
int main() {
int x = 20, y = 35;
x = y++ + x++;
y = ++y + ++x;
printf("%d%d", x, y);
return 0;
}
I am using Ubuntu, and have tried using gcc and clang. I get 5693 from both.
My friend is using Visual Studio 2015, and gets 5794.
The answer I get (5693) makes most sense to me, since:
the first line sets x = x + y (which is x = 20+35 = 55) (note: x was incremented, but assigned over top of, so doesn't matter)
y was incremented and is therefore 36
next line increments both, adds the result and sets it as y (which is y = 37 + 56 = 93)
which would be 56 and 93, so the output is 5693
I could see the VS answer making sense if the post-increment happened after the assignment. Is there some spec that makes one of these answers more right than the other? Is it just ambiguous? Should we fire anyone who writes code like this, making the ambiguity irrelevant?
Note: Initially, we only tried with gcc, however clang gives this warning:
coatedmoose#ubuntu:~/playground$ clang++ strange.cpp
strange.cpp:8:16: warning: multiple unsequenced modifications to 'x' [-Wunsequenced]
x = y++ + x++;
~ ^
1 warning generated.
The Clang warning is alluding to a clause in the standard, C++11 and later, that makes it undefined behaviour to execute two unsequenced modifications to the same scalar object. (In earlier versions of the standard the rule was different although similar in spirit.)
So the answer is that the spec makes all possible answers equally valid, including the program crashing; it is indeed inherently ambiguous.
By the way, Visual C++ actually does have somewhat consistent and logical behaviour in such cases, even though the standard does not require it: it performs all pre-increments first, then does arithmetic operations and assignments, and then finally performs all post-increments before moving on to the next statement. If you trace through the code you've given, you'll see that Visual C++'s answer is what you would expect from this procedure.
If have encountered this claim multiple times and can't figure out what it is supposed to mean. Since the resulting code is compiled using a regular C compiler it will end up being type checked just as much (or little) as any other code.
So why are macros not type safe? It seems to be one of the major reasons why they should be considered evil.
Consider the typical "max" macro, versus function:
#define MAX(a,b) a < b ? a : b
int max(int a, int b) {return a < b ? a : b;}
Here's what people mean when they say the macro is not type-safe in the way the function is:
If a caller of the function writes
char *foo = max("abc","def");
the compiler will warn.
Whereas, if a caller of the macro writes:
char *foo = MAX("abc", "def");
the preprocessor will replace that with:
char *foo = "abc" < "def" ? "abc" : "def";
which will compile with no problems, but almost certainly not give the result you wanted.
Additionally of course the side effects are different, consider the function case:
int x = 1, y = 2;
int a = max(x++,y++);
the max() function will operate on the original values of x and y and the post-increments will take effect after the function returns.
In the macro case:
int x = 1, y = 2;
int b = MAX(x++,y++);
that second line is preprocessed to give:
int b = x++ < y++ ? x++ : y++;
Again, no compiler warnings or errors but will not be the behaviour you expected.
Macros aren't type safe because they don't understand types.
You can't tell a macro to only take integers. The preprocessor recognises a macro usage and it replaces one sequence of tokens (the macro with its arguments) with another set of tokens. This is a powerful facility if used correctly, but it's easy to use incorrectly.
With a function you can define a function void f(int, int) and the compiler will flag if you try to use the return value of f or pass it strings.
With a macro - no chance. The only checks that get made are it is given the correct number of arguments. then it replaces the tokens appropriately and passes onto the compiler.
#define F(A, B)
will allow you to call F(1, 2), or F("A", 2) or F(1, (2, 3, 4)) or ...
You might get an error from the compiler, or you might not, if something within the macro requires some sort of type safety. But that's not down to the preprocessor.
You can get some very odd results when passing strings to macros that expect numbers, as the chances are you'll end up using string addresses as numbers without a squeak from the compiler.
Well they're not directly type-safe... I suppose in certain scenarios/usages you could argue they can be indirectly (i.e. resulting code) type-safe. But you could certainly create a macro intended for integers and pass it strings... the pre-processor handling the macros certainly doesn't care. The compiler may choke on it, depending on usage...
Since macros are handled by the preprocessor, and the preprocessor doesn't understand types, it will happily accept variables that are of the wrong type.
This is usually only a concern for function-like macros, and any type errors will often be caught by the compiler even if the preprocessor doesn't, but this isn't guaranteed.
An example
In the Windows API, if you wanted to show a balloon tip on an edit control, you'd use Edit_ShowBalloonTip. Edit_ShowBalloonTip is defined as taking two parameters: the handle to the edit control and a pointer to an EDITBALLOONTIP structure. However, Edit_ShowBalloonTip(hwnd, peditballoontip); is actually a macro that evaluates to
SendMessage(hwnd, EM_SHOWBALLOONTIP, 0, (LPARAM)(peditballoontip));
Since configuring controls is generally done by sending messages to them, Edit_ShowBalloonTip has to do a typecast in its implementation, but since it's a macro rather than an inline function, it can't do any type checking in its peditballoontip parameter.
A digression
Interestingly enough, sometimes C++ inline functions are a bit too type-safe. Consider the standard C MAX macro
#define MAX(a, b) ((a) > (b) ? (a) : (b))
and its C++ inline version
template<typename T>
inline T max(T a, T b) { return a > b ? a : b; }
MAX(1, 2u) will work as expected, but max(1, 2u) will not. (Since 1 and 2u are different types, max can't be instantiated on both of them.)
This isn't really an argument for using macros in most cases (they're still evil), but it's an interesting result of C and C++'s type safety.
There are situations where macros are even less type-safe than functions. E.g.
void printlog(int iter, double obj)
{
printf("%.3f at iteration %d\n", obj, iteration);
}
Calling this with the arguments reversed will cause truncation and erroneous results, but nothing dangerous. By contrast,
#define PRINTLOG(iter, obj) printf("%.3f at iteration %d\n", obj, iter)
causes undefined behavior. To be fair, GCC warns about the latter, but not about the former, but that's because it knows printf -- for other varargs functions, the results are potentially disastrous.
When the macro runs, it just does a text match through your source files. This is before any compilation, so it is not aware of the datatypes of anything it changes.
Macros aren't type safe, because they were never meant to be type safe.
The compiler does the type checking after macros had been expanded.
Macros and there expansion are meant as a helper to the ("lazy") author (in the sense of writer/reader) of C source code. That's all.
I guess the answer is "no", but from a compiler point of view, I don't understand why.
I made a very simple code which freaks out compiler diagnostics quite badly (both clang and gcc), but I would like to have confirmation that the code is not ill formatted before I report mis-diagnostics. I should point out that these are not compiler bugs, the output is correct in all cases, but I have doubts about the warnings.
Consider the following code:
#include <iostream>
int main(){
int b,a;
b = 3;
b == 3 ? a = 1 : b = 2;
b == 2 ? a = 2 : b = 1;
a = a;
std::cerr << a << std::endl;
}
The assignment of a is a tautology, meaning that a will be initialized after the two ternary statements, regardless of b. GCC is perfectly happy with this code. Clang is slighly more clever and spot something silly (warning: explicitly assigning a variable of type 'int' to itself [-Wself-assign]), but no big deal.
Now the same thing (semantically at least), but shorter syntax:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a);
std::cerr << a << std::endl;
}
Now the compilers give me completely different warnings. Clang doesn't report anything strange anymore (which is probably correct because of the parenthesis precedence). gcc is a bit more scary and says:
test.cpp: In function ‘int main()’:
test.cpp:7:15: warning: operation on ‘a’ may be undefined [-Wsequence-point]
But is that true? That sequence-point warning gives me a hint that coma separated statements are not handled in the same way in practice, but I don't know if they should or not.
And it gets weirder, changing the code to:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a+0); // <- i just changed this line
std::cerr << a << std::endl;
}
and then suddenly clang realized that there might be something fishy with a:
test.cpp:7:14: warning: variable 'a' is uninitialized when used within its own initialization [-Wuninitialized]
a+0);
^
But there was no problem with a before... For some reasons clang cannot spot the tautology in this case. Again, it might simply be because those are not full statements anymore.
The problems are:
is this code valid and well defined (in all versions)?
how is the list of comma separated statements handled? Should it be different from the first version of the code with explicit statements?
is GCC right to report undefined behavior and sequence point issues? (in this case clang is missing some important diagnostics) I am aware that it says may, but still...
is clang right to report that a might be uninitialized in the last case? (then it should have the same diagnostic for the previous case)
Edit and comments:
I am getting several (rightful) comments that this code is anything but simple. This is true, but the point is that the compilers mis-diagnose when they encounter comma-separated statements in initializers. This is a bad thing. I made my code more complete to avoid the "have you tried this syntax..." comments. A much more realistic and human readable version of the problem could be written, which would exhibit wrong diagnostics, but I think this version shows more information and is more complete.
in a compiler-torture test suite, this would be considered very understandable and readable, they do much much worse :) We need code like that to test and assess compilers. This would not look pretty in production code, but that is not the point here.
5 Expressions
10 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded
5.18 Comma operator [expr.comma]
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded-value expression (Clause 5).83 Every
value computation and side effect associated with the left expression
is sequenced before every value computation and side effect associated
with the right expression. The type and value of the result are the
type and value of the right operand; the result is of the same value
category as its right operand, and is a bit-field if its right operand
is a glvalue and a bit-field.
It sounds to me like there's nothing wrong with your statement.
Looking more closely at the g++ warning, may be undefined, which tells me that the parser isn't smart enough to see that a=1 is guaranteed to be evaluated.
I have following statement and it compiles:
static unsigned char CMD[5] = {0x10,0x03,0x04,0x05,0x06};
int Class::functionA(int *buflen)
{
...
int length = sizeof(CMD); + *buflen; // compiler should cry! why not?
...
}
Why I get no compiler error?
+ *buflen;
Is a valid application of the unary + operator on an int&, it's basically a noop. It's the same as if you wrote this:
int i = 5;
+i; // noop
See here for what the unary operator+ actually does to integers, and here what you can practically do with it.
Because it isn't wrong, just a statement with no effect.
If you compile (gcc/g++) with the flag -Wall you'll see.
I guess from this Question's title "After semicolon another command and it compiles" that you think that there can only be one command/statement per line?
As you noticed, this is false. C++ and C are free-form languages (which means that you can arrange the symbols in any way you see fit). The semicolon is just a statement terminator.
You may write foo();bar(); or
foo();
bar();
Both (and more) arrangements are totally fine. By the way, that's a feature, not a bug. Some languages (Python, early Fortran) don't have that property.
As others have correctly pointed out, your specific statement is a no-op, a statement without any effect. Some compilers might warn you about that - but no compiler will warn you about multiple statements on one line.
I've recently (only on SO actually) run into uses of the C/C++ comma operator. From what I can tell, it creates a sequence point on the line between the left and right hand side operators so that you have a predictable (defined) order of evaluation.
I'm a little confused about why this would be provided in the language as it seems like a patch that can be applied to code that shouldn't work in the first place. I find it hard to imagine a place it could be used that wasn't overly complex (and in need of refactoring).
Can someone explain the purpose of this language feature and where it may be used in real code (within reason), if ever?
It can be useful in the condition of while() loops:
while (update_thing(&foo), foo != 0) {
/* ... */
}
This avoids having to duplicate the update_thing() line while still maintaining the exit condition within the while() controlling expression, where you expect to find it. It also plays nicely with continue;.
It's also useful in writing complex macros that evaluate to a value.
The comma operator just separates expressions, so you can do multiple things instead of just one where only a single expression is required. It lets you do things like
(x) (y)
for (int i = 0, j = 0; ...; ++i, ++j)
Note that x is not the comma operator but y is.
You really don't have to think about it. It has some more arcane uses, but I don't believe they're ever absolutely necessary, so they're just curiosities.
Within for loop constructs it can make sense. Though I generally find them harder to read in this instance.
It's also really handy for angering your coworkers and people on SO.
bool guess() {
return true, false;
}
Playing Devil's Advocate, it might be reasonable to reverse the question:
Is it good practice to always use the semi-colon terminator?
Some points:
Replacing most semi-colons with commas would immediately make the structure of most C and C++ code clearer, and would eliminate some common errors.
This is more in the flavor of functional programming as opposed to imperative.
Javascript's 'automatic semicolon insertion' is one of its controversial syntactic features.
Whether this practice would increase 'common errors' is unknown, because nobody does this.
But of course if you did do this, you would likely annoy your fellow programmers, and become a pariah on SO.
Edit: See AndreyT's excellent 2009 answer to Uses of C comma operator. And Joel 2008 also talks a bit about the two parallel syntactic categories in C#/C/C++.
As a simple example, the structure of while (foo) a, b, c; is clear, but while (foo) a; b; c; is misleading in the absence of indentation or braces, or both.
Edit #2: As AndreyT states:
[The] C language (as well as C++) is historically a mix of two completely different programming styles, which one can refer to as "statement programming" and "expression programming".
But his assertion that "in practice statement programming produces much more readable code" [emphasis added] is patently false. Using his example, in your opinion, which of the following two lines is more readable?
a = rand(), ++a, b = rand(), c = a + b / 2, d = a < c - 5 ? a : b;
a = rand(); ++a; b = rand(); c = a + b / 2; if (a < c - 5) d = a; else d = b;
Answer: They are both unreadable. It is the white space which gives the readability--hurray for Python!. The first is shorter. But the semi-colon version does have more pixels of black space, or green space if you have a Hazeltine terminal--which may be the real issue here?
Everyone is saying that it is often used in a for loop, and that's true. However, I find it's more useful in the condition statement of the for loop. For example:
for (int x; x=get_x(), x!=sentinel; )
{
// use x
}
Rewriting this without the comma operator would require doing at least one of a few things that I'm not entirely comfortable with, such as declaring x outside the scope where it's used, or special casing the first call to get_x().
I'm also plotting ways I can utilize it with C++11 constexpr functions, since I guess they can only consist of single statements.
I think the only common example is the for loop:
for (int i = 0, j = 3; i < 10 ; ++i, ++j)
As mentioned in the c-faq:
Once in a while, you find yourself in a situation in which C expects a
single expression, but you have two things you want to say. The most
common (and in fact the only common) example is in a for loop,
specifically the first and third controlling expressions.
The only reasonable use I can think of is in the for construct
for (int count=0, bit=1; count<10; count=count+1, bit=bit<<1)
{
...
}
as it allows increment of multiple variables at the same time, still keeping the for construct structure (easy to read and understand for a trained eye).
In other cases I agree it's sort of a bad hack...
I also use the comma operator to glue together related operations:
void superclass::insert(item i) {
add(i), numInQ++, numLeft--;
}
The comma operator is useful for putting sequence in places where you can't insert a block of code. As pointed out this is handy in writing compact and readable loops. Additionally, it is useful in macro definitions. The following macro increments the number of warnings and if a boolean variable is set will also show the warning.
#define WARN if (++nwarnings, show_warnings) std::cerr
So that you may write (example 1):
if (warning_condition)
WARN << "some warning message.\n";
The comma operator is effectively a poor mans lambda function.
Though posted a few months after C++11 was ratified, I don't see any answers here pertaining to constexpr functions. This answer to a not-entirely-related question references a discussion on the comma operator and its usefulness in constant expressions, where the new constexpr keyword was mentioned specifically.
While C++14 did relax some of the restrictions on constexpr functions, it's still useful to note that the comma operator can grant you predictably ordered operations within a constexpr function, such as (from the aforementioned discussion):
template<typename T>
constexpr T my_array<T>::at(size_type n)
{
return (n < size() || throw "n too large"), (*this)[n];
}
Or even something like:
constexpr MyConstexprObject& operator+=(int value)
{
return (m_value += value), *this;
}
Whether this is useful is entirely up to the implementation, but these are just two quick examples of how the comma operator might be applied in a constexpr function.