The limitations of the comma operator - c++

I have read this question and I want to add to it that what are the things that can not be done using the comma operator. This has confused me a lot, as I can do this:
int arr[3];
arr[0]=1,arr[1]=2,arr[2]=3;
But when I do:
int arr[3],arr[0]=1,arr[1]=2,arr[2]=3;
It gives me a compiler error.
I want to ask that what are the limitations of the comma operator in real practice?

One thing to realize is that not all uses of a comma in C are instances of the comma operator. Changing your second example to be a syntactically declaration:
int a0=1,a1=2,a2=3;
the commas are not operators, they're just syntax required to separate instances of declarators in a list.
Also, the comma used in parameter/argument lists is not the comma operator.
In my opinion the use of the comma operator is almost always a bad idea - it just causes needless confusion. In most cases, what's done using a comma operator can be more clearly done using separate statements.
Two exceptions that come to mind easily are inside the control clauses of a for statement, and in macros that absolutely need to cram more than one 'thing' into a single expression, and even this should only be done when there's no other reasonable option).

You can use the comma operator most anywhere that an expression can appear. There are a few exceptions; notably, you cannot use the comma operator in a constant expression.
You also have to be careful when using the comma operator where the comma is also used as a separator, for example, when calling functions you must use parentheses to group the comma expression:
void f(int, bool);
f(42, 32, true); // wrong
f((42, 32), true); // right (if such a thing can be considered "right")
Your example is a declaration:
int arr[3],arr[0]=1,arr[1]=2,arr[2]=3;
In a declaration, you can declare multiple things by separating them with the comma, so here too the comma is used as a separator. Also, you can't just tack on an expression to the end of a declaration like this. (Note that you can get the desired result by using int arr[3] = { 1, 2, 3 };).

Related

why does C++ allow a declaration with no space between the type and a parenthesized variable name? [duplicate]

This question already has an answer here:
Is white space considered a token in C like languages?
(1 answer)
Closed 8 months ago.
A previous C++ question asked why int (x) = 0; is allowed. However, I noticed that even int(x) = 0; is allowed, i.e. without a space before the (x). I find the latter quite strange, because it causes things like this:
using Oit = std::ostream_iterator<int>;
Oit bar(std::cout);
*bar = 6; // * is optional
*Oit(bar) = 7; // * is NOT optional!
where the final line is because omitting the * makes the compiler think we are declaring bar again and initializing to 7.
Am I interpreting this correctly, that int(x) = 0; is indeed equivalent to int x = 0, and Oit(bar) = 7; is indeed equivalent to Oit bar = 7;? If yes, why specifically does C++ allow omitting the space before the parentheses in such a declaration + initialization?
(my guess is because the C++ compiler does not care about any space before a left paren, since it treats that parenthesized expression as it's own "token" [excuse me if I'm butchering the terminology], i.e. in all cases, qux(baz) is equivalent to qux (baz))
It is allowed in C++ because it is allowed in C and requiring the space would be an unnecessary C-compatibility breaking change. Even setting that aside, it would be surprising to have int (x) and int(x) behave differently, since generally (with few minor exceptions) C++ is agnostic to additional white-space as long as tokens are properly separated. And ( (outside a string/character literal) is always a token on its own. It can't be part of a token starting with int(.
In C int(x) has no other potential meaning for which it could be confused, so there is no reason to require white-space separation at all. C also is generally agnostic to white-space, so it would be surprising there as well to have different behavior with and without it.
One requirement when defining the syntax of a language is that elements of the language can be separated. According to the C++ syntax rules, a space separates things. But also according to the C++ syntax rules, parentheses also separate things.
When C++ is compiled, the first step is the parsing. And one of the first steps of the parsing is separating all the elements of the language. Often this step is called tokenizing or lexing. But this is just the technical background. The user does not have to know this. He or she only has to know that things in C++ must be clearly separted from each others, so that there is a sequence "*", "Oit", "(", "bar", ")", "=", "7", ";".
As explained, the rule that the parenthesis always separates is established on a very low level of the compiler. The compiler determines even before knowing what the purpose of the parenthesis is, that a parenthesis separates things. And therefore an extra space would be redundant.
When you ever use parser generators, you will see that most of them just ignore spaces. That means, when the lexer has produced the list of tokens, the spaces do not exist any more. See above in the list. There are no spaces any more. So you have no chance to specify something that explicitly requires a space.

Regular expression in C++ for mathematical expressions

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /.
For example:
6+(69-9)+3
is ok while
6++8-(52--*3)
no.
I am not using the library <regex> since it is only compatible with C++11.
Is there a alternative method to solve this problem? Thanks.
You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.
So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:
bool check_balanced(const char* expr, char open, char close) {
int parens = 0;
for (const char* p = expr; *p; ++p) {
if (*p == open) ++parens;
else if (*p == close && parens-- == 0) return false;
}
return parens == 0;
}
To get the regular expression, note that mathematical expressions without function calls can be summarized as:
BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*
where:
BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).
AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.
BETWEEN is a sub-regex which matches a binary operator.
VALUE is a sub-regex which matches a value.
For example, for ordinary four-operator arithmetic on integers you would have:
BEFORE: [-+(]
AFTER: [)]
BETWEEN: [-+*/]
VALUE: [[:digit:]]+
and putting all that together you might end up with the regex:
^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$
If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.
You can loop over each charin your expression.
If you encounter a + you can check whether it is follow by another +, /, *...
Additionally you can group operators together to prevent code duplication.
int i = 0
while(!EOF) {
switch(expression[i]) {
case '+':
case '*': //Do your syntax checks here
}
i++;
}
Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.
However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.
Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123
Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.
The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:
5--3 can be 5-(-3)
So in this case, you have two choices:
Ensure that the input is parenthesized well enough that no two operators meet
If you find something like --, treat it as a special case and investigate it further
If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:
--
+-
+*
-+
etc.
If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).
You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:
[+\-\*\\/][+\-\*\\/]
which will work for the following examples (and more):
6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)
An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.
There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.
A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:
Write a grammar for a mathematical expression. (These can be found online)
Tokenize the input into lexemes. (This will be done with a regex, typically).
Match the expressions based on the next lexeme you see.
Recurse based on your grammar
A quick Google search can provide many example recursive-descent parsers written in C++.

What's the difference between the comma operator and the comma separator? [duplicate]

This question already has answers here:
How does the compiler know that the comma in a function call is not a comma operator?
(6 answers)
Closed 8 years ago.
In C++, the comma token (i.e., ,) is either interpreted as a comma operator or as a comma separator.
However, while searching in the web I realized that it's not quite clear in which cases the , token is interpreted as the binary comma operator and where is interpreted as a separator between statements.
Moreover, considering multiple statements/expressions in one line separated by , (e.g., a = 1, b = 2, c = 3;), there's a turbidness on the order in which they are evaluated.
Questions:
In which cases a comma , token is interpreted as an operator and in which as a separator?
When we have one line multiple statements/expressions separated by comma what's the order of evaluation for either the case of the comma operator and the case of the comma separator?
When a separator is appropriate -- in arguments to a function call or macro, or separating values in an initializer list (thanks for the reminder, #haccks) -- comma will be taken as a separator. In other expressions, it is taken as an operator. For example,
my_function(a,b,c,d);
is a call passing four arguments to a function, whereas
result=(a,b,c,d);
will be understood as the comma operator. It is possible, through ugly, to intermix the two by writing something like
my_function(a,(b,c),d);
The comma operator is normally evaluated left-to-right.
The original use of this operation in C was to allow a macro to perform several operations before returning a value. Since a macro instantiation looks like a function call, users generally expect it to be usable anywhere a function call could be; having the macro expand to multiple statements would defeat that. Hence, C introduced the , operator to permit chaining several expressions together into a single expression while discarding the results of all but the last.
As #haccks pointed out, the exact rules for how the compiler determines which meaning of , was intended come out of the language grammar, and have previously been discussed at How does the compiler know that the comma in a function call is not a comma operator?
You cannot use comma to separate statements. The , in a = 1, b = 2; is the comma operator, whose arguments are two assignment expressions. The order of evaluation of the arguments of the comma operator is left-to-right, so it's clear what the evaluation order is in that case.
In the context of the arguments to a function-call, those arguments cannot be comma-expressions, so the top-level commas must be syntactic (i.e. separating arguments). In that case, the evaluation order is not specified. (Of course, the arguments might be parenthesized expressions, and the parenthesized expression might be a comma expression.)
This is expressed clearly in the grammar in the C++ standard. The relevant productions are expression, which can be:
assignment-expression
or
expression , assignment-expression
and expression-list, which is the same as an initializer-list, which is a ,-separated list of initializer-clause, where an initializer-clause is either:
assignment-expression
or
braced-init-list
The , in the second expression production is the comma-operator.

Explain the difference:

int i=1,2,3,4; // Compile error
// The value of i is 1
int i = (1,2,3,4,5);
// The value of i is 5
What is the difference between these definitions of i in C and how do they work?
Edit: The first one is a compiler error. How does the second work?
= takes precedence over ,1. So the first statement is a declaration and initialisation of i:
int i = 1;
… followed by lots of comma-separated expressions that do nothing.
The second code, on the other hand, consists of one declaration followed by one initialisation expression (the parentheses take precedence so the respective precedence of , and = are no longer relevant).
Then again, that’s purely academic since the first code isn’t valid, neither in C nor in C++. I don’t know which compiler you’re using that it accepts this code. Mine (rightly) complains
error: expected unqualified-id before numeric constant
1 Precedence rules in C++ apply regardless of how an operator is used. = and , in the code of OP do not refer to operator= or operator,. Nevertheless, they are operators as far as C++ is concerned (§2.13 of the standard), and the precedence of the tokens = and , does not depend on their usage – it so happens that , always has a lower precedence than =, regardless of semantics.
You have run into an interesting edge case of the comma operator (,).
Basically, it takes the result of the previous statement and discards it, replacing it with the next statement.
The problem with the first line of code is operator precedence. Because the = operator has greater precedence than the , operator, you get the result of the first statement in the comma chain (1).
Correction (thanks #jrok!) - the first line of code neither compiles, nor is it using the comma as an operator, but instead as an expression separator, which allows you to define multiple variable names of the same type at a time.
In the second one, all of the first values are discarded and you are given the final result in the chain of items (5).
Not sure about C++, but at least for C the first one is invalid syntax so you can't really talk about a declaration since it doesn't compile. The second one is just the comma operator misused, with the result 5.
So, bluntly, the difference is that the first isn't C while the second is.

Finding Elvis ?:

I have been tasked to find Elvis (using eclipse search). Is there any regex that I can use to find him?
The "Elvis Operator" (?:) is a shortening of Java's ternary operator.
I have tried \?[\s\S]*[:] but it doesn't match multiline.
Is there such a refactoring where I could change Elvis into an if-else block?
Edit
Sorry, I had posted a regex for the ternary operator, if your problem is multiline you could use this:
\?(\p{Z}|\r\n|\n)*:
You'll need to explicitly match line delimiters if you want to match across multiple lines. \R will match any of them(platform-independent), in Eclipse 3.4 anyway, or you can use the proper one for your file (\r, \n, \r\n). E.g. \?.*\R*.*: will work if there's only one line break. You can't use \R in a character class, though, so if you don't know how many lines the operator might span, you'd have to construct a character class with your line delimiter and any character that might appear in an operand. Something like ([-\r\n\w\s\[\](){}=!/%*+&^|."']*)\?([-\r\n\w\s\[\](){}=!/%*+&^|."']*):([-\r\n\w\s\[\](){}=!/%*+&^|."']*). I've included parentheses to capture the operands as groups so you could find and replace.
You've got a pretty big problem, though, if this is Java (and probably any other language). The ternary conditional ?: operator creates an expression, while an if statement is not an expression. Consider:
boolean even = true;
int foo = even ? 2 : 3;
int bar = if (even) 2 else 3;
The third line is syntactically incorrect; the two conditional constructs are not equivalent. (What you'd actually get from the second line if you used my regex to find and replace is if (int foo = even) 2 else 3; which has additional problems.)
So, you can find the ?: operators with the regex above (or something similar; I may have missed some characters you need to include in the class), but you won't necessarily be able to replace them with 'if' statements.