Roughly speaking in C++ there are:
operators (+, -, *, [], new, ...)
identifiers (names of classes, variables, functions,...)
const literals (10, 2.5, "100", ...)
some keywords (int, class, typename, mutable, ...)
brackets ({, }, <, >)
preprocessor (#, ## ...).
But what is the semicolon?
The semicolon is a punctuator, see 2.13 ยง1
The lexical representation of C++ programs includes a number of preprocessing tokens which are used in
the syntax of the preprocessor or are converted into tokens for operators and punctuators
It is part of the syntax and therein element of several statements. In EBNF:
<do-statement>
::= 'do' <statement> 'while' '(' <expression> ')' ';'
<goto-statement>
::= 'goto' <label> ';'
<for-statement>
::= 'for' '(' <for-initialization> ';' <for-control> ';' <for-iteration> ')' <statement>
<expression-statement>
::= <expression> ';'
<return-statement>
::= 'return' <expression> ';'
This list is not complete. Please see my comment.
The semicolon is a terminal, a token that terminates something. What exactly it terminates depends on the context.
Semicolon denotes sequential composition. It is also used to delineate declarations.
Semicolon is a statement terminator.
The semicolon isn't given a specific name in the C++ standard. It's simply a character that's used in certain grammar productions (and it just happens to be at the end of them quite often, so it 'terminates' those grammatical constructs). For example, a semicolon character is at the end of the following parts of the C++ grammar (not necessarily a complete list):
an expression-statement
a do/while iteration-statement
the various jump-statements
the simple-declaration
Note that in an expression-statement, the expression is optional. That's why a 'run' of semicolons, ;;;;, is valid in many (but not all) places where a single one is.
';'s are often used to delimit one bit of C++ source code, indicating it's intentionally separate from the following code. To see how it's useful, let's imagine we didn't use it:
For example:
#include <iostream>
int f() { std::cout << "f()\n"; }
int g() { std::cout << "g()\n"; }
int main(int argc)
{
std::cout << "message"
"\0\1\0\1\1"[argc] ? f() : g(); // final ';' needed to make this compile
// but imagine it's not there in this new
// semicolon-less C++ variant....
}
This (horrible) bit of code, called with no arguments such that argc is 1, prints:
ef()\n
Why not "messagef()\n"? That's what might be expected given first std::cout << "message", then "\0\1\0\1\1"[1] being '\1' - true in a boolean sense - suggests a call to f() printing f()\n?
Because... (drumroll please)... in C++ adjacent string literals are concatenated, so the program's parsed like this:
std::cout << "message\0\1\0\1\1"[argc] ? f() : g();
What this does is:
find the [argc/1] (second) character in "message\0\1\0\1\1", which is the first 'e'
send that 'e' to std::cout (printing it)
the ternary operator '?' triggers casting of std::cout to bool which produces true (because the printing presumably worked), so f() is called...!
Given this string literal concatenation is incredibly useful for specifying long strings
(and even shorter multi-line strings in a readable format), we certainly wouldn't want to assume that such strings shouldn't be concatenated. Consequently, if the semicolon's gone then the compiler must assume the concatenation is intended, even though visually the layout of the code above implies otherwise.
That's a convoluted example of how C++ code with and with-out ';'s changes meaning. I'm sure if I or other readers think on it for a few minutes we could come up with other - and simpler - examples.
Anyway, the ';' is necessary to inform the compiler that statement termination/separation is intended.
The semicolon lets the compiler know that it's reached the end of a command AFAIK.
The semicolon (;) is a command in C++. It tells the compiler that you're at the end of a command.
If I recall correctly, Kernighan and Ritchie called it punctuation.
Technically, it's just a token (or terminal, in compiler-speak), which
can occur in specific places in the grammar, with a specific semantics
in the language. The distinction between operators and other punctuation
is somewhat artificial, but useful in the context of C or C++, since
some tokens (,, = and :) can be either operators or punctuation,
depending on context, e.g.:
f( a, b ); // comma is punctuation
f( (a, b) ); // comma is operator
a = b; // = is assignment operator
int a = b; // = is punctuation
x = c ? a : b; // colon is operator
label: // colon is punctuation
In the case of the first two, the distinction is important, since a user
defined overload will only affect the operator, not punctuation.
It represents the end of a C++ statement.
For example,
int i=0;
i++;
In the above code there are two statements. The first is for declaring the variable and the second one is for incrementing the value of variable by one.
Related
Is there a difference between;
int main(){
return 0;
}
and
int main(){return 0;}
and
int main(){
return
0;
}
They will all likely compile to same executable. How does the C/C++ compiler treat the extra spaces and newlines, and if there is a difference between how newlines are treated differently than spaces in C code?
Also, how about tabs? What's the significance of using tabs instead of spaces in code, if there is any?
Any sequence of 1+ whitespace symbol (space/line-break/tab/...) is equivalent to a single space.
Exceptions:
Whitespace is preserved in string literals. They can't contain line-breaks, except C++ raw literals (R"(...)"). The same applies to file names in #include.
Single-line comments (//) are terminated with line-breaks only.
Preprocessor directives (starting with #) are terminated with line-breaks only.
\ followed by a line-break removes both, allowing multi-line // comments, preprocessor directrives, and string literals.
Also, whitespace symbols are ignored if there is punctuation (anything except letters, numbers, and _) to the left and/or to the right of it. E.g. 1 + 2 and 1+2 are the same, but return a; and returna; are not.
Exceptions:
Whitespace is not ignored inside string literals, obviously. Nor in #include file names.
Operators consisting of >1 punctuation symbols can't be separated, e.g. cout < < 1 is illegal. The same applies to things like // and /* */.
A space between punctuation might be necessary to prevent it from coalescing into a single operator. Examples:
+ +a is different from ++a.
a+++b is equivalent to a++ +b, but not to a+ ++b.
Pre-C++11, closing two template argument lists in a row required a space: std::vector<std::vector<int> >.
When defining a function-like macro, the space is not allowed before the opening parenthesis (adding it turns it into an object-like macro). E.g. #define A() replaces A() with nothing, but #define A () replaces A with ().
I am printing a line like this
cout<<"Hello //stackoverflow";
And this produces the following output
Hello //stackoverflow
I want to know why it does not give me an error as I commented half of the statement and there should be
missing terminating " character
error.
The grammar of C++ (like most of programming languages) is context-sensitive. Simply, // does not start a comment if it is within a string literal.
For an in depth analysis of this, you'd have to refer to the language grammar, and the string literal production rules in particular.
Informally speaking, the fact that // appears in the quoted string literal means that it does not denote a comment block. The same applies to /* and */.
The converse applies to other constructs, where maximal munch requires parsing into the token denoting the start of a comment block; a space is needed before the pointer dereference operator in
#include <iostream>
using namespace std;
int main() {
int n = 1;
int* p = &n;
cout << 1 / *p; // Removing the final space will fail compilation.
}
In easy terms, This is because everything inside quotes is recognized as a string and so the computer does not evaluate // as the way to start a comment.
I have been tasked with a project that involves me taking a Grammar (in BNF form) and creating a lexical scanner (using lex) and a parser (using bison). I've never worked with any of these programs and I think a good reference would be to see how these items are created from a grammar. I am looking for a grammar and it's associated .l and .ypp files, preferably in C++. I've been able to find sample files or sample grammars, but not both of them. I've spent some time searching and I could not find anything. I figure I'd post here in hopes that someone has something for me, but I will continue searching in the meantime.
I am currently reading Tom Niemann's
http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf which seems to be pretty well written and understandable.
Thanks
Edit: I am still searching, I am starting to think that what I am looking for does not exist. Google usually never fails me!
Edit 2: Maybe if I provide some of the grammar, you folks could show me what the appropriate .l and .ypp files would look like. This is just a snippet of the grammar, I just need a little 'taste' of how this works and I think I can take it from there.
Grammar:
Program ::= Compound
Statements ::= Compound | Assignment | ...
Assignment ::= Var ASSIGN Expression
Expression ::= Var | Operator Expression Expression | Number
Compound := START Statements END
Number ::= NUMBER
Descriptions:
Assignment is the equal sign ":="
Var is an identifier that begins with a lower case letter and is followed by lower case letters or digits
START is the "start" keyword
END is the "end keyword
Operator is "+", "-", "*", "/"
Number is decimal digits which could potentially be negative (minus sign in front)
Most of this is fairly straightforward. One part, however, is decidedly problematic. You've defined a number to (potentially) include a leading -, and that's a problem.
The problem is pretty simple. Given an input like 321-123, it's essentially impossible for the lexer (which won't normally keep track of current state) to guess at whether that's supposed to be two tokens (321 and -123 or three 321, -, 123). In this case, the - is almost certainly intended to be separate from the 123, but if the input were 321 + -123 you'd apparently want -123 as a single token instead.
To deal with that, you probably want to change your grammar so the leading - isn't part of the number. Instead, you always want to treat the - as an operator, and the number itself is composed solely of the digits. Then it's up to the parser to sort out expressions where the - is unary vs. binary.
Taking that into account, the lexer file would look something like this:
%{
#include "y.tab.h"
%}
%option noyywrap case-insensitive
%%
:= { return ASSIGN; }
start { return START; }
end { return END; }
[+/*] { return OPERATOR; }
- { return MINUS; }
[0-9]+ { return NUMBER; }
[a-z][a-z0-9]* { return VAR; }
[ \r\n] { ; }
%%
void yyerror(char const *s) { fputs(s, stderr); }
The matching yacc file would look something like this:
%token ASSIGN START END OPERATOR MINUS NUMBER VAR
%left '-' '+' '*' '/'
%%
program : compound
statement : compound
| assignment
;
assignment : VAR ASSIGN expression
;
statements :
| statements statement
;
expression : VAR
| expression OPERATOR expression
| expression MINUS expression
| value
;
value: NUMBER
| MINUS NUMBER
;
compound : START statements END
%%
int main() {
yyparse();
return 0;
}
Note: I've tested these only extremely minimally--enough to verify input I believe is grammatical, such as: start a:=1 b:=2 end and start a:=1+3*3 b:=a+4 c:=b*3 end is accepted (no error message printed out) and input I believe is un-grammatical, such as: 9:=13 and a=13 do both print out syntax error messages. Since this doesn't attempt to do any more with the expressions than recognize those which are or are not grammatical, that's about the best we can do though.
This question already has answers here:
How does the Comma Operator work
(9 answers)
Closed 9 months ago.
I came across unexpected (to me at least) C++ behavior today, shown by the following snippit:
#include <iostream>
int main()
{
std::cout << ("1", "2") << std::endl;
return 0;
}
Output:
2
This works with any number of strings between the parentheses. Tested on the visual studio 2010 compiler as well as on codepad.
I'm wondering why this compiles in the first place, what is the use of this 'feature'?
Ahh, this is the comma operator. When you use a comma and two (or more) expressions, what happens is that all expressions are executed, and the result as a whole is the result of the last expression. That is why you get "2" as a result of this. See here for a bigger explanation.
It's called the comma operator: in an expression x, y, the compiler
first evaluates x (including all side effects), then y; the results
of the expression are the results of y.
In the expression you cite, it has absolutely no use; the first string
is simply ignored. If the first expression has side effects, however,
it could be useful. (Mostly for obfuscation, in my opinion, and it's
best avoided.)
Note too that this only works when the comma is an operator. If it can
be anything else (e.g. punctuation separating the arguments of a
function), it is. So:
f( 1, 2 ); // Call f with two arguments, 1 and 2
f( (1, 2) ); // Call f with one argument, 2
(See. I told you it was good for obfuscation.)
Comma operator ( , )
The comma operator (,) is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the rightmost expression is considered.
For example, the following code:
a = (b=3, b+2);
Ref:http://www.cplusplus.com/doc/tutorial/operators/
The result of the comma (",") is the right subexpression.
I use it in loops over stl containers:
for( list<int>::iterator = mylist.begin(), it_end = mylist.end(); it != it_end; ++it )
...
The comma operator evaluates the expressions on both sides of the comma, but returns the result of the second.
#define PR ( A, B ) cout << ( A ) << ( B ) << endl ;
- error -> A was not declared in scope
- error -> B was not declared in scope
- error -> expected "," before "cout"
I thought C++ was space free language but when I write above code, then I see some errors.
I am still thinking "Is my console is not working properly or library?".
If I am not wrong, how can someone say "C++ is a space free language"?
There are numerous exceptions where whitespace matters; this is one of them. With the space after PR, how is the preprocessor supposed to know whether (A,B) is part of the macro expansion, or its arguments? It doesn't, and simply assumes that wherever it sees PR, it should substitute ( A, B ) cout << ( A ) << ( B ) << endl ;.
Another place where whitespace matters is in nested template arguments, e.g.:
std::vector<std::vector<int> >
That final space is mandatory, otherwise the compiler assumes it's the >> operator. (Although I believe this is sorted out in C++0x).
Yet another example is:
a + +b;
The space in between the two + symbols is mandatory, for obvious reasons.
You can't have a space between the macro-function-name and the parenthesis starting the argument list.
#define PR(A, B) cout << ( A ) << ( B ) << endl
Whitespace in the form of the newline also matters, because a #define statement ends when the preprocessor hits the newline.
Note that its usually a bad idea to put semicolons at the end of macro function definitions, it makes them look confusing when used without a semicolon below.
A #define is not c++, it's preprocessor. The rules of c++ aren't the same as the rules of the preprocessor.
To indicate a macro, you mustn't have a space between the name and the parenthesis.
#define PR(A, B) cout << ( A ) << ( B ) << endl;
You're asking for defense of a claim I've never heard anyone bother to voice...?
The preprocessor stage doesn't follow the same rules as the later lexing etc. stages. There are other quirks: the need for a space between > closing templates, newline-delimited comments, string literals can't embed actual newlines (as distinct from escape sequences for them), space inside character and string literals affects them....
Still, there's a lot of freedom to indent and line-delimit the code in different ways, unlike in say Python.
You can think of the c++ preprocessor as instruction to the preprocessor (part of the compiler) and not exactly a part of the "c++ space".. So the rules are indeed different although many references are shared between the two 'spaces'..