PHP, C++, etc. syntax explanation - if-statement

Why in the most programming languages required to put a semicolon after statements but not after things like if elseif and else?
Do the compilers all look out for newlines? If that's true then why wouldn't they do that for all statements?
Am I missing something? It really doesn't make sense to me...

Usually the semicolon is required because the compilers ignore most whitespace. It's not needed after statements like if, elseif, else because it's not the end of the statement. Those are only complete statements when followed by a statement or block of statements.

Because compilers for those languages don't consider whitespace for statement termination. A statement has to be terminated some how and it's done with a semicolon. You can write all code (although it would be a horrible, horrible idea) on one line as long as you terminate statements correctly.

Some compilers ignore whitespace and use the semicolon to determine statements, like variable assignments vs. if {} code blocks.
Other languages, like python, use whitespace to find statements and if blocks.

It relates to the difference between statements and expressions. Languages like C require a block to contain a series of statements. Control flow structures like if and while are statements all on their own:
void foo() {
if (bar) {
/* ... */
}
while (baz) {
/* ... */
}
}
There are no semicolons needed here because everything inside foo() is a statement. The question is, what if you want an expression like bar() in a place where a statement is expected? The C grammar says you can do that by adding a semicolon after the expression (in other words, an expression followed by ; is a statement).
So, this isn't valid:
void foo() {
1 + 2
}
Because 1 + 2 is an expression. You need to turn it into a statement:
void foo() {
1 + 2;
}
To fully understand what's going on, it's important to note that a block (something in curlies like { foo(); } is also a statement, and that the grammar for if is something like:
if ( <condition> ) <statement> (else <statement>)?
This is why if statements can have blocks for bodies or single statements: a block is a single statement.

Why use ';' rather than '\n' to terminate a statement.
Very few languages uses white space as something that affects the semantics of a language (new line being white space). This is because it is very error prone for the developer (as the white space is for all intense invisible).
Look at languages that do use white space to affect the semantics and you will see the effect on the programmers (special tools are usually built to help align things etc). Lots of bald developers (male and female) who have torn out their hair if frustration because they inserted a space instead of a tab and this caused the loop to be exited early. :-)
Also by not using white space to alter semantic meaning of the language you can now use white space solely for the human readers of the language to try and format the code into a form that makes it easy to read (for the human) (or you can abuse the white space to hide meaning).
Why are not all statements white space
Its just that you need a way to check the end of statement.
Some things it is obvious where the end of the statement is and you do not need a an extra delimiter to allow the parser to detect the end of a statement.

A source code is a set of statements. We have to delimitate the statements, using delimitators. If we use the newline as the delimitator, we can't structure our codes. Very long lines will only be readable by scrolling. (to avoid scrolling, long lines usually are split.) For example:
ParserUtils.RefreshProperty(m_cfg.PORTAL_ID, ParserUtils.CreateHashFromUrl(strDetailLinkUrl), Convert.ToInt32(m_cfg.Type), strPrice, strAddress, strStreet, strPostCode, strFeatures, strDescription, strImgFile, strBedrooms, strReception, strBath, strStatus, strLink, strPropType, strOutside, strTenure, strKeywords, strFullText, strContactInfo, m_ieBrowser.URL);
is very ugly and instead of this, we split this line to several lines to make this more readable:
ParserUtils.RefreshProperty(m_cfg.PORTAL_ID,
ParserUtils.CreateHashFromUrl(strDetailLinkUrl),
Convert.ToInt32(m_cfg.Type), strPrice,
strAddress, strStreet, strPostCode, strFeatures, strDescription, strImgFile,
strBedrooms, strReception, strBath, strStatus, strLink, strPropType,
strOutside, strTenure, strKeywords, strFullText, strContactInfo,
m_ieBrowser.URL);
This would be impossible if newline was the delimitator. Ifs, whiles and fors would be a total mess if newline was the operator. Consider this code:
for (int i = 0; i < n; i++)
{
if (i % 2 == 0)
{
System.out.println("It can be divided by two");
}
{
System.out.println("It can't be divided by two");
}
}
If newline was the operator instead of the semicolon, this source code would be very ugly:
for (int i = 0
i < 0
i++) { if (i % 2 == 0) { System.out.println("It can be divided by two")
} { System.out.println("It can't be divided by two")
} }
This code is difficult to read, and it's logically valid as the delimitator. For example, my wife writes my tasks on a paper (an algorithm) like this:
Buy food(Bread, Meat, Butter),
Go and pay the taxes,
Call your mother, because she wants to talk to you
These tasks are separated by commas, but notice that parameters are separated by commas too. We have to make a difference between a comma as a parameter separator and a comma as a delimitator of tasks, because the computer is not as intelligent as human beings. As a conclusion, the separator of tasks is a bigger comma than the separator of parameters. That's why the delimitator of statements is a semicolon and the delimitator of parameters is a comma.

Related

Why don't we add a semicolon (;) at the end of if/else?

In Rust, I have noticed that everything is an expression except 2 kinds of statements. Every expression that adds ; will become a statement. Rust's grammar wants statements to follow other statements.
So why don't we add ; at the end of an if / else "expression"? This is also an expression, so why don't we do this:
if true {
println!("true");
} else {
println!("false");
};
The most common answer in the discussion is that it looks more like what users coming from other languages expect, and there is no harm in allowing that syntax (since the result type is () thanks to semicolons in the branches).
I guess because it is a block expression in other languages like Java or Nginx-Conf a semicolon is only set after statements and not after blocks.

Does the position of brackets and whitespace affect compiler times and/or run-time?

Often while looking at other people's code, I notice a variance in bracket placement for blocks.
For instance, some use:
int foo(){
...
}
Whereas others use:
int foo()
{
...
}
And a number of ways in between. Does this at all affect how fast the code is compiled? For instance, if I were to have a series of blocks such as:
int foo() { ... {... {... {... {...} } } } }
int bar()
{
...
{
...
{
...
{
...
{
...
}
}
}
}
}
Where foo() and bar() are identical except for whitespace and bracket placement. Would the functions take different times to compile? Would one be faster at runtime than the other?
Would this be any different if this were to be expanded to several hundred or thousand nested blocks? Does this change based on the compiler used? Would it change for different languages, such as C#, PHP, Perl, etc?
Sorry if this seems like a lot of general or open-ended questions, just something that's always interested me.
Would the functions take different times to compile? Would one be faster at runtime than the other? Would this be any different if this were to be expanded to several hundred or thousand nested blocks? Does this change based on the compiler used? Would it change for different languages, such as C#, PHP, Perl, etc?
No. No. No. No. No. Virtually all sane compilers strip out whitespace almost immediately in the lexing phase. The other phases don't even know that there was whitespace.
The only way in which this could make any difference is the most hideously incompetently written compiler ever, and even then I'd be amazed (also a bug of this magnitude would make it so buggy it would be completely unusable).
The first thing a compiler will do is perform lexical analysis to strip whitespace, comments, etc and transform the input into a series of tokens.
The full process is similar to the following, depending on the specific implementation:
Since the lexer passes a series of tokens to the parser any additional whitespace, bracket positions, etc could potentially slow down only the lexing phase. And even then the difference will not be noticeable unless you had an extreme case, like a GB of whitespace or something crazy like that.

Coding style advice/rationale(s) for placing spaces in control statements with C++ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Given the following two coding styles, please specify a rationale (some pros/cons) for why one would possibly be preferable over the other when writing C++ code.
( Please do not answer with "it is not important"; "just stick to one"; etc. The question is specifically about the possible pro/cons (if any) of the two spacing styles below. Thanks. )
// VARIANT A (NO space after control keyword / space before curly brace)
if(condition) {
// ...
}
else if(c2) {
// ...
}
else {
// ...
}
for(int i=0; i<e; ++i) {
// ...
}
...
// vs. VARIANT B (space after control keyword / NO space before curly brace)
if (condition){
// ...
}
else if (c2){
// ...
}
else{
// ...
}
for (int i=0; i<e; ++i){
// ...
}
...
Note: Apart from taste issues, I am asking this because I see both styles in our code-base and would try to get some arguments for which is to be preferred.
A lot of people get very obsessed about where to place the spaces, braces, parenthesis, brackets, semi-colons and so on whilst at the same time forgetting that the most important thing about source code is that it needs to be understood by another human being.
The compiler couldn't care less about formatting.
After many years in the programming profession, I've come to use this one simple rule:
Is it easy to read and understand what the code is doing?
It doesn't matter how you format the code, if the above condition isn't met, the code is of poor quality.
I have a personal preference to formatting, and I won't say what it is here as it really doesn't matter what it is.
I find it useful having different programmers code in different styles.
There is of course an exception to this rule: Documentation and tutorial examples should always be consistent - you need to get the reader to follow the important elements being shown and not get sidetracked by the formatting.
Since this is probably much about habit and taste, it might be hard to get any concrete arguments for or against, but here's what I think:
Both works reasonably good, but how would it look to have spaces in function calls? like this:
len = strlen (somePtr);
That's at least one space too many in my opinion.
Having spaces in that example makes things look more like a control statement than a function call, and I think it is useful to make if/else/while/for stand out a bit.
But I realize this is a pretty subjective view. :)
I recently fixed some code where a single-line-if was modified and someone forgot to add some braces.
While reading the code, it seemed to me, that the style ...condition){ is harder to read than the style ...condition) { because the closing ) and opening { are easier to see when separated by a space. (When using Courier New on VS2005 - may be different with different fonts I guess.)
I would also argue that with if( the separation is pretty clear without any added whitespace, especially because in a modern editor the if will very likely be colored differently from the (, but { will likely have the same color.
Here's a short example:
if (pPointer && pPointer->condition(foobar)){
SendEvent(success_foobar);
Log(success_foobar);
}
if (pPointer && pPointer->condition(foo))
SendEvent(success_foo);
Log(success_foo);
if (pPointer && pPointer->condition(bar)){
SendEvent(success_bar);
Log(success_bar);
}
vs. this (which I think makes the missing brace a bit clearer):
if(pPointer && pPointer->condition(foobar)) {
SendEvent(success_foobar);
Log(success_foobar);
}
if(pPointer && pPointer->condition(foo))
SendEvent(success_foo);
Log(success_foo);
if(pPointer && pPointer->condition(bar)) {
SendEvent(success_bar);
Log(success_bar);
}
To sum it up, one may argue that the visual distinction in modern editors is much bigger nowadays, e.g. if and (, so these do not need spaces, while ( and { are often colored the same and are not as visually distinct and therefore a space may be in order.
I personally prefer this style:
if( condition )
oneLiner();
else if( complicatedCondition(someVar, otherVar) )
{
more();
than();
oneLiner();
}
else if( otherCondition ) // I would prefer to put this in one if with '&&' between conditions...
{
if( nestedCondition )
oneLiner();
}
else
{
// ...
}
for( int i = 0; i<e; ++i )
{
// ... even for one-liner loops which occasionally might happen
}
Why?
Braces are aligned
No space wasted on oneliner ifs.
Loops are clearly noted with braces (always)
Only the outer level has spaces before or after the parentheses. Still a mess if there are more than three levels of function calling involved, like some( function(calling(another(function())) )
a single space after a semicolon and comma for easy location of these types of delimiters.
Nested ifs need braces around them.
This is of course purely personal, and any comments on my style will be ignored flagrantly ;-)
I typically place spaces between all operators. It's just faster to parse visually. Perhaps if your eyesight is better than mine.
With the advent of modern IDEs and automatic code formatting, this really is neither here nor there. And I wouldn't advice wholesale change of spaces (or any other formatting changes) for existing code in a vcs. Developers will tend to stick to the formatting they are familiar with and as long as the code relays the intention well, where a particular space is inserted is totally redundant IMHO. What you will most likely do is annoy people who are used to doing it one way by enforcing something that is contrived on them...
Focus on the programming problems, not the formatting...

Trouble with printf in Bison Rule

I have tried something like this in my Bison file...
ReturnS: RETURN expression {printf(";")}
...but the semicolon gets printed AFTER the next token, past this rule, instead of right after the expression. This rule was made as we're required to convert the input file to a c-like form and the original language doesn't require a semicolon after the expression in the return statement, but C does, so I thought I'd add it manually to the output with printf. That doesn't seem to work, as the semicolon gets added but for some reason, it gets added after the next token is parsed (outside the ReturnS rule) instead of right when the expression rule returns to ReturnS.
This rule also causes the same result:
loop_for: FOR var_name COLONEQUALS expression TO {printf("%s<=", $<chartype>2);} expression STEP {printf("%s+=", $<chartype>2);} expression {printf(")\n");} Code ENDFOR
Besides the first two printf's not working right (I'll post another question regarding that), the last printf is actually called AFTER the first token/literal of the "Code" rule has been parsed, resulting in something like this:
for (i=0; i<=5; i+=1
a)
=a+1;
instead of
for (i=0; i<=5; i+=1)
a=a+1;
Any ideas what I'm doing wrong?
Probably because the grammar has to look-ahead one token to decide to reduce by the rule you show.
The action is executed when the rule is reduced, and it is very typical that the grammar has to read one more token before it knows that it can/should reduce the previous rule.
For example, if an expression can consist of an indefinite sequence of added terms, it has to read beyond the last term to know there isn't another '+' to continue the expression.
After seeing the Yacc/Bison grammar and Lex/Flex analyzer, some of the problems became obvious, and others took a little more sorting out.
Having the lexical analyzer do much of the printing meant that the grammar was not properly in control of what appeared when. The analyzer was doing too much.
The analyzer was also not doing enough work - making the grammar process strings and numbers one character at a time is possible, but unnecessarily hard work.
Handling comments is tricky if they need to be preserved. In a regular C compiler, the lexical analyzer throws the comments away; in this case, the comments had to be preserved. The rule handling this was moved from the grammar (where it was causing shift/reduce and reduce/reduce conflicts because of empty strings matching comments) to the lexical analyzer. This may not always be optimal, but it seemed to work OK in this context.
The lexical analyzer needed to ensure that it returned a suitable value for yylval when a value was needed.
The grammar needed to propagate suitable values in $$ to ensure that rules had the necessary information. Keywords for the most part did not need a value; things like variable names and numbers do.
The grammar had to do the printing in the appropriate places.
The prototype solution returned had a major memory leak because it used strdup() liberally and didn't use free() at all. Making sure that the leaks are fixed - possibly by using a char array rather than a char pointer for YYSTYPE - is left to the OP.
Comments aren't a good place to provide code samples, so I'm going to provide an example of code that works, after Jonathan (replied above) did some work on my code. All due credit goes to him, this isn't mine.
Instead of having FLEX print any recognized parts and letting BISON do the formatting afterwards, Jonathan suggested that FLEX prints nothing and only returns to BISON, which should then handle all printing it self.
So, instead of something like this...
FLEX
"FOR" {printf("for ("); return FOR;}
"TO" {printf("; "); return TO;}
"STEP" {printf("; "); return STEP;}
"ENDFOR" {printf("\n"); printf("}\n"); return ENDFOR;}
[a-zA-Z]+ {printf("%s",yytext); yylval.strV = yytext; return CHARACTERS;}
":=" {printf("="); lisnew=0; return COLONEQUALS;}
BISON
loop_for: FOR var_name {strcpy(myvar, $<strV>2);} COLONEQUALS expression TO {printf("%s<=", myvar);} expression STEP {printf("%s+=", myvar);} expression {printf(")\n");} Code ENDFOR
...he suggested this:
FLEX
[a-zA-Z][a-zA-Z0-9]* { yylval = strdup(yytext); return VARNAME;}
[1-9][0-9]*|0 { yylval = strdup(yytext); return NUMBER; }
BISON
loop_for: FOR var_name COLONEQUALS NUMBER TO NUMBER STEP NUMBER
{ printf("for (%s = %s; %s <= %s; %s += %s)\n", $2, $4, $2, $6, $2, $8); }
var_name: VARNAME

Adding indentation

I'm trying to make a parser for a made-up programming language. I'm now at the part of the exercise where we're required to make sure the parser's output is a conversion in C of the input.
So things like...
STARTMAIN a=b+2; return a ENDMAIN
...must become...
int main () { a=b+2; return a; }
So far so good, almost. The exercise also requires that in the same time, as we convert, we have to add proper indentation and (as I had to learn the hard way last year) newlines.
The obvious part is that each time a { opens, you increase a counter and then add the appropriate tabs on each new line. However, closing brackets ('}') are a different story as you can't detect them before hand, and once you've parsed them, you can't just put them a tab to the left by removing the last tab printed.
Is there a solution to this, and/or a consistent way of checking and adding indentation?
Well, you've now discovered one reason why people do not always bother to format generated output neatly; it is relatively hard to do so.
Indeed, one way to deal with the problem is to provide an official formatter for the language. Google's Go programming language comes with the 'gofmt' program to encourage the official format. C does not have such a standard, hence the religious wars over the placement of braces, but it does have programs such as indent which can in fact format the code neatly for you.
The trick is not to output anything on a line until you know how many tabs to output. So, on a line with a close brace, you decrement the indent counter (making sure it never goes negative) and only then do you output the leading tabs and the following brace.
Note that some parts of C require a semi-colon (or comma) after the close brace (think initializers and structure definitions); others do not (think statement blocks).