vim indentation braces inside parentheses - c++

In vim (eg 7.3), how can I use/modify the cindent or smartindent options, or otherwise augment my .vimrc, in order to automatically indent curly braces inside open parentheses to align to the first "word" (defined later) directly preceding the opening (?
The fN option seems promising, but appears to be overridden by the (N option when inside open parentheses. From :help cinoptions-values:
fN Place the first opening brace of a function or other block in
column N. This applies only for an opening brace that is not
inside other braces and is at the start of the line. What comes
after the brace is put relative to this brace. (default 0).
cino= cino=f.5s cino=f1s
func() func() func()
{ { {
int foo; int foo; int foo;
Current behavior:
func (// no closing )
// (N behavior, here N=0
{ // (N behavior overrides fN ?
int foo; // >N behavior, here N=2
while I wish for:
func (// no closing )
// (N behavior as before
{ // desired behavior
int foo; // >N behavior still works
What I am asking for is different from fN because fN aligns to the prevailing indent, and I want to align to any C++ nested-name-specifier that directly precedes the opening (, like
code; f::g<T> ( instead of code; f::g<T> (
{ {
If there is no nested-name-specifier, I'd like it to match the ( itself. Perhaps matching a nested-name-specifier is too complicated, or maybe there is another part of the grammer this is more appropriate for this scenario. Anyway, for my typical use case, I think I'd be satisfied if the { aligns with the first nonwhitespace character of the maximal sequence of characters to the left of the innermost unclosed (, inclusive, that does not contain any semicolons or left curly braces }.
By the way, I arrived at this when trying to autoindent various std::for_each(b,e,[]{}); constructs in vim7.3. Thanks for your help!

Not sure that any of the {auto,smart,c}indent features could be finagled to do what you want. I made up a mapping which might give some inspiration:
inoremap ({ <esc>T<space>y0A<space>(<cr><esc>pVr<space>A{
Downsides are that you may need to do something smarter than 'T' to get back to the beginning of the last identifier (you could use '?' with a regex), that it trashes your default register, and that if your identifier before the paren is at the start of the line you have to do '({' yourself. The notion is to jump back to just before the identifier, copy to the beginning of the line, paste that to the next line, and replace every character with a space.
Good luck!

Related

What happens with extra spaces and newlines in C/C++ code?

Is there a difference between;
int main(){
return 0;
}
and
int main(){return 0;}
and
int main(){
return
0;
}
They will all likely compile to same executable. How does the C/C++ compiler treat the extra spaces and newlines, and if there is a difference between how newlines are treated differently than spaces in C code?
Also, how about tabs? What's the significance of using tabs instead of spaces in code, if there is any?
Any sequence of 1+ whitespace symbol (space/line-break/tab/...) is equivalent to a single space.
Exceptions:
Whitespace is preserved in string literals. They can't contain line-breaks, except C++ raw literals (R"(...)"). The same applies to file names in #include.
Single-line comments (//) are terminated with line-breaks only.
Preprocessor directives (starting with #) are terminated with line-breaks only.
\ followed by a line-break removes both, allowing multi-line // comments, preprocessor directrives, and string literals.
Also, whitespace symbols are ignored if there is punctuation (anything except letters, numbers, and _) to the left and/or to the right of it. E.g. 1 + 2 and 1+2 are the same, but return a; and returna; are not.
Exceptions:
Whitespace is not ignored inside string literals, obviously. Nor in #include file names.
Operators consisting of >1 punctuation symbols can't be separated, e.g. cout < < 1 is illegal. The same applies to things like // and /* */.
A space between punctuation might be necessary to prevent it from coalescing into a single operator. Examples:
+ +a is different from ++a.
a+++b is equivalent to a++ +b, but not to a+ ++b.
Pre-C++11, closing two template argument lists in a row required a space: std::vector<std::vector<int> >.
When defining a function-like macro, the space is not allowed before the opening parenthesis (adding it turns it into an object-like macro). E.g. #define A() replaces A() with nothing, but #define A () replaces A with ().

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

Comment pattern match in flex using states

I am trying to match single line comment pattern in flex. Patterns of the comment could be:
//this is a single /(some random stuff) line comment
Or it could be like this:
// this is also a comment\
continuation of the comment from previous line
From the example it's obvious that I have to handle the multi-line case too.
Now my approach was using states. This is what I have so far:
"//" {
yymore();
BEGIN (SINGLE_COMMENT);
}
<SINGLE_COMMENT>([^{NEWLINE}]|\\[(.){NEWLINE}]) {
yymore();
}
<SINGLE_COMMENT>([^{NEWLINE}]|[^\\]{NEWLINE}) {
logout << "Line no " << line_count << ": TOKEN <COMMENT> Lexeme " << string(yytext) << "\nfound\n\n";
BEGIN (INITIAL);
}
NEWLINE is declared as:
NEWLINE \r?\n
My declaration unit:
%option noyywrap
%x SINGLE_COMMENT
int line_count = 1;
const int bucketSize = 10; // change if necessary
ofstream logout;
ofstream tokenout;
SymbolTable symbolTable(bucketSize);
Action of NEWLINE:
{NEWLINE} {
line_count++;
}
If I run it with the following input:
// hello\
int main
This is my log file:
Line no 1: TOKEN <COMMENT> Lexeme // hello\
found
Line no 1: TOKEN <INT> Lexeme int found
Line no 1: TOKEN <ID> Lexeme main found
ScopeTable # 1
6 --> < main , ID >
So, it's not catching the multi-line comment. Also the line_count is not incremented. It's staying the same. Can anybody help me figuring out what I have done wrong?
Link to code
In (f)lex, as in most regular expression engines, [ and ] enclose a character class description. A character class is a set of individual characters, and it always matches exactly one character which is a member of that set. There are also negated character classes which are written the same way except that they start with [^ and match exactly one character which is not a member of the set.
Character classes are not the same as sequences of characters:
ab matches an a followed by a b
[ab] matches either an a or a b
Since character classes are just sets of characters, it is meaningless for the individual characters in the class to be repeated or optional, etc. Consequently, almost no regular expression operators (*, +, ?, etc.) are meaningful inside a character class. If you put one of them in a character class expression, it is handled just like an ordinary character:
a* matches 0 or more as
[a*] matches either an a or a *
One of the features flex provides which is not provided by most other regular expression systems is macro expansions, of the form {name}. Here the { and } indicate the expansion of a defined macro, whose name is contained between the braces. These characters are also not special inside a character class:
{identifier} matches whatever the expanded macro named identifier would match.
[{identifier}] matches a single character which is {, } or one of the letters definrt
Macro definitions seem to be overused by beginners. My advice is always to avoid them, and thereby avoid the confusion which they create.
It's also worth noting that (f)lex does not have an operator which negates a subpattern. Only character classes can be negated; there is no easy way to write "match anything other than foo". However, you can generally rely on the first longest-match rule to effectively implement negations: if some pattern p executes, then there cannot be any pattern which would match more than p. Thus, it might not be necessary to explicitly write the negation.
For example, in your comment detector where the only real issue is dealing with carriage return (\r) characters which are not followed by newline characters, you could use (f)lex's pattern matching algorithm to your advantage:
<SINGLE_COMMENT>{
[^\\\r\n]+ ;
\\\r?\n { ++line_count; }
\\. ; /* only matches if the above rule doesn't */
\r?\n { ++line_count; BEGIN(INITIAL); }
\r ; /* only matches if the above rule doesn't */
}
By the way, it's usually much easier to provide %option yylineno than to try to track newlines manually.

Word 'if' interpreted as 'if()' function call. Parens not allowed

So I discovered that writing an if statement with parentheses in Perl 6 results in it throwing this error at me:
===SORRY!===
Word 'if' interpreted as 'if()' function call; please use whitespace instead of parens
at C:/test.p6:8
------> if<HERE>(True) {
Unexpected block in infix position (two terms in a row)
at C:/test.p6:8
------> if(True)<HERE> {
This makes me assume that there is some sort of if() function? However, creating and running a script with if(); in it produces the following compiler error:
===SORRY!===
Undeclared routine:
if used at line 15
So like what's the deal?
I read here https://en.wikibooks.org/wiki/Perl_6_Programming/Control_Structures#if.2Funless that parens are optional but that seems to not to be the case for me.
My if statements do work without parens just wondering why it would stop me from using them or why it would think that if is a subroutine because of them.
EDIT: Well aren't I a bit daft... looks like I wasn't reading well enough at the link I linked which I assume is why you are confused. The link I linked points out the following which was basically what I was asking:
if($x > 5) { # Calls subroutine "if"
}
if ($x > 5) { # An if conditional
}
I've accepted the below answer as it does provide some insight.
Are you sure you created a sub with the name 'if'? If so, (no pun intended), you get the keyword if you use a space after the literal 'if', otherwise you get your pre-declared function if you use a paren after the literal 'if' - i.e. if your use of the term looks like a function call - and you have declared such a function - it will call it;
use#localhost:~$ perl6
> sub if(Str $s) { say "if sub says: arg = $s" };
sub if (Str $s) { #`(Sub|95001528) ... }
> if "Hello World";
===SORRY!=== Error while compiling <unknown file>
Missing block
at <unknown file>:1
------> if "Hello World"⏏;
expecting any of:
block or pointy block
> if("Hello World");
if sub says: arg = Hello World
>
> if 12 < 16 { say "Excellent!" }
Excellent!
>
You can see above, I've declared a function called 'if'.
if "Hello World"; errors as the space means I'm using the keyword and therefore we have a syntax error in trying to use the if keyword.
if("Hello World") successfully calls the pre-declared function.
if 12 < 18 { say "Excellent!" } works correctly as the space means 'if' is interpreted as the keyword and this time there is no syntax error.
So again, are you sure you have (or better - can you paste here) your pre-declared 'if' function?
The reference for keywords and whitespace (which co-incidentally uses the keyword 'if' as an example!) is here: SO2 - Keywords and whitespace

Why is the flex regex being skipped?

I can't, for the life of me, figure out what's wrong with my regex's.
What I'd like to tokenize are two (2) types of strings, both of which to be contained on a single line. One string can be anything (other than a new line), and the other, any alpha-numeric (ASCII) character and literal '_', '/' '-', and '.'.
The snippet of flex code is:
nl \n|\r\n|\r|\f|\n\r
...
%%
...
\"[^\"]+{nl} { frx_parser_error("Label is missing trailing double quote."); }
\"[a-zA-Z0-9_\.\/\-]+\" {
if (yyleng > 1024) frx_parser_error("File name too long.");
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_FILENAME: %s\n", frx_parser_lval.str);
return (TOSP_FILENAME);
}
\"[^{nl}]+\" {
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_IDENTIFIER:\n%s\n", frx_parser_lval.str);
return (TOSP_IDENTIFIER);
}
And when I run the parser, the fprintf's spit this out:
TOSP_FILENAME: ModStar-Picture-Analysis.txt
TOSP_FILENAME: ModStar-Rubric.log.txt
TOSP_IDENTIFIER:
picture-A"
Progress (26,255) camera 'C' root("picture-C-
Syntax (line 34): syntax error
For whatever reason, the quote after picture-A is being ... missed. Why? I checked the ASCII values for the eight locations the quote character appears and they're all 0x22 (where the double quutoes appear that is).
If I add some characters to the end of the "picture-A" it can work sometimes; adding ".par", ".pbr" doesn't work as expected, but ".pnr" does.
I've even added a specific non-regexy token:
\"picture-A\" { frx_parser_lval.str = strdup("picture-A"); return TOSP_FILENAME; }
to the lex file and it gets skipped.
I'm using flex 2.5.39, no flex libraries, one option (%option prefix=frx_parser_) in the lex file and the flex command line is:
flex -t script-lexer.l > script-lexer.c
What gives?
EDIT I need to test this on the actual system, but unit tests show this tokenizer to be much more robust (based on rici's answer):
nl \n|\r\n|\r|\f|\n\r
...
%%
...
["][^"]+{nl} { printf("Missing trailing quote.\n%s\n",yytext); }
["][[:alnum:]_./-]+["] { printf("File name:\n%s\n",yytext); }
["][^"]+["] { printf("String:\n%s\n",yytext); }
EDIT The rule ["].+["] swallows consecutive multiple strings as one big string. It was changed to ["][^"]+["]
The problem is your pattern:
\"[^{nl}]+\"
You're attempting to expand a definition inside a character class, but that is not possible; inside a character class, { is always just a {, not a flex operator. See the flex manual:
Note that inside of a character class, all regular expression operators lose their special meaning except escape (‘\’) and the character class operators, ‘-’, ‘]]’, and, at the beginning of the class, ‘^’.
A definition is not a macro. Rather, a definition defines a new regular expression operator.
As a consequence of the above, you can write [^\"] as simply [^"] and \"[a-zA-Z0-9_\.\/\-]+\" as \"[a-zA-Z0-9_./-]+\" (The - needs to be either at the end or at the beginning.) Personally, I'd write the second pattern as:
["][[:alnum:]_./-]+["]
But everyone has their own style.