vim regex to match inline comments - regex

Assuming the following sample inline comment:
/*
function newMethodName (int bar, String s) {
int i = 123;
}
s/\<foo\s*(/newMethodName (/g
*/
How would I match and replace such that it would, essentially, become uncommented. I got this far before giving up.
:%s/\/\*\(\_.\)*\*\//\1/
Solution
:%s/\/\*\(\_.*\)\*\//\1/

Your capture group ( ) is capturing one character or newline. Put the following * inside so that \1 replacement gets the whole string rather than just the first character.

Related

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

recursive matching for string delimiter with regular expression

In verilog language, the statements are enclosed in a begin-end delimiter instead of bracket.
always# (*) begin
if (condA) begin
a = c
end
else begin
b = d
end
end
I'd like to parse outermost begin-end with its statements to check coding rule in python. Using regular expression, I want results with regular expression like:
if (condA) begin
a = c
end
else begin
b = d
end
I found similar answer for bracket delimiter.
int funcA() {
if (condA) {
b = a
}
}
regular expression:
/({(?>[^{}]+|(?R))*})/g
However, I don't know how to modify atomic group ([^{}]) for "begin-end"?
/(begin(?>[??????]+|(?R))*end)/g
The point of the [??????]+ part is to match any text that does not match a char that is equal or is the starting point of the delimiters.
So, in your case, you need to match any char other than a char that starts either begin or end substring:
/begin(?>(?!begin|end).|(?R))*end/gs
See the regex demo
The . here will match any char including line break chars due to the s modifier. Note that the actual implementation might need adjustments (e.g. in PHP, the g modifier should not be used as there are specific functions/features for that).
Also, since you recurse the whole pattern, you need no outer parentheses.

Comment pattern match in flex using states

I am trying to match single line comment pattern in flex. Patterns of the comment could be:
//this is a single /(some random stuff) line comment
Or it could be like this:
// this is also a comment\
continuation of the comment from previous line
From the example it's obvious that I have to handle the multi-line case too.
Now my approach was using states. This is what I have so far:
"//" {
yymore();
BEGIN (SINGLE_COMMENT);
}
<SINGLE_COMMENT>([^{NEWLINE}]|\\[(.){NEWLINE}]) {
yymore();
}
<SINGLE_COMMENT>([^{NEWLINE}]|[^\\]{NEWLINE}) {
logout << "Line no " << line_count << ": TOKEN <COMMENT> Lexeme " << string(yytext) << "\nfound\n\n";
BEGIN (INITIAL);
}
NEWLINE is declared as:
NEWLINE \r?\n
My declaration unit:
%option noyywrap
%x SINGLE_COMMENT
int line_count = 1;
const int bucketSize = 10; // change if necessary
ofstream logout;
ofstream tokenout;
SymbolTable symbolTable(bucketSize);
Action of NEWLINE:
{NEWLINE} {
line_count++;
}
If I run it with the following input:
// hello\
int main
This is my log file:
Line no 1: TOKEN <COMMENT> Lexeme // hello\
found
Line no 1: TOKEN <INT> Lexeme int found
Line no 1: TOKEN <ID> Lexeme main found
ScopeTable # 1
6 --> < main , ID >
So, it's not catching the multi-line comment. Also the line_count is not incremented. It's staying the same. Can anybody help me figuring out what I have done wrong?
Link to code
In (f)lex, as in most regular expression engines, [ and ] enclose a character class description. A character class is a set of individual characters, and it always matches exactly one character which is a member of that set. There are also negated character classes which are written the same way except that they start with [^ and match exactly one character which is not a member of the set.
Character classes are not the same as sequences of characters:
ab matches an a followed by a b
[ab] matches either an a or a b
Since character classes are just sets of characters, it is meaningless for the individual characters in the class to be repeated or optional, etc. Consequently, almost no regular expression operators (*, +, ?, etc.) are meaningful inside a character class. If you put one of them in a character class expression, it is handled just like an ordinary character:
a* matches 0 or more as
[a*] matches either an a or a *
One of the features flex provides which is not provided by most other regular expression systems is macro expansions, of the form {name}. Here the { and } indicate the expansion of a defined macro, whose name is contained between the braces. These characters are also not special inside a character class:
{identifier} matches whatever the expanded macro named identifier would match.
[{identifier}] matches a single character which is {, } or one of the letters definrt
Macro definitions seem to be overused by beginners. My advice is always to avoid them, and thereby avoid the confusion which they create.
It's also worth noting that (f)lex does not have an operator which negates a subpattern. Only character classes can be negated; there is no easy way to write "match anything other than foo". However, you can generally rely on the first longest-match rule to effectively implement negations: if some pattern p executes, then there cannot be any pattern which would match more than p. Thus, it might not be necessary to explicitly write the negation.
For example, in your comment detector where the only real issue is dealing with carriage return (\r) characters which are not followed by newline characters, you could use (f)lex's pattern matching algorithm to your advantage:
<SINGLE_COMMENT>{
[^\\\r\n]+ ;
\\\r?\n { ++line_count; }
\\. ; /* only matches if the above rule doesn't */
\r?\n { ++line_count; BEGIN(INITIAL); }
\r ; /* only matches if the above rule doesn't */
}
By the way, it's usually much easier to provide %option yylineno than to try to track newlines manually.

How to Modify all beginnings and endings of a function

I would like to modify all the function which are of the following kind:
returnType functionName(parameters){
OLD_LOG; // Always the first line of the function
//stuff to do
return result; // may not be here in case of function returning void
} // The ending } is not always at the beginning of the line (but is always the first not white space of the line and has the same number of white space before than 'returnType' does)
by
returnType functionName(parameters){
NEW_LOG("functionName"); // the above function name
//stuff to do
END_LOG();
return result; //if any return (if possible, END_LOG() should appear just before any return, or at the end of the function if there is no return)
}
There is a at least a hundred of those functions.
Therefore I would like to know if it is possible to do that using a "look for/replace" in a text editor supporting regex for exemple, or anything else.
Thank you
here is an attempt for the same
Regex
/(?<=\s)(\w+)(?=\()(.*\{\n.*)(OLD_LOG;)(.*)(\n\})/s
Test String
returnType functionName(parameters){
OLD_LOG;
//stuff to do
}
Replace string
\1 \2NEW_LOG("\1");\n\4\n END_LOG();\5
Result
returnType functionName (parameters){
NEW_LOG("functionName");
//stuff to do
END_LOG();
}
live demo here
I have updated the regex to include optional return statement & optional spaces
Regex
/(?<=\s)(\w+)(?=\()(.*\{\n.*)(OLD_LOG;)(.*?)(?=(?:\s*)return|(?:\n\s*\}))/s
Replace string
\1 \2NEW_LOG("\1");\n\4\n END_LOG();
demo for return statement
demo for optional spaces
see if this works for you
Find
(\n([^\S\n]*)[^\s].*\s([^\s\(]+)\s*\(.*\)\s*\{\s*\n)(\s*)OLD\_LOG;((.*\s*\n)*?)(\s*return\s.*\r\n)?\2\}
Replace with
\1\4NEW\_LOG\(\"\3\"\);\5\4END_LOG\(\);\r\n\7\2\}
Notice that \n and \r\n are used. If your code file uses a different newline format, you need to modify accordingly.
Limitations of this replace are these assumptions:
1) OLD_LOG; is just one line below the function name.
2) Function has return type (any non space character before the function name is okay).
3) Function name and { are at the same line.
4) Ending } has the same number of white space before than 'returnType' does, and there is no such } inside the function.
5) Last return is just one line above the ending }.
It may be faster to use an editor with multiple carets support (e.g. Sublime Text, IntelliJ):
https://stackoverflow.com/a/18929134/802365
(Video) Multi-caret editing in Sublime Text

C++ TR1 regex - multiline option

I thought that $ indicates the end of string. However, the following piece of code gives "testbbbccc" as a result, which is quite astonishing to me... This means that $ actually matches end of line, not end of the whole string.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
tr1::regex r("aaa([^]*?)(ogr|$)");
string test("bbbaaatestbbbccc\nddd");
vector<int> captures;
captures.push_back(1);
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator iter(test.begin(), test.end(), r, captures); iter != end; )
{
string& t1 = iter->str();
iter++;
cout << t1;
}
}
I have been trying to find a "multiline" switch (which actually can be easily found in PCRE), but without success... Can someone point me to the right direction?
Regards,
R.P.
As Boost::Regex was selected for tr1, try the following:
From Boost::Regex
Anchors:
A '^' character shall match the start
of a line when used as the first
character of an expression, or the
first character of a sub-expression.
A '$' character shall match the end of
a line when used as the last character
of an expression, or the last
character of a sub-expression.
So the behavior you observed is correct.
From: Boost Regex as well:
\A Matches at the start of a buffer
only (the same as \`).
\z Matches at
the end of a buffer only (the same as
\').
\Z Matches an optional sequence
of newlines at the end of a buffer:
equivalent to the regular expression
\n*\z
I hope that helps.
There is no multiline switch in TR1 regexs. It's not exactly the same, but you could get the same functionality matching everything:
(.|\r|\n)*?
This matches non-greedily every character, including new line and carriage return.
Note: Remember to escape the backslashes '\' like this '\\' if your pattern is a C++ string in code.
Note 2: If you don't want to capture the matched contents, append '?:' to the opening bracket:
(?:.|\r|\n)*?