How to Modify all beginnings and endings of a function - regex

I would like to modify all the function which are of the following kind:
returnType functionName(parameters){
OLD_LOG; // Always the first line of the function
//stuff to do
return result; // may not be here in case of function returning void
} // The ending } is not always at the beginning of the line (but is always the first not white space of the line and has the same number of white space before than 'returnType' does)
by
returnType functionName(parameters){
NEW_LOG("functionName"); // the above function name
//stuff to do
END_LOG();
return result; //if any return (if possible, END_LOG() should appear just before any return, or at the end of the function if there is no return)
}
There is a at least a hundred of those functions.
Therefore I would like to know if it is possible to do that using a "look for/replace" in a text editor supporting regex for exemple, or anything else.
Thank you

here is an attempt for the same
Regex
/(?<=\s)(\w+)(?=\()(.*\{\n.*)(OLD_LOG;)(.*)(\n\})/s
Test String
returnType functionName(parameters){
OLD_LOG;
//stuff to do
}
Replace string
\1 \2NEW_LOG("\1");\n\4\n END_LOG();\5
Result
returnType functionName (parameters){
NEW_LOG("functionName");
//stuff to do
END_LOG();
}
live demo here
I have updated the regex to include optional return statement & optional spaces
Regex
/(?<=\s)(\w+)(?=\()(.*\{\n.*)(OLD_LOG;)(.*?)(?=(?:\s*)return|(?:\n\s*\}))/s
Replace string
\1 \2NEW_LOG("\1");\n\4\n END_LOG();
demo for return statement
demo for optional spaces
see if this works for you

Find
(\n([^\S\n]*)[^\s].*\s([^\s\(]+)\s*\(.*\)\s*\{\s*\n)(\s*)OLD\_LOG;((.*\s*\n)*?)(\s*return\s.*\r\n)?\2\}
Replace with
\1\4NEW\_LOG\(\"\3\"\);\5\4END_LOG\(\);\r\n\7\2\}
Notice that \n and \r\n are used. If your code file uses a different newline format, you need to modify accordingly.
Limitations of this replace are these assumptions:
1) OLD_LOG; is just one line below the function name.
2) Function has return type (any non space character before the function name is okay).
3) Function name and { are at the same line.
4) Ending } has the same number of white space before than 'returnType' does, and there is no such } inside the function.
5) Last return is just one line above the ending }.

It may be faster to use an editor with multiple carets support (e.g. Sublime Text, IntelliJ):
https://stackoverflow.com/a/18929134/802365
(Video) Multi-caret editing in Sublime Text

Related

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

Comment pattern match in flex using states

I am trying to match single line comment pattern in flex. Patterns of the comment could be:
//this is a single /(some random stuff) line comment
Or it could be like this:
// this is also a comment\
continuation of the comment from previous line
From the example it's obvious that I have to handle the multi-line case too.
Now my approach was using states. This is what I have so far:
"//" {
yymore();
BEGIN (SINGLE_COMMENT);
}
<SINGLE_COMMENT>([^{NEWLINE}]|\\[(.){NEWLINE}]) {
yymore();
}
<SINGLE_COMMENT>([^{NEWLINE}]|[^\\]{NEWLINE}) {
logout << "Line no " << line_count << ": TOKEN <COMMENT> Lexeme " << string(yytext) << "\nfound\n\n";
BEGIN (INITIAL);
}
NEWLINE is declared as:
NEWLINE \r?\n
My declaration unit:
%option noyywrap
%x SINGLE_COMMENT
int line_count = 1;
const int bucketSize = 10; // change if necessary
ofstream logout;
ofstream tokenout;
SymbolTable symbolTable(bucketSize);
Action of NEWLINE:
{NEWLINE} {
line_count++;
}
If I run it with the following input:
// hello\
int main
This is my log file:
Line no 1: TOKEN <COMMENT> Lexeme // hello\
found
Line no 1: TOKEN <INT> Lexeme int found
Line no 1: TOKEN <ID> Lexeme main found
ScopeTable # 1
6 --> < main , ID >
So, it's not catching the multi-line comment. Also the line_count is not incremented. It's staying the same. Can anybody help me figuring out what I have done wrong?
Link to code
In (f)lex, as in most regular expression engines, [ and ] enclose a character class description. A character class is a set of individual characters, and it always matches exactly one character which is a member of that set. There are also negated character classes which are written the same way except that they start with [^ and match exactly one character which is not a member of the set.
Character classes are not the same as sequences of characters:
ab matches an a followed by a b
[ab] matches either an a or a b
Since character classes are just sets of characters, it is meaningless for the individual characters in the class to be repeated or optional, etc. Consequently, almost no regular expression operators (*, +, ?, etc.) are meaningful inside a character class. If you put one of them in a character class expression, it is handled just like an ordinary character:
a* matches 0 or more as
[a*] matches either an a or a *
One of the features flex provides which is not provided by most other regular expression systems is macro expansions, of the form {name}. Here the { and } indicate the expansion of a defined macro, whose name is contained between the braces. These characters are also not special inside a character class:
{identifier} matches whatever the expanded macro named identifier would match.
[{identifier}] matches a single character which is {, } or one of the letters definrt
Macro definitions seem to be overused by beginners. My advice is always to avoid them, and thereby avoid the confusion which they create.
It's also worth noting that (f)lex does not have an operator which negates a subpattern. Only character classes can be negated; there is no easy way to write "match anything other than foo". However, you can generally rely on the first longest-match rule to effectively implement negations: if some pattern p executes, then there cannot be any pattern which would match more than p. Thus, it might not be necessary to explicitly write the negation.
For example, in your comment detector where the only real issue is dealing with carriage return (\r) characters which are not followed by newline characters, you could use (f)lex's pattern matching algorithm to your advantage:
<SINGLE_COMMENT>{
[^\\\r\n]+ ;
\\\r?\n { ++line_count; }
\\. ; /* only matches if the above rule doesn't */
\r?\n { ++line_count; BEGIN(INITIAL); }
\r ; /* only matches if the above rule doesn't */
}
By the way, it's usually much easier to provide %option yylineno than to try to track newlines manually.

Why does this Swift RegEx match "e1234"?

Here's the regular expression:
let legalStr = "(?:[eE][\\+\\-]?[0-9]{1,3})?$"
Here's the invocation:
if let match = sender.stringValue.rangeOfString(legalStr, options: .RegularExpressionSearch) {
print("\(sender.stringValue) is legal")
}
else {
print( "\(sender.stringValue) is not legal")
}
If I type garbage, like "abcd" is returns illegal string.
If I type something like "e123" it returns legal string.
(note that the empty string is also legal.)
However, if I type "e1234" it still returns "legal". I'd expect it to return "not legal". Am I missing something here? BTW, note the "$" at the end of the regular expression. The three digits should appear at the end of the string.
If it's not immediately clear, the source of the string is a text edit box.
Your pattern is only anchored at the end, and matches the empty string. So any string at all will match successfully by just matching your pattern as an empty string at the end.
Add a ^ to the front to anchor it on that side, too.

Why is the flex regex being skipped?

I can't, for the life of me, figure out what's wrong with my regex's.
What I'd like to tokenize are two (2) types of strings, both of which to be contained on a single line. One string can be anything (other than a new line), and the other, any alpha-numeric (ASCII) character and literal '_', '/' '-', and '.'.
The snippet of flex code is:
nl \n|\r\n|\r|\f|\n\r
...
%%
...
\"[^\"]+{nl} { frx_parser_error("Label is missing trailing double quote."); }
\"[a-zA-Z0-9_\.\/\-]+\" {
if (yyleng > 1024) frx_parser_error("File name too long.");
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_FILENAME: %s\n", frx_parser_lval.str);
return (TOSP_FILENAME);
}
\"[^{nl}]+\" {
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_IDENTIFIER:\n%s\n", frx_parser_lval.str);
return (TOSP_IDENTIFIER);
}
And when I run the parser, the fprintf's spit this out:
TOSP_FILENAME: ModStar-Picture-Analysis.txt
TOSP_FILENAME: ModStar-Rubric.log.txt
TOSP_IDENTIFIER:
picture-A"
Progress (26,255) camera 'C' root("picture-C-
Syntax (line 34): syntax error
For whatever reason, the quote after picture-A is being ... missed. Why? I checked the ASCII values for the eight locations the quote character appears and they're all 0x22 (where the double quutoes appear that is).
If I add some characters to the end of the "picture-A" it can work sometimes; adding ".par", ".pbr" doesn't work as expected, but ".pnr" does.
I've even added a specific non-regexy token:
\"picture-A\" { frx_parser_lval.str = strdup("picture-A"); return TOSP_FILENAME; }
to the lex file and it gets skipped.
I'm using flex 2.5.39, no flex libraries, one option (%option prefix=frx_parser_) in the lex file and the flex command line is:
flex -t script-lexer.l > script-lexer.c
What gives?
EDIT I need to test this on the actual system, but unit tests show this tokenizer to be much more robust (based on rici's answer):
nl \n|\r\n|\r|\f|\n\r
...
%%
...
["][^"]+{nl} { printf("Missing trailing quote.\n%s\n",yytext); }
["][[:alnum:]_./-]+["] { printf("File name:\n%s\n",yytext); }
["][^"]+["] { printf("String:\n%s\n",yytext); }
EDIT The rule ["].+["] swallows consecutive multiple strings as one big string. It was changed to ["][^"]+["]
The problem is your pattern:
\"[^{nl}]+\"
You're attempting to expand a definition inside a character class, but that is not possible; inside a character class, { is always just a {, not a flex operator. See the flex manual:
Note that inside of a character class, all regular expression operators lose their special meaning except escape (‘\’) and the character class operators, ‘-’, ‘]]’, and, at the beginning of the class, ‘^’.
A definition is not a macro. Rather, a definition defines a new regular expression operator.
As a consequence of the above, you can write [^\"] as simply [^"] and \"[a-zA-Z0-9_\.\/\-]+\" as \"[a-zA-Z0-9_./-]+\" (The - needs to be either at the end or at the beginning.) Personally, I'd write the second pattern as:
["][[:alnum:]_./-]+["]
But everyone has their own style.

vim regex to match inline comments

Assuming the following sample inline comment:
/*
function newMethodName (int bar, String s) {
int i = 123;
}
s/\<foo\s*(/newMethodName (/g
*/
How would I match and replace such that it would, essentially, become uncommented. I got this far before giving up.
:%s/\/\*\(\_.\)*\*\//\1/
Solution
:%s/\/\*\(\_.*\)\*\//\1/
Your capture group ( ) is capturing one character or newline. Put the following * inside so that \1 replacement gets the whole string rather than just the first character.