How to force rematch? - regex

I would like to force rematch in the following scenario - I'm trying to inverse match a qualifier after each element in a list. In other words I have:
"int a, b, c" =~ m{
(?(DEFINE)
(?<qualifs>\s*(?<qualif>\bint\b|\bfloat\b)\s*+(?{print $+{qualif} . "\n"}))
(?<decl>\s*(?!(?&qualif))(?<ident>[_a-zA-Z][_a-zA-Z0-9]*+)\s*(?{print $+{ident} . "\n"}))
(?<qualifsfacet>\s*\bint\b\s*+)
(?<declfacet>[_a-zA-Z][_a-zA-Z0-9]*+)
)
^((?&qualifsfacet)*+(?!(?&decl))
|(?&qualifs)*+(?&declfacet)
|((?&qualifsfacet)
(?&declfacet)(?<negdecl>\g{lastnegdecl}(,(?&decl)))
|(?&qualifs)*+(?&declfacet)(?<lastnegdecl>\g{negdecl})
(?# Here how to force it to retry last with new lastnegdecl)))$
}xxs;
And would like to have:
a
int
b
int
c
int
As output. Currently it's only this:
a
int
int
I think this might work if there is a way to tell the regex machine to retrigger a match for the new lastnegdecl that is being captured.

Well after some trying I finally figured it out (besides the obvious whitespace issues I had in my original post):
"int a, b, c" =~ m{
(?(DEFINE)
(?<qualifs>\s*+(?<qualif>\bint\b|\bfloat\b)\s*+(?{print $+{qualif} . "\n"}))
(?<decl>\s*+(?!(?&qualif))(?<ident>[_a-zA-Z][_a-zA-Z0-9]*+)\s*(?{print $+{ident} . "\n"}))
(?<qualifsfacet>\s*+(\bint\b|\bfloat\b)\s*+)
(?<declfacet>\s*+[_a-zA-Z][_a-zA-Z0-9]*+\s*+)
)
^((?&qualifsfacet)(?!(?&decl))
|(?&qualifs)*+(?&declfacet)
|(?<restoutter>(?=(?&qualifsfacet)(?&declfacet)
(?<rest>(?(<rest>)\g{rest}),(?&decl)))
((?&qualifs)(?&declfacet)\g{rest}|(?&restoutter)))
|(?&qualifsfacet)(?&declfacet)(,(?&declfacet))*+)$
}xxs;
Basically I'm doing a positive lookahead where decl are called with code but qualifs are not while also concatenating decl inside rest then doing a partial match with the qualifs and the rest and if it doesn't match it goes to do the same thing again. Maybe someone can explain it better but it works. The output of the program above is:
a
int
b
int
c
int
And there is a full match.

Related

perl regex for matching multiline calls to c function

I'm looking to have a regex to match all potentially multiline calls to a variadic c function. The end goal is to print the file, line number, and the fourth parameter of each call, but unfortunately I'm not there yet. So far, I have this:
perl -ne 'print if s/^.*?(func1\s*\(([^\)\(,]+||,|\((?2)\))*\)).*?$/$1/s' test.c
with test.c:
int main() {
func1( a, b, c, d);
func1( a, b,
c, d);
func1( func2(), b, c, d, e );
func1( func2(a), b, c, d, e );
return 1;
}
-- which does not match the second call. The reason it doesn't match is that the s at the end of the expression allows . to match newlines, but doesn't seem to allow [..] constructs to match newlines. I'm not sure how to get past this.
I'm also not sure how to reference the fourth parameter in this... the $2, $3 do not get populated in this (and even if they did I imagine I would get some issues due to the recursive nature of the regex).
This should catch your functions, with caveats
perl -0777 -wnE'#f = /(func1\s*\( [^;]* \))\s*;/xg; s/\s+/ /g, say for #f' tt.c
I use the fact that a statement must be terminated by ;. Then this excludes an accidental ; in a comment and it excludes calls to this being nested inside another call. If that is possible then quite a bit more need be done to parse it.
However, further parsing the captured calls, presumably by commas, is complicated by the fact that a nested call may well, and realistically, contain commas. How about
func1( a, b, f2(a2, b2), c, f3(a3, b3), d );
This becomes a far more interesting little parsing problem. Or, how about macros?
Can you clarify what kinds of things one doesn't have to account for?
As the mentioned caveats may be possible to ignore here is a way to parse the argument list, using
Text::Balanced.
Since we need to extract whole function calls if they appear as an argument, like f(a, b), the most suitable function from the library is extract_tagged. With it we can make the opening tag be a word-left-parenthesis (\w+\() and the closing one a right-parenthesis \).
This function extracts only the first occurrence so it is wrapped in extract_multiple
use warnings;
use strict;
use feature 'say';
use Text::Balanced qw(extract_multiple extract_tagged);
use Path::Tiny; # path(). for slurp
my $file = shift // die "Usage: $0 file-to-parse\n";
my #functions = path($file)->slurp =~ /( func1\( [^;]* \) );/xg;
s/\s+/ /g for #functions;
for my $func (#functions) {
my ($args) = $func =~ /func1\s*\(\s* (.*) \s*\)/x;
say $args;
my #parts = extract_multiple( $args, [ sub {
extract_tagged($args, '\\w+\\(', '\\\)', '.*?(?=\w+\()')
} ] );
my #arguments = grep { /\S/ } map { /\(/ ? $_ : split /\s*,\s*/ } #parts;
s/^\s*|\s*\z//g for #arguments;
say "\t$_" for #arguments;
}
The extract_multiple returns parts with the (nested) function calls alone (identifiable by having parens), which are arguments as they stand and what we sought with all this, and parts which are strings with comma-separated groups of other arguments, that are split into individual arguments.
Note the amount of escaping in extract_tagged (found by trial and error)! This is needed because those strings are twice double-quoted in a string-eval. That isn't documented at all, so see the source (eg here).
Or directly produce escape-hungry characters (\x5C for \), which then need no escaping
extract_tagged($_[0], "\x5C".'w+'."\x5C(", '\x5C)', '.*?(?=\w+\()')
I don't know which I'd call "clearer"
I tested on the file provided in the question, to which I added a function
func1( a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e );
For each function the program prints the string with the argument list to parse and the parsed arguments, and the most interesting part of the output is for the above (added) function
[ ... ]
a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e
a
b
f2(a2, f3(a3, b3), b2)
c
f4(a4, b4)
d
e
Not Perl but perhaps simpler:
$ cat >test2.c <<'EOD'
int main() {
func1( a, b, c, d1);
func1( a, b,
c, d2);
func1( func2(), "quotes\"),(", /*comments),(*/ g(b,
c), "d3", e );
func1( func2(a), b, c, d4(p,q,r), e );
func1( a, b, c, func2( func1(a,b,c,d5,e,f) ), g, h);
return 1;
}
EOD
$ cpp -D'func1(a,b,c,d,...)=SHOW(__FILE__,__LINE__,d,)' test2.c |
grep SHOW
SHOW("test2.c",2,d1);
SHOW("test2.c",3,d2)
SHOW("test2.c",5,"d3")
SHOW("test2.c",7,d4(p,q,r));
SHOW("test2.c",8,func2( SHOW("test2.c",8,d5) ));
$
As the final line shows, a bit more work is needed if the function can take itself as an argument.

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

recursive matching for string delimiter with regular expression

In verilog language, the statements are enclosed in a begin-end delimiter instead of bracket.
always# (*) begin
if (condA) begin
a = c
end
else begin
b = d
end
end
I'd like to parse outermost begin-end with its statements to check coding rule in python. Using regular expression, I want results with regular expression like:
if (condA) begin
a = c
end
else begin
b = d
end
I found similar answer for bracket delimiter.
int funcA() {
if (condA) {
b = a
}
}
regular expression:
/({(?>[^{}]+|(?R))*})/g
However, I don't know how to modify atomic group ([^{}]) for "begin-end"?
/(begin(?>[??????]+|(?R))*end)/g
The point of the [??????]+ part is to match any text that does not match a char that is equal or is the starting point of the delimiters.
So, in your case, you need to match any char other than a char that starts either begin or end substring:
/begin(?>(?!begin|end).|(?R))*end/gs
See the regex demo
The . here will match any char including line break chars due to the s modifier. Note that the actual implementation might need adjustments (e.g. in PHP, the g modifier should not be used as there are specific functions/features for that).
Also, since you recurse the whole pattern, you need no outer parentheses.

Character class has duplicated range

The regex is: [%{1,2}|\/\/]
This matches %, %% and //
FYI: this is a warning generated from flycheck.
The warning means that you have duplicates in your character class. When you put something between square brackets, it means "one of those": [abc] means any of a, b or c. [a|b] means any of a, | or b.
So when you do [\/\/], you mean "either /, or /" which is obviously a duplicate. For the same reason, [%{1,2}] means "either % or { or 1 or , or 2 or }" which is clearly not what you want.
The group selector are parenthesis, not square brackets, so use this regex instead:
(%{1,2}|\/\/)

vim regex to match inline comments

Assuming the following sample inline comment:
/*
function newMethodName (int bar, String s) {
int i = 123;
}
s/\<foo\s*(/newMethodName (/g
*/
How would I match and replace such that it would, essentially, become uncommented. I got this far before giving up.
:%s/\/\*\(\_.\)*\*\//\1/
Solution
:%s/\/\*\(\_.*\)\*\//\1/
Your capture group ( ) is capturing one character or newline. Put the following * inside so that \1 replacement gets the whole string rather than just the first character.