I'm looking to have a regex to match all potentially multiline calls to a variadic c function. The end goal is to print the file, line number, and the fourth parameter of each call, but unfortunately I'm not there yet. So far, I have this:
perl -ne 'print if s/^.*?(func1\s*\(([^\)\(,]+||,|\((?2)\))*\)).*?$/$1/s' test.c
with test.c:
int main() {
func1( a, b, c, d);
func1( a, b,
c, d);
func1( func2(), b, c, d, e );
func1( func2(a), b, c, d, e );
return 1;
}
-- which does not match the second call. The reason it doesn't match is that the s at the end of the expression allows . to match newlines, but doesn't seem to allow [..] constructs to match newlines. I'm not sure how to get past this.
I'm also not sure how to reference the fourth parameter in this... the $2, $3 do not get populated in this (and even if they did I imagine I would get some issues due to the recursive nature of the regex).
This should catch your functions, with caveats
perl -0777 -wnE'#f = /(func1\s*\( [^;]* \))\s*;/xg; s/\s+/ /g, say for #f' tt.c
I use the fact that a statement must be terminated by ;. Then this excludes an accidental ; in a comment and it excludes calls to this being nested inside another call. If that is possible then quite a bit more need be done to parse it.
However, further parsing the captured calls, presumably by commas, is complicated by the fact that a nested call may well, and realistically, contain commas. How about
func1( a, b, f2(a2, b2), c, f3(a3, b3), d );
This becomes a far more interesting little parsing problem. Or, how about macros?
Can you clarify what kinds of things one doesn't have to account for?
As the mentioned caveats may be possible to ignore here is a way to parse the argument list, using
Text::Balanced.
Since we need to extract whole function calls if they appear as an argument, like f(a, b), the most suitable function from the library is extract_tagged. With it we can make the opening tag be a word-left-parenthesis (\w+\() and the closing one a right-parenthesis \).
This function extracts only the first occurrence so it is wrapped in extract_multiple
use warnings;
use strict;
use feature 'say';
use Text::Balanced qw(extract_multiple extract_tagged);
use Path::Tiny; # path(). for slurp
my $file = shift // die "Usage: $0 file-to-parse\n";
my #functions = path($file)->slurp =~ /( func1\( [^;]* \) );/xg;
s/\s+/ /g for #functions;
for my $func (#functions) {
my ($args) = $func =~ /func1\s*\(\s* (.*) \s*\)/x;
say $args;
my #parts = extract_multiple( $args, [ sub {
extract_tagged($args, '\\w+\\(', '\\\)', '.*?(?=\w+\()')
} ] );
my #arguments = grep { /\S/ } map { /\(/ ? $_ : split /\s*,\s*/ } #parts;
s/^\s*|\s*\z//g for #arguments;
say "\t$_" for #arguments;
}
The extract_multiple returns parts with the (nested) function calls alone (identifiable by having parens), which are arguments as they stand and what we sought with all this, and parts which are strings with comma-separated groups of other arguments, that are split into individual arguments.
Note the amount of escaping in extract_tagged (found by trial and error)! This is needed because those strings are twice double-quoted in a string-eval. That isn't documented at all, so see the source (eg here).
Or directly produce escape-hungry characters (\x5C for \), which then need no escaping
extract_tagged($_[0], "\x5C".'w+'."\x5C(", '\x5C)', '.*?(?=\w+\()')
I don't know which I'd call "clearer"
I tested on the file provided in the question, to which I added a function
func1( a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e );
For each function the program prints the string with the argument list to parse and the parsed arguments, and the most interesting part of the output is for the above (added) function
[ ... ]
a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e
a
b
f2(a2, f3(a3, b3), b2)
c
f4(a4, b4)
d
e
Not Perl but perhaps simpler:
$ cat >test2.c <<'EOD'
int main() {
func1( a, b, c, d1);
func1( a, b,
c, d2);
func1( func2(), "quotes\"),(", /*comments),(*/ g(b,
c), "d3", e );
func1( func2(a), b, c, d4(p,q,r), e );
func1( a, b, c, func2( func1(a,b,c,d5,e,f) ), g, h);
return 1;
}
EOD
$ cpp -D'func1(a,b,c,d,...)=SHOW(__FILE__,__LINE__,d,)' test2.c |
grep SHOW
SHOW("test2.c",2,d1);
SHOW("test2.c",3,d2)
SHOW("test2.c",5,"d3")
SHOW("test2.c",7,d4(p,q,r));
SHOW("test2.c",8,func2( SHOW("test2.c",8,d5) ));
$
As the final line shows, a bit more work is needed if the function can take itself as an argument.
I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here
I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.
For example, in the command line this works (the 1st argument has quotes but the 2nd argument doesn't):
"test.bat" "a" b
i.e it know that "a" is the 1st argument and b is the second
but using system() it doesn't work:
system("test.bat" "a" b)
this also doesn't work:
system("test.bat" \"a\" b)
This is gonna be simplest if we use a raw string literal. A raw string literal is a way of writing a string in c++ where nothing gets escaped. Let's look at an example:
char const* myCommand = R"(test.bat "a" b)";
The R at the beginning indicates that it's a raw string literal, and if you call system(myCommand), it will be exactly equivalent to typing
$ test.bat "a" b
into the command line. Now, suppose you want to escape the quotes on the command line:
$ test.bat \"a\" b
With a raw string literal, this is simple:
char const* myCommand = R"(test.bat \"a\" b)";
system(myCommand);
Or, alternatively:
system(R"(test.bat \"a\" b)");
Hope this helps!
A bit more info on raw string literals: Raw string literals are a great feature, and they basically allow you to copy+paste any text directly into your program. They begin with R, followed by a quote and a parenthesis. Only the stuff inside the parenthesis gets included. Examples:
using std::string;
string a = R"(Hello)"; // a == "Hello"
Begin and end with "raw":
string b = R"raw(Hello)raw"; // b == "Hello"
Begin and end with "foo"
string c = R"foo(Hello)foo"; // c == "Hello"
Begin and end with "x"
string d = R"x(Hello)x"; // d == "Hello"
The important thing is that we begin and end the literal with the same string of letters (called the delimiter), followed by the parenthesis. This ensures we never have a reason to escape something inside the raw string literal, because we can always change the delimiter so that it's not something found inside the string.
I got it to work now:
system(R"(C:\"to erase\test.bat" "a")");
I found the answer: system("test.bat" ""a"" b);
or more precisely: system("\"test.bat\" ""a"" b");
So the answer is to escape the quotes with a double quote
Basically I'm trying to replace with whatever is returned from a function call from an object. But I need a return value from the regex search as the argument. It's a little tricky but the code should speak for itself:
while ( $token =~ s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", qw($1))/e ) { }
The error I'm getting is that $1 is not getting evaluated to anything (the argument literally becomes "$1") so it screws up my getValue() method.
Cheers
The qw() functions quotes "words". I.e. it splits a string at all whitespace characters and returns that list. It does not interpolate.
You can just use the variable "as is":
s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", $1)/e
The qw() function is very different from
q(abc) (<=> 'abc'),
qq(abc) (<=> "abc"), and
qx(abc) (<=> `abc`) or
qr(abc) (<=> m/abc/):
qw(a b c) <=> ('a', 'b', 'c')