shell script to replace all function calls with another call - regex

I have modified a function lets say
foo(int a, char b); into foo(int a, char b, float new_var);
The old function foo was is called from many places in the existing source code, I want to replace all occurence of foo(some_a, some_b); with foo(some_a, some_b, 0); using a script (manual editing is laborius).
I tried to do the same with grep 's/old/new/g' file but the problem is with using regex to match the parameters, and insert the same into replacement text.
PS: a & b are either valid C variable names OR valid constant values for their types.

Since we are working with C code, we don't need to worry about function overloading issue.
Under the assumption of simple variable names and constant being passed into function call, and that the code was compilable before the change in function prototype:
Search with foo(\([^,]*\),\([^,]*\)) and replace with foo(\1,\2,0).
There are a few caveats, though:
The char field should not contain ','. It will not be replaced if it does.
The char field should not contain ). It will cause bad replacement.
There is no function whose name ends with foo. The replacement may make a mess out of your code if there is.
Search with \(foo([^)]+\) and replace with \1,0 (Thanks to glenn jackman)
There are a few caveats, though:
Both int and char field should not contain ). It will cause bad replacement.
There is no function whose name ends with foo. Similarly, the replacement may make a mess out of your code.

This should make it:
sed -i.bak -r "s/foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\)/foo(\1, \2, 0)/g" file
Note the -i.bak stands for "replace inplace" and create a backup of the original file in file.bak.
foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\) looks for something like foo( + spaces + digits + spaces + , + spaces + [ 'character' or character ] + spaces + ). And replaces with foo( + digits + , + character + 0 + ).
Example
$ cat a
hello foo(23, h);
this is another foo(23, bye);
this is another foo(23, 'b');
and we are calling to_foo(23, 'b'); here
foo(23, 'b'); here
$ sed -i.bak -r "s/foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\)/foo(\1, \2, 0)/g" a
$ cat a
hello foo(23, h, 0);
this is another foo(23, bye);
this is another foo(23, 'b', 0);
and we are calling to_foo(23, 'b', 0); here
foo(23, 'b', 0); here

Related

perl regex for matching multiline calls to c function

I'm looking to have a regex to match all potentially multiline calls to a variadic c function. The end goal is to print the file, line number, and the fourth parameter of each call, but unfortunately I'm not there yet. So far, I have this:
perl -ne 'print if s/^.*?(func1\s*\(([^\)\(,]+||,|\((?2)\))*\)).*?$/$1/s' test.c
with test.c:
int main() {
func1( a, b, c, d);
func1( a, b,
c, d);
func1( func2(), b, c, d, e );
func1( func2(a), b, c, d, e );
return 1;
}
-- which does not match the second call. The reason it doesn't match is that the s at the end of the expression allows . to match newlines, but doesn't seem to allow [..] constructs to match newlines. I'm not sure how to get past this.
I'm also not sure how to reference the fourth parameter in this... the $2, $3 do not get populated in this (and even if they did I imagine I would get some issues due to the recursive nature of the regex).
This should catch your functions, with caveats
perl -0777 -wnE'#f = /(func1\s*\( [^;]* \))\s*;/xg; s/\s+/ /g, say for #f' tt.c
I use the fact that a statement must be terminated by ;. Then this excludes an accidental ; in a comment and it excludes calls to this being nested inside another call. If that is possible then quite a bit more need be done to parse it.
However, further parsing the captured calls, presumably by commas, is complicated by the fact that a nested call may well, and realistically, contain commas. How about
func1( a, b, f2(a2, b2), c, f3(a3, b3), d );
This becomes a far more interesting little parsing problem. Or, how about macros?
Can you clarify what kinds of things one doesn't have to account for?
As the mentioned caveats may be possible to ignore here is a way to parse the argument list, using
Text::Balanced.
Since we need to extract whole function calls if they appear as an argument, like f(a, b), the most suitable function from the library is extract_tagged. With it we can make the opening tag be a word-left-parenthesis (\w+\() and the closing one a right-parenthesis \).
This function extracts only the first occurrence so it is wrapped in extract_multiple
use warnings;
use strict;
use feature 'say';
use Text::Balanced qw(extract_multiple extract_tagged);
use Path::Tiny; # path(). for slurp
my $file = shift // die "Usage: $0 file-to-parse\n";
my #functions = path($file)->slurp =~ /( func1\( [^;]* \) );/xg;
s/\s+/ /g for #functions;
for my $func (#functions) {
my ($args) = $func =~ /func1\s*\(\s* (.*) \s*\)/x;
say $args;
my #parts = extract_multiple( $args, [ sub {
extract_tagged($args, '\\w+\\(', '\\\)', '.*?(?=\w+\()')
} ] );
my #arguments = grep { /\S/ } map { /\(/ ? $_ : split /\s*,\s*/ } #parts;
s/^\s*|\s*\z//g for #arguments;
say "\t$_" for #arguments;
}
The extract_multiple returns parts with the (nested) function calls alone (identifiable by having parens), which are arguments as they stand and what we sought with all this, and parts which are strings with comma-separated groups of other arguments, that are split into individual arguments.
Note the amount of escaping in extract_tagged (found by trial and error)! This is needed because those strings are twice double-quoted in a string-eval. That isn't documented at all, so see the source (eg here).
Or directly produce escape-hungry characters (\x5C for \), which then need no escaping
extract_tagged($_[0], "\x5C".'w+'."\x5C(", '\x5C)', '.*?(?=\w+\()')
I don't know which I'd call "clearer"
I tested on the file provided in the question, to which I added a function
func1( a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e );
For each function the program prints the string with the argument list to parse and the parsed arguments, and the most interesting part of the output is for the above (added) function
[ ... ]
a, b, f2(a2, f3(a3, b3), b2), c, f4(a4, b4), d, e
a
b
f2(a2, f3(a3, b3), b2)
c
f4(a4, b4)
d
e
Not Perl but perhaps simpler:
$ cat >test2.c <<'EOD'
int main() {
func1( a, b, c, d1);
func1( a, b,
c, d2);
func1( func2(), "quotes\"),(", /*comments),(*/ g(b,
c), "d3", e );
func1( func2(a), b, c, d4(p,q,r), e );
func1( a, b, c, func2( func1(a,b,c,d5,e,f) ), g, h);
return 1;
}
EOD
$ cpp -D'func1(a,b,c,d,...)=SHOW(__FILE__,__LINE__,d,)' test2.c |
grep SHOW
SHOW("test2.c",2,d1);
SHOW("test2.c",3,d2)
SHOW("test2.c",5,"d3")
SHOW("test2.c",7,d4(p,q,r));
SHOW("test2.c",8,func2( SHOW("test2.c",8,d5) ));
$
As the final line shows, a bit more work is needed if the function can take itself as an argument.

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

Replace single backslash with double in a string c++

I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.

How do you call a batch file with an argument that has quotes, using system()

For example, in the command line this works (the 1st argument has quotes but the 2nd argument doesn't):
"test.bat" "a" b
i.e it know that "a" is the 1st argument and b is the second
but using system() it doesn't work:
system("test.bat" "a" b)
this also doesn't work:
system("test.bat" \"a\" b)
This is gonna be simplest if we use a raw string literal. A raw string literal is a way of writing a string in c++ where nothing gets escaped. Let's look at an example:
char const* myCommand = R"(test.bat "a" b)";
The R at the beginning indicates that it's a raw string literal, and if you call system(myCommand), it will be exactly equivalent to typing
$ test.bat "a" b
into the command line. Now, suppose you want to escape the quotes on the command line:
$ test.bat \"a\" b
With a raw string literal, this is simple:
char const* myCommand = R"(test.bat \"a\" b)";
system(myCommand);
Or, alternatively:
system(R"(test.bat \"a\" b)");
Hope this helps!
A bit more info on raw string literals: Raw string literals are a great feature, and they basically allow you to copy+paste any text directly into your program. They begin with R, followed by a quote and a parenthesis. Only the stuff inside the parenthesis gets included. Examples:
using std::string;
string a = R"(Hello)"; // a == "Hello"
Begin and end with "raw":
string b = R"raw(Hello)raw"; // b == "Hello"
Begin and end with "foo"
string c = R"foo(Hello)foo"; // c == "Hello"
Begin and end with "x"
string d = R"x(Hello)x"; // d == "Hello"
The important thing is that we begin and end the literal with the same string of letters (called the delimiter), followed by the parenthesis. This ensures we never have a reason to escape something inside the raw string literal, because we can always change the delimiter so that it's not something found inside the string.
I got it to work now:
system(R"(C:\"to erase\test.bat" "a")");
I found the answer: system("test.bat" ""a"" b);
or more precisely: system("\"test.bat\" ""a"" b");
So the answer is to escape the quotes with a double quote

Perl regex search and replace with $variable->function("args")

Basically I'm trying to replace with whatever is returned from a function call from an object. But I need a return value from the regex search as the argument. It's a little tricky but the code should speak for itself:
while ( $token =~ s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", qw($1))/e ) { }
The error I'm getting is that $1 is not getting evaluated to anything (the argument literally becomes "$1") so it screws up my getValue() method.
Cheers
The qw() functions quotes "words". I.e. it splits a string at all whitespace characters and returns that list. It does not interpolate.
You can just use the variable "as is":
s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", $1)/e
The qw() function is very different from
q(abc) (<=> 'abc'),
qq(abc) (<=> "abc"), and
qx(abc) (<=> `abc`) or
qr(abc) (<=> m/abc/):
qw(a b c) <=> ('a', 'b', 'c')