Perl regex search and replace with $variable->function("args") - regex

Basically I'm trying to replace with whatever is returned from a function call from an object. But I need a return value from the regex search as the argument. It's a little tricky but the code should speak for itself:
while ( $token =~ s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", qw($1))/e ) { }
The error I'm getting is that $1 is not getting evaluated to anything (the argument literally becomes "$1") so it screws up my getValue() method.
Cheers

The qw() functions quotes "words". I.e. it splits a string at all whitespace characters and returns that list. It does not interpolate.
You can just use the variable "as is":
s/\$P\(([a-z0-9A-Z_]+)\)/$db->getValue("params", $1)/e
The qw() function is very different from
q(abc) (<=> 'abc'),
qq(abc) (<=> "abc"), and
qx(abc) (<=> `abc`) or
qr(abc) (<=> m/abc/):
qw(a b c) <=> ('a', 'b', 'c')

Related

Invalid regular expression - Invalid property name in character class

I am using a fastify server, containing a typescript file that calls a function, which make sure people won't send unwanted characters. Here is the function :
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Latin}\p{Zs}\p{M}\p{Nd}\-\'\s]/gu;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But when I try to launch my server, I got an error message :
"Invalid regular expression - Invalid property name in character class".
It used to work just fine with my previous regex :
const SAFE_STRING_REPLACE_REGEXP = /[^0-9a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð\-\s\']/g;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But I have been told it wasn't optimized enough. I have also been told it's better to use split/join than regex/replace in matter of performances, but I don't know if I can use it in my case.
You need to use
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
// or
const SAFE_STRING_REPLACE_REGEXP = /[^\p{sc=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
You need to prefix scripts with sc= or Script= in Unicode category classes, so \p{Latin} should be specified as \p{Script=Latin}. See the ECMAScript reference.
Also, when you use the u flag, you cannot escape non-special chars, so do not escape ' and better move the - char to the end of the character class.
You can use split&join, too:
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/u;
console.log("Ącki-Łał русский!!!中国".split(SAFE_STRING_REPLACE_REGEXP).join(""))
Note you don't need the g modifier with split, it is the default behavior.

How can I used regular expressions to find all lines of source code defining a default arguments for a function?

I want to find lines of code which declare functions with default arguments, such as:
int sum(int a, int b=10, int c=20);
I was thinking I would look for:
The first part of the matched pattern is exactly one left-parenthesis "("
The second part of string is one or more of any character excluding "="
exactly one equals-sign "="
a non-equal-sign
one or more characters except right parenthesis ")"
")"
The following is my attempt:
([^=]+=[^=][^)]+)
I would like to avoid matching condition-clauses for if-statements and while-loops.
For example,
int x = 5;
if (x = 10) {
x = 7;
}
Our regex should find functions with default arguments in any one of python, Java, or C++. Let us not assume that function declarations end with semi-colon, or begin with a data-type
Try this:
\([^)]*\w+\s+\w+\s*=[^),][^)]*\)
See live demo.
It looks for words chars (the param type), space(s), word chars (the param name), optional space(s), then an equals sign.
Add ".*" to each end to match the whole line.
Please check this one:
\(((?:\w+\s+[\w][\w\s=]*,*\s*){1,})\)
The above expression matches the parameter list and returns it as $1 (Group 1), in case it is needed for further processing.
demo here

How do you call a batch file with an argument that has quotes, using system()

For example, in the command line this works (the 1st argument has quotes but the 2nd argument doesn't):
"test.bat" "a" b
i.e it know that "a" is the 1st argument and b is the second
but using system() it doesn't work:
system("test.bat" "a" b)
this also doesn't work:
system("test.bat" \"a\" b)
This is gonna be simplest if we use a raw string literal. A raw string literal is a way of writing a string in c++ where nothing gets escaped. Let's look at an example:
char const* myCommand = R"(test.bat "a" b)";
The R at the beginning indicates that it's a raw string literal, and if you call system(myCommand), it will be exactly equivalent to typing
$ test.bat "a" b
into the command line. Now, suppose you want to escape the quotes on the command line:
$ test.bat \"a\" b
With a raw string literal, this is simple:
char const* myCommand = R"(test.bat \"a\" b)";
system(myCommand);
Or, alternatively:
system(R"(test.bat \"a\" b)");
Hope this helps!
A bit more info on raw string literals: Raw string literals are a great feature, and they basically allow you to copy+paste any text directly into your program. They begin with R, followed by a quote and a parenthesis. Only the stuff inside the parenthesis gets included. Examples:
using std::string;
string a = R"(Hello)"; // a == "Hello"
Begin and end with "raw":
string b = R"raw(Hello)raw"; // b == "Hello"
Begin and end with "foo"
string c = R"foo(Hello)foo"; // c == "Hello"
Begin and end with "x"
string d = R"x(Hello)x"; // d == "Hello"
The important thing is that we begin and end the literal with the same string of letters (called the delimiter), followed by the parenthesis. This ensures we never have a reason to escape something inside the raw string literal, because we can always change the delimiter so that it's not something found inside the string.
I got it to work now:
system(R"(C:\"to erase\test.bat" "a")");
I found the answer: system("test.bat" ""a"" b);
or more precisely: system("\"test.bat\" ""a"" b");
So the answer is to escape the quotes with a double quote

Why is the flex regex being skipped?

I can't, for the life of me, figure out what's wrong with my regex's.
What I'd like to tokenize are two (2) types of strings, both of which to be contained on a single line. One string can be anything (other than a new line), and the other, any alpha-numeric (ASCII) character and literal '_', '/' '-', and '.'.
The snippet of flex code is:
nl \n|\r\n|\r|\f|\n\r
...
%%
...
\"[^\"]+{nl} { frx_parser_error("Label is missing trailing double quote."); }
\"[a-zA-Z0-9_\.\/\-]+\" {
if (yyleng > 1024) frx_parser_error("File name too long.");
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_FILENAME: %s\n", frx_parser_lval.str);
return (TOSP_FILENAME);
}
\"[^{nl}]+\" {
yytext[yyleng - 1] = '\0';
frx_parser_lval.str = strdup(yytext+1);
fprintf(stderr,"TOSP_IDENTIFIER:\n%s\n", frx_parser_lval.str);
return (TOSP_IDENTIFIER);
}
And when I run the parser, the fprintf's spit this out:
TOSP_FILENAME: ModStar-Picture-Analysis.txt
TOSP_FILENAME: ModStar-Rubric.log.txt
TOSP_IDENTIFIER:
picture-A"
Progress (26,255) camera 'C' root("picture-C-
Syntax (line 34): syntax error
For whatever reason, the quote after picture-A is being ... missed. Why? I checked the ASCII values for the eight locations the quote character appears and they're all 0x22 (where the double quutoes appear that is).
If I add some characters to the end of the "picture-A" it can work sometimes; adding ".par", ".pbr" doesn't work as expected, but ".pnr" does.
I've even added a specific non-regexy token:
\"picture-A\" { frx_parser_lval.str = strdup("picture-A"); return TOSP_FILENAME; }
to the lex file and it gets skipped.
I'm using flex 2.5.39, no flex libraries, one option (%option prefix=frx_parser_) in the lex file and the flex command line is:
flex -t script-lexer.l > script-lexer.c
What gives?
EDIT I need to test this on the actual system, but unit tests show this tokenizer to be much more robust (based on rici's answer):
nl \n|\r\n|\r|\f|\n\r
...
%%
...
["][^"]+{nl} { printf("Missing trailing quote.\n%s\n",yytext); }
["][[:alnum:]_./-]+["] { printf("File name:\n%s\n",yytext); }
["][^"]+["] { printf("String:\n%s\n",yytext); }
EDIT The rule ["].+["] swallows consecutive multiple strings as one big string. It was changed to ["][^"]+["]
The problem is your pattern:
\"[^{nl}]+\"
You're attempting to expand a definition inside a character class, but that is not possible; inside a character class, { is always just a {, not a flex operator. See the flex manual:
Note that inside of a character class, all regular expression operators lose their special meaning except escape (‘\’) and the character class operators, ‘-’, ‘]]’, and, at the beginning of the class, ‘^’.
A definition is not a macro. Rather, a definition defines a new regular expression operator.
As a consequence of the above, you can write [^\"] as simply [^"] and \"[a-zA-Z0-9_\.\/\-]+\" as \"[a-zA-Z0-9_./-]+\" (The - needs to be either at the end or at the beginning.) Personally, I'd write the second pattern as:
["][[:alnum:]_./-]+["]
But everyone has their own style.

shell script to replace all function calls with another call

I have modified a function lets say
foo(int a, char b); into foo(int a, char b, float new_var);
The old function foo was is called from many places in the existing source code, I want to replace all occurence of foo(some_a, some_b); with foo(some_a, some_b, 0); using a script (manual editing is laborius).
I tried to do the same with grep 's/old/new/g' file but the problem is with using regex to match the parameters, and insert the same into replacement text.
PS: a & b are either valid C variable names OR valid constant values for their types.
Since we are working with C code, we don't need to worry about function overloading issue.
Under the assumption of simple variable names and constant being passed into function call, and that the code was compilable before the change in function prototype:
Search with foo(\([^,]*\),\([^,]*\)) and replace with foo(\1,\2,0).
There are a few caveats, though:
The char field should not contain ','. It will not be replaced if it does.
The char field should not contain ). It will cause bad replacement.
There is no function whose name ends with foo. The replacement may make a mess out of your code if there is.
Search with \(foo([^)]+\) and replace with \1,0 (Thanks to glenn jackman)
There are a few caveats, though:
Both int and char field should not contain ). It will cause bad replacement.
There is no function whose name ends with foo. Similarly, the replacement may make a mess out of your code.
This should make it:
sed -i.bak -r "s/foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\)/foo(\1, \2, 0)/g" file
Note the -i.bak stands for "replace inplace" and create a backup of the original file in file.bak.
foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\) looks for something like foo( + spaces + digits + spaces + , + spaces + [ 'character' or character ] + spaces + ). And replaces with foo( + digits + , + character + 0 + ).
Example
$ cat a
hello foo(23, h);
this is another foo(23, bye);
this is another foo(23, 'b');
and we are calling to_foo(23, 'b'); here
foo(23, 'b'); here
$ sed -i.bak -r "s/foo\(\s*([0-9]*)\s*,\s*(([0-Z]|'[0-Z]'))\s*\)/foo(\1, \2, 0)/g" a
$ cat a
hello foo(23, h, 0);
this is another foo(23, bye);
this is another foo(23, 'b', 0);
and we are calling to_foo(23, 'b', 0); here
foo(23, 'b', 0); here