Keep track of matches and check against condition - regex

I have $entire_line = "if varC > 0: varB = varC + 2"
I would like my regex to find the following: varC, varB, varB in the $entire_line
These matches then need to be checked to see whether they exist in a HashMap. If so, a $ should be appended to the match.
Hence the output should be:
"if $varC > 0: $varB = $varC + 2"
NOTE: 0 and 2 don't appear in the HashMap.
Currently, I have:
$entire_line =~ s/(\w+)/\$$1/g if (exists($variable_hash{$1}));
However, this does not work as intended as the $1 in exists($variable_hash{$1}) does not refer to the previous regex: $entire_line =~ s/(\w+)/\$$1/g
Is there a proper way to go about this?
Thanks for your help.

Use the /e modifier and put the code into the replacement part:
$entire_line =~ s/(\w+)/exists $variable_hash{$1} ? $variable_hash{$1} : $1/ge;

If I got your question correctly and you don't need to perform variable value substitution (as in #choroba's answer), but only append $ character to known variables, and if the %variables_hash is not very long, how about concatenating all the keys of %variables_hash with a | character to get a regex matching all known variables?
my %variable_hash = (
varA => 1,
# varB => 1, # commented out to check that it will not be replaced
varC => 1,
);
my $entire_line = "if varC > 0: varB = varC + 2;";
my $key_regex = join('|', map { quotemeta $_; } keys %variable_hash);
# $key_regex will contain "varA|varC"
$entire_line =~ s/\b($key_regex)\b/\$$1/g;
# prefix all matching substrings with $ character
print "$entire_line\n";
Also check my comment to #choroba's answer.

Related

How to remove and ID from a string

I have a string that looks like this, they are ids in a table:
1,2,3,4,5,6,7,8,9
If someone deletes something from the database, I will need to update the string. I know that doing this it will remove the value, but not the commas. Any idea how can I check if the id has a comma before and after so my string doesn't break?
$new_values = $original_values[0];
$new_values =~ s/$car_id//;
Result: 1,2,,4,5,6,7,8,9 using the above sample (bad). It should be 1,2,4,5,6,7,8,9.
To remove the $car_id from the string:
my $car_id = 3;
my $new_values = q{1,2,3,4,5,6,7,8,9};
$new_values = join q{,}, grep { $_ != $car_id }
split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9
If you already removed the id(s), and you need to remove the extra commas, reformat the string like so:
my $new_values = q{,,1,2,,4,5,6,7,8,9,,,};
$new_values = join q{,}, grep { /\d/ } split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9
You can use
s/^$car_id,|,$car_id\b//
Details
^ - start of string
$car_id - variable value
, - comma
| - or
, - comma
$car_id - variable value
\b - word boundary.
s/^\Q$car_id\E,|,\Q$car_id\E\b//
Another approach is to store an extra leading and trailing comma (,1,2,3,4,5,6,7,8,9,)
The main benefit is that it makes it easier to search for the id using SQL (since you can search for ,$car_id,). Same goes for editing it.
On the Perl side, you'd use
s/,\K\Q$car_id\E,// # To remove
substr($_, 1, -1) # To get actual string
Ugly way: use regex to remove the value, then simplify
$new_values = $oringa_value[0];
$new_values =~ s/$car_id//;
$new_values =~ s/,+/,/;
Nice way: split and merge
$new_values = $oringa_value[0];
my #values = split(/,/, $new_values);
my $index = 0;
$index++ until $values[$index] eq $car_id;
splice(#values, $index, 1);
$new_values = join(',', #values);

Perl hash substitution with special characters in keys

My current script will take an expression, ex:
my $expression = '( a || b || c )';
and go through each boolean combination of inputs using sub/replace, like so:
my $keys = join '|', keys %stimhash;
$expression =~ s/($keys)\b/$stimhash{$1}/g;
So for example expression may hold,
( 0 || 1 || 0 )
This works great.
However, I would like to allow the variables (also in %stimhash) to contain a tag, *.
my $expression = '( a* || b* || c* )';
Also, printing the keys of the stimhash returns:
a*|b*|c*
It is not properly substituting/replacing with the extra special character, *.
It gives this warning:
Use of uninitialized value within %stimhash in substitution iterator
I tried using quotemeta() but did not have good results so far.
It will drop the values. An example after the substitution looks like:
( * || * || * )
Any suggestions are appreciated,
John
Problem 1
You use the pattern a* thinking it will match only a*, but a* means "0 or more a". You can use quotemeta to convert text into a regex pattern that matches that text.
Replace
my $keys = join '|', keys %stimhash;
with
my $keys = join '|', map quotemeta, keys %stimhash;
Problem 2
\b
is basically
(?<!\w)(?=\w)|(?<=\w)(?!\w)
But * (like the space) isn't a word character. The solution might be to replace
s/($keys)\b/$stimhash{$1}/g
with
s/($keys)(?![\w*])/$stimhash{$1}/g
though the following make more sense to me
s/(?<![\w*])($keys)(?![\w*])/$stimhash{$1}/g
Personally, I'd use
s{([\w*]+)}{ $stimhash{$1} // $1 }eg

Perl Replace 26 characters with numeric

I would like to replace a string with the numerical correspondent.
For example (one-liner on Windows):
perl -e "$_ = \"abcdefghijklmnopqrstuvwxyz\"; tr\a-z\1-9\;"
The result is:
12345678999999999999999999
This works until 9 but how I can assign the numeric correspondent after character i?
I would like to know how I can assign 2 sign to one 1 sign,
for example,
12 -> j, 13 -> k, etc.
To identify the numerical value it would makes sense to assign
"1-", "2-", ... "25-", "26".
perl -E"$_ = 'abcdefghijklmnopqrstuvwxyz'; s/([a-z])/ord($1)-96/ge; say;"
or if you have 5.14+
perl -E"say 'abcdefghijklmnopqrstuvwxyz' =~ s/([a-z])/ord($1)-96/ger;"
You can substitute any rule instead of ord($1) - 96.
I don't believe tr/// can do that unfortunately - it's a one-to-one character substitution. So you're going to have to go the long way round:
my %indicies = map { $_ => (ord($_) - ord('a')) + 1 } ('a' .. 'z');
my $result = join '', map { $indicies{$_} } split(//, $string);
Unfortunately that's not a one-liner.

How to retrieve 1 (from M 1 COMPLD) this line using regexp in TCL?

set sample "act-user:IMLI:nmss:1::***;
imli 2013-10-21 15:13:54
M 1 COMPLD
;
IMLI 2013-10-21 15:13:54
;
>"
How to retrieve 1 (from M 1 COMPLD) this line using regexp in TCL ???
You need to use a non-default matching mode — line-aware — to make that RE simple:
regexp -line {^M\s+(\d+)\s+COMPLD$} $sample -> value
puts "value = $value"
Alternatively, you can put the option inside the RE itself:
regexp {(?n)^M\s+(\d+)\s+COMPLD$} $sample -> value
puts "value = $value"
The behaviour is exactly equivalent.

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""