Perl regex replacement string special variable - regex

I'm aware of the match, prematch, and postmatch predefined variables. I'm wondering if there is something similar for the evaluated replacement part of the s/// operator.
This would be particularly useful in dynamic expressions so they don't have to be evaluated a 2nd time.
For example, I currently have %regexs which is a hash of various search and replace strings.
Here's a snippet:
while (<>) {
foreach my $key (keys %regexs) {
while (s/$regexs{$key}{'search'}/$regexs{$key}{'replace'}/ee) {
# Here I want to do something with just the replaced part
# without reevaluating.
}
}
print;
}
Is there a convenient way to do it? Perl seems to have so many convenient shortcuts, and it seems like a waste to have to evaluate twice (which appears to be the alternative).
EDIT: I just wanted to give an example: $regexs{$key}{'replace'} might be the string '"$2$1"' thus swapping the positions of some text in the string $regexs{$key}{'search'} which might be '(foo)(bar)' - thus resulting in "barfoo". The second evaluation that I'm trying to avoid is the output of $regexs{$key}{'replace'}.

Instead of using string eval (which I assume is what's going on with s///ee), you could define code references to do the work. Those code references can then return the value of the replacement text. For example:
use strict;
use warnings;
my %regex = (
digits => sub {
my $r;
return unless $_[0] =~ s/(\d)(\d)_/$r = $2.$1/e;
return $r;
},
);
while (<DATA>){
for my $k (keys %regex){
while ( my $replacement_text = $regex{$k}->($_) ){
print $replacement_text, "\n";
}
}
print;
}
__END__
12_ab_78_gh_
34_cd_78_yz_

I'm pretty sure there isn't any direct way to do what you're asking, but that doesn't mean it's impossible. How about this?
{
my $capture;
sub capture {
$capture = $_[0] if #_;
$capture;
}
}
while (s<$regexes{$key}{search}>
<"capture('" . $regexes{$key}{replace}) . "')">eeg) {
my $replacement = capture();
#...
}
Well, except to do it really properly you'd have to shoehorn a little more code in there to make the value in the hash safe inside a singlequotish string (backslash singlequotes and backslashes).

If you do the second eval manually you can store the result yourself.
my $store;
s{$search}{ $store = eval $replace }e;

why not assign to local vars before:
my $replace = $regexs{$key}{'replace'};
now your evaluating once.

Related

perl: refactor s/.../.../g -> while {}?

I've got a monstrous eval-substitution; here's a simplified version
$ perl -wpe 's#(for )(\w+)#$1 . "user " . qx/id $2/#ge'
which replaces e.g.
Stats for root are bad
Stats for user uid=0(root) gid=0(root) groups=0(root)
are bad
Is there an idiom to turn the s/.../.../g into a loop? Something like
while (m#(for )(\w+)#) {
# explicitly replace match with expression computed over several LOCs
}
Or maybe somehow use map()?
This idiom is to use s///eg. It undeniably better than the alternative you are seeking.
s{pat}{ repl }eg;
is equivalent to
my $out = '';
my $last_pos = 0;
while (m{pat}g) {
$out .= substr($_, $last_pos, $-[0] - $last_pos) . do { repl };
$last_pos = $+[0];
}
$_ = $out . substr($_, $last_pos);
Because you hinted that there would be more than one statements to be executed in the replacement expression, I'd write your code as follows:
s{for \K(\w+)}{
...
...
}eg;
The advantage of curlies is that they can be nested.

How can I match these function calls, and extract the nmae of the function and the first argument?

I am trying to parse an array of elements. Those who match a pattern like the following:
Jim("jjanson", Customer.SALES);
I want to create a hash table like Jim => "jjanson"
How can I do this?
I can not match the lines using:
if($line =~ /\s*[A-Za-z]*"(.*),Customer.*\s*/)
You're not matching either the '(' after the name, nor the ' ' after the comma, before "Customer.".
I can get 'jjanson"' using this expression:
/\s*[A-Za-z]\(*"(.*), Customer.*\s*/
But I assume you don't want jjanson", so we need to modify it like so. (I tend to include the negative character class when I'm looking for simply-delimited stuff. So, in this case I'll make the expression "[^"]*"
/\s*[A-Za-z]\(*"([^"]+)", Customer.*\s*/
Also, I try not to depend upon whitespace, presence or number, I'm going to replace the space with \s*. That you didn't notice that you skipped the whitespace is a good illustration of the need to say "ignore a bunch of whitespace".
/\s*[A-Za-z]\(*"([^"]+)",\s*Customer.*\s*/
Now it's only looking for the sequence ',' + 'Customer' in the significant characters. Functionally, the same, if more flexible.
But since you only do one capture, I can't see what you'd map to what. So I'll do my own mapping:
my %records;
while ( my $line = $source->()) { # simply feed for a source of lines.
my ( $first, $user, $tag )
= $line = m/\s*(\p{Alpha}+)\s*\(\s*"([^"]+)",\s*Customer\.(\S+?)\)\/
;
$records{ $user }
= { first => $first
, username => $user
, tag => $tag
};
}
This is much more than you would tend to need in a one-off, quick solution. But I like to store as much of my input as seems significant.
Note that Jim("jjanson", Customer.SALES); matches the syntax of a function call with two arguments. You can thus abuse string eval:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
my $info = extract_first_arg(q{ Jim("jjanson", Customer.SALES);} );
print Dump $info;
sub extract_first_arg {
my $call = shift;
my ($name) = ($call =~ m{ \A \s* (\w+) }x);
unless ($name) {
warn "Failed to find function name in '$call'";
return;
}
my $username = eval sprintf(q{
package My::DangerZone;
no strict;
local *{ %s } = sub { $_[0] };
%s
}, $name, $call);
return { $name => $username };
}
Output:
---
Jim: jjanson
Or, you can abuse autoloading:
our $AUTOLOAD;
print Dump eval 'no strict;' . q{ Jim("jjanson", Customer.SALES); };
sub AUTOLOAD {
my ($fn) = ($AUTOLOAD =~ /::(\w+)\z/);
return { $fn => $_[0] };
}
I would not necessarily recommend using these methods, especially on input that is not in your control, and in a situation where this script has access to sensitive facilities.
On the other hand, I have, in the right circumstances, utilized this kind of thing in transforming one given set of information into something that can be used elsewhere.
Try this:
$line = 'Jim("jjanson", Customer.SALES)';
my %hashStore = (); #Jim("jjanson"
if($line=~m/^\s*([^\(\)]*)\(\"([^\"]*)\"/g) { $hashStore{$1} = $2; }
use Data::Dumper;
print Dumper \%hashStore;
Output:
$VAR1 = {
'Jim' => 'jjanson'
};

Breaking out of loop causes different results [duplicate]

This question already has answers here:
Why doesn't Perl's each() iterate through the entire hash the second time?
(2 answers)
Closed 7 years ago.
I have this code:
#!/usr/bin/perl
use strict;
use warnings;
my $judge_exes = {
"^A" => "foo",
"^B" => "bar",
"^C" => "baz",
};
sub get_judge {
my ($test_id) = #_;
my $exe = undef;
while (my ($regex, $judge) = each %$judge_exes) {
if ($test_id =~ /$regex/) {
$exe = $judge;
last;
}
}
if ($exe) {
return $exe;
} else {
return "Undefined!";
}
}
print get_judge("A1");
print get_judge("B2");
print get_judge("C3");
(ideone: http://ideone.com/slxebG)
I expect to get the output foobarbaz, but I end up getting fooUndefined!baz. However, when I comment out the last statement, I get the correct behavior:
#!/usr/bin/perl
use strict;
use warnings;
my $judge_exes = {
"^A" => "foo",
"^B" => "bar",
"^C" => "baz",
};
sub get_judge {
my ($test_id) = #_;
my $exe = undef;
while (my ($regex, $judge) = each %$judge_exes) {
if ($test_id =~ /$regex/) {
$exe = $judge;
# last;
}
}
if ($exe) {
return $exe;
} else {
return "Undefined!";
}
}
print get_judge("A1");
print get_judge("B2");
print get_judge("C3");
(ideone: http://ideone.com/QJpxbK)
Why is this happening? (I'm on Perl 5.16.2, but the issue is also present on 5.10.1 and whatever ideone is using.)
As far as I understand, last just breaks out of the while loop, which is what I want.
$exe doesn't seem to be an alternative falsey value causing me to hit the wrong if-branch. (I could return early, which would be better, but I still wouldn't understand the cause of this.)
I think I'm dereferencing the hash reference correctly.
I didn't think that regex matching would have any side effects relating to loop termination.
The regexes seem to be matching correctly, since I can actually get them to match appropriately under some circumstances, so I don't think it's an interpolation issue.
I'm not modifying the container that I'm iterating over.
Am I just making some silly non-Perl related mistake?
You have become confused. I am not sure where the confusion lies, but you must be aware that the each operator maintains its state between calls. That means your last inside the while loop won't terminate the iteration. Instead it will continue where it last left off when it is next encountered.
You should also be sparing with double-quotes, as they interpolate any scalar or array variables, or backslashed control characters.
Here is how I suggest that you write your algorithm
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $judge_exes = {
'^A' => 'foo',
'^B' => 'bar',
'^C' => 'baz',
};
say get_judge('A1', $judge_exes);
say get_judge('B2', $judge_exes);
say get_judge('C3', $judge_exes);
say get_judge('D4', $judge_exes);
say get_judge('E5', $judge_exes);
sub get_judge {
my ($test_id, $judges) = #_;
for my $re ( keys %$judges ) {
return 1 if $test_id =~ /$re/;
}
'Undefined!';
}
output
1
1
1
Undefined!
Undefined!
Oh, it looks like the behavior is explained here:
Using keys %hash in scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if you use last to exit a loop early so that when you re-enter it, the hash iterator has been reset.
Wasn't expecting that — I had assumed that each would reset the iterator.

Matching Regex in Perl

my program contains ascii.txt to match patterns from it.
my program is to implement sed command, just to try write perl code because I am studying perl.
#!/usr/bin/perl
# sed command implementation
use strict;
use warnings;
use subs qw(read_STDIN read_FILE usage);
use IO::File;
use constant {
SEARCH_PRINT => 0,
};
our $proj_name = $0;
main(#ARGV);
sub main
{
if(scalar #_ == 2) {
read_FILE #_;
}
else {
usage
}
}
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
search_print_lines($sed_script, $file_name);
}
}
sub parse_sed_script {
my $command = shift or return;
if($command =~ /^\/([^\/].)*\/$/) {
return SEARCH_PRINT;
}
}
sub search_print_lines {
my ($script, $file) = #_;
my $fh = IO::File->new($file, "r") or error("no file found $file");
while( $_ = $fh->getline ) {
print if $_ =~ $script
}
}
sub usage {
message("Usage: $proj_name sed-script [file]")
}
sub error
{
my $e = shift || 'unkown error';
print("$0: $e\n");
exit 0;
}
When I execute from the shell: sed.pl /Test/ ascii.txt
I found that print if $_ =~ $script, doesn't execute because of the REGEX is stored in scalar variable
the ascii.txt contains.
Test 1
REGEX TEST
When I use print $script in search_print_lines subroutine it prints the regex sent by the user
When you pass something in on the command line and use it in your script, the entire literal text is used. So if you pass in /Test/, it will see those slashes as literals, so the "real" regular expression it's looking at is something like \/Test\/ (escaping the slashes, because now it's looking for them. Try passing in the regex without the // surrounding it.
If your goal is to allow the // to show that it's a regular expression, I would remove them when the program starts.
One more edit: If you want to be able to pass in flags, you'd need to eval the input somehow.
$script = '/Test/i';
eval { "\$regex = $script" };
and then
"REGEX TEST" =~ $regex
should return true. Doing an eval like this is highly insecure, though.
edit: what happens in eval is that whatever's in the block is executed. So what happens in the eval above is that you're dynamically creating a regular expression and setting it to a variable. That allows you to use regular expression flags like i without having to do any special parsing of the command-line input. When the eval is executed, it will be as if you had typed in $regex = /Test/i. Then you can compare your text to $regex and it will work. I thought about this because your example would not work unless you had the i flag set to make the comparison case-insensitive.
You didn't remove the slashes from $sed_script variable. After I modified your read_FILE function, it started to work:
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
$sed_script =~ s/^\/(.*)\/$/$1/;
#you can also parse the regexp
#$sed_script = qr/$sed_script/;
search_print_lines($sed_script, $file_name);
}
}

Using Perl, how can I build a dynamic regexp by passing in an argument to a subroutine?

I would like to create subroutine with a dynamically created regxp. Here is what I have so far:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub($_);
}
sub theSub {
my $int = #_;
my $var2 = $var =~ m/(??{$int})/;
print "$var2\n";
}
It looks like it will work, but it seems that once the $int in the regex gets evaluated for the first time, it's there forever.
Is there anyway to do something similar to this, but have the regex pick up the new argument each time the sub is called?
The easiest way to fix your code is to add parentheses around my, and remove ??{. Here is the fixed program:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
theSub($_);
}
sub theSub {
my($int) = #_;
my($var2) = $var =~ m/($int)/;
print "$var2\n";
}
One of the problematic lines in your code was my $int = #_, which was equivalent to my $int = 1, because it evaluated #_ in scalar context, yielding the number of elements in #_. To get the first argument of your sub, use my($int) = #_;, which evaluates #_ in list context, or fetch the first element using my $int = $_[0];, or fetch+remove the first element using my $int = shift;
There was a similar problem in the my $var2 = line, you need the parentheses there as well to evaluate the regexp match in list context, yielding the list of ($1, $2, ...), and assigning $var2 = $1.
The construct (??{...}) you were trying to use had the opposite effect to what you wanted: (among doing other things) it compiled your regexp the first time it was used for matching. For regexps containing $ or #, but not containing ??{...}, Perl recompiles the regexp automatically for each match, unless you specify the o flag (e.g. m/$int/o).
The construct (??{...}) means: use Perl code ... to generate a regexp, and insert that regexp here. To get more information, search for ??{ on http://perldoc.perl.org/perlre.html . The reason why it didn't work in your example is that you would have needed an extra layer of parentheses to capture $1, but even with my ($var2) = $var =~ m/((??{$int}))/ it wouldn't have worked, because ??{ has an undocumented property: it forces the compilation of its argument the first time the regexp is used for matching, so my ($var2) = $var =~ m/((??{$int + 5}))/ would have always matched 6.
my $int = #_;
This will give you the count of parameters, always '1' in your case.
I think you want
my $int = shift;
To dynamically pass a regexp to a function, rather than dynamically build it in the function, use qr//.
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub(qr/$int/);
}
sub theSub {
my($regexp) = #_;
my($var2) = ($var =~ $regexp);
print "$var2\n";
}
qr// accepts the same trailing arguments that m// does: i, m, s, and x
my $int is the scalar context, he has ($int) for the list context and that puts $_[0] into $int. In the following only 10 is put into $int and the rest 11 to 99 are lost.
my ($int)=(10..99);
print $int;
10