Reverse regular expression search

Reverse regular expression search - regex

I have a string which contains "foo"s followed by numbers. I'm looking to print the number following the last foo in the string. I've been told that the only way to accomplish this is to reverse the string itself. This isn't very elegant, and I'm surprised Perl doesn't have a better way to get the job done. Is there a better way to do it than this?
#!/usr/bin/perl
# works, but is ugly!!!
$string = "foo1 foo3 foo5 bar foo911 baz";
$string = scalar reverse($string);
$string =~ m/(\d+)oof/;
print scalar reverse("$1");

How about:
$string =~ /.*foo(\d+)/;
Clarification:
$string =~ /.* # Match any character zero or more times, greedily.
foo # Match 'foo'
(\d+) # Match and capture one or more digits.
/x;
The greedy "any character" match will match the first "foo"s in the string, and you'll be left just matching the last "foo".
Example:
#!perl -w
use strict;
use 5.010;
my $string = "foo1 foo2 foo3";
$string =~ /.*foo(\d+)/;
say $1;
Output:
% perl regex.pl
3

I know you already picked an answer but I thought I would add my $0.02.
Why not do a list context global pattern match and take the last element:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "foo1 foo2 foo3 bar";
my #result = $string =~ /foo(\d+)/g;
print pop(#result) . "\n";

Here's another solution, inspired by ysth's comment (if it's a very long string and the last foo is near the beginning, resulting in the regex being slow): split the line on 'foo' and parse the last element for the numbers:
my #results = split /foo/, $string;
my ($digits) = ($results[-1] =~ m/^(\d+)/);
Again, I would always go with the simplest code until it looked like the code was taking too long (and this was a problem in the overall application), and then I'd benchmark a number of solutions against typical inputs to see which is best.

Related

Limit the translation to just one word in a phrase?

Coming new to Perl world from Python, and wonder if there is a simple way to limit the translation or replace to just one word in a phrase?
In the example, the 2nd word kind also got changed to lind. Is there a simple way to do the translation without diving into some looping? Thanks.
The first word has been correctly translated to gazelle, but 2nd word has been changed too as you can see.
my $string = 'gazekke is one kind of antelope';
my $count = ($string =~ tr/k/l/);
print "There are $count changes \n";
print $string; # gazelle is one lind of antelope <-- kind becomes lind too!

I don't know of an option for tr to stop translation after the first word.
But you can use a regex with backreferences for this.
use strict;
my $string = 'gazekke is one kind of antelope';
# Match first word in $1 and rest of sentence in $2.
$string =~ m/(\w+)(.*)/;
# Translate all k's to l's in the first word.
(my $translated = $1) =~ tr/k/l/;
# Concatenate the translated first word with the rest
$string = "$translated$2";
print $string;
Outputs: gazelle is one kind of antelope

Pick the first match (a word in this case), precisely what regex does when without /g, and in that word replace all wanted characters, by running code in the replacement side, by /e
$string =~ s{(\w+)}{ $1 =~ s/k/l/gr }e;
In the regex in the replacement side, /r modifier makes it handily return the changed string and doesn't change the original, what also allows a substitution to run on $1 (which can't be modified as is a read-only).

tr is a character class transliterator. For anything else you would use regex.
$string =~ s/gazekke/gazelle/;
You can put a code block as the second half of s/// to do more complicated replacements or transmogrifications.
$string =~ s{([A-Za-z]+)}{ &mangler($1) if $should_be_mangled{$1}; }ge;
Edit:
Here's how you would first locate a phrase and then work on it.
$phrase_regex = qr/(?|(gazekke) is one kind of antelope|(etc))/;
$string =~ s{($phrase_regex)}{
my $match = $1;
my $word = $2;
$match =~ s{$word}{
my $new = $new_word_map{$word};
&additional_mangling($new);
$new;
}e;
$match;
}ge;
Here's the Perl regex documentation.
https://perldoc.perl.org/perlre

Why can't I store a regexp in a variable?

Given the following code,
my $string = "foo";
my $regex = s/foo/bar/;
$string =~ $regex;
print $string, "\n";
I would have expected the output to be bar, however it is foo. Why is that the case, and how can I solve that problem?
Note that in my actual case, the regex is more complicated, and I actually want to store several of them in a hash (so I can write something like $string =~ $rules{$key}).

You're looking for substitution, not only the regex part so I guess compiled regex (qr//) is not what you're looking for,
use strict;
use warnings;
my $string = "foo";
my $regex = sub { $_[0] =~ s/foo/bar/ };
$regex->($string);
print $string, "\n";
Your statement
my $regex = s/foo/bar/
is equivalent to
my $regex = $_ =~ s/foo/bar/
s/// returns the number of substitutions made, or it returns false (specifically, the empty string). So $regex is now '' or 1 (it could be more if the /g modifier was in effect) and
$string =~ $regex
is doing 'foo' =~ // or 'foo' =~ /1/ depending on what $_ contained originally.

You can store a regex pattern in a variable but, in your example, the regex is just foo, and there is a lot more going on than just that pattern
The statement s/foo/bar/ is more complex than it seems -- it is a fully-fledged statement that applies a regex pattern to a target string and substitutes a replacement string if the pattern is found. In this case the target string is the default variable $_ and the replacement string is foo. You could think of it as a call to a subroutine
substitute($_, 'foo', 'bar')
and the regex pattern is only the second parameter
What you can do is store a regex pattern. The regex part of that substitution is foo, and you can say
my $pattern = qr/foo/;
s/$pattern/bar/;
But you really should explain the problem that you're trying to solve so that we can help you better

In the assignment, you need to tell Perl not to evaluate the regular expression but just to keep it. This is what qr is for.
But you can't do this with whole substitutions, which is why Сухой27 suggests using a subroutine.

Match the nth longest possible string in Perl

The pattern matching quantifiers of a Perl regular expression are "greedy" (they match the longest possible string). To force the match to be "ungreedy", a ? can be appended to the pattern quantifier (*, +).
Here is an example:
#!/usr/bin/perl
$string="111s11111s";
#-- greedy match
$string =~ /^(.*)s/;
print "$1\n"; # prints 111s11111
#-- ungreedy match
$string =~ /^(.*?)s/;
print "$1\n"; # prints 111
But how one can find the second, third and .. possible string match in Perl? Make a simple example of yours --if need a better one.

Utilize a conditional expression, a code expression, and backtracking control verbs.
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
The above will use greedy matching, but will cause the largest match to intentionally fail. If you wanted the 3rd largest, you could just set the number of skips to 2.
Demonstrated below:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "111s11111s11111s";
$string =~ /^(.*)s/;
print "Greedy match - $1\n";
$string =~ /^(.*?)s/;
print "Ungreedy match - $1\n";
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
print "2nd Greedy match - $1\n";
Outputs:
Greedy match - 111s11111s11111
Ungreedy match - 111
2nd Greedy match - 111s11111
When using such advanced features, it is important to have a full understanding of regular expressions to predict the results. This particular case works because the regex is fixed on one end with ^. That means that we know that each subsequent match is also one shorter than the previous. However, if both ends could shift, we could not necessarily predict order.
If that were the case, then you find them all, and then you sort them:
use strict;
use warnings;
my $string = "111s11111s";
my #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
my #sorted = sort {length $b <=> length $a} #seqs;
use Data::Dump;
dd #sorted;
Outputs:
("111s11111s11111", "111s11111", 111)
Note for Perl versions prior to v5.18
Perl v5.18 introduced a change, /(?{})/ and /(??{})/ have been heavily reworked, that enabled the scope of lexical variables to work properly in code expressions as utilized above. Before then, the above code would result in the following errors, as demonstrated in this subroutine version run under v5.16.2:
Variable "$skips" will not stay shared at (re_eval 1) line 1.
Variable "#seqs" will not stay shared at (re_eval 2) line 1.
The fix for older implementations of RE code expressions is to declare the variables with our, and for further good coding practices, to localize them when initialized. This is demonstrated in this modified subroutine version run under v5.16.2, or as put below:
local our #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;

Start by getting all possible matches.
my $string = "111s1111s11111s";
local our #matches;
$string =~ /^(.*)s(?{ push #matches, $1 })(?!)/;
This finds
111s1111s11111
111s1111
111
Then, it's just a matter of finding out which one is the second longuest and filtering out the others.
use List::MoreUtils qw( uniq );
my $target_length = ( sort { $b <=> $a } uniq map length, #matches )[1];
#matches = uniq grep { length($_) == $target_length } #matches
if $target_length;

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).

Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty

Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

Perl: How to replace only matched part of string?

I have a string foo_bar_not_needed_string_part_123. Now in this string I want to remove not_needed_string_part only when foo_ is followed by bar.
I used the below regex:
my $str = "foo_bar_not_needed_string_part_123";
say $str if $str =~ s/foo_(?=bar)bar_(.*?)_\d+//;
But it removed the whole string and just prints a newline.
So, what I need is to remove only the matched (.*?) part. So, that the output is
foo_bar__123.

There's another way, and it's quite simple:
my $str = "foo_bar_not_needed_string_part_123";
$str =~ s/(?<=foo_bar_)\D+//gi;
print $str;
The trick is to use lookbehind check anchor, and replace all non-digit symbols that follow this anchor (not a symbol). Basically, with this pattern you match only the symbols you need to be removed, hence no need for capturing groups.
As a sidenote, in the original regex (?=bar)bar construct is redundant. The first part (lookahead) will match only if some position is followed by 'bar' - but that's exactly what's checked with non-lookahead part of the pattern.

You can capture the parts you do not want to remove:
my $str = "foo_bar_not_needed_string_part_123";
$str =~ s/(foo_bar_).*?(_\d+)/$1$2/;
print $str;

You can try this:
my $str = "foo_bar_not_needed_string_part_123";
say $str if $str =~ s/(foo_(?=bar)bar_).*?(_\d+)/$1$2/;
Outputs:
foo_bar__123
PS: I am new to perl/regex so I am interested if there exist a way to directly replace the matched part. What I have done is captured everything which is required and than replaced the whole string with it.

What's about to divide string to 3 parts, and delete only middle?
$str =~ s/(foo_(?=bar)bar_)(.*?)(_\d+)/$1$3/;

Try this:
(?<=foo_bar_).*(?=_\d)
In this variant, it includes in result ALL (.*) between foo_bar_ and _"any digit".
In your regex, it includes in result:
foo_
Then it looks for "bar" after "foo_":
(?=bar)
But it DOES NOT included at this step. It is included on the next step:
bar_
And then rest of line is included by (.*?)_\d+.
So, in general: it includes in result all this that you typed, EXCEPT (?=bar), which is just looking for "bar" after expression.

go with
echo "foo_bar_not_needed_string_part_123" | perl -pe 's/(?<=foo_bar_)[^\d]+//'

You can use look-behind/look-ahead in this case
$str =~ s/(?<=foo_bar_).*?(?=_\d+)//;
and the look-behind can be replace with \K (keep) to make it a little tidier
$str =~ s/foo_bar_\K.*?(?=_\d+)//;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reverse regular expression search - regex

I know you already picked an answer but I thought I would add my $0.02. Why not do a list context global pattern match and take the last element: #!/usr/bin/perl use strict; use warnings; my $string = "foo1 foo2 foo3 bar"; my #result = $string =~ /foo(\d+)/g; print pop(#result) . "\n";

Related

Limit the translation to just one word in a phrase?

Why can't I store a regexp in a variable?

Match the nth longest possible string in Perl

Perl regex return matches from substitution

Perl: How to replace only matched part of string?

Categories

Resources