Matching a variable in a string in Perl from the end - regex

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.

There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).

The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list

I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output

To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

How to refer to matched part in regex

I am using the following code to search for a substring and print it out with a few characters before and after it. Somehow Perl takes issue with me using $1 and complains about
Use of uninitialized value $1 in concatenation (.) or string.
I cannot figure out why...can you?
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
You have to capture $word into regex group $1 by using parentheses,
while ($lines =~ m/\b($word)\b/g)
When you use $1, you are asking the code to use the first captured group from the regex and since your regex doesn't have any, well, that variable won't exist.
You can either refer to the whole match with $& or you add a capture group to your regex and keep using $1.
i.e. Either:
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$&";
print substr ($lines, max(pos($lines)-length($&)-$context, 0), length($&)+$context); # check: am I possibly violating any boundaries here
}
Or
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/(\b$word\b)/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
Note: It doesn't matter whether you use (\b$word\b) or (\b$word)\b or \b($word\b) or \b($word)\b here because \b is a 'string' of 0 length.
When you want to address a matched part in regex, put it in parenthes. Than you'll be able to address this mathced part via $1 variable (for first pair of parenthes), $2 (for the second pair) and so on.
The values $1, $2 and so on hold the strings found by capture groups. When a match is performed all of these variables are set to undef. The code in the question does not have any capture groups and hence $1 is never given a value, it is undefined.
Running the code below shows the effect. Initially $1, $2 and $3 are not defined. The first match sets $1 and $2 but not $3. The second match sets only $1 but not that $2 is cleared to be undefined. The third match has no capture groups and all three are undefined.
use strict;
use warnings;
sub show
{
printf "\$1: %s\n", (defined $1 ? $1 : "-undef-");
printf "\$2: %s\n", (defined $2 ? $2 : "-undef-");
printf "\$3: %s\n", (defined $3 ? $3 : "-undef-");
print "\n";
}
my $text = "abcdefghij";
show();
$text =~ m/ab(cd)ef(gh)ij/; # First match
show();
$text =~ m/ab(cd)efghij/; # Second match
show();
$text =~ m/abcdefghij/; # Third match
show();
$1 will have no value unless you are actually capturing something.
You can adjust your boundary collection method to using lookahead and lookbehinds.
use strict;
use warnings;
my $lines = "this is just a test to find something out";
my $word = "test";
my $extra = 10;
while ($lines =~ m/(?:(?<=(.{$extra}))|(.{0,$extra}))\b(\Q$word\E)\b(?=(.{0,$extra}))/gs ) {
my $pre = $1 // $2;
my $word = $3;
my $post = $4;
print "'...$pre<$word>$post...'\n";
}
Outputs:
'...is just a <test> to find s...'

Matching words with exactly one vowel

I want to match only the strings that have exactly one vowel.
I tried this code, and it works but it also matches those strings that haven't any vowels (for example hshs, ksks, lslsl) and I need only the strings that have just one vowel
if ( $string !~ /\*w[aeiou]\w*[aeiou]\W*/ ) {
print $string;
}
You can use tr/// to count the occurrences of letters in a string.
Something like this perhaps
use strict;
use warnings;
for my $string ( qw/ a fare is paid for every cab /) {
if ( $string =~ tr/aeiuoAEIOU// == 1 ) {
print $string, "\n";
}
}
output
a
is
for
cab
Make it simple, at least one vowel:
if ($string =~ /[aeiou]/i) {
print $string;
}
exactly one vowel:
if ($string =~ /^[^aeiou]*[aeiou][^aeiou]*$/i) {
print $string;
}

Unable to replace a string via regex through a subroutine

I am trying to replace square brackets in a string with an empty string. which means if a string is [SECTION], I want to convert it to SECTION.
I have tried this and it works,
my $sectionName =~ tr/[]//d;
print "$sectionName\n";
However, when I tried to create a general subroutine for replacing strings, it didn't work out. Here's what I tried,
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
$target = quotemeta $target;
$replacement = quotemeta $replacement;
$string =~ tr/$target/$replacement/d;
return $string;
}
I am calling the sub like this,
# the string "[SECTION]" below is intended to be replaced by a variable
my $sectionName = strReplace("[SECTION]", "[]", "");
print "$sectionName\n";
However, instead of getting the replaced string, I am still getting the old one, i.e, [SECTION]. What am I doing wrong? (PS: Perl version 5.14.2)
Perl's tr/// operator does not support variables. You can find various strategies to work around this here: Perl's tr/// is not doing what I want
To summarize, you have two main options:
Wrap your tr/// in an eval.
Convert your tr/// into a substitution using s///.
If your main case for strReplace is actually just to remove characters, I'd write a less-general-purpose sub that does that. Otherwise, a s/// conversion that can both remove and replace looks like this:
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
my %replacement;
#replacement{ split //, $target } = split //, $replacement;
$string =~ s{ ([\Q$target\E]) }{ $replacement{$1} // '' }gxe;
return $string;
}
The substitution repeatedly (because of the /g flag) looks for [\Q$target\E] (a character in a class of any the characters in $target, any special characters automatically escaped if necessary by \Q...\E), and replaces it with the value found by looking in the hash, or just removes it if it wasn't found in the hash.

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

What I mean is:
For example, a{3,} will match 'a' at least three times greedly. It may find five times, 10 times, etc. I need this number. I need this number for the rest of the code.
I can do the rest less efficiently without knowing it, but I thought maybe Perl has some built-in variable to give this number or is there some trick to get it?
Just capture it and use length.
if (/(a{3,})/) {
print length($1), "\n";
}
Use #LAST_MATCH_END and #LAST_MATCH_START
my $str = 'jlkjmkaaaaaamlmk';
$str =~ /a{3,}/;
say $+[0]-$-[0];
Output:
6
NB: This will work only with a one-character pattern.
Here's an idea (maybe this is what you already had?) assuming the pattern you're interested in counting has multiple characters and variable length:
capture the substring which matches the pattern{3,} subpattern
then match the captured substring globally against pattern (note the absence of the quantifier), and force a list context on =~ to get the number of matches.
Here's a sample code to illustrate this (where $patt is the subpattern you're interested in counting)
my $str = "some catbratmatrattatblat thing";
my $patt = qr/b?.at/;
if ($str =~ /some ((?:$patt){3,}) thing/) {
my $count = () = $1 =~ /$patt/g;
print $count;
...
}
Another (admittedly somewhat trivial) example with 2 subpatterns
my $str = "some catbratmatrattatblat thing 11,33,446,70900,";
my $patt1 = qr/b?.at/;
my $patt2 = qr/\d+,/;
if ($str =~ /some ((?:$patt1){3,}) thing ((?:$patt2){2,})/) {
my ($substr1, $substr2) = ($1, $2);
my $count1 = () = $substr1 =~ /$patt1/g;
my $count2 = () = $substr2 =~ /$patt2/g;
say "count1: " . $count1;
say "count2: " . $count2;
}
Limitation(s) of this approach:
Fails miserably with lookarounds. See amon's example.
If you have a pattern of type /AB{n,}/ where A and B are complex patterns, we can split the regex into multiple pieces:
my $string = "ABABBBB";
my $n = 3;
my $count = 0;
TRY:
while ($string =~ /A/gc) {
my $pos = pos $string; # remember position for manual backtracking
$count++ while $string =~ /\GB/g;
if ($count < $n) {
$count = 0;
pos($string) = $pos; # restore previous position
} else {
last TRY;
}
}
say $count;
Output: 4
However, embedding code into the regex to do the counting may be more desirable, as it is more general:
my $string = "ABABBBB";
my $count;
$string =~ /A(?{ $count = 0 })(?:B(?{ $count++ })){3,}/ and say $count;
Output: 4.
The downside is that this code won't run on older perls. (Code was tested on v14 & v16).
Edit: The first solution will fail if the B pattern backtracks, e.g. $B = qr/BB?/. That pattern should match the ABABBBB string three times, but the strategy will only let it match two times. The solution using embedded code allows proper backtracking.