Regex does not work with input string containing + inside - regex

I have the following piece of code:
$url = "http://www.example.com/url.html";
$content=Encode::encode_utf8(get $url);
$nameaux = Encode::encode_utf8($DBfield);
if($content =~ />$nameaux<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class2">(.*?)<\/td>/ ||
$content =~ />$nameaux<\/a><\/td><td class="class3">(.*?)<\/td>/ ) {
... more code ...
}
This piece of code works great except when $DBfield is equal to a string containing a plus (ex. A+1) on it that exists on $content.
Could someone explain my how to handle this?

If $nameaux can contain regex characters (like +), you need to escape the field to a regex literal by wrapping with \Q ... \E.
$content =~ />\Q$nameaux\E<\/a><\/td><td class="class1">(.*?)<\/td>/ ||
So + will be just a plus sign and not mean "one or more of", which is why your regex doesn't match.

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Unable to replace a string via regex through a subroutine

I am trying to replace square brackets in a string with an empty string. which means if a string is [SECTION], I want to convert it to SECTION.
I have tried this and it works,
my $sectionName =~ tr/[]//d;
print "$sectionName\n";
However, when I tried to create a general subroutine for replacing strings, it didn't work out. Here's what I tried,
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
$target = quotemeta $target;
$replacement = quotemeta $replacement;
$string =~ tr/$target/$replacement/d;
return $string;
}
I am calling the sub like this,
# the string "[SECTION]" below is intended to be replaced by a variable
my $sectionName = strReplace("[SECTION]", "[]", "");
print "$sectionName\n";
However, instead of getting the replaced string, I am still getting the old one, i.e, [SECTION]. What am I doing wrong? (PS: Perl version 5.14.2)
Perl's tr/// operator does not support variables. You can find various strategies to work around this here: Perl's tr/// is not doing what I want
To summarize, you have two main options:
Wrap your tr/// in an eval.
Convert your tr/// into a substitution using s///.
If your main case for strReplace is actually just to remove characters, I'd write a less-general-purpose sub that does that. Otherwise, a s/// conversion that can both remove and replace looks like this:
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
my %replacement;
#replacement{ split //, $target } = split //, $replacement;
$string =~ s{ ([\Q$target\E]) }{ $replacement{$1} // '' }gxe;
return $string;
}
The substitution repeatedly (because of the /g flag) looks for [\Q$target\E] (a character in a class of any the characters in $target, any special characters automatically escaped if necessary by \Q...\E), and replaces it with the value found by looking in the hash, or just removes it if it wasn't found in the hash.

In regular expression matching of Perl, is it possible to know number of matches in a{n,}?

What I mean is:
For example, a{3,} will match 'a' at least three times greedly. It may find five times, 10 times, etc. I need this number. I need this number for the rest of the code.
I can do the rest less efficiently without knowing it, but I thought maybe Perl has some built-in variable to give this number or is there some trick to get it?
Just capture it and use length.
if (/(a{3,})/) {
print length($1), "\n";
}
Use #LAST_MATCH_END and #LAST_MATCH_START
my $str = 'jlkjmkaaaaaamlmk';
$str =~ /a{3,}/;
say $+[0]-$-[0];
Output:
6
NB: This will work only with a one-character pattern.
Here's an idea (maybe this is what you already had?) assuming the pattern you're interested in counting has multiple characters and variable length:
capture the substring which matches the pattern{3,} subpattern
then match the captured substring globally against pattern (note the absence of the quantifier), and force a list context on =~ to get the number of matches.
Here's a sample code to illustrate this (where $patt is the subpattern you're interested in counting)
my $str = "some catbratmatrattatblat thing";
my $patt = qr/b?.at/;
if ($str =~ /some ((?:$patt){3,}) thing/) {
my $count = () = $1 =~ /$patt/g;
print $count;
...
}
Another (admittedly somewhat trivial) example with 2 subpatterns
my $str = "some catbratmatrattatblat thing 11,33,446,70900,";
my $patt1 = qr/b?.at/;
my $patt2 = qr/\d+,/;
if ($str =~ /some ((?:$patt1){3,}) thing ((?:$patt2){2,})/) {
my ($substr1, $substr2) = ($1, $2);
my $count1 = () = $substr1 =~ /$patt1/g;
my $count2 = () = $substr2 =~ /$patt2/g;
say "count1: " . $count1;
say "count2: " . $count2;
}
Limitation(s) of this approach:
Fails miserably with lookarounds. See amon's example.
If you have a pattern of type /AB{n,}/ where A and B are complex patterns, we can split the regex into multiple pieces:
my $string = "ABABBBB";
my $n = 3;
my $count = 0;
TRY:
while ($string =~ /A/gc) {
my $pos = pos $string; # remember position for manual backtracking
$count++ while $string =~ /\GB/g;
if ($count < $n) {
$count = 0;
pos($string) = $pos; # restore previous position
} else {
last TRY;
}
}
say $count;
Output: 4
However, embedding code into the regex to do the counting may be more desirable, as it is more general:
my $string = "ABABBBB";
my $count;
$string =~ /A(?{ $count = 0 })(?:B(?{ $count++ })){3,}/ and say $count;
Output: 4.
The downside is that this code won't run on older perls. (Code was tested on v14 & v16).
Edit: The first solution will fail if the B pattern backtracks, e.g. $B = qr/BB?/. That pattern should match the ABABBBB string three times, but the strategy will only let it match two times. The solution using embedded code allows proper backtracking.

How can I escape meta-characters when I interpolate a variable in Perl's match operator?

Suppose I have a file containing lines I'm trying to match against:
foo
quux
bar
In my code, I have another array:
foo
baz
quux
Let's say we iterate through the file, calling each element $word, and the internal list we are checking against, #arr.
if( grep {$_ =~ m/^$word$/i} #arr)
This works correctly, but in the somewhat possible case where we have an test case of fo. in the file, the . operates as a wildcard operator in the regex, and fo. then matches foo, which is not acceptable.
This is of course because Perl is interpolating the variable into a regex.
The question:
How do I force Perl to use the variable literally?
Use \Q...\E to escape special symbols directly in perl string after variable value interpolation:
if( grep {$_ =~ m/^\Q$word\E$/i} #arr)
From perlfaq6's answer to How do I match a regular expression that's in a variable?:
We don't have to hard-code patterns into the match operator (or anything else that works with regular expressions). We can put the pattern in a variable for later use.
The match operator is a double quote context, so you can interpolate your variable just like a double quoted string. In this case, you read the regular expression as user input and store it in $regex. Once you have the pattern in $regex, you use that variable in the match operator.
chomp( my $regex = <STDIN> );
if( $string =~ m/$regex/ ) { ... }
Any regular expression special characters in $regex are still special, and the pattern still has to be valid or Perl will complain. For instance, in this pattern there is an unpaired parenthesis.
my $regex = "Unmatched ( paren";
"Two parens to bind them all" =~ m/$regex/;
When Perl compiles the regular expression, it treats the parenthesis as the start of a memory match. When it doesn't find the closing parenthesis, it complains:
Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
You can get around this in several ways depending on our situation. First, if you don't want any of the characters in the string to be special, you can escape them with quotemeta before you use the string.
chomp( my $regex = <STDIN> );
$regex = quotemeta( $regex );
if( $string =~ m/$regex/ ) { ... }
You can also do this directly in the match operator using the \Q and \E sequences. The \Q tells Perl where to start escaping special characters, and the \E tells it where to stop (see perlop for more details).
chomp( my $regex = <STDIN> );
if( $string =~ m/\Q$regex\E/ ) { ... }
Alternately, you can use qr//, the regular expression quote operator (see perlop for more details). It quotes and perhaps compiles the pattern, and you can apply regular expression flags to the pattern.
chomp( my $input = <STDIN> );
my $regex = qr/$input/is;
$string =~ m/$regex/ # same as m/$input/is;
You might also want to trap any errors by wrapping an eval block around the whole thing.
chomp( my $input = <STDIN> );
eval {
if( $string =~ m/\Q$input\E/ ) { ... }
};
warn $# if $#;
Or...
my $regex = eval { qr/$input/is };
if( defined $regex ) {
$string =~ m/$regex/;
}
else {
warn $#;
}
The correct answer is - don't use regexps. I'm not saying regexps are bad, but using them for (what equals to) simple equality check is overkill.
Use: grep { lc($_) eq lc($word) } #arr and be happy.
Quotemeta
Returns the value of EXPR with all non-"word" characters backslashed.
http://perldoc.perl.org/functions/quotemeta.html
I don't think you want a regex in this case since you aren't matching a pattern. You're looking for a literal sequence of characters that you already know. Build a hash with the values to match and use that to filter #arr:
open my $fh, '<', $filename or die "...";
my %hash = map { chomp; lc($_), 1 } <$fh>;
foreach my $item ( #arr )
{
next unless exists $hash{ lc($item) };
print "I matched [$item]\n";
}