Perl pattern match not working as expected

Perl pattern match not working as expected - regex

I'm trying to match values, which may be comma separated, using a regex. Basically, I want to return true if any value in the string does NOT have 3g or 3k starting in the 3rd position.
My test code is as follows:
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
print $_;
say $_ =~ /(?:^|,)\w{2}[^(?:3G|3K)]/i ? " true" : " false";
}
This returns
in3g123456,dh3k123456,dhec110101 true
dhec110101,dhec123456 true
in3g123456,dh3k123456 false
c3kasdf false <- whaaaaaaaat?
usdfusdufs3gsdf true
I don't understand why the 4th one is not true. Any help would be appreciated.

[^(?:3G|3K)] reads as "any character but (, ?, etc."
failed
v
c3 kasdf
/(?:^|,)\w{2}[^(?:3G|3K)]/i
Use this:
/(?:^|,)\w{2}(?!3G|3K)/i
Demo: https://regex101.com/r/P2XsgN/1.

How about /\b\w{2}(?!3g|3k)/i.
\b matches the empty string at the beginning or end of a word. Slightly simpler equivalent to (^|,) in this situation.
(?!foo) is a zero-width negative lookahead assertion. So, matches the empty string as long as it's not followed by a substring that matches foo.

You can also split the string first, instead of parsing everything with a regex. That is far more flexible and maintainable, and easier.
When processing the list of the extracted "values" you can match any character twice then your pattern, /^..$patt/. The module List::MoreUtils is useful (and fast) for list manipulations, and its notall function is tailor-made for your condition.
use warnings 'all';
use strict;
use List::MoreUtils qw(notall);
my $file = '...';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my $res = notall { /^..(?:3k|3g)/ } split /,/;
print "$_: " . ($res ? 'true' : 'false'), "\n";
}
I presume that you read from a file. If not, replace while (<$fn>) with for (#strings).
The notall function returns true if any element of the list fails the condition.
The split by default uses $_ so we only need the pattern. Here it is simply , but the pattern takes a regex so one can match separators flexibly. For example, this /[,\s]+/ splits on any amount of , and/or whitespace. So ,, , in a string is matched as a separator, as well as , or space(s).
When applied to the array with your strings the above prints
in3g123456,dh3k123456,dhec110101: true
dhec110101,dhec123456: true
in3g123456,dh3k123456: false
c3kasdf: true
usdfusdufs3gsdf: true

You could use substr to get data at 3rd and 4th position and then compare it with (3g|3k).
substr $_,2,2
#!/usr/bin/perl
use strict;
use warnings;
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
my #inputs = split /,/,$_;
my $flag = 0;
foreach (#inputs){
$flag = 1 unless ((substr $_,2,2) =~ /(3g|3k)/);
}
$flag ? print "$_: True\n" : print "$_: False\n";
}
Output:
in3g123456,dh3k123456,dhec110101: True
dhec110101,dhec123456: True
in3g123456,dh3k123456: False
c3kasdf: True
usdfusdufs3gsdf: True
Demo

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}

if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.

I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$

Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

Save result from flip-flop in variable?

I have about 1kB of text from STDIN
my $f = join("", <STDIN>);
and I would like to get the content between open1 and close1, so /open1/../close1/ comes to mind.
I have only seen it been used in one liners and in scripts in while-loops and $_.
Question
How can I get the result from /open1/../close1/ in my script when everything is in $f?

Capturing all matches with a single regular expression
If you want to capture all the lines between open1 and start1 markers (excluding the markers), it is easily done with a single regular expression:
my $f = join("", <STDIN>);
my #matches = ( $f =~ m/\bopen1\b(.*?)\bclose1\b/gs );
for my $m (#matches) {
print "$m";
}
where
s modifier treats the string as a single line;
g modifier captures all the matches;
(.*?) matches a group of any characters using the lazy quantifier
Using the range operator
The range operator (so-called flip-flop) is not very convenient for this task if you want to avoid capturing the markers, because an expression like /open1/ .. /close1/ returns true for the lines matching the patterns.
The expression /^open1$/ .. /^close1$/ returns false until /^open1$/ is true. The left regular expression stops being evaluated once it matches the line, and keeps returning true until /^close1$/ becomes true. When the right expression matches, the cycle is restarted. Thus, the open1 and close1 markers are included into $matches.
It is even less convenient, if the input is stored in a variable, because you will need to read the contents of the variable line by line, e.g.:
my $matches = "";
my #lines = split /\n/, $f;
foreach my $line (#lines) {
if ($line =~ m/^open1$/ .. $line =~ m/^close1$/) {
$matches .= "$line\n";
}
}
Note, it is possible to use arbitrary Perl expressions as operands of the range operator. I wouldn't recommend this code, as it is not very efficient, and not very readable. At the same time it is easy to adapt the first example to the case where the open1 and close1 markers are included into the set of matches, e.g.:
my #matches = ( $f =~ m/\bopen1\b(.*?)\bclose1\b/gs );
for my $m (#matches) {
print "open1${m}close1\n";
}

You can rewrite how $f is generated so that it takes advantage of the flip-flop inside a while loop:
my ( $f, $matched );
while ( <> ) {
$f .= $_;
$matched .= $_ if /open1/ .. /close1/;
}

Another way is to create a new inputs stream out of the contents of $f.
open my $fh, '<', \$f;
while (<$fh>) {
if (/open1/ .. /close1/) {
...
}
}

You can also employ split. To get what is between the first pair of open1 and close1
my $open_to_close = (split /open1|close1/, $f)[1];
The delimiter can be either open1 or close1, so returned is a list of three elements: before open1, between them, and after close1. We take the second element.
If there are more open1/close1 pairs take all odd-indexed elements.
Either get the array as well
my #parts = split /open1|close1/, $f;
my #all_open_to_close = #parts[ grep { $_ & 1 } 0..$#parts ];
or get it directly from the list
my #all_open_to_close =
grep { CORE::state $i; ++$i % 2 == 0 } split /open1|close1/, $f;
The state is a feature
from v5.10. If you already use that you don't need CORE:: prefix.

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.

There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).

The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list

I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output

To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Regular expressions, matching operator using a string variable in Perl

I am using a regex but am getting some odd, unexpected "matches". "Names" are sent to a subroutine to be compared to an array called #ASlist, which contains multiple rows. The first element of each row is also a name, followed by 0 to several synonyms. The goal is to match the incoming "name" to any row in #ASlist that has a matching cell.
Sample input, from which $names is derived for the comparison against #ASlist:
13 1 13 chr7 7 70606019 74345818 Otud7a Klf13 E030018B13Rik Trpm1 Mir211 Mtmr10 Fan1 Mphosph10 Mcee Apba2 Fam189a1 Ndnl2 Tjp1 Tarsl2 Tm2d3 1810008I18Rik Pcsk6 Snrpa1 H47 Chsy1 Lrrk1 Aldh1a3 Asb7 Lins Lass3 Adamts17
Sample lines from #ASlist:
HSPA5 BIP FLJ26106 GRP78 MIF2
NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13
ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK
The code:
my ($name) = #_; ## this comes in from another loop elsewhere in code I did not include
chomp $name;
my #collectmatches = (); ## container to collect matches
foreach my $ASline ( #ASlist ){
my #synonyms = split("\t", $ASline );
for ( my $i = 0; $i < scalar #synonyms; $i++ ){
chomp $synonyms[ $i ];
#print "COMPARE $name TO $synonyms[ $i ]\n";
if ( $name =~m/$synonyms[$i]/ ){
print "\tname $name from block matches\n\t$synonyms[0]\n\tvia $synonyms[$i] from AS list\n";
push ( #collectmatches, $synonyms[0], $synonyms[$i] );
}
else {
# print "$name does not match $synonyms[$i]\n";
}
}
}
The script is working but also reports weird matches. Such as, when $name is "E030018B13Rik" it matches "NDUFA5" when it occurs in #ASlist. These two should not be matched up.
If I change the regex from ~m/$synonyms[$i]/ to ~m/^$synonyms[$i]$/, the "weird" matches go away, BUT the script misses the vast majority of matches.

The NDUFA5 record contains B13 as a pattern, which will match E030018<B13>Rik.
If you want to be more literal, then add boundary conditions to your regular expression /\b...\b/. Also should probably escape regular expression special characters using quotemeta.
if ( $name =~ m/\b\Q$synonyms[$i]\E\b/ ) {
Or if you want to test straight equality, then just use eq
if ( $name eq $synonyms[$i] ) {

Another, more Perlish way to test for string equality is to use a hash.
You don't show any real test data, but this short Perl program builds a hash from your array #ASlist of lines of match strings. After that, most of the work is done.
The subsequent for loop tests just E030018B13Rik to see if it is one of the keys of the new %ASlist and prints an appropriate message
use strict;
use warnings;
my #ASlist = (
'HSPA5 BIP FLJ26106 GRP78 MIF2',
'NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13',
'ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK',
);
my %ASlist = map { $_ => 1 } map /\S+/g, #ASlist;
for (qw/ E030018B13Rik /) {
printf "%s %s\n", $_, $ASlist{$_} ? 'matches' : 'doesn\'t match';
}
output
E030018B13Rik doesn't match

Since you only need to compare two strings, you can simply use eq:
if ( $name eq $synonyms[$i] ){

You are using B13 as the regular expression. As none of the characters has a special meaning, any string containing the substring B13 matches the expression.
E030018B13Rik
^^^
If you want the expression to match the whole string, use anchors:
if ($name =~m/^$synonyms[$i]$/) {
Or, use index or eq to detect substrings (or identical strings, respectively), as your input doesn't seem to use any features of regular expressions.

perl insert and or replace string variable into a string

the requirement I have is to check a string and based on particular set of chars either insert or replace with prefix string
$prefix = "DV1";
Following are my source $input strings:
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
the first chars before first . can be of any length
but after the first . the chars ABC remain constant followed by any one character - these four chars will always be there in my input string.
after these 4 chars, the i/p string may have two alphanumeric chars - T2 in this case.
what needs to be done is check if $input has "T2" (can be any two alphanum chars) and if it has then replace those 2 chars with D1 (any two chars from $prefix)
if $input does not have "T2", then insert $prefix

This can be done quite straightforwardly with a single substitution. This program demonstrates
The pattern looks for the sequence .ABC followed by any non-dot character. The \K protects that part of the pattern from being changed. Then there may be two optional non-dot characters, followed by a dot. The replacement string is D1 if the two optional characters were present, or the value of $prefix if not
use strict;
use warnings;
my $prefix = 'DV1';
while (<DATA>) {
s/\.ABC[^.]\K([^.]{2})?(?=\.)/$1 ? 'D1' : $prefix/e;
print;
}
__DATA__
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
output
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
9C5.ABCSDV1.MIM.EDR2.QR

Here's the code you can try..
I am assuming that, T2 can be a string of length 2 any alphanumeric characters.. It can be A4, or 5B...
#!/perl/bin
use v5.14;
use warnings;
my $str = "9C5.ABCS.MIM.EDR2.QR";
my $str1 = "SS7.ABCWT2.RSND.LTE1.QR";
my $prefix = "DV1";
my $file = 'D:\Programming\Perl\Learning Perl\chapter_1\demo.txt';
open my $fh, '<', $file or die $!;
foreach (<$fh>) {
if (m/(^.*\.ABC\w)\w{2}\./g) {
s/(^.*\.ABC\w)\w{2}\./$1D1\./;
} else {
s/(^.*\.ABC\w)\./$1$prefix\./;
}
say; # Takes current line as default($_). We don't need to specify it.
}
Input File: -
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCEX4.VCE2.QR
Y88.ABCN5W.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
Output: -
SS7.ABCWD1.RSND.LTE1.QR # Replace T2
IT4.ABCED1.VCE2.QR # Replace X4
Y88.ABCND1.MIM.EDR2.QR # Replace 5W
9C5.ABCSDV1.MIM.EDR2.QR # Does not contains T2. Add DV1

Try the following code, and tell me if it fits your needs :
#!/usr/bin/perl -l
use strict;
use warnings;
my $text =<<EOF;
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
EOF
my $prefix = "DV1";
for (split "\n", $text) {
s/^(\w+\.ABC\w)T2/$1D1/ || s/^/$prefix/;
print;
}
OUTPUT
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
DV19C5.ABCS.MIM.EDR2.QR

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl pattern match not working as expected - regex

[^(?:3G|3K)] reads as "any character but (, ?, etc." failed v c3 kasdf /(?:^|,)\w{2}[^(?:3G|3K)]/i Use this: /(?:^|,)\w{2}(?!3G|3K)/i Demo: https://regex101.com/r/P2XsgN/1.

How about /\b\w{2}(?!3g|3k)/i. \b matches the empty string at the beginning or end of a word. Slightly simpler equivalent to (^|,) in this situation. (?!foo) is a zero-width negative lookahead assertion. So, matches the empty string as long as it's not followed by a substring that matches foo.

Related

How to verify if a variable value contains a character and ends with a number using Perl

Save result from flip-flop in variable?

Matching a variable in a string in Perl from the end

Regular expressions, matching operator using a string variable in Perl

perl insert and or replace string variable into a string

Categories

Resources