perl match multiple numbers from a variable similar to egrep - regex

Want to match the number exactly from the variable which has multiple numbers seperated by pipe symbol similar to egrep.below is the code which i tried.
#!/usr/bin/perl
my $searchnum = $ARGV[0];
my $num = "148|1|0|256";
print $num;
if ($searchnum =~ /$num/)
{
print "found";
}
else
{
print "not-found";
}
Expected o/p
perl number_match.pl 1
found
perl number_match.pl 1432
not-found

The regex /148|1|0|256/ matches if the string that this regex is bound to contains a substring that is either 148, 1, 0 or 256. This means that the option 148 is superfluous, as this matches a subset of strings that match 1.
You probably want to test if the given string is equal to one of these options. If you want to use regexes, you have to anchor the regex at the start and the end of the string:
/^ (?:148|1|0|256) $/x
You could also use the grep builtin:
my $number = ...;
if (grep {$number eq $_} qw/148 1 0 256/) { say "found" }
else { say "not-found" }
The grep function takes a block that has to return a boolean value. It returns all elements from the list on the right where the condition returns true. If at least one element matches, then the whole expression evaluates to true.
You could also use a hash that contains all possible options:
my $number = ...;
my %options = map { $_ => undef } qw/148 1 0 256/;
if ( exists $options{$number} ) { say "found" }
else { say "not-found" }
This is more efficient than grep.

Use:
my $num = '^(148|1|0|256)$';
Here is a oneliner:
perl -e "$_=$ARGV[0]; exit if !/^\d+$/; print \"not-\" unless /^(14|156|0|89)$/;print \"found\n\";"

Related

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

Matching all characters in a string except one in any position

How to match (preferably in perl) all strings that match the query string except one character?
Query: TLAQLLLDK
Want to match: xLAQLLLDK, TxAQLLLDK, TLxQLLLDK, etc.
Where 'x' is any capital letter '[A-Z]'.
Use alternation operator.
^(?:[A-Z]LAQLLLDK|T[A-Z]AQLLLDK|TL[A-Z]QLLLDK|.....)$
Likewise fill all..
You can do that by writing a terrible regular expression, which will be horribly slow to build and even slower to execute, or you can just don't use regexes for things like these and write a function that just compares both strings character after character, allows for one "mistake" and returns True only if there was exactly one mistake.
How to match (preferably in perl) all strings that match the query string except one character?
Expanding the answer of #Avinash, by generating the required regular expression dynamically at run time:
my $query = 'TLAQLLLDK';
my $re_proto = '(' . join( '|', map { (my$x=$query)=~s/^(.{$_})./$1\[A-Za-z]/; $x; } (0 .. length($query)-1) ) . ')';
my $re = qr/^$re_proto$/;
my #input = qw(xLAQLLLDK TxAQLLLDK TLxQLLLDK);
my #matches = grep { /$re/ } #input;
print "#matches\n";
(I had to include the [a-z] too, since your example input uses the x as the marker.)
If you need to do that very often, I would advise to cache the generated regular expressions.
Is this what you are looking for?
#!/usr/bin/perl
use strict;
my #str = ("ULAQLLLDK","TAAQLLLDK","TLCQLLLDK","TLAQLLLDK");
while(<#str>){
if (/[A-S,U-Z]LAQLLLDK|T[A-K,M-Z]AQLLLDK|TL[B-Z]QLLLDK/ ){
print "$_\n";
}
}
output:
ULAQLLLDK
TAAQLLLDK
TLCQLLLDK
There are only 9 x 25 = 225 such strings, so you may as well generate them all and put them in a hash for comparison
use strict;
use warnings;
use 5.010;
my %matches;
my $s = 'TLAQLLLDK';
for my $i (0 .. length($s) - 1) {
my $c = substr $s, $i, 1;
for my $cc ('A' .. 'Z') {
substr(my $ss = $s, $i, 1) = $cc;
++$matches{$ss} unless $cc eq $c;
}
}
printf "%d matches found\n", scalar keys %matches;
for ( qw/ TLAQLLLDK TLAQLXLDK / ) {
printf "\$matches{%s} = %s\n", $_, $matches{$_} // 'undef';
}
output
225 matches found
$matches{TLAQLLLDK} = undef
$matches{TLAQLXLDK} = 1

Regular expressions, matching operator using a string variable in Perl

I am using a regex but am getting some odd, unexpected "matches". "Names" are sent to a subroutine to be compared to an array called #ASlist, which contains multiple rows. The first element of each row is also a name, followed by 0 to several synonyms. The goal is to match the incoming "name" to any row in #ASlist that has a matching cell.
Sample input, from which $names is derived for the comparison against #ASlist:
13 1 13 chr7 7 70606019 74345818 Otud7a Klf13 E030018B13Rik Trpm1 Mir211 Mtmr10 Fan1 Mphosph10 Mcee Apba2 Fam189a1 Ndnl2 Tjp1 Tarsl2 Tm2d3 1810008I18Rik Pcsk6 Snrpa1 H47 Chsy1 Lrrk1 Aldh1a3 Asb7 Lins Lass3 Adamts17
Sample lines from #ASlist:
HSPA5 BIP FLJ26106 GRP78 MIF2
NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13
ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK
The code:
my ($name) = #_; ## this comes in from another loop elsewhere in code I did not include
chomp $name;
my #collectmatches = (); ## container to collect matches
foreach my $ASline ( #ASlist ){
my #synonyms = split("\t", $ASline );
for ( my $i = 0; $i < scalar #synonyms; $i++ ){
chomp $synonyms[ $i ];
#print "COMPARE $name TO $synonyms[ $i ]\n";
if ( $name =~m/$synonyms[$i]/ ){
print "\tname $name from block matches\n\t$synonyms[0]\n\tvia $synonyms[$i] from AS list\n";
push ( #collectmatches, $synonyms[0], $synonyms[$i] );
}
else {
# print "$name does not match $synonyms[$i]\n";
}
}
}
The script is working but also reports weird matches. Such as, when $name is "E030018B13Rik" it matches "NDUFA5" when it occurs in #ASlist. These two should not be matched up.
If I change the regex from ~m/$synonyms[$i]/ to ~m/^$synonyms[$i]$/, the "weird" matches go away, BUT the script misses the vast majority of matches.
The NDUFA5 record contains B13 as a pattern, which will match E030018<B13>Rik.
If you want to be more literal, then add boundary conditions to your regular expression /\b...\b/. Also should probably escape regular expression special characters using quotemeta.
if ( $name =~ m/\b\Q$synonyms[$i]\E\b/ ) {
Or if you want to test straight equality, then just use eq
if ( $name eq $synonyms[$i] ) {
Another, more Perlish way to test for string equality is to use a hash.
You don't show any real test data, but this short Perl program builds a hash from your array #ASlist of lines of match strings. After that, most of the work is done.
The subsequent for loop tests just E030018B13Rik to see if it is one of the keys of the new %ASlist and prints an appropriate message
use strict;
use warnings;
my #ASlist = (
'HSPA5 BIP FLJ26106 GRP78 MIF2',
'NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13',
'ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK',
);
my %ASlist = map { $_ => 1 } map /\S+/g, #ASlist;
for (qw/ E030018B13Rik /) {
printf "%s %s\n", $_, $ASlist{$_} ? 'matches' : 'doesn\'t match';
}
output
E030018B13Rik doesn't match
Since you only need to compare two strings, you can simply use eq:
if ( $name eq $synonyms[$i] ){
You are using B13 as the regular expression. As none of the characters has a special meaning, any string containing the substring B13 matches the expression.
E030018B13Rik
^^^
If you want the expression to match the whole string, use anchors:
if ($name =~m/^$synonyms[$i]$/) {
Or, use index or eq to detect substrings (or identical strings, respectively), as your input doesn't seem to use any features of regular expressions.

Perl regex digit

Consider my regex in this code section:
use strict;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}?/) {
print "$chk\n";
}
}
}
The \d{4} will print nothing. The \d{3} will print only 123. But if I change to \d{1,2}? it will print all. I thought, according to all the sources I read so far, that {1,2} mean: one digit but no more than two. So it should have printed only 1 and 2, correct?
What do I need to extract items that contains only one to two digits?
\d{1,2} succeeds if it finds 1 or 2 digits anywhere in the string provided. Additional string content is does not cause the match to fail. If you want to match only when the string contains exactly 1 or 2 digits, do this: ^\d{1,2}$
You should anchor your regular expression for the desired effect. The built-in function grep suits better here since it is a selection from an array that is to be done:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = ( 1, 2, 123 );
print join "\n", grep /^\d{1,2}$/, #list;
It appears to be working perfectly!
Here's a hint: Use the Perl variables $`, $&, and $'. These variables are special regular expression variables that show the part of the string before the match, what was matched, and the post matched string.
Here's a sample program:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Scalar::Util;
my #list = ("1", "2", "123");
foreach my $string (#list) {
if ($string =~ /\d{1,2}?/) {
say qq(We have a match for "string"!);
say qq("$`" "$&" "$'");
}
else {
say "No match makes David Sad";
}
}
The output will be:
We have a match for "1"!
"" "1" ""
We have a match for "2"!
"" "2" ""
We have a match for "123"!
"" "1" "23"
What this does is divide up the string into three sections: The section of the string before the regular expression match, the section of the string that matches the regular expression, and the section of the string after the regular expression match.
In each case, there was no pre-match because the regular expression matches from the start of the string. We also see that \d{1,2}? matches a single digit in each case even through 123 could have matched two digits. Why? Because the question mark on the end of the match specifier tells the regular expression not to be greedy. In this case, we tell the regular expression to match either one or two characters. Fine, it matches on one. Remove the question mark, and the last line would have looked like this:
We have a match for "123"!
"" "12" "3"
If you want to match on one or two digits, but not three or more digits, you'll have to specify the part of your string before and after the one or two digits. Something like this:
/\D\d{1,2}\D/
This would match your string foo12bar, but not foo123bar. But what if the string is 12? In that case, we want to say that either we have the beginning of the string, or a non-digit before our one or two character match, and we either have a non-digit or the end of the string at the end of our one or two character match:
/(\D|^)\d{1,2}(/D|$)/
A quick explanation:
(\D|^): A non-digit or the beginning of the string (The ^ anchor)
d{1,2}: One or two digits
(\D|$): A non-digit or the end of the string (The $ anchor)
Now, this will match 12, but not 123, and it will match foo12 and foo12bar, but not foo123 or foo123bar.
Just looking for a one or two digit number, we can simply specify the anchors:
/^\d{1,2}$/;
Now, that will match 1, 12, but not foo12 or 123.
The main thing is to use the $`, $&, and $' variables in order to help see exactly what your regular expression is matching on and what's before and after your match.
No, because while the regex only matches two digits, $chk still contains 123. If you want to only print the part that is matched, use
if ($chk =~ m/(\d{1,2})/) {
print "$1\n";
}
Note the parentheses and the $1. This causes it to print only that which is in the parentheses.
Also, this code doesn't make much sense:
sub chk {
my #num = split (" ", "#_");
Because #_ already is an array it makes no sense to make it into a string and then split it. Simply do:
sub chk {
foreach my $chk (#_) {
You also do not need to use chomp for data that is not coming from user input, as it is intended to remove the trailing newline. There is no newline in any of this data.
#!/usr/bin/perl
use strict;
my #list = ("1", "2", "123");
&chk(\#list);
sub chk {
foreach my $chk (#{$_[0]}) {
print "$chk\n" if $chk =~ m/^\d{1,2}$/ ;
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}/ && length($chk) <= 2) {
print "$chk\n";
}
}
}

Perl regex: how to know number of matches

I'm looping through a series of regexes and matching it against lines in a file, like this:
for my $regex (#{$regexs_ref}) {
LINE: for (#rawfile) {
/#$regex/ && do {
# do something here
next LINE;
};
}
}
Is there a way for me to know how many matches I've got (so I can process it accordingly..)?
If not maybe this is the wrong approach..? Of course, instead of looping through every regex, I could just write one recipe for each regex. But I don't know what's the best practice?
If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.
Or am I misunderstanding the question?
Example:
my #list = /$my_regex/g;
if (#list)
{
# do stuff
print "Number of matches: " . scalar #list . "\n";
}
You will need to keep track of that yourself. Here is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my #regexes = (
qr/b/,
qr/a/,
qr/foo/,
qr/quux/,
);
my %matches = map { $_ => 0 } #regexes;
while (my $line = <DATA>) {
for my $regex (#regexes) {
next unless $line =~ /$regex/;
$matches{$regex}++;
}
}
for my $regex (#regexes) {
print "$regex matched $matches{$regex} times\n";
}
__DATA__
foo
bar
baz
In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special #- array (emphasis added):
#LAST_MATCH_START
#-
$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with
substr $_, $-[n], $+[n] - $-[n]
if $-[n] is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with #+.
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.
After a match against some variable $var:
$` is the same as substr($var, 0, $-[0])
$& is the same as substr($var, $-[0], $+[0] - $-[0])
$' is the same as substr($var, $+[0])
$1 is the same as substr($var, $-[1], $+[1] - $-[1])
$2 is the same as substr($var, $-[2], $+[2] - $-[2])
$3 is the same as substr($var, $-[3], $+[3] - $-[3])
Example usage:
#! /usr/bin/perl
use warnings;
use strict;
my #patterns = (
qr/(foo(bar(baz)))/,
qr/(quux)/,
);
chomp(my #rawfile = <DATA>);
foreach my $pattern (#patterns) {
LINE: for (#rawfile) {
/$pattern/ && do {
my $captures = $#-;
my $s = $captures == 1 ? "" : "s";
print "$_: got $captures capture$s\n";
};
}
}
__DATA__
quux quux quux
foobarbaz
Output:
foobarbaz: got 3 captures
quux quux quux: got 1 capture
How about below code:
my $string = "12345yx67hjui89";
my $count = () = $string =~ /\d/g;
print "$count\n";
It prints 9 here as expected.