perl regex parenthesis matching

perl regex parenthesis matching - regex

I have a variable $next which contains strings that might contain parenthesis e.g trna(tgc) I want to make this matching statement if ($data[$i][2]=~/$next/){ ..} and it always return false even if it's true in reality. I tried this if ($data[$i][2]=~/trnA\(tgc\)/){ ..}and it works.
my question is : how to insert the '\' in front of each parenthesis into the variable $next?

You need to quote meta-characters.
Try this.
print "match" if( $var1 =~ /\Q$var2\E/ );

I think you want quotemeta:
$next = "trna(tgc)";
$search = quotemeta($next);
if ($data[$i][2]=~/$search/){
//..
}

Related

Perl pattern match not working as expected

I'm trying to match values, which may be comma separated, using a regex. Basically, I want to return true if any value in the string does NOT have 3g or 3k starting in the 3rd position.
My test code is as follows:
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
print $_;
say $_ =~ /(?:^|,)\w{2}[^(?:3G|3K)]/i ? " true" : " false";
}
This returns
in3g123456,dh3k123456,dhec110101 true
dhec110101,dhec123456 true
in3g123456,dh3k123456 false
c3kasdf false <- whaaaaaaaat?
usdfusdufs3gsdf true
I don't understand why the 4th one is not true. Any help would be appreciated.

[^(?:3G|3K)] reads as "any character but (, ?, etc."
failed
v
c3 kasdf
/(?:^|,)\w{2}[^(?:3G|3K)]/i
Use this:
/(?:^|,)\w{2}(?!3G|3K)/i
Demo: https://regex101.com/r/P2XsgN/1.

How about /\b\w{2}(?!3g|3k)/i.
\b matches the empty string at the beginning or end of a word. Slightly simpler equivalent to (^|,) in this situation.
(?!foo) is a zero-width negative lookahead assertion. So, matches the empty string as long as it's not followed by a substring that matches foo.

You can also split the string first, instead of parsing everything with a regex. That is far more flexible and maintainable, and easier.
When processing the list of the extracted "values" you can match any character twice then your pattern, /^..$patt/. The module List::MoreUtils is useful (and fast) for list manipulations, and its notall function is tailor-made for your condition.
use warnings 'all';
use strict;
use List::MoreUtils qw(notall);
my $file = '...';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my $res = notall { /^..(?:3k|3g)/ } split /,/;
print "$_: " . ($res ? 'true' : 'false'), "\n";
}
I presume that you read from a file. If not, replace while (<$fn>) with for (#strings).
The notall function returns true if any element of the list fails the condition.
The split by default uses $_ so we only need the pattern. Here it is simply , but the pattern takes a regex so one can match separators flexibly. For example, this /[,\s]+/ splits on any amount of , and/or whitespace. So ,, , in a string is matched as a separator, as well as , or space(s).
When applied to the array with your strings the above prints
in3g123456,dh3k123456,dhec110101: true
dhec110101,dhec123456: true
in3g123456,dh3k123456: false
c3kasdf: true
usdfusdufs3gsdf: true

You could use substr to get data at 3rd and 4th position and then compare it with (3g|3k).
substr $_,2,2
#!/usr/bin/perl
use strict;
use warnings;
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
my #inputs = split /,/,$_;
my $flag = 0;
foreach (#inputs){
$flag = 1 unless ((substr $_,2,2) =~ /(3g|3k)/);
}
$flag ? print "$_: True\n" : print "$_: False\n";
}
Output:
in3g123456,dh3k123456,dhec110101: True
dhec110101,dhec123456: True
in3g123456,dh3k123456: False
c3kasdf: True
usdfusdufs3gsdf: True
Demo

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.

There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).

The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list

I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output

To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Group of Regular expression matches, return nth match and assign to array

$content = "<p><strong>haha</p> .. .. .. <p><strong> hihi </p>";
my #eachTopic = ($content =~ /(<p><strong>)(.*?)(<\/p>)/g);
#I only want to capture the $2 and add it to array
print $_."\n" foreach(#eachTopic);
The resulting array should have 2 values $eachTopic[0] = "haha" and $eachTopic[1] = "hihi".
I know i could do this using for loop and using a if statement, but just wondering, is there anything in regular expression that i can do something like
#arr = ($content =~ /(x)(.*?)(x)/g$2/)

Instead of trying to do this with a match group, use zero-width assertions to match what comes before and after the part you want to match. That way you're matching only that part, and a global match will return an array of all occurences.
my #eachTopic = ($content =~ /(?<=<p><strong>).*?(?=<\/p>)/g);

Have you tried split?
like so:
my #topics = map{ $_ =~ s/<\/p>$//; $_ }split( /<p><strong>/, $txt );
This of course fails if your string contains something other than <p><strong>...</p> at some point...

Regular expressions, matching operator using a string variable in Perl

I am using a regex but am getting some odd, unexpected "matches". "Names" are sent to a subroutine to be compared to an array called #ASlist, which contains multiple rows. The first element of each row is also a name, followed by 0 to several synonyms. The goal is to match the incoming "name" to any row in #ASlist that has a matching cell.
Sample input, from which $names is derived for the comparison against #ASlist:
13 1 13 chr7 7 70606019 74345818 Otud7a Klf13 E030018B13Rik Trpm1 Mir211 Mtmr10 Fan1 Mphosph10 Mcee Apba2 Fam189a1 Ndnl2 Tjp1 Tarsl2 Tm2d3 1810008I18Rik Pcsk6 Snrpa1 H47 Chsy1 Lrrk1 Aldh1a3 Asb7 Lins Lass3 Adamts17
Sample lines from #ASlist:
HSPA5 BIP FLJ26106 GRP78 MIF2
NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13
ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK
The code:
my ($name) = #_; ## this comes in from another loop elsewhere in code I did not include
chomp $name;
my #collectmatches = (); ## container to collect matches
foreach my $ASline ( #ASlist ){
my #synonyms = split("\t", $ASline );
for ( my $i = 0; $i < scalar #synonyms; $i++ ){
chomp $synonyms[ $i ];
#print "COMPARE $name TO $synonyms[ $i ]\n";
if ( $name =~m/$synonyms[$i]/ ){
print "\tname $name from block matches\n\t$synonyms[0]\n\tvia $synonyms[$i] from AS list\n";
push ( #collectmatches, $synonyms[0], $synonyms[$i] );
}
else {
# print "$name does not match $synonyms[$i]\n";
}
}
}
The script is working but also reports weird matches. Such as, when $name is "E030018B13Rik" it matches "NDUFA5" when it occurs in #ASlist. These two should not be matched up.
If I change the regex from ~m/$synonyms[$i]/ to ~m/^$synonyms[$i]$/, the "weird" matches go away, BUT the script misses the vast majority of matches.

The NDUFA5 record contains B13 as a pattern, which will match E030018<B13>Rik.
If you want to be more literal, then add boundary conditions to your regular expression /\b...\b/. Also should probably escape regular expression special characters using quotemeta.
if ( $name =~ m/\b\Q$synonyms[$i]\E\b/ ) {
Or if you want to test straight equality, then just use eq
if ( $name eq $synonyms[$i] ) {

Another, more Perlish way to test for string equality is to use a hash.
You don't show any real test data, but this short Perl program builds a hash from your array #ASlist of lines of match strings. After that, most of the work is done.
The subsequent for loop tests just E030018B13Rik to see if it is one of the keys of the new %ASlist and prints an appropriate message
use strict;
use warnings;
my #ASlist = (
'HSPA5 BIP FLJ26106 GRP78 MIF2',
'NDUFA5 B13 CI-13KD-B DKFZp781K1356 FLJ12147 NUFM UQOR13',
'ACAN AGC1 AGCAN CSPG1 CSPGCP MSK16 SEDK',
);
my %ASlist = map { $_ => 1 } map /\S+/g, #ASlist;
for (qw/ E030018B13Rik /) {
printf "%s %s\n", $_, $ASlist{$_} ? 'matches' : 'doesn\'t match';
}
output
E030018B13Rik doesn't match

Since you only need to compare two strings, you can simply use eq:
if ( $name eq $synonyms[$i] ){

You are using B13 as the regular expression. As none of the characters has a special meaning, any string containing the substring B13 matches the expression.
E030018B13Rik
^^^
If you want the expression to match the whole string, use anchors:
if ($name =~m/^$synonyms[$i]$/) {
Or, use index or eq to detect substrings (or identical strings, respectively), as your input doesn't seem to use any features of regular expressions.

Break from regex loop in Perl

In Perl regex, how can I break from /ge loop..?
Let's say the code is:
s/\G(foo)(bar)(;|$)/{ break if $3 ne ';'; print "$1\n"; '' }/ge;
...break here doesn't work, but it should illustrate what I mean.

Generally, I would write this as a while statement:
while( s/(foo)(bar)/$1/ ) {
# my code to determine if I should stop
if(something) {
last;
}
}
The caveat with this method is that your search/replace will start at the beginning each time, which may matter depending on your regex.
If you really wanted to do it in the regex, you could write a function that returns an unmodified string if you reached your end point, such as a count in this case:
my $count=0;
sub myfunc {
my ($string, $a, $b) = #_;
$count++;
if($count > 3) {
return $string;
}
return $a;
}
$mystring = "foobar foobar, foobar + foobar and foobar";
$mystring =~ s/((foo)(bar))/myfunc($1,$2,$3)/ge;
# result: $mystring => "foo foo, foo + foobar and foobar"
If I knew your specific case, I could probably provide a more helpful example.

You can use some experimental features to emulate a break statement, the Perl documentation for some of these features warn that they may change in future versions of Perl.
my $str = "abcdef";
my $stop = 0;
$str =~ s/(?(?{ $stop })(?!))(.)/ $stop = 1 if $1 ge "c"; "X" /ge;
print "$str\n";
This will print XXXdef.
A piece wise explanation:
(?(condition)yes-pattern) if the pattern in in condition matches then match yes-pattern, otherwise don't match anything.
(?{ code }) execute code, inside a conditional if the code is true execute the yes-pattern
(?!) will always fail to match, it's meaning is something like "Don't match nothing" and since 'nothing' can be matched at any point in a string it will fail.
So when $stop is true the pattern can never match, and when $stop is false it matches.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

perl regex parenthesis matching - regex

You need to quote meta-characters. Try this. print "match" if( $var1 =~ /\Q$var2\E/ );

I think you want quotemeta: $next = "trna(tgc)"; $search = quotemeta($next); if ($data[$i][2]=~/$search/){ //.. }

Related

Perl pattern match not working as expected

Matching a variable in a string in Perl from the end

Group of Regular expression matches, return nth match and assign to array

Regular expressions, matching operator using a string variable in Perl

Break from regex loop in Perl

Categories

Resources