How do I use operations inside a Perl regex? - regex

For example let's say I have something like this:
$_ = 23;
$a = 2;
print /$a $a+1/x;
should print 1. Basically, is it possible to use functions inside the regex string?

Variable interpolation in regexes works pretty much the same as variable interpolation in strings. Given my $x = 2, the string "$x $x+1" would be "2 2+1". The variable is expanded, but code in the string is not executed.
One trick around this is to use dereference a reference inside the string. This allows us to include arbitrary expressions, but the syntax is a bit cumbersome. Usually, we create an array reference with the value we want to include [$x + 1], then immediately dereference it: #{[$x + 1]}. This is similar to Ruby's #{...} interpolation, or to Bash $(...) command substitution.
So the regex /$x #{[$x + 1]}/x would work.
But in most cases, it's going to be much clearer to perform all calculations outside of the regex:
my $x = 2;
my $y = $x + 1;
/$x $y/x;
The Perl regex syntax also has syntax that can generate parts of the regex dynamically. With variable interpolation as above the variable contents are interpolated, and then the regex is compiled. But advanced regexes may change the value of a variable during the pattern match. These delayed regexes can be written with the (??{ ... }) syntax. Here: /$x (??{ $x + 1 })/x. However, this is a very advanced and error-prone regex feature. This will also be slower than an ordinary regex.

There is an extended pattern that provides for code execution in the match operator m/ or in the matching part of the substitution operator s///.
Its version that substitutes the code's return and goes on to treat it as a pattern is
/(??{ code })/
so in your case
$_ = 23;
my $x = 2;
my ($m) = /(2(??{ $x+1 }))/;
say $m;
or
RE_EVAL: {
use re 'eval';
my ($m) = /($x(??{ $x+1 }))/;
say $m;
}
matches and captures 23.
Here use re 'eval' specifically allows this, normally disallowed for security reasons.
This is a very involved capability which comes with complex warnings. Apart from its entry at the above link also follow the link in that text and read about Embedded Code Execution frequency.
Please don't use this complex tool for convenience, or to substitute for properly written code.

Related

How to construct this regular expression?

How to ignore abc and def,01 in below expression using regex. I tried ignore me it doesn’t work.
Abc-def-smdp-01
One way to go is to split the original string into prefix, portion of interest, and suffix, removing the unwanted charcters in the affixes thereafter:
$raw = "abc-def-smdp-01";
preg_match ( "/^(.*)(-smdp-)(.*)$/", $raw, $matches ); // separate original string into prefix, trunk, suffix
$matches[1] = preg_replace ( "/[^-]/", "", $matches[1] ); // non-'-' characters deleted in prefix
$matches[3] = preg_replace ( "/[^-]/", "", $matches[3] ); // non-'-' characters deleted in suffixfix
$result = $matches[1].$matches[2].$matches[3]; // composing target string
echo $result;
Online demo available here.
NB
Problems similar to this one can be tackled easily with some knowledge of the php function library whose doc within the online php manual comes with exakt syntax, example code, and user comments.
In this case, look up:
preg_match
preg_replace
Of course, finding suitable candidates for the intended functionality assumes at least a perfunctory grasp of the available facilities which makes the 15 minutes of browsing the hierarchical index a judicious investment of time
I would use this pattern:
preg_match("/\w+\-\w+\-(\w+)\-\d+/", "Abc-def-smdp-01", $output);
Echo $output[1];
http://www.phpliveregex.com/p/ftB
EDIT: or do you need the dashes?
In that case you need to use this pattern:"/\w+(\-)\w+(\-)(\w+)(\-)\d+/"
And then output it as:
echo $output[1].$output[2].$output[3].$output[4];
or loop it and build a string with .=

regular expression url rewrite based on folder

I need to be able to take /calendar/MyCalendar.ics where MyCalendar.ics coudl be about anything with an ICS extention and rewrite it to /feeds/ics/ics_classic.asp?MyCalendar.ics
Thanks
Regular expressions are meant for searching/matching text. Usually you will use regex to define what you search for some text manipulation tool, and then use a tool specific way to tell the tool with what to replace the text.
Regex syntax use round brackets to define capture groups inside the whole search pattern. Many search and replace tools use capture groups to define which part of the match to replace.
We can take the Java Pattern and Matcher classes as example. To complete your task with the Java Matcher you can use the following code:
Pattern p = Pattern.compile("/calendar/(.*\.(?i)ics)");
Matcher m = p.matcher(url);
String rewritenUrl = "";
if(m.matches()){
rewritenUrl = "/feeds/ics/ics_classic.asp?" + url.substring( m.start(1), m.end(1));
}
This will find the requested pattern but will only take the first regex group for creating the new string.
Here is a link to regex replacement information in (imho) a very good regex information site: http://www.regular-expressions.info/refreplace.html
C:\x>perl foo.pl
Before: a=/calendar/MyCalendar.ics
After: a=/feeds/ics/ics_classic.asp?MyCalendar.ics
...or how about this way?
(regex kind of seems like overkill for this problem)
b=/calendar/MyCalendar.ics
index=9
c=MyCalendar.ics (might want to add check for ending with '.ics')
d=/feeds/ics/ics_classic.asp?MyCalendar.ics
Here's the code:
C:\x>type foo.pl
my $a = "/calendar/MyCalendar.ics";
print "Before: a=$a\n";
my $match = (
$a =~ s|^.*/([^/]+)\.ics$|/feeds/ics/ics_classic.asp?$1.ics|i
);
if( ! $match ) {
die "Expected path/filename.ics instead of \"$a\"";
}
print "After: a=$a\n";
print "\n";
print "...or how about this way?\n";
print "(regex kind of seems like overkill for this problem)\n";
my $b = "/calendar/MyCalendar.ics";
my $index = rindex( $b, "/" ); #find last path delim.
my $c = substr( $b, $index+1 );
print "b=$b\n";
print "index=$index\n";
print "c=$c (might want to add check for ending with '.ics')\n";
my $d = "/feeds/ics/ics_classic.asp?" . $c;
print "d=$d\n";
C:\x>
General thoughts:
If you do solve this with a regex, a semi-tricky bit is making sure your capture group (the parens) exclude the path separator.
Some things to consider:
Are your paths separators always forward-slashes?
Regex seems like overkill for this; the simplest thing I can think of of to get the index of your last path separator and do simple string manipulation (2nd part of sample program).
Libraries often have routines for parsing paths. In Java I'd look at the java.io.File object, for example, specifically
getName()
Returns the name of the file or directory denoted by
this abstract pathname. This is just the last name in
the pathname's name sequence

how do you match two strings in two different variables using regular expressions?

$a='program';
$b='programming';
if ($b=~ /[$a]/){print "true";}
this is not working
thanks every one i was a little confused
The [] in regex mean character class which match any one of the character listed inside it.
Your regex is equivalent to:
$b=~ /[program]/
which returns true as character p is found in $b.
To see if the match happens or not you are printing true, printing true will not show anything. Try printing something else.
But if you wanted to see if one string is present inside another you have to drop the [..] as:
if ($b=~ /$a/) { print true';}
If variable $a contained any regex metacharacter then the above matching will fail to fix that place the regex between \Q and \E so that any metacharacters in the regex will be escaped:
if ($b=~ /\Q$a\E/) { print true';}
Assuming either variable may come from external input, please quote the variables inside the regex:
if ($b=~ /\Q$a\E/){print true;}
You then won't get burned when the pattern you'll be looking for will contain "reserved characters" like any of -[]{}().
(apart the missing semicolons:) Why do you put $a in square brackets? This makes it a list of possible characters. Try:
$b =~ /\Q${a}\E/
Update
To answer your remarks regarding = and =~:
=~ is the matching operator, and specifies the variable to which you are applying the regex ($b) in your example above. If you omit =~, then Perl will automatically use an implied $_ =~.
The result of a regular expression is an array containing the matches. You usually assign this so an array, such as in ($match1, $match2) = $b =~ /.../;. If, on the other hand, you assign the result to a scalar, then the scalar will be assigned the number of elements in that array.
So if you write $b = /\Q$a\E/, you'll end up with $b = $_ =~ /\Q$a\E/.
$a='program';
$b='programming';
if ( $b =~ /\Q$a\E/) {
print "match found\n";
}
If you're just looking for whether one string is contained within another and don't need to use any character classes, quantifiers, etc., then there's really no need to fire up the regex engine to do an exact literal match. Consider using index instead:#!/usr/bin/env perl
#!/usr/bin/env perl
use strict;
use warnings;
my $target = 'program';
my $string = 'programming';
if (index($string, $target) > -1) {
print "target is in string\n";
}

Shortening RegEx for Perl

I have created a regexp in Perl that is about 95 characters in length, I wish to shorten it to 78 characters but can't find a suitable method. Any advice welcome, the regexp is similar to the code below, ideally there is something similar to \ in C.
my ($foo, $bar, $etc) = $input_line =~
/^\d+: .... (\X+)\(\X(\d+.\d+|\d+)\/\X(\d+.\d+|\d+) (\X+)\)$/
There is a way to tell regex to skip embedded whitespace and comments, so not only you can split it up into multiple lines, but also comment it, format it to sections etc. I think it's 'x', but I don't have documentation handy right now, so look it up in the man page.
So you'd change it to something like:
my ($foo, $bar, $etc) = $input_line =~ /
^\d+: ....
(\X+)\(
\X(\d+.\d+|\d+) # numerator
\/\X(\d+.\d+|\d+) # denominator
\ (\X+)\)$/x # mind the escaped space!
It's also possible to construct pieces of regular expression separately via the 'qr' string prefix and combine them using variable substitution. Something like
my $num_re = qr/(\X+)\(\X(\d+.\d+|\d+)\/\X(\d+.\d+|\d+)/;
my ($foo, $bar, $etc) = $input_line =~ /^\d+: .... $num_re (\X+)\)$/;
I have not done this for long, so I am not sure you whether any flags are needed.
Perl interpolates regex, so you could do something like this
my $input_line = '123: .... X(X1.1/X5 XXX)';
my $dOrI = '(\d+.\d+|\d+)';
my ($foo, $bar, $etc) = $input_line =~
/^\d+: .... (\X+)\(\X$dOrI\/\X$dOrI (\X+)\)$/;
print "$foo, $bar, $etc";
Output -
X, 1.1, 5
One thing I see in the regex is the period in '\d+.\d+'.
You know that '.' in a regex matches ANY character, not only an actual period character.
If you want to specify only an actual period character, you'll have to use '\.' instead.
The other thing is that you may be able to replace '\d+.\d+|\d+' with '\d+.?\d+'
[EDIT]
One more thing, if you use the interpolated regex more than once and don't change it in between uses, (say, in a loop), you should use the /o option to have Perl compile the entire regex so it doesn't need to be compiled everytime.

Does a regular expression exist for enzymatic cleavage?

Does a regular expression exist for (theoretical) tryptic cleavage of protein sequences? The cleavage rule for trypsin is: after R or K, but not before P.
Example:
Cleavage of the sequence VGTKCCTKPESERMPCTEDYLSLILNR should result in these 3 sequences (peptides):
VGTK
CCTKPESER
MPCTEDYLSLILNR
Note that there is no cleavage after K in the second peptide (because P comes after K).
In Perl (it could just as well have been in C#, Python or Ruby):
my $seq = 'VGTRCCTKPESERMPCTEDYLSLILNR';
my #peptides = split /someRegularExpression/, $seq;
I have used this work-around (where a cut marker, =, is first inserted in the sequence and removed again if P is immediately after the cut maker):
my $seq = 'VGTRCCTKPESERMPCTEDYLSLILNR';
$seq =~ s/([RK])/$1=/g; #Main cut rule.
$seq =~ s/=P/P/g; #The exception.
my #peptides = split( /=/, $seq);
But this requires modification to a string that can potentially be very long and there can be millions of sequences. Is there a way where a regular expression can be used with split? If yes, what would the regular expression be?
Test platform: Windows XP 64 bit. ActivePerl 64 bit. From perl -v: v5.10.0 built for MSWin32-x64-multi-thread.
You indeed need to use the combination of a positive lookbehind and a negative lookahead. The correct (Perl) syntax is as follows:
my #peptides = split(/(?!P)(?<=[RK])/, $seq);
You could use look-around assertions to exclude that cases. Something like this should work:
split(/(?<=[RK](?!P))/, $seq)
You can use lookaheads and lookbehinds to match this stuff while still getting the correct position.
/(?<=[RK])(?!P)/
Should end up splitting on a point after an R or K that is not followed by a P.
In Python you can use the finditer method to return non-overlapping pattern matches including start and span information. You can then store the string offsets instead of rebuilding the string.