Perl Ternary statement with regex as one of its results - regex

I currently have the following code:
my $class = $rs->{'CLS_ID'};
$class =~ s/^C\S{1,3}\s+// if ($transform);
This works fine, but I was wondering if those 2 statements could be combined into a single ternary expression?

The ternary conditional operator lets you specify a test and two values.
E.g.
my $value = somecondition ? value-if-true : value-if-false;
Now, you aren't doing that with your example - you're setting a value, then running a subroutine (regex) if your condition is true.
So I'd suggest whilst you possibly could, you're subverting the notion of what a ternary operator is for. And you'd still have your assignment on both 'sides' of the expression.
E.g:
my $class = ($transform) ? $rs->{'CLS_ID'} : $rs->{'CLS_ID'} =~ s/^C\S{1,3}\s+//r;
The 'r' flag tells your regex to 'return' the modified value, without modifying the original. But I wouldn't do this, because it makes what you're doing less clear.
Note that the r modifier to regex applies to perl 5.14 onwards.

Not with a conditional operator, but I think you're looking for this:
(my $class = 'initial-val') =~ s/something// if ($transform);
NOTE: This is officially undefined behaviour, and has very weird side effects. I'll leave it here, though, as a example of what not to do.
Another way it could be accomplished (assuming $transform is either 0 or 1):
use strict;
my $rs = {'CLS_ID' => 'Cabc and this should be left'};
# True value
my $transform = 1;
my $class = (($transform.$rs->{'CLS_ID'}) =~ s/^(1C\S{1,3}\s+)|0//r);
print $class . "\n";
# False value
my $transform = 0;
my $class = (($transform.$rs->{'CLS_ID'}) =~ s/^(1C\S{1,3}\s+)|0//r);
print $class . "\n";
This prints:
and this should be left
Cabc and this should be left
But... please don't do this ;)

Conditional operator, and no variable redundancy,
my ($class) = map { $transform ? s/^C\S{1,3}\s+//r : $_ } $rs->{'CLS_ID'};
for older perl with no /r switch for s///
$transform and s/^C\S{1,3}\s+// for my $class = $rs->{'CLS_ID'};

Related

Why is a regular expression containing the exact string not matching successfully?

This might be a beginners mistake.
The regex turns out always as not matching while clearly it should.
#!/usr/bin/perl
# This will print "Hello, World"
print "Hello, world\n";
my $addr = "Hello";
#if($addr =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\)/ )
if (my $addr =~ /Hello/)
{
print("matched\n\n");
}else
{
print("Didnt Match\n\n");
}
The my makes the variable you match local and uninitialised.
So you should change to
if ($addr =~ /Hello/)
The my indicates that the $addr in the if is "my own here", i.e. different from the other $addr the one with larger, outer scope.
Only the outer scope variable got initialised to something which would match your regex. The second, inner one is not initialised and (at least in your case) has no matching value.
Note: Comments by other authors have proposed a best practice for avoiding/detecting the cause of your problem in future programming.
I know this has been answered already, but let me just expand a bit.
in general in perl we would work in blocks, if we can call it that. If you set
my $string = 'String';
in the beginning of the script, outside any loop, that declaration will stay the string throughout the script, unless you re-declare or re-assign it somewhere along the line.
my $string = 'string';
$string = 'Text';
That changes a bit if you work inside of a block, let's say in an if statement:
Scenario 1.
my $var = 'test';
if ($var =~ /test/) {
my $string = 'string';
}
print $string; # This will not work as $string only existed in the if block.
The following is the same scenario, but you re-declare $var in the if block and therefore it will try and match the new variable which has no value and therefore $string in this instance will never be set.
Scenario 2.
my $var = 'test';
if (my $var =~ /test/) {
my $string = 'string';
}
There is another one though which works differently from my and that is our
Scenario 3.
my $var = 'test';
if ($var =~ /test/) {
our $string = 'string';
}
print $string;
The above scenario works a bit different from scenario 1. We are declaring our $string inside of the if statement block and we can now use $string outside of that loop because our is saying any block from here on, owns this variable.
U had initialised another variable called $addr in the scope of if, instead of using the variable which was initialized in the global scope.
#!/usr/bin/perl
# This will print "Hello, World"
print "Hello, world\n";
my $addr = "Hello";
#if($addr =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\)/ )
if ($addr =~ /Hello/)
{
print("matched\n\n");
}else
{
print("Didnt Match\n\n");
}

perl regex parenthesis matching

I have a variable $next which contains strings that might contain parenthesis e.g trna(tgc) I want to make this matching statement if ($data[$i][2]=~/$next/){ ..} and it always return false even if it's true in reality. I tried this if ($data[$i][2]=~/trnA\(tgc\)/){ ..}and it works.
my question is : how to insert the '\' in front of each parenthesis into the variable $next?
You need to quote meta-characters.
Try this.
print "match" if( $var1 =~ /\Q$var2\E/ );
I think you want quotemeta:
$next = "trna(tgc)";
$search = quotemeta($next);
if ($data[$i][2]=~/$search/){
//..
}

How can I count characters in Perl?

I have the following Perl script counting the number of Fs and Ts in a string:
my $str = "GGGFFEEIIEETTGGG";
my $ft_count = 0;
$ft_count++ while($str =~ m/[FT]/g);
print "$ft_count\n";
Is there a more concise way to get the count (in other words, to combine line 2 and 3)?
my $ft_count = $str =~ tr/FT//;
See perlop.
If the REPLACEMENTLIST is empty, the
SEARCHLIST is replicated. This latter is useful for counting
characters in a class …
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
Here's a benchmark:
use strict; use warnings;
use Benchmark qw( cmpthese );
my ($x, $y) = ("GGGFFEEIIEETTGGG" x 1000) x 2;
cmpthese -5, {
'tr' => sub {
my $cnt = $x =~ tr/FT//;
},
'm' => sub {
my $cnt = ()= $y =~ m/[FT]/g;
},
};
Rate tr m
Rate m tr
m 108/s -- -99%
tr 8118/s 7440% --
With ActiveState Perl 5.10.1.1006 on 32 Windows XP.
The difference seems to be starker with
C:\Temp> c:\opt\strawberry-5.12.1\perl\bin\perl.exe t.pl
Rate m tr
m 88.8/s -- -100%
tr 25507/s 28631% --
When the "m" operator has the /g flag AND is executed in list context, it returns a list of matching substrings. So another way to do this would be:
my #ft_matches = $str =~ m/[FT]/g;
my $ft_count = #ft_matches; # count elements of array
But that's still two lines. Another weirder trick that can make it shorter:
my $ft_count = () = $str =~ m/[FT]/g;
The "() =" forces the "m" to be in list context. Assigning a list with N elements to a list of zero variables doesn't actually do anything. But then when this assignment expression is used in a scalar context ($ft_count = ...), the right "=" operator returns the number of elements from its right-hand side - exactly what you want.
This is incredibly weird when first encountered, but the "=()=" idiom is a useful Perl trick to know, for "evaluate in list context, then get size of list".
Note: I have no data on which of these are more efficient when dealing with large strings. In fact, I suspect your original code might be best in that case.
Yes, you can use the CountOf secret operator:
my $ft_count = ()= $str =~ m/[FT]/g;
You can combine line 2, 3 and 4 into one like so:
my $str = "GGGFFEEIIEETTGGG";
print $str =~ s/[FT]//g; #Output 4;

Perl regex replacement string special variable

I'm aware of the match, prematch, and postmatch predefined variables. I'm wondering if there is something similar for the evaluated replacement part of the s/// operator.
This would be particularly useful in dynamic expressions so they don't have to be evaluated a 2nd time.
For example, I currently have %regexs which is a hash of various search and replace strings.
Here's a snippet:
while (<>) {
foreach my $key (keys %regexs) {
while (s/$regexs{$key}{'search'}/$regexs{$key}{'replace'}/ee) {
# Here I want to do something with just the replaced part
# without reevaluating.
}
}
print;
}
Is there a convenient way to do it? Perl seems to have so many convenient shortcuts, and it seems like a waste to have to evaluate twice (which appears to be the alternative).
EDIT: I just wanted to give an example: $regexs{$key}{'replace'} might be the string '"$2$1"' thus swapping the positions of some text in the string $regexs{$key}{'search'} which might be '(foo)(bar)' - thus resulting in "barfoo". The second evaluation that I'm trying to avoid is the output of $regexs{$key}{'replace'}.
Instead of using string eval (which I assume is what's going on with s///ee), you could define code references to do the work. Those code references can then return the value of the replacement text. For example:
use strict;
use warnings;
my %regex = (
digits => sub {
my $r;
return unless $_[0] =~ s/(\d)(\d)_/$r = $2.$1/e;
return $r;
},
);
while (<DATA>){
for my $k (keys %regex){
while ( my $replacement_text = $regex{$k}->($_) ){
print $replacement_text, "\n";
}
}
print;
}
__END__
12_ab_78_gh_
34_cd_78_yz_
I'm pretty sure there isn't any direct way to do what you're asking, but that doesn't mean it's impossible. How about this?
{
my $capture;
sub capture {
$capture = $_[0] if #_;
$capture;
}
}
while (s<$regexes{$key}{search}>
<"capture('" . $regexes{$key}{replace}) . "')">eeg) {
my $replacement = capture();
#...
}
Well, except to do it really properly you'd have to shoehorn a little more code in there to make the value in the hash safe inside a singlequotish string (backslash singlequotes and backslashes).
If you do the second eval manually you can store the result yourself.
my $store;
s{$search}{ $store = eval $replace }e;
why not assign to local vars before:
my $replace = $regexs{$key}{'replace'};
now your evaluating once.

Using Perl, how can I build a dynamic regexp by passing in an argument to a subroutine?

I would like to create subroutine with a dynamically created regxp. Here is what I have so far:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub($_);
}
sub theSub {
my $int = #_;
my $var2 = $var =~ m/(??{$int})/;
print "$var2\n";
}
It looks like it will work, but it seems that once the $int in the regex gets evaluated for the first time, it's there forever.
Is there anyway to do something similar to this, but have the regex pick up the new argument each time the sub is called?
The easiest way to fix your code is to add parentheses around my, and remove ??{. Here is the fixed program:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
theSub($_);
}
sub theSub {
my($int) = #_;
my($var2) = $var =~ m/($int)/;
print "$var2\n";
}
One of the problematic lines in your code was my $int = #_, which was equivalent to my $int = 1, because it evaluated #_ in scalar context, yielding the number of elements in #_. To get the first argument of your sub, use my($int) = #_;, which evaluates #_ in list context, or fetch the first element using my $int = $_[0];, or fetch+remove the first element using my $int = shift;
There was a similar problem in the my $var2 = line, you need the parentheses there as well to evaluate the regexp match in list context, yielding the list of ($1, $2, ...), and assigning $var2 = $1.
The construct (??{...}) you were trying to use had the opposite effect to what you wanted: (among doing other things) it compiled your regexp the first time it was used for matching. For regexps containing $ or #, but not containing ??{...}, Perl recompiles the regexp automatically for each match, unless you specify the o flag (e.g. m/$int/o).
The construct (??{...}) means: use Perl code ... to generate a regexp, and insert that regexp here. To get more information, search for ??{ on http://perldoc.perl.org/perlre.html . The reason why it didn't work in your example is that you would have needed an extra layer of parentheses to capture $1, but even with my ($var2) = $var =~ m/((??{$int}))/ it wouldn't have worked, because ??{ has an undocumented property: it forces the compilation of its argument the first time the regexp is used for matching, so my ($var2) = $var =~ m/((??{$int + 5}))/ would have always matched 6.
my $int = #_;
This will give you the count of parameters, always '1' in your case.
I think you want
my $int = shift;
To dynamically pass a regexp to a function, rather than dynamically build it in the function, use qr//.
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub(qr/$int/);
}
sub theSub {
my($regexp) = #_;
my($var2) = ($var =~ $regexp);
print "$var2\n";
}
qr// accepts the same trailing arguments that m// does: i, m, s, and x
my $int is the scalar context, he has ($int) for the list context and that puts $_[0] into $int. In the following only 10 is put into $int and the rest 11 to 99 are lost.
my ($int)=(10..99);
print $int;
10