how to extract string with any operator between? - regex

I have an array contain #arr = { "a=b", "a>b", "a<b", "a!=b", "a-b" }. What is the best way to get a and b with any operator between. I can extract by
for($i=0; $i<=$#arr; $i++){
$str = $arr[$i];
if($str =~ m/^(.*?)(\s*=\s*)(.*)(;)/g){
my $d = $1;
my $e = $3;
}
Follow by all if statement with the possible operator like "!=", "<" etc. But this will make my code look messy. Any better solution for this?

You could try something like this one liner
perl -e '#a = ("a=b","a>b","a<b","a!=b","a-b"); for $l (#a) { $l =~ s/(.).*(.)/$1/; print "$1$2\n"};'
The key thing is the greedy match ie "(.*)" between the two single character matches ie "(.)". To really make sure that you start at the start and end of the strings you could use this
perl -e '#a = ("a=b","a>b","a<b","a!=b","a-b"); for $l (#a) { $l =~ s/^(.).*(.)$/$1/; print "$1$2\n"};'
A complete working example that demonstrates the whole thing would be
#!/usr/bin/perl
use strict;
use warnings;
my #expressions = ("a=b","a>b","a<b","a!=b","a-b");
for my $exp (#expressions) {
$exp =~ s/^(.).*(.)$/$1$2/;
print "$1$2 is the same as $exp\n";
};

A very simple regex might be
/^(\w+)\s*(\W+)\s*(\w+)$/
Or you enumerate possible operators
/^(\w+)\s*(=|!=|<|>|<=|>=|\+|-|\*|\/|==)\s*(\w+)$/
It depends whether the input can be trusted or not. If not, you might have to be more meticulous w.r.t. the identifiers, too. Here's a simpler loop, and no need to use m//g(lobal). Not sure about the semicolon - omitted it.
my #arr = ( "a=b", "a>b", "a<b", "a!=b", "a-b" );
for my $str (#arr){
if($str =~ /^(\w+)\s*(=|!=|<|>|<=|>=|\+|-|\*|\/|==)\s*(\w+)$/ ){
my $d = $1;
my $e = $3;
print "d=$d e=$e\n";
}
}
Later If you enumerate the operators, you can also add word symbols:
if($str =~ /^(\w+)\s*(=|!=|<|>|<=|>=|\+|-|\*|\/|==|x?or|and)\s*(\w+)$/ ){
...

if there always 'a' and 'b' at the beginning and the end you could try:
my $str = 'a<b';
my( $op ) = $str =~ /^a(.*)b$/;

Not a well thought out answer. Will reconsider the problem.

Related

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

perl match single occurence pattern in string

I have a list of names and I want to look for names containing two given letters asigned using variables.
$one = "A";
$two = "O";
Please note that I want those letters to be present anywhere in the checked names, so that I can get outputs like this:
Jason
Damon
Amo
Noma
Boam
...
But each letter must only be present once per name, meaning that this wouldn't work.
Alamo
I've tried this bit of code but it doesn't work.
foreach my $name (#list) {
if ($name =~ /$one/) {
if ($name =~ /$two/) {
print $name;
}}
else {next}; }
How about this?
for my $name (#list) {
my $ones = () = $name =~ /$one/gi;
my $twos = () = $name =~ /$two/gi;
if ($ones == 1 && $twos == 1) {
print $name;
}
}
#!/usr/bin/env perl
#
# test.pl is the name of this script
use warnings;
use strict;
my %char = map {$_ => 1} grep {/[a-z]/} map {lc($_)} split //, join '', #ARGV;
my #chars = sort keys %char; # the different characters appearing in the command line arguments
while (my $line = <STDIN>)
{
grep {$_ <=> 1} map {scalar(() = $line =~ /$_/ig )} #chars
or print $line;
}
Now:
echo hello world | test.pl fw will print nothing (w occurs exactly once in hello world, but f does not)
echo hello world | test.pl hw will print a line consisting of hello world (both h and w occur exactly once).
One way to get it all into a single regex is to use an expression within the regex pattern to search for the other letter (a or o) based on which one was found first:
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
while(<DATA>){
chomp;
say if m/^
[^ao]* # anything but a or o
([ao]) # an 'a' or 'o'
[^ao]* # anything but a or o
(??{($1 and lc($1) eq 'a') ? 'o' : 'a'}) # the other 'a' or 'o'
[^ao]* $/xi; # anything but a or o
}
__DATA__
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
See the perlre section on Extended Expressions for more info.
This is my solution. You don't make it clear whether there will always be just two single-character strings to match but I have assumed that there may be more
Unfortunately the classical way of counting occurrences of a character -- tr/// -- doesn't interpolate variables into its searchlist and doesn't have a case-independent modifier /i. But the pattern-match operator m// does, so that is what I have used
I thoroughly dislike the so-called goatse operator, but there isn't a neater way that I know of that allows you to count the number of times a global regex pattern matches
I could have used a grep for the inner loop, but I went for a regular for loop and a next with a label as I believe it's more readable this way
use strict;
use warnings;
use v5.10.1;
use autodie;
my #list = do {
open my $fh, '<', 'names.txt';
<$fh>;
};
chomp #list;
my ($one, $two) = qw/ A O /;
NAME:
for my $name ( #list ) {
for ( $one, $two) {
my $count = () = $name =~ /$_/gi;
next NAME unless $count == 1;
}
say $name;
}
output
Gallio
Tekoa
Achbor
Clopas
This is the input that I used
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
Tiras
Jehudi
Bildad
Shemidah
Meshillemoth
Tabeel
Achbor
Jesus
Osee
Elnaam
Rephah
Asaiah
Er
Clopas
Penuel
Shema
Marsena
Jaare
Joseph
Shamariah
Levi
Aphses

Combination of while, array, and regex

The loop fails. What is wrong with the array?
I would like the regex to return B when it parses the first string, and M when it parses the second string.
How is such an regex constructed?
#!/usr/bin/perl
use warnings;
use strict;
my $a = "0.0 B/s";
my $b = "12.0 MiB/s";
while (defined (my $s = shift ("$a", "$b"))) {
my $unit = $1 if ($a =~ m/.*([KMGT])i?B\/s$/);
print "$unit\n";
}
shift is meant to be used with arrays, not lists. If you want to use a while loop, you need to pre-declare an array containing $a and $b (which, by the way, are a bad choice for variable names).
Having said that, a for loop construct is the more natural choice here:
for my $s ( $var1, $var2 ) { ... }
And given that you're trying to extract the measurement unit, why not do things a slightly different way:
say for map { my ( $s ) = /$regex/; $s } $var1, $var2;
You need another substitution:
for ($a, $b) {
if (m!((?:[KMGT]i)?B)/s$!) {
my $unit = $1;
$unit =~ s/(.).*/$1/;
print "$unit\n" if $unit;
}
}
Your while has issues.
You are using variable $a inside loop, when you want to use $s.
I'd use it this way:
#!/usr/bin/perl
use warnings;
use strict;
my $a = "0.0 B/s";
my $b = "12.0 MiB/s";
foreach my $s($a, $b) {
print $1 if ($s =~ m/.*([KMGT])i?B\/s$/);
}

Regex and the characters case

Okay, I got a rather simple one (at least seems simple). I have a multi lined string and I am just playing around with replacing different words with something else. Let me show you...
#!/usr/bin/perl -w
use strict;
$_ = "That is my coat.\nCoats are very expensive.";
s/coat/Hat/igm;
print;
The output would be
That is my Hat
Hats are very expensive...
The "hat" on the first line shouldn't be capitalized. Are there any tricks that can make the casing compliant with how english is written? Thanks :)
see how-to-replace-string-and-preserve-its-uppercase-lowercase
For more detail go to How do I substitute case insensitively on the LHS while preserving case on the RHS?
You can use the e modifier to s/// to do the trick:
s/(coat)/ucfirst($1) eq $1 ? 'Hat' : 'hat'/igme;
For one, you should use \b (word boundary) to match only the whole word. For example s/hat/coat/ would change That to Tcoat without leading \b. Now for your question. With the flag /e you can use Perl code in the replacement part of the regex. So you can write a Perl function that checks the case of the match and then set the case of the replacement properly:
my $s = "That is my coat.\nCoats are very expensive.";
$s =~ s/(\bcoat)/&same_case($1, "hat")/igme;
print $s, "\n";
sub same_case {
my ($match, $replacement) = #_;
# if match starts with uppercase character, apply ucfirst to replacement
if($match =~ /^[A-Z]/) {
return ucfirst($replacement);
}
else {
return $replacement;
}
}
Prints:
That is my hat.
Hats are very expensive.
This may solve your problem:
#!/usr/bin/perl -w
use strict;
sub smartSubstitute {
my $target = shift;
my $pattern = shift;
my $replacement = shift;
$pattern = ucfirst $pattern;
$replacement = ucfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
$pattern = lcfirst $pattern;
$replacement = lcfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
return $target;
}
my $x = "That is my coat.\nCoats are very expansive.";
my $y = smartSubstitute($x, "coat", "Hat");
print $y, "\n";