pattern matching in regular expression (Perl)

pattern matching in regular expression (Perl) - regex

Make a pattern that will match three consecutive copies of whatever is currently contained in $what. That is, if $what is fred, your pattern should match fredfredfred. If $what is fred|barney, your pattern should match fredfredbarney, barneyfredfred, barneybarneybarney, or many other variations. (Hint: You should set $what at the top of the pattern test program with a statement like my $what = 'fred|barney';)
But my solution to this is just too easy so I'm assuming its wrong. My solution is:
#! usr/bin/perl
use warnings;
use strict;
while (<>){
chomp;
if (/fred|barney/ig) {
print "pattern found! \n";
}
}
It display what I want. And I didn't even have to save the pattern in a variable. Can someone help me through this? Or enlighten me if I'm doing/understanding the problem wrong?

This example should clear up what was wrong with your solution:
my #tests = qw(xxxfooxx oofoobar bar bax rrrbarrrrr);
my $str = 'foo|bar';
for my $test (#tests) {
my $match = $test =~ /$str/ig ? 'match' : 'not match';
print "$test did $match\n";
}
OUTPUT
xxxfooxx did match
oofoobar did match
bar did match
bax did not match
rrrbarrrrr did match
SOLUTION
#!/usr/bin/perl
use warnings;
use strict;
# notice the example has the `|`. Meaning
# match "fred" or "barney" 3 times.
my $str = 'fred|barney';
my #tests = qw(fred fredfredfred barney barneybarneybarny barneyfredbarney);
for my $test (#tests) {
if( $test =~ /^($str){3}$/ ) {
print "$test matched!\n";
} else {
print "$test did not match!\n";
}
}
OUTPUT
$ ./test.pl
fred did not match!
fredfredfred matched!
barney did not match!
barneybarneybarny did not match!
barneyfredbarney matched!

use strict;
use warnings;
my $s="barney/fred";
my #ra=split("/", $s);
my $test="barneybarneyfred"; #etc, this will work on all permutations
if ($test =~ /^(?:$ra[0]|$ra[1]){3}$/)
{
print "Valid\n";
}
else
{
print "Invalid\n";
}
Split delimited your string based off of "/". (?:$ra[0]|$ra[1]) says group, but do not extract, "barney" or "fred", {3} says exactly three copies. Add an i after the closing "/" if the case doesn't matter. The ^ says "begins with," and the $ says "ends with."
EDIT:
If you need the format to be barney\fred, use this:
my $s="barney\\fred";
my #ra=split(/\\/, $s);
If you know that the matching will always be on fred and barney, then you can just replace $ra[0], $ra[1] with fred and barney.

Related

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example

It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.

The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.

This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}

You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

How to refer to matched part in regex

I am using the following code to search for a substring and print it out with a few characters before and after it. Somehow Perl takes issue with me using $1 and complains about
Use of uninitialized value $1 in concatenation (.) or string.
I cannot figure out why...can you?
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}

You have to capture $word into regex group $1 by using parentheses,
while ($lines =~ m/\b($word)\b/g)

When you use $1, you are asking the code to use the first captured group from the regex and since your regex doesn't have any, well, that variable won't exist.
You can either refer to the whole match with $& or you add a capture group to your regex and keep using $1.
i.e. Either:
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$&";
print substr ($lines, max(pos($lines)-length($&)-$context, 0), length($&)+$context); # check: am I possibly violating any boundaries here
}
Or
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/(\b$word\b)/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
Note: It doesn't matter whether you use (\b$word\b) or (\b$word)\b or \b($word\b) or \b($word)\b here because \b is a 'string' of 0 length.

When you want to address a matched part in regex, put it in parenthes. Than you'll be able to address this mathced part via $1 variable (for first pair of parenthes), $2 (for the second pair) and so on.

The values $1, $2 and so on hold the strings found by capture groups. When a match is performed all of these variables are set to undef. The code in the question does not have any capture groups and hence $1 is never given a value, it is undefined.
Running the code below shows the effect. Initially $1, $2 and $3 are not defined. The first match sets $1 and $2 but not $3. The second match sets only $1 but not that $2 is cleared to be undefined. The third match has no capture groups and all three are undefined.
use strict;
use warnings;
sub show
{
printf "\$1: %s\n", (defined $1 ? $1 : "-undef-");
printf "\$2: %s\n", (defined $2 ? $2 : "-undef-");
printf "\$3: %s\n", (defined $3 ? $3 : "-undef-");
print "\n";
}
my $text = "abcdefghij";
show();
$text =~ m/ab(cd)ef(gh)ij/; # First match
show();
$text =~ m/ab(cd)efghij/; # Second match
show();
$text =~ m/abcdefghij/; # Third match
show();

$1 will have no value unless you are actually capturing something.
You can adjust your boundary collection method to using lookahead and lookbehinds.
use strict;
use warnings;
my $lines = "this is just a test to find something out";
my $word = "test";
my $extra = 10;
while ($lines =~ m/(?:(?<=(.{$extra}))|(.{0,$extra}))\b(\Q$word\E)\b(?=(.{0,$extra}))/gs ) {
my $pre = $1 // $2;
my $word = $3;
my $post = $4;
print "'...$pre<$word>$post...'\n";
}
Outputs:
'...is just a <test> to find s...'

Matching words with exactly one vowel

I want to match only the strings that have exactly one vowel.
I tried this code, and it works but it also matches those strings that haven't any vowels (for example hshs, ksks, lslsl) and I need only the strings that have just one vowel
if ( $string !~ /\*w[aeiou]\w*[aeiou]\W*/ ) {
print $string;
}

You can use tr/// to count the occurrences of letters in a string.
Something like this perhaps
use strict;
use warnings;
for my $string ( qw/ a fare is paid for every cab /) {
if ( $string =~ tr/aeiuoAEIOU// == 1 ) {
print $string, "\n";
}
}
output
a
is
for
cab

Make it simple, at least one vowel:
if ($string =~ /[aeiou]/i) {
print $string;
}
exactly one vowel:
if ($string =~ /^[^aeiou]*[aeiou][^aeiou]*$/i) {
print $string;
}

regex pattern match and extraction

I am trying to write a perl regex to extract words greater than 2 letters after the colon :. For example, If the pattern is subject:I am about to write a regex. I need to extract in my $variable only >2 letter wordsi.e, $variable = "subject:about write regex".
Here is my program where the regex and pattern matching is done but when I print, my variable is empty. What am I doing wrong?
#!/usr/bin/perl
while (<STDIN>) {
foreach my $query_part (split(/\s+/, $_)) {
my($query_part_subject) = $query_part =~ /([^\w\#\.]+)?((?:\w{3,}|[\$\#()+.])+)(?::(\w{3,}.+))?/ ;
print "query_part : $query_part_subject \n";
}
}
exit(0);

Try doing this :
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
s/.*?://;
print join "\n", grep { length($_) > 2 } split;
__DATA__
subject:a bb ccc dddd fffff
OUTPUT
ccc
dddd
fffff
NOTE
from my understanding of your question : I display only the words length > 2 characters after the : character.

It isn't clear from your question. Is this what you are looking for??
$txt='I am about to create regex';
$x='(I)';
$y='.*?';
$z='(am)';
$re1=$x.$y.$z;
if ($txt =~ m/$re1/is)
{
$var1=$1;
$word1=$2;
print "($var1) ($word1) \n";
}

Regex and the characters case

Okay, I got a rather simple one (at least seems simple). I have a multi lined string and I am just playing around with replacing different words with something else. Let me show you...
#!/usr/bin/perl -w
use strict;
$_ = "That is my coat.\nCoats are very expensive.";
s/coat/Hat/igm;
print;
The output would be
That is my Hat
Hats are very expensive...
The "hat" on the first line shouldn't be capitalized. Are there any tricks that can make the casing compliant with how english is written? Thanks :)

see how-to-replace-string-and-preserve-its-uppercase-lowercase
For more detail go to How do I substitute case insensitively on the LHS while preserving case on the RHS?

You can use the e modifier to s/// to do the trick:
s/(coat)/ucfirst($1) eq $1 ? 'Hat' : 'hat'/igme;

For one, you should use \b (word boundary) to match only the whole word. For example s/hat/coat/ would change That to Tcoat without leading \b. Now for your question. With the flag /e you can use Perl code in the replacement part of the regex. So you can write a Perl function that checks the case of the match and then set the case of the replacement properly:
my $s = "That is my coat.\nCoats are very expensive.";
$s =~ s/(\bcoat)/&same_case($1, "hat")/igme;
print $s, "\n";
sub same_case {
my ($match, $replacement) = #_;
# if match starts with uppercase character, apply ucfirst to replacement
if($match =~ /^[A-Z]/) {
return ucfirst($replacement);
}
else {
return $replacement;
}
}
Prints:
That is my hat.
Hats are very expensive.

This may solve your problem:
#!/usr/bin/perl -w
use strict;
sub smartSubstitute {
my $target = shift;
my $pattern = shift;
my $replacement = shift;
$pattern = ucfirst $pattern;
$replacement = ucfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
$pattern = lcfirst $pattern;
$replacement = lcfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
return $target;
}
my $x = "That is my coat.\nCoats are very expansive.";
my $y = smartSubstitute($x, "coat", "Hat");
print $y, "\n";

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

pattern matching in regular expression (Perl) - regex

Related

Perl Grepping from an Array

How to refer to matched part in regex

Matching words with exactly one vowel

regex pattern match and extraction

Regex and the characters case

Categories

Resources