perl regex to grep for a character in a word - regex

I am trying to write a perl script to grep for a character in a string. All the strings are stored in an array. We iterate over the array and look if the particular word occurs, if so grep for a particular pattern.
my #array = ("Foo1", "Bar", "Baz", "Foo5", "Foo2", "Bak", "Foo3");
foreach my $ var (#array){
if ($var =~ /Foo/){
#Regex to grep for the number which is at the end of string Foo
}
}
Any leads are welcomed. Thanks for the help.
************Edits***********
Thanks for the comments.
if ($var =~ /Foo/){
/.Foo+([A-Z]+)/;
print $1, "\n";
}
The above is the code that I tried and it didn't print anything.

Matching without binding =~ matches against the $_ variable that you don't use. Furthermore, the dot before Foo means the second regex will match only if Foo is preceded by something (that's not a newline). As all your strings containing Foo start with it, the second regex can never match even if you specify $var =~.
Moreover, you can match the number directly in the condition.
And finally, [A-Z] doesn't match digits. Use [0-9] instead.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
foreach my $var (#array){
if ($var =~ /Foo([0-9]+)/){
print $1, "\n";
}
}

Try this for your second regex
[^\w]*([0-9])
Then you can use the first group to get the number.

my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
my #var = map(/Foo(\w+)/) #array;
print #var ;

Related

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

Perl idiom for quickly searching file with elements in array

what is the Perl idiom to search a string or a whole file for array elements occurrences? E.g.:
my #array = qw(word, test, ...);
my $string = ".......";
I want to search for word or test (can also be words, tester, etc.) inside $string and return whatever is found (i.e. group match).
I searched the docs, seems like map + grep is what I need but I just can’t come up with the code for it. Perl is such fun that I am totally clueless sometimes. :)
Using one example from map:
my #squares = map { $_ * $_ } grep { $_ > 5 } #numbers;
I suppose I can split the string into array and grep. Am I right?
grep { #array } #string; # something like grep {/(word|test)/} #string but I want to use array
my #word_roots = qw( word test );
my $pat = join '|', map quotemeta, #word_roots;
my $re = qr/\b(?:$pat)\w+\b/;
my #matches = $string =~ /($re)/g;
How about something like this from a re.pl session:
$ my #array = qw(word test)
$VAR1 = 'word';
$VAR2 = 'test';
$ my $string = ' the word is test, I said'
the word is test, I said
$ my #match_array = map { $string =~ /\b($_)\b/ } #array
$VAR1 = 'word';
$VAR2 = 'test';
The parenthesis around \b$_\b capture the match in the regex inside of map.
The \b ensures that we only match is the word is found on its own (like "test" or "word") and not words that contain the characters "test", or "word" in them like "coward" or "brightest". See http://www.regular-expressions.info/wordboundaries.html for more details on \b.

Perl regex to remove empty strings

I'm trying to write regex to remove empty strings on a line (and doesn't care about whitespace between list items), for example: baz foo, "","", bar, "" becomes baz foo, bar
So far I'm trying
$newLine =~ s/""\s*?,//g;
$newLine =~ s/,\s*?""//g;
but given baz "", foo, "" it is returning baz foo, "", but I want it to return baz foo.
Could anyone explain what's going wrong/how I can fix it?
Thanks
Your code works for me:
$string = 'baz "", foo, ""';
$string =~ s/""\s*?,//g;
$string =~ s/,\s*?""//g;
print "$string\n";
Returns
baz foo
for me.
Edit: As stated in the commentary below, it won't work for the string baz "", "". That's because the first regex consumes the , right before the second "", causing the second regex to not match.
An alternative for the regexes would be to use map.
$string = 'baz "", "", foo';
$string = join(" ", map { $_ =~ s/\s*""\s*//g; $_; } (split(/\s*,\s*/, $string)));
That will set $string to baz foo
It's easier to split the string, remove elements that don't contain anything apart from "" (and possibly surrounding spaces) and join those back.
The following might work for you:
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
Example:
$ cat mmm
$newLine = 'baz foo, "","", bar, ""';
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
print $newLine . "\n";
$ perl mmm
baz foo, bar

PCRE pattern to count number of character(s)

here is my input string:
bar somthing foo bar somthing foo
I would like to count number of a character (ex: 't') between bar & foo
bar somthing foo -> 1
bar somthing foo -> 1
I know we can use /bar(.*?)foo/ and then count number of character in matches[1] with a String function
Is there way to do this w/o string function to count?
A Perl solution:
$_ = 'bar test this thing foo';
my $count = /bar(.*?)foo/ && $1 =~ tr/t//;
print $count;
Output:
4
Just for fun, using a single expression with (?{ code }):
$_ = 'bar test this thing foo';
my $count = 0;
/bar ( (?:(?!bar)[^t])*+ ) (?:t (?{ ++$count; }) (?-1) )*+ foo/x or $count = 0;
print $count;
#Qtax: the subject says PCRE... so it's not exactly perl. Hence (?{ code }) would most probably be unsupported (let alone the full perl code).
Though both solutions are cool ;)
#tqwer: you can get the match and then replace [^t] with "" and check length..
though i am not sure what the logic behind counting with regex would be ;)

Perl regex: how to know number of matches

I'm looping through a series of regexes and matching it against lines in a file, like this:
for my $regex (#{$regexs_ref}) {
LINE: for (#rawfile) {
/#$regex/ && do {
# do something here
next LINE;
};
}
}
Is there a way for me to know how many matches I've got (so I can process it accordingly..)?
If not maybe this is the wrong approach..? Of course, instead of looping through every regex, I could just write one recipe for each regex. But I don't know what's the best practice?
If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.
Or am I misunderstanding the question?
Example:
my #list = /$my_regex/g;
if (#list)
{
# do stuff
print "Number of matches: " . scalar #list . "\n";
}
You will need to keep track of that yourself. Here is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my #regexes = (
qr/b/,
qr/a/,
qr/foo/,
qr/quux/,
);
my %matches = map { $_ => 0 } #regexes;
while (my $line = <DATA>) {
for my $regex (#regexes) {
next unless $line =~ /$regex/;
$matches{$regex}++;
}
}
for my $regex (#regexes) {
print "$regex matched $matches{$regex} times\n";
}
__DATA__
foo
bar
baz
In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special #- array (emphasis added):
#LAST_MATCH_START
#-
$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with
substr $_, $-[n], $+[n] - $-[n]
if $-[n] is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with #+.
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.
After a match against some variable $var:
$` is the same as substr($var, 0, $-[0])
$& is the same as substr($var, $-[0], $+[0] - $-[0])
$' is the same as substr($var, $+[0])
$1 is the same as substr($var, $-[1], $+[1] - $-[1])
$2 is the same as substr($var, $-[2], $+[2] - $-[2])
$3 is the same as substr($var, $-[3], $+[3] - $-[3])
Example usage:
#! /usr/bin/perl
use warnings;
use strict;
my #patterns = (
qr/(foo(bar(baz)))/,
qr/(quux)/,
);
chomp(my #rawfile = <DATA>);
foreach my $pattern (#patterns) {
LINE: for (#rawfile) {
/$pattern/ && do {
my $captures = $#-;
my $s = $captures == 1 ? "" : "s";
print "$_: got $captures capture$s\n";
};
}
}
__DATA__
quux quux quux
foobarbaz
Output:
foobarbaz: got 3 captures
quux quux quux: got 1 capture
How about below code:
my $string = "12345yx67hjui89";
my $count = () = $string =~ /\d/g;
print "$count\n";
It prints 9 here as expected.