Matching simple keyword and keyword with spaces - regex

I'm currently working on a function which takes a list of keywords and a string(a looong string) as arguments, and i want it to return a list of each matched keyword. Problem is that a keyword can be in 2 words.
For exemple - keyword1 : foobar, keyword2 : foo bar, keyword3 : barfoo)
string:
hi this is foobar, have you seen my foo bar, he is very fooBar ?
i want a list with (foobar, foo bar);
For the moment i got:
#matches = $string =~ m/\b(?:foobar|foo bar)\b/gi ;
This works fine for simple words, but not for composed words :/
any idea ?
Thank you for your help.

sub myfunc {
my ($str, #kw) = #_;
my ($re) = map qr/\b ($_) \b/x, join "|", #kw;
return $str =~ /$re/gi;
}
my #kwords = ("foobar", "foo bar", "barfoo");
my #arr = myfunc("hi this is foobar, have you seen my foo bar, he is very fooBar ?", #kwords);

This returns the correct results:
sub match {
my #keywords=#_;
my $s=pop #keywords;
return grep {$s=~/\b\Q$_\E\b/i} #keywords;
}
my #matches=match('foobar','foo bar','barfoo)','hi this is foobar, have you seen my foo bar, he is very fooBar?'); #this returns (foobar, foo bar)
BTW your code #matches = $string =~ m/\b(?:foobar|foo bar)\b/gi; is working great too, if you remove the /i modifier it returns (foobar, foo bar)

Related

perl regex to grep for a character in a word

I am trying to write a perl script to grep for a character in a string. All the strings are stored in an array. We iterate over the array and look if the particular word occurs, if so grep for a particular pattern.
my #array = ("Foo1", "Bar", "Baz", "Foo5", "Foo2", "Bak", "Foo3");
foreach my $ var (#array){
if ($var =~ /Foo/){
#Regex to grep for the number which is at the end of string Foo
}
}
Any leads are welcomed. Thanks for the help.
************Edits***********
Thanks for the comments.
if ($var =~ /Foo/){
/.Foo+([A-Z]+)/;
print $1, "\n";
}
The above is the code that I tried and it didn't print anything.
Matching without binding =~ matches against the $_ variable that you don't use. Furthermore, the dot before Foo means the second regex will match only if Foo is preceded by something (that's not a newline). As all your strings containing Foo start with it, the second regex can never match even if you specify $var =~.
Moreover, you can match the number directly in the condition.
And finally, [A-Z] doesn't match digits. Use [0-9] instead.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
foreach my $var (#array){
if ($var =~ /Foo([0-9]+)/){
print $1, "\n";
}
}
Try this for your second regex
[^\w]*([0-9])
Then you can use the first group to get the number.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
my #var = map(/Foo(\w+)/) #array;
print #var ;

Perl regex to remove empty strings

I'm trying to write regex to remove empty strings on a line (and doesn't care about whitespace between list items), for example: baz foo, "","", bar, "" becomes baz foo, bar
So far I'm trying
$newLine =~ s/""\s*?,//g;
$newLine =~ s/,\s*?""//g;
but given baz "", foo, "" it is returning baz foo, "", but I want it to return baz foo.
Could anyone explain what's going wrong/how I can fix it?
Thanks
Your code works for me:
$string = 'baz "", foo, ""';
$string =~ s/""\s*?,//g;
$string =~ s/,\s*?""//g;
print "$string\n";
Returns
baz foo
for me.
Edit: As stated in the commentary below, it won't work for the string baz "", "". That's because the first regex consumes the , right before the second "", causing the second regex to not match.
An alternative for the regexes would be to use map.
$string = 'baz "", "", foo';
$string = join(" ", map { $_ =~ s/\s*""\s*//g; $_; } (split(/\s*,\s*/, $string)));
That will set $string to baz foo
It's easier to split the string, remove elements that don't contain anything apart from "" (and possibly surrounding spaces) and join those back.
The following might work for you:
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
Example:
$ cat mmm
$newLine = 'baz foo, "","", bar, ""';
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
print $newLine . "\n";
$ perl mmm
baz foo, bar

PCRE pattern to count number of character(s)

here is my input string:
bar somthing foo bar somthing foo
I would like to count number of a character (ex: 't') between bar & foo
bar somthing foo -> 1
bar somthing foo -> 1
I know we can use /bar(.*?)foo/ and then count number of character in matches[1] with a String function
Is there way to do this w/o string function to count?
A Perl solution:
$_ = 'bar test this thing foo';
my $count = /bar(.*?)foo/ && $1 =~ tr/t//;
print $count;
Output:
4
Just for fun, using a single expression with (?{ code }):
$_ = 'bar test this thing foo';
my $count = 0;
/bar ( (?:(?!bar)[^t])*+ ) (?:t (?{ ++$count; }) (?-1) )*+ foo/x or $count = 0;
print $count;
#Qtax: the subject says PCRE... so it's not exactly perl. Hence (?{ code }) would most probably be unsupported (let alone the full perl code).
Though both solutions are cool ;)
#tqwer: you can get the match and then replace [^t] with "" and check length..
though i am not sure what the logic behind counting with regex would be ;)

Why does my Perl regex cause an infinite loop?

I have some code that grabs the "between" of some text;
specifically, between a foo $someword and the next foo $someword.
However, what happens is it gets stuck at the first "between" and somehow the internal string position doesn't get incremented.
The input data is a text file with newlines here and there: they are rather irrelevant, but make printing easier.
my $component = qr'foo (\w+?)\s*?{';
while($text =~ /$component/sg)
{
push #baz, $1; #grab the $someword
}
my $list = join( "|", #baz);
my $re = qr/$list/; #create a list of $somewords
#Try to grab everything between the foo $somewords;
# or if there's no $foo someword, grab what's left.
while($text=~/($re)(.+?)foo ($re|\z|\Z)/ms)
#if I take out s, it doesn't repeat, but nothing gets grabbed.
{
# print pos($text), "\n"; #this is undef...that's a clue I'm certain.
print $1, ":", $2; #prints the someword and what was grabbed.
print "\n", '-' x 20, "\n";
}
Update: One more update to deal with 'foo' occurring inside the text you want to extract:
use strict;
use warnings;
use File::Slurp;
my $text = read_file \*DATA;
my $marker = 'foo';
my $marker_re = qr/$marker\s+\w+\s*?{/;
while ( $text =~ /$marker_re(.+?)($marker_re|\Z)/gs ) {
print "---\n$1\n";
pos $text -= length $2;
}
__DATA__
foo one {
one1
one2
one3
foo two
{ two1 two2
two3 two4 }
that was the second one
foo three { 3
foo 3 foo 3
foo 3
foo foo
foo four{}
Output:
---
one1
one2
one3
---
two1 two2
two3 two4 }
that was the second one
---
3
foo 3 foo 3
foo 3
foo foo
---
}

Is there a Perl equivalent for String.scan found in Ruby?

I want to do this in Perl:
>> "foo bar baz".scan /(\w+)/
=> [["foo"], ["bar"], ["baz"]]
Any suggestions?
This does essentially the same thing.
my #elem = "foo bar baz" =~ /(\w+)/g
You can also set the "default scalar" variable $_.
$_ = "foo bar baz";
my #elem = /(\w+)/g;
See perldoc perlre for more information.
If you only want to use that string as an array, you could use qw().
my #elem = qw"foo bar baz";
See perldoc perlop ​ ​( Quote and Quote-like Operators ).
Also, split, e.g.,
my $x = "foo bar baz";
my #elem = split(' ', $x);
OR
my #elem = split(/\w+/, $x);
etc.
They key perl concept is scalar vs list context. Assigning an expression to an array forces list context, as does something like a while loop.
Thus, the equivalent of the block form of String#scan is to use the regexp with a while loop:
while ("foo bar baz" =~ /(\w+)/) {
my $word = $1;
do_something_with($word);
}