Is there a Perl equivalent for String.scan found in Ruby? - regex

I want to do this in Perl:
>> "foo bar baz".scan /(\w+)/
=> [["foo"], ["bar"], ["baz"]]
Any suggestions?

This does essentially the same thing.
my #elem = "foo bar baz" =~ /(\w+)/g
You can also set the "default scalar" variable $_.
$_ = "foo bar baz";
my #elem = /(\w+)/g;
See perldoc perlre for more information.
If you only want to use that string as an array, you could use qw().
my #elem = qw"foo bar baz";
See perldoc perlop ​ ​( Quote and Quote-like Operators ).

Also, split, e.g.,
my $x = "foo bar baz";
my #elem = split(' ', $x);
OR
my #elem = split(/\w+/, $x);
etc.

They key perl concept is scalar vs list context. Assigning an expression to an array forces list context, as does something like a while loop.
Thus, the equivalent of the block form of String#scan is to use the regexp with a while loop:
while ("foo bar baz" =~ /(\w+)/) {
my $word = $1;
do_something_with($word);
}

Related

perl regex to grep for a character in a word

I am trying to write a perl script to grep for a character in a string. All the strings are stored in an array. We iterate over the array and look if the particular word occurs, if so grep for a particular pattern.
my #array = ("Foo1", "Bar", "Baz", "Foo5", "Foo2", "Bak", "Foo3");
foreach my $ var (#array){
if ($var =~ /Foo/){
#Regex to grep for the number which is at the end of string Foo
}
}
Any leads are welcomed. Thanks for the help.
************Edits***********
Thanks for the comments.
if ($var =~ /Foo/){
/.Foo+([A-Z]+)/;
print $1, "\n";
}
The above is the code that I tried and it didn't print anything.
Matching without binding =~ matches against the $_ variable that you don't use. Furthermore, the dot before Foo means the second regex will match only if Foo is preceded by something (that's not a newline). As all your strings containing Foo start with it, the second regex can never match even if you specify $var =~.
Moreover, you can match the number directly in the condition.
And finally, [A-Z] doesn't match digits. Use [0-9] instead.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
foreach my $var (#array){
if ($var =~ /Foo([0-9]+)/){
print $1, "\n";
}
}
Try this for your second regex
[^\w]*([0-9])
Then you can use the first group to get the number.
my #array = qw( Foo1 Bar Baz Foo5 Foo2 Bak Foo3 );
my #var = map(/Foo(\w+)/) #array;
print #var ;

Perl idiom for quickly searching file with elements in array

what is the Perl idiom to search a string or a whole file for array elements occurrences? E.g.:
my #array = qw(word, test, ...);
my $string = ".......";
I want to search for word or test (can also be words, tester, etc.) inside $string and return whatever is found (i.e. group match).
I searched the docs, seems like map + grep is what I need but I just can’t come up with the code for it. Perl is such fun that I am totally clueless sometimes. :)
Using one example from map:
my #squares = map { $_ * $_ } grep { $_ > 5 } #numbers;
I suppose I can split the string into array and grep. Am I right?
grep { #array } #string; # something like grep {/(word|test)/} #string but I want to use array
my #word_roots = qw( word test );
my $pat = join '|', map quotemeta, #word_roots;
my $re = qr/\b(?:$pat)\w+\b/;
my #matches = $string =~ /($re)/g;
How about something like this from a re.pl session:
$ my #array = qw(word test)
$VAR1 = 'word';
$VAR2 = 'test';
$ my $string = ' the word is test, I said'
the word is test, I said
$ my #match_array = map { $string =~ /\b($_)\b/ } #array
$VAR1 = 'word';
$VAR2 = 'test';
The parenthesis around \b$_\b capture the match in the regex inside of map.
The \b ensures that we only match is the word is found on its own (like "test" or "word") and not words that contain the characters "test", or "word" in them like "coward" or "brightest". See http://www.regular-expressions.info/wordboundaries.html for more details on \b.

Perl regex to remove empty strings

I'm trying to write regex to remove empty strings on a line (and doesn't care about whitespace between list items), for example: baz foo, "","", bar, "" becomes baz foo, bar
So far I'm trying
$newLine =~ s/""\s*?,//g;
$newLine =~ s/,\s*?""//g;
but given baz "", foo, "" it is returning baz foo, "", but I want it to return baz foo.
Could anyone explain what's going wrong/how I can fix it?
Thanks
Your code works for me:
$string = 'baz "", foo, ""';
$string =~ s/""\s*?,//g;
$string =~ s/,\s*?""//g;
print "$string\n";
Returns
baz foo
for me.
Edit: As stated in the commentary below, it won't work for the string baz "", "". That's because the first regex consumes the , right before the second "", causing the second regex to not match.
An alternative for the regexes would be to use map.
$string = 'baz "", "", foo';
$string = join(" ", map { $_ =~ s/\s*""\s*//g; $_; } (split(/\s*,\s*/, $string)));
That will set $string to baz foo
It's easier to split the string, remove elements that don't contain anything apart from "" (and possibly surrounding spaces) and join those back.
The following might work for you:
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
Example:
$ cat mmm
$newLine = 'baz foo, "","", bar, ""';
#foo = grep { !/\s?""\s?/ } split /,/, $newLine;
$newLine = join(',', #foo);
print $newLine . "\n";
$ perl mmm
baz foo, bar

Matching simple keyword and keyword with spaces

I'm currently working on a function which takes a list of keywords and a string(a looong string) as arguments, and i want it to return a list of each matched keyword. Problem is that a keyword can be in 2 words.
For exemple - keyword1 : foobar, keyword2 : foo bar, keyword3 : barfoo)
string:
hi this is foobar, have you seen my foo bar, he is very fooBar ?
i want a list with (foobar, foo bar);
For the moment i got:
#matches = $string =~ m/\b(?:foobar|foo bar)\b/gi ;
This works fine for simple words, but not for composed words :/
any idea ?
Thank you for your help.
sub myfunc {
my ($str, #kw) = #_;
my ($re) = map qr/\b ($_) \b/x, join "|", #kw;
return $str =~ /$re/gi;
}
my #kwords = ("foobar", "foo bar", "barfoo");
my #arr = myfunc("hi this is foobar, have you seen my foo bar, he is very fooBar ?", #kwords);
This returns the correct results:
sub match {
my #keywords=#_;
my $s=pop #keywords;
return grep {$s=~/\b\Q$_\E\b/i} #keywords;
}
my #matches=match('foobar','foo bar','barfoo)','hi this is foobar, have you seen my foo bar, he is very fooBar?'); #this returns (foobar, foo bar)
BTW your code #matches = $string =~ m/\b(?:foobar|foo bar)\b/gi; is working great too, if you remove the /i modifier it returns (foobar, foo bar)

Pattern binding operator on assignment

I am working into uncommented perl code. I came across a passage, that looks too perl-ish to me as a perl beginner. This is a simplified adaption:
my $foo;
my $bar = "x|y|z|";
$bar =~ s{\|$}{};
($foo = $bar) =~ s{ }{}gs;
I understand that $bar =~ s{\|$}{} applies the regular expression on the right to the string inside $bar.
But what does the expression ($foo = $bar) =~ s{ }{}gs; mean? I am not asking about the regular expression but on the expression it is apllied to.
Just follow the precedence that the parentheses dictate and solve each statement one at the time:
($a = $b) =~ s{ }{}gs;
#^^^^^^^^--- executed first
($a = $b) # set $a to the value contained in $b
$a =~ s{ }{}gs; # perform the regex on $a
The /g global modifier causes the regex to match as many times as possible, the /s modifier makes the wildcard . match newline as well (so it now really matches everything). The /s modifier is redundant for this regex, since there are no wildcards . in it.
Note that $a and $b are predeclared variables which are used by sort, and you should avoid using them.
When in doubt, you can always print the variables and see how they change. For example:
use Data::Dumper;
my $x = 'foo bar';
(my $y = $x) =~ s{ }{}gs;
print Dumper $x, $y;
Output:
$VAR1 = 'foo bar';
$VAR2 = 'foobar';
A scalar assignment in scalar context returns its left-hand-side operand (as shown here). That means
$a = $b
assigns the value of $b to $a and returns $a. That means
($a = $b) =~ s{ }{}gs;
is short for
$a = $b; $a =~ s{ }{}gs;
and long for
$a = $b =~ s{ }{}gsr; # Requires 5.14+
But what does the expression ($a = $b) =~ s{ }{}gs; mean?
It is same as
$a = $b;
$a =~ s{ }{}gs;
s{ }{}gs is substitution s/ //gs regex with {} as delimiters