grep a expression in an array of string with Perl - regex

I'm trying to grep a special pattern in an array of strings.
My array of strings is like this :
#list=("First phrase with \"blabla - Word\" and other things "
,"A phrase without... "
,"Second phrase with \"truc - Word\" and etc... "
,"Another phrase without... "
,"Another phrase with \"thing - Word\" and etc... ");
and I tried to grep the pattern "..... - Word" with this function :
#keyw = grep { /\".* Word\"/} (#list);
and I have the following result :
print (Dumper ( #keyw));
$VAR1 = 'First phrase with "blabla - Word" and other things ';
$VAR2 = 'Second phrase with "truc - Word" and etc... ';
$VAR3 = 'Another phrase with "thing - Word" and etc... ';
My grep function is ok to grep the phrase but I would like to grep just the pattern and get the following result :
$VAR1 = '"blabla - Word"';
$VAR2 = '"truc - Word"';
$VAR3 = '"thing - Word"';
Do you know how to reach this result ?

Use map instead of grep:
my #keyw = map { /\"[^"]*? Word\"/ ? $& : () } #list;
It just returns $& (whole match) if the pattern matches and () (empty list) if it does not.
Little caveat: don't use $& with with Perl < 5.18.0.
Here's Casimir et Hippolyte's simpler solution:
my #keyw = map { /(\"[^"]*? Word\")/ } #list;
It works since m// returns a list of captured groups.

Why can't we just use a normal loop?
use strict;
use warnings;
use Data::Dump;
my #phrases = (
"First phrase with \"blabla - Word\" and other things ",
"A phrase without... ",
"Second phrase with \"truc - Word\" and etc... ",
"Another phrase without... ",
"Another phrase with \"thing - Word\" and etc... ",
);
my #matches;
for (#phrases) {
next unless /("[^"]+Word")/;
push(#matches, $1);
}
# prints ["\"blabla - Word\"", "\"truc - Word\"", "\"thing - Word\""]
dd(\#matches);

Related

perl regex match using global switch

I am trying to match a word that starts with a letter and is followed by at .
I use this regex for it
use strict;
use warnings;
use Data::Dumper;
my $str = "fat 123 cat sat on the mat";
my #a = $str =~ /(\s?[a-z]{1,2}(at)\s?)/g;
print Dumper( #a );
the out put I am getting is:
$ perl ~/playground/regex.pl
$VAR1 = 'fat ';
$VAR2 = 'at';
$VAR3 = ' cat ';
$VAR4 = 'at';
$VAR5 = 'sat ';
$VAR6 = 'at';
$VAR7 = ' mat';
$VAR8 = 'at';
why does it match "at" as well when I clearly say match just 1 character before at.
Your optional spaces aren't a good way to delimit words: they are optional
Use the word boundary construct \b for a rough match to the ends of words
use strict;
use warnings;
use Data::Dumper;
my $str = "fat 123 cat sat on the mat";
my #aa = $str =~ /\b[a-z]+at\b/gi;
print Dumper \#aa;
output
$VAR1 = [
'fat',
'cat',
'sat',
'mat'
];
If you want to be more clever and be certain that the word found isn't preceded or followed by a non-space character then you can write this instead
my #aa = $str =~ /(?<!\S)[a-z]+at(?!\S)/gi;
which produces the same result for the data you show

Regex patterns with log header

I want to compare the following string to my regex below. It doesnt seem to be working. Any suggestions?
String to compare : "User searched remoteexec.log for "player" # 02:21:31"
This is my perl code.
my $qu_re = q{(.?) searched (.?) for "(.?)" # (\d+):(\d+):(\d+)};
You are lacking some quantifiers, +:
my $qu_re = q{(.+?) searched (.+?) for "(.+?)" # (\d+):(\d+):(\d+)};
^ ^ ^
Otherwise, they would match 0 or 1 character only. This should suit you, but I would rather use something more restrictive if you ask me though:
my $qu_re = q{(\w+) searched (\w+\.log) for "(\w+)" # (\d{2}):(\d{2}):(\d{2})};
Here though, I'm assuming that User, player and remoteexec can be alphanumeric and contain underscores. {2} near to \d means there are only 2 digits.
This is an example:
use strict;
use warnings;
my $str = qq{User searched remoteexec.log for "player" # 02:21:31};
my $qu_re = qr{(.+) searched (.+) for "([^"]+)" # (\d+):(\d+):(\d+)};
if( $str =~ m/$qu_re/ ) {
print "user: ", $1, "\n";
print "what: ", $2, "\n";
print "player: ", $3, "\n";
print "when: ", "$4:$5:$6" , "\n";
}
prints:
user: User
what: remoteexec.log
player: player
when: 02:21:31

perl regex matching failed

I want to match two different string and output should come in $1 and $2,
According to me in this example, if $a is 'xy abc', then $1 should be 'xy abc' and $2 should 'abc', but 'abc' part is coming in $3.
Can you please help me to writing a regex in that $1 should have whole string and $2 should
have second part.
I am using perl 5.8.5.
my #data=('abc xy','xy abc');
foreach my $a ( #data) {
print "\nPattern= $a\n";
if($a=~/(abc (xy)|xy (abc))/) {
print "\nMatch: \$1>$1< \$2>$2< \$3>$3<\n";
}
}
Output:
perl test_reg.pl
Pattern= abc xy
Match: $1>abc xy< $2>xy< $3><
Pattern= xy abc
Match: $1>xy abc< $2>< $3>abc<
Can be done with:
(?|(abc (xy))|(xy (abc)))
Why even bother with capturing the whole thing? You can use $& for that.
my #data = ('abc xy', 'xy abc');
for(#data) {
print "String: '$_'\n";
if(/(?|abc (xy)|xy (abc))/) {
print "Match: \$&='$&', \$1='$1'\n";
}
}
Because only one of captures $2 and $3 can be defined, you can write
foreach my $item ( #data) {
print "\nPattern= $item\n";
if ($item=~/(abc (xy)|xy (abc))/) {
printf "Match: whole>%s< part>%s<\n", $1, $2 || $3;
}
}
which gives the output
Pattern= abc xy
Match: whole>abc xy< part>xy<
Pattern= xy abc
Match: whole>xy abc< part>abc<
If you can live with allowing more capture variables than $1 and $2, then use the substrings from the branch of the alternative that matched.
for ('abc xy', 'xy abc') {
print "[$_]:\n";
if (/(abc (xy))|(xy (abc))/) {
print " - match: ", defined $1 ? "1: [$1], 2: [$2]\n"
: "1: [$3], 2: [$4]\n";
}
else {
print " - no match\n";
}
}
Output:
[abc xy]:
- match: 1: [abc xy], 2: [xy]
[xy abc]:
- match: 1: [xy abc], 2: [abc]

In Perl, how can I remove all spaces that are not inside double quotes " "?

I'm tying to come up with some regex that will remove all space chars from a string as long as it's not inside of double quotes (").
Example string:
some string with "text in quotes"
Result:
somestringwith"text in quotes"
So far I've come up with something like this:
$str =~ /"[^"]+"|/g;
But it doesn't seem to be giving the intended result.
I'm honestly very new at perl and haven't had too much regexp experience. So if anyone willing to answer would also be willing to provide some insight into the why and how that would be great!
Thanks!
EDIT
String will not contain escaped "'s
It should actually always be formatted like this:
Some.String = "Some Value"
Result would be
Some.String="Some Value"
Here is a technique using split to separate the quoted strings. It relies on your data being consistent and will not work with loose quotes.
use strict;
use warnings;
my #line = split /("[^"]*")/;
for (#line) {
unless (/^"/) {
s/[ \t]+//g;
}
}
print #line; # line is altered
Basically, you split up the string in order to isolate the quoted strings. Once that is done, perform the substitution on all other strings. Since the array elements are aliased in the loop, substitutions are performed on the actual array.
You can run this script like so:
perl -n script.pl inputfile
To see the output. Or
perl -n -i.bak script.pl inputfile
To do in-place edit on inputfile, while saving backup in inputfile.bak.
With that said, I'm not sure what your edit means. Do you want to change
Some.String = "Some Value"
to
Some.String="Some Value"
Text::ParseWords is tailor-made for this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
my #strings = (
q{This.string = "Hello World"},
q{That " string " and "another shoutout to my bytes"},
);
for my $s ( #strings ) {
my #words = quotewords '\s+', 1, $s;
print join('', #words), "\n";
}
Output:
This.string="Hello World"
That" string "and"another shoutout to my bytes"
Using Text::ParseWords means if you ever had to deal with quoted strings with escaped quotation marks in them, you'd be ready ;-)
Also, this sounds like you have a configuration file of some sort and you're trying to parse it. If that is the case, there are probably better solutions.
I suggest removing the quoted substrings using split and then recombining them with join after removing whitespace from the intermediate text.
Note that if the regex used for split contains captures then the captured values will also be included in the list returned.
Here's some sample code.
use strict;
use warnings;
my $source = <<END;
Some.String = "Some Value";
Other.String = "Other Value";
Last.String = "Last Value";
END
print join '', map {s/\s+// unless /"/; $_; } split /("[^"]*")/, $source;
output
Some.String= "Some Value";Other.String = "Other Value";Last.String = "Last Value";
I would simply loop through the string char by char. This way you can handle escaped strings too (just add an isEscaped variable).
my $text='lala "some thing with quotes " lala ... ';
my $quoteOpen = 0;
my $out;
foreach $char(split//,$text) {
if ($char eq "\"" && $quoteOpen==0) {
$quoteOpen = 1;
$out .= $char;
} elsif ($char eq "\"" && $quoteOpen==1) {
$quoteOpen = 0;
$out .= $char;
} elsif ($char =~ /\s/ && $quoteOpen==1) {
$out .= $char;
} elsif ($char !~ /\s/) {
$out .= $char;
}
}
print "$out\n";
Splitting on double quotes, removing spaces only from even fields (i.e. those in quotes):
sub remove_spaces {
my $string = shift;
my #fields = split /"/, $string . ' '; # trailing space needed to keep final " in output
my $flag = 1;
return join '"', map { s/ +//g if $flag; $flag = ! $flag; $_} #fields;
}
It can be done with regex:
s/([^ ]*|\"[^\"]*\") */$1/g
Note that this won't handle any kind of escapes inside the quotes.

Need help with regular expression matching

Given:
$num = "3";
$num_list = "30 3 42 54";
How can I match the "3" and not the "30"? The number order will always be changing.
I tried:
if ($num_list =~ /(\s?$num\s+/)
Unfortunately it matches the "3" in "30". Not sure how to fix it. I know it's because of the ? means 0 or 1.
Your help is much appreciated!
Try using word boundaries:
/\b$num\b/
\b will either match start or end of string or any boundary between word character and non-word character (i.e. between [0-9a-zA-Z_] and not [0-9a-zA-Z_]).
A solution that's great if you're going to check if a lot of numbers are in $num_list:
my $pat = join '|', map quotemeta, split " ", $num_list;
my $re = qr/^(?:$pat)\z/;
$num =~ $re
A solution that's great if you're going to check if a lot of numbers are in $num_list:
my %num_list = map { $_ => 1 } split " ", $num_list;
$num_list{$num}
A solution that doesn't require regexp (great for SQL):
index(" $num_list ", " $num ") >= 0
Simple solutions:
" $num_list " =~ / $num /
$num_list =~ /(?<!\S)$num(?!\S)/
$num_list =~ /\b$num\b/
grep { $_ == $num } split " ", $num_list
How about not using regexps at all?
$num = 3;
#num_list = qw[30 3 42 54];
if (grep { $_ == $num } #num_list) {
...
}
From what I know, Perl uses the same regex as preg_match in PHP
Then you can try the following:
/(?<=^|\s)($num)(?=$|\s)/
I should have a start or whitespace before the 3, and an end or whitespace afterwards
Maybe something like:
$num_list = "30 3 42 54";
$num = "3";
#arr = explode(" ", $num_list);
if (scalar grep {$_ eq $num} #num_list) {
print "Zuko!\n";
}