I have a small problem. I have a perl regexp with multiple capture groups. Some of them have quantifiers (like '+'). If no quantifier is added, then #- & #+ array are filled nicely with the matched position of the capture groups, but if a quantifier is added only the last match is detected. But I would like to have all of them.
An example:
my $s = 'xx1a2b3cyy';
my $re = qr/^xx(\d\w)+/;
So I'd like to know that matches are '1a', '2b', '3c' at 2, 4, 6.
Simple matching gives:
if ($s =~ $re) {
print "Match #-, #+\n";
for (my $i = 0; $i < #-; ++$i) {
print 'i: ', $i, " - '", substr($s, $-[$i], $+[$i] - $-[$i]), "\n";
}
}
Gives:
Match 0 6, 8 8
i: 0 - 'xx1a2b3c
i: 1 - '3c
So only the last capture group match is remembered.
My next simple try was which is not really what I want as the RE is different:
$re = qr/(\d\w)/;
my #s = ($s =~ /$re/g);
print "RE: '#s'\n";
while ($s =~ /$re/g) {
print "Match #-, #+\n";
for (my $i = 0; $i < #-; ++$i) {
print 'i: ', $i, " - '", substr($s, $-[$i], $+[$i] - $-[$i]), "\n";
}
}
gives:
RE: '1a 2b 3c'
Match 2 2, 4 4
i: 0 - '1a
i: 1 - '1a
Match 4 4, 6 6
i: 0 - '2b
i: 1 - '2b
Match 6 6, 8 8
i: 0 - '3c
i: 1 - '3c
But this not what I want, as it would match a string like 'ZZ1aAA2bBB3cZZ'.
So somehow I have to combine the two. The best what I could get:
$re = '^xx(?:\d\w)*?\G(\d\w)';
pos($s) = 2;
while ($s =~ m($re)g) {
print "Match pos: ", pos($s), ', G: ', $1, ", '#-', '#+'\n"
}
gives:
Match pos: 4, G: 1a, '0 2', '4 4'
Match pos: 6, G: 2b, '0 4', '6 6'
Match pos: 8, G: 3c, '0 6', '8 8'
This is almost nice, but for this I need to know the position of the first possible match. If it is not set properly it will not match anything. I can only determine the first position if I remove the non greedy part:
$re = '^xx(\d\w)';
if ($s =~ m($re)) {
print "Match: '#-', '#+'\n";
}
which gives:
Match: '0 2', '4 4'
So $-[1] gives the first position, but for this I have to modify the RE "manually".
If I add code execution into the pattern I almost get what I need:
use re 'eval';
$re = '^xx(\d\w)+(??{print "Code: <#-> <#+>\n"})';
$s =~ m($re) and print "Match\n";
gives:
Code: <0 6> <8 8>
Code: <0 4> <6 6>
Code: <0 2> <4 4>
For this I need to add the (?{ code }) part.
Does anybody know a simpler method (I mean not need to modify the original RE) to get all the possible matches of a capture group having a quantifier?
Thanks in advance!
There's no general solution; the regex engine simply doesn't store the necessary information. You're asking to use a regex as a parser, and that's a no-go.
sub extract {
for ($_[0]) {
/^ xx /xg
or return ();
my #matches;
push #matches, $1 while /\G (\d\w) /xg;
return #matches;
}
}
or
sub extract {
my ($pairs) = $_[0] =~ /^xx((?:\d\w)+)/
or return ();
return unpack('(a2)*', $pairs);
}
If you just want the positions, it's the same.
sub extract {
for ($_[0]) {
/^ xx /xg
or return ();
my #matches;
push #matches, $-[1] while /\G (\d\w) /xg;
return #matches;
}
}
or
sub extract {
$_[0] =~ /^xx((?:\d\w)+)/
or return ();
return map { $-[1] + ( $_ - 1 )*2 } 1..length($1)/2;
}
Even a non-general purpose solution is extremely hard using regular expressions. Say you had the following pattern:
xx(\d\w)+yy(\d\w)+zz
The correct solution would be:
use Storable qw( dclone );
my $s = "xx1a2byy3c4dZZ...xx5a6byy7c8dzz";
local our $rv;
if (
$s =~ /
(?{ [] })
xx
(?: (\d\w) (?{ my $r = dclone($^R); push #{ $r->[0] }, $^N; $r }) )+
yy
(?: (\d\w) (?{ my $r = dclone($^R); push #{ $r->[1] }, $^N; $r }) )+
zz
(?{ $rv = $^R; })
/x
) {
say "\$1: #{ $rv->[0] }";
say "\$2: #{ $rv->[1] }";
}
Output:
$1: 5a 6b
$2: 7c 8d
And something like
(zz(\d\w)+)+
would need
use Storable qw( dclone );
my $s = "zz1a2bzz3c4d";
local our $rv;
if (
$s =~ /
(?{ [] })
(?:
(?{ my $r = dclone($^R); push #$r, []; $r })
zz
(?: (\d\w) (?{ my $r = dclone($^R); push #{ $r->[-1] }, $^N; $r }) )+
)+
(?{ $rv = $^R; })
/x
) {
say "\$1: #$_" for #$rv;
}
Output:
$1: 1a 2b
$1: 3c 4d
I think I can give some explanation for the behavior you see:
In the first example I can see only one capture group. The quantifier allows it to be used multiple times, but it's one capture group nonetheless. So every new occurence of a matching subpattern would overwrite the value previously captured there. Even if the RE engine is already advanced behind it, but backtracking would occur (for e.g. a more advanced pattern with branching and the likes), it could be that the now again visited capture group would change. And since #- and #+ hold the positions to the capture groups (as opposed to occuring subpattern matches), this would explain why there's only the last occurence of the subpattern contained.
You could even play around with named subpatterns and %+/%- and would experience the same thing. It becomes more obvious with the already used (?{ }), at least for debugging purposes. But use re 'debug' is fine for shorter regexes / strings to match.
So be aware of the effects of backtracking to capture groups while matching is still in progress!
But if you don't have to care about backtracking, I can think of kind of a recipe to handle a capture group with a quantifier:
If your capture group is (bla) and your quantifier {0,3}, transform it into
(?:(bla)(?{ print $-[$#-],$+[$#-]."\n" })){0,3}.
You practically put the subpattern into another (non-capturing) group. If the RE engine is done with it, execute code regarding the last capture group matched so far. The quantifier outside the surrounding group is then responsible for the correct number of execution of the code fragment.
So you example becomes this:
use Data::Dumper;
my $s = 'xx1a2b3cyy';
my #submatches;
sub getem { push #submatches, [$-[$#-],$+[$#-]]; }
$s =~ m/^xx(?:(\d\w)(?{ getem() }))+/;
print Dumper(\#submatches);
This also works with multiple capture groups transformed this way:
my $s = 'xx1a2b3cyy4de5fg6hihhh2';
$s =~ m/^xx(?:(\d\w)(?{ getem() }))+yy(?:(\d\w{2})(?{ getem() }))+hh/;
You have to adjust the index used, if your capture group contains more capture groups. That's why I prefer names capture groups.
Hope this helps.
Related
I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example
#array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t )
foreach my $string (#array){
if ( $string =~ /^\/{3}/ ){
print " yes, word with 3 / found !\n";
print "$string\n";
}
else {
print " no word contain 3 / found\n";
}
Few macthing i try but none of them work
$string =~ /^\/{3}/;
$string =~ /^(\w+\/\w+\/\w+\/\w+)/;
$string =~ /^(.*\/.*\/.*\/.*)/;
Any other way i can match this type of string and print the string?
Match a / globally and compare the number of matches with 3
if ( ( () = m{/}g ) == 3 ) { say "Matched 3 times" }
where the =()= operator is a play on context, forcing list context on its right side but returning the number of elements of that list when scalar context is provided on its left side.
If you are uncomfortable with such a syntax stretch then assign to an array
if ( ( my #m = m{/}g ) == 3 ) { say "Matched 3 times" }
where the subsequent comparison evaluates it in the scalar context.
You are trying to match three consecutive / and your string doesn't have that.
The pattern you need (with whitespace added) is
^ [^/]* / [^/]* / [^/]* / [^/]* \z
or
^ [^/]* (?: / [^/]* ){3} \z
Your second attempt was close, but using ^ without \z made it so you checked for string starting with your pattern.
Solutions:
say for grep { m{^ [^/]* (?: / [^/]* ){3} \z}x } #array;
or
say for grep { ( () = m{/}g ) == 3 } #array;
or
say for grep { tr{/}{} == 3 } #array;
You need to match
a slash
surrounded by some non-slashes (^(?:[^\/]*)
repeating the match exactly three times
and enclosing the whole triple in start of line and and of line anchors:
$string =~ /^(?:[^\/]*\/[^\/]*){3}$/;
if ( $string =~ /\/.*\/.*\// and $string !~ /\/.*\/.*\/.*\// )
I want to print sentences from text file placed in () brackets deeper than one pair of brackets.
For example for this text file :
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
the output should be :
print me
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
hhhhh
This is what I've done so far:
#!/usr/bin/perl -w
open(FILE, "<", $ARGV[0]) or die "file open error";
if ( #ARGV ) #if there are args
{
if ( -f $ARGV[0] ) #if its regular file
{
while(<FILE>)
{
my #array = split('\)',$_);
foreach(#array)
{
if ($_ =~ /.*\((.*)/)
{
print "$1\n";
}
}
}
close(FILE);
}
else{
print "Arg is not a file\n";}
}
else{
print "no args\n";}
My code can't separate the sentences placed in deeper brackets.
Assuming brackets are balanced:
use strict;
use warnings;
my #a;
while (<DATA>) {
while (/\(([^()]*(?:\(((?1))\)[^()]*(?{push #a, $2}))*+)\)/g){}
}
print join "\n", #a;
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bb(xxxx)b))aa)
blabla (blabla(hhhhh))
It returns:
print me
cccc
xxxx
bbbb(cccc)bb(xxxx)b
eeee(bbbb(cccc)bb(xxxx)b)
hhhhh
The idea is to store the capture group 2 content after each recursion, using the (?{...}) construct to execute code in the pattern.
Note that the order of results isn't ideal since the innermost content appears first. Unfortunately, I didn't find a way to change the order of results.
Pattern details:
\( # opening bracket level 1
( # open capture group 1
[^()]* # all that is not a bracket
(?:
\( # opening bracket for level 2 (or more when a recursion occurs)
( # capture group 2: to store the result
(?1) # recursion
)
\) # closing bracket for level 2 (or more ...)
[^()]* #
(?{push #a, $2}) # store the capture group 2 content in #a
)*+ # repeat when needed
)
\) # closing bracket level 1
EDIT: This pattern assumes that brackets are balanced, but if it isn't the case, this may cause problems of unwanted results for certain strings. The reason is that results are stored before the whole pattern succeeds.
Example with the string 1234 ( 5678 (abcd(efgh)ijkl) where a closing bracket is missing:
1234 ( 5678 (abcd(efgh)ijkl)
# ^ ^---- second attempt succeeds, "efgh" is stored
# '---- first attempt fails, but "efgh", "abcd(efgh)ijkl" are stored
To solve the problem, you can choose between two default behaviours:
the strict behaviour that only accepts balanced brackets. All you need is to store the results in a temporary array and to reset this array in the while loop or when a closing bracket is missing. In this case the result will only be "efgh":
my #a;
my #b;
while (<DATA>) {
while (/\(([^()]*(?:\(((?1))\)[^()]*(?{push #b, $2}))*+)(?:\)|(?{undef #b})(*F))/g) {
push #a, #b;
undef #b;
}
}
a more tolerant behaviour that doesn't make mandatory the closing bracket. To do that you must replace each \) with (?:\)|$). In this case, the first attempt succeeds and consumes characters until the end of the string (in other words, there isn't a second attempt). The results are "efgh" and "abcd(efgh)ijkl"
This is probably easiest, and the most maintainable with a two-pass solution.
The initial pass captures all first level parentheses. The second pass captures all enclosed parenthesis groups, only advancing a single character in order to match every level of embedded paren groups:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my $data = do { local $/; <DATA> };
my $parens_content_re = qr{
\(
(
(?:
[^()]*+
|
\( (?1) \)
)*
)
\)
}x;
say for map {/(?=$parens_content_re)\(/g} map {/$parens_content_re/g} $data;
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
----(----(aaaa(123)bbbb(456)cccc)----)----
Outputs:
$ perl parens.pl
print me
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
hhhhh
aaaa(123)bbbb(456)cccc
123
456
This code works by capturing levels recursively, using a simple regex for ) and split-ing by ( for the opening paren. It first prepares by peeling off the two starting layers of nesting. It works for shown examples, and a few others. However, there are other ways to nest pairs, for which rules are not specified. Also, this is probably rough around the edges. There is no magic of any kind involved and adjusting code for new cases should be feasible.
use warnings;
use strict;
my ($lev, #el, #res, $rret);
while (my $str = <DATA>)
{
print "\nString: $str\n";
#res = ();
# Drop two layers to start: strip last two ), split by ( and drop 0,1
$str =~ s/ (.*) \) [^)]* \) [^)]* $/$1/x;
#el = split '\(', $str;
#el = #el[2..$#el];
# Edge case: may have one element and be done, but with extra )
if (#el > 1) { $lev = join '(', #el }
else { ($lev = $el[0]) =~ s|\)||g }
push #res, $lev;
# Get next level and join string back, recursively
while ( $rret = nest_one($lev) ) {
$lev = join '(', #$rret;
push #res, $lev;
last if #$rret == 1;
}
print "\t$_\n" for #res;
}
# Strip last ) and past it, split by ( and drop first element
sub nest_one {
(my $lev = $_[0]) =~ s/(.*) \) [^)]* $/$1/x;
my #el = split '\(', $lev;
shift #el;
return (#el) ? \#el : undef;
}
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
It prints
blabla(nothing(print me)) nanana (nanan)
print me
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
blabla (blabla(hhhhh))
hhhhh
Perl's regexp matching is left-greedy, so that the regexp
/\A (a+) (.+) \z/x
matching the string 'aaab', will set $1='aaa' and $2='b'.
(The \A and \z are just to force start and end of the string.)
You can also give non-greedy qualifiers, as
/\A (a+?) (.+?) \z/x
This will still match, but give $1='a' and $2='aab'.
But I would like to check all possible ways to generate the string, which are
$1='aaa' $2='b'
$1='aa' $2='ab'
$1='a' $2='aab'
The first way corresponds to the default left-greedy behaviour, and the third way corresponds to making the first match non-greedy, but there may be ways in between those extremes. Is there a regexp engine (whether Perl's, or some other such as PCRE or RE2) which can be made to try all possible ways that the regexp specified generates the given string?
Among other things, this would let you implement 'POSIX-compatible' regexp matching where the longest total match is picked. In my case I really would like to see every possibility.
(One way would be to munge the regexp itself, replacing the + modifier with {1,1} on the first attempt, then {1,2}, {1,3} and so on - for each combination of + and * modifiers in the regexp. That is very laborious and slow, and it's not obvious when to stop. I hope for something smarter.)
Background
To answer Jim G.'s question on what problem this might solve, consider a rule-based translation system between two languages, given by the rules
translate(any string of one or more 'a' . y) = 'M' . translate(y)
translate('ab') = 'U'
Then there is a possible result of translate('aaab'), namely 'MU'.
You might try to put these rules into Perl code based on regexps, as
our #m;
my #rules = (
[ qr/\A (a+) (.*) \z/x => sub { 'M' . translate($m[1]) } ],
[ qr/\A ab \z/x => sub { 'U' } ],
);
where translate runs over each of #rules and tries to apply them in turn:
sub translate {
my $in = shift;
foreach (#rules) {
my ($lhs, $rhs) = #$_;
$in =~ $lhs or next;
local #m = ($1, $2);
my $r = &$rhs;
next if index($r, 'fail') != -1;
return $r;
}
return 'fail';
}
However, calling translate('aaab') returns 'fail'. This is because
it tries to apply the first rule matching (a+)(.*) and the regexp
engine finds the match with the longest possible string of 'a'.
Using the answer suggested by ikegami, we can try all ways in which
the regular expression generates the string:
use re 'eval';
sub translate {
my $in = shift;
foreach (#rules) {
my ($lhs, $rhs) = #$_;
local our #matches;
$in =~ /$lhs (?{ push #matches, [ $1, $2 ] }) (*FAIL)/x;
foreach (#matches) {
local #m = #$_;
my $r = &$rhs;
next if index($r, 'fail') != -1;
return $r;
}
}
return 'fail';
}
Now translate('aaab') returns 'MU'.
local our #matches;
'aaab' =~ /^ (a+) (.+) \z (?{ push #matches, [ $1, $2 ] }) (*FAIL)/x;
I'm looping through a series of regexes and matching it against lines in a file, like this:
for my $regex (#{$regexs_ref}) {
LINE: for (#rawfile) {
/#$regex/ && do {
# do something here
next LINE;
};
}
}
Is there a way for me to know how many matches I've got (so I can process it accordingly..)?
If not maybe this is the wrong approach..? Of course, instead of looping through every regex, I could just write one recipe for each regex. But I don't know what's the best practice?
If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.
Or am I misunderstanding the question?
Example:
my #list = /$my_regex/g;
if (#list)
{
# do stuff
print "Number of matches: " . scalar #list . "\n";
}
You will need to keep track of that yourself. Here is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my #regexes = (
qr/b/,
qr/a/,
qr/foo/,
qr/quux/,
);
my %matches = map { $_ => 0 } #regexes;
while (my $line = <DATA>) {
for my $regex (#regexes) {
next unless $line =~ /$regex/;
$matches{$regex}++;
}
}
for my $regex (#regexes) {
print "$regex matched $matches{$regex} times\n";
}
__DATA__
foo
bar
baz
In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special #- array (emphasis added):
#LAST_MATCH_START
#-
$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with
substr $_, $-[n], $+[n] - $-[n]
if $-[n] is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with #+.
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.
After a match against some variable $var:
$` is the same as substr($var, 0, $-[0])
$& is the same as substr($var, $-[0], $+[0] - $-[0])
$' is the same as substr($var, $+[0])
$1 is the same as substr($var, $-[1], $+[1] - $-[1])
$2 is the same as substr($var, $-[2], $+[2] - $-[2])
$3 is the same as substr($var, $-[3], $+[3] - $-[3])
Example usage:
#! /usr/bin/perl
use warnings;
use strict;
my #patterns = (
qr/(foo(bar(baz)))/,
qr/(quux)/,
);
chomp(my #rawfile = <DATA>);
foreach my $pattern (#patterns) {
LINE: for (#rawfile) {
/$pattern/ && do {
my $captures = $#-;
my $s = $captures == 1 ? "" : "s";
print "$_: got $captures capture$s\n";
};
}
}
__DATA__
quux quux quux
foobarbaz
Output:
foobarbaz: got 3 captures
quux quux quux: got 1 capture
How about below code:
my $string = "12345yx67hjui89";
my $count = () = $string =~ /\d/g;
print "$count\n";
It prints 9 here as expected.
Here's a problem I ran into recently. I have attributes strings of the form
"x=1 and y=abc and z=c4g and ..."
Some attributes have numeric values, some have alpha values, some have mixed, some have dates, etc.
Every string is supposed to have "x=someval and y=anotherval" at the beginning, but some don't. I have three things I need to do.
Validate the strings to be certain that they have x and y.
Actually parse the values for x and y.
Get the rest of the string.
Given the example at the top, this would result in the following variables:
$x = 1;
$y = "abc";
$remainder = "z=c4g and ..."
My question is: Is there a (reasonably) simple way to parse these and validate with a single regular expression? i.e.:
if ($str =~ /someexpression/)
{
$x = $1;
$y = $2;
$remainder = $3;
}
Note that the string may consist of only x and y attributes. This is a valid string.
I'll post my solution as an answer, but it doesn't meet my single-regex preference.
Assuming you also want to do something with the other name=value pairs this is how I would do it ( using Perl version 5.10 ):
use 5.10.0;
use strict;
use warnings;
my %hash;
while(
$string =~ m{
(?: ^ | \G ) # start of string or previous match
\s*
(?<key> \w+ ) # word characters
=
(?<value> \S+ ) # non spaces
\s* # get to the start of the next match
(?: and )?
}xgi
){
$hash{$+{key}} = $+{value};
}
# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};
On older Perls ( at least Perl 5.6 );
use strict;
use warnings;
my %hash;
while(
$string =~ m{
(?: ^ | \G ) # start of string or previous match
\s*
( \w+ ) = ( \S+ )
\s* # get to the start of the next match
(?: and )?
}xgi
){
$hash{$1} = $2;
}
# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};
These have the added benefit of continuing to work if you need to work with more data.
I'm not the best at regular expressions, but this seems pretty close to what you're looking for:
/x=(.+) and y=([^ ]+)( and (.*))?/
Except you use $1, $2, and $4. In use:
my #strs = ("x=1 and y=abc and z=c4g and w=v4l",
"x=yes and y=no",
"z=nox and w=noy");
foreach (#strs) {
if ($_ =~ /x=(.+) and y=([^ ]+)( and (.*))?/) {
$x = $1;
$y = $2;
$remainder = $4;
print "x: $x; y: $y; remainder: $remainder\n";
} else {
print "Failed.\n";
}
}
Output:
x: 1; y: abc; remainder: z=c4g and w=v4l
x: yes; y: no; remainder:
Failed.
This of course leaves out plenty of error checking, and I don't know everything about your inputs, but this seems to work.
As a fairly simple modification to Rudd's version,
/^x=(.+) and y=([^ ]+)(?: and (.*))?/
will allow you to use $1, $2 and $3 (the ?: makes it a noncapturing group), and will ensure that the string starts with "x=" rather than allowing a "not_x=" to match
If you have better knowledge of what the x and y values will be, this should be used to tighten the regex further:
my #strs = ("x=1 and y=abc and z=c4g and w=v4l",
"x=yes and y=no",
"z=nox and w=noy",
"not-x=nox and y=present",
"x=yes and w='there is no and y=something arg here'");
foreach (#strs) {
if ($_ =~ /^x=(.+) and y=([^ ]+)(?: and (.*))?/) {
$x = $1;
$y = $2;
$remainder = $3;
print "x: {$x}; y: {$y}; remainder: {$remainder}\n";
} else {
print "$_ Failed.\n";
}
}
Output:
x: {1}; y: {abc}; remainder: {z=c4g and w=v4l}
x: {yes}; y: {no}; remainder: {}
z=nox and w=noy Failed.
not-x=nox and y=present Failed.
x: {yes and w='there is no}; y: {something}; remainder: {}
Note that the missing part of the last test is due to the current version of the y test requiring no spaces, if the x test had the same restriction that string would have failed.
Rudd and Cebjyre have gotten you most of the way there but they both have certain problems:
Rudd suggested:
/x=(.+) and y=([^ ]+)( and (.*))?/
Cebjyre modified it to:
/^x=(.+) and y=([^ ]+)(?: and (.*))?/
The second version is better because it will not confuse "not_x=foo" with "x=foo" but will accept things such as "x=foo z=bar y=baz" and set $1 = "foo z=bar" which is undesirable.
This is probably what you are looking for:
/^x=(\w+) and y=(\w+)(?: and (.*))?/
This disallows anything between the x= and y= options, places and allows and optional " and..." which will be in $3
Here's basically what I did to solve this:
($x_str, $y_str, $remainder) = split(/ and /, $str, 3);
if ($x_str !~ /x=(.*)/)
{
# error
}
$x = $1;
if ($y_str !~ /y=(.*)/)
{
# error
}
$y = $1;
I've omitted some additional validation and error handling. This technique works, but it's not as concise or pretty as I would have liked. I'm hoping someone will have a better suggestion for me.