Assign captured group to value, and if no match: assign string - regex

In Perl regex, the documentation says
... in scalar context, $time =~ /(\d\d):(\d\d):(\d\d)/ returns a true
or false value. In list context, however, it returns the list of
matched values ($1,$2,$3)
But how is it that when you provide an alternative option - when no match is found - TRUE or FALSE will be assigned even when in list context?
As an example, I want to assign the matched group to a variable and if not found, use the string value ALL.
my ($var) = $string =~ /myregex/ || 'ALL';
Is this possible? And what about multiple captured groups? E.g.
my ($var1, $var2) = $string =~ /(d.t)[^d]+(d.t)/ || 'dit', 'dat';
Where if the first group isn't matched, 'dit' is used, and if no match for the second is found 'dat'.

For the first requirement, you can use the ternary operator:
my $string = 'abded';
for my $f ('a' .. 'f') {
my ($v1) = $string =~ /($f)/ ? ($1) : ('ALL') ;
say "$f : $v1";
}
Output:
a : a
b : b
c : ALL
d : d
e : e
f : ALL

Related

Regex $1 into variable interferes with another variable

I have been struggling with a section of my code for a while now and can't figure it out. Seems to have something to do with how $1 is handled but I cannot find anything relevant.
The regex finds the 16640021 and assigns it to a position in the arrays.
my #one;
my #two;
my $articleregex = qr/\s*\d*\/\s*\d*\|\s*(.*?)\|/p; # $1 = article number
my $row = " 7/ 1| 16640021|Taats 3 IP10 |14-03-03| | | 1,0000|st | | 01| | N| 0|";
if ($row =~ /$articleregex/g) {
$one[0] = $1;
}
if ($row =~ /$articleregex/g) {
$two[0] = $1;
}
print $one[0];
print $two[0];
Which outputs
Use of uninitialized value in print at perltest3.pl line 13.
16640021
It appears that the designation of $one[0] somehow interferes with that of $two[0]. This seems strange to me as the two variables and their designations should not be interacting in any way
It's because you used if (//g) instead of if (//).
//g in scalar context sets pos($_)[1] to where the match left off, or unsets pos($_)[1] if the match was unsuccessful[2].
//g in scalar context starts matching at position pos($_)[1].
For example,
$_ = "ab";
say /(.)/g ? $1 : "no match"; # a
say /(.)/g ? $1 : "no match"; # b
say /(.)/g ? $1 : "no match"; # no match
say /(.)/g ? $1 : "no match"; # a
This allows the following to iterate through the matches:
while (/(.)/g) {
say $1;
}
Don't use if (//g)[3]!
$_ is being used to represent the variable being matched against.
Unless /c is also used.
Unless you're unrolling a while (//g), or unless you're using if (//gc) to tokenize.

Perl pattern match not working as expected

I'm trying to match values, which may be comma separated, using a regex. Basically, I want to return true if any value in the string does NOT have 3g or 3k starting in the 3rd position.
My test code is as follows:
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
print $_;
say $_ =~ /(?:^|,)\w{2}[^(?:3G|3K)]/i ? " true" : " false";
}
This returns
in3g123456,dh3k123456,dhec110101 true
dhec110101,dhec123456 true
in3g123456,dh3k123456 false
c3kasdf false <- whaaaaaaaat?
usdfusdufs3gsdf true
I don't understand why the 4th one is not true. Any help would be appreciated.
[^(?:3G|3K)] reads as "any character but (, ?, etc."
failed
v
c3 kasdf
/(?:^|,)\w{2}[^(?:3G|3K)]/i
Use this:
/(?:^|,)\w{2}(?!3G|3K)/i
Demo: https://regex101.com/r/P2XsgN/1.
How about /\b\w{2}(?!3g|3k)/i.
\b matches the empty string at the beginning or end of a word. Slightly simpler equivalent to (^|,) in this situation.
(?!foo) is a zero-width negative lookahead assertion. So, matches the empty string as long as it's not followed by a substring that matches foo.
You can also split the string first, instead of parsing everything with a regex. That is far more flexible and maintainable, and easier.
When processing the list of the extracted "values" you can match any character twice then your pattern, /^..$patt/. The module List::MoreUtils is useful (and fast) for list manipulations, and its notall function is tailor-made for your condition.
use warnings 'all';
use strict;
use List::MoreUtils qw(notall);
my $file = '...';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my $res = notall { /^..(?:3k|3g)/ } split /,/;
print "$_: " . ($res ? 'true' : 'false'), "\n";
}
I presume that you read from a file. If not, replace while (<$fn>) with for (#strings).
The notall function returns true if any element of the list fails the condition.
The split by default uses $_ so we only need the pattern. Here it is simply , but the pattern takes a regex so one can match separators flexibly. For example, this /[,\s]+/ splits on any amount of , and/or whitespace. So ,, , in a string is matched as a separator, as well as , or space(s).
When applied to the array with your strings the above prints
in3g123456,dh3k123456,dhec110101: true
dhec110101,dhec123456: true
in3g123456,dh3k123456: false
c3kasdf: true
usdfusdufs3gsdf: true
You could use substr to get data at 3rd and 4th position and then compare it with (3g|3k).
substr $_,2,2
#!/usr/bin/perl
use strict;
use warnings;
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
my #inputs = split /,/,$_;
my $flag = 0;
foreach (#inputs){
$flag = 1 unless ((substr $_,2,2) =~ /(3g|3k)/);
}
$flag ? print "$_: True\n" : print "$_: False\n";
}
Output:
in3g123456,dh3k123456,dhec110101: True
dhec110101,dhec123456: True
in3g123456,dh3k123456: False
c3kasdf: True
usdfusdufs3gsdf: True
Demo

Perl short form of regex capture

I would like to only get the first capture group into the same var. In fact, I am looking for a short form of:
$_ = $1 if m/$prefix($pattern)$suffix/;
Something like:
s/$prefix($pattern)$suffix/$1/a; ## Where a is the option I am looking for
Or even better:
k/$prefix($pattern)$suffix/; ## Where k is also an option I wish I can use...
This will avoid the need of matching all the text which leads to a more complicated line:
s/^.*$prefix($pattern)$suffix.*$/defined $1 ? $1 : ""/e;
Any clues?
This would be useful for this example:
push #array, {id => k/.*\s* = \s* '([^']+)'.*/};
instead of
/.*\s* = \s* '([^']+)'.*/;
my $id = '';
$id = $1 if $1;
push #array, {id => $id};
Edit:
I just found an interesting way, but if $1 is not defined I will get an error :(
$_ = (/$prefix($pattern)$suffix/)[0];
Use a Conditional operator
my $var = /$prefix($pattern)$suffix/ ? $1 : '';
You always want to make sure that you regex matches before using a capture group. By using a ternary you can either specify a default value or you can warn that a match wasn't found.
Alternatively, you can use the list form of capture groups inside an if statement, and let your else output the warning:
if (my ($var) = /$prefix($pattern)$suffix/) {
...;
} else {
warn "Unable to find a match";
}
You can use the /r switch to return the altered string instead of doing the substitution on the variable. There is no need to capture anything at all with that. Just get rid of the prefix and the suffix and add the result of that operation to your array.
use Data::Dump;
my #strings = qw( prefixcontent1suffix prefixcontent2suffix );
my #array = map { s/^prefix|suffix$//gr } #strings;
dd #array;
__END__
("content1", "content2")
If you want it to be configurable, how about this:
my $prefix = qr/.+\{\{/;
my $suffix = qr/\}\}.+/;
my #strings = ( '{fo}o-_#09{{content1}}bar42' );
my #array = map { s/^$prefix|$suffix$//gr } #strings;
dd #array;
__END__
"content1"
In list context, the m// operator returns the captures as a list. This means you can do this:
($_) = m/$prefix($pattern)$suffix/;
or this:
my ($key, $value) = $line =~ m/^([^=]+)=([^=]+)$/;

Perl context - why is return value not the same as $1?

I found that in some situations that the return value from a regex match is True and other times the return value in the ()
eg, I am finding all text up to a semi-colon:
my $blop = "some;different;fields";
if(some expression)
{
my $blip = $blop =~ /([^;]+)/;
}
I expected $blip eq "some", but it is 1 (or TRUE) return value from the check.
$1 contains my desired result, so I can write:
$blop =~ /([^;]+)/;
my $blip = $1;
But that's inefficient. I am sure that in other scenarios that the return is the bracketed result, what is different here?
If you give the regex list context, it returns the captures in the list:
use strict;
use warnings;
my $blop = "some;different;fields";
my($blip) = $blop =~ /([^;]+)/;
print "$blip\n";
print "$1\n";
That prints 'some' twice.
In a scalar context, a regex returns true if it matched and false otherwise (which is why you can use them in conditions). Add these two lines to the code above:
my $blup = $blop =~ /([^;]+);([^;]+)/;
print "$blup : $1 : $2\n";
and the extra code prints:
1 : some : different

Perl conditional regex extraction

This conditional must match either telco_imac_city or telco_hier_city. When it succeeds I need to extract up to the second underscore of the value that was matched.
I can make it work with this code
if ( ($value =~ /(telco_imac_)city/) || ($value =~ /(telco_hier_)city/) ) {
print "value is: \"$1\"\n";
}
But if possible I would rather use a single regex like this
$value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$1\"\n";
}
But if I pass the value telco_hier_city I get this output on testing the second value
Use of uninitialized value $1 in concatenation (.) or string at ./test.pl line 19.
value is: ""
What am I doing wrong?
while (<$input>){
chomp;
print "$1\n" if /(telco_hier|telco_imac)_city/;
}
Perl capture groups are numbered based on the matches in a single statement. Your input, telco_hier_city, matches the second capture of that single regex (/(telco_imac_)city|(telco_hier_)city/), meaning you'd need to use $2:
my $value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$2\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
Because there was no match in your first capture group ((telco_imac_)), $1 is uninitialized, as expected.
To fix your original code, use FlyingFrog's regex:
my $value = $ARGV[0];
if ( $value =~ /(telco_hier_|telco_imac_)city/ ) {
print "value is: \"$1\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
$> ./conditionalIfRegex.pl telco_imac_city
value is: "telco_imac_"