I'm assigning a series of regex's to vars. Some of the regex values will be the same but unique and be identifiable by the var name itself ($a and $c as example).
#various regex
$a = "([\d]{1,2})"
$b = "([\d]{3})"
$c = $b #Note this has the same regex as $b
$d = "\s[-]\s"
$e = "[_]"
#select the pattern
$patternNum = 4
I then want to be able to concat the vars in different orders to create a larger regex.
Switch ($patternNum){
#create a pattern
1 { $pattern = ($a, $e, $b) }
2 { $pattern = ($a, $d, $b) }
3 { $pattern = ($a, $d, $a, $e, $b) }
4 { $pattern = ($a, $e, $b, $e, $c) }
}
This creates the expanded regex string i'm hoping for
#so i can use full regex pattern later
$selectedPattern = -join $pattern
But I want to be able to associate the var in $pattern with the original var name and not the literal string that's associated with the var (as some strings will be the same)
#find the index of each var and assign to another var so var can be used later to identify position within match
var1 = [array]::IndexOf($pattern, $a) # [0]
var2 = array]::IndexOf($pattern, $b) # [2]
var3 = [array]::IndexOf($pattern, $c) # [2] but i want it to be [4]
The regex which will be used in matching, each match will be used in different strings and in different positions
I thought i'd be able to use scriptblock {} and then convert back to string but that doesn't seem to work. Can anybody think of a way to get each vars original var name or think of a better way of doing this?
Using named captures
Use (? ) syntax to create named captures. Make the name the same as your variable names, e.g.:
$A = '(?<A>\d{3})'
$B = '(?<B>\D{3})'
$string = 'ABC123'
$regex = $B + $A
$string -match $regex
$Matches
Name Value
---- -----
A 123
B ABC
0 ABC123
Now you can correlate the variables to the position they matched in the string like this:
$string.IndexOf($Matches.A)
3
$string.IndexOf($Matches.B)
0
following your code I'll do it like this, but knowing what's is your real need someone can suggest other solution:
$c = $b
$d = "\s[-]\s"
$e = "[_]"
#select the pattern
$patternNum = 4
Switch ($patternNum){
#create a pattern
1 { $pattern = ('$a', '$e', '$b') }
2 { $pattern = ('$a', '$d', '$b') }
3 { $pattern = ('$a', '$d', '$a', '$e', '$b') }
4 { $pattern = ('$a', '$e', '$b', '$e', '$c') }
}
$selectedPattern = -join $pattern
$var1 = [array]::IndexOf($pattern, '$a') # [0]
$var2 = [array]::IndexOf($pattern, '$b') # [2]
$var3 = [array]::IndexOf($pattern, '$c') # [4]
#converting literal to your pattern
$regexpattern = $ExecutionContext.InvokeCommand.ExpandString( -JOIN $pattern )
$regexpattern
([\d]{1,2})[_]([\d]{3})[_]([\d]{3})
Related
if we are in the following case:
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
I have a table "T1" in the database with columns: ("Name", "Address") and I want to put on the column "Name" values "value1,Value2,Value3" and on the column "Adress" values "Value4,Value5,Value6"
in this case we have :
my #matches = $str =~ /Name=(.*?)\nAdress=(.*?)\n/g;
how can we use $1 and $2 with #matches in order to get separately all occurence of Name and Adresse in order to insert them on the Table T1?
All captures of all matches are returned, so you'd have to group them up.
use List::Util 1.29 qw( pairs );
for ( pairs( $str =~ /Name=(.*) Address=(.*)/g ) ) {
my #matches = #$_;
...
}
That said, it's far more common to grab the matches iteratively.
while ($str =~ /Name=(.*) Address=(.*)/g) {
my #matches = ( $1, $2 );
...
}
Regex is not always the right tool for the job. Your data looks a lot like it's just key/value pairs. Use split to break it up. No need for a pattern match here.
Your code and data doesn't match, so I've gone with what the code said.
use strict;
use warnings;
my $str = <<EO_STR;
Name=Value1
Adress=Value4
Name=Value2
Adress=Value5
Name=Value3
Adress=Value6
EO_STR
my $fields;
foreach my $pair (split /\n/, $str) {
my ($key, $value) = split /=/, $pair;
$key =~ s/^\s+//;
push #{ $fields->{$key} }, $value;
}
use Data::Dumper;
print Dumper $fields;
The code will create this data structure:
$VAR1 = {
'Name' => [
'Value1',
'Value2',
'Value3'
],
'Adress' => [
'Value4',
'Value5',
'Value6'
]
};
You can now access these two array references and use them to insert data into your table.
I have done the following:
#!/usr/bin/env perl
use v5.28;
my $str = <<EO_STR;
Name=Value1 Adress=Value4
Name=Value2 Adress=Value5
Name=Value3 Adress=Value6
EO_STR
my #array;
for my $a (split(/\n/, $str)) {
my %res = $a =~ m/(\w+)=(\w+)/g;
push #array, \%res;
}
for my $a (#array) {
for my $b (sort keys %{$a}) {
"\n", <INPUT_FILE> ); say $b.'->'.$a->{$b};
}
}
It creates this structure:
#array = [
{
Name->Value1,
Adress->Value4
},
...
];
I am used to Perl but a Perl 6 newbie
I want to host a regular expression in a text variable, like I would have done in perl5:
my $a = 'abababa';
my $b = '^aba';
if ($a =~ m/$b/) {
print "True\n";
} else {
print "False\n";
}
But if I do the same in Perl6 it doesn't work:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /^aba/; # True
say so $a ~~ /$b/; # False
I'm puzzled... What am I missing?
You need to have a closer look at Quoting Constructs.
For this case, enclose the part of the LHS that is a separate token with angle brackets or <{ and }>:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /<$b>/; # True, starts with aba
say so $a ~~ /<{$b}>/; # True, starts with aba
my $c = '<[0..5]>'
say so $a ~~ /<$c>/; # False, no digits 1 to 5 in $a
say so $a ~~ /<{$c}>/; # False, no digits 1 to 5 in $a
Another story is when you need to pass a variable into a limiting quantifier. That is where you need to only use braces:
my $ok = "12345678";
my $not_ok = "1234567";
my $min = 8;
say so $ok ~~ / ^ \d ** {$min .. *} $ /; # True, the string consists of 8 or more digits
say so $not_ok ~~ / ^ \d ** {$min .. *} $ /; # False, there are 7 digits only
Is there a reason why you don't pick the regex object for these types of uses?
my $a = 'abababa';
my $b = rx/^aba/;
say so $a ~~ /^aba/; # True
say so $a ~~ $b; # True
I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1
I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11
I would like to only get the first capture group into the same var. In fact, I am looking for a short form of:
$_ = $1 if m/$prefix($pattern)$suffix/;
Something like:
s/$prefix($pattern)$suffix/$1/a; ## Where a is the option I am looking for
Or even better:
k/$prefix($pattern)$suffix/; ## Where k is also an option I wish I can use...
This will avoid the need of matching all the text which leads to a more complicated line:
s/^.*$prefix($pattern)$suffix.*$/defined $1 ? $1 : ""/e;
Any clues?
This would be useful for this example:
push #array, {id => k/.*\s* = \s* '([^']+)'.*/};
instead of
/.*\s* = \s* '([^']+)'.*/;
my $id = '';
$id = $1 if $1;
push #array, {id => $id};
Edit:
I just found an interesting way, but if $1 is not defined I will get an error :(
$_ = (/$prefix($pattern)$suffix/)[0];
Use a Conditional operator
my $var = /$prefix($pattern)$suffix/ ? $1 : '';
You always want to make sure that you regex matches before using a capture group. By using a ternary you can either specify a default value or you can warn that a match wasn't found.
Alternatively, you can use the list form of capture groups inside an if statement, and let your else output the warning:
if (my ($var) = /$prefix($pattern)$suffix/) {
...;
} else {
warn "Unable to find a match";
}
You can use the /r switch to return the altered string instead of doing the substitution on the variable. There is no need to capture anything at all with that. Just get rid of the prefix and the suffix and add the result of that operation to your array.
use Data::Dump;
my #strings = qw( prefixcontent1suffix prefixcontent2suffix );
my #array = map { s/^prefix|suffix$//gr } #strings;
dd #array;
__END__
("content1", "content2")
If you want it to be configurable, how about this:
my $prefix = qr/.+\{\{/;
my $suffix = qr/\}\}.+/;
my #strings = ( '{fo}o-_#09{{content1}}bar42' );
my #array = map { s/^$prefix|$suffix$//gr } #strings;
dd #array;
__END__
"content1"
In list context, the m// operator returns the captures as a list. This means you can do this:
($_) = m/$prefix($pattern)$suffix/;
or this:
my ($key, $value) = $line =~ m/^([^=]+)=([^=]+)$/;