So far I have a text config file that has variables in it like this:
foo=something
bar=${foo}
When processing it in Perl, I replace the variables with a regex like this:
s/(?<!\\)\$\{\s*([^\}\s]+(\s+[^\}\s]+)*)\s*\}/_replacement( $1 )/ge;
where _replacement() knows how to look up existing values.
Now I'd like to be able to extend the syntax of the text file so I can do this recursively, like this:
platform=x86
foo_x86=somepc
foo_arm=somedevice
bar=${foo_${platform}}
where when processing the last line, "${platform}" first gets replaced with "x86", and the resulting "${foo_x86}" then gets replaced with "somepc".
I'm having trouble with extending my regex: I can't quite seem to find a way to match the } with the correct {. (Greedy won't work because a string might be "${a}${b}", and non-greedy will mismatch.)
Ideas?
sub expand {
my ($vars, $s) = #_;
$s =~ s{
(
\$\{
(
(?> [^\$\}]
| \$ (?! \{ )
| (?1)
)++
)
\}
)
}{
my $expanded = expand($vars, $2);
$vars->{$expanded} // "\${$expanded}" # /
}xeg;
return $s;
}
my $vars = {};
while (<DATA>) {
chomp;
my ($key, $val) = split(/=/, $_, 2);
$vars->{$key} = expand($vars, $val);
}
print($vars->{bar}, "\n");
__DATA__
platform=x86
foo_x86=somepc
foo_arm=somedevice
bar=${foo_${platform}}
Your input file is very similar to the format supported by AppConfig which would also let's you extend the same syntax to your command line or other inputs.
use AppConfig;
# create a new AppConfig object
my $config = AppConfig->new();
# define a variable
$config->set( 'foo', 'bar' );
# read configuration file
my $filename = 'your.conf';
$config->file( $filename );
Related
I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11
I would like to only get the first capture group into the same var. In fact, I am looking for a short form of:
$_ = $1 if m/$prefix($pattern)$suffix/;
Something like:
s/$prefix($pattern)$suffix/$1/a; ## Where a is the option I am looking for
Or even better:
k/$prefix($pattern)$suffix/; ## Where k is also an option I wish I can use...
This will avoid the need of matching all the text which leads to a more complicated line:
s/^.*$prefix($pattern)$suffix.*$/defined $1 ? $1 : ""/e;
Any clues?
This would be useful for this example:
push #array, {id => k/.*\s* = \s* '([^']+)'.*/};
instead of
/.*\s* = \s* '([^']+)'.*/;
my $id = '';
$id = $1 if $1;
push #array, {id => $id};
Edit:
I just found an interesting way, but if $1 is not defined I will get an error :(
$_ = (/$prefix($pattern)$suffix/)[0];
Use a Conditional operator
my $var = /$prefix($pattern)$suffix/ ? $1 : '';
You always want to make sure that you regex matches before using a capture group. By using a ternary you can either specify a default value or you can warn that a match wasn't found.
Alternatively, you can use the list form of capture groups inside an if statement, and let your else output the warning:
if (my ($var) = /$prefix($pattern)$suffix/) {
...;
} else {
warn "Unable to find a match";
}
You can use the /r switch to return the altered string instead of doing the substitution on the variable. There is no need to capture anything at all with that. Just get rid of the prefix and the suffix and add the result of that operation to your array.
use Data::Dump;
my #strings = qw( prefixcontent1suffix prefixcontent2suffix );
my #array = map { s/^prefix|suffix$//gr } #strings;
dd #array;
__END__
("content1", "content2")
If you want it to be configurable, how about this:
my $prefix = qr/.+\{\{/;
my $suffix = qr/\}\}.+/;
my #strings = ( '{fo}o-_#09{{content1}}bar42' );
my #array = map { s/^$prefix|$suffix$//gr } #strings;
dd #array;
__END__
"content1"
In list context, the m// operator returns the captures as a list. This means you can do this:
($_) = m/$prefix($pattern)$suffix/;
or this:
my ($key, $value) = $line =~ m/^([^=]+)=([^=]+)$/;
I'm parsing a CSV file with embedded commas, and obviously, using split() has a few limitations due to this.
One thing I should note is that the values with embedded commas are surrounded by parentheses, double quotes, or both...
for example:
(Date, Notional),
"Date, Notional",
"(Date, Notional)"
Also, I'm trying to do this without using any modules for certain reasons I don't want to go into right now...
Can anyone help me out with this?
This should do what you need. It works in a very similar way to the code in Text::CSV_PP, but doesn't allow for escaped characters within the field as you say you have none
use strict;
use warnings;
use 5.010;
my $re = qr/(?| "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = "$line," =~ /$re/g;
say "<$_>" for #fields;
output
<Date, Notional 1>
<Date, Notional 2>
<Date, Notional 3>
Update
Here's a version for older Perls (prior to version 10) that don't have the regex branch reset construct. It produces identical output to the above
use strict;
use warnings;
use 5.010;
my $re = qr/(?: "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = grep defined, "$line," =~ /$re/g;
say "<$_>" for #fields;
I know you already have a working solution with Borodin's answer, but for the record there is also a simple solution with split (see the results at the bottom of the online demo). This situation sounds very similar to regex match a pattern unless....
#!/usr/bin/perl
$regex = '(?:\([^\)]*\)|"[^"]*")(*SKIP)(*F)|\s*,\s*';
$subject = '(Date, Notional), "Date, Notional", "(Date, Notional)"';
#splits = split($regex, $subject);
print "\n*** Splits ***\n";
foreach(#splits) { print "$_\n"; }
How it Works
The left side of the alternation | matches complete (parentheses) and (quotes), then deliberately fails. The right side matches commas, and we know they are the right commas because they were not matched by the expression on the left.
Possible Refinements
If desired, the parenthess-matching portion could be made recursive to match (nested(parens))
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
I know that this is quite old question, but for completeness I would like to add solution from great book "Mastering Regular Expressions" by Jeffrey Friedl (page 271):
sub parse_csv {
my $text = shift; # record containing comma-separated values
my #fields = ( );
my $field;
chomp($text);
while ($text =~ m{\G(?:^|,)(?:"((?>[^"]*)(?:""[^"]*)*)"|([^",]*))}gx) {
if (defined $2) {
$field = $2;
} else {
$field = $1;
$field =~ s/""/"/g;
}
# print "[$field]";
push #fields, $field;
}
return #fields;
}
Try it against test row:
my $line = q(Ten Thousand,10000, 2710 ,,"10,000",,"It's ""10 Grand"", baby",10K);
my #fields = parse_csv($line);
my $i;
for ($i = 0; $i < #fields; $i++) {
print "$fields[$i],";
}
print "\n";
I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.
A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)
Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G
$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilĂ !
Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F
Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.
$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;
I'm currently extracting values from a table within a file via REGEX line matches against the table rows.
foreach my $line (split("\n", $file)) {
if ($line =~ /^(\S+)\s*(\S+)\s*(\S+)$/) {
my ($val1, $val2, $val3) = ($1, $2, $3);
# $val's used here
}
}
I purposely assign vals for clarity in the code. Some of my table rows contain 10+ vals (aka columns) - is there a more efficient method of assigning the vals instead of doing ... = ($1, $2, ..., $n)?
A match in list context yields a list of the capture groups. If it fails, it returns an empty list, which is false. You can therefore
if( my ( $val1, $val2, $val3 ) = $line =~ m/^(\S+)\s*(\S+)\s*(\S+)$/ ) {
...
}
However, a number of red flags are apparent in this code. That regexp capture looks very similar to a split:
if( my ( $val2, $val2, $val3 ) = split ' ', $line ) {
...
}
Secondly, why split $file by linefeeds; if you are reading the contents of a file, far nicer is to actually read a single line at once:
while( my $line = <$fh> ) {
...
}
I assume that this is not your actual code, because if so, it will not work:
foreach my $line (split("\n", $file)) {
if ($line =~ /^(\S+)\s*(\S+)\s*(\S+)$/) {
my ($val1, $val2, $val3) = ($1, $2, $3);
}
# all the $valX variables are now out of scope
}
You should also be aware that \s* will also match the empty string, and may cause subtle errors. For example:
"a bug" =~ /^(\S+)\s*(\S+)\s*(\S+)$/;
# the captures are now: $1 = "a"; $2 = "bu"; $3 = "g"
Even despite the fact that \S+ is greedy, the anchors ^ ... $ will force the regex to fit, hence allowing the empty strings to split the words.
If your intention is to capture all the words that are separated by whitespace, using split is your best option, as others have already mentioned.
open my $fh, "<", "file.txt" or die $!;
my #stored;
while (<$fh>) {
my #vals = split;
push(#stored, \#vals) if #vals; # ignore empty values
}
This will store any captured values into a two-dimensional array. Using the file handle directly and reading line-by-line is the preferred method, unless for some reason you actually need to have the entire file in memory.
Looks like you are just using a table with a space delimiter.You can use the split function:
#valuearray = split(" ", $line)
And then address the elements as:
#valuearray[0] ,#valuearray[1] etc..