Related
my $str = "<SampleElement oldattribs=\"sa1 sa2 sa3\">";
$str =~ s#<SampleElement[^>]*oldattribs="([^"]*)"#
my $fulcnt=$&;
my $afids=$1;
my #affs = ();
if($afids =~ m/\s+/) {
#affs = split /\s/, $afids;
my $jnafs = join ",", map { $_=~s/[a-z]*//i, } #affs;
($fulcnt." newattribs=\"$jnafs\"");
}
else {
($fulcnt);
}
#eg;
My Output:
<SampleElement oldattribs="sa1 sa2 sa3" newattribs="1,1,1">
Expected Output:
<SampleElement oldattribs="sa1 sa2 sa3" newattribs="1,2,3">
Someone could point out me where I am doing wrong. Thanks in advance.
Where you're going wrong is earlier than you think - you're parsing XML using regular expressions. XML is contextual, and regex isn't, so it's NEVER going to be better than a dirty hack.
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> parse ( \*DATA );
my $sample_elt = $twig -> get_xpath('//SampleElement',0);
my #old_att = split ( ' ', $sample_elt -> att('oldattribs') );
$sample_elt -> set_att('newattribs', join " ", map { /(\d+)/ } #old_att);
$twig -> set_pretty_print ( 'indented_a' );
$twig -> print;
__DATA__
<XML>
<SampleElement oldattribs="sa1 sa2 sa3">
</SampleElement>
</XML>
But to answer the core of your problem - you're misusing map as an iterator here.
map { $_=~s/[a-z]*//i, } #affs;
Because what that is doing is iterating all the elements in #affs and modifying those... but map is just returning the result of the expression - which is 1 because it worked.
If you want to change #affs you'd:
s/[a-z]*//i for #affs;
But if you didn't want to, then the easy answer is to use the r regex flag:
map { s/[a-z]*//ir } #affs;
Or as I've done in my example:
map { /(\d+)/ } #affs;
Which regex matches and captures the numeric part of the string, but as a result the 'captured' text is what's returned.
Here is a simple way to build shown output from the input $str.
Note: The input is in single quotes, not double. Then the \" isn't a problem in the regex.
my $str = '<SampleElement oldattribs=\"sa1 sa2 sa3\">';
# Pull 'sa1 sa2 sa3' string out of it
my ($attrs) = $str =~ /=\\"([^\\]+)/; # " # (turn off bad syntax highlight)
# Build '1,2,3' string from it
my $indices = join ',', map { /(\d+)/ } split ' ', $attrs;
# Extract content between < > so to add to it, put it back together
my ($content) = $str =~ /<(.*)>/;
my $outout = '<' . $content . " newattribs=\"$indices\"" . '>';
This gives the required output.
Some of these can be combined into single statements, if you are into that. For example
my $indices =
join ',', map { /(\d+)/ } split ' ', ($str =~ /"([^\\]+)/)[0]; # "
$str =~ s/<(.*)>/<$1 newattribs=\"$indices\">/;
All of this can be rolled into one regex, but it becomes just unwieldy and hard to maintain.
Above all – this appears to be XML or such ... please don't do it by hand, unless there is literally just a snippet or two. There are excellent parsers.
Found solution on this by searching map function:
my $str = "<SampleElement oldattribs=\"sa1 sa2 sa3\">";
$str=~s#<SampleElement[^>]*oldattribs="([^"]*)"#my $fulcnt=$&; my $afids=$1;
my #affs = ();
if($afids=~m/\s+/)
{
#affs = split /\s/, $afids;
my #newas = join ",", map { (my $foo = $_) =~ s/[a-z]*//i; $foo; } #affs ;
($fulcnt." newattribs=\"#newas\"");
}
else
{
($fulcnt);
}
#eg;
I have updated the below line on my code:
my #newas = join ",", map { (my $foo = $_) =~ s/[a-z]*//i; $foo; } #affs ;
Instead of
my $jnafs = join ",", map { $_=~s/[a-z]*//i, } #affs;
Its working thanks for all.
Have this string:
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
The data is repeated.
I need to remove the []' characters from the data so it looks like this:
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
I'm also trying to split the data to assign it to variables as seen below:
my($currency, $strike, $tenor, $tenor2,$ado_symbol) = split /,/, $_;
This works for everything but the ['TEST'] section. Should I remove the []' characters first then keep my split the same or is there an easier way to do this?
Thanks
Something that's useful to know is this - that split takes a regex. (It'll even let you capture, but that'll insert into the returned list, which is why I've got (?: for non capturing groups)
I observe your data only has [' right next to the delimiter - so how about:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
chomp;
my #fields = split /(?:\'])?,(?:\[\')?/;
print Dumper \#fields;
}
__DATA__
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
Output:
$VAR1 = [
'ABC',
'-0.5',
'10Y',
'10Y',
'TEST',
'ABC.1000145721ABC',
'-0.5',
'20Y',
'10Y',
'TEST',
'ABC.1000145722'
];
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
$str =~ s/\['|'\]//g;
print $str;
output is
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
Now you can split.
Clean up $ado_symbol after split:
$ado_symbol =~ s/^\['//;
$ado_symbol =~ s/'\]$//;
You can use a global regex match to find all substrings that are not a comma, a single quote, or a square bracket
Like this
use strict;
use warnings 'all';
my $s = q{ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722};
my #data = $s =~ /[^,'\[\]]+/g;
my ( $currency, $strike, $tenor, $tenor2, $ado_symbol ) = #data;
print "\$currency = $currency\n";
print "\$strike = $strike\n";
print "\$tenor = $tenor\n";
print "\$tenor2 = $tenor2\n";
print "\$ado_symbol = $ado_symbol\n";
output
$currency = ABC
$strike = -0.5
$tenor = 10Y
$tenor2 = 10Y
$ado_symbol = TEST
Another alternative
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
my ($currency, $strike, $tenor, $tenor2,$ado_symbol) = map{ s/[^A-Z0-9\.-]//g; $_} split ',',$str;
print "$currency, $strike, $tenor, $tenor2, $ado_symbol",$/;
Output is:
ABC, -0.5, 10Y, 10Y, TEST
I'm just starting to learn Perl. I need to parse JavaScript file. I came up with the following subroutine, to do it:
sub __settings {
my ($_s) = #_;
my $f = $config_directory . "/authentic-theme/settings.js";
if ( -r $f ) {
for (
split(
'\n',
$s = do {
local $/ = undef;
open my $fh, "<", $f;
<$fh>;
}
)
)
{
if ( index( $_, '//' ) == -1
&& ( my #m = $_ =~ /(?:$_s\s*=\s*(.*))/g ) )
{
my $m = join( '\n', #m );
$m =~ s/[\'\;]//g;
return $m;
}
}
}
}
I have the following regex, that removes ' and ; from the string:
s/[\'\;]//g;
It works alright but if there is a mentioned chars (' and ;) in string - then they are also removed. This is undesirable and that's where I stuck as it gets a bit more complicated for me and I'm not sure how to change the regex above correctly to only:
Remove only first ' in string
Remove only last ' in string
Remove ont last ; in string if exists
Any help, please?
You can use the following to match:
^'|';?$|;$
And replace with '' (empty string)
See DEMO
Remove only first ' in string
Remove only last ' in string
^[^']*\K'|'(?=[^']*$)
Try this .See demo.
https://regex101.com/r/oF9hR9/8
Remove ont last ; in string if exists
;(?=[^;]*$)
Try this.See demo.
https://regex101.com/r/oF9hR9/9
All three in one
^[^']*\K'|'(?=[^']*$)|;(?=[^;]*$)
See Here
You can use this code:
#!/usr/bin/perl
$str = "'string; 'inside' another;";
$str =~ s/^'|'?;?$//g;
print $str;
IDEONE demo
The main idea is to use anchors: ^ beginning of string, $ end of string and ;? matches the ";" symbol at the end only if it is present (? quantifier is making the pattern preceding it optional).EDIT: Also, ; will get removed even if there is no preceding '.
I suggest that your original code should look more like this. It is much more idiomatic Perl and I think more straightforward to follow
sub __settings {
my ($_s) = #_;
my $file = "$config_directory/authentic-theme/settings.js";
return unless -r $file;
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my #file = <$fh>;
chomp #file;
for ( #file ) {
next if m{//};
if ( my #matches = $_ =~ /(?:$_s\s*=\s*(.*))/g ) {
my $matches = join "\n", #matches;
$matches =~ tr/';//d;
return $matches;
}
}
}
I have a string:
<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"
So the format is
(<val>; rel="key")*
And I want to parse that to a hash with the following format:
next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5
In Java I would use a regex pattern to extract each key => value pair and put them into a map. The pattern would be something like:
<([^>]++)>;\s*rel="([^"]++)"
Which would give me the key in the second match group and the value in the first. Would the same approach be the best way to achieve this is Perl, or is there something snazzier I could do?
P.S. the reason I'm using Perl rather than Java is that the server doesn't have Java.
My first inclination was to split the string on commas and work with the three substrings, but it is probably better to use a global match ina while loop.
This should do what you want. (Perl is by far the better tool for text processing like this!)
Update I've just realised that your choice of markdown discarded the angle brackets and newlines. Is this more appropriate? I assume it's a multi-line string?
use strict;
use warnings;
my $str = <<'END';
<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"
END
my %data;
while ($str =~ / < ([^<>]+) >; \s* rel="([^"]+)" (?:,\s*)? /xg) {
$data{$2} = $1;
}
use Data::Dump;
dd \%data;
output
{
first => "https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5",
last => "https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5",
next => "https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5",
}
You can split the string on a "," and then use a map to create the hash:
#!/usr/bin/env perl
use strict;
use warnings;
my $str = 'https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';
my %hash = map {
my ($v, $k) = $_ =~ /\s*([^;]+);\s*rel="([^"]+)".*/;
$k => $v;
} split ',', $str;
foreach my $key (keys %hash) {
print "$key => $hash{$key}\n"
}
output:
first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5
update
With the new string you could do:
$str = q(<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>; rel="next", <https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>; rel="first", <https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>; rel="last");
my %hash = map {
my ($v, $k) = $_ =~ /<([^>]+)>;\s*rel="([^"]+)".*/;
$k => $v;
} split ',', $str;
to get the same result.
use strict;
use warnings;
my $string='https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';
my #array=split /,/, $string;
my %hash;
foreach(#array)
{
if($_=~/(.*?);\s*rel\=\s*"([^"]+)"/)
{
$hash{$2}=$1;
}
}
print "$_ => $hash{$_}\n" foreach(keys%hash);
I'm trying to create a Perl hash from an input string, but I'm having problems with the original 'split', as values may contain quotes. Below is an example input string, and my (desired) resulting hash:
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,MOB,123,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %hash =
(
CREATE => '',
USER => '',
TEL => '12345678',
MOB => '444001122',
Type => 'Whatever',
ATTRIBUTES => 'ID,0,MOB,123,KEY,VALUE',
TIME => '08:01:59',
FIN => '0',
);
The input string is of arbitrary length, and the number of keys is not set.
Thanks!
-hq
Use Text::CSV. It handles comma separated value files correctly.
Update
It seems the format of your input is not parsable by the standard module, even with sep_char and allow_loose_quotes. So, you have to do the heavy lifting yourself, but you can still use Text::CSV to parse each key-value pair:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Data::Dumper;
use Text::CSV;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my #fields = split /:/, $command;
my %hash;
my $csv = Text::CSV->new();
my $i = 0;
while ($i <= $#fields) {
if (1 == $fields[$i] =~ y/"//) {
my $j = $i;
$fields[$i] .= ':' . $fields[$j] until 1 == $fields[++$j] =~ y/"//;
$fields[$i] .= ':' . $fields[$j];
splice #fields, $i + 1, $j - $i, ();
}
$csv->parse($fields[$i]);
my ($key, $value) = $csv->fields;
$hash{$key} = "$value"; # quotes turn undef to q()
$i++;
}
print Dumper \%hash;
As far as I can see the most obvious candidate - Text::CSV - won't handle this format properly, so a home-grown regular expression solution is the only one.
use strict;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config;
for my $field ($command =~ /(?:"[^"]*"|[^:])+/g) {
my ($key, $val) = split /,/, $field, 2;
($config{$key} = $val // '') =~ s/"([^"]*)"/$1/;
}
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
output
%config = (
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
);
If you have Perl v5.10 or later then you have the convenient (?| ... ) regular expression group, which allows you to write this
use 5.010;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config = $command =~ /(\w+) (?| , " ([^"]*) " | , ([^:"]*) | () )/gx;
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
which produces identical results to the code above.
This looks like something Text::ParseWords could handle. The quotewords subroutine will split the input on the delimiter :, ignoring delimiters inside quotes. This will give us the basic list of items, seen first in the output as $VAR1. After that, it is a simple matter of parsing the comma separated items with a regex which will handle optional second capture to accommodate empty tags such as those for CREATE and USER.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
while (<DATA>) {
chomp;
my #list = quotewords(':', 0, $_);
my %hash = map { my ($k, $v) = /([^,]+),?(.*)/; $k => $v; } #list;
print Dumper \#list, \%hash;
}
__DATA__
CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0
Output:
$VAR1 = [
'CREATE',
'USER',
'TEL,12345678',
'MOB,444001122',
'Type,Whatever',
'ATTRIBUTES,ID,0,KEY,VALUE',
'TIME,08:01:59',
'FIN,0'
];
$VAR2 = {
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
};
my %hash = $command =~ /([^:,]+)(?:,((?:[^:"]|"[^"]*")*))?/g;
s/"([^"]*)"/$1/g
for grep defined, values %hash;