Extract String between { } in Perl - regex

I have this all of the below stored in $data .
'Berry-Berry Belgian Waffles' => {
'calories' => '900',
'price' => '$8.95',
'description' => 'Light Belgian waffles covered with an assortment of fresh berries and whipped cream'
},
I need to extract the contents in between the '{' and '}' using regular expression. So, the result should be as follows.
'calories' => '900',
'price' => '$8.95',
'description' => 'Light Belgian waffles covered with an assortment of fresh berries and whipped cream'
How do I achieve this using perl script?
This is the script I have so far, it reads from an xml file whether it's on the web or a local file.
use XML::Simple;
use LWP;
use Data::Dumper;
#request path
print "Enter path\n";
my $input = <STDIN>;
my $data;
chomp $input;
print "Path : $input\n";
if ($input =~ /http/)
{
print "This is a webpage\n";
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => $input );
my $res = $ua->request( $req );
print Dumper (XML::Simple->new()->XMLin( $res->content ));
}
else
{
print "This is a local path\n";
$xml = new XML::Simple;
$data = $xml ->XMLin($input);
print Dumper($data);
}
print "Type in keyword to search: \n";
my $inputsearch = <STDIN>;
chomp $inputsearch;
print "You typed --> $inputsearch\n";
Dumper($data) =~ m/$inputsearch/;
$after = "$'";
$result = $after =~ /{...}/;
print $result;

OK, seriously. Please don't use XML::Simple. Even XML::Simple says:
The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.
I'm going to make a guess at how your XML looks, and give you an idea how to extract information from it. I'll update if you can give a better example of the XML.
<root>
<item name="Berry-Berry Belgian Waffles">
<calories>900</calories>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
</item>
</root>
And you can process it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->parse( \*DATA );
foreach my $item ( $twig -> get_xpath ( '//item' ) ) {
print "Name: ", $item -> att('name'),"\n";
foreach my $element ( $item -> children ) {
print $element -> tag,": ", $element -> trimmed_text,"\n";
}
}
__DATA__
<root>
<item name="Berry-Berry Belgian Waffles">
<calories>900</calories>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
</item>
</root>
With XML::Twig you can access "attributes" via att, the element name via tag and the content via text or trimmed_text.
So the above will print:
Name: Berry-Berry Belgian Waffles
calories: 900
price: $8.95
description: Light Belgian waffles covered with an assortment of fresh berries and whipped cream

Related

Perl deferred interpolation of string

I have a situation where there is a triage script that takes in a message, compares it against a list of regex's and the first one that matches sets the bucket. Some example code would look like this.
my $message = 'some message: I am bob';
my #buckets = (
{
regex => '^some message:(.*)',
bucket => '"remote report: $1"',
},
# more pairs
);
foreach my $e (#buckets) {
if ($message =~ /$e->{regex}/i) {
print eval "$e->{bucket}";
}
}
This code will give remote report: I am bob. I keep looking at this and feel like there has to be a better way to do this then it is done now. especially with the double quoting ('""') in the bucket. Is there a better way for this to be handled?
Perl resolves the interpolation when that expression is evaluated. For that, it is sufficient to use a subroutine, no eval needed:
...
bucket => sub { "remote report: $1" },
...
print $e->{bucket}->();
Note that you effectively eval your regexes as well. You can use pre-compiled regex objects in your hash, with the qr// operator:
...
regex => qr/^some message:(.*)/i,
...
if ($message =~ /$e->{regex}/) {
You could use sprintf-style format strings:
use strict;
use warnings;
my $message = 'some message: I am bob';
my #buckets = (
{
regex => qr/^some message:(.*)/,
bucket => 'remote report: %s',
},
# more pairs
);
foreach my $e (#buckets) {
if (my #matches = ($message =~ /$e->{regex}/ig)) {
printf($e->{bucket}, #matches);
}
}

Edit all files in directory tree with regular expression on Windows

I am looking for a program that can edit all files in directory tree like Perl on Unix systems. The files are xml's and another folders.
The regex should delete all the content placed in <loot></loot> brackets.
for example file
<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
<immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
<item id="3"/>
<inside>
<item id="6"/>
</inside>
</item>
</loot>
the file should look after edit:
<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
<immunity fire="1"/>
</immunities>
<loot>
</loot>
I would shy away from anything regex based - XML simply doesn't work with regular expressions.
But fortunately, Perl for Windows is readily available. And better yet, if you go with Strawberry perl, it comes bundled with both XML::Twig and XML::LibXML.
At which point the problem becomes inanely simple:
#!/usr/bin/perl
use warnings;
use strict;
use File::Find::Rule;
use XML::Twig;
sub delete_loot {
my ( $twig, $loot ) = #_;
foreach my $loot_entry ( $loot -> children ) {
$loot_entry -> delete;
}
$twig -> flush;
}
my $twig = XML::Twig -> new ( pretty_print => 'indented',
twig_handlers => { 'loot' => \&delete_loot ,
'_all_' => sub { $_ - > flush } } );
foreach my $file ( File::Find::Rule -> file()
-> name ( '*.xml.txt' )
-> in ( 'C:\tmp' ) ) {
print "Processing $file\n";
$twig -> parsefile_inplace($file);
}
Of course, this also assumes that your XML is, in fact, XML - which your example isn't. If that example is actually correct, then you should really hit whoever wrote it around the head with a rolled up copy of the XML Spec whilst chanting 'don't make fake XML'.

CSR Subject with Special Characters extract

I would need to extract the $subj as seen below snippet, but looks like my Regex isn't working as expected. This is actually similar with this: How to string manipulate/extract subject contents in a CSR using OpenSSL command + Perl? but with a different subject entry in the CSR. I'm not sure if I did a good thing in my regex for the %subjinfo
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $subj =
'subject=/O=test~##$^()_+-=\{}|;':",./<>/OU=test~##$^()_+-=\{}|;':",./<>/emailAd‌​dress=test~##$^()_+-=\{}|;':",./<>/L=CDE/ST=ABC/C=AU/CN=test~##$^()_+ -=\{}|;':",./<>';
my %subjinfo = ( $subj =~ m,(\w+)=([^=]*)(?:/|$),g );
print Dumper \%subjinfo;
and so must give a result to this:
$VAR1 = {
'subject' => '',
'L' => 'NYC',
'C' => 'AMER',
'OU' => 'test~##$^()_+-=\{}|;':",./<>',
'emailAddress' => 'test~##$^()_+-=\{}|;':",./<>',
'ST' => 'AMER',
'CN' => 'test~##$^()_+-=\{}|;':",./<>',
'O' => 'test~##$^()_+-=\{}|;':",./<>'
};
Is that possible? Can you advise?
Splitting on regex looks more natural than regex only solution,
use Data::Dumper;
my $subj = q(subject=/O=test~##$^()_+-=\{}|;':",./<>/OU=test~##$^()_+-=\{}|;':",./<>/emailAddress=test~##$^()_+-=\{}|;':",./<>/L=CDE/ST=ABC/C=AU/CN=test~##$^()_+ -=\{}|;':",./<>);
(undef, my %subjinfo) = split m|/?(\w+)=|, $subj;
print Dumper \%subjinfo;
output
$VAR1 = {
'emailAddress' => 'test~##$^()_+-=\\{}|;\':",./<>',
'CN' => 'test~##$^()_+ -=\\{}|;\':",./<>',
'OU' => 'test~##$^()_+-=\\{}|;\':",./<>',
'L' => 'CDE',
'C' => 'AU',
'ST' => 'ABC',
'subject' => '',
'O' => 'test~##$^()_+-=\\{}|;\':",./<>'
};

Perl - Parse blocks from text file

First, I apologize if you feel this is a duplicate. I looked around and found some very similar questions, but I either got lost or it wasn't quite what I think I need and therefore couldn't come up with a proper implementation.
QUESTION:
So I have a txt file that contains entries made by another script (I can edit the format for how these entries are generated if you can suggest a better way to format them):
SR4 Pool2
11/5/2012 13:45
----------
Beginning Wifi_Main().
SR4 Pool2
11/8/2012 8:45
----------
This message is a
multiline message.
SR4 Pool4
11/5/2012 14:45
----------
Beginning Wifi_Main().
SR5 Pool2
11/5/2012 13:48
----------
Beginning Wifi_Main().
And I made a perl script to parse the file:
#!C:\xampp-portable\perl\bin\perl.exe
use strict;
use warnings;
#use Dumper;
use CGI 'param','header';
use Template;
#use Config::Simple;
#Config::Simple->import_from('config.ini', \%cfg);
my $cgh = CGI->new;
my $logs = {};
my $key;
print "Content-type: text/html\n\n";
open LOG, "logs/Pool2.txt" or die $!;
while ( my $line = <LOG> ) {
chomp($line);
}
print $logs;
close LOG;
My goal is to have a hash in the end that looks like this:
$logs = {
SR4 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
{
time => '11/8/2012 8:45',
msg => 'This message is a multiline message.',
},
},
Pool4 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
SR5 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
};
What would be the best way of going about this? Should I change the formatting of the generated logs to make it easier on myself? If you need anymore info, just ask. Thank you in advanced. :)
The format makes no sense. You used a hash at the third level, but you didn't specify keys for the values. I'm assuming it should be an array.
my %logs;
{
local $/ = ""; # "Paragraph mode"
while (<>) {
my #lines = split /\n/;
my ($x, $y) = split ' ', $lines[0];
my $time = $lines[1];
my $msg = join ' ', #lines[3..$#lines];
push #{ $logs{$x}{$y} }, {
time => $time,
msg => $msg,
};
}
}
Should I change the formatting of the generated logs
Your time stamps appear to be ambiguous. In most time zones, an hour of the year is repeated.
If you can possibly output it as XML, reading it in would be embarrasingly easy with XML::Simple
Although Karthik T idea of using XML makes sense, and I would also consider it, I'm not sure if this is the best route. The first problem is putting it in XML format in the first place.
The second is that XML format might not be so easily parsed. Sure, the XML::Simple module will read the whole thing in one swoop, you then have to parse the XML data structure itself.
If you can set the output however you want, make it in a format that's easy to parse. I like using prefix data identifiers. In the following example, each piece of data has it's own identifier. The ER: tells me when I hit the end of record:
DT: 11/5/2012 13:35
SR: SR4
PL: Pool2
MG: Beginning Wifi_Main().
ER:
DT: 1/8/2012 8:45
SR: SR4
PL: Pool2
MG: This message is a
MG: multiline message.
ER:
Parsing this output is straight forward:
my %hash;
while ( $line = <DATA> ) {
chomp $line;
if ( not $line eq "ER:" ) {
my ($key, $value) = split ( ": ", $line );
$hash{$key} .= "$value "; #Note trailing space!
}
else {
clean_up_hash ( \%hash ); #Remove trailing space on all values
create_entry ( \%log, \%hash );
%hash = ();
}
}
I like using classes whenever I start getting complex data structures, and I would probably create a Local::Log class and subclasses to store each layer of the log. However, it's not an absolute necessity and wasn't part of your question. Still, I would use a create_entry subroutine just to keep the logic of figuring out where in your log that entry belongs inside your loop.
NOTE: I append a space after each piece of data. I did this to make the code simpler since some of your messages may take more than one line. There are other ways to handle this, but I was trying to keep the loop as clean as possible and with as few if statements as possible.

Parsing XML file with perl - regex

i'm just a begginer in perl, and very urgently need to prepare a small script that takes top 3 things from an xml file and puts them in a new one.
Here's an example of an xml file:
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
What i'd like to do is to get first 3 items along with all the tags in between and put it into another file.
Thanks for all the help in advance
regards
peter
Never ever use Regex to handle markup languages.
The original version of this answer (see below) used XML::XPath. Grant McLean said in the comments:
XML::XPath is an old and unmaintained module. XML::LibXML is a modern, maintained module with an almost identical API and it's faster too.
so I made a new version that uses XML::LibXML (thanks, Grant):
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'articles.xml');
my $xp = XML::LibXML::XPathContext->new($doc->documentElement);
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath) ) {
# now do something with $article
print $article.": ".$article->getName."\n";
}
For me this prints:
XML::LibXML::Element=SCALAR(0x346ef90): article
XML::LibXML::Element=SCALAR(0x346ef30): article
XML::LibXML::Element=SCALAR(0x346efa8): article
Links to the relevant documentation:
The type of $doc will be XML::LibXML::Document.
The type of $xp is XML::LibXML::XPathContext.
The return type of $xp->findnodes() is XML::LibXML::NodeList.
The type $article is XML::LibXML::Element.
Original version of the answer, based on the XML::XPath package:
use warnings;
use strict;
use XML::XPath;
my $xp = XML::XPath->new(filename => 'articles.xml');
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath)->get_nodelist ) {
# now do something with $article
print $article.": ".$article->getName ."\n";
}
which prints this for me:
XML::XPath::Node::Element=REF(0x38067b8): article
XML::XPath::Node::Element=REF(0x38097e8): article
XML::XPath::Node::Element=REF(0x3809ae8): article
The type of $xp is XML::XPath, obviously.
The return type of $xp->findnodes() is XML::XPath::NodeSet.
The type of $article will be XML::XPath::Node::Element in this case.
Have a look at the docs to find out what you can do with them.
Here:
open my $input, "<", "file.xml" or die $!;
open my $output, ">", "truncated-file.xml" or die $!;
my $n_articles = 0;
while (<$input>) {
print $output $_;
if (m:</article>:) {
$n_articles++;
if ($n_articles >= 3) {
last;
}
}
}
close $input or die $!;
close $output or die $!;
You really don't need an XML parser to do such a simple job.