How to print out matached string with perlre? - regex

Perlre (Perl Regular Expression) is used for searching / replacing complex XML structure, e.g.
perl -0777 -pe 's/<ac:structured-macro ac:macro-id="[a-z0-9\-]+" ac:name="gadget" ac:schema-version="1"><ac:parameter ac:name="preferences">.*?selectedIssueKey=([A-Z\-0-9]+).*?(<\/ac:parameter>)<ac:parameter ac:name="url">https:\/\/rcrs.rbinternational.corp\/issue\/rest\/gadgets\/1.0\/g\/com.pyxis.greenhopper.jira:greenhopper-card-view-gadget\/gadgets\/greenhopper-card-view.xml<\/ac:parameter>.*?(<\/ac:structured-macro>)/<ac:structured-macro ac:name="jira" ac:schema-version="1"><ac:parameter ac:name="server">RCRS Issue Tracking<\/ac:parameter><ac:parameter ac:name="columns">key,type,assignee,status,nwu model developer,nwu model reviewer,nwu model owner,nwu head rr\/b2\/cro,ho validation owner,ho pi\/micro country manager,ho responsible signee<\/ac:parameter><ac:parameter ac:name="maximumIssues">20<\/ac:parameter><ac:parameter ac:name="jqlQuery">key = \1 <\/ac:parameter><ac:parameter ac:name="serverId">d64129aa-b1e8-3584-8953-2bd89c3e515c<\/ac:parameter><\/ac:structured-macro>/igs' macro
Currently the search pattern does not match as expected. Obviously the search patterns spans more string than expected. Is there a possibility to print out each matched string for debugging purpose?

You can turn on debugging for regex with use re 'debug';. However that would be the wrong approach to take. Your problem here isn't your regex is wrong, it's that regex is fundamentally the wrong tool for XML parsing. (And leaving aside that - your line is just too long to be sensible to use inline like that!)
Given your example - it looks like you're trying to extract a single value (selectedIssueKey) and insert it into a new blob of XML.
This is done much easier by a parser, such as XML::Twig. I can't give you a precise example, because I would need to see your XML structure (or at least a subset without the wildcards).
But something like this can be used for extracting a value from some XML:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> parse ( \*DATA );
my $selectedIssueKey = $twig -> findnodes ( '//ac:parameter/pref', 0) -> att('selectedIssueKey');
print $selectedIssueKey;
Extracts the value of an attribute 'selectedIssueKey' from:
<ac:structured-macro ac:macro-id="test" ac:name="gadget" ac:schema-version="1">
<ac:parameter ac:name="preferences">
<pref selectedIssueKey="anothertest" />
</ac:parameter>
<ac:parameter ac:name="url">https://rcrs.rbinternational.corp/issue/rest/gadgets/1.0/g/com.pyxis.greenhopper.jira:greenhopper-card-view-gadget/gadgets/greenhopper-card-view.xml</ac:parameter>
</ac:structured-macro>
XML::Twig also lets you cut and paste, so you could do something like this:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new ( 'pretty_print' => 'indented_a' ) -> parsefile 'sample.xml' );
my $selectedIssueKey = $twig -> findnodes ( '//ac:parameter/pref', 0) -> att('selectedIssueKey');
print "Found key of: $selectedIssueKey\n";
my $ac_structured_macro = $twig -> findnodes ( '//ac:structured-macro',0 );
my $new_macro = $twig -> root -> insert_new_elt( 'last_child', 'ac:structured-macro', { "ac:name" => "jira", "ac:schema-version"=> "1" } );
$new_macro -> insert_new_elt('last_child', 'ac:parameter', { 'ac:name' => 'server' }, "RCRS ISSUE Tracking" );
$new_macro -> insert_new_elt('last_child', 'ac:parameter', { 'ac:name' => 'columns' }, "key,type,assignee,status,etc" );
$new_macro -> insert_new_elt('last_child', 'ac:parameter', { 'ac:name' => 'maximumIssues' }, "20" );
$new_macro -> insert_new_elt('last_child', 'ac:parameter', { 'ac:name' => 'jqlQuery' }, "key = $selectedIssueKey" );
$new_macro -> insert_new_elt('last_child', 'ac:parameter', { 'ac:name' => 'serverId' }, "d64129aa-b1e8-3584-8953-2bd89c3e515c" );
$ac_structured_macro -> delete;
$twig -> print;
(It's probably easier to use a whole XML snippet for this though, and just replace the bits yo uwant).

use re 'debug'
To debug you regex, read more about it here http://perldoc.perl.org/re.html#%27debug%27-mode

Use a CPAN module for this. May save you some headache.
But if you still want to use regex for the job, I'd suggest you expand the regex in a script and step through it using the debugger like mentioned above.

Related

Perl deferred interpolation of string

I have a situation where there is a triage script that takes in a message, compares it against a list of regex's and the first one that matches sets the bucket. Some example code would look like this.
my $message = 'some message: I am bob';
my #buckets = (
{
regex => '^some message:(.*)',
bucket => '"remote report: $1"',
},
# more pairs
);
foreach my $e (#buckets) {
if ($message =~ /$e->{regex}/i) {
print eval "$e->{bucket}";
}
}
This code will give remote report: I am bob. I keep looking at this and feel like there has to be a better way to do this then it is done now. especially with the double quoting ('""') in the bucket. Is there a better way for this to be handled?
Perl resolves the interpolation when that expression is evaluated. For that, it is sufficient to use a subroutine, no eval needed:
...
bucket => sub { "remote report: $1" },
...
print $e->{bucket}->();
Note that you effectively eval your regexes as well. You can use pre-compiled regex objects in your hash, with the qr// operator:
...
regex => qr/^some message:(.*)/i,
...
if ($message =~ /$e->{regex}/) {
You could use sprintf-style format strings:
use strict;
use warnings;
my $message = 'some message: I am bob';
my #buckets = (
{
regex => qr/^some message:(.*)/,
bucket => 'remote report: %s',
},
# more pairs
);
foreach my $e (#buckets) {
if (my #matches = ($message =~ /$e->{regex}/ig)) {
printf($e->{bucket}, #matches);
}
}

Twitch TMI, extract all users

I'm trying to extract all the user names from the source (https://tmi.twitch.tv/group/user/twitchpresents/chatters), but so far I'm only able to get like the first "name".
Goal is to get all the names into an array, and then just remove the "viewers", "admins", "staff", etc. names.
use strict;
use warnings;
my #listusers = userlist();
sub userlist {
my $url = "https://tmi.twitch.tv/group/user/twitchpresents/chatters";
my $array = get($url);
my #array2;
my $time = 0;
while ($time != 2){
my $mylist = (join "",grep(/"\s*(.*?)\s*"/, $array[$time])) =~ /"\s*(.*?)\s*"/;
print $1;
$time++;
}
return #array2;
}
print #listusers;
I assume that you get the page with get from LWP::Simple. Please always show relevant includes.
Since this is valid JSON, use a module for that
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
use LWP::Simple;
use JSON;
my $data_json = get($url);
my $data = JSON->new->decode($data_json);
#dd $data;
my #names = #{ $data->{chatters}{moderators} }; # get "moderators"
say "#names";
# my #all_names = map { #$_ } values %{$data->{chatters}}; # or get all names
This prints the line: cliccer cuda hnlbot nixi93 scorpy0 somppe
I print "names" of moderators as an example of getting one category out; the commented out line gets all names in one array. Once you have a hashref there are various ways to extract what you need.
Your code seems to be trying to parse that JSON string. It is very easy with a module.
One can view complex data structures with Data::Dumper or such. I use Data::Dump.
Here JSON is used, which delegates to JSON::XS if installed or to the "pure Perl" (and slower) JSON::PP otherwise. Another option is Cpanel::JSON::XS.
For convenience, the structure printed by dd $data is
{
_links => {},
chatter_count => 15,
chatters => {
admins => [],
global_mods => [],
moderators => [
"cliccer",
"cuda",
"hnlbot",
"joffy95",
"nixi93",
"scorpy0",
"somppe",
],
staff => [],
viewers => [
"coldblood94",
"coldbot",
"gabenator",
"gharokk",
"reconcrusadershadow",
"scrubnubslulz",
"shai_the_panda",
"sonadourge",
],
},
}
I recommend that you use
Mojo::UserAgent
for this, as it contains a JSON decoder as well as an HTTP user agent and makes the code much more concise
I assume you want the names of all the different categories of chatters so that's what the map statement does: simply flattening all of the categories—admins, global_mods, moderators, staff, and viewers—into a single list
If anything goes wrong with the HTTP transfer then the subroutine will simply return an empty list, so you may want to add some proper error handling
use strict;
use warnings;
use feature 'say';
use Mojo::UserAgent;
my #list_users = chatters();
say for #list_users;
sub chatters {
my $url = 'https://tmi.twitch.tv/group/user/twitchpresents/chatters';
return unless my $data = Mojo::UserAgent->new->get($url)->res->json;
map { #$_ } values %{ $data->{chatters} };
}
output
girlonduty
moobot
ravager
100tesports
123mickeypr
13eebo
13urnxcalibur
2dmoody
2l33t
2tony79
3nticed
...

Edit all files in directory tree with regular expression on Windows

I am looking for a program that can edit all files in directory tree like Perl on Unix systems. The files are xml's and another folders.
The regex should delete all the content placed in <loot></loot> brackets.
for example file
<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
<immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
<item id="3"/>
<inside>
<item id="6"/>
</inside>
</item>
</loot>
the file should look after edit:
<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
<immunity fire="1"/>
</immunities>
<loot>
</loot>
I would shy away from anything regex based - XML simply doesn't work with regular expressions.
But fortunately, Perl for Windows is readily available. And better yet, if you go with Strawberry perl, it comes bundled with both XML::Twig and XML::LibXML.
At which point the problem becomes inanely simple:
#!/usr/bin/perl
use warnings;
use strict;
use File::Find::Rule;
use XML::Twig;
sub delete_loot {
my ( $twig, $loot ) = #_;
foreach my $loot_entry ( $loot -> children ) {
$loot_entry -> delete;
}
$twig -> flush;
}
my $twig = XML::Twig -> new ( pretty_print => 'indented',
twig_handlers => { 'loot' => \&delete_loot ,
'_all_' => sub { $_ - > flush } } );
foreach my $file ( File::Find::Rule -> file()
-> name ( '*.xml.txt' )
-> in ( 'C:\tmp' ) ) {
print "Processing $file\n";
$twig -> parsefile_inplace($file);
}
Of course, this also assumes that your XML is, in fact, XML - which your example isn't. If that example is actually correct, then you should really hit whoever wrote it around the head with a rolled up copy of the XML Spec whilst chanting 'don't make fake XML'.

Perl - Parse blocks from text file

First, I apologize if you feel this is a duplicate. I looked around and found some very similar questions, but I either got lost or it wasn't quite what I think I need and therefore couldn't come up with a proper implementation.
QUESTION:
So I have a txt file that contains entries made by another script (I can edit the format for how these entries are generated if you can suggest a better way to format them):
SR4 Pool2
11/5/2012 13:45
----------
Beginning Wifi_Main().
SR4 Pool2
11/8/2012 8:45
----------
This message is a
multiline message.
SR4 Pool4
11/5/2012 14:45
----------
Beginning Wifi_Main().
SR5 Pool2
11/5/2012 13:48
----------
Beginning Wifi_Main().
And I made a perl script to parse the file:
#!C:\xampp-portable\perl\bin\perl.exe
use strict;
use warnings;
#use Dumper;
use CGI 'param','header';
use Template;
#use Config::Simple;
#Config::Simple->import_from('config.ini', \%cfg);
my $cgh = CGI->new;
my $logs = {};
my $key;
print "Content-type: text/html\n\n";
open LOG, "logs/Pool2.txt" or die $!;
while ( my $line = <LOG> ) {
chomp($line);
}
print $logs;
close LOG;
My goal is to have a hash in the end that looks like this:
$logs = {
SR4 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
{
time => '11/8/2012 8:45',
msg => 'This message is a multiline message.',
},
},
Pool4 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
SR5 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
};
What would be the best way of going about this? Should I change the formatting of the generated logs to make it easier on myself? If you need anymore info, just ask. Thank you in advanced. :)
The format makes no sense. You used a hash at the third level, but you didn't specify keys for the values. I'm assuming it should be an array.
my %logs;
{
local $/ = ""; # "Paragraph mode"
while (<>) {
my #lines = split /\n/;
my ($x, $y) = split ' ', $lines[0];
my $time = $lines[1];
my $msg = join ' ', #lines[3..$#lines];
push #{ $logs{$x}{$y} }, {
time => $time,
msg => $msg,
};
}
}
Should I change the formatting of the generated logs
Your time stamps appear to be ambiguous. In most time zones, an hour of the year is repeated.
If you can possibly output it as XML, reading it in would be embarrasingly easy with XML::Simple
Although Karthik T idea of using XML makes sense, and I would also consider it, I'm not sure if this is the best route. The first problem is putting it in XML format in the first place.
The second is that XML format might not be so easily parsed. Sure, the XML::Simple module will read the whole thing in one swoop, you then have to parse the XML data structure itself.
If you can set the output however you want, make it in a format that's easy to parse. I like using prefix data identifiers. In the following example, each piece of data has it's own identifier. The ER: tells me when I hit the end of record:
DT: 11/5/2012 13:35
SR: SR4
PL: Pool2
MG: Beginning Wifi_Main().
ER:
DT: 1/8/2012 8:45
SR: SR4
PL: Pool2
MG: This message is a
MG: multiline message.
ER:
Parsing this output is straight forward:
my %hash;
while ( $line = <DATA> ) {
chomp $line;
if ( not $line eq "ER:" ) {
my ($key, $value) = split ( ": ", $line );
$hash{$key} .= "$value "; #Note trailing space!
}
else {
clean_up_hash ( \%hash ); #Remove trailing space on all values
create_entry ( \%log, \%hash );
%hash = ();
}
}
I like using classes whenever I start getting complex data structures, and I would probably create a Local::Log class and subclasses to store each layer of the log. However, it's not an absolute necessity and wasn't part of your question. Still, I would use a create_entry subroutine just to keep the logic of figuring out where in your log that entry belongs inside your loop.
NOTE: I append a space after each piece of data. I did this to make the code simpler since some of your messages may take more than one line. There are other ways to handle this, but I was trying to keep the loop as clean as possible and with as few if statements as possible.

soap lite pass string argument

I have a problem passing a string argument using Perl. The following code
#!/usr/bin/perl -w
use SOAP::Lite;
my $service = SOAP::Lite->service('http://localhost:8080/greeting?wsdl');
print $service->greetClient('perl wooooo'), "\n";
Results in
Greeting null! Have a nice day...
A similar python code
from suds.client import Client
client = Client('http://localhost:8080/greeting?wsdl')
print client.service.greetClient('python wooooo')
works perfectly
Greeting python wooooo! Have a nice day...
I tried to set different encodings
print $service->encoding('utf-8')->greetClient("perl wooooo"), "\n";
with the same result.
A SOAP Monitor shows that there is no arg0 in a case of Perl
<greetClient xsi:nil="true" xsi:type="tns:greetClient" />
which is present in a case of Python
<ns0:greetClient>
<arg0>python wooooo</arg0>
</ns0:greetClient>
What can be a problem?
Why it's so complicated to implement a SOAP client with Perl compared to Python?
EDIT:
SOLUTION
Finally the following solution is working
#!/usr/bin/perl -w
use strict;
use warnings;
use XML::Compile::SOAP11;
use XML::Compile::WSDL11;
use XML::Compile::Transport::SOAPHTTP;
my $soap = XML::Compile::WSDL11->new('c:/temp/greeting.wsdl');
my $call = $soap->compileClient('greetClient');
print $call->(arg0 => 'perl wooooo'){'greetClientResponse'}{'return'}, "\n";
SOAP::Lite can be infuriatingly bad. You might give XML::Compile::SOAP a try:
use strict;
use warnings;
use XML::Compile::SOAP11;
use XML::Compile::WSDL11;
use XML::Compile::Transport::SOAPHTTP;
my $soap = XML::Compile::WSDL11->new(
'http://localhost:8080/greeting?wsdl',
schema_dirs => [
'c:/soft/Perl/site/lib/XML/Compile/SOAP11/xsd'
'c:/soft/Perl/site/lib/XML/Compile/XOP/xsd'
'c:/soft/Perl/site/lib/XML/Compile/xsd'
]
);
$soap->compileCalls;
my ( $response, $trace ) = $soap->call( 'greetClient', arg0 => 'perl wooooo' );
$trace->printResponse;
$response will be the call response converted to a hashref via XML::Simple, which may be all you need. The $trace object is handy to see what the raw XML response looks like.
Unfortunately, I can't see your WSDL.
But in regards to SOAP::Lite, I don't see you setting up neither a proxy (endpoint) nor an uri.
You're also probably going to have to change the on_action behavior as well. By default, SOAP::Lite wants to use the '#' concatenation.
So something along these lines might work.
$service->proxy( $uri_of_my_end_point );
$service->uri( $schema_namespace );
$service->on_action( sub {
my ( $uri, $method ) = #_;
my $slash = $uri =~ m{/$} ? '' : '/';
return qq{"$uri$slash$method"};
});