Replace lines in a multi-line string - regex

I have an Oracle WebLogic config.xml file read into a string. I'm looking to update a series of lines in it. I've verified that I'm reading the file, getting the lines set, and able to update the correct line with the parameters I'm looking for, but I can't seem to update the original string.
Here's the main loop:
while ( $lines =~ m{(<arguments>.*?</arguments>)}mgs ) {
my $nchunk = my $ochunk = $1;
print "#" . '=' x 70 . "\n";
my ($ms) = $ochunk =~ m{.*/(.*?)\.out.*};
my $nname = $monster->{$domain}->{$ms}->{nodeName};
my $tname = $monster->{$domain}->{$ms}->{tierName};
my $newentry = sprintf(" %s %s.nodeName=-Dappdynamics.agent.nodeName=%s",
$appdjar, $ms, $nname);
$newentry .= " $ms.appdynamics.tierName=-Dappdynamics.tierName=$tname";
$nchunk =~ s/(<\/arguments>)/$newentry\1/g;
print "$ochunk\n";
print "#" . '-' x 70 . "\n";
print "$nchunk\n";
# $lines =~ s!$ochunk!!msg;
# $lines =~ s!$ochunk!$nchunk!msg;
}
As written, that results in:
#======================================================================
<arguments>-Xms512m -Xmx512m -Dweblogic.system.BootIdentityFile=/opt/app/oracle/user_projects/domains/AccountingServices_Domain/boot.properties -Dweblogic.Stdout=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/AccountingCommon_MS1.out -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/dumps -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote.port=40124 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=/opt/app/oracle/user_projects/logs/AccountingServices_Domain,maxage=10m,dumponexit=true,dumponexitpath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j.configuration=file:/opt/app/oracle/user_projects/applications/AccountingServices_Domain/log4j.xml</arguments>
#----------------------------------------------------------------------
<arguments>-Xms512m -Xmx512m -Dweblogic.system.BootIdentityFile=/opt/app/oracle/user_projects/domains/AccountingServices_Domain/boot.properties -Dweblogic.Stdout=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/AccountingCommon_MS1.out -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain/dumps -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote.port=40124 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=/opt/app/oracle/user_projects/logs/AccountingServices_Domain,maxage=10m,dumponexit=true,dumponexitpath=/opt/app/oracle/user_projects/logs/AccountingServices_Domain -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dlog4j.configuration=file:/opt/app/oracle/user_projects/applications/AccountingServices_Domain/log4j.xml -javaagent:/opt/app/appdynamics/universal-agent/monitor/java/javaagent.jar AccountingCommon_MS1.nodeName=-Dappdynamics.agent.nodeName=AccountingCommon_2123 AccountingCommon_MS1.appdynamics.tierName=-Dappdynamics.tierName=AccountingCommon</arguments>
[[snip]]
I can't seem to 're-find' the source chunk as indicated by one of those commented $lines trying to replace $ochunk with nothing.

You're going about this in a very round-about way, which is why I couldn't fathom what you were trying to do for the longest time. What you're actually trying to do is
Insert an additional string after the existing text in an arguments element
And you need just a substitution. I've left it global in case there really are multiple such elements in the XML. I've not been able to test it, but I do know that it compiles
$lines =~ s{ (<arguments>) (.*?) (</arguments>) }{
my ($otag, $text, $ctag) = ($1, $2, $3);
my ($ms) = $text =~ m{.*/(.*?)\.out};
my $msdata = $monster->{$domain}{$ms};
my $node = $msdata->{nodeName};
my $tier = $msdata->{tierName};
my $newentry = " $appdjar $ms.nodeName=-Dappdynamics.agent.nodeName=$node";
$newentry .= " $ms.appdynamics.tierName=-Dappdynamics.tierName=$tier";
$otag . $text . $newentry . $ctag;
}segx

Related

Remove matching words from the string using Perl

I want to remove the words Z or ZN and LVT from the strings present in my file but I couldn't get it. Can someone check my code.
Input
abchsfk/jshflka/ZN (cellLVT)
asjkfsa/sfklfkshfsf/Z (mobLVT)
asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z (celLVT)
asjhdjs/jhskjds/ZN (abcLVT)
shdsjk/jhskd/ZN (xyzLVT)
Output
abchsfk/jshflka cell
asjkfsa/sfklfkshfsf mob
asjhfdjkfd/sjfdskjfhdk/hsakfshf cel
asjhdjs/jhskjds abc
shdsjk/jhskd xyz
CODE:
if ($line =~ /LVT/ && ($line =~ /ZN/ || $line =~ /Z/) )
#### matches the words LVT and ( Z or ZN)
{
my #names = split / /, $line; ##### splits the line
$names[2] =~ s/\/Z|/ZN//g; #### remove Z or ZN
$names[3] =~ s/\(|LVT\)//g ; #### remove LVT & braces
print OUT " $names[2] $names[3] \n"; #### print
}
The problem is the order of matching: s/\/Z|\/ZN//g (the second backslash is missing in your code!). You should match the longer string first, otherwise Z will match and N won't be deleted.
There's even easier way, though: Just use \/ZN?:
#!/usr/bin/perl
use warnings;
use strict;
while (my $line = <DATA>) {
if ($line =~ /LVT/ && $line =~ /ZN?/) {
my #names = split ' ', $line;
$names[0] =~ s/\/ZN?//g;
$names[1] =~ s/\(|LVT\)//g;
print "$names[0] $names[1]\n";
}
}
__DATA__
abchsfk/jshflka/ZN (cellLVT)
asjkfsa/sfklfkshfsf/Z (mobLVT)
asjhfdjkfd/sjfdskjfhdk/hsakfshf/Z (celLVT)
asjhdjs/jhskjds/ZN (abcLVT)
shdsjk/jhskd/ZN (xyzLVT)

regular expression preg replace omit a character

I use this function to replace relative links with absolutes and make them as parameters for the page to stream it with file_get_contents. there is a problem i think in my regular expression that omits a character
its the function
$pattern = "/<a([^>]*) " .
"href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
$replace = "<a\${1} href=\"?u=" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<a([^>]*) " .
"href='[^http|ftp|https|mailto]([^\']*)'/";
$replace = "<a\${1} href=\"?u=" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<img([^>]*) " .
"src=\"[^http|ftp|https]([^\"]*)\"/";
$replace = "<img\${1} src=\"" . $base . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
$pattern = "/<a([^>]*) " .
"href=\"([^\"]*)\"/";
$replace = "<a\${1} href=\"?u=" . "\${2}\"";
$text = preg_replace($pattern, $replace, $text);
so
"UsersList.aspx?dir=09"
with this $base url":
http://www.some-url.com/Members/
should be replaced to
"?u=http://www.some-url.com/Members/UsersList.aspx?dir=09"
but i get
"?u=http://www.some-url.com/Members/sersList.aspx?dir=09"
i dont know whats the problem in my regular expression and how to fix it
Guess your a tag is like
and it will not work with this pattern for your desired result.
$pattern = "/<a([^>]*) " . "href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
in that
[^http|ftp|https|mailto] -- this expression matches only one character, means 'U' will be missing
try removing that like
$pattern = "/<a([^>]*) " . "href=\"([^\"]*)\"/";

extracting specific URL based on a condition from a log file in perl

I have this log file and I have to extract REQ-URL (Actually it is 2 lines long) of only Passed test cases. How do I check first if it is a Passed test case then extract it ? This list is 10 pages long. Please someone help me with this issue. I am stuck with this issue for a while now.
[Case MostPopular-BlogFlag]
[REQ-URL]: http://hostname:8080/guiderest?
customerId=cisco&code=news&guide=MostPopular&attrFilter=BlogFlag:true&v=1
***Passed!***
.
.
.
[Case MostPopular-BlogFlag]
[REQ-URL]: http://hostname:8080/guiderest?
customerId=cisco&code=news&guide=MostPopular&attrFilter=BlogFlag:true&v=1
***Failed!***
Thank you well in advance.
You can do it like this:
#!/usr/bin/perl
use strict;
my $string = '[Case MostPopular-BlogFlag1]
[REQ-URL]: http://hostname:8080/guiderest?
customerId=cisco&code=news&guide=MostPopular&attrFilter=BlogFlag:true&v=1
***Passed!***
.
.
.
[Case MostPopular-BlogFlag]
[REQ-URL]: http://hostname:8080/guiderest?
customerId=cisco&code=news&guide=MostPopular&attrFilter=BlogFlag:true&v=1
***Failed!***';
while($string =~ /\[Case\h+(?<case>[^]]+)]\s*\[REQ-URL]:\h+(?<url>\S+(\?\R\S+)?)\s*\*+Passed!\*+/g) {
print $+ {case} . "\n" . $+ {url} . "\n\n";
}
The pattern will fail if ***Passed!*** is not after.
You have to implement a basic state machine.
if ( $line =~ /REQ_URL/) {
$maybe_line = $line ;
$append = 1 ;
} elsif ( $line =~ /\*\*\*(Passed|Failed)/ {
if ( $1 =~ /Passed/ ) {
output_line($maybe_line,$line) ;
}
$append = "" ;
$maybe_line = "" ;
} else {
if ( $append ) {
$maybe_line .= $line
}
}
One option is to set Perl's record separator ($/) to "[Case MostPopular-BlogFlag]", so the log is read in 'chunks' seperated by that string. Next, use a regex that includes "***Passed!" and if it's not there, get the next record. If found, remove any newlines from the captured URL, and then print it:
use strict;
use warnings;
local $/ = '[Case MostPopular-BlogFlag]';
while (<>) {
next unless my ($url) = /\[REQ-URL\]:\s+([^*]+)\*\*\*Passed!/;
$url =~ s/\n//g;
print "$url\n";
}
Usage: perl script.pl inFile [>outFile]
The last optional parameter directs output to a file.
Hope this helps!

$2 not working in Perl search and replace for not-so-special case

Dear stackoverflow community,
I am trying to add entries to a tsv file using webpage-based curl lookups per the following:
#!/usr/bin/env perl
my $file = "TfbG_peaks2.tsv";
open(INFO, $file) or die("Could not open file.");
#my $VNG = "VNG1649G";
my $query = "(\<title\>)([A-Za-z0-9\- ]*)";
foreach $line (<INFO>) {
$line =~ /(^VNG\w*)/;
$VNG = $1;
my $url = "http://www.ncbi.nlm.nih.gov/gene/?term=$VNG";
my $page = `curl $url`;
if ($page =~ /(\<title\>)(VNG)/) {
$name = "hypothetical protein";
$abbrev = " ";
$longname = $name;
}
elsif ($page =~ /$query/) {
$name = $2;
$name =~ /^(\w+)(\s+)(([A-Za-z0-9\-]+\s*)+)/;
$abbrev = $1;
$longname = $3;
}
my #values = split('\t', $line);
splice #values, 1, 0, $abbrev;
splice #values, 2, 0, $longname;
print join "\t", #values;
print "\n";
}
The input tsv data file has lines that look like
VNG1374G Chromosome 1022977 1023252 4.184852806 2.877295983 3.362660404 3.961922335 3.932399564
or
VNGt26 Chromosome 1153828 1154334 4.879550683 3.730707809 5.515198268 5.30410069 5.328461226
The only line in the page source of the webpages I am looking up that contains the <title> HTML tag is like
<title>trn26 [Halobacterium sp. NRC-1] - Gene - NCBI</title>
for entries with a trn name and like
<title>gspE1 type II secretion system protein [Halobacterium sp. NRC-1] - Gene - NCBI</title>
or like
<title>VNG1872C hypothetical protein [Halobacterium sp. NRC-1] - Gene - NCBI</title>
for entries with a non-trn name.
The code works for non-trn names, i.e. prints either something like
VNG0218G gspE1 type II secretion system protein Chromosome 186556 186979 4.072750978 2.233376793 2.684902216 3.714576271 3.52083442
or something like
VNG2556H hypothetical protein Chromosome 1917796 1918082 3.778968581 2.582944032 2.981130347 3.940093432 4.286983604
but for trn entries prints
VNGt26 <title> Chromosome 1153828 1154334 4.879550683 3.730707809 5.515198268 5.30410069 5.328461226
instead of the expected
VNGt26 trn26 Chromosome 1153828 1154334 4.879550683 3.730707809 5.515198268 5.30410069 5.328461226
Why should the trn case be any different? The webpage source lines seem formatted in the same manner for the trn and non-trn cases and I can't see why my regex would fail for this case.
Also, I'm new to Perl so any suggestions on organization or keeping code clean & concise are welcome :).
Many thanks,
Michael
$name =~ /^(\w+)(\s+)(([A-Za-z0-9\-]+\s*)+)/; doesn't match, and thus doesn't change $1, $2, etc.

How to extract the text between two patterns using REGEX perl

In the following lines how can I store the lines between "Description:" and "Tag:" in a variable using REGEX PERL and what would be a good datatype to use, string or list or something else?
(I am trying to write a program in Perl to extract the information of a text file with Debian package information and convert it into a RDF(OWL) file(ontology).)
Description: library for decoding ATSC A/52 streams (development)
liba52 is a free library for decoding ATSC A/52 streams. The A/52 standard is
used in a variety of applications, including digital television and DVD. It is
also known as AC-3.
This package contains the development files.
Homepage: http://liba52.sourceforge.net/
Tag: devel::library, role::devel-lib
The code I have written so far is:
#!/usr/bin/perl
open(DEB,"Packages");
open(ONT,">>debianmodelling.txt");
$i=0;
while(my $line = <DEB>)
{
if($line =~ /Package/)
{
$line =~ s/Package: //;
print ONT ' <package rdf:ID="instance'.$i.'">';
print ONT ' <name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</name>'."\n";
}
elsif($line =~ /Priority/)
{
$line =~ s/Priority: //;
print ONT ' <priority rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</priority>'."\n";
}
elsif($line =~ /Section/)
{
$line =~ s/Section: //;
print ONT ' <Section rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Section>'."\n";
}
elsif($line =~ /Maintainer/)
{
$line =~ s/Maintainer: //;
print ONT ' <maintainer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</maintainer>'."\n";
}
elsif($line =~ /Architecture/)
{
$line =~ s/Architecture: //;
print ONT ' <architecture rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</architecture>'."\n";
}
elsif($line =~ /Version/)
{
$line =~ s/Version: //;
print ONT ' <version rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</version>'."\n";
}
elsif($line =~ /Provides/)
{
$line =~ s/Provides: //;
print ONT ' <provides rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</provides>'."\n";
}
elsif($line =~ /Depends/)
{
$line =~ s/Depends: //;
print ONT ' <depends rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</depends>'."\n";
}
elsif($line =~ /Suggests/)
{
$line =~ s/Suggests: //;
print ONT ' <suggests rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</suggests>'."\n";
}
elsif($line =~ /Description/)
{
$line =~ s/Description: //;
print ONT ' <Description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Description>'."\n";
}
elsif($line =~ /Tag/)
{
$line =~ s/Tag: //;
print ONT ' <Tag rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Tag>'."\n";
print ONT ' </Package>'."\n\n";
}
$i=$i+1;
}
my $desc = "Description:";
my $tag = "Tag:";
$line =~ /$desc(.*?)$tag/;
my $matched = $1;
print $matched;
or
my $desc = "Description:";
my $tag = "Tag:";
my #matched = $line =~ /$desc(.*?)$tag/;
print $matched[0];
or
my $desc = "Description:";
my $tag = "Tag:";
(my $matched = $line) =~ s/$desc(.*?)$tag/$1/;
print $matched;
Additional
If your Description and Tag may be on separate lines, you may need to use the /s modifier, to treat it as a single line, so the \n won't wreck it. Example:
$_=qq{Description:foo
more description on
new line Tag: some
tag};
s/Description:(.*?)Tag:/$1/s; #notice the trailing slash
print;
Assuming:
my $example; # holds the example text above
You could:
(my $result=$example)=~s/^.*?\n(Description:)/$1/s; # strip up to first marker
$result=~s/(\nTag:[^\n]*\n).+$/$1/s; # strip everything after second marker line
Or
(my $result=$example)=~s/^.*?\n(Description:.+?Tag:[^\n]*\n).*$/$1/s;
Both assume the Tag: value is contained on a single line.
If this is not the case, you might try:
(my $result=$example)=~s/
( # start capture
Description: # literal 'Description:'
.+? # any chars (non-greedy) up to
Tag: # literal 'Tag:'
.+? # any chars up to
)
(?: # either
\n[A-Z][a-z]+\: # another tagged value name
| # or
$ # end of string
)
/$1/sx;
I believe that the problem is caused by using a line reading loop for data structured by paragraphs. If you can slurp the file into memory and and apply split with a captured delimiter, the processing will be much smoother:
#!/usr/bin/perl -w
use strict;
use diagnostics;
use warnings;
use English;
# simple sample sub
my $printhead = sub {
printf "%5s got the tag '%s ...'\n", '', substr( shift, 0, 30 );
};
# map keys/tags? to functions
my %tagsoups = (
'PackageName' => sub {printf "%5s got the name '%s'\n", '', shift;}
, 'Description' => sub {printf "%5s got the description:\n---------\n%s\n----------\n", '', shift;}
, 'Tag' => $printhead
);
# slurp Packages (fallback: parse using $INPUT_RECORD_SEPARATOR = "Package:")
open my $fh, "<", './Packages-00.txt' or die $!;
local $/; # enable localized slurp mode
my $all = <$fh>;
my #pks = split /^(Package):\s+/ms, $all;
close $fh;
# outer loop: Packages
for (my $p = 1, my $n = 0; $p < scalar #pks; $p +=2) {
my $blk = "PackageName: " . $pks[$p + 1];
my #inf = split /\s*^([\w-]+):\s+/ms, $blk;
printf "%3d %s named %s\n", ++$n, $pks[$p], $inf[ 2 ];
# outer loop: key-value-pairs (or whatever they are called)
for (my $x = 1; $x < scalar #inf; $x += 2) {
if (exists($tagsoups{$inf[ $x ]})) {
$tagsoups{$inf[ $x ]}($inf[$x + 1]);
}
}
}
output for a shortened Packages file from my Ubuntu Linux:
3 Package named abrowser-3.5-branding
got the PackageName:
---------
abrowser-3.5-branding
----------
got the Description:
---------
dummy upgrade package for firefox-3.5 -> firefox
This is a transitional package so firefox-3.5 users get firefox on
upgrades. It can be safely removed.
----------
4 Package named casper
got the PackageName:
---------
casper
----------
got the Description:
---------
Run a "live" preinstalled system from read-only media
----------
got the Tag:
---------
admin::boot, admin::filesystem, implemented-in::shell, protocol::smb, role::plugin, scope::utility, special::c
ompletely-tagged, works-with-format::iso9660
----------
Using a hash for the functions to apply to the extracted parts will keep the details of generating xml out of the parser loops.