Text Pattern Processing in paragraph with unix linux utilities

Text Pattern Processing in paragraph with unix linux utilities - regex

I have a file with the following pattern (please note this is a file generated using sed,
awk, grep etc processing). The part of file input is as follows.
filename1,
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
EndOfFilefilename1
filename2,
BASE=f/g/h
CONFIG=$BASE/i
propertiesfile1=$CONFIG/j.properties
EndOfFilefilename2
filename3,
BASE=k/l/m
CONFIG=$BASE/n
propertiesfile1=$CONFIG/o.properties
EndOfFilefilename3
I want the output like
filename1,a/b/c/d/e.properties,
filename2,f/g/h/i/j.properties,
filename3, k/l/m/n/o.properties,
I could not find a solution with sed or awk or grep. So I ams tuck. Please do let me know if you know the solution with these unix utilities or any other language, platform.
Regards,
Suhaas

Assuming you generated the original file, and therefore it is safe to execute it as a script:
sed -e 's/^.*,/FILE=&/' \
-e 's/^.*=\$CONFIG/PROPFILE=$CONFIG/' \
-e 's/^EndOfFile.*/echo $FILE $PROPFILE/' < yourInputFile | sh
This converts each section of your file into the form:
FILE=filename1,
BASE=a/b/c
CONFIG=$BASE/d
PROPFILE=$CONFIG/e.properties
echo $FILE $PROPFILE
... and then sends it into a shell for processing.
Line-by-line explanation:
Line 1: Searches for the lines ending in a comma (the filenames), and sets FILE to the name.
Line 2: Searches for lines that set the properties file, and renames the variable to PROPFILE.
Line 3: Replaces the EndOfFile lines with a command to echo the file name and the properties file, then pipes it into a shell.

This is an excellent use case for structural regular expressions, which have been implemented as a python library, amongst other places. Here's an article which descibes how to emulate SREs in Perl.

And here is an awk script to process that input and generate what you want:
BEGIN {
FS="="
state = 0;
base = "";
config = "";
prop = "";
filename = "";
dbg = 0;
}
/^BASE=/ {
if (dbg) {
print "BASE";
print $0;
}
if (state != 1) {
print "Error base!";
exit 1;
}
state++;
base = $2;
if (dbg > 1) printf ("BASE = %s\n", base);
}
/^CONFIG=/ {
if (dbg) {
print "CONFIG";
print $0;
}
if (state != 2) {
print "Error config!";
exit 1;
}
state++;
config = $2;
sub (/\$BASE/, base, config);
if (dbg > 1) printf ("CONFIG = %s\n", config);
}
/^propertiesfile1=/ {
if (dbg) {
print "PROP";
print $0;
}
if (state != 3) {
print "Error pF!";
exit 1;
}
state++;
prop = $2;
sub (/\$CONFIG/, config, prop);
}
/^EndOfFile/ {
if (dbg) {
print "EOF";
print $0;
}
if (state != 4) {
print "Error EOF!";
print state;
exit 1;
}
state = 0;
printf ("%s%s,\n", filename, prop);
}
/,$/{
if (dbg) {
print "FILENAME";
print $0;
}
if (state != 0) {
print "Error filename!";
print state;
exit 1;
}
state++;
filename = $1;
}

gawk
gawk -vRS= 'BEGIN{FS="BASE[=]?|CONFIG|\n"}
{
s=$1
for(i=1;i<=NF;i++){
if($i~/\// ){ s=s $i }
}
print s
s=""
}' file
output
$ more file
filename1,
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
EndOfFilefilename1
filename2,
BASE=f/g/h
CONFIG=$BASE/i
propertiesfile1=$CONFIG/j.properties
EndOfFilefilename2
filename3,
BASE=k/l/m
CONFIG=$BASE/n
propertiesfile1=$CONFIG/o.properties
EndOfFilefilename3
$ ./shell.sh
filename1,a/b/c/d/e.properties
filename2,f/g/h/i/j.properties
filename3,k/l/m/n/o.properties

A perl script that does what you want would be something like (note this is untested)
while (<>) {
$base = $1 if (m/BASE=(.+)/);
$config = $1 if (m/CONFIG=(.+)/);
if (m/propertiesfile1=(.+)/) {
$props = $1;
$props =~ m/\$CONFIG/$config/;
$props =~ m/\$BASE/$base/;
print $ARGV . ", " . $props . "\n";
}
}
you give the script the filenames as arguments.

Multi-steps but it works!
cat yourInputFile | egrep ',|\/' | \
sed -e "s/^.*=//g" -e "s/\$.*\(\/.*\)/\1/g" | \
awk '{if($0 ~ "properties") print $0; else printf $0}'
The egrep grabs the lines containing a "," or a "/" and so eliminates the last line:
BASE=a/b/c
CONFIG=$BASE/d
propertiesfile1=$CONFIG/e.properties
The sed reduces the output to:
filename1,
a/b/c
/d
/e.properties
The awk portion reassembles the line to:
filename1,a/b/c/d/e.properties

Related

Why is this switch deleting an extra line (PowerShell)

This code is supposed to find a line with a regular expression and replace the line with "test". It is finding that line and replace it with "test" but also deleting the line under it, no matter what is in the next line down. I feel like I am just missing something about how a switch works in PowerShell.
Note: This is super boiled down code. There is a larger program this is part of.
$reg = '^HI\*BH'
$appendText = ''
$file = Get-ChildItem (join-path $PSScriptRoot "a.txt.BAK")
foreach ($f in $file){
switch -regex -file $f {
$reg
{
$appendText = "test"
}
default {
If ($appendText -eq '') {$appendText = $_}
$appendText
$appendText = ''
}
}
}
a.txt.BAK
HI*BH>00>D8>0*BH>00>D8>0*BH>A1>D8>0*BH>B1>D8>0000000~
HI*BE>02>>>0.00*BE>00>>>0.00~
NM1*71*1*TTT*NAME****XX*0000000~
PRV*AT*PXC*000V00000X~
Output:
test
NM1*71*1*TTT*NAME****XX*0000000~
PRV*AT*PXC*000V00000X~

The switch is not "deleting" anything - but you explicit ask it to overwrite $appendText on match, and you only ever output (and reset the value of) $appendText when it doesn't.
This code is supposed to find a line with a regular expression and replace the line with "test".
In that case I suggest you simplify your switch:
switch -regex -file $f {
$reg {
"test"
}
default {
$_
}
}
That's it - no fiddling around with variables - just output "test" on match, otherwise output the line as-is.
If you insist on using the intermediate variable, you'll need to output + reset the value in both cases:
switch -regex -file $f {
$reg {
$appendText = "test"
$appendText
$appendText = ''
}
default {
$appendText = $_
$appendText
$appendText = ''
}
}

How to match a regex pattern for multiple files under a directory in perl?

I wrote a script to match a pattern and return a statement for a file
#!/usr/bin/perl
use strict;
use warnings;
my $file = '/home/Sidtest/sid.txt';
open my $info , $file or die " Couldn't open the $file:$!";
while( my $line = <$info>) {
if ($line =~ m/^#LoadModule ssl_module/) {
print "FileName =",$file," Status = Failed \n";
}
elsif ($line =~ m/^LoadModule ssl_module/) {
print "FileName =",$file," Status = Passed \n";
}
}
close $info;
So now I am trying to modify this script to work for multiple files under the same directory. I haven't been able to do that successfully. Can anyone please help in how I can make it work for any number of files in a directory.

This will read every file in ./directory and foreach file, print out each line.
The print statement can be altered to print if /match/, or whatever you want:
my #dir = <directory/*>;
foreach my $file (#dir){
open my $input, '<', $file;
while (<$input>){
print "PASS: $_\n" if m/^#LoadModule ssl_module/;
[...]
}
}

The variable #ARGV contains a list of arguments sent to the script when started. Loop through #ARGV and call the script with the files you want to process:
#!/usr/bin/perl
use strict;
use warnings;
foreach my $file (#ARGV) {
open my $info , $file or die " Couldn't open the $file:$!";
while( my $line = <$info>) {
if ($line =~ m/^#LoadModule ssl_module/) {
print "FileName =",$file," Status = Failed \n";
}
elsif ($line =~ m/^LoadModule ssl_module/) {
print "FileName =",$file," Status = Passed \n";
}
}
close $info;
}
# process all files *.txt in your dir: ./myscript.pl /home/Sidtest/*.txt

Check perldoc perlrun, and look at the -p and -n parameters. Essentially, they treat your script as if it were the contents of a loop over stdin, where stdin is generated by iterating through the files supplied on the command line. The name of the file currently-being-processed can be accessed using the $ARGV variable.
So, you might go for an approach where your whole script looks more like this, using the -n param, where $_ contains the current line.:
if ( m/^#LoadModule ssl_module/) {
print "FileName =",$ARGV" Status = Failed \n";
} elsif (m/^LoadModule ssl_module/) {
print "FileName =",$ARGV," Status = Passed \n";
}

perl - remove new line after comma within function definitions only

Regex command to remove new line after comma within function definitions only.
ChainCtrlUpdateHandler defaultUpdateHandlers[kVideoRouteNodeMax] =
{
ChainCtrlUpdateMonitorRoute,
ChainCtrlUpdateVideoOutRoute,
};
eErrorT ChainCtrlInitChains(ChainCtrlT* pChainCtrl,
char* name,
int instance,
void* pOwner,
)
{
...
}
OUTPUT DESIRED
ChainCtrlUpdateHandler defaultUpdateHandlers[kVideoRouteNodeMax] =
{
ChainCtrlUpdateMonitorRoute,
ChainCtrlUpdateVideoOutRoute,
};
eErrorT ChainCtrlInitChains(ChainCtrlT* pChainCtrl,char* name,int instance,void* pOwner)
{
....
}
NOTE
There are many function definitions in the .c file
MY CODE
open(my $FILE, "< chaincontroller.c") or die $!;
my #arr = <$FILE>;
foreach (#arr){
$_ =~ s/,\n/,/;
print $_;
}
It removes ',\n' everywhere but i need it to be done only for functions definitions only.

Try doing this :
perl -00 -ple 's/,\s*\n/,/gms if /\(/ .. /\)/' filename.txt
With a script :
perl -MO=Deparse -00 -ple 's/,\s*\n/,/gms if /\(/ .. /\)/' filename.txt
code :
BEGIN { $/ = ""; $\ = "\n\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
s/,\s*\n/,/gms if /\(/ .. /\)/;
}
continue {
die "-p destination: $!\n" unless print $_;
}

Trying to compare web service response and expected xml from file

We're developing in Java for the most, but we want to integration test (using https://github.com/scottmuc/Pester) our web-services with ms as well. To do this I'm writing powershell scripts that connects to a web-service and compares the response to xml that I've loaded from a file.
[System.Net.ServicePointManager]::ServerCertificateValidationCallback = {$true}
$instance = New-WebServiceProxy -Uri "https://localhost:7002/service?WSDL" -Namespace "myspace"
$instance.Credentials = new-object System.Net.NetworkCredential("user", "pass")
...
$reply = $instance.fetchInformation($inputA, $inputB)
[xml]$expected = Get-Content ("expected.xml")
...
However, now I have a $reply that contains objects from the myspace namespace and an $expected that contains an XMLNode. I see two ways I can do this (there are probably many more):
Get the original XML response and compare that. However, I can't seem to find out how to get that.
Serialise the $expected XML into the myspace namespace objects. Is that possible?

You could serialize the response returned by the web service to XML and compare it with the contents of the expected.xml file as strings.
Here's an example:
$writer = New-Object System.IO.StringWriter
$serializer = New-Object System.Xml.Serialization.XmlSerializer($reply.GetType())
$serializer.Serialize($writer, $reply)
$replyAsXml = $writer.ToString()
$expectedReplyAsXml = Get-Content expected.xml
$replyAsXml -eq $expectedReplyAsXml
Note that in this example you need to make sure that XML contained in the expected.xml file matches the one returned by the XmlSerializer also in regard to spacing and indenting. In order to avoid that, you could strip all extra characters (such as spaces and newlines) from the two strings before comparing them.

I ended up with a completely different approach. The two XML's was quite different from each other so instead I created a custom comparator. This made it possible for me to simply write custom code to ignore uninteresting differences.
This lead to some pile of crude code that does the job:
# Assume two arrays of equal length
Function Zip {
Param($a1, $a2)
$sum = New-Object object[] $a1.Count
For ($i = 0; $i -lt $a1.Count; ++$i) {
$sum[$i] = New-Object object[] 2
$sum[$i][0] = $a1[$i]
$sum[$i][1] = $a2[$i]
}
Return ,$sum
}
Function XmlChildNodes2List{
param($nodes)
$myArray = New-Object object[] 0
For ($i = 0; $i -lt $nodes.Count; ++$i) {
$node = $nodes.Item($i)
If ($node -ne $null) {
$myArray += $node
}
}
Return ,$myArray
}
Function ShowContext{
Param($ctx)
" at " + $ctx
}
Function CompareNode{
Param($o1, $o2, $ctx)
Try {
Switch ($o1.GetType().Name) {
"XmlDocument" {
CompareXml $o1.ChildNodes $o2.ChildNodes
}
"XmlChildNodes" {
$olist1 = XmlChildNodes2List $o1 | Sort
$olist2 = XmlChildNodes2List $o2 | Sort
If ($olist1.Count -ne $olist2.Count) {
$msg = "Unequal child node count " + ($olist1 -join ",") + " and " + ($olist2 -join ",") + (ShowContext $ctx)
throw $msg
} Else {
$list = Zip $olist1 $olist2
$value = $true
foreach ($item in $list) {
if ($value -eq $true) {
$value = CompareXml $item[0] $item[1] $ctx
}
}
$value
}
}
"XmlElement" {
If ($o1.LocalName -eq $o2.LocalName) {
If ($o1.LocalName -eq "uninterestingElement" -or $o1.LocalName -eq "uninterestingElement2") {
$true
} Else {
CompareXML $o1.ChildNodes $o2.ChildNodes ($ctx + "/" + $o1.LocalName)
}
} Else {
throw ("Element " + $o1.LocalName + " != " + $o2.LocalName + (ShowContext $ctx))
}
}
"XmlDeclaration" {
$true
}
"XmlText" {
$result = $o1.InnerText.Replace("`r`n","`n")
$expect = $o2.InnerText.Replace("`r`n","`n")
# TODO: Hack to remove timezone from expected dates in format 2005-09-01+02:00, the webservice side of the
# reply to xml-conversion looses them
If ($expect -match "^(\d{4}-\d\d-\d\d)\+\d\d:\d\d$") {
$expect = $Matches[1]
}
If ($result -eq $expect) {
$true
} Else {
throw ($o1.InnerText + " is not equal to " + $o2.InnerText + (ShowContext $ctx))
}
}
Default {
throw ("What to do with node " + $o1.GetType().Name + (ShowContext $ctx))
}
}
} Catch [Exception] {
throw $_
}
}
Function CompareXML{
Param($o1, $o2, $ctx)
If ($o1 -eq $null -and $o2 -eq $null) {
$true
} ElseIf ($o1 -eq $null -or $o2 -eq $null) {
throw ("Response or expected is null")
} ElseIf ($o1.GetType() -eq $o2.GetType()) {
CompareNode $o1 $o2 $ctx
} Else {
throw ($o1.GetType().Name + " is not " + $o2.GetType().Name + (ShowContext $ctx))
}
}
This can then be run on two XML's like this:
CompareXML $result $expected ""

Perl - Printing the next line

I am a noob Perl user trying to get my work done ASAP so I can go home on time today :)
Basically I need to print the next line of blank lines in a text file.
The following is what I have so far. It can locate blank lines perfectly fine. Now I just have to print the next line.
open (FOUT, '>>result.txt');
die "File is not available" unless (#ARGV ==1);
open (FIN, $ARGV[0]) or die "Cannot open $ARGV[0]: $!\n";
#rawData=<FIN>;
$count = 0;
foreach $LineVar (#rawData)
{
if($_ = ~/^\s*$/)
{
print "blank line \n";
#I need something HERE!!
}
print "$count \n";
$count++;
}
close (FOUT);
close (FIN);
Thanks a bunch :)

open (FOUT, '>>result.txt');
die "File is not available" unless (#ARGV ==1);
open (FIN, $ARGV[0]) or die "Cannot open $ARGV[0]: $!\n";
$count = 0;
while(<FIN>)
{
if($_ = ~/^\s*$/)
{
print "blank line \n";
count++;
<FIN>;
print $_;
}
print "$count \n";
$count++;
}
close (FOUT);
close (FIN);
not reading the entire file into #rawData saves memory, especially in the case of large files...
<FIN> as a command reads the next line into $_
print ; by itself is a synonym for print $_; (although I went for the more explicit variant this time...

Elaborating on Ron Savage's solution:
foreach $LineVar (#rawData)
{
if ( $lastLineWasBlank )
{
print $LineVar;
$lastLineWasBlank = 0;
}
if($LineVar =~ /^\s*$/)
{
print "blank line \n";
#I need something HERE!!
$lastLineWasBlank = 1;
}
print "$count \n";
$count++;
}

I'd go like this but there's probably other ways to do it:
for ( my $i = 0 ; $i < #rawData ; $i++ ){
if ( $rawData[$i] =~ /^\s*$/ ){
print $rawData[$i + 1] ; ## plus check this is not null
}
}
J.

sh> perl -ne 'if ($b) { print }; if ($b = !/\S/) { ++$c }; END { print $c,"\n" }'
Add input filename(s) to your liking.

Add a variable like $lastLineWasBlank, and set it at the end of each loop.
if ( $lastLineWasBlank )
{
print "blank line\n" . $LineVar;
}
something like that. :-)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Text Pattern Processing in paragraph with unix linux utilities - regex

This is an excellent use case for structural regular expressions, which have been implemented as a python library, amongst other places. Here's an article which descibes how to emulate SREs in Perl.

Related

Why is this switch deleting an extra line (PowerShell)

How to match a regex pattern for multiple files under a directory in perl?

perl - remove new line after comma within function definitions only

Trying to compare web service response and expected xml from file

Perl - Printing the next line

Categories

Resources