How to read from specific line to another line - Perl - regex

I'm doing a Perl script and I have a log file where I need to extract data from. I want to know how to read from a specific line to another line(not end of file).
I tried it this way by putting a last if if it reaches the line that I want to stop at but it doesn't work. The line that I want to start reading from is <TEST_HEAD TH 1> and stops at </TEST_HEAD TH 1>. I'm doing this because my regular expression captures data that I do not need, so I tried to read from a specific line to another line.
This is what I've done so far:
while(<$log_fh>)
{
if($. =~ /\<TEST_HEAD TH 1\>/)
{
if ( /Computer Name:\s*(\S+)(-\d+)/i )
{
$details{tester_name} = $1 . $2;
$details{tester_type} = $1;
push #{$details{tester_arr}}, $1 . $2;
}
elsif ( /Operating System:\s*(.*\S)/i )
{
$details{op_sys} = $1;
}
elsif ( /IG-XL Version:\s*([^;]*)/i )
{
$details{igxl_vn} = $1;
}
elsif ( /^([\d]+)\.\d\s+(\S+)\s+([\d-]*)\s+([\d|\w]*)(?=\s)/ )
{
push #{$details{slot}}, $1;
push #{$details{board_name}}, $2;
push #{$details{part_no}}, $3;
push #{$details{serial_no}}, $4;
}
last if $. == /\<\/TEST_HEAD TH 1\>/;
}
}
Just a modified sample of the raw data file:
<TEST_HEAD TH 1> #Start reading here
(Lines containing data to be captured)
</TEST_HEAD TH 1> #end reading here

Without going much into the nested matching logic you may want to change
if($. =~ /\<TEST_HEAD TH 1\>/)
into
if (/<TEST_HEAD TH 1>/ .. /<\/TEST_HEAD TH 1>/)
What you ask is actually XY problem and it would be better to process xml like document with xml parser. Parsing complex XML in Perl

Without knowing specifically how your data looks, I'd also offer another approach.
Set $/ to a record separator, and then you grab a chunk of text in one go. You can then apply a bunch of different regexes to it all at once.
E.g.:
local $/ = 'TEST_HEAD';
while (<$log_fh>) {
next unless m/^\s*TH/;
my ( $tester_name, $tester_id ) = (m/Computer Name:\s*(\S+)(-\d+)/i);
my ($op_sys) = (m/Operating System:\s*(.*\S)/i);
my ( $slot, $board, $part, $serial ) =
(m/^([\d]+)\.\d\s+(\S+)\s+([\d-]*)\s+([\d|\w]*)(?=\s)/m);
# etc.
# then validate and update your array:
$details{$tester_name} = $tester_name;
## etc.
}

Related

Data extract from MQ output issue in perl code

I'm having problem with my perl code, the code that I wrote is suppose to grab some information from MQ command dis ql(*) all,
below is one of the output example from above command,
AMQ8409: Display Queue details.
QUEUE(XXX.DATATYPE.NETSTATVM) TYPE(QLOCAL)
ACCTQ(QMGR) ALTDATE(2016-08-01)
ALTTIME(18.33.19) BOQNAME( )
BOTHRESH(0) CLUSNL( )
CLUSTER( ) CLCHNAME( )
CLWLPRTY(0) CLWLRANK(0)
CLWLUSEQ(QMGR) CRDATE(2016-08-01)
CRTIME(18.33.19) CURDEPTH(0)
CUSTOM( ) DEFBIND(OPEN)
DEFPRTY(0) DEFPSIST(YES)
DEFPRESP(SYNC) DEFREADA(NO)
DEFSOPT(SHARED) DEFTYPE(PREDEFINED)
DESCR(Queue for XXX.DataType.netstatvm)
DISTL(NO) GET(ENABLED)
HARDENBO INITQ( )
IPPROCS(1) MAXDEPTH(20000)
MAXMSGL(33554432) MONQ(QMGR)
MSGDLVSQ(PRIORITY) NOTRIGGER
NPMCLASS(NORMAL) OPPROCS(0)
PROCESS( ) PUT(ENABLED)
PROPCTL(COMPAT) QDEPTHHI(80)
QDEPTHLO(20) QDPHIEV(DISABLED)
QDPLOEV(DISABLED) QDPMAXEV(ENABLED)
QSVCIEV(NONE) QSVCINT(999999999)
RETINTVL(999999999) SCOPE(QMGR)
SHARE STATQ(QMGR)
TRIGDATA( ) TRIGDPTH(1)
TRIGMPRI(0) TRIGTYPE(FIRST)
USAGE(NORMAL)
Above output is grab from one of the queue in MQ instead of all queue which the command run.
From above, I want to extract the value from QUEUE, CURDEPTH and MAXDEPTH, as below:-
QUEUE(XXX.DATATYPE.NETSTATVM)
CURDEPTH(0)
MAXDEPTH(20000)
So, I wrote a perl code to obtain the value from QUEUE, CURDEPTH and MAXDEPTH, below is my code,
my $qm = XXX;
open (CHS_OUT, "echo 'dis ql(*) all'|runmqsc $qm|");
while (<CHS_OUT>) {
if ( /QUEUE\(/ ){
my $QueueName =~ /QUEUE/(/\S+)/g;
}
if ( /CURDEPTH\(/ ){
my $CurDepth =~ s/\D//g;
chomp $CurDepth;
print "$CurDepth \n";
}
if ( /MAXDEPTH\(/ ){
my $MaxDepth =~ s/\D//g;
chomp $MaxDepth;
print "$MaxDepth \n";
}
}
The output suppose to be as below,
XXX.DATATYPE.NETSTATVM
0
20000
However, I received a multiple error to extract all of this 3 information, one of the error as below,
Use of uninitialized value $MaxDepth in substitution (s///) at mq_test.pl line 26, line 7361.
Use of uninitialized value $MaxDepth in scalar chomp at mq_test.pl line 27, line 7361.
Use of uninitialized value $MaxDepth in concatenation (.) or string at mq_test.pl line 28, line 7361.
This make me confuse since I already do multiples changes of this code but still not success.
You could use the following regular Expression
(?:QUEUE|CURDEPTH|MAXDEPTH)\(\K[^()]+
See a demo on regex101.com.
That is
(?:QUEUE|CURDEPTH|MAXDEPTH) # one of the alternatives
\( # an opening bracket
\K # "forget" everything
[^()]+ # not (), at least once
In Perl this would be:
my #matches = $str =~ /(?:QUEUE|CURDEPTH|MAXDEPTH)\(\K[^()]+/g;
print "#matches\n";
# XXX.DATATYPE.NETSTATVM
# 0
# 20000
=~ is the binding operator. It binds the left hand side string to the match on the right hand side. But you have my $variable on the LHS - so the string is empty. What you want is to match against the implicit variable, and possibly store a part of the match. This is done by normal assignment in list context:
#!/usr/bin/perl
use warnings;
use strict;
while (<>) {
if ( /QUEUE\(/ ) {
my ($QueueName) = /QUEUE\((\S+)\)/;
print "QN: $QueueName\n";
}
if ( /CURDEPTH\(/ ) {
my ($CurDepth) = /CURDEPTH\((\d+)/;
print "CD: $CurDepth\n";
}
if ( /MAXDEPTH\(/ ) {
my ($MaxDepth) = /MAXDEPTH\((\d+)/;
print "MD: $MaxDepth\n";
}
}
You can combine all the regexes into one, too, and use a hash to store the values keyed by the word before the parenthesis:
#!/usr/bin/perl
use warnings;
use strict;
my %info;
while (<>) {
if (my ($key, $value)
= / ( QUEUE | CURDEPTH | MAXDEPTH ) \( ( [^)]+ ) /x
) {
$info{$key} = $value;
}
}
for my $key (keys %info) {
print "$key: $info{$key}\n";
}
I use the following to do something similar with awk:
echo "DIS QL(*) CURDEPTH MAXDEPTH"|runmqsc $qm | grep -o '^\w\+:\|\w\+[(][^)]\+[)]' | awk -F '[()]' -v OFS='\n' 'function printValues() { if ("QUEUE" in p) { print p["QUEUE"], p["CURDEPTH"], p["MAXDEPTH"], "" } } /^\w+:/ { printValues(); delete p; next } { p[$1] = $2 } END { printValues() }'
Output would look like this:
XXX.DATATYPE.NETSTATVM
0
20000
YYY.DATATYPE.NETSTATVM
50
10000

Sorting lines with regex in perl

I am trying to read a log file and write all the error logs to a new file. I must also keep track of how many errors there are and the number of messages in general. I must assume that the logs will be broken up onto multiple lines, so I have been using regex and series a variables to search for all possibilities and write to the appropriate file.
My file handles are: FILE, ERRORFILE, and SUCCESSFILE.
use strict;
use warnings;
my $totalcount = 0;
my $errorcount = 0;
my $log = "s"; # $log controls what what should be written where,
# incase it doesn't start with code.
# "s" = SuccessFile, "e" = ErrorFile
my $logStart = "y"; # used with m/^I/ or m/^E/ instead of full code
# incase the code is broken into pieces.
my $dash = 0;
while (<FILE>) {
$dash += () = $_ =~ m/-/g; # can't use tr/// because it counts at compile
if ( $dash lt 25 ) { next; } # this line skips "---Begin <Repository>---"
elsif ( m/[a-zA-Z <>]/ && $dash lt 25 ) { next; }
elsif ( $dash >= 26 ) { last; } #Ends loop at "---End <Repository>---"
if ( m/^I/ && $logStart eq "y" ) {
$log = "s";
$logStart = "n";
$totalcount++;
next;
} #Ignores nonerror logs
elsif ( m/^E/ && $logStart eq "y" ) {
chomp $_;
print ERRORFILE "$_";
$errorcount++;
$totalcount++;
$log = "e";
$logStart = "n";
}
elsif (m/ \.\n$/) { #End of log
if ( $log eq "s" ) { $logStart = "y"; next; }
print ERRORFILE "$_\n" if $log eq "e";
$logStart = "y";
}
else { #line doesn't start with code or end in " .\n"
chomp $_;
print ERRORFILE "$_" if $log eq "e";
next if $log eq "s";
}
}
print "\nThere are $errorcount error logs.\n";
print "There are $totalcount logs in the full log file.\n";
I know that the non-error logs start with I00020036 and the errors start with E03020039. Both end in " .\n"
---------- Begin <Load Repository> ---------------
I00020036: Loaded C:\Documents and Settings\dorja03\Desktop\DSMProduct\external\etpki\Linux_2.4_x86\redistrib\readme.txt into \DSM R11\external\etpki\Linux_2.4_x86\redistrib\readme.txt .
E03020039: Unable to load C:\Documents and Settings\dorja03\Desktop\DSMProduct\external\etpki\Linux_2.4_x86\redistrib\etpki_install_lib.sh into \DSM R11\external\etpki\Linux_2.4_x86\redistrib\etpki_install_lib.sh . Text file contains invalid characters .
---------- End <Load Repository> ---------------
I have been running a test sample with two lines. If the error comes up first, it will print it to the error file, along with the non-error log, and on the same line. If the non-error goes first, it doesn't recognize the error.
Is this because I'm using m// wrong or something else entirely?
Edit: Test input has been added. I also added the code to skip the header and footer.
Test output: If the non-error comes first, there are 0 errors and 1 log total.
If the non-error comes first, there is 1 error and 1 log total.
If this worked, it should have said there was 1 error and 2 logs. It also would have only printed the error to the ERRORFILE.
This won't answer why your code isn't working, but here's how I would approach the problem:
Since the logs can span over multiple lines, modify the default line-by-line behavior by tweaking $/.
Use appropriate data structures to filter the errors from non-errors. This will also allow you to defer printing till later.
The code would then look something like this:
use strict;
use warnings;
my %logs;
local $/ = " .\n";
while ( <> ) { # Now $_ is the full (multi-line) log
next if /--- Begin/; # Skip if /Begin/
last if /--- End/; # Stop processing if /End/
if ( m/^I/ ) {
push #{ $logs{nonerror} }, $_;
}
if ( m/^E/ ) {
push #{ $logs{error} }, $_;
}
}
printf "There are %d error logs\n.", scalar #{ $logs{error} // [] } ;
printf "There are %d logs in the full logfile.\n",
#{$logs{error} // []} + #{$logs{nonerror} // []};
Things I like about this approach:
Perl takes care of deciding when each log message ends (eliminates the $logStart variable altogether).
The logic is much easier to extend.
The while loop is dedicated to processing the log file (no need to ++ anything).
Use of sensibly-labeled data structures instead of temporary variables makes for easier code maintenance.
To make a formal answer, I scrapped this code and replaced it. I instead fed the file into var with a delimiter, then just split it into an array. It was much easier and cleaner. I don't however have the code anymore due to a lost flashdrive.

PERL: line by line text parsing script

for excercise and curiosity, anyone knows if the following script can be made more compact and expedite:
foreach(#list){
if ($_=~"givenName: ") {
$cname=$_;
$cname=~ s/givenName: //g;
}
if ($_=~"cn: ") {
$cn=$_;
$cn=~ s/cn: //g;
}
...
}
What is does:
- It looks for a string inside the line, to see if it contains that particular index
- It then strips off the string and read the rest of the line putting the content into the variable.
- This script reads line by line the result of another script and identify the fields of each line putting the value inside the proper variable
If every line in the list is guaranteed to be in the format 'variableName: someText' then you could do this instead:
foreach (#list) {
/^(\w+): (.*)/ && $vars{$1} = $2;
}
It's not exactly like your solution -- it puts the results into a %vars hash instead of into variables named $cname, $cn, etc. -- but it's more succinct and general.
how about something like this?
my $data = {}; #a hashref to store your data
foreach my $line(#list){
$line =~ s/(givenName|cn|more|names):\s//g and $data->{$1} = $line;
...
}
#EDIT: now you have all your data inside the hashref and can call each var accordingly
print $data->{givenName};
print $data->{cn};
my #list = ('name', 'givenName: ', 'noname');
foreach(#list){
s/givenName: //g if /givenName: /;
my $var = $_;
...
}

regular expression code

I need to find match between two tab delimited files files like this:
File 1:
ID1 1 65383896 65383896 G C PCNXL3
ID1 2 56788990 55678900 T A ACT1
ID1 1 56788990 55678900 T A PRO55
File 2
ID2 34 65383896 65383896 G C MET5
ID2 2 56788990 55678900 T A ACT1
ID2 2 56788990 55678900 T A HLA
what I would like to do is to retrive the matching line between the two file. What I would like to match is everyting after the gene ID
So far I have written this code but unfortunately perl keeps giving me the error:
use of "Use of uninitialized value in pattern match (m//)"
Could you please help me figure out where i am doing it wrong?
Thank you in advance!
use strict;
open (INA, $ARGV[0]) || die "cannot to open gene file";
open (INB, $ARGV[1]) || die "cannot to open coding_annotated.var files";
my #sample1 = <INA>;
my #sample2 = <INB>;
foreach my $line (#sample1) {
my #tab = split (/\t/, $line);
my $chr = $tab[1];
my $start = $tab[2];
my $end = $tab[3];
my $ref = $tab[4];
my $alt = $tab[5];
my $name = $tab[6];
foreach my $item (#sample2){
my #fields = split (/\t/,$item);
if ( $fields[1] =~ m/$chr(.*)/
&& $fields[2] =~ m/$start(.*)/
&& $fields[4] =~ m/$ref(.*)/
&& $fields[5] =~ m/$alt(.*)/
&& $fields[6] =~ m/$name(.*)/
) {
print $line, "\n", $item;
}
}
}
On its surface your code seems to be fine (although I didn't debug it). If you don't have an error I cannot spot, could be that the input data has RE special character, which will confuse the regular expression engine when you put it as is (e.g. if any of the variable has the '$' character). Could also be that instead of tab you have spaces some where, in which case you'll indeed get an error, because your split will fail.
In any case, you'll be better off composing just one regular expression that contains all the fields. My code below is a little bit more Perl Idiomatic. I like using the implicit $_ which in my opinion makes the code more readable. I just tested it with your input files and it does the job.
use strict;
open (INA, $ARGV[0]) or die "cannot open file 1";
open (INB, $ARGV[1]) or die "cannot open file 2";
my #sample1 = <INA>;
my #sample2 = <INB>;
foreach (#sample1) {
(my $id, my $chr, my $start, my $end, my $ref, my $alt, my $name) =
m/^(ID\d+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)/;
my $rex = "^ID\\d+\\s+$chr\\s+$start\\s+$end\\s+$ref\\s+$alt\\s+$name\\s+";
#print "$rex\n";
foreach (#sample2) {
if( m/$rex/ ) {
print "$id - $_";
}
}
}
Also, how regular is the input data? Do you have exactly one tab between the fields? If that is the case, there is no point to split the lines into 7 different fields - you only need two: the ID portion of the line, and the rest. The first regex would be
(my $id, my $restOfLine) = m/^(ID\d+)\s+(.*)$/;
And you are searching $restOfLine within the second file in a similar technique as above.
If your files are huge and performance is an issue, you should consider putting the first regular expressions (or strings) in a map. That will give you O(n*log(m)) where n and m are the number of lines in each file.
Finally, I have a similar challenge when I need to compare logs. The logs are supposed to be identical, with the exception of a time mark at the beginning of each line. But more importantly: most lines are the same and in order. If this is what you have, and it make sense for you, you can:
First remove the IDxxx from each line: perl -pe "s/ID\d+ +//" file >cleanfile
Then use BeyondCompare or Windiff to compare the files.
I played a bit with your code. What you wrote there was actually three loops:
one over the lines of the first file,
one over the lines of the second file, and
one over all fields in these lines. You manually unrolled this loop.
The rest of this answer assumes that the files are strictly tab-seperated and that any other whitespace matters (even at the end of fields and lines).
Here is a condensed version of the code (assumes open filehandles $file1, $file2, and use strict):
my #sample2 = <$file2>;
SAMPLE_1:
foreach my $s1 (<$file1>) {
my (undef, #fields1) = split /\t/, $s1;
my #regexens = map qr{\Q$_\E(.*)}, #fields1;
SAMPLE_2:
foreach my $s2 (#sample2) {
my (undef, #fields2) = split /\t/, $s2;
for my $i (0 .. $#regexens) {
$fields2[$i] =~ $regexens[$i] or next SAMPLE_2;
}
# only gets here if all regexes matched
print $s1, $s2;
}
}
I did some optimisations: precompiling the various regexes and storing them in an array, quoting the contents of the fields etc. However, this algorithm is O(n²), which is bad.
Here is an elegant variant of that algorithm that knows that only the first field is different — the rest of the line has to be the same character for character:
my #sample2 = <$file2>;
foreach my $s1 (<$file1>) {
foreach my $s2 (#sample2) {
print $s1, $s2 if (split /\t/, $s1, 2)[1] eq (split /\t/, $s2, 2)[1];
}
}
I just test for string equality of the rest of the line. While this algorithm is still O(n²), it outperforms the first solution roughly by an order of magnitude simply by avoiding braindead regexes here.
Finally, here is an O(n) solution. It is a variant of the previous one, but executes the loops after each other, not inside each other, therefore finishing in linear time. We use hashes:
# first loop via map
my %seen = map {reverse(split /\t/, $_, 2)}
# map {/\S/ ? $_ : () } # uncomment this line to handle empty lines
<$file1>;
# 2nd loop
foreach my $line (<$file2>) {
my ($id2, $key) = split /\t/, $line, 2;
if (defined (my $id1 = $seen{$key})) {
print "$id1\t$key";
print "$id2\t$key";
}
}
%seen is a hash that has the rest of the line as a key and the first field as a value. In the second loop, we retrieve the rest of the line again. If this line was present in the first file, we reconstruct the whole line and print it out. This solution is better than the others and scales well up- and downwards, because of its linear complexity
How about:
#!/usr/bin/perl
use File::Slurp;
use strict;
my ($ina, $inb) = #ARGV;
my #lines_a = File::Slurp::read_file($ina);
my #lines_b = File::Slurp::read_file($inb);
my $table_b = {};
my $ln = 0;
# Store all lines in second file in a hash with every different value as a hash key
# If there are several identical ones we store them also, so the hash values are lists containing the id and line number
foreach (#lines_b) {
chomp; # strip newlines
$ln++; # count current line number
my ($id, $rest) = split(m{[\t\s]+}, $_, 2); # split on whitespaces, could be too many tabs or spaces instead
if (exists $table_b->{$rest}) {
push #{ $table_b->{$rest} }, [$id, $ln]; # push to existing list if we already found an entry that is the same
} else {
$table_b->{$rest} = [ [$id, $ln] ]; # create new entry if this is the first one
}
}
# Go thru first file and print out all matches we might have
$ln = 0;
foreach (#lines_a) {
chomp;
$ln++;
my ($id, $rest) = split(m{[\t\s]+}, $_, 2);
if (exists $table_b->{$rest}) { # if we have this entry print where it is found
print "$ina:$ln:\t\t'$id\t$rest'\n " . (join '\n ', map { "$inb:$_->[1]:\t\t'$_->[0]\t$rest'" } #{ $table_b->{$rest} }) . "\n";
}
}

How can I detect a blank line in Perl?

How do I check a line ($_ value) is a blank line in Perl? Or another
good method to check it instead of using $_?
I want to code like this
if ($_ eq '') # Check current line is a blank line (no any characters)
{
$x = 0;
}
I updated some code with a question solution below.
My test.txt for parsing:
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
constant fixup private GemConfigAlarms = <U1 0> /* my Comment */
vid = 1
name = "CONFIGALARMS"
units = ""
min = <U1 0>
max = <U1 2>
default = <U1 0>
My code is below.
That's why I need to initially set $x = 0. I am not sure if it is a normal
solution or not.
sub ConstantParseAndPrint
{
if (/^$/) // SOLUTION!
{
$x = 0;
}
if ($x == 0)
{
if (/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+(["']?)([a-zA-Z0-9.:\\]+)\6>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
{
$name1 = $1; # Constant
$name2 = $2; # Fixup
$name3 = $3; # Private
$name4 = $4;
$name5 = $5;
$name6 = $7;
$name7 = $8;
# start print
if (!$name7 eq '')
{
print DEST_XML_FILE "<!-- $name7-->\n";
}
print DEST_XML_FILE " <ECID";
print DEST_XML_FILE " logicalName=\"$name4\"";
print DEST_XML_FILE " valueType=\"$name5\"";
print DEST_XML_FILE " value=\"$name6\"";
$x = 1;
}
}
elsif ($x == 1)
{
if(/\s*vid\s*=\s*(.*?)(\s|\n|\r)/)
{
$nID = $1;
print DEST_XML_FILE " vid=\"$nID\"";
$x = 2;
}
}
elsif ($x == 2)
{
if(/\s*name\s*=\s*(.*?)(\s|\n|\r)/)
{
$nName = $1;
print DEST_XML_FILE " name=$nName";
$x = 3;
}
}
elsif ($x == 3)
{
if (/\s*units\s*=\s*(.*?)(\s|\n|\r)/)
{
$nUnits = $1;
print DEST_XML_FILE " units=$nUnits";
$x = 4;
}
}
elsif ($x == 4)
{
# \s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>\
if (/\s*min\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nMinName1 = $1;
$nMinName2 = $2; # Find the nMin Value
#$nMinName3 = $3;
#$nMinName4 = $4;
print DEST_XML_FILE " min=\"$nMinName2\"";
$x = 5;
}
else
{
print DEST_XML_FILE "></ECID>\n";
$x = 0; # There is no line 4 and line 5
}
}
elsif ($x == 5)
{
if (/\s*max\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nMaxName1 = $1;
$nMaxName2 = $2; # Find the nMax Value
#$nMaxName3 = $3;
#$nMaxName4 = $4;
print DEST_XML_FILE " max=\"$nMaxName2\"";
$x = 6;
}
}
elsif ($x == 6)
{
if (/\s*default\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nDefault1 = $1;
$nDefault2 = $2; # Find the default Value
#$nDefault3 = $3;
#$nDefault4 = $4;
print DEST_XML_FILE " default=\"$nDefault2\">";
print DEST_XML_FILE "</ECID>\n";
$x = 0;
}
}
}
if ($_ =~ /^\s*$/) {
# blank
}
checks for 0 or more whitespaces (\s*) bound by beginning(^)/end($) of line. That's checking for a blank line (i.e. may have whitespace). If you want an empty line check, just remove the \s*.
The check against $_ can be implicit, so you can reduce the above to if (/^\s*$/) for conciseness.
Against my better judgment I will try to help you again.
The issue is not how to find a blank line. The issue is not which regex to use. The fundamental issue is understanding how to analyze a problem and turn that analysis into code.
In this case the problem is "How do I parse this format?"
I've written a parser for you. I have also taken the time to write a detailed description of the process I used to write it.
WARNING: The parser is not carefully tested for all cases. It does not have enough error handling built in. For those features, you can request a rate card or write them yourself.
Here's the data sample you provided (I'm not sure which of your several questions I pulled this from):
constant fixup GemEstabCommDelay = <U2 20>
vid = 6
name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
units = "s"
min = <U2 0>
max = <U2 1800>
default = <U2 20>
constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
vid = 4
name = "" units = ""
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
Before you can write a parser for a data file, you need to have a description the structure of the file. If you are using a standard format (say XML) you can read the existing specification. If you are using some home-grown format, you get to write it yourself.
So, based on the sample data, we can see that:
data is broken into blocks.
each block starts with the word constant in column 0.
each block ends with a blank line.
a block consists of a start line, and zero or more additional lines.
The start line consists of the keyword constant followed by one or more whitespace delimited words, an '=' sign and an <> quoted data value.
The last keyword appears to be the name of the constant. Call it constant_name
The <>-quoted data appears to be a combined type/value specifier.
earlier keywords appear to specify additional metadata about the constant. Let's call those options.
The additional lines specify additional key value pairs. Let's call them attributes. Attributes may have a single value or they may have a type/value specifier.
One or more attributes may appear in a single line.
Okay, so now we have a rough spec. What do we do with it?
How is the format structured? Consider the logical units of organization from largest to smallest. These will determine the structure and flow of our code.
A FILE is made of BLOCKS.
BLOCKS are made of LINES.
So our parser should decompose a file into blocks, and then handle the blocks.
Now we rough out a parser in comments:
# Parse a constant spec file.
# Until file is done:
# Read in a whole block
# Parse the block and return key/value pairs for a hash.
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
# Return ref to big hash with all block data
Now we start to fill in some code:
# Parse a constant spec file.
sub parse_constant_spec {
my $fh = shift;
my %spec;
# Until file is done:
# Read in a whole block
while( my $block = read_block($fh) ) {
# Parse the and return key/value pairs for a hash.
my %constant = parse_block( $block );
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
$spec{ $constant{name} } = \%constant;
}
# Return ref to big hash with all block data
return \%spec;
}
But it won't work. The parse_block and read_block subs haven't been written yet. At this stage that's OK. The point is to rough in features in small, understandable chunks. Every once in a while, to keep things readable you need to gloss over the details drop in a subroutine--otherwise you wind up with monstrous 1000 line subs that are impossible to debug.
Now we know we need to write a couple of subs to finish up, et viola:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $fh = \*DATA;
print Dumper parse_constant_spec( $fh );
# Parse a constant spec file.
# Pass in a handle to process.
# As long as it acts like a file handle, it will work.
sub parse_constant_spec {
my $fh = shift;
my %spec;
# Until file is done:
# Read in a whole block
while( my $block = read_block($fh) ) {
# Parse the and return key/value pairs for a hash.
my %constant = parse_block( $block );
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
$spec{ $constant{const_name} } = \%constant;
}
# Return ref to big hash with all block data
return \%spec;
}
# Read a constant definition block from a file handle.
# void return when there is no data left in the file.
# Otherwise return an array ref containing lines to in the block.
sub read_block {
my $fh = shift;
my #lines;
my $block_started = 0;
while( my $line = <$fh> ) {
$block_started++ if $line =~ /^constant/;
if( $block_started ) {
last if $line =~ /^\s*$/;
push #lines, $line;
}
}
return \#lines if #lines;
return;
}
sub parse_block {
my $block = shift;
my ($start_line, #attribs) = #$block;
my %constant;
# Break down first line:
# First separate assignment from option list.
my ($start_head, $start_tail) = split /=/, $start_line;
# work on option list
my #options = split /\s+/, $start_head;
# Recover constant_name from options:
$constant{const_name} = pop #options;
$constant{options} = \#options;
# Now we parse the value/type specifier
#constant{'type', 'value' } = parse_type_value_specifier( $start_tail );
# Parse attribute lines.
# since we've already got multiple per line, get them all at once.
chomp #attribs;
my $attribs = join ' ', #attribs;
# we have one long line of mixed key = "value" or key = <TYPE VALUE>
#attribs = $attribs =~ /\s*(\w+\s+=\s+".*?"|\w+\s+=\s+<.*?>)\s*/g;
for my $attrib ( #attribs ) {
warn "$attrib\n";
my ($name, $value) = split /\s*=\s*/, $attrib;
if( $value =~ /^"/ ) {
$value =~ s/^"|"\s*$//g;
}
elsif( $value =~ /^</ ) {
$value = [ parse_type_value_specifier( $start_tail ) ];
}
else {
warn "Bad line";
}
$constant{ $name } = $value;
}
return %constant;
}
sub parse_type_value_specifier {
my $tvs = shift;
my ($type, $value) = $tvs =~ /<(\w+)\s+(.*?)>/;
return $type, $value;
}
__DATA__
constant fixup GemEstabCommDelay = <U2 20>
vid = 6
name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
units = "s"
min = <U2 0>
max = <U2 1800>
default = <U2 20>
constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
vid = 4
name = "" units = ""
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
The above code is far from perfect. IMO, parse_block is too long and ought to be broken into smaller subs. Also, there isn't nearly enough validation and enforcement of well-formed input. Variable names and descriptions could be clearer, but I don't really understand the semantics of your data format. Better names would more closely match the semantics of the data format.
Despite these issues, it does parse your format and produce a big handy data structure that can be stuffed into whatever output format you want.
If you use this format in many places, I recommend putting the parsing code into a module. See perldoc perlmod for more info.
Now, please stop using global variables and ignoring good advice. Please start reading the perldoc, read Learning Perl and Perl Best Practices, use strict, use warnings. While I am throwing reading lists around go read Global Variables are Bad and then wander around the wiki to read and learn. I learned more about writing software by reading c2 than I did in school.
If you have questions about how this code works, why it is laid out as it is, what other choices could have been made, speak up and ask. I am willing to help a willing student.
Your English is good, but it is clear you are not a native speaker. I may have used too many complex sentences. If you need parts of this written in simple sentences, I can try to help. I understand that working in a foreign language is very difficult.
The answer depends on what you mean by a blank line (whether it contains no characters apart from a newline or whether it contains only whitespace). An idiomatic way to deal with this is to use a negative match against \S which matches in both of these cases:
if ( ! /\S/ ) {
...
}
If you are only looking for the former than your own answer is fine.
You often see this technique used as a filter:
while (<>) {
next unless /\S/; # Ignore blank lines.
...
}
You can use:
if ($_ =~ /^$/)
or even just
if (/^$/)
since Perl assumes checking against $_
If you just want to check if the current value of $_ or $var is a blank (or at least all-whitespace) line, then something like
if (/^\s*$/) { ... }
if ($var =~ /^\s*$/){ ... }
as several others have already mentioned.
However, I find that I most commonly want to ignore blank lines while processing input in a loop. I do that like this:
while (<>) {
next if /^\s*$/;
...
}
If I want to allow the traditional shell-style comments, I usually add
s/\s*#.*$//;
just before the check for a blank line.
while (<>){
chomp;
if ($_ eq ""){
print "blank at $.\n";
}
}
The way you showed - if ( $_ eq '' ) is perfectly sane. Perhaps you should describe what is your problem with it?
if(/^\s*$/)
{
$x = 0;
}