Perl6: Confused about BagHash/Matching - regex

I'm attempting to count matches of a regex using a BagHash, and getting odd results.
my $fh = open "versions.txt";
my $versions = BagHash.new();
while (defined my $line = $fh.get) {
my $last = '';
if $line ~~ /(\d+)\.?(\d*)/ {
say 'match ' ~ $/[0];
if $last !eq $/[0] {
say 'not-same: ' ~ $/[0];
$versions{$/[0]}++
}
$last = $/[0];
}
else {
$last = '';
}
}
say 'count: ' ~ $versions.elems;
Output is:
match 234
not-same: 234
match 999
not-same 999
count: 1 # I expect 2 here.
The test case I'm working with is:
version history thingy
version=234.234
version=999
What am I missing?

You are resetting $last with each iteration. Also, don't trust say. It's meant to be used to avoid flooding a terminal or logfile with infinite lists. Use dd (Rakudo internal) or a module to dump debug output. If you would have used dd would would have seen that $/[0] contains a Match, a complex structure that is not suited to generate Hash keys.
# my #lines = slurp('version.txt');
my #lines = ('version=234.234', 'version=999');
my BagHash $versions.=new;
for #lines {
ENTER my $last = '';
if .Str ~~ /(\d+) '.'? (\d*)/ {
$versions{$0.Str}++ if $last ne $0.Str;
$last = $0.Str
}else{
$last = ''
}
};
dd $versions;
# OUTPUT«BagHash $versions = ("234"=>1,"999"=>1).BagHash␤»
The whole point of BagHash is that it's constructor will do the counting for you. If you supply lazy lists all the way down, this can be fairly efficient.
my #lines = ('version=234.234', 'version=999');
dd BagHash.new(#lines».split('=')».[1]);
# OUTPUT«("234.234"=>1,"999"=>1).BagHash␤»

Bug #1 is you almost certainly want your $last declaration outside of the loop so you don't keep resetting it to ''
Bug #2 you probably only want to update $last on the state where you found a version number not for all lines
Bug #3 you used the Match object as the key to the HashBag rather than the string value of the version. You can coerce a match to being the string it matched with ~$/[0] but just $0 is a shortcut for that too.
I cleaned up your code and got the below that works, but is really quite far from being idiomatic Perl 6:
my $fh = open "versions.txt";
my $versions = BagHash.new();
my $last = '';
for $fh.lines -> $line {
if $line ~~ /(\d+)\.?(\d*)/ {
say 'match ' ~ $/[0];
if $last ne $/[0] {
say 'not-same: ' ~ $/[0];
$versions{~$/[0]}++;
$last = $/[0];
}
}
else {
$last = '';
}
}
say $versions;
say 'count: ' ~ $versions.elems;
I would personally have written this as follows if it was throw away code:
my $versions = "versions.txt".IO.lines.comb(/(\d+)\.?(\d*)/).Bag;
say $versions.elems;
If you wanted the file later or to do more with each line or this is for production:
my %versions;
for "versions.txt".IO.lines -> $line {
if $line ~~ /((\d+)\.?(\d*))/ {
%versions{$0}++;
}
}
say %versions.elems;

Related

perl6 grep like program in parallel

I wrote a grep-like program in perl6, and now I made it into parallel processing. But I ran into some problem: even with the same command line the program sometimes succeeds, and sometimes fails. When it succeeds, things looks just normal to me. When it fails, I don't know why...
Here is the error message when it fails.
> grep6 perl *
An operation first awaited:
in sub MAIN at /Users/xxx/Dropbox/bin/grep6 line 28
in block <unit> at /Users/xxx/Dropbox/bin/grep6 line 30
Died with the exception:
Cannot find method 'Any' on object of type Match
in regex at /Users/xxx/Dropbox/bin/grep6 line 34
in sub do_something at /Users/xxx/Dropbox/bin/grep6 line 34
in block at /Users/xxx/Dropbox/bin/grep6 line 24
And the code is:
#!/usr/bin/env perl6
constant $color_red = "\e[31m";
constant $color_off = "\e[0m";
sub MAIN(Str $pattern, *#filenames){
my $channel = Channel.new();
$channel.send($_) for #filenames; # dir();
$channel.close;
my #workers;
for 1..3 -> $n {
push #workers, start {
while (my $file = $channel.poll) {
do_something($pattern, $file);
}
}
}
await(#workers);
}
sub do_something(Str $pattern, Str $filename) {
#say $filename;
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ (<$pattern>) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}
My question is why it fails sometimes?
Regards
Xin
This problem seems to be basically the same as a known rakudo issue for the race method.
I switched from:
if $temp ~~ s:g/ (<$pattern>) /$color_red$0$color_off/ {
to:
if $temp ~~ s:g/ ($pattern) /$color_red$0$color_off/ {
and the problem seemed to go away.
As later mentioned by Xin Cheng and also described in the same doc, the simpler interpolation matches literally as clarified by the doc examples. The issue ticket fixed the problem with something like:
my $reg = regex { <$pattern> };
'' ~~ $reg;
leading to an updated program with a similar workaround:
#!/usr/bin/env perl6
constant $color_red = "\e[31m";
constant $color_off = "\e[0m";
sub MAIN(Str $pattern, *#filenames){
my $channel = Channel.new();
$channel.send($_) for #filenames; # dir();
$channel.close;
my #workers;
# match seems required for pre-compilation
'' ~~ (my regex pat_regex { <$pattern> });
for 1..3 -> $n {
push #workers, start {
while (my $file = $channel.poll) {
do_something(&pat_regex, $file);
}
}
}
await(#workers);
}
sub do_something(Regex $pat_regex, Str $filename) {
# say $filename;
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ ($pat_regex) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}
My apologies for the earlier proposed explicit EVAL solution, about which the best I can say is that my description requested a better solution.
Did a bit of playing about the issue seems to be the anonymous regexp you're creating by doing :
s:g/ (<$pattern>) /$color_red$0$color_off/
If you instead precompile your regex (either in do_something or the MAIN routine then the errors stop.
Here's the updated do_something version :
sub do_something(Str $pattern, Str $filename) {
my $reg = regex { $pattern };
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ ($reg) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}

perl regex not start and end wi

I am trying to write a perl script that get all strings that is does not start and end with a single quote. And a string cannot be a part of comment # and each line in DATA is not necessary at the beginning of a line.
use warnings;
use strict;
my $file;
{
local $/ = undef;
$file = <DATA>;
};
my #strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",#strings);
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
I am getting no where with this regex.
The expected output is
"This is a string2"
"This is comment syntax #"
"This is string 4"
Obviously this is only an exercise, as there are been many students asking about this problem lately. Regex's will only ever get you part of the way there, as there will pretty much always be edge cases.
The following code is probably good enough for your purposes, but it doesn't even successfully parse itself because of quotes inside a qr{}. You'll have to figure out how to get strings that span lines to work on your own:
use strict;
use warnings;
my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;
my $data = do { local $/; <DATA> };
while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
my $match = $1;
if ($match =~ /^#/) {
print "Comment - $match\n";
} elsif ($match =~ /^"/) {
print "Double quote - $match\n";
} elsif ($match =~ /^'/) {
print "Single quote - $match\n";
} else {
die "Carp! something went wrong! <$match>";
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Do not know how to achieve that by using regular expression, so here is a simple hand-written lexer:
#!/usr/bin/perl
use strict;
use warnings;
sub extract_string {
my #buf = split //, shift;
while (my $peer = shift #buf) {
if ($peer eq '"') {
my $str = "$peer";
while ($peer = shift #buf) {
$str .= "$peer";
last if $peer eq '"';
}
if ($peer) {
return ($str, join '', #buf);
}
else {
return ("", "");
}
}
elsif ($peer eq '#') {
return ("", "");
}
}
}
my ($str, $buf);
while ($buf = <DATA>) {
chomp $buf;
while (1) {
($str, $buf) = extract_string $buf;
print "$str\n" if $str;
last unless $buf;
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Another option is using Perl module such as PPI.

Parsing input to get specific values

I have input like this:
"[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,D=14}|phone";
It appears as a continuous line, there are no line breaks. I need the
largest value out of the values between [ and the first occurrence of
|. In this case, for example, the largest value is 204. Once
that is obtained, I want to print the contents of that element
between []. In this case, it would be "204|0|{A=9,B=201,C=61,D=11}|Calculator".
I've tried something like this, but it is not going anywhere:
my #array1;
my $data = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=1
+7}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,
+D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C
+=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}
+|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,
+D=14}|phone";
my $high = 0;
my #values = split(/\[([^\]]+)\]/,$data) ;
print "Values is #values \n";
foreach (#values) {
# I want the value that preceeds the first occurence of | in each array
# element, i.e. 0,0,196,204, etc.
my ($conf,$rest)= split(/\|/,$_);
print "Conf is $conf \n";
print "Rest is $rest \n";
push(#array1, $conf);
push (#array2, $rest);
print "Array 1 is #array1 \n";
print "Array 2 is #array2 \n";
}
$conf = highest(#array1);
my $i=0;
# I want the index value of the element that contains the highest conf value,
# in this case 204.
for (#myarray1) { last if $conf eq $_; $i++; };
print "$conf=$i\n";
# I want to print the rest of the string that was split in the same index
# position.
$rest = #array2[$i];
print "Rest is $rest \n";
# To get the highest conf value
sub highest {
my #data = #_;
my $high = 0;
for(#data) {
$high = $_ if $_ > $high;
}
$high;
}
Maybe I should be using a different approach. Could someone help me, please?
One way of doing it:
#!/usr/bin/perl
use strict;
my $s = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]";
my #parts = split(/\]/, $s);
my $max = 0;
my $data = "";
foreach my $part (#parts) {
if ($part =~ /\[(\d+)/) {
if ($1 > $max) {
$max = $1;
$data = substr($part, 1);
}
}
}
print $data."\n";
A couple of notes:
you can split your original string by \], so you get parts like [0|0|{A=145,B=2,C=12,D=18}|!
then you parse each part to get the integer after the initial [
the rest it's easy: keep track of the biggest integer and of the corresponding part, and output it at the end.
In shell script:
#!/bin/bash
MAXVAL=$(cat /tmp/data | tr [ "\\n" | cut -d"|" -f1 | sort -n | tail -1)
cat /tmp/data | tr [] "\\n" | grep ^$MAXVAL
The first line cuts your big mass of data into lines, extracts just the first field, sorts it and takes the max. The second line cuts the data into lines again and greps for that max val.
If you have a LOT of data, this could be slow, so you could put the "lined" data into a temp file or something.
split() is the Right Tool when you know what you want to throw away. Capturing or m//g is the Right Tool when you know what you want to keep. (paraphrased from a Randal Schwartz quote).
You want to specify what to keep (between square brackets) rather than what to throw away (nothing!).
Luckily, your data is "hash shaped" (ie. alternating keys and values), so load it into a hash, sort the keys, and output the value for the highest key:
my %data = $data =~ /\[
(\d+) # digits are the keys
([^]]+) # rest are the values
\]/gx;
my($highest) = sort {$b <=> $a} keys %data; # inefficent if $data is big
print $highest, $data{$highest}, "\n";
Another way of doing this :
#!/usr/bin/perl
use strict;
my $str = '[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|phone]0|0|{A=145,B=2,C=12,D=18}|!0|0|{A=167,B=2,C=67,D=17}|.1iit196|0|{A=244,B=6,C=67,D=12}|10:48AM204|0|{A=9,B=201,C=61,D=11}|Calculator66|0|{A=145,B=450,C=49,D=14}|phone';
my $maxval = 0;
my $pattern;
while ( $str =~ /(\[(\d+)\|.+?\])/g)
{
if ( $maxval < $2 ) {
$maxval = $2;
$pattern = $1;
}
}
print "Maximum value = $maxval and the associate pattern = $pattern \n";
# In this example $maxvalue = 204
# and $pattern = [204|0|{A=9,B=201,C=61,D=11}|Calculator]

(La)Tex math parsing for C/C++

I would like to convert parse (la)tex math expressions, and convert them to (any kind of!) scripting language expression, so I can evaluate expressions.
What libraries do you recommend ?
May be it will help - take a look at TeXmacs, especially at a way it interacts with computer algebra systems.
Here is a set of possible options from a similar question. https://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
I think that Perl would make a fine choice for something like this, acting on text is one of its fortes.
Here is some info on how to make an exclusive flip-flop test (to find the context between \begin{} and \end{} without keeping those lines), http://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
EDIT: So this problem has started me going. Here is a first attempt to create something here is my "math.pl" which takes a .tex file as an arguement (i.e. $./math.pl test.tex).
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $re_num = qr/[+\-\dE\.]/;
my $file = shift;
open( my $fh, '<', $file);
#parsing this out for more than just the equation environment might be easier using Text::Balanced too.
my #equations;
my $current_equation = '';
while(<$fh>) {
my $test;
next unless ($test = /\\begin\{equation\}/ .. /\\end\{equation\}/);
if ($test !~ /(^1|E0)$/ ) {
chomp;
$current_equation .= $_;
} elsif ($test =~ /E0$/) {
#print $current_equation . "\n";
push #equations, {eq => $current_equation};
$current_equation = '';
}
}
foreach my $eq (#equations) {
print "Full Equation: " . $eq->{'eq'} . "\n";
solve($eq);
print "Result: " . $eq->{'value'} . "\n\n";
}
sub solve {
my $eq = shift;
print $eq->{'eq'} . "\n";
parse($eq);
compute($eq);
print "intermediate result: " . $eq->{'value'} . "\n";
}
sub parse {
my $eq = shift;
my ($command,#fields) = extract_multiple(
$eq->{'eq'}, [ sub { extract_bracketed(shift,'{}') } ]
);
$command =~ s/^\\//;
print "command: " . $command . "\n";
#fields = map { s/^\{\ *//; s/\ *\}$//; print "arg: $_\n"; {value => $_}; } #fields;
($eq->{'command'}, #{ $eq->{'args'} }) = ($command, #fields);
}
sub compute {
my ($eq) = #_;
#check arguements ...
foreach my $arg (#{$eq->{'args'}}) {
#if arguement is a number, continue
if ($arg->{'value'} =~ /^$re_num$/) {
next;
#if the arguement is a simple mathematical operation, do it and continue
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\ |\*|\\times)?\ *($re_num)$/) {
$arg->{'value'} = $1 * $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\+)?\ *($re_num)$/) {
$arg->{'value'} = $1 + $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\-)?\ *($re_num)$/) {
$arg->{'value'} = $1 - $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\/)?\ *($re_num)$/) {
$arg->{'value'} = $1 / $2;
} else {
#parse it and calc it as if it were its own equation.
$arg->{'eq'} = $arg->{'value'};
solve($arg);
}
}
my #args = #{$eq->{'args'}};
## add command processing here
# frac
if ($eq->{'command'} eq 'frac') {
$eq->{'value'} = $args[0]->{'value'} / $args[1]->{'value'};
return;
}
}
and here is a sample test.tex:
\documentclass{article}
\begin{document}
Hello World!
\begin{equation}
\frac{\frac{1}{3}}{2}
\end{equation}
\end{document}
Maybe using boost::spirit in order to tokenize the expression. You will need to define a huge grammar!
Use a parser generator to create an appropriate parser. Try ANTLR for this, as it includes an IDE for the Grammar, which is very helpful. Using tree rewrite rules, you can then convert the parse tree to an abstract syntax tree.
Start perhaps with the expression evaluator from ANTLR tutorial. I think this is reasonably close enough.

How can I detect a blank line in Perl?

How do I check a line ($_ value) is a blank line in Perl? Or another
good method to check it instead of using $_?
I want to code like this
if ($_ eq '') # Check current line is a blank line (no any characters)
{
$x = 0;
}
I updated some code with a question solution below.
My test.txt for parsing:
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
constant fixup private GemConfigAlarms = <U1 0> /* my Comment */
vid = 1
name = "CONFIGALARMS"
units = ""
min = <U1 0>
max = <U1 2>
default = <U1 0>
My code is below.
That's why I need to initially set $x = 0. I am not sure if it is a normal
solution or not.
sub ConstantParseAndPrint
{
if (/^$/) // SOLUTION!
{
$x = 0;
}
if ($x == 0)
{
if (/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+(["']?)([a-zA-Z0-9.:\\]+)\6>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
{
$name1 = $1; # Constant
$name2 = $2; # Fixup
$name3 = $3; # Private
$name4 = $4;
$name5 = $5;
$name6 = $7;
$name7 = $8;
# start print
if (!$name7 eq '')
{
print DEST_XML_FILE "<!-- $name7-->\n";
}
print DEST_XML_FILE " <ECID";
print DEST_XML_FILE " logicalName=\"$name4\"";
print DEST_XML_FILE " valueType=\"$name5\"";
print DEST_XML_FILE " value=\"$name6\"";
$x = 1;
}
}
elsif ($x == 1)
{
if(/\s*vid\s*=\s*(.*?)(\s|\n|\r)/)
{
$nID = $1;
print DEST_XML_FILE " vid=\"$nID\"";
$x = 2;
}
}
elsif ($x == 2)
{
if(/\s*name\s*=\s*(.*?)(\s|\n|\r)/)
{
$nName = $1;
print DEST_XML_FILE " name=$nName";
$x = 3;
}
}
elsif ($x == 3)
{
if (/\s*units\s*=\s*(.*?)(\s|\n|\r)/)
{
$nUnits = $1;
print DEST_XML_FILE " units=$nUnits";
$x = 4;
}
}
elsif ($x == 4)
{
# \s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>\
if (/\s*min\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nMinName1 = $1;
$nMinName2 = $2; # Find the nMin Value
#$nMinName3 = $3;
#$nMinName4 = $4;
print DEST_XML_FILE " min=\"$nMinName2\"";
$x = 5;
}
else
{
print DEST_XML_FILE "></ECID>\n";
$x = 0; # There is no line 4 and line 5
}
}
elsif ($x == 5)
{
if (/\s*max\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nMaxName1 = $1;
$nMaxName2 = $2; # Find the nMax Value
#$nMaxName3 = $3;
#$nMaxName4 = $4;
print DEST_XML_FILE " max=\"$nMaxName2\"";
$x = 6;
}
}
elsif ($x == 6)
{
if (/\s*default\s*=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)>(\s|\n|\r)/)
{
#$nDefault1 = $1;
$nDefault2 = $2; # Find the default Value
#$nDefault3 = $3;
#$nDefault4 = $4;
print DEST_XML_FILE " default=\"$nDefault2\">";
print DEST_XML_FILE "</ECID>\n";
$x = 0;
}
}
}
if ($_ =~ /^\s*$/) {
# blank
}
checks for 0 or more whitespaces (\s*) bound by beginning(^)/end($) of line. That's checking for a blank line (i.e. may have whitespace). If you want an empty line check, just remove the \s*.
The check against $_ can be implicit, so you can reduce the above to if (/^\s*$/) for conciseness.
Against my better judgment I will try to help you again.
The issue is not how to find a blank line. The issue is not which regex to use. The fundamental issue is understanding how to analyze a problem and turn that analysis into code.
In this case the problem is "How do I parse this format?"
I've written a parser for you. I have also taken the time to write a detailed description of the process I used to write it.
WARNING: The parser is not carefully tested for all cases. It does not have enough error handling built in. For those features, you can request a rate card or write them yourself.
Here's the data sample you provided (I'm not sure which of your several questions I pulled this from):
constant fixup GemEstabCommDelay = <U2 20>
vid = 6
name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
units = "s"
min = <U2 0>
max = <U2 1800>
default = <U2 20>
constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
vid = 4
name = "" units = ""
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
Before you can write a parser for a data file, you need to have a description the structure of the file. If you are using a standard format (say XML) you can read the existing specification. If you are using some home-grown format, you get to write it yourself.
So, based on the sample data, we can see that:
data is broken into blocks.
each block starts with the word constant in column 0.
each block ends with a blank line.
a block consists of a start line, and zero or more additional lines.
The start line consists of the keyword constant followed by one or more whitespace delimited words, an '=' sign and an <> quoted data value.
The last keyword appears to be the name of the constant. Call it constant_name
The <>-quoted data appears to be a combined type/value specifier.
earlier keywords appear to specify additional metadata about the constant. Let's call those options.
The additional lines specify additional key value pairs. Let's call them attributes. Attributes may have a single value or they may have a type/value specifier.
One or more attributes may appear in a single line.
Okay, so now we have a rough spec. What do we do with it?
How is the format structured? Consider the logical units of organization from largest to smallest. These will determine the structure and flow of our code.
A FILE is made of BLOCKS.
BLOCKS are made of LINES.
So our parser should decompose a file into blocks, and then handle the blocks.
Now we rough out a parser in comments:
# Parse a constant spec file.
# Until file is done:
# Read in a whole block
# Parse the block and return key/value pairs for a hash.
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
# Return ref to big hash with all block data
Now we start to fill in some code:
# Parse a constant spec file.
sub parse_constant_spec {
my $fh = shift;
my %spec;
# Until file is done:
# Read in a whole block
while( my $block = read_block($fh) ) {
# Parse the and return key/value pairs for a hash.
my %constant = parse_block( $block );
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
$spec{ $constant{name} } = \%constant;
}
# Return ref to big hash with all block data
return \%spec;
}
But it won't work. The parse_block and read_block subs haven't been written yet. At this stage that's OK. The point is to rough in features in small, understandable chunks. Every once in a while, to keep things readable you need to gloss over the details drop in a subroutine--otherwise you wind up with monstrous 1000 line subs that are impossible to debug.
Now we know we need to write a couple of subs to finish up, et viola:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $fh = \*DATA;
print Dumper parse_constant_spec( $fh );
# Parse a constant spec file.
# Pass in a handle to process.
# As long as it acts like a file handle, it will work.
sub parse_constant_spec {
my $fh = shift;
my %spec;
# Until file is done:
# Read in a whole block
while( my $block = read_block($fh) ) {
# Parse the and return key/value pairs for a hash.
my %constant = parse_block( $block );
# Store a ref to the hash in a big hash of all blocks, keyed by constant_name.
$spec{ $constant{const_name} } = \%constant;
}
# Return ref to big hash with all block data
return \%spec;
}
# Read a constant definition block from a file handle.
# void return when there is no data left in the file.
# Otherwise return an array ref containing lines to in the block.
sub read_block {
my $fh = shift;
my #lines;
my $block_started = 0;
while( my $line = <$fh> ) {
$block_started++ if $line =~ /^constant/;
if( $block_started ) {
last if $line =~ /^\s*$/;
push #lines, $line;
}
}
return \#lines if #lines;
return;
}
sub parse_block {
my $block = shift;
my ($start_line, #attribs) = #$block;
my %constant;
# Break down first line:
# First separate assignment from option list.
my ($start_head, $start_tail) = split /=/, $start_line;
# work on option list
my #options = split /\s+/, $start_head;
# Recover constant_name from options:
$constant{const_name} = pop #options;
$constant{options} = \#options;
# Now we parse the value/type specifier
#constant{'type', 'value' } = parse_type_value_specifier( $start_tail );
# Parse attribute lines.
# since we've already got multiple per line, get them all at once.
chomp #attribs;
my $attribs = join ' ', #attribs;
# we have one long line of mixed key = "value" or key = <TYPE VALUE>
#attribs = $attribs =~ /\s*(\w+\s+=\s+".*?"|\w+\s+=\s+<.*?>)\s*/g;
for my $attrib ( #attribs ) {
warn "$attrib\n";
my ($name, $value) = split /\s*=\s*/, $attrib;
if( $value =~ /^"/ ) {
$value =~ s/^"|"\s*$//g;
}
elsif( $value =~ /^</ ) {
$value = [ parse_type_value_specifier( $start_tail ) ];
}
else {
warn "Bad line";
}
$constant{ $name } = $value;
}
return %constant;
}
sub parse_type_value_specifier {
my $tvs = shift;
my ($type, $value) = $tvs =~ /<(\w+)\s+(.*?)>/;
return $type, $value;
}
__DATA__
constant fixup GemEstabCommDelay = <U2 20>
vid = 6
name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
units = "s"
min = <U2 0>
max = <U2 1800>
default = <U2 20>
constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
vid = 4
name = "" units = ""
constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
vid = 0
name = ""
units = ""
The above code is far from perfect. IMO, parse_block is too long and ought to be broken into smaller subs. Also, there isn't nearly enough validation and enforcement of well-formed input. Variable names and descriptions could be clearer, but I don't really understand the semantics of your data format. Better names would more closely match the semantics of the data format.
Despite these issues, it does parse your format and produce a big handy data structure that can be stuffed into whatever output format you want.
If you use this format in many places, I recommend putting the parsing code into a module. See perldoc perlmod for more info.
Now, please stop using global variables and ignoring good advice. Please start reading the perldoc, read Learning Perl and Perl Best Practices, use strict, use warnings. While I am throwing reading lists around go read Global Variables are Bad and then wander around the wiki to read and learn. I learned more about writing software by reading c2 than I did in school.
If you have questions about how this code works, why it is laid out as it is, what other choices could have been made, speak up and ask. I am willing to help a willing student.
Your English is good, but it is clear you are not a native speaker. I may have used too many complex sentences. If you need parts of this written in simple sentences, I can try to help. I understand that working in a foreign language is very difficult.
The answer depends on what you mean by a blank line (whether it contains no characters apart from a newline or whether it contains only whitespace). An idiomatic way to deal with this is to use a negative match against \S which matches in both of these cases:
if ( ! /\S/ ) {
...
}
If you are only looking for the former than your own answer is fine.
You often see this technique used as a filter:
while (<>) {
next unless /\S/; # Ignore blank lines.
...
}
You can use:
if ($_ =~ /^$/)
or even just
if (/^$/)
since Perl assumes checking against $_
If you just want to check if the current value of $_ or $var is a blank (or at least all-whitespace) line, then something like
if (/^\s*$/) { ... }
if ($var =~ /^\s*$/){ ... }
as several others have already mentioned.
However, I find that I most commonly want to ignore blank lines while processing input in a loop. I do that like this:
while (<>) {
next if /^\s*$/;
...
}
If I want to allow the traditional shell-style comments, I usually add
s/\s*#.*$//;
just before the check for a blank line.
while (<>){
chomp;
if ($_ eq ""){
print "blank at $.\n";
}
}
The way you showed - if ( $_ eq '' ) is perfectly sane. Perhaps you should describe what is your problem with it?
if(/^\s*$/)
{
$x = 0;
}