I wrote a grep-like program in perl6, and now I made it into parallel processing. But I ran into some problem: even with the same command line the program sometimes succeeds, and sometimes fails. When it succeeds, things looks just normal to me. When it fails, I don't know why...
Here is the error message when it fails.
> grep6 perl *
An operation first awaited:
in sub MAIN at /Users/xxx/Dropbox/bin/grep6 line 28
in block <unit> at /Users/xxx/Dropbox/bin/grep6 line 30
Died with the exception:
Cannot find method 'Any' on object of type Match
in regex at /Users/xxx/Dropbox/bin/grep6 line 34
in sub do_something at /Users/xxx/Dropbox/bin/grep6 line 34
in block at /Users/xxx/Dropbox/bin/grep6 line 24
And the code is:
#!/usr/bin/env perl6
constant $color_red = "\e[31m";
constant $color_off = "\e[0m";
sub MAIN(Str $pattern, *#filenames){
my $channel = Channel.new();
$channel.send($_) for #filenames; # dir();
$channel.close;
my #workers;
for 1..3 -> $n {
push #workers, start {
while (my $file = $channel.poll) {
do_something($pattern, $file);
}
}
}
await(#workers);
}
sub do_something(Str $pattern, Str $filename) {
#say $filename;
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ (<$pattern>) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}
My question is why it fails sometimes?
Regards
Xin
This problem seems to be basically the same as a known rakudo issue for the race method.
I switched from:
if $temp ~~ s:g/ (<$pattern>) /$color_red$0$color_off/ {
to:
if $temp ~~ s:g/ ($pattern) /$color_red$0$color_off/ {
and the problem seemed to go away.
As later mentioned by Xin Cheng and also described in the same doc, the simpler interpolation matches literally as clarified by the doc examples. The issue ticket fixed the problem with something like:
my $reg = regex { <$pattern> };
'' ~~ $reg;
leading to an updated program with a similar workaround:
#!/usr/bin/env perl6
constant $color_red = "\e[31m";
constant $color_off = "\e[0m";
sub MAIN(Str $pattern, *#filenames){
my $channel = Channel.new();
$channel.send($_) for #filenames; # dir();
$channel.close;
my #workers;
# match seems required for pre-compilation
'' ~~ (my regex pat_regex { <$pattern> });
for 1..3 -> $n {
push #workers, start {
while (my $file = $channel.poll) {
do_something(&pat_regex, $file);
}
}
}
await(#workers);
}
sub do_something(Regex $pat_regex, Str $filename) {
# say $filename;
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ ($pat_regex) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}
My apologies for the earlier proposed explicit EVAL solution, about which the best I can say is that my description requested a better solution.
Did a bit of playing about the issue seems to be the anonymous regexp you're creating by doing :
s:g/ (<$pattern>) /$color_red$0$color_off/
If you instead precompile your regex (either in do_something or the MAIN routine then the errors stop.
Here's the updated do_something version :
sub do_something(Str $pattern, Str $filename) {
my $reg = regex { $pattern };
for $filename.IO.lines -> $line {
my Str $temp = $line;
if $temp ~~ s:g/ ($reg) /$color_red$0$color_off/ {
say $filename ~ ": " ~ $temp;
}
}
}
Related
I'm attempting to count matches of a regex using a BagHash, and getting odd results.
my $fh = open "versions.txt";
my $versions = BagHash.new();
while (defined my $line = $fh.get) {
my $last = '';
if $line ~~ /(\d+)\.?(\d*)/ {
say 'match ' ~ $/[0];
if $last !eq $/[0] {
say 'not-same: ' ~ $/[0];
$versions{$/[0]}++
}
$last = $/[0];
}
else {
$last = '';
}
}
say 'count: ' ~ $versions.elems;
Output is:
match 234
not-same: 234
match 999
not-same 999
count: 1 # I expect 2 here.
The test case I'm working with is:
version history thingy
version=234.234
version=999
What am I missing?
You are resetting $last with each iteration. Also, don't trust say. It's meant to be used to avoid flooding a terminal or logfile with infinite lists. Use dd (Rakudo internal) or a module to dump debug output. If you would have used dd would would have seen that $/[0] contains a Match, a complex structure that is not suited to generate Hash keys.
# my #lines = slurp('version.txt');
my #lines = ('version=234.234', 'version=999');
my BagHash $versions.=new;
for #lines {
ENTER my $last = '';
if .Str ~~ /(\d+) '.'? (\d*)/ {
$versions{$0.Str}++ if $last ne $0.Str;
$last = $0.Str
}else{
$last = ''
}
};
dd $versions;
# OUTPUT«BagHash $versions = ("234"=>1,"999"=>1).BagHash»
The whole point of BagHash is that it's constructor will do the counting for you. If you supply lazy lists all the way down, this can be fairly efficient.
my #lines = ('version=234.234', 'version=999');
dd BagHash.new(#lines».split('=')».[1]);
# OUTPUT«("234.234"=>1,"999"=>1).BagHash»
Bug #1 is you almost certainly want your $last declaration outside of the loop so you don't keep resetting it to ''
Bug #2 you probably only want to update $last on the state where you found a version number not for all lines
Bug #3 you used the Match object as the key to the HashBag rather than the string value of the version. You can coerce a match to being the string it matched with ~$/[0] but just $0 is a shortcut for that too.
I cleaned up your code and got the below that works, but is really quite far from being idiomatic Perl 6:
my $fh = open "versions.txt";
my $versions = BagHash.new();
my $last = '';
for $fh.lines -> $line {
if $line ~~ /(\d+)\.?(\d*)/ {
say 'match ' ~ $/[0];
if $last ne $/[0] {
say 'not-same: ' ~ $/[0];
$versions{~$/[0]}++;
$last = $/[0];
}
}
else {
$last = '';
}
}
say $versions;
say 'count: ' ~ $versions.elems;
I would personally have written this as follows if it was throw away code:
my $versions = "versions.txt".IO.lines.comb(/(\d+)\.?(\d*)/).Bag;
say $versions.elems;
If you wanted the file later or to do more with each line or this is for production:
my %versions;
for "versions.txt".IO.lines -> $line {
if $line ~~ /((\d+)\.?(\d*))/ {
%versions{$0}++;
}
}
say %versions.elems;
I'm just starting to learn Perl. I need to parse JavaScript file. I came up with the following subroutine, to do it:
sub __settings {
my ($_s) = #_;
my $f = $config_directory . "/authentic-theme/settings.js";
if ( -r $f ) {
for (
split(
'\n',
$s = do {
local $/ = undef;
open my $fh, "<", $f;
<$fh>;
}
)
)
{
if ( index( $_, '//' ) == -1
&& ( my #m = $_ =~ /(?:$_s\s*=\s*(.*))/g ) )
{
my $m = join( '\n', #m );
$m =~ s/[\'\;]//g;
return $m;
}
}
}
}
I have the following regex, that removes ' and ; from the string:
s/[\'\;]//g;
It works alright but if there is a mentioned chars (' and ;) in string - then they are also removed. This is undesirable and that's where I stuck as it gets a bit more complicated for me and I'm not sure how to change the regex above correctly to only:
Remove only first ' in string
Remove only last ' in string
Remove ont last ; in string if exists
Any help, please?
You can use the following to match:
^'|';?$|;$
And replace with '' (empty string)
See DEMO
Remove only first ' in string
Remove only last ' in string
^[^']*\K'|'(?=[^']*$)
Try this .See demo.
https://regex101.com/r/oF9hR9/8
Remove ont last ; in string if exists
;(?=[^;]*$)
Try this.See demo.
https://regex101.com/r/oF9hR9/9
All three in one
^[^']*\K'|'(?=[^']*$)|;(?=[^;]*$)
See Here
You can use this code:
#!/usr/bin/perl
$str = "'string; 'inside' another;";
$str =~ s/^'|'?;?$//g;
print $str;
IDEONE demo
The main idea is to use anchors: ^ beginning of string, $ end of string and ;? matches the ";" symbol at the end only if it is present (? quantifier is making the pattern preceding it optional).EDIT: Also, ; will get removed even if there is no preceding '.
I suggest that your original code should look more like this. It is much more idiomatic Perl and I think more straightforward to follow
sub __settings {
my ($_s) = #_;
my $file = "$config_directory/authentic-theme/settings.js";
return unless -r $file;
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
my #file = <$fh>;
chomp #file;
for ( #file ) {
next if m{//};
if ( my #matches = $_ =~ /(?:$_s\s*=\s*(.*))/g ) {
my $matches = join "\n", #matches;
$matches =~ tr/';//d;
return $matches;
}
}
}
I am trying to write a perl script that get all strings that is does not start and end with a single quote. And a string cannot be a part of comment # and each line in DATA is not necessary at the beginning of a line.
use warnings;
use strict;
my $file;
{
local $/ = undef;
$file = <DATA>;
};
my #strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",#strings);
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
I am getting no where with this regex.
The expected output is
"This is a string2"
"This is comment syntax #"
"This is string 4"
Obviously this is only an exercise, as there are been many students asking about this problem lately. Regex's will only ever get you part of the way there, as there will pretty much always be edge cases.
The following code is probably good enough for your purposes, but it doesn't even successfully parse itself because of quotes inside a qr{}. You'll have to figure out how to get strings that span lines to work on your own:
use strict;
use warnings;
my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;
my $data = do { local $/; <DATA> };
while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
my $match = $1;
if ($match =~ /^#/) {
print "Comment - $match\n";
} elsif ($match =~ /^"/) {
print "Double quote - $match\n";
} elsif ($match =~ /^'/) {
print "Single quote - $match\n";
} else {
die "Carp! something went wrong! <$match>";
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Do not know how to achieve that by using regular expression, so here is a simple hand-written lexer:
#!/usr/bin/perl
use strict;
use warnings;
sub extract_string {
my #buf = split //, shift;
while (my $peer = shift #buf) {
if ($peer eq '"') {
my $str = "$peer";
while ($peer = shift #buf) {
$str .= "$peer";
last if $peer eq '"';
}
if ($peer) {
return ($str, join '', #buf);
}
else {
return ("", "");
}
}
elsif ($peer eq '#') {
return ("", "");
}
}
}
my ($str, $buf);
while ($buf = <DATA>) {
chomp $buf;
while (1) {
($str, $buf) = extract_string $buf;
print "$str\n" if $str;
last unless $buf;
}
}
__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";
Another option is using Perl module such as PPI.
I have put together a Perl script to go through a directory and match various keys in the source and output the results to a text file. The match operation works well, however the end goal is to perform a replace operation. The Perl script is as follows:
#!/usr/bin/perl
#use strict;
use warnings;
#use File::Slurp;
#declare variables
my $file = '';
my $verbose = 0;
my $logfile;
my #files = grep {/[.](pas|cmm|ptd|pro)$/i} glob 'C:\users\perry_m\desktop\epic_test\pascal_code\*.*';
#iterate through the files in input directory
foreach $file (#files) {
print "$file\n";
#read the file into a single string
open FILEHANDLE, $file or die $!;
my $string = do { local $/; <FILEHANDLE> };
#perfrom REGEX on this string
########################################################
#fix the include formats to conform to normal PASCAL
$count = 0;
while ($string =~ m/%INCLUDE/g)
{
#%include
$count++;
}
if ($count > 0)
{
print " $count %INCLUDE\n";
}
$count = 0;
while ($string =~ m/INCLUDE/g)
{
#%INCLUDE;
$count++;
}
if ($count > 0)
{
print " $count INCLUDE\n";
}
$count = 0;
while ($string =~ m/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/g)
{
#$1$2;
$count++;
}
if ($count > 0)
{
print " $count XXXX:include \n";
}
}
This produces output as desired, an example is below:
C:\users\perry_m\desktop\epic_test\pascal_code\BRTINIT.PAS
1 INCLUDE
2 XXXX:include
39 external and readonly
However if I change the regex operations to try and implement a replace, using the replacement operation shown in the commented lines above, the scripts hangs and never returns. I imagine it is somehow related to memory, but I am new to Perl. I was also trying to avoid parsing the file by line if possible.
Example:
while ($string =~ s/%INCLUDE/%include/g)
{
#%include
$count++;
}
and
while ($string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g)
{
#$1$2;
$count++;
}
Edit: simplified the examples
The problem is with your while loops. A loop like
while ($string =~ m/INCLUDE/g) { ... }
will execute once for each ocurrence of INCLUDE in the target string, but a subtitution like
$string =~ s/INCLUDE/%INCLUDE;/
will make all of the replacement in one go and retuen the number of replacements made. So a loop
while ($string =~ s/INCLUDE/%INCLUDE;/g) { ... }
will endlessly add more and more percentage signs before and semicolons after every INCLUDE.
To find the number of replacements made, change all your loops like this to just
$count = $string =~ s/INCLUDE/%INCLUDE;/g
the pattern in s/INCLUDE/%INCLUDE/g will match the replacement also, so if you're running it in a while loop it will run forever (until you run out of memory).
s///g will replace all matches in a single shot so you very rarely will need to put it in a loop. Same goes for m//g, it will do the counting in a single step if you put it in list context.
I would like to convert parse (la)tex math expressions, and convert them to (any kind of!) scripting language expression, so I can evaluate expressions.
What libraries do you recommend ?
May be it will help - take a look at TeXmacs, especially at a way it interacts with computer algebra systems.
Here is a set of possible options from a similar question. https://tex.stackexchange.com/questions/4223/what-parsers-for-latex-mathematics-exist-outside-of-the-tex-engines
I think that Perl would make a fine choice for something like this, acting on text is one of its fortes.
Here is some info on how to make an exclusive flip-flop test (to find the context between \begin{} and \end{} without keeping those lines), http://www.effectiveperlprogramming.com/2010/11/make-exclusive-flip-flop-operators/
EDIT: So this problem has started me going. Here is a first attempt to create something here is my "math.pl" which takes a .tex file as an arguement (i.e. $./math.pl test.tex).
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $re_num = qr/[+\-\dE\.]/;
my $file = shift;
open( my $fh, '<', $file);
#parsing this out for more than just the equation environment might be easier using Text::Balanced too.
my #equations;
my $current_equation = '';
while(<$fh>) {
my $test;
next unless ($test = /\\begin\{equation\}/ .. /\\end\{equation\}/);
if ($test !~ /(^1|E0)$/ ) {
chomp;
$current_equation .= $_;
} elsif ($test =~ /E0$/) {
#print $current_equation . "\n";
push #equations, {eq => $current_equation};
$current_equation = '';
}
}
foreach my $eq (#equations) {
print "Full Equation: " . $eq->{'eq'} . "\n";
solve($eq);
print "Result: " . $eq->{'value'} . "\n\n";
}
sub solve {
my $eq = shift;
print $eq->{'eq'} . "\n";
parse($eq);
compute($eq);
print "intermediate result: " . $eq->{'value'} . "\n";
}
sub parse {
my $eq = shift;
my ($command,#fields) = extract_multiple(
$eq->{'eq'}, [ sub { extract_bracketed(shift,'{}') } ]
);
$command =~ s/^\\//;
print "command: " . $command . "\n";
#fields = map { s/^\{\ *//; s/\ *\}$//; print "arg: $_\n"; {value => $_}; } #fields;
($eq->{'command'}, #{ $eq->{'args'} }) = ($command, #fields);
}
sub compute {
my ($eq) = #_;
#check arguements ...
foreach my $arg (#{$eq->{'args'}}) {
#if arguement is a number, continue
if ($arg->{'value'} =~ /^$re_num$/) {
next;
#if the arguement is a simple mathematical operation, do it and continue
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\ |\*|\\times)?\ *($re_num)$/) {
$arg->{'value'} = $1 * $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\+)?\ *($re_num)$/) {
$arg->{'value'} = $1 + $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\-)?\ *($re_num)$/) {
$arg->{'value'} = $1 - $2;
} elsif ($arg->{'value'} =~ /^($re_num)\ *(?:\/)?\ *($re_num)$/) {
$arg->{'value'} = $1 / $2;
} else {
#parse it and calc it as if it were its own equation.
$arg->{'eq'} = $arg->{'value'};
solve($arg);
}
}
my #args = #{$eq->{'args'}};
## add command processing here
# frac
if ($eq->{'command'} eq 'frac') {
$eq->{'value'} = $args[0]->{'value'} / $args[1]->{'value'};
return;
}
}
and here is a sample test.tex:
\documentclass{article}
\begin{document}
Hello World!
\begin{equation}
\frac{\frac{1}{3}}{2}
\end{equation}
\end{document}
Maybe using boost::spirit in order to tokenize the expression. You will need to define a huge grammar!
Use a parser generator to create an appropriate parser. Try ANTLR for this, as it includes an IDE for the Grammar, which is very helpful. Using tree rewrite rules, you can then convert the parse tree to an abstract syntax tree.
Start perhaps with the expression evaluator from ANTLR tutorial. I think this is reasonably close enough.