Why is division parsed as regular expression? - regex

This is part of my code:
my $suma = U::round $item->{ suma }; # line 36
$ts += $suma;
$tnds += U::round $suma /6;
}
return( $ts, $tnds );
}
sub create { #line 46
my( $c ) = shift;
my $info = $c->req->json;
my $header = #$info[0];
my $details = #$info[1];
my $agre = D::T Agreement => $header->{ agreement_id };
my( $total_suma, $total_nds ) = total( $details );
my $saldo = 0;
my $iid = #$details[0]->{ period };
my $interval = D::T Period => $iid //7; # line 58
# This is first Invoice if operator do not provide activation date
my $is_first = !$details->[0]{valid_from} && $iid && $interval;
When this module is loaded I gen an error:
Can't load application from file "lib/MaitreD/Controller/ManualDocument.pm line 38, near "my $interval = D::T Period => $iid /"
Unknown regexp modifier "/6" at lib/MaitreD/Controller/ManualDocument.pm line 38, at end of line
Global symbol "$pkg" requires explicit package name (did you forget to declare "my $pkg"?) at lib/MaitreD/Controller/ManualDocument.pm line 41.
...
Is this indirect object call guilty?
Because when I put parentheses at U::round( $suma /6 ) there is no errors

Here are some thoughts on this, and a plausible explanation. A simple reproduction
perl -wE'sub tt { say "#_" }; $v = 7; tt $v /3'
gives me
Search pattern not terminated at -e line 1.
So it tries to parse a regex in that subroutine call, as stated, and the question is: why?
With parenthesis around argument(s) it works as expected. With more arguments following it it fails the same way, but with arguments preceding it it works
perl -wE'sub tt { say "#_" }; $v = 7; tt $v /3, 3' # fails the same way
perl -wE'sub tt { say "#_" }; $v = 7; tt 3, $v /3' # works
Equipping the tt sub with a prototype doesn't change any of this.
By the error it appears that the / triggers the search for the closing delimiter and once it's not found the whole thing fails. So why is this interpreted as a regex and not division?
It seems that tt $v are grouped in parsing, and interpreted as a sub and its arguments, since they're followed by a space; then /3 is taken separately and then that does look like a regex.† That would still fail as a syntax error but perhaps the regex parsing failure comes first.
Then the difference between other comma-separated terms coming before or after is clear -- with tt 3, ... the following $v /3 is a term for the next argument, and is parsed as division.
This still leaves another issue. All builtins that I tried don't have this problem, be they list or unary operators, with a variety of prototypes (push, chr, splice, etc) -- except for print, which does have the same looking problem. And which fails both with and without parens.
perl -wE'$v=110; say for unpack "A1A1", $v /2' #--> 5 5
perl -wE'$v=200; say chr $v /2' #--> d
perl -wE'$v=3; push #ary, $v /2; say "#ary"' #--> 1.5
perl -wE'$v = 7; say $v /3' # fails, the same way
perl -wE'$v = 7; say( $v /3 )' # fails as well, same way
A difference is that print obeys "special" parsing rules, and which allow the first argument to be a filehandle. (Also, it has no prototype but that doesn't appear to matter.)
Then the expression print $v /3... can indeed be parsed as print filehandle EXPR, and the EXPR starting with / is parsed as a regex. The same works with parenthesis.‡
All this involves some guesswork as I don't know how the parser does it. But it is clearly a matter of details of how a subroutine call is parsed, what (accidentally?) includes print as well.
An obvious remedy of using parens on (user-defined) subroutines is reasonable in my view. The other fix is to be consistent with spaces around math operators, to either not have them on either side or to use them on both sides -- that is fine as well, even as it's itchy (spaces? really?).
I don't know what to say about there being a problem with say( $v /3 ) though.
A couple more comments on the question.
By the text of the error message in the question, Unknown regexp modifier "/6", it appears that there the / is taken as the closing delimiter, unlike in the example above. And there is more in that message, which is unclear. In the end, we do have a very similar parsing question.
As for
Is this indirect object call guilty?
I don't see an indirect object call there, only a normal subroutine call. Also, the example from this answer displays very similar behavior and rules out the indirect object syntax.
† Another possibility may be that $v /3 is parsed as a term, since it follows the (identifiable!) subroutine name tt. Then, the regex binding operator =~ binds more tightly than the division, and here it is implied by clearly attempting to bind to $_ by default.
I find this less likely, and it also can't explain the behavior of builtins, print in particular.
‡
Then one can infer that other builtins with an optional comma-less first argument (and so without a prototype) go the same way but I can't readily think of any.

Perl thinks that the symbol / is a start of a regular expression and not a division operator. https://perldoc.perl.org/perlre - You can check the perldoc for regular expressions.
You can try adding a whitespace character before 6 like so: $tnds += U::round $suma / 6;

Related

Error while compiling regex function, why am I getting this issue?

My RAKU Code:
sub comments {
if ($DEBUG) { say "<filtering comments>\n"; }
my #filteredtitles = ();
# This loops through each track
for #tracks -> $title {
##########################
# LAB 1 TASK 2 #
##########################
## Add regex substitutions to remove superflous comments and all that follows them
## Assign to $_ with smartmatcher (~~)
##########################
$_ = $title;
if ($_) ~~ s:g:mrx/ .*<?[\(^.*]> / {
# Repeat for the other symbols
########################## End Task 2
# Add the edited $title to the new array of titles
#filteredtitles.push: $_;
}
}
# Updates #tracks
return #filteredtitles;
}
Result when compiling:
Error Compiling! Placeholder variable '#_' may not be used here because the surrounding block doesn't take a signature.
Is there something obvious that I am missing? Any help is appreciated.
So, in contrast with #raiph's answer, here's what I have:
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
Just that. Nothing else. Let's dissect it, from the inside out:
This part: / <[\(^]> / is a regular expression that will match one character, as long as it is an open parenthesis (represented by the \() or a caret (^). When they go inside the angle brackets/square brackets combo, it means that is an Enumerated character class.
Then, the: S introduces the non-destructive substitution, i.e., a quoting construct that will make regex-based substitutions over the topic variable $_ but will not modify it, just return its value with the modifications requested. In the code above, S:g brings the adverb :g or :global (see the global adverb in the adverbs section of the documentation) to play, meaning (in the case of the substitution) "please make as many as possible of this substitution" and the final / marks the end of the substitution text, and as it is adjacent to the second /, that means that
S:g / <[\(^]> //
means "please return the contents of $_, but modified in such a way that all its characters matching the regex <[\(^]> are deleted (substituted for the empty string)"
At this point, I should emphasize that regular expressions in Raku are really powerful, and that reading the entire page (and probably the best practices and gotchas page too) is a good idea.
Next, the: .map method, documented here, will be applied to any Iterable (List, Array and all their alikes) and will return a sequence based on each element of the Iterable, altered by a Code passed to it. So, something like:
#x.map({ S:g / foo /bar/ })
essencially means "please return a Sequence of every item on #x, modified by substituting any appearance of the substring foo for bar" (nothing will be altered on #x). A nice place to start to learn about sequences and iterables would be here.
Finally, my one-liner
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
can be translated as:
I have a List with three string elements
Foo
Ba(r
B^az
(This would be a placeholder for your "list of titles"). Take that list and generate a second one, that contains every element on it, but with all instances of the chars "open parenthesis" and "caret" removed.
Ah, and store the result in the variable #tracks (that has my scope)
Here's what I ended up with:
my #tracks = <Foo Ba(r B^az>;
sub comments {
my #filteredtitles;
for #tracks -> $_ is copy {
s:g / <[\(^]> //;
#filteredtitles.push: $_;
}
return #filteredtitles;
}
The is copy ensures the variable set up by the for loop is mutable.
The s:g/...//; is all that's needed to strip the unwanted characters.
One thing no one can help you with is the error you reported. I currently think you just got confused.
Here's an example of code that generates that error:
do { #_ }
But there is no way the code you've shared could generate that error because it requires that there is an #_ variable in your code, and there isn't one.
One way I can help in relation to future problems you may report on StackOverflow is to encourage you to read and apply the guidance in Minimal Reproducible Example.
While your code did not generate the error you reported, it will perhaps help you if you know about some of the other compile time and run time errors there were in the code you shared.
Compile-time errors:
You wrote s:g:mrx. That's invalid: Adverb mrx not allowed on substitution.
You missed out the third slash of the s///. That causes mayhem (see below).
There were several run-time errors, once I got past the compile-time errors. I'll discuss just one, the regex:
.*<?[...]> will match any sub-string with a final character that's one of the ones listed in the [...], and will then capture that sub-string except without the final character. In the context of an s:g/...// substitution this will strip ordinary characters (captured by the .*) but leave the special characters.
This makes no sense.
So I dropped the .*, and also the ? from the special character pattern, changing it from <?[...]> (which just tries to match against the character, but does not capture it if it succeeds) to just <[...]> (which also tries to match against the character, but, if it succeeds, does capture it as well).
A final comment is about an error you made that may well have seriously confused you.
In a nutshell, the s/// construct must have three slashes.
In your question you had code of the form s/.../ (or s:g/.../ etc), without the final slash. If you try to compile such code the parser gets utterly confused because it will think you're just writing a long replacement string.
For example, if you wrote this code:
if s/foo/ { say 'foo' }
if m/bar/ { say 'bar' }
it'd be as if you'd written:
if s/foo/ { say 'foo' }\nif m/...
which in turn would mean you'd get the compile-time error:
Missing block
------> if m/⏏bar/ { ... }
expecting any of:
block or pointy block
...
because Raku(do) would have interpreted the part between the second and third /s as the replacement double quoted string of what it interpreted as an s/.../.../ construct, leading it to barf when it encountered bar.
So, to recap, the s/// construct requires three slashes, not two.
(I'm ignoring syntactic variants of the construct such as, say, s [...] = '...'.)

In Ruby, how can the Regexp#~ unary operator be aliased?

Playing with the freedom that Ruby offers in its base features, I found rather easy to alias most operator used in the language, but the Regexp#~ unary prefix operator is trickier.
A first naïve approach would be to alias it in the Regexp class itself
class Regexp
alias hit ~# # remember that # stands for "prefix version"
# Note that a simple `alias_method :hit, :~#` will give the same result
end
As it was pointed in some answer bellow, this approach is somehow functionnal with the dot notation calling form, like /needle/.hit. However trying to execute hit /needle/ will raise undefined method hit' for main:Object (NoMethodError)`
So an other naïve approach would be to define this very method in Object, something like
class Object
def ~#(pattern)
pattern =~ $_
end
end
However, this won’t work, as the $_ global variable is in fact locally binded and won’t keep the value it has in the calling context, that is $_ is always nil in the previous snippet.
So the question is, is it possible to have the expression hit /needle/ to restitute the same result as ~ /needle/?
Works just fine for me:
class Regexp
alias_method :hit, :~ # both of them work
# alias hit ~ # both of them work
end
$_ = "input data"
/at/.hit #=> 7
~/at/ #=> 7
/at/.hit #=> 7
~/at/ #=> 7
So, as the completed question now inhibits it, the main hindrance is the narrow scope of $_. That’s where trace_var can come to the rescue:
trace_var :$_, proc { |nub|
$last_explicitly_read_line = nub
#puts "$_ is now '#{nub}'"
}
def reach(pattern)
$last_explicitly_read_line =~ pattern
end
def first
$_ = "It’s needless to despair."
end
def second
first
p reach /needle/
$_ = 'What a needlework!'
p reach /needle/
end
p reach /needle/
second
p reach /needle/
$_ = nil
p reach /needle/
So the basic idea is to stash the value of $_ each time it is changed in an other variable that will be accessible in other subsequent calling context. Here it was implemented with a an other global variable (not locally binded, unlike $_ of course), but the same result could be obtained with other implementations, like defining a class variable on Object.
One could also try to use something like binding_of_caller or binding_ninja, but my own approach of doing so failed, and also of course it comes with additional dependencies which have their own limitations.

How to fix 'Bareword found' issue in perl eval()

The following code returns "Bareword found where operator expected at (eval 1) line 1, near "*,out" (Missing operator before out?)"
$val = 0;
$name = "abc";
$myStr = '$val = ($name =~ in.*,out [)';
eval($myStr);
As per my understanding, I can resolve this issue by wrapping "in.*,out [" block with '//'s.
But that "in.*,out [" can be varied. (eg: user inputs). and users may miss giving '//'s. therefore, is there any other way to handle this issue.? (eg : return 0 if eval() is trying to return that 'Bareword found where ...')
The magic of (string) eval -- and the danger -- is that it turns a heap of dummy characters into code, compiles and runs it. So can one then use '$x = ,hi'? Well, no, of course, when that string is considered code then that's a loose comma operator there, a syntax eror; and a "bareword" hi.† The string must yield valid code
In a string eval, the value of the expression (which is itself determined within scalar context) is first parsed, and if there were no errors, executed as a block within the lexical context of the current Perl program.
So that string in the question as it stands would be just (badly) invalid code, which won't compile, period. If the in.*,out [ part of the string is in quotes of some sort, then that is legitimate and the =~ operator will take it as a pattern and you have a regex. But then of course why not use regex's normal pattern delimiters, like // (or m{}, etc).
And whichever way that string gets acquired it'll be in a variable, no? So you can have /$input/ in the eval and populate that $input beforehand.
But, above all, are you certain that there is no other way? There always is. The string-eval is complex and tricky and hard to use right and nigh impossible to justify -- and dangerous. It runs arbitrary code! That can break things badly even without any bad intent.
I'd strongly suggest to consider other solutions. Also, it is unclear why there'd be need for eval in the first place -- as you only need the regex pattern as user input (not code) you can have that very regex in normal code with a pattern in a variable, which is populated earlier when the user input is supplied. (Note that taking a pattern from the user may lead to trouble as well.)
† A problem if you're into warnings, and we all are.
The following isn't valid Perl code:
$val = ($name =~ in.*,out [)
You want the following:
$val = $name =~ /in.*,out \[/
(The parens weren't harmful, but didn't help either.)
If the pattern is user-supplied, you can use the following:
$val = $name =~ /$pattern/
(No eval EXPR needed!)
Note from the correction that the pattern in the question isn't correct. You can catch such errors using eval BLOCK
eval { $val = $name =~ /$pattern/ };
die("Bad pattern \"$pattern\" provided: $#") if $#;
A note about user-provided patterns: The above won't let the user execute arbitrary code, but it won't protect you from patterns that would take longer than the lifespan of the universe to complete.

Why does the match operator's "match-only-once" optimization only apply with the "?" delimiter?

From the docs (perldoc -f m)
If ? is the delimiter, then a match-only-once rule applies, described in m?*PATTERN*? below.
The "match-only-once rule" doesn't' seem to be defined anywhere, but it seems to be a real optimization,
use Benchmark qw(:all) ;
use constant HAYSTACK => "this is a test string";
my $needle = "test";
cmpthese(-1, {
'questionmark' => sub { if ( HAYSTACK =~ m?$needle?n ) { 1 } },
'backslash' => sub { if ( HAYSTACK =~ m/$needle/n ) { 1 } },
});
With the results,
Rate backslash questionmark
backslash 9267717/s -- -57%
questionmark 21588328/s 133% --
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior? Let's take for example the output
perl -E'say "FOOOOOO" =~ m/O/' # returns 1
If it's not even counting the O what does it do after the first match such that it's twice as slow?
The "match-only-once rule" doesn't' seem to be defined anywhere, […]
"A match-only-once rule" is a description of the rule — it's a rule saying that m?PATTERN? matches only once — not an official name that you can use to search. The text that you quote is pulled from the perlop manpage, so when it says "described in m?*PATTERN*? below", it's referring to this part of that manpage:
m?PATTERN?msixpodualngc
This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only m?? patterns local to the current package are reset.
while (<>) {
if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear m?? status for next file
}
Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:
s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior?
Even in scalar context, m// or m?? may be called many times between resets, and if so then the two behave differently. (You can see this in the first snippet above. It's also the reason that your benchmarks give different performance results: the version with m?$needle?n only does a regex match the first time the function is called — it just returns 'no match' on all subsequent calls — whereas the version with m/$needle/n does a regex match every time.)
The confusion here is that "once" in "match-only-once" is in reference to the calling context of the m?? not in reference to matching once the needle inside the haystack, and ignoring subsequent matches of the needle inside the haystack. So if m?? is called many times without reset, only the first one that matches will return the match.
sub foo { return "foo" =~ m?o? };
say foo(); # 1
say foo(); # undef
reset();
say foo(); # 1

Perl switch/case Fails on Literal Regex String Containing Non-Capturing Group '?'

I have text files containing lines like:
2/17/2018 400000098627 =2,000.0 $2.0994 $4,387.75
3/7/2018 1)0000006043 2,000.0 $2.0731 $4,332.78
3/26/2018 4 )0000034242 2,000.0 $2.1729 $4,541.36
4/17/2018 2)0000008516 2,000.0 $2.219 $4,637.71
I am matching them with /^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/ But I also have some files with lines in a completely different format, which I match with a different regex. When I open a file I determine which format and assign $pat = '<regex-string>'; in a switch/case block:
$pat = '/^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/'
But the ? character that introduces the non-capturing group I use to match repeats after the date and before the first currency amount causes the Perl interpreter to fail to compile the script, reporting on abort:
syntax error at ./report-dates-amounts line 28, near "}continue "
If I delete the ? character, or replace ? with \? escaped character, or first assign $q = '?' then replace ? with $q inside a " string assignment (ie. $pat = "/^\s*(\S+)\s+($q:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/"; ) the script compiles and runs. If I assign the regex string outside the switch/case block that also works OK. Perl v5.26.1 .
My code also doesn't have any }continue in it, which as reported in the compilation failure is probably some kind of transformation of the switch/case code by Switch.pm into something native the compiler chokes on. Is this some kind of bug in Switch.pm? It fails even when I use given/when in exactly the same way.
#!/usr/local/bin/perl
use Switch;
# Edited for demo
switch($format)
{
# Format A eg:
# 2/17/2018 400000098627 =2,000.0 $2.0994 $4,387.75
# 3/7/2018 1)0000006043 2,000.0 $2.0731 $4,332.78
# 3/26/2018 4 )0000034242 2,000.0 $2.1729 $4,541.36
# 4/17/2018 2)0000008516 2,000.0 $2.219 $4,637.71
#
case /^(?:april|snow)$/i
{ # This is where the ? character breaks compilation:
$pat = '^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$';
# WORKS:
# $pat = '^\s*(\S+)\s+(' .$q. ':[0-9|\)| ]+)+\s+\D' .$q. '(\S+)\s+\$';
}
# Format B
case /^(?:umberto|petro)$/i
{
$pat = '^(\S+)\s+.*Think 1\s+(\S+)\s+';
}
}
Don't use Switch. As mentionned by #choroba in the comments, Switch uses a source filter, which leads to mysterious and hard to debug errors, as you constated.
The module's documentation itself says:
In general, use given/when instead. It were introduced in perl 5.10.0. Perl 5.10.0 was released in 2007.
However, given/when is not necessarily a good option as it is experimental and likely to change in the future (it seems that this feature was almost removed from Perl v5.28; so you definitely don't want to start using it now if you can avoid it). A good alternative is to use for:
for ($format) {
if (/^(?:april|snow)$/i) {
...
}
elsif (/^(?:umberto|petro)$/i) {
...
}
}
It might look weird a first, but once you get used to it, it's actually reasonable in my opinion. Or, of course, you can use none of this options and just do:
sub pattern_from_format {
my $format = shift;
if ($format =~ /^(?:april|snow)$/i) {
return qr/^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$/;
}
elsif ($format =~ /^(?:umberto|petro)$/i) {
return qr/^(\S+)\s+.*Think 1\s+(\S+)\s+/;
}
# Some error handling here maybe
}
If, for some reason, you still want to use Switch: use m/.../ instead of /.../.
I have no idea why this bug is happening, however, the documentation says:
Also, the presence of regexes specified with raw ?...? delimiters may cause mysterious errors. The workaround is to use m?...? instead.
Which I misread at first, and therefore tried to use m/../ instead of /../, which fixed the issue.
Another option instead of an if/elsif chain would be to loop over a hash which maps your regular expressions to the values which should be assigned to $pat:
#!/usr/local/bin/perl
my %switch = (
'^(?:april|snow)$' => '^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$',
'^(?:umberto|petro)$' => '^(\S+)\s+.*Think 1\s+(\S+)\s+',
);
for my $re (keys %switch) {
if ($format =~ /$re/i) {
$pat = $switch{$re};
last;
}
}
For a more general case (i.e., if you're doing more than just assigning a string to a scalar) you could use the same general technique, but use coderefs as the values of your hash, thus allowing it to execute an arbitrary sub based on the match.
This approach can cover a pretty wide range of the functionality usually associated with switch/case constructs, but note that, because the conditions are pulled from the keys of a hash, they'll be evaluated in a random order. If you have data which could match more than one condition, you'll need to take extra precautions to handle that, such as having a parallel array with the conditions in the proper order or using Tie::IxHash instead of a regular hash.