Is this expected behavior for Perl smart match? - regex

### Code Here ###
use 5.012;
use warnings;
my #a = (1, 'Ah');
say (#a ~~ /^1$/ ? 'TRUE' : 'FALSE');
say ('1' ~~ #a ? 'TRUE' : 'FALSE');
say (#a ~~ "Ah" ? 'TRUE' : 'FALSE');
say (#a ~~ /^Ah$/ ? 'TRUE' : 'FALSE');
### STDOUT ###
TRUE
TRUE
FALSE
TRUE
Shouldn't all of these pass?

Smartmatch's behavior is generally determined by the type of the right operand; it's not symmetrical. Keep the array on the right side and you should see the behavior you expect.
From perlop:
It is often best read aloud as "in", "inside of", or "is contained in", because the left operand is often looked for inside the right operand. That makes the order of the operands to the smartmatch operand often opposite that of the regular match operator. In other words, the "smaller" thing is usually placed in the left operand and the larger one in the right.
Giving this a read over again, honestly the behavior seems a bit bizarre and unpredictable, and I'd limit my use of it to either well-documented helper functions or extremely trivial cases.

Smartmatch is a subtle beast, and the Perl5 implementation is arguably buggy – it was demoted to experimental status in the 5.18 release.
We can look at the table of possible type combinations to determine which case is chosen.
The #a ~~ /^1$/ has type ARRAY ~~ Regexp which has the description “any ARRAY elements match Regexp. Like: grep { /Regexp/ } ARRAY”.
The '1' ~~ #a has type Any ~~ ARRAY, which has the description “smartmatch each ARRAY element. Like: grep { Any ~~ $_ } ARRAY”. The second level of smart matches should use the Any ~~ Num and Any ~~ Any cases.
The #a ~~ "Ah" probably has type Any ~~ Any, which does string comparison!
The #a ~~ /^Ah$/ is the above regex case again.
The smartmatch table is best understood by looking at the right argument. If it is a collection, the smartmatch is an in operator. If it is a regex or a coderef, smartmatch behaves like an application. If it is a simple scalar, then a ordinary comparision (either == or eq) is done.

Related

How to fix 'Bareword found' issue in perl eval()

The following code returns "Bareword found where operator expected at (eval 1) line 1, near "*,out" (Missing operator before out?)"
$val = 0;
$name = "abc";
$myStr = '$val = ($name =~ in.*,out [)';
eval($myStr);
As per my understanding, I can resolve this issue by wrapping "in.*,out [" block with '//'s.
But that "in.*,out [" can be varied. (eg: user inputs). and users may miss giving '//'s. therefore, is there any other way to handle this issue.? (eg : return 0 if eval() is trying to return that 'Bareword found where ...')
The magic of (string) eval -- and the danger -- is that it turns a heap of dummy characters into code, compiles and runs it. So can one then use '$x = ,hi'? Well, no, of course, when that string is considered code then that's a loose comma operator there, a syntax eror; and a "bareword" hi.† The string must yield valid code
In a string eval, the value of the expression (which is itself determined within scalar context) is first parsed, and if there were no errors, executed as a block within the lexical context of the current Perl program.
So that string in the question as it stands would be just (badly) invalid code, which won't compile, period. If the in.*,out [ part of the string is in quotes of some sort, then that is legitimate and the =~ operator will take it as a pattern and you have a regex. But then of course why not use regex's normal pattern delimiters, like // (or m{}, etc).
And whichever way that string gets acquired it'll be in a variable, no? So you can have /$input/ in the eval and populate that $input beforehand.
But, above all, are you certain that there is no other way? There always is. The string-eval is complex and tricky and hard to use right and nigh impossible to justify -- and dangerous. It runs arbitrary code! That can break things badly even without any bad intent.
I'd strongly suggest to consider other solutions. Also, it is unclear why there'd be need for eval in the first place -- as you only need the regex pattern as user input (not code) you can have that very regex in normal code with a pattern in a variable, which is populated earlier when the user input is supplied. (Note that taking a pattern from the user may lead to trouble as well.)
† A problem if you're into warnings, and we all are.
The following isn't valid Perl code:
$val = ($name =~ in.*,out [)
You want the following:
$val = $name =~ /in.*,out \[/
(The parens weren't harmful, but didn't help either.)
If the pattern is user-supplied, you can use the following:
$val = $name =~ /$pattern/
(No eval EXPR needed!)
Note from the correction that the pattern in the question isn't correct. You can catch such errors using eval BLOCK
eval { $val = $name =~ /$pattern/ };
die("Bad pattern \"$pattern\" provided: $#") if $#;
A note about user-provided patterns: The above won't let the user execute arbitrary code, but it won't protect you from patterns that would take longer than the lifespan of the universe to complete.

Why does the match operator's "match-only-once" optimization only apply with the "?" delimiter?

From the docs (perldoc -f m)
If ? is the delimiter, then a match-only-once rule applies, described in m?*PATTERN*? below.
The "match-only-once rule" doesn't' seem to be defined anywhere, but it seems to be a real optimization,
use Benchmark qw(:all) ;
use constant HAYSTACK => "this is a test string";
my $needle = "test";
cmpthese(-1, {
'questionmark' => sub { if ( HAYSTACK =~ m?$needle?n ) { 1 } },
'backslash' => sub { if ( HAYSTACK =~ m/$needle/n ) { 1 } },
});
With the results,
Rate backslash questionmark
backslash 9267717/s -- -57%
questionmark 21588328/s 133% --
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior? Let's take for example the output
perl -E'say "FOOOOOO" =~ m/O/' # returns 1
If it's not even counting the O what does it do after the first match such that it's twice as slow?
The "match-only-once rule" doesn't' seem to be defined anywhere, […]
"A match-only-once rule" is a description of the rule — it's a rule saying that m?PATTERN? matches only once — not an official name that you can use to search. The text that you quote is pulled from the perlop manpage, so when it says "described in m?*PATTERN*? below", it's referring to this part of that manpage:
m?PATTERN?msixpodualngc
This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only m?? patterns local to the current package are reset.
while (<>) {
if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear m?? status for next file
}
Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:
s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior?
Even in scalar context, m// or m?? may be called many times between resets, and if so then the two behave differently. (You can see this in the first snippet above. It's also the reason that your benchmarks give different performance results: the version with m?$needle?n only does a regex match the first time the function is called — it just returns 'no match' on all subsequent calls — whereas the version with m/$needle/n does a regex match every time.)
The confusion here is that "once" in "match-only-once" is in reference to the calling context of the m?? not in reference to matching once the needle inside the haystack, and ignoring subsequent matches of the needle inside the haystack. So if m?? is called many times without reset, only the first one that matches will return the match.
sub foo { return "foo" =~ m?o? };
say foo(); # 1
say foo(); # undef
reset();
say foo(); # 1

Why is a Regexp object considered to be "falsy" in Ruby?

Ruby has a universal idea of "truthiness" and "falsiness".
Ruby does have two specific classes for Boolean objects, TrueClass and FalseClass, with singleton instances denoted by the special variables true and false, respectively.
However, truthiness and falsiness are not limited to instances of those two classes, the concept is universal and applies to every single object in Ruby. Every object is either truthy or falsy. The rules are very simple. In particular, only two objects are falsy:
nil, the singleton instance of NilClass and
false, the singleton instance of FalseClass
Every single other object is truthy. This includes even objects that are considered falsy in other programming languages, such as
the Integer 0,
the Float 0.0,
the empty String '',
the empty Array [],
the empty Hash {},
These rules are built into the language and are not user-definable. There is no to_bool implicit conversion or anything similar.
Here is a quote from the ISO Ruby Language Specification:
6.6 Boolean values
An object is classified into either a trueish object or a falseish object.
Only false and nil are falseish objects. false is the only instance of the class FalseClass (see 15.2.6), to which a false-expression evaluates (see 11.5.4.8.3). nil is the only instance of the class NilClass (see 15.2.4), to which a nil-expression evaluates (see 11.5.4.8.2).
Objects other than false and nil are classified into trueish objects. true is the only instance of the class TrueClass (see 15.2.5), to which a true-expression evaluates (see 11.5.4.8.3).
The executable Ruby/Spec seems to agree:
it "considers a non-nil and non-boolean object in expression result as true" do
if mock('x')
123
else
456
end.should == 123
end
According to those two sources, I would assume that Regexps are also truthy, but according to my tests, they aren't:
if // then 'Regexps are truthy' else 'Regexps are falsy' end
#=> 'Regexps are falsy'
I tested this on YARV 2.7.0-preview1, TruffleRuby 19.2.0.1, and JRuby 9.2.8.0. All three implementations agree with each other and disagree with the ISO Ruby Language Specification and my interpretation of the Ruby/Spec.
More precisely, Regexp objects that are the result of evaluating Regexp literals are falsy, whereas Regexp objects that are the result of some other expression are truthy:
r = //
if r then 'Regexps are truthy' else 'Regexps are falsy' end
#=> 'Regexps are truthy'
Is this a bug, or desired behavior?
This isn’t a bug. What is happening is Ruby is rewriting the code so that
if /foo/
whatever
end
effectively becomes
if /foo/ =~ $_
whatever
end
If you are running this code in a normal script (and not using the -e option) then you should see a warning:
warning: regex literal in condition
This is probably somewhat confusing most of the time, which is why the warning is given, but can be useful for one lines using the -e option. For example you can print all lines matching a given regexp from a file with
$ ruby -ne 'print if /foo/' filename
(The default argument for print is $_ as well.)
This is the result of (as far as I can tell) an undocumented feature of the ruby language, which is best explained by this spec:
it "matches against $_ (last input) in a conditional if no explicit matchee provided" do
-> {
eval <<-EOR
$_ = nil
(true if /foo/).should_not == true
$_ = "foo"
(true if /foo/).should == true
EOR
}.should complain(/regex literal in condition/)
end
You can generally think of $_ as the "last string read by gets"
To make matters even more confusing, $_ (along with $-) is not a global variable; it has local scope.
When a ruby script starts, $_ == nil.
So, the code:
// ? 'Regexps are truthy' : 'Regexps are falsey'
Is being interpreted like:
(// =~ nil) ? 'Regexps are truthy' : 'Regexps are falsey'
...Which returns falsey.
On the other hand, for a non-literal regexp (e.g. r = // or Regexp.new('')), this special interpretation does not apply.
// is truthy; just like all other object in ruby besides nil and false.
Unless running a ruby script directly on the command line (i.e. with the -e flag), the ruby parser will display a warning against such usage:
warning: regex literal in condition
You could make use of this behaviour in a script, with something like:
puts "Do you want to play again?"
gets
# (user enters e.g. 'Yes' or 'No')
/y/i ? play_again : back_to_menu
...But it would be more normal to assign a local variable to the result of gets and perform the regex check against this value explicitly.
I'm not aware of any use case for performing this check with an empty regex, especially when defined as a literal value. The result you've highlighted would indeed catch most ruby developers off-guard.

value of binding operator expression in perl

I have some doubt about the outcome of a binding operator expression in perl. I mean expression like
string =~ /pattern/
I have done some simple test
$ss="a1b2c3";
say $ss=~/a/; # 1
say $ss=~/[a-z]/g; # abc
#aa=$ss=~/[a-z]/g;say #aa; # abc
$aa=#aa;say $aa; # 3
$aa=$ss=~/[a-z]/g;say $aa; # 1
note the comment part above is the running result.
So here comes the question, what on earth is returned by $ss=~/[a-z]/g, it seems that it returned an array according to code line 3,4,5. But what about the last line, why it gives 1 instead of 3 which is the length of array?
The return of the match operator depends on the context: in list context it returns all captured matches, in scalar context the true/false. The say imposes list context, but in the first example nothing is captured in the regex so you only get "success."
Next, the behavior of /g modifier also differs across contexts. In list context, with it the string keeps being scanned with the given pattern until all matches are found, and a list with them is returned. These are your second and third examples.
But in scalar context its behavior is a bit specific: with it the search will continue from the position of the last match, the next time round. One typical use is in the loop condition
while (/(\w+)/g) { ... }
This is a bit of a tokenizer: after the body of the loop runs the next word is found, etc.
Then the last example doesn't really make sense; you are getting the "normal" scalar-context matching success/fail, and /g doesn't do anything -- until you match on $ss the next time
perl -wE'
$s=shift||q(abc);
for (1..2) { $m = $s=~/(.)/g; say "$m: $1"; }
'
prints lines 1:a and then 1:b.
Outside of iterative structures (like while condition) the /g in scalar context is usually an error, pointless at best or a quiet bug.
See "Global matching" under "Using regular expressions" in perlretut for /g.
See regex operators in perlop in general, and about /g as well. A useful tool to explore /g workings is pos.

How do I use a Moose Type as a regex match expression?

I have the following type in my class file:
has 'cardNumber' => (is => 'ro', isa => 'Int', required => 1);
I am trying to do the following:
foreach $_ (#accountsInfo) {
if ($_ =~ m/^$self->cardNumber()/) {
$self->pushing(split(/,/, $_));
$self->invokeAccount();
}
}
But I can't get it to test properly. If I manually type in the number I am looking for in the regex slashes it works perfectly. Can you please help me to use the cardNumber Type?
Perl's interpolation rules state that arrays ("#foo") and scalars ("$bar"), as well as (a) lookups of values in hashes ("$baz{bar}") or arrays ("$foo[1]"), and (b) dereferences of the previous cases ("#$foo, $$bar, $baz->{bar}, $foo->[1]") are interpolated into double quoted strings.
Function calls, and per extension method calls, are not interpolated.
You can interpolate arbitrary code into strings by using a trick of dereferencing an anonymous reference. Usually, you want an arrayref:
"foo #{[ expressions; ]} bar"; # interpolating anon hashref
but scalar refs work as well (they are 1 character longer).
"foo ${\( expressions; )}" # interpolating anon scalar ref
However, you should consider caching the value you want to interpolate in a scalar variable:
my $cardNumber = $self->cardNumber;
for (#accountsInfo) {
if (/\A\Q$cardNumber\E/) {
$self->pushing(split /,/);
$self->invokeAccount();
}
}
Additional note: I stripped out unneccessary parens and mentions of $_ from that code snippet. Also, I escaped the characters in $cardNumber so that they match literally, and aren't treated as a regex.
Print it to see what you're trying to match. You'll get output like SOMECLASS=HASH(0x6bbb48)->foo(). A solution:
/^#{[$self->cardNumber()]}/