perl6 Need help to understand more about proto regex/token/rule - regex

The following code is taken from the Perl 6 documentation, and I am trying to learn more about it before more experimentation:
proto token command {*}
token command:sym<create> { <sym> }
token command:sym<retrieve> { <sym> }
token command:sym<update> { <sym> }
token command:sym<delete> { <sym> }
Is the * in the first line a whatever-star? Can it be something else, such as
proto token command { /give me an apple/ }
Can "sym" be something else, such as
command:eat<apple> { <eat> } ?

{*} tells the runtime to call the correct candidate.
Rather than force you to write {{*}} for the common case of just call the correct one, the compiler allows you to shorten it to just {*}
That is the case for all proto routines like sub, method, regex, token, and rule.
In the case of the regex proto routines, only a bare {*} is allowed.
The main reason is probably because no-one has really come up with a good way to make it work sensibly in the regex sub-language.
So here is an example of a proto sub that does some things that are common to all of the candidates.
#! /usr/bin/env perl6
use v6.c;
for #*ARGS { $_ = '--stdin' when '-' }
# find out the number of bytes
proto sub MAIN (|) {
try {
# {*} calls the correct multi
# then we get the number of elems from its result
# and try to say it
say {*}.elems # <-------------
}
# if {*} returns a Failure note the error message to $*ERR
or note $!.message;
}
#| the number of bytes on the clipboard
multi sub MAIN () {
xclip
}
#| the number of bytes in a file
multi sub MAIN ( Str $filename ){
$filename.IO.slurp(:!chomp,:bin)
}
#| the number of bytes from stdin
multi sub MAIN ( Bool :stdin($)! ){
$*IN.slurp-rest(:bin)
}
sub xclip () {
run( «xclip -o», :out )
.out.slurp-rest( :bin, :close );
}

This answers your second question. Yes, it's late.
You have to distinguish two different syms (or eats). The one that's on the definition of the token as an "adverb" (or extended syntax identifier, whatever you want to call it), and the one that's on the token itself.
If you use <eat> in the token body, Perl 6 will simply not find it. You will get an error like
No such method 'eat' for invocant of type 'Foo'
Where Foo would be the name of the grammar. <sym> is a predefined token, which matches the value of the adverb (or pair value) in the token.
You could, in principle, use the extended syntax to define a multi token (or rule, or regex). However, if you try to define it in this way, you will get a different error:
Can only use <sym> token in a proto regex
So, the answer to your second question is no, and no.

Related

Raku grammar action throwing "Cannot bind attributes in a Nil type object. Did you forget a '.new'?" error when using "make"

I have this method in a class that's throwing a Cannot bind attributes in a Nil type object. Did you forget a '.new'?
method parse() {
grammar FindHeaders {
token TOP { [<not-header> | <header>]+ $ }
token not-header { ^^ <![#]> \N* \n }
token header { ^^ '#'{ 1 .. 6 } <content> \n }
token content { \N+ }
}
class HeaderActions {
method content($match) {
return if $match ~~ m/^^\#\s+<[A..Z]>+e*s/ || $match !~~ m/<[a..z]>/;
return if $match ~~ m/\|/ && ( $match ~~ m:i/project/ || $match ~~ m:i/\+\w/ );
my $tc = Lingua::EN::Titlecase.new($match);
my $new_title = $tc.title;
make($new_title);
}
}
my $t = $!content;
FindHeaders.parse($t, actions => HeaderActions.new);
}
As far as I can tell, this code matches what's in the official documentation. So not sure why I'm getting this error. I have no idea what attribute or Nil object the compiler is referring to. If I comment out the line with the make method, everything works fine.
method content($match) {
There's a reason that action methods typically use $/ as the argument name: because the make function looks for $/ in order to associate the provided object to it. You can use $match, but then need to call the make method on that instead:
$match.make($new_title);
The mention of Nil is because the failed match earlier in the action method resulted in $/ being set to Nil.
I guess you avoided the more idiomatic $/ as the parameter of the action method because it gets in the way of doing further matching in the action method. Doing further matching in action methods means that the text is being parsed twice (once in the grammar and again the action), which is not so efficient, and usually best avoided (by moving the parsing work into the grammar).
As a final style point, declaring grammars and action classes in a method is neat encapsulation if they are only used there, but it would be wise to my scope them (my grammar FindHeaders { ... }), otherwise they shall end up installed in the nearest enclosing package anyway.
Err - bit of a guess here but looks like this error is generated during creation of a new object. That points to the line my $tc = Lingua::EN::Titlecase.new($match). I wonder if you want to pass a Str into this function call e.g. with "$match" or ~$match...

Custom vallidator to ban a specific wordlist

I need a custom validator to ban a specific list of banned words from a textarea field.
I need exactly this type of implementation, I know that it's not logically correct to let the user type part of a query but it's exactly what I need.
I tried with a regExp but it has a strange behaviour.
My RegExp
/(drop|update|truncate|delete|;|alter|insert)+./gi
my Validator
export function forbiddenWordsValidator(sqlRe: RegExp): ValidatorFn {
return (control: AbstractControl): { [key: string]: any } | null => {
const forbidden = sqlRe.test(control.value);
return forbidden ? { forbiddenSql: { value: control.value } } : null;
};
}
my formControl:
whereCondition: new FormControl("", [
Validators.required,
forbiddenWordsValidator(this.BAN_SQL_KEYWORDS)...
It works only in certain cases and I don't understand why does the same string works one time and doesn't work if i delete a char and rewrite it or sometimes if i type a whitespace the validator returns ok.
There are several issues here:
The global g modifier leads to unexpected alternated results when used in RegExp#test and similar methods that move the regex index after a valid match, it must be removed
. at the end requires any 1 char other than line break char, hence it must be removed.
Use
/drop|update|truncate|delete|;|alter|insert/i
Or, to match the words as whole words use
/\b(?:drop|update|truncate|delete|alter|insert)\b|;/i
This way, insert in insertion and drop in dropout won't get "caught" (=matched).
See the regex demo.
it's not a great idea to give such power to the user

Does the Perl compiler need to be told not to optimize away function calls with ignored return values?

I am writing new Perl 5 module Class::Tiny::ConstrainedAccessor to check type constraints when you touch object attributes, either by setting or by getting a default value. I am writing the unit tests and want to run the accessors for the latter case. However, I am concerned that Perl may optimize away my accessor-function call since the return value is discarded. Will it? If so, can I tell it not to? Is the corresponding behaviour documented? If the answer is as simple as "don't worry about it," that's good enough, but a reference to the docs would be appreciated :) .
The following MCVE succeeds when I run it on my Perl 5.26.2 x64 Cygwin. However, I don't know if that is guaranteed, or if it just happens to work now and may change someday.
use 5.006; use strict; use warnings; use Test::More; use Test::Exception;
dies_ok { # One I know works
my $obj = Klass->new; # Default value of "attribute" is invalid
diag $obj->accessor; # Dies, because the default is invalid
} 'Bad default dies';
dies_ok {
my $obj = Klass->new;
$obj->accessor; # <<< THE QUESTION --- Will this always run?
} 'Dies even without diag';
done_testing();
{ package Klass;
sub new { my $class = shift; bless {#_}, $class }
sub check { shift; die 'oops' if #_ and $_[0] eq 'bad' }
sub default { 'bad' }
sub accessor {
my $self = shift;
if(#_) { $self->check($_[0]); return $self->{attribute} = $_[0] } # W
elsif(exists $self->{attribute}) { return $self->{attribute} } # R
else {
# Request to read the attribute, but no value is assigned yet.
# Use the default.
$self->check($self->default); # <<<---- What I want to exercise
return $self->{attribute} = $self->default;
}
} #accessor()
} #Klass
This question deals with variables, but not functions. perlperf says that Perl will optimize away various things, but other than ()-prototyped functions, it's not clear to me what.
In JavaScript, I would say void obj.accessor();, and then I would know for sure it would run but the result would be discarded. However, I can't use undef $obj->accessor; for a similar effect; compilation legitimately fails with Can't modify non-lvalue subroutine call of &Klass::accessor.
Perl doesn't ever optimize away sub calls, and sub calls with side effects shouldn't be optimised away in any language.
undef $obj->accessor means something similar to $obj->accessor = undef

How to cache and use the cached regexes in perl6 grammar?

My code spends a lot of time on regex interpolation. As the patterns rarely change, I guess caching these generated regexes should speed up the code. But I cannot figure out a right way to cache and use the cached regexes.
The code is used to parse some arithmetric expressions. As the users are allowed to define new operators, the parser must be ready to add new operators to the grammar. So the parser use a table to record these new operators and generate regexes from the table on the fly.
#! /usr/bin/env perl6
use v6.c;
# the parser may add new operators to this table on the fly.
my %operator-table = %(
1 => $['"+"', '"-"'],
2 => $['"*"', '"/"'],
# ...
);
# original code, runnable but slow.
grammar Operator {
token operator(Int $level) {
<{%operator-table{$level}.join('|')}>
}
# ...
}
# usage:
say Operator.parse(
'+',
rule => 'operator',
args => \(1)
);
# output:
# 「+」
Here are some experiments:
# try to cache the generated regexes but not work.
grammar CachedOperator {
my %cache-table = %();
method operator(Int $level) {
if (! %cache-table{$level}) {
%cache-table.append(
$level => rx { <{%operator-table{$level}.join('|')}> }
)
}
%cache-table{$level}
}
}
# test:
say CachedOperator.parse(
'+',
rule => 'operator',
args => \(1)
);
# output:
# Nil
# one more try
grammar CachedOperator_ {
my %cache-table = %();
token operator(Int $level) {
<create-operator($level)>
}
method create-operator(Int $level) {
if (! %cache-table{$level}) {
%cache-table.append(
$level => rx { <{%operator-table{$level}.join('|')}> }
)
}
%cache-table{$level}
}
}
# test:
say CachedOperator_.parse(
'+',
rule => 'operator',
args => \(1)
);
# compile error:
# P6opaque: no such attribute '$!pos' on type Match in a Regex when trying to get a value
The following doesn't directly answer your question but may be of interest.
User defined operators
The following code declares an operator in P6:
sub prefix:<op> ($operand) { " $operand prefixed by op" }
Now one can use the new operator:
say op 42; # 42 prefixed by op
A wide range of operator positions and arities are covered, including choice of associativity and precedence, parentheses for grouping, etc. So maybe this is an appropriate way to implement what you're implementing.
Although it's slow, it might be fast enough. Additionally, as Larry said in 2017 ...
we know some some places in the parser that are slower than they should be, for instance ... various lexers relook at various characters in your Perl 6 program, it averages 5 or 6 times on every character, which is obviously deeply sub-optimal, and we know how to fix it
... and with luck Jonathan will work on the P6 grammar parser this year.
DSLs and Slangs
Even if you aren't interested in using the main language's ability to declare user defined operators, or can't for some reason, the underlying mechanisms that make it work might be of interest/use. Here are some references:
Brian Duggan's Informal DSLs presentation (video, slides).
Mouq's 2014 gist Slangs.
Larry Wall's speculation from way back when in Switching parsers and Slangs.

Matching and storing part of a string in a variable using JScript.NET

I am fiddling with some a script for Fiddler, which uses JScript.NET. I have a string of the format:
{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...
I want to match and show "key2":"someothervalue" where someothervalue could be any value but the key is static.
Using good old sed and bash I can replace the part I am looking for with:
$ a='{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...'
$ echo $a | sed -r 's/"key2":"[^"]+"/replaced/g'
{"params":{"key1":"somevalue",replaced,"key3":"whatevervalue", ...
Now. Instead of replacing it, I want to extract that part into a variable using JScript.NET. How can that be done?
The most graceful way is to use a JSON parser. My personal preference is to import IE's JSON parser using the htmlfile COM object.
import System;
var str:String = '{"params":{"key1":"foo","key2":"bar","key3":"baz"}}',
htmlfile = new ActiveXObject('htmlfile');
// force htmlfile COM object into IE9 compatibility
htmlfile.IHTMLDocument2_write('<meta http-equiv="x-ua-compatible" content="IE=9" />');
// clone JSON object and methods into familiar syntax
var JSON = htmlfile.parentWindow.JSON,
// deserialize your JSON-formatted string
obj = JSON.parse(str);
// access JSON values as members of a hierarchical object
Console.WriteLine("params.key2 = " + obj.params.key2);
// beautify the JSON
Console.WriteLine(JSON.stringify(obj, null, '\t'));
Compiling, linking, and running results in the following console output:
params.key2 = bar
{
"params": {
"key1": "foo",
"key2": "bar",
"key3": "baz"
}
}
Alternatively, there are also at least a couple of .NET namespaces which provide methods to serialize objects into a JSON string, and to deserialize a JSON string into objects. Can't say I'm a fan, though. The ECMAScript notation of JSON.parse() and JSON.stringify() are certainly a lot easier and profoundly less alien than whatever neckbeard madness is going on at Microsoft.
And while I certainly don't recommend scraping JSON (or any other hierarchical markup if it can be helped) as complicated text, JScript.NET will handle a lot of familiar Javascript methods and objects, including regex objects and regex replacements on strings.
sed syntax:
echo $a | sed -r 's/("key2"):"[^"]*"/\1:"replaced"/g'
JScript.NET syntax:
print(a.replace(/("key2"):"[^"]*"/, '$1:"replaced"'));
JScript.NET, just like JScript and JavaScript, also allows for calling a lambda function for the replacement.
print(
a.replace(
/"(key2)":"([^"]*)"/,
// $0 = full match; $1 = (key2); $2 = ([^"]*)
function($0, $1, $2):String {
var replace:String = $2.toUpperCase();
return '"$1":"' + replace + '"';
}
)
);
... Or to extract the value of key2 using the RegExp object's exec() method:
var extracted:String = /"key2":"([^"]*)"/.exec(a)[1];
print(extracted);
Just be careful with that, though, as retrieving element [1] of the result of exec() will cause an index-out-of-range exception if there is no match. Might either want to if (/"key2":/.test(a)) or add a try...catch. Or better yet, just do what I said earlier and deserialize your JSON into an object.