Can't overload the >> operator in Raku - overloading

I'm trying to overload the >> operator like this:
class A {}
multi sub infix:«>>»(A:D $a, Str() $b) is assoc<non> { dd $a; dd $b }
my $x = A.new;
$x >> 'output.txt';
But I get a compile error at line 5 that says:
Unsupported use of >> to do right shift. In Raku please use: +> or ~>.
What am I missing?

This is a case of Rakudo's compiler being (kind of) too smart for its own good. Because there are different types of shifting operations in Raku and neither use a double arrow, the grammar used by Rakudo has >> set to trigger an alert for people who are used to other languages. I guess no one thought at the time that someone would make a >> operator which makes sense because >> more or less implies there might be a <<, which could wreak all sorts of havoc given it use as a quoting circumfix and a meta operator.
You can see the code the grammar here:
https://github.com/rakudo/rakudo/blob/9d6d8dd7a72aed698e30b6fe4b8eea62642c62c6/src/Perl6/Grammar.nqp#L4104

Related

Regular expression in C++ for mathematical expressions

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /.
For example:
6+(69-9)+3
is ok while
6++8-(52--*3)
no.
I am not using the library <regex> since it is only compatible with C++11.
Is there a alternative method to solve this problem? Thanks.
You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.
So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:
bool check_balanced(const char* expr, char open, char close) {
int parens = 0;
for (const char* p = expr; *p; ++p) {
if (*p == open) ++parens;
else if (*p == close && parens-- == 0) return false;
}
return parens == 0;
}
To get the regular expression, note that mathematical expressions without function calls can be summarized as:
BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*
where:
BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).
AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.
BETWEEN is a sub-regex which matches a binary operator.
VALUE is a sub-regex which matches a value.
For example, for ordinary four-operator arithmetic on integers you would have:
BEFORE: [-+(]
AFTER: [)]
BETWEEN: [-+*/]
VALUE: [[:digit:]]+
and putting all that together you might end up with the regex:
^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$
If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.
You can loop over each charin your expression.
If you encounter a + you can check whether it is follow by another +, /, *...
Additionally you can group operators together to prevent code duplication.
int i = 0
while(!EOF) {
switch(expression[i]) {
case '+':
case '*': //Do your syntax checks here
}
i++;
}
Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.
However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.
Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123
Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.
The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:
5--3 can be 5-(-3)
So in this case, you have two choices:
Ensure that the input is parenthesized well enough that no two operators meet
If you find something like --, treat it as a special case and investigate it further
If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:
--
+-
+*
-+
etc.
If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).
You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:
[+\-\*\\/][+\-\*\\/]
which will work for the following examples (and more):
6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)
An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.
There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.
A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:
Write a grammar for a mathematical expression. (These can be found online)
Tokenize the input into lexemes. (This will be done with a regex, typically).
Match the expressions based on the next lexeme you see.
Recurse based on your grammar
A quick Google search can provide many example recursive-descent parsers written in C++.

When to use equality operator over binding operator

recently I have become friendly with regular expressions and used them to over come a number of tasks very efficiently. As with most perl TIMTOWTDI has clouded my judgement. There are times I can use equality operator or binding operator. However are there times where it is more appropriate to use one over the other?
Firstly the simplified case
my $name = 'Chris';
if ($name eq 'Chris') { print 'What a great name!'; }
if ($name =~/^Chris$/) { print 'Yip sure is a great name; }
So in this case this is the most simplified, where using the equality is less typing, however in this simplified example is there any benefit to one or the other.
In a slightly more complex example
my $name = 'Christopher';
if ($name eq 'Chris' || $name eq 'Christopher') { print 'What a great name!'; }
if ($name =~ /^Chris(?:topher)?$/) { print 'Yip sure is a great name; }
here the binding operator is less typing. However I am not sure of the benefit either may hold over the other.
So is the general rule if you are matching an entire string with a fixed value to use equality operator and if your matching a string with a pattern for example any 5 digits string /\d{5}/ then use binding operator.
Is it inappropriate to use binding operator in the above examples. I appreciate that these examples are just made up and may not reflect a real life problem. However they were the ones i thought of to try to explain my question.
however in this simplified example is there any benefit to one or the other.
Well, they're not equivalent. /^Chris$/ matches Chris and Chris followed by a newline.
If you had used an equivalent pattern (/^Chris\z/), the difference would have been performance. A single string comparison will be faster than a regex match. It's also clearer.
For more complicated comparisons, you generally want to go with what's simpler, clearer, and more maintainable. Address performance (by using profile and running benchmarks) when it becomes an issue.
I would expect slightly (if at all) better performance from the eq operator because the regular expression might require a compilation phase as well as analysis before coming up with its determination.
So in the case:
if ($name eq 'Chris') { print 'What a great name!'; }
if ($name =~/^Chris$/) { print 'Yip sure is a great name; }
... I would expect the first statement to be fastest.
In the second example, however, you have to consider the summed times of the failed cases where you've provided a logical OR:
if ($name eq 'Chris' || $name eq 'Christopher') { print 'What a great name!'; }
if ($name =~ /^Chris(?:topher)?$/) { print 'Yip sure is a great name; }
... here things are less cut and dried. Sure, eq may be faster, but are two eqs faster than a regular expression which doesn't have to backtrack (in this example)? I can't be so sure.
Usually you won't have to consider the performance benefits. So you can't argue one is "better" than the other - I'd usually encourage code clarity in this situation. But it's important to realise that eq is very unforgiving while regular expressions are very flexible - allowing for case-insensitive searches, anchoring to just the beginning, etc. When you do hit some code in which comparison speed is critical then ultimately you'll want to benchmark.
The power of regular expressions is realized in its variability.
When you give a regex engine a template, you "suggest" match outcomes to the engine.
Inernally, its the same C "strncmp()" and such as you would do as in Perl, ie: $str eq "asdf", both are templates.
However, you cannot describe variablilty very well with just a language, thats why regular expression engines exist.
There is an overhead to "eterring" the engine, ie: reset variables, state tracking etc..
But after that, the engine will outperform any combination of language constructs you can
concieve of. Not by a little, but by a huge, huge percentage.

Initiating Short circuit rule in Bison for && and || operations

I'm programming a simple calculator in Bison & Flex , using C/C++ (The logic is done in Bison , and the C/C++ part is responsible for the data structures , e.g. STL and more) .
I have the following problem :
In my calculator the dollar sign $ means i++ and ++i (both prefix and postfix) , e.g. :
int y = 3;
-> $y = 4
-> y$ = 4
When the user hits : int_expression1 && int_expression2 , if int_expression1 is evaluated to 0 (i.e. false) , then I don't wan't bison to evaluate int_expression2 !
For example :
int a = 0 ;
int x = 2 ;
and the user hits : int z = a&&x$ ...
So , the variable a is evaluated to 0 , hence , I don't want to evaluate x , however it still grows by 1 ... here is the code of the bison/c++ :
%union
{
int int_value;
double double_value;
char* string_value;
}
%type <int_value> int_expr
%type <double_value> double_expr
%type <double_value> cmp_expr
int_expr:
| int_expr '&&' int_expr { /* And operation between two integers */
if ($1 == 0)
$$ = 0;
else // calc
$$ = $1 && $3;
}
How can I tell bison to not evaluate the second expression , if the first one was already evaluated to false (i.e. 0) ?
Converting extensive commentary into an answer:
How can I tell Bison to not evaluate the second expression if the first one was already evaluated to false?
It's your code that does the evaluation, not Bison; put the 'blame' where it belongs.
You need to detect that you're dealing with an && rule before the RHS is evaluated. The chances are that you need to insert some code after the && and before the second int_expr that suspends evaluation if the first int_expr evaluates to 0. You'll also need to modify all the other evaluation code to check for and obey a 'do not evaluate' flag.
Alternatively, you have the Bison do the parsing and create a program that you execute when the parse is complete, rather than evaluating as you parse. That is a much bigger set of changes.
Are you sure regarding putting some code before the second int_expr ? I can't seem to find a plausible way to do that. It's a nice trick, but I can't find a way to actually tell Bison not to evaluate the second int_expr, without ruining the entire evaluation.
You have to write your code so that it does not evaluate when it is not supposed to evaluate. The Bison syntax is:
| int_expr '&&' {...code 1...} int_expr {...code 2...}
'Code 1' will check on $1 and arrange to stop evaluating (set a global variable or something similar). 'Code 2' will conditionally evaluate $4 (4 because 'code 1' is now $3). All evaluation code must obey the dictates of 'code 1' — it must not evaluate if 'code 1' says 'do not evaluate'. Or you can do what I suggested and aselle suggested; parse and evaluate separately.
I second aselle's suggestion about The UNIX Programming Environment. There's a whole chapter in there about developing a calculator (they call it hoc for higher-order calculator) which is worth reading. Be aware, though, that the book was published in 1984, and pre-dates the C standard by a good margin. There are no prototypes in the C code, and (by modern standards) it takes a few liberties. I do have hoc6 (the last version of hoc they describe; also versions 1-3) in modern C — contact me if you want it (see my profile).
That's the problem: I can't stop evaluating in the middle of the rule, since I cannot use return (I can, but of no use; it causes the program to exit). | intExpr '&&' { if ($1 == 0) {/* turn off a flag */ } } intExpr { /* code */} After I exit $3 the $4 is being evaluated automatically.
You can stop evaluating in the middle of a rule, but you have to code your expression evaluation code block to take the possibility into account. And when I said 'stop evaluating', I meant 'stop doing the calculations', not 'stop the parser in its tracks'. The parsing must continue; your code that calculates values must only evaluate when evaluation is required, not when no evaluation is required. This might be an (ugh!) global flag, or you may have some other mechanism.
It's probably best to convert your parser into a code generator and execute the code after you've parsed it. This sort of complication is why that is a good strategy.
#JonathanLeffler: You're indeed the king ! This should be an answer !!!
Now it is an answer.
You almost assuredly want to generate some other representation before evaluating in your calculator. A parse tree or ast are classic methods, but a simple stack machine is also popular. There are many great examples of how to do this, but my favorite is
http://www.amazon.com/Unix-Programming-Environment-Prentice-Hall-Software/dp/013937681X
That shows how to take a simple direct evaluation tool like you have made in yacc (old bison) and take it all the way to a programming language that is almost as powerful as BASIC. All in very few pages. It's a very old book but well worth the read.
You can also look at SeExpr http://www.disneyanimation.com/technology/seexpr.html
which is a simple expression language calculator for scalars and 3 vectors. If you look at https://github.com/wdas/SeExpr/blob/master/src/SeExpr/SeExprNode.cpp
on line 313 you will see the && implementation of he eval() function:
void
SeExprAndNode::eval(SeVec3d& result) const
{
// operands and result must be scalar
SeVec3d a, b;
child(0)->eval(a);
if (!a[0]) {
result[0] = 0;
} else {
child(1)->eval(b);
result[0] = (b[0] != 0.0);
}
}
That file contains all objects that represent operations in the parse tree. These objects are generated as the code is parsed (these are the actions in yacc). Hope this helps.

Bison, C++ GLR parsing: how to force shift\reduce conflict?

How can I force the shift\reduce conflict to be resolved by the GLR method?
Suppose I want the parser to resolve the conflict between the right shift operator and two closing angle brackets of template arguments for itself. I make the lexer pass the 2 consecutive ">" symbols, as separate tokens, without merging them into one single ">>" token. Then i put these rules to the grammar:
operator_name:
"operator" ">"
| "operator" ">" ">"
;
I want this to be a shift\reduce conflict. If I have the token declaration for ">" with left associativity, this will not be a conflict. So I have to remove the token precedence\associativity declaration, but this results in many other conflicts that I don't want to solve manually by specifying the contextual precedence for each conflicting rule. So, is there a way to force the shift\reduce conflict while having the token declared?
I believe that using context-dependent precedence on the rules for operator_name will work.
The C++ grammar as specified by the updated standard actually modifies the grammar to accept the >> token as closing two open template declarations. I'd recommend following it to get standard behaviour. For example, you must be careful that "x > > y" is not parsed as "x >> y", and you must also ensure that "foo<bar<2 >> 1>>" is invalid, while "foo<bar<(2 >> 1)>>" is valid.
I worked in Yacc (similar to Bison), with a similar scenario.
Standard grammars are, sometimes, called "parsing directed by syntax".
This case is, sometimes, called something like "parsing directed by semantics".
Example:
...
// shift operator example
if ((x >> 2) == 0)
...
// consecutive template closing tag example
List<String, List<String>> MyList =
...
Lets remember, our mind works like a compiler. The human mind can compile this, but the previous grammars, can't. Mhhh. Lets see how a human mind, would compile this code.
As you already know, the "x" before the consecutive ">" and ">" tokens indicates an expression or lvalue. The mind thinks "two consecutive greater-than symbols, after an expresion, should become a single shift operator token".
And for the "string" token: "two consecutive greater-than symbols, after a type identifier, should become two consecutive template closing tag tokens".
I think this case cannot be handled by the usual operator precedence, shift or reduce, or just grammars, but using ( "hacking" ) some functions provided by the parser itself.
I don't see an error in your example grammar rule. The "operator" symbol avoids confusing the two cases you mention. The parts that should be concern its the grammars where the shift operator its used, and the consecutive template closing tags are used.
operator_expr_example:
lvalue "<<" lvalue |
lvalue ">>" lvalue |
lvalue "&&" lvalue |
;
template_params:
identifier |
template_declaration_example |
array_declaration |
other_type_declaration
;
template_declaration_example:
identifier "<" template_params ">"
;
Cheers.

Processing conditional statements

I'm not looking for an implementation, just pseudo-code, or at least an algorithm to handle this effectively. I need to process statements like these:
(a) # if(a)
(a,b) # if(a || b)
(a+b) # if(a && b)
(a+b,c) # same as ((a+b),c) or if((a&&b) || c)
(a,b+c) # same as (a,(b|c)) or if(a || (b&&c))
So the + operator takes precedence over the , operator. (so my + is like mathematical multiplication with , being mathematical addition, but that is just confusing).
I think a recursive function would be best, so I can handle nested parentheses nice and easy by a recursive call. I'll also take care of error handling once the function returns, so no worries there. The problems I'm having:
I just don't know how to tackle the precedence thing. I could return true as soon as I see a , and the previous value was true. Otherwise, I'll rerun the same routine. A plus would effectively be a bool multiplication (ie true*true=true, true*false=false etc...).
Error detection: I've thought up several schemes to handle the input, but there are a lot of ugly bad things I want to detect and print an error to the user. None of the schemes I thought of handle errors in a unified (read: centralized) place in the code, which would be nice for maintainability and readability:
()
(,...
(+...
(a,,...
(a,+...
(a+,...
(a++...
Detecting these in my "routine" above should take care of bad input. Of course I'll check end-of-input each time I read a token.
Of course I'll have the problem of maybe having o read the full text file if there are unmatched parenthesis, but hey, people should avoid such tension.
EDIT: Ah, yes, I forgot the ! which should also be usable like the classic not operator:
(!a+b,c,!d)
Tiny update for those interested: I had an uninformed wild go at this, and wrote my own implementation from scratch. It may not be pretty enough for the die-hards, so hence this question on codereview.
The shunting-yard algorithm is easily implementable in a relatively short amount of code. It can be used to convert an infix expression like those in your examples into postfix expressions, and evaluation of a postfix expression is Easy-with-a-capital-E (you don't strictly need to complete the infix-to-postfix conversion; you can evaluate the postfix output of the shunting yard directly and just accumulate the result as you go along).
It handles operator precedence, parentheses, and both unary and binary operators (and with a little effort can be modified to handle infix ternary operators, like the conditional operator in many languages).
Write it in yacc (bison) it becomes trivial.
/* Yeacc Code */
%token IDENTIFIER
%token LITERAL
%%
Expression: OrExpression
OrExpression: AndExpression
| OrExpression ',' AndExpression
AndExpression: NotExpression
| AndExpression '+' NotExpression
NotExpression: PrimaryExpression
| '!' NotExpression
PrimaryExpression: Identifier
| Literal
| '(' Expression ')'
Literal: LITERAL
Identifier: IDENTIFIER
%%
There's probably a better (there's definitely a more concise) description of this, but I learned how to do this from this tutorial many years ago:
http://compilers.iecc.com/crenshaw/
It's a very easy read for non-programmers too (like me). You'll need only the first few chapters.