Scala Regex for less than equal to operator (<=) - regex

I am trying to parse an expression with (<, <=, >=, >). All but <= works just fine. Can someone help what could be the issue.
Code:
object MyTestParser extends RegexParsers {
override def skipWhitespace = true
private val expression: Parser[String] = """[a-zA-Z0-9\.]+""".r
val operation: Parser[Try[Boolean]] =
expression ~ ("<" | "<=" | ">=" | ">") ~ expression ^^ {
case v1 ~ op ~ v2 => for {
a <- Try(v1.toDouble)
b <- Try(v2.toDouble)
} yield op match {
case "<" => a < b
case "<=" => a <= b
case ">" => a > b
case ">=" => a >= b
}
}
}
Test:
"MyTestParser" should {
"successfully parse <= condition" in {
val parser = MyTestParser.parseAll(MyTestParser.operation, "10 <= 20")
val result = parser match {
case MyTestParser.Success(s, _) => s.get
case MyTestParser.Failure(e, _) =>
println(s"Parsing failed with error: $e")
false
case MyTestParser.Error(e, _) =>
println(s"Parsing error: $e")
false
}
result === true
}
"successfully parse >= condition" in {
val result = MyTestParser.parseAll(MyTestParser.operation, "50 >= 20").get
result === scala.util.Success(true)
}
}
Error for <= condition:
Parsing failed with error: string matching regex `[a-zA-Z0-9\.]+' expected but `=' found

You need to change the order of the alternatives so that the longest options could be checked first.
expression ~ ( "<=" | ">=" | ">" | "<") ~ expression ^^ {
If the shortest alternative matches first, others are not considered at all.
Also note that a period does not have to be escaped inside a character class, this will do:
"""[a-zA-Z0-9.]+""".r

Your problem is that "<" is matched by <=, so it moves on to trying the expression. If you change the order so that "<=" comes first, that will be matched instead, and you will get the desired result.

#Prateek: it does not work cause the regex engine works just like a boolean OR. It does not search further if one of the patterns in the or-chain is satisfied at a certain point.
So, when use | between patterns, if two or more patterns have substring in common, you have to place the longest first.
As a general rule: order the patterns starting from the longest to the shortest.
Change the relevant line like this make it works:
// It works as expected with '>= / >' also before for the same reason
expression ~ ("<=" | "<" | ">=" | ">") ~ expression ^^ {
Or you want to follow the general rule:
expression ~ ("<=" | ">=" | "<" | ">") ~ expression ^^ {

Related

Regex to match block of text inside square brackets that can be nested

I'm writing a parser in scala that reads a string composed by repetitions of '+', '-', '<', '>' and '.' characters. The string may also have '[' and ']' characters and inside them there is a repetition of the first group of characters.
I need a Regex that matches everything inside square brackets, the problem is that the brackets can be nested.
I've already tried with this regex: \[.*\] and many others that I've found on SO but none seems to be working.
The regex I'm looking for should work like this:
"[+++.]" matches "+++."
"[++[-]]" should match "++[-]"
edit (added a use case):
"[+++.] [++[-]]" should NOT match "+++.] [++[-]" but 2 matches of "+++." and "++[-]"
That would be pretty tough with a single regex, but with some post-processing you might get a bit closer.
def parse(s :String) :Array[String] =
"\\[(.*)\\]".r.unanchored
.findAllMatchIn(s)
.toArray
.flatMap(_.group(1).split(raw"][^\[\]]+\["))
usage:
parse("+++.]") //res0: Array[String] = Array()
parse("[+++.]") //res1: Array[String] = Array("+++.")
parse("[++[-]]") //res2: Array[String] = Array("++[-]")
parse("[+++.] [++[-]]") //res3: Array[String] = Array("+++.", "++[-]")
parse("[++[-]--] [+]") //res4: Array[String] = Array(++[-]--, +)
After some research I think I may have found the solution, however it is not usable in Scala. What is needed is a recursive regex that matches balanced constructs, in my case:
\[(?:[+-\[\]]|(?R))*\]
and as far as I know these kind are not supported in scala, so I'll just leave this here if someone needs it for other languages.
However I solved my problem by implementing the parser in another way, I just thought that having a regex like that would have been a simpler and smoother solution.
What I was implementing was a brainfuck language interpreter and here is my parser class:
class brainfuck(var pointer: Int, var array: Array[Int]) extends JavaTokenParsers {
def Program = rep(Statement) ^^ { _ => () }
def Statement: Parser[Unit] =
"+" ^^ { _ => array(pointer) = array(pointer) + 1 } |
"-" ^^ { _ => array(pointer) = array(pointer) - 1 } |
"." ^^ { _ => println("elem: " + array(pointer).toChar) } |
"," ^^ { _ => array(pointer) = readChar().toInt } |
">" ^^ { _ => pointer = pointer + 1 } |
"<" ^^ { _ => pointer = pointer - 1 } |
"[" ~> rep(block|squares) <~ "]" ^^ { items => while(array(pointer)!=0) { parseAll(Program,items.mkString) } }
def block =
"""[-+.,<>]""".r ^^ { b => b.toString() }
def squares: Parser[String] = "[" ~> rep(block|squares) <~ "]" ^^ { b => var res = "[" + b.mkString + "]"; res }
}

How can I avoid not-exhaustive-match warnings with Scala RegexParsers

I have something like this:
def numberBrace: Parser[Double] = "{" ~ number ~ "}" ^^ {
case "{" ~ num ~ "}" => num
case _ => 0.0
}
This compiles cleanly, but if I didn't have the "case _" line there I get a "match may not be exhaustive" warning on compile.
I've got a lot of these clause-parsers, so these little extra case _ additions litter my code. Are they really needed here, and if not, how can I avoid them?
This problem goes away if you use RegexParsers' ~> and <~ operators. a ~> b matches but doesn't capture a, then matches but does capture b. b <~ c matches/captures b and matches/doesn't-capture c. So the original question can be accomplished like this:
def numberBrace: Parser[Double] = "{" ~> number <~ "}" ^^ { num => num }
This matches but ignores the braces but captures the number, which is simply returned in the ^^ clause. Much cleaner!

Scala Parser Combinators :Extract List of a type

I am writing parser and grammer using scala.util.parsing.combinator .
My input is ":za >= 1 alok && :ft == 9"
case class Expression(op1:Operand,operator:Operator,op2:Operand)
def word: Parser[String] = """[a-z\\\s]+""".r ^^ { _.toString }
def colonWord:Parser[Operand]=":"~> word ^^{case variable => Operand(variable)}
def constant:Parser[Operand]="""^[a-zA-Z0-9\\\s-]*""".r ^^ {case constant => Operand(constant)}
def expression:Parser[Expression] = colonWord ~ operator ~ constant ^^{ case op1~operator~op2 => Expression(op1, operator, op2)}
def expressions = expression ~ opt(" && " ~> expression)*
but when I parse sample String , the result is not expected. The second expression after && is not parsed. Please note there can be multiple expression joined using &&.
When i execute:
val expr= ":za >= 1 alok && :ft == 9"
parse(expressions, expr) match {
case Success(matched, _) => println(matched)
case ..}
Output is :
List((Expression(za ,>= ,1 alok )~None))
I dont see the second expression being parsed. Can anyone help what have i missed here?
EDIT
-----------------------------------
The requirement is to get List[Expression].
when I say, incorporting the changes mentioned in Ans :
def expressions = expression ~ ("&&" ~> expression)*
The return type of expressions is not List[Expression]. For eg:
If I write another def :
case class Condition(exprs: List[Expression], st:Statement)
def condition = expressions ~","~statement ^^{
case exprs~commaa~statement => Condition(exprs,statement) //This is giving error.
the error is:
type mismatch; found : ~[Expression,Expression]] required: Expressions.
So how do i convert [Expression, Expression] to List[Expressions]?
Thanks
The correction needed was:
expression ~ opt("&&" ~> expression)*
Remove space across && and it should work. This is because you are already covering space in your constant parser.
Edit:
Based on edited question, is this what you want:
def expressions = expression ~ (("&&" ~> expression)*) ^^{
case x ~ y => x :: y
}
Now the return type of expressions is List[Expression]. Your condition will now compile

Scala Regex Parser throws weird error

I have a simple RegexParser that matches {key}={value} repeating for several times:
object CommandOptionsParser extends RegexParsers {
private val key: Parser[String] = "[^= ]+".r
private val value: Parser[String] = "[^ ]*".r
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=".r ~> value).?).? ^^ {
case None => None
case Some(k ~ v) => Some(k.trim -> v.map(_.trim))
}
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^ {
case v =>
Map(v.flatten: _*)
}
def apply(input: String): Map[String, Option[String]] = parseAll(pairs, input) match {
case Success(plan, _) => plan
case x => sys.error(x.toString)
}
}
However the matching of value seems to fail on more than 1 capturing groups (despite that the regex doesn't limit it). when I try to match against "token=abc again=abc", I have the following error:
[1.11] failure: string matching regex `\z' expected but `a' found
token=abc again=abc'
^
Why RegexParser has such strange behaviour?
The fix for your unexpected behavior is quite easy, just change the value of skipWhitespace:
object CommandOptionsParser extends RegexParsers {
override val skipWhitespace = false
From description of RegexParsers:
The parsing methods call the method skipWhitespace (defaults to
true) and, if true, skip any whitespace before each parser is
called.
So, what happened, your first pair was matched, then whiteSpace was skipped and then, as repsep couldn't find another whitespace separator, it just assumed that parsing is over, hence that "\z" expected.
Also, I can't help but note that the whole Parser approach for such simple task seems overcomplicated, simple regexps would suffice.
UPD: Also your parsers can be a bit simpler:
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=" ~> value).?).? ^^ (_.map {case (k ~ v) => k.trim -> v.map(_.trim)})
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^
{ l => Map(l.flatten: _*)}

Accept brackets correctly in bisonc++

I've tried to write a basic syntax checker using bisonc++
The rules are:
expression -> OPEN_BRACKET expression CLOSE_BRACKET
expression -> expression operator expression
operator -> PLUS
operator -> MINUS
If I try to run the compiled code, I get an error at this line:
(a+b)-(c+d)
The first rule is applied, the leftmost and the rightmost brackets are the OPEN_BRACKET and the CLOSE_BRACKET. The remaining expression is: a+b)-(c+d
How is it possible to prevent this behaviour? Is it possible to count the open and closed brackets?
Edit
The expression grammar:
expression:
OPEN_BRACKET expression CLOSE_BRACKET
{
//
}
| operator
{
//
}
| VARIABLE
{
//
}
;
operator:
expression PLUS expression
{
//
}
| expression MINUS expression
{
//
}
;
Edit2
The lexer
CHAR [a-z]
WS [ \t\n]
%%
{CHAR}+ return Parser::VARIABLE;
"+" return Parser::PLUS;
"-" return Parser::MINUS;
"(" return Parser::OPEN_BRACKET;
")" return Parser::CLOSE_BRACKET;
This is not a normal expression grammar. Try the normal one.
expression
: term
| expression '+' term
| expression '-' term
;
term
: factor
| term '*' factor
| term '/' factor
| term '%' factor
;
factor
: primary
| '-' factor // unary minus
| primary '^' factor // exponentiation, right-associative
;
primary
: identifier
| literal
| '(' expression ')'
;
Note also the above method of indenting and aligning, and that you only have to return yytext[0] from the lexer for single special characters: you don't need special token names, and it's more readable without them:
CHAR [a-zA-Z]
DIGIT [0-9]
WHITESPACE [ \t\r\n]
%%
{CHAR}+ { return Parser::VARIABLE; }
{DIGIT}+ { return Parser::LITERAL; }
{WHITESPACE}+ ;
. { return yytext[0]; }
Your operator rule does not look good.
Try experiment with:
expression:
OPEN_BRACKET expression CLOSE_BRACKET
{
//
}
|
expression operator expression
{
//
}
|
VARIABLE
{
//
}
;
operator:
PLUS
{
//
}
|
MINUS
{
//
}
;
As your pseudo code actually suggests...