How can I avoid not-exhaustive-match warnings with Scala RegexParsers - regex

I have something like this:
def numberBrace: Parser[Double] = "{" ~ number ~ "}" ^^ {
case "{" ~ num ~ "}" => num
case _ => 0.0
}
This compiles cleanly, but if I didn't have the "case _" line there I get a "match may not be exhaustive" warning on compile.
I've got a lot of these clause-parsers, so these little extra case _ additions litter my code. Are they really needed here, and if not, how can I avoid them?

This problem goes away if you use RegexParsers' ~> and <~ operators. a ~> b matches but doesn't capture a, then matches but does capture b. b <~ c matches/captures b and matches/doesn't-capture c. So the original question can be accomplished like this:
def numberBrace: Parser[Double] = "{" ~> number <~ "}" ^^ { num => num }
This matches but ignores the braces but captures the number, which is simply returned in the ^^ clause. Much cleaner!

Related

Regex to match block of text inside square brackets that can be nested

I'm writing a parser in scala that reads a string composed by repetitions of '+', '-', '<', '>' and '.' characters. The string may also have '[' and ']' characters and inside them there is a repetition of the first group of characters.
I need a Regex that matches everything inside square brackets, the problem is that the brackets can be nested.
I've already tried with this regex: \[.*\] and many others that I've found on SO but none seems to be working.
The regex I'm looking for should work like this:
"[+++.]" matches "+++."
"[++[-]]" should match "++[-]"
edit (added a use case):
"[+++.] [++[-]]" should NOT match "+++.] [++[-]" but 2 matches of "+++." and "++[-]"
That would be pretty tough with a single regex, but with some post-processing you might get a bit closer.
def parse(s :String) :Array[String] =
"\\[(.*)\\]".r.unanchored
.findAllMatchIn(s)
.toArray
.flatMap(_.group(1).split(raw"][^\[\]]+\["))
usage:
parse("+++.]") //res0: Array[String] = Array()
parse("[+++.]") //res1: Array[String] = Array("+++.")
parse("[++[-]]") //res2: Array[String] = Array("++[-]")
parse("[+++.] [++[-]]") //res3: Array[String] = Array("+++.", "++[-]")
parse("[++[-]--] [+]") //res4: Array[String] = Array(++[-]--, +)
After some research I think I may have found the solution, however it is not usable in Scala. What is needed is a recursive regex that matches balanced constructs, in my case:
\[(?:[+-\[\]]|(?R))*\]
and as far as I know these kind are not supported in scala, so I'll just leave this here if someone needs it for other languages.
However I solved my problem by implementing the parser in another way, I just thought that having a regex like that would have been a simpler and smoother solution.
What I was implementing was a brainfuck language interpreter and here is my parser class:
class brainfuck(var pointer: Int, var array: Array[Int]) extends JavaTokenParsers {
def Program = rep(Statement) ^^ { _ => () }
def Statement: Parser[Unit] =
"+" ^^ { _ => array(pointer) = array(pointer) + 1 } |
"-" ^^ { _ => array(pointer) = array(pointer) - 1 } |
"." ^^ { _ => println("elem: " + array(pointer).toChar) } |
"," ^^ { _ => array(pointer) = readChar().toInt } |
">" ^^ { _ => pointer = pointer + 1 } |
"<" ^^ { _ => pointer = pointer - 1 } |
"[" ~> rep(block|squares) <~ "]" ^^ { items => while(array(pointer)!=0) { parseAll(Program,items.mkString) } }
def block =
"""[-+.,<>]""".r ^^ { b => b.toString() }
def squares: Parser[String] = "[" ~> rep(block|squares) <~ "]" ^^ { b => var res = "[" + b.mkString + "]"; res }
}

Scala Regex Parser throws weird error

I have a simple RegexParser that matches {key}={value} repeating for several times:
object CommandOptionsParser extends RegexParsers {
private val key: Parser[String] = "[^= ]+".r
private val value: Parser[String] = "[^ ]*".r
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=".r ~> value).?).? ^^ {
case None => None
case Some(k ~ v) => Some(k.trim -> v.map(_.trim))
}
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^ {
case v =>
Map(v.flatten: _*)
}
def apply(input: String): Map[String, Option[String]] = parseAll(pairs, input) match {
case Success(plan, _) => plan
case x => sys.error(x.toString)
}
}
However the matching of value seems to fail on more than 1 capturing groups (despite that the regex doesn't limit it). when I try to match against "token=abc again=abc", I have the following error:
[1.11] failure: string matching regex `\z' expected but `a' found
token=abc again=abc'
^
Why RegexParser has such strange behaviour?
The fix for your unexpected behavior is quite easy, just change the value of skipWhitespace:
object CommandOptionsParser extends RegexParsers {
override val skipWhitespace = false
From description of RegexParsers:
The parsing methods call the method skipWhitespace (defaults to
true) and, if true, skip any whitespace before each parser is
called.
So, what happened, your first pair was matched, then whiteSpace was skipped and then, as repsep couldn't find another whitespace separator, it just assumed that parsing is over, hence that "\z" expected.
Also, I can't help but note that the whole Parser approach for such simple task seems overcomplicated, simple regexps would suffice.
UPD: Also your parsers can be a bit simpler:
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=" ~> value).?).? ^^ (_.map {case (k ~ v) => k.trim -> v.map(_.trim)})
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^
{ l => Map(l.flatten: _*)}

Scala Regex for less than equal to operator (<=)

I am trying to parse an expression with (<, <=, >=, >). All but <= works just fine. Can someone help what could be the issue.
Code:
object MyTestParser extends RegexParsers {
override def skipWhitespace = true
private val expression: Parser[String] = """[a-zA-Z0-9\.]+""".r
val operation: Parser[Try[Boolean]] =
expression ~ ("<" | "<=" | ">=" | ">") ~ expression ^^ {
case v1 ~ op ~ v2 => for {
a <- Try(v1.toDouble)
b <- Try(v2.toDouble)
} yield op match {
case "<" => a < b
case "<=" => a <= b
case ">" => a > b
case ">=" => a >= b
}
}
}
Test:
"MyTestParser" should {
"successfully parse <= condition" in {
val parser = MyTestParser.parseAll(MyTestParser.operation, "10 <= 20")
val result = parser match {
case MyTestParser.Success(s, _) => s.get
case MyTestParser.Failure(e, _) =>
println(s"Parsing failed with error: $e")
false
case MyTestParser.Error(e, _) =>
println(s"Parsing error: $e")
false
}
result === true
}
"successfully parse >= condition" in {
val result = MyTestParser.parseAll(MyTestParser.operation, "50 >= 20").get
result === scala.util.Success(true)
}
}
Error for <= condition:
Parsing failed with error: string matching regex `[a-zA-Z0-9\.]+' expected but `=' found
You need to change the order of the alternatives so that the longest options could be checked first.
expression ~ ( "<=" | ">=" | ">" | "<") ~ expression ^^ {
If the shortest alternative matches first, others are not considered at all.
Also note that a period does not have to be escaped inside a character class, this will do:
"""[a-zA-Z0-9.]+""".r
Your problem is that "<" is matched by <=, so it moves on to trying the expression. If you change the order so that "<=" comes first, that will be matched instead, and you will get the desired result.
#Prateek: it does not work cause the regex engine works just like a boolean OR. It does not search further if one of the patterns in the or-chain is satisfied at a certain point.
So, when use | between patterns, if two or more patterns have substring in common, you have to place the longest first.
As a general rule: order the patterns starting from the longest to the shortest.
Change the relevant line like this make it works:
// It works as expected with '>= / >' also before for the same reason
expression ~ ("<=" | "<" | ">=" | ">") ~ expression ^^ {
Or you want to follow the general rule:
expression ~ ("<=" | ">=" | "<" | ">") ~ expression ^^ {

How to create a parser from Regex in Scala to parse a path?

I am writing a parser in which I am trying to parse a path and do arithmetic calculations. since I cannot use RegexParsers with StandardTokenParsers I am trying to make my own. So I am using the following code for that which I picked a part of it from another discussion:
lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] =
acceptMatch( "string literal matching regex " + r,{ case StringLit(s) if r.unapplySeq(s).isDefined => s })
def pathIdent: Parser[String] =regexStringLit ("/hdfs://([\\d.]+):(\\d+)/([\\w/]+/(\\w+\\.w+))".r)
def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] = pathIdent ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdent ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//some other code
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
Then I use args(0) to send my input to the program which is :
"/hdfs://111.33.55.2:8888/folder1/p.a3d+1"
and this is the error I get :
[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected
/hdfs://111.33.55.2:8888/folder1/p.a3d
^
I tried simple path and also I commented the rest of the code and just left the path part there but it seems like the regexStringLit is not working for me. I think I am wrong in syntax part. I don't know!
There are a couple of mistakes in you regex:
/hdfs://([\d.]+):(\d+)/([\w/]+/(\w+\.w+))
1) There are unnecessary parenthesis (or your forgot a +) - this is not a real mistake but makes it harder to read your regex and fix bugs.
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.w+
2) The last w+ is not escaped:
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.\w+
3) You only allow . but not + for the last part:
/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+
The above expression matches your test case, however, I do suspect, you actually want this expression:
/hdfs://\d+(\.\d+){3}:\d+(/(\w+([-+.*/]\w+)*))+
I solved it writing a trait and using JavaTokenParsers rather than StandardToken Parser.
trait pathIdentifier extends RegexParsers{
def pathIdent: Parser[String] ={
"""hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))""".r
}
}
#Tilo Thanks for your help your solution is working as well but changing extended class to JavaTokenParser helped to solve the problem.

Using parser combinators to collate lines of text

I'm trying to parse a text file using parser combinators. I want to capture the index and text in a class called Example. Here's a test showing the form on an input file:
object Test extends ParsComb with App {
val input = """
0)
blah1
blah2
blah3
1)
blah4
blah5
END
"""
println(parseAll(examples, input))
}
And here's my attempt that doesn't work:
import scala.util.parsing.combinator.RegexParsers
case class Example(index: Int, text: String)
class ParsComb extends RegexParsers {
def examples: Parser[List[Example]] = rep(divider~example) ^^
{_ map {case d ~ e => Example(d,e)}}
def divider: Parser[Int] = "[0-9]+".r <~ ")" ^^ (_.toInt)
def example: Parser[String] = ".*".r <~ (divider | "END")
}
It fails with:
[4.1] failure: `END' expected but `b' found
blah2
^
I'm just starting out with these so I don't have much clue what I'm doing. I think the problem could be with the ".*".r regex not doing multi-line. How can I change this so that it parses correctly?
What does the error message mean?
According to your grammar definition, ".*".r <~ (divider | "END"), you told to the parser that, an example should followed either by a divider or a END. After parsing blah1, the parser tried to find divider and failed, then tried END, failed again, there're no other options available, so the END here was the last alternative of the production value, so from the parser's perspective, it expected END, but it soon found, the next input was blah2 from the 4th line.
How to fix it?
Try to be close to your implementation, the grammar in your case should be:
examples ::= {divider example}
divider ::= Integer")"
example ::= {literal ["END"]}
and I think parsing "example" into List[String] makes more sense, anyway, it's up to you.
The problem is your example parser, it should be a repeatable literal.
So ,
class ParsComb extends RegexParsers {
def examples: Parser[List[Example]] = rep(divider ~ example) ^^ { _ map { case d ~ e => Example(d, e) } }
def divider: Parser[Int] = "[0-9]+".r <~ ")" ^^ (_.toInt)
def example: Parser[List[String]] = rep("[\\w]*(?=[\\r\\n])".r <~ opt("END"))
}
the regex (?=[\\r\\n]) means it's a positive lookahead and would match characters that followed by \r or \n.
the parse result is:
[10.1] parsed: List(Example(0,List(blah1, blah2, blah3)),
Example(1,List(blah4, blah5)))
If you want to parse it into a String(instead of List[String]), just add a transform function for example: ^^ {_ mkString "\n"}
Your parser can't process newline character, your example parser eliminates next divider and your example regex matches divider and "END" string.
Try this:
object ParsComb extends RegexParsers {
def examples: Parser[List[Example]] = rep(divider~example) <~ """END\n?""".r ^^ {_ map {case d ~ e => Example(d,e)}}
def divider: Parser[Int] = "[0-9]+".r <~ ")\n" ^^ (_.toInt)
def example: Parser[String] = rep(str) ^^ {_.mkString}
def str: Parser[String] = """.*\n""".r ^? { case s if simpleLine(s) => s}
val div = """[0-9]+\)\n""".r
def simpleLine(s: String) = s match {
case div() => false
case "END\n" => false
case _ => true
}
def apply(s: String) = parseAll(examples, s)
}
Result:
scala> ParsComb(input)
res3: ParsComb.ParseResult[List[Example]] =
[10.1] parsed: List(Example(0,blah1
blah2
blah3
), Example(1,blah4
blah5
))
I think the problem could be with the ".*".r regex not doing
multi-line.
Exactly. Use the dotall modifier (strangely called "s"):
def example: Parser[String] = "(?s).*".r <~ (divider | "END")