"updateDet" shouldnt recognize as keyword "update" - regex

With this code
import scala.util.parsing.combinator.JavaTokenParsers
class TestKeywords extends JavaTokenParsers {
def keywords: Parser[String] = "update"
def identifier: Parser[String] = not(keywords) ~> """[a-zA-Z0-9_$#]+""".r
def script: Parser[Any] = repsep(identifier,",")
}
object TestKeywordsApp extends TestKeywords with App {
val cmd = """updateDet,update"""
parseAll(script,
cmd.stripMargin) match {
case Success(lup, _) => println(lup)
case x => println(x)
}
}
i get error
[1.1] failure: string matching regex \z' expected butu' found
updateDet,update
How to fix it? updateDet shouldnt recognize as keyword
scala 2.10.2

word boundaries perhaps
– Amit Joki
To expand, you've said that identifier is not(keywords) followed by some characters. But updateDet isn't that - it does start with a keyword. Perhaps you should declare that a keyword ends with a word boundary (regex \b)?
– lmm

Related

Scala Regex Parser throws weird error

I have a simple RegexParser that matches {key}={value} repeating for several times:
object CommandOptionsParser extends RegexParsers {
private val key: Parser[String] = "[^= ]+".r
private val value: Parser[String] = "[^ ]*".r
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=".r ~> value).?).? ^^ {
case None => None
case Some(k ~ v) => Some(k.trim -> v.map(_.trim))
}
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^ {
case v =>
Map(v.flatten: _*)
}
def apply(input: String): Map[String, Option[String]] = parseAll(pairs, input) match {
case Success(plan, _) => plan
case x => sys.error(x.toString)
}
}
However the matching of value seems to fail on more than 1 capturing groups (despite that the regex doesn't limit it). when I try to match against "token=abc again=abc", I have the following error:
[1.11] failure: string matching regex `\z' expected but `a' found
token=abc again=abc'
^
Why RegexParser has such strange behaviour?
The fix for your unexpected behavior is quite easy, just change the value of skipWhitespace:
object CommandOptionsParser extends RegexParsers {
override val skipWhitespace = false
From description of RegexParsers:
The parsing methods call the method skipWhitespace (defaults to
true) and, if true, skip any whitespace before each parser is
called.
So, what happened, your first pair was matched, then whiteSpace was skipped and then, as repsep couldn't find another whitespace separator, it just assumed that parsing is over, hence that "\z" expected.
Also, I can't help but note that the whole Parser approach for such simple task seems overcomplicated, simple regexps would suffice.
UPD: Also your parsers can be a bit simpler:
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=" ~> value).?).? ^^ (_.map {case (k ~ v) => k.trim -> v.map(_.trim)})
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^
{ l => Map(l.flatten: _*)}

How to create a parser from Regex in Scala to parse a path?

I am writing a parser in which I am trying to parse a path and do arithmetic calculations. since I cannot use RegexParsers with StandardTokenParsers I am trying to make my own. So I am using the following code for that which I picked a part of it from another discussion:
lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] =
acceptMatch( "string literal matching regex " + r,{ case StringLit(s) if r.unapplySeq(s).isDefined => s })
def pathIdent: Parser[String] =regexStringLit ("/hdfs://([\\d.]+):(\\d+)/([\\w/]+/(\\w+\\.w+))".r)
def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] = pathIdent ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdent ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//some other code
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
Then I use args(0) to send my input to the program which is :
"/hdfs://111.33.55.2:8888/folder1/p.a3d+1"
and this is the error I get :
[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected
/hdfs://111.33.55.2:8888/folder1/p.a3d
^
I tried simple path and also I commented the rest of the code and just left the path part there but it seems like the regexStringLit is not working for me. I think I am wrong in syntax part. I don't know!
There are a couple of mistakes in you regex:
/hdfs://([\d.]+):(\d+)/([\w/]+/(\w+\.w+))
1) There are unnecessary parenthesis (or your forgot a +) - this is not a real mistake but makes it harder to read your regex and fix bugs.
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.w+
2) The last w+ is not escaped:
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.\w+
3) You only allow . but not + for the last part:
/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+
The above expression matches your test case, however, I do suspect, you actually want this expression:
/hdfs://\d+(\.\d+){3}:\d+(/(\w+([-+.*/]\w+)*))+
I solved it writing a trait and using JavaTokenParsers rather than StandardToken Parser.
trait pathIdentifier extends RegexParsers{
def pathIdent: Parser[String] ={
"""hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))""".r
}
}
#Tilo Thanks for your help your solution is working as well but changing extended class to JavaTokenParser helped to solve the problem.

non-greedy matching in Scala RegexParsers

Suppose I'm writing a rudimentary SQL parser in Scala. I have the following:
class Arith extends RegexParsers {
def selectstatement: Parser[Any] = selectclause ~ fromclause
def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}
When trying to match selectstatement against SELECT foo FROM bar, how do I prevent the selectclause from gobbling up the entire phrase due to the rep(token) in ~ tokens?
In other words, how do I specify non-greedy matching in Scala?
To clarify, I'm fully aware that I can use standard non-greedy syntax (*?) or (+?) within the String pattern itself, but I wondered if there's a way to specify it at a higher level inside def tokens. For example, if I had defined token like this:
def token: Parser[Any] = stringliteral | numericliteral | columnname
Then how can I specify non-greedy matching for the rep(token) inside def tokens?
Not easily, because a successful match is not retried. Consider, for example:
object X extends RegexParsers {
def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
}
scala> X.parseAll(X.p, "aaaab")
res1: X.ParseResult[X.~[String,String]] =
[1.2] failure: `ab' expected but `a' found
aaaab
^
The first match was successful, in parser inside parenthesis, so it proceeded to the next one. That one failed, so p failed. If p was part of alternative matches, the alternative would be tried, so the trick is to produce something that can handle that sort of thing.
Let's say we have this:
def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
terminal(in) match {
case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
case _ =>
rep(in) match {
case Success(x, rest) => recurse(rest, x :: elems)
case ns: NoSuccess => ns
}
}
recurse(in, Nil)
}
You can then use it like this:
def p = nonGreedy("a", "ab")
By the way,I always found that looking at how other things are defined is helpful in trying to come up with stuff like nonGreedy above. In particular, look at how rep1 is defined, and how it was changed to avoid re-evaluating its repetition parameter -- the same thing would probably be useful on nonGreedy.
Here's a full solution, with a little change to avoid consuming the "terminal".
trait NonGreedy extends Parsers {
def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
terminal(in) match {
case _: Success[_] => Success(elems.reverse, in)
case _ =>
rep(in) match {
case Success(x, rest) => recurse(rest, x :: elems)
case ns: NoSuccess => ns
}
}
recurse(in, Nil)
}
}
class Arith extends RegexParsers with NonGreedy {
// Just to avoid recompiling the pattern each time
val select: Parser[String] = "(?i)SELECT".r
val from: Parser[String] = "(?i)FROM".r
val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
val eof: Parser[String] = """\z""".r
def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
def selectclause(terminal: Parser[Any]): Parser[Any] =
select ~ tokens(terminal)
def fromclause(terminal: Parser[Any]): Parser[Any] =
from ~ tokens(terminal)
def tokens(terminal: Parser[Any]): Parser[Any] =
nonGreedy(token, terminal)
}

Parametrized Regex for pattern matching

Is it possible to match regular expression pattern which is returned from a function? Can I do something like this?
def pattern(prefix: String) = (prefix + "_(\\w+)").r
val x = something match {
case pattern("a")(key) => "AAAA" + key
case pattern("b")(key) => "BBBB" + key
}
I cannot compile the above code. The following console snapshot shows an error I get. What am I doing wrong?
scala> def pattern(prefix: String) = (prefix + "_(\\w+)").r
pattern: (prefix: String)scala.util.matching.Regex
scala> def f(s:String) = s match {
| case pattern("a")(x) => s+x+"AAAAA"
<console>:2: error: '=>' expected but '(' found.
case pattern("a")(x) => s+x+"AAAAA"
^
This syntax is not supported by scala, you have to declare the extractor before you use it. See my earlier question on this topic.

scala dsl parser: rep, opt and regexps

Learning how to use scala DSL:s and quite a few examples work nice.
However I get stuck on a very simple thing:
I'm parsing a language, which has '--' as comment until end of line.
A single line works fine using:
def comment: Parser[Comment] = """--.*$""".r ^^ { case c => Comment(c) }
But when connecting multiple lines I get an error.
I've tried several varaints, but the following feel simple:
def commentblock: Parser[List[Comment]] = opt(rep(comment)) ^^ {
case Some(x) => { x }
case None => { List() }
}
When running a test with two consecutive commentlines I get an error.
Testcase:
--Test Comment
--Test Line 2
Error:
java.lang.AssertionError: Parse error: [1.1] failure: string matching regex `--.*$' expected but `-' found
Any ideas on how I should fix this?
Complete code below:
import scala.util.parsing.combinator._
abstract class A
case class Comment(comment:String) extends A
object TstParser extends JavaTokenParsers {
override def skipWhitespace = true;
def comment: Parser[Comment] = """--.*$""".r ^^ { case c => Comment(c) }
def commentblock: Parser[List[Comment]] = opt(rep(comment)) ^^ {
case Some(x) => { x }
case None => { List() }
}
def parse(text : String) = {
parseAll(commentblock, text)
}
}
class TestParser {
import org.junit._, Assert._
#Test def testComment() = {
val y = Asn1Parser.parseAll(Asn1Parser.comment, "--Test Comment")
assertTrue("Parse error: " + y, y.successful)
val y2 = Asn1Parser.parseAll(Asn1Parser.commentblock,
"""--Test Comment
--Test Line 2
""")
assertTrue("Parse error: " + y2, y2.successful)
}
}
Not familiar with Scala, but in Java, the regex --.*$ matches:
-- two hyphens;
.* followed by zero or more characters other than line breaks;
$ followed by the end of the input (not necessarily the end of the line!).
So you could try:
def comment: Parser[Comment] = """--.*""".r ^^ { case c => Comment(c) }
or even:
def comment: Parser[Comment] = """--[^\r\n]*""".r ^^ { case c => Comment(c) }
Note that in both cases, the line break is left in place and not "consumed" by your comment "rule".