Scala Map[Regex, String] collectFirst error - regex

I am trying to automatically convert a string to Date based on regex matches. My code thus far is as below:
package be.folks.date
import java.util.Date
import scala.util.matching.Regex
import org.joda.time.format.DateTimeFormat
class StringToDate(underlying:String) {
val regmap : Map[Regex, String] = Map(
("""\d\d-\d\d-\d\d\d\d""".r, "dd-MM-yyyy"),
("""\d\d-\w\w\w-\d\d\d\d""".r, "dd-MMM-yyyy")
)
def toDate() : Date = {
DateTimeFormat.forPattern((regmap collectFirst { case (_(underlying) , v) => v } get)).parseDateTime(underlying).toDate()
}
}
object StringToDate {
implicit def +(s:String) = new StringToDate(s)
}
However, I am getting an error for "_" - ) expected but found (.
How do I correct this?

I'm not sure I understand your syntax to apply the Regex. Maybe, in toDate, you wanted:
regmap collectFirst {
case (pattern , v) if((pattern findFirstIn underlying).nonEmpty) => v}
I also would not use get to extract the string from the option, as it throws an exception if no matching regex is found. I don't know how you want to manage that case in your code so I can't give you suggestions.

Related

Scala Regex Parser throws weird error

I have a simple RegexParser that matches {key}={value} repeating for several times:
object CommandOptionsParser extends RegexParsers {
private val key: Parser[String] = "[^= ]+".r
private val value: Parser[String] = "[^ ]*".r
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=".r ~> value).?).? ^^ {
case None => None
case Some(k ~ v) => Some(k.trim -> v.map(_.trim))
}
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^ {
case v =>
Map(v.flatten: _*)
}
def apply(input: String): Map[String, Option[String]] = parseAll(pairs, input) match {
case Success(plan, _) => plan
case x => sys.error(x.toString)
}
}
However the matching of value seems to fail on more than 1 capturing groups (despite that the regex doesn't limit it). when I try to match against "token=abc again=abc", I have the following error:
[1.11] failure: string matching regex `\z' expected but `a' found
token=abc again=abc'
^
Why RegexParser has such strange behaviour?
The fix for your unexpected behavior is quite easy, just change the value of skipWhitespace:
object CommandOptionsParser extends RegexParsers {
override val skipWhitespace = false
From description of RegexParsers:
The parsing methods call the method skipWhitespace (defaults to
true) and, if true, skip any whitespace before each parser is
called.
So, what happened, your first pair was matched, then whiteSpace was skipped and then, as repsep couldn't find another whitespace separator, it just assumed that parsing is over, hence that "\z" expected.
Also, I can't help but note that the whole Parser approach for such simple task seems overcomplicated, simple regexps would suffice.
UPD: Also your parsers can be a bit simpler:
val pair: Parser[Option[(String, Option[String])]] =
(key ~ ("=" ~> value).?).? ^^ (_.map {case (k ~ v) => k.trim -> v.map(_.trim)})
val pairs: Parser[Map[String, Option[String]]] = phrase(repsep(pair, whiteSpace)) ^^
{ l => Map(l.flatten: _*)}

How to create a parser from Regex in Scala to parse a path?

I am writing a parser in which I am trying to parse a path and do arithmetic calculations. since I cannot use RegexParsers with StandardTokenParsers I am trying to make my own. So I am using the following code for that which I picked a part of it from another discussion:
lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
import lexical.StringLit
def regexStringLit(r: Regex): Parser[String] =
acceptMatch( "string literal matching regex " + r,{ case StringLit(s) if r.unapplySeq(s).isDefined => s })
def pathIdent: Parser[String] =regexStringLit ("/hdfs://([\\d.]+):(\\d+)/([\\w/]+/(\\w+\\.w+))".r)
def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] = pathIdent ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdent ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//some other code
def parse(s:String) = {
val tokens = new lexical.Scanner(s)
phrase(expr)(tokens)
}
Then I use args(0) to send my input to the program which is :
"/hdfs://111.33.55.2:8888/folder1/p.a3d+1"
and this is the error I get :
[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected
/hdfs://111.33.55.2:8888/folder1/p.a3d
^
I tried simple path and also I commented the rest of the code and just left the path part there but it seems like the regexStringLit is not working for me. I think I am wrong in syntax part. I don't know!
There are a couple of mistakes in you regex:
/hdfs://([\d.]+):(\d+)/([\w/]+/(\w+\.w+))
1) There are unnecessary parenthesis (or your forgot a +) - this is not a real mistake but makes it harder to read your regex and fix bugs.
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.w+
2) The last w+ is not escaped:
/hdfs://[\d.]+:\d+/[\w/]+/\w+\.\w+
3) You only allow . but not + for the last part:
/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+
The above expression matches your test case, however, I do suspect, you actually want this expression:
/hdfs://\d+(\.\d+){3}:\d+(/(\w+([-+.*/]\w+)*))+
I solved it writing a trait and using JavaTokenParsers rather than StandardToken Parser.
trait pathIdentifier extends RegexParsers{
def pathIdent: Parser[String] ={
"""hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))""".r
}
}
#Tilo Thanks for your help your solution is working as well but changing extended class to JavaTokenParser helped to solve the problem.

"updateDet" shouldnt recognize as keyword "update"

With this code
import scala.util.parsing.combinator.JavaTokenParsers
class TestKeywords extends JavaTokenParsers {
def keywords: Parser[String] = "update"
def identifier: Parser[String] = not(keywords) ~> """[a-zA-Z0-9_$#]+""".r
def script: Parser[Any] = repsep(identifier,",")
}
object TestKeywordsApp extends TestKeywords with App {
val cmd = """updateDet,update"""
parseAll(script,
cmd.stripMargin) match {
case Success(lup, _) => println(lup)
case x => println(x)
}
}
i get error
[1.1] failure: string matching regex \z' expected butu' found
updateDet,update
How to fix it? updateDet shouldnt recognize as keyword
scala 2.10.2
word boundaries perhaps
– Amit Joki
To expand, you've said that identifier is not(keywords) followed by some characters. But updateDet isn't that - it does start with a keyword. Perhaps you should declare that a keyword ends with a word boundary (regex \b)?
– lmm

How to encode a constraint on the format of String values

As I frequently observe and how I often implement a name attribute, is to simply model it as String.
What now, if the name has to follow a certain syntax, i.e. format? In Java I probably would define a constructor with a check on its arguments, something like:
public Name(str: String) {
if (str == null) throw new IllegalArgumentException("Str must not be null.");
if (!str.matches("name format expressed as regex")) throw new IllegalArgumentException("Str must match 'regex' but was " + str);
this.str = str;
}
In Scala I came up with the following solution:
import StdDef.Str
import StdDef.Bol
import StdDef.?
import scala.util.parsing.combinator.RegexParsers
final case class Name private (pfx: ?[Str] = None, sfx: Str) {
override def toString = pfx.mkString + sfx
}
object Name extends RegexParsers {
implicit def apply(str: Str): Name = parseAll(syntax, str) match {
case Success(res, _) => Name(res._1, res._2)
case rej: NoSuccess => error(rej.toString)
}
lazy val syntax = (prefix ?) ~! suffix
lazy val prefix = (("x" | "X") ~! hyph) ^^ { case a ~ b => a + b }
lazy val suffix = alpha ~! (alpha | digit | hyph *) ^^ { case a ~ b => a + b.mkString }
lazy val alpha: Parser[Str] = """\p{Alpha}""".r
lazy val digit: Parser[Str] = """\p{Digit}""".r
lazy val hyph: Parser[Str] = "-"
override lazy val skipWhitespace = false
}
My intents here are:
Compose a Name from its natural representation, i.e. a String value
Check whether its natural representation forms a valid Name at construction time.
Disallow any other construction than through the factory method apply:(str:Str)Str.
Make the construction from its natural representation implicit, e.g. val a: Name = "ISBN 978-0-9815316-4-9".
Decompose a Name into its parts according to its syntactical elements.
Have errors being thrown with messages, such as:
===
--
^
[1.3] error: string matching regex `\p{Alpha}' expected but end of source found
I would like to know what solutions you come up with.
After giving the topic some more thoughts, I am currently taking the following approach.
Token.scala:
abstract class Token {
val value: Str
}
object Token {
def apply[A <: Token](ctor: Str => A, syntax: Regex) = (value: Str) => value match {
case syntax() => ctor(value)
case _ => error("Value must match '" + syntax + "' but was '" + value + "'.")
}
}
Tokens.scala:
final case class Group private (val value: Str) extends Token
final case class Name private (val value: Str) extends Token
trait Tokens {
import foo.{ bar => outer }
val Group = Token(outer.Group, """(?i)[a-z0-9-]++""".r)
val Name = Token(outer.Name, """(?i)(?:x-)?+[a-z0-9-]++""".r)
}
Given that you'd be comfortable using a regex in Java, it seems like overkill to then try and solve the same problem with a parser in Scala.
Stick with what you know here, but add a Scala twist to clean up the solution a bit. Regexes in Scala also define extractors, allowing them to be used in a pattern match:
//triple-quote to make escaping easier, the .r makes it a regex
//Note how the value breaks normal naming conventions and starts in uppercase
//This is to avoid backticks when pattern matching
val TestRegex = """xxyyzz""".r
class Name(str: String) {
str match {
case Null => throw new IllegalArgumentException("Str must not be null")
case TestRegex => //do nothing
case _ => throw new IllegalArgumentException(
"Str must match 'regex' but was " + str)
}
}
disclaimer: I didn't actually test this code, it may contain typos

Parametrized Regex for pattern matching

Is it possible to match regular expression pattern which is returned from a function? Can I do something like this?
def pattern(prefix: String) = (prefix + "_(\\w+)").r
val x = something match {
case pattern("a")(key) => "AAAA" + key
case pattern("b")(key) => "BBBB" + key
}
I cannot compile the above code. The following console snapshot shows an error I get. What am I doing wrong?
scala> def pattern(prefix: String) = (prefix + "_(\\w+)").r
pattern: (prefix: String)scala.util.matching.Regex
scala> def f(s:String) = s match {
| case pattern("a")(x) => s+x+"AAAAA"
<console>:2: error: '=>' expected but '(' found.
case pattern("a")(x) => s+x+"AAAAA"
^
This syntax is not supported by scala, you have to declare the extractor before you use it. See my earlier question on this topic.