Scala pattern matching with undefined number of parameters - regex

I am developping a string parser in scala. I am facing an issue where I need to not always match the same number of parameters.
To be more clear, my code as follow :
line match {
case regex(first, second, third, ...) => // sometimes 2 arguments, sometimes more
// do stuff
case _ =>
println("Wrong parsing")
}
As you can see, I need to define dynamically my arguments. Do you have an idea to achieve this ? I tried to use a list, but I had no success.
PS : my regex is dynamically generated
UPDATE : thanks to sheunis' answer I found the solution.
line match {
case regex(args # _*) =>
println(args(0))
println(args(1))
println(args(2))
... // as much as you have
case _ => println("Wrong parsing")
}

case class Regex(args: String*)
val test = Regex("a", "b", "c")
test match {
case Regex(args # _*) => for (arg <- args) println(arg)
case _ => println("Wrong parsing")
}

Related

scala regex : extract from string

I am trying to extract few values out of an big string , I have an hard time extracting them , I have tired a couple of regex patterns , but they always give me no match. Anyway they seem to work in the online regex sites available but not in Scala. What I am trying to do is
Input :
ESSSTOR\Disk&Ven_VendorName&Prod_MO_Might_MS_5.0&Rev_6.01\08765J54U3K4QVR0&0
Extract [output]:
Vendorname
MO_Might_MS_5.0&Rev_6.01
08765J54U3K4QVR0&0
I am trying to extract those three values from the input string , but unable to do so.
Can some one please me see what I am doing wrong.
Thanks in advance.
//Input value
val device:String= "ESSSTOR\\Disk&Ven_VendorName&Prod_MO_Might_MS_5.0&Rev_6.01\\08765J54U3K4QVR0&0"
// Regex build for product extraction
val proReg= """.*[Prod_]([^\\\\]*)""".r
// """.*Prod_([^\\\\]*)""".r -- no match as output
// """(?:Prod_)([^\\\\]*)""".r -- no match as output
println("Device: "+device)
// method -1:
device match{
case proReg(prVal) => println(s"$prVal is product val")
case _ => println("no match") }
// method-2 :
val proReg(g1) = "ESSSTOR\\Disk&Ven_VendorName&Prod_MO_Might_MS_5.0&Rev_6.01\\08765J54U3K4QVR0&0"
println(s"group1: $g1 ")
O/P:
Device: ESSSTOR\Disk&Ven_VendorName&Prod_MO_Might_MS_5.0&Rev_6.01\08765J54U3K4QVR0&0
//method-1
no match
// method-2
error
// Regex build for dev serial
val serReg = """(?:Prod_\\S*[\\\\])(.*)""".r
device match {
case serReg(srVal) => println(s"$srVal is product val")
case _ => println("no match")
}
o/p:
no match
// Regex for vendor
val venReg="""(?:Ven_)([^&]*)""".r
device match {
case venReg(vnVal) => println(s"$vnVal is vendor val")
case _ => println("no match")
}
o/p:
no match
See if this gets closer to what you want.
val pttrn = raw"Ven_([^&]+)&Prod_([^&]+)&Rev_6.01\\(.*)".r.unanchored
device match {
case pttrn(ven, prod, rev) =>
s"vendor: $ven\nproduct: $prod\nrevNum: $rev"
case _ => "does not match pattern"
}
explanation
Ven_([^&]+) --> Look for something that begins with Ven_. Capture everything that isn't an ampersand &.
&Prod_([^&]+) --> That should be followed by the Prod_ string. Capture everything that isn't an ampersand &.
&Rev_6.01\\(.*) --> That should be followed by the Rev_ string that ends with a single backslash \. Capture everything that follows.

Match case not behaving the same as Regex.findAll

I have a little helper method, which has to normalize some money values. Hence, I wrote some regular expressions, which should detect different ways of representing them. Strangely they only trigger if used with Regex.findAllIn(..), but not if used in a match case statement.
val result = extractAmount("23772.90")
def extractAmount(amountStr: String): BigDecimal = {
val Plain = """^\d+$""".r
val Dot = """^(\d+)\.(\d*)$""".r
val Comma = """^(\d+),(\d*)$""".r
val DotComma = """^(\d+)\.(\d+),(\d*)$""".r
val CommaDot = """^(\d+),(\d+)\.(\d*)$""".r
if (Dot.findAllIn(amountStr).hasNext)
println(Dot.findAllIn(amountStr).next())
amountStr match {
case Plain(value) => new java.math.BigDecimal(value)
case Dot(values) => new BigDecimal(s"${values(0)}.${values(1)}")
case Comma(values) => new BigDecimal(s"${values(0)}.${values(1)}")
case DotComma(values) => new BigDecimal(s"${values(0)}${values(1)}.${values(2)}")
case CommaDot(values) => new BigDecimal(s"${values(0)}${values(1)}.${values(2)}")
case _ => throw new RuntimeException(s"Money amount string -->${amountStr}<-- did not match any pattern.")
}
}
Debugger output hitting Regex.findAllIn(..):
Debugger output not hitting the match case for Dot(values):
Also interesting might be following error message in the debugger:
Using scala version 2.11.8.
I am puzzled, for sure overlooking something obvious. Thankful for a hint.
Instead of doing e.g.
case Dot(values) => new BigDecimal(s"${values(0)}.${values(1)}")
rewrite the usage of your Regex extractors like this:
case Dot(a, b) => new BigDecimal(s"$a.$b")
The amount of arguments in each extractor must match the amount of groups your regex contains (here: 2). Each argument is just a string that represents the content of one single group.

Scala match statements with inline regexes

I am trying (if at all possible) to get a Scala match/case statement to perform an inline regex match for me.
Specifically, I have a method that will run a match, and if the input to the method starts with the string "fizz", then I would like the match statement to select the correct case:
def animalToSound(animal : String) : String = {
animal match {
case "duck" => "quack"
case "lion" => "roar"
case "dog" => "woof"
case matchesFizzRegex(animal) => "heyo!"
case _ => "meow"
}
}
def matchesFizzRegex(animal : String) : ??? = {
val fizzRegex = "fizz*".r
if(fizzRegex.match(animal)) {
???
} else {
???
}
}
So if I call animalToSound("fizzBuzz"), then the desired behavior is:
Does "fizzBuzz" equal "duck"? No. So try the next case.
Does "fizzBuzz" equal "lion"? No. So try the next case.
Does "fizzBuzz" equal "dog"? No. So try the next case.
Does "fizzBuzz" match the fizz regex (any string starting with 'fizz')? Yes, so return "heyo!"
Any ideas how I can get this working properly?
Simple and straight forward
Use pattern matching with guards and matches method of string
def animalToSound(animal : String) : String = animal match {
case "duck" => "quack"
case "lion" => "roar"
case "dog" => "woof"
case x if x matches "fizz.*" => "heyo!"
case _ => "meow"
}
You can match regex among other cases:
val reg = "fizz.*".r
animal match {
case "duck" => "quack"
case "lion" => "roar"
case "dog" => "woof"
case `reg` => "heyo!"
case _ => "meow"
}

Do parenthesized groups work in Scala?

Parentheses in regular expressions don't seem to work in match/case statements. For example, the following code
val pat1 = """ab""".r
val pat2 = """(a)(b)""".r
val pat3 = """((a)(b))""".r
val pat4 = """((a)b)""".r
val pat5 = """(ab)""".r
"ab" match {
case pat1(x) => println("1 " + x)
case pat2(x) => println("2 " + x)
case pat3(x) => println("3 " + x)
case pat4(x) => println("4 " + x)
case pat5(x) => println("5 " + x)
case _ => println("None of the above")
}
prints "5 ab", but I would have expected any of the patterns to match. I'd like to use "(...)?" optional elements, but I can't. Related to this, I can't get (?m) to work. My patterns work okay outside of a match/case expression. Can someone explain to me how Scala handles regular expressions in match/case expressions?
I'm trying to write a tokenizer in Scala
Regex defines unapplySeq, not unapply, which means that you get each group in its own variable. Also, although lower-case matchers may work in some instances (i.e. with parameters), you really should use upper-case. So, what will work is:
val Pat1 = """ab""".r
val Pat2 = """(a)(b)""".r
val Pat3 = """((a)(b))""".r
val Pat4 = """((a)b)""".r
val Pat5 = """(ab)""".r
def no() { println("No match") }
"ab" match { case Pat1() => println("Pat1"); case _ => no }
"ab" match { case Pat2(x,y) => println("Pat2 "+x+" "+y); case _ => no }
"ab" match { case Pat3(x,y,z) => println("Pat3 "+x+" "+y+" "+z); case _ => no }
"ab" match { case Pat4(x,y) => println("Pat4 "+x+" "+y); case _ => no }
"ab" match { case Pat5(x) => println("Pat5 "+x); case _ => no }
(You will always get a match.)
If you want all matches, use # _*
"ab" match { case Pat3(w # _*) => println(w); case _ => no }
I'm not sure what you mean by (?a) so I don't know what's wrong with it. Don't confuse (?a) with (?:a) (or with (a?) or with (a)?).
Here's an example of how you can access group(1) of each match:
val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
Test this demo here.

How to pattern match using regular expression in Scala?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:
case Process(word) =>
word.firstLetter match {
case([a-c][A-C]) =>
case _ =>
}
}
But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?
You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.
val Pattern = "([a-cA-C])".r
word.firstLetter match {
case Pattern(c) => c bound to capture group here
case _ =>
}
Since version 2.10, one can use Scala's string interpolation feature:
implicit class RegexOps(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true
Even better one can bind regular expression groups:
scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123
scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25
It is also possible to set more detailed binding mechanisms:
scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler
scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20
scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive
scala> "10" match { case r"(\d\d)${d # isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10
An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:
object T {
class RegexpExtractor(params: List[String]) {
def unapplySeq(str: String) =
params.headOption flatMap (_.r unapplySeq str)
}
class StartsWithExtractor(params: List[String]) {
def unapply(str: String) =
params.headOption filter (str startsWith _) map (_ => str)
}
class MapExtractor(keys: List[String]) {
def unapplySeq[T](map: Map[String, T]) =
Some(keys.map(map get _))
}
import scala.language.dynamics
class ExtractorParams(params: List[String]) extends Dynamic {
val Map = new MapExtractor(params)
val StartsWith = new StartsWithExtractor(params)
val Regexp = new RegexpExtractor(params)
def selectDynamic(name: String) =
new ExtractorParams(params :+ name)
}
object p extends ExtractorParams(Nil)
Map("firstName" -> "John", "lastName" -> "Doe") match {
case p.firstName.lastName.Map(
Some(p.Jo.StartsWith(fn)),
Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
println(s"Match! $fn ...$lastChar")
case _ => println("nope")
}
}
As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:
word.matches("[a-cA-C].*")
You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).
To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:
val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
case Process("b", _) => println("first: 'a', some rest")
case Process(_, rest) => println("some first, rest: " + rest)
// etc.
}
String.matches is the way to do pattern matching in the regex sense.
But as a handy aside, word.firstLetter in real Scala code looks like:
word(0)
Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:
"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true
I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.
EDIT:
To be clear, the way I would do this is, as others have said:
"Cat".matches("^[a-cA-C].*")
res14: Boolean = true
Just wanted to show an example as close as possible to your initial pseudocode. Cheers!
First we should know that regular expression can separately be used. Here is an example:
import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
case Some(v) => println(v)
case _ =>
} // output: Scala
Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
case date(year, month, day) => "hello"
} // output: hello
In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex
Note that the approach from #AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:
scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*
scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo
scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
And with no .* at the end:
scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)
scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match