Scala string pattern match regex a star multiple findallin - regex

I want to parse this string: "er1r2r3" with: """(e|w|n|s)(r[1-3])*""".r
val SideR = """(e|w|n|s)""".r
val PieceR = """(r)([1-3])""".r
def parseSidedPieces(str: String): (Char, List[Char]) = {
val side = str(0) match {
case SideR(s) => s
}
val pieces = parsePieces(str.tail)
(side, pieces)
}
def parsePieces(str: String): List[Char] = {
PieceR.findAllIn(str).toList map {
case PieceR(c, n) => n
}
}
But this throws on empty string "" because str(0).
Fix this, regex only.

I don't think this can be fixed 'regexes only' (whatever that is supposed to mean), because the code fails before the first regex is used.
It fails because you call apply(index: Int) on an empty String. So, either you do an isEmpty check before calling str(0) or even parseSidedPieces, or you change the code and match the whole String:
val PieceR = """(r)([1-3])""".r
val CombinedR = "(e|w|n|s)((?:r[1-3])*)".r
def parseSidedPieces(str: String): (Char, List[Char]) = {
str match {
case CombinedR(side, pieces) =>
(side(0), parsePieces(pieces))
case "" =>
// hmm, what kind of tuple would be a good return value here? maybe:
throw new IllegalArgumentException(s"Unexpected input: $str")
case _ =>
// handle unmatched strings however you like, I'd do:
throw new IllegalArgumentException(s"Unexpected input: $str")
}
}
def parsePieces(str: String): List[Char] = {
PieceR.findAllIn(str).toList map {
case PieceR(c, n) => n(0)
}
}
parseSidedPieces("er1r2r3") |-> res0: (Char, List[Char]) = (e,List(1, 2, 3))

Related

Scala - match on regex for non-empty string

In Scala - I need to validate if a given string is non-empty. Following snippet returns true. What is the issue with the regex along with match?
def isEmpty(input: String): String = {
val nonEmptyStringPattern = raw"""(\w+)""".r
input match {
case nonEmptyStringPattern => s"matched $input"
case _ => "n/a"
}
}
However, the same regex works are expected on matches method as below.
def isEmpty(input: String): String = {
val nonEmptyStringPattern = raw"""(\w+)""".r
input match {
case input if nonEmptyStringPattern matches( input) => s"matched $input"
case _ => "n/a" ```.
}
}
Does this mean match cannot have regex instances ?
Just as case x => ... creates a new variable x to match against, it's the same for case nonEmptyStringPattern => .... A new variable is created that shadows the existing nonEmptyStringPattern. And as it's an unencumbered variable, it will match anything and everything.
Also, you've created and compiled a regex pattern but you have to invoke it in order to pattern match against it.
def isEmpty(input: String): String = {
val nonEmptyStringPattern = "\\w+".r
input match {
case nonEmptyStringPattern() => s"matched $input"
case _ => "n/a"
}
}
This now works, except for the fact that not all String characters are \w word characters.
isEmpty("") //res0: String = n/a
isEmpty("abc") //res1: String = matched abc
isEmpty("#$#") //res2: String = n/a

Scala, regex matching ignore unnecessary words

My program is:
val pattern = "[*]prefix_([a-zA-Z]*)_[*]".r
val outputFieldMod = "TRASHprefix_target_TRASH"
var tar =
outputFieldMod match {
case pattern(target) => target
}
println(tar)
Basically, I try to get the "target" and ignore "TRASH" (I used *). But it has some error and I am not sure why..
Simple and straight forward standard library function (unanchored)
Use Unanchored
Solution one
Use unanchored on the pattern to match inside the string ignoring the trash
val pattern = "prefix_([a-zA-Z]*)_".r.unanchored
unanchored will only match the pattern ignoring all the trash (all the other words)
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """foo\((.*)\)""".r.unanchored
pattern: scala.util.matching.UnanchoredRegex = foo\((.*)\)
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res3: String = bar
Solution two
Pad your pattern from both sides with .*. .* matches any char other than a linebreak character.
val pattern = ".*prefix_([a-zA-Z]*)_.*".r
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """.*foo\((.*)\).*""".r
pattern: scala.util.matching.Regex = .*foo\((.*)\).*
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res4: String = bar
This will work, val pattern = ".*prefix_([a-z]+).*".r, but it distinguishes between target and trash via lower/upper-case letters. Whatever determines real target data from trash data will determine the real regex pattern.

Matching or regular expression in scala

I have a regular expression like ".*(A20).*|.*(B30).*|C".
I would like to write a function its returns A20 or B30 based on the match found.
val regx=".*(A20).*|.*(B30).*".r
"helloA20" match { case regx(b,_) => b; case _ => "" } // A20
"helloB30" match { case regx(b,_) => b; case _ => "" } // null
"C" match { case regx(b,_) => b case _ => "" }
It's returning null because I am not considering the second group. In my actual code, I have a lot of group like that. I would like to return the matched string. Please help me to find a solution.
Easy! It should be like this:
val regx="^(.*(B30|A20).*|(C))$".r
Demo: https://regex101.com/r/nA6dQ9/1
Then you get the second value in the array of every group.
That way you only have one group regardless of the many possibilities.
You're close:
def extract(s: String) = s match {
case regx(b, _) if b != null => b
case regx(_, b) if b != null => b
case _ => ""
}
extract("helloA20")
res3: String = A20
extract("helloB30")
res4: String = B30
extract("A30&B30")
res6: String = B30
If you have a lot of groups, it's reasonable to use for comprehension insted of pattern matching. This code will return first match or None:
val letters = ('A' to 'Z').toSeq
val regex = letters.map(_.toString).mkString("(", "|", ")").r
def extract(s: String) = {
for {
m <- regex.findFirstMatchIn(s)
} yield m.group(1)
}
extract("A=B=")
extract("dsfdsBA")
extract("C====b")
extract("a====E")
res0: Option[String] = Some(A)
res1: Option[String] = Some(B)
res2: Option[String] = Some(C)
res3: Option[String] = Some(E)

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

How to pattern match using regular expression in Scala?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:
case Process(word) =>
word.firstLetter match {
case([a-c][A-C]) =>
case _ =>
}
}
But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?
You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.
val Pattern = "([a-cA-C])".r
word.firstLetter match {
case Pattern(c) => c bound to capture group here
case _ =>
}
Since version 2.10, one can use Scala's string interpolation feature:
implicit class RegexOps(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true
Even better one can bind regular expression groups:
scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123
scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25
It is also possible to set more detailed binding mechanisms:
scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler
scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20
scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive
scala> "10" match { case r"(\d\d)${d # isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10
An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:
object T {
class RegexpExtractor(params: List[String]) {
def unapplySeq(str: String) =
params.headOption flatMap (_.r unapplySeq str)
}
class StartsWithExtractor(params: List[String]) {
def unapply(str: String) =
params.headOption filter (str startsWith _) map (_ => str)
}
class MapExtractor(keys: List[String]) {
def unapplySeq[T](map: Map[String, T]) =
Some(keys.map(map get _))
}
import scala.language.dynamics
class ExtractorParams(params: List[String]) extends Dynamic {
val Map = new MapExtractor(params)
val StartsWith = new StartsWithExtractor(params)
val Regexp = new RegexpExtractor(params)
def selectDynamic(name: String) =
new ExtractorParams(params :+ name)
}
object p extends ExtractorParams(Nil)
Map("firstName" -> "John", "lastName" -> "Doe") match {
case p.firstName.lastName.Map(
Some(p.Jo.StartsWith(fn)),
Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
println(s"Match! $fn ...$lastChar")
case _ => println("nope")
}
}
As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:
word.matches("[a-cA-C].*")
You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).
To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:
val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
case Process("b", _) => println("first: 'a', some rest")
case Process(_, rest) => println("some first, rest: " + rest)
// etc.
}
String.matches is the way to do pattern matching in the regex sense.
But as a handy aside, word.firstLetter in real Scala code looks like:
word(0)
Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:
"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true
I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.
EDIT:
To be clear, the way I would do this is, as others have said:
"Cat".matches("^[a-cA-C].*")
res14: Boolean = true
Just wanted to show an example as close as possible to your initial pseudocode. Cheers!
First we should know that regular expression can separately be used. Here is an example:
import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
case Some(v) => println(v)
case _ =>
} // output: Scala
Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
case date(year, month, day) => "hello"
} // output: hello
In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex
Note that the approach from #AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:
scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*
scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo
scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
And with no .* at the end:
scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)
scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match