Multi capturing groups in scala regex - regex

I'm trying to go from
val s: String = "sometextHere[a][b][c]"
to
val x = "sometextHere"
val y = List("a", "b", "c")
The number of "[...]" is 1+.
I've got something pretty hacky but I feel like there must be a better solution
val bracketMatcher = "\\[(\\w+)\\]".r
val listMatcher = s"^(\\w+)((?:$bracketMatcher)+)".r
listMatcher.findAllIn(chunk) match {
case matchIterator if matchIterator.hasNext =>
val matchData = matchIterator.matchData.next()
val indexesMatch = bracketMatcher.findAllIn(matchData.group(2)).matchData.flatMap(_.subgroups).toList
val a = matchData.group(1) // This is "sometextHere"
val b = indexesMatch // This is List("a", "b", "c")
case _ => ...

Regexes are easier to write in triple quotes. Also, you don't have to match the entire thing at once:
def allMatches(s: String): (String, List[String]) = {
val bracketMatcher = """\[(\w+)\]""".r
val startMatcher = """^(\w+)\[""".r
val first = startMatcher.findFirstMatchIn(s).get.group(1)
val matches = bracketMatcher.findAllMatchIn(s)
val indexes = matches.map(_.group(1)).toList
(first, indexes)
}
allMatches("sometextHere[a][b][c]")
Robert gave a good warning, though. Make sure your input data has no nesting, or you won't be able to handle it with regular expressions. If you have nesting, you'll have to use a proper parser.

Related

Scala string pattern match regex a star multiple findallin

I want to parse this string: "er1r2r3" with: """(e|w|n|s)(r[1-3])*""".r
val SideR = """(e|w|n|s)""".r
val PieceR = """(r)([1-3])""".r
def parseSidedPieces(str: String): (Char, List[Char]) = {
val side = str(0) match {
case SideR(s) => s
}
val pieces = parsePieces(str.tail)
(side, pieces)
}
def parsePieces(str: String): List[Char] = {
PieceR.findAllIn(str).toList map {
case PieceR(c, n) => n
}
}
But this throws on empty string "" because str(0).
Fix this, regex only.
I don't think this can be fixed 'regexes only' (whatever that is supposed to mean), because the code fails before the first regex is used.
It fails because you call apply(index: Int) on an empty String. So, either you do an isEmpty check before calling str(0) or even parseSidedPieces, or you change the code and match the whole String:
val PieceR = """(r)([1-3])""".r
val CombinedR = "(e|w|n|s)((?:r[1-3])*)".r
def parseSidedPieces(str: String): (Char, List[Char]) = {
str match {
case CombinedR(side, pieces) =>
(side(0), parsePieces(pieces))
case "" =>
// hmm, what kind of tuple would be a good return value here? maybe:
throw new IllegalArgumentException(s"Unexpected input: $str")
case _ =>
// handle unmatched strings however you like, I'd do:
throw new IllegalArgumentException(s"Unexpected input: $str")
}
}
def parsePieces(str: String): List[Char] = {
PieceR.findAllIn(str).toList map {
case PieceR(c, n) => n(0)
}
}
parseSidedPieces("er1r2r3") |-> res0: (Char, List[Char]) = (e,List(1, 2, 3))

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

How does regex capturing work in scala?

Here is an example:
object RegexTest {
def main (args: Array[String]): Unit = {
val input = "Enjoy this apple 3.14 times"
val pattern = """.* apple ([\d.]+) times""".r
val pattern(amountText) = input
val amount = amountText.toDouble
println(amount)
}
}
I understand what this does, but how does val pattern(amountText) = input actually work? It looks very weird to me.
What that line is doing is calling Regex.unapplySeq (which is also called an extractor) to deconstruct input into a list of captured groups, and then bind each group to a new variable. In this particular scenario, only one group is expected to be captured and bound to the value amountText.
Validation aside, this is kinda what's going on behind the scenes:
val capturedGroups = pattern.unapplySeq(input)
val amountText = capturedGroups(0)
// And this:
val pattern(a, b, c) = input
// Would be equivalent to this:
val capturedGroups = pattern.unapplySeq(input)
val a = capturedGroups(0)
val b = capturedGroups(1)
val c = capturedGroups(2)
It is very similar in essence to extracting tuples:
val (a, b) = (2, 3)
Or even pattern matching:
(2,3) match {
case (a, b) =>
}
In both of these cases, Tuple.unapply is being called.
I suggest you have a look at this page : http://docs.scala-lang.org/tutorials/tour/extractor-objects.html. It is the official tutorial regarding extractors which this the pattern you are looking for.
I find that looking at the source makes it clear how it works : https://github.com/scala/scala/blob/2.11.x/src/library/scala/util/matching/Regex.scala#L243
Then, note that your code val pattern(amountText) = input is perfectly working, but, you must be sure about the input and be sure that there is a match with the regex.
Otherwise, I recommend you to write it this way :
input match {
case pattern(amountText) => ...
case _ => ...
}

Scala: extracting part of a Strings using Regular Expressions

I have a very simple string like this one:
"Some(1234)"
I'd like to extract "1234" out from it. How can I do it?
val s = "Some(1234)"
//s: java.lang.String = Some(1234)
val Pattern = """Some\((\d+)\)""".r
//Pattern: scala.util.matching.Regex = Some\((\d+)\)
val Pattern(number) = s
//number: String = 1234
Switch out your regex for whatever you need. \d+ limits it to digits only.
scala> val s = "Some(1234)"
s: String = Some(1234)
scala> val nums = "[0-9]".r
nums: scala.util.matching.Regex = [0-9]
scala> nums.findAllIn(s).mkString
res0: String = 1234
Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:
val s"Some($number)" = "Some(1234)"
// number: String = 1234
Also note that if the idea is to extract an Option[Int] from its toString version, you can use the interpolation extraction with a match statement:
x match { case s"Some($number)" => number.toIntOption case _ => None }
// x = "Some(1234)" => Option[Int] = Some(1234)
// x = "Some(1S3R)" => Option[Int] = None
// x = "None" => Option[Int] = None
just another way, playing with the regex. Limit to 4 digits.
def getnumber (args: Array[String]) {
val str = "Some(1234)"
val nums = "\\d{4}".r
println (nums.findAllIn(str).mkString)
}

Tuples from string by regular expressions in Scala

I have string like {param1=foo}{param2=bar}hello world!
I need to extract array of tuples (paramName, value) from this string and get something like [(param1, foo), (param2, bar)]
Is it possible in Scala to extract this tuples by only one regex? Because I managed to do this only in way like
val str = "{param1=foo}{param2=bar}hello world!"
val param = """(?<=\{)(.+?)(?=\})""".r // extract everything between { and }
val keyValue = """(.+)=(.+)""".r // for extracting key and value
val parameters = for (keyValue(key,value) <- param.findAllIn(str).toArray)
yield (key,value)
And it doesn't look sweet.
Also I tried to use
val param = """(?<=\{)(.+?)=(.+?)(?=\})""".r
But it return param=value as one string
Here's an expression that will find things like {A=B} where A and B do not contain {, }, or =.
scala> val Re = """\{([^{}=]+)=([^{}=]+)\}""".r
scala> val Re(a,b) = "{param1=foo}"
a: String = param1
b: String = foo
And if you want to find all matches in a string:
scala> val s = "{param1=foo}{param2=bar}hello world!"
scala> Re.findAllIn(s).matchData.map(_.subgroups).toList
res9: List[List[String]] = List(List(param1, foo), List(param2, bar))
Without regex you can do:
scala> val str = "{param1=foo}{param2=bar}hello world!"
scala> str split '}' filter(x => x.head =='{' && x.contains('=')) map{x => val Array(key, value) = x.tail split '='; key -> value }
res9: Array[(java.lang.String, java.lang.String)] = Array((param1,foo), (param2,bar))
Or in a clearer way:
// We find different blocks
val str1 = str split '}'
// We remove invalid blocks (end of the String in your case)
val str2 = str1 filter(x => x.head == '{' && x.contains('='))
// We transform the String into a tupple, removing the head
val str3 = str2 map{x =>
val Array(key, value) = x.tail split '='
key -> value
}