Tuples from string by regular expressions in Scala - regex

I have string like {param1=foo}{param2=bar}hello world!
I need to extract array of tuples (paramName, value) from this string and get something like [(param1, foo), (param2, bar)]
Is it possible in Scala to extract this tuples by only one regex? Because I managed to do this only in way like
val str = "{param1=foo}{param2=bar}hello world!"
val param = """(?<=\{)(.+?)(?=\})""".r // extract everything between { and }
val keyValue = """(.+)=(.+)""".r // for extracting key and value
val parameters = for (keyValue(key,value) <- param.findAllIn(str).toArray)
yield (key,value)
And it doesn't look sweet.
Also I tried to use
val param = """(?<=\{)(.+?)=(.+?)(?=\})""".r
But it return param=value as one string

Here's an expression that will find things like {A=B} where A and B do not contain {, }, or =.
scala> val Re = """\{([^{}=]+)=([^{}=]+)\}""".r
scala> val Re(a,b) = "{param1=foo}"
a: String = param1
b: String = foo
And if you want to find all matches in a string:
scala> val s = "{param1=foo}{param2=bar}hello world!"
scala> Re.findAllIn(s).matchData.map(_.subgroups).toList
res9: List[List[String]] = List(List(param1, foo), List(param2, bar))

Without regex you can do:
scala> val str = "{param1=foo}{param2=bar}hello world!"
scala> str split '}' filter(x => x.head =='{' && x.contains('=')) map{x => val Array(key, value) = x.tail split '='; key -> value }
res9: Array[(java.lang.String, java.lang.String)] = Array((param1,foo), (param2,bar))
Or in a clearer way:
// We find different blocks
val str1 = str split '}'
// We remove invalid blocks (end of the String in your case)
val str2 = str1 filter(x => x.head == '{' && x.contains('='))
// We transform the String into a tupple, removing the head
val str3 = str2 map{x =>
val Array(key, value) = x.tail split '='
key -> value
}

Related

Scala, regex matching ignore unnecessary words

My program is:
val pattern = "[*]prefix_([a-zA-Z]*)_[*]".r
val outputFieldMod = "TRASHprefix_target_TRASH"
var tar =
outputFieldMod match {
case pattern(target) => target
}
println(tar)
Basically, I try to get the "target" and ignore "TRASH" (I used *). But it has some error and I am not sure why..
Simple and straight forward standard library function (unanchored)
Use Unanchored
Solution one
Use unanchored on the pattern to match inside the string ignoring the trash
val pattern = "prefix_([a-zA-Z]*)_".r.unanchored
unanchored will only match the pattern ignoring all the trash (all the other words)
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """foo\((.*)\)""".r.unanchored
pattern: scala.util.matching.UnanchoredRegex = foo\((.*)\)
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res3: String = bar
Solution two
Pad your pattern from both sides with .*. .* matches any char other than a linebreak character.
val pattern = ".*prefix_([a-zA-Z]*)_.*".r
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """.*foo\((.*)\).*""".r
pattern: scala.util.matching.Regex = .*foo\((.*)\).*
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res4: String = bar
This will work, val pattern = ".*prefix_([a-z]+).*".r, but it distinguishes between target and trash via lower/upper-case letters. Whatever determines real target data from trash data will determine the real regex pattern.

Multi capturing groups in scala regex

I'm trying to go from
val s: String = "sometextHere[a][b][c]"
to
val x = "sometextHere"
val y = List("a", "b", "c")
The number of "[...]" is 1+.
I've got something pretty hacky but I feel like there must be a better solution
val bracketMatcher = "\\[(\\w+)\\]".r
val listMatcher = s"^(\\w+)((?:$bracketMatcher)+)".r
listMatcher.findAllIn(chunk) match {
case matchIterator if matchIterator.hasNext =>
val matchData = matchIterator.matchData.next()
val indexesMatch = bracketMatcher.findAllIn(matchData.group(2)).matchData.flatMap(_.subgroups).toList
val a = matchData.group(1) // This is "sometextHere"
val b = indexesMatch // This is List("a", "b", "c")
case _ => ...
Regexes are easier to write in triple quotes. Also, you don't have to match the entire thing at once:
def allMatches(s: String): (String, List[String]) = {
val bracketMatcher = """\[(\w+)\]""".r
val startMatcher = """^(\w+)\[""".r
val first = startMatcher.findFirstMatchIn(s).get.group(1)
val matches = bracketMatcher.findAllMatchIn(s)
val indexes = matches.map(_.group(1)).toList
(first, indexes)
}
allMatches("sometextHere[a][b][c]")
Robert gave a good warning, though. Make sure your input data has no nesting, or you won't be able to handle it with regular expressions. If you have nesting, you'll have to use a proper parser.

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

Scala Replacement by Regex

I've a string like
val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo".
Now I split this string and define the pattern
val split = bar.split("-")
val pattern = ".*(A|K)\\d.*".r
and now I want to replace A9K9foo in the last entry of 'split'
val last = split.last
val suffix = last match {
case pattern(_) => last replaceFirst ("""(A\d)?(K\d)?.*""", "")
case _ => last
}
What I know is that replaceFirst is executed but it won't replace A9K9foo in my 'last' val
(replaceFirst should only executed if 'last' matches 'pattern'), the wanted result is M2.
Edit: It could happen that last is not M9A9K9foo but M9A9 or M9K9foo or maybe M9A9K9. All i want is to replace all content except the text before A\d or K\d but if there is no A\d or K\d nothing should happen.
Do you know why this replacement won't work?
You're using String.replaceFirst, and your pattern has a wildcard that consumes everything.
Maybe you want:
last replaceFirst ("""A\dK\d""", "")
where the A9K9 is not optional, and that's all you want to replace.
There are other formulations:
scala> val r = """(A\dK\d)""".r
r: scala.util.matching.UnanchoredRegex = (A\dK\d)
scala> val m = (r findAllMatchIn bar).toList.last
m: scala.util.matching.Regex.Match = A9K9
scala> s"${m.before}${m.after}"
res15: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9foo
That's not the most clever.
More:
scala> val r = """(A|K)\d""".r
r: scala.util.matching.Regex = (A|K)\d
scala> val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo"
bar: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo
scala> val last = (bar split "-").last
last: String = M9A9K9foo
scala> r findFirstMatchIn last map (_.before) getOrElse last
res0: CharSequence = M9
scala> val r = """(.*?)((A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)((A|K)\d.*)?
scala> last match { case r(prefix, _*) => prefix }
res1: String = M9
scala> "M9" match { case r(prefix, _*) => prefix }
res2: String = M9
scala> "M9K9foo" match { case r(prefix, _*) => prefix }
res3: String = M9
scala> val r = """(.*?)(?:(?:A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)(?:(?:A|K)\d.*)?
scala> last match { case r(prefix) => prefix }
res4: String = M9
The diagnosis is the same; there are different ways to pull the string apart.

Scala: extracting part of a Strings using Regular Expressions

I have a very simple string like this one:
"Some(1234)"
I'd like to extract "1234" out from it. How can I do it?
val s = "Some(1234)"
//s: java.lang.String = Some(1234)
val Pattern = """Some\((\d+)\)""".r
//Pattern: scala.util.matching.Regex = Some\((\d+)\)
val Pattern(number) = s
//number: String = 1234
Switch out your regex for whatever you need. \d+ limits it to digits only.
scala> val s = "Some(1234)"
s: String = Some(1234)
scala> val nums = "[0-9]".r
nums: scala.util.matching.Regex = [0-9]
scala> nums.findAllIn(s).mkString
res0: String = 1234
Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:
val s"Some($number)" = "Some(1234)"
// number: String = 1234
Also note that if the idea is to extract an Option[Int] from its toString version, you can use the interpolation extraction with a match statement:
x match { case s"Some($number)" => number.toIntOption case _ => None }
// x = "Some(1234)" => Option[Int] = Some(1234)
// x = "Some(1S3R)" => Option[Int] = None
// x = "None" => Option[Int] = None
just another way, playing with the regex. Limit to 4 digits.
def getnumber (args: Array[String]) {
val str = "Some(1234)"
val nums = "\\d{4}".r
println (nums.findAllIn(str).mkString)
}