How to group similar characters in a string in scala? - regex

Lets assume I have a string as such:
val a = "aaaabbbcccss"
and I want to group only the a's and b's as such:
"a4b3cccss"
I have tries a.toList.groupBy(identity).mapValues(_.size) but that returns a map with no ordering so I cannot convert it into the form I want. I was wondering if there is a function in scala that can achieve what I want?

You may use
val a = "aaaabbbcccss"
val p = """([ab])\1*""".r
println(p replaceAllIn (a, m => s"${m.group(1)}${m.group(0).size}") )
See Scala demo
The regex matches:
([ab]) - Group 1: a or b
\1* - zero or more occurrences of the char captured into Group 1.
In the replacement part, m.group(1) is the char captured into Group 1 and m.group(0).size is the size of the whole match.

As an alternative, you might create a function which you can give your string and a list of characters and use a recursive approach where you could take consecutive characters from the list using takeWhile.
Then drop from the list using the length of the result from takewhile and add to the accumulator what you want to concatenate to the acc string which will be returned when the list will be empty.
def countSimilar(str: String, ch: List[Char]): String = {
def process(l: List[Char], acc: String = ""): String = {
l match {
case Nil => acc
case h :: _ =>
val tw = l.takeWhile(_ == h)
acc + process(
l.drop(tw.length),
if (ch.contains(h)) h + tw.length.toString else tw.mkString("")
)
}
}
process(str.toList)
}
println(countSimilar("aaaabbbcccss", List('a', 'b')))
println(countSimilar("aaaabbbcccssaaaabb", List('a', 'b', 'c')))
That will give you:
a4b3cccss
a4b3c3ssa4b2
See the Scala demo

Related

Find index locations by regex pattern and replace them with a list of indexes in Scala

I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}

Scala: string pattern matching and splitting

I am new to Scala and want to create a function to split Hello123 or Hello 123 into two strings as follows:
val string1 = 123
val string2 = Hello
What is the best way to do it, I have attempted to use regex matching \\d and \\D but I am not sure how to write the function fully.
Regards
You may replace with 0+ whitespaces (\s*+) that are preceded with letters and followed with digits:
var str = "Hello123"
val res = str.split("(?<=[a-zA-Z])\\s*+(?=\\d)")
println(res.deep.mkString(", ")) // => Hello, 123
See the online Scala demo
Pattern details:
(?<=[a-zA-Z]) - a positive lookbehind that only checks (but does not consume the matched text) if there is an ASCII letter before the current position in the string
\\s*+ - matches (consumes) zero or more spaces possessively, i.e.
(?=\\d) - this check is performed only once after the whitespaces - if any - were matched, and it requires a digit to appear right after the current position in the string.
Based on the given string I assume you have to match a string and a number with any number of spaces in between
here is the regex for that
([a-zA-Z]+)\\s*(\\d+)
Now create a regex object using .r
"([a-zA-Z]+)\\s*(\\d+)".r
Scala REPL
scala> val regex = "([a-zA-Z]+)\\s*(\\d+)".r
scala> val regex(a, b) = "hello 123"
a: String = "hello"
b: String = "123"
scala> val regex(a, b) = "hello123"
a: String = "hello"
b: String = "123"
Function to handle pattern matching safely
pattern match with extractors
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
Here is the function which does Regex with Pattern matching
def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
Scala REPL
scala> def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
defined function matchStr
scala> matchStr("Hello123")
res41: Option[(String, Int)] = Some(("Hello", 123))
scala> matchStr("Hello 123")
res42: Option[(String, Int)] = Some(("Hello", 123))

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

Using regex in Scala to group and pattern match

I need to process phone numbers using regex and group them by (country code) (area code) (number). The input format:
country code: between 1-3 digits
, area code: between 1-3 digits
, number: between 4-10 digits
Examples:
1 877 2638277
91-011-23413627
And then I need to print out the groups like this:
CC=91,AC=011,Number=23413627
This is what I have so far:
String s = readLine
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val ret = pattern.findAllIn(s)
println("CC=" + ret.group(1) + "AC=" + ret.group(2) + "Number=" + ret.group(3));
The compiler said "empty iterator." I also tried:
val (cc,ac,n) = s
and that didn't work either. How to fix this?
The problem is with your pattern. I would recommend using some tool like RegexPal to test them. Put the pattern in the first text box and your provided examples in the second one. It will highlight the matched parts.
You added spaces between your groups and [ -] separators, and it was expecting spaces there. The correct pattern is:
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
Also if you want to explicitly get groups then you want to get a Match returned. For an example the findFirstMatchIn function returns the first optional Match or the findAllMatchIn returns a list of matches:
val allMatches = pattern.findAllMatchIn(s)
allMatches.foreach { m =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
}
val matched = pattern.findFirstMatchIn(s)
matched match {
case Some(m) =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
case None =>
println("There wasn't a match!")
}
I see you also tried extracting the string into variables. You have to use the Regex extractor in the following way:
val Pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val Pattern(cc, ac, n) = s
println(s"CC=${cc}AC=${ac}Number=$n")
And if you want to handle errors:
s match {
case Pattern(cc, ac, n) =>
println(s"CC=${cc}AC=${ac}Number=$n")
case _ =>
println("No match!")
}
Also you can also take a look at string interpolation to make your strings easier to understand: s"..."

Extract numbers from string with rich string magic

I want to extract a list of ID of a string pattern in the following:
{(2),(4),(5),(100)}
Note: no leading or trailing spaces.
The List can have up to 1000 IDs.
I want to use rich string pattern matching to do this. But I tried for 20 minutes with frustration.
Could anyone help me to come up with the correct pattern? Much appreciated!
Here's brute force string manipulation.
scala> "{(2),(4),(5),(100)}".replaceAll("\\(", "").replaceAll("\\)", "").replaceAll("\\{","").replaceAll("\\}","").split(",")
res0: Array[java.lang.String] = Array(2, 4, 5, 100)
Here's a regex as #pst noted in the comments. If you don't want the parentheses change the regular expression to """\d+""".r.
val num = """\(\d+\)""".r
"{(2),(4),(5),(100)}" findAllIn res0
res33: scala.util.matching.Regex.MatchIterator = non-empty iterator
scala> res33.toList
res34: List[String] = List((2), (4), (5), (100))
"{(2),(4),(5),(100)}".split ("[^0-9]").filter(_.length > 0).map (_.toInt)
Split, where char is not part of a number, and only convert non-empty results.
Might be modified to include dots or minus signs.
Use Extractor object:
object MyList {
def apply(l: List[String]): String =
if (l != Nil) "{(" + l.mkString("),(") + ")}"
else "{}"
def unapply(str: String): Some[List[String]] =
Some(
if (str.indexOf("(") > 0)
str.substring(str.indexOf("(") + 1, str.lastIndexOf(")")) split
"\\p{Space}*\\)\\p{Space}*,\\p{Space}*\\(\\p{Space}*" toList
else Nil
)
}
// test
"{(1),(2)}" match { case MyList(l) => l }
// res23: List[String] = List(1, 2)