Extract numbers from string with rich string magic - regex

I want to extract a list of ID of a string pattern in the following:
{(2),(4),(5),(100)}
Note: no leading or trailing spaces.
The List can have up to 1000 IDs.
I want to use rich string pattern matching to do this. But I tried for 20 minutes with frustration.
Could anyone help me to come up with the correct pattern? Much appreciated!

Here's brute force string manipulation.
scala> "{(2),(4),(5),(100)}".replaceAll("\\(", "").replaceAll("\\)", "").replaceAll("\\{","").replaceAll("\\}","").split(",")
res0: Array[java.lang.String] = Array(2, 4, 5, 100)
Here's a regex as #pst noted in the comments. If you don't want the parentheses change the regular expression to """\d+""".r.
val num = """\(\d+\)""".r
"{(2),(4),(5),(100)}" findAllIn res0
res33: scala.util.matching.Regex.MatchIterator = non-empty iterator
scala> res33.toList
res34: List[String] = List((2), (4), (5), (100))

"{(2),(4),(5),(100)}".split ("[^0-9]").filter(_.length > 0).map (_.toInt)
Split, where char is not part of a number, and only convert non-empty results.
Might be modified to include dots or minus signs.

Use Extractor object:
object MyList {
def apply(l: List[String]): String =
if (l != Nil) "{(" + l.mkString("),(") + ")}"
else "{}"
def unapply(str: String): Some[List[String]] =
Some(
if (str.indexOf("(") > 0)
str.substring(str.indexOf("(") + 1, str.lastIndexOf(")")) split
"\\p{Space}*\\)\\p{Space}*,\\p{Space}*\\(\\p{Space}*" toList
else Nil
)
}
// test
"{(1),(2)}" match { case MyList(l) => l }
// res23: List[String] = List(1, 2)

Related

Find index locations by regex pattern and replace them with a list of indexes in Scala

I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}

How to group similar characters in a string in scala?

Lets assume I have a string as such:
val a = "aaaabbbcccss"
and I want to group only the a's and b's as such:
"a4b3cccss"
I have tries a.toList.groupBy(identity).mapValues(_.size) but that returns a map with no ordering so I cannot convert it into the form I want. I was wondering if there is a function in scala that can achieve what I want?
You may use
val a = "aaaabbbcccss"
val p = """([ab])\1*""".r
println(p replaceAllIn (a, m => s"${m.group(1)}${m.group(0).size}") )
See Scala demo
The regex matches:
([ab]) - Group 1: a or b
\1* - zero or more occurrences of the char captured into Group 1.
In the replacement part, m.group(1) is the char captured into Group 1 and m.group(0).size is the size of the whole match.
As an alternative, you might create a function which you can give your string and a list of characters and use a recursive approach where you could take consecutive characters from the list using takeWhile.
Then drop from the list using the length of the result from takewhile and add to the accumulator what you want to concatenate to the acc string which will be returned when the list will be empty.
def countSimilar(str: String, ch: List[Char]): String = {
def process(l: List[Char], acc: String = ""): String = {
l match {
case Nil => acc
case h :: _ =>
val tw = l.takeWhile(_ == h)
acc + process(
l.drop(tw.length),
if (ch.contains(h)) h + tw.length.toString else tw.mkString("")
)
}
}
process(str.toList)
}
println(countSimilar("aaaabbbcccss", List('a', 'b')))
println(countSimilar("aaaabbbcccssaaaabb", List('a', 'b', 'c')))
That will give you:
a4b3cccss
a4b3c3ssa4b2
See the Scala demo

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

Scala: How to always extract the same substring out from strings with different prefix and/or suffix

given the following strings...
val s0 = "objects"
val s1 = "/objects"
val s2 = "/objects(0)"
val s3 = "/objects(1)"
I need to extract the substring objects, regardless of any possible prefix and suffix. If the string always started with a slash and ended with (N), then easiest solution would be
scala> s3.substring(1).substring(0, s3.indexOf("(") - 1)
res1: String = objects
How do I always extract the string objects with a regex (I suppose this is the way to go)?
You could use the below regex and get the string you want from group index 1.
^\/?(.*?)(?=(?:\(\d*\))?$)
DEMO
Here is another way to do this:
val pattern = """.*/(objects)\(\d+\).*""".r
val data = Seq("objects", "/objects", "/objects(0)", "/objects(1)")
val results = data.map{
case pattern(obj) => obj
case _ => "-"
}
Scala REPL:
results: Seq[String] = List(-, -, objects, objects)
Knowing the delimiting characters allows for this use of dropWhile and takeWhile; for
val in = Seq("objects", "/objects", "/objects(0)", "/objects(1)")
then
in.map(i => i.dropWhile(_ == '/').takeWhile(_ != '('))
List(objects, objects, objects, objects)
A regular expression with grouping as already suggested proves more robust, scalable and general otherwise.

Scala extract from list based on condition

I have a list of words as a list an I would like to extract words that are maybe of lengths between 5 and 10, I am using the following code but doesn't seem to work. Also i can use only val and not var.
val sentence = args(0)
val words = sentence.split(" ")
val fullsort = words.sortBy(w => w.length -> w)
val med = fullsort.map(x => if(x.length>3 && x.length<11) x)
val sentence = args(0)
val words = sentence.split(" ")
val results = words.filter(word => word.length >= 5 && word.length <= 10)
Try this
val sentence = args(0)
val words = sentence.split(" ")
val fullsort = words.sortBy(w => w.length -> w)
val med = fullsort collect {case x:String if (x.length >= 5 && x.length <= 10) => x}
Another alternative is to let a regex do more of the work for you:
val wordLimitRE = "\\b\\w{5,10}\\b".r
val wordIterator = wordLimitRE.findAllMatchIn(sentence).map {_.toString}
This particular regex starts with a word boundary pattern \b then a range limited match for a number of word characters \w{lower, upper} then finally another word boundary pattern \b
The method findAllMatchIn returns an Iterator[Regex.Match] for each match (matches don't overlap because of the word boundary patterns). The map {_.toString} returns an Iterator[String]