Replace occurence in a String in Kotlin - list

I have two list of Strings. Now I want to replace every occurence of a word in the first list at index i with a word in the second list at index i of a sentence.
So if I have
list a=("am","I","my")
and
list b=("are","You","your")
I want the sentence "I am an amateur"
to become "You are an amateur"
What is cleanest way to do that in Kotlin (without for loop)?

First split the string to a list of its words and then map each word if it exists in list a to the corresponding word in list b. Finally rejoin the string:
val a= listOf("am","I","my")
val b= listOf("are","You","your")
val str = "I am an amateur"
val new = str
.split("\\s+".toRegex())
.map { val i = a.indexOf(it); if (i < 0) it else b[i] }
.joinToString(" ")
Another way of doing the same thing is:
var new = " $str "
a.forEachIndexed { i, s -> new = new.replace(" $s ", " ${b[i]} ") }
new = new.trim()
although this is closer to a for loop.

I assume there is no punctuation, all whitespaces are spaces and so on.
val m = a.zip(b).toMap()
return s.split(' ').joinToString(" ") { m[it] ?: it }
First you create a map m for more efficient... mapping. Then
Split the string to get a list of words
Map all words: if m contains the word, then return the value (i.e. the replacement), otherwise return the original word (since we shouldn't replace it).
Join all words, separate them by spaces.

You can use the regular expression \b\w+\b to match words in a sentence and then call replace function with the lambda that provides a replacement string for each match:
val input = "I am an amateur, alas."
val wordsToReplace = listOf("I", "am", "my")
val wordsReplaceWith = listOf("You", "are", "your")
val wordRegex = """\b\w+\b""".toRegex()
val result = wordRegex.replace(input) { match ->
val wordIndex = wordsToReplace.indexOf(match.value)
if (wordIndex >= 0) wordsReplaceWith[wordIndex] else match.value
}
println(result)
If there are a lot of word in your lists, it makes sense to build a map of them to speed up searches:
val replaceMap = (wordsToReplace zip wordsReplaceWith).toMap()
val result = wordRegex.replace(input) { match ->
replaceMap[match.value] ?: match.value
}

I think the simplest way is to create a set of regex you want and replace the string by iteration. Let's say you want to replace the word "am", your regex will be "\bam\b". You can use "(?i)\bam\b" if you want it not to be case sensitive. To make "I am an amateur" to "You are an amateur"
val replacements = setOf("\\bam\\b" to "are",
"\\bI\\b" to "You",
"\\bmy\\b" to "your")
replacements.forEach {
str = str.replace(Regex(it.first), it.second)
}

Related

Find index locations by regex pattern and replace them with a list of indexes in Scala

I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}

How to group similar characters in a string in scala?

Lets assume I have a string as such:
val a = "aaaabbbcccss"
and I want to group only the a's and b's as such:
"a4b3cccss"
I have tries a.toList.groupBy(identity).mapValues(_.size) but that returns a map with no ordering so I cannot convert it into the form I want. I was wondering if there is a function in scala that can achieve what I want?
You may use
val a = "aaaabbbcccss"
val p = """([ab])\1*""".r
println(p replaceAllIn (a, m => s"${m.group(1)}${m.group(0).size}") )
See Scala demo
The regex matches:
([ab]) - Group 1: a or b
\1* - zero or more occurrences of the char captured into Group 1.
In the replacement part, m.group(1) is the char captured into Group 1 and m.group(0).size is the size of the whole match.
As an alternative, you might create a function which you can give your string and a list of characters and use a recursive approach where you could take consecutive characters from the list using takeWhile.
Then drop from the list using the length of the result from takewhile and add to the accumulator what you want to concatenate to the acc string which will be returned when the list will be empty.
def countSimilar(str: String, ch: List[Char]): String = {
def process(l: List[Char], acc: String = ""): String = {
l match {
case Nil => acc
case h :: _ =>
val tw = l.takeWhile(_ == h)
acc + process(
l.drop(tw.length),
if (ch.contains(h)) h + tw.length.toString else tw.mkString("")
)
}
}
process(str.toList)
}
println(countSimilar("aaaabbbcccss", List('a', 'b')))
println(countSimilar("aaaabbbcccssaaaabb", List('a', 'b', 'c')))
That will give you:
a4b3cccss
a4b3c3ssa4b2
See the Scala demo

Regex Matching using Matcher and Pattern

I am trying to do regex on a number based on the below conditions, however its returning an empty string
import java.util.regex.Matcher
import java.util.regex.Pattern
object clean extends App {
val ALPHANUMERIC: Pattern = Pattern.compile("^[a-zA-Z0-9]*$")
val SPECIALCHAR: Pattern = Pattern.compile("[a-zA-Z0-9\\-#\\.\\(\\)\\/%&\\s]")
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
val TRAILINGZEROES: Pattern = Pattern.compile("\\.0*$|(\\.\\d*?)0+$")
def evaluate(codes: String): String = {
var str2: String = codes.toString
var text:Matcher = LEADINGZEROES.matcher(str2)
str2 = text.replaceAll("")
text = ALPHANUMERIC.matcher(str2)
str2 = text.replaceAll("")
text = SPECIALCHAR.matcher(str2)
str2 = text.replaceAll("")
text = TRAILINGZEROES.matcher(str2)
str2 = text.replaceAll("")
}
}
the code is returning empty string for LEADINGZEROES match.
scala> println("cleaned value :" + evaluate("0001234"))
cleaned value :
What change should I do to make the code work as I expect. Basically i am trying to remove leading/trailing zeroes and if the numbers has special characters/alphanumeric values than entire value should be returned null
Your LEADINGZEROES pattern is working correct as
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
println(LEADINGZEROES.matcher("0001234").replaceAll(""))
gives
//1234
But then there is a pattern matching
text = ALPHANUMERIC.matcher(str2)
which replaces all alphanumeric to "" and this made str as empty ("")
As when you do
val ALPHANUMERIC: Pattern = Pattern.compile("^[a-zA-Z0-9]*$")
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
println(ALPHANUMERIC.matcher(LEADINGZEROES.matcher("0001234").replaceAll("")).replaceAll(""))
it will print empty
Updated
As you have commented
if there is a code that is alphanumeric i want to make that value NULL
but in case of leading or trailing zeroes its pure number, which should return me the value after removing zeroes
but its also returning null for trailing and leading zeroes matches
and also how can I skip a match , suppose i want the regex to not match the number 0999 for trimming leading zeroes
You can write your evaluate function and regexes as below
val LEADINGTRAILINGZEROES = """(0*)(\d{4})(0*)""".r
val ALPHANUMERIC = """[a-zA-Z]""".r
def evaluate(codes: String): String = {
val LEADINGTRAILINGZEROES(first, second, third) = if(ALPHANUMERIC.findAllIn(codes).length != 0) "0010" else codes
if(second.equalsIgnoreCase("0010")) "NULL" else second
}
which should give you
println("cleaned value : " + evaluate("000123400"))
// cleaned value : 1234
println("alphanumeric : " + evaluate("0001A234"))
// alphanumeric : NULL
println("skipping : " + evaluate("0999"))
// skipping : 0999
I hope the answer is helpful

Getting the index of a slice

I want to do some processing on a string in Scala. The first stage of that is finding the index of articles such as: "A ", " A ", "a ", " a ". I am trying to do that like this:
"A house is in front of us".indexOfSlice("\\s+[Aa] ")
I think this should return 0, as the substring is first matched in the first position of the string.
However, this returns -1.
Why does it return -1? Is the regex I am using incorrect?
The other answers as I type this are just missing the point. Your problem is that indexOfSlice doesn't take a regexp, but a sub-sequence to seach for in the sequence. So fixing the regexp won't help at all.
Try this:
val pattern = "\\b[Aa]\\b".r.unanchored
for (mo <- pattern.findAllMatchIn("A house is in front of us, a house is in front of us all")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 0
//| pattern starts at 27
(with fixed regex, too)
Edit: counter-example for the popular but wrong suggestion of "\\s*[Aa] "
val pattern2 = "\\s*[Aa] ".r.unanchored
for (mo <- pattern2.findAllMatchIn("The agenda is hidden")) {
println("pattern starts at " + mo.start)
}
//> pattern starts at 9
I see a mistake in your regex. your regex is searching for
at least once space (\s+)
a letter (either A or a)
but string you are matching doesn't contain space in beginning. that's why It's not returning you index 0 but -1.
you could write your regex as "^\\s*[Aa] "
Here is example:
val text = "A house is in front of us";
val matcher = Pattern.compile("^\\s*[Aa] ").matcher(text)
var idx = 0;
if(matcher.find()){
idx = matcher.start()
}
println(idx)
it should return 0 as expected.

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}