Split list when predicate is true - list

Does Kotlin provide a mutation function to split a list when a specific predicate is true?
In the following example the list should be split when the element is a ..
The result should be of the type List<List<String>>.
// input list
val list = listOf(
"This is", "the", "first sentence", ".",
"And", "now there is", "a second", "one", ".",
"Nice", "."
)
// the following should be the result of the transformation
listOf(
listOf("This is", "the", "first sentence"),
listOf("And", "now there is", "a second", "one"),
listOf("Nice")
)
I need something like list.splitWhen { it == "." }

Does Kotlin provide a mutation function to split a list when a
specific predicate is true?
The closest one I have heard of is partition(), however I don't think it will work in your case.
I have made and have briefly tested 3 higher order extension functions, which gives the same expected output.
Solution 1: Straightforward approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<MutableList<String>>()
var needNewList = false
forEach {
string->
if(!predicate(string)){
if(needNewList||list.isEmpty()){
list.add(mutableListOf(string))
needNewList= false
}
else {
list.last().add(string)
}
}
else {
/* When a delimiter is found */
needNewList = true
}
}
return list
}
Solution 2: Pair based approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<List<String>>()
withIndex()
.filter { indexedValue -> predicate(indexedValue.value) || indexedValue.index==0 || indexedValue.index==size-1} // Just getting the delimiters with their index; Include 0 and last -- so to not ignore it while pairing later on
.zipWithNext() // zip the IndexValue with the adjacent one so to later remove continuous delimiters; Example: Indices : 0,1,2,5,7 -> (0,1),(1,2),(2,5),(5,7)
.filter { pair-> pair.first.index + 1 != pair.second.index } // Getting rid of continuous delimiters; Example: (".",".") will be removed, where "." is the delimiter
.forEach{pair->
val startIndex = if(predicate(pair.first.value)) pair.first.index+1 else pair.first.index // Trying to not consider delimiters
val endIndex = if(!predicate(pair.second.value) && pair.second.index==size-1) pair.second.index+1 else pair.second.index // subList() endIndex is exclusive
list.add(subList(startIndex,endIndex)) // Adding the relevant sub-list
}
return list
}
Solution 3: Check next value if delimiter found approach
inline fun List<String>.splitWhen(predicate: (String)-> Boolean):List<List<String>> =
foldIndexed(mutableListOf<MutableList<String>>(),{index, list, string->
when {
predicate(string) -> if(index<size-1 && !predicate(get(index+1))) list.add(mutableListOf()) // Adds a new List within the output List; To prevent continuous delimiters -- !predicate(get(index+1))
list.isNotEmpty() -> list.last().add(string) // Just adding it to lastly added sub-list, as the string is not a delimiter
else -> list.add(mutableListOf(string)) // Happens for the first String
}
list})
Simply call list.splitWhen{it=="delimiter"}. Solution 3 looks more syntactic sugar. Apart from it, you can do some performance test to check which one performs well.
Note: I have done some brief tests which you can have a look via Kotlin Playground or via Github gist.

Related

Compare of values from two lists by use of regular expressions in Kotlin

I have two lists. The first contains original product data as following:
data class InputProductData (val optionFamilyInput: String?, val optionCodeInput: String?, val optionDescriptionInput: String?)
val inputProductData = mutableListOf(
InputProductData("AAA", "111","Chimney with red bricks"),
InputProductData(null,"222","Two wide windows in the main floor"),
InputProductData("CCCC",null,"Beautiful door in green color"),
InputProductData("DDDD",null,"House with area 120 square meters"),
InputProductData(null,"555","Old wood windows")
)
Second list consists of customizing data. The list can have many identical option ids (first column).
data class CustomizingProductOption(val id: Int, val optionName: String, val optionCategory: String, val optionFamily: String?, val optionCode: String?, val searchPattern: String?, val outputValue: String)
val customizingProductOptions = mutableListOf(
CustomizingProductOption(10001, "Chimney", "Additional options", "^AAA$", "", "^Chimney with", "Available"),
CustomizingProductOption(10002, "Windows", "Basic options", "", "^222$", "^Two wide windows", "Available"),
CustomizingProductOption(10002, "Windows", "Basic options", "", "^555$", "wood windows$", "Available"),
CustomizingProductOption(10003, "Door color", "Basic options", "^CCCC$", "", "door in green color$", "green"),
CustomizingProductOption(10004, "House area", "Basic options", "^DDD", "", "120 square meters", "120")
)
The target is to check the product input data and to identify different product options. Whitin the following loop it is done by use of a business logic. There are 2 different constelations which can occure:
Option family + regex within option description
Option code + regex within option description
data class IndicatedOptions(val id: Int, val output: String)
val indicatedOptions: MutableList<IndicatedOptions> = mutableListOf()
for (i in 0 until inputProductData.size) {
for (k in 0 until customizingProductOptions.size) {
if(inputProductData[i].optionFamilyInput.toString().contains(Regex(customizingProductOptions[k].optionFamily.toString())) == true &&
inputProductData[i].optionDescriptionInput.toString().contains(Regex(customizingProductOptions[k].searchPattern.toString())) == true ||
inputProductData[i].optionCodeInput.toString().contains(Regex(customizingProductOptions[k].optionCode.toString())) == true &&
inputProductData[i].optionDescriptionInput.toString().contains(Regex(customizingProductOptions[k].searchPattern.toString())) == true) {
indicatedOptions.add(IndicatedOptions(customizingProductOptions[k].id, customizingProductOptions[k].outputValue))
}
}
}
println("\n--- ALL INDICATED OPTIONS ---")
indicatedOptions.forEach { println(it) }
val indicatedOptionsUnique = indicatedOptions.distinct().sortedBy { it.id }
println("\n--- UNIQUE INDICATED OPTIONS ---")
indicatedOptionsUnique.forEach {println(it)}
QUESTION: Do you see any ways to optimize this codein order to get it more faster?
First, the "regex" code looks broken. Why do you test if a String contains a Regex? This is the wrong way around you would normally test a Regex to see if the target string is matched by the Regex.
Ideas for performance
Precompile your Regex in the constructor of CustomizingProductOption
Your if logic is 4 logic ANDs. The code executes first to last in a logical expressions, so arrange the first test to be the one that is most selective (i.e. have the least number of matches).
Ideas for readability
use proper streams, e.g. inputProductData.map { customizingProductOptions.filter { LOGIC } }...
Stop using unnecessary toString() on something that is already a String
Stop testing if a boolean expression ==true
Now with sample code:
# Use Regex class here
data class CustomizingProductOption(
val id: Int, val optionName: String, val optionCategory: String,
val optionFamily: Regex?, val optionCode: Regex?, val searchPattern: String?,
val outputValue: String,
)
# Instantiate like this:
CustomizingProductOption(
10001, "Chimney", "Additional options", Regex("^AAA$"),
null, "^Chimney with", "Available",
),
# main code
val indicatedOptions: List<IndicatedOptions> = inputProductData.map { productData ->
customizingProductOptions.filter { option -> // this filter will only return matching options to product data
productData.optionFamilyInput != null && option.optionFamily?.containsMatchIn(productData.optionFamilyInput) ?: false
//&& other conditions
}
.map {option -> // transform to your desired output
IndicatedOptions(
option.id,
option.outputValue,
)
}
}.flatten() // you need this to flatten List<List<IndicatedOptions>>

Kotlin check List contain ignore case

Having the equals ignore case option
if (bookType.equals(Type.BABY.name, true))
Is there an option to do contain similar with ignore case?
val validTypes = listOf("Kids", "Baby")
if (validTypes.contains(bookType)))
I see there is an option of doing :
if (bookType.equals(Type.BABY.name, true) || bookType.equals(Type.KIDS.name, true))
But I want more elegant way
Could use the is operator with a when expression, and directly with book rather than bookType:
val safetyLevel = when (book) {
is Type.BABY, is Type.KIDS -> "babies and kids"
is Type.CHILD -> "okay for children"
else -> "danger!"
}
See Type checks and casts.
Perhaps you could make the valid types list all uppercase then you can do the following:
You could use map to convert the case for all items in the list e.g.
validTypes.contains(bookType.uppercase())
Or you could use any (https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/any.html)
validTypes.any { bookType.uppercase() == it }
If you want to keep the casing of your original list you could do:
validTypes.any { bookType.replaceFirstChar { it.uppercaseChar() } == it }

Replace occurence in a String in Kotlin

I have two list of Strings. Now I want to replace every occurence of a word in the first list at index i with a word in the second list at index i of a sentence.
So if I have
list a=("am","I","my")
and
list b=("are","You","your")
I want the sentence "I am an amateur"
to become "You are an amateur"
What is cleanest way to do that in Kotlin (without for loop)?
First split the string to a list of its words and then map each word if it exists in list a to the corresponding word in list b. Finally rejoin the string:
val a= listOf("am","I","my")
val b= listOf("are","You","your")
val str = "I am an amateur"
val new = str
.split("\\s+".toRegex())
.map { val i = a.indexOf(it); if (i < 0) it else b[i] }
.joinToString(" ")
Another way of doing the same thing is:
var new = " $str "
a.forEachIndexed { i, s -> new = new.replace(" $s ", " ${b[i]} ") }
new = new.trim()
although this is closer to a for loop.
I assume there is no punctuation, all whitespaces are spaces and so on.
val m = a.zip(b).toMap()
return s.split(' ').joinToString(" ") { m[it] ?: it }
First you create a map m for more efficient... mapping. Then
Split the string to get a list of words
Map all words: if m contains the word, then return the value (i.e. the replacement), otherwise return the original word (since we shouldn't replace it).
Join all words, separate them by spaces.
You can use the regular expression \b\w+\b to match words in a sentence and then call replace function with the lambda that provides a replacement string for each match:
val input = "I am an amateur, alas."
val wordsToReplace = listOf("I", "am", "my")
val wordsReplaceWith = listOf("You", "are", "your")
val wordRegex = """\b\w+\b""".toRegex()
val result = wordRegex.replace(input) { match ->
val wordIndex = wordsToReplace.indexOf(match.value)
if (wordIndex >= 0) wordsReplaceWith[wordIndex] else match.value
}
println(result)
If there are a lot of word in your lists, it makes sense to build a map of them to speed up searches:
val replaceMap = (wordsToReplace zip wordsReplaceWith).toMap()
val result = wordRegex.replace(input) { match ->
replaceMap[match.value] ?: match.value
}
I think the simplest way is to create a set of regex you want and replace the string by iteration. Let's say you want to replace the word "am", your regex will be "\bam\b". You can use "(?i)\bam\b" if you want it not to be case sensitive. To make "I am an amateur" to "You are an amateur"
val replacements = setOf("\\bam\\b" to "are",
"\\bI\\b" to "You",
"\\bmy\\b" to "your")
replacements.forEach {
str = str.replace(Regex(it.first), it.second)
}

Non-empty iterator over regex groups becomes empty array

I have this strange situation - when I print regex groups to a console, they show up. When I convert this iterator to array - it's empty. Following code doesnt print anything:
val str = "buy--751-rates.data"
val expr = "--(.+)-rates.data".r
val target = Array[String]()
expr.findAllIn(str).matchData map(m => m group 1) copyToArray(target, 0, 4)
target foreach { println }
But this snippet works:
val str = "buy--751-rates.data"
val expr = "--(.+)-rates.data".r
println("Scala matches:")
expr.findAllIn(str).matchData foreach {
m => println(m group 1)
}
I guess I missed something simple
You didn't get anything because you were copying to a zero length array. You don't actually need to do that as there is a toArray method on the iterator that converts it to and array and from that you can get the head value if you want. For example:
(expr.findAllIn(str).matchData).map(m => m group 1).toArray.head

Scala Map a list of items to a value

I have a list of bigrams of a sentence and another original list of relevantbigrams, I want to check that if any of the relevantbigrams are present in the sentences then I want to return the sentence. I was thinking of implementing it as follows: map each of the bigrams in the list to the sentence they come from then do a search on the key an return the value.
example:
relevantbigrams = (This is, is not, not what)
bigrams List(list(This of, of no, no the),list(not what, what is))
So each list is a bigram of separate sentences. Here "not what" from the second sentence matches, so I would like to return the second sentence. I am planning to have a map of Map("This of" -> "This of no the", "of no" ->"This of no the", "not what"->"not what is"). etc. and return the sentences that match on relevant bigram, so here I return "not what is"
This is my code:
val bigram = usableTweets.map(x =>Tokenize(x).sliding(2).flatMap{case Vector(x,y) => List(x+" "+y)}.map(z => z, x))
for(i<- 0 to relevantbigram.length)
if(bigram.contains(relevantbigram(i)))) bigram.get(relevantbigram(i))
else useableTweets.head
You got the order or flatMap and map the wrong way around:
val bigramMap = usableTweets.flatMap { x =>
x.split(" ").sliding(2).
map(bg => bg.mkString(" ") -> x)
} toMap
Then you can do your search like this:
relevantbigrams collect { rb if theMap contains rb => bigramMap(rb) }
Or
val found =
for {
rb <- relevantbigrams
sentence <- theMap get rb
} yield sentence
Both should give you a list, but from your code it appears you want to default to the first sentence if your search found nothing:
found.headOption.getOrElse(usableTweets.head)