Scala Map a list of items to a value - list

I have a list of bigrams of a sentence and another original list of relevantbigrams, I want to check that if any of the relevantbigrams are present in the sentences then I want to return the sentence. I was thinking of implementing it as follows: map each of the bigrams in the list to the sentence they come from then do a search on the key an return the value.
example:
relevantbigrams = (This is, is not, not what)
bigrams List(list(This of, of no, no the),list(not what, what is))
So each list is a bigram of separate sentences. Here "not what" from the second sentence matches, so I would like to return the second sentence. I am planning to have a map of Map("This of" -> "This of no the", "of no" ->"This of no the", "not what"->"not what is"). etc. and return the sentences that match on relevant bigram, so here I return "not what is"
This is my code:
val bigram = usableTweets.map(x =>Tokenize(x).sliding(2).flatMap{case Vector(x,y) => List(x+" "+y)}.map(z => z, x))
for(i<- 0 to relevantbigram.length)
if(bigram.contains(relevantbigram(i)))) bigram.get(relevantbigram(i))
else useableTweets.head

You got the order or flatMap and map the wrong way around:
val bigramMap = usableTweets.flatMap { x =>
x.split(" ").sliding(2).
map(bg => bg.mkString(" ") -> x)
} toMap
Then you can do your search like this:
relevantbigrams collect { rb if theMap contains rb => bigramMap(rb) }
Or
val found =
for {
rb <- relevantbigrams
sentence <- theMap get rb
} yield sentence
Both should give you a list, but from your code it appears you want to default to the first sentence if your search found nothing:
found.headOption.getOrElse(usableTweets.head)

Related

Split list when predicate is true

Does Kotlin provide a mutation function to split a list when a specific predicate is true?
In the following example the list should be split when the element is a ..
The result should be of the type List<List<String>>.
// input list
val list = listOf(
"This is", "the", "first sentence", ".",
"And", "now there is", "a second", "one", ".",
"Nice", "."
)
// the following should be the result of the transformation
listOf(
listOf("This is", "the", "first sentence"),
listOf("And", "now there is", "a second", "one"),
listOf("Nice")
)
I need something like list.splitWhen { it == "." }
Does Kotlin provide a mutation function to split a list when a
specific predicate is true?
The closest one I have heard of is partition(), however I don't think it will work in your case.
I have made and have briefly tested 3 higher order extension functions, which gives the same expected output.
Solution 1: Straightforward approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<MutableList<String>>()
var needNewList = false
forEach {
string->
if(!predicate(string)){
if(needNewList||list.isEmpty()){
list.add(mutableListOf(string))
needNewList= false
}
else {
list.last().add(string)
}
}
else {
/* When a delimiter is found */
needNewList = true
}
}
return list
}
Solution 2: Pair based approach
inline fun List<String>.splitWhen(predicate: (String)->Boolean):List<List<String>> {
val list = mutableListOf<List<String>>()
withIndex()
.filter { indexedValue -> predicate(indexedValue.value) || indexedValue.index==0 || indexedValue.index==size-1} // Just getting the delimiters with their index; Include 0 and last -- so to not ignore it while pairing later on
.zipWithNext() // zip the IndexValue with the adjacent one so to later remove continuous delimiters; Example: Indices : 0,1,2,5,7 -> (0,1),(1,2),(2,5),(5,7)
.filter { pair-> pair.first.index + 1 != pair.second.index } // Getting rid of continuous delimiters; Example: (".",".") will be removed, where "." is the delimiter
.forEach{pair->
val startIndex = if(predicate(pair.first.value)) pair.first.index+1 else pair.first.index // Trying to not consider delimiters
val endIndex = if(!predicate(pair.second.value) && pair.second.index==size-1) pair.second.index+1 else pair.second.index // subList() endIndex is exclusive
list.add(subList(startIndex,endIndex)) // Adding the relevant sub-list
}
return list
}
Solution 3: Check next value if delimiter found approach
inline fun List<String>.splitWhen(predicate: (String)-> Boolean):List<List<String>> =
foldIndexed(mutableListOf<MutableList<String>>(),{index, list, string->
when {
predicate(string) -> if(index<size-1 && !predicate(get(index+1))) list.add(mutableListOf()) // Adds a new List within the output List; To prevent continuous delimiters -- !predicate(get(index+1))
list.isNotEmpty() -> list.last().add(string) // Just adding it to lastly added sub-list, as the string is not a delimiter
else -> list.add(mutableListOf(string)) // Happens for the first String
}
list})
Simply call list.splitWhen{it=="delimiter"}. Solution 3 looks more syntactic sugar. Apart from it, you can do some performance test to check which one performs well.
Note: I have done some brief tests which you can have a look via Kotlin Playground or via Github gist.

Replace occurence in a String in Kotlin

I have two list of Strings. Now I want to replace every occurence of a word in the first list at index i with a word in the second list at index i of a sentence.
So if I have
list a=("am","I","my")
and
list b=("are","You","your")
I want the sentence "I am an amateur"
to become "You are an amateur"
What is cleanest way to do that in Kotlin (without for loop)?
First split the string to a list of its words and then map each word if it exists in list a to the corresponding word in list b. Finally rejoin the string:
val a= listOf("am","I","my")
val b= listOf("are","You","your")
val str = "I am an amateur"
val new = str
.split("\\s+".toRegex())
.map { val i = a.indexOf(it); if (i < 0) it else b[i] }
.joinToString(" ")
Another way of doing the same thing is:
var new = " $str "
a.forEachIndexed { i, s -> new = new.replace(" $s ", " ${b[i]} ") }
new = new.trim()
although this is closer to a for loop.
I assume there is no punctuation, all whitespaces are spaces and so on.
val m = a.zip(b).toMap()
return s.split(' ').joinToString(" ") { m[it] ?: it }
First you create a map m for more efficient... mapping. Then
Split the string to get a list of words
Map all words: if m contains the word, then return the value (i.e. the replacement), otherwise return the original word (since we shouldn't replace it).
Join all words, separate them by spaces.
You can use the regular expression \b\w+\b to match words in a sentence and then call replace function with the lambda that provides a replacement string for each match:
val input = "I am an amateur, alas."
val wordsToReplace = listOf("I", "am", "my")
val wordsReplaceWith = listOf("You", "are", "your")
val wordRegex = """\b\w+\b""".toRegex()
val result = wordRegex.replace(input) { match ->
val wordIndex = wordsToReplace.indexOf(match.value)
if (wordIndex >= 0) wordsReplaceWith[wordIndex] else match.value
}
println(result)
If there are a lot of word in your lists, it makes sense to build a map of them to speed up searches:
val replaceMap = (wordsToReplace zip wordsReplaceWith).toMap()
val result = wordRegex.replace(input) { match ->
replaceMap[match.value] ?: match.value
}
I think the simplest way is to create a set of regex you want and replace the string by iteration. Let's say you want to replace the word "am", your regex will be "\bam\b". You can use "(?i)\bam\b" if you want it not to be case sensitive. To make "I am an amateur" to "You are an amateur"
val replacements = setOf("\\bam\\b" to "are",
"\\bI\\b" to "You",
"\\bmy\\b" to "your")
replacements.forEach {
str = str.replace(Regex(it.first), it.second)
}

Scala Spark count regex matches in a file

I am learning Spark+Scala and I am stuck with this problem. I have one file that contains many sentences, and another file with a large number of regular expressions. Both files have one element per line.
What I want is to count how many times each regex has a match in the whole sentences file. For example if the sentences file (after becoming an array or list) was represented by ["hello world and hello life", "hello i m fine", "what is your name"], and the regex files by ["hello \\w+", "what \\w+ your", ...] then I would like the output to be something like: [("hello \\w+", 3),("what \\w+ your",1), ...]
My code is like this:
object PatternCount_v2 {
def main(args: Array[String]) {
// The text where we will find the patterns
val inputFile = args(0);
// The list of patterns
val inputPatterns = args(1)
val outputPath = args(2);
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
// Load the text file
val textFile = sc.textFile(inputFile).cache()
// Load the patterns
val patterns = Source.fromFile(inputPatterns).getLines.map(line => line.r).toList
val patternCounts = textFile.flatMap(line => {
println(line)
patterns.foreach(
pattern => {
println(pattern)
(pattern,pattern.findAllIn(line).length )
}
)
}
)
patternCounts.saveAsTextFile(outputPath)
}}
But the compiler complains:
If I change the flatMap to just map the code runs but returns a bunch of empty tuples () () () ()
Please help! This is driving me crazy.
Thanks,
As far as I can see, there are two issues here:
You should use map instead of foreach: foreach returns Unit, it performs an action with a potential side effect on each element of a collection, it doesn't return a new collection. map on the other hand transform a collection into a new one by applying the supplied function to each element
You're missing the part where you aggregate the results of flatMap to get the actual count per "key" (pattern). This can be done easily with reduceByKey
Altogether - this does what you need:
val patternCounts = textFile
.flatMap(line => patterns.map(pattern => (pattern, pattern.findAllIn(line).length)))
.reduceByKey(_ + _)

Regex parse a list of comma separated number ranges, and capture them in individual groups

I found the following regex for matching comma separated numbers or ranges of numbers:
val reg = """^(\d+(-\d+)?)(,\s*(\d+(-\d+)?))*$""".r
While this does match valid strings, I only get one String out of it, instead of a list of strings, each corresponding to one of the separated entries. E.g.
reg.findAllIn("1-2, 3").map(s => s""""$s"""").toList
Gives
List("1-2, 3")
But I want:
List("1-2", "3")
The following comes closer:
val list = "1-2, 3" match {
case Reg(groups # _*) => groups
case _ => Nil
}
list.map(s => s""""$s"""")
But it contains all sorts of 'garbage':
List("1-2", "-2", ", 3", "3", "null")
With findAllIn you should not try to match the entire string. It will split by the biggest continuos match it can find. Instead what you need is just a part of your regex:
val reg = """(\d+(-\d+)?)""".r
If you use this with findAllIn it will return what you need.
scala> val x = """(\d+(-\d+)?)""".r
x: scala.util.matching.Regex = (\d+(-\d+)?)
scala> x.findAllIn("1-2, 3").toList
res0: List[String] = List(1-2, 3)

ViM: how to put string from input dialog in a list

VIM: Does anyone know how to put a string from an input dialog in a list?
p.e.:
the string "3,5,12,15"
to:
list item[1] = 3
list item[2] = 5
list item[3] = 12
etc.
and how can I know how many list items there are?
From :h E714
:let l = len(list) " number of items in list
:let list = split("a b c") " create list from items in a string
In your case,
let string = "3,5,7,19"
let list = split(string, ",")
echo len(list)
Use split, len and empty functions:
let list=split(string, ',')
let list_length=len(list)
" If all you want is to check whether list is empty:
if empty(list)
throw "You must provide at least one value"
endif
Note that if you want to get a list of numbers out of the string, you will have to use map to transform list elements into numbers:
let list=map(split(string, ','), '+v:val')
Most of time you can expect strings be transformed into numbers, but sometimes such transformation is not done.