Scala: how to split using more than one delimiter - list

I would like to know how I can split a string using more than one delimiter with Scala.
For instance if I have a list of delimiters :
List("Car", "Red", "Boo", "Foo")
And a string to harvest :
Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed
I would like to be able to output something like :
List( ("Car", " foerjfpoekrfopekf "),
("Red", " ezokdpzkdpoedkzopke dekpzodk "),
("Foo", " azdkpodkzed")
)

You can use the list to create a regular expression and use its split method:
val regex = List("Car", "Red", "Boo", "Foo").mkString("|").r
regex.split("Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed")
That however doesn't tell you which delimiter was used where. If you need that, I suggest you try Scala's parser library.
EDIT:
Or you can use regular expressions to extract one pair at a time like this:
def split(s:String, l:List[String]):List[(String,String)] = {
val delimRegex = l.mkString("|")
val r = "("+delimRegex+")(.*?)(("+delimRegex+").*)?"
val R = r.r
s match {
case R(delim, text, rest, _) => (delim, text) :: split(rest, l)
case _ => Nil
}
}

a bit verbose, but it works:
DEPRECATED VERSION: (it has a bug, left it here because you already accepted the answer)
def f(s: String, l: List[String], g: (String, List[String]) => Int) = {
for {
t <- l
if (s.contains(t))
w = s.drop(s.indexOf(t) + t.length)
} yield (t, w.dropRight(w.length - g(w, l)))
}
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def g(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), g(s, xs))
}
val l = List("Car", "Red", "Boo", "Foo")
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed"
output:
f(s, l, g).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
it returns Array[String] instead of list. but you can just as well do: f(s, l, g).toList
EDIT:
just noticed this code is good if the delimiters only appear once in the string. if had defined s as follows:
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed Car more..."
I'd still get the same result, instead of another pair ("Car"," more...")
EDIT#2: BUGLESS VERSION here's the fixed snippet:
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def multiSplit(str: String, delimiters: List[String]): List[(String, String)] = {
val del = nextDelimiter(str, delimiters)
del._1 match {
case None => Nil
case Some(x) => {
val tmp = str.drop(x.length)
val current = tmp.dropRight(tmp.length - nextDelIndex(tmp,delimiters))
(x, current) :: multiSplit(str.drop(x.length + current.length), delimiters)
}
}
}
def nextDelIndex(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), nextDelIndex(s, xs))
}
def nextDelimiter(str: String, delimiters: List[String]): (Option[String], Int) = delimiters match {
case Nil => (None, -1)
case x :: xs => {
val next = nextDelimiter(str, xs)
if (str.contains(x)) {
val i = str.indexOf(x)
next._1 match {
case None => (Some(x), i)
case _ => if (next._2 < i) next else (Some(x), i)
}
} else next
}
}
output:
multiSplit(s, l).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
> (Car, more...)
and now it works :)

Related

Get a unique string from string split

I want to get a List[String] from the input. Please help me to find an elegant way.
Desired output:
emp1,emp2
My code:
val ls = List("emp1.id1", "emp2.id2","emp2.id3","emp1.id4")
def myMethod(ls: List[String]): Unit = {
ls.foreach(i => print(i.split('.').head))
}
(myMethod(ls)). //set operation to make it unique ??
If you care about validation, you can consider using Regex:
val ls = List("emp1.id1", "emp2.id2","emp2.id3","emp1.id4","boom")
def myMethod(ls: List[String]) = {
val empIdRegex = "([\\w]+)\\.([\\w]+)".r
val employees = ls collect { case empIdRegex(emp, _) => emp }
employees.distinct
}
println(myMethod(ls))
Outputs:
List(emp1, emp2)
def myMethod(ls: List[String]) =
ls.map(_.takeWhile(_ != '.'))
myMethod(ls).distinct
Since Scala 2.13, you can use List.unfold to do this:
List.unfold(ls) {
case Nil =>
None
case x :: xs =>
Some(x.takeWhile(_ != '.'), xs)
}.distinct
Please not that you want distinct values, therefore you can achieve the same using Set.unfold:
Set.unfold(ls) {
case Nil =>
None
case x :: xs =>
Some(x.takeWhile(_ != '.'), xs)
}
Code run at Scastie.

Combining two lists of objects into one based on a business logic with Scala

In continuation of Scala learning curve
I have two lists of objects. I need to merge these lists into one list, while applying a logic with matching pares.
So, for example, here are the two lists:
case class test(int: Int, str: String)
val obj1 = test(1, "one")
val obj2 = test(2, "two")
val list1 = List(obj1, obj2)
val obj3 = test(2, "Another two")
val obj4 = test(4, "four")
val list2 = List(obj1, obj2)
What I need is:
List(test(1, "one old"), test(2, "Another two updated"), test(4, "four new"))
Of coarse, I can iterate though all elements in an old fashioned way, and do all the conversions there, but that is not the "Scala way" (I guess).
I tried approaching it with foldLeft, but got stuck. Here is what I have that is not working:
list1.foldLeft(list2) { (a:test, b:test) =>
b.int match {
case a.int => {
//Apply logic and create new object
}
}
}
UPDATE
For now I did it in two steps:
var tasks : Seq[ChecklistSchema.Task] = left.tasks.map((task:ChecklistSchema.Task) =>
right.tasks.find(t => t.groupId == task.groupId) match {
case Some(t: ChecklistSchema.Task) => t
case _ => {
task.status match {
case TaskAndValueStatus.Active => task.copy(status = TaskAndValueStatus.Hidden)
case _ => task
}
}
}
)
tasks = tasks ++ right.tasks.filter((t:ChecklistSchema.Task) => !tasks.contains(t))
There is got to be a better approach!
Thanks,
*Assuming val list2 = List(obj3, obj4).
Here's my approach to this:
Apply "old" to all list1 entries
Create a map for list2 in order to efficiently check (in the next method) if a duplicated value came from list2. (breakOut here instructs the compiler to build it using the most appropriate factory. More at https://stackoverflow.com/a/7404582/4402547)
applyLogic decides what to call a not-old test ("new" or "updated")
Put them together, groupBy on the index, applyLogic, and sort (optional).
def merge(left: List[Test], right: List[Test]) = {
val old = list1.map(t => Test(t.int, t.str+" old"))
val l2Map = list2.map(t => (t.int -> t)) (breakOut): Map[Int, Test]
def applyLogic(idx: Int, tests: List[Test]): Test = {
tests.size match {
case 1 => {
val test = tests.head
if(l2Map.contains(test.int)) Test(test.int, test.str + " new") else test
}
case 2 => {
val updated = tests(1)
Test(updated.int, updated.str+" updated")
}
}
}
(old ++ list2).groupBy(t => t.int).map(f => applyLogic(f._1, f._2)).toList.sortBy((t => t.int))
}
val left = List(Test(1, "one"), Test(2, "two"))
val right = List(Test(2, "Another two"), Test(4, "four"))
val result = List(Test(1, "one old"), Test(2, "Another two updated"), Test(4, "four new"))
assert(merge(left, right) == result)
I don't know if this solution is "Scala way" but it is using foldLeft.
case class Test(a: Int, b: String) {
def labeled(label: String) = copy(b = b + " " + label)
}
def merge(left: List[Test], right: List[Test]) = {
val (list, updated) = left.foldLeft((List[Test](), Set[Int]())) { case ((acc, founded), value) =>
right.find(_.a == value.a) match {
case Some(newValue) => (newValue.labeled("updated") :: acc, founded + value.a)
case None => (value.labeled("old") :: acc, founded)
}
}
list.reverse ::: right.filterNot(test => updated(test.a)).map(_.labeled("new"))
}
val left = List(Test(1, "one"), Test(2, "two"))
val right = List(Test(2, "Another two"), Test(4, "four"))
val result = List(Test(1, "one old"), Test(2, "Another two updated"), Test(4, "four new"))
assert(merge(left, right) == result)
(list1 ++ (list2.map(l => l.copy(str = l.str + " new")))).groupBy(_.int).map(
l =>
if (l._2.size >= 2) {
test(l._2(0).int, "Another two updated")
} else l._2(0)
)
map to update new value and use groupBy to update distinct value

Function always return Nil

I am trying to resolve some anagrams assignments. And I can't figure out the problem behind getting always a List() when running my sentenceAnagrams function. Any Help !
type Word = String
type Sentence = List[Word]
type Occurrences = List[(Char, Int)]
def combinations(occurrences: Occurrences): List[Occurrences] = occurrences match {
case Nil => List(Nil)
case x :: xs => (for {z <- combinations(xs); i <- 1 to x._2} yield (x._1, i) :: z).union(combinations(xs))
}
def subtract(x: Occurrences, y: Occurrences): Occurrences = {
if (y.isEmpty) x
else {
val yMap = y.toMap withDefaultValue 0
x.foldLeft(x) { (z, i) => if (combinations(x).contains(y)) {
val diff = i._2 - yMap.apply(i._1)
if (diff > 0) z.toMap.updated(i._1, diff).toList else z.toMap.-(i._1).toList
} else z
}
}}
--
def sentenceAnagrams(sentence: Sentence): List[Sentence] = {
def sentenceAnag(occ: Occurrences): List[Sentence] =
if (occ.isEmpty) List(List())
else (for {
comb <- combinations(occ)
word <- (dictionaryByOccurrences withDefaultValue List()).apply(comb)
otherSentence <- sentenceAnag(subtract(occ, comb))
} yield word :: otherSentence).toList
sentenceAnag(sentenceOccurrences(sentence))
}

Scala: List Operations

In Scala, define the function slice(from, until, xs) that selects an interval of elements from the (string) list xs such that for each element x in the interval the following invariant holds: from <= indexOf(x) < until.
from: the lowest index to include from this list.
until: the highest index to exclude from this list.
returns: a list containing the elements greater than or equal to index from extending up to (but not including) index until of this list.
example:
def test {
expect (Cons("b", Cons("c", Nil()))) {
slice(1, 3, Cons("a", Cons("b", Cons("c", Cons("d", Cons("e", Nil()))))))
}
}
another example:
def test {
expect (Cons("d", Cons("e", Nil()))) {
slice(3, 7, Cons("a", Cons("b", Cons("c", Cons("d", Cons("e", Nil()))))))
}
}
and this is what I have, but its not that correct. Can someone help me with it?
abstract class StringList
case class Nil() extends StringList
case class Cons(h: String, t: StringList) extends StringList
object Solution {
// define function slice
def slice(from : Int, until : Int, xs : StringList) : StringList = (from, until, xs) match {
case (_,_,Nil()) => Nil()
case (n, m, _) if(n == m) => Nil()
case (n, m, _) if(n > m) => Nil()
case (n, m, _) if(n < 0) => Nil()
case (n, m, xs) if(n == 0)=> Cons(head(xs), slice(n+1,m,tail(xs)))
case (n, m, xs) => {
//Cons(head(xs), slice(n+1,m,tail(xs)))
if(n == from) {
print("\n")
print("n == m " + Cons(head(xs), slice(n+1,m,tail(xs))))
print("\n")
Cons(head(xs), slice(n+1,m,tail(xs)))
}
else slice(n+1,m,tail(xs))
}
}
def head(t : StringList) : String = t match {
case Nil() => throw new NoSuchElementException
case Cons(h, t) => h
}
def tail(t : StringList) : StringList = t match {
case Nil() => Nil()
case Cons(h, t) => t
}
/* def drop(n : Int, t : StringList) : StringList = (n, t) match {
case (0, t) => t
case (_, Nil()) => Nil()
case (n, t) => drop(n-1 , tail(t))
}*/
}//
This works add a method to find the element at given index :
trait StringList
case class Nil() extends StringList
case class Cons(h: String, t: StringList) extends StringList
object Solution {
def numberOfElements(str: StringList, count: Int = 0): Int = {
if (str == Nil()) count else numberOfElements(tail(str), count + 1)
}
def elemAtIndex(n: Int, str: StringList, count: Int = 0): String = {
if (str == Nil() || n == count) head(str) else elemAtIndex(n, tail(str), count + 1)
}
def head(str: StringList): String = str match {
case Nil() => throw new NoSuchElementException
case Cons(h, t) => h
}
def tail(str: StringList): StringList = str match {
case Nil() => Nil()
case Cons(h, t) => t
}
// define function slice
def slice(from: Int, until: Int, xs: StringList): StringList = (from, until, xs) match {
case (n, m: Int, _) if n == m || n > m || n < 0 => Nil()
case (n, m: Int, xs: StringList) =>
if (m > numberOfElements(xs)) {
slice(n, numberOfElements(xs), xs)
} else {
Cons(elemAtIndex(n, xs), slice(n + 1, m, xs))
}
}
}
scala> Solution.slice(1, 3, Cons("a", Cons("b", Cons("c", Cons("d", Cons("e", Nil()))))))
res0: StringList = Cons(b,Cons(c,Nil()))
scala> Solution.slice(3, 7, Cons("a", Cons("b", Cons("c", Cons("d", Cons("e", Nil()))))))
res0: StringList = Cons(d,Cons(e,Nil()))

Scala sort list based on second attribute and then first

I wish to sort a list containing (word, word.length) first based on length and then words alphabetically. So given: "I am a girl" the output should be a:1, I:1, am:2, girl:4
I have the following piece of code which works but not for all examples
val lengths = words.map(x => x.length)
val wordPairs = words.zip(lengths).toList
val mapwords = wordPairs.sort (_._2 < _._2).sortBy(_._1)
You can sort by tuple:
scala> val words = "I am a girl".split(" ")
words: Array[java.lang.String] = Array(I, am, a, girl)
scala> words.sortBy(w => w.length -> w)
res0: Array[java.lang.String] = Array(I, a, am, girl)
scala> words.sortBy(w => w.length -> w.toLowerCase)
res1: Array[java.lang.String] = Array(a, I, am, girl)
U can do that in one line:
"I am a girl".toLowerCase.split(" ").map(x => (x,x.length)).sortWith { (x: (String,Int), y: (String,Int)) => x._1 < y._1 }
or in two lines:
val wordPairs = "I am a girl".split(" ").map(x => (x,x.length))
val result = wordPairs.toLowerCase.sortWith { (x: (String,Int), y: (String,Int)) => x._1 < y._1 }