Scala memory issue on List vs. Vector - list

I wrote a solution to project Euler problem #59 in Scala and I do not understand why switching between Vector and List adds what I think is a memory leak.
Here is a working, brute force solution using Vectors.
val code = scala.io.Source.fromFile("e59.txt").getLines()
.flatMap(l => l.split(',')).map(_.toInt).toVector
val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toVector
def decode(k: Int)(code: Vector[Int])(pswd: Vector[Int]): Vector[Int] = {
code.grouped(k).flatMap(cs => cs.toVector.zip(pswd).map(t => t._1 ^ t._2)).toVector
}
def scoreText(text: Vector[Int]): Int = {
if (text.contains((c: Int) => (c < 0 || c > 128))) -1
else {
val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
words.length - words.diff(commonWords).length
}
}
lazy val psswds = for {
a <- (97 to 122);
b <- (97 to 122);
c <- (97 to 122)
} yield Vector(a, b, c)
val ans = psswds.toStream.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
println(ans)
I store original code (a collection of ordered ints), each password and some common English words as Vectors.
However, if I replace Vector with List, my program slows down with each checked password and eventually crashes:
val code = scala.io.Source.fromFile("e59.txt").getLines()
.flatMap(l => l.split(',')).map(_.toInt).toList
val commonWords = scala.io.Source.fromFile("common_words.txt").getLines().toList
def decode(k: Int)(code: List[Int])(pswd: List[Int]): List[Int] = {
println(pswd)
code.grouped(k).flatMap(cs => cs.toList.zip(pswd).map(t => t._1 ^ t._2)).toList
}
def scoreText(text: List[Int]): Int = {
if (text.contains((c: Int) => (c < 0 || c > 128))) -1
else {
val words = text.map(_.toChar).mkString.toLowerCase.split(' ')
words.length - words.diff(commonWords).length
}
}
lazy val psswds = for {
a <- (97 to 122);
b <- (97 to 122);
c <- (97 to 122)
} yield List(a, b, c)
val ans = psswds.toStream.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
println(ans)
Error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.valueOf(String.java:2861)
at java.lang.Character.toString(Character.java:4439)
at java.lang.String.valueOf(String.java:2847)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:349)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:342)
at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:308)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:310)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:312)
at scala.collection.AbstractTraversable.mkString(Traversable.scala:104)
at Main$$anon$1.Main$$anon$$scoreText(e59_list.scala:14)
at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
at Main$$anon$1$$anonfun$5.apply(e59_list.scala:26)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1222)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1212)
at scala.collection.immutable.Stream.foreach(Stream.scala:595)
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:227)
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:104)
at Main$$anon$1.<init>(e59_list.scala:27)
at Main$.main(e59_list.scala:1)
at Main.main(e59_list.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)
Files used:
common_words.txt
a
able
about
across
after
all
almost
also
am
among
an
and
any
are
as
at
be
because
been
but
by
can
cannot
could
dear
did
do
does
either
else
ever
every
for
from
get
got
had
has
have
he
her
hers
him
his
how
however
i
if
in
into
is
it
its
just
least
let
like
likely
may
me
might
most
must
my
neither
no
nor
not
of
off
often
on
only
or
other
our
own
rather
said
say
says
she
should
since
so
some
than
that
the
their
them
then
there
these
they
this
tis
to
too
twas
us
wants
was
we
were
what
when
where
which
while
who
whom
why
will
with
would
yet
you
your
e59.txt
79,59,12,2,79,35,8,28,20,2,3,68,8,9,68,45,0,12,9,67,68,4,7,5,23,27,1,21,79,85,78,79,85,71,38,10,71,27,12,2,79,6,2,8,13,9,1,13,9,8,68,19,7,1,71,56,11,21,11,68,6,3,22,2,14,0,30,79,1,31,6,23,19,10,0,73,79,44,2,79,19,6,28,68,16,6,16,15,79,35,8,11,72,71,14,10,3,79,12,2,79,19,6,28,68,32,0,0,73,79,86,71,39,1,71,24,5,20,79,13,9,79,16,15,10,68,5,10,3,14,1,10,14,1,3,71,24,13,19,7,68,32,0,0,73,79,87,71,39,1,71,12,22,2,14,16,2,11,68,2,25,1,21,22,16,15,6,10,0,79,16,15,10,22,2,79,13,20,65,68,41,0,16,15,6,10,0,79,1,31,6,23,19,28,68,19,7,5,19,79,12,2,79,0,14,11,10,64,27,68,10,14,15,2,65,68,83,79,40,14,9,1,71,6,16,20,10,8,1,79,19,6,28,68,14,1,68,15,6,9,75,79,5,9,11,68,19,7,13,20,79,8,14,9,1,71,8,13,17,10,23,71,3,13,0,7,16,71,27,11,71,10,18,2,29,29,8,1,1,73,79,81,71,59,12,2,79,8,14,8,12,19,79,23,15,6,10,2,28,68,19,7,22,8,26,3,15,79,16,15,10,68,3,14,22,12,1,1,20,28,72,71,14,10,3,79,16,15,10,68,3,14,22,12,1,1,20,28,68,4,14,10,71,1,1,17,10,22,71,10,28,19,6,10,0,26,13,20,7,68,14,27,74,71,89,68,32,0,0,71,28,1,9,27,68,45,0,12,9,79,16,15,10,68,37,14,20,19,6,23,19,79,83,71,27,11,71,27,1,11,3,68,2,25,1,21,22,11,9,10,68,6,13,11,18,27,68,19,7,1,71,3,13,0,7,16,71,28,11,71,27,12,6,27,68,2,25,1,21,22,11,9,10,68,10,6,3,15,27,68,5,10,8,14,10,18,2,79,6,2,12,5,18,28,1,71,0,2,71,7,13,20,79,16,2,28,16,14,2,11,9,22,74,71,87,68,45,0,12,9,79,12,14,2,23,2,3,2,71,24,5,20,79,10,8,27,68,19,7,1,71,3,13,0,7,16,92,79,12,2,79,19,6,28,68,8,1,8,30,79,5,71,24,13,19,1,1,20,28,68,19,0,68,19,7,1,71,3,13,0,7,16,73,79,93,71,59,12,2,79,11,9,10,68,16,7,11,71,6,23,71,27,12,2,79,16,21,26,1,71,3,13,0,7,16,75,79,19,15,0,68,0,6,18,2,28,68,11,6,3,15,27,68,19,0,68,2,25,1,21,22,11,9,10,72,71,24,5,20,79,3,8,6,10,0,79,16,8,79,7,8,2,1,71,6,10,19,0,68,19,7,1,71,24,11,21,3,0,73,79,85,87,79,38,18,27,68,6,3,16,15,0,17,0,7,68,19,7,1,71,24,11,21,3,0,71,24,5,20,79,9,6,11,1,71,27,12,21,0,17,0,7,68,15,6,9,75,79,16,15,10,68,16,0,22,11,11,68,3,6,0,9,72,16,71,29,1,4,0,3,9,6,30,2,79,12,14,2,68,16,7,1,9,79,12,2,79,7,6,2,1,73,79,85,86,79,33,17,10,10,71,6,10,71,7,13,20,79,11,16,1,68,11,14,10,3,79,5,9,11,68,6,2,11,9,8,68,15,6,23,71,0,19,9,79,20,2,0,20,11,10,72,71,7,1,71,24,5,20,79,10,8,27,68,6,12,7,2,31,16,2,11,74,71,94,86,71,45,17,19,79,16,8,79,5,11,3,68,16,7,11,71,13,1,11,6,1,17,10,0,71,7,13,10,79,5,9,11,68,6,12,7,2,31,16,2,11,68,15,6,9,75,79,12,2,79,3,6,25,1,71,27,12,2,79,22,14,8,12,19,79,16,8,79,6,2,12,11,10,10,68,4,7,13,11,11,22,2,1,68,8,9,68,32,0,0,73,79,85,84,79,48,15,10,29,71,14,22,2,79,22,2,13,11,21,1,69,71,59,12,14,28,68,14,28,68,9,0,16,71,14,68,23,7,29,20,6,7,6,3,68,5,6,22,19,7,68,21,10,23,18,3,16,14,1,3,71,9,22,8,2,68,15,26,9,6,1,68,23,14,23,20,6,11,9,79,11,21,79,20,11,14,10,75,79,16,15,6,23,71,29,1,5,6,22,19,7,68,4,0,9,2,28,68,1,29,11,10,79,35,8,11,74,86,91,68,52,0,68,19,7,1,71,56,11,21,11,68,5,10,7,6,2,1,71,7,17,10,14,10,71,14,10,3,79,8,14,25,1,3,79,12,2,29,1,71,0,10,71,10,5,21,27,12,71,14,9,8,1,3,71,26,23,73,79,44,2,79,19,6,28,68,1,26,8,11,79,11,1,79,17,9,9,5,14,3,13,9,8,68,11,0,18,2,79,5,9,11,68,1,14,13,19,7,2,18,3,10,2,28,23,73,79,37,9,11,68,16,10,68,15,14,18,2,79,23,2,10,10,71,7,13,20,79,3,11,0,22,30,67,68,19,7,1,71,8,8,8,29,29,71,0,2,71,27,12,2,79,11,9,3,29,71,60,11,9,79,11,1,79,16,15,10,68,33,14,16,15,10,22,73

Large amount of Lists create more load on GC comparing to the same Vectors. But your problem is not about right choice of collections, but about wrong use of Stream.
Scala's streams can be very memory inefficient if used improperly. In your case, I assume, you were trying to use Stream to avoid eager computation of the transformed passwds collection, but you actually made the things worse (as Stream not only memoized your elements, it created extra overhead with Stream wrappers of these elements).
What you had to do is just to replace toStream with view. It will create collection wrapper which makes nearly all transformations lazy (basically what you tried to achieve).
val ans = psswds.view.map(decode(3)(code))
.map(text => (text, scoreText(text)))
.maxBy(_._2)._1.sum
After this tiny fix you program runs fine even with -Xmx5m (I checked).
There are also many other things to optimize in your program (try to avoid creating excessive collections), but I'll leave it to you.

Related

Scala functional programming concepts instead of multiple for loops

I am trying to learn functional programming in Scala. Right now I'm using the OOP way of having for loops to do a job. I have two lists userCurrentRole and entitlements over which I'm doing a double for loop:
for {
curr <- userCurrentRole {
ent <- entitlements
} {
if (ent.userEmail.split("#")(0) == curr.head) {
if (ent.roleName != curr(1)) {
grantOrRevoke += 1
grantList += SomeCaseClass(curr.head, ent.roleName)
}
}
}
}
Is it possible to convert this double for loop into a logic that uses map or filter or both or any functional programming features of scala, but without a for loop?
EDIT 1: Added a list addition inside the double if..
The good news is: you are already using functional style! Since the for is not a loop per se, but a "for comprehension", which desugars into flatMap and map calls. It's only easier to read / write.
However, the thing you should avoid is mutable variables, like the grantOrRevoke thing you have there.
val revocations = for {
curr <- userCurrentRole {
ent <- entitlements
if ent.userEmail.split("#")(0) == curr.head
if ent.roleName != curr(1)
} yield {
1
}
revocations.size // same as revocations.sum
Note that the ifs inside the for block (usually) desugar to withFilter calls, which is often preferable to filter calls, since the latter builds up a new collection whereas the former avoids that.
You can write it like this:
val grantOrRevoke = userCurrentRole
.map(curr => entitlements
.filter(ent => ent.userEmail.split("#")(0) == curr.head && ent.roleName != curr(1))
.size)
.sum
Well, you are already using some higher order functions, only that you don't notice it, because you believe those are for loops, but they aren't loops. They are just sugar syntax for calls to map & flatMap. But in your case, also to foreach and that plus mutability, is want doesn't make it functional.
I would recommend you to take a look to the scaladoc, you will find that collections have a lot of useful methods.
For example, in this case, we may use count & sum.
val grantOrRevoke = userCurrentRole.iterator.map {
// Maybe it would be better to have a list of tuples instead of a list of lists.
case List(username, userRole) =>
entitlements.count { ent =>
(ent.userEmail.split("#", 2)(0) == username) && (ent.roleName == userRole)
}
}.sum

How to maintain an immutable list when you impact object linked to each other into this list

I'm trying to code the fast Non Dominated Sorting algorithm (NDS) of Deb used in NSGA2 in immutable way using Scala.
But the problem seems more difficult than i think, so i simplify here the problem to make a MWE.
Imagine a population of Seq[A], and each A element is decoratedA with a list which contains pointers to other elements of the population Seq[A].
A function evalA(a:decoratedA) take the list of linkedA it contains, and decrement value of each.
Next i take a subset list decoratedAPopulation of population A, and call evalA on each. I have a problem, because between each iteration on element on this subset list decoratedAPopulation, i need to update my population of A with the new decoratedA and the new updated linkedA it contain ...
More problematic, each element of population need an update of 'linkedA' to replace the linked element if it change ...
Hum as you can see, it seem complicated to maintain all linked list synchronized in this way. I propose another solution bottom, which probably need recursion to return after each EvalA a new Population with element replaced.
How can i do that correctly in an immutable way ?
It's easy to code in a mutable way, but i don't find a good way to do this in an immutable way, do you have a path or an idea to do that ?
object test extends App{
case class A(value:Int) {def decrement()= new A(value - 1)}
case class decoratedA(oneAdecorated:A, listOfLinkedA:Seq[A])
// We start algorithm loop with A element with value = 0
val population = Seq(new A(0), new A(0), new A(8), new A(1))
val decoratedApopulation = Seq(new decoratedA(population(1),Seq(population(2),population(3))),
new decoratedA(population(2),Seq(population(1),population(3))))
def evalA(a:decoratedA) = {
val newListOfLinked = a.listOfLinkedA.map{e => e.decrement()
new decoratedA(a.oneAdecorated,newListOfLinked)}
}
def run()= {
//decoratedApopulation.map{
// ?
//}
}
}
Update 1:
About the input / output of the initial algorithm.
The first part of Deb algorithm (Step 1 to Step 3) analyse a list of Individual, and compute for each A : (a) domination count, the number of A which dominate me (the value attribute of A) (b) a list of A i dominate (listOfLinkedA).
So it return a Population of decoratedA totally initialized, and for the entry of Step 4 (my problem) i take the first non dominated front, cf. the subset of elements of decoratedA with A value = 0.
My problem start here, with a list of decoratedA with A value = 0; and i search the next front into this list by computing each listOfLinkedA of each of this A
At each iteration between step 4 to step 6, i need to compute a new B subset list of decoratedA with A value = 0. For each , i decrementing first the domination count attribute of each element into listOfLinkedA, then i filter to get the element equal to 0. A the end of step 6, B is saved to a list List[Seq[DecoratedA]], then i restart to step 4 with B, and compute a new C, etc.
Something like that in my code, i call explore() for each element of B, with Q equal at the end to new subset of decoratedA with value (fitness here) = 0 :
case class PopulationElement(popElement:Seq[Double]){
implicit def poptodouble():Seq[Double] = {
popElement
}
}
class SolutionElement(values: PopulationElement, fitness:Double, dominates: Seq[SolutionElement]) {
def decrement()= if (fitness == 0) this else new SolutionElement(values,fitness - 1, dominates)
def explore(Q:Seq[SolutionElement]):(SolutionElement, Seq[SolutionElement])={
// return all dominates elements with fitness - 1
val newSolutionSet = dominates.map{_.decrement()}
val filteredSolution:Seq[SolutionElement] = newSolutionSet.filter{s => s.fitness == 0.0}.diff{Q}
filteredSolution
}
}
A the end of algorithm, i have a final list of seq of decoratedA List[Seq[DecoratedA]] which contain all my fronts computed.
Update 2
A sample of value extracted from this example.
I take only the pareto front (red) and the {f,h,l} next front with dominated count = 1.
case class p(x: Int, y: Int)
val a = A(p(3.5, 1.0),0)
val b = A(p(3.0, 1.5),0)
val c = A(p(2.0, 2.0),0)
val d = A(p(1.0, 3.0),0)
val e = A(p(0.5, 4.0),0)
val f = A(p(0.5, 4.5),1)
val h = A(p(1.5, 3.5),1)
val l = A(p(4.5, 1.0),1)
case class A(XY:p, value:Int) {def decrement()= new A(XY, value - 1)}
case class ARoot(node:A, children:Seq[A])
val population = Seq(
ARoot(a,Seq(f,h,l),
ARoot(b,Seq(f,h,l)),
ARoot(c,Seq(f,h,l)),
ARoot(d,Seq(f,h,l)),
ARoot(e,Seq(f,h,l)),
ARoot(f,Nil),
ARoot(h,Nil),
ARoot(l,Nil))
Algorithm return List(List(a,b,c,d,e), List(f,h,l))
Update 3
After 2 hour, and some pattern matching problems (Ahum...) i'm comming back with complete example which compute automaticaly the dominated counter, and the children of each ARoot.
But i have the same problem, my children list computation is not totally correct, because each element A is possibly a shared member of another ARoot children list, so i need to think about your answer to modify it :/ At this time i only compute children list of Seq[p], and i need list of seq[A]
case class p(x: Double, y: Double){
def toSeq():Seq[Double] = Seq(x,y)
}
case class A(XY:p, dominatedCounter:Int) {def decrement()= new A(XY, dominatedCounter - 1)}
case class ARoot(node:A, children:Seq[A])
case class ARootRaw(node:A, children:Seq[p])
object test_stackoverflow extends App {
val a = new p(3.5, 1.0)
val b = new p(3.0, 1.5)
val c = new p(2.0, 2.0)
val d = new p(1.0, 3.0)
val e = new p(0.5, 4.0)
val f = new p(0.5, 4.5)
val g = new p(1.5, 4.5)
val h = new p(1.5, 3.5)
val i = new p(2.0, 3.5)
val j = new p(2.5, 3.0)
val k = new p(3.5, 2.0)
val l = new p(4.5, 1.0)
val m = new p(4.5, 2.5)
val n = new p(4.0, 4.0)
val o = new p(3.0, 4.0)
val p = new p(5.0, 4.5)
def isStriclyDominated(p1: p, p2: p): Boolean = {
(p1.toSeq zip p2.toSeq).exists { case (g1, g2) => g1 < g2 }
}
def sortedByRank(population: Seq[p]) = {
def paretoRanking(values: Set[p]) = {
//comment from #dk14: I suppose order of values isn't matter here, otherwise use SortedSet
values.map { v1 =>
val t = (values - v1).filter(isStriclyDominated(v1, _)).toSeq
val a = new A(v1, values.size - t.size - 1)
val root = new ARootRaw(a, t)
println("Root value ", root)
root
}
}
val listOfARootRaw = paretoRanking(population.toSet)
//From #dk14: Here is convertion from Seq[p] to Seq[A]
val dominations: Map[p, Int] = listOfARootRaw.map(a => a.node.XY -> a.node.dominatedCounter) //From #dk14: It's a map with dominatedCounter for each point
val listOfARoot = listOfARootRaw.map(raw => ARoot(raw.node, raw.children.map(p => A(p, dominations.getOrElse(p, 0)))))
listOfARoot.groupBy(_.node.dominatedCounter)
}
//Get the first front, a subset of ARoot, and start the step 4
println(sortedByRank(Seq(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p)).head)
}
Talking about your problem with distinguishing fronts (after update 2):
val (left,right) = population.partition(_.node.value == 0)
List(left, right.map(_.copy(node = node.copy(value = node.value - 1))))
No need for mutating anything here. copy will copy everything but fields you specified with new values. Talking about the code, the new copy will be linked to the same list of children, but new value = value - 1.
P.S. I have a feeling you may actually want to do something like this:
case class A(id: String, level: Int)
val a = A("a", 1)
val b = A("b", 2)
val c = A("c", 2)
val d = A("d", 3)
clusterize(List(a,b,c,d)) === List(List(a), List(b,c), List(d))
It's simple to implement:
def clusterize(list: List[A]) =
list.groupBy(_.level).toList.sortBy(_._1).map(_._2)
Test:
scala> clusterize(List(A("a", 1), A("b", 2), A("c", 2), A("d", 3)))
res2: List[List[A]] = List(List(A(a,1)), List(A(b,2), A(c,2)), List(A(d,3)))
P.S.2. Please consider better naming conventions, like here.
Talking about "mutating" elements in some complex structure:
The idea of "immutable mutating" some shared (between parts of a structure) value is to separate your "mutation" from the structure. Or simply saying, divide and conquerror:
calculate changes in advance
apply them
The code:
case class A(v: Int)
case class AA(a: A, seq: Seq[A]) //decoratedA
def update(input: Seq[AA]) = {
//shows how to decrement each value wherever it is:
val stats = input.map(_.a).groupBy(identity).mapValues(_.size) //domination count for each A
def upd(a: A) = A(a.v - stats.getOrElse(a, 0)) //apply decrement
input.map(aa => aa.copy(aa = aa.seq.map(upd))) //traverse and "update" original structure
}
So, I've introduced new Map[A, Int] structure, that shows how to modify the original one. This approach is based on highly simplified version of Applicative Functor concept. In general case, it should be Map[A, A => A] or even Map[K, A => B] or even Map[K, Zipper[A] => B] as applicative functor (input <*> map). *Zipper (see 1, 2) actually could give you information about current element's context.
Notes:
I assumed that As with same value are same; that's default behaviour for case classess, otherwise you need to provide some additional id's (or redefine hashCode/equals).
If you need more levels - like AA(AA(AA(...)))) - just make stats and upd recursive, if dеcrement's weight depends on nesting level - just add nesting level as parameter to your recursive function.
If decrement depends on parent node (like decrement only A(3)'s, which belongs to A(3)) - add parent node(s) as part of stats's key and analise it during upd.
If there is some dependency between stats calculation (how much to decrement) of let's say input(1) from input(0) - you should use foldLeft with partial stats as accumulator: val stats = input.foldLeft(Map[A, Int]())((partialStats, elem) => partialStats ++ analize(partialStats, elem))
Btw, it takes O(N) here (linear memory and cpu usage)
Example:
scala> val population = Seq(A(3), A(6), A(8), A(3))
population: Seq[A] = List(A(3), A(6), A(8), A(3))
scala> val input = Seq(AA(population(1),Seq(population(2),population(3))), AA(population(2),Seq(population(1),population(3))))
input: Seq[AA] = List(AA(A(6),List(A(8), A(3))), AA(A(8),List(A(6), A(3))))
scala> update(input)
res34: Seq[AA] = List(AA(A(5),List(A(7), A(3))), AA(A(7),List(A(5), A(3))))

expression evaluator in scala (with maybe placeholders?)

I am reading something like this from my configuration file :
metric1.critical = "<2000 || >20000"
metric1.okay = "=1"
metric1.warning = "<=3000"
metric2.okay = ">0.9 && < 1.1 "
metric3.warning ="( >0.9 && <1.5) || (<500 &&>200)"
and I have a
metric1.value = //have some value
My aim is to basically evaluate
if(metric1.value<2000 || metric1.value > 20000)
metric1.setAlert("critical");
else if(metric1.value=1)
metric.setAlert("okay");
//and so on
I am not really good with regex so I am going to try not to use it. I am coding in Scala and wanted to know if any existing library can help with this. Maybe i need to put placeholders to fill in the blanks and then evaluate the expression? But how do I evaluate the expression most efficiently and with less overhead?
EDIT:
In java how we have expression evaluator Libraries i was hoping i could find something similar for my code . Maybe I can add placeholders in the config file like "?" these to substitute my metric1.value (read variables) and then use an evaluator?
OR
Can someone suggest a good regex for this?
Thanks in advance!
This sounds like you want to define your own syntax using a parser combinator library.
There is a parser combinator built into the scala class library. Since the scala library has been modularized, it is now a separate project that lives at https://github.com/scala/scala-parser-combinators.
Update: everybody looking for a parser combinator library that is conceptually similar to scala-parser-combinators should take a look at fastparse. It is very fast, and does not use macros. So it can serve as a drop-in replacement for scala-parser-combinators.
There are some examples on how to use it in Programming in Scala, Chapter 33, "Combinator Parsing".
Here is a little grammar, ast and evaluator to get you started. This is missing a lot of things such as whitespace handling, operator priority etc. You should also not use strings for encoding the different comparison operators. But I think with this and the chapter from Programming in Scala you should be able to come up with something that suits your needs.
import scala.util.parsing.combinator.{JavaTokenParsers, PackratParsers}
sealed abstract class AST
sealed abstract class BooleanExpression extends AST
case class BooleanOperation(op: String, lhs: BooleanExpression, rhs:BooleanExpression) extends BooleanExpression
case class Comparison(op:String, rhs:Constant) extends BooleanExpression
case class Constant(value: Double) extends AST
object ConditionParser extends JavaTokenParsers with PackratParsers {
val booleanOperator : PackratParser[String] = literal("||") | literal("&&")
val comparisonOperator : PackratParser[String] = literal("<=") | literal(">=") | literal("==") | literal("!=") | literal("<") | literal(">")
val constant : PackratParser[Constant] = floatingPointNumber.^^ { x => Constant(x.toDouble) }
val comparison : PackratParser[Comparison] = (comparisonOperator ~ constant) ^^ { case op ~ rhs => Comparison(op, rhs) }
lazy val p1 : PackratParser[BooleanExpression] = booleanOperation | comparison
val booleanOperation = (p1 ~ booleanOperator ~ p1) ^^ { case lhs ~ op ~ rhs => BooleanOperation(op, lhs, rhs) }
}
object Evaluator {
def evaluate(expression:BooleanExpression, value:Double) : Boolean = expression match {
case Comparison("<=", Constant(c)) => value <= c
case Comparison(">=", Constant(c)) => value >= c
case Comparison("==", Constant(c)) => value == c
case Comparison("!=", Constant(c)) => value != c
case Comparison("<", Constant(c)) => value < c
case Comparison(">", Constant(c)) => value > c
case BooleanOperation("||", a, b) => evaluate(a, value) || evaluate(b, value)
case BooleanOperation("&&", a, b) => evaluate(a, value) && evaluate(b, value)
}
}
object Test extends App {
def parse(text:String) : BooleanExpression = ConditionParser.parseAll(ConditionParser.p1, text).get
val texts = Seq(
"<2000",
"<2000||>20000",
"==1",
"<=3000",
">0.9&&<1.1")
val xs = Seq(0.0, 1.0, 100000.0)
for {
text <- texts
expression = parse(text)
x <- xs
result = Evaluator.evaluate(expression, x)
} {
println(s"$text $expression $x $result")
}
}
Scala has built in Interpreter library which you can use. The library provides functionalities similar to eval() in many other languages. You can pass Scala code snippet as String to the .interpret method and it will evaluate it.
import scala.tools.nsc.{ Interpreter, Settings }
val settings = new Settings
settings.usejavacp.value = true
val in = new Interpreter(settings)
val lowerCritical = "<2000" // set the value from config
val value = 200
in.interpret(s"$value $lowerCritical") //> res0: Boolean = true
val value1 = 20000 //> value1 : Int = 20000
in.interpret(s"$value1 $lowerCritical") //> res1: Boolean = false
You want to use an actual parser for this.
Most answers are suggesting Scala's parser combinators, and that's a perfectly valid choice, if a bit out-of-date.
I'd suggest Parboiled2, an other parser combinator implementation that has the distinct advantage of being written as Scala macros - without getting too technical, it means your parser is generated at compile time rather than runtime, which can yield significant performance improvements. Some benchmarks have Parboiled2 up to 200 times as fast as Scala's parser combinator.
And since parser combinators are now in a separate dependency (as of 2.11, I believe), there really is no good reason to prefer them to Parboiled2.
I recently faced the same problem and I ended up writing my own expression evaluation library scalexpr. It is a simple library but it can validate / evaluate expressions that are similar to the ones in the question. You can do things like:
val ctx = Map("id" -> 10L, "name" -> "sensor1")
val parser = ExpressionParser()
val expr = parser.parseBooleanExpression(""" id == 10L || name == "sensor1" """).get
println(expr3.resolve(ctx3)) // prints true
If you don't want to use the library, I recommend the fastparse parser... It is much faster than parser combinators, a little bit slower than parboiled, but much easier to use than both.

Insertion order of a list based on order of another list

I have a sorting problem in Scala that I could certainly solve with brute-force, but I'm hopeful there is a more clever/elegant solution available. Suppose I have a list of strings in no particular order:
val keys = List("john", "jill", "ganesh", "wei", "bruce", "123", "Pantera")
Then at random, I receive the values for these keys at random (full-disclosure, I'm experiencing this problem in an akka actor, so events are not in order):
def receive:Receive = {
case Value(key, otherStuff) => // key is an element in keys ...
And I want to store these results in a List where the Value objects appear in the same order as their key fields in the keys list. For instance, I may have this list after receiving the first two Value messages:
List(Value("ganesh", stuff1), Value("bruce", stuff2))
ganesh appears before bruce merely because he appears earlier in the keys list. Once the third message is received, I should insert it into this list in the correct location per the ordering established by keys. For instance, on receiving wei I should insert him into the middle:
List(Value("ganesh", stuff1), Value("wei", stuff3), Value("bruce", stuff2))
At any point during this process, my list may be incomplete but in the expected order. Since the keys are redundant with my Value data, I throw them away once the list of values is complete.
Show me what you've got!
I assume you want no worse than O(n log n) performance. So:
val order = keys.zipWithIndex.toMap
var part = collection.immutable.TreeSet.empty[Value](
math.Ordering.by(v => order(v.key))
)
Then you just add your items.
scala> part = part + Value("ganesh", 0.1)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1))
scala> part = part + Value("bruce", 0.2)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1), Value(bruce,0.2))
scala> part = part + Value("wei", 0.3)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1), Value(wei,0.3), Value(bruce,0.2))
When you're done, you can .toList it. While you're building it, you probably don't want to, since updating a list in random order so that it is in a desired sorted order is an obligatory O(n^2) cost.
Edit: with your example of seven items, my solution takes about 1/3 the time of Jean-Philippe's. For 25 items, it's 1/10th the time. 1/30th for 200 (which is the difference between 6 ms and 0.2 ms on my machine).
If you can use a ListMap instead of a list of tuples to store values while they're gathered, this could work. ListMap preserves insertion order.
class MyActor(keys: List[String]) extends Actor {
def initial(values: ListMap[String, Option[Value]]): Receive = {
case v # Value(key, otherStuff) =>
if(values.forall(_._2.isDefined))
context.become(valuesReceived(values.updated(key, Some(v)).collect { case (_, Some(v)) => v))
else
context.become(initial(keys, values.updated(key, Some(v))))
}
def valuesReceived(values: Seq[Value]): Receive = { } // whatever you need
def receive = initial(keys.map { k => (k -> None) })
}
(warning: not compiled)

Scala spec unit tests

I ve got the following class and I want to write some Spec test cases, but I am really new to it and I don't know how to start. My class do loke like this:
class Board{
val array = Array.fill(7)(Array.fill(6)(None:Option[Coin]))
def move(x:Int, coin:Coin) {
val y = array(x).indexOf(None)
require(y >= 0)
array(x)(y) = Some(coin)
}
def apply(x: Int, y: Int):Option[Coin] =
if (0 <= x && x < 7 && 0 <= y && y < 6) array(x)(y)
else None
def winner: Option[Coin] = winner(Cross).orElse(winner(Naught))
private def winner(coin:Coin):Option[Coin] = {
val rows = (0 until 6).map(y => (0 until 7).map( x => apply(x,y)))
val cols = (0 until 7).map(x => (0 until 6).map( y => apply(x,y)))
val dia1 = (0 until 4).map(x => (0 until 6).map( y => apply(x+y,y)))
val dia2 = (3 until 7).map(x => (0 until 6).map( y => apply(x-y,y)))
val slice = List.fill(4)(Some(coin))
if((rows ++ cols ++ dia1 ++ dia2).exists(_.containsSlice(slice)))
Some(coin)
else None
}
override def toString = {
val string = new StringBuilder
for(y <- 5 to 0 by -1; x <- 0 to 6){
string.append(apply(x, y).getOrElse("_"))
if (x == 6) string.append ("\n")
else string.append("|")
}
string.append("0 1 2 3 4 5 6\n").toString
}
}
Thank you!
I can only second Daniel's suggestion, because you'll end up with a more practical API by using TDD.
I also think that your application could be nicely tested with a mix of specs2 and ScalaCheck. Here the draft of a Specification to get you started:
import org.specs2._
import org.scalacheck.{Arbitrary, Gen}
class TestSpec extends Specification with ScalaCheck { def is =
"moving a coin in a column moves the coin to the nearest empty slot" ! e1^
"a coin wins if" ^
"a row contains 4 consecutive coins" ! e2^
"a column contains 4 consecutive coins" ! e3^
"a diagonal contains 4 consecutive coins" ! e4^
end
def e1 = check { (b: Board, x: Int, c: Coin) =>
try { b.move(x, c) } catch { case e => () }
// either there was a coin before somewhere in that column
// or there is now after the move
(0 until 6).exists(y => b(x, y).isDefined)
}
def e2 = pending
def e3 = pending
def e4 = pending
/**
* Random data for Coins, x position and Board
*/
implicit def arbitraryCoin: Arbitrary[Coin] = Arbitrary { Gen.oneOf(Cross, Naught) }
implicit def arbitraryXPosition: Arbitrary[Int] = Arbitrary { Gen.choose(0, 6) }
implicit def arbitraryBoardMove: Arbitrary[(Int, Coin)] = Arbitrary {
for {
coin <- arbitraryCoin.arbitrary
x <- arbitraryXPosition.arbitrary
} yield (x, coin)
}
implicit def arbitraryBoard: Arbitrary[Board] = Arbitrary {
for {
moves <- Gen.listOf1(arbitraryBoardMove.arbitrary)
} yield {
val board = new Board
moves.foreach { case (x, coin) =>
try { board.move(x, coin) } catch { case e => () }}
board
}
}
}
object Cross extends Coin {
override def toString = "x"
}
object Naught extends Coin {
override def toString = "o"
}
sealed trait Coin
The e1 property I've implemented is not the real thing because it doesn't really check that we moved the coin to the nearest empty slot, which is what your code and your API suggests. You will also want to change the generated data so that the Boards are generated with an alternation of x and o. That should be a great way to learn how to use ScalaCheck!
I suggest you throw all that code out -- well, save it somewhere, but start from zero using TDD.
The Specs2 site has plenty examples of how to write tests, but use TDD -- test driven design -- to do it. Adding tests after the fact is suboptimal, to say the least.
So, think of the most simple case you want to handle of the most simple feature, write a test for that, see it fail, write the code to fix it. Refactor if necessary, and repeat for the next most simple case.
If you want help with how to do TDD in general, I heartily endorse the videos about TDD available on Clean Coders. At the very least, watch the second part where Bob Martin writes a whole class TDD-style, from design to end.
If you know how to do testing in general but are confused about Scala or Specs, please be much more specific about what your questions are.