I have a (possibly infinite) regular language which I describe with a regular expression. From this regular language I want to obtain all strings of length up to n, using scala. Some quick googling tells me there are some libraries out there that can help me. Before using an external library I want to know if this is something that is easy (as in something a decent programmer can implement in under 15 minutes) to do myself in Scala. If not, are there some good libraries that you can recommend for this?
To make what I want more concrete. Suppose I have the language A*B* and my n is 3, I then want the strings "", "A", "B", "AA", "AB", "BB", "AAA", "AAB", "ABB", "BBB".
Answer
Edits
26Nov, 4:30pm - added iterator-based version to reduce runtime and memory consumption. Seq-based version of canonic is at the bottom under (1)
26Nov, 2:45pm - added working seq-based version for canonic, non working old version of canonic is at the bottom (2)
Approach
Canonically generate all words possible for a given alphabet up to length n.
Filter the generated words by a regular expression (your regular language in that case)
Code
object SO {
import scala.annotation.tailrec
import scala.collection.{AbstractIterator, Iterator}
import scala.util.matching.Regex
def canonic(alphabet: Seq[Char], n: Int): Iterator[String] =
if (n < 0) Iterator.empty
else {
val r: IndexedSeq[Iterator[String]] = for (i <- 1 to n)
yield new CanonicItr(alphabet, i)
r.reduce(_ ++ _)
}
private class CanonicItr(alphabet: Seq[Char], width: Int) extends AbstractIterator[String] {
val aSize = alphabet.size
val alph = alphabet.toVector
val total = aSizePower(width)
println("total " + total)
private var pos = 0L
private def aSizePower(r: Int): Long = scala.math.pow(aSize, r).toLong
def stringFor(id: Long): String = {
val r = for {
i <- (0 until width).reverse
// (738 / 10^0) % 10 = 8
// (738 / 10^1) % 10 = 3
// (738 / 10^2) % 10 = 7
charIdx = ((id / (aSizePower(i))) % aSize).toInt
} yield alph(charIdx)
r.mkString("")
}
override def hasNext: Boolean = pos < total
override def next(): String = {
val s = stringFor(pos)
pos = pos + 1
s
}
}
def main(args: Array[String]): Unit = {
// create all possible words with the given alphabet
val canonicWordSet = canonic(Seq('a', 'b', 'c'), 8)
// formal regular language definition
val languageDef: Regex = "a*b".r
// determine words of language by filtering the canocic set.
val wordsOfLanguage = canonicWordSet.filter(word => languageDef.pattern.matcher(word).matches)
println(wordsOfLanguage.toList)
}
}
1) Working version of canonic but with high memory requirements
object SO {
import scala.util.matching.Regex
/**
* Given a sequence of characters (e.g. Seq('a', 'b', 'c') )
* generates all combinations up to lneght of n (incl.).
*
* #param alphabet sequence of characters
* #param n is the max length
* #return all combinations of up to length n.
*/
def canonic(alphabet:Seq[Char], n: Int): Seq[String] = {
def combination( input: Seq[String], chars: Seq[Char]) = {
for {
i <- input
c <- chars
} yield (i+c)
}
#tailrec
def rec(left: Int, current: Seq[String], accum: Seq[String] ) : Seq[String] = {
left match {
case 0 => accum
case _ => {
val next = combination( current, alphabet )
rec( left-1, next, accum ++ next )
}
}
}
rec(n, Seq(""), Seq(""))
}
def main(args: Array[String]) : Unit = {
// create all possible words with the given alphabet
val canonicWordSet= canonic( Seq('a', 'b', 'c'), 3)
// formal regular language definition
val languageDef: Regex = "a*b".r
// determine words of language by filtering the canocic set.
val wordsOfLanguage = canonicWordSet.filter( word => languageDef.pattern.matcher(word).matches )
println( wordsOfLanguage.toList )
}
}
2) Non working version of canonic not working correctly
def canonic(alphabet:Seq[Char], n: Int): Iterator[String] = {
for {
i <- (0 to n).iterator
combi <- alphabet.combinations(i).map(cs => cs.mkString)
} yield combi
}
I have not completely understood your meaning, is this OK?
scala> def generator(chars: Seq[Char], n: Int): Iterator[String] =
| (0 to n).iterator flatMap (i => (chars flatMap (_.toString*i) mkString) combinations i)
generator: (chars: Seq[Char], n: Int)Iterator[String]
scala>
scala> generator("AB", 3) toList
res0: List[String] = List("", A, B, AA, AB, BB, AAA, AAB, ABB, BBB)
scala> generator("ABC", 3) toList
res1: List[String] = List("", A, B, C, AA, AB, AC, BB, BC, CC, AAA, AAB, AAC, ABB, ABC, ACC, BBB, BBC, BCC, CCC)
Related
I'm new to Scala/Functional Programming and want to understand if my below solution fits into the functional programming world. If someone can suggest me a better approach, I will be obliged.
Problem Statement: Print each item of a list, n times
Solution:
import scala.collection.mutable.ListBuffer
object ListReplication extends App {
def printNTimes(items: List[Int], n: Int): ListBuffer[Int] = {
var outputList = new ListBuffer[Int]
def storeNTime(item: Int): Unit = {
for (_ <- 1 to n) outputList += item
}
for (item <- items) storeNTime(item)
outputList
}
val result = printNTimes(items = List(1,2,4), n = 3)
println(result)
}
It is always better to work with immutable types. So I'll change the return type into List[Int]. You can just do:
def printNTimes(items: List[Int], n: Int): List[Int] = {
items.flatMap(i => Vector.fill(n)(i))
}
or:
def printNTimes(items: List[Int], n: Int): List[Int] = {
items.flatMap(Vector.fill(n)(_))
}
Then running:
println(printNTimes(List(1,2,4), 3))
will output:
List(1, 1, 1, 2, 2, 2, 4, 4, 4)
This question was inspired by Extract numbers from String Array question.
Consider we have a List of arbitrary alphabetic and numeric strings:
val ls = List("The", "first", "one", "is", "11", "the", "second", "is" "22")
The goal is to form a list of numbers extracted from the original list: val nums: List[Int] = List(11, 22)
There are two different approaches possible (AFAIK):
Using Try construct:
val nums = ls.flatMap(s => Try(s.toInt).toOption)
This solution looks concise but it will have a huge overhead to handle exceptions.
Using matches method:
val nums = ls.filter(_.matches("\\d+")).map(_.toInt)
Here the most time-consuming part is regexp matching.
Which one is better by performance?
From my point of view usage of exception mechanism in such simple operation is a like "using a sledge-hammer to crack a nut".
I highly recommend you test this stuff out yourself, you can learn a lot! Commence Scala REPL:
scala> import scala.util.Try
import scala.util.Try
< import printTime function from our repo >
scala> val list = List("The", "first", "one", "is", "11", "the", "second", "is", "22")
list: List[String] = List(The, first, one, is, 11, the, second, is, 22)
scala> var x: List[Int] = Nil
x: List[Int] = List()
OK, the environment is set up. Here's your first function (Try):
scala> def f1(l: List[String], n: Int) = {
var i = 0
while (i < n) {
x = l.flatMap(s => Try(s.toInt).toOption)
i += 1
}
}
f1: (l: List[String], n: Int)Unit
The second function (regex):
scala> def f2(l: List[String], n: Int) = {
var i = 0
while (i < n) {
x = l.filter(_.matches("\\d+")).map(_.toInt)
i += 1
}
}
f2: (l: List[String], n: Int)Unit
Timings:
scala> printTime(f1(list, 100000)) // Try
time: 4.152s
scala> printTime(f2(list, 100000)) // regex
time: 565.107ms
Well, we've learned that handling exceptions inside a flatMap is a very inefficient way to do things. This is partly because exception handling produces bad assembly code, and partly because flatMaps with options do a lot of extra allocation and boxing. Regex is ~8x faster! But...is regex fast?
scala> def f3(l: List[String], n: Int) = {
var i = 0
while (i < n) {
x = l.filter(_.forall(_.isDigit)).map(_.toInt)
i += 1
}
}
f3: (l: List[String], n: Int)Unit
scala> printTime(f3(list, 100000)) // isDigit
time: time: 70.960ms
Replacing regex with character isDigit calls gave us another order of magnitude improvement. The lesson here is to avoid try/catch handling at all costs, avoid using regex whenever possible, and don't be afraid to write performance comparisons!
I am trying to match mathematical operations using a match in scala. So the function will be able to match any string like "5+2" or "log10" or "10^5" etc. However the match keeps failing for the individual types of expressions
def isValid(expression:String):Boolean={
val number = """((\-|\+)?[0-9]+\.?[0-9])*"""
val operation = """([\+,\-,*,/,C,P])"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val powers = """\^"""+number
val arithmeticExpression = (number + operation + number).r
val functionExpression = (functions + number).r
val powerOperation = (number + powers).r
val stringToTest: Regex = ("""(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerOperation+""")""").r
expression match {
case arithmeticExpression(s) => true
case functionExpression(s) => true
case powerOperation(s)=>true
case _ => false
}
}
println(isValid("1+4").toString)
However if I match for a general expression I get the expected output:
def isValid(expression:String):Boolean={
val number = """(\-|\+)?[0-9]+\.?[0-9]*"""
val operation = """[\+,\-,*,/,C,P]"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val power = """\^"""+number
val arithmeticExpression = number+operation+number
val functionExpression = functions+number
val powerExpression = number+power
val validExpression = """(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerExpression+""")"""
validExpression.r.findFirstIn(expression) match {
case Some(`expression`) => true
case None => false
}
You're not doing numbers correctly:
scala> arithmeticExpression.findFirstIn("1+4")
res2: Option[String] = Some(+)
scala> arithmeticExpression.unapplySeq("1+4")
res3: Option[List[String]] = None
scala> arithmeticExpression.unapplySeq("11+14")
res4: Option[List[String]] = Some(List(11, null, +, 14, null))
Since you're requiring two digits.
The "()" in the regular expression for numbers was affecting the result. Also /^ needed to wrapped in (). This ended up working for me.
def isValid(expression:String):Boolean={
val number = """[\-,\+]?[0-9]+\.?[0-9]*"""
val operation = """([\+,\-,*,/,C,P])"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val powers = """(\^)"""+number
val arithmeticExpression = (""""""+number + operation + number+"""""").r
val functionExpression = (functions + number).r
val powerOperation = (number + powers).r
val stringToTest: Regex = ("""(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerOperation+""")""").r
expression match {
case arithmeticExpression(s) => {
println("Arithmetic Match")
true
}
case functionExpression(s) => {
println("Function Match")
true
}
case powerOperation(s)=>{
println("Power Match")
true
}
case _ => false
}
}
Thanks for the help!
I would like to know how I can split a string using more than one delimiter with Scala.
For instance if I have a list of delimiters :
List("Car", "Red", "Boo", "Foo")
And a string to harvest :
Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed
I would like to be able to output something like :
List( ("Car", " foerjfpoekrfopekf "),
("Red", " ezokdpzkdpoedkzopke dekpzodk "),
("Foo", " azdkpodkzed")
)
You can use the list to create a regular expression and use its split method:
val regex = List("Car", "Red", "Boo", "Foo").mkString("|").r
regex.split("Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed")
That however doesn't tell you which delimiter was used where. If you need that, I suggest you try Scala's parser library.
EDIT:
Or you can use regular expressions to extract one pair at a time like this:
def split(s:String, l:List[String]):List[(String,String)] = {
val delimRegex = l.mkString("|")
val r = "("+delimRegex+")(.*?)(("+delimRegex+").*)?"
val R = r.r
s match {
case R(delim, text, rest, _) => (delim, text) :: split(rest, l)
case _ => Nil
}
}
a bit verbose, but it works:
DEPRECATED VERSION: (it has a bug, left it here because you already accepted the answer)
def f(s: String, l: List[String], g: (String, List[String]) => Int) = {
for {
t <- l
if (s.contains(t))
w = s.drop(s.indexOf(t) + t.length)
} yield (t, w.dropRight(w.length - g(w, l)))
}
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def g(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), g(s, xs))
}
val l = List("Car", "Red", "Boo", "Foo")
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed"
output:
f(s, l, g).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
it returns Array[String] instead of list. but you can just as well do: f(s, l, g).toList
EDIT:
just noticed this code is good if the delimiters only appear once in the string. if had defined s as follows:
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed Car more..."
I'd still get the same result, instead of another pair ("Car"," more...")
EDIT#2: BUGLESS VERSION here's the fixed snippet:
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def multiSplit(str: String, delimiters: List[String]): List[(String, String)] = {
val del = nextDelimiter(str, delimiters)
del._1 match {
case None => Nil
case Some(x) => {
val tmp = str.drop(x.length)
val current = tmp.dropRight(tmp.length - nextDelIndex(tmp,delimiters))
(x, current) :: multiSplit(str.drop(x.length + current.length), delimiters)
}
}
}
def nextDelIndex(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), nextDelIndex(s, xs))
}
def nextDelimiter(str: String, delimiters: List[String]): (Option[String], Int) = delimiters match {
case Nil => (None, -1)
case x :: xs => {
val next = nextDelimiter(str, xs)
if (str.contains(x)) {
val i = str.indexOf(x)
next._1 match {
case None => (Some(x), i)
case _ => if (next._2 < i) next else (Some(x), i)
}
} else next
}
}
output:
multiSplit(s, l).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
> (Car, more...)
and now it works :)
I search the best and the most elegant way to make GA crossover operator in Scala functional (No "for" loop, with only immutable type if possible), for example, with this list:
val A = IndexedSeq (5,4,8)
val B = IndexedSeq (3,2,6)
I want to make random bitcoin permutation (with rng.nextBoolean for example) between each element in my IndexedSeq, and finally I get the two lists A' and B' after permutation of their elements.
Example of execution:
rng.nextBoolean <- (true,false,true)
A' = 3,4,6
B' = 5,2,8
Thanks.
def crossover[T](a: Seq[T], b: Seq[T], rs: Seq[Boolean]) =
(a, b, rs).zipped.map((x, y, z) => if (z) Seq(x, y) else Seq(y, x)).transpose
Use with Booleans as third argument:
scala> val Seq(a1, b1) = crossover(A, B, List(true, false, true))
a1: Seq[Int] = Vector(5, 2, 8)
b1: Seq[Int] = Vector(3, 4, 6)
If you want it with a default sequence of Booleans, you could provide a default argument like this:
def crossover[T](a: Seq[T], b: Seq[T], rs: Seq[Boolean] = {
val rng = new util.Random
Stream.continually(rng.nextBoolean) }) =
(a, b, rs).zipped.map((x, y, z) => if (z) Seq(x, y) else Seq(y, x)).transpose
Wow, where's all this code coming from? Here:
val (a1, b1) = A zip B map (t => if (util.Random.nextBoolean) t.swap else t) unzip
There, that's all.
If you already have a list of random booleans, you can do this:
val (a1, b1) = A zip B zip C map { case (t, flag) => if (flag) t.swap else t } unzip
import scala.util.Random
val A = IndexedSeq(5,4,8)
val B = IndexedSeq(3,2,6)
def crossover[T](rng: Random)(a: Seq[T], b: Seq[T]): (Seq[T],Seq[T]) = {
if (a.isEmpty && b.isEmpty) return (Nil,Nil)
val (aTailCrossover,bTailCrossover) = crossover(rng)(a.tail,b.tail)
if (rng.nextBoolean) (b.head +: aTailCrossover, a.head +: bTailCrossover)
else (a.head +: aTailCrossover, b.head +: bTailCrossover)
}
println(crossover(new Random)(A,B))
def rndCombi [T] (a: Seq[T], b: Seq[T]): Seq[T] = {
if (a.size != b.size) sys.error ("sizes don't match: a:" + a.size + " != b: " + b.size)
val rnd = util.Random
val max = (math.pow (2, a.size)).toInt
val r = rnd.nextInt (max)
def pick (a: Seq[T], b: Seq[T], r: Int) : List[T] = {
if (a.size == 0) Nil else
if (r % 2 == 0) a.head :: pick (a.tail , b.tail, r/2) else
b.head :: pick (a.tail , b.tail, r/2)
}
// print all combinations for testing:
// (0 until max).map (i => println (pick (a, b, i).mkString ("-")))
pick (a, b, r).toSeq
}
// I choosed different values for easy testing:
val a = IndexedSeq (7, 8, 9)
val b = IndexedSeq (1, 2, 3)
println (rndCombi (a, b).mkString (" "))
println (rndCombi (a, b.tail).mkString (" "))
Initializing util.Random each time is of course not very clever, if done frequently. So for production code you would rearrange the code.
If you don't restrict the input to 2 sequences, it get's more interesting. Here we go:
def rndCombi [T] (s: Seq[Seq[T]]): Seq[T] = {
val outer = s.size
val inner = s(0).size
val rnd = util.Random
val max = (math.pow (outer, inner)).toInt
val r = rnd.nextInt (max)
def pick (s: Seq[Seq[T]], r: Int, pos: Int = 0) : List[T] =
if (pos == inner) Nil
else s(r % inner)(pos) :: pick (s, r/inner, pos + 1)
// print all combinations for testing:
(0 until max).map (i => println (pick (s, i).mkString ("-")))
println ()
pick (s, r).toSeq
}
val a = IndexedSeq (1, 2, 3)
val b = IndexedSeq (4, 5, 6)
val c = IndexedSeq (7, 8, 9)
println (rndCombi (Seq (a, b, c)).mkString (" "))
The second solution can, of course, be used for 2 sequences as well.