Matching mathematical operations in scala using regex - regex

I am trying to match mathematical operations using a match in scala. So the function will be able to match any string like "5+2" or "log10" or "10^5" etc. However the match keeps failing for the individual types of expressions
def isValid(expression:String):Boolean={
val number = """((\-|\+)?[0-9]+\.?[0-9])*"""
val operation = """([\+,\-,*,/,C,P])"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val powers = """\^"""+number
val arithmeticExpression = (number + operation + number).r
val functionExpression = (functions + number).r
val powerOperation = (number + powers).r
val stringToTest: Regex = ("""(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerOperation+""")""").r
expression match {
case arithmeticExpression(s) => true
case functionExpression(s) => true
case powerOperation(s)=>true
case _ => false
}
}
println(isValid("1+4").toString)
However if I match for a general expression I get the expected output:
def isValid(expression:String):Boolean={
val number = """(\-|\+)?[0-9]+\.?[0-9]*"""
val operation = """[\+,\-,*,/,C,P]"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val power = """\^"""+number
val arithmeticExpression = number+operation+number
val functionExpression = functions+number
val powerExpression = number+power
val validExpression = """(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerExpression+""")"""
validExpression.r.findFirstIn(expression) match {
case Some(`expression`) => true
case None => false
}

You're not doing numbers correctly:
scala> arithmeticExpression.findFirstIn("1+4")
res2: Option[String] = Some(+)
scala> arithmeticExpression.unapplySeq("1+4")
res3: Option[List[String]] = None
scala> arithmeticExpression.unapplySeq("11+14")
res4: Option[List[String]] = Some(List(11, null, +, 14, null))
Since you're requiring two digits.

The "()" in the regular expression for numbers was affecting the result. Also /^ needed to wrapped in (). This ended up working for me.
def isValid(expression:String):Boolean={
val number = """[\-,\+]?[0-9]+\.?[0-9]*"""
val operation = """([\+,\-,*,/,C,P])"""
val functions = """(log|ln|sin|cos|tan|arc sin|arc cos|arc tan|sec|csc|cot)"""
val powers = """(\^)"""+number
val arithmeticExpression = (""""""+number + operation + number+"""""").r
val functionExpression = (functions + number).r
val powerOperation = (number + powers).r
val stringToTest: Regex = ("""(""" +arithmeticExpression+"""|"""+functionExpression+"""|"""+powerOperation+""")""").r
expression match {
case arithmeticExpression(s) => {
println("Arithmetic Match")
true
}
case functionExpression(s) => {
println("Function Match")
true
}
case powerOperation(s)=>{
println("Power Match")
true
}
case _ => false
}
}
Thanks for the help!

Related

How to use scala.util.matching.Regex correctly?

This may look obvious but I couldn't explain the No match available error . Below, you find a definition of a simple matching function I am using.
The same instructions inside the function run without an issue, however calling the function raises the error. Can you help me pinpoint the mistake ?
import scala.util.matching.Regex
def regexParsing(inputRecord:String, inputRegex:String, listOfFields:Seq[String], fieldsToRemove:Seq[String]): scala.collection.Map[String,Any] = {
val logPattern = new Regex(inputRegex, listOfFields:_*)
val result = logPattern.findAllIn(inputRecord)
val resultMap = result.groupNames.map(a => Map(a.toString -> result.group(a))).reduce(_++_)
return resultMap
}
val inputRecord = """s2222f"""
val inputRegex = """(.*)"""
val listOfFields = Seq("field")
val fieldsToRemove = Seq("field1", "field2")
// working
val logPattern = new Regex(inputRegex, listOfFields:_*)
val result = logPattern.findAllIn(inputRecord)
val resultMap = result.groupNames.map(a => Map(a.toString -> result.group(a))).reduce(_++_)
// not working
regexParsing(inputRecord, inputRegex, listOfFields, fieldsToRemove)
Try 2.12? The restriction about advancing the iterator is a gotcha in the API that was finally addressed.
$ scala
Welcome to Scala 2.12.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
Type in expressions for evaluation. Or try :help.
scala> :pa
// Entering paste mode (ctrl-D to finish)
import scala.util.matching.Regex
def regexParsing(inputRecord:String, inputRegex:String, listOfFields:Seq[String], fieldsToRemove:Seq[String]): scala.collection.Map[String,Any] = {
val logPattern = new Regex(inputRegex, listOfFields:_*)
val result = logPattern.findAllIn(inputRecord)
val resultMap = result.groupNames.map(a => Map(a.toString -> result.group(a))).reduce(_++_)
return resultMap
}
val inputRecord = """s2222f"""
val inputRegex = """(.*)"""
val listOfFields = Seq("field")
val fieldsToRemove = Seq("field1", "field2")
// working
val logPattern = new Regex(inputRegex, listOfFields:_*)
val result = logPattern.findAllIn(inputRecord)
val resultMap = result.groupNames.map(a => Map(a.toString -> result.group(a))).reduce(_++_)
// Exiting paste mode, now interpreting.
import scala.util.matching.Regex
regexParsing: (inputRecord: String, inputRegex: String, listOfFields: Seq[String], fieldsToRemove: Seq[String])scala.collection.Map[String,Any]
inputRecord: String = s2222f
inputRegex: String = (.*)
listOfFields: Seq[String] = List(field)
fieldsToRemove: Seq[String] = List(field1, field2)
logPattern: scala.util.matching.Regex = (.*)
result: scala.util.matching.Regex.MatchIterator = non-empty iterator
resultMap: scala.collection.immutable.Map[String,String] = Map(field -> s2222f)
scala> regexParsing(inputRecord, inputRegex, listOfFields, fieldsToRemove)
res0: scala.collection.Map[String,Any] = Map(field -> s2222f)
scala> :quit

Scala: Find all strings of length up to n in regular language

I have a (possibly infinite) regular language which I describe with a regular expression. From this regular language I want to obtain all strings of length up to n, using scala. Some quick googling tells me there are some libraries out there that can help me. Before using an external library I want to know if this is something that is easy (as in something a decent programmer can implement in under 15 minutes) to do myself in Scala. If not, are there some good libraries that you can recommend for this?
To make what I want more concrete. Suppose I have the language A*B* and my n is 3, I then want the strings "", "A", "B", "AA", "AB", "BB", "AAA", "AAB", "ABB", "BBB".
Answer
Edits
26Nov, 4:30pm - added iterator-based version to reduce runtime and memory consumption. Seq-based version of canonic is at the bottom under (1)
26Nov, 2:45pm - added working seq-based version for canonic, non working old version of canonic is at the bottom (2)
Approach
Canonically generate all words possible for a given alphabet up to length n.
Filter the generated words by a regular expression (your regular language in that case)
Code
object SO {
import scala.annotation.tailrec
import scala.collection.{AbstractIterator, Iterator}
import scala.util.matching.Regex
def canonic(alphabet: Seq[Char], n: Int): Iterator[String] =
if (n < 0) Iterator.empty
else {
val r: IndexedSeq[Iterator[String]] = for (i <- 1 to n)
yield new CanonicItr(alphabet, i)
r.reduce(_ ++ _)
}
private class CanonicItr(alphabet: Seq[Char], width: Int) extends AbstractIterator[String] {
val aSize = alphabet.size
val alph = alphabet.toVector
val total = aSizePower(width)
println("total " + total)
private var pos = 0L
private def aSizePower(r: Int): Long = scala.math.pow(aSize, r).toLong
def stringFor(id: Long): String = {
val r = for {
i <- (0 until width).reverse
// (738 / 10^0) % 10 = 8
// (738 / 10^1) % 10 = 3
// (738 / 10^2) % 10 = 7
charIdx = ((id / (aSizePower(i))) % aSize).toInt
} yield alph(charIdx)
r.mkString("")
}
override def hasNext: Boolean = pos < total
override def next(): String = {
val s = stringFor(pos)
pos = pos + 1
s
}
}
def main(args: Array[String]): Unit = {
// create all possible words with the given alphabet
val canonicWordSet = canonic(Seq('a', 'b', 'c'), 8)
// formal regular language definition
val languageDef: Regex = "a*b".r
// determine words of language by filtering the canocic set.
val wordsOfLanguage = canonicWordSet.filter(word => languageDef.pattern.matcher(word).matches)
println(wordsOfLanguage.toList)
}
}
1) Working version of canonic but with high memory requirements
object SO {
import scala.util.matching.Regex
/**
* Given a sequence of characters (e.g. Seq('a', 'b', 'c') )
* generates all combinations up to lneght of n (incl.).
*
* #param alphabet sequence of characters
* #param n is the max length
* #return all combinations of up to length n.
*/
def canonic(alphabet:Seq[Char], n: Int): Seq[String] = {
def combination( input: Seq[String], chars: Seq[Char]) = {
for {
i <- input
c <- chars
} yield (i+c)
}
#tailrec
def rec(left: Int, current: Seq[String], accum: Seq[String] ) : Seq[String] = {
left match {
case 0 => accum
case _ => {
val next = combination( current, alphabet )
rec( left-1, next, accum ++ next )
}
}
}
rec(n, Seq(""), Seq(""))
}
def main(args: Array[String]) : Unit = {
// create all possible words with the given alphabet
val canonicWordSet= canonic( Seq('a', 'b', 'c'), 3)
// formal regular language definition
val languageDef: Regex = "a*b".r
// determine words of language by filtering the canocic set.
val wordsOfLanguage = canonicWordSet.filter( word => languageDef.pattern.matcher(word).matches )
println( wordsOfLanguage.toList )
}
}
2) Non working version of canonic not working correctly
def canonic(alphabet:Seq[Char], n: Int): Iterator[String] = {
for {
i <- (0 to n).iterator
combi <- alphabet.combinations(i).map(cs => cs.mkString)
} yield combi
}
I have not completely understood your meaning, is this OK?
scala> def generator(chars: Seq[Char], n: Int): Iterator[String] =
| (0 to n).iterator flatMap (i => (chars flatMap (_.toString*i) mkString) combinations i)
generator: (chars: Seq[Char], n: Int)Iterator[String]
scala>
scala> generator("AB", 3) toList
res0: List[String] = List("", A, B, AA, AB, BB, AAA, AAB, ABB, BBB)
scala> generator("ABC", 3) toList
res1: List[String] = List("", A, B, C, AA, AB, AC, BB, BC, CC, AAA, AAB, AAC, ABB, ABC, ACC, BBB, BBC, BCC, CCC)

How to fix the pattern-matching exhaustive warning?

Some scala code:
val list = List(Some("aaa"), Some("bbb"), None, ...)
list.filter(_!=None).map {
case Some(x) => x + "!!!"
// I don't want to handle `None` case since they are not possible
// case None
}
When I run it, the compiler complains:
<console>:9: warning: match may not be exhaustive.
It would fail on the following input: None
list.filter(_!=None).map {
^
res0: List[String] = List(aaa!!!, bbb!!!)
How to fix that warning without providing the case None line?
If you are using map after filter, you may to use collect.
list collect { case Some(x) => x + "!!!" }
you can use flatten
scala> val list = List(Some("aaa"), Some("bbb"), None).flatten
list: List[String] = List(aaa, bbb)
scala> list.map {
| x => x + "!!!"
| }
res1: List[String] = List(aaa!!!, bbb!!!)
You could use the #unchecked annotation, although that requires some additional code:
list.filter(_!=None).map { x => ( x : #unchecked) match {
case Some(x) => x + "!!!"
}
}
You can use get method instead of pattern matching.
Here is example code:
scala> val list = List(Some("aaa"), Some("bbb"), None)
list: List[Option[String]] = List(Some(aaa), Some(bbb), None)
scala> list.filter(_ != None).map(_.get + "!!!")
res0: List[String] = List(aaa!!!, bbb!!!)
some other way to solve this issue, without filter and pattern matching
scala> list.flatten map (_ + "!!!")
or
scala> list.flatMap (_ map (_ + "!!!"))

Scala: how to split using more than one delimiter

I would like to know how I can split a string using more than one delimiter with Scala.
For instance if I have a list of delimiters :
List("Car", "Red", "Boo", "Foo")
And a string to harvest :
Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed
I would like to be able to output something like :
List( ("Car", " foerjfpoekrfopekf "),
("Red", " ezokdpzkdpoedkzopke dekpzodk "),
("Foo", " azdkpodkzed")
)
You can use the list to create a regular expression and use its split method:
val regex = List("Car", "Red", "Boo", "Foo").mkString("|").r
regex.split("Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed")
That however doesn't tell you which delimiter was used where. If you need that, I suggest you try Scala's parser library.
EDIT:
Or you can use regular expressions to extract one pair at a time like this:
def split(s:String, l:List[String]):List[(String,String)] = {
val delimRegex = l.mkString("|")
val r = "("+delimRegex+")(.*?)(("+delimRegex+").*)?"
val R = r.r
s match {
case R(delim, text, rest, _) => (delim, text) :: split(rest, l)
case _ => Nil
}
}
a bit verbose, but it works:
DEPRECATED VERSION: (it has a bug, left it here because you already accepted the answer)
def f(s: String, l: List[String], g: (String, List[String]) => Int) = {
for {
t <- l
if (s.contains(t))
w = s.drop(s.indexOf(t) + t.length)
} yield (t, w.dropRight(w.length - g(w, l)))
}
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def g(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), g(s, xs))
}
val l = List("Car", "Red", "Boo", "Foo")
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed"
output:
f(s, l, g).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
it returns Array[String] instead of list. but you can just as well do: f(s, l, g).toList
EDIT:
just noticed this code is good if the delimiters only appear once in the string. if had defined s as follows:
val s = "Car foerjfpoekrfopekf Red ezokdpzkdpoedkzopke dekpzodk Foo azdkpodkzed Car more..."
I'd still get the same result, instead of another pair ("Car"," more...")
EDIT#2: BUGLESS VERSION here's the fixed snippet:
def h(s: String, x: String) = if (s.contains(x)) s.indexOf(x) else s.length
def multiSplit(str: String, delimiters: List[String]): List[(String, String)] = {
val del = nextDelimiter(str, delimiters)
del._1 match {
case None => Nil
case Some(x) => {
val tmp = str.drop(x.length)
val current = tmp.dropRight(tmp.length - nextDelIndex(tmp,delimiters))
(x, current) :: multiSplit(str.drop(x.length + current.length), delimiters)
}
}
}
def nextDelIndex(s: String, l: List[String]): Int = l match {
case Nil => s.length
case x :: xs => math.min(h(s, x), nextDelIndex(s, xs))
}
def nextDelimiter(str: String, delimiters: List[String]): (Option[String], Int) = delimiters match {
case Nil => (None, -1)
case x :: xs => {
val next = nextDelimiter(str, xs)
if (str.contains(x)) {
val i = str.indexOf(x)
next._1 match {
case None => (Some(x), i)
case _ => if (next._2 < i) next else (Some(x), i)
}
} else next
}
}
output:
multiSplit(s, l).foreach(println)
> (Car, foerjfpoekrfopekf )
> (Red, ezokdpzkdpoedkzopke dekpzodk )
> (Foo, azdkpodkzed)
> (Car, more...)
and now it works :)

Scala: extracting a repeated value from a list

I have often the need to check if many values are equal and in case extract the common value. That is, I need a function that will work like follows:
extract(List()) // None
extract(List(1,2,3)) // None
extract(List(2,2,2)) // Some(2)
Assuming one has a pimp that will add tailOption to seqs (it is trivial to write one or there is one in scalaz), one implementation looks like
def extract[A](l: Seq[A]): Option[A] = {
def combine(s: A)(r: Seq[A]): Option[A] =
r.foldLeft(Some(s): Option[A]) { (acc, n) => acc flatMap { v =>
if (v == n) Some(v) else None
} }
for {
h <- l.headOption
t <- l.tailOption
res <- combine(h)(t)
} yield res
}
Is there something like that - possibly more general - already in Scalaz, or some simpler way to write it?
This seems like a really complicated way to write
def extract[A](l:Seq[A]):Option[A] = l.headOption.flatMap(h =>
if (l.tail.forall(h==)) Some(h) else None)
You don't need tailOption, since the anonymous function that gets passed as an argument to flatMap is only executed if l is not empty.
If you only want to delete duplicates toSet is enough:
def equalValue[A](xs: Seq[A]): Option[A] = {
val set = xs.toSet
if (set.size == 1) Some(set.head) else None
}
scala> equalValue(List())
res8: Option[Nothing] = None
scala> equalValue(List(1,2,3))
res9: Option[Int] = None
scala> equalValue(List(2,2,2))
res10: Option[Int] = Some(2)
This is a fluent solution
yourSeq.groupBy(x => x) match {case m if m.size==1 => m.head._1; case _ => None}
You could use a map to count the number of occurrences of each element in the list and then return only those that occur more than once:
def extract[T](ts: Iterable[T]): Iterable[T] = {
var counter: Map[T, Int] = Map()
ts.foreach{t =>
val cnt = counter.get(t).getOrElse(0) + 1
counter = counter.updated(t, cnt)
}
counter.filter(_._2 > 1).map(_._1)
}
println(extract(List())) // List()
println(extract(List(1,2,3))) // List()
println(extract(List(2,2,2))) // List(2)
println(extract(List(2,3,2,0,2,3))) // List(2,3)
You can also use a foldLeft instead of foreach and use the empty map as the initial accumulator of foldLeft.