I have a little helper method, which has to normalize some money values. Hence, I wrote some regular expressions, which should detect different ways of representing them. Strangely they only trigger if used with Regex.findAllIn(..), but not if used in a match case statement.
val result = extractAmount("23772.90")
def extractAmount(amountStr: String): BigDecimal = {
val Plain = """^\d+$""".r
val Dot = """^(\d+)\.(\d*)$""".r
val Comma = """^(\d+),(\d*)$""".r
val DotComma = """^(\d+)\.(\d+),(\d*)$""".r
val CommaDot = """^(\d+),(\d+)\.(\d*)$""".r
if (Dot.findAllIn(amountStr).hasNext)
println(Dot.findAllIn(amountStr).next())
amountStr match {
case Plain(value) => new java.math.BigDecimal(value)
case Dot(values) => new BigDecimal(s"${values(0)}.${values(1)}")
case Comma(values) => new BigDecimal(s"${values(0)}.${values(1)}")
case DotComma(values) => new BigDecimal(s"${values(0)}${values(1)}.${values(2)}")
case CommaDot(values) => new BigDecimal(s"${values(0)}${values(1)}.${values(2)}")
case _ => throw new RuntimeException(s"Money amount string -->${amountStr}<-- did not match any pattern.")
}
}
Debugger output hitting Regex.findAllIn(..):
Debugger output not hitting the match case for Dot(values):
Also interesting might be following error message in the debugger:
Using scala version 2.11.8.
I am puzzled, for sure overlooking something obvious. Thankful for a hint.
Instead of doing e.g.
case Dot(values) => new BigDecimal(s"${values(0)}.${values(1)}")
rewrite the usage of your Regex extractors like this:
case Dot(a, b) => new BigDecimal(s"$a.$b")
The amount of arguments in each extractor must match the amount of groups your regex contains (here: 2). Each argument is just a string that represents the content of one single group.
Related
I created a simple code in Scala that checks whether an input is correctly formatted as HH:mm. I expect the code to result in an Array of valid strings. However, what I'm getting as a result is of type Any = Array(), which is problematic as when I try to print that result I get something like that:
[Ljava.lang.Object;#32a59591.
I guess it's a simple problem but being a Scala newbie I didn't manage to solve it even after a good few hours of googling and trial & error.
val scheduleHours = if (inputScheduleHours == "") {
dbutils.notebook.exit(s"ERROR: Missing param value for schedule hours.")
}
else {
val timePattern = """^((?:[0-30]?[0-9]|2[0-3]):[0-5][0-9])$""".r
val inputScheduleHoursParsed = inputScheduleHours.split(";").map(_.trim)
for (e <- inputScheduleHoursParsed) yield e match {
case timePattern(e) => e.toString
case _ => dbutils.notebook.exit(s"ERROR: Wrong param value for schedule hours: '${inputScheduleHours}'")
}
}
The problem is that some branches return the result you want and others return dbutils.notebook.exit which (I think) returns Unit. Scala must pick a type for the result that is compatible with both Unit and Array[String], and Any is the only one that fits.
One solution is to add a compatible value after the calls to dbutils.notebook.exit, e.g.
val scheduleHours = if (inputScheduleHours == "") {
dbutils.notebook.exit(s"ERROR: Missing param value for schedule hours.")
Array.empty[String]
}
Then all the branches return Array[String] so that will be the type of the result.
I am starting to learn Scala and want to use regular expressions to match a character from a string so I can populate a mutable map of characters and their value (String values, numbers etc) and then print the result.
I have looked at several answers on SO and gone over the Scala Docs but can't seem to get this right. I have a short Lexer class that currently looks like this:
class Lexer {
private val tokens: mutable.Map[String, Any] = collection.mutable.Map()
private def checkCharacter(char: Character): Unit = {
val Operator = "[-+*/^%=()]".r
val Digit = "[\\d]".r
val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => tokens(c) = "Operator"
case Digit(c) => tokens(c) = Integer.parseInt(c)
case Other(c) => tokens(c) = "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
for (s <- inputArray)
checkCharacter(s)
for((key, value) <- tokens)
println(key + ": " + value)
}
}
I'm pretty confused by the sort of strange method syntax, Operator(c), that I have seen being used to handle the value to match and am also unsure if this is the correct way to use regex in Scala. I think what I want this code to do is clear, I'd really appreciate some help understanding this. If more info is needed I will supply what I can
This official doc has lot's of examples: https://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html. What might be confusing is the type of the regular expression and its use in pattern matching...
You can construct a regex from any string by using .r:
scala> val regex = "(something)".r
regex: scala.util.matching.Regex = (something)
Your regex becomes an object that has a few useful methods to be able to find matching groups like findAllIn.
In Scala it's idiomatic to use pattern matching for safe extraction of values, thus Regex class also has unapplySeq method to support pattern matching. This makes it an extractor object. You can use it directly (not common):
scala> regex.unapplySeq("something")
res1: Option[List[String]] = Some(List(something))
or you can let Scala compiler call it for you when you do pattern matching:
scala> "something" match {
| case regex(x) => x
| case _ => ???
| }
res2: String = something
You might ask why exactly this return type on unapply/unapplySeq. The doc explains it very well:
The return type of an unapply should be chosen as follows:
If it is just a test, return a Boolean. For instance case even().
If it returns a single sub-value of type T, return an Option[T].
If you want to return several sub-values T1,...,Tn, group them in an optional tuple Option[(T1,...,Tn)].
Sometimes, the number of values to extract isn’t fixed and we would
like to return an arbitrary number of values, depending on the input.
For this use case, you can define extractors with an unapplySeq method
which returns an Option[Seq[T]]. Common examples of these patterns
include deconstructing a List using case List(x, y, z) => and
decomposing a String using a regular expression Regex, such as case
r(name, remainingFields # _*) =>
In short your regex might match one or more groups, thus you need to return a list/seq. It has to be wrapped in an Option to comply with extractor contract.
The way you are using regex is correct, I would just map your function over the input array to avoid creating mutable maps. Perhaps something like this:
class Lexer {
private def getCharacterType(char: Character): Any = {
val Operator = "([-+*/^%=()])".r
val Digit = "([\\d])".r
//val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => "Operator"
case Digit(c) => Integer.parseInt(c)
case _ => "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
val tokens = inputArray.map(x => x -> getCharacterType(x))
for((key, value) <- tokens)
println(key + ": " + value)
}
}
scala> val l = new Lexer()
l: Lexer = Lexer#60f662bd
scala> l.lex("a-1")
a: Other
-: Operator
1: 1
Struggling with my first (ever) Scala regex here. I need to see if a given String matches the regex: "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".
So, some examples:
animal<0,sega> => valid
animal<fizz,buzz> => valid
animAl<fizz,buzz> => illegal; animAl contains upper-case (and this is case-sensitive)
animal<fizz,3d> => valid
animal<,3d> => illegal; there needs to be something [a-zA-Z0-9]+ between '<' and ','
animal<fizz,> => illegal; there needs to be something [a-zA-Z0-9]+ between ',' and '>'
animal<fizz,%> => illegal; '%' doesn't match [a-zA-Z0-9]+
etc.
My best attempt so far:
val animalRegex = "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".r
animalRegex.findFirstIn("animal<fizz,buzz")
Unfortunately that's where I'm hitting a brick wall. findFirstIn and all the other obvious methods available of animalRegex all return Option[String] types. I was hoping to find something that returns a boolean, so something like:
val animalRegex = "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".r
if(animalRegex.matches("animal<fizz,buzz>")) {
val leftOperand : String = getLeftOperandSomehow(...)
val rightOperand : String = getRightOperandSomehow(...)
}
So I need the equivalent of Java's matches method, and then need a way to access the "left operand" (that is, the value of the first [a-zA-Z0-9]+ group, which in the current case is "fizz"), and then ditto for the right/second operand ("buzz"). Any ideas where I'm going awry?
To be able to extract the matched parts from your string, you'll need to add capture groups to your regex expression, like so (note the parentheses):
val animalRegex = "animal<([a-zA-Z0-9]+),([a-zA-Z0-9]+)>".r
Then, you can use Scala's pattern matching to check for a match and extract the operands from the string:
val str = "animal<fizz,3d>"
val result = str match {
case animalRegex(op1,op2) => s"$op1, $op2"
case _ => "Did not match"
}
In this example, result will contain "fizz, 3d"
This has gotta be something stupid, but I'm wondering if someone can help me out here. The following regex pattern match within a case class match is not working as I would expect. Can someone provide some insight? Thanks.
object Confused {
case class MyCaseClass(s: String)
val WS = """\s*""".r
def matcher(myCaseClass: MyCaseClass) = myCaseClass match {
case MyCaseClass(WS(_)) => println("Found WS")
case MyCaseClass(s) => println(s"Found >>$s<<")
}
def main(args: Array[String]): Unit = {
val ws = " "
matcher(MyCaseClass(ws))
}
}
I would expect the the first case in the pattern match to be the one that matches, but it is not.
This prints
Found >> <<
It should be:
val WS = """(\s*)""".r
For your question, you want to match a pattern of spaces, In Scala,
A regular expression is used to determine whether a string matches a
pattern and, if it does, to extract or transform the parts that match.
for extracting match parts we need to use group to pattern a string. It means that we need to use parentheses to around our pattern string.
Example:
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2004-01-20" match {
case date(year, month, day) => s"$year was a good year for PLs."
}
val AlphabetPattern = "^([a-z]+)".r
def stringMatch(s: String) = s match {
case AlphabetPattern() => println("found")
case _ => println("not found")
}
If I try,
stringMatch("hello")
I get "not found", but I expected to get "found".
My understanding of the regex,
[a-z] = in the range of 'a' to 'z'
+ = one more of the previous pattern
^ = starts with
So regex AlphabetPattern is "all strings that start with one or more alphabets in the range a-z"
Surely I am missing something, want to know what.
Replace case AlphabetPattern() with case AlphabetPattern(_) and it works. The extractor pattern takes a variable to which it binds the result. Here we discard it but you could use x or whatever.
edit: Further to Randall's comment below, if you check the docs for Regex you'll see that it has an unapplySeq rather than an unapply method, which means it takes multiple variables. If you have the wrong number, it won't match, rather like
list match { case List(a,b,c) => a + b + c }
won't match if list doesn't have exactly 3 elements.
There are some issues with the match statement. s match is matching on the value of s which is checked against AlphabetPattern and _ which always evaluates to _ since s is never equal to "^([a-z]+)".r. Use one of the find methods in Scala.Util.Regex to look for a match with the given `Regex.
For example, using findFirstIn to find the first match of a string in AlphabetPattern.
scala> AlphabetPattern.findFirstIn("hello")
res0: Option[String] = Some(hello)
The stringMatch method using findFirstIn and a case statement:
scala> def stringMatch(s: String) = AlphabetPattern findFirstIn s match {
| case Some(s) => println("Found: " + s)
| case None => println("Not found")
| }
stringMatch: (s:String)Unit
scala> stringMatch("hello")
Found: hello