In SML, is it possible for you to have multiple patterns in one case statement?
For example, I have 4 arithmetic operators express in string, "+", "-", "*", "/" and I want to print "PLUS MINUS" of it is "+" or "-" and "MULT DIV" if it is "*" or "/".
TL;DR: Is there somewhere I can simplify the following to use less cases?
case str of
"+" => print("PLUS MINUS")
| "-" => print("PLUS MINUS")
| "*" => print("MULT DIV")
| "/" => print("MULT DIV")
Given that you've tagged your question with the smlnj tag, then yes, SML/NJ supports this kind of patterns. They call it or-patterns and it looks like this:
case str of
("+" | "-") => print "PLUS MINUS"
| ("*" | "/") => print "MULT DIV"
Notice the parentheses.
The master branch of MLton supports it too, as part of their Successor ML effort, but you'll have to compile MLton yourself.
val str = "+"
val _ =
case str of
"+" | "-" => print "PLUS MINUS"
| "*" | "/" => print "MULT DIV"
Note that MLton does not require parantheses. Now compile it using this command (unlike SML/NJ, you have to enable this feature explicitly in MLton):
mlton -default-ann 'allowOrPats true' or-patterns.sml
In Standard ML, no. In other dialects of ML, such as OCaml, yes. You may in some cases consider splitting pattern matching up into separate cases/functions, or skip pattern matching in favor of a shorter catch-all expression, e.g.
if str = "+" orelse str = "-" then "PLUS MINUS" else
if str = "*" orelse str = "/" then "MULT DIV" else ...
Expanding upon IonuČ›'s example, you can even use datatypes with other types in them, but their types (and identifier assignments) must match:
datatype mytype = COST as int | QUANTITY as int | PERSON as string | PET as string;
case item of
(COST n|QUANTITY n) => print Int.toString n
|(PERSON name|PET name) => print name
If the types or names don't match, it will get rejected:
case item of
(COST n|PERSON n) => (* fails because COST is int and PERSON is string *)
(COST n|QUANTITY q) => (* fails because the identifiers are different *)
And these patterns work in function definitions as well:
fun myfun (COST n|QUANTITY n) = print Int.toString n
|myfun (PERSON name|PET name) = print name
;
Related
I have a use case with this data:
1. "apple+case"
2. "apple+case+10+cover"
3. "apple+case+10++cover"
4. "+apple"
5. "iphone8+"
Currently, I am doing this to replace the + with space as follows:
def normalizer(value: String): String = {
if (value == null) {
null
} else {
value.replaceAll("\\+", BLANK_SPACE)
}
}
val testUDF = udf(normalizer(_: String): String)
df.withColumn("newCol", testUDF($"value"))
But this is replacing all "+". How do I replace "+" that comes between strings while also handling use cases like: "apple+case+10++cover" => "apple case 10+ cover"?
The output should be
1. "apple case"
2. "apple case 10 cover"
3. "apple case 10+ cover"
4. "apple"
5. "iphone8+"
You can use regexp_replace to do this instead of a udf, it should be much faster. For most of the cases, you can use negative lookahead in the regexp, but for "+apple" you actually want to replace "+" with "" (and not a space). The easiest way is to simply use to regexps.
df.withColumn("newCol", regexp_replace($"value", "^\\+", ""))
.withColumn("newCol", regexp_replace($"newCol", "\\+(?!\\+|$)", " "))
This will give:
+--------------------+--------------------+
|value |newCol |
+--------------------+--------------------+
|apple+case |apple case |
|apple+case+10+cover |apple case 10 cover |
|apple+case+10++cover|apple case 10+ cover|
|+apple |apple |
|iphone8+ |iphone8+ |
+--------------------+--------------------+
To make this more modular and reusable, you can define it as a function:
def normalizer(c: String) = regexp_replace(regexp_replace(col(c), "^\\+", ""), "\\+(?!\\+|$)", " ")
df.withColumn("newCol", normalizer("value"))
You may try making two regex replacements:
df.withColumn("newCol", regexp_replace(
regexp_replace(testUDF("value"), "(?<=\d)\+(?!\+)", "+ "),
"(?<!\d)\+", " ")).show
The inner regex replacement would target the edge case of single plus preceded by a digit, which should be replaced by adding a space (but not deleting the plus). Example:
apple+case+10+cover --> apple+case+10+ cover
The outer regex replacement then targets all pluses which are not preceded by a digit, and replaces them with a space. Example, continuing from above:
apple+case+10+ cover --> apple case 10+ cover
I'm writing a parser in scala that reads a string composed by repetitions of '+', '-', '<', '>' and '.' characters. The string may also have '[' and ']' characters and inside them there is a repetition of the first group of characters.
I need a Regex that matches everything inside square brackets, the problem is that the brackets can be nested.
I've already tried with this regex: \[.*\] and many others that I've found on SO but none seems to be working.
The regex I'm looking for should work like this:
"[+++.]" matches "+++."
"[++[-]]" should match "++[-]"
edit (added a use case):
"[+++.] [++[-]]" should NOT match "+++.] [++[-]" but 2 matches of "+++." and "++[-]"
That would be pretty tough with a single regex, but with some post-processing you might get a bit closer.
def parse(s :String) :Array[String] =
"\\[(.*)\\]".r.unanchored
.findAllMatchIn(s)
.toArray
.flatMap(_.group(1).split(raw"][^\[\]]+\["))
usage:
parse("+++.]") //res0: Array[String] = Array()
parse("[+++.]") //res1: Array[String] = Array("+++.")
parse("[++[-]]") //res2: Array[String] = Array("++[-]")
parse("[+++.] [++[-]]") //res3: Array[String] = Array("+++.", "++[-]")
parse("[++[-]--] [+]") //res4: Array[String] = Array(++[-]--, +)
After some research I think I may have found the solution, however it is not usable in Scala. What is needed is a recursive regex that matches balanced constructs, in my case:
\[(?:[+-\[\]]|(?R))*\]
and as far as I know these kind are not supported in scala, so I'll just leave this here if someone needs it for other languages.
However I solved my problem by implementing the parser in another way, I just thought that having a regex like that would have been a simpler and smoother solution.
What I was implementing was a brainfuck language interpreter and here is my parser class:
class brainfuck(var pointer: Int, var array: Array[Int]) extends JavaTokenParsers {
def Program = rep(Statement) ^^ { _ => () }
def Statement: Parser[Unit] =
"+" ^^ { _ => array(pointer) = array(pointer) + 1 } |
"-" ^^ { _ => array(pointer) = array(pointer) - 1 } |
"." ^^ { _ => println("elem: " + array(pointer).toChar) } |
"," ^^ { _ => array(pointer) = readChar().toInt } |
">" ^^ { _ => pointer = pointer + 1 } |
"<" ^^ { _ => pointer = pointer - 1 } |
"[" ~> rep(block|squares) <~ "]" ^^ { items => while(array(pointer)!=0) { parseAll(Program,items.mkString) } }
def block =
"""[-+.,<>]""".r ^^ { b => b.toString() }
def squares: Parser[String] = "[" ~> rep(block|squares) <~ "]" ^^ { b => var res = "[" + b.mkString + "]"; res }
}
I am trying to parse an expression with (<, <=, >=, >). All but <= works just fine. Can someone help what could be the issue.
Code:
object MyTestParser extends RegexParsers {
override def skipWhitespace = true
private val expression: Parser[String] = """[a-zA-Z0-9\.]+""".r
val operation: Parser[Try[Boolean]] =
expression ~ ("<" | "<=" | ">=" | ">") ~ expression ^^ {
case v1 ~ op ~ v2 => for {
a <- Try(v1.toDouble)
b <- Try(v2.toDouble)
} yield op match {
case "<" => a < b
case "<=" => a <= b
case ">" => a > b
case ">=" => a >= b
}
}
}
Test:
"MyTestParser" should {
"successfully parse <= condition" in {
val parser = MyTestParser.parseAll(MyTestParser.operation, "10 <= 20")
val result = parser match {
case MyTestParser.Success(s, _) => s.get
case MyTestParser.Failure(e, _) =>
println(s"Parsing failed with error: $e")
false
case MyTestParser.Error(e, _) =>
println(s"Parsing error: $e")
false
}
result === true
}
"successfully parse >= condition" in {
val result = MyTestParser.parseAll(MyTestParser.operation, "50 >= 20").get
result === scala.util.Success(true)
}
}
Error for <= condition:
Parsing failed with error: string matching regex `[a-zA-Z0-9\.]+' expected but `=' found
You need to change the order of the alternatives so that the longest options could be checked first.
expression ~ ( "<=" | ">=" | ">" | "<") ~ expression ^^ {
If the shortest alternative matches first, others are not considered at all.
Also note that a period does not have to be escaped inside a character class, this will do:
"""[a-zA-Z0-9.]+""".r
Your problem is that "<" is matched by <=, so it moves on to trying the expression. If you change the order so that "<=" comes first, that will be matched instead, and you will get the desired result.
#Prateek: it does not work cause the regex engine works just like a boolean OR. It does not search further if one of the patterns in the or-chain is satisfied at a certain point.
So, when use | between patterns, if two or more patterns have substring in common, you have to place the longest first.
As a general rule: order the patterns starting from the longest to the shortest.
Change the relevant line like this make it works:
// It works as expected with '>= / >' also before for the same reason
expression ~ ("<=" | "<" | ">=" | ">") ~ expression ^^ {
Or you want to follow the general rule:
expression ~ ("<=" | ">=" | "<" | ">") ~ expression ^^ {
Why ++ becomes -+-+-+- ?
I'd like to clean a string from double operating signs. How should I process ?
String = "++"
print (String ) -- -> ++
String = string.gsub( String, "++", "+")
print (String ) -- -> + ok
String = string.gsub( String, "--", "+")
print (String ) -- -> +++ ?
String = string.gsub( String, "+-", "-")
print (String ) -- -> -+-+-+- ??
String = string.gsub( String, "-+", "-")
print (String ) -- -> -+-+-+- ??? ;-)
The core problem is that gsub operates on patterns (Lua's minimal regular expressions) and your string contains unescaped magic characters. However, even knowing that I found myself surprised by your results.
It's easier to see what gsub is doing if we change the replacement string:
string.gsub('+', '--', '|') => |+|
string.gsub('+++', '--', '|') => |+|+|+|
- means "0 or more occurrences of the preceding atom". Unlike +, it's non-greedy, matching the fewest characters possible.
I just tested it and apparently "fewest characters possible" mostly means 0 characters. For instance, my intuition about this:
string.gsub('aaa','a-', '|')
Is that the expression a- would match each a, replace them with '|', resulting in '|||'. In fact, it matches on the 0-length gaps before and after each character, resulting in: '|a|a|a|'
In fact, it doesn't matter what atom we precede with -, it always matches on the smallest length, 0:
string.gsub('aaa','x-', '|') => |a|a|a|
string.gsub('aaa','a-', '|') => |a|a|a|
string.gsub('aaa','?-', '|') => |a|a|a|
string.gsub('aaa','--', '|') => |a|a|a|
You can see that last one is your case and explains your results. Your next result is the exact same thing:
string.gsub('+++','+-','|') => |+|+|+|
Your final result is more straightforward:
string.gsub('-+-+-+-','-+','|') => |+|+|+|
In this case, you're matching "1 or more occurances of the atom -", so you're just replacing the - characters, just as you'd expect.
Lets say I have a list of type integer [blah;blah;blah;...] and i don't know the size of the lis and I want to pattern match and not print the first element of the list. Is there any way to do this without using a if else case or having a syntax error?
because all i'm trying to do is parse a file tha looks like a/path/to/blah/blah/../file.c
and only print the path/to/blah/blah
for example, can it be done like this?
let out x = Printf.printf " %s \n" x
let _ = try
while true do
let line = input_line stdin in
...
let rec f (xpath: string list) : ( string list ) =
begin match Str.split (Str.regexp "/") xpath with
| _::rest -> out (String.concat "/" _::xpath);
| _ -> ()
end
but if i do this i have a syntax error at the line of String.concat!!
String.concat "/" _::xpath doesn't mean anything because _ is pattern but not a value. _ can be used in the left part of a pattern matching but not in the right part.
What you want to do is String.concat "/" rest.
Even if _::xpath were correct, String.concat "/" _::xpath would be interpreted as (String.concat "/" _)::xpath whereas you want it to be interpreted as String.concat "/" (_::xpath).