I need to use a regex to match a pattern in Scala and I currently have a Regex that is
InputPattern: scala.util.matching.Regex = put (.*) in (.*)
When I run the follwing this happens:
scala> val InputPattern(verb, item, prep, obj) = "put a in b";
scala.MatchError: put a in b (of class java.lang.String)
... 33 elided
I would like it to end up with verb("put"), item("a"), prep("in"), and obj("b") for input "put a in b" and also verb("put"), item(""), prep("in"), and obj("") for input "put in".
Thanks
This works for your special cases :
scala> val InputPattern = "(put) (.*?) ?(in) ?(.*?)".r
InputPattern: scala.util.matching.Regex = (put) (.*) ?(in) ?(.*)
scala> val InputPattern(verb, item, prep, obj) = "put a in b"
verb: String = put
item: String = a
prep: String = in
obj: String = b
scala> val InputPattern(verb, item, prep, obj) = "put in"
verb: String = put
item: String = ""
prep: String = in
obj: String = ""
put and in here are also captured in groups to participate in pattern matching. I also used lazy regexps (.*?) to capture as less as possible, you may replace it with (\S*). ? gives you optional space to match
"put in" (with one space between put and in and no space at the end).
But be aware of this:
scala> val InputPattern(verb, item, prep, obj) = "put ainb"
verb: String = put
item: String = a
prep: String = in
obj: String = b
scala> val InputPattern(verb, item, prep, obj) = "put aininb"
verb: String = put
item: String = a
prep: String = in
obj: String = inb
scala> val InputPattern(verb, item, prep, obj) = "put ain"
verb: String = put
item: String = a
prep: String = in
obj: String = ""
If you have simple command interpreter it may be even good, otherwise you should match your special cases separately.
To process a simple (not natural) language, you may also consider StandardTokenParsers, as they are context-free (Chomsky type 2):
import scala.util.parsing.combinator.syntactical._
val p = new StandardTokenParsers {
lexical.reserved ++= List("put", "in")
def p = "put" ~ opt(ident) ~ "in" ~ opt(ident)
}
scala> p.p(new p.lexical.Scanner("put a in b"))
warning: there was one feature warning; re-run with -feature for details
res13 = [1.11] parsed: (((put~Some(a))~in)~Some(b))
scala> p.p(new p.lexical.Scanner("put in"))
warning: there was one feature warning; re-run with -feature for details
res14 = [1.7] parsed: (((put~None)~in)~None)
You can write one regexp for all cases, but I'm not sure it would be readable and maintainable. I prefer simple approach:
val pattern1 = "(put) (.*) (in) (.*)".r
val pattern2 = "(put) (in)".r
def parse(text: String) = text match {
case pattern1(verb, item, prep, obj) => (verb, item, prep, obj);
case pattern2(verb, prep) => (verb, "", prep, "")
}
scala> parse("put a in b")
res6: (String, String, String, String) = (put,a,in,b)
scala> parse("put in")
res7: (String, String, String, String) = (put,"",in,"")
And one extra notion: I hope you know what you are doing! RegEx is a Chomsky Type 3 grammar and natural language is much more complex. If you need natural language parser, you can use already available solution such as Stanford NLP parser.
Related
Exercise: given a string with a name, then space or newline, then email, then maybe newline and some text separated by newlines capture the name and the domain of email.
So I created the following:
val regexp = "^([a-zA-Z]+)(?:\\s|\\n)\\w+#(\\w+\\.\\w+)(?:.|\\r|\\n)*".r
def fun(str: String): String = {
val result = str match {
case regexp(name, domain) => name + ' ' + domain
case _ => "invalid"
}
result
}
And started testing:
scala> val input = "oleg oleg#email.com"
scala> fun(input)
res17: String = oleg email.com
scala> val input = "oleg\noleg#email.com"
scala> fun(input)
res18: String = oleg email.com
scala> val input = """oleg
| oleg#email.com
| 7bdaf0a1be3"""
scala> fun(input)
res19: String = oleg email.com
scala> val input = """oleg
| oleg#email.com
| 7bdaf0a1be3
| """
scala> fun(input)
res20: String = invalid
Why doesn't the regexp capture the string with the newline at the end?
This part (?:\\s|\\n) can be shortened to \s as it will also match a newline, and as there is still a space before the emails where you are using multiple lines it can be \s+ to repeat it 1 or more times.
Matching any character like this (?:.|\\r|\\n)* if very inefficient due to the alternation. You can use either [\S\s]* or use an inline modifier (?s) to make the dot match a newline.
But using your pattern to just get the name and the domain of the email you don't have to match what comes after it, as you are using the 2 capturing groups in the output.
^([a-zA-Z]+)\s+\w+#(\w+\.\w+)
Regex demo
If you do want to match all that follows, you can use:
val regexp = """(?s)^([a-zA-Z]+)\s+\w+#(\w+\.\w+).*""".r
def fun(str: String): String = {
val result = str match {
case regexp(name, domain) => name + ' ' + domain
case _ => "invalid"
}
result
}
Scala demo
Note that this pattern \w+#(\w+\.\w+) is very limited for matching an email
My program is:
val pattern = "[*]prefix_([a-zA-Z]*)_[*]".r
val outputFieldMod = "TRASHprefix_target_TRASH"
var tar =
outputFieldMod match {
case pattern(target) => target
}
println(tar)
Basically, I try to get the "target" and ignore "TRASH" (I used *). But it has some error and I am not sure why..
Simple and straight forward standard library function (unanchored)
Use Unanchored
Solution one
Use unanchored on the pattern to match inside the string ignoring the trash
val pattern = "prefix_([a-zA-Z]*)_".r.unanchored
unanchored will only match the pattern ignoring all the trash (all the other words)
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """foo\((.*)\)""".r.unanchored
pattern: scala.util.matching.UnanchoredRegex = foo\((.*)\)
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res3: String = bar
Solution two
Pad your pattern from both sides with .*. .* matches any char other than a linebreak character.
val pattern = ".*prefix_([a-zA-Z]*)_.*".r
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """.*foo\((.*)\).*""".r
pattern: scala.util.matching.Regex = .*foo\((.*)\).*
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res4: String = bar
This will work, val pattern = ".*prefix_([a-z]+).*".r, but it distinguishes between target and trash via lower/upper-case letters. Whatever determines real target data from trash data will determine the real regex pattern.
I am new to Scala and want to create a function to split Hello123 or Hello 123 into two strings as follows:
val string1 = 123
val string2 = Hello
What is the best way to do it, I have attempted to use regex matching \\d and \\D but I am not sure how to write the function fully.
Regards
You may replace with 0+ whitespaces (\s*+) that are preceded with letters and followed with digits:
var str = "Hello123"
val res = str.split("(?<=[a-zA-Z])\\s*+(?=\\d)")
println(res.deep.mkString(", ")) // => Hello, 123
See the online Scala demo
Pattern details:
(?<=[a-zA-Z]) - a positive lookbehind that only checks (but does not consume the matched text) if there is an ASCII letter before the current position in the string
\\s*+ - matches (consumes) zero or more spaces possessively, i.e.
(?=\\d) - this check is performed only once after the whitespaces - if any - were matched, and it requires a digit to appear right after the current position in the string.
Based on the given string I assume you have to match a string and a number with any number of spaces in between
here is the regex for that
([a-zA-Z]+)\\s*(\\d+)
Now create a regex object using .r
"([a-zA-Z]+)\\s*(\\d+)".r
Scala REPL
scala> val regex = "([a-zA-Z]+)\\s*(\\d+)".r
scala> val regex(a, b) = "hello 123"
a: String = "hello"
b: String = "123"
scala> val regex(a, b) = "hello123"
a: String = "hello"
b: String = "123"
Function to handle pattern matching safely
pattern match with extractors
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
Here is the function which does Regex with Pattern matching
def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
Scala REPL
scala> def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
defined function matchStr
scala> matchStr("Hello123")
res41: Option[(String, Int)] = Some(("Hello", 123))
scala> matchStr("Hello 123")
res42: Option[(String, Int)] = Some(("Hello", 123))
I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}
Could you guys please tell me what I'm doing incorrectly trying to extract using regex pattern-matching? I have following code
val Pattern = "=".r
val Pattern(key, value) = "key=value"
And I get following exception in runtime
Exception in thread "main" scala.MatchError: key=value (of class java.lang.String)
That's more of a regular expression problem: your regex does not capture any groups, it just matches a single = character.
With
val Pattern = "([^=]*)=(.*)".r
you will get:
scala> val Pattern(key, value) = "key=value"
key: String = key
value: String = value
Edit:
Also, that won't match if the input string is empty. You can change the pattern to make it match, or (better) you can pattern match with the regex, like so:
"key=value" match {
case Pattern(k, v) => // do something
case _ => // wrong input, do nothing
}
If what you actually wanted was to split the input text with whatever the regex matches, that is also possible using Regex.split:
scala> val Pattern = "=".r
Pattern: scala.util.matching.Regex = =
scala> val Array(key, value) = Pattern.split("key=value")
key: String = key
value: String = value