Scala: string pattern matching and splitting - regex

I am new to Scala and want to create a function to split Hello123 or Hello 123 into two strings as follows:
val string1 = 123
val string2 = Hello
What is the best way to do it, I have attempted to use regex matching \\d and \\D but I am not sure how to write the function fully.
Regards

You may replace with 0+ whitespaces (\s*+) that are preceded with letters and followed with digits:
var str = "Hello123"
val res = str.split("(?<=[a-zA-Z])\\s*+(?=\\d)")
println(res.deep.mkString(", ")) // => Hello, 123
See the online Scala demo
Pattern details:
(?<=[a-zA-Z]) - a positive lookbehind that only checks (but does not consume the matched text) if there is an ASCII letter before the current position in the string
\\s*+ - matches (consumes) zero or more spaces possessively, i.e.
(?=\\d) - this check is performed only once after the whitespaces - if any - were matched, and it requires a digit to appear right after the current position in the string.

Based on the given string I assume you have to match a string and a number with any number of spaces in between
here is the regex for that
([a-zA-Z]+)\\s*(\\d+)
Now create a regex object using .r
"([a-zA-Z]+)\\s*(\\d+)".r
Scala REPL
scala> val regex = "([a-zA-Z]+)\\s*(\\d+)".r
scala> val regex(a, b) = "hello 123"
a: String = "hello"
b: String = "123"
scala> val regex(a, b) = "hello123"
a: String = "hello"
b: String = "123"
Function to handle pattern matching safely
pattern match with extractors
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
Here is the function which does Regex with Pattern matching
def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
Scala REPL
scala> def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
defined function matchStr
scala> matchStr("Hello123")
res41: Option[(String, Int)] = Some(("Hello", 123))
scala> matchStr("Hello 123")
res42: Option[(String, Int)] = Some(("Hello", 123))

Related

Regex doesn't work when newline is at the end of the string

Exercise: given a string with a name, then space or newline, then email, then maybe newline and some text separated by newlines capture the name and the domain of email.
So I created the following:
val regexp = "^([a-zA-Z]+)(?:\\s|\\n)\\w+#(\\w+\\.\\w+)(?:.|\\r|\\n)*".r
def fun(str: String): String = {
val result = str match {
case regexp(name, domain) => name + ' ' + domain
case _ => "invalid"
}
result
}
And started testing:
scala> val input = "oleg oleg#email.com"
scala> fun(input)
res17: String = oleg email.com
scala> val input = "oleg\noleg#email.com"
scala> fun(input)
res18: String = oleg email.com
scala> val input = """oleg
| oleg#email.com
| 7bdaf0a1be3"""
scala> fun(input)
res19: String = oleg email.com
scala> val input = """oleg
| oleg#email.com
| 7bdaf0a1be3
| """
scala> fun(input)
res20: String = invalid
Why doesn't the regexp capture the string with the newline at the end?
This part (?:\\s|\\n) can be shortened to \s as it will also match a newline, and as there is still a space before the emails where you are using multiple lines it can be \s+ to repeat it 1 or more times.
Matching any character like this (?:.|\\r|\\n)* if very inefficient due to the alternation. You can use either [\S\s]* or use an inline modifier (?s) to make the dot match a newline.
But using your pattern to just get the name and the domain of the email you don't have to match what comes after it, as you are using the 2 capturing groups in the output.
^([a-zA-Z]+)\s+\w+#(\w+\.\w+)
Regex demo
If you do want to match all that follows, you can use:
val regexp = """(?s)^([a-zA-Z]+)\s+\w+#(\w+\.\w+).*""".r
def fun(str: String): String = {
val result = str match {
case regexp(name, domain) => name + ' ' + domain
case _ => "invalid"
}
result
}
Scala demo
Note that this pattern \w+#(\w+\.\w+) is very limited for matching an email

How to pull string value in url using scala regex?

I have below urls in my applications, I want to take one of the value in urls.
For example:
rapidvie value 416
Input URL: http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
Output should be: 416
I've written the code in scala using import java.util.regex.{Matcher, Pattern}
val p: Pattern = Pattern.compile("[?&]rapidView=(\\d+)[?&]")**strong text**
val m:Matcher = p.matcher(url)
if(m.find())
println(m.group(1))
I am getting output, but i want to migrate this scala using scala.util.matching library.
How to implement this in simply?
This code is working with java utils.
In Scala, you may use an unanchored regex within a match block to get just the captured part:
val s = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
val pattern ="""[?&]rapidView=(\d+)""".r.unanchored
val res = s match {
case pattern(rapidView) => rapidView
case _ => ""
}
println(res)
// => 416
See the Scala demo
Details:
"""[?&]rapidView=(\d+)""".r.unanchored - the triple quoted string literal allows using single backslashes with regex escapes, and the .unanchored property makes the regex match partially, not the entire string
pattern(rapidView) gets the 1 or more digits part (captured with (\d+)) if a pattern finds a partial match
case _ => "" will return an empty string upon no match.
You can do this quite easily with Scala:
scala> val url = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
url: String = http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
scala> url.split("rapidView=").tail.head.split("&").head
res0: String = 416
You can also extend it by parameterize the search word:
scala> def searchParam(sp: String) = sp + "="
searchParam: (sp: String)String
scala> val sw = "rapidView"
sw: String = rapidView
And just search with the parameter name
scala> url.split(searchParam(sw)).tail.head.split("&").head
res1: String = 416
scala> val sw2 = "projectKey"
sw2: String = projectKey
scala> url.split(searchParam(sw2)).tail.head.split("&").head
res2: String = DSCI

Scala, regex matching ignore unnecessary words

My program is:
val pattern = "[*]prefix_([a-zA-Z]*)_[*]".r
val outputFieldMod = "TRASHprefix_target_TRASH"
var tar =
outputFieldMod match {
case pattern(target) => target
}
println(tar)
Basically, I try to get the "target" and ignore "TRASH" (I used *). But it has some error and I am not sure why..
Simple and straight forward standard library function (unanchored)
Use Unanchored
Solution one
Use unanchored on the pattern to match inside the string ignoring the trash
val pattern = "prefix_([a-zA-Z]*)_".r.unanchored
unanchored will only match the pattern ignoring all the trash (all the other words)
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """foo\((.*)\)""".r.unanchored
pattern: scala.util.matching.UnanchoredRegex = foo\((.*)\)
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res3: String = bar
Solution two
Pad your pattern from both sides with .*. .* matches any char other than a linebreak character.
val pattern = ".*prefix_([a-zA-Z]*)_.*".r
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """.*foo\((.*)\).*""".r
pattern: scala.util.matching.Regex = .*foo\((.*)\).*
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res4: String = bar
This will work, val pattern = ".*prefix_([a-z]+).*".r, but it distinguishes between target and trash via lower/upper-case letters. Whatever determines real target data from trash data will determine the real regex pattern.

Scala: how to construct a regex including a variable?

I want to test if a regex including a variable that I defined before matches a string.
For instance I would like to do :
val value = "abc"
regex = "[^a-z]".r + value //something like that
if(regex matches ".abc") print("ok, it works")
So my question is : how can I add construct a regex including a variable in scala?
("[^a-z]" + value).r
is all you need
You must quote the value string to protect against special regex syntax.
scala> val value = "*"
value: String = *
scala> val oops = """[^a-z]""" + value r
oops: scala.util.matching.Regex = [^a-z]*
scala> ".*" match { case oops() => }
scala> ".....*" match { case oops() => }
They added quoting to the Scala API:
scala> import util.matching._
import util.matching._
scala> val ok = """[^a-z]""" + Regex.quote(value) r
ok: scala.util.matching.Regex = [^a-z]\Q*\E
scala> ".*" match { case ok() => }
scala> ".....*" match { case ok() => }
scala.MatchError: .....* (of class java.lang.String)
... 33 elided
You could also generalize the pattern and do additional checks:
scala> val r = """\W(\w+)""".r
r: scala.util.matching.Regex = \W(\w+)
scala> ".abc" match { case r(s) if s == "abc" => }
Parsing and building the regex itself is relatively expensive, so usually it's desirable to do it once with a general pattern.

How to pattern match using regular expression in Scala?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:
case Process(word) =>
word.firstLetter match {
case([a-c][A-C]) =>
case _ =>
}
}
But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?
You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.
val Pattern = "([a-cA-C])".r
word.firstLetter match {
case Pattern(c) => c bound to capture group here
case _ =>
}
Since version 2.10, one can use Scala's string interpolation feature:
implicit class RegexOps(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true
Even better one can bind regular expression groups:
scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123
scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25
It is also possible to set more detailed binding mechanisms:
scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler
scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20
scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive
scala> "10" match { case r"(\d\d)${d # isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10
An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:
object T {
class RegexpExtractor(params: List[String]) {
def unapplySeq(str: String) =
params.headOption flatMap (_.r unapplySeq str)
}
class StartsWithExtractor(params: List[String]) {
def unapply(str: String) =
params.headOption filter (str startsWith _) map (_ => str)
}
class MapExtractor(keys: List[String]) {
def unapplySeq[T](map: Map[String, T]) =
Some(keys.map(map get _))
}
import scala.language.dynamics
class ExtractorParams(params: List[String]) extends Dynamic {
val Map = new MapExtractor(params)
val StartsWith = new StartsWithExtractor(params)
val Regexp = new RegexpExtractor(params)
def selectDynamic(name: String) =
new ExtractorParams(params :+ name)
}
object p extends ExtractorParams(Nil)
Map("firstName" -> "John", "lastName" -> "Doe") match {
case p.firstName.lastName.Map(
Some(p.Jo.StartsWith(fn)),
Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
println(s"Match! $fn ...$lastChar")
case _ => println("nope")
}
}
As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:
word.matches("[a-cA-C].*")
You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).
To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:
val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
case Process("b", _) => println("first: 'a', some rest")
case Process(_, rest) => println("some first, rest: " + rest)
// etc.
}
String.matches is the way to do pattern matching in the regex sense.
But as a handy aside, word.firstLetter in real Scala code looks like:
word(0)
Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:
"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true
I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.
EDIT:
To be clear, the way I would do this is, as others have said:
"Cat".matches("^[a-cA-C].*")
res14: Boolean = true
Just wanted to show an example as close as possible to your initial pseudocode. Cheers!
First we should know that regular expression can separately be used. Here is an example:
import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
case Some(v) => println(v)
case _ =>
} // output: Scala
Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
case date(year, month, day) => "hello"
} // output: hello
In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex
Note that the approach from #AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:
scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*
scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo
scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
And with no .* at the end:
scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)
scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match