For a given string check if matches the pattern [scala] - regex

I am beginner in Scala and I was wondering how I can build a function to check if it matches a definite pattern or not?
For example:
def patternFound(s:String): Boolean = (s) match {
case s matches xyxy pattern => true //where x,y are two consecutive characters in the string
case s matches xxyy pattern => false //where x, y are two characters in that string
case (_) => false //default
}
//Here x,y are not definite characters but the string s should match a pattern
//which consist a string of pattern containing characters in alternating positions
patternFound("babab")//true because pattern of xyxy found in it
patternFound("baabba")//false because pattern of xxyy found in it
Can anyone show with an example how I can achieve this?
Looking for a solution which returns true for any occurrence of xyxyxy pattern in a string, but returns false when the pattern is xxyy in that string.
Example: The function should return true if the string is "babab" or
"ababa" (which has pattern xyxy in it), but returns false for "aabba"
or "bbaab" (which has the pattern xxyy in it)
Any help is appreciated! Thank you in advance.

For the two examples you've posted, these two Regex patterns will cover it.
def patternFound(s:String): Boolean = {
val ptrn1 = "(.)(.)\\1\\2".r
val ptrn2 = "(.)\\1(.)\\2".r
s match {
case ptrn1(_,_) => true
case ptrn2(_,_) => true
case _ => false
}
}
proof:
patternFound("rrgg") // res0: Boolean = true
patternFound("hqhq") // res1: Boolean = true
patternFound("cccx") // res2: Boolean = false
But I suspect that your requirements, as stated, are not specific enough to cover exactly what you're looking for.
UPDATE
You're 2nd requirement now makes no sense. Everything that doesn't match the 1st pattern will return false so there's no point in testing for a specific pattern to return false.
def patternFound(s:String): Boolean = {
val ptrn = "(.)(.)\\1\\2".r.unanchored
s match {
case ptrn(_,_) => true
case _ => false
}
}
patternFound("babab") //true because pattern of xyxy found in it
patternFound("baabba") //false because it doesn't match the target pattern

The syntax is not correct.
You need to remove "s matches" from the body of the function, it is already in the method definition line "(s) match".
See also https://docs.scala-lang.org/tour/pattern-matching.html

This might be possible with regular expressions and look arounds, but I just created a helper function:
/**
* Checks for recurring pattern in a String
* #param s The input String to check
* #param patternSize The size of the expected pattern. For example in the String "aabbaabbaabb" the pattern is "aabb" which is a length of 4
*/
def checkPattern(s: String, patternSize: Int): Boolean = {
val grouped = s.grouped(patternSize)
grouped.toSet.size == 1 // everything should be the same
}
Some example usage of that function:
checkPattern("abababab", 2) // true
checkPattern("aabbaabbaabb", 2) // false
checkPattern("aabbaabbaabb", 4) // true
checkPattern("abcabcabc", 3) // true
So for your code you could do use it with some guard statements:
def patternFound(s: String): Boolean = s match {
case "" => false // empty Strings can't have patterns
case s if checkPattern(s, 2) => true
case s if checkPattern(s, 4) => true
case _ => false
}
patternFound("ababababab") // true
patternFound("aabbaabb") // true
patternFound("aabbzz") // false
Edit: I think the other answer is better for what you are looking for, but here is my updated answer for you updated question:
def patternFound(s: String): Boolean = s match {
s.nonEmpty && checkPattern(s, 2)
}

Related

How do i use the regex in scala to check the first 3 chars of filename

What is the scala code to check the first 3 characters of a fileName is String
I want a boolean to be returned , If the first 3 chars of a fileName are letters , then true needs to be returned , otherwise false
val fileName = "ABC1234.dat"
val regex = "[A-Z]*".r
val result = fileName.substring(0,3) match {
case regex(fileName) => true
case _ => false
}
You could use findFirstIn matching 3 times a char a-zA-Z [A-Za-z]{3} or use \\p{L}{3} to match any letter from any language and check for nonEmpty on the Option
val fileName = "ABC1234.dat"
val regex = "[A-Za-z]{3}".r
regex.findFirstIn(fileName).nonEmpty
Output
res0: Boolean = true
If you want to use substring with matches as in the comment, matches takes a string as the regex and has to match the whole pattern.
fileName.substring(0,3).matches("(?i)[a-z]{3}")
Note that substring will give an StringIndexOutOfBoundsException if the string is shorter than the specified indices, and using findFirstIn with the Option would return false.

Scala regex "starts with lowercase alphabets" not working

val AlphabetPattern = "^([a-z]+)".r
def stringMatch(s: String) = s match {
case AlphabetPattern() => println("found")
case _ => println("not found")
}
If I try,
stringMatch("hello")
I get "not found", but I expected to get "found".
My understanding of the regex,
[a-z] = in the range of 'a' to 'z'
+ = one more of the previous pattern
^ = starts with
So regex AlphabetPattern is "all strings that start with one or more alphabets in the range a-z"
Surely I am missing something, want to know what.
Replace case AlphabetPattern() with case AlphabetPattern(_) and it works. The extractor pattern takes a variable to which it binds the result. Here we discard it but you could use x or whatever.
edit: Further to Randall's comment below, if you check the docs for Regex you'll see that it has an unapplySeq rather than an unapply method, which means it takes multiple variables. If you have the wrong number, it won't match, rather like
list match { case List(a,b,c) => a + b + c }
won't match if list doesn't have exactly 3 elements.
There are some issues with the match statement. s match is matching on the value of s which is checked against AlphabetPattern and _ which always evaluates to _ since s is never equal to "^([a-z]+)".r. Use one of the find methods in Scala.Util.Regex to look for a match with the given `Regex.
For example, using findFirstIn to find the first match of a string in AlphabetPattern.
scala> AlphabetPattern.findFirstIn("hello")
res0: Option[String] = Some(hello)
The stringMatch method using findFirstIn and a case statement:
scala> def stringMatch(s: String) = AlphabetPattern findFirstIn s match {
| case Some(s) => println("Found: " + s)
| case None => println("Not found")
| }
stringMatch: (s:String)Unit
scala> stringMatch("hello")
Found: hello

How to pattern match using regular expression in Scala?

I would like to be able to find a match between the first letter of a word, and one of the letters in a group such as "ABC". In pseudocode, this might look something like:
case Process(word) =>
word.firstLetter match {
case([a-c][A-C]) =>
case _ =>
}
}
But how do I grab the first letter in Scala instead of Java? How do I express the regular expression properly? Is it possible to do this within a case class?
You can do this because regular expressions define extractors but you need to define the regex pattern first. I don't have access to a Scala REPL to test this but something like this should work.
val Pattern = "([a-cA-C])".r
word.firstLetter match {
case Pattern(c) => c bound to capture group here
case _ =>
}
Since version 2.10, one can use Scala's string interpolation feature:
implicit class RegexOps(sc: StringContext) {
def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}
scala> "123" match { case r"\d+" => true case _ => false }
res34: Boolean = true
Even better one can bind regular expression groups:
scala> "123" match { case r"(\d+)$d" => d.toInt case _ => 0 }
res36: Int = 123
scala> "10+15" match { case r"(\d\d)${first}\+(\d\d)${second}" => first.toInt+second.toInt case _ => 0 }
res38: Int = 25
It is also possible to set more detailed binding mechanisms:
scala> object Doubler { def unapply(s: String) = Some(s.toInt*2) }
defined module Doubler
scala> "10" match { case r"(\d\d)${Doubler(d)}" => d case _ => 0 }
res40: Int = 20
scala> object isPositive { def unapply(s: String) = s.toInt >= 0 }
defined module isPositive
scala> "10" match { case r"(\d\d)${d # isPositive()}" => d.toInt case _ => 0 }
res56: Int = 10
An impressive example on what's possible with Dynamic is shown in the blog post Introduction to Type Dynamic:
object T {
class RegexpExtractor(params: List[String]) {
def unapplySeq(str: String) =
params.headOption flatMap (_.r unapplySeq str)
}
class StartsWithExtractor(params: List[String]) {
def unapply(str: String) =
params.headOption filter (str startsWith _) map (_ => str)
}
class MapExtractor(keys: List[String]) {
def unapplySeq[T](map: Map[String, T]) =
Some(keys.map(map get _))
}
import scala.language.dynamics
class ExtractorParams(params: List[String]) extends Dynamic {
val Map = new MapExtractor(params)
val StartsWith = new StartsWithExtractor(params)
val Regexp = new RegexpExtractor(params)
def selectDynamic(name: String) =
new ExtractorParams(params :+ name)
}
object p extends ExtractorParams(Nil)
Map("firstName" -> "John", "lastName" -> "Doe") match {
case p.firstName.lastName.Map(
Some(p.Jo.StartsWith(fn)),
Some(p.`.*(\\w)$`.Regexp(lastChar))) =>
println(s"Match! $fn ...$lastChar")
case _ => println("nope")
}
}
As delnan pointed out, the match keyword in Scala has nothing to do with regexes. To find out whether a string matches a regex, you can use the String.matches method. To find out whether a string starts with an a, b or c in lower or upper case, the regex would look like this:
word.matches("[a-cA-C].*")
You can read this regex as "one of the characters a, b, c, A, B or C followed by anything" (. means "any character" and * means "zero or more times", so ".*" is any string).
To expand a little on Andrew's answer: The fact that regular expressions define extractors can be used to decompose the substrings matched by the regex very nicely using Scala's pattern matching, e.g.:
val Process = """([a-cA-C])([^\s]+)""".r // define first, rest is non-space
for (p <- Process findAllIn "aha bah Cah dah") p match {
case Process("b", _) => println("first: 'a', some rest")
case Process(_, rest) => println("some first, rest: " + rest)
// etc.
}
String.matches is the way to do pattern matching in the regex sense.
But as a handy aside, word.firstLetter in real Scala code looks like:
word(0)
Scala treats Strings as a sequence of Char's, so if for some reason you wanted to explicitly get the first character of the String and match it, you could use something like this:
"Cat"(0).toString.matches("[a-cA-C]")
res10: Boolean = true
I'm not proposing this as the general way to do regex pattern matching, but it's in line with your proposed approach to first find the first character of a String and then match it against a regex.
EDIT:
To be clear, the way I would do this is, as others have said:
"Cat".matches("^[a-cA-C].*")
res14: Boolean = true
Just wanted to show an example as close as possible to your initial pseudocode. Cheers!
First we should know that regular expression can separately be used. Here is an example:
import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
case Some(v) => println(v)
case _ =>
} // output: Scala
Second we should notice that combining regular expression with pattern matching would be very powerful. Here is a simple example.
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
case date(year, month, day) => "hello"
} // output: hello
In fact, regular expression itself is already very powerful; the only thing we need to do is to make it more powerful by Scala. Here are more examples in Scala Document: http://www.scala-lang.org/files/archive/api/current/index.html#scala.util.matching.Regex
Note that the approach from #AndrewMyers's answer matches the entire string to the regular expression, with the effect of anchoring the regular expression at both ends of the string using ^ and $. Example:
scala> val MY_RE = "(foo|bar).*".r
MY_RE: scala.util.matching.Regex = (foo|bar).*
scala> val result = "foo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = foo
scala> val result = "baz123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
scala> val result = "abcfoo123" match { case MY_RE(m) => m; case _ => "No match" }
result: String = No match
And with no .* at the end:
scala> val MY_RE2 = "(foo|bar)".r
MY_RE2: scala.util.matching.Regex = (foo|bar)
scala> val result = "foo123" match { case MY_RE2(m) => m; case _ => "No match" }
result: String = No match

Scala capture group using regex

Let's say I have this code:
val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)
I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:
val pattern = """one.*two(\d+)three""".r
val pattern(aMatch) = string
println(aMatch) // prints 483
Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?
Here's an example of how you can access group(1) of each match:
val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
This prints "483" (as seen on ideone.com).
The lookaround option
Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:
val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)
The above also prints "483" (as seen on ideone.com).
References
regular-expressions.info/Lookarounds
val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r
string match {
case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
case _ => // no match
}
Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:
"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"
Or even:
val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483
If you expect non matching input, you can add a default pattern guard:
"one493deux483three" match {
case s"${x}two${y}three" => y
case _ => "no match"
}
// String = "no match"
You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".
See this regex tutorial.
def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
case Some(i) => i.group(1)
case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
"http4://testing.bbmkl.com/document/sth1234.zip")

How to check whether a String fully matches a Regex in Scala?

Assume I have a Regex pattern I want to match many Strings to.
val Digit = """\d""".r
I just want to check whether a given String fully matches the Regex. What is a good and idiomatic way to do this in Scala?
I know that I can pattern match on Regexes, but this is syntactically not very pleasing in this case, because I have no groups to extract:
scala> "5" match { case Digit() => true case _ => false }
res4: Boolean = true
Or I could fall back to the underlying Java pattern:
scala> Digit.pattern.matcher("5").matches
res6: Boolean = true
which is not elegant, either.
Is there a better solution?
Answering my own question I'll use the "pimp my library pattern"
object RegexUtils {
implicit class RichRegex(val underlying: Regex) extends AnyVal {
def matches(s: String) = underlying.pattern.matcher(s).matches
}
}
and use it like this
import RegexUtils._
val Digit = """\d""".r
if (Digit matches "5") println("match")
else println("no match")
unless someone comes up with a better (standard) solution.
Notes
I didn't pimp String to limit the scope of potential side effects.
unapplySeq does not read very well in that context.
I don't know Scala all that well, but it looks like you can just do:
"5".matches("\\d")
References
http://langref.org/scala/pattern-matching/matching
For the full match you may use unapplySeq. This method tries to match target (whole match) and returns the matches.
scala> val Digit = """\d""".r
Digit: scala.util.matching.Regex = \d
scala> Digit unapplySeq "1"
res9: Option[List[String]] = Some(List())
scala> Digit unapplySeq "123"
res10: Option[List[String]] = None
scala> Digit unapplySeq "string"
res11: Option[List[String]] = None
"""\d""".r.unapplySeq("5").isDefined //> res1: Boolean = true
"""\d""".r.unapplySeq("a").isDefined //> res2: Boolean = false
Using Standard Scala library and a pre-compiled regex pattern and pattern matching (which is scala state of the art):
val digit = """(\d)""".r
"2" match {
case digit( a) => println(a + " is Digit")
case _ => println("it is something else")
}
more to read: http://www.scala-lang.org/api/2.12.1/scala/util/matching/index.html
The answer is in the regex:
val Digit = """^\d$""".r
Then use the one of the existing methods.