Scala Regular Expressions (string delimited by double quotes) - regex

I am new to scala. I am trying to match a string delimited by double quotes, and I am a bit puzzled by the following behavior:
If I do the following:
val stringRegex = """"([^"]*)"(.*$)"""
val regex = stringRegex.r
val tidyTokens = Array[String]("1", "\"test\"", "'c'", "-23.3")
tidyTokens.foreach {
token => if (token.matches (stringRegex)) println (token + " matches!")
}
I get
"test" matches!
otherwise, if I do the following:
tidyTokens.foreach {
token => token match {
case regex(token) => println (token + " matches!")
case _ => println ("No match for token " + token)
}
}
I get
No match for token 1
No match for token "test"
No match for token 'c'
No match for token -23.3
Why doesn't "test" match in the second case?

Take your regular expression:
"([^"]*)"(.*$)
When compiled with .r, this string yields a regex object - which, if it matches it's input string, must yield 2 captured strings - one for the ([^"]*) and the other for the (.*$). Your code
case regex(token) => ...
Ought to reflect this, so maybe you want
case regex(token, otherStuff) => ...
Or just
case regex(token, _) => ...
Why? Because the case regex(matchedCaputures...) syntax works because regex is an
object with an unapplySeq method. case regex(token) => ... translates (roughly) to:
case List(token) => ...
Where List(token) is what regex.unapplySeq( inputString ) returns:
regex.unapplySeq("\"test\"") // Returns Some(List("test", ""))
Your regex does match the string "test" but in the case statement the regex extractor's unapplySeq method returns a list of 2 strings because that is what the regex says it captures. That's unfortunate, but the compiler can't help you here because regular expressions are compiled from strings at runtime.
One alternative would be to use a non-capturing group:
val stringRegex = """"([^"]*)"(?:.*$)"""
// ^^
Then your code would work, because regex will now be an extractor object whose
unapplySeq method returns only a single captured group:
tidyTokens foreach {
case regex(token) => println (token + " matches!")
case t => println ("No match for token " + t)
}
Have a look at the tutorial on Extractor Objects, for a better understanding on
how apply / unapply / unapplySeq works.

Related

Scala regex find matches in middle of string [duplicate]

This question already has an answer here:
Working regex fails when using Scala pattern matching
(1 answer)
Closed 5 years ago.
I have written the following code in scala:
val regex_str = "([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
which returns "other", but if I take off the leading underscore:
val regex_str = "([a-z]+)(\\d+)".r
"abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
I get "found". How can I find any ([a-z]+)(\\d+) instead of just at the beginning? I am used to other regex languages where you use a ^ to specify beginning of the string, and the absence of that just gets all matches.
Scala regex patterns default as "anchored", i.e. bound to beginning and end of target string.
You'll get the expected match with this.
val regex_str = "([a-z]+)(\\d+)".r.unanchored
Hi May be you need something like this,
val regex_str = "[^>]([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => println(s"found $a $n")
case _ => println("other")
}
This will avoid the first character from your string.
Hope this helps!
The unapplySeq of the Regex tries to capture the whole input by default (treats the pattern as if it was between ^ and $).
There are two ways to capture inside the input:
use .* before and after the captures: val regex_str = ".*([a-z]+)(\\d+).*".r
do the same with .unanchored: val regex_str = "([a-z]+)(\\d+)".r.unanchored
Otherwise scala treats regular expression anchors the same way as in other languages; this one is an exception made for semantic reasons.
The regex extractor in scala pattern-matching attempts to match the entire string. If you want to skip some junk-characters in the beginning and in the end, prepend a . with a reluctant quantifier to the regex:
val regex_str = ".*?([a-z]+)(\\d+).*".r
val result = "_!+<>__abc123_%$" match {
case regex_str(a, n) => s"found a = '$a', n = '$n'"
case _ => "no match"
}
println(result)
This outputs:
found a = 'abc', n = '123'
Otherwise, don't use the pattern match with the extractor, use "...".r.findAllIn to find all matches.

Scala equivalent for 'matches' regex method?

Struggling with my first (ever) Scala regex here. I need to see if a given String matches the regex: "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".
So, some examples:
animal<0,sega> => valid
animal<fizz,buzz> => valid
animAl<fizz,buzz> => illegal; animAl contains upper-case (and this is case-sensitive)
animal<fizz,3d> => valid
animal<,3d> => illegal; there needs to be something [a-zA-Z0-9]+ between '<' and ','
animal<fizz,> => illegal; there needs to be something [a-zA-Z0-9]+ between ',' and '>'
animal<fizz,%> => illegal; '%' doesn't match [a-zA-Z0-9]+
etc.
My best attempt so far:
val animalRegex = "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".r
animalRegex.findFirstIn("animal<fizz,buzz")
Unfortunately that's where I'm hitting a brick wall. findFirstIn and all the other obvious methods available of animalRegex all return Option[String] types. I was hoping to find something that returns a boolean, so something like:
val animalRegex = "animal<[a-zA-Z0-9]+,[a-zA-Z0-9]+>".r
if(animalRegex.matches("animal<fizz,buzz>")) {
val leftOperand : String = getLeftOperandSomehow(...)
val rightOperand : String = getRightOperandSomehow(...)
}
So I need the equivalent of Java's matches method, and then need a way to access the "left operand" (that is, the value of the first [a-zA-Z0-9]+ group, which in the current case is "fizz"), and then ditto for the right/second operand ("buzz"). Any ideas where I'm going awry?
To be able to extract the matched parts from your string, you'll need to add capture groups to your regex expression, like so (note the parentheses):
val animalRegex = "animal<([a-zA-Z0-9]+),([a-zA-Z0-9]+)>".r
Then, you can use Scala's pattern matching to check for a match and extract the operands from the string:
val str = "animal<fizz,3d>"
val result = str match {
case animalRegex(op1,op2) => s"$op1, $op2"
case _ => "Did not match"
}
In this example, result will contain "fizz, 3d"

Using regex in Scala to group and pattern match

I need to process phone numbers using regex and group them by (country code) (area code) (number). The input format:
country code: between 1-3 digits
, area code: between 1-3 digits
, number: between 4-10 digits
Examples:
1 877 2638277
91-011-23413627
And then I need to print out the groups like this:
CC=91,AC=011,Number=23413627
This is what I have so far:
String s = readLine
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val ret = pattern.findAllIn(s)
println("CC=" + ret.group(1) + "AC=" + ret.group(2) + "Number=" + ret.group(3));
The compiler said "empty iterator." I also tried:
val (cc,ac,n) = s
and that didn't work either. How to fix this?
The problem is with your pattern. I would recommend using some tool like RegexPal to test them. Put the pattern in the first text box and your provided examples in the second one. It will highlight the matched parts.
You added spaces between your groups and [ -] separators, and it was expecting spaces there. The correct pattern is:
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
Also if you want to explicitly get groups then you want to get a Match returned. For an example the findFirstMatchIn function returns the first optional Match or the findAllMatchIn returns a list of matches:
val allMatches = pattern.findAllMatchIn(s)
allMatches.foreach { m =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
}
val matched = pattern.findFirstMatchIn(s)
matched match {
case Some(m) =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
case None =>
println("There wasn't a match!")
}
I see you also tried extracting the string into variables. You have to use the Regex extractor in the following way:
val Pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val Pattern(cc, ac, n) = s
println(s"CC=${cc}AC=${ac}Number=$n")
And if you want to handle errors:
s match {
case Pattern(cc, ac, n) =>
println(s"CC=${cc}AC=${ac}Number=$n")
case _ =>
println("No match!")
}
Also you can also take a look at string interpolation to make your strings easier to understand: s"..."

Scala regex "starts with lowercase alphabets" not working

val AlphabetPattern = "^([a-z]+)".r
def stringMatch(s: String) = s match {
case AlphabetPattern() => println("found")
case _ => println("not found")
}
If I try,
stringMatch("hello")
I get "not found", but I expected to get "found".
My understanding of the regex,
[a-z] = in the range of 'a' to 'z'
+ = one more of the previous pattern
^ = starts with
So regex AlphabetPattern is "all strings that start with one or more alphabets in the range a-z"
Surely I am missing something, want to know what.
Replace case AlphabetPattern() with case AlphabetPattern(_) and it works. The extractor pattern takes a variable to which it binds the result. Here we discard it but you could use x or whatever.
edit: Further to Randall's comment below, if you check the docs for Regex you'll see that it has an unapplySeq rather than an unapply method, which means it takes multiple variables. If you have the wrong number, it won't match, rather like
list match { case List(a,b,c) => a + b + c }
won't match if list doesn't have exactly 3 elements.
There are some issues with the match statement. s match is matching on the value of s which is checked against AlphabetPattern and _ which always evaluates to _ since s is never equal to "^([a-z]+)".r. Use one of the find methods in Scala.Util.Regex to look for a match with the given `Regex.
For example, using findFirstIn to find the first match of a string in AlphabetPattern.
scala> AlphabetPattern.findFirstIn("hello")
res0: Option[String] = Some(hello)
The stringMatch method using findFirstIn and a case statement:
scala> def stringMatch(s: String) = AlphabetPattern findFirstIn s match {
| case Some(s) => println("Found: " + s)
| case None => println("Not found")
| }
stringMatch: (s:String)Unit
scala> stringMatch("hello")
Found: hello

Parametrized Regex for pattern matching

Is it possible to match regular expression pattern which is returned from a function? Can I do something like this?
def pattern(prefix: String) = (prefix + "_(\\w+)").r
val x = something match {
case pattern("a")(key) => "AAAA" + key
case pattern("b")(key) => "BBBB" + key
}
I cannot compile the above code. The following console snapshot shows an error I get. What am I doing wrong?
scala> def pattern(prefix: String) = (prefix + "_(\\w+)").r
pattern: (prefix: String)scala.util.matching.Regex
scala> def f(s:String) = s match {
| case pattern("a")(x) => s+x+"AAAAA"
<console>:2: error: '=>' expected but '(' found.
case pattern("a")(x) => s+x+"AAAAA"
^
This syntax is not supported by scala, you have to declare the extractor before you use it. See my earlier question on this topic.