Why does Scala regexp work differently in pattern matching - regex

I have a simple regular expression val emailRegex = "\\w+#\\w+\\.\\w+".r that matches simple emails (not for production, of course:). When I run the following code:
println(email match {
case emailRegex(_) => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())
It prints not cool and true. Adding anchors doesn't help either: "^\\w+#\\w+\\.\\w+$".r gives the same result. But when I add parentheses "(\\w+#\\w+\\.\\w+)".r it prints cool and true.
Why does this happen?

The number of arguments to a regex pattern should match the number of capturing group in the regex. Your regex does not have any capturing groups, so there should be zero arguments:
println(email match {
case emailRegex() => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())

Because pattern matching with a regex is about capturing regex groups:
val email = "foo#foo.com"
val slightyDifferentEmailRegex = "(\\w+)#\\w+\\.\\w+".r // just add a group with two brackets
println(email match {
case slightyDifferentEmailRegex(g) => "cool" + s" and here's the captured group: $g"
case _ => "not cool"
})
prints:
cool and here's the captured group: foo

Related

Matching an IP address in Scala

New to Scala, I've written this piece of code to match IP addresses, but results with "No Match".
val regex = """^(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})$""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex(ip) => println(ip)
case _ => println("No match.")
}
isValidIP(i)
Result: No match.
I have verified that the Regex pattern works as expected.
What am I missing here?
There are several issues:
An issue with your regex that does not match the full IP address. You can use a well-known IP address validation regex from regular-expressions.info.
match requires a full string match
match also requires a capturing group in the pattern. If you do not want to specify the group, you need regex() => println(ip) to just check if the regex matches a string.
You can fix your code using
val regex = """(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex() => println(ip)
case _ => println("No match.")
}
isValidIP(i)
See the Scala code demo.

Scala pattern match regex

In my scala program, I want to use a pattern match to test whether there is a valid .csv file in the input path.
path ="\DAP\TestData\test01.csv"
val regex=""".csv$""".r.unanchored
I tried to use the previous regex to match the string, it worked, but when it went to match pattern, it cannot work.
path ="\DAP\TestData\test01.csv"
val regex="""\.csv$""".r.unanchored
path match {
case regex(type) =>println(s"$type matched")
case _ =>println("something else happeded")
}
I need to successfully print information like ".csv matched".
Could anyone help me with this issue? I m really confused by this issue.
Thanks
It's not clear which part of the path you want to capture and report. But in any case you'll probably want a capture group in the regex pattern.
val path = raw"\DAP\TestData\test01.csv"
val re = """(.*\.csv)$""".r.unanchored
path match {
case re(typ) => println(s"$typ matched") //"\DAP\TestData\test01.csv matched"
case _ => println("something else happened")
}
You can also use the capture group to capture any one of many different target patterns.
val re = ".*\\.((?i:json|xml|csv))$".r
raw"\root\test31.XML" match {
case re(ext) => println(s"$ext matched") //"XML matched"
case _ => println("something else happeded")
}
You can try it like this:
val regex = """(\.csv)$""".r.unanchored
path match {
case regex(fileType) => println(s"$fileType matched")
case _ => println("something else happeded")
}

scala regex : extract file extension

val file_name="D:/folder1/folder2/filename.ext" //filename
val reg_ex = """(.*?).(\\\\w*$)""".r //regex pattern
file_name match {
case reg_ex(one , two) =>s"$two is extension"
case _ => println(" file_reg_ex none")
}
I want to extract file extension i.e."ext" from the above using scala regex , using match & case.
I am using above regex and it is going into not match case.
Any pointers to regex tutorials will be helpful.
A few minor adjustments.
val reg_ex = """.*\.(\w+)""".r
file_name match {
case reg_ex(ext) =>s"$ext is extension"
case _ => println("file_reg_ex none"); ""
}
Only one capture group needed. Ignore everything before the final dot, \. (escaped so it's a dot and not an "any char") and capture the rest.
The default, case _, should do more than println. It should return the same type as the match.

Scala Anchored Regex acts as unachored

So for some reason in Scala 2.11, my anchored regex patterns act as unanchored regex patterns.
scala> """something\.com""".r.anchored findFirstIn "app.something.com"
res66: Option[String] = Some(something.com)
scala> """^.something\.com$""".r.anchored findFirstIn "app.something.com"
res65: Option[String] = None
I thought the first expression would evaluate as None like the second (manually entered anchors) but it does not.
Any help would be appreciated.
The findFirstIn method un-anchors the regex automatically.
You can see that the example code also matches A only:
Example:
"""\w+""".r findFirstIn "A simple example." foreach println // prints "A"
BTW, once you create a regex like "pattern".r, it is anchored by default, but that only matters when you use the regex in a match block. Inside the FindAllIn or FindFirstIn, this type of anchoring is just ignored.
So, to make sure the regex matches the whole string, always add ^ and $ (or \A and \z) anchors if you are not sure where you are going to use the regexes.
I think, it is only supposed to work with match:
val reg = "o".r.anchored
"foo" match {
case reg() => "Yes!"
case _ => "No!"
}
... returns "No!".
This doesn't seem very useful, because just "o".r is anchored by default anyway. The only use of this I can imagine is if you made some unanchored (by accident? :)), and then want to undo it, or if you just want to match both cases, but s
eparately:
val reg = "o".r.unanchored
"foo" match {
case reg.anchored() => "Anchored!
case reg() => "Unanchored"
case _ => "I dunno"
}

scala matching optional set of characters

I am using scala regex to extract a token from a URL
my url is http://www.google.com?x=10&id=x10_23&y=2
here I want to extract the value of x10 in front of id. note that _23 is optional and may or may not appear but if it appears it must be removed.
The regex which I have written is
val regex = "^.*id=(.*)(\\_\\d+)?.*$".r
x match {
case regex(id) => print(id)
case _ => print("none")
}
this should work because (\\_\\d+)? should make the _23 optional as a whole.
So I don't understand why it prints none.
Note that your pattern ^.*id=(.*)(\\_\\d+)?.*$ actually puts x10_23&y=2 into Group 1 because of the 1st greedy dot matching subpattern. Since (_\d+)? is optional, the first greedy subpattern does not have to yield any characters to that capture group.
You can use
val regex = "(?s).*[?&]id=([^\\W&]+?)(?:_\\d+)?(?:&.*)?".r
val x = "http://www.google.com?x=10&id=x10_23&y=2"
x match {
case regex(id) => print(id)
case _ => print("none")
}
See the IDEONE demo (regex demo)
Note that there is no need defining ^ and $ - that pattern is anchored in Scala by default. (?s) ensures we match the full input string even if it contains newline symbols.
Another idea instead of using a regular expression to extract tokens would be to use the built-in URI Java class with its getQuery() method. There you can split the query by = and then check if one of the pair starts with id= and extract the value.
For instance (just as an example):
val x = "http://www.google.com?x=10&id=x10_23&y=2"
val uri = new URI(x)
uri.getQuery.split('&').find(_.startsWith("id=")) match {
case Some(param) => println(param.split('=')(1).replace("_23", ""))
case None => println("None")
}
I find it simpler to maintain that the regular expression you have, but that's just my thought!