Matching an IP address in Scala - regex

New to Scala, I've written this piece of code to match IP addresses, but results with "No Match".
val regex = """^(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})$""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex(ip) => println(ip)
case _ => println("No match.")
}
isValidIP(i)
Result: No match.
I have verified that the Regex pattern works as expected.
What am I missing here?

There are several issues:
An issue with your regex that does not match the full IP address. You can use a well-known IP address validation regex from regular-expressions.info.
match requires a full string match
match also requires a capturing group in the pattern. If you do not want to specify the group, you need regex() => println(ip) to just check if the regex matches a string.
You can fix your code using
val regex = """(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex() => println(ip)
case _ => println("No match.")
}
isValidIP(i)
See the Scala code demo.

Related

Scala pattern match regex

In my scala program, I want to use a pattern match to test whether there is a valid .csv file in the input path.
path ="\DAP\TestData\test01.csv"
val regex=""".csv$""".r.unanchored
I tried to use the previous regex to match the string, it worked, but when it went to match pattern, it cannot work.
path ="\DAP\TestData\test01.csv"
val regex="""\.csv$""".r.unanchored
path match {
case regex(type) =>println(s"$type matched")
case _ =>println("something else happeded")
}
I need to successfully print information like ".csv matched".
Could anyone help me with this issue? I m really confused by this issue.
Thanks
It's not clear which part of the path you want to capture and report. But in any case you'll probably want a capture group in the regex pattern.
val path = raw"\DAP\TestData\test01.csv"
val re = """(.*\.csv)$""".r.unanchored
path match {
case re(typ) => println(s"$typ matched") //"\DAP\TestData\test01.csv matched"
case _ => println("something else happened")
}
You can also use the capture group to capture any one of many different target patterns.
val re = ".*\\.((?i:json|xml|csv))$".r
raw"\root\test31.XML" match {
case re(ext) => println(s"$ext matched") //"XML matched"
case _ => println("something else happeded")
}
You can try it like this:
val regex = """(\.csv)$""".r.unanchored
path match {
case regex(fileType) => println(s"$fileType matched")
case _ => println("something else happeded")
}

Why does Scala regexp work differently in pattern matching

I have a simple regular expression val emailRegex = "\\w+#\\w+\\.\\w+".r that matches simple emails (not for production, of course:). When I run the following code:
println(email match {
case emailRegex(_) => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())
It prints not cool and true. Adding anchors doesn't help either: "^\\w+#\\w+\\.\\w+$".r gives the same result. But when I add parentheses "(\\w+#\\w+\\.\\w+)".r it prints cool and true.
Why does this happen?
The number of arguments to a regex pattern should match the number of capturing group in the regex. Your regex does not have any capturing groups, so there should be zero arguments:
println(email match {
case emailRegex() => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())
Because pattern matching with a regex is about capturing regex groups:
val email = "foo#foo.com"
val slightyDifferentEmailRegex = "(\\w+)#\\w+\\.\\w+".r // just add a group with two brackets
println(email match {
case slightyDifferentEmailRegex(g) => "cool" + s" and here's the captured group: $g"
case _ => "not cool"
})
prints:
cool and here's the captured group: foo

Scala Anchored Regex acts as unachored

So for some reason in Scala 2.11, my anchored regex patterns act as unanchored regex patterns.
scala> """something\.com""".r.anchored findFirstIn "app.something.com"
res66: Option[String] = Some(something.com)
scala> """^.something\.com$""".r.anchored findFirstIn "app.something.com"
res65: Option[String] = None
I thought the first expression would evaluate as None like the second (manually entered anchors) but it does not.
Any help would be appreciated.
The findFirstIn method un-anchors the regex automatically.
You can see that the example code also matches A only:
Example:
"""\w+""".r findFirstIn "A simple example." foreach println // prints "A"
BTW, once you create a regex like "pattern".r, it is anchored by default, but that only matters when you use the regex in a match block. Inside the FindAllIn or FindFirstIn, this type of anchoring is just ignored.
So, to make sure the regex matches the whole string, always add ^ and $ (or \A and \z) anchors if you are not sure where you are going to use the regexes.
I think, it is only supposed to work with match:
val reg = "o".r.anchored
"foo" match {
case reg() => "Yes!"
case _ => "No!"
}
... returns "No!".
This doesn't seem very useful, because just "o".r is anchored by default anyway. The only use of this I can imagine is if you made some unanchored (by accident? :)), and then want to undo it, or if you just want to match both cases, but s
eparately:
val reg = "o".r.unanchored
"foo" match {
case reg.anchored() => "Anchored!
case reg() => "Unanchored"
case _ => "I dunno"
}

scala matching optional set of characters

I am using scala regex to extract a token from a URL
my url is http://www.google.com?x=10&id=x10_23&y=2
here I want to extract the value of x10 in front of id. note that _23 is optional and may or may not appear but if it appears it must be removed.
The regex which I have written is
val regex = "^.*id=(.*)(\\_\\d+)?.*$".r
x match {
case regex(id) => print(id)
case _ => print("none")
}
this should work because (\\_\\d+)? should make the _23 optional as a whole.
So I don't understand why it prints none.
Note that your pattern ^.*id=(.*)(\\_\\d+)?.*$ actually puts x10_23&y=2 into Group 1 because of the 1st greedy dot matching subpattern. Since (_\d+)? is optional, the first greedy subpattern does not have to yield any characters to that capture group.
You can use
val regex = "(?s).*[?&]id=([^\\W&]+?)(?:_\\d+)?(?:&.*)?".r
val x = "http://www.google.com?x=10&id=x10_23&y=2"
x match {
case regex(id) => print(id)
case _ => print("none")
}
See the IDEONE demo (regex demo)
Note that there is no need defining ^ and $ - that pattern is anchored in Scala by default. (?s) ensures we match the full input string even if it contains newline symbols.
Another idea instead of using a regular expression to extract tokens would be to use the built-in URI Java class with its getQuery() method. There you can split the query by = and then check if one of the pair starts with id= and extract the value.
For instance (just as an example):
val x = "http://www.google.com?x=10&id=x10_23&y=2"
val uri = new URI(x)
uri.getQuery.split('&').find(_.startsWith("id=")) match {
case Some(param) => println(param.split('=')(1).replace("_23", ""))
case None => println("None")
}
I find it simpler to maintain that the regular expression you have, but that's just my thought!

Scala Regex Multiple Block Capturing

I'm trying to capture parts of a multi-lined string with a regex in Scala.
The input is of the form:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
I've tried several possibilities that should get me the text out of the begin { } blocks. One of them:
val Block = """(?s).*begin \{(.*)\}""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
I get a NO MATCH. If I drop the \} the regex looks like (?s).*begin \{(.*) and it matches the last block including the unwanted } and "some text". I checked my regex at rubular.com as with /.*begin \{(.*)\}/m and it matches at least one block. I thought when my Scala regex would match the same I could start using findAllIn to match all blocks. What am I doing wrong?
I had a look at Scala Regex enable Multiline option but I could not manage to capture all the occurrences of the text blocks in, for example, a Seq[String].
Any help is appreciated.
As Alex has said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it was bounded (ie, using ^ and $). The usual way to avoid this problem is to use findAllIn first. This way:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
val Block = """(?s)begin \{(.*)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
Otherwise, you can use .* at the beginning and end to get around that restriction:
val Block = """(?s).*begin \{(.*)\}.*""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
By the way, you probably want a non-eager matcher:
val Block = """(?s)begin \{(.*?)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
When doing a match, I believe there is a full match implicity required. Your match is equivalent to:
val Block = """^(?s).*begin \{(.*)\}$""".r
It works if you add .* to the end:
val Block = """(?s).*begin \{(.*)\}.*""".r
I haven't been able to find any documentation on this, but I have encountered this same issue.
As a complement to the other answers, I wanted to point out the existence of kantan.regex, which lets you write the following:
import kantan.regex.ops._
// The type parameter is the type as which to decode results,
// the value parameters are the regular expression to apply and the group to
// extract data from.
input.evalRegex[String]("""(?s)begin \{(.*?)\}""", 1).toList
This yields:
List(Success(
content to extract
content to extract
), Success(
other content to extract
))