Scala pattern match regex - regex

In my scala program, I want to use a pattern match to test whether there is a valid .csv file in the input path.
path ="\DAP\TestData\test01.csv"
val regex=""".csv$""".r.unanchored
I tried to use the previous regex to match the string, it worked, but when it went to match pattern, it cannot work.
path ="\DAP\TestData\test01.csv"
val regex="""\.csv$""".r.unanchored
path match {
case regex(type) =>println(s"$type matched")
case _ =>println("something else happeded")
}
I need to successfully print information like ".csv matched".
Could anyone help me with this issue? I m really confused by this issue.
Thanks

It's not clear which part of the path you want to capture and report. But in any case you'll probably want a capture group in the regex pattern.
val path = raw"\DAP\TestData\test01.csv"
val re = """(.*\.csv)$""".r.unanchored
path match {
case re(typ) => println(s"$typ matched") //"\DAP\TestData\test01.csv matched"
case _ => println("something else happened")
}
You can also use the capture group to capture any one of many different target patterns.
val re = ".*\\.((?i:json|xml|csv))$".r
raw"\root\test31.XML" match {
case re(ext) => println(s"$ext matched") //"XML matched"
case _ => println("something else happeded")
}

You can try it like this:
val regex = """(\.csv)$""".r.unanchored
path match {
case regex(fileType) => println(s"$fileType matched")
case _ => println("something else happeded")
}

Related

Matching an IP address in Scala

New to Scala, I've written this piece of code to match IP addresses, but results with "No Match".
val regex = """^(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))((\.(([0-9])|([1-9][0-9])|(1([0-9]{2}))|(2[0-4][0-9])|(25[0-5]))){3})$""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex(ip) => println(ip)
case _ => println("No match.")
}
isValidIP(i)
Result: No match.
I have verified that the Regex pattern works as expected.
What am I missing here?
There are several issues:
An issue with your regex that does not match the full IP address. You can use a well-known IP address validation regex from regular-expressions.info.
match requires a full string match
match also requires a capturing group in the pattern. If you do not want to specify the group, you need regex() => println(ip) to just check if the regex matches a string.
You can fix your code using
val regex = """(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)""".r
val i = "10.20.30.40"
def isValidIP(ip: String) = ip match {
case regex() => println(ip)
case _ => println("No match.")
}
isValidIP(i)
See the Scala code demo.

Why does Scala regexp work differently in pattern matching

I have a simple regular expression val emailRegex = "\\w+#\\w+\\.\\w+".r that matches simple emails (not for production, of course:). When I run the following code:
println(email match {
case emailRegex(_) => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())
It prints not cool and true. Adding anchors doesn't help either: "^\\w+#\\w+\\.\\w+$".r gives the same result. But when I add parentheses "(\\w+#\\w+\\.\\w+)".r it prints cool and true.
Why does this happen?
The number of arguments to a regex pattern should match the number of capturing group in the regex. Your regex does not have any capturing groups, so there should be zero arguments:
println(email match {
case emailRegex() => "cool"
case _ => "not cool"
})
printlnemailRegex.pattern.matcher(email).matches())
Because pattern matching with a regex is about capturing regex groups:
val email = "foo#foo.com"
val slightyDifferentEmailRegex = "(\\w+)#\\w+\\.\\w+".r // just add a group with two brackets
println(email match {
case slightyDifferentEmailRegex(g) => "cool" + s" and here's the captured group: $g"
case _ => "not cool"
})
prints:
cool and here's the captured group: foo

scala regex : extract file extension

val file_name="D:/folder1/folder2/filename.ext" //filename
val reg_ex = """(.*?).(\\\\w*$)""".r //regex pattern
file_name match {
case reg_ex(one , two) =>s"$two is extension"
case _ => println(" file_reg_ex none")
}
I want to extract file extension i.e."ext" from the above using scala regex , using match & case.
I am using above regex and it is going into not match case.
Any pointers to regex tutorials will be helpful.
A few minor adjustments.
val reg_ex = """.*\.(\w+)""".r
file_name match {
case reg_ex(ext) =>s"$ext is extension"
case _ => println("file_reg_ex none"); ""
}
Only one capture group needed. Ignore everything before the final dot, \. (escaped so it's a dot and not an "any char") and capture the rest.
The default, case _, should do more than println. It should return the same type as the match.

Scala Anchored Regex acts as unachored

So for some reason in Scala 2.11, my anchored regex patterns act as unanchored regex patterns.
scala> """something\.com""".r.anchored findFirstIn "app.something.com"
res66: Option[String] = Some(something.com)
scala> """^.something\.com$""".r.anchored findFirstIn "app.something.com"
res65: Option[String] = None
I thought the first expression would evaluate as None like the second (manually entered anchors) but it does not.
Any help would be appreciated.
The findFirstIn method un-anchors the regex automatically.
You can see that the example code also matches A only:
Example:
"""\w+""".r findFirstIn "A simple example." foreach println // prints "A"
BTW, once you create a regex like "pattern".r, it is anchored by default, but that only matters when you use the regex in a match block. Inside the FindAllIn or FindFirstIn, this type of anchoring is just ignored.
So, to make sure the regex matches the whole string, always add ^ and $ (or \A and \z) anchors if you are not sure where you are going to use the regexes.
I think, it is only supposed to work with match:
val reg = "o".r.anchored
"foo" match {
case reg() => "Yes!"
case _ => "No!"
}
... returns "No!".
This doesn't seem very useful, because just "o".r is anchored by default anyway. The only use of this I can imagine is if you made some unanchored (by accident? :)), and then want to undo it, or if you just want to match both cases, but s
eparately:
val reg = "o".r.unanchored
"foo" match {
case reg.anchored() => "Anchored!
case reg() => "Unanchored"
case _ => "I dunno"
}

Scala Regex Multiple Block Capturing

I'm trying to capture parts of a multi-lined string with a regex in Scala.
The input is of the form:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
I've tried several possibilities that should get me the text out of the begin { } blocks. One of them:
val Block = """(?s).*begin \{(.*)\}""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
I get a NO MATCH. If I drop the \} the regex looks like (?s).*begin \{(.*) and it matches the last block including the unwanted } and "some text". I checked my regex at rubular.com as with /.*begin \{(.*)\}/m and it matches at least one block. I thought when my Scala regex would match the same I could start using findAllIn to match all blocks. What am I doing wrong?
I had a look at Scala Regex enable Multiline option but I could not manage to capture all the occurrences of the text blocks in, for example, a Seq[String].
Any help is appreciated.
As Alex has said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it was bounded (ie, using ^ and $). The usual way to avoid this problem is to use findAllIn first. This way:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
val Block = """(?s)begin \{(.*)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
Otherwise, you can use .* at the beginning and end to get around that restriction:
val Block = """(?s).*begin \{(.*)\}.*""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
By the way, you probably want a non-eager matcher:
val Block = """(?s)begin \{(.*?)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
When doing a match, I believe there is a full match implicity required. Your match is equivalent to:
val Block = """^(?s).*begin \{(.*)\}$""".r
It works if you add .* to the end:
val Block = """(?s).*begin \{(.*)\}.*""".r
I haven't been able to find any documentation on this, but I have encountered this same issue.
As a complement to the other answers, I wanted to point out the existence of kantan.regex, which lets you write the following:
import kantan.regex.ops._
// The type parameter is the type as which to decode results,
// the value parameters are the regular expression to apply and the group to
// extract data from.
input.evalRegex[String]("""(?s)begin \{(.*?)\}""", 1).toList
This yields:
List(Success(
content to extract
content to extract
), Success(
other content to extract
))