Scala regex get parameters in path - regex

regex noob here.
example path:
home://Joseph/age=20/race=human/height=170/etc
Using regex, how do I grab everything after the "=" between the /Joseph/ path and /etc? I'm trying to create a list like
[20, human, 170]
So far I have
val pattern = ("""(?<=Joseph/)[^/]*""").r
val matches = pattern.findAllIn(path)
The pattern lets me just get "age=20" but I thought findAllIn would let me find all of the "parameter=" matches. And after that, I'm not sure how I would use regex to just obtain the "20" in "age=20", etc.

Code
See regex in use here
(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)
Usage
See code in use here
object Main extends App {
val path = "home://Joseph/age=20/race=human/height=170/etc"
val pattern = ("""(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)""").r
pattern.findAllIn(path).matchData foreach {
m => println(m.group(1))
}
}
Results
Input
home://Joseph/age=20/race=human/height=170/etc
Output
20
human
170
Explanation
(?:(?<=/Joseph/)|\G(?!\A)/) Match the following
(?<=/Joseph/) Positive lookbehind ensuring what precedes matches /Joseph/ literally
\G(?!\A)/ Assert position at the end of the previous match and match / literally
[^=]+ Match one or more of any character except =
= Match this literally
([^=/]+) Capture one or more of any character except = and / into capture group 1

Your pattern looks for the pattern directly after Joseph/, which is why only age=20 matched, maybe just look after =?
val s = "home://Joseph/age=20/race=human/height=170/etc"
// s: String = home://Joseph/age=20/race=human/height=170/etc
val pattern = "(?<==)[^/]*".r
// pattern: scala.util.matching.Regex = (?<==)[^/]*
pattern.findAllIn(s).toList
// res3: List[String] = List(20, human, 170)

Related

How to get multiple regex on same string in scala

My requirement is to get multiple regex patterns in a given String.
"<a href=\"https://page1.google.com/ab-cd/ABCDEF\”>Hello</a> hiiii <a href=\"https://page2.yahoo.com/gr\”>page</a><img src=\"https://image01.google.com/gr/content/attachment/987654321\” alt=\”demo image\”></a><a href=\"https://page3.google.com/hr\">"
With this below code:
val p = Pattern.compile("href=\"(.*?)\"")
val m = p.matcher(str)
while(m.find()){
println(m.group(1))
}
I am getting output:
https://page1.google.com/ab-cd/ABCDEF
https://page2.yahoo.com/gr
https://page3.google.com/hr
With change in Pattern:
val p = Pattern.compile("img src=\"(.*?)\"")
I am getting output:
https://image01.google.com/gr/content/attachment/987654321
But with Pattern:
val p = Pattern.compile("href=\"(.*?)\"|img src=\"(.*?)\"")
I am getting output:
https://page1.google.com/ab-cd/ABCDEF
https://page2.yahoo.com/gr
Null
https://page3.google.com/hr
Please let me know, how to get multiple regex pattern or is their any other easy way to do this?
Thanks
You may use
val rx = "(?:href|img src)=\"(.*?)\"".r
val results = rx.findAllMatchIn(s).map(_ group 1)
// println(results.mkString(", ")) prints:
// https://page1.google.com/ab-cd/ABCDEF,
// https://page2.yahoo.com/gr,
// https://image01.google.com/gr/content/attachment/987654321,
// https://page3.google.com/hr
See the Scala demo
Details
(?:href|img src)=\"(.*?)\" matches either href or img src, then a =", and then captures any 0+ chars other than line break chars as few as possible into Group 1, and then a " is matched
With .findAllIn, you get all matches, then .map(_ group 1) only fetches Group 1 values.

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Cannot retrive a group from Scala Regex match

I am struggling with regexps in Scala (2.11.5), I have a followin string to parse (example):
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
I want to extract third numeric value in the string above (it needs to be third after a slash because there can be other groups following), in order to do that I have the following regex pattern:
val pattern = """\/\d+,\d+,(\d+)""".r
I have been trying to retrieve the group for the third sequence of digits, but nothing seems to work for me.
val matchList = pattern.findAllMatchIn(string).foreach(println)
val matchListb = pattern.findAllIn(string).foreach(println)
I also tried using matching pattern.
string match {
case pattern(a) => println(a)
case _ => "What's going on?"
}
and got the same results. Either whole regexp is returned or nothing.
Is there an easy way to retrieve a group form regexp pattern in Scala?
You can use group method of scala.util.matching.Regex.Match to get the result.
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """\/\d+,\d+,(\d+)""".r
val result = pattern.findAllMatchIn(string) // returns iterator of Match
.toArray
.headOption // returns None if match fails
.map(_.group(1)) // select first regex group
// or simply
val result = pattern.findFirstMatchIn(string).map(_.group(1))
// result = Some(14058913)
// result will be None if the string does not match the pattern.
// if you have more than one groups, for instance:
// val pattern = """\/(\d+),\d+,(\d+)""".r
// result will be Some(56)
Pattern matching is usually the easiest way to do it, but it requires a match on the full string, so you'll have to prefix and suffix your regex pattern with .*:
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """.*\/\d+,\d+,(\d+).*""".r
val pattern(x) = string
// x: String = 14058913

how do I extract substring (group) using regex without knowing if regex matches?

I want to use this
val r = """^myprefix:(.*)""".r
val r(suffix) = line
println(suffix)
But it gives an error when the string doesn't match. How do I use a similar construct where matching is optional?
Edit: To make it clear, I need the group (.*)
You can extract match groups via pattern matching.
val r = """^myprefix:(.*)""".r
line match {
case r(group) => group
case _ => ""
}
Another way using Option:
Option(line) collect { case r(group) => group }
"""^myprefix:(.*)""".r // Regex
.findFirstMatchIn(line) // Option[Match]
.map(_ group 1) // Option[String]
This has the advantage that you can write it as a one-liner without needing to assign the regex to an intermediate value r.
In case you're wondering, group 0 is the matched string while group 1 etc are the capture groups.
try
r.findFirstIn(line)
UPD:
scala> val rgx = """^myprefix:(.*)""".r
rgx: scala.util.matching.Regex = ^myprefix:(.*)
scala> val line = "myprefix:value"
line: java.lang.String = myprefix:value
scala> for (rgx(group) <- rgx.findFirstIn(line)) yield group
res0: Option[String] = Some(value)

Scala capture group using regex

Let's say I have this code:
val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)
I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:
val pattern = """one.*two(\d+)three""".r
val pattern(aMatch) = string
println(aMatch) // prints 483
Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?
Here's an example of how you can access group(1) of each match:
val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
This prints "483" (as seen on ideone.com).
The lookaround option
Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:
val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)
The above also prints "483" (as seen on ideone.com).
References
regular-expressions.info/Lookarounds
val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r
string match {
case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
case _ => // no match
}
Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:
"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"
Or even:
val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483
If you expect non matching input, you can add a default pattern guard:
"one493deux483three" match {
case s"${x}two${y}three" => y
case _ => "no match"
}
// String = "no match"
You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".
See this regex tutorial.
def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
case Some(i) => i.group(1)
case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
"http4://testing.bbmkl.com/document/sth1234.zip")