Replacing the 1st regex-match group instead of the 0th - regex

I was expecting this
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ""))
to result in this:
hello, world
Instead, it prints this:
hello world
I see that the replace function cares about the whole match. Is there a way to replace only the 1st group instead of the 0th one?

Add the comma in the replacement:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ","))
Or, if kotlin supports lookahead:
val string = "hello , world"
val regex = Regex("""\s+(?=,)""")
println(string.replace(regex, ""))

You can retrieve the match range of the regular expression by using the groups property of MatchGroupCollection and then using the range as a parameter for String.removeRange method:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
val result = string.removeRange(regex.find(string)!!.groups[1]!!.range)

Related

How to extract nth URL from string using regex?

I wanted to extract second URL using regex, I cant use any other thing, So far I have managed to extract all URLs from the string using a regex but its just giving out the first URL.
fun main() {
var text = "hello world https://www.google.com hello world https://www.stackoverflow.com hello world https://www.test.com"
var regex = """((http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?)"""
println(performRegex(text, regex))
}
private fun performRegex(text: String?, regex: String?): String? {
val regexPattern = Regex("""$regex""")
return regexPattern.find(text.toString())?.value
}
Current Output: https://www.google.com
Expected Output: https://www.stackoverflow.com
You can use
private fun performRegex(text: String?, regex: String?): String? {
val regexPattern = Regex("""$regex""")
val matchList = regexPattern.findAll(text.toString()).map{it.value}.toList()
return if (matchList.size >= 2) matchList[1] else null
}
fun main(args: Array<String>) {
var text = "hello world https://www.google.com hello world https://www.stackoverflow.com hello world https://w...content-available-to-author-only...t.com"
var regex = """(?:https?|ftp)://\S+"""
println(performRegex(text, regex))
}
See the online Kotlin demo.
The regex is (?:https?|ftp)://\S+, it matches http://, https:// or ftp:// and then any one or more non-whitespace chars.
The val matchList = regexPattern.findAll(text.toString()).map{it.value}.toList() part finds all matches and maps the results to a list of strings.
The return if (matchList.size >= 2) matchList[1] else null part returns the second match found if the match list size is two or more, else, it returns null.

Scala: how to replace strings using the original matched values

Is there a way to replace some string in a text using the original matched values?
For instance, I would like to replace all the integers by decimals, as in the following example:
"hello 45 hello 4 bye" --> "hello 45.0 hello 4.0 bye"
I could match all the numbers with findAllIn and after replace them but I would like to know if there is a better solution.
Using RegularExpressions, you can use $1 to get the result of the first capturing group (in parenthesis):
val regex = "(\\d+)".r
val text = "hello 45 hello 4 bye"
val result = regex.replaceAllIn(text, "$1.0")
// result: String = hello 45.0 hello 4.0 bye
Use the overload of replaceAllIn that takes a replacer function:
http://www.scala-lang.org/api/current/index.html#scala.util.matching.Regex#replaceAllIn(target:CharSequence,replacer:scala.util.matching.Regex.Match=>String):String

return first instance of unmatched regex scala

Is there a way to return the first instance of an unmatched string between 2 strings with Scala's Regex library?
For example:
val a = "some text abc123 some more text"
val b = "some text xyz some more text"
a.firstUnmatched(b) = "abc123"
Regex is good for matching & replacing in strings based on patterns.
But to look for the differences between strings? Not exactly.
However, diff can be used to find differences.
object Main extends App {
val a = "some text abc123 some more text 321abc"
val b = "some text xyz some more text zyx"
val firstdiff = (a.split(" ") diff b.split(" "))(0)
println(firstdiff)
}
prints "abc123"
Is regex desired after all? Then realize that the splits could be replaced by regex matching.
The regex pattern in this example looks for words:
val reg = "\\w+".r
val firstdiff = (reg.findAllIn(a).toList diff reg.findAllIn(b).toList)(0)

Selectively uppercasing a string

I have a string with some XML tags in it, like:
"hello <b>world</b> and <i>everyone</i>"
Is there a good Scala/functional way of uppercasing the words, but not the tags, so that it looks like:
"HELLO <b>WORLD<b> AND <i>EVERYONE</i>"
We can use dustmouse's regex to replace all the text in/outside XML tags with Regex.replaceAllIn. We can get the matched text with Regex.Match.matched which then can easily be uppercased using toUpperCase.
val xmlText = """(?<!<|<\/)\b\w+(?!>)""".r
val string = "hello <b>world</b> and <i>everyone</i>"
xmlText.replaceAllIn(string, _.matched.toUpperCase)
// String = HELLO <b>WORLD</b> AND <i>EVERYONE</i>
val string2 = "<h1>>hello</h1> <span>world</span> and <span><i>everyone</i>"
xmlText.replaceAllIn(string2, _.matched.toUpperCase)
// String = <h1>>HELLO</h1> <span>WORLD</span> AND <span><i>EVERYONE</i>
Using dustmouse's updated regex :
val xmlText = """(?:<[^<>]+>\s*)(\w+)""".r
val string3 = """<h1>>hello</h1> <span id="test">world</span>"""
xmlText.replaceAllIn(string3, m =>
m.group(0).dropRight(m.group(1).length) + m.group(1).toUpperCase)
// String = <h1>>hello</h1> <span id="test">WORLD</span>
Okay, how about this. It just prints the results, and takes into consideration some of the scenarios brought up by others. Not sure how to capitalize the output without mercilessly poaching from Peter's answer:
val string = "<h1 id=\"test\">hello</h1> <span>world</span> and <span><i>everyone</i></span>"
val pattern = """(?:<[^<>]+>\s*)(\w+)""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
The main thing here is that it is extracting the correct capture group.
Working example: http://ideone.com/2qlwoP
Also need to give credit to the answer here for getting capture groups in scala: Scala capture group using regex

Cannot retrive a group from Scala Regex match

I am struggling with regexps in Scala (2.11.5), I have a followin string to parse (example):
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
I want to extract third numeric value in the string above (it needs to be third after a slash because there can be other groups following), in order to do that I have the following regex pattern:
val pattern = """\/\d+,\d+,(\d+)""".r
I have been trying to retrieve the group for the third sequence of digits, but nothing seems to work for me.
val matchList = pattern.findAllMatchIn(string).foreach(println)
val matchListb = pattern.findAllIn(string).foreach(println)
I also tried using matching pattern.
string match {
case pattern(a) => println(a)
case _ => "What's going on?"
}
and got the same results. Either whole regexp is returned or nothing.
Is there an easy way to retrieve a group form regexp pattern in Scala?
You can use group method of scala.util.matching.Regex.Match to get the result.
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """\/\d+,\d+,(\d+)""".r
val result = pattern.findAllMatchIn(string) // returns iterator of Match
.toArray
.headOption // returns None if match fails
.map(_.group(1)) // select first regex group
// or simply
val result = pattern.findFirstMatchIn(string).map(_.group(1))
// result = Some(14058913)
// result will be None if the string does not match the pattern.
// if you have more than one groups, for instance:
// val pattern = """\/(\d+),\d+,(\d+)""".r
// result will be Some(56)
Pattern matching is usually the easiest way to do it, but it requires a match on the full string, so you'll have to prefix and suffix your regex pattern with .*:
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """.*\/\d+,\d+,(\d+).*""".r
val pattern(x) = string
// x: String = 14058913