Selectively uppercasing a string - regex

I have a string with some XML tags in it, like:
"hello <b>world</b> and <i>everyone</i>"
Is there a good Scala/functional way of uppercasing the words, but not the tags, so that it looks like:
"HELLO <b>WORLD<b> AND <i>EVERYONE</i>"

We can use dustmouse's regex to replace all the text in/outside XML tags with Regex.replaceAllIn. We can get the matched text with Regex.Match.matched which then can easily be uppercased using toUpperCase.
val xmlText = """(?<!<|<\/)\b\w+(?!>)""".r
val string = "hello <b>world</b> and <i>everyone</i>"
xmlText.replaceAllIn(string, _.matched.toUpperCase)
// String = HELLO <b>WORLD</b> AND <i>EVERYONE</i>
val string2 = "<h1>>hello</h1> <span>world</span> and <span><i>everyone</i>"
xmlText.replaceAllIn(string2, _.matched.toUpperCase)
// String = <h1>>HELLO</h1> <span>WORLD</span> AND <span><i>EVERYONE</i>
Using dustmouse's updated regex :
val xmlText = """(?:<[^<>]+>\s*)(\w+)""".r
val string3 = """<h1>>hello</h1> <span id="test">world</span>"""
xmlText.replaceAllIn(string3, m =>
m.group(0).dropRight(m.group(1).length) + m.group(1).toUpperCase)
// String = <h1>>hello</h1> <span id="test">WORLD</span>

Okay, how about this. It just prints the results, and takes into consideration some of the scenarios brought up by others. Not sure how to capitalize the output without mercilessly poaching from Peter's answer:
val string = "<h1 id=\"test\">hello</h1> <span>world</span> and <span><i>everyone</i></span>"
val pattern = """(?:<[^<>]+>\s*)(\w+)""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
The main thing here is that it is extracting the correct capture group.
Working example: http://ideone.com/2qlwoP
Also need to give credit to the answer here for getting capture groups in scala: Scala capture group using regex

Related

transform string scala in an elegant way

I have the following input string: val s = 19860803 000000
I want to convert it to 1986/08/03
I tried this s.split(" ").head, but this is not complete
is there any elegant scala coding way with regex to get the expected result ?
You can use a date like pattern using 3 capture groups, and match the following space and the 6 digits.
In the replacement use the 3 groups in the replacement with the forward slashes.
val s = "19860803 000000"
val result = s.replaceAll("^(\\d{4})(\\d{2})(\\d{2})\\h\\d{6}$", "$1/$2/$3")
Output
result: String = 1986/08/03
i haven't tested this, but i think the following will work
val expr = raw"(\d{4})(\d{2})(\d{2}) (.*)".r
val formatted = "19860803 000000" match {
case expr(year,month,day,_) =>. s"$year/$month/$day"
}
scala docs have a lot of good info
https://www.scala-lang.org/api/2.13.6/scala/util/matching/Regex.html
An alternative, without a regular expression, by using slice and take.
val s = "19860803 000000"
val year = s.take(4)
val month = s.slice(4,6)
val day = s.slice(6,8)
val result = s"$year/$month/$day"
Or as a one liner
val result = Seq(s.take(4), s.slice(4,6), s.slice(6,8)).mkString("/")

Scala regex get string before the first hyphen and the entire string

Given a string like abab/docId/example-doc1-2019-01-01, I want to use Regex to extract these values:
firstPart = example
fullString = example-doc1-2019-01-01
I have this:
import scala.util.matching.Regex
case class Read(theString: String) {
val stringFormat: Regex = """.*\/docId\/([A-Za-z0-9]+)-([A-Za-z0-9-]+)$""".r
val stringFormat(firstPart, fullString) = theString
}
But this separates it like this:
firstPart = example
fullString = doc1-2019-01-01
Is there a way to retain the fullString and do a regex on that to get the part before the first hyphen? I know I can do this using the String split method but is there a way do it using regex?
You may use
val stringFormat: Regex = ".*/docId/(([A-Za-z0-9])+-[A-Za-z0-9-]+)$".r
||_ Group 2 _| |
| |
|_________________ Group 1 __|
See the regex demo.
Note how capturing parentheses are re-arranged. Also, you need to swap the variables in the regex match call, see demo below (fullString should come before firstPart).
See Scala demo:
val theString = "abab/docId/example-doc1-2019-01-01"
val stringFormat = ".*/docId/(([A-Za-z0-9]+)-[A-Za-z0-9-]+)".r
val stringFormat(fullString, firstPart) = theString
println(s"firstPart: '$firstPart'\nfullString: '$fullString'")
Output:
firstPart: 'example'
fullString: 'example-doc1-2019-01-01'

replaceAllMapped matches with span elements

I want to replace all matched strings after regExp with same strings in span elements. Is this possible?
I want to do something like that:
final text = message.replaceAllMapped(exp, (match) => '<span>exp, (match)</span>');
You may use String#replaceAllMapped like this:
final exp = new RegExp(r'\d+(?:\.\d+)?');
String message = 'test 40.40 test 20.20';
final text = message.replaceAllMapped(exp,
(Match m) => "<span>${m[0]}</span>");
print(text);
Output: test <span>40.40</span> test <span>20.20</span>
Here, m is the Match object that the regex engine finds and passes to the arrow method where the first item in the m array is inserted in between <span> and </span> inside an interpolated double quoted string literal.

Replacing the 1st regex-match group instead of the 0th

I was expecting this
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ""))
to result in this:
hello, world
Instead, it prints this:
hello world
I see that the replace function cares about the whole match. Is there a way to replace only the 1st group instead of the 0th one?
Add the comma in the replacement:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ","))
Or, if kotlin supports lookahead:
val string = "hello , world"
val regex = Regex("""\s+(?=,)""")
println(string.replace(regex, ""))
You can retrieve the match range of the regular expression by using the groups property of MatchGroupCollection and then using the range as a parameter for String.removeRange method:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
val result = string.removeRange(regex.find(string)!!.groups[1]!!.range)

return first instance of unmatched regex scala

Is there a way to return the first instance of an unmatched string between 2 strings with Scala's Regex library?
For example:
val a = "some text abc123 some more text"
val b = "some text xyz some more text"
a.firstUnmatched(b) = "abc123"
Regex is good for matching & replacing in strings based on patterns.
But to look for the differences between strings? Not exactly.
However, diff can be used to find differences.
object Main extends App {
val a = "some text abc123 some more text 321abc"
val b = "some text xyz some more text zyx"
val firstdiff = (a.split(" ") diff b.split(" "))(0)
println(firstdiff)
}
prints "abc123"
Is regex desired after all? Then realize that the splits could be replaced by regex matching.
The regex pattern in this example looks for words:
val reg = "\\w+".r
val firstdiff = (reg.findAllIn(a).toList diff reg.findAllIn(b).toList)(0)