Split with a multicharacter regex pattern and keep delimiters - regex

I have next string and regex for splitting it:
val str = "this is #[loc] sparta"
val regex = "((?<=( #\\[\\w{3,100}\\] ))|(?=( #\\[\\w{3,100}\\] )))"
print(str.split(Regex(regex)))
//print - [this is, #[loc] , sparta]
Works fine. But in develop I did not realize when in #[***] block must be a not only text (\w) - he have and "-" and numbers (UUID), and my correct blocks is -
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62]"
and in this case regex don't work.
How to change this part - "\w{3,100}" for new requirements?
I try change to any - "\.{3,100}" - not work

To fix your issue, you may replace your regex with
val regex = """((?<=( #\[[^\]\[]{3,100}] ))|(?=( #\[[^\]\[]{3,100}] )))"""
The \w can be replaced with [^\]\[] that matches any char but [ and ].
Note the use of a raw string literal, """...""", that allows the use of a single backslash as a regex escape.
See the Kotlin online demo.
Alternatively, you may use the following method to split and keep delimiters:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
Then, just use it like
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62] sparta"
val regex = """#\[[\]\[]+]""".toRegex()
print(splitKeepDelims(str, regex))
// => [this is , #[loc_75acca83-a39b-4df1-8c3c-b690df00db62], sparta]
See the Kotlin demo.
The \[[^\]\[]+] pattern matches
\[ - a [ char
[^\]\[]+ - 1+ chars other than [ and ]
] - a ] char.

Related

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Kotlin .split() with multiple regex

Input: """aaaabb\\\\\cc"""
Pattern: ["""aaa""", """\\""", """\"""]
Output: [aaa, abb, \\, \\, \, cc]
How can I split Input to Output using patterns in Pattern in Kotlin?
I found that Regex("(?<=cha)|(?=cha)") helps patterns to remain after spliting, so I tried to use looping, but some of the patterns like '\' and '[' require escape backslash, so I'm not able to use loop for spliting.
EDIT:
val temp = mutableListOf<String>()
for (e in Input.split(Regex("(?<=\\)|(?=\\)"))) temp.add(e)
This is what I've been doing, but this does not work for multiple regex, and this add extra "" at the end of temp if Input ends with "\"
You may use the function I wrote for some previous question that splits by a pattern keeping all matched and non-matched substrings:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
You just need a dynamic pattern from yoyur Pattern list items by joining them with a |, an alternation operator while remembering to escape all the items:
val Pattern = listOf("aaa", """\\""", "\\") // Define the list of literal patterns
val rx = Pattern.map{Regex.escape(it)}.joinToString("|").toRegex() // Build a pattern, \Qaaa\E|\Q\\\E|\Q\\E
val text = """aaaabb\\\\\cc"""
println(splitKeepDelims(text, rx, false))
// => [aaa, abb, \\, \\, \, cc]
See the Kotlin demo
Note that between \Q and \E, all chars in the pattern are considered literal chars, not special regex metacharacters.

Regex for characters in specific location in string

Using notepad++, how can I replace the -s noted by the carats? The dashes I want to replace occurs every 7th character in the string.
11.871-2-2.737-2.00334-2
^ ^ ^
123456781234567812345678
It's pretty simple since it's only dashes:
(\S*?)-
Begin capture group.............................. (
Find any number of non-space chars... \S*
Lazily until...............................................?
End capture group...................................)
No capture find hyphen...........................-
Demo 1
var str = `11.871-2-2.737-2.00334-2`;
var sub = `$1`;
var rgx = /(\S*?)-/g;
var res = str.replace(rgx, sub);
console.log(res);
"There is a dash (right above 1) that I would like to preserve. This seems to get rid of all the dashes in the string"
The question clearly shows that there isn't a dash at the "1 position", but since there's a possibility that it's possible considering the pattern (n7). Don't have time to break it down, but I can refer you to a proper definition of the meta char \b.
Demo 2
var str = `-11.871-2-2.737-2.00334-2`;
var sub = `$1$2`;
var rgx = /\b[-]{1}(\S*?)-(\S*?)\b/g;
var res = str.replace(rgx, sub);
console.log(res);
Search for ([0-9\.-]{6,6})-
Replace with: $1MY_SEPARATOR

Scala: concatenating a string in a regex pattern string causing issue

If I am doing this it is working fine:
val string = "somestring;userid=someidT;otherstuffs"
var pattern = """[;?&]userid=([^;&]+)?(;|&|$)""".r
val result = pattern.findFirstMatchIn(string).get;
But I am getting an error when I am doing this
val string = "somestring;userid=someidT;otherstuffs"
val id_name = "userid"
var pattern = """[;?&]""" + id_name + """=([^;&]+)?(;|&|$)""".r
val result = pattern.findFirstMatchIn(string).get;
This is the error:
error: value findFirstMatchIn is not a member of String
You may use an interpolated string literal and use a bit simpler regex:
val string = "somestring;userid=someidT;otherstuffs"
val id_name = "userid"
var pattern = s"[;?&]${id_name}=([^;&]*)".r
val result = pattern.findFirstMatchIn(string).get.group(1)
println(result)
// => someidT
See the Scala demo.
The [;?&]$id_name=([^;&]*) pattern finds ;, ? or & and then userId (since ${id_name} is interpolated) and then = is matched and then any 0+ chars other than ; and & are captured into Group 1 that is returned.
NOTE: if you want to use a $ as an end of string anchor in the interpolated string literal use $$.
Also, remember to Regex.quote("pattern") if the variable may contain special regex operators like (, ), [, etc. See Scala: regex, escape string.
Add parenthesis around the string so that regex is made after the string has been constructed instead of the other way around:
var pattern = ("[;?&]" + id_name + "=([^;&]+)?(;|&|$)").r
// pattern: scala.util.matching.Regex = [;?&]userid=([^;&]+)?(;|&|$)
val result = pattern.findFirstMatchIn(string).get;
// result: scala.util.matching.Regex.Match = ;userid=someidT;

Regular expression that matches string equals to one in a group

E.g. I want to match string with the same word at the end as at the begin, so that following strings match:
aaa dsfj gjroo gnfsdj riier aaa
sdf foiqjf skdfjqei adf sdf sdjfei sdf
rew123 jefqeoi03945 jq984rjfa;p94 ajefoj384 rew123
This one could do te job:
/^(\w+\b).*\b\1$/
explanation:
/ : regex delimiter
^ : start of string
( : start capture group 1
\w+ : one or more word character
\b : word boundary
) : end of group 1
.* : any number of any char
\b : word boundary
\1 : group 1
$ : end of string
/ : regex delimiter
M42's answer is ok except degenerate cases -- it will not match string with only one word. In order to accept those within one regexp use:
/^(?:(\w+\b).*\b\1|\w+)$/
Also matching only necessary part may be significantly faster on very large strings. Here're my solutions on javascript:
RegExp:
function areEdgeWordsTheSame(str) {
var m = str.match(/^(\w+)\b/);
return (new RegExp(m[1]+'$')).test(str);
}
String:
function areEdgeWordsTheSame(str) {
var idx = str.indexOf(' ');
if (idx < 0) return true;
return str.substr(0, idx) == str.substr(-idx);
}
I don't think a regular expression is the right choice here. Why not split the the lines into an array and compare the first and the last item:
In c#:
string[] words = line.Split(' ');
return words.Length >= 2 && words[0] == words[words.Length - 1];