Regexp help - is this possible at all with regexp? - regex

I'm still struggling with regexp, wondering if this is at all possible.
I need to parse variable names from expression, but I need to skip ones within string literals and ones after "dot".
so for expression like:
'test' + (n + text.length)
I would like to get only n and text.
I'm using /([a-z_][a-z0-9_]*)/gi
but it gives me test,n,text,length
Thanks for help:)

If your input is not too complicated, here is a possible regex option:
var re = /'[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|(?:^|[^.])\b(\w+)/g;
var str = '\'test\\\' this\' + "Missing \\\"here\\\"" + (n + text.length)';
document.body.innerHTML = "Testing string: <b>" + str + "</b><br/>";
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[1]) { res.push(m[1]); }
}
document.body.innerHTML += JSON.stringify(res, 0, 4);
The regex details:
'[^'\\]*(?:\\.[^'\\]*)*' - single quoted string literals (supporting escaped sequences)
| - or
"[^"\\]*(?:\\.[^"\\]*)*" - double quoted string literals (supporting escaped sequences)
| - or
(?:^|[^.])\b(\w+) - 1+ word characters that are either right at the string start or after a non-dot and preceded with a word boundary (placed inside Group 1)
See the regex demo.

Related

How to do a camel case to sentence case in dart

Something is wrong with my attempt:
String camelToSentence(String text) {
var result = text.replaceAll(RegExp(r'/([A-Z])/g'), r" $1");
var finalResult = result[0].toUpperCase() + result.substring(1);
return finalResult;
}
void main(){
print(camelToSentence("camelToSentence"));
}
It just prints "CamelToSentence" instead of "Camel To Sentence".
Looks like the problem is here r" $1"; but I don't know why.
You can use
String camelToSentence(String text) {
return text.replaceAllMapped(RegExp(r'^([a-z])|[A-Z]'),
(Match m) => m[1] == null ? " ${m[0]}" : m[1].toUpperCase());
}
Here,
^([a-z])|[A-Z] - matches and captures into Group 1 a lowercase ASCII letter at the start of string, or just matches an uppercase letter anywhere in the string
(Match m) => m[1] == null ? " ${m[0]}" : m[1].toUpperCase() returns as the replacement the uppercases Group 1 value (if it was matched) or a space + the matched value otherwise.
You should not use the / and /g in the pattern.
About the The replaceAll method:
Notice that the replace string is not interpreted. If the replacement
depends on the match (for example on a RegExp's capture groups), use
the replaceAllMapped method instead.
As is does not match, result[0] returns c and result.substring(1) contains amelToSentence so you are concatenating an uppercased c with amelToSentence giving CamelToSentence
You can also use lookarounds
(?<!^)(?=[A-Z])
(?<!^) Assert not the start of the string
(?=[A-Z]) Assert an uppercase char A-Z to the right
Dart demo
For example
String camelToSentence(String text) {
var result = text.replaceAll(RegExp(r'(?<!^)(?=[A-Z])'), r" ");
var finalResult = result[0].toUpperCase() + result.substring(1);
return finalResult;
}
void main() {
print(camelToSentence("camelToSentence"));
}
Output
Camel To Sentence

Kotlin .split() with multiple regex

Input: """aaaabb\\\\\cc"""
Pattern: ["""aaa""", """\\""", """\"""]
Output: [aaa, abb, \\, \\, \, cc]
How can I split Input to Output using patterns in Pattern in Kotlin?
I found that Regex("(?<=cha)|(?=cha)") helps patterns to remain after spliting, so I tried to use looping, but some of the patterns like '\' and '[' require escape backslash, so I'm not able to use loop for spliting.
EDIT:
val temp = mutableListOf<String>()
for (e in Input.split(Regex("(?<=\\)|(?=\\)"))) temp.add(e)
This is what I've been doing, but this does not work for multiple regex, and this add extra "" at the end of temp if Input ends with "\"
You may use the function I wrote for some previous question that splits by a pattern keeping all matched and non-matched substrings:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
You just need a dynamic pattern from yoyur Pattern list items by joining them with a |, an alternation operator while remembering to escape all the items:
val Pattern = listOf("aaa", """\\""", "\\") // Define the list of literal patterns
val rx = Pattern.map{Regex.escape(it)}.joinToString("|").toRegex() // Build a pattern, \Qaaa\E|\Q\\\E|\Q\\E
val text = """aaaabb\\\\\cc"""
println(splitKeepDelims(text, rx, false))
// => [aaa, abb, \\, \\, \, cc]
See the Kotlin demo
Note that between \Q and \E, all chars in the pattern are considered literal chars, not special regex metacharacters.

Split with a multicharacter regex pattern and keep delimiters

I have next string and regex for splitting it:
val str = "this is #[loc] sparta"
val regex = "((?<=( #\\[\\w{3,100}\\] ))|(?=( #\\[\\w{3,100}\\] )))"
print(str.split(Regex(regex)))
//print - [this is, #[loc] , sparta]
Works fine. But in develop I did not realize when in #[***] block must be a not only text (\w) - he have and "-" and numbers (UUID), and my correct blocks is -
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62]"
and in this case regex don't work.
How to change this part - "\w{3,100}" for new requirements?
I try change to any - "\.{3,100}" - not work
To fix your issue, you may replace your regex with
val regex = """((?<=( #\[[^\]\[]{3,100}] ))|(?=( #\[[^\]\[]{3,100}] )))"""
The \w can be replaced with [^\]\[] that matches any char but [ and ].
Note the use of a raw string literal, """...""", that allows the use of a single backslash as a regex escape.
See the Kotlin online demo.
Alternatively, you may use the following method to split and keep delimiters:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
Then, just use it like
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62] sparta"
val regex = """#\[[\]\[]+]""".toRegex()
print(splitKeepDelims(str, regex))
// => [this is , #[loc_75acca83-a39b-4df1-8c3c-b690df00db62], sparta]
See the Kotlin demo.
The \[[^\]\[]+] pattern matches
\[ - a [ char
[^\]\[]+ - 1+ chars other than [ and ]
] - a ] char.

Can this be done with regex?

I have a string with different length sub-strings split by symbol '_' and some sub-strings have to be split in multiple sub-sub-strings...
Example:
"_foo-2_bar-12_un[3;1]iver[3]se[3-7]"
should be split in groups like this:
"foo-2", "2", "bar-12", "12", "un[3;1]", "3;1", "iv", "er[3]", "3", "se[3-7]", "3-7"
I've come up with something like this:
/(?:((?:(?:\[([a-z0-9;-]+)\])|(?<=_)(?:[a-z0-9]+)|-([0-9]+))+))/ig
The problem I encounter is with the last part. And after finicking around I started to think whether or not this is even possible. Is it?
Any kind of a guidance is appreciated.
You can use the following regex:
/[^\W_]+(?:\[([^\][]*)]|-([^_]+))/g
See the regex demo
The pattern matches any 1+ char alphanumeric sequence ([^\W_]+) followed either with [...] substrings having no [ and ] inside (with \[([^\][]*)] - note it captures what is inside [...] into Group 1) OR a hyphen followed with 1+ characters other than _ (and this part after - is captured into Group 2).
var re = /[^\W_]+(?:\[([^\][]*)]|-([^_]+))/g;
var str = '_foo-2_bar-12_un[3;1]iver[3]se[3-7]';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[0]);
if (m[1]) {
res.push(m[1]);
} else {
res.push(m[2]);
}
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
In this code, the match object is analyzed at each iteration: the 0th group (the whole match) ias added to the final array, and then if Group 1 matched, Group 1 is added, else, Group 2 is added to the resulting array.

String Replacing in Regex

I am trying to replace text in string using regex. I accomplished it in c# using the same pattern but in swift its not working as per needed.
Here is my code:
var pattern = "\\d(\\()*[x]"
let oldString = "2x + 3 + x2 +2(x)"
let newString = oldString.stringByReplacingOccurrencesOfString(pattern, withString:"*" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
print(newString)
What I want after replacement is :
"2*x + 3 +x2 + 2*(x)"
What I am getting is :
"* + 3 + x2 +*)"
Try this:
(?<=\d)(?=x)|(?<=\d)(?=\()
This pattern matches not any characters in the given string, but zero width positions in between characters.
For example, (?<=\d)(?=x) This matches a position in between a digit and 'x'
(?<= is look behind assertion (?= is look ahead.
(?<=\d)(?=\() This matches the position between a digit and '('
So the pattern before escaping:
(?<=\d)(?=x)|(?<=\d)(?=\()
Pattern, after escaping the parentheses and '\'
\(?<=\\d\)\(?=x\)|\(?<=\\d\)\(?=\\\(\)