Regular expressions string splits - regex

I'm trying to create a pattern that enables me split a string on comas but ignoring expressions within curly brackets.
my existing code works great if only one group of curly bracket expressions exist in the string.
Dim expression As New Regex(",(?=(?:[^\{]*\{[^\{]*\})*(?![^\}]*\}))")
Try
parts = expression.Split(sortString)
For Each Item In parts
If Not Item Is Nothing Then
result.Add(Item)
End If
Next
Return result
If I pass the string
{IIF(Hemo.Site = "LV",1,IIF(Hemo.Site = "SVC",2,IIF(Hemo_Pressures.Site = "AO",3,4)))},Site DESC,Pressure1 ASC
It works, the curly bracket grouping is ignored and each string after is broken out with the coma split.
Problem is. If I need to accommodate multiple groupings of curly bracket expressions in my string and it begins to fail.
This fails:
{IIF(Hemo.Site = "LV",1,IIF(Hemo.Site = "SVC",2,IIF(Hemo_Pressures.Site = "AO",3,4)))},Site DESC,{IIF(Hemo.Site = "LV",1,IIF(Hemo.Site = "SVC",2,IIF(Hemo.Site = "AO",3,4)))}, Pressure1 ASC
one of the grouping is ignored as it should be, but the other grouping of curly brackets is not. Resulting in a dirty collection.
I would appreciate a second pair of eyes on this.

The pattern ,(?![^{]*\}) seems to be working for me, as long as there are no lose { or } floating around in the text.
Broken down, it evaluates as
,
zero-width negative lookahead
Any character not in "{"
* (zero or more times)
}
End Capture
You can use *,(?![^{]*\}) * to trim the spaces immediately before and after commas if required.
On your test string, it produces the follow splits:
{IIF(Hemo.Site = "LV",1,IIF(Hemo.Site =
"SVC",2,IIF(Hemo_Pressures.Site = "AO",3,4)))}
Site DESC
{IIF(Hemo.Site = "LV",1,IIF(Hemo.Site = "SVC",2,IIF(Hemo.Site =
"AO",3,4)))}
Pressure1 ASC
However, it will fail to properly split strings like
Apple, Ban}ana, Carrot {1,2,3}, Frog, Cat, 1,2,3
due to the lose } in "banana"

Related

Capturing a delimiter that isn't in between single quotes

Like the question says, is it possible to use a single Regex string to get a delimiter that isn't in between some quotes?
For example, I want to split this string with the delimiter &:
"example=3&testing='f&tmp'"
should produce
["example=3", "testing='f&tmp'"]
Essentially, things inside single quotes (' ') should remain untouched.
I found out how to get things within quotes with expression: (?:'.*?')
The closest I could get to a tangible solution was: (.[^']&[^'])
It is not an easy task for a String#split, but is quite a feasible task for Matcher#find if you use
[^&\s=]+=(?:'[^']*'|[^\s&]*)
(see this regex demo) and this Java code:
String text = "example=3&testing='f&tmp'";
Pattern p = Pattern.compile("[^&\\s=]+=(?:'[^']*'|[^\\s&]*)");
Matcher m = p.matcher(text);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
// => [example=3, testing='f&tmp']
Details
[^&\s=]+ - one or more chars other than &, = and whitespace
= - a = char
(?:'[^']*'|[^\s&]*) - a non-capturing group matching either ', zero or more chars other than ' and then a ', or zero or more chars other than whitespace and &.

Regular expresion with a specific character and without another

I'm trying to implement the escape character functionality in a macro generator I'm writing in Dart. For example, I would like the program to grab all the occurrences of '&param' in my string and replace it with 'John', unless the '&' character is preceded with the escape character '\'. Example: "My name is &param and my parameter is called \&param." -> "My name is John and my parameter is called &param". What would be the regular expression to catch all the substrings that contain the '&', then my parameter's name, and without the preceding '\'?
It's possible to match that, even avoiding escapes of backslashes, as:
var re = RegExp(r"(?<!(?:^|[^\\])(?:\\{2})*\\)&\w+");
This uses negative lookbehind to find a & followed by word-characters, and not preceded by an odd number of backslashes.
More likely, you want to also recognize double-backslashes and convert them to single-backslashes. That's actually easier if you try to find all matches, because then you know all preceding double-backslashes are part of an earlier match:
var re = RegExp(r"\\\\|(?<!\\)&\w+");
This, when used as re.allMatches will find all occurrences of \\ and &word where the latter is not preceded by an odd number of backslashes.
var _re = RegExp(r"\\\\|(?<!\\)&(\w+)");
String template(String input, Map<String, String> values) {
return input.replaceAllMapped(_re, (m) {
var match = m[0]!;
if (match == r"\\") return r"\";
var replacement = values[m[1]!];
if (replacement != null) return replacement;
// do nothing for undefined words.
return match;
});
}
(You might also want to allow something like &{foo} if parameters can occur next to other characters, like &{amount)USD).
To keep the character before &param when it matches a non-backslash character you need to use so called capturing groups. These are are subexpressions of a regular expression inside parentheses. To use capturing groups in Dard you need to use the method replaceAllMapped. We also have the case when the template starts with &param and in this case we match at the beginning of the string instead.
Try this:
void main() {
final template = 'My name is &param and my parameter is called \\&param.';
final populatedTemplate = template.replaceAllMapped(RegExp(r'(^|[^\\])&param\b'), (match) {
return '${match.group(1)}John';
});
final result = populatedTemplate.replaceAll(RegExp(r'\\&param\b'), 'John');
print(result);
}

Regex - change commas only in a portion of a string

I make a lot of changes on a original csv string. there is a lot of comma delimiter. I have to replace by a ";" either only the commas inside the expression || ....|| or only the commas outside this expression. i need to do this change in order to have different delimiter in the expression ||....|| compare to the rest of the string.
Example:
(.*)(?:\|\|)(?:.*)(,)(?:.*)\|\|
After I use
var regex = /myregex/g;
var str = str.replace(regex, ',')
thanks
You can use
const string = "aba,bjlj,alj,ljlj||name1,name2,name3||jflkj,glfgjlf,jflg,fjlfd||name1,name2||fd,sdfsfd,dfs||name1,name2,name3,name4,name5||";
console.log( string.replace(/\|{2}[\w\W]*?\|{2}/g, (x) => x.replace(/,/g, ';')) );
The regex is
/\|{2}.*?\|{2}/gs // matches any text between two double pipes
/\|{2}[\w\W]*?\|{2}/g // matches any text between two double pipes
/\|{2}.*?\|{2}/g // matches any text but line breaks between two double pipes
Note the . does not match line breaks without the s modifier flag.
The regex matches double pipe, then any zero or more chars, as few as possible up to the next double pipe.
Then, x, the whole match value, is passed as an argument to the anonymous callback function used as a replacement argument, and all commas are replaced with ; only inside the matches.
The "contrary" solution is to match and capture the strings between double pipes and only match commas in all other contexts so that you could keep the captures and replace those commas:
const string = "aba,bjlj,alj,ljlj||name1,name2,name3||jflkj,glfgjlf,jflg,fjlfd||name1,name2||fd,sdfsfd,dfs||name1,name2,name3,name4,name5||";
console.log( string.replace(/(\|{2}[\w\W]*?\|{2})|,/g, (x,y) => y || ';') );
Big Thanks.
I also find
var newStr = str.replace(/\|{2}.*?\|{2}/g, function(match) {
return match.replace(/,/g,";");
});
Do you think is it possible to do the contrary and change all the comma outside the occurence ||...|| ?

DART Conditional find and replace using Regex

I have a string that sometimes contains a certain substring at the end and sometimes does not. When the string is present I want to update its value. When it is absent I want to add it at the end of the existing string.
For example:
int _newCount = 7;
_myString = 'The count is: COUNT=1;'
_myString2 = 'The count is: '
_rRuleString.replaceAllMapped(RegExp('COUNT=(.*?)\;'), (match) {
//if there is a match (like in _myString) update the count to value of _newCount
//if there is no match (like in _myString2) add COUNT=1; to the string
}
I have tried using a return of:
return "${match.group(1).isEmpty ? _myString + ;COUNT=1;' : 'COUNT=$_newCount;'}";
But it is not working.
Note that replaceAllMatched will only perform a replacement if there is a match, else, there will be no replacement (insertion is still a replacement of an empty string with some string).
Your expected matches are always at the end of the string, and you may leverage this in your current code. You need a regex that optionally matches COUNT= and then some text up to the first ; including the char and then checks if the current position is the end of string.
Then, just follow the logic: if Group 1 is matched, set the new count value, else, add the COUNT=1; string:
The regex is
(COUNT=[^;]*;)?$
See the regex demo.
Details
(COUNT=[^;]*;)? - an optional group 1: COUNT=, any 0 or more chars other than ; and then a ;
$ - end of string.
Dart code:
_myString.replaceFirstMapped(RegExp(r'(COUNT=[^;]*;)?$'), (match) {
return match.group(0).isEmpty ? "COUNT=1;" : "COUNT=$_newCount;" ; }
)
Note the use of replaceFirstMatched, you need to replace only the first match.

Matlab: using regexp to get a string that has a whitespace in between

I want to use Regex to acquire some ID's in a cellstring array, the array looks like this:
myString = '(['US04650Y1001', 'US90274P3029', 'HON WI', 'US41165F1012'])';
My pattern for regex is as follows:
pattern = '[A-Za-z0-9.^_]+';
newArr = regexp(myString, pattern,'match');
I'd like to get the ID called 'HON WI', but with my current pattern, its splitting it into two because my pattern can't deal with the whitespace properly. I would like to get the whole "HON WI", as well as my other strings, everything that's in '', these might have special characters like ^, . or _, but I don't know how to add the whitespace.
I already tried stuff like this, without success:
pattern = '[A-Za-z0-9.^_\s]+';
My new array should have, in each cell, the strings/ID's contained in myString (US04650Y1001, US90274P3029, HON WI and US41165F1012) with dimensions 1x4.
Another approach that seems to work but not entirely sure:
myString = strrep(myString,'([','');
myString = strrep(myString,'])','');
myString = regexp(myString,',','split');
myString = strrep(myString,'''','');
This seems to get me what I want, but I would like to know how can I alter the regex on my first approach.
Many thanks in advance.
You may use a mere '([^']+)' regex and use 'tokens' to get the captures:
myString = '([''US04650Y1001'', ''US90274P3029'', ''HON WI'', ''US41165F1012''])';
pattern = '''([^'']+)''';
newArr = regexp(myString, pattern,'match', 'tokens');
The newArr will look like
{
[1,1] = 'US04650Y1001'
[1,2] = 'US90274P3029'
[1,3] = 'HON WI'
[1,4] = 'US41165F1012'
}
You may option is to use lookaround assertions. The following will match any string made of alphanumeric character or underscore (\w), space (' ') or characters . or ^, that is located between quotes. This will specifically exclude the blank space next to the comma, in the separation between tokens, i.e. ', ' does not give a match.
Note that \s will match any blank space character (including tab, newline), this is why a space is preferred here:
pattern2='(?<='')[\w.^ ]+(?='')';
pattern2 =
(?<=')[\w.^ ]+(?=')
newArr = regexp(myString, pattern2,'match');
newArr'
ans =
'US04650Y1001'
'US90274P3029'
'HON WI'
'US41165F1012'