Regex divide string by commas ignoring function syntax [duplicate] - regex

This question already has answers here:
Split string delimited by comma without respect to commas in brackets
(3 answers)
Closed 4 years ago.
I need a regex that substitutes a string by looking at their commas.
For example the string:
str1 = "a,b,12,func(a,b),8,bob,func(1,2))"
should be transformed as following:
str1_transformed = "a;b;12;func(a,b);8;bob;func(1,2))"
I cannot substitute every "," with a ";" because it will look like:
str1_wrong = "a;b;12;func(a;b);8;bob;func(1;2))"
How can I deal with it?
I looked at the following threads without success:
How can I Split(',') a string while ignore commas in between quotes?
Regular Expression for Comma Based Splitting Ignoring Commas inside Quotes

If you know that you won't have unbalanced or escaped brackets below regex works well:
,(?![^()]*\))
Breakdown:
, Match a comma
(?! Start of negative lookahead
[^()]*\) That means, recent matched comma shouldn't follow a closing bracket without matching opening bracket
) End of lookahead
C# code:
Regex regex = new Regex(#",(?![^()]*\))");
string result = regex.Replace(#"a,b,12,func(a,b),8,bob,func(1,2))", #";");

Related

How to exclude a substring in a regular expression? [duplicate]

This question already has answers here:
What is the difference between .*? and .* regular expressions?
(3 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 5 months ago.
There is a line of text:
Lorem ~Ipsum~ is simply ~dummy~ text ~of~ the printing...
To find all the words enclosed in ~~ I use
re.search(r'~([^~]*)~', text)
Let's say it became necessary to use ~~ instead of ~
([^\~]+) indicates to exclude the ~ character from the text within those characters
How do I make a regular expression to exclude a string of characters instead of just one?
That is, ~~Lor~em~~ should return Lor~em
The symbol of the new string must not be excluded and the length of the found string cannot be 0
Use a non-greedy quantifier instead of a negated character set.
re.search(r'~~(.*?)~~', text, flags=re.DOTALL)
re.DOTALL makes . match newline characters.

Parsing regex with escaped pipe delimiter [duplicate]

This question already has answers here:
regular expression to match pipe separated strings with pipe escaping
(4 answers)
Closed 3 years ago.
Im trying to parse
|123|create|item|1497359166334|Sport|Some League|\|Team\| vs \|Team\||1497359216693|
With regex (https://regex101.com/r/KLzIOa/1/)
I currently have
[^|]++
Which is parsing everything correctly except \|Team\| vs \|Team\|
I would expect this to be parsed as |Team| vs |Team|
If i change the regex to
[^\\|]++
It parses the Teams separately instead of together with the escaped pipe
Basically i want to parse the fields between the pipes however, if there are any escaped pipes i would like to capture them. So with my example i would expect
["123", "create", "item", "1497359166334", "Sport", "Some League", "|Team| vs |Team|", "1497359216693"]
You can alternate between:
\\. - A literal backslash followed by anything, or
[^|\\]+ - Anything but a pipe or backslash
(?:\\.|[^|\\]+)+
https://regex101.com/r/KLzIOa/2
Note that there's no need for the possessive quantifier, because no backtracking will occur.
If you also want to replace \|s with |s, then do that afterwards: match \\\| and replace with |.
To handle escaping, you should match a backslash and the character after it as a single "item".
(?:\\.|[^|])++
This conveniently also works for escaping the backslashes themselves!
To then remove the backslashes from the results, use a simple replacement:
Replace: \\(.)
With: $1
Use:
(?:\\\||[^|])+
Demo & explanation

Regular Expressions: Remove x-leading spaces from lines [duplicate]

This question already has answers here:
Notepad++ Regex - Issue with ^ anchor and repeating patterns
(2 answers)
Closed 5 years ago.
To remove, e.g. (exactly) 2 leading spaces from each line, I've tried to replace
"^ "
with
""
I tried that with our own text editor and with Notepad++. Both behave the same and start the search at the same position where the last found/replace happend, so it will actually remove 2n spaces from each line (n >= 0). Is this the expected behavior? Is my used regular expression wrong for that task or do our own text editor and Notepad++ behave incorrectly?
The issue here is that Notepad++ will keep replacing a pattern so long as it keeps finding matches. This means that replacing ^ will keep stripping whitespace from the start of the string, so long as there are two or more leading spaces available.
Try this as a workaround:
Find:
^ (.*)$
Replace:
$1

Regex: replace multiple occurrences of a pattern within the same line, using one single regular expression [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I'm trying to capture the %20's in a URL and replace them with +'es, as well as strip away some other stuff, all preferably using a single regular expression.
Specifically, I'd like something like this
a%20sentence%20divided%20by%20spaces_123456.html
to be turned into something like this
a+sentence+divived+by+spaces
Edit: for clarity, it's crucial both the %20's AND the trailing _1233456.html are targeted, preferably using one single expression.
The source can be targeted with
^([\w]+%20)+.*\.html$ (multiple occurrences of [\w]+%20, followed by any character, followed by .html)
but I'm confused about how to specifically replace both the multiple occurrences of %20 and the trailing '123456'. I'd guess this would be a shot in the right direction
^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
$1 being each occurrence of ([\w]+)%20, $2 being each occurrence of [\w]+ within the first match, and $3 being [\w]+, but I'm not getting the result I'm looking for (using Sublime Text for this):
string: a%20sentence%20divided%20by%20spaces_123456.html
search: ^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
replace: $2+$3
expected result: a+sentence+divided+by+spaces
actual result: by+spaces
Any ideas where my line of thought goes awry?
You can use two regular expressions (there may be better solutions though):
var string ="a%20sentence%20divided%20by%20spaces_123456.html";
// replace %20 with +
var regex1 = '%20';
var re1 = new RegExp(regex1, 'g');
string = string.replace(re1, '+');
// trailing _12345
var regex2 = '([^_]+)_([^.]+)(\.html)$';
// match everything except an underscore and capture it in group 1
// underscore
// match everything except a dot
// match the file extension (html in this case) and capture it in group 3
var re2 = new RegExp(regex2);
string = string.replace(re2,'$1$3');
// replace the string with capture group 1 and 3
alert(string);
See a JS fiddle here.
Replacing parts of a string with different strings depending on what has been captured isn't something easily done with regex.
It can be done very easily using 2 regular expressions. However if you really want to do this with only 1 regex, here is a solution
Solution with 1 regular expression :
original_string = 'a%20sentence%20divided%20by%20spaces_123456.html'
searched_string = original_string + "+"
regex : '%20(?=[^\+]*(\+))|_[^_]*$'
replace : '$1'
result : a+sentence+divided+by+spaces
For the explanation :
The regex will search for either a "%20" followed by any string of character ending with "+" and capture the "+" OR every character after the last "_" and capture nothing
It will then replace the matched string by the capture string, which is a "+" if "%20" has been matched, and nothing if it's the end of the string
To work, this regex needs that the string contains a "+".
That is why you NEED to concatenate it at the end of your string (it will be erased by the regex at the end anyway)

Split on spaces except those within quotes but also allow escaped quotes [duplicate]

This question already has answers here:
Split a string by whitespace, keeping quoted segments, allowing escaped quotes
(4 answers)
Closed 8 years ago.
If my input string is:
Word "Two Words" Quo\"te "Quo\"ted Words"
The output should be:
Word Two Words Quo"te Quo"ted Words.
I can't seem to work in the escaped quotes in my regex.
split using this pattern (?<!\\)".*?(?<!\\)"(*SKIP)(*F)|\s
Demo