Regex for Removing Everything Before Certain Comma Position - regex

I'm trying to remove and replace everything before the 13th comma in an array like so:
{1,1,0,0,0,4,0,0,0,0,20,4099,4241,706,706,714,714,817,824,824,824,2,2,2,2,1,1,1,1},
to where it becomes:
{706,706,714,714,817,824,824,824,2,2,2,2,1,1,1,1},
Reference: I'm using regex in Notepad ++.
I found this regex string to match everything after a certain comma to the end of the line:
,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*$
But how do I turn it around to start from the beginning?
I appreciate your time and help, thank you.

Whereas $ matches the end of the subject string, ^ matches the beginning. So if you want to match up to and including the 13th comma:
^[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,
Replace with "{".

You may use
{(?:[^,}]*,){13}
Replace with a mere {. See the regex demo. This version will work correctly even if you have {...} substrings spanning across lines and having fewer than 13 items in between.
Details
{ - a {
(?:[^,}]*,){13} - 13 consecutive occurrences of
[^,}]* - 0+ chars other than , and } (the } is important to avoid overflowing from one {...} substring into another)
, - a comma
You may also use
{\K(?:[^,}]*,){13}
And replace with an empty string. See another regex demo. You do not need to replace with { because \K omits the first { from the match, and it is thus kept in the final text.

Try the following find and replacement:
Find:
\{(?:[^,]*,){13}(.*)
Replace:
{$1
The above pattern could be slightly adjusted depending on what your expectations are for where this bracketed string might appear, edge cases you want to cover/avoid, etc.
Demo

Related

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

Trying to combine two Regex

I'm trying to combine two working regex patterns into one. Please let me know the correct syntax and if this can be better written.
Pattern 1: (?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|
Pattern 2: (?P<path>[^\/]+(?=\-[^\/-]*$))
Sample line:
06/Mar/2020:00:01:04 -0500|/TESTSTREAM|5766764|4.2.2.1|123290|path1/path2/x-fr-US.OPEN.1-Turtle-2020.30.04-64.mp3
The first expression matches the start of the string, the second matches the end, you can combine them by putting a non-greedy .*? between them, like this:
(?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|.*?(?P<path>[^\/]+(?=\-[^\/-]*$))
As you can see here this expression works, but it takes 1660 steps to match the string. This is because .* between | first capture the whole string up to the end, and then try to step back character by character in order to find the match.
If you use the non-greedy modifiers here: .*?, then the regex machine will initially match an empty string and then will need to move forward character by character until it finds the matching |. It will reduce the number of steps to 1183: demo
However, if you want to remove this backtracking (forward-tracking) at all, you can just very quickly skip as many non-| characters as possible with [^|]*. Similarly we can replace other .* patterns in the regex. The resulting regex finds a match in just 47 steps, more than 30-times less than the original regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|(?:[^\/\n]*\/)*(?P<path>.*)-.*
Demo here.
Update 2020-03-09
If you want to keep the last slash you can use this regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|.*?(?P<path>\/[^\/]*)-[^\/]*

Regex is matching second occurence. I need it to match first occurence

This is my regex code:
.*(X.*)\s(.*?)\$
This is my data string:
1247.P1.06.Z01.0020N.X396X111.Y008 1247.P1.06.Z01.0020N$M234477$
This is properly grabbing the second item that ends with the first $ sign:
1247.P1.06.Z01.0020N
But for the first string, I want it to grab:
X396X111.Y008
Instead it is grabbing:
X111.Y008
So I want it to get the first X and everything up to the space. But the second X is triggering the match.
The string starting with "X" is always 13 characters, so I tried specifying the length but it still started with the second X
I am fine with either pattern:
Start with the first X and end with the space.
Start with the first X and grab 13 characters.
Thank you.
Get rid of .* at the beginning of the regular expression. It's greedy, so it's skipping over the longest possible prefix that allows the rest of the regular expression to match. That forces the rest to get the last occurrence instead of the first.
DEMO
In general, it's not necessary to put .* at the beginning of end of a regular expression. It just looks for the pattern anywhere in the input, so stuff around the match will just be ignored.
Your match is too loose. A stricter regex could be:
X\S+\s
which matches an X, then every non whitespace character until a whitespace character.
Demo: https://regex101.com/r/Jl2BJS/2/
If the ID is always 13 characters you can do:
X.{13}
Demo: https://regex101.com/r/Jl2BJS/3/
Alternatively removing the .*, or making it non greedy with ? or the U modifier would also work.
Demo: https://regex101.com/r/Jl2BJS/4/ or https://regex101.com/r/Jl2BJS/5/

Regex: Find multiple matching strings in all lines

I'm trying to match multiple strings in a single line using regex in Sublime Text 3.
I want to match all values and replace them with null.
Part of the string that I'm matching against:
"userName":"MyName","hiScore":50,"stuntPoints":192,"coins":200,"specialUser":false
List of strings that it should match:
"MyName"
50
192
200
false
Result after replacing:
"userName":null,"hiScore":null,"stuntPoints":null,"coins":null,"specialUser":null
Is there a way to do this without using sed or any other substitution method, but just by matching the wanted pattern in regex?
You can use this find pattern:
:(.*?)(,|$)
And this replace pattern:
:null\2
The first group will match any symbol (dot) zero or more times (asterisk) with this last quantifier lazy (question mark), this last part means that it will match as little as possible. The second group will match either a comma or the end of the string. In the replace pattern, I substitute the first group with null (as desired) and I leave the symbol matched by the second group unchanged.
Here is an alternative on amaurs answer where it doesn't put the comma in after the last substitution:
:\K(.*?)(?=,|$)
And this replacement pattern:
null
This works like amaurs but starts matching after the colon is found (using the \K to reset the match starting point) and matches until a comma of new line (using a positive look ahead).
I have tested and this works in Sublime Text 2 (so should work in Sublime Text 3)
Another slightly better alternative to this is:
(?<=:).+?(?=,|$)
which uses a positive lookbehind instead of resetting the regex starting point
Another good alternative (so far the most efficient here):
:\K[^,]*
This may help.
Find: (?<=:)[^,]*
Replace: null

regex to start matching from last letter in previous match

I have following regex
(\{\w*\}\s*[^{}]+\s*)\{?
and I am testing it on this string
this {match} is cool{match} but {match} this one is more cool
currently I am able to capture 2 groups -> {match} is cool and {match} this one is more cool, so as you can see group but {match} is missing.
Reason for this is because last matched character is {, so in next matching turn he will skip {, and won't be able to match until new { occurrence.
Does anyone knows how to force to match middle group also?
Debugging: http://regex101.com/r/hM5xE6/2
You can probably just remove the \{? (and the \s* too); you also don't need the capturing parentheses:
\{\w*\}[^{}]+
Test it live on regex101.com.
If you want to enforce the match to end before a { or at the end of the string, you can use a positive lookahead assertion for that:
\{\w*\}[^{}]+(?=\{|$)
But you would only need that if you wanted to avoid a match completely if there are nested braces, like in {{match} whatever}, where the first regex would find {match} whatever.
You can use the following regular expression to start matching from last letter in previous match
\{\w*\}[^{}]+(?=\{|$)
You are including the next { in the regex (but not in the match), so it begins the next match on the character after, skipping the first { and not matching until you get to the second.
There's no need for lookaheads or anything like that.
If you remove the trailing check for \{?, you get all 3 matches (can also remove the } from the brackets and the last \s*):
(\{\w*\}\s*[^{]+)
(http://regex101.com/r/hM5xE6/7)
you can also use the following regex, depending on how specific you need to be with the capture:
(\{\w*\}[\w\s]*)
http://regex101.com/r/hM5xE6/5
(\{\w*\}\s*[^{}]+\s*)(?=\{|$)
Try this.Use lookahead for 0 width assertion.See demo.
http://regex101.com/r/qC9cH4/18