Find and replace a Regex pattern occurring more than once [duplicate] - regex

This question already has answers here:
How can I match overlapping strings with regex?
(6 answers)
Matching when an arbitrary pattern appears multiple times
(1 answer)
Closed 2 years ago.
I'm trying to find-and-replace instances where consecutive commas appear throughout a string; replacing them w/ something like ",N/A,". I was using a very simple /,,/g pattern, and that works on things like ",,abc" and ",,,,abc" (with even numbers of commas). However, it doesn't catch things like ",,,abc". That's because the first two commas are considered a match, and then the third comma is just considered part of a new ",abc" string. Is there a way to handle this w/ a RegEx pattern or options? Otherwise, I'm going to need to perform multiple searches.
FWIW - I'm working in JavaScript, but I'm guessing this is just a general RegEx question/answer.

The reason why /,,/g only matches once with three commas is because the global match restarts after the position of the final consumed characters. You need a way to match the pattern of ,, without consuming those characters for pattern matching purposes.
If your language supports it, use a positive lookahead. A positive lookeahead lets a regex match some additional characters, but not consume them in the pattern.
/,(?=,)/g
In English, this means:
, # match a comma, then
(?= #start a group that must exist, and if so, isn't consumed by the pattern,
, # a comma
)
See more about this here: https://www.regular-expressions.info/lookaround.html
Javascript supports positive lookahead. :)

Related

Regex - match number within a text that does not start with a certain string

I've searched through multiple answers on SO now, but most of them consider the beginning of the line as the whole string being looked upon, which doesn't serve my case, I think (at least all the answers I tried didn't work).
So, I want to match all codes within a text that are 7-digit long, start with 1 or 2, and are not prefixed by "TC-" and its lowercase variants.
Came up with the /(!?TC-){0}(1|2)\d{6}/g expression, but it doesn't work for not matching the codes that start with "TC-", and I don't know how can I prevent from selecting those. Is there a way to do that?
I've created an example pattern on Regexr: regexr.com/6p70c.
You can assert not TC- to the left using negative lookbehind (?<! and omit the {0} quantifier as that makes it optional:
(?<!\bTC-)\b[12]\d{6}\b
Regex demo

regex ${something} [duplicate]

This question already has answers here:
Regex to get string between curly braces
(16 answers)
Closed 2 years ago.
How do I use regex to get what is inside of a ${} enclosed value such as:
Dont care about this. ${This is the thing I want regex to return} Not this either.
Should return:
This is the thing I want regex to return
I've tried \${ }$
the best I got messing around on regex101.com
I'll be honest I have no Idea what I'm doing as far as regex goes
using on c++ but would also (if possible) like to use in geany text editor
I suggest \${[^}]*}. Note that $ have special meaning in regular expressions and need to be escaped with a \ to be read literary.
I use [^}]* instead of .* between the braces to avoid making a long match including the entire value of:
${Another} match, more then one ${on the same line}
[^}] means anything but }
What you want is matching the starting ${ and the ending } with any amount of characters in between: \$\{.*\}. The special part here is the .*, . means any character and * means the thing in front of it can be matched 0 or more times.
Since you want thre matched results, you might also want to wrap it in (): (\$\{.*\}). The parenthesis makes regex remember the stuff inside for later use.
See this stackoverflow on how to get the results back:
How to match multiple results using std::regex

Regex - How to exclude matches without look-behind? [duplicate]

This question already has answers here:
How to negate specific word in regex? [duplicate]
(12 answers)
Closed 3 years ago.
I'm trying to scan all attributes from a database, searching for specific patterns and ignoring similar ones that I know should not match but I'm having some problems as in the below example:
Let's say I'm trying to find Customer Registration Numbers and one of my patterns is this:
.*CRN.*
Then I'm ignoring everything that are not CRNs (like currency and country name) like this:
(CRN)(?!CY|AME)
So far everything is working fine as look ahead is included in Javascript
The next step is to exclude things like SCRN (screen) for example but look behind (?<!S)(CRN)(?!CY|AME) doesn't work.
Is there any alternative?
Example inputs:
CREDIT_CARD
DISCARD
CARDINALITY
CARDNO
My Regex (?!.*DISCARD.*|.*CARDINALITY.*).*CARD.*
CARDINALITY was removed but DISCARD still being considered :(
The regex that you want is:
(?!\b(?:CARDINALITY|DISCARD)\b)(\b\w*CARD\w*\b)
It is important that you are testing the negative lookahead against the entire word and thus we are trying to match (\b\w*CARD\w*\b) rather than just CARD. The problem with the following regex:
(?!(?:CARDINALITY|DISCARD))CARD
is that with the case of DISCARD, when the scan is at the character position where CARD begins, we are past DIS and you would need a negative lookbehind condition to eliminate DISCARD from consideration. But when we are trying to match the complete word as we are in the regex I propose, we are still at the start of the word when we are applying the negative lookahead conditions.
Regex Demo (click on "RUN TESTS")

Regex: split number into optional first group of up to three then last group of up to three

I have two 1-6 digit numbers separated by a slash. I want these split up into groups of at most 3 digits, taking from the right.
For example:
0/1 -> [,0,,1]
1234/3 -> [1,234,,3]
12345/1234 -> [12,345,1,234]
123456/789123 -> [123,456,789,123]
I need to use a regular expression to do this because I want to do this for a location in NGINX. It's possible to do this with application logic but that is not the question due to performance.
Similar question which solves part of this was here using a negative lookahead: Regular expression to match last number in a string
What regex can achieve this split?
UPDATE:
This regex comes close to what I want (https://regex101.com/r/bQtNdK/3):
(?<prefix1>\d{0,3}?)(?<threes1>\d{0,3})\/(?<prefix2>\d{0,3}?)(?=\d)(?<threes2>\d{0,3})
It fails matching if the second number behind the slash is more than 3 digits long.
UPDATE2:
Now this regex works for most combinations (https://regex101.com/r/bQtNdK/5):
(?<prefix1>\d{0,3}?)(?<threes1>\d{1,3})\/(?<prefix2>\d{0,3})(?<threes2>\d{3})
I don't understand why this starts to fail if I use the same regex for prefix2/threes2 like prefix1/threes1 (i.e. make prefix2 also lazy). Any ideas how to solve this? So close...
I don't know that it's possible without the ability for the regex engine to remember all intermediate matches of a match group that matched an arbitrary number of times (.NET can do this, not sure what others). PCRE will apparently only remember the 'last' match for each group, other wise you could use something like this : (?<prefix1>\d{0,2})(?:(?<threes1>\d{3})*)\/(?<prefix2>\d{0,2})(?<threes2>\d{3})*\s
This regex seems to be correct now (regex101):
(?<prefix1>\d{0,3}?)(?<suffix1>\d{1,3})\/(?<prefix2>\d{0,3}?)(?<suffix2>\d{1,3})\/

Regexp to start matching after a specific character [duplicate]

This question already has an answer here:
Regex to match text after a given character excluding the character itself
(1 answer)
Closed 6 years ago.
In someXstring it's easy to find everything after and including 'X'.
What I need is to find everything after, but EXCLUDING 'X'.
... just to match string in it.
Try using a lookbehind assertion.
(?<=X)\w+
If you regex engine doesn't support lookbehind assertions, you can work around that using capturing groups.
X(\w+)
In the above regex, string would be accessed referencing \1.
NOTE: this uses \w to capture word characters. If you literally mean that you want to capture everything then use the dot, ., metacharacter instead...
(?<=X).+$
You can use lookbehind if available
(?<=X).*$
if not you can use groups.Grab group 1.
X(.*$)