How can I fix this negative lookahead to make it work - regex

I have a string for example as follows:
ABCD17; ABC18; ABCEF19; XYZ19; ABCDE
Within the MusicBee application, I'm attempting to use a Regex replace function to swap MATCHED items for blanks and thus transform the above string into
ABCEF19; XYZ19
i.e. ONLY retain the items ending in "19"
The elements can be any length and they may or may not end in a number.
The following expression correctly matches the items Ending in 19
[^|;].*(?=19).{3}
However, I obviously need the opposite of this (since the matched items are then replaced with empty strings) which is NOT (surprisingly to me)
[^|;].*(?!19).{3}

If you only want to keep items that end on 19, one option might be to use word boundaries \b and start matching 1+ uppercase chars A-Z.
Optionally match the digits at the end when it is not 19 using the negative lookahead (?!19\b)
\b[A-Z]+(?!19\b)\d*\b;?
\b Word boundary
[A-Z]+ Match 1+ uppercase chars A-Z (or use [^\W\d] to match word chars without a digit)
(?!19\b) Negative lookahead, assert what is directly on the right is not 19
\d* Match 0+ digits
\b;? Word boundary and optionally match ;
Regex demo

Related

How do I include characters, space and maximum two consecutive character in dart?

Does anyone know how to use a regular expression in Dart, where maximum two consecutive characters allow.
Example :
aabc //allow
aaabc // not allow
aabc aabc // not allow same string
aabcde //allow
abs cdd ert fgg fgy df //allow
I tried this but it won't work:
([A-Za-z])?!.*\1
([a-zA-Z])-?\1-?\1-?\1-?\1
^(?:([A-Za-z])(?!.\1))$
If you want to match chars A-Za-z, you can make use of word boundaries and use 2 negative lookaheads.
The first lookahead excludes matching 3 of the same chars in a row, the second lookahead excludes 2 times the same "word" where a word is identified by the word boundaries.
^(?!.*([A-Za-z])\1\1)(?!.*\b([A-Za-z]+)\b.*\2).+
Explanation
^ Start of string
(?!.*([A-Za-z])\1\1) Negative lookahead, assert that to the right is not a char A-Za-z directly followed by 2 times the same char using a backreference
(?!.*\b([A-Za-z]+)\b.*\2) Negative lookahead, assert that to the right is not 1+ chars A-Za-z surrounded by a word boundary, and then find that same "word" again
.+ Match 1+ chars
See a regex demo.

Python regex for sequence containing at least two digits/letters

using the Python module re, I would like to detect sequences that contain at least two letters (A-Z) and at least two digits (0-9) from a text, e.g., from the text
"N03FZ467 other text N03671"
precisely the sub-string "N03FZ467" shall be matched.
The best I have got so far is
(?=[A-Z]*\d)[A-Z0-9]{4,}
which detects sequences of length at least 4 that contain only letters A-Z and digits 0-9, and at least one digit and one letter.
How can I make sure I respectively get at least two?
If you want to match full words, start matching at word boundaries \b.
Check the first condition (two upper) by a lookahead: (?=(?:\d*[A-Z]){2})
If this succeeds, match the second requirement, two digits: (?:[A-Z]*\d){2}
Finally match any remaining [A-Z\d]* until another \b.
Putting it together:
\b(?=(?:\d*[A-Z]){2})(?:[A-Z]*\d){2}[A-Z\d]*\b
See this demo at regex101 or a Python demo at tio.run
Note that a lookahead is a zero length assertion, it does not consume characters. If you don't specifiy a starting point eg \b, the lookahead will be used at any place which is less efficient.
Further to mention, the minimum length of at least four will be satisfied by the requirements.
Use look aheads, one for each requirement:
^(?=(.*\d){2})(?=(.*[A-Z]){2}).*
See live demo.
Regex breakdown:
(?=(.*\d){2}) is "2 digits somewhere ahead"
(?=(.*[A-Z]){2}) is "2 letters somewhere ahead"
The more efficient version:
^(?=(?:.*?\d){2})(?=(?:.*?[A-Z]){2}).*
It's more efficient because it doesn't capture (uses non-capturing groups (?:...)) and it uses the reluctant quantifier .*? which matches as early as possible in the input, whereas .* will scan ahead to the end then backtrack to find a match.
If you only want to match chars A-Z and 0-9 you can use a single lookahead (if supported) to make sure there are 2 digits present, and then match 2 times A-Z when matching the string.
As you have asserted 2 chars and matching 2 chars, then length is automatically at least 4 chars.
\b(?=[A-Z\d]*\d\d)[A-Z\d]*[A-Z]{2}[A-Z\d]*\b
Explanation
\b A word boundary to prevent a partial word match
(?=[A-Z\d]*\d\d) Positive lookahead, assert 2 digits to the right
[A-Z\d]* Match optional chars A-Z or digits
[A-Z]{2} Match 2 uppercase chars A-Z
[A-Z\d]* Match optional chars A-Z or digits
\b A word boundary
See a regex demo.
I would enhance given answer and do this:
(?=\b(?:\D+\d+){2}\b)(?=\b(?:[^a-z]+[a-z]+){2}\b)\S+
Regex demo
This contains two lookaheads, each validating one rule:
(?=\b(?:\D+\d+){2}\b) - lookahead that asserts that what follows is word boundary \b, then its a non-digits followed by digits \D+\d+ to determine that we have at least two such groups. Then words boundary again, two be sure we are within one "word".
Another look ahead is the same, but now isntead of digits and non digits we have letter [a-z] and non-letters [^a-z] - (?=\b(?:[^a-z]+[a-z]+){2}\b)
At the end, we just match whole 'word' with \S+ which is simply match all non-whitespace characters (since we asserted earlier our 'word', this is sufficient).

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regex Capturing alternating letters and numbers only when it does not begin the string

I'm trying to capture alternating numbers and alphabets (alphabets come first) and ultimately remove them, unless it starts the string.
So in the below example, yellow is what I'm trying to capture:
While I'm identifying the correct rows I'm having a hard time just capturing just the yellow highlighted however...
^(?!([A-Z]+\d+\w*))(?:(.+))[A-Z]+\d+\w*
https://regexr.com/673hl
Any help greatly appreciated.
You can use
(?!^)\b[A-Z]+\d+\w*
See the regex demo. Details:
(?!^) - a negative lookahead that matches a position that is NOT at the start of string
\b - match a word boundary, the preceding char must a non-word char (or start of string, but the lookahead above already ruled that position out)
[A-Z]+ - one or more uppercase ASCII letters
\d+ - one or more digits
\w* - zero or more letters, digits or underscores.
If you want to match any kind of alphanumeric strings add an alternative:
(?!^)\b(?:[A-Z]+\d|\d+[A-Z])\w*
And to make it case insensitive:
(?!^)\b(?:[A-Za-z]+\d|\d+[A-Za-z])\w*

Regex for extracting digits in a string not in a word and not separated by a symbol?

I want to extract an ID from a search query but I don't know the length of the ID.
From this input I want to get the numbers that are not in the words and the numbers that are not separated by symbols.
12 11231390 good123e41 12he12o1 1391389 dajue1290a 12331 12-10 1.2 test12.0why 12+12 12*6 2d1139013 09`29 83919 1
Here I want to return
12 11231390 1391389 12331 83919 1
So far I've tried /\b[^\D]\d*[^\D]\b/gm but I get the numbers in between the symbols and I don't get the 1 at the end.
You could repeatedly match digits between whitespace boundaries. Using a word boundary \b would give you partial matches.
Note that [^\D] is the same as \d and would expect at least a single character.
Your pattern can be written as \b\d\d*\d\b and you can see that you don't get the 1 at the end as your pattern matches at least 2 digits.
(?<!\S)\d+(?:\s+\d+)*(?!\S)
The pattern matches:
(?<!\S) Negateive lookbehind, assert a whitespace boundary to the left
\d+(?:\s+\d+)* Match 1+ digits and optionally repeat matching 1+ whitespace chars and 1+ digits.
(?!\S) Negative lookahead, assert a whitspace boundary to the right
Regex demo
If lookarounds are not supported, you could use a match with a capture group
(?:^|\s)(\d+(?:\s+\d+)*)(?:$|\s)
Regex demo