Regex Negative Lookbehind to Negate a Full Capture - regex

I'm want to detect numbers that don't have letters before them.
example:
ignore: covid19
accept: 19
I have this regex: (?<![a-z])(\d+) that uses a negative lookbehind to check if there's letters b4 the numbers, and if so, don't capture.
The problem is that if I type covid1 then it is ignored as expected, but if I type covid19 or covid1+[any more numbers] it doesn't get ignored.
How do I do a negative lookbehind that negates the whole capture following it?

Your regex was very close. Although with some changes if you want only the numbers to matched i.e. it doesn't has any letters associated with it.
(?<![\S])(\b\d+\b)
^^ ^^ ^^ changes
Explanation of the above regex:
\b - Represents a word-boundary.
\d+ - Matches digit[0-9] one or more times.
\S - Matches any non-whitespace character.
(?<![\S]) - Negative look-behind asserts that non-whitespace characters should not be matched before and after the 1st capturing group.
You can find the demo in here.

Related

How to use a negative lookahead to prevent my regular expression from matching?

I'm using this regular expression: ^(\d+(?:\.\d+)?) that will match any decimal or integer numeric value that is followed by any character and will capture only the numeric part of it. For example this regex will match the following values and capture the numeric part of them:
10.5
10.5 Inches
10 Inches
However, it seems like my regex will also match the following value: 6" + 1.5". I want to update my regex so that it doesn't match for these type of values. So it shouldn't match if there are multiple numeric values.
I tried doing a negative lookahead like this ^(\d+(?:\.\d+)?)(?!\d), but it doesn't seem to be working.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this regex:
^(\d+(?:\.\d+)?)\b(?!.*\d)
RegEx Demo
RegEx Breakdown:
^: Line start
(: Start a capture group
\d+: Match 1+ digits
(?:\.\d+)?: Optionally match dot and 1+ digits
): End capture group
\b: Word boundary
(?!.*\d): Negative lookahead to assert that there is no digit ahead after this match

Regular expression for SSN without all consecutive numbers

I'm working on a regular expression for SSN with the rules below. I have successfully applied all matching rules except #7. Can someone help alter this expression to include the last rule, #7:
^((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$|(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}$)
Hyphens should be optional (this is handled above by using 2 expressions with an OR
Cannot begin with 000
Cannot begin with 666
Cannot begin with 900-999
Middle digits cannot be 00
Last four digits cannot 0000
Cannot be all the same numbers ex: 111-11-1111 or 111111111
Add the following negative look ahead anchored to start:
^(?!(.)(\1|-)+$)
See live demo.
This captures the first character then asserts the rest of the input is not made of that captured char or hyphen.
The whole regex can be shortened to:
^(?!(.)(\1|-)+$)(?!000|666|9..)(?!...-?00)(?!.*0000$)\d{3}(-?)\d\d\3\d{4}$
See live demo.
The main trick to not having to repeat the regex both with and without the hyphens was to capture the optional hyphen (as group 3), then use a back reference \3 to the capture in the next position, so are either both there or both absent.
First, let's shorten the pattern as it contains two next-to identical alternatives, one matching SSN with hyphens, and the other matching the SSN numbers without hyphens. Instead of ^(x-y-z$|xyz$) pattern, you can use a ^x(-?)y\1z$ pattern, so your regex can get reduced to ^(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$, see this regex demo here.
To make a pattern never match a string that contains only identical digits, you may add the following negative lookahead right after ^:
(?!\D*(\d)(?:\D*\1)*\D*$)
It fails the match if there are
\D* - zero or more non-digits
(\d) - a digit (captured in Group 1)
(?:\D*\1)* - zero or more occurrences of any zero or more non-digits and then then same digit as in Group 1, and then
\D*$ - zero or more non-digits till the end of string.
Now, since I suggested shortening the regex to the pattern with backreference(s), you will have to adjust the backreferences after adding this lookahead.
So, your solution looks like
^(?!\D*(\d)(?:\D*\1)*\D*$)(?!000|666)[0-8]\d{2}(-?)(?!00)\d{2}\2(?!0000)\d{4}$
^(?![^0-9]*([0-9])(?:[^0-9]*\1)*[^0-9]*$)(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\2(?!0000)[0-9]{4}$
Note the \1 in the pattern without the lookahead turned into \2 as (-?) became Group 2.
See the regex demo.
Note also that in some regex flavors \d is not equal to [0-9].

Ungreedy with look behind

I have this kind of text:
other text opt1 opt2 opt3 I_want_only_this_text because_of_this
And am using this regex:
(?<=opt1|opt2|opt3).*?(?=because_of_this)
Which returns me:
opt2 opt3 I_want_only_this_text
However, I want to match only "I_want_only_this_text".
What is the best way to achieve this?
I don't know in what order the opt's will appear and they are only examples. Actual words will be different and there will be more of them.
Test screenshot
Actual data:
regex
(?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
text
こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります
what I want to get:
冷凍で
You can use
(?<=※|を|備考|町|品は|。)(?:(?!※|を|備考|町|品は|。).)*?(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
See the regex demo. The scheme is the same as in (?<=opt1|opt2|opt3)(?:(?!opt1|opt2|opt3).)*?(?=because_of_this) (see demo).
The tempered greedy token solution allows you to match multiple occurrences of the same pattern in a longer string.
Details
(?<=※|を|備考|町|品は|。) - a positive lookbehind that matches a location that is immediately preceded with one of the alternatives listed in the lookbehind
(?:(?!※|を|備考|町|品は|。).)*? - any char other than a line break char, zero or more but as few as possible occurrences, that is not a starting point of any of the alternative patterns in the negative lookahead
(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) - a positive lookahead that requires one of the alternative patterns to appear immediately to the right of the current location.
You could add a negative lookahead (?!\s*opt\d) to assert that there is no opt and a digit to the right. You can use a character class to list the digits 1, 2 and 3 instead of using the alternation with |.
(?<=\bopt[123]\s(?!\s*opt\d)).*?(?=\s*\bbecause_of_this\b)
Regex demo
It might be a bit more efficient to use a match with a capture group:
\bopt[123]\s(?!\s*opt\d)(.*?)\s*\bbecause_of_this\b
Regex demo
What about:
.*\bopt[123]\b\s*(.*?)\s*because_of_this\b
See the online demo.
.* - A greedy match of any character other than newline upto the last occurence of:
\bopt[123]\b - A word boundary followed by literally "opt" with a trailing number 1, 2 or 3 and another word boundary.
\s* - 0+ whitespace characters.
(.*?) - A 1st capture group with a lazy match of 0+ characters upto:
\s* - 0+ whitespace characters.
because_of_this\b - Literally "because_of_this" followed by a word-boundary.
If you need to have this written out in alternations:
.*\b(?:opt1|opt2|opt3)\b\s*(.*?)\s*because_of_this\b
See that demo.

Negative Lookahead not match suffix

I have an expression that is matching something, but am trying to get this not to match if it's followed by the suffix: one or more spaces, three dashes, one or more spaces, one or more digits, a slash, and finally one or more digits. Here is the expression:
(?<=(^|\s+))[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!(\s+\-\-\-\s+[0-9]+/[0-9]+))
And here is the text:
January 10.5/13.5 --- 22/26 ---
It's matching January 10.5/13, but I don't want it to match anything.
As lookarounds are supported, you can change the positive lookbehind at the start to a negative lookbehind asserting a whitespace boundary to the left (?<!\S)
You can use .* to it to scan the whole line, instead of starting with 1+ more whitespace chars \s+
The negative lookahead (?!.*\s-{3}\s+[0-9]+/[0-9] asserts that what is on the right is not the suffix.
You can omit the quantifier + after the last character class, as it does not matter if there are 1 or more digits following...as long as it is not a digit.
Note that in the current pattern, the decimal part is an optional capturing group 2. If you want that whole value in group 1, you can make it an optional group.
(?<!\S)[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!.*\s-{3}\s+[0-9]+/[0-9])
Regex demo

Regex to match character if not between digits

I need to match a character to split a big string, let's say -, but not if it's between two digits
In a-b it should match -
In a-4 it should match -
In 3-a it should match -
In 3-4 it should not match
I've tried negative lookahead and lookbehind, but I've only been able to come up with this (?<=\D)-(?=\D)|(?<=\d)-(?=\D)|(?<=\D)-(?=\d)
Is there a simpler way to specify this pattern?
Edit: using regex conditionals I think I can use (?(?<=\D)-|-(?=\D))
The following will work for this scenario. Be sure that your Regex flavor of choice has conditionals, otherwise this will not work:
-(?(?=\d)(?<=\D-))
- // match a dash
(? // If
(?=\d) // the next character is a digit
(?<= // then start a lookbehind (assert preceding characters are)
\D- // a non-digit then the dash we matched
) // end lookbehind
) // end conditional
With nothing as the substitution, as the dash is the only character captured.
Another option is to use an alternation to match a - when on the left is not a digit or match a - when on the right is not a digit:
(?<!\d)-|-(?!\d)
(?<!\d)- Negative lookbehind, assert what is on the left is not a digit and match -
| or
-(?!\d) Match - and assert what is on the right is not a digit using a negative lookahead
Regex demo