Regular Expression Problem - Matching A Single Letter. Exclude consecutive letters - regex

I'm having an issue creating a regular expression that will give me what I want. I need your help! So the text we are using is:
S 1SS 1S
"S" and "1S" are matches. "1SS" is not a match. I would like it to be a little more specific than just excluding anything with three characters but that may be a solution.
Any other ideas on how to exclude "1SS"? I can't figure it out!
Thank you,
Mark S.

You can use a negative lookahead pattern to avoid matching a consecutive letter S:
\b\d*S(?!S)
Demo: https://regex101.com/r/sv467b/2
Explanations: \b matches a word boundary to ensure that this won't match the second S in two consecutive Ses. \d* matches zero or more digits to allow optional preceding numbers. S is followed by (?!S), a negative lookahead pattern to ensure that what follows S is not another S.

A regexp with more general applications is something like:
\b(?:(.)(?!\1))+\b
\b is for word boundaries.
List item
(?:) is a non-capturing group.
(?:) is a negative lookahead group.
\1 is the group reference.

Related

Regular expression for SSN without all consecutive numbers

I'm working on a regular expression for SSN with the rules below. I have successfully applied all matching rules except #7. Can someone help alter this expression to include the last rule, #7:
^((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$|(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}$)
Hyphens should be optional (this is handled above by using 2 expressions with an OR
Cannot begin with 000
Cannot begin with 666
Cannot begin with 900-999
Middle digits cannot be 00
Last four digits cannot 0000
Cannot be all the same numbers ex: 111-11-1111 or 111111111
Add the following negative look ahead anchored to start:
^(?!(.)(\1|-)+$)
See live demo.
This captures the first character then asserts the rest of the input is not made of that captured char or hyphen.
The whole regex can be shortened to:
^(?!(.)(\1|-)+$)(?!000|666|9..)(?!...-?00)(?!.*0000$)\d{3}(-?)\d\d\3\d{4}$
See live demo.
The main trick to not having to repeat the regex both with and without the hyphens was to capture the optional hyphen (as group 3), then use a back reference \3 to the capture in the next position, so are either both there or both absent.
First, let's shorten the pattern as it contains two next-to identical alternatives, one matching SSN with hyphens, and the other matching the SSN numbers without hyphens. Instead of ^(x-y-z$|xyz$) pattern, you can use a ^x(-?)y\1z$ pattern, so your regex can get reduced to ^(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$, see this regex demo here.
To make a pattern never match a string that contains only identical digits, you may add the following negative lookahead right after ^:
(?!\D*(\d)(?:\D*\1)*\D*$)
It fails the match if there are
\D* - zero or more non-digits
(\d) - a digit (captured in Group 1)
(?:\D*\1)* - zero or more occurrences of any zero or more non-digits and then then same digit as in Group 1, and then
\D*$ - zero or more non-digits till the end of string.
Now, since I suggested shortening the regex to the pattern with backreference(s), you will have to adjust the backreferences after adding this lookahead.
So, your solution looks like
^(?!\D*(\d)(?:\D*\1)*\D*$)(?!000|666)[0-8]\d{2}(-?)(?!00)\d{2}\2(?!0000)\d{4}$
^(?![^0-9]*([0-9])(?:[^0-9]*\1)*[^0-9]*$)(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\2(?!0000)[0-9]{4}$
Note the \1 in the pattern without the lookahead turned into \2 as (-?) became Group 2.
See the regex demo.
Note also that in some regex flavors \d is not equal to [0-9].

Regex Negative Lookbehind to Negate a Full Capture

I'm want to detect numbers that don't have letters before them.
example:
ignore: covid19
accept: 19
I have this regex: (?<![a-z])(\d+) that uses a negative lookbehind to check if there's letters b4 the numbers, and if so, don't capture.
The problem is that if I type covid1 then it is ignored as expected, but if I type covid19 or covid1+[any more numbers] it doesn't get ignored.
How do I do a negative lookbehind that negates the whole capture following it?
Your regex was very close. Although with some changes if you want only the numbers to matched i.e. it doesn't has any letters associated with it.
(?<![\S])(\b\d+\b)
^^ ^^ ^^ changes
Explanation of the above regex:
\b - Represents a word-boundary.
\d+ - Matches digit[0-9] one or more times.
\S - Matches any non-whitespace character.
(?<![\S]) - Negative look-behind asserts that non-whitespace characters should not be matched before and after the 1st capturing group.
You can find the demo in here.

How to consume lookaround in regex?

I want to match
abc_def_ghi,
abc_abc_ghi,
abc_a2a_ghi,
abc_999_ghi
but not abc_xxx_ghi (with xxx in center).
I came up to manually consuming look ahead (abc_(?!xxx)..._ghi), but I wonder is there any other way without manually specifying number of characters to skip.
Original qustion was with numbers, updated for strings case.
If you don't want to specify exactly how many characters to skip, perhaps you could use a quantifier like + in the negative lookahead and use a negated character class to match not an underscore.
\babc_(?!x+_)[^_]+_ghi\b
Explanation
\babc_ Word boundary, match abc_
(?! Negative lookahead, assert what is directly on the right is not
x+_ Match 1+ times x followed by an underscore
) Close lookahead
[^_]+_ Negated character class, match 1+ times any char except _
ghi\b Match ghi and word boundary
Regex demo
You can use this
123_(?:(?!000)\d){3}_789
Regex demo
If you don't wish to use look-arounds, this expression might be an option:
(?:abc_xxx_ghi)|(abc_.{3}_ghi)
Other than that I can't think of anything else.
DEMO

Regex to match this pattern (numbers must be between brackets divided by minus)

I want to match a pattern with regex, the pattern is:
A-Za-z1-9[0-9-0-9]
so for example:
test1[1-50]
Can you help me ?
Solution update:
^[A-Za-z0-9]+\[[0-9]+-[0-9]+]$
Use this regex: [A-Za-z]+[1-9]\[[0-9]+-[0-9]+\]. You might also want to add \b at the start of the regex to match only after non words character.
[A-Za-z]+ matches things like test, only letters are accepted, one or more times
[1-9] matches a any digit but 0
\[[0-9]+-[0-9]+\] matches one or more digits twice and separated with -. All this must be enclosed with square brackets. (You need to escape those with \ because they are metacharacters)

Regex: how to use lookahead/lookbehind on the result of a pattern?

I'm trying to learn more about regex today.
I'm simply trying to match an order number not surrounded by brackets (#1234 but not [#1234]) but my question is more in general about using lookahead assertions on an arbitrary pattern.
On my first attempts I noticed my negative lookahead match \d+(?!\]) would cause the \d+ to keep matching digits until it wasn't followed by a ]. I need the digits to match only if their entirety isn't followed by a ].
My current solution kills the match at the first digit by looking ahead to see if there's a ] in the digit chain.
Is this a standard way to go about this? I'm just repeating the match pattern in the lookahead. If this were a more complex regex, would I approach it the same? Repeat the valid match followed by the invalid match and have the regex engine repeat itself for every letter?
For valid matches, it would have to match itself as many times as the characters in the match.
(?<!\[) # not preceded by [
#\d+
(?!\d*\]) # not followed zero+ digits and ]
# or (?!\d|\]) # not followed by digit or ]
I'd appreciate any feedback!
You can achieve what you want by using a possessive quantifier along with lookarounds like this
(?<!\[)#\d++(?!\])
The problem in your case is when you use \d+ it allows backtracking and ends up having a partial match #123. Once you change that to possessive quantifier, it will not backtrack and only match if the sequence of digits is not preceded/followed by brackets.
Live Demo
Edit
If possessive quantifiers are not supported then you can use this one
#\d(?<!\[#\d)(?!\d*\])\d*