Negative Lookahead not match suffix - regex

I have an expression that is matching something, but am trying to get this not to match if it's followed by the suffix: one or more spaces, three dashes, one or more spaces, one or more digits, a slash, and finally one or more digits. Here is the expression:
(?<=(^|\s+))[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!(\s+\-\-\-\s+[0-9]+/[0-9]+))
And here is the text:
January 10.5/13.5 --- 22/26 ---
It's matching January 10.5/13, but I don't want it to match anything.

As lookarounds are supported, you can change the positive lookbehind at the start to a negative lookbehind asserting a whitespace boundary to the left (?<!\S)
You can use .* to it to scan the whole line, instead of starting with 1+ more whitespace chars \s+
The negative lookahead (?!.*\s-{3}\s+[0-9]+/[0-9] asserts that what is on the right is not the suffix.
You can omit the quantifier + after the last character class, as it does not matter if there are 1 or more digits following...as long as it is not a digit.
Note that in the current pattern, the decimal part is an optional capturing group 2. If you want that whole value in group 1, you can make it an optional group.
(?<!\S)[A-Z]+[ ]+([0-9]+(\.[0-9]{1,3})?)/([0-9]+(\.[0-9]{1,3})?)(?!.*\s-{3}\s+[0-9]+/[0-9])
Regex demo

Related

Regex - All before an underscore, and all between second underscore and the last period?

How do I get everything before the first underscore, and everything between the last underscore and the period in the file extension?
So far, I have everything before the first underscore, not sure what to do after that.
.+?(?=_)
EXAMPLES:
111111_SMITH, JIM_END TLD 6-01-20 THR LEWISHS.pdf
222222_JONES, MIKE_G URS TO 7.25 2-28-19 SA COOPSHS.pdf
DESIRED RESULTS:
111111_END TLD 6-01-20 THR LEWISHS
222222_G URS TO 7.25 2-28-19 SA COOPSHS
You can match the following regular expression that contains no capture groups.
^[^_]*|(?!.*_).*(?=\.)
Demo
This expression can be broken down as follows.
^ # match the beginning of the string
[^_]* # match zero or more characters other than an underscore
| # or
(?! # begin negative lookahead
.*_ # match zero or more characters followed by an underscore
) # end negative lookahead
.* # match zero or more characters greedily
(?= # begin positive lookahead
\. # match a period
) # end positive lookahead
.*_ means to match zero or more characters greedily, followed by an underscore. To match greedily (the default) means to match as many characters as possible. Here that includes all underscores (if there are any) before the last one. Similarly, .* followed by (?=\.) means to match zero or more characters, possibly including periods, up to the last period.
Had I written .*?_ (incorrectly) it would match zero or more characters lazily, followed by an underscore. That means it would match as few characters as possible before matching an underscore; that is, it would match zero or more characters up to, but not including, the first underscore.
If instead of capturing the two parts of the string of interest you wanted to remove the two parts of the string you don't want (as suggested by the desired results of your example), you could substitute matches of the following regular expression with empty strings.
_.*_|\.[^.]*$
Demo
This regular expression reads, "Match an underscore followed by zero of more characters followed by an underscore, or match a period followed by zero or more characters that are not periods, followed by the end of the string".
You could use 2 capture groups:
^([^_\n]+_).*\b([^\s_]*_.*)(?=\.)
^ Start of string
([^_\n]+_) Capture group 1, match any char except _ or a newline followed by matching a _
.*\b Match the rest of the line and match a word boundary
([^\s_]*_.*) Capture group 2, optionally match any char except _ or a whitespace char, then match _ and the rest of the line
(?=\.) Positive lookahead, assert a . to the right
See a regex demo.
Another option could be using a non greedy version to get to the first _ and make sure that there are no following underscores and then match the last dot:
^([^_\n]+_).*?(\S*_[^_\n]+)\.[^.\n]+$
See another regex demo.
Looks like you're very close. You could eliminate the names between the underscores by finding this
(_.+?_)
and replacing the returned value with a single underscore.
I am assuming that you did not intend your second result to include the name MIKE.

Regex for extracting digits in a string not in a word and not separated by a symbol?

I want to extract an ID from a search query but I don't know the length of the ID.
From this input I want to get the numbers that are not in the words and the numbers that are not separated by symbols.
12 11231390 good123e41 12he12o1 1391389 dajue1290a 12331 12-10 1.2 test12.0why 12+12 12*6 2d1139013 09`29 83919 1
Here I want to return
12 11231390 1391389 12331 83919 1
So far I've tried /\b[^\D]\d*[^\D]\b/gm but I get the numbers in between the symbols and I don't get the 1 at the end.
You could repeatedly match digits between whitespace boundaries. Using a word boundary \b would give you partial matches.
Note that [^\D] is the same as \d and would expect at least a single character.
Your pattern can be written as \b\d\d*\d\b and you can see that you don't get the 1 at the end as your pattern matches at least 2 digits.
(?<!\S)\d+(?:\s+\d+)*(?!\S)
The pattern matches:
(?<!\S) Negateive lookbehind, assert a whitespace boundary to the left
\d+(?:\s+\d+)* Match 1+ digits and optionally repeat matching 1+ whitespace chars and 1+ digits.
(?!\S) Negative lookahead, assert a whitspace boundary to the right
Regex demo
If lookarounds are not supported, you could use a match with a capture group
(?:^|\s)(\d+(?:\s+\d+)*)(?:$|\s)
Regex demo

Regex to match character if not between digits

I need to match a character to split a big string, let's say -, but not if it's between two digits
In a-b it should match -
In a-4 it should match -
In 3-a it should match -
In 3-4 it should not match
I've tried negative lookahead and lookbehind, but I've only been able to come up with this (?<=\D)-(?=\D)|(?<=\d)-(?=\D)|(?<=\D)-(?=\d)
Is there a simpler way to specify this pattern?
Edit: using regex conditionals I think I can use (?(?<=\D)-|-(?=\D))
The following will work for this scenario. Be sure that your Regex flavor of choice has conditionals, otherwise this will not work:
-(?(?=\d)(?<=\D-))
- // match a dash
(? // If
(?=\d) // the next character is a digit
(?<= // then start a lookbehind (assert preceding characters are)
\D- // a non-digit then the dash we matched
) // end lookbehind
) // end conditional
With nothing as the substitution, as the dash is the only character captured.
Another option is to use an alternation to match a - when on the left is not a digit or match a - when on the right is not a digit:
(?<!\d)-|-(?!\d)
(?<!\d)- Negative lookbehind, assert what is on the left is not a digit and match -
| or
-(?!\d) Match - and assert what is on the right is not a digit using a negative lookahead
Regex demo

Custom Credit Card regex

I'm working with a custom credit card validator which has following conditions:
It must start with a 4,5 or 6.
It must contain exactly 16 digits.
It must only consist of digits (0-9).
It may have digits in groups of 4, separated by one hyphen -.
It must NOT use any other separator like ' ' , '_', etc.
It must NOT have 4 or more consecutive repeated digits.
I'm not able to find a regex fulfilling the last condition.
I have following regex for other conditions:
r'[456][0-9]{3}-?[0-9]{4}-?[0-9]{4}-?[0-9]{4}':
For the last condition you could use a negative lookahead that asserts that what follows is not a digit followed by 3 or more times an optional hypen and a digit.
Explanation for the added part (?![-\d]*([0-9])(?:-?\1){3,})
(?!) Negative lookahead
[-\d]* Match zero or more times a digit or a hyphen
([0-9]) Capture a digit in a group
(?:-?\1){3,} Repeat 3 or more times in a non capturing group an optional hyphen -? followed by a backreference to the digit that is captured in group 1
) Close negative lookahead
Your regex with added anchors to assert the start ^ and the end $ of the line could look like:
^(?![-\d]*([0-9])(?:-?\1){3,})[456][0-9]{3}-?[0-9]{4}-?[0-9]{4}-?[0-9]{4}$

Finding words in a string that start with number (Regex)

I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.
For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.
To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here