Regex negative match query - regex

I've got a regex issue, I'm trying to ignore just the number '41', I want 4, 1, 14 etc to all match.
I've got this [^\b41\b] which is effectively what I want but this also ignores all single iterations of the values 1 and 4.
As an example, this matches "41", but I want it to NOT match:
\b41\b

Try something like:
\b(?!41\b)(\d+)
The (?!...) construct is a negative lookahead so this means: find a word boundary that is not followed by "41" and capture a sequence of digits after it.

You could use a negative look-ahead assertion to exclude 41:
/\b(?!41\b)\d+\b/
This regular expression is to be interpreted as: At any word boundary \b, if it is not followed by 41\b ((?!41\b)), match one or more digits that are followed by a word boundary.
Or the same with a negative look-behind assertion:
/\b\d+\b(?<!\b41)/
This regular expression is to be interpreted as: Match one or more digits that are surrounded by word boundaries, but only if the substring at the end of the match is not preceded by \b41 ((?<!\b41)).
Or can even use just basic syntax:
/\b(\d|[0-35-9]\d|\d[02-9]|\d{3,})\b/
This matches only sequences of digits surrounded by word boundaries of either:
one single digit
two digits that do not have a 4 at the first position or not a 1 at the second position
three or more digits

This is similar to the question "Regular expression that doesn’t contain certain string", so I'll repeat my answer from there:
^((?!41).)*$
This will work for an arbitrary string, not just 41. See my response there for an explanation.

Related

Find certain colons in string using Regex

I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions
Preceeded or followed by a word e.g A Book: Chapter 1 or A Book :Chapter 1
Do not match if it is part of emoticons i.e :( or ): or :/ or :-) etc
Do not match if it is part of a given time i.e 16:00 etc
I've come up with a regex as such
(\:)(?=\w)|(?<=\w)(\:)
which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?
edit: it has to be in a single regex statement if possible
You can use
(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)
See the regex demo. Details:
(:\b|\b:) - Group 1: a : that is either preceded or followed with a word char
(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).
Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w):.
If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)).
More flexible solution
Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need. This is called "best regex trick ever". So, you may use a regex like
8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)
that will match 8:, :P, :D, one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char. All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.
Word characters \w include numbers [a-zA-Z0-9_]
So just use [a-ZA-Z] instead
(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)
Test Here

Is there a way to use Regex to capture numbers out of a string based on a specific leading letters?

I need to extract any number between 4-10 digits that following directly after 'PO#' OR 'PO# ' (with a whitespace). I do not want to include the PO# with the actual value that is extracted, however I do need it as criteria to target the value within a string. If the digits are less than 4 or greater than 10, I do not wish to capture the value and would like to otherwise ignore it.
A sample string would look like this:
PO#12445 for Vendor Enterprise
or
Invoice# 21412556 for Vendor Enterprise for PO# 12445
My current RegEX expression captures PO# with '#' and I use additional logic after the fact to remove the '#', however my expression is also capturing Invoice# and Inv# which I don't want it to do. I'd like it to only target PO#.
Current Expression: [P][O][#]\s*[0-9]{3,9}\d+\w
Any help would be greatly appreciated!
If you need only the digits, you can use \b(?<=PO#)\s?(\d{4,10})\b, with:
(?<=PO#): positivive lookbehind, be sure that this pattern is present before the needed pattern (PO followed by #)
\s?: 0 or 1 whitespace
(\d{4,10}): between 4 and 10 digits
\b: word boundaries to avoid ie. the 10 first digits of a 11 digits pattern match or 'SPO#' to match
Edit: Alexander Mashin is right about the lookbehind having to be fixed width, so \b(?<=PO#)\s?(\d{4,10})\b is better https://regex101.com/r/1KBQd1/5
Edit: added word boundaries
You can use a capturing group and repeat matching the digits 4-10 times using [0-9]{4,10}.
Note that [P][O][#] is the same as PO#
\bPO#\s*([0-9]{4,10})\b
\bPO#\s* Match PO# preceded by a word boundary and match 0+ whitespace chars
( Capture group 1
[0-9]{4,10} Match 4 - 10 digits
)\b Close group followed by a word boundary to prevent the match being part of a larger word
Regex demo
If PCRE is available, how about:
PO#\s*\K\d{4,10}(?=\D|$)
PO#\s* matches the leading substring "PO#" followed by 0 or more whitespaces.
\K resets the starting position of the match and works as a positive (zero length) lookbehind.
\d{4,10} matches a sequence of digits of 4 <= length <= 10.
(?=\D|$) is the positive lookahead to match a non-digit character or the end of the string.

Regex match an entire number with lookbehind and look ahead logic(without word boundaries)

I am trying to detect if a string has a number based on few conditions. For example I don't want to match the number if it's surrounded by parentheses and I'm using the lookahead and lookbehind to do this but I'm running into issues when the number contains multiple digits. Also, the number can be between text without any space separators.
My regex:
(?https://regex101.com/r/RnTSMJ/1
Sample examples:
{2}: Should NOT Match. //My regex Works
{34: Should NOT Match. //My regex matches 4 in {34
45}: Should NOT Match. //My regex matches 4 in {45
{123}: Should NOT Match. //My regex matches 2 in {123}
I looked at Regex.Match whole words but this approach doesn't work for me. If I use word boundaries, the above cases work as expected but then cases like the below don't where numbers are surrounded with text. I also want to add some additional logic like don't match specific strings like 1st, 2nd, etc or #1, #2, etc
updated regex:
(?<!\[|\{|\(|#)(\b\d+\b)(?!\]|\}|\|st|nd|rd|th)
See here https://regex101.com/r/DhE3K4/4
123abd //should match 123
abc345 //should match 234
ab2123cd // should match 2123
Is this possible with pure regex or do I need something more comprehensive?
You could match 1 or more digits asserting what is on the left and right is not {, } or a digit or any of the other options to the right
(?<![{\d#])\d+(?![\d}]|st|nd|rd|th)
Explanation
(?<![{\d#]) Negative lookbehind, assert what is on the left is not {, # or a digit
\d+ Match 1+ digits
(?! Negative lookahead, assert what is on the right is not
[\d}]|st|nd|rd|th Match a digit, } or any of the alternatives
) Close lookahead
Regex demo
The following regex is giving the expected result.
(?<![#\d{])\d+(?!\w*(?:}|(?:st|th|rd|nd)\b))
Regex Link

A special Regular Expression

I want to have a restriction a string which can accept alphanumeric values and hiphen.
I am providing 3 examples to have a clear idea.
1) AS15JKM-125TR-325AMOR
2) ITEW32-DE432OI
3) 09IURE765EDR
There is no specific pattern, There may b 0 to 3 hiphens in a string.
I just want to restrict it in such a way that it should accept only alphanumeric value and
only Hiphen, no other special character.
plz help me on this.
Option 1: No Lookahead
^(?:[A-Za-z0-9]*-){0,3}[A-Za-z0-9]+$
Note that if you only want uppercase letters, you need to remove a-z
Explanation
The ^ anchor asserts that we are at the beginning of the string
The non-capturing group (?:[A-Za-z0-9]*-) matches zero or more letters or digit, then a hyphen
This is repeated zero to three times, enforcing your limit on hyphens
[A-Za-z0-9]+ matches one or more letters or digit
The $ anchor asserts that we are at the end of the string
Option 2: With Lookahead
This does not present any benefit over the first version, I am just showing it for completion.
^(?=(?:[^-]*-){0,3}[^-]*$)[A-Za-z0-9]+$
Explanation
The lookahead (?=(?:[^-]*-){0,3}[^-]*$) asserts that what follows is
(?:[^-]*-) any number of non-hyphens, followed by a hyphen
{0,3} zero to three times
then [^-]*$ any number of non-hyphens and the end of the string
Option 3: With Negative Lookahead
Courtesy of #Jerry:
^(?!(?:[^-]*-){4})[A-Za-z0-9]+$
Explanation
The negative lookahead (?!(?:[^-]*-){4}) asserts that it is not possible to find a non-hyphen followed by a hyphen four times.
Assuming you do not want to count the hyphens, something like so should work: ^[A-Z0-9 -]+$.
An example of the regex is available here.

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.