I am new to RegEx and I am having some difficult time when trying to detect a pattern.
I want to identify a number that is between 4000-4999 but at the same time must NOT be preceded or followed by another number with an optional character of either space or hyphen "-".
For example:
4567 (match)
I have 4999 roses (match)
1234567 days are gone (no match)
My water supply account is 123 4567 89 (no match)
Howdy, my cell number is 123-4567-89 (no match)
I tried below pattern
(?<!(\d))\b4\d{3}\b(?!(\d))
but it still gives me a match for 123 4567 - I guess there is something special about \b?
Any advice will be highly appreciated.
Thanks,
Eric
You may use
(?<!\d[\s-]|\d)4\d{3}(?![\s-]?\d)
In .NET, JavaScript ECMAScript 2018 compliant environments, or PyPi regex, where lookbehinds patterns can contain ?, *, + and {min,} quantifiers, you may shorten it to
(?<!\d[\s-]?)4\d{3}(?![\s-]?\d)
Or, in case alternation with different length is not supported (as in Boost or Python), use
(?<!\d[\s-])(?<!\d)4\d{3}(?![\s-]?\d)
See the regex demo and regex demo 2 (and a .NET regex demo).
Details
(?<!\d[\s-]|\d) / (?<!\d[\s-]?) / (?<!\d[\s-])(?<!\d) - no digit and a whitespace/- and no digit immediately to the left of the current position is allowed
4\d{3} - 4 and any 3 digits
(?![\s-]?\d) - immediately to the right, no 1 or 0 occurrences of a whitespace/- followed with a digit is allowed.
NOTE The solutions above do not rely on word boundaries and may even match in between underscores and when glued to words. If you really want to avoid that, then you need to use word boundaries by all means, e.g. (?<!\d[\s-]|\d)\b4\d{3}\b(?![\s-]?\d).
How about using Positive Lookahead and Positive Lookbehind along with [^ ]? I think it can get you the desired results.
Pattern:
(?<=^|[^\d]{2})4[0-9]{3}(?=$|[^\d]{2})
Example: https://regex101.com/r/PYPeCk/2/
Related
I'm using a proprietary software to look in the body of an email for an SSN using this regex: ((?!666|000)[0-8][0-9\_]{2}[.-]?(?!00)[0-9\_]{2}[.-]?(?!0000)[0-9\_]{4})
It's a pretty common SSN regex and works great.
My issue is that this will only match the SSN when it is the ONLY thing in the body. So to get around that, I'm adding .* and .* to the beginning and end. Which works great.
Now my issue is that it is also matching numbers with 10 digits, which is a different number - our account number. Finally the question - anyway to take this regex and only look for 9 digits. I'm thinking \d{9} but not sure how to append it on to the end.
If you are using a proprietary software where you have to use .* to get a full match you could make use of word boundaries \b and an alternation to match your pattern or 9 digits.
Note that you don't have to escape the underscore in the character class.
.*\b((?:(?!666|000)[0-8][0-9_]{2}[.-]?(?!00)[0-9_]{2}[.-]?(?!0000)[0-9_]{4}|[0-9]{9}))\b.*
^^^^^^^^^
Regex demo
If lookarounds are supported, you might also assert what is on the left and on the right is not a non whitespace char using (?<!\S) and (?!\S)
.*(?<!\S)((?:(?!666|000)[0-8][0-9_]{2}[.-]?(?!00)[0-9_]{2}[.-]?(?!0000)[0-9_]{4}|[0-9]{9}))(?!\S).*
Regex demo
While this may not look very fun at all, it will validate any valid Social Security Number as per the constraints (and exceptions) listed by the Social Security Administration.
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$
Sorry for bad paste here.
^ # Start of expression
(?!219-09-9999|078-05-1120) # Don't allow "219-09-999" or "078-05-1120" explicitly
(?!666|000|9\d{2})\d{3} # Don't allow the SSN to begin with 666, 000 or anything between 900-999
- # Explicit dash (separating Area and Group numbers)
(?!00)\d{2} # Don't allow the Group Number to be "00"
- # Another dash (separating Group and Serial numbers)
(?!0{4})\d{4} # Don't allow last four digits to be "0000"
$ # End of expression
Follow instructions found here http://rion.io/2013/09/10/validating-social-security-numbers-through-regular-expressions-2/
Hope this helps!
I've got the following text:
instance=hostname1, topic="AB_CD_EF_12345_ZY_XW_001_000001"
instance=hostname2, topic="AB_CD_EF_1345_ZY_XW_001_00001"
instance=hostname1, topic="AB_CD_EF_1235_ZY_XW_001_000001"
instance=hostname2, topic="AB_CD_EF_GH_4567_ZY_XW_01_000001"
instance=hostname1, topic="AB_CD_EF_35678_ZY_XW_001_00001"
instance=hostname2, topic="AB_CD_EF_56789_ZY_XW_001_000001"
I would like to capture numbers from the sample above. I've tried to do so with the regular expressions below and they work well as separate queries:
Regex: *.topic="AB_CD_EF_([^_]+).*
Matches: 12345 1345 1235
Regex: *.topic="AB_CD_EF_GH_([^_]+).*
Matches: 4567 35678 56789
But I need a regex which can give me all numbers, ie:
12345 1345 1235 4567 35678 56789
Make GH_ optional:
.*topic="AB_CD_EF_(GH_)?([^_]+).*
which matches all your target numbers.
See live demo.
You could be more general by allowing any number of "letter letter underscore" sequences using:
.*topic="(?:[A-Z]{2}_)+([^_]+).*
See live demo.
Another option that we might call, would be an expression similar to:
topic=".*?[A-Z]_([0-9]+)_.*?"
and our desired digits are in this capturing group ([0-9]+).
Please see the demo for additional explanation.
From the examples and conditions you've given I think you're going to need a very restrictive regex, but this may depend on how you want to adapt it. Take a look at the following regex and read the breakdown for more information on what it does. Use the first group (there is only one in this regex) as a substitution to retrieve the numbers you are looking for.
Regex
^instance\=hostname[0-9]+\,\s*topic\=\“[A-Z_]+([0-9]+)_[A-Z_]+[0-9_]+\”$
Try it out in this DEMO.
Breakdown
^ # Asserts position at start of the line
hostname[0-9]+ # Matches any and all hostname numbers
\s* # Matches whitespace characters (between 0 and unlimited times)
[A-Z_]+ # Matches any upper-case letter or underscore (between 1 and unlimited times)
([0-9]+) # This captures the number you want
$ # Asserts position at end of the line
Although this does answer the question you have asked I fear this might not be exactly what you're looking for but without further information this is the best I can give you. In any case after you've studied the breakdown and played around the demo a it it should prove to be of some help.
The regex worked for me :
/.*topic="(?:[AB_CD_EF_(GH_)]{2,3}_)+([^_]]+).*/
I'm using regex in powershell 5.1.
I need it to detect groups of numbers, but ignore groups followed or preceeded by /, so from this it should detect only 9876.
[regex]::matches('9876 1234/56',‘(?<!/)([0-9]{1,}(?!(\/[0-9])))’).value
As it is now, the result is:
9876
123
6
More examples: "13 17 10/20" should only match 13 and 17.
Tried using something like (?!(\/([0-9]{1,}))), but it did not help.
You may use
\b(?<!/)[0-9]+\b(?!/[0-9])
See the regex demo
Alternatively, if the numbers can be glued to text:
(?<![/0-9])[0-9]+(?!/?[0-9])
See this regex demo.
The first pattern is based on word boundaries \b that make sure there are no letters, digits and _ right before and after an expected match. The second one just makes sure there are no digits and / on both ends of the match.
Details
(?<![/0-9]) - a negative lookbehind making sure there is no digit or / immediately to the left of the current location
[0-9]+ - one or more digis
(?!/?[0-9]) - a negative lookahead making sure there is no optional / followed with a digit immediately to the right of the current location.
I'm trying to apply a data validation formula to a column, checking if the content is a valid international telephone number. The problem is I can't have +1 or +some dial code because it's interpreted as an operator. So I'm looking for a regex that accepts all these, with the dial code in parentheses:
(+1)-234-567-8901
(+61)-234-567-89-01
(+46)-234 5678901
(+1) (234) 56 89 901
(+1) (234) 56-89 901
(+46).234.567.8901
(+1)/234/567/8901
A starting regex can be this one (where I also took the examples).
This regex match all the example you gave us (tested with https://fr.functions-online.com/preg_match_all.html)
/^\(\+\d+\)[\/\. \-]\(?\d{3}\)?[\/\. \-][\d\- \.\/]{7,11}$/m
^ Match the beginning of the string or new line.
To match (+1) and (+61): \(\+\d+\): The plus sign and the parentheses have to be escaped since they have special meaning in the regex. \d+ Stand for any digit (\d) character and the plus means one or more (the plus could be replaced by {1,2})
[\/\. \-] This match dot, space, slash and hyphen exactly one time.
\(?\d{3}\)?: The question mark is for optional parenthesis (? = 0 or 1 time). It expect three digits.
[\/\. \-] Same as step 3
[\d\- \.\/]{7,11}: Expect digits, hyphen, space, dot or slash between 7 and 11 time.
$ Match the end of the line or the end of the string
The m modifier allow the caret (^) and dollar sign ($) combination to match line break. Remove that if you want those symbol to match only the begining and the end of the string.
Slashes are use are delimiter for this regex (there are other character that you can use).
I must admit I don't like the last part of the regex as do not ensure that you have at least 7 digits.
It would be probably better to remove all the separator (by example with PHP function str_replace) and deal only with parenthesis and number with this regex
/(\(\+\d+\))(\(?\d{3}\)?)(\d{3})(\d{4})/m
Notice that in this last regex I used 4 capturing group to match the four digit section of the phone number. This regex keep the parenthesis and the plus sign of the first group and the optional parenthesis of the second group. To keep only the digits group, you can use this regex:
/\(\+(\d+)\)\(?(\d{3})\)?(\d{3})(\d{4})/m
Note: The groups are for formatting the phone number after validating it. It is probably better for you to keep all your phone number in your database in the same format.
Well, here are different possibility you can use.
Note: Those regex should be compatible with all regex engine, but it is good practice to specify with which language you works because regex engine don't deal the same way with advanced/fancy function.
By example, the look behind is not supported by javascript and .Net allow a more powerful control on lookbehind than PHP.
Keep me in touch if you need more information
I want to filter out all .+[0-9]. (correct way?) patterns to avoid duplicate decimal points within a numeral: (e.g., .12345.); but allow non-numerals to include duplicate decimal points: (e.g. .12345*.) where * is any NON-NUMERAL.
How do I include a non-numeral negation value into the regexp pattern? Again,
.12345. <-- error: erroneous numeral.<br/>
.12345(.' or '.12345*.' <-- Good.
I think you are looking for
^\d*(?:\.\d+)?(?:(?<=\d)[^.\d\n]+\.)?$
Here is a demo
Remember to escape the regex properly in Swift:
let rx = "^\d*(?:\\.\\d+)?(?:(?<=\\d)[^.\\d\\n]+\\.)?$"
REGEX EXPLANATION:
^ - Start of string
\d* - Match a digit optionally
(?:\.\d+)? - Match decimal part, 0 or 1 time (due to ?)
(?:(?<=\d)[^.\d\n]+\.)? - Optionally (due to ? at the end) matches 1 or more symbols preceded with a digit (due to (?<=\d) lookbehind) other than a digit ([^\d]), a full stop ([^.]) or a linebreak ([^\n]) (this one is more for demo purposes) and then followed by a full stop (\.).
$ - End of string
I am using non-capturing groups (?:...) for better performance and usability.
UPDATE:
If you prefer an opposite approach, that is, matching the invalid strings, you can use a much simpler regex:
\.[0-9]+\.
In Swift, let rx = "\\.[0-9]+\\.". It matches any substrings starting with a dot, then 1 or more digits from 0 to 9 range, and then again a dot.
See another regex demo
The non-numeral regex delimited character is \D. Conversely, if you're looking for only numerals, \d would work.
Without further context of what you're trying to achieve it's hard to suggest how to build a regex for it, though based on your example, (I think) this should work: .+\d+\D+