Capturing exact amount of numbers without Whitespaces - regex

I have a series of 8-digit-numbers that I need to capture via RegEx.
Single whitespaces can occur before, after and in some cases between the digits. In some cases, other chars follow. Here's the most common variations, each of which I want to capture as 12345678:
123456789
12345678
1234567 89S
12345 678 9
123 456789
123456 789
Is this possible?

I think a regex like:
(( )?\d){8}
Would suffice to capture the digits - I'd then remove the whitespace (before further processing) as a separate step.
I'm not sure how strictly to interpret the OP's "single whitespaces" requirement, but it's why I've structured my RegEx to accept 8 digits, each of which is optionally prefixed by a single space character.
If it should only match if there are single spaces, and not any more, the above works whereas the "strip whitespace first" or "strip non-digits first" approaches will not.
If more spaces are allowed, it's easy to change the ? to a * or any fixed upper limit.

This is not possible in a single "regex" step. I can go into more detail if you like, but basically regex cannot "count" (it can only match a specified match size, such as "8 numbers", but not "an unknown number of characters, 8 of which are numbers").
You need to do this in two stages -
first remove whitespace.
then perform a regex match.
For instance, in ruby:
thingtomatch = " 12 3456 7899X"
temp = thingtomatch.squeeze(' ').strip # => temp="1234567899X"
matched_digits = temp.match(/(\d{8}).*/)[1]
(Or, as other answers have suggested, you could perform a regex match and then remove whitespace from the result.)

You can do it, but in two steps:
First, remove non-digits:
s/[^\d]//g
Second, match your digits:
m/^(\d{8})$/

Related

Regex Giftcard number pattern

I am trying to come up with a regex for a giftcard number pattern in an application. I have this so far and it works fine:
(?:5049\d{12}|6219\d{12}) = 5049123456789012
What I need to account for though is numbers that are separated by dashed or spaces like so:
5049-1234-5678-9012
5049 1234 5678 9012
Can I chain these patterns together or do I need to make separate for each type?
The easiest and most simple regex could be:
(?:(5049|6219)([ -]?\d{4}){3})
Explanation:
(5049|6219) - Will check for the '5049' or '6219' start
(x){3} - Will repeat the (x) 3 times
[ -]? - Will look for " " or "-", ? accepts it once or 0 times
\d{4} - Will look for a digit 4 times
A more detailed explanation and example can be found here: https://regex101.com/r/A46GJp/1/
Use (?:5049|6219)(?:[ -]?\d{4}){3}
First, match one of the two leads. Then match 3 groups of 4 digits each, each group optionally preceded by space or dash.
See regex101 for demo, and also explains in more detail.
The above regex will also match if separators are mixed, e.g. 5049 1234-5678 9012. If you don't want that, use
(?:5049|6219)([ -]?)\d{4}(?:\1\d{4}){2} regex101
This captures the first separator, if any, and specifies that the following 2 groups must use that same separator.
Try this :
(?:(504|621)9(\d{12}|(\-\d{4}){3}|(\s\d{4}){3}))
https://regex101.com/r/SyjaT5/6

Regex - matching while ignoring some characters

I am trying to write a regex to max a sequence of numbers that is 5 digits long or over, but I ignore any spaces, dashes, parens, or hashes when doing that analysis. Here's what I have so far.
(\d|\(|\)|\s|#|-){5,}
The problem with this is that this will match any sequence of 5 characters including those characters I want to ignore, so something like "#123 " would match. While I do want to ignore the # and space character, I still need the number itself to be 5 digits or more in order to qualify at a match.
To be clear, these would match:
1-2-3-4-5
123 45
2(134) 5
Bonus points if the matching begins and ends with a number rather than with one of those "special characters" I am excluding.
Any tips for doing this kind of matching?
If I understood requirements right you can use:
^\d(?:[()\s#-]*\d){4,}$
RegEx Demo
It always matches a digit at start. Then it is followed by 4 or more of a non-capturing group i.e. (?:[()\s#-]*\d) which means 0 or more of any listed special character followed by a digit.
So just repeat a digit, followed by any other sequence of allowed characters 5 or more times:
^(\d[()\s#-]*){5,}$
You can ensure it ends on a digit if you subtract one of the repetitions and add an explicit digit at the end:
^(\d[()\s#-]*){4,}\d$
You can suggest non-digits with \D so et would be something like:
(\d\D*){5,}
Here is a guide.

Reg Exp: match specific number of characters or digits

My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.

tweak regex to accept several of the pattern

I validate my input if it passes both regex above.
How can I tweak both regex so that it accepts a list ie (like on input2 and input3). Right now my regex only work on input1.
2 or higher:
^\d{2}\d*$
non 0:
^[1-9]\d*$
input1: 123
input2: 123, 456
input3: 123, 456, 789
First of all, the pattern you posted is equivalent to ^\d{2,}$, which requires the number to have two or more digits. The regex for an integer greater than or equal to 2 is more like ^[+-]?0*([2-9]|[1-9]\d+)$. From your description, it's not clear which of these you intended.
Either way, what you want to use is something like this:
^(<pattern>(,\s|$))+$`
So for your scenario, it would be something like:
^(\d{2,}(,\s|$))+$ #2 or more digits
^(0*([2-9]|[1-9]\d+)(,\s|$))+$ #positive integers >= 2
^(0*[1-9]\d*)(,\s|$))+$ #positive integers > 0
I'm not sure what flavor of regex you're using, but if your engine balks at the redundant use of $ in the patterns above, you could try something like
^(<pattern>,\s)*<pattern>$
instead. Example:
^(\d{2,},\s)*\d{,2}$ #2 or more digits, simplified
Bear in mind that a better way to do this is usually to split the string on the comma + whitespace separator, which will give you an array of strings you can try to parse as integers.
First of all, here's what I'd recommend for your base regexes:
Two or more digits:
\d{2,}
One or more digits:
\d+
Now if you want either of them to match a comma-and-space separated list, you could use:
(?:\d{2,}(?:\s*,\s*)?)+
and
(?:\d+(?:\s*,\s*)?)+
respectively
Try this:
^[1-9]\d*(?:\s*,\s*[1-9]\d*)*$
[1-9]\d* matches one or more digits, the first of which cannot be zero. If there are any more characters after the first number, they must comprise a comma optionally surrounded by whitespace -- \s*,\s* -- followed by another number. And that repeats as many times as necessary.
You can be more strict about the format if you like. For example, if the comma must follow immediately after the number and there must be exactly one space after the comma, you can use this:
^[1-9]\d*(?:, [1-9]\d*)*$

How do I write a regex that won't match a certain amount of whitespace?

I'm trying to write a regex that won't match a certain number of white spaces, but it's not going the way I expected.
I have these strings:
123 99999 # has 6 white spaces
321 99999 # same
123 8888 # has 3 white spaces \
321 8888 # same | - These are the lines I
1237777 | want to match
3217777 /
I want to match the last four lines, i.e. starts with 123 or 321 followed by anything but 6 whitespace characters:
^(123|321)[^\ ]{6}.*
This doesn't seem to do the trick - this matches only the two last ones. What am I missing?
" 888"
If you match this up, this does not match [^\ ]{6}: this is saying
[not a space][not a space][not a space][not a space][not a space][not a space]
In this case, you have the problem that the first 3 characters are a space, so it's not matching up right.
You can use a negative lookahead ^(123)|(321)(?!\s{6}). What I prefer because it is more readable, is to write the regular expression to match what you don't want, then negate (i.e., not, !, etc.). I don't know enough about your data, but I would do use \s{6}, then negate it.
Try this:
^(123|321)(?!\s{6}).*
(uses a negative lookahead so see if there are 6 whitespaces in .* match)
What language are you doing this in? If in Perl or something that supports PCREs, you can simply use a negative lookahead assertion:
^(123)|(321)(?!\ {6}).*
You need to first say that it may have 3 whitespaces and then deny the existence of the three more whitespaces, like this:
^([0-9]+)(\s{0,3})([^ ]{3})([0-9]*)$
^([0-9]+) = Accepts one or more numbers in the beginning of your string.
(\s{0,3}) = Accepts zero or up to three spaces.
([^ ]{3}) = Disallow the next 3 spaces after the allowed spaces.
([0-9]*) = Accepts any number after spaces till the end of your string.
Or:
^([0-9]+)(\s{0,3})(?!\s+)([0-9]*)$
The only change here is that after the three allowed spaces it won't accept any more spaces (I particularly like this second option more because it's more readable).
Hope it helps.