Social Security Number Validation That Accepts Dashes, Spaces or No Spaces - regex

Social Security numbers that I want to accept are:
xxx-xx-xxxx (ex. 123-45-6789)
xxxxxxxxx (ex. 123456789)
xxx xx xxxx (ex. 123 45 6789)
I am not a regex expert, but I wrote this (it's kind of ugly)
^(\d{3}-\d{2}-\d{4})|(\d{3}\d{2}\d{4})|(\d{3}\s{1}\d{2}\s{1}\d{4})$
However this social security number passes, when it should actually fail since there is only one space
12345 6789
So I need an updated regex that rejects things like
12345 6789
123 456789
To make things more complex it seems that SSNs cannot start with 000 or 666 and can go up to 899, the second and third set of numbers also cannot be all 0.
I came up with this
^(?!000|666)[0-8][0-9]{2}[ \-](?!00)[0-9]{2}[ \-](?!0000)[0-9]{4}$
Which validates with spaces or dashes, but it fails if the number is like so
123456789
Ideally these set of SSNs should pass
123456789
123 45 6789
123-45-6789
899-45-6789
001-23-4567
And these should fail
12345 6789
123 456789
123x45x6789
ABCDEEEEE
1234567890123
000-45-6789
123-00-6789
123-45-0000
666-45-6789

More complete validation rules are available on CodeProject at http://www.codeproject.com/Articles/651609/Validating-Social-Security-Numbers-through-Regular. Copying the information here in case the link goes away, but also expanding on the codeproject answer a bit.
A Social Security number CANNOT :
Contain all zeroes in any specific group (ie 000-##-####, ###-00-####, or ###-##-0000)
Begin with ’666′.
Begin with any value from ’900-999′
Be ’078-05-1120′ (due to the Woolworth’s Wallet Fiasco)
Be ’219-09-9999′ (appeared in an advertisement for the Social Security Administration)
This RegEx taken from the referenced CodeProject article will validate all Social Security numbers according to all the rules - requires dashes as separators.
^(?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
Same with spaces, instead of dashes
^(?!219 09 9999|078 05 1120)(?!666|000|9\d{2})\d{3} (?!00)\d{2} (?!0{4})\d{4}$
Finally, this will validate numbers without spaces or dashes
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$
Combining the three cases above, we get the
Answer
^((?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4})|((?!219 09 9999|078 05 1120)(?!666|000|9\d{2})\d{3} (?!00)\d{2} (?!0{4})\d{4})|((?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4})$

To solve your problem with dashes, spaces, etc. being consistent, you can use a backreference. Make the first separator a group and allow it to be optional - ([ \-]?). You can then reference it with \1 to make sure the second separator is the same as the first one:
^(?!000|666)[0-9]{3}([ -]?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$
See it here (thanks #Tushar)

I had a requirement to validate SSN's. This regex will validate SSN for below rules
Matches dashes, spaces or no spaces
Numbers, 9 digits, non-alphanumeric
Exclude all zeros
Exclude beginning characters 666,000,900,999,123456789,111111111,222222222,333333333,444444444,555555555,666666666,777777777,888888888,999999999
Exclude ending characters 0000
^(?!123([ -]?)45([ -]?)6789)(?!\b(\d)\3+\b)(?!000|666|900|999)[0-9]{3}([ -]?)(?!00)[0-9]{2}\4(?!0000)[0-9]{4}$
Explanation
^ - Beginning of string
(?!123([ -]?)45([ -]?)6789) - Don't match 123456789, 123-45-6789, 123 45 6789
(?!\b(\d)\3+\b) - Don't match 00000000,111111111...999999999. Repeat same with space and dashes. '\3' is for backtracking to (\d)
(?!000|666|900|999) - Don't match SSN that begins with 000,666,900 or 999.
([ -]?) - Check for space and dash. '?' is used to make space and dash optional. ? is 0 or 1 occurence of previous character.
(?!00) - the 4th and 5th characters cannot be 00.
\4 - Backtracking to check for space and dash again after the 5th character.
(?!0000) - The last 4 characters cannot be all zeros.
$ - End of string
Backtracking is used to repeat a captured group (). Each group is represented sequentially 1,2,3..so on
See here for more explanation and examples
https://regex101.com/r/rA2xA2/3

Related

Can I use a regular expression to help format this data to separate name, age, and address?

I am working on an assignment for class, and we need to format this data. I was thinking that regular expressions would be a very elegant way of formatting the data. But, I ran into some trouble. This is my first time doing this before and I do not know how to properly split the data. I want the beginning to the first digit be the first section, the first digit until the next white space to be the second section, and there till the end of the line to be the third section. Here is my data:
Amber-Rose Bowen 53 123 Machinery Rd.
Joyce Kirkland 19 234 Cylinder Dr.
Seb Dotson 32 3456 Surgery Ln.
Dominique Hough 58 654 Election Rd.
Yasemin Mcleod 29 555 Cabinet Ave.
Nancy Lord 80 232 Highway Rd.
Tracy Mckenzie 72 101 Device Ave.
Alistair Salter 25 109 Guitar Ln.
Adeel Sears 42 222 Solitare Rd.
I have been using https://regex101.com/ to test my ideas. ([a-zA-Z]+)([0-9]+) this is my start, but I do not know how to go from the start to the first digit. (or any other part of this)
You can use
^(.*?)[^\S\r\n]+(\d+)[^\S\r\n]+(\S.*)
See the regex demo. This regex can also be used with a multiline flag to extract data from a multiline string.
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
[^\S\r\n]+ - zero or more horizontal whitespaces (in some regex flavors, you can use \h+ or [^\p{Zs}\t]+ instead)
(\d+) - Group 2: one or more digits
[^\S\r\n]+ - one or more horizontal whitespaces
(\S.*) - Group 3: a non-whitespace char and then the rest of the line.
If you merely wish to separate the string into full name, age and street address you may split the string on matches of the regular expression
(?i)(?<=[a-z]|\d) +(?=\d)
For example:
Amber-Rose Bowen 53 123 Machinery Rd.
^ ^^^^
Demo
The regular expression reads: "match one or more spaces preceded by a letter or digit and followed by a digit". (?i) causes the match of a letter to be case-indifferent. (?<=[a-z]|\d) is a positive lookbehind; (?=\d) is a positive lookahead.
You may use the following regular expression if you wish to to extract first name, last name, age, street number and street name.
^(?<first_name>\S+) +(?<last_name>\S+) +(?<age>\d+) +(?<street_nbr>\d+) +(?<stret_name>.*)
For example:
Amber-Rose Bowen 53 123 Machinery Rd.
^^^^^^^^^^ ^^^^^ ^^ ^^^ ^^^^^^^^^^^^^
1 2 3 4 5
1: first_name
2: last_name
3: age
4: street_nbr
5: street_name
Demo
I've used the PCRE regex engine with named capture groups. The expression would be similar for other regex engines, though some do not support named groups, in which case you would have to use numbered groups (group 1, group 2, and so forth.)
Note that this only works because of the consistent structure of your data. In real life some strings may contain such things as middle names or apartment numbers, which would complicate the parsing of the strings.

Regular expression for Invoice Number

I am new to Stackoverflow and I need your help to match payment invoice number. So that user can't input wrong invoice number. It should match the invoice pattern like 612(fixed) 10/20/30/40/50(only one from 5 of them) 001-064(one at a time) 0000(fixed) 01-64(one at a time) 00(fixed) and then 0001-9999(allowed)
If I show you one invoice number it'll be like this one 612 30 005 0000 55 00 1234 without any space like this 61230005000055001234
I can't figure it out how could I do this. please help me if you can.
^612\s?[1-5]0\s?0(?:[0-5]\d|6[0-4])\s?0000\s?(?:[0-5]\d|6[0-4])\s?00\s?\d{4}$
Should do the job for you, assuming that spaces are optional, but in fixed position and only single ones.
^ is an anchor for the beginning of the string
612\s? matches 612 literally, followed by an optional space
[1-5]0\s? matches 1/2/3/4/5 followed by 0 and an optional space
0([0-5]\d|6[0-4])\s? means 0 followed by either 0-5 and any digit or 6
and 0-4, followed by an optional space
0000\s? matches 0000 literally, followed by an optinal space
([0-5]\d|6[0-4])\s? is either 0-5 and any digit or 6 and 0-4, followed by an optional space
00\s? matches 00 literally, followed by an optional space
\d{4} means any 4 digits
$ is an anchor for the end of the string
https://regex101.com/r/iU5jY5/3
612[1-5]00(?:[0-5][0-9]|6[0-4])0000(?:0[0-9]|[1-5][0-9]|6[0-4])00[0-9]{4}
See a demo here.

Regular Expression find space delimited numbers

I have a string that comes from user input through a messaging system, this can contain a series of 4 digit numbers, but as users are likely to type things in wrong it needs to be a little bit flexible.
Therefore I want to allow them to type in the numbers, or pepper their message with any string of characters and then just take the numbers that match the formats
=nnnn or nnnn
For this I have the Regular Expression:
(^|=|\s)\d{4}(\s|$)
Which almost works, however as it says that each group of 4 digits must start with an =, a space, or the start of the string it misses every other set of numbers
I tried this:
(^|=|\s*)\d{4}(\s|$)
But that means that any four digits followed by a space get matched - which is incorrect.
How can I match groups of numbers, but include a single space at the end of one group, and the beginning of the next, to clarify this string:
Ack 9876 3456 3467 4578 4567
Should produce the matches:
9876
3456
3467
4578
4567
Here you need to use lookarounds which won't consume any characters.
(?:^|[=\s])\K\d{4}(?=\s|$)
OR
(?:^|[=\s])(\d{4})(?=\s|$)
DEMO
Your regex (^|=|\s)\d{4}(\s|$) fails because at first this would match <space>9876<space> then it would look for another space or equals or start of the line. So now it finds the next match at <space>3467<space>. It won't match 3456 because the space before 3456 was already consumed in the first match. In-order to do overlapping matches, you need to put the pattern inside positive lookarounds. So when you put the last pattern (\s|$) inside lookahead, it won't consume the space, it just asserts that the match must be followed by a space or end of the line boundary.
\b\d+\b
\b asserts position at a word boundary (^\w|\w$|\W\w|\w\W). It is a 0-width anchor, much like ^ and $. It doesn't consume any characters.
Demo
or
(?:^|(?<=[=\s]))\d{4}\b
Demo

Regex match and take N symbols when they are mixed with symbols which you don't have to take

Here's a couple of examples:
NUM12345678OTHERSTR
NUM 123 45 678 OTHERSTR
NUM123 45-678 OTHERSTR
NUM 123 456 789 1011
I need to get 12345678
So I need to select number which is located after the certain marker NUM and may include digits mixed with spaces and dashes. That's not a problem, I'm able to create a pattern like this. But I need to limit this number either with another marker(OTHERSTR) OR with its length. I need to select e.g. at least 4 and up to 8 digits max. I thought about {4,8}, but couldn't figure out how to apply it only to digits but not to spaces etc. Could somebody help me with that?
for delimiting with string
(?<=NUM).*(?=OTHERSTR)
for delimiting with number of chars :
^\D*((?:\d\D*){4,8})
demo here : http://regex101.com/r/aL6tE7
you will get 123 456 78 in the first capture group, for which you can do a regex substitution of spaces with nothing to get your desired output. So on the result that you are getting, do regex substitution of \D+ with nothing.

Capturing exact amount of numbers without Whitespaces

I have a series of 8-digit-numbers that I need to capture via RegEx.
Single whitespaces can occur before, after and in some cases between the digits. In some cases, other chars follow. Here's the most common variations, each of which I want to capture as 12345678:
123456789
12345678
1234567 89S
12345 678 9
123 456789
123456 789
Is this possible?
I think a regex like:
(( )?\d){8}
Would suffice to capture the digits - I'd then remove the whitespace (before further processing) as a separate step.
I'm not sure how strictly to interpret the OP's "single whitespaces" requirement, but it's why I've structured my RegEx to accept 8 digits, each of which is optionally prefixed by a single space character.
If it should only match if there are single spaces, and not any more, the above works whereas the "strip whitespace first" or "strip non-digits first" approaches will not.
If more spaces are allowed, it's easy to change the ? to a * or any fixed upper limit.
This is not possible in a single "regex" step. I can go into more detail if you like, but basically regex cannot "count" (it can only match a specified match size, such as "8 numbers", but not "an unknown number of characters, 8 of which are numbers").
You need to do this in two stages -
first remove whitespace.
then perform a regex match.
For instance, in ruby:
thingtomatch = " 12 3456 7899X"
temp = thingtomatch.squeeze(' ').strip # => temp="1234567899X"
matched_digits = temp.match(/(\d{8}).*/)[1]
(Or, as other answers have suggested, you could perform a regex match and then remove whitespace from the result.)
You can do it, but in two steps:
First, remove non-digits:
s/[^\d]//g
Second, match your digits:
m/^(\d{8})$/