Regex for valid SSN or other ID - regex

I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999

I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999

Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$

Related

Suite 400 - 100 ABCDEF (Capture values from 100)

I need a regular expression that would find 100 ABCDEF from input string Suite 400 - 100 ABCDEF. It should be noted that I created a regex as below but it picks the value from Suite.
[^-\s]\d.+
Just put $ at the end of your regex. $ means "end of line".
Also, replace the dot with [^-], so it will match only non-hyphens:
[^-\s]?\d[^-]+$
Fiddle: http://refiddle.com/refiddles/5b9a88ef75622d4ca9590000
Since you're trying to match a US street address, you should try matching a number followed by one or more words instead:
\d+(?:\s+[A-Za-z.]+)+
Demo: https://regex101.com/r/y6n5jD/1

SSN issue in SIEBEL

I am working on siebel CRM. I have space issues in my regex.
I have SSN numbers in these formats
123 456 789
123-456-789
123 45 6789
I need to dispaly my SSN Like XXX-XX-4567. My regex looks like
([\s.:])(?!000)(?!666)(?!9[0-9][0-9])\d{3}[- ]?(?!00)\d{2}[- ]?(?!0000)\d{4})([\s.:]) |
([\s.:])(?!000)(?!666)(?!9[0-9][0-9])\d{3}[- ]?(?!00)\d{3}[- ]?(?!00)\d{3})([\s.:]).
How can I remove all blank spaces in the above expression and display the format as i mentioned above?
It looks like there are syntax errors in your RegEx. There are a couple of unmatched brackets, at (?!0000)\d{4}) on the first section, the last bracket is unmatched.
I think I've managed to write the regex you're looking for, but a bit shorter than the one you were using:
([\s.:])((?!000)(?!666)(?!9[0-9]{2})\d{3})[- ]?((?!00)\d{2,3})[- ]?((?!00)\d{3,4})([\s.:])
This will match the following strings:
123-12-1234
123 456 789
123-456-789
123 45 6789
But will not match the following:
666-45-1234
abc-12-1232
123-00-1233
123-224-0011
123 224 0000
There are several capture groups here:
Matches any character (you may want to change this).
Matches the first three digit number.
Matches the second, two or three digit number.
Matches the third, three or four digit number.
Matches any character (you may want to change this).
You should be able to reconstruct the SSN in the format you need with the result of this RegEx.

Regex mask with varying ignored characters

I have a series of strings which look something like this:
foobar | ABC Some text 123
barfoo | DEF Some te 456
And I want to mask it such that I get the results
ABC123
DEF456
respectively. The text in between will always be a substring Some text which could potentially contain numbers (e.g. S0m3 t3xt or S0m3 t3). It will always be a substring starting from the left, so never me te.
So clearly I need to start the Regex with something like
(?<=| )[A-Z]{3}
which gets me ABC and DEF but I am at a loss of how to effectively concatenate the numbers at the end of the string.
Is there any way to do this with a single expression?
See http://regexr.com?375u8
(?<=| )([A-Z]{3}).*(\d{3})
This will give you three characters in the range of A-Z and three numbers in two capturing groups, allowing you to use these groups to concatenate both to your desired output: $1$2
This will even work if your Some text contains three numbers inbetween.
In case you want to replace everything with both of your capturing groups, add .* in front of the regex:
.*(?<=| )([A-Z]{3}).*?(\d{3})
Another javascript version
[
'foobar | ABC Some text 123',
'barfoo | DEF Some te 456'
].map(function(v) {
return v.replace(/^.*\| ([A-Z]{3}) .* (\d{3})$/, '$1$2');
})
Gives
["ABC123", "DEF456"]

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))

trim phone number with regex

Probably an easy regex question.
How do I remove all non-digts except leading + from a phone number?
i.e.
012-3456 => 0123456
+1 (234) 56789 => +123456789
/(?<!^)\+|[^\d+]+//g
will remove all non-numbers and leave a leading + alone. Note that leading whitespace will cause the "leave + alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.
Explanation:
(?<!^)\+: Match a + unless it's at the start of the string. (In .NET, use (?<!^\s*)\+ to allow for leading whitespace).
| or
[^\d+]+: match any run of characters that are neither numbers nor +.
Before (using (?<!^\s*)\+|[^\d+]+):
+49 (123) 234 5678
+1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666
After:
+491232345678
+15552345678
+72345678910
01233455678666
In Java, you can do
public static String trimmed(String phoneNumber) {
return phoneNumber.replaceAll("[^+\\d]", "");
}
This will keep all +, even if it's in the middle of phoneNumber. If you want to remove any + in the middle, then do something like this:
return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");
(?<=.) is a lookbehind to see if there was a preceding character before the +.
System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"
How do I remove all non-digits except leading + from a phone number?
Removing ( and ) and spaces from +44 (0) 20 3000 9000 results in the non-valid number +4402030009000. It should be +442030009000.
The tidying routine needs several steps to deal with country code (with or without access code or +) and/or trunk code and/or punctuation either singly or in any combination.
It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:
#!/usr/bin/perl
while (<DATA>) {
print "DATA Read: \$_=$_"; #\n already there...
s/\s*(.*)\s*/$1/g;
$s=s/(^\+){0,1}//?$1:'';
s/[^\d]//g;
print "Formatted: $s$_\n====\n";
}
__DATA__
012-3456
+1 (234) 56789
+1 (234) 56789
1234-56789 |
+12345+6789
Output:
DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_= +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789 |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789
If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:
s/[^0-9+]//g
If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:
s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/
use perl,
my $number = // set it equal to phone number
$number =~ s/[^\d+]//g
This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.
Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string
Just replace everything except digits and + to ''
/[^\d+]/
In Python,
>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>
You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.
Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.
My advice would be be to write a custom class for handling phone numbers and not to use regex.