trim phone number with regex - regex

Probably an easy regex question.
How do I remove all non-digts except leading + from a phone number?
i.e.
012-3456 => 0123456
+1 (234) 56789 => +123456789

/(?<!^)\+|[^\d+]+//g
will remove all non-numbers and leave a leading + alone. Note that leading whitespace will cause the "leave + alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.
Explanation:
(?<!^)\+: Match a + unless it's at the start of the string. (In .NET, use (?<!^\s*)\+ to allow for leading whitespace).
| or
[^\d+]+: match any run of characters that are neither numbers nor +.
Before (using (?<!^\s*)\+|[^\d+]+):
+49 (123) 234 5678
+1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666
After:
+491232345678
+15552345678
+72345678910
01233455678666

In Java, you can do
public static String trimmed(String phoneNumber) {
return phoneNumber.replaceAll("[^+\\d]", "");
}
This will keep all +, even if it's in the middle of phoneNumber. If you want to remove any + in the middle, then do something like this:
return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");
(?<=.) is a lookbehind to see if there was a preceding character before the +.
System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"

How do I remove all non-digits except leading + from a phone number?
Removing ( and ) and spaces from +44 (0) 20 3000 9000 results in the non-valid number +4402030009000. It should be +442030009000.
The tidying routine needs several steps to deal with country code (with or without access code or +) and/or trunk code and/or punctuation either singly or in any combination.

It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:
#!/usr/bin/perl
while (<DATA>) {
print "DATA Read: \$_=$_"; #\n already there...
s/\s*(.*)\s*/$1/g;
$s=s/(^\+){0,1}//?$1:'';
s/[^\d]//g;
print "Formatted: $s$_\n====\n";
}
__DATA__
012-3456
+1 (234) 56789
+1 (234) 56789
1234-56789 |
+12345+6789
Output:
DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_= +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789 |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789

If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:
s/[^0-9+]//g
If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:
s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/

use perl,
my $number = // set it equal to phone number
$number =~ s/[^\d+]//g
This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.
Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string

Just replace everything except digits and + to ''
/[^\d+]/
In Python,
>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>

You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.
Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.
My advice would be be to write a custom class for handling phone numbers and not to use regex.

Related

Regex for valid SSN or other ID

I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999
I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999
Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$

IBAN Regex design [duplicate]

This question already has answers here:
IBAN Validation check
(11 answers)
Closed 4 years ago.
Help me please to design Regex that will match all IBANs with all possible whitespaces. Because I've found that one, but it does not work with whitespaces.
[a-zA-Z]{2}[0-9]{2}[a-zA-Z0-9]{4}[0-9]{7}([a-zA-Z0-9]?){0,16}
I need at least that formats:
DE89 3704 0044 0532 0130 00
AT61 1904 3002 3457 3201
FR14 2004 1010 0505 0001 3
Just to find the example IBAN's from those countries in a text :
Start with 2 letters then 2 digits.
Then allow a space before every 4 digits, optionally ending with 1 or 2 digits:
\b[A-Z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4}(?!(?:[ ]?[0-9]){3})(?:[ ]?[0-9]{1,2})?\b
regex101 test here
Note that if the intention is to validate a complete string, that the regex can be simplified.
Since the negative look-ahead (?!...) won't be needed then.
And the word boundaries \b can be replaced by the start ^ and end $ of the line.
^[A-Z]{2}[0-9]{2}(?:[ ]?[0-9]{4}){4}(?:[ ]?[0-9]{1,2})?$
Also, it can be simplified even more if having the 4 groups of 4 connected digits doesn't really matter.
^[A-Z]{2}(?:[ ]?[0-9]){18,20}$
Extra
If you need to match an IBAN number from accross the world?
Then the BBAN part of the IBAN is allowed to have up to 30 numbers or uppercase letters. Reference
And can be written with either spaces or dashes or nothing in between.
For example: CC12-XXXX-12XX-1234-5678-9012-3456-7890-123
So the regex pattern to match a complete string with a long IBAN becomes a bit longer.
^([A-Z]{2}[ \-]?[0-9]{2})(?=(?:[ \-]?[A-Z0-9]){9,30}$)((?:[ \-]?[A-Z0-9]{3,5}){2,7})([ \-]?[A-Z0-9]{1,3})?$
regex101 test here
Also note, that a pure regex solution can't do calculations.
So to actually validate an IBAN number then extra code is required.
Example Javascript Snippet:
function smellsLikeIban(str){
return /^([A-Z]{2}[ \-]?[0-9]{2})(?=(?:[ \-]?[A-Z0-9]){9,30}$)((?:[ \-]?[A-Z0-9]{3,5}){2,7})([ \-]?[A-Z0-9]{1,3})?$/.test(str);
}
function validateIbanChecksum(iban) {
const ibanStripped = iban.replace(/[^A-Z0-9]+/gi,'') //keep numbers and letters only
.toUpperCase(); //calculation expects upper-case
const m = ibanStripped.match(/^([A-Z]{2})([0-9]{2})([A-Z0-9]{9,30})$/);
if(!m) return false;
const numbericed = (m[3] + m[1] + m[2]).replace(/[A-Z]/g,function(ch){
//replace upper-case characters by numbers 10 to 35
return (ch.charCodeAt(0)-55);
});
//The resulting number would be to long for javascript to handle without loosing precision.
//So the trick is to chop the string up in smaller parts.
const mod97 = numbericed.match(/\d{1,7}/g)
.reduce(function(total, curr){ return Number(total + curr)%97},'');
return (mod97 === 1);
};
var arr = [
'DE89 3704 0044 0532 0130 00', // ok
'AT61 1904 3002 3457 3201', // ok
'FR14 2004 1010 0505 0001 3', // wrong checksum
'GB82-WEST-1234-5698-7654-32', // ok
'NL20INGB0001234567', // ok
'XX00 1234 5678 9012 3456 7890 1234 5678 90', // only smells ok
'YY00123456789012345678901234567890', // only smells ok
'NL20-ING-B0-00-12-34-567', // stinks, but still a valid checksum
'XX22YYY1234567890123', // wrong checksum again
'droid#i.ban' // This Is Not The IBAN You Are Looking For
];
arr.forEach(function (str) {
console.log('['+ str +'] Smells Like IBAN: '+ smellsLikeIban(str));
console.log('['+ str +'] Valid IBAN Checksum: '+ validateIbanChecksum(str))
});
Here is a suggestion that may works for the patterns you provided:
[A-Z]{2}\d{2} ?\d{4} ?\d{4} ?\d{4} ?\d{4} ?[\d]{0,2}
Try it on regex101
Explanation
[A-Z]{2}\d{2} ? 2 capital letters followed by 2 digits (optional space)
\d{4} ? 4 digits, repeated 4 times (optional space)
[\d]{0,2} 0 to 2 digits
You can use a regex like this:
^[A-Z]{2}\d{2} (?:\d{4} ){3}\d{4}(?: \d\d?)?$
Working demo
This will match only those string formats
It's probably best to look up the specifications for a correct IBAN number. But if you want to have a regex similar to your existing one, but with spaces, you can use the following one:
^[a-zA-Z]{2}[0-9]{2}\s?[a-zA-Z0-9]{4}\s?[0-9]{4}\s?[0-9]{3}([a-zA-Z0-9]\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,4}\s?[a-zA-Z0-9]{0,3})?$
Here is a live example: https://regex101.com/r/ZyIPLD/1

Match phone numbers with lengths between 8-16 digits, ignoring ()+-

Consider the following:
+12 34 456 432
(12) 34 567 124
1234 56 78 90
(1234) 567 890
1234-567-890
1234 - 567 - 890
12 34 56 78
12-34-56-78
Assume these are all valid phone number structures
Can a regex be used to express: find at least 8 numbers,but not more than 16 and ignore spaces, round brackets, the plus symbol(once) and the minus.
My current working sample is a mess:
^([\+|\(]{1,2})?+(\d{2,4})+([ |-|\)]{1,2})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})?$
Even if phone number validation is recommended against. Is there not a simpler regex syntax for these things?
To just account for the number of digits and ingore the -, ), ( or spaces (allowing a + at the beginning), you can use the following regex:
^\+?(?:[ ()-]*\d){8,16}$
It matches
^ - start of string
\+? - one or zero +
(?:[ ()-]*\d){8,16} - 8 to 16 sequences of...
[ ()-]* - 0 or more -, ), ( or a space characters
\d - a digit
$ - end of string
See the regex demo
This may ease your task.
First, remove everything that is not a number:
myString = myString.replace(/\D/g,'');
You'll get this:
1234456432
1234567124
1234567890
1234567890
1234567890
1234567890
12345678
12345678
Then just check for length:
if(myString.length >= 0 && myString.length <=16)
// Do stuff
Using preg_replace fetch numbers only, check for the valid length
<?php
$ph = "(12) 34 567 124";
$len = strlen(preg_replace('/[^0-9]+/', '', $ph));
if($len >=8 && $len <=16)
echo "Valid";
else
echo "Invalid";
Don't even think about it. Phone numbers are complicated. They are hugely complicated. Google has a decent library to handle phone numbers named libPhoneNumber.
And excuse me, but ignoring the "+" makes whatever you are doing totally, absolutely wrong. A plus is followed by the country code of some country, followed by a local phone number within that country (which needs to be parsed according to the rules of that country, and there are about 200). Without the "+", you have a phone number according to the local rules, and you need to find out which local rules apply. Which means your number can start with a code for dialing a foreign exchange instead of the "+", otherwise it is formatted according to local rules.
As a result, a number may be valid with the "+" and invalid without it or vice versa, and most likely refers to a different actual phone in totally different countries with or without the "+".

regular expression for bulgarian mobile phone numbers

Hello I should think of this regular expression:
The telephone number should begin with 087 OR 088 OR 089 and then it should be followed by7 digits:
This is what I made but it doesn't work correctly: it accepts only numbers which begin with 089
(087)|(088)|(089)[0-9]{7}";
/08[789]\d{7}/
that will match 087xxxxxxx, 088xxxxxxx, 089xxxxxxx numbers.
See it in action
Maybe /08[7-9][0-9]{7}/ is what you're searching for?
Autopsy:
08 - a literal 08
[7-9] - matches the numbers from 7-9 once
[0-9]{7} - matches the numbers from 0-9 repeated exactly 7 times
That said, you might prefer /^08[7-9][0-9]{7}$/ if your string is only the phone number. (^ means "the string MUST start here" and $ means "the string MUST end here").
Actually that will be far better regex for Bulgarian phone numbers:
/(\+)?(359|0)8[789]\d{1}(|-| )\d{3}(|-| )\d{3}/
It checks:
Phones that start with country code(+359) or 0 instead;
if the phone number use delimiters like - or space.
I tried it in https://regex101.com and it did not work against my test set. So I tweaked it a little bit with the below regex pattern:
^([+]?359)|0?(|-| )8[789]\d{1}(|-| )\d{3}(|-| )\d{3}$

How to find numbers and exclude any in parentheses using regex

I'm trying to write a regex pattern that will find numbers with two leading 00's in it in a string and replace it with a single 0. The problem is that I want to ignore numbers in parentheses and I can't figure out how to do this.
For example, with the string:
Somewhere 001 (2009)
I want to return:
Somewhere 01 (2009)
I can search by using [00] to find the first 00, and replace with 0 but the problem is that (2009) becomes (209) which I don't want. I thought of just doing a replace on (209) with (2009) but the strings I'm trying to fix could have a valid (209) in it already.
Any help would be appreciated!
Search one non digit (or start of line) followed by two zeros followed by one or more digits.
([^0-9]|^)00[0-9]+
What if the number has three leading zeros? How many zeros do you want it to have after the replacement? If you want to catch all leading zeros and replace them with just one:
([^0-9]|^)00+[0-9]+
Ideally, you'd use negative look behind, but your regex engine may not support it. Here is what I would do in JavaScript:
string.replace(/(^|[^(\d])00+/g,"$10");
That will replace any string of zeros that is not preceded by parenthesis or another digit. Change the character class to [^(\d.] if you're also working with decimal numbers.
?Regex.Replace("Somewhere 001 (2009)", " 00([0-9]+) ", " 0$1 ")
"Somewhere 01 (2009)"