Checking phone numbers for equivalence - regex

What's the best way to check phone numbers in different formats for equivalence?
Different formats:
"(708) 399 7222"
"7083997222"
"708-399-7222"
"708399-7222"
"+1 (708) 399-7222"
"+1 (708)399-7222"
Additional Difficulty: what if the phone number isn't prefaced by the country code?

You can't use a single regular expression. To get a canonical representation that can be compared:
Replace an initial + with your international call prefix. In many countries this is 00.
If number doesn't start with the prefix, add the prefix and the country code for your country.
Remove all non-digits.
This will be sufficient if you only have to deal with calls made from a single country, for example if you are developing something for internal use at a phone company. If you have to accept a wide range of inputs from different countries with various prefixes I suggest finding a well tested library to do this.

You can implement PhoneEqualityComparer class. If you deal only with US numbers, the following code will work:
sealed class PhoneEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return string.Equals(NormalizePhone(x), NormalizePhone(y));
}
public int GetHashCode(string obj)
{
return NormalizePhone(obj).GetHashCode();
}
private static string NormalizePhone(string phone)
{
if (phone.StartsWith("+1"))
phone = phone.Substring(2);
return Regex.Replace(phone, #"\D", string.Empty);
}
}
Sample usage:
string[] phones = new[] {
"(708) 399 7222",
"7083997222",
"708-399-7222",
"708399-7222",
"+1 (708) 399-7222",
"+1 (708)399-7222"
};
string[] uniquePhones = phones.Distinct(new PhoneEqualityComparer()).ToArray();

Try this regex
\(?\d{3}\)?-? *\d{3}-? *-?\d{4}
Or:
^\+?(\d[\d-]+)?(\([\d-.]+\))?[\d-]+\d$
Regex Demo

This is a pretty neat regex that will replace:
any plus sign followed by one or more digits
any character that is not a digit or a line break character
with a blank string
string text = Regex.Replace( inputString, #"\+\d+|[^0-9\r\n]", "" , RegexOptions.None | RegexOptions.Multiline );
Input:
"(708) 399 7222"
"7083997222"
"708-399-7222"
"708399-7222"
"+1 (708) 399-7222"
"+1 (708)399-7222"
Output:
7083997222
7083997222
7083997222
7083997222
7083997222
7083997222

Related

Using regex to capture phone numbers with spaces inserted at differing points

I want to be able to extract a complete phone number from text, irrespective of how many spaces interrupt the number.
For example in the passage:
I think Emily was her name, and that her number was either 0421032614 or 0423 032 615 or 04321 98 564
I would like to extract:
0421032614
0423032615
0432198564
I can extract the first two using
(\d{4}[\s]?)(\d{3}[\s]?)+
But this is contingent on me knowing ahead of time how the ten numbers will be grouped (i.e. where the spaces will be). Is there any way to capture the ten numbers with a more flexible pattern?
You need to remove all white space then run a for loop and iterate through the groups:
public static void main (String [] args){
String reg = "(\\d{10})";
String word = " think Emily was her name, and that her number was either 0421032614 or 0423 032 615 or 04321 98 564";
word = word.replaceAll("\\s+",""); // replace all the whitespace with nothing
Pattern pat = Pattern.compile(reg);
Matcher mat = pat.matcher(word);
while (mat.find()) {
for (int i = 1; i <= mat.groupCount(); i++) {
System.out.println(mat.group(i));
}
}
}
output is
0421032614
0423032615
0432198564

Angular 2 Pattern Validator reporting invalid incorrectly...? [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 6 years ago.
I'm working with a form for US phone numbers:
let phoneRegEx = '^\([1-9]{3}\)\s{1}[0-9]{3}-[0-9]{4}$';
class PhoneFormComponent {
...
buildForm(): void {
this.form = this.formBuilder.group({
number: [
this.phone ? this.phoneNumberPipe.transform(this.phone.number) : '', [
Validators.required,
Validators.pattern(phoneRegEx),
]
]
});
}
}
I also have a Pipe that formats the phone number to my liking:
#Pipe({name: 'phoneNumber'})
export class PhoneNumberPipe implements PipeTransform {
transform(value: string): string {
if (!value) return value;
let areaCode = value.substring(0, 3);
let prefix = value.substring(3, 6);
let number = value.substring(6);
return `(${areaCode}) ${prefix}-${number}`;
}
}
Phone numbers numbers are stored as a string with no spaces or formatting in the API that I am required to access, so I'm leveraging the pipe to format the phone number as provided by the service when populating the form, and then matching the same pattern for input. I strip out the formatting when supplying the value back to the API.
According to regex101.com, my regex should be working as expected: https://regex101.com/r/9TRysJ/1, e.g. (111) 111-1111 should be valid.
However, my Angular form always reports that the value for the phone field is invalid. What am I doing wrong?
I don't Know Angular
But it REGEXP should Work.
\((\d{3})\)\s(\d{3})\-(\d{4})
Result:
Full match 0-14 `(123) 456-7890`
Group 1. 1-4 `123`
Group 2. 6-9 `456`
Group 3. 10-14 `7890`
I'm basically separating each value into a group.
Then u can use
($1) $2-$3
Explanation:
$1 = 123
$2 = 456
$3 = 7890
Note: $0 => Full Match (123) 456-7890

Regex for custom parsing

Regex isn't my strongest point. Let's say I need a custom parser for strings which strips the string of any letters and multiple decimal points and alphabets.
For example, input string is "--1-2.3-gf5.47", the parser would return
"-12.3547".
I could only come up with variations of this :
string.replaceAll("[^(\\-?)(\\.?)(\\d+)]", "")
which removes the alphabets but retains everything else. Any pointers?
More examples:
Input: -34.le.78-90
Output: -34.7890
Input: df56hfp.78
Output: 56.78
Some rules:
Consider only the first negative sign before the first number, everything else can be ignored.
I'm trying to do this using Java.
Assume the -ve sign, if there is one, will always occur before the
decimal point.
Just tested this on ideone and it seemed to work. The comments should explain the code well enough. You can copy/paste this into Ideone.com and test it if you'd like.
It might be possible to write a single regex pattern for it, but you're probably better off implementing something simpler/more readable like below.
The three examples you gave prints out:
--1-2.3-gf5.47 -> -12.3547
-34.le.78-90 -> -34.7890
df56hfp.78 -> 56.78
import java.util.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
System.out.println(strip_and_parse("--1-2.3-gf5.47"));
System.out.println(strip_and_parse("-34.le.78-90"));
System.out.println(strip_and_parse("df56hfp.78"));
}
public static String strip_and_parse(String input)
{
//remove anything not a period or digit (including hyphens) for output string
String output = input.replaceAll("[^\\.\\d]", "");
//add a hyphen to the beginning of 'out' if the original string started with one
if (input.startsWith("-"))
{
output = "-" + output;
}
//if the string contains a decimal point, remove all but the first one by splitting
//the output string into two strings and removing all the decimal points from the
//second half
if (output.indexOf(".") != -1)
{
output = output.substring(0, output.indexOf(".") + 1)
+ output.substring(output.indexOf(".") + 1, output.length()).replaceAll("[^\\d]", "");
}
return output;
}
}
In terms of regex, the secondary, tertiary, etc., decimals seem tough to remove. However, this one should remove the additional dashes and alphas: (?<=.)-|[a-zA-Z]. (Hopefully the syntax is the same in Java; this is a Python regex but my understanding is that the language is relatively uniform).
That being said, it seems like you could just run a pretty short "finite state machine"-type piece of code to scan the string and rebuild the reduced string yourself like this:
a = "--1-2.3-gf5.47"
new_a = ""
dash = False
dot = False
nums = '0123456789'
for char in a:
if char in nums:
new_a = new_a + char # record a match to nums
dash = True # since we saw a number first, turn on the dash flag, we won't use any dashes from now on
elif char == '-' and not dash:
new_a = new_a + char # if we see a dash and haven't seen anything else yet, we append it
dash = True # activate the flag
elif char == '.' and not dot:
new_a = new_a + char # take the first dot
dot = True # put up the dot flag
(Again, sorry for the syntax, I think you need some curly backets around the statements vs. Python's indentation only style)

regex how can I split this word?

I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;

Regular Expression to find numbers with same digits in different order

I have been looking for a regular expression with Google for an hour or so now and can't seem to work this one out :(
If I have a number, say:
2345
and I want to find any other number with the same digits but in a different order, like this:
2345
For example, I match
3245 or 5432 (same digits but different order)
How would I write a regular expression for this?
There is an "elegant" way to do it with a single regex:
^(?:2()|3()|4()|5()){4}\1\2\3\4$
will match the digits 2, 3, 4 and 5 in any order. All four are required.
Explanation:
(?:2()|3()|4()|5()) matches one of the numbers 2, 3, 4, or 5. The trick is now that the capturing parentheses match an empty string after matching a number (which always succeeds).
{4} requires that this happens four times.
\1\2\3\4 then requires that all four backreferences have participated in the match - which they do if and only if each number has occurred once. Since \1\2\3\4 matches an empty string, it will always match as long as the previous condition is true.
For five digits, you'd need
^(?:2()|3()|4()|5()|6()){5}\1\2\3\4\5$
etc...
This will work in nearly any regex flavor except JavaScript.
I don't think a regex is appropriate. So here is an idea that is faster than a regex for this situation:
check string lengths, if they are different, return false
make a hash from the character (digits in your case) to integers for counting
loop through the characters of your first string:
increment the counter for that character: hash[character]++
loop through the characters of the second string:
decrement the counter for that character: hash[character]--
break if any count is negative (or nonexistent)
loop through the entries, making sure each is 0:
if all are 0, return true
else return false
EDIT: Java Code (I'm using Character for this example, not exactly Unicode friendly, but it's the idea that matters now):
import java.util.*;
public class Test
{
public boolean isSimilar(String first, String second)
{
if(first.length() != second.length())
return false;
HashMap<Character, Integer> hash = new HashMap<Character, Integer>();
for(char c : first.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count++;
hash.put(c, count);
}
else
{
hash.put(c, 1);
}
}
for(char c : second.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count--;
if(count < 0)
return false;
hash.put(c, count);
}
else
{
return false;
}
}
for(Integer i : hash.values())
{
if(i.intValue()!=0)
return false;
}
return true;
}
public static void main(String ... args)
{
//tested to print false
System.out.println(new Test().isSimilar("23445", "5432"));
//tested to print true
System.out.println(new Test().isSimilar("2345", "5432"));
}
}
This will also work for comparing letters or other character sequences, like "god" and "dog".
Put the digits of each number in two arrays, sort the arrays, find out if they hold the same digits at the same indices.
RegExes are not the right tool for this task.
You could do something like this to ensure the right characters and length
[2345]{4}
Ensuring they only exist once is trickier and why this is not suited to regexes
(?=.*2.*)(?=.*3.*)(?=.*4.*)(?=.*5.*)[2345]{4}
The simplest regular expression is just all 24 permutations added up via the or operator:
/2345|3245|5432|.../;
That said, you don't want to solve this with a regex if you can get away with it. A single pass through the two numbers as strings is probably better:
1. Check the string length of both strings - if they're different you're done.
2. Build a hash of all the digits from the number you're matching against.
3. Run through the digits in the number you're checking. If you hit a match in the hash, mark it as used. Keep going until you don't get an unused match in the hash or run out of items.
I think it's very simple to achieve if you're OK with matching a number that doesn't use all of the digits. E.g. if you have a number 1234 and you accept a match with the number of 1111 to return TRUE;
Let me use PHP for an example as you haven't specified what language you use.
$my_num = 1245;
$my_pattern = '/[' . $my_num . ']{4}/'; // this resolves to pattern: /[1245]{4}/
$my_pattern2 = '/[' . $my_num . ']+/'; // as above but numbers can by of any length
$number1 = 4521;
$match = preg_match($my_pattern, $number1); // will return TRUE
$number2 = 2222444111;
$match2 = preg_match($my_pattern2, $number2); // will return TRUE
$number3 = 888;
$match3 = preg_match($my_pattern, $number3); // will return FALSE
$match4 = preg_match($my_pattern2, $number3); // will return FALSE
Something similar will work in Perl as well.
Regular expressions are not appropriate for this purpose. Here is a Perl script:
#/usr/bin/perl
use strict;
use warnings;
my $src = '2345';
my #test = qw( 3245 5432 5542 1234 12345 );
my $canonical = canonicalize( $src );
for my $candidate ( #test ) {
next unless $canonical eq canonicalize( $candidate );
print "$src and $candidate consist of the same digits\n";
}
sub canonicalize { join '', sort split //, $_[0] }
Output:
C:\Temp> ks
2345 and 3245 consist of the same digits
2345 and 5432 consist of the same digits