Regular expression for dividing country calling codes - regex

I have a list of calling codes for all countries(the phone number prefixes), I would like to split them up in the
country name and the actual code so I can put then into an xml.
I have tried back and forth but can not get a regexp going that takes all cases into account.
I think it is fairly simple for someone with a bit of experience.
The codes have these formats:
Afghanistan 93
Anguilla 1 264
Antarctica 6721
Antigua and Barbuda 1 268
Bosnia and Herzegovina 387
Canada 1
Congo, Republic of the 242
Cote d'Ivoire 225
Ireland (Eire) 353
United States of America 1
There are around 235 of them in total, but these are the regulars and the exceptions.
^[a-zA-Z]\s,'()] for between 1 and X words and then it is [0-9\s]{1,5}$ for the numbers:
X
XX
XXX
XXXX
X XXX
So if I should express it as a sentence it would be: "from beginning of a line, take all characters (1) including space,'() until you encounter digits, then take all of these including space(2) until you encounter a line break."
I am using TextMate, and the docs says:
TextMate uses the Oniguruma regular
expression library by K. Kosako.
I would appreciate any help given:)
Thank you.

This posix regex should be sufficient: ^[a-zA-Z ]+[0-9 ]+$

Related

Regular expression for this problem (extracting string between strings)

I'm working in a project and unfortunately data extracted from another software needs more format. Take a look at this line
Instructor : 95371 XXX XXX XXX Associate Professor Course Name EE 311 Microprocessors lecture 834 1 32 3 3 1 08:00 AM - 08:50 AM 1 09:00 AM - 09:50 AM 3 10:00 AM - 10:50 AM 21 Total : 3 Section Position : Serial Campus Hrs Weekly Activity Semester: Time Schedule Type : 411 Reg. Regular First Semester 41/42 Rank : Course
Each line must start with Instructor followed by : and ID. The name may not be available. After that the rank of the teacher is stated in the following group
Associate Professor
Assistant Professor
Lecturer
Teacher
Teaching Assistant
after the words lecture or exercise or practical there are six number places, I need to extract the first one from the right.
Could you please suggest a startup regular expression for this? Qt library is welcomed.
This regex will match your text and extract the value as group
Instructor :\s*\d+\s+(?:\w+(?: \w+)*)\s+(?:Associate Professor|Assistant Professor|Lecturer|Teacher|Teaching Assistant)\s+Course Name\s+\w+ \d+\s+\w+(?: \w+)*\s+(?:lecture|exercise|practical)\s+\d+\s+\d+\s+\d+\s+\d+\s+\d+\s+(\d+)\s+\d{2}:\d{2} (?:AM|PM) - \d{2}:\d{2} (?:AM|PM)\s+\d\s+\d{2}:\d{2} (?:AM|PM) - \d{2}:\d{2} (?:AM|PM)\s+\d\s+\d{2}:\d{2} (?:AM|PM) - \d{2}:\d{2} (?:AM|PM)\s+\d+\s+Total : \d\s+Section Position : \s+Serial\s+Campus\s+Hrs Weekly\s+Activity\s+Semester:\s+Time\s+Schedule Type : \d+ Reg\.\s+Regular First Semester \d{2}\/\d{2}\s+Rank :\s+Course\s+

Regex (Posix) to get first word only, not including numbers

New to Regex (which was recently added to SQL in DB2 for i). I don't know anything about the different engines but research indicates that it is "based on POSIX extended regular expressions".
I would like to get the street name (first non-numeric word) from an address.
e.g.
101 Main Street = Main
2/b Pleasant Ave = Pleasant
5H Unpleasant Crescent = Unpleasant
I'm sorry I don't have a string that isn't working, as suggested by the forum software. I don't even know where to start. I tried a few things I found in search but they either yielded nothing or the first "word" - i.e. the number (101, 2/b, 5H).
Thanks
Edit: Although it's looking as if IBM's implementation of regex on the DB2 family of databases may be too alien for many of the resident experts, I'll press ahead with some more detail in case it helps.
A plain English statement of the requirement would be:
Basic/acceptable: Find the first word/unbroken string that contains no numbers or special characters
Advanced/ideal: Find the first word that contains three or more characters, being only letters and zero or one embedded dash/hyphen, but no numbers or other characters.
Additional examples (original ones at top are still valid)
190 - 192 Tweety-bird avenue = Tweety-bird
190-192 Tweety-bird avenue = Tweety-bird
Charles Bronson Place = Charles
190H Charles-Bronson Place = Charles-Bronson
190 to 192 Charles Bronson Place = Charles
Second Edit:
Mooching around on the internet and trying every vaguely connected expression that I could find, I stumbled on this one:
[a-zA-Z]+(?:[\s-][a-zA-Z]+)*
which actually works pretty well - it gives the street name and street type, which on reflection would actually suit my purpose as well as the street name alone (I can easily expand common abbreviations - e.g. RD to ROAD - on the fly).
Sample SQL:
select HAD1,
regexp_substr(HAD1, '[a-zA-Z]+(?:[\s-][a-zA-Z]+)*')
from ECH
where HEDTE > 20190601
Sample output
Ship To REGEXP_SUBSTR
Address
Line 1
32 CHRISTOPHER STREET CHRISTOPHER STREET
250 - 270 FEATHERSTON STREET FEATHERSTON STREET
118 MONTREAL STREET MONTREAL STREET
7 BIRMINGHAM STREET BIRMINGHAM STREET
59 MORRISON DRIVE MORRISON DRIVE
118 MONTREAL STREET MONTREAL STREET
MASON ROAD MASON ROAD
I know this wasn't exactly the question I asked, so apologies to anyone who could have done this but was following the original request faithfully.
Not sure if this is Posix compliant, but something like this could work: ^[\w\/]+?\s((\w+\s)+?)\s*\w+?$, example here.
The script assumes that the first chunk is the number of the building, the second chunk, is the name of the street, and the last chunk is Road/Ave/Blvd/etc.
This should also cater for street names which have white spaces in them.
Using the following regex matches your examples :
(?<=[^ ]+ )[^ ]*[ ]

phone number RegEx not working for some strings

I want to recognize phone number as 9 consecutive figures which can be separated by white spaces, non-breaking spaces etc. with regEx "(\s*\d\s*){9}"
I run VBA macro (JS RegEx) and here are example strings which work fine with above RegEx:
ul. 27 Grudnia 16, tel. 21 287 31 61, fax 61 286 69 60 –
ul. Wrzosowa 110/120/222, kom. 692 601 428
And here is an example where phone number is not detected in VBA, but is detected by RegEx JS online tools:
al. Mazowieckiego 63, kom. 622 769 694 –
Strings which are detected and these which are not, have the same structure, so I have no idea why VBA doesn't detect phone number in some of them.
It came out that VBA changed some strings to look in - replaced a whitespace - chr(32) with a non breaking chr(160).
Removing chr(160) from string to look in solves the problem.
Also I will try to find RegEx which will let non-breaking spaces, because \s* doesn't do so, at least in VBA.

Italian phone 10-digit number regex issue

I'm trying to use the regex from this site
/^([+]39)?((38[{8,9}|0])|(34[{7-9}|0])|(36[6|8|0])|(33[{3-9}|0])|(32[{8,9}]))([\d]{7})$/
for italian mobile phone numbers but a simple number as 3491234567 results invalid.
(don't care about spaces as i'll trim them)
should pass:
349 1234567
+39 349 1234567
TODO: 0039 349 1234567
TODO: (+39) 349 1234567
TODO: (0039) 349 1234567
regex101 and regexr both pass the validation..what's wrong?
UPDATE:
To clarify:
The regex should match any number that starts with either
388/389/380 (38[{8,9}|0])|
or
347/348/349/340 (34[{7-9}|0])|
or
366/368/360 (36[6|8|0])|
or
333/334/335/336/337/338/339/330 (33[{3-9}|0])|
328/329 (32[{8,9}])
plus 7 digits ([\d]{7})
and the +39 at the start optionally ([+]39)?
The following regex appears to fulfill your requirements. I took out the syntax errors and guessed a bit, and added the missing parts to cover your TODO comments.
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[7-90]|36[680]|33[3-90]|32[89])\d{7}$
Demo: https://regex101.com/r/yF7bZ0/1
Your test cases fail to cover many of the variations captured by the regex; perhaps you'll want to beef up the test set to make sure it does what you want.
The beginning allows for an optional international prefix with or without the parentheses. The basic pattern is (00|\+)39 and it is repeated with or without parentheses around it. (Perhaps a better overall approach would be to trim parentheses and punctuation as well as whitespace before processing begins; you'll want to keep the plus as significant, of course.)
Updated with information from #Edoardo's answer; wrapped for legibility and added comments:
^ # beginning of line
(\((00|\+)39\)|(00|\+)39)? # country code or trunk code, with or without parentheses
( # followed by one of the following
32[89]| # 328 or 329
33[013-9]| # 33x where x != 2
34[04-9]| # 34x where x not in 1,2,3
35[01]| # 350 or 351
36[068]| # 360 or 366 or 368
37[019] # 370 or 371 or 379
38[089]) # 380 or 388 or 389
\d{6,7} # ... followed by 6 or 7 digits
$ # and end of line
There are obvious accidental gaps which will probably also get filled over time. Generalizing this further is likely to improve resilience toward future changes, but of course may at the same time increase the risk of false positives. Make up your mind about which is worse.
I found this and i updated with new operators and MVNO prefixes (Iliad, ho.)
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])\d{6,7}$
I improved the regex adding the case to handle space between numbers:
^(\((00|\+)39\)|(00|\+)39)?(38[890]|34[4-90]|36[680]|33[13-90]|32[89]|35[01]|37[019])(\s?\d{3}\s?\d{3,4}|\d{6,7})$
so, for example, I can match phone number like this (0039) 349 123 4567 or this 349 123 4567
Following doc:
https://it.qaz.wiki/wiki/Telephone_numbers_in_Italy
A simple regex for MOBILE italian numbers without special chars is:
/^3[0-9]{8,9}$/
it match a string starting with the digit '3' and followed by 8 or 9 digits, ex:
3345678103
you can add then ITALIAN prefix like '+39 ' or '0039 '
/^+39 3[0-9]{8,9}$/ --- match --> +39 3345678103
/^\0039 3[0-9]{8,9}$/ --- match --> 0039 3345678103

Phone validation regex

I'm using this pattern to check the validation of a phone number
^[0-9\-\+]{9,15}$
It's works for 0771234567 and +0771234567,
but I want it to works for 077-1234567 and +077-1234567 and +077-1-23-45-67 and +077-123-45-6-7
What should I change in the pattern?
Please refer to this SO Post
example of a regular expression in jquery for phone numbers
/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/
(123) 456 7899
(123).456.7899
(123)-456-7899
123-456-7899
123 456 7899
1234567899
are supported
This solution actually validates the numbers and the format. For example: 123-456-7890 is a valid format but is NOT a valid US number and this answer bears that out where others here do not.
If you do not want the extension capability remove the following including the parenthesis:
(?:\s*(?:#|x.?|ext.?|extension)\s*(\d+)\s*)? :)
edit (addendum) I needed this in a client side only application so I converted it. Here it is for the javascript folks:
var myPhoneRegex = /(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]‌​)\s*)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)([2-9]1[02-9]‌​|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})\s*(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+)\s*)?$/i;
if (myPhoneRegex.test(phoneVar)) {
// Successful match
} else {
// Match attempt failed
}
hth.
end edit
This allows extensions or not and works with .NET
(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]‌​)\s*)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)([2-9]1[02-9]‌​|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?$
To validate with or without trailing spaces. Perhaps when using .NET validators and trimming server side use this slightly different regex:
(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]‌​)\s*)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)([2-9]1[02-9]‌​|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})\s*(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+)\s*)?$
All valid:
1 800 5551212
800 555 1212
8005551212
18005551212
+1800 555 1212 extension65432
800 5551212 ext3333
Invalid #s
234-911-5678
314-159-2653
123-234-5678
EDIT: Based on Felipe's comment I have updated this for international.
Based on what I could find out from here and here regarding valid global numbers
This is tested as a first line of defense of course. An overarching element of the international number is that it is no longer than 15 characters. I did not write a replace for all the non digits and sum the result. It should be done for completeness. Also, you may notice that I have not combined the North America regex with this one. The reason is that this international regex will match North American numbers, however, it will also accept known invalid # such as +1 234-911-5678. For more accurate results you should separate them as well.
Pauses and other dialing instruments are not mentioned and therefore invalid per E.164
\(?\+[0-9]{1,3}\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})?
With 1-10 letter word for extension and 1-6 digit extension:
\(?\+[0-9]{1,3}\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})? ?(\w{1,10}\s?\d{1,6})?
Valid International: Country name for ref its not a match.
+55 11 99999-5555 Brazil
+593 7 282-3889 Ecuador
(+44) 0848 9123 456 UK
+1 284 852 5500 BVI
+1 345 9490088 Grand Cayman
+32 2 702-9200 Belgium
+65 6511 9266 Asia Pacific
+86 21 2230 1000 Shanghai
+9124 4723300 India
+821012345678 South Korea
And for your extension pleasure
+55 11 99999-5555 ramal 123 Brazil
+55 11 99999-5555 foo786544 Brazil
Enjoy
I have a more generic regex to allow the user to enter only numbers, +, -, whitespace and (). It respects the parenthesis balance and there is always a number after a symbol.
^([+]?[\s0-9]+)?(\d{3}|[(]?[0-9]+[)])?([-]?[\s]?[0-9])+$
false, ""
false, "+48 504 203 260##"
false, "+48.504.203.260"
false, "+55(123) 456-78-90-"
false, "+55(123) - 456-78-90"
false, "504.203.260"
false, " "
false, "-"
false, "()"
false, "() + ()"
false, "(21 7777"
false, "+48 (21)"
false, "+"
true , " 1"
true , "1"
true, "555-5555-555"
true, "+48 504 203 260"
true, "+48 (12) 504 203 260"
true, "+48 (12) 504-203-260"
true, "+48(12)504203260"
true, "+4812504203260"
true, "4812504203260
Consider:
^\+?[0-9]{3}-?[0-9]{6,12}$
This only allows + at the beginning; it requires 3 digits, followed by an optional dash, followed by 6-12 more digits.
Note that the original regex allows 'phone numbers' such as 70+12---12+92, which is a bit more liberal than you probably had in mind.
The question was amended to add:
+077-1-23-45-67 and +077-123-45-6-7
You now probably need to be using a regex system that supports alternatives:
^\+?[0-9]{3}-?([0-9]{7}|[0-9]-[0-9]{2}-[0-9]{2}-[0-9]{2}|[0-9]{3}-[0-9]{2}-[0-9]-[0-9])$
The first alternative is seven digits; the second is 1-23-45-67; the third is 123-45-6-7. These all share the optional plus + followed by 3 digits and an optional dash - prefix.
The comment below mentions another pattern:
+077-12-34-567
It is not at all clear what the general pattern should be - maybe one or more digits separated by dashes; digits at front and back?
^\+?[0-9]{3}-?[0-9](-[0-9]+)+$
This will allow the '+077-' prefix, followed by any sequence of digits alternating with dashes, with at least one digit between each dash and no dash at the end.
/^[0-9\+]{1,}[0-9\-]{3,15}$/
so first is a digit or a +, then some digits or -
First test the length of the string to see if it is between 9 and 15.
Then use this regex to validate:
^\+?\d+(-\d+)*$
This is yet another variation of the normal* (special normal*)* pattern, with normal being \d and special being -.
I tried :
^(1[ \-\+]{0,3}|\+1[ -\+]{0,3}|\+1|\+)?((\(\+?1-[2-9][0-9]{1,2}\))|(\(\+?[2-8][0-9][0-9]\))|(\(\+?[1-9][0-9]\))|(\(\+?[17]\))|(\([2-9][2-9]\))|([ \-\.]{0,3}[0-9]{2,4}))?([ \-\.][0-9])?([ \-\.]{0,3}[0-9]{2,4}){2,3}$
I took care of special country codes like 1-97... as well. Here are the numbers I tested against (from Puneet Lamba and MCattle):
***** PASS *****
18005551234
1 800 555 1234
+1 800 555-1234
+86 800 555 1234
1-800-555-1234
1.800.555.1234
+1.800.555.1234
1 (800) 555-1234
(800)555-1234
(800) 555-1234
(800)5551234
800-555-1234
800.555.1234
(+230) 5 911 4450
123345678
(1) 345 654 67
+1 245436
1-976 33567
(1-734) 5465654
+(230) 2 345 6568
***** CORRECTLY FAILING *****
(003) 555-1212
(103) 555-1212
(911) 555-1212
1-800-555-1234p
800x555x1234
+1 800 555x1234
***** FALSE POSITIVES *****
180055512345
1 800 5555 1234
+867 800 555 1234
1 (800) 555-1234
86 800 555 1212
Originally posted here: Regular expression to match standard 10 digit phone number
Here is the regex for Ethiopian phone numbers (EthioTelecom and Safaricom). For my fellow Ethiopian developers ;)
phoneExp = /^(^\+251|^251|^0)?(9|7)\d{8}$/;
It matches the following (restrict any unwanted character in start and end position)
+251912345678
251912345678
0912345678
912345678
+251712345678
251712345678
0712345678
712345678
You can test it on this site regexr.
^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$
Matches the following cases:
123-456-7890
(123) 456-7890
123 456 7890
123.456.7890
+91 (123) 456-7890
Try this
\+?\(?([0-9]{3})\)?[-.]?\(?([0-9]{3})\)?[-.]?\(?([0-9]{4})\)?
It matches the following cases
+123-(456)-(7890)
+123.(456).(7890)
+(123).(456).(7890)
+(123)-(456)-(7890)
+123(456)(7890)
+(123)(456)(7890)
123-(456)-(7890)
123.(456).(7890)
(123).(456).(7890)
(123)-(456)-(7890)
123(456)(7890)
(123)(456)(7890)
For further explanation on the pattern CLICKME
The following regex matches a '+' followed by n digits
var mobileNumber = "+18005551212";
var regex = new RegExp("^\\+[0-9]*$");
var OK = regex.test(mobileNumber);
if (OK) {
console.log("is a phone number");
} else {
console.log("is NOT a phone number");
}
^+?\d{3}-?\d{2}-?\d{2}-?\d{3}$
You may try this....
How about this one....Hope this helps...
^(\\+?)\d{3,3}-?\d{2,2}-?\d{2,2}-?\d{3,3}$
^[0-9\-\+]{9,15}$
would match 0+0+0+0+0+0, or 000000000, etc.
(\-?[0-9]){7}
would match a specific number of digits with optional hyphens in any position among them.
What is this +077 format supposed to be?
It's not a valid format. No country codes begin with 0.
The digits after the + should usually be a country code, 1 to 3 digits long.
Allowing for "+" then country code CC, then optional hyphen, then "0" plus two digits, then hyphens and digits for next seven digits, try:
^\+CC\-?0[1-9][0-9](\-?[0-9]){7}$
Oh, and {3,3} is redundant, simplifes to {3}.
This regex matches any number with the common format 1-(999)-999-9999 and anything in between. Also, the regex will allow braces or no braces and separations with period, space or dash. "^([01][- .])?(\(\d{3}\)|\d{3})[- .]?\d{3}[- .]\d{4}$"
Adding to #Joe Johnston's answer, this will also accept:
+16444444444,,241119933
(Required for Apple's special character support for dial-ins - https://support.apple.com/kb/PH18551?locale=en_US)
\(?\+[0-9]{1,3}\)? ?-?[0-9]{1,3} ?-?[0-9]{3,5} ?-?[0-9]{4}( ?-?[0-9]{3})? ?([\w\,\#\^]{1,10}\s?\d{1,10})?
Note: Accepts upto 10 digits for extension code
/^(([+]{0,1}\d{2})|\d?)[\s-]?[0-9]{2}[\s-]?[0-9]{3}[\s-]?[0-9]{4}$/gm
https://regexr.com/4n3c4
Tested for
+94 77 531 2412
+94775312412
077 531 2412
0775312412
77 531 2412
// Not matching
77-53-12412
+94-77-53-12412
077 123 12345
77123 12345
JS code:
function checkIfValidPhoneNumber(input){
"use strict";
if(/^((\+?\d{1,3})?[\(\- ]?\d{3,5}[\)\- ]?)?(\d[.\- ]?\d)+$/.test(input)&&input.replace(/\D/g,"").length<=15){
return true;
} else {
return false;
}
}
It may be primitive in terms of checking phone number, but it checks that input text is compliant with E.164 recommendation.
Maximum phone length is 15 digits
Country code consists of 1 to 3 digits, could be preceded with plus (could be omitted)
Region (network) code consists of 3 to 5 digits (could be omitted but only if country code is omitted)
It allows some delimiters in phone number and around region code (.- )
For example:
+7(918)000-12-34
911
1-23456-789.10.11.12
all are compliant with E.164 and validated
for all phone number format:
/^\+?([87](?!95[5-7]|99[08]|907|94[^09]|336)([348]\d|9[0-6789]|7[01247])\d{8}|[1246]\d{9,13}|68\d{7}|5[1-46-9]\d{8,12}|55[1-9]\d{9}|55[138]\d{10}|55[1256][14679]9\d{8}|554399\d{7}|500[56]\d{4}|5016\d{6}|5068\d{7}|502[345]\d{7}|5037\d{7}|50[4567]\d{8}|50855\d{4}|509[34]\d{7}|376\d{6}|855\d{8,9}|856\d{10}|85[0-4789]\d{8,10}|8[68]\d{10,11}|8[14]\d{10}|82\d{9,10}|852\d{8}|90\d{10}|96(0[79]|17[0189]|181|13)\d{6}|96[23]\d{9}|964\d{10}|96(5[569]|89)\d{7}|96(65|77)\d{8}|92[023]\d{9}|91[1879]\d{9}|9[34]7\d{8}|959\d{7,9}|989\d{9}|971\d{8,9}|97[02-9]\d{7,11}|99[^4568]\d{7,11}|994\d{9}|9955\d{8}|996[2579]\d{8}|998[3789]\d{8}|380[345679]\d{8}|381\d{9}|38[57]\d{8,9}|375[234]\d{8}|372\d{7,8}|37[0-4]\d{8}|37[6-9]\d{7,11}|30[69]\d{9}|34[679]\d{8}|3459\d{11}|3[12359]\d{8,12}|36\d{9}|38[169]\d{8}|382\d{8,9}|46719\d{10})$/