Regular Expression catching uk phone number - regex

I trying to create a regular expression to catch the following conditions, but totally failing to get my head around it (Friday) and need a bit of help please?
Trying to capture UK phone numbers starting with area code or no area code, but excluding mobiles.
example: 01316691234 or 6691234 but not any number starting with 07
got this so far ^[0-9]1?(\d{6,11}) but struggling to exclude the 07 numbers.

This is based on the supposition that UK area codes:
start with 0 and are followed by 1 (usual) or 2 (London);
run to 3-5 digits
are followed by a phone number 6-7 digits long
Whilst this seems sound to me, I'm no telecoms anorak so you'll need to modify accordingly if any part of this supposition is wrong:
/^(0[12345689]\d{1,3} ?)?\d{6,7}$/
Either way, it's a bit of a can of worms. Postscodes and phone numbers don't lend themselves well to REGEX; the more tightly you refine it, the more at risk you are from new rules being added tomorrow - e.g. if they launched a new area code starting 03.

Use a negative look ahead to prevent numbers starting with "07" matching:
^(?!07)([0-9]1)?(\d{6,11})

Related

Regex improvements for international, common and RF3966 phone number validation?

Context
Hi, earlier I was browsing the web in order to find a quick answer about telephone number validation in one regex formula : for emergency, short, international, french, spanish and north american numbers (normal, fancy and extended versions).
Strangely, I couldn't find better than "A comprehensive regex for phone number formula", since it seems to be the best topic about this, or I missed it, which is totally possible.
So I'm new to the site and actually writing this very first question (yeah!), since that other thread is currently on hold of some sort : seems the author didn't get what he and I were seeking.
That makes at least three of us who would like to have a good solution, as I know at least my pal, the one who asked me first about finding one to be used in simple integrations like his Google Forms.
Hence my current question(s) and own answer to begin with, since I took some night time to build my own based on advices and tests patterns from the best replies on the other thread. If you're interested by the topic, there are some interesting elements.
Questions
What is the best way to optimize and improve this regex (without resorting to coding) which is dedicated to validation of international and most national phone numbers (along the recommendations of RFC 3966 at least)?
Not sure if I can add a related question as well (since it is still on purpose to improve the usefulness of the regex pattern), no harm asking I guess.
Are there other commonly-used formats that this regex should match (and not)?
If you can add them (or a link) here for me to update my test bundles, I would be thankful. Equally useful would be phone numbers that should definitely not be validated (the unwanted).
My initial solution
My current regex solution (version 4) on Regular Expressions 101
An earlier version was matching results despite leading and trailing whitespaces, not that useful to the point (a bit too fancy for the exceution time).
The latest version at the time of writing took into consideration the other posts on the subject RFC 3966 (from the IETF standards) and the wikipedia article on "Natural conventions for writing telephone numbers".
Another potentially side dish is to isolate matching groups for country code, area code and extended code... and things work relatively dandy to a certain point : it only works well when there are some separators (or the parenthesis) to distinguish those groups of digits.
Matching goals
Emergency and short numbers : 112 or 911
Spanish international : +34 987 654 321
French extended +33 (0)1 23 45 67 89
French national : 01 23 45 67 89
American extended : 001-(123)-456-7890 ext-4321
German (Microsoft style) : +49 (1234) 567890
Mexican national : (01 55) 1234 5678
Hypothetical international number (max length?) : 00321-(4321)-567.89 ext-4321
Another matching goal is to have a regex that do not under-perform too much, not really picky since it is not to be used in critical parts of code.
Still, how could we optimize those best regex(es) people will find/propose without changing their results?
Goals from the main thread
+1(234)/567.8901 x1234 and the like (with different permutations of separators : ., /, - and horizontal whitespaces.
2345678901 : same US number dialed in the states I guess.
Not sure how it should work since I though that + (or its equivalent the double zero 00) was required in front of any international number... always done it that way. The other thread had a list of positive matches without.
Could someone confirm that + or 00 is not mandatory to US numbers? Thank you again.
Best of unwanted formats
12(34567890 and 123)456789012345 : unmatched parenthesis.
)123(34567890 : parenthesis are wrongly matched.
++34123456789 : double + is a typo.
+9-123/456.7890 x12345 : ext has 4 numbers top.
1-234-567-8901 : missing 00 or + at the beginning of an international number.
1234 to 12345678 : not a short number, yet not a normal one (between 9 and 12? as far as i know).
1234567890123 : over max length (since without international features).
0012312345678901 : over max length (as international number).
Regex101.com was a big plus to rewrite and test the regex to this point, I couldn't have progressed so far without its help. Yet, I'm no expert so I can only scratch the surface here and I need your help to improve this.
Thank you for reading, it was very educating to write the question (but not something I would do every day, very time-consuming at my pace), hope it will find its answers as well. Have a nice day (or night... ;) ).
Before I forgot, here's the post of the latest version of the regex I put together and its code :
^(?=(?:\+|0{2})?(?:(?:[\(\-\)\.\/ \t\f]*\d){7,10})?(?:[\-\.\/ \t\f]?\d{2,3})(?:[\-\s]?[ext]{1,3}[\-\.\/ \t\f]?\d{1,4})?$)((?:\+|0{2})\d{0,3})?(?:[\-\.\/ \t\f]?)(\(0\d[ ]?\d{0,4}\)|\(\d{0,4}\)|\d{0,4})(?:[\-\.\/ \t\f]{0,2}\d){3,8}(?:[\-\s]?(?:x|ext)[\-\t\f ]?(\d{1,4}))?$
As far as I know, it pass the tests I put in the question and some more that I added on that Regex101.com page. You can even fork it, very useful feature indeed, I'm a new fan. :)
The code seems to work, as is, with PHP (pcre), Python and Javascript (but not Golang) with different performance that are not awesome but good enough for our purpose.
For instance, I wanted to use \h for horizontal whitespaces (instead of \t, \f and space, but it is less compatible with the different platforms.
It still need a lot of improvements, and I'm eager to see what you will be cooking to answer this little problem of ours, but I'm spent... already a sunny morning here. Good night folks.

Excluding % from a Regex number search

I'm attempting to create a Regex that finds only 2-digit integers or numbers with a precision of 2 decimal points.
In the example string at the bottom, I want to find only the following:
21 and 10.50
Using this expression, 100% is getting captured, in addition to the strings I desire to capture:
(\d){1,2}(\.?)([0-9]?[0-9]?){1,2}
I know I need to use ^% somewhere, but I can't figure out where it goes. Any suggestions are greatly appreciated.
Here's my sample string:
Earn Up to $21 Per Hour - Deliver Food with !!
Delivery Drivers work when they want and make great money when they do.
All orders are prepaid, just pick them up and deliver them to hungry diners. No waiting in line or fumbling with receipts and prepaid cards.
It's fast and easy to start working. Get started today.
Apply Now
Why choose ?
More orders than any other takeout platform
100% of our restaurants are official partners
Competitive pay: Per order fee + mileage + tips
We guarantee an hourly minimum of $10.50/hour*
Create your own schedule & work the hours you want
Word boundaries in your regular expression will grant you a bit more control.
Since word boundaries are a bit strict, we need to introduce an OR condition to address both cases which will satisfy your regex.
(\b[\d]{2}\.[\d]{2}\b)|(\b[\d]{2}\b)
Edit: Try this one,
\b[\d]{2}\b(\.[\d]{2})?
The first example has a chance to fail as it is order dependent due to the way it short-circuits. This I believe should address multiple cases properly.
I think this should work:
(?<!\d)((\d+\.\d\d)|(\d\d))(?!%|\d)
Demo (and explanation)
EDIT:
Improved version:
(?<!\d)(\d{1,2}(?:\.\d{1,2})?)(?!%|\d)
Demo (and explanation)
You can try this variant: (\d{1,}|[\d.])\b(?!%)
It uses negative lookahead (?!%) to exclude digits following by % sign.
Details at regex101

Using RegEx to Find a Block of Text

I'm attempting to block a long string of unnecessary text that's on every page of a document.
Ex: "36075 This is another page and this is the date March 4 2013"
I know this must be very simple, but I'm hoping there is a way to block text verbatim. Is the only way to block this text by using a lot of /d/s/w+/+ etc or is there is a way to say, "match 36075 This is another page and this is the date March 4 2013".
This would be SO HELPFUL to know. Thank you for helping!
From what you wrote I assume you need to get leading numbers from string, to do it you just need to use this pattern: ^\d+ which from this input:
36075 This is another page and this is the date March 4 2013
will return this:
36075
For future, in case of such questions please provide example string and expected output. As well as what you have tried.
I realized the issue I was having. I didn't need to use RegEx. The program I was using has the functionality to match specific words or groups of words and pronounce them differently. What I discovered is that it will not match the words unless the word groups are input exactly the way the program typically reads them.
Ergo --> The channel saw
the end of the British hold over
Would have to be listed as one group for, "The channel saw" and a second group for "the end of the British hold over"
In addition, there were some numbers --> 11960_30_o_ho_
and if the program naturally read 119 and then 60_3 and then _o_ho_ then three strings would need to be input for each section.
A few frustrating hours later, problem solved :) Thank you for your assistance.

SQL Server Regular Expression Workaround in T-SQL?

I have some SQLCLR code for working with Regular Expresions. But now that it is getting migrated into Azure, which does not allow SQLCLR, that's out. I need to find a way to do regex in pure T-SQL.
Master Data Services are not available because the dev edition of MSSQL we have is not R2.
All ideas appreciated, thanks.
Regular expression match samples that need handling
(culled from regexlib and other places over the past few years)
email address
^[\w-]+(\.[\w-]+)*#([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$
dollars
^(\$)?(([1-9]\d{0,2}(\,\d{3})*)|([1-9]\d*)|(0))(\.\d{2})?$
uri
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*#)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$
one numeric digit
^\d$
percentage
^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$
height notation
^\d?\d'(\d|1[01])"$
numbers between 1 1000
^([1-9]|[1-9]\d|1000)$
credit card numbers
^((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}$
list of years
^([1-9]{1}[0-9]{3}[,]?)*([1-9]{1}[0-9]{3})$
days of the week
^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T((ue?)|(hu?r?))\.?$
time on 12 hour clock
(?<Time>^(?:0?[1-9]:[0-5]|1(?=[012])\d:[0-5])\d(?:[ap]m)?)
time on 24 hour clock
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
usa phone numbers
^\(?[\d]{3}\)?[\s-]?[\d]{3}[\s-]?[\d]{4}$
Unfortunately, you will not be able to move your CLR function(s) to SQL Azure. You will need to either use the normal string functions (PATINDEX, CHARINDEX, LIKE, and so on) or perform these operations outside of the database.
EDIT Adding some information for the examples added to the question.
Email address
This one is always controversial because people disagree about which version of the RFC they want to support. The original didn't support apostrophes, for example (or at least people insist that it didn't - I haven't dug it up from the archives and read it myself, admittedly), and it has to be expanded quite often for new TLDs (once for 4-letter TLDs like .info, then again for 6-letter TLDs like .museum). I've often heard quite knowledgeable people state that perfect e-mail validation is impossible, and having previously worked for an e-mail service provider, I can tell you that it was a constantly moving target. But for the simplest approaches, see the question TSQL Email Validation (without regex).
One numeric digit
Probably the easiest one of the bunch:
WHERE #s LIKE '[0-9]';
Credit card numbers
Assuming you strip out dashes and spaces, which you should do in any case. Note that this isn't an actual check of the credit card number algorithm to ensure that the number itself is actually valid, just that it conforms to the general format (AmEx = 15 digits starting with a 3, the rest are 16 digits - Visa starts with a 4, MasterCard starts with a 5, Discover starts with 6 and I think there's one that starts with a 7 (though that may just be gift cards of some kind)):
WHERE #s + ' ' LIKE '[3-7]'+ REPLICATE('[0-9]', 14) + '[0-9 ]';
If you want to be a little more precise at the cost of being long-winded, you can say:
WHERE (LEN(#s) = 15 AND #s LIKE '3' + REPLICATE('[0-9]', 14))
OR (LEN(#s) = 16 AND #s LIKE '[4-7]' + REPLICATE('[0-9]', 15));
USA phone numbers
Again, assuming you're going to strip out parentheses, dashes and spaces first. Pretty sure a US area code can't start with a 1; if there are other rules, I am not aware of them.
WHERE #s LIKE '[2-9]' + REPLICATE('[0-9]', 9);
-----
I'm not going to go further, because a lot of the other expressions you've defined can be extrapolated from the above. Hopefully this gives you a start. You should be able to Google for some of the others to see how other people have replicated the patterns with T-SQL. Some of them (like days of the week) can probably just be checked against a table - seems overkill to do an invasie pattern matching for a set of 7 possible values. Similarly with a list of 1000 numbers or years, these are things that will be much easier (and probably more efficient) to check if the numeric value is in a table rather than convert it to a string and see if it matches some pattern.
I'll state again that a lot of this will be much better if you can cleanse and validate the data before it gets into the database in the first place. You should strive to do this wherever possible, because without CLR, you just can't do powerful RegEx inside SQL Server.
Ken Henderson wrote about ways to replicate RegEx without CLR, but they require sp_OA* procedures, which are even less likely to ever see the light of day in Azure than CLR. Most of the other articles you'll find online use an approach similar to Ken's or use complex use of built-in string functions.
Which portions of RegEx specifically are you trying to replicate? Can you show an example of the input/output of one of your functions? Perhaps it will be easy to convert to get similar results using the built-in string functions like PATINDEX.

telephone number regex

I am currently trying to validate UK telephone numbers:
The format I'm looking for is: 01234 567891 or 01234567891 - So I need the number to have 5 numbers then a space then 6 numbers or simply a 11 numbers.
The number must start with a 0.
I've had a look at a couple of examples:
/^[0-9]{10,11} - to check that the chars are all numbers
/^0[0-9]{9,10}$/ - to check that the first number is a 0
I'm just unsure how to put all these together and check if there is a space or not.
Could someone help me with this regex?
Thanks
Try this regex:
/^0\d{4}\s?\d{6}$/
Many people try to do input validation and formatting in a single step.
It is better to separate these processes.
Match UK telephone number in any format
^(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?$
The above pattern allows the user to enter the number in any format they are comfortable with. Don't constrain the user into entering specific formats.
Extract NSN, prefix and extension
^(\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)(44)\)?[\s-]?)?\(?0?(?:\)[\s-]?)?([1-9]\d{1,4}\)?[\d[\s-]]+)((?:x|ext\.?|\#)\d{3,4})?$
Next, extract the various elements.
$2 will be '44' if international format was used, otherwise assume national format with leading '0'.
$4 contains the extension number if present.
$3 contains the NSN part.
Validation and formatting
Use further RegEx patterns to check the NSN has the right number of digits for this number range. Finally, store the number in E.164 format or display it in E.123 format.
There's a very detailed list of validation and display formatting RegEx patterns for UK numbers at:
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Formatting_UK_Telephone_Numbers
It's too long to reproduce here and it would be difficult to maintain multiple copies of this document.
If you are looking for all UK numbers, I'd look for a bit more than just that number, some are in the format 020 7123 4567 etc.
^\s*\(?(020[7,8]{1}\)?[ ]?[1-9]{1}[0-9{2}[ ]?[0-9]{4})|(0[1-8]{1}[0-9]{3}\)?[ ]?[1-9]{1}[0-9]{2}[ ]?[0-9]{3})\s*$
/\d*(*)*+*-*/
Simple Telephone Regex includes + () and - anywhere, as well as digits
I think ^0[\d]{4}\s?[\d]{5,6}} will work for you. I have used [\d] instead of [0-9].
I find that RegExr is a useful online tool to check and try your regular expressions. It also has a nice library of examples to help point you in the right direction
you should just count the number of digits and check that it's 10,
Some UK numbers have only 9 digits, not 10 (not including the leading 0).
These include 40 of the 01 area codes (using "4+5" format), the 016977 area code (using "5+4" format), all 0500 numbers and some 0800 numbers.
There's a list at: http://www.aa-asterisk.org.uk/index.php/01_numbers
This US numbers pattern accepts following phones as well:
800-432-4500, Opt: 9, Ext: 100316
800-432-4500, Opt: 9, Ext: X100316
800-432-4500, Option #3
(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4}),?(?:\s*(?:#|x\.?|opt(\.|:|\.:)?|option)\s*#?(\d+))?,?(?:\s*(?:#|x\.?|ext(\.|:|\.:)?|extension)\s*(\d+))?
(used this answer in other topic as start point)