Regex for UK phone number - regex

I need to validate uk numbers
Below are sample type of number
01457 341235
0229 111111
+1213 3133143
Optional Plus should be allowed at first postion only
Using this regex but not working
^(?:\W*\d){11}\W*$

An actual UK phone number will start with 0 or +44 (the latter being the UK country code), or possibly just 44, followed by nine or ten digits. A regex to capture that would look something like:
^(?:0|\+?44)(?:\d\s?){9,10}$
In this regex, I have allowed the digits to be separated by spaces in any way, because there isn't a single standardized way of breaking down the numbers. You could further narrow this down to certain allowed groupings, if you like, but it would greatly increase the complexity of the regex.
Your question implies you might want something broader or different. As some of your examples aren't valid UK numbers (+1213 3133143, 12345 123456).
You could use something like this to simply match between 10 and 12 digits, with arbitrary spacing, possibly preceded by a +:
^\+?(?:\d\s?){10,12}$

Related

Width-insensitive regex matching

I know how to make a regex case-insensitive. This is not about case but about character width. I'm looking for something that is width-insensitive. In Japanese, you have half-width and full-width characters (consider 0123 vs 0123 or ABCD vs ABCD). You can make SQL Server databases width-insensitive with _WI (or width-sensitive with _WS). I was hoping there would be something similar for a regex.
I'm trying to find birth dates where the numbers can be half and full width. Here's an illustration of the problem
For a more specific date matching problem, here's another illustration:
So while \d{4} finds instances of 4 digits, it will not find 4 full-width digits, etc. The workaround I've found is to do something like [0123456789|\d]{4} like so:
But that feels really dirty. Is there a better way to do this?
To match ASCII or full-width digits you can use
[0-9\d]
Or, [\uFF10-\uFF19\d] if you need to abstract from using Unicode text in your source.
Note that in ECMAScript 2018 and later, you can use \p{N} or \p{Nd} to match all Unicode digits.
The current \p{N} range (encompassing Number, Decimal Digit (Nd), Number, Letter (Nl) and Number, Other (No) categories) regex matching 1,791 code points is
(?:[0-9\xB2\xB3\xB9\xBC-\xBE\u0660-\u0669\u06F0-\u06F9\u07C0-\u07C9\u0966-\u096F\u09E6-\u09EF\u09F4-\u09F9\u0A66-\u0A6F\u0AE6-\u0AEF\u0B66-\u0B6F\u0B72-\u0B77\u0BE6-\u0BF2\u0C66-\u0C6F\u0C78-\u0C7E\u0CE6-\u0CEF\u0D58-\u0D5E\u0D66-\u0D78\u0DE6-\u0DEF\u0E50-\u0E59\u0ED0-\u0ED9\u0F20-\u0F33\u1040-\u1049\u1090-\u1099\u1369-\u137C\u16EE-\u16F0\u17E0-\u17E9\u17F0-\u17F9\u1810-\u1819\u1946-\u194F\u19D0-\u19DA\u1A80-\u1A89\u1A90-\u1A99\u1B50-\u1B59\u1BB0-\u1BB9\u1C40-\u1C49\u1C50-\u1C59\u2070\u2074-\u2079\u2080-\u2089\u2150-\u2182\u2185-\u2189\u2460-\u249B\u24EA-\u24FF\u2776-\u2793\u2CFD\u3007\u3021-\u3029\u3038-\u303A\u3192-\u3195\u3220-\u3229\u3248-\u324F\u3251-\u325F\u3280-\u3289\u32B1-\u32BF\uA620-\uA629\uA6E6-\uA6EF\uA830-\uA835\uA8D0-\uA8D9\uA900-\uA909\uA9D0-\uA9D9\uA9F0-\uA9F9\uAA50-\uAA59\uABF0-\uABF9\uFF10-\uFF19]|\uD800[\uDD07-\uDD33\uDD40-\uDD78\uDD8A\uDD8B\uDEE1-\uDEFB\uDF20-\uDF23\uDF41\uDF4A\uDFD1-\uDFD5]|\uD801[\uDCA0-\uDCA9]|\uD802[\uDC58-\uDC5F\uDC79-\uDC7F\uDCA7-\uDCAF\uDCFB-\uDCFF\uDD16-\uDD1B\uDDBC\uDDBD\uDDC0-\uDDCF\uDDD2-\uDDFF\uDE40-\uDE48\uDE7D\uDE7E\uDE9D-\uDE9F\uDEEB-\uDEEF\uDF58-\uDF5F\uDF78-\uDF7F\uDFA9-\uDFAF]|\uD803[\uDCFA-\uDCFF\uDD30-\uDD39\uDE60-\uDE7E\uDF1D-\uDF26\uDF51-\uDF54\uDFC5-\uDFCB]|\uD804[\uDC52-\uDC6F\uDCF0-\uDCF9\uDD36-\uDD3F\uDDD0-\uDDD9\uDDE1-\uDDF4\uDEF0-\uDEF9]|\uD805[\uDC50-\uDC59\uDCD0-\uDCD9\uDE50-\uDE59\uDEC0-\uDEC9\uDF30-\uDF3B]|\uD806[\uDCE0-\uDCF2\uDD50-\uDD59]|\uD807[\uDC50-\uDC6C\uDD50-\uDD59\uDDA0-\uDDA9\uDFC0-\uDFD4]|\uD809[\uDC00-\uDC6E]|\uD81A[\uDE60-\uDE69\uDEC0-\uDEC9\uDF50-\uDF59\uDF5B-\uDF61]|\uD81B[\uDE80-\uDE96]|\uD834[\uDEE0-\uDEF3\uDF60-\uDF78]|\uD835[\uDFCE-\uDFFF]|\uD838[\uDD40-\uDD49\uDEF0-\uDEF9]|\uD83A[\uDCC7-\uDCCF\uDD50-\uDD59]|\uD83B[\uDC71-\uDCAB\uDCAD-\uDCAF\uDCB1-\uDCB4\uDD01-\uDD2D\uDD2F-\uDD3D]|\uD83C[\uDD00-\uDD0C]|\uD83E[\uDFF0-\uDFF9])
and the \p{Nd} (with 660 code points) converts to
(?:[0-9\u0660-\u0669\u06F0-\u06F9\u07C0-\u07C9\u0966-\u096F\u09E6-\u09EF\u0A66-\u0A6F\u0AE6-\u0AEF\u0B66-\u0B6F\u0BE6-\u0BEF\u0C66-\u0C6F\u0CE6-\u0CEF\u0D66-\u0D6F\u0DE6-\u0DEF\u0E50-\u0E59\u0ED0-\u0ED9\u0F20-\u0F29\u1040-\u1049\u1090-\u1099\u17E0-\u17E9\u1810-\u1819\u1946-\u194F\u19D0-\u19D9\u1A80-\u1A89\u1A90-\u1A99\u1B50-\u1B59\u1BB0-\u1BB9\u1C40-\u1C49\u1C50-\u1C59\uA620-\uA629\uA8D0-\uA8D9\uA900-\uA909\uA9D0-\uA9D9\uA9F0-\uA9F9\uAA50-\uAA59\uABF0-\uABF9\uFF10-\uFF19]|\uD801[\uDCA0-\uDCA9]|\uD803[\uDD30-\uDD39]|\uD804[\uDC66-\uDC6F\uDCF0-\uDCF9\uDD36-\uDD3F\uDDD0-\uDDD9\uDEF0-\uDEF9]|\uD805[\uDC50-\uDC59\uDCD0-\uDCD9\uDE50-\uDE59\uDEC0-\uDEC9\uDF30-\uDF39]|\uD806[\uDCE0-\uDCE9\uDD50-\uDD59]|\uD807[\uDC50-\uDC59\uDD50-\uDD59\uDDA0-\uDDA9]|\uD81A[\uDE60-\uDE69\uDEC0-\uDEC9\uDF50-\uDF59]|\uD835[\uDFCE-\uDFFF]|\uD838[\uDD40-\uDD49\uDEF0-\uDEF9]|\uD83A[\uDD50-\uDD59]|\uD83E[\uDFF0-\uDFF9])

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Regex to validate only US and India retail number

I am trying to get a regex which serves below requirements:
Validates US, India retail phone number
Excludes special purpose/business purpose phone numbers in both countries. I.e. starting with 800, 888, 877, and 866, 900, at least 10 digits for US, there can be more guidelines but above is just for example.
It should validate special chars if any like (, ), +, 1, 0 if included but satisfies all this points than should be a valid phone number.
If preceded by STD, ISD consider it as valid.
Landline, mobile both should be valid.
I looked whether some came across the same requirements, but the solutions I am getting serve different requirements and not exactly the one I am looking for.
Without a definitive exclusion/inclusion list of the phone numbers you want to match, here is a "template" regular expression that you could use to match US numbers:
(?:^|\b)(\+?1[ -.\/]?)?\(?(?!37|950|958|959|96|976)[2-9]([0-8])(?!\2)\d(?:\) ?|[ -.\/])?[2-9](?!11)\d\d[ -.]?\d{4}(?:$|\b)
A break-down:
(?:^|\b): Start of string or break. This prevents, for example, the match of digits to start in the middle of a longer series of digits;
(\+?1[ -.\/]?)?: this matches an optional prefix of the US country code (i.e. 1), and accepts values like +1, 1/, +1, 1;
\(?: an optional opening bracket for the region code;
(?!37|950|958|959|96|976): exclusion list of region codes. When only 2 digits are given, any region code starting with those is rejected -- you'll need to extend this list to identify other "special business" phone numbers you want to exclude;
[2-9]: first digit of region code; cannot be 0 or 1;
([0-8]): second digit of region code; cannot be 9;
(?!\2)\d: third digit of region code; cannot be the same as the second digit (\2 refers to the second match group);
(?:\) ?|[ -.\/])?: optional separator: ),-,.,/, or space. If ), it can optionally be followed by a space;
[2-9]: first digit of exchange code; cannot be 0 or 1;
(?!11): exclusion for second and third digits of exchange code -- they cannot both be 1 at the same time;
\d\d: second and third digit of exchange code; no further limitations;
[ -.]?: optional separator; can be -, . or space;
\d{4}: four digit customer number; no restrictions.
(?:$|\b): End of string or break. This prevents, for example, the match of digits to stop in the middle of a longer series of digits;
Here is an online regex test.
I suppose with the above as inspiration, you could fine-tune it to your expectations, and add the Indian formats in the same manner. You can use the | operator to separate the two sub-regular expressions you will have, like (US|IND), where you need to replace those two arguments by real expressions of course.
To capture also the prefix STD or ISD, you can insert the following in the above regex, right after the break test:
(?:STD\b\s*|ISD\b\s*|)
...which matches these optional words followed by optional spaces.
However, the complexity of the final regex will increase the more precise you want to match and exclude invalid numbers. For example, if you would want to validate against the All India STD Code List, then your regular expression would get very long and hard to manage.

Regex Matching Phone Number

I'm trying to create a regex which will match anything which looks like a phone number. If there's more than one number in a string, match both of them. A phone number is defined as:
10+ characters
Does not end in N, but can end in other letters/words
So I'd like to match these:
07158245215
01244356356
07158245215Y
01244356356Y
07158245215P
01244356356P
07158245215X
01244356356X
07158245215 work
01244 356356 work
work 07158 245215 / home 07158 245215 // might be a difficult one
work 01244356356
And disallow these:
071582 45215N
01244356356N
01244356356 N
I've toyed with negative lookahead/lookbehind but I can't get anything intelligible out. Is tis even possible or shall I do it in a higher language like .NET?
(?:\d\s*){10,}(?![\d\s]*N)
will match a 10+ digit phone number within a longer string, as long as that number is not followed by N. It allows any number of spaces between each digit.
If all your phone numbers always start with 0 as in your example, you can explicitly code that into the regex:
\b0\s*(?:\d\s*){9,}(?![\d\s]*N)
See it on RegExr.

Custom RegEx expression for validating different possibilities of phone number entries?

I'm looking for a custom RegEx expression (that works!) to will validate common phone number with area code entries (no country code) such as:
111-111-1111
(111) 111-1111
(111)111-1111
111 111 1111
111.111.1111
1111111111
And combinations of these / anything else I may have forgotton.
Also, is it possible to have the RegEx expression itself reformat the entry? So take the 1111111111 and put it in 111-111-1111 format. The regex will most likely be entered in a Joomla / some type of CMS module, so I can't really add code to it aside from the expression itself.
\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})
will match all your examples; after a match, backreference 1 will contain the area code, backreference 2 and 3 will contain the phone number.
I hope you don't need to handle international phone numbers, too.
If the phone number is in a string by itself, you could also use
^\s*\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})\s*$
allowing for leading/trailing whitespace and nothing else.
Why not just remove spaces, parenthesis, dashes, and periods, then check that it is a number of 10 digits?
Depending on the language in question, you might be better off using a replace-like statement to replace non-numeric characters: ()-/. with nothing, and then just check if what is left is a 10-digit number.