Regex for minimum and range of number - regex

I am trying to create a regex for a exactly five digit number which should be in the range between 90000 – 96163.
I created a regex for exactly 5 number
#"^\d{5}$"
Now how do I make sure that it is between the range of 90000 – 96163?
Anything smaller than 90001 and over 96162 should not work.
Thanks

This is most easily achieved using a regular numeric comparison (using < and > operators) in your language.
You can do a range check using regular expressions, but it's tedious to implement and all but nicely readable. For the sake of completeness, here's a possible pattern:
9([0-5][0-9]{3}|6(0[0-9]{2}|1([0-5][0-9]|6[0-3])))
Broken up, the pattern reads as follows:
9 # The first digit must be a 9
(
[0-5][0-9]{3} # Covering the range 90000-95999
|
6 # Matching 96xxx
(
0[0-9]{2} # Covering the range 96000-96099
|
1 # Matching 961xx
(
[0-5][0-9] # Covering the range 96100-96159
|
6[0-3] # Covering the range 96160-96163
)
)
)
Please don't do this if it can be avoided. Just consider what happens when the range boundaries change: Imagine you have to check whether a value is between 7243 and 132843 — not fun.

Digit by digit:
/(?!^90000$)(^9([0-5]\d{3}|6(0\d{2}|1([0-5]\d|6[0-2]))))$/

(9[0-5][0-9]{3}|960[0-9]{2}|961[0-5][0-9]|9616[0-3])
http://gamon.webfactional.com/regexnumericrangegenerator/
This would be enough to find a regex to find a range. No need to ask about range problems

Related

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Regular Expression (RegEx) For Hours with Increments

I need to only accept input that meets these rules...
0.25-24
Increments of .25 (.00, .25, .50, .75)
First digit doesn't have to be required.
Would like trailing zeros to be optional.
Examples of some valid entries:
0.25
.50
.5
1
1.0
5.50
23.75
24 (max allowed)
UPDATE: nothing at all, null/blank, should also be accepted as valid
Example of some invalid entries:
0
.0
.00
0.0
0.00
24.25
-1
I understand that RegEx is a pattern matching language therefore it's not great for ranges, less-than, and great-than checking. So to check if it's less than or equal to 24 means I'd have to find a pattern, right? So there are 24 possible patters which would make this a long RegEx, am I understanding this correctly? I could use ColdFusion to do the check to make sure it's in the 0-24 range. It's not the end of the world if I have use ColdFusion for this part, but it'd be nice to get it all into the RegEx if it doesn't cause it to be too long. This is what I have so far:
^\d{0,2}((\.(0|00|25|5|50|75))?)$
http://regex101.com/r/iS7zM3
This handles pretty much all of it except for the 0-24 range check or the check for just a zero. I'll keep plugging away at it but any help would be appreciated. Thanks!
Change \d{0,2} to (?:1[0-9]?|2[0-4]?|[3-9])? and it'll match from 1 to 24 (or nothing).
You can also simplify the second part to (?:\.(?:00?|25|50?|75))? - you could go further to (?:\.(?:[05]0?|[27]5))? but that might obfuscate the intent a bit too far.
To exclude 24.25 you could perhaps use a negative lookahead (?!24\.[^0]) to prevent anything other than 24.0 or 24.00, but it's probably simpler to just exclude 24 from the main pattern and include a specific check for 24/24.0/24.00 at the start:
(?x)
# checks for 24
^24$|^24\.00?$
|
# integer part
^
(?:1[0-9]?|2[0-3]?|[3-9]|0(?=\.[^0])|(?=\.[^0]))
# decimal part
(?:\.(?:00?|25|50?|75))?
$
That also includes a check for 0(?=\.[^0]) which uses a positive lookahead to only allow an initial 0 if the next char is a . followed by a non-zero (so 0.0 and 0.00 isn't allowed).
The (?x) flag allows whitespace to be ignored, allowing readable regex in your code - obviously preferable to squashing it all onto a single line - and also enables the use of # to start line comments to explain parts of a pattern. (Literal whitespaces and hashes can be escaped with backslash, or encoded via e.g. \x23 for hash.)
For comparison, here's a pure-CFML way of doing it:
IsNumeric(Num)
AND Num GT 0
AND Num LTE 24
AND NOT find('.',Num*4)
Now, are you really sure it's better as a regex...
You could try this regex (broken down):
^
(?:
(?:[1-9]|1\d|2[0-3])(?:\.(?:[05]0?|[27]5))? # Non-zeros with optional decimal
|
0?(?:\.(?:50?|[27]5)) # Decimals under 1
|
24(?:\.00?)? # The maximum
)
$
In one line:
^(?:(?:[1-9]|1\d|2[0-3])(?:\.(?:[05]0?|[27]5))?|0?(?:\.(?:50?|[27]5))|24(?:\.00?)?)$
regex101 demo
^([0-1]?[0-9]|2[0-4])((\.(0|00|25|5|50|75))?)$
This means the one's place can be 0-9 if the tens place is missing, a 0, or 1.
If the tens place is a 2, then the ones place can be 0-4.
The second part is great, it's simple and readable too. It has an extra set of parens though that can be removed, reducing it to this:
^([0-1]?[0-9]|2[0-4])(\.(0|00|25|5|50|75))?$

Is there a regex to match a numeric range, and also ensure that it's a valid range?

Is there a regular expression to match a numeric range, e.g. 1 - 20?
If so, is it possible to ensure that the left value is always less than the right value? It wouldn't make sense to have a range e.g. 20 - 1 or 15 - 5
As the commentors note: if this is possible in a regex it will be very hard. There is no direct support to perform arithmetical comparisons (including greater than) in a regex.
Better to use a regex to validate the format and capture the two numbers. If the regex matches then use the host language to convert the captures into numbers and compare.
Yes you can do that. You can ensure that a given number is smaller than another number in the same text.
This regex tests whether the first number is smaller than the next number: Format: XXX,YYY
XXX < YYY:
\b(?:[1-9](?<open>\B\d)+\d*,(?<close-open>\d)+(?(open)(?!))\b
|
(?<prefix>\d*)(?:(?<g0>0)|(?<g1>1)|(?<g2>2)|(?<g3>3)|(?<g4>4)|(?<g5>5)|(?<g6>6)|(?<g7>7)|(?<g8>8)|(?<g9>9))(?<suffix>\d)*,\k<prefix>(?(g0)(?!)|(?(g1)0|(?(g2)[01]|(?(g3)[0-2]|(?(g4)[0-3]|(?(g5)[0-4]|(?(g6)[0-5]|(?(g7)[0-6]|(?(g8)[0-7]|(?(g9)[0-8]))))))))))(?<suffix2-suffix>\d)*(?(suffix)(?!)))\b
XXX > YYY:
(?<open>\B\d|\b[1-9])+,[1-9](?<close-open>\d)+(?(open)(?!))\d*\b
|
(?<prefix>\d*)(?:(?<g0>0)|(?<g1>1)|(?<g2>2)|(?<g3>3)|(?<g4>4)|(?<g5>5)|(?<g6>6)|(?<g7>7)|(?<g8>8)|(?<g9>9))(?<suffix>\d)*,\k<prefix>(?(g0)[1-9]|(?(g1)[2-9]|(?(g2)[3-9]|(?(g3)[4-9]|(?(g4)[5-9]|(?(g5)[6-9]|(?(g6)[7-9]|(?(g7)[89]|(?(g8)9|(?(g9)(?!)))))))))))(?<suffix2-suffix>\d)*(?(suffix)(?!))
If you want to use - as separator you only have to replace the , with - in this regex. This regex was created and tested using C# regex.

Regexes for integer constants and for binary numbers

I have tried 2 questions, could you tell me whether I am right or not?
Regular expression of nonnegative integer constants in C, where numbers beginning with 0 are octal constants and other numbers are decimal constants.
I tried 0([1-7][0-7]*)?|[1-9][0-9]*, is it right? And what string could I match? Do you think 034567 will match and 000083 match?
What is a regular expression for binary numbers x such that hx + ix = jx?
I tried (0|1){32}|1|(10)).. do you think a string like 10 will match and 11 won’t match?
Please tell me whether I am right or not.
You can always use http://www.spaweditor.com/scripts/regex/ for a quick test on whether a particular regex works as you intend it to. This along with google can help you nail the regex you want.
0([1-7][0-7])?|[1-9][0-9] is wrong because there's no repetition - it will only match 1 or 2-character strings. What you need is something like 0[0-7]*|[1-9][0-9]*, though that doesn't take hexadecimal into account (as per spec).
This one is not clear. Could you rephrase that or give some more examples?
Your regex for integer constants will not match base-10 numbers longer than two digits and octal numbers longer than three digits (2 if you don't count the leading zero). Since this is a homework, I leave it up to you to figure out what's wrong with it.
Hint: Google for "regular expression repetition quantifiers".
Question 1:
Octal numbers:
A string that start with a [0] , then can be followed by any digit 1, 2, .. 7 [1-7](assuming no leading zeroes) but can also contain zeroes after the first actual digit, so [0-7]* (* is for repetition, zero or more times).
So we get the following RegEx for this part: 0 [1-7][0-7]*
Decimal numbers:
Decimal numbers must not have a leading zero, hence start with all digits from 1 to 9 [1-9], but zeroes are allowed in all other positions as well hence we need to concatenate [0-9]*
So we get the following RegEx for this part: [1-9][0-9]*
Since we have two options (octal and decimal numbers) and either one is possible we can use the Alternation property '|' :
L = 0[1-7][0-7]* | [1-9][0-9]*
Question 2:
Quickly looking at Fermat's Last Theorem:
In number theory, Fermat's Last Theorem (sometimes called Fermat's conjecture, especially in older texts) states that no three positive integers a, b, and c can satisfy the equation an + bn = cn for any integer value of n greater than two.
(http://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem)
Hence the following sets where n<=2 satisfy the equation: {0,1,2}base10 = {0,1,10}base2
If any of those elements satisfy the equation, we use the Alternation | (or)
So the regular expression can be: L = 0 | 1 | 10 but can also be L = 00 | 01 | 10 or even be L = 0 | 1 | 10 | 00 | 01
Or can be generalized into:
{0} we can have infinite number of zeroes: 0*
{1} we can have infinite number of zeroes followed by a 1: 0*1
{10} we can have infinite number of zeroes followed by 10: 0*10
So L = 0* | 0*1 | 0*10
max answered the first question.
the second appears to be the unsolvable diophantine equation of fermat's last theorem. if h,i,j are non-zero integers, x can only be 1 or 2, so you're looking for
^0*10?$
does that help?
There are several tool available to test regular expressions, such as The Regulator.
If you search for "regular expression test" you will find numerous links to online testers.

Can I optimize this phone-regex?

Ok, so I have this regex:
( |^|>)(((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{2})(-)?( )?)?)([0-9]{7}))|((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{3})(-)?( )?)?)([0-9]{6}))|((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{1})(-)?( )?)?)([0-9]{8})))( |$|<)
It formats Dutch and Belgian phone numbers (I only want those hence the 31 and 32 as country code).
Its not much fun to decipher but as you can see it also has a lot duplicated. but now it does handles it very accurately
All the following European formatted phone numbers are accepted
0031201234567
0031223234567
0031612345678
+31(0)20-1234567
+31(0)223-234567
+31(0)6-12345678
020-1234567
0223-234567
06-12345678
0201234567
0223234567
0612345678
and the following false formatted ones are not
06-1234567 (mobile phone number in the Netherlands should have 8 numbers after 06 )
0223-1234567 (area code with home phone)
as opposed to this which is good.
020-1234567 (area code with 3 numbers has 7 numbers for the phone as opposed to a 4 number area code which can only have 6 numbers for phone number)
As you can see it's the '-' character that makes it a little difficult but I need it in there because it's a part of the formatting usually used by people, and I want to be able to parse them all.
Now is my question... do you see a way to simplify this regex (or even improve it if you see a fault in it), while keeping the same rules?
You can test it at regextester.com
(The '( |^|>)' is to check if it is at the start of a word with the possibility it being preceded by either a new line or a '>'. I search for the phone numbers in HTML pages.)
First observation: reading the regex is a nightmare. It cries out for Perl's /x mode.
Second observation: there are lots, and lots, and lots of capturing parentheses in the expression (42 if I count correctly; and 42 is, of course, "The Answer to Life, the Universe, and Everything" -- see Douglas Adams "Hitchiker's Guide to the Galaxy" if you need that explained).
Bill the Lizard notes that you use '(-)?( )?' several times. There's no obvious advantage to that compared with '-? ?' or possibly '[- ]?', unless you are really intent on capturing the actual punctuation separately (but there are so many capturing parentheses working out which '$n' items to use would be hard).
So, let's try editing a copy of your one-liner:
( |^|>)
(
((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{2})(-)?( )?)?)([0-9]{7})) |
((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{3})(-)?( )?)?)([0-9]{6})) |
((((((\+|00)(31|32)( )?(\(0\))?)|0)([0-9]{1})(-)?( )?)?)([0-9]{8}))
)
( |$|<)
OK - now we can see the regular structure of your regular expression.
There's much more analysis possible from here. Yes, there can be vast improvements to the regular expression. The first, obvious, one is to extract the international prefix part, and apply that once (optionally, or require the leading zero) and then apply the national rules.
( |^|>)
(
(((\+|00)(31|32)( )?(\(0\))?)|0)
(((([0-9]{2})(-)?( )?)?)([0-9]{7})) |
(((([0-9]{3})(-)?( )?)?)([0-9]{6})) |
(((([0-9]{1})(-)?( )?)?)([0-9]{8}))
)
( |$|<)
Then we can simplify the punctuation as noted before, and remove some plausibly redundant parentheses, and improve the country code recognizer:
( |^|>)
(
(((\+|00)3[12] ?(\(0\))?)|0)
(((([0-9]{2})-? ?)?)[0-9]{7}) |
(((([0-9]{3})-? ?)?)[0-9]{6}) |
(((([0-9]{1})-? ?)?)[0-9]{8})
)
( |$|<)
We can observe that the regex does not enforce the rules on mobile phone codes (so it does not insist that '06' is followed by 8 digits, for example). It also seems to allow the 1, 2 or 3 digit 'exchange' code to be optional, even with an international prefix - probably not what you had in mind, and fixing that removes some more parentheses. We can remove still more parentheses after that, leading to:
( |^|>)
(
(((\+|00)3[12] ?(\(0\))?)|0) # International prefix or leading zero
([0-9]{2}-? ?[0-9]{7}) | # xx-xxxxxxx
([0-9]{3}-? ?[0-9]{6}) | # xxx-xxxxxx
([0-9]{1}-? ?[0-9]{8}) # x-xxxxxxxx
)
( |$|<)
And you can work out further optimizations from here, I'd hope.
Good Lord Almighty, what a mess! :) If you have high-level semantic or business rules (such as the ones you describe talking about European numbers, numbers in the Netherlands, etc.) you'd probably be better served breaking that single regexp test into several individual regexp tests, one for each of your high level rules.
if number =~ /...../ # Dutch mobiles
# ...
elsif number =~ /..../ # Belgian landlines
# ...
# etc.
end
It'll be quite a bit easier to read and maintain and change that way.
Split it into multiple expressions. For example (pseudo-code)...
phone_no_patterns = [
/[0-9]{13}/, # 0031201234567
/+(31|32)\(0\)\d{2}-\d{7}/ # +31(0)20-1234567
# ..etc..
]
def check_number(num):
for pattern in phone_no_patterns:
if num matches pattern:
return match.groups
Then you just loop over each pattern, checking if each one matches..
Splitting the patterns up makes its easy to fix specific numbers that are causing problems (which would be horrible with the single monolithic regex)
(31|32) looks bad. When matching 32, the regex engine will first try to match 31 (2 chars), fail, and backtrack two characters to match 31. It's more efficient to first match 3 (one character), try 1 (fail), backtrack one character and match 2.
Of course, your regex fails on 0800- numbers; they're not 10 digits.
It's not an optimization, but you use
(-)?( )?
three times in your regex. This will cause you to match on phone numbers like these
+31(0)6-12345678
+31(0)6 12345678
but will also match numbers containing a dash followed by a space, like
+31(0)6- 12345678
You can replace
(-)?( )?
with
(-| )?
to match either a dash or a space.