I'm pretty sure this hasn't actually been answered yet on this site. For once and for all, what is the smallest regex that matches a numeric string that is in the range of a 32-bit signed integer, in the range -2147483648 to 2147483647.
I must use regex for validation - that is the only option available to me.
I have tried
\d{1,10}
but I can't figure out how to restrict it to the valid number range.
To aid developing in regex, it should match:
-2147483648
-2099999999
-999999999
-1
0
1
999999999
2099999999
2147483647
It should not match:
-2147483649
-2200000000
-11111111111
2147483648
2200000000
11111111111
I have set up an on-line live demo (on rubular) that has my attempt and the test cases above.
Note: The shortest regex that works will be accepted. Efficiency of regex will not be considered (unless there's a tie for shortest length).
I really hope it is just puzzler and no one will use regex for this problem in real world. Proper solution would be converting number from string to numeric type like BigInteger. This should allow us to check its range using proper methods or operators, like compareTo, >, <.
To make life easier you can use this page (dead link) to generate regex for ranges. So regex for range 0 - 2147483647 can look like
\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))\b
(friendlier way)
\b(
[0-9]{1,9}|
1[0-9]{9}|
2(0[0-9]{8}|
1([0-3][0-9]{7}|
4([0-6][0-9]{6}|
7([0-3][0-9]{5}|
4([0-7][0-9]{4}|
8([0-2][0-9]{3}|
3([0-5][0-9]{2}|
6([0-3][0-9]|
4[0-7]
)))))))))\b
and range 0 - 2147483648
\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8])))))))))\b
So we can just combine these ranges and write it as
range of 0-2147483647 OR "-" range of 0-2147483648
which will give us
\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))\b|-\b([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8])))))))))\b.
[edit]
Since Bohemian noticed in his comment final regex can be in form -?regex1|-2147483648 so here is little shorter version (also changed [0-9] to \d)
^-?(\d{1,9}|1\d{9}|2(0\d{8}|1([0-3]\d{7}|4([0-6]\d{6}|7([0-3]\d{5}|4([0-7]\d{4}|8([0-2]\d{3}|3([0-5]\d{2}|6([0-3]\d|4[0-7])))))))))$|^-2147483648$
If you will use it in Java String#matches(regex) method on each line you can also skip ^ and $ parts since they will be added automatically to make sure entire string matches regex.
I know this regex is very ugly, but just shows why regex is not good tool for range validation.
Edit:
This is the shortest regex you can get and the best way to do it:
We check every digit starting from the left, if it reaches it's limit and all the previous did, we put control on the next one.
for the range (-2147483647 to 2147483647) it could be a - signe or not. for -2147483648 it must be a - signe.
So finaly we get this:
^-?([0-9]{1,9}|[0-1][0-9]{9}|20[0-9]{8}|21[0-3][0-9]{7}|214[0-6][0-9]{6}|2147[0-3][0-9]{5}|21474[0-7][0-9]{4}|214748[0-2][0-9]{3}|2147483[0-5][0-9]{2}|21474836[0-3][0-9]|214748364[0-7])$|^(-2147483648)$
And this is a Live Demo
^(429496729[0-6]|42949672[0-8]\d|4294967[01]\d{2}|429496[0-6]\d{3}|42949[0-5]\d{4}|4294[0-8]\d{5}|429[0-3]\d{6}|42[0-8]\d{7}|4[01]\d{8}|[1-3]\d{9}|[1-9]\d{8}|[1-9]\d{7}|[1-9]\d{6}|[1-9]\d{5}|[1-9]\d{4}|[1-9]\d{3}|[1-9]\d{2}|[1-9]\d|\d)$
Kindly try this i tested randomly not thoroughly.
only for the numbers above zero. add '-' and adjust last number pattern for negative numbers.
(^\d{1,9}$|^1\d{9}$|^20\d{8}$|^21[0-3]\d{7}$|^214[0-6]\d{6}$|^2147[0-3]\d{5}$|^21474[0-7]\d{4}$|^214748[0-2]\d{3}$|^2147483[0-5]\d{2}$|^21474836[0-3]\d$|^214748364[0-7]$)
one should never use regex for this type of work.
Related
The following will give me 9090 but I wish to get -9090
regexp_replace('abcd-9090',[^0-9],'')
If I use regexp_replace('abcd-9090',[^0-9-],'')
then it gives -9090
but when the string is abcd9090- it would give me 9090-
There could be many more cases I guess where abc-abcd-9090 would give me -9090 but its safe to assume that such will not be the case and there would be only a single - before the numeric values.
Since there could be many cases , I am just supposed to assume the best and replace the flawed code with a more correct pattern which produces an integer almost always.
May be like assuming a condition where only single - could come at the beginning of any digits in the string is okay to assume.
Any help is appreciated.
I guess you can try to use regexp_extract instead:
regexp_extract('abcd-9090','.*(-[0-9]+)',1)
UPD from comment - author need to address one more corner case:
regexp_extract(regexp_replace('-ab2cd9090','[^\\d-]+',''),'(-?\\d+)',1)
I checked around but didn't find a regular expression that was suitable. I'm trying to match on only numbers (8-32) and tried a few combinations that were unsuccessful including (Regex regex = new Regex("[8-9]|[10-29]\\d",RegexOptions.IgnoreCase | RegexOptions.Singleline);). This only got me up to 8-29 and then I got lost.
I know there is a better and easier way if I just create an if statement, but I'll never learn anything doing it that way. :-)
Any help would be greatly appreciated.
Using a regex for checking whether a number is in a range is a bad idea. Regex only cares about what characters are in the string, not what the value of each character represents. The regex engine doesn't know that 2 in 23 actually means 20. To it, it's the same as any other 2.
You might be able to write a highly complex regex to do that, but don't.
Assuming you are using C#, just convert the string to an integer like this
var integer = Convert.ToInt32(yourString);
then check if it is in range with an if statement:
if (integer >= 8 && integer <= 32) {
}
If your number is a part of a larger string, then you can use regex to extract the number out, convert it to an int, and check it with an if.
As a reference for regex testing with explanations, I would suggest you https://regexr.com/
And for your need : 8-32, you will want a pattern like
[8-9]|[1-2][0-9]|3[0-2]
So that you will get 8 or 9 or every number between 10 and 29 or 30 to 32
I'm trying to find a regex that validates for a number being greater or less than 0.
It must allow a number to be 1.20, -2, 0.0000001, etc...it simply can't be 0 and it must be a number, also means it can't be 0.00, 0.0
^(?=.*[1-9])(?:[1-9]\d*\.?|0?\.)\d*$
tried that but it does not allows negative
I don't think a regex is the appropriate tool for that problem.
Why not using a simple condition ?
long number = ...;
if (number != 0)
{
// ...
}
Why using a bazooka to kill a fly ?
also tried something:
-?[0-9]*([1-9][0-9]*(\.[0-9]*)?|\.[0-9]*[1-9][0-9]*)
demo: http://regex101.com/r/bZ8fE5
Just tried something:
[+-]?(?:\d*[1-9]\d*(?:\.\d+)?|0+\.\d*[1-9]\d*)
Online demo
Take a typical regex for a number, say
^[+-]?[0-9]*(\.[0-9]*)?$
and then require that there be a non-zero digit either before or after the decimal. Based on your examples, you're not expecting leading zeros before the decimal, so a simple regex might be
^([+-]?[1-9][0-9]*(\.[0-9]*)?)|([+-]?[0-9]*\.0*[1-9]*0*)
Then decide if you still want to use a regex for this.
Try to negate the regex like this
!^[0\.]+$
If you're feeling the need to use regex just because it's stored as a String you could use Double.parseDouble() to covert the string into a numeric type. This would have an added advantage of checking if the string is a valid number or not (by catching NumberFormatException).
I'm currently using ([1-9]|1[0-2]) to represent inputs from 1 to 12. (Leading zeros not allowed.)
However it seems rather hacky, and on some days it looks outright dirty.
☞ Is there a proper in-built way to do it?
☞ What are some other ways to represent number ranges?
I tend to go with forms like [2-9]|1[0-2]? which avoids backtracking, though it makes little difference here. I've been conditioned by XML Schema to avoid such "ambiguities", even though regex can handle them fine.
Yes, the correct one:
[1-9]|1[0-2]
Otherwise you don't get the 10.
Here is the better answer, with exact match from 1 - 12.
(^0?[1-9]$)|(^1[0-2]$)
Previous answers doesn't really work well with HTML input regex validation, where some values like '1111' or '1212' will still treat it as a valid input.
You can use:
[1-9]|1[012]
How about:
^[1-9]|10|11|12$
Matches 0-9 or 10 or 11 or 12. thats it, nothing else is matched.
You can try this:
^[1-9]$|^[1][0-2]$
Use the following pattern (0?[1-9]|1[0-2]) use this which will return values from 1 to 12 (January to December) even if it initially starts with 0 (01, 02, 03, ..., 09, 10, 11, 12)
The correct patter to validate numbers from 1 to 12 is the following:
(^[1-9][0-2]$)|(^[1-9]$)
The above expression is useful when you have an input with type number and you need to validate month, for example. This is because the input type number ignores the 0 in front of any number, eg: 01 it returns 1.
You can see it in action here: https://regexr.com/5hk0s
if you need to validate string numbers, I mean, when you use an input with type text but you expect numbers, eg: expiration card month, or months the below expression can be useful for you:
((^0[1-9]$)|(^1[0-2]$))
You can see it in action here https://regexr.com/5hkae
I hope this helps a lot because it is very tricky.
Regards.
In python this matches any number between 1 - 12:
12|11|10|9|8|7|6|5|4|3|2|1
The descending order matters. In ascending order 10, 11 and 12 would match 1 instead as regex usually pick the first matching value.
I would like to write a regular expression to validate and input field against the following arguments:
field is required (cannot be
empty)
field must not be a negative number
field must be a validate decimal
number to two decimals (eg. 1 or 1.3
or 1.23)
field can be any valid number between 0 and 100 or an 'e'
Regular expressions find great use in checking format, but you're wishing to use it to do a subset of floating point number parsing and bounds checking. Be kind to yourself and the person who will maintain your code after you're gone: check if it's an 'e', else read it into a float and check the bounds.
You can use: ^(100|\d{1,2}(\.\d{1,2})?|e)$
However, it would be simpler and more readable to use your language's float parsing/casting functions.
EDIT: Some variations based on the comments:
Allowing 100.0 and 100.00: ^(100(\.0{1,2})?|\d{1,2}(\.\d{1,2})?|e)$
Disallowing leading zeroes: ^(100(\.0{1,2})?|[1-9]?\d(\.\d{1,2})?|e)$
^(?:100|\d{1,2}(?:\.\d{1,2})?|e)$
Hmm does this work for you?
^((100|[0-9]{1,2})(\.[0-9]{1,2})?)|(e)$
Whay environment is this for? Any particular regex standard it must adhere to?
Constraints on numeric values (such as "> 100", or "<= 5.3") can make regexes rather complicated. These types of contraints are better checkedin application logic. Then you can have a simpler (and easier to understand) pattern:
^(([0-9]{1,3})(\.[0-9]{1,2})?)|(e)$
And then extract the capture group for the first 3 digits and validate that separately.
Edit:
Ok I think this one should do it (last one because my eyes are getting tired):
^(100(\.0{1,2})?)|([0-9]{1,2})(\.[0-9]{1,2})?|(e)$
Will also allow 100.00 or 100.0