Retrieving certain numbers with regex - regex

I need help getting certain numbers with a regex.
As I dont know much about regex I have only managed to see if the first two characters match with 95 - 99. ^([0-9][5-9]{4})
I have the numbers 00000 through 99999.
I want to exclude all the numbers that start with 95 and up.
So 00000 - 94999 is ok, 95000 - 99999 is not ok.

You could match the range of numbers 00000 - 94999 including leading zeroes you might try it like this:
^ From the beginning of the string
(?=\d{5}$) Start with a positive lookahead that makes sure that the number is not longer than 5 digits until the end of the string
0* Preprend with zero or more zeroes
(?:9[0-4][0-9]{3}|[1-8][0-9]{4}|[1-9][0-9]{1,3}|[0-9]) Match the range of numbers
$ The end of the string
Your regex could look like:
^(?=\d{5}$)0*(?:9[0-4][0-9]{3}|[1-8][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$

A regex is for validating if a string matches the set pattern. It is not for comparing numbers to see if they are within range. Convert the text (if that is what you are starting from) to a number and then use comparison operators in an if statement.

Regex is not well suited to performing numeric comparisons, from both a readability and performance standpoint. It would be much more sensible to extract the number and perform a numeric compare afterwards.
You've not mentioned a language, so I'll demonstrate with Python3.
# input data
lines = [
"my line #1 :94995: message #1",
"my line #2 :95005: message #2"
]
for i, line in enumerate(x):
# extract a 5-digit number wrapped in colons
match = re.search(':([0-9]{5}):', line)
if match is None:
continue
# convert to a number, and verify
num = int(match.group(1))
if num >= 95000:
continue
# print any lines that meet our criteria
print("line %d meets our criteria! (%d)" % ( i, num ))
Will output:
line 0 meets our criteria! (94995)

Related

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Matching numbers greater than 40

I'm trying to match numbers greater than 40. The good point is that all of them have 2 decimal places, so all of them are like: 3.25, 5.89, 999.75 and they don't use any leading zeros (except on the decimal part that always have 2 digits)...
At first I tried the following code but then I realized this wouldn't match numbers like 100, 1000... even if they are greater than 40.
[4-9][0-9]\.
I don't have to match the decimal part, so don't worry about matching that, just help me to find how to match numbers greater than 40 (up to 9999 would be fine).
Thanks for your help.
This should do the job:
([4-9][0-9]|\d{3,})\.
Check it here:
http://www.regexr.com/3a5v9
Don't use regular expressions for number comparison. If, for example, you're using Javascript:
var aNumber = parseFloat("50");
if (aNumber > 40) {
// yay!
}
If your regex flavour can use negative lookbehind to match the numbers from 41 to 9999 without decimal:
\b(?:[1-9][0-9]{2,3}|[5-9][0-9]|4[1-9])(?<!\.\d{1,2})\b
(40\.(?!0[^\d]|00)\d{1,2}|(((4[1-9](?!\d)|[5-9][0-9])(?![\d])|\d*[1-9]\d{2,})(\.\d{1,2})?))
This prevents false positives from leading 0s.
This worked for me.
It tries to match 40 followed by 1 or two decimals that are not 00.
It then tries to match 4 followed by 1-9, decimal optional.
If it can't match that it matches 5-9 followed by 0-9, decimal optional.
It then triese to match any digit, any number of times, followed by 1-9, followed by 1 or 2 digits, decimal optional.
If you want to require the decimal, just remove the last question mark.
This will do it:
([4-9][0-9]+|\d{3,})
This it will get all the numbers of two digits having the first one greater than 4 or any number with three digits.
As an example http://www.regexr.com/3a5v0
You can use brackets to indicate a minimum and, if desired, maximum number of characters to match. So,
([4-9][0-9]|[1-9][0-9]{2,})\.
matches 4-9 followed by one or more digits. Presumably there's a boundary of some sort at the beginning of this, but it sounds like you have that part worked out. This uses an OR to allow for two possible groups of first digits.
(Most of the other answer are perfect for me -- This is paranoia and a bad idea :)
for use with grep -Po or Perl we could use:
'\b(\d{3,}|[4-9]\d)\.\d\d'
but this would get 40.00 (not greater than 40)
'\b(\d{3,}|[5-9]\d|4[1-9])\.\d\d|\b40\.\d?[1-9]\d?'
Corresponding to:
DDD.DD
| [5-9]D.DD
| 4[1-9].DD
| 40.D[1-9]
| 40.[1-9]D
In flex(1) you have this code to parse strings and get numbers greater than 40:
pru.l:
%option noyywrap
%%
\+?(0*[4-9][0-9]|0*[1-9][0-9][0-9][0-9]*)(\.[0-9]*)? { printf("Greater than 40: %s\n", yytext); }
\-?[0-9]*(\.[0-9]*)? { printf("Lesser than 40: %s\n", yytext); }
\n |
. ;
%%
int main()
{ yylex(); }
Install flex and compile this file it with
make pru
Then run it as:
pru <filein >fileout
or just
pru
This code constructs a deterministic finite automaton from the regular expressions listed and prints the commands listed on the right when recognizes a value greater than 40. It allows a leading optional sign and leading zeros, and an optional fractional part composed of any number of digits. And it does this with only one asignment and one decision for each character read. You have access to the automaton state table generated by flex (it writes C code for you)
the regex that recognizes numbers greater than 40 (with decimals and leading sign and zeros) is:
\+?(0*[4-9][0-9]|0*[1-9][0-9][0-9][0-9]*)(\.[0-9]*)?
and can be abreviated as:
\+?(0*[4-9][0-9]|0*[1-9][0-9]{3,})(\.[0-9]*)?
explanation:
\+? matches an optional plus sign.
(...|...) two options:
0* optional arbitrary number of leadin zeros.
[4-9][0-9] the numbers 40 to 99
[1-9][0-9]{3,} the numbers 100 and up.
(.[0-9]*)? optional decimal point followed by an arbitrary number of digits.

Reg Ex for even number of 0s and 1s

I am trying to create a regular expression that determines if a string (of any length) matches a regex pattern such that the number of 0s in the string is even, and the number of 1s in the string is even. Can anyone help me determine a regex statement that I could try and use to check the string for this pattern?
So completely reformulated my answer to reflect all the changes:
This regex would match all strings with only zeros and ones and only equal amounts of those
^(?=1*(?:01*01*)*$)(?=0*(?:10*10*)*$).*$
See it here on Regexr
I am working here with positive lookahead assertions. The big advantage here of a lookahead assertion is, that it checks the complete string, but without matching it, so both lookaheads start to check the string from the start, but for different assertions.
(?=1*(?:01*01*)*$) does check for an equal amount of 0 (including 0)
(?=0*(?:10*10*)*$) does check for an equal amount of 1 (including 0)
.* does then actually match the string
Those lookaheads checks:
(?=
1* # match 0 or more 1
(?: # open a non capturing group
0 # match one 0
1* # match 0 or more 1
0 # match one 0
1* # match 0 or more 1
)
* # repeat this pattern at least once
$ # till the end of the string
)
So, I have come up with a solution to the problem:
(11+00+(10+01)(11+00)\*(10+01))\*
For even sets of 0s, you can use the following regex to ensure that the number of 0s is even.
^(1*01*01*)*$
However, I believe that the question is to have both an even number of 0s and also an even number of 1s. Since it is possible to construct a non-deterministic finite automaton (NFA) for this problem, the solution is regular and can be represented using a regex expression. The NFA is represented via the machine below, S1 is the start/exit state.
S1 ---1----->S2
|^ <--1----- |^
|| ||
00 00
|| ||
v| v|
S3----1----->S4
<---1------
From there, there's a way to convert NFAs to regex expressions but it's been a while since my computation course. There's some notes below that seem to be helpful in explaining the steps required to convert a NFA to a regex.
http://www.cs.uiuc.edu/class/sp09/cs373/lectures/lect_08.pdf
RE-UPDATED
Try this : [ check out this demo : http://regexr.com?30m7c ]
^(00|11|0011|0110|1100|1001)+$
Hint :
Even numbers are divisible by 2, thus - in binary - they always end in zero (0)
Not a regular expression (which is likely to be impossible, although I can't prove it: the proof by contradiction via the pumping lemma fails), but the "correct" solution is avoiding a complicated and inefficient regular expression all together and using something like (in Python):
def even01(string):
return string.count("1") % 2 == 0 and string.count("0") % 2 == 0
Or if the string has to consist only of 1s and 0s:
import re
def even01(string):
return not re.search("[^01]",string) and \
string.count("1") % 2 == 0 and string.count("0") % 2 == 0
^(0((1(00)*1)*0|1(11|00)*01)|1((0(11)*0)*1|0(11|00)*10))*$
If I haven't overlooked anything, this matches any bit string where the number of 0s is even and the number of 1s is even, using only rudimentary regex operators (*, ^, $). It's slightly easier to see how it works if written like this:
^(0((1(00)*1)*0
|1(11|00)*01)
|1((0(11)*0)*1
|0(11|00)*10))*$
The following test code should illustrate the correctness - we compare the result of the pattern match against a function that tells us if a string has an even number of 0s and 1s. All bit strings of length 16 are tested.
import re
balanced = lambda s: s.count('0') % 2 == 0 and s.count('1') % 2 == 0
pat = re.compile('^(0((1(00)*1)*0|1(11|00)*01)|1((0(11)*0)*1|0(11|00)*10))*$')
size = 16
num = 2**size
for i in xrange(num):
binstr = bin(i)[2:].zfill(size)
b, m = balanced(binstr), bool(pat.match(binstr))
if b != m:
print "balanced('%s') = %d, pat.match('%s') = %d" % (binstr, b, binstr, m)
break
elif i != 0 and i % (num / 10) == 0:
# Python 2's `/` operator performs integer division
print "%d percent done..." % (100 * i / num + 1)
If you try to solve within the same sentence (starting with ^ and ending with $), you are in deep trouble. :-)
You can make sure that you have an even number of 0s (with ^(1*01*01*)*$, as stated by #david-z) OR you can make sure that you have an even number of 1s:
^(1*01*01*)*$|^(0*10*10*)*$
It works for strings with small lengths as well, such as "00" or "101", both valid strings.
I have also been working on lookaheads and lookbacks in my spare time, and using lookahead the problem can be solved while taking also account for the single 1s and/or the single 0s. So, the expression should also work for 11,1111,111111,... and also for 00,0000,000000,....
^(((?=(?:1*01*01*)*$)(?=(?:0*10*10*)*$).*)|([1]{2})*|([0]{2})*)$
Works for all cases.
So, if the string consists of only 1s or only 0s:
([1]{2})*|([0]{2})*
If it contains a mix of 0s and 1s, the positive lookahead will take care of that.
((?=(?:1*01*01*)*$)(?=(?:0*10*10*)*$).*
Combining both of them, it takes into account all string with even number of 0s and 1s.

Regex that matches integers in between whitespace or start/end of string only

I'm currently using the pattern: \b\d+\b, testing it with these entries:
numb3r
2
3454
3.214
test
I only want it to catch 2, and 3454. It works great for catching number words, except that the boundary flags (\b) include "." as consideration as a separate word. I tried excluding the period, but had troubles writing the pattern.
Basically I want to remove integer words, and just them alone.
All you want is the below regex:
^\d+$
Similar to manojlds but includes the optional negative/positive numbers:
var regex = /^[-+]?\d+$/;
EDIT
If you don't want to allow zeros in the front (023 becomes invalid), you could write it this way:
var regex = /^[-+]?[1-9]\d*$/;
EDIT 2
As #DmitriyLezhnev pointed out, if you want to allow the number 0 to be valid by itself but still invalid when in front of other numbers (example: 0 is valid, but 023 is invalid). Then you could use
var regex = /^([+-]?[1-9]\d*|0)$/
You could use lookaround instead if all you want to match is whitespace:
(?<=\s|^)\d+(?=\s|$)
This just allow positive integers.
^[0-9]*[1-9][0-9]*$
I would add this as a comment to the other good answers, but I need more reputation to do so. Be sure to allow for scientific notation if necessary, i.e. 3e4 = 30000. This is default behavior in many languages. I found the following regex to work:
/^[-+]?\d+([Ee][+-]?\d+)?$/;
// ^^ If 'e' is present to denote exp notation, get it
// ^^^^^ along with optional sign of exponent
// ^^^ and the exponent itself
// ^ ^^ The entire exponent expression is optional
This solution matches integers:
Negative integers are matched (-1,-2,etc)
Single zeroes are matched (0)
Negative zeroes are not (-0, -01, -02)
Empty spaces are not matched ('')
/^(0|-*[1-9]+[0-9]*)$/
^([+-]?[0-9]\d*|0)$
will accept numbers with leading "+", leading "-" and leadings "0"
Try /^(?:-?[1-9]\d*$)|(?:^0)$/.
It matches positive, negative numbers as well as zeros.
It doesn't match input like 00, -0, +0, -00, +00, 01.
Online testing available at http://rubular.com/r/FlnXVL6SOq
^(-+)?[1-9][0-9]*$
starts with a - or + for 0 or 1 times, then you want a non zero number (because there is not such a thing -0 or +0) and then it continues with any number from 0 to 9
This worked in my case where I needed positive and negative integers that should NOT include zero-starting numbers like 01258 but should of course include 0
^(-?[1-9]+\d*)$|^0$
Example of valid values:
"3",
"-3",
"0",
"-555",
"945465464654"
Example of not valid values:
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"+100",
"--1",
"-.1",
"-0",
"00099",
"099"

Regex - Validation of numeric with up to 4 decimal places

I am having a bit of difficulty with the following:
I need to allow any positive numeric value up to four decimal places. Here are some examples.
Allowed:
123
12345.4
1212.56
8778787.567
123.5678
Not allowed:
-1
12.12345
-12.1234
I have tried the following:
^[0-9]{0,2}(\.[0-9]{1,4})?$|^(100)(\.[0]{1,4})?$
However this doesn't seem to work, e.g. 1000 is not allowed when it should be.
Any ideas would be greatly appreciated.
Thanks
To explain why your attempt is not working for a value of 1000, I'll break down the expression a little:
^[0-9]{0,2} # Match 0, 1, or 2 digits (can start with a zero)...
(\.[0-9]{1,4})?$ # ... optionally followed by (a decimal, then 1-4 digits)
| # -OR-
^(100) # Capture 100...
(\.[0]{1,4})?$ # ... optionally followed by (a decimal, then 1-4 ZEROS)
There is no room for 4 digits of any sort, much less 1000 (theres only room for a 0-2 digit number or the number 100)
^\d* # Match any number of digits (can start with a zero)
(\.\d{1,4})?$ # ...optionally followed by (a decimal and 1-4 digits)
This expression will pass any of the allowed examples and reject all of the Not Allowed examples as well, because you (and I) use the beginning-of-string assertion ^.
It will also pass these numbers:
.2378
1234567890
12374610237856987612364017826350947816290385
000000000000000000000.0
0
... as well as a completely blank line - which might or might not be desired
to make it reject something that starts with a zero, use this:
^(?!0\d)\d* # Match any number of digits (cannot "START" with a zero)
(\.\d{1,4})?$ # ...optionally followed by (a decimal and 1-4 digits)
This expression (which uses a negative lookahead) has these evaluations:
REJECTED Allowed
--------- -------
0000.1234 0.1234
0000 0
010 0.0
You could also test for a completely blank line in other ways, but if you wanted to reject it with the regex, use this:
^(?!0\d|$)\d*(\.\d{1,4})?$
Try this:
^[0-9]*(?:\.[0-9]{0,4})?$
Explanation: match only if starting with a digit (excluding negative numbers), optionally followed by (non-capturing group) a dot and 0-4 digits.
Edit: With this pattern .2134 would also be matched. To only allow 0 < x < 1 of format 0.2134, replace the first * with a + above.
This regex would do the trick:
^\d+(?:\.\d{1,4})?$
From the beginning of the string search for one or more digits. If there's a . it must be followed with atleast one digit but a maximum of 4.
^(?<!-)\+?\d+(\.?\d{0,4})?$
The will match something with doesn't start with -, maybe has a + followed by an integer part with at least one number and an optional floating part of maximum 4 numbers.
Note: Regex does not support scientific notation. If you want that too let me know in a comment.
Well asked!!
You can try this:
^([0-9]+[\.]?[0-9]?[0-9]?[0-9]?[0-9]?|[0-9]+)$
If you have a double value but it goes to more decimal format and you want to shorter it to 4 then !
double value = 12.3457652133
value =Double.parseDouble(new DecimalFormat("##.####").format(value));