I am trying to create a regular expression that determines if a string (of any length) matches a regex pattern such that the number of 0s in the string is even, and the number of 1s in the string is even. Can anyone help me determine a regex statement that I could try and use to check the string for this pattern?
So completely reformulated my answer to reflect all the changes:
This regex would match all strings with only zeros and ones and only equal amounts of those
^(?=1*(?:01*01*)*$)(?=0*(?:10*10*)*$).*$
See it here on Regexr
I am working here with positive lookahead assertions. The big advantage here of a lookahead assertion is, that it checks the complete string, but without matching it, so both lookaheads start to check the string from the start, but for different assertions.
(?=1*(?:01*01*)*$) does check for an equal amount of 0 (including 0)
(?=0*(?:10*10*)*$) does check for an equal amount of 1 (including 0)
.* does then actually match the string
Those lookaheads checks:
(?=
1* # match 0 or more 1
(?: # open a non capturing group
0 # match one 0
1* # match 0 or more 1
0 # match one 0
1* # match 0 or more 1
)
* # repeat this pattern at least once
$ # till the end of the string
)
So, I have come up with a solution to the problem:
(11+00+(10+01)(11+00)\*(10+01))\*
For even sets of 0s, you can use the following regex to ensure that the number of 0s is even.
^(1*01*01*)*$
However, I believe that the question is to have both an even number of 0s and also an even number of 1s. Since it is possible to construct a non-deterministic finite automaton (NFA) for this problem, the solution is regular and can be represented using a regex expression. The NFA is represented via the machine below, S1 is the start/exit state.
S1 ---1----->S2
|^ <--1----- |^
|| ||
00 00
|| ||
v| v|
S3----1----->S4
<---1------
From there, there's a way to convert NFAs to regex expressions but it's been a while since my computation course. There's some notes below that seem to be helpful in explaining the steps required to convert a NFA to a regex.
http://www.cs.uiuc.edu/class/sp09/cs373/lectures/lect_08.pdf
RE-UPDATED
Try this : [ check out this demo : http://regexr.com?30m7c ]
^(00|11|0011|0110|1100|1001)+$
Hint :
Even numbers are divisible by 2, thus - in binary - they always end in zero (0)
Not a regular expression (which is likely to be impossible, although I can't prove it: the proof by contradiction via the pumping lemma fails), but the "correct" solution is avoiding a complicated and inefficient regular expression all together and using something like (in Python):
def even01(string):
return string.count("1") % 2 == 0 and string.count("0") % 2 == 0
Or if the string has to consist only of 1s and 0s:
import re
def even01(string):
return not re.search("[^01]",string) and \
string.count("1") % 2 == 0 and string.count("0") % 2 == 0
^(0((1(00)*1)*0|1(11|00)*01)|1((0(11)*0)*1|0(11|00)*10))*$
If I haven't overlooked anything, this matches any bit string where the number of 0s is even and the number of 1s is even, using only rudimentary regex operators (*, ^, $). It's slightly easier to see how it works if written like this:
^(0((1(00)*1)*0
|1(11|00)*01)
|1((0(11)*0)*1
|0(11|00)*10))*$
The following test code should illustrate the correctness - we compare the result of the pattern match against a function that tells us if a string has an even number of 0s and 1s. All bit strings of length 16 are tested.
import re
balanced = lambda s: s.count('0') % 2 == 0 and s.count('1') % 2 == 0
pat = re.compile('^(0((1(00)*1)*0|1(11|00)*01)|1((0(11)*0)*1|0(11|00)*10))*$')
size = 16
num = 2**size
for i in xrange(num):
binstr = bin(i)[2:].zfill(size)
b, m = balanced(binstr), bool(pat.match(binstr))
if b != m:
print "balanced('%s') = %d, pat.match('%s') = %d" % (binstr, b, binstr, m)
break
elif i != 0 and i % (num / 10) == 0:
# Python 2's `/` operator performs integer division
print "%d percent done..." % (100 * i / num + 1)
If you try to solve within the same sentence (starting with ^ and ending with $), you are in deep trouble. :-)
You can make sure that you have an even number of 0s (with ^(1*01*01*)*$, as stated by #david-z) OR you can make sure that you have an even number of 1s:
^(1*01*01*)*$|^(0*10*10*)*$
It works for strings with small lengths as well, such as "00" or "101", both valid strings.
I have also been working on lookaheads and lookbacks in my spare time, and using lookahead the problem can be solved while taking also account for the single 1s and/or the single 0s. So, the expression should also work for 11,1111,111111,... and also for 00,0000,000000,....
^(((?=(?:1*01*01*)*$)(?=(?:0*10*10*)*$).*)|([1]{2})*|([0]{2})*)$
Works for all cases.
So, if the string consists of only 1s or only 0s:
([1]{2})*|([0]{2})*
If it contains a mix of 0s and 1s, the positive lookahead will take care of that.
((?=(?:1*01*01*)*$)(?=(?:0*10*10*)*$).*
Combining both of them, it takes into account all string with even number of 0s and 1s.
Related
I need help getting certain numbers with a regex.
As I dont know much about regex I have only managed to see if the first two characters match with 95 - 99. ^([0-9][5-9]{4})
I have the numbers 00000 through 99999.
I want to exclude all the numbers that start with 95 and up.
So 00000 - 94999 is ok, 95000 - 99999 is not ok.
You could match the range of numbers 00000 - 94999 including leading zeroes you might try it like this:
^ From the beginning of the string
(?=\d{5}$) Start with a positive lookahead that makes sure that the number is not longer than 5 digits until the end of the string
0* Preprend with zero or more zeroes
(?:9[0-4][0-9]{3}|[1-8][0-9]{4}|[1-9][0-9]{1,3}|[0-9]) Match the range of numbers
$ The end of the string
Your regex could look like:
^(?=\d{5}$)0*(?:9[0-4][0-9]{3}|[1-8][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$
A regex is for validating if a string matches the set pattern. It is not for comparing numbers to see if they are within range. Convert the text (if that is what you are starting from) to a number and then use comparison operators in an if statement.
Regex is not well suited to performing numeric comparisons, from both a readability and performance standpoint. It would be much more sensible to extract the number and perform a numeric compare afterwards.
You've not mentioned a language, so I'll demonstrate with Python3.
# input data
lines = [
"my line #1 :94995: message #1",
"my line #2 :95005: message #2"
]
for i, line in enumerate(x):
# extract a 5-digit number wrapped in colons
match = re.search(':([0-9]{5}):', line)
if match is None:
continue
# convert to a number, and verify
num = int(match.group(1))
if num >= 95000:
continue
# print any lines that meet our criteria
print("line %d meets our criteria! (%d)" % ( i, num ))
Will output:
line 0 meets our criteria! (94995)
I am looking for a Regex for non decimal integer considering exponents and honestly I have tried a lot before asking here.
The regex should
match 1.23E4,1.2334576E34, 122E3,123,456 etc.
not match 1.234E2 (since it expands to 123.4).
should not match 1.22 and so on.
My try was
^[+-]?([0-9]*\\.?[0-9]+|[0-9]+\\.?[0-9]*)([eE][+]?[0-9]+)?$
However as you can see I am not calculating the exponent so that after expansion I should be able to tell that a value X after expanding does not contain a decimal.
Is there any way to extract the number of digits after the decimal . and compare it with exponent so that I can be sure that after expanding it will not contain a decimal.
For the info only a regex that can work in runtime will work for me.
Please help me guys...
ok, so this is only if you really need this for some weird regexp-only validation. it's written in python 3 and it makes no attempt to be compact (there's no limitation except available memory in the size of a regexp in python).
def over(n):
'''make aregexp for an exponent of n or more'''
assert n < 100
return r'([1-9]\d{2,}|%s)' % '|'.join(str(i) for i in range(n, 100))
def make_decimal(n_digits, n_decimal):
'''make a regexp for a number with an "E" with the given number of significant digits and decimal places'''
assert n_decimal < n_digits
assert 100 > n_decimal >= 0
if n_decimal:
return r'\d{%d}.\d{%d}E%s' % (n_digits-n_decimal, n_decimal, over(n_decimal))
else:
return r'\d{%d}E\d+'
def make_e(n_digits):
'''make a regexp for an integer with an "E" with the given number of significant digits'''
return '|'.join(make_decimal(n_digits, i) for i in range(n_digits))
def make_regexp(max_digits):
'''make a regexp for a decimal integer with up to the given number of significant digits'''
assert max_digits < 100
return r'(\d+|%s)' % '|'.join(make_e(i) for i in range(max_digits+1))
here's some test code.
from re import compile
rx = make_regexp(8)
m = compile('^%s$' % rx)
for n in ['1.23E4', '1.2334576E34', '122E3', '123', '456']:
assert m.match(n), n
for n in ['1.234E2', '1.22']:
assert not m.match(n), n
for up to significant 8 digits (to the left of E), which seems a reasonable limit, the regexp generated is 8774 digits long. you could reduce this significantly (for example, see https://stackoverflow.com/a/17840228/181772), but what's the need (the regular expression engine is capable of generating a much smaller internal automaton from this)?
Description
It's not impossible, but rather difficult and the expression will really start to get out of hand. Take this 2831 character monster which:
validates a number with exponent will expand to an integer
requires a number to be in 123.456e7890 or 1234.678e1,234,567
if the exponent contains commas they must appear in the correct comma delimited three digit groupings
supports only numbers upto 99 places after the decimal point
As written here it does require the use of the x option which will ignore white space and comments. The expression could be shortened to about 2041 by replacing the [eE] with e and using the i option; and [0-9] with \d however this will slightly reduce performance because \d class contains all unicode characters and not just 0-9.
^
(?=.*?[eE][0-9]{1,3}(?:,[0-9]{3})*|[0-9]*$) # validate commas are in the correct order
(?=[0-9]+\. # match the integer portion of a real number
(?=
[0-9]{1,99}[eE][1-9](?:,?[0-9]){2,}
|[0-9]{1,9}[eE][1-9],?[0-9]
|[0-9]{10,19}[eE][2-9],?[0-9]
|[0-9]{20,29}[eE][3-9],?[0-9]
|[0-9]{30,39}[eE][4-9],?[0-9]
|[0-9]{40,49}[eE][5-9],?[0-9]
|[0-9]{50,59}[eE][6-9],?[0-9]
|[0-9]{60,69}[eE][7-9],?[0-9]
|[0-9]{70,79}[eE][89],?[0-9]
|[0-9]{80,89}[eE][9],?[0-9]
|[0-9]{90,99}[eE][1-9],?[0-9]
|(?=[0-9]{90}(?=.*?[eE]9)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{80}(?=.*?[eE]8)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{70}(?=.*?[eE]7)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{60}(?=.*?[eE]6)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{50}(?=.*?[eE]5)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{40}(?=.*?[eE]4)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{30}(?=.*?[eE]3)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{20}(?=.*?[eE]2)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?=[0-9]{10}(?=.*?[eE]1)(?:[eE].,?[0-9]|[0-9]{1}[eE].,?[1-9]|[0-9]{2}[eE].,?[2-9]|[0-9]{3}[eE].,?[3-9]|[0-9]{4}[eE].,?[4-9]|[0-9]{5}[eE].,?[5-9]|[0-9]{6}[eE].,?[6-9]|[0-9]{7}[eE].,?[7-9]|[0-9]{8}[eE].,?[89]|[0-9]{9}[eE].,?9))
|(?:[eE][0-9]|[0-9]{1}[eE][1-9]|[0-9]{2}[eE][2-9]|[0-9]{3}[eE][3-9]|[0-9]{4}[eE][4-9]|[0-9]{5}[eE][5-9]|[0-9]{6}[eE][6-9]|[0-9]{7}[eE][7-9]|[0-9]{8}[eE][89]|[0-9]{9}[eE]9)
)
|(?=[0-9]+[eE]) # integers
)
[+-]?
([0-9]*\.?[0-9]+|[0-9]+\.?[0-9]*)
[eE][+]?((?:,?[0-9]+)+)
As written here the expression uses the x option which ignores white space
Example
Sample Text
1.2334576E34
1.23E4
1.2334576E34
122E3,123,456
1.234
1.234E2
Matches
[0] => 1.2334576E34
[1] => 1.23E4
[2] => 1.2334576E34
[3] => 122E3,123,456
I am having a bit of difficulty with the following:
I need to allow any positive numeric value up to four decimal places. Here are some examples.
Allowed:
123
12345.4
1212.56
8778787.567
123.5678
Not allowed:
-1
12.12345
-12.1234
I have tried the following:
^[0-9]{0,2}(\.[0-9]{1,4})?$|^(100)(\.[0]{1,4})?$
However this doesn't seem to work, e.g. 1000 is not allowed when it should be.
Any ideas would be greatly appreciated.
Thanks
To explain why your attempt is not working for a value of 1000, I'll break down the expression a little:
^[0-9]{0,2} # Match 0, 1, or 2 digits (can start with a zero)...
(\.[0-9]{1,4})?$ # ... optionally followed by (a decimal, then 1-4 digits)
| # -OR-
^(100) # Capture 100...
(\.[0]{1,4})?$ # ... optionally followed by (a decimal, then 1-4 ZEROS)
There is no room for 4 digits of any sort, much less 1000 (theres only room for a 0-2 digit number or the number 100)
^\d* # Match any number of digits (can start with a zero)
(\.\d{1,4})?$ # ...optionally followed by (a decimal and 1-4 digits)
This expression will pass any of the allowed examples and reject all of the Not Allowed examples as well, because you (and I) use the beginning-of-string assertion ^.
It will also pass these numbers:
.2378
1234567890
12374610237856987612364017826350947816290385
000000000000000000000.0
0
... as well as a completely blank line - which might or might not be desired
to make it reject something that starts with a zero, use this:
^(?!0\d)\d* # Match any number of digits (cannot "START" with a zero)
(\.\d{1,4})?$ # ...optionally followed by (a decimal and 1-4 digits)
This expression (which uses a negative lookahead) has these evaluations:
REJECTED Allowed
--------- -------
0000.1234 0.1234
0000 0
010 0.0
You could also test for a completely blank line in other ways, but if you wanted to reject it with the regex, use this:
^(?!0\d|$)\d*(\.\d{1,4})?$
Try this:
^[0-9]*(?:\.[0-9]{0,4})?$
Explanation: match only if starting with a digit (excluding negative numbers), optionally followed by (non-capturing group) a dot and 0-4 digits.
Edit: With this pattern .2134 would also be matched. To only allow 0 < x < 1 of format 0.2134, replace the first * with a + above.
This regex would do the trick:
^\d+(?:\.\d{1,4})?$
From the beginning of the string search for one or more digits. If there's a . it must be followed with atleast one digit but a maximum of 4.
^(?<!-)\+?\d+(\.?\d{0,4})?$
The will match something with doesn't start with -, maybe has a + followed by an integer part with at least one number and an optional floating part of maximum 4 numbers.
Note: Regex does not support scientific notation. If you want that too let me know in a comment.
Well asked!!
You can try this:
^([0-9]+[\.]?[0-9]?[0-9]?[0-9]?[0-9]?|[0-9]+)$
If you have a double value but it goes to more decimal format and you want to shorter it to 4 then !
double value = 12.3457652133
value =Double.parseDouble(new DecimalFormat("##.####").format(value));
Let L= { w in (0+1)* | w has even number of 1s}, i.e. L is the set of all bit strings with even number of 1s. Which one of the regular expressions below represents L?
A) (0*10*1)*
B) 0*(10*10*)*
C) 0*(10*1)* 0*
D) 0*1(10*1)* 10*
According to me option D is never correct because it does not represent the bit string with zero 1s. But what about the other options? We are concerned about the number of 1s(even or not) not the number of zeros doesn't matter.
Then which is the correct option and why?
A if false. It doesn't get matched by 0110 (or any zeros-only non-empty string)
B represents OK. I won't bother proving it here since the page margins are too small.
C doesn't get matched by 010101010 (zero in the middle is not matched)
D as you said doesn't get matched by 00 or any other # with no ones.
So only B
To solve such a problem you should
Supply counterexample patterns to all "incorrect" regexps. This will be either a string in L that is not matched, or a matched string out of L.
To prove the remaining "correct" pattern, you should answer two questions:
Does every string that matches the pattern belong to L? This can be done by devising properties each of matched strings should satisfy--for example, number of occurrences of some character...
Is every string in L matched by the regexp? This is done by dividing L into easily analyzable subclasses, and showing that each of them matches pattern in its own way.
(No concrete answers due to [homework]).
Examining the pattern B:
^0*(10*10*)*$
^ # match beginning of string
0* # match zero or more '0'
( # start group 1
10* # match '1' followed by zero or more '0'
10* # match '1' followed by zero or more '0'
)* # end group 1 - match zero or more times
$ # end of string
Its pretty obvious that this pattern will only match strings who have 0,2,4,... 1's.
Look for examples that should match but don't. 0, 11011, and 1100 should all match, but each one fails for one of those four
C is incorrect because it does not allow any 0s between the second 1 of one group and the first 1 of the next group.
This answer would be best for this language
(0*10*10*)
a quick python script actually eliminated all the possibilities:
import re
a = re.compile("(0*10*1)*")
b = re.compile("0*(10*10*)*")
c = re.compile("0*(10*1)* 0*")
d = re.compile("0*1(10*1)* 10*")
candidates = [('a',a),('b',b),('c',c),('d',d)]
tests = ['0110', '1100', '0011', '11011']
for test in tests:
for candidate in candidates:
if not candidate[1].match(test):
candidates.remove(candidate)
print "removed %s because it failed on %s" % (candidate[0], test)
ntests = ['1', '10', '01', '010', '10101']
for test in ntests:
for candidate in candidates:
if candidate[1].match(test):
candidates.remove(candidate)
print "removed %s because it matched on %s" % (candidate[0], test)
the output:
removed c because it failed on 0110
removed d because it failed on 0110
removed a because it matched on 1
removed b because it matched on 10
I have tried 2 questions, could you tell me whether I am right or not?
Regular expression of nonnegative integer constants in C, where numbers beginning with 0 are octal constants and other numbers are decimal constants.
I tried 0([1-7][0-7]*)?|[1-9][0-9]*, is it right? And what string could I match? Do you think 034567 will match and 000083 match?
What is a regular expression for binary numbers x such that hx + ix = jx?
I tried (0|1){32}|1|(10)).. do you think a string like 10 will match and 11 won’t match?
Please tell me whether I am right or not.
You can always use http://www.spaweditor.com/scripts/regex/ for a quick test on whether a particular regex works as you intend it to. This along with google can help you nail the regex you want.
0([1-7][0-7])?|[1-9][0-9] is wrong because there's no repetition - it will only match 1 or 2-character strings. What you need is something like 0[0-7]*|[1-9][0-9]*, though that doesn't take hexadecimal into account (as per spec).
This one is not clear. Could you rephrase that or give some more examples?
Your regex for integer constants will not match base-10 numbers longer than two digits and octal numbers longer than three digits (2 if you don't count the leading zero). Since this is a homework, I leave it up to you to figure out what's wrong with it.
Hint: Google for "regular expression repetition quantifiers".
Question 1:
Octal numbers:
A string that start with a [0] , then can be followed by any digit 1, 2, .. 7 [1-7](assuming no leading zeroes) but can also contain zeroes after the first actual digit, so [0-7]* (* is for repetition, zero or more times).
So we get the following RegEx for this part: 0 [1-7][0-7]*
Decimal numbers:
Decimal numbers must not have a leading zero, hence start with all digits from 1 to 9 [1-9], but zeroes are allowed in all other positions as well hence we need to concatenate [0-9]*
So we get the following RegEx for this part: [1-9][0-9]*
Since we have two options (octal and decimal numbers) and either one is possible we can use the Alternation property '|' :
L = 0[1-7][0-7]* | [1-9][0-9]*
Question 2:
Quickly looking at Fermat's Last Theorem:
In number theory, Fermat's Last Theorem (sometimes called Fermat's conjecture, especially in older texts) states that no three positive integers a, b, and c can satisfy the equation an + bn = cn for any integer value of n greater than two.
(http://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem)
Hence the following sets where n<=2 satisfy the equation: {0,1,2}base10 = {0,1,10}base2
If any of those elements satisfy the equation, we use the Alternation | (or)
So the regular expression can be: L = 0 | 1 | 10 but can also be L = 00 | 01 | 10 or even be L = 0 | 1 | 10 | 00 | 01
Or can be generalized into:
{0} we can have infinite number of zeroes: 0*
{1} we can have infinite number of zeroes followed by a 1: 0*1
{10} we can have infinite number of zeroes followed by 10: 0*10
So L = 0* | 0*1 | 0*10
max answered the first question.
the second appears to be the unsolvable diophantine equation of fermat's last theorem. if h,i,j are non-zero integers, x can only be 1 or 2, so you're looking for
^0*10?$
does that help?
There are several tool available to test regular expressions, such as The Regulator.
If you search for "regular expression test" you will find numerous links to online testers.