regex - At most two pair of consecutives - regex

I'm taking a computation course which also teaches about regular expressions. There is a difficult question that I cannot answer.
Find a regular expression for the language that accepts words that contains at most two pair of consecutive 0's. The alphabet consists of 0 and 1.
First, I made an NFA of the language but cannot convert it to a GNFA (that later be converted to regex). How can I find this regular expressin? With or without converting it to a GNFA?

(Since this is a homework problem, I'm assuming that you just want enough help to get started, and not a full worked solution?)
Your mileage may vary, but I don't really recommend trying to convert an NFA into a regular expression. The two are theoretically equivalent, and either can be converted into the other algorithmically, but in my opinion, it's not the most intuitive way to construct either one.
Instead, one approach is to start by enumerating various possibilities:
No pairs of consecutive zeroes at all; that is, every zero, except at the end of the string, must be followed by a one. So, the string consists of a mixed sequence of 1 and 01, optionally followed by 0:
(1|01)*(0|ε)
Exactly one pair of consecutive zeroes, at the end of the string. This is very similar to the previous:
(1|01)*00
Exactly one pair of consecutive zeroes, not at the end of the string — and, therefore, necessarily followed by a one. This is also very similar to the first one:
(1|01)*001(1|01)*(0|ε)
To continue that approach, you would then extend the above to support two pair of consecutive zeroes; and lastly, you would merge all of these into a single regular expression.

(0+1)*00(0+1)*00(0+1)* + (0+1)*000(0+1)*

contains at most two pair of consecutive 0's
(1|01)*(00|ε)(1|10)*(00|ε)(1|10)*

Related

Can I write a regular expression that checks two lengths are equal?

I want to match strings with two numbers of equal length, like : 42-42, 0-2, 12345-54321.
I don't want to match strings where the two numbers have different lengths, like : 42-1, 000-0000.
The two parts (separated by the hyphen) must have the same length.
I wonder if it is possible to do a regexp like [0-9]{n}-[0-9]{n} with n variable but equal?
If there is no clean way to that in one pattern (I must put that in the pattern attribute of a HTML form input), I will do something like /\d-\d|\d{2}-\d{2}|\d{3}-\d{3}|<etc>/ up to the maximum length (16 in my case).
This is not possible with regular expressions, because this is neither a type-3 grammatic (can be done with regular expression) nor a type-2 grammatic (can be done with regular expressions, which support recursion).
The higher grammar levels (type-1 grammatic and type-0 grammatic) can only be parsed using a Turing machine (or something compatible like your programming language).
More about this can be found here:
https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy
Using a programming language, you need to count the first sequence of digits, check for the minus and then check if the same amount of digits follows.
Without the minus symbol, this would be a type-2 grammatic and could be solved using a recursive regular expression (even if the right sequence shall not contain digits), like this: ^(\d(?1)\d)$
So you need to write your own, non-regular-expression check code.
You should probably split the String around the separator and compare the length of both parts.
The tool of choice in regex to use when specifying "the same thing than before" are back-references, however they reference the matched value rather than the matching pattern : no way of using a back-reference to .{3} to match any 3 characters.
However, if you only need to validate a finite number of lengths, it can be (painfully) done with alternation :
\d-\d will match up to 1 character on both sides of the separator
\d-\d|\d{2}-\d{2} will match up to 2 characters on both sides of the separator
...

How do I convert language set notation to regular expressions?

I have this following questing in regular expression and I just can't get my head around these kind of problems.
L1 = { 0n1m | n≥3 ∧ m is odd }
How would I write a regular expression for this sort of problem when the alphabet is {0,1}.
What's the answer?
The regular expression for your example is:
000+1(11)*1
So what does this do?
The first two characters, 00, are literal zeros. This is going to be important for the next point
The second two characters, 0+, mean "at least one zero, no upper bound". These first four characters satisfy the first condition, which is that we have at least three zeros.
The next character, 1, is a literal one. Since we need to have an odd number of ones, this is the smallest number we're allowed to have
The last-but-one characters, (11), represent a logical grouping of two literal ones, and the ending * says to match this grouping zero or more times. Since we always have at least one 1, we'll always match an odd number. So we're done.
How'd I get that?
The key is knowing regular expression syntax. I happen to have quite a bit of experience in it, but this website helped me to verify.
Once you know the basic building blocks of regex, you need to break down your problem into what you can represent.
For example, regex allows us to specify a lower AND upper bound for matching (the {x,y} syntax), but doesn't allow to specify just a lower bound ({x} will match exactly x times). So I knew I would have to use either + or * to specify the zeros, as those are the only specifiers that permit an infinite number of matches. I also knew that it didn't make sense to apply those modifiers to a group; the restriction that we must have at least 3 zeroes doesn't imply that we must have a multiple of three, for example, so (000)+ was out. I had to apply the modifier to only one character, which meant I had to match a few literals first. 000 guarantees matching exactly three 0s, and 0* (Final expression 0000*) does exactly what I want, and then I condensed that to the equivalent 000+.
For the second condition, I had to think about what an odd number is. By definition, an odd number can be expressed by 2*k + 1, where k is an integer. So I had to match one 1 (Hence the literal 1), and some number of the substring 11. That led me to the group, and then the *. On a slightly different problem, you could write 1(11)+ to match any odd number of ones, and at least 3.
1 A colleague of mine pointed out to me that the + operator isn't technically part of the formal definition of regular expressions. If this is an academic question rather than a programming one, you might find the 0000* version more helpful. In that case, the final string would be 0000*1(11)*

Regular Expression (consecutive 1s and 0s)

Hey I'm supposed to develop a regular expression for a binary string that has no consecutive 0s and no consecutive 1s. However this question is proving quite tricky. I'm not quite sure how to approach it as is.
If anyone could help that'd be great! This is new to me.
You're basically looking for alternating digits, the string:
...01010101010101...
but one that doesn't go infinitely in either direction.
That would be an optional 0 followed by any number of 10 sets followed by an optional 1:
^0?(10)*1?$
The (10)* (group) gives you as many of the alternating digits as you need and the optional edge characters allow you to start/stop with a half-group.
Keep in mind that also allows an empty string which may not be what you want, though you could argue that's still a binary string with no consecutive identical digits. If you need it to have a length of at least one, you can do that with a more complicated "or" regex like:
^(0(10)*1?)|(1(01)*0?)$
which makes the first digit (either 1 or 0) non-optional and adjusts the following sequences accordingly for the two cases.
But a simpler solution may be better if it's allowed - just ensure it has a length greater than zero before doing the regex check.

find Reg. Expr. over {0,1,2} so last symbol of string is the sum of the symbols so far on the string mod 3.

I'm learning by myself formal languages (Aho's,Hopcroft) but I'm having a hard time with regular expressions.
I've been able to tackle simple tasks but this one has posed a challenge, at least for me. How to solve this if you can't count so far, I'm not used to this type of computation.
There must be some property or something that let me generalize the answer that much that i can put it as a regular expresion.
So far I've devised that is possible that there may be at least 2 o 3 cases:
sums mod3=0 if sum=3k
sums mod3=1 if sum=3k+1
sums mod3=2 if sum=3k+2.
But I've come to realize that there may be many combinations for a sum to happen so can't find the pattern the regular expression must follow.
The string for ex. {122211}0 (braces are for easy read sake) has the zero at the end as it holds that {sum=3k}0, if the sum is "10" from a string for ex. {1222111}1 the case may be {sum=3k+1} so the one has to be at the end, and so on.
This may or not be the right track to tackle the problem but I'm open to any suggestions please, any help is very appreciated.
Here's a hint: think of what distinct final states you can possibly be in. You certainly have at least 3 states, since the number of values can be three different things mod three. Also, you need to have a distinct start state, since the empty string cannot be accepted. Do you need more states?
Hint2: I think you can easily do this with a DFA using a start state and nine other states, of which exactly three will be accepting.
EDIT: Once you have a DFA, you can use Kleene's Theorem to construct an equivalent regular expression. If you'd rather go straight for a regular expression, here's another hint: if you're looking at any string of length 3k, you can append: 0; any string of length 1, followed by 1; any string of length 2, followed by 2. So if you can write regular expressions for strings of lengths 3k, 1, and 2, you're practically done.

RegEx Numeric Check in Range?

I'm new to StackOverflow, so please let me know if there is a better way to ask the following question.
I need to create a regular expression that detects whether a field in the database is numeric, and if it is numeric does it fall within a valid range (i.e. 1-50). I've tried [1-50], which works except for the instances where a single digit number is preceded by a 0 (i.e. 06). 06 should still be considered a valid number, since I can later convert that to a number.
I really appreciate your help! I'm trying to learn more about regular expressions, and have been learning all I can from: www.regular-expressions.info. If you guys have recommendations of other sites to bone up on this stuff I would appreciate it!
Try this
^(0?[1-9])|([1-4][0-9])|(50)$
The idea of this regex is to break the problem down into cases
0?[1-9] takes care of the single digit case allowing for an optional preceeding 0
[1-4][0-9] takes care of all numbers from 10 to 49. This also allwows for a preceeding 0 on a single digit
50 takes care of 50
Regular expressions work on characters (in this case digits), not numbers. You need to have a separate pattern for each number of digits in your pattern, and combine them with | (the OR operator) like the other answers have suggested. However, consider just checking if the text is numeric with a regular expression (like [0-9]+) and then converting to an integer and checking the integer is within range.
You can't easily do range checking with regular expressions. You can -- with some work -- develop a pattern that recognizes a numeric range, but it's usually quite complex, and difficult to modify for a slightly different range.
You're better off breaking this into two parts.
Recognize the number pattern (^\d+$).
Check the range of that number in an application program.
^0?[1-50]{1,2}$