Regular expression to define some binary sequence - regex

How would you write a regular expression to define all strings of 0's and 1's that, as a binary number, represent an integer that is multiple of 3.
Some valid binary numbers would be:
11
110
1001
1100
1111

Using the DFA here we can make a regular expression the following way, where A, B, C represent the states of the DFA.
A = 1B + 0A
B = 1A + 0C
C = 1C + 0B
C = 1*0B // Eliminate recursion
B = 1A + 0(1*0B)
B = 01*0B + 1A
B = (01*0)*1A // Eliminate recursion
A = 1(01*0)*1A + 0A
A = (1(01*0)*1 + 0)A
A = (1(01*0)*1 + 0)* // Eliminate recursion
Resulting in a PCRE regex like:
/^(1(01*0)*1|0)+$/
Perl test/example:
use strict;
for(qw(
11
110
1001
1100
1111
0
1
10
111
)){
print "$_ (", eval "0b$_", ") ";
print /^(1(01*0)*1|0)+$/? "matched": "didnt match";
print "\n";
}
Outputs:
11 (3) matched
110 (6) matched
1001 (9) matched
1100 (12) matched
1111 (15) matched
0 (0) matched
1 (1) didnt match
10 (2) didnt match
111 (7) didnt match

When you divide a number by three, there are only three possible remainders (0, 1 and 2). What you're aiming at is to ensure the remainder is 0, hence a multiple of three.
This can be done by an automaton with the three states:
ST0, multiple of 3 (0, 3, 6, 9, ....).
ST1, multiple of 3 plus 1 (1, 4, 7, 10, ...).
ST2, multiple of 3 plus 2 (2, 5, 8, 11, ...).
Now think of any non-negative number (that's our domain) and multiply it by two (tack a binary zero on to the end). The transitions for that are:
ST0 -> ST0 (3n * 2 = 3 * 2n, still a multiple of three).
ST1 -> ST2 ((3n+1) * 2 = 3*2n + 2, a multiple of three, plus 2).
ST2 -> ST1 ((3n+2) * 2 = 3*2n + 4 = 3*(2n+1) + 1, a multiple of three, plus 1).
Also think of any non-negative number, multiply it by two then add one (tack a binary one on to the end). The transitions for that are:
ST0 -> ST1 (3n * 2 + 1 = 3*2n + 1, a multiple of three, plus 1).
ST1 -> ST0 ((3n+1) * 2 + 1 = 3*2n + 2 + 1 = 3*(2n+1), a multiple of three).
ST2 -> ST2 ((3n+2) * 2 + 1 = 3*2n + 4 + 1 = 3*(2n+1) + 2, a multiple of three, plus 2).
This idea is that, at the end, you need to finish up in state ST0. However, given that there can be an arbitrary number of sub-expressions (and sub-sub-expressions), it does not lend itself easily to reduction to a regular expression.
What you have to do is allow for any of the transition sequences that can get from ST0 to ST0 then just repeat them:
These boil down to the two RE sequences:
ST0 --> ST0 : 0+
[0]
ST0 --> ST1 (--> ST2 (--> ST2)* --> ST1)* --> ST0: 1(01*0)*1
[1] ([0] ([1] )* [0] )* [1]
or the regex:
(0+|1(01*0)*1)+
This captures the multiples of three, or at least the first ten that I tested. You can try as many as you like, they'll all work, that's the beauty of mathematical analysis rather than anecdotal evidence.

The answer is (1(01*0)*10*)*, which is the only one so far that works for 110011

I don't think you would. I can't believe in any language using a regular expression could ever be the best way to do this.

Related

Counting ways of breaking up a string of digits into numbers under 26

Given a string of digits, I wish to find the number of ways of breaking up the string into individual numbers so that each number is under 26.
For example, "8888888" can only be broken up as "8 8 8 8 8 8 8". Whereas "1234567" can be broken up as "1 2 3 4 5 6 7", "12 3 4 5 6 7" and "1 23 4 5 6 7".
I'd like both a recurrence relation for the solution, and some code that uses dynamic programming.
This is what I've got so far. It only covers the base cases which are a empty string should return 1 a string of one digit should return 1 and a string of all numbers larger than 2 should return 1.
int countPerms(vector<int> number, int currentPermCount)
{
vector< vector<int> > permsOfNumber;
vector<int> working;
int totalPerms=0, size=number.size();
bool areAllOverTwo=true, forLoop = true;
if (number.size() <=1)
{
//TODO: print out permetations
return 1;
}
for (int i = 0; i < number.size()-1; i++) //minus one here because we dont care what the last digit is if all of them before it are over 2 then there is only one way to decode them
{
if (number.at(i) <= 2)
{
areAllOverTwo = false;
}
}
if (areAllOverTwo) //if all the nubmers are over 2 then there is only one possable combination 3456676546 has only one combination.
{
permsOfNumber.push_back(number);
//TODO: write function to print out the permetions
return 1;
}
do
{
//TODO find all the peremtions here
} while (forLoop);
return totalPerms;
}
Assuming you either don't have zeros, or you disallow numbers with leading zeros), the recurrence relations are:
N(1aS) = N(S) + N(aS)
N(2aS) = N(S) + N(aS) if a < 6.
N(a) = 1
N(aS) = N(S) otherwise
Here, a refers to a single digit, and S to a number. The first line of the recurrence relation says that if your string starts with a 1, then you can either have it on its own, or join it with the next digit. The second line says that if you start with a 2 you can either have it on its own, or join it with the next digit assuming that gives a number less than 26. The third line is the termination condition: when you're down to 1 digit, the result is 1. The final line says if you haven't been able to match one of the previous rules, then the first digit can't be joined to the second, so it must stand on its own.
The recurrence relations can be implemented fairly directly as an iterative dynamic programming solution. Here's code in Python, but it's easy to translate into other languages.
def N(S):
a1, a2 = 1, 1
for i in xrange(len(S) - 2, -1, -1):
if S[i] == '1' or S[i] == '2' and S[i+1] < '6':
a1, a2 = a1 + a2, a1
else:
a1, a2 = a1, a1
return a1
print N('88888888')
print N('12345678')
Output:
1
3
An interesting observation is that N('1' * n) is the n+1'st fibonacci number:
for i in xrange(1, 20):
print i, N('1' * i)
Output:
1 1
2 2
3 3
4 5
5 8
6 13
7 21
8 34
9 55
If I understand correctly, there are only 25 possibilities. My first crack at this would be to initialize an array of 25 ints all to zero and when I find a number less than 25, set that index to 1. Then I would count up all the 1's in the array when I was finished looking at the string.
What do you mean by recurrence? If you're looking for a recursive function, you would need to find a good way to break the string of numbers down recursively. I'm not sure that's the best approach here. I would just go through digit by digit and as you said if the digit is 2 or less, then store it and test appending the next digit... i.e. 10*digit + next. I hope that helped! Good luck.
Another way to think about it is that, after the initial single digit possibility, for every sequence of contiguous possible pairs of digits (e.g., 111 or 12223) of length n we multiply the result by:
1 + sum, i=1 to floor (n/2), of (n-i) choose i
For example, with a sequence of 11111, we can have
i=1, 1 1 1 11 => 5 - 1 = 4 choose 1 (possibilities with one pair)
i=2, 1 11 11 => 5 - 2 = 3 choose 2 (possibilities with two pairs)
This seems directly related to Wikipedia's description of Fibonacci numbers' "Use in Mathematics," for example, in counting "the number of compositions of 1s and 2s that sum to a given total n" (http://en.wikipedia.org/wiki/Fibonacci_number).
Using the combinatorial method (or other fast Fibonacci's) could be suitable for strings with very long sequences.

Regex Counting By 3s

I'm teaching myself regular expressions, and found a quizzing site that has been helping me find more applications for them and has been helping me expand my knowledge of how they work.
I found a question asking me to form a regex to match 10 digit numbers that are multiples of 3s. The only way I can think of doing this is by having the regex recognise numbers' values and be able to manipulate them mathematically. How is this possible?
In other words, what regex would match
0003
0006
0351
1749
but not match
0005
0011
0361
4372
First you need to start with the rule that a number is divisble by three if and only if the sum of its digits is divisible by three (proving this takes a little number theory, but it helps to see that 9, 99, 999 etc. are all multiples of three and therefore 1, 10, 100, 1000, etc. all contribute the same amount to the remainder of a number when divided by three).
Then, notice that there are three kinds of digits:
The multiples of three: 0, 3, 6, and 9. These are equivalent to 0 (mod 3).
One more than multiples of three: 1, 4, and 7. These are equivalent to 1 (mod 3).
Two more than multiples of three: 2, 5, and 8. These are equivalent to -1 (mod 3). Conventionally this class is named 2, but -1 is more useful to us. Because 1 + 2 = 0 (mod 3), -1 is a legitimate name for 2.
Then, the numbers 0, 3, 6, and 9 are multiples of three. If we add any number of digits from class 0 anywhere, the number remains a multiple of three (so 33, 999, and 963 are all multiples of three). If we add a digit from class 1 anywhere, we need to either add another digit of class -1, or add two more digits of class 1 to bring the remainder back to 0. Likewise if we add a digit from class -1 anywhere, we either need to add another digit of class 1, or add two more digits of class -1 to bring the remainder back to 0.
Here's wcp's answer, formatted as a perl /x regex for readability:
/
( [0369] # 0
| [147] [0369]* [258] # 1 + 0 + -1 = 0
| ( [258] # -1
| [147] [0369]* [147] # 1 + 0 + 1 = -1
) # -1
( [0369] # 0
| [258] [0369]* [147] # -1 + 0 + 1 = 0
)* # 0
( [147] # 1
| [258] [0369]* [258] # -1 + 0 + -1 = 1
) # 1 ... -1 + 0 + 1 = 0
)+
/x
The regex matches groups of digits having remainder of 0. The first branch matches only digits of class 0; the second branch matches groups where the remainder goes up to 1 and then back to 0; and the third branch matches groups where the remainder goes down to -1 and then back to 0. There's some cleverness in how it's constructed (a less-golfed regex would have five major branches instead of three, I think), but the comments should be enough to let you follow it.
Just like #Jerry say in comment, you can use this:
([0369]|[147][0369]*[258]|([258]|[147][0369]*[147])([0369]|[258][0369]*[147])*([147]|[258][0369]*[258]))+
more short

How to find the number of sequences of zeros and ones without "111" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a problem:
I have a N (N <= 40). N is a length of sequence of zeroz and ones. How to find the number of sequences of zeros and ones in which there are no three "1" together?
Example:
N = 3, answer = 7
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
Here's a solution using a recursive function :
(PHP code here, but it's really simple)
$seq = '';
function tree ($node, $flag, $seq)
{
if ($flag == 3) { return 0; }
if ($node == 0) { echo $seq, ' '; return 0;}
$seq1 = $seq.'1';
$seq2 = $seq.'0';
tree($node-1, $flag+1, $seq1);
tree($node-1, 0, $seq2);
}
tree(8, 0, $seq);
I use a tree to go through all the possible sequences, and a flag to check how many 1 in a row.
If there is two 1 in a row, then the flag reaches 3, and the function is stopped for this branch.
If we reach a leaf of the tree (ie. $node = 0), then the sequence is displayed, and the function ends.
Else, the function explores the two sub-trees starting from the current node.
void tree ( int node, int flag, std::string seq)
{
std::string seq1 = seq;
std::string seq2 = seq;
if(flag ==3) { return; }
if(node ==0) { printf("%s\n",seq.c_str()); return;}
seq1 += '1';
seq2 += '0';
tree(node-1, flag+1, seq1);
tree(node-1, 0, seq2);
}
You can write a grammar for the (non-empty) strings of this language. It's designed so that each string appears exactly once.
S := 0 | 1 | 11 | 10 | 110 | 0S | 10S | 110S
Let a_i be the total number of strings of length i in S.
First, look at the number of strings of length 1 on both sides of the grammar rule. There's a_1 in S by definition which deals with the left-hand-side.
a_1 = 2
For a_2, on the right-hand-side we immediately get two strings of length 2 (11 and 10), plus another two from the 0S rule (00 and 01). This gives us:
a_2 = 2 + a_1 = 4
Similarly, for a_3, we get:
a_3 = 1 + a_2 + a_1 = 7
(So far so good, we've got the right solution 7 for the case where the strings are length three).
For i > 3, consider the number of strings of length i on both sides.
a_i = a_{i-1} + a_{i-2} + a_{i-3}
Now we've got a recurrence we can use. A quick check for a_4...
a_4 = a_1 + a_2 + a_3 = 2 + 4 + 7 = 13.
There's 16 strings of length 4 and three containing 111: 1110, 0111, 1111. So 13 looks right!
Here's some code in Python for the general case, using this recurrence.
def strings_without_111(n):
if n == 0: return 1
a = [2, 4, 7]
for _ in xrange(n - 1):
a = [a[1], a[2], a[0] + a[1] + a[2]]
return a[0]
This is a dp problem. I will explain the solution in a way so that it is easy to modify it to count the number of sequences having no sequence a0a1a2 in them(where ai is arbitrary binary value).
I will use 4 helper variables each counting the sequence up to a given length that are valid and end with 00, 01, 10, and 11 respectively. Name those c00, c01, c10, c11. It is pretty obvious that for length N = 2, those numbers are all 1:
int c00 = 1;
int c01 = 1;
int c10 = 1;
int c11 = 1;
Now assuming we have counted the sequences up to a given length k we count the sequences in the four groups for length k + 1 in the following manner:
int new_c00 = c10 + c00;
int new_c01 = c10 + c00;
int new_c10 = c01 + c11;
int new_c11 = c01;
The logic above is pretty simple - if we append a 0 to either a sequence of length k ending at 0 0 or ending at 1 0 we end up with a new sequence of length k + 1 and ending with 0 0 and so on for the other equations above.
Note that c11 is not added to the number of sequences ending with 1 1 and with length k + 1. That is because if we append 1 to a sequence ending with 1 1 we will end up with an invalid sequence( ending at 1 1 1).
Here is a complete solution for your case:
int c00 = 1;
int c01 = 1;
int c10 = 1;
int c11 = 1;
for (int i = 0; i < n - 2; ++i) {
int new_c00 = c10 + c00;
int new_c01 = c10 + c00;
int new_c10 = c01 + c11;
int new_c11 = c01;
c00 = new_c00;
c01 = new_c01;
c10 = new_c10;
c11 = new_c11;
}
// total valid sequences of length n
int result = c00 + c01 + c10 + c11;
cout << result << endl;
Also you will have to take special care for the case when N < 2, because the above solution does not handle that correctly.
To find a number of all possible sequences for N bits are easy. It is 2^N.
To find all sequences contains 111 a bit harder.
Assume N=3 then Count = 1
111
Assume N=4 then Count = 3
0111
1110
1111
Assume N=5 then Count = 8
11100
11101
11110
11111
01110
01111
00111
10111
If you write simple simulation program it yields 1 3 8 20 47 107 ...
Subtract 2^n - count(n) = 2 4 7 13 24 44 81 149...
Google it and it gives OEIS sequence, known as tribonacci numbers. Solved by simple recurrent equation:
a(n) = a(n - 1) + a(n - 2) + a(n - 3)

0 or 1 combinations such that we do not have two 1's immediately in sequence

My requirement is for a code to find the number of combinations of two digits only 0 and 1 for X digit size which may vary from 1 .. 1000 such that no time two 1 can be immediately in sequence but 0's are possible
Say for input of 4 digit we have
1010 1000 0000 0101 0001 0010 0100 1001
I am not sure which of algos to generate such a combinations of 0's and 1's?
The answer is given by the Fibonacci sequence.
f(n) = f(n-1) + f(n-2)
Here are the first few results:
length number of combinations
1 2 (0, 1)
2 3 (00, 01, 10)
3 5 (000, 001, 010, 100, 101)
4 8 (0000, 0001, 0010, 0100, 0101, 1000, 1001, 1010)
You can see the why there is a relationship to the Fibonacci sequence if you consider strings starting with "0" or "10" separately:
number of sequences of n digits
= number of sequences starting with 0, followed by n-1 more digits
+ number of sequences starting with 10, followed by n-2 more digits
Sequences starting with "11" are disallowed.
The Fibonacci numbers can be calculated very quickly if an appropriate technique is used, but you should be aware that the answer will grow very quickly as maxlen increases. If you want to have an exact answer you will need to use a library that can work with arbitrary large integers.
One idea is to build the complete string by using the words 10 and 0 (and 1, but only at the very end).
build(sofar, maxlen):
if len(sofar) > maxlen: return
if len(sofar) == maxlen: found(sofar); return
if len(sofar) == maxlen - 1: build(sofar + "1", maxlen)
build(sofar + "10", maxlen)
build(sofar + "0", maxlen)
The proof that this algorithm only generates valid sequences is left to you. Same with the proof that this algorithm generates all valid sequences.
How about having a function that generates these values into arrays, and another function that just checks if the current index to a value in the array is a '1' and checks if the next value is a '1' or not? If true, then discard; else, valid.

Ternary Numbers, regex

I'm looking for some regex/automata help. I'm limited to + or the Kleene Star. Parsing through a string representing a ternary number (like binary, just 3), I need to be able to know if the result is 1-less than a multiple of 4.
So, for example 120 = 0*1+2*3+1*9 = 9+6 = 15 = 16-1 = 4(n)-1.
Even a pointer to the pattern would be really helpful!
You can generate a series of values to do some observation with bc in bash:
for n in {1..40}; do v=$((4*n-1)); echo -en $v"\t"; echo "ibase=10;obase=3;$v" | bc ; done
3 10
7 21
11 102
15 120
19 201
23 212
27 1000
31 1011
...
Notice that each digit's value (in decimal) is either 1 more or 1 less than something divisible by 4, alternately. So the 1 (lsb) digit is one more than 0, the 3 (2nd) digit is one less than 4, the 9 (3rd) digit is 1 more than 8, the 27 (4th) digit is one less than 28, etc.
If you sum up all the even-placed digits and all the odd-placed digits, then add 1 to the odd-placed ones (if counting from 1), you should get equality.
In your example: odd: (0+1)+1, even: (2). So they are equal, and so the number is of the form 4n-1.