Constructing Strings using Regular Expressions and Boolean logic || - regex

How do I construct strings with exactly one occurrence of 111 from a set E* consisting of all possible combinations of elements in the set {0,1}?

You can generate the set of strings based on following steps:
Some chucks of numbers and their legal position are enumerated:
Start: 110
Must have one, anywhere: 111
Anywhere: 0, 010, 0110
End: 011
Depend on the length of target string (the length should be bigger than 3)
Condition 1: Length = 3 : {111}
Condition 2: 6 > Length > 3 : (Length-3) = 1x + 3y + 4z
For example, if length is 5: answer is (2,1,0) and (1,0,1)
(2,1,0) -> two '0' and one '010' -> ^0^010^ or ^010^0^ (111 can be placed in any one place marked as ^)
(1,0,1) -> one '0' and one '0110' ...
Condition 3: If 9 > Length > 6, you should consider the solution of two formulas:
Comments:
length – 3 : the length exclude 111
x: the times 0 occurred
y: the times 010 occurred
z: the times 0110 occurred
Finding all solutions {(x,y,z) | 1x + 3y + 4z = (Length - 3)} ----(1)
For each solution, you can generate one or more qualified string. For example, if you want to generate strings of length 10. One solution of (x,y,z) is (0,2,1), that means '010' should occurred twice and '0110' should occurred once. Based on this solution, the following strings can be generated:
0: x0 times
010: x 2 times
0110: x1 times
111: x1 times (must have)
Finding the permutations of elements above.
010-0110-010-111 or 111-010-010-0110 …
(Length - 6) = 1x + 3y + 4z ---(2)
Similar as above case, find all permutations to form an intermediate string.
Finally, for each intermediate string Istr, Istr + 011 or 110 + Istr are both qualified.
For example, (10-6) = 1*0 + 3*0 + 4*1 or = 1*1 + 3*1 + 4*0
The intermediate string can be composed by one '0110' for answer(0,0,1):
Then ^0110^011 and 110^0110^ are qualified strings (111 can be placed in any one place marked as ^)
Or the intermediate string can also be composed by one '0' and one '010' for answer (1,1,0)
The intermediate string can be 0 010 or 010 0
Then ^0010^011 and 110^0100^ are qualified strings (111 can be placed in any one place marked as ^)
Condition 4: If Length > 9, an addition formula should be consider:
(Length – 9) = 1x + 3y + 4z
Similar as above case, find all permutations to form an intermediate string.
Finally, for each intermediate string Istr, 110 + Istr + 011 is qualified.
Explaination:
The logic I use is based on Combinatorial Mathematics. A target string is viewed as a combination of one or more substrings. To fulfill the constraint ('111' appears exactly one time in target string), we should set criteria on substrings. '111' is definitely one substring, and it can only be used one time. Other substrings should prevent to violate the '111'-one-time constraint and also general enough to generate all possible target string.
Except the only-one-111, other substrings should not have more than two adjacent '1'. (Because if other substring have more than two adjacent 1, such as '111', '1111', '11111,' the substring will contain unnecessary '111'). Except the only-one-111, other substrings should not have more than two consecutive '1'. Because if other substring have more than two consecutive 1, such as '111', '1111', '11111,' the substring will contain unnecessary '111' . However, substrings '1' and '11' cannot ensure the only-one-111 constraint. For example, '1'+'11,' '11'+'11' or '1'+'1'+'1' all contain unnecessary '111'. To prevent unnecessary '111,' we should add '0' to stop more adjacent '1'. That results in three qualified substring '0', '010' and '0110'. Any combined string made from three qualified substring will contain zero times of '111'.
Above three qualified substring can be placeed anywhere in the target string, since they 100% ensure no additional '111' in target string.
If the substring's position is in the start or end, they can use only one '0' to prevent '111'.
In start:
10xxxxxxxxxxxxxxxxxxxxxxxxxxxx
110xxxxxxxxxxxxxxxxxxxxxxxxxxx
In end:
xxxxxxxxxxxxxxxxxxxxxxxxxxx011
xxxxxxxxxxxxxxxxxxxxxxxxxxxx01
These two cases can also ensure no additional '111'.
Based on logics mentioned above. We can generate any length of target string with exactly one '111'.

Your question could be clearer.
For one thing, does "1111" contain one occurrence of "111" or two?
If so, you want all strings that contain "111" but do not contain either "1111" or "111.*111". If not, omit the test for "1111".
If I understand you correctly, you're trying to construct an infinite subset of the infinite set of sequences of 0s and 1s. How you do that is probably going to depend on the language you're using (most languages don't have a way of representing infinite sets).
My best guess is that you want to generate a sequence of all sequences of 0s and 1s (which shouldn't be too hard) and select the ones that meet your criteria.

Related

Regex exact number of exact characters in random position

For example, I have a string:
R-G-B-G
I want to know is this string contains at least 2 'G', 1 'R' and 1 'B'.
String can change it size from 1 to 11 symbols, letters can appear at random position separated by '-'.
3 'R' -> R-R-G-R-R-B -> match
2 'B' -> ... -> not match
1 'R' 1 'G' 1 'B' -> ... -> match
Edit:
Moving forward i need to check some more cases.
My strings consist of Sockets, Links and Colors, ex:
R R-R B-R-R has 6 Sockets, one 2-link, one 3-link, 5R, 1B and 0G colors.
User sends me a number of sockets, length of the link, and desired colors in that link, and i must tell if it has all that or not. I've built expressions to match each individual case, but i cant figure out how to put them together.
6 sockets, Link with size 2, 2 R colors
All of that combined must return true because R R-R B-R-R, on the other hand, string: R R R B-R-R must be false because 2 R link is part of a 3 link B-R-R.
Or should i just run 3 separate expressions with output of previous sent as an input to the next.
You can chain look-aheads (?=...) after the start ^ for each required letter.
F.e. for at least two 'R' and two 'G' and one 'B'.
And for a string that's 1 to 11 long.
^(?=(?:.*?R){2})(?=(?:.*?G){2})(?=(?:.*?B){1}).{1,11}$
Test here on regex101
Note that the lazy searches *? used in those look-aheads is just a speed optimalization.
It won't matter much in this case, since it's such a short string.

Linear time construction of a data structure to answer in constant time if two strings have 2 common letters

Problem statement:
Given a string which contains words separated by blanks(spaces) it's required to construct a data structure in linear time (O(n) where n is the number of characters in the string) that can answer in constant time (O(1)) if two words in that string have two common characters.
Any ideas on how that data structure would look like?
Thanks.
Your question is ambiguous.
What is the definition of a word?
Exactly two characters in common or at least two characters in common?
Do you just need to answer if a) there exists two words in the string which have two common characters or b) you will be given some pairs of words and need to answer if those words have two common characters for each pair?
In case of (b) of point 3, what is the format of input? Will you be given whole word as input or just the indices of the words in the string?
I'm assuming a word consists of only alphabetical character (a-zA-Z) and you will be given arbitrary pair of words by their index in the input string.
Define array f such that, (f[a],...,f[z]) = (0,...,25) and (f[A],...,f[Z]) = (26,...,51).
Let input string S consists of words W[0], W[1], ..., W[n-1]. Define a map from a word to integer by F(W) = OR(1 << f[c]) for all characters c in word W
If we have to find out if two words W1, W2 has (assuming at least) two characters in common, we just need to find out if B = F(W1) & F(W2) has at least two bits. You can either loop over all bits of B to find if it has at least two bits, which is still O(1), or you can check B && (B & (B-1)) is true. (Explanation: B & (B-1) unset the lowest set bit of B, so if that is non-zero B must have at least two bits)
Now you can precompute F(w) for each word in O(|S|) and then output for each query in O(1) by comparing the F-values.

Regex Verification of String in Correct Order with Delimiters in PHP

I'm trying to make a expression to verify that the string supplied is a valid format, but it seems that if I don't use regex in a few months, I forget everything I learned and have to relearn it.
My expression is supposed to match a format like this: 010L0404FFCCAANFFCC00M000000XXXXXX
The four delimiters are (L, N, K, M) which arent in the 0-9A-F hexidecimal range to indicate uniqueness must be in that order or not in the list. Each delimiter can only exist once!
It breaks down to this:
Starts off with a 3 digit numbers, which is simply ^([0-9]{3}) and is always required
Second set begins with L, and must be 2 digits + 2 digits + 6 hexdecimal and is optional
Third set begins with N and must be a 6 digit hexdecimal and is optional
The fourth set K is simply any amount of numbers and is optional
The fifth set is M and can be any 6 hexdecimals or XXXXXX to indicate nothing, it must be in multiples of 6 excluding 0, like 336699 (6) or 336699XXXXXXFFCC00 (18) and is optional
The hardest part I cant figure out making it require it in that order, and in multiples, like the L delimiter must come before and K always if it's there (the reason so I don't get variations of the same string which means the same thing with delimiters swapped). I can already parse it, I just want to verify the string is the correct format.
Any help would be appreciated, thanks.
Requiring the order isn't too bad. Just make each set optional. The regex will still match in order, so if the L section, for example, isn't there and the next character is N, it won't let L occur later since it won't match any of the rest of the regex.
I believe a direct translation of your requirements would be:
^([0-9]{3})(L[0-9]{4}[0-9A-F]{6})?(N[0-9A-F]{6})?(K[0-9]+)?(M([0-9A-F]{6}|X{6})+)?$
No real tricks, just making each group optional except for the first three digits, and adding an internal alternative for the two patterns of six digits in the M block.
^([0-9]{3})(L[0-9]{4}[0-9A-F]{6})?(N[0-9A-F]{6})?(K[0-9]+)?(M([0-9A-F]{6})+|MX{6})$

Adding one digit (0-9) to the sequence/string creates new 4 digits number

I'm trying to find an algorithm which "breaks the safe" by typing the keys 0-9. The code is 4 digits long. The safe will be open where it identifies the code as substring of the typing. meaning, if the code is "3456" so the next typing will open the safe: "123456". (It just means that the safe is not restarting every 4 keys input).
Is there an algorithm which every time it add one digit to the sequence, it creates new 4 digits number (new combinations of the last 4 digits of the sequence\string)?
thanks, km.
Editing (I post it years ago):
The question is how to make sure that every time I set an input (one digit) to the safe, I generate a new 4 digit code that was not generated before. For example, if the safe gets binary code with 3 digits long then this should be my input sequence:
0001011100
Because for every input I get a new code (3 digit long) that was not generated before:
000 -> 000
1 -> 001
0 -> 010
1 -> 101
1 -> 011
1 -> 111
0 -> 110
0 -> 100
I found a reduction to your problem:
Lets define directed graph G = (V,E) in the following way:
V = {all possible combinations of the code}.
E = {< u,v > | v can be obtained from u by adding 1 digit (at the end), and delete the first digit}.
|V| = 10^4.
Din and Dout of every vertex equal to 10 => |E| = 10^5.
You need to prove that there is Hamilton cycle in G - if you do, you can prove the existence of a solution.
EDIT1:
The algorithm:
Construct directed graph G as mentioned above.
Calculate Hamilton cycle - {v1,v2,..,vn-1,v1}.
Press every number in v1.
X <- v1.
while the safe isn't open:
5.1 X <- next vertex in the Hamilton path after X.
5.2 press the last digit in X.
We can see that because we use Hamilton cycle, we never repeat the same substring. (The last 4 presses).
EDIT2:
Of course Hamilton path is sufficient.
Here in summary is the problem I think you are trying to solve and some explanation on how i might approach solving it. http://www.cs.swan.ac.uk/~csharold/cpp/SPAEcpp.pdf
You have to do some finessing to make it fit into the chinese post man problem however...
Imagine solving this problem for the binary digits, three digits strings. Assume you have the first two digits, and ask your self what are my options to move to? (In regards to the next two digit string?)
You are left with a Directed Graph.
/-\
/ V
\- 00 ----> 01
^ / ^|
\/ ||
/\ ||
V \ |V
/-- 11 ---> 10
\ ^
\-/
Solve the Chinese Postman, you will have all combinations and will form one string
The question is now, is the Chinese postman solvable? There are algorithms which determine weather or not a DAG is solvable for the CPP, but i don't know if this particular graph is necessarily solvable based on the problem alone. That would be a good thing to determine. You do however know you could find out algorithmically weather it is solvable and if it is you could solve it using algorithms available in that paper (I think) and online.
Every vertex here has 2 incoming edges and 2 outgoing edges.
There are 4 (2^2) vertexes.
In the full sized problem there are 19683( 3 ^ 9 ) vertexs and every vertex has 512 ( 2 ^ 9 ) out going and incoming vertexes. There would be a total of
19683( 3 ^ 9 ) x 512 (2 ^ 9) = 10077696 edges in your graph.
Approach to solution:
1.) Create list of all 3 digit numbers 000 to 999.
2.) Create edges for all numbers such that last two digits of first number match first
two digits of next number.
ie 123 -> 238 (valid edge) 123 -> 128 (invalid edge)
3.) use Chinese Postman solving algorithmic approaches to discover if solvable and
solve
I would create an array of subsequences which needs to be updates upon any insertion of a new digit. So in your example, it will start as:
array = {1}
then
array = {1,2,12}
then
array = {1,2,12,3,13,23,123}
then
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234}
and when you have a sequence that is already at the length=4 you don't need to continue the concatenation, just remove the 1st digit of the sequence and insert the new digit at the end, for example, use the last item 1234, when we add 5 it will become 2345 as follows:
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234,5,15,25,125,35,135,235,1235,45,145,245,1245,345,1345,2345,2345}
I believe that this is not a very complicated way of going over all the sub-sequences of a given sequence.

Regular expression for bit strings with even number of 1s

Let L= { w in (0+1)* | w has even number of 1s}, i.e. L is the set of all bit strings with even number of 1s. Which one of the regular expressions below represents L?
A) (0*10*1)*
B) 0*(10*10*)*
C) 0*(10*1)* 0*
D) 0*1(10*1)* 10*
According to me option D is never correct because it does not represent the bit string with zero 1s. But what about the other options? We are concerned about the number of 1s(even or not) not the number of zeros doesn't matter.
Then which is the correct option and why?
A if false. It doesn't get matched by 0110 (or any zeros-only non-empty string)
B represents OK. I won't bother proving it here since the page margins are too small.
C doesn't get matched by 010101010 (zero in the middle is not matched)
D as you said doesn't get matched by 00 or any other # with no ones.
So only B
To solve such a problem you should
Supply counterexample patterns to all "incorrect" regexps. This will be either a string in L that is not matched, or a matched string out of L.
To prove the remaining "correct" pattern, you should answer two questions:
Does every string that matches the pattern belong to L? This can be done by devising properties each of matched strings should satisfy--for example, number of occurrences of some character...
Is every string in L matched by the regexp? This is done by dividing L into easily analyzable subclasses, and showing that each of them matches pattern in its own way.
(No concrete answers due to [homework]).
Examining the pattern B:
^0*(10*10*)*$
^ # match beginning of string
0* # match zero or more '0'
( # start group 1
10* # match '1' followed by zero or more '0'
10* # match '1' followed by zero or more '0'
)* # end group 1 - match zero or more times
$ # end of string
Its pretty obvious that this pattern will only match strings who have 0,2,4,... 1's.
Look for examples that should match but don't. 0, 11011, and 1100 should all match, but each one fails for one of those four
C is incorrect because it does not allow any 0s between the second 1 of one group and the first 1 of the next group.
This answer would be best for this language
(0*10*10*)
a quick python script actually eliminated all the possibilities:
import re
a = re.compile("(0*10*1)*")
b = re.compile("0*(10*10*)*")
c = re.compile("0*(10*1)* 0*")
d = re.compile("0*1(10*1)* 10*")
candidates = [('a',a),('b',b),('c',c),('d',d)]
tests = ['0110', '1100', '0011', '11011']
for test in tests:
for candidate in candidates:
if not candidate[1].match(test):
candidates.remove(candidate)
print "removed %s because it failed on %s" % (candidate[0], test)
ntests = ['1', '10', '01', '010', '10101']
for test in ntests:
for candidate in candidates:
if candidate[1].match(test):
candidates.remove(candidate)
print "removed %s because it matched on %s" % (candidate[0], test)
the output:
removed c because it failed on 0110
removed d because it failed on 0110
removed a because it matched on 1
removed b because it matched on 10