The Art of Computer Programming exercise question: Chapter 1, Question 8 - knuth

I'm doing the exercises to TAOCP Volume 1 Edition 3 and have trouble understanding the syntax used in the answer to the following exercise.
Chapter 1 Exercise 8
Computing the greatest common divisor of positive integers m & n by specifying Tj,sj,aj,bj
Let your input be represented by the string ambn (m a's followed by n b's)
Answer:
Let A = {a,b,c}, N=5. The algorithm will terminate with the string agcd(m,n)
j Tj sj bj aj
0 ab (empty) 1 2 Remove one a and one b, or go to 2.
1 (empty) c 0 0 Add c at extreme left, go back to 0.
2 a b 2 3 Change all a's to b's
3 c a 3 4 Change all c's to a's
4 b b 0 5 if b's remain, repeat
The part that I have trouble understanding is simply how to interpret this table.
Also, when Knuth says this will terminate with the string agcd(m,n) -- why the superscript for gcd(m,n) ?
Thanks for any help!
Edited with more questions:
What is Tj -- note that T = Theta
What is sj -- note that s = phi
How do you interpret columns bj and aj?
Why does Knuth switch a new notation in the solution to an example that he doesn't explain in the text? Just frustrating. Thanks!!!

Here's an implementation of that exercise answer. Perhaps it helps.
By the way, the table seems to describe a Markov algorithm.
As far as I understand so far, you start with the first command set, j = 0. Replace any occurencies of Tj with sj and jump to the next command line depending on if you replaced anything (in that case jump to bj, if nothing has been replaced, jump to aj).
EDIT: New answers:
A = {a,b,c} seems to be the character set you can operate with. c comes in during the algorithm (added to the left and later replaced by a's again).
Theta and phi could be some greek character you usually use for something like "original" and "replacement", although I wouldn't know they are.
bj and aj are the table lines to be next executed. This matches with the human-readable descriptions in the last column.
The only thing I can't answer is why Knuth uses this notation without any explanations. I browsed the first chapters and the solutions in the book again and he doesn't mention it anywhere.
EDIT2: Example for gdc(2,2) = 2
Input string: aabb
Line 0: Remove one a and one b, or go to 2.
=> ab => go to 1
Line 1: Add c at extreme left, go back to 0.
=> cab => go to 0
Line 0: Remove one a and one b, or go to 2.
=> c => go to 1
Line 1: Add c at extreme left, go back to 0.
=> cc => go to 0
Line 0: Remove one a and one b, or go to 2.
No ab found, so go to 2
Line 2: Change all a's to b's
No a's found, so go to 3
Line 3: Change all c's to a's
=> aa
Line 4: if b's remain, repeat
No b's found, so go to 5 (end).
=> Answer is "aa" => gdc(2,2) = 2
By the way, I think description to line 1 should be "Remove one "ab", or go to 2." This makes things a bit clearer.

The superscript for gcd(m,n) is due to how numbers are being represented in this table.
For example: m => a^m
n => b^n
gcd(m,n) => a^gcd(m,n)
It looks to be like Euclids algorithm is being implemented.
i.e.
gcd(m,n):
if n==0:
return m
return gcd(n,m%n)
The numbers are represented as powers so as to be able to do the modulo operation m%n.
For example, 4 % 3, will be computed as follows:
4 'a's (a^4) mod 3 'b's (b^3), which will leave 1 'a' (a^1).

the notion of am is probably a notion of input string in the state machine context.
Such notion is used to refer to m instances of consecutive a, i.e.:
a4 = aaaa
b7 = bbbbbbb
a4b7a3 = aaaabbbbbbbaaa
And what agcd(m,n) means is that after running the (solution) state machine, the resulting string should be gcd(m,n) instances of a
In other words, the number of a's in the result should be equal to the result of gcd(m,n)
And I agree with #schnaader in that it's probably a table describing Markov algorithm usages.

Related

Regular Expression for Binary Numbers Divisible by 5

I want to write a regular expression for Binary Numbers Divisible by 5.
I have already done the regular expressions for Binary Numbers Divisible by 2 and for 3 but I couldn't find one for 5.
Any suggestions?
(0|1(10)*(0|11)(01*01|01*00(10)*(0|11))*1)*
Add ^$ to test it with regexp. See it working here.
You can build a DFA and convert it to regular expression. The DFA was already built in another answer. You can read it, it is very well explained.
The general idea is to remove nodes, adding edges.
Becomes:
Using this transformation and the DFA from the answer I linked, here are the steps to get the regular expression:
(EDIT: Note that the labels "Q3" and "Q4" have been mistakenly swapped in the diagram. These states represent the remainder after modulus 5.)
2^0 = 1 = 1 mod 5
2^1 = 2 = 2 mod 5
2^2 = 4 = -1 mod 5
2^3 = 8 = -2 mod 5
2^4 = 16 = 1 mod 5
2^5 = 32 = 2 mod 5
... -1 mod 5
... -2 mod 5
So we have a 1, 2, -1, -2 pattern. There are two subpatterns where only the sign of the number alternates: Let n is the digit number and the number of the least significant digit is 0; odd pattern is
(-1)^(n)
and even pattern is
2x((-1)^(n))
So, how to use this?
Let the original number be 100011, divide the numbers digits into two parts, even and odd. Sum each parts digits separately. Multiply sum of the odd digits by 2. Now, if the result is divisible by sum of the even digits, then the original number is divisible by 5, else it is not divisible. Example:
100011
1_0_1_ 1+0+1 = 2
_0_0_1 0+0+1 = 1; 1x2 = 2
2 mod(2) equals 0? Yes. Therefore, original number is divisible.
How to apply it within a regex? Using callout functions within a regex it can be applied. Callouts provide a means of temporarily passing control to the script in the middle of regular expression pattern matching.
However, ndn's answer is more appropriate and easier, therefore I recommend to use his answer.
However, "^(0|1(10)*(0|11)(01*01|01*00(10)*(0|11))1)$" matches empty string.

How to create a column that is a repetition of a column's contents by a specified number of times mentioned in another column?

I want to use columns 'A' and 'B' to create column 'Result' which is content of A repeated B number of times
A B Result
z 3 zzz
az 2 azaz
Tried using Result=repeat(A,B) which didn't work out. Is there something I missed while using the repeat statement?
The REPEAT function returns a character value consisting of the first argument repeated n times, Thus, the first argument appears n + 1 times in the result.
So, you have to subtract 1 from B to get the result that you want.
Try
Result=repeat(A,int(B)-1)
It is simple in R! . Sorry I did not look for the tag, but here is how R does it
Try the function makeNstr() from package Hmisc
>require(Hmisc)
>df <- data.frame(A = c("a","az"), B = c(3,2))
>Result <- makeNstr(df$A,df$B)
>df <- cbind(df,Result)
>df
A B Result
1 a 3 aaa
2 az 2 azaz
Hope you find it useful

Adding one digit (0-9) to the sequence/string creates new 4 digits number

I'm trying to find an algorithm which "breaks the safe" by typing the keys 0-9. The code is 4 digits long. The safe will be open where it identifies the code as substring of the typing. meaning, if the code is "3456" so the next typing will open the safe: "123456". (It just means that the safe is not restarting every 4 keys input).
Is there an algorithm which every time it add one digit to the sequence, it creates new 4 digits number (new combinations of the last 4 digits of the sequence\string)?
thanks, km.
Editing (I post it years ago):
The question is how to make sure that every time I set an input (one digit) to the safe, I generate a new 4 digit code that was not generated before. For example, if the safe gets binary code with 3 digits long then this should be my input sequence:
0001011100
Because for every input I get a new code (3 digit long) that was not generated before:
000 -> 000
1 -> 001
0 -> 010
1 -> 101
1 -> 011
1 -> 111
0 -> 110
0 -> 100
I found a reduction to your problem:
Lets define directed graph G = (V,E) in the following way:
V = {all possible combinations of the code}.
E = {< u,v > | v can be obtained from u by adding 1 digit (at the end), and delete the first digit}.
|V| = 10^4.
Din and Dout of every vertex equal to 10 => |E| = 10^5.
You need to prove that there is Hamilton cycle in G - if you do, you can prove the existence of a solution.
EDIT1:
The algorithm:
Construct directed graph G as mentioned above.
Calculate Hamilton cycle - {v1,v2,..,vn-1,v1}.
Press every number in v1.
X <- v1.
while the safe isn't open:
5.1 X <- next vertex in the Hamilton path after X.
5.2 press the last digit in X.
We can see that because we use Hamilton cycle, we never repeat the same substring. (The last 4 presses).
EDIT2:
Of course Hamilton path is sufficient.
Here in summary is the problem I think you are trying to solve and some explanation on how i might approach solving it. http://www.cs.swan.ac.uk/~csharold/cpp/SPAEcpp.pdf
You have to do some finessing to make it fit into the chinese post man problem however...
Imagine solving this problem for the binary digits, three digits strings. Assume you have the first two digits, and ask your self what are my options to move to? (In regards to the next two digit string?)
You are left with a Directed Graph.
/-\
/ V
\- 00 ----> 01
^ / ^|
\/ ||
/\ ||
V \ |V
/-- 11 ---> 10
\ ^
\-/
Solve the Chinese Postman, you will have all combinations and will form one string
The question is now, is the Chinese postman solvable? There are algorithms which determine weather or not a DAG is solvable for the CPP, but i don't know if this particular graph is necessarily solvable based on the problem alone. That would be a good thing to determine. You do however know you could find out algorithmically weather it is solvable and if it is you could solve it using algorithms available in that paper (I think) and online.
Every vertex here has 2 incoming edges and 2 outgoing edges.
There are 4 (2^2) vertexes.
In the full sized problem there are 19683( 3 ^ 9 ) vertexs and every vertex has 512 ( 2 ^ 9 ) out going and incoming vertexes. There would be a total of
19683( 3 ^ 9 ) x 512 (2 ^ 9) = 10077696 edges in your graph.
Approach to solution:
1.) Create list of all 3 digit numbers 000 to 999.
2.) Create edges for all numbers such that last two digits of first number match first
two digits of next number.
ie 123 -> 238 (valid edge) 123 -> 128 (invalid edge)
3.) use Chinese Postman solving algorithmic approaches to discover if solvable and
solve
I would create an array of subsequences which needs to be updates upon any insertion of a new digit. So in your example, it will start as:
array = {1}
then
array = {1,2,12}
then
array = {1,2,12,3,13,23,123}
then
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234}
and when you have a sequence that is already at the length=4 you don't need to continue the concatenation, just remove the 1st digit of the sequence and insert the new digit at the end, for example, use the last item 1234, when we add 5 it will become 2345 as follows:
array = {1,2,12,3,13,23,123,4,14,24,124,34,134,234,1234,5,15,25,125,35,135,235,1235,45,145,245,1245,345,1345,2345,2345}
I believe that this is not a very complicated way of going over all the sub-sequences of a given sequence.

Constructing Strings using Regular Expressions and Boolean logic ||

How do I construct strings with exactly one occurrence of 111 from a set E* consisting of all possible combinations of elements in the set {0,1}?
You can generate the set of strings based on following steps:
Some chucks of numbers and their legal position are enumerated:
Start: 110
Must have one, anywhere: 111
Anywhere: 0, 010, 0110
End: 011
Depend on the length of target string (the length should be bigger than 3)
Condition 1: Length = 3 : {111}
Condition 2: 6 > Length > 3 : (Length-3) = 1x + 3y + 4z
For example, if length is 5: answer is (2,1,0) and (1,0,1)
(2,1,0) -> two '0' and one '010' -> ^0^010^ or ^010^0^ (111 can be placed in any one place marked as ^)
(1,0,1) -> one '0' and one '0110' ...
Condition 3: If 9 > Length > 6, you should consider the solution of two formulas:
Comments:
length – 3 : the length exclude 111
x: the times 0 occurred
y: the times 010 occurred
z: the times 0110 occurred
Finding all solutions {(x,y,z) | 1x + 3y + 4z = (Length - 3)} ----(1)
For each solution, you can generate one or more qualified string. For example, if you want to generate strings of length 10. One solution of (x,y,z) is (0,2,1), that means '010' should occurred twice and '0110' should occurred once. Based on this solution, the following strings can be generated:
0: x0 times
010: x 2 times
0110: x1 times
111: x1 times (must have)
Finding the permutations of elements above.
010-0110-010-111 or 111-010-010-0110 …
(Length - 6) = 1x + 3y + 4z ---(2)
Similar as above case, find all permutations to form an intermediate string.
Finally, for each intermediate string Istr, Istr + 011 or 110 + Istr are both qualified.
For example, (10-6) = 1*0 + 3*0 + 4*1 or = 1*1 + 3*1 + 4*0
The intermediate string can be composed by one '0110' for answer(0,0,1):
Then ^0110^011 and 110^0110^ are qualified strings (111 can be placed in any one place marked as ^)
Or the intermediate string can also be composed by one '0' and one '010' for answer (1,1,0)
The intermediate string can be 0 010 or 010 0
Then ^0010^011 and 110^0100^ are qualified strings (111 can be placed in any one place marked as ^)
Condition 4: If Length > 9, an addition formula should be consider:
(Length – 9) = 1x + 3y + 4z
Similar as above case, find all permutations to form an intermediate string.
Finally, for each intermediate string Istr, 110 + Istr + 011 is qualified.
Explaination:
The logic I use is based on Combinatorial Mathematics. A target string is viewed as a combination of one or more substrings. To fulfill the constraint ('111' appears exactly one time in target string), we should set criteria on substrings. '111' is definitely one substring, and it can only be used one time. Other substrings should prevent to violate the '111'-one-time constraint and also general enough to generate all possible target string.
Except the only-one-111, other substrings should not have more than two adjacent '1'. (Because if other substring have more than two adjacent 1, such as '111', '1111', '11111,' the substring will contain unnecessary '111'). Except the only-one-111, other substrings should not have more than two consecutive '1'. Because if other substring have more than two consecutive 1, such as '111', '1111', '11111,' the substring will contain unnecessary '111' . However, substrings '1' and '11' cannot ensure the only-one-111 constraint. For example, '1'+'11,' '11'+'11' or '1'+'1'+'1' all contain unnecessary '111'. To prevent unnecessary '111,' we should add '0' to stop more adjacent '1'. That results in three qualified substring '0', '010' and '0110'. Any combined string made from three qualified substring will contain zero times of '111'.
Above three qualified substring can be placeed anywhere in the target string, since they 100% ensure no additional '111' in target string.
If the substring's position is in the start or end, they can use only one '0' to prevent '111'.
In start:
10xxxxxxxxxxxxxxxxxxxxxxxxxxxx
110xxxxxxxxxxxxxxxxxxxxxxxxxxx
In end:
xxxxxxxxxxxxxxxxxxxxxxxxxxx011
xxxxxxxxxxxxxxxxxxxxxxxxxxxx01
These two cases can also ensure no additional '111'.
Based on logics mentioned above. We can generate any length of target string with exactly one '111'.
Your question could be clearer.
For one thing, does "1111" contain one occurrence of "111" or two?
If so, you want all strings that contain "111" but do not contain either "1111" or "111.*111". If not, omit the test for "1111".
If I understand you correctly, you're trying to construct an infinite subset of the infinite set of sequences of 0s and 1s. How you do that is probably going to depend on the language you're using (most languages don't have a way of representing infinite sets).
My best guess is that you want to generate a sequence of all sequences of 0s and 1s (which shouldn't be too hard) and select the ones that meet your criteria.

simulate a deterministic pushdown automaton (PDA) in c++

I was reading an exercise of UVA, which I need to simulate a deterministic pushdown automaton, to see
if certain strings are accepted or not by PDA on a given entry in the following format:
The first line of input will be an integer C, which indicates the number of test cases. The first line of each test case contains five integers E, T, F, S and C, where E represents the number of states in the automaton, T the number of transitions, F represents the number of final states, S the initial state and C the number of test strings respectively. The next line will contain F integers, which represent the final states of the automaton. Then come T lines, each with 2 integers I and J and 3 strings, L, T and A, where I and J (0 ≤ I, J < E) represent the state of origin and destination of a transition state respectively. L represents the character read from the tape into the transition, T represents the symbol found at the top of the stack and A the action to perform with the top of the stack at the end of this transition (the character used to represent the bottom of the pile is always Z. to represent the end of the string, or unstack the action of not taking into account the top of the stack for the transition character is used <alt+156> £). The alphabet of the stack will be capital letters. For chain A, the symbols are stacked from right to left (in the same way that the program JFlap, ie, the new top of the stack will be the character that is to the left). Then come C lines, each with an input string. The input strings may contain lowercase letters and numbers (not necessarily present in any transition).
The output in the first line of each test case must display the following string "Case G:", where G represents the number of test case (starting at 1). Then C lines on which to print the word "OK" if the automaton accepts the string or "Reject" otherwise.
For example:
Input:
2
3 5 1 0 5
2
0 0 1 Z XZ
0 0 1 X XX
0 1 0 X X
1 1 1 X £
1 2 £ Z Z
111101111
110111
011111
1010101
11011
4 6 1 0 5
3
1 2 b A £
0 0 a Z AZ
0 1 a A AAA
1 0 a A AA
2 3 £ Z Z
2 2 b A £
aabbb
aaaabbbbbb
c1bbb
abbb
aaaaaabbbbbbbbb
this is the output:
Output:
Case 1:
Accepted
Rejected
Rejected
Rejected
Accepted
Case 2:
Accepted
Accepted
Rejected
Rejected
Accepted
I need some help, or any idea how I can simulate this PDA, I am not asking me a code that solves the problem because I want to make my own code (The idea is to learn right??), But I need some help (Some idea or pseudocode) to begin implementation.
You first need a data structure to keep transitions. You can use a vector with a transition struct that contains transition quintuples. But you can use fact that states are integer and create a vector which keeps at index 0, transitions from state 0; at index 1 transitions from state 1 like that. This way you can reduce searching time for finding correct transition.
You can easily use the stack in stl library for the stack. You also need search function it could chnage depending on your implementation if you use first method you can use a function which is like:
int findIndex(vector<quintuple> v)//which finds the index of correct transition otherwise returns -1
then use the return value to get newstate and newstack symbol.
Or you can use a for loop over the vector and bool flag which represents transition is found or not.
On second method you can use a function which takes references to new state and new stack symbol and set them if you find a appropriate transition.
For inputs you can use something like vector or vector depends on personal taste. You can implement your main method with for loops but if you want extra difficulties you can implement a recursive function. May it be easy.