Minimize the number of consecutive equal extractions in a deque of map<string, string> - c++

I hope this place is the best for this kind of question.
I've the following problem (I think is more complex than it appears).
I'm using a double-ended queue (deque) data structure of strings.
deque < string > extractions;
The deque contains only N different strings, every string repeated for M times in random order, so that the lenght of the deque is N*M, for example, suppose M=4, N=2, string1="A", string2="B":
extractions[1] = "A"
extractions[2] = "A"
extractions[3] = "B"
extractions[4] = "B"
extractions[5] = "A"
extractions[6] = "B"
extractions[7] = "B"
extractions[8] = "A"
I'm in search of an algorithm which allows me to find an interesting configuration in which there aren't two consecutive equal elements, in this case there should be only two solutions, the "A","B","A","B","A","B","A","B" and the "B","A","B","A","B","A","B","A".
For "interesting" configuration I mean the configuration not simply given by a number N of nested loops.
A very stupid solution I've implemented is to randomly shuffle the deck with std::random_shuffle until no occurence of consecutive equal elements is found, but this is both stupid and slow, this is more like a bogosort...
Clearly maximizing the edit distance between the strings should be better.
Some hint?

Start with a trivial configuration, e.g for N=4 an M=4, start from
A B C D A B C D A B C D A B C D
and then run a standard shuffling algorithm but observing the constraint that you do not bring two equal elements next to each others, i.e.
for i = 0 .. N*M - 2
let j = random(N*M - 2 - i) + i + 1
if ((i == 0 || array[i - 1] != array[j])
&& (array[i + 1] != array[j])
&& (array[i] != array[j - 1])
&& (j == N*M - 1 || array[i] != array[j + 1]))
swap (array[i], array[j])
This should leave you very quickly with a random configuration that fulfills your requirement of not having two consecutive equal elements.

int total = m * n;
for (int i = 1; i < total - 1; i++)
{
int j = total - 1;
while ((j > i) && (queue[i - 1] == queue[j]))
j--;
if (queue[i - 1] == queue[j])
{
String aux = queue[i - 1];
queue[i - 1] = queue[j];
queue[j] = aux;
}
}
This code is not tested, but you get the idea.

I would do it with recursion:
example is in C#: I find it more "speaking" than the nested loops:
public List<String> GetPermutations(string s, IEnumerable<String> parts, string lastPart, int targetLength)
{
List<String> possibilities = new List<string>();
if (s.Length >= targetLength)
{
possibilities.Add(s);
return possibilities;
}
foreach (String part in parts)
{
if (!part.Equals(lastPart))
{
possibilities.AddRange(
GetPermutations(s + part, parts, part, targetLength));
}
}
return possibilities;
}
usage:
List<String> parts = new List<String>() { "A", "B", "C"};
int numOccurences = 4;
List<String> results =
GetPermutations("", parts, "", numOccurences * parts.Count );
But if you want just a single possible solution (which is way faster to calculate of course):
it will create you a random, non trivial solution like: CACBCBCABABACAB (for A, B, C)
public String GetRandomValidPermutation(
string s,
List<String> parts,
string lastPart,
int targetLength)
{
if (s.Length >= targetLength)
{
return s;
}
String next = String.Empty;
while(
(next = parts[new Random().Next(0, parts.Count)])
.Equals(lastPart)
){}
return GetRandomValidPermutation(s + next, parts, next, targetLength);
}
call:
String validString =
GetRandomValidPermutation("", parts, "", numOccurences * parts.Count);

Related

Substrings of equal length comparison using hashing

On an assignment that I have, for a string S, I need to compare two substrings of equal lengths. Output should be "Yes" if they are equal, "No" if they are not equal. I am given the starting indexes of two substrings (a and b), and the length of the substrings L.
For example, for S = "Hello", a = 1, b = 3, L = 2, the substrings are:
substring1 = "el" and substring2 = "lo", which aren't equal, so answer will be "No".
I think hashing each substring of the main string S and writing them all to memory would be a good aproach to take. Here is the code I have written for this (I have tried to implement what I learned about this from the Coursera course that I was taking):
This function takes any string, and values for p and x for hashing thing, and performs a polynomial hash on the given string.
long long PolyHash(string str, long long p, int x){
long long res = 0;
for(int i = str.length() - 1; i > -1; i--){
res = (res * x + (str[i] - 'a' + 1)) % p;
}
return res;
}
The function below just precomputes all hashes, and fills up an array called ah, which is initialized in the main function. The array ah consists of n = string length rows, and n = string length columns (half of which gets wasted because I couldn't find how to properly make it work as a triangle, so I had to go for a full rectangular array). Assuming n = 7, then ah[0]-ah[6] are hash values for string[0]-string[6] (meaning all substrings of length 1). ah[7]-ah[12] are hash values for string[0-1]-string[5-6] (meaning all substrings of length 2), and etc. until the end.
void PreComputeAllHashes(string str, int len, long long p, int x, long long* ah){
int n = str.length();
string S = str.substr(n - len, len);
ah[len * n + n - len] = PolyHash(S, p, x);
long long y = 1;
for(int _ = 0; _ < len; _++){
y = (y * x) % p;
}
for(int i = n - len - 1; i > -1; i--){
ah[n * len + i] = (x * ah[n * len + i + 1] + (str[i] - 'a' + 1) - y * (str[i + len] - 'a' + 1)) % p;
}
}
And below is the main function. I took p equal to some large prime number, and x to be some manually picked, somewhat "random" prime number.
I take the text as input, initialize hash array, fill the hash array, and then take queries as input, to answer all queries from my array.
int main(){
long long p = 1e9 + 9;
int x = 78623;
string text;
cin >> text;
long long* allhashes = new long long[text.length() * text.length()];
for(int i = 1; i <= text.length(); i++){
PreComputeAllHashes(text, i, p, x, allhashes);
}
int queries;
cin >> queries;
int a, b, l;
for(int _ = 0; _ < queries; _++){
cin >> a >> b >> l;
if(a == b){
cout << "Yes" << endl;
}else{
cout << ((allhashes[l * text.length() + a] == allhashes[l * text.length() + b]) ? "Yes" : "No") << endl;
}
}
return 0;
}
However, one of the test cases for this assignment on Coursera is throwing an error like this:
Failed case #7/14: unknown signal 6 (Time used: 0.00/1.00, memory used: 29396992/536870912.)
Which, I have looked up online, and means the following:
Unknown signal 6 (or 7, or 8, or 11, or some other).This happens when your program crashes. It can be
because of division by zero, accessing memory outside of the array bounds, using uninitialized
variables, too deep recursion that triggers stack overflow, sorting with contradictory comparator,
removing elements from an empty data structure, trying to allocate too much memory, and many other
reasons. Look at your code and think about all those possibilities.
And I've been looking at my code the entire day, and still haven't been able to come up with a solution to this error. Any help to fix this would be appreciated.
Edit: The assignment states that the length of the input string can be up to 500000 characters long, and the number of queries can be up to 100000. This task also has 1 second time limit, which is pretty small for going over characters one by one for each string.
So, I did some research as to how I can reduce the complexity of this algorithm that I have implemented, and finally found it! Turns out there is a super-simple way (well, not if you count the theory involved behind it) to get hash value of any substring, given the prefix hashes of the initial string!
You can read more about it here, but I will try to explain it briefly.
So what do we do - We precalculate all the hash values for prefix-substrings.
Prefix substrings for a string "hello" would be the following:
h
he
hel
hell
hello
Once we have hash values of all these prefix substrings, we can collect them in a vector such that:
h[str] = str[0] + str[1] * P + str[2] * P^2 + str[3] * P^3 + ... + str[N] * P^N
where P is any prime number (I chose p = 263)
Then, we need a high value that we will take everything's modulo by, just to keep things not too large. This number I will choose m = 10^9 + 9.
First I am creating a vector to hold the precalculated powers of P:
vector<long long> p_pow (s.length());
p_pow[0] = 1;
for(size_t i=1; i<p_pow.size(); ++i){
p_pow[i] = (m + (p_pow[i-1] * p) % m) % m;
}
Then I calculate the vector of hash values for prefix substrings:
vector<long long> h (s.length());
for (size_t i=0; i<s.length(); ++i){
h[i] = (m + (s[i] - 'a' + 1) * p_pow[i] % m) % m;
if(i){
h[i] = (m + (h[i] + h[i-1]) % m) % m;
}
}
Suppose I have q queries, each of which consist of 3 integers: a, b, and L.
To check equality for substrings s1 = str[a...a+l-1] and s2 = str[b...b+l-1], I can compare the hash values of these substrings. And to get the hash value of substrings using the has values of prefix substrings that we just created, we need to use the following formula:
H[I..J] * P[I] = H[0..J] - H[0..I-1]
Again, you can read about the proof of this in the link.
So, to address each query, I would do the following:
cin >> a >> b >> len;
if(a == b){ // just avoid extra calculation, saves little time
cout << "Yes" << endl;
}else{
long long h1 = h[a+len-1] % m;
if(a){
h1 = (m + (h1 - h[a-1]) % m) % m;
}
long long h2 = h[b+len-1] % m;
if(b){
h2 = (m + (h2 - h[b-1]) % m) % m;
}
if (a < b && h1 * p_pow[b-a] % m == h2 % m || a > b && h1 % m == h2 * p_pow[a-b] % m){
cout << "Yes" << endl;
}else{
cout << "No" << endl;
}
}
Your approach is very hard and complex for such a simple task. Assuming that you only need to do this operation once. You can compare the substrings manually with a for loop. No need for hashing. Take a look at this code:
for(int i = a, j = b, counter = 0 ; counter < L ; counter++, i++, j++){
if(S[i] != S[j]){
cout << "Not the same" << endl;
return 0;
}
}
cout << "They are the same" << endl;

count distinct slices in an array

I was trying to solve this problem.
An integer M and a non-empty zero-indexed array A consisting of N
non-negative integers are given. All integers in array A are less than
or equal to M.
A pair of integers (P, Q), such that 0 ≤ P ≤ Q < N, is called a slice
of array A. The slice consists of the elements A[P], A[P + 1], ...,
A[Q]. A distinct slice is a slice consisting of only unique numbers.
That is, no individual number occurs more than once in the slice.
For example, consider integer M = 6 and array A such that:
A[0] = 3
A[1] = 4
A[2] = 5
A[3] = 5
A[4] = 2
There are exactly nine distinct slices: (0, 0), (0, 1), (0, 2), (1,
1), (1,2), (2, 2), (3, 3), (3, 4) and (4, 4).
The goal is to calculate the number of distinct slices.
Thanks in advance.
#include <algorithm>
#include <cstring>
#include <cmath>
#define MAX 100002
// you can write to stdout for debugging purposes, e.g.
// cout << "this is a debug message" << endl;
using namespace std;
bool check[MAX];
int solution(int M, vector<int> &A) {
memset(check, false, sizeof(check));
int base = 0;
int fibot = 0;
int sum = 0;
while(fibot < A.size()){
if(check[A[fibot]]){
base = fibot;
}
check[A[fibot]] = true;
sum += fibot - base + 1;
fibot += 1;
}
return min(sum, 1000000000);
}
The solution is not correct because your algorithm is wrong.
First of all, let me show you a counter example. Let A = {2, 1, 2}. The first iteration: base = 0, fibot = 0, sum += 1. That's right. The second one: base = 0, fibot = 1, sum += 2. That's correct, too. The last step: fibot = 2, check[A[fibot]] is true, thus, base = 2. But it should be 1. So your code returns1 + 2 + 1 = 4 while the right answer 1 + 2 + 2 = 5.
The right way to do it could be like this: start with L = 0. For each R from 0 to n - 1, keep moving the L to the right until the subarray contais only distinct values (you can maintain the number of occurrences of each value in an array and use the fact that A[R] is the only element that can occur more than once).
There is one more issue with your code: the sum variable may overflow if int is 32-bit type on the testing platform (for instance, if all elements of A are distinct).
As for the question WHY your algorithm is incorrect, I have no idea why it should be correct in the first place. Can you prove it? The base = fibot assignment looks quite arbitrary to me.
I would like to share the explanation of the algorithm that I have implemented in C++ followed by the actual implementation.
Notice that the minimum amount of distinct slices is N because each element is a distinct one-item slice.
Start the back index from the first element.
Start the front index from the first element.
Advance the front until we find a duplicate in the sequence.
In each iteration, increment the counter with the necessary amount, this is the difference between front and back.
If we reach the maximum counts at any iteration, just return immediately for slight optimisation.
In each iteration of the sequence, record the elements that have occurred.
Once we have found a duplicate, advance the back index one ahead of the duplicate.
While we advance the back index, clear all the occurred elements since we start a new slice beyond those elements.
The runtime complexity of this solution is O(N) since we go through each
element.
The space complexity of this solution is O(M) because we have a hash to store
the occurred elements in the sequences. The maximum element of this hash is M.
int solution(int M, vector<int> &A)
{
int N = A.size();
int distinct_slices = N;
vector<bool> seq_hash(M + 1, false);
for (int back = 0, front = 0; front < N; ++back) {
while (front < N and !seq_hash[A[front]]) { distinct_slices += front - back; if (distinct_slices > 1000000000) return 1000000000; seq_hash[A[front++]] = true; }
while (front < N and back < N and A[back] != A[front]) seq_hash[A[back++]] = false;
seq_hash[A[back]] = false;
}
return distinct_slices;
}
100% python solution that helped me, thanks to https://www.martinkysel.com/codility-countdistinctslices-solution/
def solution(M, A):
the_sum = 0
front = back = 0
seen = [False] * (M+1)
while (front < len(A) and back < len(A)):
while (front < len(A) and seen[A[front]] != True):
the_sum += (front-back+1)
seen[A[front]] = True
front += 1
else:
while front < len(A) and back < len(A) and A[back] != A[front]:
seen[A[back]] = False
back += 1
seen[A[back]] = False
back += 1
return min(the_sum, 1000000000)
Solution with 100% using Ruby
LIMIT = 1_000_000_000
def solution(_m, a)
a.each_with_index.inject([0, {}]) do |(result, slice), (back, i)|
return LIMIT if result >= LIMIT
slice[back] = true
a[(i + slice.size)..-1].each do |front|
break if slice[front]
slice[front] = true
end
slice.delete back
[result + slice.size, slice]
end.first + a.size
end
Using Caterpillar algorithm and the formula that S(n+1) = S(n) + n + 1 where S(n) is count of slices for n-element array java solution could be:
public int solution(int top, int[] numbers) {
int len = numbers.length;
long count = 0;
if (len == 1) return 1;
int front = 0;
int[] counter = new int[top + 1];
for (int i = 0; i < len; i++) {
while(front < len && counter[numbers[front]] == 0 ) {
count += front - i + 1;
counter[numbers[front++]] = 1;
}
while(front < len && numbers[i] != numbers[front] && i < front) {
counter[numbers[i++]] = 0;
}
counter[numbers[i]] = 0;
if (count > 1_000_000_000) {
return 1_000_000_000;
}
}
return count;
}

Find all ordered sequences with maximum length in strings

I have following problem to solve:
There are two strings of arbitrary length with arbitrary content. I need to find all ordered sequences with maximum length, which appears in both strings.
Example 1:
input: "a1b2c3" "1a2b3c"
output: "123" "12c" "1b3" "1bc" "a23" "a2c" "ab3" "abc"
Example 2:
input: "cadb" "abcd"
output: "ab" "ad" "cd"
I wrote it in straight way with two loops, recursion, then removing duplicates and results which are part of larger result (for instance "abc" sequence contains "ab" "ac" and "bc" sequences, so I am filtering those)
// "match" argument here used as temporary buffer
void match_recursive(set<string> &matches, string &match, const string &a_str1, const string &a_str2, size_t a_pos1, size_t a_pos2)
{
bool added = false;
for(size_t i = a_pos1; i < a_str1.length(); ++i)
{
for(size_t j = a_pos2; j < a_str2.length(); ++j)
{
if(a_str1[i] == a_str2[j])
{
match.push_back(a_str1[i]);
if(i < a_str1.length() - 1 && j < a_str2.length() - 1)
match_recursive(matches, match, a_str1, a_str2, i + 1, j + 1);
else
matches.emplace(match);
added = true;
match.pop_back();
}
}
}
if(!added)
matches.emplace(match);
}
This function solves problem, but complexity is unacceptable. For instance solution for "0q0e0t0c0a0d0a0d0i0e0o0p0z0" "0w0r0y0d0s0a0b0w0k0f0.0k0x0" takes 28 seconds on my machine (debug target, but anyway this is extremely slow). I think there should be some simple algorithm for this problem, but somehow I can't find any on the net.
Can you guys point me to right direction?
Look up "longest common subsequence (LCS)" problem, e.g. http://en.wikipedia.org/wiki/Longest_common_subsequence_problem and see how the dynamic programming solution works to find a LCS of two sequences, based on building up the solution efficiently starting with trivially getting the the LCS for the first character of each sequence, and then building up the LCS solution for longer and longer pairs of prefixes of the two sequences. The only modification you need to make is that when you get the LCS for a current prefix pair from the previously computed LCS solutions for earlier prefix pairs, you need to have stored ALL previous LCS strings for the earlier prefix pairs, and then combine these sets of LCS strings together (possibly with an added character) into an overall set of LCS strings you store for the current prefix pair. This will solve your problem efficiently. You can solve even a bit more efficiently by first just getting a single LCS and getting the overall LCS length, and then finding all earlier prefix pairs that contribute to computational paths that obtain the LCS length, and then going back and repeating the dynamic programming iterations just for those prefix pairs, and this time keeping track of all possible LCS sequences like I described earlier.
Here is the code for the dynamic programming solution. I test it with the examples you give. I have solved the LCS problem, but this is the first time to print them all.
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
#include <set>
using namespace std;
#define MAX_LENGTH 100
int lcs(const char* a, const char* b)
{
int row = strlen(a)+ 1;
int column = strlen(b) + 1;
//Memoization lower the function's time cost in exchange for space cost.
int **matrix = (int**)malloc(sizeof(int*) * row);
int i, j;
for(i = 0; i < row; ++i)
matrix[i] = (int*)calloc(sizeof(int), column);
typedef set<string> lcs_set;
lcs_set s_matrix[MAX_LENGTH][MAX_LENGTH];
//initiate
for(i = 0; i < MAX_LENGTH ; ++i)
s_matrix[0][i].insert("");
for(i = 0; i < MAX_LENGTH ; ++i)
s_matrix[i][0].insert("");
//Bottom up calculation
for(i = 1; i < row; ++i)
{
for(j = 1; j < column; ++j)
{
if(a[i - 1] == b[j - 1])
{
matrix[i][j] = matrix[i -1][j - 1] + 1;
// if your compiler support c++ 11, you can simplify this code.
for(lcs_set::iterator it = s_matrix[i - 1][j - 1].begin(); it != s_matrix[i - 1][j - 1].end(); ++it)
s_matrix[i][j].insert(*it + a[i - 1]);
}
else
{
if(matrix[i][j - 1] > matrix[i - 1][j])
{
matrix[i][j] = matrix[i][j - 1];
for(lcs_set::iterator it = s_matrix[i][j - 1].begin(); it != s_matrix[i][j - 1].end(); ++it)
s_matrix[i][j].insert(*it);
}
else if(matrix[i][j - 1] == matrix[i - 1][j])
{
matrix[i][j] = matrix[i][j - 1];
for(lcs_set::iterator it = s_matrix[i][j - 1].begin(); it != s_matrix[i][j - 1].end(); ++it)
s_matrix[i][j].insert(*it);
for(lcs_set::iterator it = s_matrix[i - 1][j].begin(); it != s_matrix[i - 1][j].end(); ++it)
s_matrix[i][j].insert(*it);
}
else
{
matrix[i][j] = matrix[i - 1][j];
for(lcs_set::iterator it = s_matrix[i - 1][j].begin(); it != s_matrix[i - 1][j].end(); ++it)
s_matrix[i][j].insert(*it);
}
}
}
}
int lcs_length = matrix[row - 1][column -1];
// all ordered sequences with maximum length are here.
lcs_set result_set;
int m, n;
for(m = 1; m < row; ++m)
{
for(n = 1; n < column; ++n)
{
if(matrix[m][n] == lcs_length)
{
for(lcs_set::iterator it = s_matrix[m][n].begin(); it != s_matrix[m][n].end(); ++it)
result_set.insert(*it);
}
}
}
//comment it
for(lcs_set::iterator it = result_set.begin(); it != result_set.end(); ++it)
printf("%s\t", it->c_str());
printf("\n");
for(i = 0; i < row; ++i)
free(matrix[i]);
free(matrix);
return lcs_length;
}
int main()
{
char buf1[MAX_LENGTH], buf2[MAX_LENGTH];
while(scanf("%s %s", buf1, buf2) != EOF)
{
printf("length is: %d\n", lcs(buf1, buf2) );
}
return 0;
}
Sounds like you are trying to find similarities between 2 string? I found this code, and modified slightly, somewhere on the web many years ago (sorry I cannot quote the source any longer) and use it often. It works very quick (for strings anyway). You may need to change for your purpose. Sorry it's in VB.
Private Shared piScore As Integer
''' <summary>
''' Compares two not-empty strings regardless of case.
''' Returns a numeric indication of their similarity
''' (0 = not at all similar, 100 = identical)
''' </summary>
''' <param name="psStr1">String to compare</param>
''' <param name="psStr2">String to compare</param>
''' <returns>0-100 (0 = not at all similar, 100 = identical)</returns>
''' <remarks></remarks>
Public Shared Function Similar(ByVal psStr1 As String, ByVal psStr2 As String) As Integer
If psStr1 Is Nothing Or psStr2 Is Nothing Then Return 0
' Convert each string to simplest form (letters
' and digits only, all upper case)
psStr1 = ReplaceSpecial(psStr1.ToUpper)
psStr2 = ReplaceSpecial(psStr2.ToUpper)
If psStr1.Trim = "" Or psStr2.Trim = "" Then
' One or both of the strings is now empty
Return 0
End If
If psStr1 = psStr2 Then
' Strings are identical
Return 100
End If
' Initialize cumulative score (this will be the
' total length of all the common substrings)
piScore = 0
' Find all common sub-strings
FindCommon(psStr1, psStr2)
' We now have the cumulative score. Return this
' as a percent of the maximum score. The maximum
' score is the average length of the two strings.
Return piScore * 200 / (Len(psStr1) + Len(psStr2))
End Function
''' <summary>USED BY SIMILAR FUNCTION</summary>
Private Shared Sub FindCommon(ByVal psS1 As String, ByVal psS2 As String)
' Finds longest common substring (other than single
' characters) in psS1 and psS2, then recursively
' finds longest common substring in left-hand
' portion and right-hand portion. Updates the
' cumulative score.
Dim iLongest As Integer = 0, iStartPos1 As Integer = 0, iStartPos2 As Integer = 0, iJ As Integer = 0
Dim sHoldStr As String = "", sTestStr As String = "", sLeftStr1 As String = "", sLeftStr2 As String = ""
Dim sRightStr1 As String = "", sRightStr2 As String = ""
sHoldStr = psS2
Do While Len(sHoldStr) > iLongest
sTestStr = sHoldStr
Do While Len(sTestStr) > 1
iJ = InStr(psS1, sTestStr)
If iJ > 0 Then
' Test string is sub-set of the other string
If Len(sTestStr) > iLongest Then
' Test string is longer than previous
' longest. Store its length and position.
iLongest = Len(sTestStr)
iStartPos1 = iJ
iStartPos2 = InStr(psS2, sTestStr)
End If
' No point in going further with this string
Exit Do
Else
' Test string is not a sub-set of the other
' string. Discard final character of test
' string and try again.
sTestStr = Left(sTestStr, Len(sTestStr) - 1)
End If
Loop
' Now discard first char of test string and
' repeat the process.
sHoldStr = Right(sHoldStr, Len(sHoldStr) - 1)
Loop
' Update the cumulative score with the length of
' the common sub-string.
piScore = piScore + iLongest
' We now have the longest common sub-string, so we
' can isolate the sub-strings to the left and right
' of it.
If iStartPos1 > 3 And iStartPos2 > 3 Then
sLeftStr1 = Left(psS1, iStartPos1 - 1)
sLeftStr2 = Left(psS2, iStartPos2 - 1)
If sLeftStr1.Trim <> "" And sLeftStr2.Trim <> "" Then
' Get longest common substring from left strings
FindCommon(sLeftStr1, sLeftStr2)
End If
Else
sLeftStr1 = ""
sLeftStr2 = ""
End If
If iLongest > 0 Then
sRightStr1 = Mid(psS1, iStartPos1 + iLongest)
sRightStr2 = Mid(psS2, iStartPos2 + iLongest)
If sRightStr1.Trim <> "" And sRightStr2.Trim <> "" Then
' Get longest common substring from right strings
FindCommon(sRightStr1, sRightStr2)
End If
Else
sRightStr1 = ""
sRightStr2 = ""
End If
End Sub
''' <summary>USED BY SIMILAR FUNCTION</summary>
Private Shared Function ReplaceSpecial(ByVal sString As String) As String
Dim iPos As Integer
Dim sReturn As String = ""
Dim iAsc As Integer
For iPos = 1 To sString.Length
iAsc = Asc(Mid(sString, iPos, 1))
If (iAsc >= 48 And iAsc <= 57) Or (iAsc >= 65 And iAsc <= 90) Then
sReturn &= Chr(iAsc)
End If
Next
Return sReturn
End Function
Just call the Similar function and you get a result between 0 at 100.
Hope this helps

What Ruzzle board contains the most unique words?

For smart phones, there is this game called Ruzzle.
It's a word finding game.
Quick Explanation:
The game board is a 4x4 grid of letters.
You start from any cell and try to spell a word by dragging up, down, left, right, or diagonal.
The board doesn't wrap, and you can't reuse letters you've already selected.
On average, my friend and I find about 40 words, and at the end of the round, the game informs you of how many possible words you could have gotten. This number is usually about 250 - 350.
We are wondering what board would yield the highest number of possible words.
How would I go about finding the optimal board?
I've written a program in C that takes 16 characters and outputs all the appropriate words.
Testing over 80,000 words, it takes about a second to process.
The Problem:
The number of game board permutations is 26^16.
That's 43608742899428874059776 (43 sextillion).
I need some kind of heuristic.
Should I skip all boards that have either z, q, x, etc because they are expected to not have as many words? I wouldn't want to exclude a letter without being certain.
There is also 4 duplicates of every board, because rotating the board will still give the same results.
But even with these restrictions, I don't think I have enough time in my life to find the answer.
Maybe board generation isn't the answer.
Is there a quicker way to find the answer looking at the list of words?
tldr;
S E R O
P I T S
L A N E
S E R G
or any of its reflections.
This board contains 1212 words (and as it turns out, you can exclude 'z', 'q' and 'x').
First things first, turns out you're using the wrong dictionary. After not getting exact matches with Ruzzle's word count, I looked into it, it seems Ruzzle uses a dictionary called TWL06, which has around 180,000 words. Don't ask me what it stands for, but it's freely available in txt.
I also wrote code to find all possible words given a 16 character board, as follows. It builds the dictionary into a tree structure, and then pretty much just goes around recursively while there are words to be found. It prints them in order of length. Uniqueness is maintained by the STL set structure.
#include <cstdlib>
#include <ctime>
#include <map>
#include <string>
#include <set>
#include <algorithm>
#include <fstream>
#include <iostream>
using namespace std;
struct TreeDict {
bool existing;
map<char, TreeDict> sub;
TreeDict() {
existing = false;
}
TreeDict& operator=(TreeDict &a) {
existing = a.existing;
sub = a.sub;
return *this;
}
void insert(string s) {
if(s.size() == 0) {
existing = true;
return;
}
sub[s[0]].insert(s.substr(1));
}
bool exists(string s = "") {
if(s.size() == 0)
return existing;
if(sub.find(s[0]) == sub.end())
return false;
return sub[s[0]].exists(s.substr(1));
}
TreeDict* operator[](char alpha) {
if(sub.find(alpha) == sub.end())
return NULL;
return &sub[alpha];
}
};
TreeDict DICTIONARY;
set<string> boggle_h(const string board, string word, int index, int mask, TreeDict *dict) {
if(index < 0 || index >= 16 || (mask & (1 << index)))
return set<string>();
word += board[index];
mask |= 1 << index;
dict = (*dict)[board[index]];
if(dict == NULL)
return set<string>();
set<string> rt;
if((*dict).exists())
rt.insert(word);
if((*dict).sub.empty())
return rt;
if(index % 4 != 0) {
set<string> a = boggle_h(board, word, index - 4 - 1, mask, dict);
set<string> b = boggle_h(board, word, index - 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 - 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
if(index % 4 != 3) {
set<string> a = boggle_h(board, word, index - 4 + 1, mask, dict);
set<string> b = boggle_h(board, word, index + 1, mask, dict);
set<string> c = boggle_h(board, word, index + 4 + 1, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
rt.insert(c.begin(), c.end());
}
set<string> a = boggle_h(board, word, index + 4, mask, dict);
set<string> b = boggle_h(board, word, index - 4, mask, dict);
rt.insert(a.begin(), a.end());
rt.insert(b.begin(), b.end());
return rt;
}
set<string> boggle(string board) {
set<string> words;
for(int i = 0; i < 16; i++) {
set<string> a = boggle_h(board, "", i, 0, &DICTIONARY);
words.insert(a.begin(), a.end());
}
return words;
}
void buildDict(string file, TreeDict &dict = DICTIONARY) {
ifstream fstr(file.c_str());
string s;
if(fstr.is_open()) {
while(fstr.good()) {
fstr >> s;
dict.insert(s);
}
fstr.close();
}
}
struct lencmp {
bool operator()(const string &a, const string &b) {
if(a.size() != b.size())
return a.size() > b.size();
return a < b;
}
};
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
set<string> a = boggle("SEROPITSLANESERG");
set<string, lencmp> words;
words.insert(a.begin(), a.end());
set<string>::iterator it;
for(it = words.begin(); it != words.end(); it++)
cout << *it << endl;
cout << words.size() << " words." << endl;
}
Randomly generating boards and testing against them didn't turn out too effective, expectedly, I didn't really bother with running that, but I'd be surprised if they crossed 200 words. Instead I changed the board generation to generate boards with letters distributed in proportion to their frequency in TWL06, achieved by a quick cumulative frequency (the frequencies were reduced by a factor of 100), below.
string randomBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += (char)('A' + rand() % 26);
return board;
}
char distLetter() {
int x = rand() % 15833;
if(x < 1209) return 'A';
if(x < 1510) return 'B';
if(x < 2151) return 'C';
if(x < 2699) return 'D';
if(x < 4526) return 'E';
if(x < 4726) return 'F';
if(x < 5161) return 'G';
if(x < 5528) return 'H';
if(x < 6931) return 'I';
if(x < 6957) return 'J';
if(x < 7101) return 'K';
if(x < 7947) return 'L';
if(x < 8395) return 'M';
if(x < 9462) return 'N';
if(x < 10496) return 'O';
if(x < 10962) return 'P';
if(x < 10987) return 'Q';
if(x < 12111) return 'R';
if(x < 13613) return 'S';
if(x < 14653) return 'T';
if(x < 15174) return 'U';
if(x < 15328) return 'V';
if(x < 15452) return 'W';
if(x < 15499) return 'X';
if(x < 15757) return 'Y';
if(x < 15833) return 'Z';
}
string distBoard() {
string board = "";
for(int i = 0; i < 16; i++)
board += distLetter();
return board;
}
This was significantly more effective, very easily achieving 400+ word boards. I left it running (for longer than I intended), and after checking over a million boards, the highest found was around 650 words. This was still essentially random generation, and that has its limits.
Instead, I opted for a greedy maximisation strategy, wherein I'd take a board and make a small change to it, and then commit the change only if it increased the word count.
string changeLetter(string x) {
int y = rand() % 16;
x[y] = distLetter();
return x;
}
string swapLetter(string x) {
int y = rand() % 16;
int z = rand() % 16;
char w = x[y];
x[y] = x[z];
x[z] = w;
return x;
}
string change(string x) {
if(rand() % 2)
return changeLetter(x);
return swapLetter(x);
}
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
string board = "SEROPITSLANESERG";
int locmax = boggle(board).size();
for(int j = 0; j < 5000; j++) {
int changes = 1;
string board2 = board;
for(int k = 0; k < changes; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
}
This very rapidly got me 1000+ word boards, with generally similar letter patterns, despite randomised starting points. What leads me to believe that the board given is the best possible board is how it, or one of its various reflections, turned up repeatedly, within the first 100 odd attempts at maximising a random board.
The biggest reason for skepticism is the greediness of this algorithm, and that this somehow would lead to the algorithm missing out better boards. The small changes made are quite flexible in their outcomes – that is, they have the power to completely transform a grid from its (randomised) start position. The number of possible changes, 26*16 for the fresh letter, and 16*15 for the letter swap, are both significantly less than 5000, the number of continuous discarded changes allowed.
The fact that the program was able to repeat this board output within the first 100 odd times implies that the number of local maximums is relatively small, and the probability that there is an undiscovered maximum low.
Although the greedy seemed intuitively right – it shouldn't really be less possible to reach a given grid with the delta changes from a random board – and the two possible changes, a swap and a fresh letter do seem to encapsulate all possible improvements, I changed the program in order to allow it to make more changes before checking for the increase. This again returned the same board, repeatedly.
int main() {
srand(time(NULL));
buildDict("/Users/XXX/Desktop/TWL06.txt");
int glomax = 0;
int i = 0;
while(true) {
string board = distBoard();
int locmax = boggle(board).size();
for(int j = 0; j < 500; j++) {
string board2 = board;
for(int k = 0; k < 2; k++)
board2 = change(board);
int loc = boggle(board2).size();
if(loc >= locmax && board != board2) {
j = 0;
board = board2;
locmax = loc;
}
}
if(glomax <= locmax) {
glomax = locmax;
cout << board << " " << glomax << " words." << endl;
}
if(++i % 10 == 0)
cout << i << endl;
}
}
Having iterated over this loop around a 1000 times, with this particular board configuration showing up ~10 times, I'm pretty confident that this is for now the Ruzzle board with the most unique words, until the English language changes.
Interesting problem. I see (at least, but mainly) two approches
one is to try the hard way to stick as many wordable letters (in all directions) as possible, based on a dictionary. As you said, there are many possible combinations, and that route requires a well elaborated and complex algorithm to reach something tangible
there is another "loose" solution based on probabilities that I like more. You suggested to remove some low-appearance letters to maximize the board yield. An extension of this could be to use more of the high-appearance letters in the dictionary.
A further step could be:
based on the 80k dictionary D, you find out for each l1 letter of our L ensemble of 26 letters the probability that letter l2 precedes or follows l1. This is a L x L probabilities array, and is pretty small, so you could even extend to L x L x L, i.e. considering l1 and l2 what probability has l3 to fit. This is a bit more complex if the algorithm wants to estimate accurate probabilities, as the probas sum depends on the relative position of the 3 letters, for instance in a 'triangle' configuration (eg positions (3,3), (3,4) and (3,5)) the result is probably less yielding than when the letters are aligned [just a supposition]. Why not going up to L x L x L x L, which will require some optimizations...
then you distribute a few high-appearance letters (say 4~6) randomly on the board (having each at least 1 blank cell around in at least 5 of the 8 possible directions) and then use your L x L [xL] probas arrays to complete - meaning based on the existing letter, the next cell is filled with a letter which proba is high given the configuration [again, letters sorted by proba descending, and use randomness if two letters are in a close tie].
For instance, taking only the horizontal configuration, having the following letters in place, and we want to find the best 2 in between ER and TO
...ER??TO...
Using L x L, a loop like (l1 and l2 are our two missing letters). Find the absolutely better letters - but bestchoice and bestproba could be arrays instead and keep the - say - 10 best choices.
Note: there is no need to keep the proba in the range [0,1] in this case, we can sum up the probas (which don't give a proba - but the number matters. A mathematical proba could be something like p = ( p(l0,l1) + p(l2,l3) ) / 2, l0 and l3 are the R and T in our L x L exemple)
bestproba = 0
bestchoice = (none, none)
for letter l1 in L
for letter l2 in L
p = proba('R',l1) + proba(l2,'T')
if ( p > bestproba )
bestproba = p
bestchoice = (l1, l2)
fi
rof
rof
the algorithm can take more factors into account, and needs to take the vertical and diagonals into account as well. With L x L x L, more letters in more directions are taken into account, like ER?,R??,??T,?TO - this requires to think more through the algorithm - maybe starting with L x L can give an idea about the relevancy of this algorithm.
Note that a lot of this may be pre-calculated, and the L x L array is of course one of them.

Determining the individual letters of Fibonacci strings?

The Fibonacci strings are defined as follows:
The first Fibonacci string is "a"
The second Fibonacci string is "bc"
The (n + 2)nd Fibonacci string is the concatenation of the two previous Fibonacci strings.
For example, the first few Fibonacci strings are
a
bc
abc
bcabc
abcbcabc
The goal is, given a row and an offset, to determine what character is at that offset. More formally:
Input: Two integers separated by a space - K and P(0 < K ≤ 109), ( < P ≤ 109), where K is the line number of the Fibonacci string and P is the position number in a row.
Output: The desired character for the relevant test: "a", "b" or "c". If P is greater than the kth row (K ≤ 109), it is necessary to derive «No solution»
Example:
input: 18 58
output: a
I wrote this code to solve the problem:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
int k, p;
string s1 = "a";
string s2 = "bc";
vector < int >fib_numb;
fib_numb.push_back(1);
fib_numb.push_back(2);
cin >> k >> p;
k -= 1;
p -= 1;
while (fib_numb.back() < p) {
fib_numb.push_back(fib_numb[fib_numb.size() - 1] + fib_numb[fib_numb.size() - 2]);
}
if (fib_numb[k] <= p) {
cout << "No solution";
return 0;
}
if ((k - fib_numb.size()) % 2 == 1)
k = fib_numb.size() + 1;
else
k = fib_numb.size();
while (k > 1) {
if (fib_numb[k - 2] > p)
k -= 2;
else {
p -= fib_numb[k - 2];
k -= 1;
}
}
if (k == 1)
cout << s2[p];
else
cout << s1[0];
return 0;
}
Is it correct? How would you have done?
You can solve this problem without explicitly computing any of the strings, and this is probably the best way to solve the problem. After all, if you're asked to compute the 50th Fibonacci string, you're almost certain to run out of memory; F(50) is 12,586,269,025, so you'd need over 12Gb of memory just to hold it!
The intuition behind the solution is that because each line of the Fibonacci strings are composed of the characters of the previous lines, you can convert an (row, offset) pair into a different (row', offset') pair where the new row is always for a smaller Fibonacci string than the one you started with. If you repeat this enough times, eventually you will arrive back at the Fibonacci strings for either row 0 or row 1, in which case the answer can immediately be read off.
In order to make this algorithm work, we need to establish a few facts. First, let's define the Fibonacci series to be zero-indexed; that is, the sequence is
F(0) = 0
F(1) = 1
F(n+2) = F(n) + F(n + 1)
Given this, we know that the nth row (one-indexed) of the Fibonacci strings has a total of F(n + 1) characters in it. You can see this quickly by induction:
Row 1 has length 1 = F(2) = F(1 + 1)
Row 2 has length 2 = F(3) = F(2 + 1).
For some row n + 2, the length of that row is given by Size(n) + Size(n + 1) = F(n + 1) + F(n + 2) = F(n + 3) = F((n + 2) + 1)
Using this knowledge, let's suppose that we want to find the seventh character of the seventh row of the Fibonacci strings. We know that row seven is composed of the concatenation of rows five and six, so the string looks like this:
R(7) = R(5) R(6)
Row five has F(5 + 1) = F(6) = 8 characters in it, which means that the first eight characters of row seven come from R(5). Since we want the seventh character out of this row, and since 7 ≤ 8, we know that we now need to look at the seventh character of row 5 to get this value. Well, row 5 looks like the concatenation of rows 3 and 4:
R(5) = R(3) R(4)
We want to find the seventh character of this row. Now, R(3) has F(4) = 3 characters in it, which means that if we are looking for the seventh character of R(5), it's going to be in the R(4) part, not the R(3) part. Since we're looking for the seventh character of this row, it means that we're looking for the the 7 - F(4) = 7 - 3 = 4th character of R(4), so now we look there. Again, R(4) is defined as
R(4) = R(2) R(3)
R(2) has F(3) = 2 characters in it, so we don't want to look in it to find the fourth character of the row; that's going to be contained in R(3). The fourth character of the line must be the second character of R(3). Let's look there. R(3) is defined as
R(3) = R(1) R(2)
R(1) has one character in it, so the second character of this line must be the first character of R(1), so we look there. We know, however, that
R(2) = bc
So the first character of this string is b, which is our answer. Let's see if this is right. The first seven rows of the Fibonacci strings are
1 a
2 bc
3 abc
4 bcabc
5 abcbcabc
6 bcabcabcbcabc
7 abcbcabcbcabcabcbcabc
Sure enough, if you look at the seventh character of the seventh string, you'll see that it is indeed a b. Looks like this works!
More formally, the recurrence relation we are interested in looks like this:
char NthChar(int row, int index) {
if (row == 1) return 'a';
if (row == 2 && index == 1) return 'b';
if (row == 2 && index == 2) return 'c';
if (index < Fibonacci(row - 1)) return NthChar(row - 2, index);
return NthChar(row - 1, index - Fibonacci(row - 1));
}
Now, of course, there's a problem with the implementation as written here. Because the row index can range up to 109, we can't possibly compute Fibonacci(row) in all cases; the one billionth Fibonacci number is far too large to represent!
Fortunately, we can get around this. If you look at a table of Fibonacci numbers, you'll find that F(45) = 1,134,903,170, which is greater than 109 (and no smaller Fibonacci number is greater than this). Moreover, since we know that the index we care about must also be no greater than one billion, if we're in row 46 or greater, we will always take the branch where we look in the first half of the Fibonacci string. This means that we can rewrite the code as
char NthChar(int row, int index) {
if (row == 1) return 'a';
if (row == 2 && index == 1) return 'b';
if (row == 2 && index == 2) return 'c';
/* Avoid integer overflow! */
if (row >= 46) return NthChar(row - 2, index);
if (index < Fibonacci(row - 1)) return NthChar(row - 2, index);
return NthChar(row - 1, index - Fibonacci(row - 1));
}
At this point we're getting very close to a solution. There are still a few problems to address. First, the above code will almost certainly blow out the stack unless the compiler is good enough to use tail recursion to eliminate all the stack frames. While some compilers (gcc, for example) can detect this, it's probably not a good idea to rely on it, and so we probably should rewrite this recursive function iteratively. Here's one possible implementation:
char NthChar(int row, int index) {
while (true) {
if (row == 1) return 'a';
if (row == 2 && index == 1) return 'b';
if (row == 2 && index == 2) return 'c';
/* Avoid integer overflow! */
if (row >= 46 || index < Fibonacci(row - 1)) {
row -= 2;
} else {
index -= Fibonacci(row - 1);
row --;
}
}
}
But of course we can still do much better than this. In particular, if you're given a row number that's staggeringly huge (say, one billion), it's really silly to keep looping over and over again subtracting two from the row until it becomes less than 46. It makes a lot more sense to just determine what value it's ultimately going to become after we do all the subtraction. But we can do this quite easily. If we have an even row that's at least 46, we'll end up subtracting out 2 until it becomes 44. If we have an odd row that's at least 46, we'll end up subtracting out 2 until it becomes 45. Consequently, we can rewrite the above code to explicitly handle this case:
char NthChar(int row, int index) {
/* Preprocess the row to make it a small value. */
if (row >= 46) {
if (row % 2 == 0)
row = 45;
else
row = 44;
}
while (true) {
if (row == 1) return 'a';
if (row == 2 && index == 1) return 'b';
if (row == 2 && index == 2) return 'c';
if (index < Fibonacci(row - 1)) {
row -= 2;
} else {
index -= Fibonacci(row - 1);
row --;
}
}
}
There's one last thing to handle, which is what happens if there isn't a solution to the problem because the character is out of range. But we can easily fix this up:
string NthChar(int row, int index) {
/* Preprocess the row to make it a small value. */
if (row >= 46) {
if (row % 2 == 0)
row = 45;
else
row = 44;
}
while (true) {
if (row == 1 && index == 1) return "a"
if (row == 2 && index == 1) return "b";
if (row == 2 && index == 2) return "c";
/* Bounds-checking. */
if (row == 1) return "no solution";
if (row == 2) return "no solution";
if (index < Fibonacci(row - 1)) {
row -= 2;
} else {
index -= Fibonacci(row - 1);
row --;
}
}
}
And we've got a working solution.
One further optimization you might do is precomputing all of the Fibonacci numbers that you'll need and storing them in a giant array. You only need Fibonacci values for F(2) through F(44), so you could do something like this:
const int kFibonacciNumbers[45] = {
0, 1, 1, 2, 3, 5,
8, 13, 21, 34, 55, 89,
144, 233, 377, 610,
987, 1597, 2584, 4181,
6765, 10946, 17711, 28657,
46368, 75025, 121393, 196418,
317811, 514229, 832040,
1346269, 2178309, 3524578,
5702887, 9227465, 14930352,
24157817, 39088169, 63245986,
102334155, 165580141, 267914296,
433494437, 701408733
};
With this precomputed array, the final version of the code would look like this:
string NthChar(int row, int index) {
/* Preprocess the row to make it a small value. */
if (row >= 46) {
if (row % 2 == 0)
row = 45;
else
row = 44;
}
while (true) {
if (row == 1 && index == 1) return "a"
if (row == 2 && index == 1) return "b";
if (row == 2 && index == 2) return "c";
/* Bounds-checking. */
if (row == 1) return "no solution";
if (row == 2) return "no solution";
if (index < kFibonacciNumbers[row - 1]) {
row -= 2;
} else {
index -= kFibonacciNumbers[row - 1];
row --;
}
}
}
I have not yet tested this; to paraphrase Don Knuth, I've merely proved it correct. :-) But I hope this helps answer your question. I really loved this problem!
I guess your general idea should be OK, but I don't see how your code is going to deal with larger values of K, because the numbers will get enormous quickly, and even with large integer libraries it might take virtually forever to compute fibonacci(10^9) exactly.
Fortunately, you are only asked about the first 10^9 characters. The string will reach that many characters already on the 44th line (f(44) = 1134903170).
And if I'm not mistaken, from there on the first 10^9 characters will be simply alternating between the prefixes of line 44 and 45, and therefore in pseudocode:
def solution(K, P):
if K > 45:
if K % 2 == 0:
return solution(44, P)
else:
return solution(45, P)
#solution for smaller values of K here
I found this. I did not do a pre-check (get the size of the k-th fibo string to test p againt it) because if the check is successful you'll have to compute it anyway. Of course as soon as k becomes big, you may have an overflow issue (the length of the fibo string is an exponential function of the index n...).
#include <iostream>
#include <string>
using namespace std;
string fibo(unsigned int n)
{
if (n == 0)
return "a";
else if (n == 1)
return "bc";
else
return fibo(n - 2) + fibo(n - 1);
}
int main()
{
unsigned int k, p;
cin >> k >> p;
--k;
--p;
string fiboK = fibo(k);
if (p > fiboK.size())
cout << "No solution" << endl;
else
cout << fiboK[p] << endl;
return 0;
}
EDIT: ok, I now see your point, i.e. checking in which part of the k-th string p resides (i.e. in string k - 2 or k - 1, and updating p if needed). Of course this is the good way to do it, since as I was saying above my naive solution will explode quite too quickly.
Your way looks correct to me from an algorithm point of view (saves memory and complexity).
I would have computed the K-th Fibonacci String, and then retrieve the P-th character of it. Something like that:
#include <iostream>
#include <string>
#include <vector>
std::string FibonacciString(unsigned int k)
{
std::vector<char> buffer;
buffer.push_back('a');
buffer.push_back('b');
buffer.push_back('c');
unsigned int curr = 1;
unsigned int next = 2;
while (k --)
{
buffer.insert(
buffer.end(),
buffer.begin(),
buffer.end());
buffer.erase(
buffer.begin(),
buffer.begin() + curr);
unsigned int prev = curr;
curr = next;
next = prev + next;
}
return std::string(
buffer.begin(),
buffer.begin() + curr);
}
int main(int argc, char** argv)
{
unsigned int k, p;
std::cin >> k >> p;
-- p;
-- k;
std::string fiboK = FibonacciString(k);
if (p > fiboK.size())
std::cout << "No solution";
else
std::cout << fiboK[p];
std::cout << std::endl;
return 0;
}
It does use more memory than your version since it needs to store both the N-th and the (N+1)-th Fibonacci string at every instant. However, since it is really close to the definition, it does work for every value.
Your algorithm seems to have some issue when k is large while p is small. The test fib_num[k] < p will dereference an item outside of the range of the array with k = 30 and p = 1, won't it ?
I made another example where each corresponding number of Fibonnaci series corresponds to the letter in the alfabet. So for 1 is a, for 2 is b, for 3 is c, for 5 is e... etc:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string a = "abcdefghijklmnopqrstuvwxyz"; //the alphabet
string a1 = a.substr(0,0); string a2 = a.substr(1,1); string nexT = a.substr(0,0);
nexT = a1 + a2;
while(nexT.length() <= a.length())
{
//cout << nexT.length() << ", "; //show me the Fibonacci numbers
cout << a.substr(nexT.length()-1,1) << ", "; //show me the Fibonacci letters
a1 = a2;
a2 = nexT;
nexT = a1 + a2;
}
return 0;
}
Output: a, b, c, e, h, m, u,
Quote from Wikipedia, Fibonacci_word:
The nth digit of the word is 2+[nφ]-[(n+1)φ] where φ is the golden ratio ...
(The only characters used in the Wikipedia page are 1 and 0.)
But note that the strings in the Wikipedia page, and in Knuth s Fundamental Algorithms, are built up in the opposite order of the above shown strings; there it becomes clear when the strings are listed, with ever repeating leading part, that there is only one infinitely long Fibonacci string. It is less clear when generated in the above used order, for the ever repeating part is the string s trailing part, but it is no less true. Therefore the term "the word" in the quotation, and, except for the question "is n too great for this row?", the row is not important.
Unhappily, though, it is too hard to apply this formula to the poster s problem, because in this formula the original strings are of the same length, and poster began with "a" and "bc".
This J(ava)Script script generates the Fibonacci string over the characters the poster chose, but in the opposite order. (It contains the Microsoft object WScript used for fetching command-line argument and outputting to the standard output.)
var u, v /*Fibonacci numbers*/, g, i, k, R;
v = 2;
u = 1;
k = 0;
g = +WScript.arguments.item(0); /*command-line argument for desired length of string*/
/*Two consecutiv Fibonacci numbers, with the greater no less than the
Fibonacci string s length*/
while (v < g)
{ v += u;
u = v - u;
k = 1 - k;
}
i = u - k;
while (g-- > 0)
{ /*In this operation, i += u with i -= v when i >= v (carry),
since the Fibonacci numbers are relativly prime, i takes on
every value from 0 up to v. Furthermore, there are u carries,
and, therefore, u instances of character 'cb', and v-u instances
of 'a' (no-carry). The characters are spread as evenly as can be.*/
if ((i += u) < v)
{ R = 'a'; // WScript.StdOut.write('a'); /* no-carry */
} else
{ i -= v; /* carry */
R = 'cb'; // WScript.StdOut.write('cb')
}
}
/*result is in R*/ // WScript.StdOut.writeLine()
I suggest it because actually outputting the string is not required. One can simply stop at the desired length, and show the last thing about to be outputted. (The code for output is commented out with '//'). Of course, using this to find the character at position n has cost proportional to n. The formula at the top costs much less.