Porting Python algorithm to C++ - different solution - c++

Thank you all for helping. Below this post I put the corrected version's of both scripts which now produce the equal output.
Hello,
I have written a little brute string generation script in python to generate all possible combinations of an alphabet within a given length. It works quite nice, but for the reason I wan't it to be faster I try to port it to C++.
The problem is that my C++ Code is creating far too much combination for one word.
Heres my example in python:
./test.py
gives me
aaa
aab
aac
aad
aa
aba
....
while ./test (the c++ programm gives me)
aaa
aaa
aaa
aaa
aa
Here I also get all possible combinations, but I get them twice ore more often.
Here is the Code for both programms:
#!/usr/bin/env python
import sys
#Brute String Generator
#Start it with ./brutestringer.py 4 6 "abcdefghijklmnopqrstuvwxyz1234567890" ""
#will produce all strings with length 4 to 6 and chars from a to z and numbers 0 to 9
def rec(w, p, baseString):
for c in "abcd":
if (p<w - 1):
rec(w, p + 1, baseString + "%c" % c)
print baseString
for b in range(3,4):
rec(b, 0, "")
And here the C++ Code
#include <iostream>
using namespace std;
string chars="abcd";
void rec(int w,int b,string p){
unsigned int i;
for(i=0;i<chars.size();i++){
if(b < (w-1)){
rec(w, (b+1), p+chars[i]);
}
cout << p << "\n";
}
}
int main ()
{
int a=3, b=0;
rec (a+1,b, "");
return 0;
}
Does anybody see my fault ? I don't have much experience with C++.
Thanks indeed
Here the corrected version:
C++
#include <iostream>
using namespace std;
string chars="abcd";
void rec(int w,int b,string p){
unsigned int i;
for(i=0;i<chars.size();i++){
if(b < (w)){
rec(w, (b+1), p+chars[i]);
}
}
cout << p << "\n";
}
int main ()
{
rec (3,0, "");
return 0;
}
Python
#!/usr/bin/env python
import sys
def rec(w, b, p):
for c in "abcd":
if (b < w - 1):
rec(w, b + 1, p + "%c" % c)
print p
rec(4, 0, "")
Equal Output:
$ ./test > 1
$ ./test.py 3 3 "abcd" "" > 2
$ diff 1 2
$

I think the Python code is also broken but maybe you don't notice because the print is indented by one space too many (hey, now I've seen a Python program with a one-off error!)
Shouldn't the output only happen in the else case? And the reason why the output happens more often is that you call print/cout 4 times. I suggest to change the code:
def rec(w, p, baseString):
if w == p:
print baseString
else:
for ...

Just out of curiosity, is this fast enough?
import itertools, string
alphabet = string.lowercase + string.digits
for numchars in (3, 4):
for x in itertools.product(alphabet, repeat=numchars):
print ''.join(x)
(And make sure you're redirecting output to a file; scrolling huge amounts of text up the screen can be surprisingly slow).

In rec the string p gets printed in every iteration of the loop:
for(i=0;i<chars.size();i++){
// ...
cout << p << "\n";
}
The Python code you posted seems to do the same, but maybe there is something mixed up with the indentation there? Did you maybe mix tabs and spaces in the Python file, leading to surprising results?

You say...:
./test.py
gives me
aaa
aab
(etc), but that's not true of the code you posted: what you get instead is
aa
aa
aa
aa
a
with four repetitions of the leading aa, etc etc. Of course you do: you have the print baseString statement inside the loop of for c in "abcd":, so necessarily it's executed four times. I imagine you want that print out of the loop -- and similarly for the C++ code, where you've also put the output statement smack inside the loop, so it gets repeated.

Related

How to make my CodeChef solution code faster?

I am a beginner currently in first semester. I have been practising on Code Chef and am stuck at this problem. They are asking to reduce the execution time of my code. The problem goes as follows:
Meliodas and Ban are fighting over chocolates. Meliodas has X chocolates, while Ban has Y. Whoever has lesser number of chocolates eats as many chocolates as he has from the other's collection. This eatfest war continues till either they have the same number of chocolates, or at least one of them is left with no chocolates.
Can you help Elizabeth predict the total no of chocolates they'll be left with at the end of their war?
Input:
First line will contain T, number of testcases. Then the testcases follow.
Each testcase contains of a single line of input, which contains two integers X,Y, the no of chocolates Meliodas and Ban have, respectively.
Output:
For each testcase, output in a single line the no of chocolates that remain after Ban and Meliodas stop fighting.
Sample Input:
3
5 3
10 10
4 8
Sample Output:
2
20
8
My code is as follows:
#include <iostream>
using namespace std;
int main()
{
unsigned int t,B,M;
cin>>t;
while(t--)
{
cin>>M>>B;
if(B==M)
{
cout<<B+M<<endl;
}
else
{
for(int i=1;B!=M;i++)
{
if(B>M)
B=B-M;
else
M=M-B;
}
cout<<M+B<<endl;
}
}
return 0;
}
Assuming that Band Mare different from 0, this algorithm corresponds to one version of the Euclidean algorithm. Therefore, you can simply:
std::cout << 2 * std::gcd(B, M) << "\n";
If at least one of the quantity is equal to 0, then just print B + M.
After realizing that your code was correct, I wondered where could be any algorithmic improvement. And I realized that eating as many chocolate from the peer as one has was in fact close to a modulo operation. If both number are close, a minus operation could be slightly faster than a modulo one, but if one number is high, while the other is 1, you immediately get it instead of looping a great number of times...
The key to prevent stupid errors is to realize that if a modulo is 0, that means that the high number is a multiple of the small one and we must stop immediately writing twice the lower value.
And care should be taken that if one of the initial counts are 0, the total number will never change.
So the outer loop should become:
if(B==M || B == 0 || M == 0)
{
cout<<B+M<<"\0";
}
else {
for (;;) {
if (M < B) {
B = B % M;
if (B == 0) {
cout << M * 2 << '\n';
break;
}
}
else {
M = M % B;
if (M == 0) {
cout << B * 2 << '\n';
break;
}
}
}
}
...
Note: no infinite loop is possible here because a modulo ensures that for example is M > B > 0' after M = M % Byou will haveB > M >= 0and as the case== 0` is explicitely handled the number of loops cannot be higher than the lower number.

Doing a scenario problem in C++ and am unsure of how to proceed with boolean and if-loops

**This is a translation, very hastily written. If you need any clarification, just comment.
The question given is:
We live in a world with too much garbage. We have found a way to compress the garbage, but it can only be done in a specific way, or else the garbage will explode. The garbage has to be laid out in a line, and it can only be compressed with its neighbor, and only if its neighbor has the same value as it.
The first input is int N, and it represents the amount of garbage in the row. The second input is t, and it must have an input of as many characters as the value in N. If the whole thing is able to be compressed until there's only 1 garbage (t) left, then the output will be "YES".
We've figured out that as long as either N == 1, or all inputs in t (all the characters) are the same, the output will be YES.
Example inputs/outputs:
Input:
2
1 1
Output:
YES
Or
Input:
3
1 2 1
Output:
NO
Or
Input:
1
5
Output:
YES
Here's what we've got so far:
#include <iostream>
#include <string>
using namespace std;
int N;
string t;
bool allCharactersSame(string s)
{
int n = s.length();
for (int i = 1; i < n; i++) {
if (s[i] != s[0])
return false;
}
return true;
}
int main()
{
cin>>N;
cin >> t;
if (N == 1)
{
cout << "YES";
}
else if (allCharactersSame(t))
{
cout <<"YES";
}
else
{
cout<<"NO";
}
}
The problem with this is that it outputs YES no matter what, and we think it's because it takes the whitespace of the input into consideration. If we don't include spaces, it works fine. BUT the question dictates that we Have To have spaces separating our inputs. So, we're stumped. Any suggestions?
(I can't comment, therefore I write this as an answer.)
There is some other problem than you think, because the code in the question works as it should. When I gave it input "5 11111" it said "YES" when I gave it "5 12345" it said "NO".
Kai's first comment is slightly weird, when determining whether all characters in a string are the same it is sufficient to compare each of them to the first one, just as you do it in your allCharactersSame() method.
I'd suggest you add some checks on the provided input; the program should probably notice if given N doesn't match given strings' length and it should probably notice when the given string doesn't consist of numbers. As it is now, e.g. input "3 a" says "YES".

Converting letters to numbers in C++

PROBLEM SOLVED: thanks everyone!
I am almost entirely new to C++ so I apologise in advance if the question seems trivial.
I am trying to convert a string of letters to a set of 2 digit numbers where a = 10, b = 11, ..., Y = 34, Z = 35 so that (for example) "abc def" goes to "101112131415". How would I go about doing this? Any help would really be appreciated. Also, I don't mind whether capitalization results in the same number or a different number. Thank you very much in advance. I probably won't need it for a few days but if anyone is feeling particularly nice how would I go about reversing this process? i.e. "101112131415" --> "abcdef" Thanks.
EDIT: This isn't homework, I'm entirely self taught. I have completed this project before in a different language and decided to try C++ to compare the differences and try to learn C++ in the process :)
EDIT: I have roughly what I want, I just need a little bit of help converting this so that it applies to strings, thanks guys.
#include <iostream>
#include <sstream>
#include <string>
int returnVal (char x)
{
return (int) x - 87;
}
int main()
{
char x = 'g';
std::cout << returnVal(x);
}
A portable method is to use a table lookup:
const unsigned int letter_to_value[] =
{10, 11, 12, /*...*/, 35};
// ...
letter = toupper(letter);
const unsigned int index = letter - 'A';
value = letter_to_value[index];
cout << index;
Each character has it's ASCII values. Try converting your characters into ASCII and then manipulate the difference.
Example:
int x = 'a';
cout << x;
will print 97; and
int x = 'a';
cout << x - 87;
will print 10.
Hence, you could write a function like this:
int returnVal(char x)
{
return (int)x - 87;
}
to get the required output.
And your main program could look like:
int main()
{
string s = "abcdef"
for (unsigned int i = 0; i < s.length(); i++)
{
cout << returnVal(s[i]);
}
return 0;
}
This is a simple way to do it, if not messy.
map<char, int> vals; // maps a character to an integer
int g = 1; // if a needs to be 10 then set g = 10
string alphabet = "abcdefghijklmnopqrstuvwxyz";
for(char c : alphabet) { // kooky krazy for loop
vals[c] = g;
g++;
}
What Daniel said, try it out for yourself.
As a starting point though, casting:
int i = (int)string[0] + offset;
will get you your number from character, and: stringstream will be useful too.
How would I go about doing this?
By trying to do something first, and looking for help only if you feel you cannot advance.
That being said, the most obvious solution that comes to mind is based on the fact that characters (i.e. 'a', 'G') are really numbers. Suppose you have the following:
char c = 'a';
You can get the number associated with c by doing:
int n = static_cast<int>(c);
Then, add some offset to 'n':
n += 10;
...and cast it back to a char:
c = static_cast<char>(n);
Note: The above assumes that characters are consecutive, i.e. the number corresponding to 'a' is equal to the one corresponding to 'z' minus the amount of letters between the two. This usually holds, though.
This can work
int Number = 123; // number to be converted to a string
string Result; // string which will contain the result
ostringstream convert; // stream used for the conversion
convert << Number; // insert the textual representation of 'Number' in the characters in the stream
Result = convert.str(); // set 'Result' to the contents of the stream
you should add this headers
#include <sstream>
#include <string>
Many answers will tell you that characters are encoded in ASCII and that you can convert a letter to an index by subtracting 'a'.
This is not proper C++. It is acceptable when your program requirements include a specification that ASCII is in use. However, the C++ standard alone does not require this. There are C++ implementations with other character sets.
In the absence of knowledge that ASCII is in use, you can use translation tables:
#include <limits.h>
// Define a table to translate from characters to desired codes:
static unsigned int Translate[UCHAR_MAX] =
{
['a'] = 10,
['b'] = 11,
…
};
Then you may translate characters to numbers by looking them up in the table:
unsigned char x = something;
int result = Translate[x];
Once you have the translation, you could print it as two digits using printf("%02d", result);.
Translating in the other direction requires reading two characters, converting them to a number (interpreting them as decimal), and performing a similar translation. You might have a different translation table set up for this reverse translation.
Just do this !
(s[i] - 'A' + 1)
Basically we are converting a char to number by subtracting it by A and then adding 1 to match the number and letters

Very large execution time differences for virtually same C++ and Python code

I was trying to write a solution for Problem 12 (Project Euler) in Python. The solution was just too slow, so I tried checking up other people's solution on the internet. I found this code written in C++ which does virtually the same exact thing as my python code, with just a few insignificant differences.
Python:
def find_number_of_divisiors(n):
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in range(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
def tri_nums():
n = 1
t = 1
while 1:
yield t
n += 1
t += n
t = tri_nums()
m = 0
for n in t:
d = find_number_of_divisiors(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
C++:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned int iteration = 1;
unsigned int triangle_number = 0;
unsigned int divisor_count = 0;
unsigned int current_max_divisor_count = 0;
while (true) {
triangle_number += iteration;
divisor_count = 0;
for (int x = 2; x <= triangle_number / 2; x ++) {
if (triangle_number % x == 0) {
divisor_count++;
}
}
if (divisor_count > current_max_divisor_count) {
current_max_divisor_count = divisor_count;
std::cout << triangle_number << " has " << divisor_count
<< " divisors." << std::endl;
}
if (divisor_count == 318) {
exit(0);
}
iteration++;
}
return 0;
}
The python code takes 1 minute and 25.83 seconds on my machine to execute. While the C++ code takes around 4.628 seconds. Its like 18x faster. I had expected the C++ code to be faster but not by this great margin and that too just for a simple solution which consists of just 2 loops and a bunch of increments and mods.
Although I would appreciate answers on how to solve this problem, the main question I want to ask is Why is C++ code so much faster? Am I using/doing something wrongly in python?
Replacing range with xrange:
After replacing range with xrange the python code takes around 1 minute 11.48 seconds to execute. (Around 1.2x faster)
This is exactly the kind of code where C++ is going to shine compared to Python: a single fairly tight loop doing arithmetic ops. (I'm going to ignore algorithmic speedups here, because your C++ code uses the same algorithm, and it seems you're explicitly not asking for that...)
C++ compiles this kind of code down to a relatively few number of instructions for the processor (and everything it does probably all fits in the super-fast levels of CPU cache), while Python has a lot of levels of indirection it's going through for each operation. For example, every time you increase a number it's checking that the number didn't just overflow and need to be moved into a bigger data type.
That said, all is not necessarily lost! This is also the kind of code that a just-in-time compiler system like PyPy will do well at, since once it's gone through the loop a few times it compiles the code to something similar to what the C++ code starts at. On my laptop:
$ time python2.7 euler.py >/dev/null
python euler.py 72.23s user 0.10s system 97% cpu 1:13.86 total
$ time pypy euler.py >/dev/null
pypy euler.py > /dev/null 13.21s user 0.03s system 99% cpu 13.251 total
$ clang++ -o euler euler.cpp && time ./euler >/dev/null
./euler > /dev/null 2.71s user 0.00s system 99% cpu 2.717 total
using the version of the Python code with xrange instead of range. Optimization levels don't make a difference for me with the C++ code, and neither does using GCC instead of Clang.
While we're at it, this is also a case where Cython can do very well, which compiles almost-Python code to C code that uses the Python APIs, but uses raw C when possible. If we change your code just a little bit by adding some type declarations, and removing the iterator since I don't know how to handle those efficiently in Cython, getting
cdef int find_number_of_divisiors(int n):
cdef int i, div
if n == 1:
return 1
div = 2 # 1 and the number itself
for i in xrange(2, n/2 + 1):
if (n % i) == 0:
div += 1
return div
cdef int m, n, t, d
m = 0
n = 1
t = 1
while True:
n += 1
t += n
d = find_number_of_divisiors(t)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
exit(0)
then on my laptop I get
$ time python -c 'import euler_cy' >/dev/null
python -c 'import euler_cy' > /dev/null 4.82s user 0.02s system 98% cpu 4.941 total
(within a factor of 2 of the C++ code).
Rewriting the divisor counting algorithm to use divisor function makes the run time reduces to less than 1 second. It is still possible to make it faster, but not really necessary.
This is to show that: before you do any optimization trick with the language features and compiler, you should check whether your algorithm is the bottleneck or not. The trick with compiler/interpreter is indeed quite powerful, as shown in Dougal's answer where the gap between Python and C++ is closed for the equivalent code. However, as you can see, the change in algorithm immediately give a huge performance boost and lower the run time to around the level of algorithmically inefficient C++ code (I didn't test the C++ version, but on my 6-year-old computer, the code below finishes running in ~0.6s).
The code below is written and tested with Python 3.2.3.
import math
def find_number_of_divisiors(n):
if n == 1:
return 1
num = 1
count = 1
div = 2
while (n % div == 0):
n //= div
count += 1
num *= count
div = 3
while (div <= pow(n, 0.5)):
count = 1
while n % div == 0:
n //= div
count += 1
num *= count
div += 2
if n > 1:
num *= 2
return num
Here's my own variant built on nhahtdh's factor-counting optimization plus my own prime factorization code:
def prime_factors(x):
def factor_this(x, factor):
factors = []
while x % factor == 0:
x /= factor
factors.append(factor)
return x, factors
x, factors = factor_this(x, 2)
x, f = factor_this(x, 3)
factors += f
i = 5
while i * i <= x:
for j in (2, 4):
x, f = factor_this(x, i)
factors += f
i += j
if x > 1:
factors.append(x)
return factors
def product(series):
from operator import mul
return reduce(mul, series, 1)
def factor_count(n):
from collections import Counter
c = Counter(prime_factors(n))
return product([cc + 1 for cc in c.values()])
def tri_nums():
n, t = 1, 1
while 1:
yield t
n += 1
t += n
if __name__ == '__main__':
m = 0
for n in tri_nums():
d = factor_count(n)
if m < d:
print n, ' has ', d, ' divisors.'
m = d
if m == 320:
break

Wildcard String Search Algorithm

In my program I need to search in a quite big string (~1 mb) for a relatively small substring (< 1 kb).
The problem is the string contains simple wildcards in the sense of "a?c" which means I want to search for strings like "abc" or also "apc",... (I am only interested in the first occurence).
Until now I use the trivial approach (here in pseudocode)
algorithm "search", input: haystack(string), needle(string)
for(i = 0, i < length(haystack), ++i)
if(!CompareMemory(haystack+i,needle,length(needle))
return i;
return -1; (Not found)
Where "CompareMemory" returns 0 iff the first and second argument are identical (also concerning wildcards) only regarding the amount of bytes the third argument gives.
My question is now if there is a fast algorithm for this (you don't have to give it, but if you do I would prefer c++, c or pseudocode). I started here
but I think most of the fast algorithms don't allow wildcards (by the way they exploit the nature of strings).
I hope the format of the question is ok because I am new here, thank you in advance!
A fast way, which is kind of the same thing as using a regexp, (which I would recommend anyway), is to find something that is fixed in needle, "a", but not "?", and search for it, then see if you've got a complete match.
j = firstNonWildcardPos(needle)
for(i = j, i < length(haystack)-length(needle)+j, ++i)
if(haystack[i] == needle[j])
if(!CompareMemory(haystack+i-j,needle,length(needle))
return i;
return -1; (Not found)
A regexp would generate code similar to this (I believe).
Among strings over an alphabet of c characters, let S have length s and let T_1 ... T_k have average length b. S will be searched for each of the k target strings. (The problem statement doesn't mention multiple searches of a given string; I mention it below because in that paradigm my program does well.)
The program uses O(s+c) time and space for setup, and (if S and the T_i are random strings) O(k*u*s/c) + O(k*b + k*b*s/c^u) total time for searching, with u=3 in program as shown. For longer targets, u should be increased, and rare, widely-separated key characters chosen.
In step 1, the program creates an array L of s+TsizMax integers (in program, TsizMax = allowed target length) and uses it for c lists of locations of next occurrences of characters, with list heads in H[] and tails in T[]. This is the O(s+c) time and space step.
In step 2, the program repeatedly reads and processes target strings. Step 2A chooses u = 3 different non-wild key characters (in current target). As shown, the program just uses the first three such characters; with a tiny bit more work, it could instead use the rarest characters in the target, to improve performance. Note, it doesn't cope with targets with fewer than three such characters.
The line "L[T[r]] = L[g+i] = g+i;" within Step 2A sets up a guard cell in L with proper delta offset so that Step 2G will automatically execute at end of search, without needing any extra testing during the search. T[r] indexes the tail cell of the list for character r, so cell L[g+i] becomes a new, self-referencing, end-of-list for character r. (This technique allows the loops to run with a minimum of extraneous condition testing.)
Step 2B sets vars a,b,c to head-of-list locations, and sets deltas dab, dac, and dbc corresponding to distances between the chosen key characters in target.
Step 2C checks if key characters appear in S. This step is necessary because otherwise a while loop in Step 2E will hang. We don't want more checks within those while loops because they are the inner loops of search.
Step 2D does steps 2E to 2i until var c points to after end of S, at which point it is impossible to make any more matches.
Step 2E consists of u = 3 while loops, that "enforce delta distances", that is, crawl indexes a,b,c along over each other as long as they are not pattern-compatible. The while loops are fairly fast, each being in essence (with ++si instrumentation removed) "while (v+d < w) v = L[v]" for various v, d, w. Replicating the three while loops a few times may increase performance a little and will not change net results.
In Step 2G, we know that the u key characters match, so we do a complete compare of target to match point, with wild-character handling. Step 2H reports result of compare. Program as given also reports non-matches in this section; remove that in production.
Step 2I advances all the key-character indexes, because none of the currently-indexed characters can be the key part of another match.
You can run the program to see a few operation-count statistics. For example, the output
Target 5=<de?ga>
012345678901234567890123456789012345678901
abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg
# 17, de?ga and de3ga match
# 24, de?ga and defg4 differ
# 31, de?ga and defga match
Advances: 'd' 0+3 'e' 3+3 'g' 3+3 = 6+9 = 15
shows that Step 2G was entered 3 times (ie, the key characters matched 3 times); the full compare succeeded twice; step 2E while loops advanced indexes 6 times; step 2I advanced indexes 9 times; there were 15 advances in all, to search the 42-character string for the de?ga target.
/* jiw
$Id: stringsearch.c,v 1.2 2011/08/19 08:53:44 j-waldby Exp j-waldby $
Re: Concept-code for searching a long string for short targets,
where targets may contain wildcard characters.
The user can enter any number of targets as command line parameters.
This code has 2 long strings available for testing; if the first
character of the first parameter is '1' the jay[42] string is used,
else kay[321].
Eg, for tests with *hay = jay use command like
./stringsearch 1e?g a?cd bc?e?g c?efg de?ga ddee? ddee?f
or with *hay = kay,
./stringsearch bc?e? jih? pa?j ?av??j
to exercise program.
Copyright 2011 James Waldby. Offered without warranty
under GPL v3 terms as at http://www.gnu.org/licenses/gpl.html
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
//================================================
int main(int argc, char *argv[]) {
char jay[]="abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg";
char kay[]="ludehkhtdiokihtmaihitoia1htkjkkchajajavpajkihtijkhijhipaja"
"etpajamhkajajacpajihiatokajavtoia2pkjpajjhiifakacpajjhiatkpajfojii"
"etkajamhpajajakpajihiatoiakavtoia3pakpajjhiifakacpajjhkatvpajfojii"
"ihiifojjjjhijpjkhtfdoiajadijpkoia4jihtfjavpapakjhiifjpajihiifkjach"
"ihikfkjjjjhijpjkhtfdoiajakijptoik4jihtfjakpapajjkiifjpajkhiifajkch";
char *hay = (argc>1 && argv[1][0]=='1')? jay:kay;
enum { chars=1<<CHAR_BIT, TsizMax=40, Lsiz=TsizMax+sizeof kay, L1, L2 };
int L[L2], H[chars], T[chars], g, k, par;
// Step 1. Make arrays L, H, T.
for (k=0; k<chars; ++k) H[k] = T[k] = L1; // Init H and T
for (g=0; hay[g]; ++g) { // Make linked character lists for hay.
k = hay[g]; // In same loop, could count char freqs.
if (T[k]==L1) H[k] = T[k] = g;
T[k] = L[T[k]] = g;
}
// Step 2. Read and process target strings.
for (par=1; par<argc; ++par) {
int alpha[3], at[3], a=g, b=g, c=g, da, dab, dbc, dac, i, j, r;
char * targ = argv[par];
enum { wild = '?' };
int sa=0, sb=0, sc=0, ta=0, tb=0, tc=0;
printf ("Target %d=<%s>\n", par, targ);
// Step 2A. Choose 3 non-wild characters to follow.
// As is, chooses first 3 non-wilds for a,b,c.
// Could instead choose 3 rarest characters.
for (j=0; j<3; ++j) alpha[j] = -j;
for (i=j=0; targ[i] && j<3; ++i)
if (targ[i] != wild) {
r = alpha[j] = targ[i];
if (alpha[0]==alpha[1] || alpha[1]==alpha[2]
|| alpha[0]==alpha[2]) continue;
at[j] = i;
L[T[r]] = L[g+i] = g+i;
++j;
}
if (j != 3) {
printf (" Too few target chars\n");
continue;
}
// Step 2B. Set a,b,c to head-of-list locations, set deltas.
da = at[0];
a = H[alpha[0]]; dab = at[1]-at[0];
b = H[alpha[1]]; dbc = at[2]-at[1];
c = H[alpha[2]]; dac = at[2]-at[0];
// Step 2C. See if key characters appear in haystack
if (a >= g || b >= g || c >= g) {
printf (" No match on some character\n");
continue;
}
for (g=0; hay[g]; ++g) printf ("%d", g%10);
printf ("\n%s\n", hay); // Show haystack, for user aid
// Step 2D. Search for match
while (c < g) {
// Step 2E. Enforce delta distances
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
// Step 2F. See if delta distances were met
if (a+dab==b && b+dbc==c && c<g) {
// Step 2G. Yes, so we have 3-letter-match and need to test whole match.
r = a-da;
for (k=0; targ[k]; ++k)
if ((hay[r+k] != targ[k]) && (targ[k] != wild))
break;
printf ("# %3d, %s and ", r, targ);
for (i=0; targ[i]; ++i) putchar(hay[r++]);
// Step 2H. Report match, if found
puts (targ[k]? " differ" : " match");
// Step 2I. Advance all of a,b,c, to go on looking
a = L[a]; ++ta;
b = L[b]; ++tb;
c = L[c]; ++tc;
}
}
printf ("Advances: '%c' %d+%d '%c' %d+%d '%c' %d+%d = %d+%d = %d\n",
alpha[0], sa,ta, alpha[1], sb,tb, alpha[2], sc,tc,
sa+sb+sc, ta+tb+tc, sa+sb+sc+ta+tb+tc);
}
return 0;
}
Note, if you like this answer better than current preferred answer, unmark that one and mark this one. :)
Regular expressions usually use a finite state automation-based search, I think. Try implementing that.