Preassign decision variables of a tupel set - tuples

I am trying to solve a Flexible job-shop problem and want to add some precedence constraints additionally. Therefore i want to preassign/ constraint some decision variables iterated over a tuple.
While the generic formulations of constraints and the access of the tupledata with the forall-operator do not seem to be a problem (as in (1)), I am struggling with the formulation of a specific decision variable as "the type boolean can not be used for <>" in (2).
(1) forall (t in DATA) // (2) if Operation is not assigned to ressource k Start- & End-
StartingTime[t] + CompletionTime[t] <= x[t]*L; // time = 0
// 84 Constraints, 136 Variablen (45 Binär, 91 andere)
(2) forall(t in DATA) //
jobCompletionTime[t.job == 1] <= StartingTime[t.job == 10];
My Model without the constraints:
tuple Columns { // Deklaration des Tupeltyp
int operation; // Operation
int job; // Job
int pos; // Position der Operation in job
string posID;
string ressource; // Ressource
float prozesszeit; // Prozesszeit
};
{Columns} DATA= ...; // Deklaration Tupelmenge
//Columns TupelTeilmenge = item(DATA, i); --> Tupelteilmenge (i. Tupel in Tabelle)
//int praezedenz[jobs][jobs]=...; // Vorrangmatrix
// Entscheidungsvariablen
dvar boolean x[DATA]; // 1 falls Operation ij auf Ressource k bearbeitet wird (Auswahl des jeweiligen Tupels); 0 sonst
dvar float+ StartingTime[DATA]; // Startzeit des Tupels
dvar float+ CompletionTime[DATA]; // Endzeit des Tupels
dvar boolean Y[DATA][DATA]; // 1, falls Operation ij (tupel t ) nachfolger von Operation i'j' (tupel t2) auf Ressource k; 0 sonst
dvar float+ jobCompletionTime[DATA]; // Achtung: jobs mehrfach einzelnen Operationen zugeordnet
dvar float+ maxCompletionTime;
minimize (maxCompletionTime);
subject to {

in the OPL examples you should have a look at
The flexible job-shop scheduling
This problem is an extension of the classical Job-Shop Scheduling
problem which allows an operation to be processed by any machine from
a given set. The problem is to assign each operation to a machine and
to order the operations on the machines such that the maximal
completion time (makespan) of all operations is minimized.
To access this example, go to: examples/opl/sched_jobshopflex.
and the precedence constraint is
forall (j in Jobs, o1 in Ops, o2 in Ops: o1.jobId==j && o2.jobId==j && o2.pos==1+o1.pos)
endBeforeStart(ops[o1],ops[o2]);

Related

Why is there a loop in this division as multiplication code?

I got the js code below from an archive of hackers delight (view the source)
The code takes in a value (such as 7) and spits out a magic number to multiply with. Then you bitshift to get the results. I don't remember assembly or any math so I'm sure I'm wrong but I can't find the reason why I'm wrong
From my understanding you could get a magic number by writing ceil(1/divide * 1<<32) (or <<64 for 64bit values, but you'd need bigger ints). If you multiple an integer with imul you'd get the result in one register and the remainder in another. The result register is magically the correct result of a division with this magic number from my formula
I wrote some C++ code to show what I mean. However I only tested with the values below. It seems correct. The JS code has a loop and more and I was wondering, why? Am I missing something? What values can I use to get an incorrect result that the JS code would get correctly? I'm not very good at math so I didn't understand any of the comments
#include <cstdio>
#include <cassert>
int main(int argc, char *argv[])
{
auto test_divisor = 7;
auto test_value = 43;
auto a = test_value*test_divisor;
auto b = a-1; //One less test
auto magic = (1ULL<<32)/test_divisor;
if (((1ULL<<32)%test_divisor) != 0) {
magic++; //Round up
}
auto answer1 = (a*magic) >> 32;
auto answer2 = (b*magic) >> 32;
assert(answer1 == test_value);
assert(answer2 == test_value-1);
printf("%lld %lld\n", answer1, answer2);
}
JS code from hackers delight
var two31 = 0x80000000
var two32 = 0x100000000
function magic_signed(d) { with(Math) {
if (d >= two31) d = d - two32// Treat large positive as short for negative.
var ad = abs(d)
var t = two31 + (d >>> 31)
var anc = t - 1 - t%ad // Absolute value of nc.
var p = 31 // Init p.
var q1 = floor(two31/anc) // Init q1 = 2**p/|nc|.
var r1 = two31 - q1*anc // Init r1 = rem(2**p, |nc|).
var q2 = floor(two31/ad) // Init q2 = 2**p/|d|.
var r2 = two31 - q2*ad // Init r2 = rem(2**p, |d|).
do {
p = p + 1;
q1 = 2*q1; // Update q1 = 2**p/|nc|.
r1 = 2*r1; // Update r1 = rem(2**p, |nc|.
if (r1 >= anc) { // (Must be an unsigned
q1 = q1 + 1; // comparison here).
r1 = r1 - anc;}
q2 = 2*q2; // Update q2 = 2**p/|d|.
r2 = 2*r2; // Update r2 = rem(2**p, |d|.
if (r2 >= ad) { // (Must be an unsigned
q2 = q2 + 1; // comparison here).
r2 = r2 - ad;}
var delta = ad - r2;
} while (q1 < delta || (q1 == delta && r1 == 0))
var mag = q2 + 1
if (d < 0) mag = two32 - mag // Magic number and
shift = p - 32 // shift amount to return.
return mag
}}
In the C CODE:
auto magic = (1ULL<<32)/test_divisor;
We get Integer Value in magic because both (1ULL<<32) & test_divisor are Integers.
The Algorithms requires incrementing magic on certain conditions, which is the next conditional statement.
Now, multiplication also gives Integers:
auto answer1 = (a*magic) >> 32;
auto answer2 = (b*magic) >> 32;
C CODE is DONE !
In the JS CODE:
All Variables are var ; no Data types !
No Integer Division ; No Integer Multiplication !
Bitwise Operations are not easy and not suitable to use in this Algorithm.
Numeric Data is via Number & BigInt which are not like "C Int" or "C Unsigned Long Long".
Hence the Algorithm is using loops to Iteratively add and compare whether "Division & Multiplication" has occurred to within the nearest Integer.
Both versions try to Implement the same Algorithm ; Both "should" give same answer, but JS Version is "buggy" & non-standard.
While there are many Issues with the JS version, I will highlight only 3:
(1) In the loop, while trying to get the best Power of 2, we have these two statements :
p = p + 1;
q1 = 2*q1; // Update q1 = 2**p/|nc|.
It is basically incrementing a counter & multiplying a number by 2, which is a left shift in C++.
The C++ version will not require this rigmarole.
(2) The while Condition has 2 Equality comparisons on RHS of || :
while (q1 < delta || (q1 == delta && r1 == 0))
But both these will be false in floating Point Calculations [[ eg check "Math.sqrt(2)*Math.sqrt(0.5) == 1" : even though this must be true, it will almost always be false ]] hence the while Condition is basically the LHS of || , because RHS will always be false.
(3) The JS version returns only one variable mag but user is supposed to get (& use) even variable shift which is given by global variable access. Inconsistent & BAD !
Comparing , we see that the C Version is more Standard, but Point is to not use auto but use int64_t with known number of bits.
First I think ceil(1/divide * 1<<32) can, depending on the divide, have cases where the result is off by one. So you don't need a loop but sometimes you need a corrective factor.
Secondly the JS code seems to allow for other shifts than 32: shift = p - 32 // shift amount to return. But then it never returns that. So not sure what is going on there.
Why not implement the JS code in C++ as well and then run a loop over all int32_t and see if they give the same result? That shouldn't take too long.
And when you find a d where they differ you can then test a / d for all int32_t a using both magic numbers and compare a / d, a * m_ceil and a * m_js.

how I can use if condition in ampl?

I am wondering can I use if operator in ampl? I have a set of variable x_{1},...,x_{n} and some constraints. now I have some constraints whose are valid under some circumstances. for example if x_{1}+...+x_{n}=kn+1 where `k is an integer then constraint A is valid.
is there any way that I can write it in ampl?
In other words the problem is that I want to search layer by layer in feasible reign. the layer is dot product between a point x=(x1,...,xn) and the vector 1=(1,1,1,...1) .
so
if < x,1>=1 then x has to satisfy the constraint A<1,
if =2 then x has to satisfy the constraint B<2,
.
.
.
this is what I found in AMPL website but it does not work! (n is dimension of x and k arbitrary integer)
subject to Time {if < x,1 > =kn+1}:
s.t. S1: A<1;
I'm not clear whether your example means "constraint A requires that x_[1]+...+x_[n]=4m+1 where m is an integer", or "if x_[1]+...+x_[n]=4m+1 where m is an integer, then constraint A requires some other condition to be met".
The former is trivial to code:
var m integer;
s.t. c1: sum{i in 1..n} x_[i] = 4m+1;
It does require a solver with MIP capability. From your tags I assume you're using CPLEX, which should be fine.
For the latter: AMPL does have some support for logical constraints, documented here. Depending on your problem, it's also sometimes possible to code logical constraints as linear integer constraints.
For example, if the x[i] variables in your example are also integers, you can set things up like so:
var m integer;
var r1 integer in 0..1;
var r2 integer in 0..2;
s.t. c1: r2 <= 2*r1; # i.e. r2 can only be non-zero if r1 = 1
s.t. c2: sum{i in 1..n} x_[i] = 4m+r1+r2;
var remainder_is_1 binary;
s.t. c3: remainder_is_1 >= r1-r2;
s.t. c4: remainder_is_1 <= 1-r2/2;
Taken together, these constraints ensure that remainder_is_1 equals 1 if and only if sum{i in 1..n} x_[i] = 4m+1 for some integer m. You can then use this variable in other constraints. This sort of trick can be handy if you only have a few logical constraints to deal with, but if you have many, it'll be more efficient to use the logical constraint options if they're available to you.

Intersection of two BDDs using CUDD

I would like to find intersection of two BDDs for the following two Boolean functions:
F=A'B'C'D'=1
G=A XOR B XOR C XOR D=1
Here is my code:
int main (int argc, char *argv[])
{
char filename[30];
DdManager *gbm; /* Global BDD manager. */
gbm = Cudd_Init(0,0,CUDD_UNIQUE_SLOTS,CUDD_CACHE_SLOTS,0); /* Initialize a new BDD manager. */
DdNode *bdd, *var, *tmp_neg, *tmp,*f,*g;
int i;
bdd = Cudd_ReadOne(gbm); /*Returns the logic one constant of the manager*/
Cudd_Ref(bdd); /*Increases the reference count of a node*/
for (i = 3; i >= 0; i--) {
var = Cudd_bddIthVar(gbm,i); /*Create a new BDD variable*/
tmp_neg = Cudd_Not(var); /*Perform NOT Boolean operation*/
tmp = Cudd_bddAnd(gbm, tmp_neg, bdd); /*Perform AND Boolean operation*/
Cudd_Ref(tmp);
Cudd_RecursiveDeref(gbm,bdd);
f = tmp;
}
for (i = 3; i >= 0; i--) {
var = Cudd_bddIthVar(gbm,i); /*Create a new BDD variable*/
tmp = Cudd_bddXor(gbm, var, bdd); /*Perform AND Boolean operation*/
Cudd_Ref(tmp);
Cudd_RecursiveDeref(gbm,bdd);
g = tmp;
}
bdd= Cudd_bddIntersect(gbm,f,g);/*Intersection between F and G */
bdd = Cudd_BddToAdd(gbm, bdd); /*Convert BDD to ADD for display purpose*/
print_dd (gbm, bdd, 2,4); /*Print the dd to standard output*/
sprintf(filename, "./bdd/graph.dot"); /*Write .dot filename to a string*/
write_dd(gbm, bdd, filename); /*Write the resulting cascade dd to a file*/
Cudd_Quit(gbm);
return 0;
}
And here is the result I got:
DdManager nodes: 7 | DdManager vars: 4 | DdManager reorderings: 0 | DdManager memory: 8949888
: 3 nodes 2 leaves 2 minterms
ID = 0xaa40f index = 0 T = 0 E = 1
0--- 1
As you can see here the intersection gives A=0 and don't cares for B,C and D. I was expecting values of A,B,C and D that satifies both F and G. But clearly A=0 is not the solution for both F and G. For example someone can choose A=0,B=1 which gives 0 for function F. What is wrong here?
This reply comes awfully late, but just to close the issue, the problem is that the last operand to both Cudd_bddAnd and Cudd_bddXor is bdd instead of f or g. Of course, both f and g should be properly initialized (the way bdd is currently initialized). Fixing the code this way will also take care of the multiple dereferences of bdd, which are going to cause grief should garbage collection kick in.
Also, Cudd_bddIntersect does not compute the AND of two BDDs, but a function that implies the AND. It's used when one wants a witness to the nonemptiness of the conjunction of two BDDs without computing the whole result (and then possibly extracting a witness from it).
Finally, bdd is used as both operand to Cudd_BddToAdd and as destination for the return value. This is guaranteed to "leak" BDD nodes.

Why is My Program not Working [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am a noob programmer,who just started in C++. I wrote a program, to answer a question. When I try to run it from my cmd.exe, windows tells me "a problem has caused this program to stop working, we'll close the program and notify you when a solution is available".
I have included a link to the well documented source code. Please take a look at the code, and help me out.
link: http://mibpaste.com/ZRevGf
i believe, that figuring out the error, with my code may help several other noob programmers out there, who may use similar methods to mine.
Code from link:
//This is the source code for a puzzle,well kind of that I saw on the internet. I will include the puzzle's question below.
//Well, I commented it so I hope you understand.
//ALAFIN OLUWATOBI 100L DEPARTMENT OF COMPUTER SCIENCE BABCOCK UNIVERSITY.
//Future CEO of VERI Technologies inc.
/*
* In a corridor, there are 100 doors. All the doors are initially closed.
* You walk along the corridor back and forth. As you walk along the corridor, you reverse the state of each door.
* I.e if the door is open, you close it, and if it is closed, you open it.
* You walk along the corrdor, a total of 200 times.
* On your nth trip, You stop at every nth door, that you come across.
* I.e on your first trip, you stop at every door. On your second trip, every second door, on your third trip every third door and so on and so forth
* Write a program to display, the final states of the doors.
*/
#include <iostream>
#include <cstdlib>
#include <cmath>
using namespace std;
inline void inverse(bool args[]); //The prototype of the function. I made the function inline in the declaration, to increase efficiency, ad speed of execution.
bool doors [200]; //Declaring a global array, for the doors.
int main ()
{
inverse(doors); //A call to the inverse function
cout << "This is the state of the 100 doors...\n";
for (int i = 0 ; i<200 ; i++) //Loop, to dis play the final states of the doors.
{
cout << "DOOR " << (i+1) << "\t|" << doors[i] << endl;
}
cout << "Thank you, for using this program designed by VERI Technologies. :)"; //VERI Technologies, is the name of the I.T company that I hope to establish.
return 0;
}
void inverse(bool args [])
{
for (int n = 1 ; n<= 200 ; n++) //This loop, is for the control of every nth trip. It executes 100 times
{
if (n%2 != 0) //This is to control the reversal of the doors going forward, I.e on odd numbers
{
for (int b = n, a = 1 ; b<=200 ;b = n*++a) //This is the control loop, for every odd trip, going forwards. It executes 100 times
args [b] = !args[b] ; //The reversal operation. It reverses the boolean value of the door.
}
/*
* The two variables, are declared. They will be used in controlling the program. b represents the number of the door to be operated on.
* a is a variable, which we shall use to control the value of b.
* n remains constant for the duration, of the loop, as does (200-n)
* the pre increment of a {++a} multiplied by n or (200-n) is used to calculate the value of b in the update.
* Thus, we have the scenario, of b increasing in multiples of n. Achieving what is desired for the program. Through this construct, only every nth door is considered.
*/
else if((n%2) == 0) //This is to control the reversal of the doors going backwards, I.e on even numbers
{
for (int b = (200-n), a = 1 ; b>=1 ; b = (200-n)*++a) //This is the control loop for every even trip, going backwards. It executes 100 times.
args [b] = !args[b] ; //The reversal operation. It reverses the boolean value of the door.
}
}
}
I believe the exception is due to the line:
for (int b = (200 - n), a = 1; b >= 1; b = (200 - n)*++a)
When the exception occurs the following values are assigned to the variables:
b = 3366
n = 2
a = 17
From what I can see, b is calculated by (200 - n) * a.
If we substitute the values given we have: 198 * 17
This gives us the value of 3366 which is beyond the index of doors and throws the exception when the line
args[b] = !args[b];
is executed.
I have created the following solution that should provide the desired results if you wish to use it.
void inverse(bool args[])
{
//n represents what trip you are taking down the hallway
//i.e. n = 1 is the first trip, n = 2 the second, and so on
for (int n = 1; n <= 200; n++){
//We are on trip n, so now we must change the state of all the doors for the trip
//The current door is represented by i
//i.e. i = 1 is the first door, i = 2 the second, and so on
for (int i = 1; i <= 200; i++){
//If the current door mod the trip is 0 then we must change the state of the door
//Only the nth door will be changed which occurs when i mod n equals 0
//We modify the state of doors[i - 1] as the array of doors is 0 - 199 but we are counting doors from 1 to 200
//So door 1 mod trip 1 will equal 0 so we must change the state of door 1, which is really doors[0]
if (i % n == 0){
args[i - 1] = !args[i - 1];
}
}
}
EUREKA!!!!!!
I finally came up with a working solution. No more errors. I'm calling it version 2.0.0
I've uploaded it online, and here's the link
[version 2.0.0] http://mibpaste.com/3NADgl
All that's left is to go to excel, and derive the final states of the door and be sure, that it's working perfectly. Please take a look at my solution, and comment on any error that I may have made, or any way you think that I may optimize the code.I thank you for your help, it allowed me to redesign a working solution to the program. I'm sstarting to think that an Out-of-bounds error, might have caused my version 1 to crash, but the logic was flawed, anyway, so I'm scrapping it.
This is ths code:
/**********************************************************************************************
200 DOOR PROGRAM
Version 2.0.0
Author: Alafin OluwaTobi Department of Computer Science, Babcock University
New Additions: I redrew, the algorithm, to geneate a more logically viable solution,
I additionally, expanded the size of the array, to prevent a potential out of bounds error.
**********************************************************************************************/
//Hello. This a program,I've written to solve a fun mental problem.
//I'll include a full explanation of the problem, below.
/**********************************************************************************************
*You are in a Hallway, filled with 200 doors .
*ALL the doors are initially closed .
*You walk along the corridor, *BACK* and *FORTH* reversing the state of every door which you stop at .
*I.e if it is open, you close it .
*If it is closed, you open it .
*On every nth trip, you stop at every nth door .
*I.e on your first trip, you stop at every door. On your second trip every second door, On your third trip every third door, etc .
*Write a program to display the final state of the doors .
**********************************************************************************************/
/**********************************************************************************************
SOLUTION
*NOTE: on even trips, your coming back, while on odd trips your going forwards .
*2 Imaginary doors, door 0 and 201, delimit the corridor .
*On odd trips, the doors stopped at will be (0+n) doors .
*I.e you will be counting forward, in (0+n) e.g say, n = 5: 5, 10, 15, 20, 25
*On even trips, the doors stopped at will be (201-n) doors.
*I.e you will be counting backwards in (201-n) say n = 4: 197, 193, 189, 185, 181
**********************************************************************************************/
#include <iostream>
#include <cstdlib> //Including the basic libraries
bool HALLWAY [202] ;
/*
*Declaring the array, for the Hallway, as global in order to initialise all the elements at zero.
*In addition,the size is set at 202 to make provision for the delimiting imaginary doors,
*This also serves to prevent potential out of bound errors, that may occur, in the use of thefor looplater on.
*/
inline void inverse (bool args []) ;
/*
*Prototyping the function, which will be used to reverse the states of the door.
*The function, has been declared as inline in order to allow faster compilation, and generate a faster executable program.
*/
using namespace std ; //Using the standard namespace
int main ()
{
inverse (HALLWAY) ; //Calling the inverse function, to act on the Hallway, reversing the doors.
cout << "\t\t\t\t\t\t\t\t\t\t200 DOOR TABLE\n" ;
for(int i = 1 ; i <= 200 ; i++ )
//A loop to display the states of the doors.
{
if (HALLWAY [i] == 0)
//The if construct allows us to print out the state of the door as closed, when the corresponding element of the Array has a value of zero.
{
cout << "DOOR " << i << " is\tCLOSED" << endl ;
for (int z = 0 ; z <= 300 ; z++)
cout << "_" ;
cout << "\n" ;
}
else if (HALLWAY [i] == 1)
//The else if construct allows us to print out the state of the door as open, when the corresponding element of the Array has a value of one.
{
cout << "DOOR " << i << " is\tOPEN" << endl ;
for (int z = 0 ; z <= 300 ; z++)
cout << "_" ;
cout << "\n" ;
}
}
return 0 ; //Returns the value of zero, to show that the program executed properly
}
void inverse (bool args[])`
{
for ( int n = 1; n <= 200 ; n++)
//This loop, is to control the individual trips, i.e trip 1, 2, 3, etc..
{
if (n%2 == 0)
//This if construct, is to ensure that on even numbers(i,e n%2 = 0), that you are coming down the hallway and counting backwards
{
for (int b = (201-n) ; b <= 200 && b >= 1 ; b -= n)
/*
*This loop, is for the doors that you stop at on your nth trip.
*The door is represented by the variable b.
*Because you are coming back, b will be reducing proportionally, in n.
*The Starting value for b on your nth trip, will be (201-n)
* {b -= n} takes care of this. On the second turn for example. First value of b will be 199, 197, 195, 193, ..., 1
*/
args [b] = !(args [b]) ;
//This is the actual reversal operation, which reverses the state of the door.
}
else if (n%2 != 0)
//This else if construct, is to ensure that on odd numbers(i.e n%2 != 0), that you are going up the hallway and counting forwards
{
for (int b = n ; b <= 200 && b >= 1 ; b += n)
/*
*This loop, is for the doors that you stop at on your nth trip.
*The door is represented by the variable b.
*Because you are going forwards, b will be increasing proportionally, in n.
*The starting value of b will be (0+n) whch is equal to n
* {b += n} takes care of this. On the third turn for example. First value of b will be 3, 6, 9, 12, ...., 198
*/
args [b] = !(args [b]) ;
//This is the actual reversal operation, which reverses the state of the door
}
}
}

Wildcard String Search Algorithm

In my program I need to search in a quite big string (~1 mb) for a relatively small substring (< 1 kb).
The problem is the string contains simple wildcards in the sense of "a?c" which means I want to search for strings like "abc" or also "apc",... (I am only interested in the first occurence).
Until now I use the trivial approach (here in pseudocode)
algorithm "search", input: haystack(string), needle(string)
for(i = 0, i < length(haystack), ++i)
if(!CompareMemory(haystack+i,needle,length(needle))
return i;
return -1; (Not found)
Where "CompareMemory" returns 0 iff the first and second argument are identical (also concerning wildcards) only regarding the amount of bytes the third argument gives.
My question is now if there is a fast algorithm for this (you don't have to give it, but if you do I would prefer c++, c or pseudocode). I started here
but I think most of the fast algorithms don't allow wildcards (by the way they exploit the nature of strings).
I hope the format of the question is ok because I am new here, thank you in advance!
A fast way, which is kind of the same thing as using a regexp, (which I would recommend anyway), is to find something that is fixed in needle, "a", but not "?", and search for it, then see if you've got a complete match.
j = firstNonWildcardPos(needle)
for(i = j, i < length(haystack)-length(needle)+j, ++i)
if(haystack[i] == needle[j])
if(!CompareMemory(haystack+i-j,needle,length(needle))
return i;
return -1; (Not found)
A regexp would generate code similar to this (I believe).
Among strings over an alphabet of c characters, let S have length s and let T_1 ... T_k have average length b. S will be searched for each of the k target strings. (The problem statement doesn't mention multiple searches of a given string; I mention it below because in that paradigm my program does well.)
The program uses O(s+c) time and space for setup, and (if S and the T_i are random strings) O(k*u*s/c) + O(k*b + k*b*s/c^u) total time for searching, with u=3 in program as shown. For longer targets, u should be increased, and rare, widely-separated key characters chosen.
In step 1, the program creates an array L of s+TsizMax integers (in program, TsizMax = allowed target length) and uses it for c lists of locations of next occurrences of characters, with list heads in H[] and tails in T[]. This is the O(s+c) time and space step.
In step 2, the program repeatedly reads and processes target strings. Step 2A chooses u = 3 different non-wild key characters (in current target). As shown, the program just uses the first three such characters; with a tiny bit more work, it could instead use the rarest characters in the target, to improve performance. Note, it doesn't cope with targets with fewer than three such characters.
The line "L[T[r]] = L[g+i] = g+i;" within Step 2A sets up a guard cell in L with proper delta offset so that Step 2G will automatically execute at end of search, without needing any extra testing during the search. T[r] indexes the tail cell of the list for character r, so cell L[g+i] becomes a new, self-referencing, end-of-list for character r. (This technique allows the loops to run with a minimum of extraneous condition testing.)
Step 2B sets vars a,b,c to head-of-list locations, and sets deltas dab, dac, and dbc corresponding to distances between the chosen key characters in target.
Step 2C checks if key characters appear in S. This step is necessary because otherwise a while loop in Step 2E will hang. We don't want more checks within those while loops because they are the inner loops of search.
Step 2D does steps 2E to 2i until var c points to after end of S, at which point it is impossible to make any more matches.
Step 2E consists of u = 3 while loops, that "enforce delta distances", that is, crawl indexes a,b,c along over each other as long as they are not pattern-compatible. The while loops are fairly fast, each being in essence (with ++si instrumentation removed) "while (v+d < w) v = L[v]" for various v, d, w. Replicating the three while loops a few times may increase performance a little and will not change net results.
In Step 2G, we know that the u key characters match, so we do a complete compare of target to match point, with wild-character handling. Step 2H reports result of compare. Program as given also reports non-matches in this section; remove that in production.
Step 2I advances all the key-character indexes, because none of the currently-indexed characters can be the key part of another match.
You can run the program to see a few operation-count statistics. For example, the output
Target 5=<de?ga>
012345678901234567890123456789012345678901
abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg
# 17, de?ga and de3ga match
# 24, de?ga and defg4 differ
# 31, de?ga and defga match
Advances: 'd' 0+3 'e' 3+3 'g' 3+3 = 6+9 = 15
shows that Step 2G was entered 3 times (ie, the key characters matched 3 times); the full compare succeeded twice; step 2E while loops advanced indexes 6 times; step 2I advanced indexes 9 times; there were 15 advances in all, to search the 42-character string for the de?ga target.
/* jiw
$Id: stringsearch.c,v 1.2 2011/08/19 08:53:44 j-waldby Exp j-waldby $
Re: Concept-code for searching a long string for short targets,
where targets may contain wildcard characters.
The user can enter any number of targets as command line parameters.
This code has 2 long strings available for testing; if the first
character of the first parameter is '1' the jay[42] string is used,
else kay[321].
Eg, for tests with *hay = jay use command like
./stringsearch 1e?g a?cd bc?e?g c?efg de?ga ddee? ddee?f
or with *hay = kay,
./stringsearch bc?e? jih? pa?j ?av??j
to exercise program.
Copyright 2011 James Waldby. Offered without warranty
under GPL v3 terms as at http://www.gnu.org/licenses/gpl.html
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
//================================================
int main(int argc, char *argv[]) {
char jay[]="abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg";
char kay[]="ludehkhtdiokihtmaihitoia1htkjkkchajajavpajkihtijkhijhipaja"
"etpajamhkajajacpajihiatokajavtoia2pkjpajjhiifakacpajjhiatkpajfojii"
"etkajamhpajajakpajihiatoiakavtoia3pakpajjhiifakacpajjhkatvpajfojii"
"ihiifojjjjhijpjkhtfdoiajadijpkoia4jihtfjavpapakjhiifjpajihiifkjach"
"ihikfkjjjjhijpjkhtfdoiajakijptoik4jihtfjakpapajjkiifjpajkhiifajkch";
char *hay = (argc>1 && argv[1][0]=='1')? jay:kay;
enum { chars=1<<CHAR_BIT, TsizMax=40, Lsiz=TsizMax+sizeof kay, L1, L2 };
int L[L2], H[chars], T[chars], g, k, par;
// Step 1. Make arrays L, H, T.
for (k=0; k<chars; ++k) H[k] = T[k] = L1; // Init H and T
for (g=0; hay[g]; ++g) { // Make linked character lists for hay.
k = hay[g]; // In same loop, could count char freqs.
if (T[k]==L1) H[k] = T[k] = g;
T[k] = L[T[k]] = g;
}
// Step 2. Read and process target strings.
for (par=1; par<argc; ++par) {
int alpha[3], at[3], a=g, b=g, c=g, da, dab, dbc, dac, i, j, r;
char * targ = argv[par];
enum { wild = '?' };
int sa=0, sb=0, sc=0, ta=0, tb=0, tc=0;
printf ("Target %d=<%s>\n", par, targ);
// Step 2A. Choose 3 non-wild characters to follow.
// As is, chooses first 3 non-wilds for a,b,c.
// Could instead choose 3 rarest characters.
for (j=0; j<3; ++j) alpha[j] = -j;
for (i=j=0; targ[i] && j<3; ++i)
if (targ[i] != wild) {
r = alpha[j] = targ[i];
if (alpha[0]==alpha[1] || alpha[1]==alpha[2]
|| alpha[0]==alpha[2]) continue;
at[j] = i;
L[T[r]] = L[g+i] = g+i;
++j;
}
if (j != 3) {
printf (" Too few target chars\n");
continue;
}
// Step 2B. Set a,b,c to head-of-list locations, set deltas.
da = at[0];
a = H[alpha[0]]; dab = at[1]-at[0];
b = H[alpha[1]]; dbc = at[2]-at[1];
c = H[alpha[2]]; dac = at[2]-at[0];
// Step 2C. See if key characters appear in haystack
if (a >= g || b >= g || c >= g) {
printf (" No match on some character\n");
continue;
}
for (g=0; hay[g]; ++g) printf ("%d", g%10);
printf ("\n%s\n", hay); // Show haystack, for user aid
// Step 2D. Search for match
while (c < g) {
// Step 2E. Enforce delta distances
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
// Step 2F. See if delta distances were met
if (a+dab==b && b+dbc==c && c<g) {
// Step 2G. Yes, so we have 3-letter-match and need to test whole match.
r = a-da;
for (k=0; targ[k]; ++k)
if ((hay[r+k] != targ[k]) && (targ[k] != wild))
break;
printf ("# %3d, %s and ", r, targ);
for (i=0; targ[i]; ++i) putchar(hay[r++]);
// Step 2H. Report match, if found
puts (targ[k]? " differ" : " match");
// Step 2I. Advance all of a,b,c, to go on looking
a = L[a]; ++ta;
b = L[b]; ++tb;
c = L[c]; ++tc;
}
}
printf ("Advances: '%c' %d+%d '%c' %d+%d '%c' %d+%d = %d+%d = %d\n",
alpha[0], sa,ta, alpha[1], sb,tb, alpha[2], sc,tc,
sa+sb+sc, ta+tb+tc, sa+sb+sc+ta+tb+tc);
}
return 0;
}
Note, if you like this answer better than current preferred answer, unmark that one and mark this one. :)
Regular expressions usually use a finite state automation-based search, I think. Try implementing that.