Implementing the Backward Nondeterministic Dawg Matching algorithm - c++

I'm trying to implement the BNDM algorithm in my code, in order to perform a fast pattern search.
I found some code online and tried to adjust it for my use case:
I think that I did something wrong while changing the values, since the algorithm takes a few minutes to finish (I was expecting it to be faster).
Using std::search takes me 30 seconds (with wildcards).
This takes me around 4-5 minutes (without wildcards).
The reason I'm casting everything to (unsigned char) is because the program crashes otherwise, since both my data and pattern hold hex values.
What I'd like to know is, where did I go wrong with this implementation (why is it running so slow)? and how can I include the ability to search for a pattern that contains wildcards?
EDIT*
The issue with speed has been solved by switching build from debug to release.
Also changing the size of the B array to 256 made it even faster.
The only issue I currently have now is how to implement a way to use wildcards using this algorithm.
Current code:
vector<unsigned int> get_matches(const vector<char> & data, const string & pattern) {
vector<unsigned int> matches;
//vector<char>::const_iterator walk = data.begin();
std::array<std::uint32_t, 256> B{ 0 };
int m = pattern.size();
int n = data.size();
int i, j, s, d, last;
//if (m > WORD_SIZE)
// error("BNDM");
// Pre processing
//memset(B, 0, ASIZE * sizeof(int));
s = 1;
for (i = m - 1; i >= 0; i--) {
B[(unsigned char)pattern[i]] |= s;
s <<= 1;
}
// Searching phase
j = 0;
while (j <= n - m) {
i = m - 1; last = m;
d = ~0;
while (i >= 0 && d != 0) {
d &= B[(unsigned char)data[j + i]];
i--;
if (d != 0) {
if (i >= 0)
last = i + 1;
else
matches.emplace_back(j);
}
d <<= 1;
}
j += last;
}
return matches;
}

B is not big enough -- it is indexed by the bytes in the pattern so it must have 256 elements (assuming an 8-bit byte architecture.) But you define it as having pattern.size() elements, which is a much smaller number.
As a consequence, you are using memory outside of B's allocation, which is Undefined Behaviour.
I suggest you use std::array<std::uint32_t, 256>, since you don't ever need to resize B. (Or even better, std::array<std::uint32_t, std::numeric_limits<unsigned char>::max()+1>).
I'm not an expert on this particular search algorithm, but the preprocessing step appears to set bit p in element c of B if the character c matches pattern element p. Since a wildcard pattern element can match any character, it seems reasonable that every element of B should have the bits corresponding to wildcard characters set. In other words, instead of initialising every element of B to 0, initialise them to the mask of wildcard positions in the pattern.
I don't know if that is sufficient to get the algorithm to work with wildcards, but it could be worth a try.

Related

Fastest way to find smallest missing integer from list of integers

I have a list of 100 random integers. Each random integer has a value from 0 to 99. Duplicates are allowed, so the list could be something like
56, 1, 1, 1, 1, 0, 2, 6, 99...
I need to find the smallest integer (>= 0) is that is not contained in the list.
My initial solution is this:
vector<int> integerList(100); //list of random integers
...
vector<bool> listedIntegers(101, false);
for (int theInt : integerList)
{
listedIntegers[theInt] = true;
}
int smallestInt;
for (int j = 0; j < 101; j++)
{
if (!listedIntegers[j])
{
smallestInt = j;
break;
}
}
But that requires a secondary array for book-keeping and a second (potentially full) list iteration. I need to perform this task millions of times (the actual application is in a greedy graph coloring algorithm, where I need to find the smallest unused color value with a vertex adjacency list), so I'm wondering if there's a clever way to get the same result without so much overhead?
It's been a year, but ...
One idea that comes to mind is to keep track of the interval(s) of unused values as you iterate the list. To allow efficient lookup, you could keep intervals as tuples in a binary search tree, for example.
So, using your sample data:
56, 1, 1, 1, 1, 0, 2, 6, 99...
You would initially have the unused interval [0..99], and then, as each input value is processed:
56: [0..55][57..99]
1: [0..0][2..55][57..99]
1: no change
1: no change
1: no change
0: [2..55][57..99]
2: [3..55][57..99]
6: [3..5][7..55][57..99]
99: [3..5][7..55][57..98]
Result (lowest value in lowest remaining interval): 3
I believe there is no faster way to do it. What you can do in your case is to reuse vector<bool>, you need to have just one such vector per thread.
Though the better approach might be to reconsider the whole algorithm to eliminate this step at all. Maybe you can update least unused color on every step of the algorithm?
Since you have to scan the whole list no matter what, the algorithm you have is already pretty good. The only improvement I can suggest without measuring (that will surely speed things up) is to get rid of your vector<bool>, and replace it with a stack-allocated array of 4 32-bit integers or 2 64-bit integers.
Then you won't have to pay the cost of allocating an array on the heap every time, and you can get the first unused number (the position of the first 0 bit) much faster. To find the word that contains the first 0 bit, you only need to find the first one that isn't the maximum value, and there are bit twiddling hacks you can use to get the first 0 bit in that word very quickly.
You program is already very efficient, in O(n). Only marginal gain can be found.
One possibility is to divide the number of possible values in blocks of size block, and to register
not in an array of bool but in an array of int, in this case memorizing the value modulo block.
In practice, we replace a loop of size N by a loop of size N/block plus a loop of size block.
Theoretically, we could select block = sqrt(N) = 12 in order to minimize the quantity N/block + block.
In the program hereafter, block of size 8 are selected, assuming that dividing integers by 8 and calculating values modulo 8 should be fast.
However, it is clear that a gain, if any, can be obtained only for a minimum value rather large!
constexpr int N = 100;
int find_min1 (const std::vector<int> &IntegerList) {
constexpr int Size = 13; //N / block
constexpr int block = 8;
constexpr int Vmax = 255; // 2^block - 1
int listedBlocks[Size] = {0};
for (int theInt : IntegerList) {
listedBlocks[theInt / block] |= 1 << (theInt % block);
}
for (int j = 0; j < Size; j++) {
if (listedBlocks[j] == Vmax) continue;
int &k = listedBlocks[j];
for (int b = 0; b < block; b++) {
if ((k%2) == 0) return block * j + b;
k /= 2;
}
}
return -1;
}
Potentially you can reduce the last step to O(1) by using some bit manipulation, in your case __int128, set the corresponding bits in loop one and call something like __builtin_clz or use the appropriate bit hack
The best solution I could find for finding smallest integer from a set is https://codereview.stackexchange.com/a/179042/31480
Here are c++ version.
int solution(std::vector<int>& A)
{
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
while (0 < A[i] && A[i] - 1 < A.size()
&& A[i] != i + 1
&& A[i] != A[A[i] - 1])
{
int j = A[i] - 1;
auto tmp = A[i];
A[i] = A[j];
A[j] = tmp;
}
}
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
if (A[i] != i+1)
{
return i + 1;
}
}
return A.size() + 1;
}

how to find distinct substrings?

Given a string, and a fixed length l, how can I count the number of distinct substrings whose length is l?
The size of character set is also known. (denote it as s)
For example, given a string "PccjcjcZ", s = 4, l = 3,
then there are 5 distinct substrings:
“Pcc”; “ccj”; “cjc”; “jcj”; “jcZ”
I try to use hash table, but the speed is still slow.
In fact I don't know how to use the character size.
I have done things like this
int diffPatterns(const string& src, int len, int setSize) {
int cnt = 0;
node* table[1 << 15];
int tableSize = 1 << 15;
for (int i = 0; i < tableSize; ++i) {
table[i] = NULL;
}
unsigned int hashValue = 0;
int end = (int)src.size() - len;
for (int i = 0; i <= end; ++i) {
hashValue = hashF(src, i, len);
if (table[hashValue] == NULL) {
table[hashValue] = new node(i);
cnt ++;
} else {
if (!compList(src, i, table[hashValue], len)) {
cnt ++;
};
}
}
for (int i = 0; i < tableSize; ++i) {
deleteList(table[i]);
}
return cnt;
}
Hastables are fine and practical, but keep in mind that if the length of substrings is L, and the whole string length is N, then the algorithm is Theta((N+1-L)*L) which is Theta(NL) for most L. Remember, just computing the hash takes Theta(L) time. Plus there might be collisions.
Suffix trees can be used, and provide a guaranteed O(N) time algorithm (count number of paths at depth L or greater), but the implementation is complicated. Saving grace is you can probably find off the shelf implementations in the language of your choice.
The idea of using a hashtable is good. It should work well.
The idea of implementing your own hashtable as an array of length 2^15 is bad. See Hashtable in C++? instead.
You can use an unorder_set and insert the strings into the set and then get the size of the set. Since the values in a set are unique it will take care of not including substrings that are the same as ones previously found. This should give you close to O(StringSize - SubstringSize) complexity
#include <iostream>
#include <string>
#include <unordered_set>
int main()
{
std::string test = "PccjcjcZ";
std::unordered_set<std::string> counter;
size_t substringSize = 3;
for (size_t i = 0; i < test.size() - substringSize + 1; ++i)
{
counter.insert(test.substr(i, substringSize));
}
std::cout << counter.size();
std::cin.get();
return 0;
}
Veronica Kham answered good to the question, but we can improve this method to expected O(n) and still use a simple hash table rather than suffix tree or any other advanced data structure.
Hash function
Let X and Y are two adjacent substrings of length L, more precisely:
X = A[i, i + L - 1]
Y = B[i + 1, i + 1 + L - 1]
Let assign to each letter of our alphabet a single non negative integer, for example a := 1, b := 2 and so on.
Let's define a hash function h now:
h(A[i, j]) := (P^(L-1) * A[i] + P^(L-2) * A[i + 1] + ... + A[j]) % M
where P is a prime number ideally greater than the alphabet size and M is a very big number denoting the number of different possible hashes, for example you can set M to maximum available unsigned long long int in your system.
Algorithm
The crucial observation is the following:
If you have a hash computed for X, you can compute a hash for Y in
O(1) time.
Let assume that we have computed h(X), which can be done in O(L) time obviously. We want to compute h(Y). Notice that since X and Y differ by only 2 characters, and we can do that easily using addition and multiplication:
h(Y) = ((h(X) - P^L * A[i]) * P) + A[j + 1]) % M
Basically, we are subtracting letter A[i] multiplied by its coefficient in h(X), multiplying the result by P in order to get proper coefficients for the rest of letters and at the end, we are adding the last letter A[j + 1].
Notice that we can precompute powers of P at the beginning and we can do it modulo M.
Since our hashing functions returns integers, we can use any hash table to store them. Remember to make all computations modulo M and avoid integer overflow.
Collisions
Of course, there might occur a collision, but since P is prime and M is really huge, it is a rare situation.
If you want to lower the probability of a collision, you can use two different hashing functions, for example by using different modulo in each of them. If probability of a collision is p using one such function, then for two functions it is p^2 and we can make it arbitrary small by this trick.
Use Rolling hashes.
This will make the runtime expected O(n).
This might be repeating pkacprzak's answer, except, it gives a name for easier remembrance etc.
Suffix Automaton also can finish it in O(N).
It's easy to code, but hard to understand.
Here are papers about it http://dl.acm.org/citation.cfm?doid=375360.375365
http://www.sciencedirect.com/science/article/pii/S0304397509002370

Understanding the Baeza-Yates Régnier algorithm (multiple string matching, extended from Boyer-Moore)

First of all, excuse me if I write a lot, I tried to summarize my research so that everyone can understand.
R. Baeza-Yates and M. Regnier published in 1990 a new algorithm for searching a two dimensional mm pattern in a two dimensional nn text. The publication is very well written and quite understandable for a novice like me, the algorithm is described in pseudocode and I was able to implements it successfully.
One part of the BYR algorithm requires the Aho-Corasick algorithm. This allows to search occurences of multiple keywords in a string text. However, they also say that this part of their algorithm can be greatly improved by using Aho-Corasick not, but Commentz-Walter algorithm (based on Boyer-Moore rather than Knuth-Morris-Pratt algorithm). They evoke an alternative to the Commentz-Walter algorithm, alternative that they themselves developed. This is described and explained in their previous publication (see 4th chapter).
This is where my problem lies. As I said, the algorithm goes through the text and check if it contains a word from the set of keywords. The words are arranged upside down and placed in a tree. To be efficient, it will sometimes be necessary to skip a number of letters, when he knows that there is no match found.
To determine the number of characters that can be skipped, two tables d and dd have to be computed. Then, the algorithm is very simple:
The algorithm works as follows:
We align the root of the trie with position m in the text, and we start matching the text from right to left following the corresponding
path in the trie.
If a match is found (final node), we output the index of the corresponding string.
After a match or mismatch, we move the trie further in the text using the maximum of the shift associated to the current node (means dd), and the
value of d[x], where x is the character in the text corresponding to
the root of the trie.
Start matching the trie again from right to left in the new position.
My problem is that I do not know how to compute the dd function. In their publication, R. Baeza-Yates and M. Regnier propose a formal definition of it:
pi is a word among the set of keyword, j is the index of a letter in this word, so pi[j] is like a node in the previous trie I showed. Number in the node represented dd(node). L is the number of words, and mi is the number of letters in the word pi.
They give no indication concerning the construction of this function. They only recommend to watch the work of W. Rytter. This document builds a function similar to that expected, the difference being that in this case, there is only one keyword and not a set.
The definiton of dd (called D here), is as follow:
It may be noted similarities with the previous definition, but I do not understand everything.
The pseudocode for the construction of this function is given in the paper, I have implemented it, here in C++:
int pattern[] = { 1, 2, 3, 1 }; /* I use int instead of char, simpler */
const int n = sizeof(pattern) / 4;
int D[n];
int f[n];
int j = n;
int t = n + 1;
for (int k = 1; k <= n; k++){
D[k-1] = 2 * n - k;
}
while (j > 0) {
f[j-1] = t;
while (t <= n) {
if (pattern[j-1] != pattern[t-1]) {
D[t-1] = min(D[t-1], n - j);
t = f[t-1];
}
else {
break;
}
}
t = t - 1;
j = j - 1;
}
int f1[n];
int q = t;
t = n + 1 - q;
int q1 = 1;
int j1 = 1;
int t1 = 0;
while (j1 <= t) {
f1[j1 - 1] = t1;
while (t1 >= 1) {
if (pattern[j1 - 1] != pattern[t1 - 1]) {
t1 = f1[t1 - 1];
}
else {
break;
}
}
t1 = t1 + 1;
j1 = j1 + 1;
}
while (q < n) {
for (int k = q1; k <= q; k++) {
D[k - 1] = min(D[k - 1], n + q - k);
}
q1 = q + 1;
q = q + t - f1[t - 1];
t = f1[t - 1];
}
for (int i = 0; i < n; i++)
{
cout << D[i] << " ";
}
It works, but I do not know how to expand it for several words, I do not know how to coincide with the formal definition of dd given by Baeza-Yates and Régnier. I said that the two definitions was similar, but I do not know to what extent.
I did not find any other information about their algorithm, it is impossible for me to know how to implement the construction of dd, but I am looking for someone who could perhaps understand and show me how to get there, explaining me the link between the definitions of D and dd.
I think d[x] corresponds to the bad character rule in http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm and D corresponds to the Good Suffix rule in the same article. This would mean that x in d[x] is not the character in the root of the tree, but the value of the first character in the text being searched that fails to match a child of the current node.
I think the idea is the same as Boyer-Moore. You move along the tree as long as you have a match, and when you have a mismatch you know two things: the character causing the mismatch, and the substring you have matched so far. Taking each of these things independently, you may be able to work out that if you shifted along the text being searched 1,2,..k positions you still wouldn't have a match, because at these offsets the character that caused a mismatch would still cause a mismatch, or the portion of the text that previously matched would not match at this shifted offset. So you can skip on to the first offset not ruled out by either value.
Actually, this suggests a variant scheme, in which d and DD provide not numbers but bit-masks, and you and together the two bitmaps and shift according to the position of the first bit that is still set. Presumably this doesn't save you enough to be worth the extra set-up time.

Finding repeating signed integers with O(n) in time and O(1) in space

(This is a generalization of: Finding duplicates in O(n) time and O(1) space)
Problem: Write a C++ or C function with time and space complexities of O(n) and O(1) respectively that finds the repeating integers in a given array without altering it.
Example: Given {1, 0, -2, 4, 4, 1, 3, 1, -2} function must print 1, -2, and 4 once (in any order).
EDIT: The following solution requires a duo-bit (to represent 0, 1, and 2) for each integer in the range of the minimum to the maximum of the array. The number of necessary bytes (regardless of array size) never exceeds (INT_MAX – INT_MIN)/4 + 1.
#include <stdio.h>
void set_min_max(int a[], long long unsigned size,\
int* min_addr, int* max_addr)
{
long long unsigned i;
if(!size) return;
*min_addr = *max_addr = a[0];
for(i = 1; i < size; ++i)
{
if(a[i] < *min_addr) *min_addr = a[i];
if(a[i] > *max_addr) *max_addr = a[i];
}
}
void print_repeats(int a[], long long unsigned size)
{
long long unsigned i;
int min, max = min;
long long diff, q, r;
char* duos;
set_min_max(a, size, &min, &max);
diff = (long long)max - (long long)min;
duos = calloc(diff / 4 + 1, 1);
for(i = 0; i < size; ++i)
{
diff = (long long)a[i] - (long long)min; /* index of duo-bit
corresponding to a[i]
in sequence of duo-bits */
q = diff / 4; /* index of byte containing duo-bit in "duos" */
r = diff % 4; /* offset of duo-bit */
switch( (duos[q] >> (6 - 2*r )) & 3 )
{
case 0: duos[q] += (1 << (6 - 2*r));
break;
case 1: duos[q] += (1 << (6 - 2*r));
printf("%d ", a[i]);
}
}
putchar('\n');
free(duos);
}
void main()
{
int a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof(a)/sizeof(int));
}
The definition of big-O notation is that its argument is a function (f(x)) that, as the variable in the function (x) tends to infinity, there exists a constant K such that the objective cost function will be smaller than Kf(x). Typically f is chosen to be the smallest such simple function such that the condition is satisfied. (It's pretty obvious how to lift the above to multiple variables.)
This matters because that K — which you aren't required to specify — allows a whole multitude of complex behavior to be hidden out of sight. For example, if the core of the algorithm is O(n2), it allows all sorts of other O(1), O(logn), O(n), O(nlogn), O(n3/2), etc. supporting bits to be hidden, even if for realistic input data those parts are what actually dominate. That's right, it can be completely misleading! (Some of the fancier bignum algorithms have this property for real. Lying with mathematics is a wonderful thing.)
So where is this going? Well, you can assume that int is a fixed size easily enough (e.g., 32-bit) and use that information to skip a lot of trouble and allocate fixed size arrays of flag bits to hold all the information that you really need. Indeed, by using two bits per potential value (one bit to say whether you've seen the value at all, another to say whether you've printed it) then you can handle the code with fixed chunk of memory of 1GB in size. That will then give you enough flag information to cope with as many 32-bit integers as you might ever wish to handle. (Heck that's even practical on 64-bit machines.) Yes, it's going to take some time to set that memory block up, but it's constant so it's formally O(1) and so drops out of the analysis. Given that, you then have constant (but whopping) memory consumption and linear time (you've got to look at each value to see whether it's new, seen once, etc.) which is exactly what was asked for.
It's a dirty trick though. You could also try scanning the input list to work out the range allowing less memory to be used in the normal case; again, that adds only linear time and you can strictly bound the memory required as above so that's constant. Yet more trickiness, but formally legal.
[EDIT] Sample C code (this is not C++, but I'm not good at C++; the main difference would be in how the flag arrays are allocated and managed):
#include <stdio.h>
#include <stdlib.h>
// Bit fiddling magic
int is(int *ary, unsigned int value) {
return ary[value>>5] & (1<<(value&31));
}
void set(int *ary, unsigned int value) {
ary[value>>5] |= 1<<(value&31);
}
// Main loop
void print_repeats(int a[], unsigned size) {
int *seen, *done;
unsigned i;
seen = calloc(134217728, sizeof(int));
done = calloc(134217728, sizeof(int));
for (i=0; i<size; i++) {
if (is(done, (unsigned) a[i]))
continue;
if (is(seen, (unsigned) a[i])) {
set(done, (unsigned) a[i]);
printf("%d ", a[i]);
} else
set(seen, (unsigned) a[i]);
}
printf("\n");
free(done);
free(seen);
}
void main() {
int a[] = {1,0,-2,4,4,1,3,1,-2};
print_repeats(a,sizeof(a)/sizeof(int));
}
Since you have an array of integers you can use the straightforward solution with sorting the array (you didn't say it can't be modified) and printing duplicates. Integer arrays can be sorted with O(n) and O(1) time and space complexities using Radix sort. Although, in general it might require O(n) space, the in-place binary MSD radix sort can be trivially implemented using O(1) space (look here for more details).
The O(1) space constraint is intractable.
The very fact of printing the array itself requires O(N) storage, by definition.
Now, feeling generous, I'll give you that you can have O(1) storage for a buffer within your program and consider that the space taken outside the program is of no concern to you, and thus that the output is not an issue...
Still, the O(1) space constraint feels intractable, because of the immutability constraint on the input array. It might not be, but it feels so.
And your solution overflows, because you try to memorize an O(N) information in a finite datatype.
There is a tricky problem with definitions here. What does O(n) mean?
Konstantin's answer claims that the radix sort time complexity is O(n). In fact it is O(n log M), where the base of the logarithm is the radix chosen, and M is the range of values that the array elements can have. So, for instance, a binary radix sort of 32-bit integers will have log M = 32.
So this is still, in a sense, O(n), because log M is a constant independent of n. But if we allow this, then there is a much simpler solution: for each integer in the range (all 4294967296 of them), go through the array to see if it occurs more than once. This is also, in a sense, O(n), because 4294967296 is also a constant independent of n.
I don't think my simple solution would count as an answer. But if not, then we shouldn't allow the radix sort, either.
I doubt this is possible. Assuming there is a solution, let's see how it works. I'll try to be as general as I can and show that it can't work... So, how does it work?
Without losing generality we could say we process the array k times, where k is fixed. The solution should also work when there are m duplicates, with m >> k. Thus, in at least one of the passes, we should be able to output x duplicates, where x grows when m grows. To do so, some useful information has been computed in a previous pass and stored in the O(1) storage. (The array itself can't be used, this would give O(n) storage.)
The problem: we have O(1) of information, when we walk over the array we have to identify x numbers(to output them). We need a O(1) storage than can tell us in O(1) time, if an element is in it. Or said in a different way, we need a data structure to store n booleans (of wich x are true) that uses O(1) space, and takes O(1) time to query.
Does this data structure exists? If not, then we can't find all duplicates in an array with O(n) time and O(1) space (or there is some fancy algorithm that works in a completely different manner???).
I really don't see how you can have only O(1) space and not modify the initial array. My guess is that you need an additional data structure. For example, what is the range of the integers? If it's 0..N like in the other question you linked, you can have an additinal count array of size N. Then in O(N) traverse the original array and increment the counter at the position of the current element. Then traverse the other array and print the numbers with count >= 2. Something like:
int* counts = new int[N];
for(int i = 0; i < N; i++) {
counts[input[i]]++;
}
for(int i = 0; i < N; i++) {
if(counts[i] >= 2) cout << i << " ";
}
delete [] counts;
Say you can use the fact you are not using all the space you have. You only need one more bit per possible value and you have lots of unused bit in your 32-bit int values.
This has serious limitations, but works in this case. Numbers have to be between -n/2 and n/2 and if they repeat m times, they will be printed m/2 times.
void print_repeats(long a[], unsigned size) {
long i, val, pos, topbit = 1 << 31, mask = ~topbit;
for (i = 0; i < size; i++)
a[i] &= mask;
for (i = 0; i < size; i++) {
val = a[i] & mask;
if (val <= mask/2) {
pos = val;
} else {
val += topbit;
pos = size + val;
}
if (a[pos] < 0) {
printf("%d\n", val);
a[pos] &= mask;
} else {
a[pos] |= topbit;
}
}
}
void main() {
long a[] = {1, 0, -2, 4, 4, 1, 3, 1, -2};
print_repeats(a, sizeof (a) / sizeof (long));
}
prints
4
1
-2

How to find if 3 numbers in a set of size N exactly sum up to M

I want to know how I can implement a better solution than O(N^3). Its similar to the knapsack and subset problems. In my question N<=8000, so i started computing sums of pairs of numbers and stored them in an array. Then I would binary search in the sorted set for each (M-sum[i]) value but the problem arises how will I keep track of the indices which summed up to sum[i]. I know I could declare extra space but my Sums array already has a size of 64 million, and hence I couldn't complete my O(N^2) solution. Please advice if I can do some optimization or if I need some totally different technique.
You could benefit from some generic tricks to improve the performance of your algorithm.
1) Don't store what you use only once
It is a common error to store more than you really need. Whenever your memory requirement seem to blow up the first question to ask yourself is Do I really need to store that stuff ? Here it turns out that you do not (as Steve explained in comments), compute the sum of two numbers (in a triangular fashion to avoid repeating yourself) and then check for the presence of the third one.
We drop the O(N**2) memory complexity! Now expected memory is O(N).
2) Know your data structures, and in particular: the hash table
Perfect hash tables are rarely (if ever) implemented, but it is (in theory) possible to craft hash tables with O(1) insertion, check and deletion characteristics, and in practice you do approach those complexities (tough it generally comes at the cost of a high constant factor that will make you prefer so-called suboptimal approaches).
Therefore, unless you need ordering (for some reason), membership is better tested through a hash table in general.
We drop the 'log N' term in the speed complexity.
With those two recommendations you easily get what you were asking for:
Build a simple hash table: the number is the key, the index the satellite data associated
Iterate in triangle fashion over your data set: for i in [0..N-1]; for j in [i+1..N-1]
At each iteration, check if K = M - set[i] - set[j] is in the hash table, if it is, extract k = table[K] and if k != i and k != j store the triple (i,j,k) in your result.
If a single result is sufficient, you can stop iterating as soon as you get the first result, otherwise you just store all the triples.
There is a simple O(n^2) solution to this that uses only O(1)* memory if you only want to find the 3 numbers (O(n) memory if you want the indices of the numbers and the set is not already sorted).
First, sort the set.
Then for each element in the set, see if there are two (other) numbers that sum to it. This is a common interview question and can be done in O(n) on a sorted set.
The idea is that you start a pointer at the beginning and one at the end, if your current sum is not the target, if it is greater than the target, decrement the end pointer, else increment the start pointer.
So for each of the n numbers we do an O(n) search and we get an O(n^2) algorithm.
*Note that this requires a sort that uses O(1) memory. Hell, since the sort need only be O(n^2) you could use bubble sort. Heapsort is O(n log n) and uses O(1) memory.
Create a "bitset" of all the numbers which makes it constant time to check if a number is there. That is a start.
The solution will then be at most O(N^2) to make all combinations of 2 numbers.
The only tricky bit here is when the solution contains a repeat, but it doesn't really matter, you can discard repeats unless it is the same number 3 times because you will hit the "repeat" case when you pair up the 2 identical numbers and see if the unique one is present.
The 3 times one is simply a matter of checking if M is divisible by 3 and whether M/3 appears 3 times as you create the bitset.
This solution does require creating extra storage, up to MAX/8 where MAX is the highest number in your set. You could use a hash table though if this number exceeds a certain point: still O(1) lookup.
This appears to work for me...
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main(void)
{
set<long long> keys;
// By default this set is sorted
set<short> N;
N.insert(4);
N.insert(8);
N.insert(19);
N.insert(5);
N.insert(12);
N.insert(35);
N.insert(6);
N.insert(1);
typedef set<short>::iterator iterator;
const short M = 18;
for(iterator i(N.begin()); i != N.end() && *i < M; ++i)
{
short d1 = M - *i; // subtract the value at this location
// if there is more to "consume"
if (d1 > 0)
{
// ignore below i as we will have already scanned it...
for(iterator j(i); j != N.end() && *j < M; ++j)
{
short d2 = d1 - *j; // again "consume" as much as we can
// now the remainder must eixst in our set N
if (N.find(d2) != N.end())
{
// means that the three numbers we've found, *i (from first loop), *j (from second loop) and d2 exist in our set of N
// now to generate the unique combination, we need to generate some form of key for our keys set
// here we take advantage of the fact that all the numbers fit into a short, we can construct such a key with a long long (8 bytes)
// the 8 byte key is made up of 2 bytes for i, 2 bytes for j and 2 bytes for d2
// and is formed in sorted order
long long key = *i; // first index is easy
// second index slightly trickier, if it's less than j, then this short must be "after" i
if (*i < *j)
key = (key << 16) | *j;
else
key |= (static_cast<int>(*j) << 16); // else it's before i
// now the key is either: i | j, or j | i (where i & j are two bytes each, and the key is currently 4 bytes)
// third index is a bugger, we have to scan the key in two byte chunks to insert our third short
if ((key & 0xFFFF) < d2)
key = (key << 16) | d2; // simple, it's the largest of the three
else if (((key >> 16) & 0xFFFF) < d2)
key = (((key << 16) | (key & 0xFFFF)) & 0xFFFF0000FFFFLL) | (d2 << 16); // its less than j but greater i
else
key |= (static_cast<long long>(d2) << 32); // it's less than i
// Now if this unique key already exists in the hash, this won't insert an entry for it
keys.insert(key);
}
// else don't care...
}
}
}
// tells us how many unique combinations there are
cout << "size: " << keys.size() << endl;
// prints out the 6 bytes for representing the three numbers
for(set<long long>::iterator it (keys.begin()), end(keys.end()); it != end; ++it)
cout << hex << *it << endl;
return 0;
}
Okay, here is attempt two: this generates the output:
start: 19
size: 4
10005000c
400060008
500050008
600060006
As you can see from there, the first "key" is the three shorts (in hex), 0x0001, 0x0005, 0x000C (which is 1, 5, 12 = 18), etc.
Okay, cleaned up the code some more, realised that the reverse iteration is pointless..
My Big O notation is not the best (never studied computer science), however I think the above is something like, O(N) for outer and O(NlogN) for inner, reason for log N is that std::set::find() is logarithmic - however if you replace this with a hashed set, the inner loop could be as good as O(N) - please someone correct me if this is crap...
I combined the suggestions by #Matthieu M. and #Chris Hopman, and (after much trial and error) I came up with this algorithm that should be O(n log n + log (n-k)! + k) in time and O(log(n-k)) in space (the stack). That should be O(n log n) overall. It's in Python, but it doesn't use any Python-specific features.
import bisect
def binsearch(r, q, i, j): # O(log (j-i))
return bisect.bisect_left(q, r, i, j)
def binfind(q, m, i, j):
while i + 1 < j:
r = m - (q[i] + q[j])
if r < q[i]:
j -= 1
elif r > q[j]:
i += 1
else:
k = binsearch(r, q, i + 1, j - 1) # O(log (j-i))
if not (i < k < j):
return None
elif q[k] == r:
return (i, k, j)
else:
return (
binfind(q, m, i + 1, j)
or
binfind(q, m, i, j - 1)
)
def find_sumof3(q, m):
return binfind(sorted(q), m, 0, len(q) - 1)
Not trying to boast about my programming skills or add redundant stuff here.
Just wanted to provide beginners with an implementation in C++.
Implementation based on the pseudocode provided by Charles Ma at Given an array of numbers, find out if 3 of them add up to 0.
I hope the comments help.
#include <iostream>
using namespace std;
void merge(int originalArray[], int low, int high, int sizeOfOriginalArray){
// Step 4: Merge sorted halves into an auxiliary array
int aux[sizeOfOriginalArray];
int auxArrayIndex, left, right, mid;
auxArrayIndex = low;
mid = (low + high)/2;
right = mid + 1;
left = low;
// choose the smaller of the two values "pointed to" by left, right
// copy that value into auxArray[auxArrayIndex]
// increment either left or right as appropriate
// increment auxArrayIndex
while ((left <= mid) && (right <= high)) {
if (originalArray[left] <= originalArray[right]) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}else{
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to auxArray
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}
while (right <= high) {
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
// all values are in auxArray; copy them back into originalArray
int index = low;
while (index <= high) {
originalArray[index] = aux[index];
index++;
}
}
void mergeSortArray(int originalArray[], int low, int high){
int sizeOfOriginalArray = high + 1;
// base case
if (low >= high) {
return;
}
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high)/2;
// Steps 2 and 3: Recursively sort the 2 halves of origianlArray and then merge those
mergeSortArray(originalArray, low, mid);
mergeSortArray(originalArray, mid + 1, high);
merge(originalArray, low, high, sizeOfOriginalArray);
}
//O(n^2) solution without hash tables
//Basically using a sorted array, for each number in an array, you use two pointers, one starting from the number and one starting from the end of the array, check if the sum of the three elements pointed to by the pointers (and the current number) is >, < or == to the targetSum, and advance the pointers accordingly or return true if the targetSum is found.
bool is3SumPossible(int originalArray[], int targetSum, int sizeOfOriginalArray){
int high = sizeOfOriginalArray - 1;
mergeSortArray(originalArray, 0, high);
int temp;
for (int k = 0; k < sizeOfOriginalArray; k++) {
for (int i = k, j = sizeOfOriginalArray-1; i <= j; ) {
temp = originalArray[k] + originalArray[i] + originalArray[j];
if (temp == targetSum) {
return true;
}else if (temp < targetSum){
i++;
}else if (temp > targetSum){
j--;
}
}
}
return false;
}
int main()
{
int arr[] = {2, -5, 10, 9, 8, 7, 3};
int size = sizeof(arr)/sizeof(int);
int targetSum = 5;
//3Sum possible?
bool ans = is3SumPossible(arr, targetSum, size); //size of the array passed as a function parameter because the array itself is passed as a pointer. Hence, it is cummbersome to calculate the size of the array inside is3SumPossible()
if (ans) {
cout<<"Possible";
}else{
cout<<"Not possible";
}
return 0;
}