How to optimize my Langford Sequence function? - c++

This is my code for making a Langford Sequence out of an array of pairs of numbers (112233 -> 312132). I wanted to write a recursive function, because I wasn't able to find one online anywhere as a self-improvement exercise with algorithms. My question is, how do I optimize it? Is there a way to apply dynamic programming to this and have a better time/space complexity with emphasis on time complexity? My current Runtime complexity is O(n^2) and Space complexity of O(n). Any sort of help in writing cleaner code is also appreciated. Thanks. Also, is this a P or an NP problem?
#include <iostream>
using namespace std;
const int arrLen = 8;
const int seqLen = 8;
bool langfordSequence(int * arr, int indx, int *seq, int pos);
int main() {
int arr[] = {1,1,2,2,3,3,4,4};
int seq[] = {0,0,0,0,0,0,0,0};
bool test = langfordSequence(arr, 0, seq, 0);
if (test)
cout << "Langford Sequence Successful: " << endl;
else
cout << "Langford Sequence Failed: " << endl;
for (int i = 0; i < seqLen; i++)
{
cout << seq[i] << " ";
}
return 0;
}
bool langfordSequence(int * arr, int indx, int *seq, int pos)
{
if (indx >= arrLen - 1) //this means we've reached the end of the array
return true;
if (pos + arr[indx] + 1 >= seqLen) //if the second part of the number is off the array
return false;
if (seq[pos] == 0 && seq[pos + arr[indx] + 1] == 0)
{
seq[pos] = arr[indx];
seq[pos + arr[indx] + 1] = arr[indx];
if (langfordSequence(arr, indx + 2, seq, 0)) //the current pair is good, go to the next one, start from the beginning
return true;
else
{
seq[pos] = 0;
seq[pos + arr[indx] + 1] = 0;
if (langfordSequence(arr, indx, seq, pos + 1))
return true;
}
}
else
{
if (langfordSequence(arr, indx, seq, pos + 1)) //current position is no good, try next position
return true;
}
}

Here’s pseudocode for the idea I was referring to in my comments. I haven’t searched to see who else has done something like this yet (because I like to solve things myself first) but someone else probably has priority.
Algorithm LANGFORD
Parameters N (largest element in the top-level, final sequence), M (largest element of the intermediate, hooked sequence). At the top level, M = N.
Returns: A list of all sequences of length 2N such that each element j in 1..M appears exactly twice separated by exactly j elements and the position of the second M is less than N + M/2 + 1. All other elements of the sequence are set to 0.
If M == 1 (base case)
Let S' := []
For i := 0 to N-2
Let s' be the length 2N sequence containing the subsequence "101" starting at position i (counting from 0), and zero everywhere else.
Insert s' into S'
Return S'
Otherwise: (inductive case)
Let S' := []
Let S := LANGFORD(N,M-1)
For each s in S
Let r := reverse(s)
For i := 0 to floor(N - M/2 + 1)
If s[i] == s[i+M+1] == 0
Let s' be s with s'[i] and s'[i+M+1] replaced by M
Insert s' into S'
If r != s and r[i] == r[i+M+1] == 0
Let r' be r with r'[i] and r'[i+M+1] replaced by M
Insert r' into S'
Return S'
Running this algorithm for N = 4, we have initially M = 4 and recurse until N = 4, M = 1. This step gives us the list [[10100000],[01010000],[00101000]]. We pass this back up to the M=2 step, which finds the hooked sequences [[12102000],[10120020],[20020101],[02002101],[00201210],[01210200],[20021010],[00201210],[20121000],[02012100]]. Passing these up to the M=3 step, we get [[30023121],[13120320],[13102302],[31213200],[23021310],[23121300],[03121320]]. Finally, we return to the top-level function and find the sequence [[41312432]], which also represents its symmetric dual 23421314.
Essentially, we're trying to fit each puzzle piece like "30003" into each potential solution, keeping in mind that the mirror image of any solution is a solution. The time and space complexity are dominated by the combinatorial explosion of potential solutions for values of M around N/2. It might be fast to store the sequences as byte arrays aligned to use vector instructions, and the lists as array lists (vector in C++, [sequence] in Haskell, etc.).

Related

How to Write Recursive Majority Element Algorithm [duplicate]

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Array-Sum Operation

I have written this code using vector. Some case has been passed but others show timeout termination error.
The problem statement is:-
You have an identity permutation of N integers as an array initially. An identity permutation of N integers is [1,2,3,...N-1,N]. In this task, you have to perform M operations on the array and report the sum of the elements of the array after each operation.
The ith operation consists of an integer opi.
If the array contains opi, swap the first and last elements in the array.
Else, remove the last element of the array and push opi to the end of the array.
Input Format
The first line contains two space-separated integers N and M.
Then, M lines follow denoting the operations opi.
Constraints :
2<=N,M <= 10^5
1 <= op <= 5*10^5
Output Format
Print M lines, each containing a single integer denoting the answer to each of the M operations.
Sample Input 0
3 2
4
2
Sample Output 0
7
7
Explanation 0
Initially, the array is [1,2,3].
After the 1st operation, the array becomes[1,2,4] as opi = 4, as 4 is not present in the current array, we remove 3 and push 4 to the end of the array and hence, sum=7 .
After 2nd operation the array becomes [4,2,1] as opi = 2, as 2 is present in the current array, we swap 1 and 4 and hence, sum=7.
Here is my code:
#include <bits/stdc++.h>
using namespace std;
int main()
{
long int N,M,op,i,t=0;
vector<long int > g1;
cin>>N>>M;
if(N>=2 && M>=2) {
g1.reserve(N);
for(i = 1;i<=N;i++) {
g1.push_back(i);
}
while(M--) {
cin>>op;
auto it = find(g1.begin(), g1.end(), op);
if(it != (g1.end())) {
t = g1.front();
g1.front() = g1.back();
g1.back() = t;
cout<<accumulate(g1.begin(), g1.end(), 0);
cout<<endl;
}
else {
g1.back() = op;
cout<<accumulate(g1.begin(), g1.end(), 0);
cout<<endl;
}
}
}
return 0;
}
Please Suggest changes.
Looking carefully in question you will find that the operation are made only on the first and last element. So there is no need to involve a whole vector in it much less calculating the sum. we can calculate the whole sum of the elements except first and last by (n+1)(n-2)/2 and then we can manipulate the first and last element in the question. We can also shorten the search by using (1<op<n or op==first element or op == last element).
p.s. I am not sure it will work completely but it certainly is faster
my guess, let take N = 3, op = [4, 2]
N= [1,2,3]
sum = ((N-2) * (N+1)) / 2, it leave first and last element, give the sum of numbers between them.
we need to play with the first and last elements. it's big o(n).
function performOperations(N, op) {
let out = [];
let first = 1, last = N;
let sum = Math.ceil( ((N-2) * (N+1)) / 2);
for(let i =0;i<op.length;i++){
let not_between = !(op[i] >= 2 && op[i] <= N-1);
if( first!= op[i] && last != op[i] && not_between) {
last = op[i];
}else {
let t = first;
first = last;
last = t;
}
out.push(sum + first +last)
}
return out;
}

Divide array into smaller consecutive parts such that NEO value is maximal

On this years Bubble Cup (finished) there was the problem NEO (which I couldn't solve), which asks
Given array with n integer elements. We divide it into several part (may be 1), each part is a consecutive of elements. The NEO value in that case is computed by: Sum of value of each part. Value of a part is sum all elements in this part multiple by its length.
Example: We have array: [ 2 3 -2 1 ]. If we divide it like: [2 3] [-2 1]. Then NEO = (2 + 3) * 2 + (-2 + 1) * 2 = 10 - 2 = 8.
The number of elements in array is smaller then 10^5 and the numbers are integers between -10^6 and 10^6
I've tried something like divide and conquer to constantly split array into two parts if it increases the maximal NEO number otherwise return the NEO of the whole array. But unfortunately the algorithm has worst case O(N^2) complexity (my implementation is below) so I'm wondering whether there is a better solution
EDIT: My algorithm (greedy) doesn't work, taking for example [1,2,-6,2,1] my algorithm returns the whole array while to get the maximal NEO value is to take parts [1,2],[-6],[2,1] which gives NEO value of (1+2)*2+(-6)+(1+2)*2=6
#include <iostream>
int maxInterval(long long int suma[],int first,int N)
{
long long int max = -1000000000000000000LL;
long long int curr;
if(first==N) return 0;
int k;
for(int i=first;i<N;i++)
{
if(first>0) curr = (suma[i]-suma[first-1])*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Split the array into elements from [first..i] and [i+1..N-1] store the corresponding NEO value
else curr = suma[i]*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Same excpet that here first = 0 so suma[first-1] doesn't exist
if(curr > max) max = curr,k=i; // find the maximal NEO value for splitting into two parts
}
if(k==N-1) return max; // If the max when we take the whole array then return the NEO value of the whole array
else
{
return maxInterval(suma,first,k+1)+maxInterval(suma,k+1,N); // Split the 2 parts further if needed and return it's sum
}
}
int main() {
int T;
std::cin >> T;
for(int j=0;j<T;j++) // Iterate over all the test cases
{
int N;
long long int NEO[100010]; // Values, could be long int but just to be safe
long long int suma[100010]; // sum[i] = sum of NEO values from NEO[0] to NEO[i]
long long int sum=0;
int k;
std::cin >> N;
for(int i=0;i<N;i++)
{
std::cin >> NEO[i];
sum+=NEO[i];
suma[i] = sum;
}
std::cout << maxInterval(suma,0,N) << std::endl;
}
return 0;
}
This is not a complete solution but should provide some helpful direction.
Combining two groups that each have a positive sum (or one of the sums is non-negative) would always yield a bigger NEO than leaving them separate:
m * a + n * b < (m + n) * (a + b) where a, b > 0 (or a > 0, b >= 0); m and n are subarray lengths
Combining a group with a negative sum with an entire group of non-negative numbers always yields a greater NEO than combining it with only part of the non-negative group. But excluding the group with the negative sum could yield an even greater NEO:
[1, 1, 1, 1] [-2] => m * a + 1 * (-b)
Now, imagine we gradually move the dividing line to the left, increasing the sum b is combined with. While the expression on the right is negative, the NEO for the left group keeps decreasing. But if the expression on the right gets positive, relying on our first assertion (see 1.), combining the two groups would always be greater than not.
Combining negative numbers alone in sequence will always yield a smaller NEO than leaving them separate:
-a - b - c ... = -1 * (a + b + c ...)
l * (-a - b - c ...) = -l * (a + b + c ...)
-l * (a + b + c ...) < -1 * (a + b + c ...) where l > 1; a, b, c ... > 0
O(n^2) time, O(n) space JavaScript code:
function f(A){
A.unshift(0);
let negatives = [];
let prefixes = new Array(A.length).fill(0);
let m = new Array(A.length).fill(0);
for (let i=1; i<A.length; i++){
if (A[i] < 0)
negatives.push(i);
prefixes[i] = A[i] + prefixes[i - 1];
m[i] = i * (A[i] + prefixes[i - 1]);
for (let j=negatives.length-1; j>=0; j--){
let negative = prefixes[negatives[j]] - prefixes[negatives[j] - 1];
let prefix = (i - negatives[j]) * (prefixes[i] - prefixes[negatives[j]]);
m[i] = Math.max(m[i], prefix + negative + m[negatives[j] - 1]);
}
}
return m[m.length - 1];
}
console.log(f([1, 2, -5, 2, 1, 3, -4, 1, 2]));
console.log(f([1, 2, -4, 1]));
console.log(f([2, 3, -2, 1]));
console.log(f([-2, -3, -2, -1]));
Update
This blog provides that we can transform the dp queries from
dp_i = sum_i*i + max(for j < i) of ((dp_j + sum_j*j) + (-j*sum_i) + (-i*sumj))
to
dp_i = sum_i*i + max(for j < i) of (dp_j + sum_j*j, -j, -sum_j) ⋅ (1, sum_i, i)
which means we could then look at each iteration for an already seen vector that would generate the largest dot product with our current information. The math alluded to involves convex hull and farthest point query, which are beyond my reach to implement at this point but will make a study of.

Divide-and-conquer algorithm for finding the majority element?

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

How to find if 3 numbers in a set of size N exactly sum up to M

I want to know how I can implement a better solution than O(N^3). Its similar to the knapsack and subset problems. In my question N<=8000, so i started computing sums of pairs of numbers and stored them in an array. Then I would binary search in the sorted set for each (M-sum[i]) value but the problem arises how will I keep track of the indices which summed up to sum[i]. I know I could declare extra space but my Sums array already has a size of 64 million, and hence I couldn't complete my O(N^2) solution. Please advice if I can do some optimization or if I need some totally different technique.
You could benefit from some generic tricks to improve the performance of your algorithm.
1) Don't store what you use only once
It is a common error to store more than you really need. Whenever your memory requirement seem to blow up the first question to ask yourself is Do I really need to store that stuff ? Here it turns out that you do not (as Steve explained in comments), compute the sum of two numbers (in a triangular fashion to avoid repeating yourself) and then check for the presence of the third one.
We drop the O(N**2) memory complexity! Now expected memory is O(N).
2) Know your data structures, and in particular: the hash table
Perfect hash tables are rarely (if ever) implemented, but it is (in theory) possible to craft hash tables with O(1) insertion, check and deletion characteristics, and in practice you do approach those complexities (tough it generally comes at the cost of a high constant factor that will make you prefer so-called suboptimal approaches).
Therefore, unless you need ordering (for some reason), membership is better tested through a hash table in general.
We drop the 'log N' term in the speed complexity.
With those two recommendations you easily get what you were asking for:
Build a simple hash table: the number is the key, the index the satellite data associated
Iterate in triangle fashion over your data set: for i in [0..N-1]; for j in [i+1..N-1]
At each iteration, check if K = M - set[i] - set[j] is in the hash table, if it is, extract k = table[K] and if k != i and k != j store the triple (i,j,k) in your result.
If a single result is sufficient, you can stop iterating as soon as you get the first result, otherwise you just store all the triples.
There is a simple O(n^2) solution to this that uses only O(1)* memory if you only want to find the 3 numbers (O(n) memory if you want the indices of the numbers and the set is not already sorted).
First, sort the set.
Then for each element in the set, see if there are two (other) numbers that sum to it. This is a common interview question and can be done in O(n) on a sorted set.
The idea is that you start a pointer at the beginning and one at the end, if your current sum is not the target, if it is greater than the target, decrement the end pointer, else increment the start pointer.
So for each of the n numbers we do an O(n) search and we get an O(n^2) algorithm.
*Note that this requires a sort that uses O(1) memory. Hell, since the sort need only be O(n^2) you could use bubble sort. Heapsort is O(n log n) and uses O(1) memory.
Create a "bitset" of all the numbers which makes it constant time to check if a number is there. That is a start.
The solution will then be at most O(N^2) to make all combinations of 2 numbers.
The only tricky bit here is when the solution contains a repeat, but it doesn't really matter, you can discard repeats unless it is the same number 3 times because you will hit the "repeat" case when you pair up the 2 identical numbers and see if the unique one is present.
The 3 times one is simply a matter of checking if M is divisible by 3 and whether M/3 appears 3 times as you create the bitset.
This solution does require creating extra storage, up to MAX/8 where MAX is the highest number in your set. You could use a hash table though if this number exceeds a certain point: still O(1) lookup.
This appears to work for me...
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main(void)
{
set<long long> keys;
// By default this set is sorted
set<short> N;
N.insert(4);
N.insert(8);
N.insert(19);
N.insert(5);
N.insert(12);
N.insert(35);
N.insert(6);
N.insert(1);
typedef set<short>::iterator iterator;
const short M = 18;
for(iterator i(N.begin()); i != N.end() && *i < M; ++i)
{
short d1 = M - *i; // subtract the value at this location
// if there is more to "consume"
if (d1 > 0)
{
// ignore below i as we will have already scanned it...
for(iterator j(i); j != N.end() && *j < M; ++j)
{
short d2 = d1 - *j; // again "consume" as much as we can
// now the remainder must eixst in our set N
if (N.find(d2) != N.end())
{
// means that the three numbers we've found, *i (from first loop), *j (from second loop) and d2 exist in our set of N
// now to generate the unique combination, we need to generate some form of key for our keys set
// here we take advantage of the fact that all the numbers fit into a short, we can construct such a key with a long long (8 bytes)
// the 8 byte key is made up of 2 bytes for i, 2 bytes for j and 2 bytes for d2
// and is formed in sorted order
long long key = *i; // first index is easy
// second index slightly trickier, if it's less than j, then this short must be "after" i
if (*i < *j)
key = (key << 16) | *j;
else
key |= (static_cast<int>(*j) << 16); // else it's before i
// now the key is either: i | j, or j | i (where i & j are two bytes each, and the key is currently 4 bytes)
// third index is a bugger, we have to scan the key in two byte chunks to insert our third short
if ((key & 0xFFFF) < d2)
key = (key << 16) | d2; // simple, it's the largest of the three
else if (((key >> 16) & 0xFFFF) < d2)
key = (((key << 16) | (key & 0xFFFF)) & 0xFFFF0000FFFFLL) | (d2 << 16); // its less than j but greater i
else
key |= (static_cast<long long>(d2) << 32); // it's less than i
// Now if this unique key already exists in the hash, this won't insert an entry for it
keys.insert(key);
}
// else don't care...
}
}
}
// tells us how many unique combinations there are
cout << "size: " << keys.size() << endl;
// prints out the 6 bytes for representing the three numbers
for(set<long long>::iterator it (keys.begin()), end(keys.end()); it != end; ++it)
cout << hex << *it << endl;
return 0;
}
Okay, here is attempt two: this generates the output:
start: 19
size: 4
10005000c
400060008
500050008
600060006
As you can see from there, the first "key" is the three shorts (in hex), 0x0001, 0x0005, 0x000C (which is 1, 5, 12 = 18), etc.
Okay, cleaned up the code some more, realised that the reverse iteration is pointless..
My Big O notation is not the best (never studied computer science), however I think the above is something like, O(N) for outer and O(NlogN) for inner, reason for log N is that std::set::find() is logarithmic - however if you replace this with a hashed set, the inner loop could be as good as O(N) - please someone correct me if this is crap...
I combined the suggestions by #Matthieu M. and #Chris Hopman, and (after much trial and error) I came up with this algorithm that should be O(n log n + log (n-k)! + k) in time and O(log(n-k)) in space (the stack). That should be O(n log n) overall. It's in Python, but it doesn't use any Python-specific features.
import bisect
def binsearch(r, q, i, j): # O(log (j-i))
return bisect.bisect_left(q, r, i, j)
def binfind(q, m, i, j):
while i + 1 < j:
r = m - (q[i] + q[j])
if r < q[i]:
j -= 1
elif r > q[j]:
i += 1
else:
k = binsearch(r, q, i + 1, j - 1) # O(log (j-i))
if not (i < k < j):
return None
elif q[k] == r:
return (i, k, j)
else:
return (
binfind(q, m, i + 1, j)
or
binfind(q, m, i, j - 1)
)
def find_sumof3(q, m):
return binfind(sorted(q), m, 0, len(q) - 1)
Not trying to boast about my programming skills or add redundant stuff here.
Just wanted to provide beginners with an implementation in C++.
Implementation based on the pseudocode provided by Charles Ma at Given an array of numbers, find out if 3 of them add up to 0.
I hope the comments help.
#include <iostream>
using namespace std;
void merge(int originalArray[], int low, int high, int sizeOfOriginalArray){
// Step 4: Merge sorted halves into an auxiliary array
int aux[sizeOfOriginalArray];
int auxArrayIndex, left, right, mid;
auxArrayIndex = low;
mid = (low + high)/2;
right = mid + 1;
left = low;
// choose the smaller of the two values "pointed to" by left, right
// copy that value into auxArray[auxArrayIndex]
// increment either left or right as appropriate
// increment auxArrayIndex
while ((left <= mid) && (right <= high)) {
if (originalArray[left] <= originalArray[right]) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}else{
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to auxArray
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}
while (right <= high) {
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
// all values are in auxArray; copy them back into originalArray
int index = low;
while (index <= high) {
originalArray[index] = aux[index];
index++;
}
}
void mergeSortArray(int originalArray[], int low, int high){
int sizeOfOriginalArray = high + 1;
// base case
if (low >= high) {
return;
}
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high)/2;
// Steps 2 and 3: Recursively sort the 2 halves of origianlArray and then merge those
mergeSortArray(originalArray, low, mid);
mergeSortArray(originalArray, mid + 1, high);
merge(originalArray, low, high, sizeOfOriginalArray);
}
//O(n^2) solution without hash tables
//Basically using a sorted array, for each number in an array, you use two pointers, one starting from the number and one starting from the end of the array, check if the sum of the three elements pointed to by the pointers (and the current number) is >, < or == to the targetSum, and advance the pointers accordingly or return true if the targetSum is found.
bool is3SumPossible(int originalArray[], int targetSum, int sizeOfOriginalArray){
int high = sizeOfOriginalArray - 1;
mergeSortArray(originalArray, 0, high);
int temp;
for (int k = 0; k < sizeOfOriginalArray; k++) {
for (int i = k, j = sizeOfOriginalArray-1; i <= j; ) {
temp = originalArray[k] + originalArray[i] + originalArray[j];
if (temp == targetSum) {
return true;
}else if (temp < targetSum){
i++;
}else if (temp > targetSum){
j--;
}
}
}
return false;
}
int main()
{
int arr[] = {2, -5, 10, 9, 8, 7, 3};
int size = sizeof(arr)/sizeof(int);
int targetSum = 5;
//3Sum possible?
bool ans = is3SumPossible(arr, targetSum, size); //size of the array passed as a function parameter because the array itself is passed as a pointer. Hence, it is cummbersome to calculate the size of the array inside is3SumPossible()
if (ans) {
cout<<"Possible";
}else{
cout<<"Not possible";
}
return 0;
}