Sorting Optimization

Sorting Optimization - c++

I'm currently following an algorithms class and thus decided it would be good practice to implement a few of the sorting algorithms and compare them.
I implemented merge sort and quick sort and then compared their run time, along with the std::sort:
My computer isn't the fastest but for 1000000 elements I get on average after 200 attempts:
std::sort -> 0.620342 seconds
quickSort -> 2.2692
mergeSort -> 2.19048
I would like to ask if possible for comments on how to improve and optimize the implementation of my code.
void quickSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
if(s >= e)
return;
int pivot;
int a = s + (rand() % (e-s));
int b = s + (rand() % (e-s));
int c = s + (rand() % (e-s));
//find median of the 3 random pivots
int min = std::min(std::min(nums[a],nums[b]),nums[c]);
int max = std::max(std::max(nums[a],nums[b]),nums[c]);
if(nums[a] < max && nums[a] > min)
pivot = a;
else if(nums[b] < max && nums[b] > min)
pivot = b;
else
pivot = c;
int temp = nums[s];
nums[s] = nums[pivot];
nums[pivot] = temp;
//partition
int i = s + 1, j = s + 1;
for(; j < e; j++){
if(comparator(nums[j] , nums[s])){
temp = nums[i];
nums[i++] = nums[j];
nums[j] = temp;
}
}
temp = nums[i-1];
nums[i-1] = nums[s];
nums[s] = temp;
//sort left and right of partition
quickSort(nums,s,i-1,comparator);
quickSort(nums,i,e,comparator);
Here s is the index of the first element, e the index of the element after the last. defaultComparator is just the following lambda function:
auto defaultComparator = [](int a, int b){ return a <= b; };
std::vector<int> mergeSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
std::vector<int> sorted(e-s);
if(s == e)
return sorted;
int mid = (s+e)/2;
if(s == mid){
sorted[0] = nums[s];
return sorted;
}
std::vector<int> left = mergeSort(nums, s, mid);
std::vector<int> right = mergeSort(nums, mid, e);
unsigned int i = 0, j = 0;
unsigned int c = 0;
while(i < left.size() || j < right.size()){
if(i == left.size()){
sorted[c++] = right[j++];
}
else if(j == right.size()){
sorted[c++] = left[i++];
}
else{
if(comparator(left[i],right[j]))
sorted[c++] = left[i++];
else
sorted[c++] = right[j++];
}
}
return sorted;
Thank you all

The first thing I see, you're passing a std::function<> which involves a virtual call, one of the most expensive calling strategies. Give it a try with simply a template T (which might be a function) - the result will be direct function calls.
Second thing, never do this result-in-local-container (vector<int> sorted;) when optimizing and when in-place variant exists. Do in-place sort. Client should be aware of you shorting their vector; if they wish, they can make a copy in advance. You take non-const reference for a reason. [1]
Third, there's a cost associated with rand() and it's far from negligible. Unless you're sure you need the randomized variant of quicksort() (and its benefits regarding 'no too bad sequence'), use just the first element as pivot. Or the middle.
Use std::swap() to swap two elements. Chances are, it gets translated to xchg (on x86 / x64) or an equivalent, which is hard to beat. Whether the optimizer identifies your intend to swap at these places without being explicit could be verified from assembly output.
The way you found the median of three elements is full of conditional moves / branches. It's simply nums[a] + nums[b] + nums[c] - max - min; but getting nums[...], min and max at the same time could also be optimized further.
Avoid i++ when aiming at speed. While most optimizers will usually create good code, there's a small chance that it's suboptimal. Be explicit when optimizing (++i after the swap), but _only_when_optimizing_.
But the most important one: valgrind/callgrind/kcachegrind. Profile, profile, profile. Only optimize what's really slow.
[1] There's an exception to this rule: const containers that you build from non-const ones. These are usually in-house types and are shared across multiple threads, hence it's better to keep them const & copy when modification is needed. In this case, you'll allocate a new container (either const or not) in your function, but you'll probably keep const one for users' convenience on API.

For quick sort, use Hoare like partition scheme.
http://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Median of 3 only needs 3 if / swap statements (effectively a bubble sort). No need for min or max check.
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
if(nums[b] > nums[c])
std::swap(nums[b], nums[c]);
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
// use nums[b] as pivot value
For merge sort, use an entry function that does a one time creation of a working vector, then pass that vector by reference to the actual merge sort function. For top down merge sort, the indices determine the start, middle, and end of each sub-vector.
If using top down merge sort, the code can avoid copying data by alternating the direction of merge depending on the level of recursion. This can be done using two mutually recursive functions, the first one where the result ends up in the original vector, the second one where the result ends up in the working vector. The first one calls the second one twice, then merges from the working vector back to the original vector, and vice versa for the second one. For the second one, if the size == 1, then it needs to copy 1 element from the original vector to the working vector. An alternative to two functions is to pass a boolean for which direction to merge.
If using bottom up merge sort (which will be a bit faster), then each pass swaps vectors. The number of passes needed is determined up front and in the case of an odd number of passes, the first pass swaps in place, so that the data ends up in the original vector after all merge passes are done.

Related

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}

OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

What is the faster way to search for sequence of numbers in a 2d vector?

Given a 2d array(the array can be larger than 10k*10k) with integer values, What is the faster way to search for a given sequence of numbers in the array?
Assume the 2d array which is in the file is read into a big 1d vector and is accessed as big_matrix(row*x+width).
There are 3 types of searches I would like to do on the same 2d array. They are Search Ordered, Search Unordered, Search Best Match. Here's my approach to each of the search functions.
Search Ordered: This function finds all the rows in which given number sequence(order of numbers matters) is present. Here's the KMP method to find the given number sequence I implemented:
void searchPattern(std::vector<int> const &pattern, std::vector<int> const &big_matrix, int begin, int finish,
int width, std::vector<int> &searchResult) {
auto M = (int) pattern.size();
auto N = width; // size of one row
while (begin < finish) {
int i = 0;
int j = 0;
while (i < N) {
if (pattern[j] == big_matrix[(begin * width) + i]) {
j++;
i++;
}
if (j == M) {
searchResult[begin] = begin;
begin++;
break;
} else if (i < N && pattern[j] != big_matrix[(begin * width) + i]) {
if (j != 0)
j = lps[j - 1]; // lookup table as in KMP
else
i = i + 1;
}
}
if (j != M) {
searchResult[begin] = -1;
begin++;
}
}
}
Complexity: O(m*n); m is the number of rows, n is the number of cols
Search Unordered/Search Best Match: This function finds all the rows in which given number sequence is present(order of numbers doesn't matter).
Here I am sorting the large array initially and will just sort only the input array during the search.
void SearchUnordered/BestMatch(std::vector<int> const &match, std::vector<int> const &big_matrix_sorted, int begin, int finish,
int width, std::vector<int> &searchResult) {
std::vector<int>::iterator it;
std::vector<int> v(match.size() + width);
while (begin < finish) {
it = std::set_intersection(match.begin(), match.end(), big_matrix_sorted.begin() + begin * width,
big_matrix_sorted.begin() + begin * width + width, v.begin());
v.resize(it - v.begin());
if (v.size() == subseq.size())
searchResult[begin] = begin;
else
searchResult[begin] = -1;
begin++;
/* For search best match the last few lines will change as follows:
searchResult[begin] = (int) v.size();
begin++; and largest in searchResult will be the result */
}
}
Complexity: O(m*(l + n)); l - the length of the pattern, m is the number of rows, n is the number of cols.
Preprocessing of big_matrix(Constructing lookup table, storing a sorted version of it. You're allowed to do any pre-processing stuff.) is not taken into consideration. How can I improve the complexity(to O(log (m*n)) of these search functions?

If you want to do it faster overall, but already have the right algorithm. You may get some performance by just optimising the code (memory allocations, removing duplicate operations if the compiler didn't etc.). For example there may be a gain by removing the two big_matrix[(row * width) + i] and assigning it to a local variable. Be careful to profile and measure realistic cases.
For bigger gains, threads can be an option. You can process here one row at a time, so should be roughly linear speedup with the number of cores. C++ 11 has std::async, which can handle some of the work for launching threads and getting results, rather than dealing with std::thread yourself or platform specific mechanisms. There are some other newer things that may be useful as well in newer versions of C++.
void searchPatternRow(std::vector<int> const &pattern, std::vector<int> const &big_matrix, int row, int width, std::vector<int> &searchResult);
void searchPattern(std::vector<int> const &pattern, std::vector<int> const &big_matrix, int begin, int finish, int width, std::vector<int> &searchResult)
{
std::vector<std::future<void>> futures;
for (int row = begin; row < finish; ++row)
std::async([&, row]() { searchPatternRow(pattern, big_matrix, row, width, searchResult); });
for (auto &future : futures) future.wait(); // Note, also implicit when the future from async gets destructed
}
To improve threaded efficiency you may want to batch and search say 10 rows. There are also some considerations with threads writing to the same cache line for searchResult.

When searching for exact match, you can do this quite efficient by use of what I will call a "moving hash".
When you search, you calculate a hash on your search string, and at the same time you keep calculating a moving hash on the data you are searching. When comparing you then first compares the hash, and only if that match, you then go on and compare the actual data.
Now the tick is to chose an hash algorithm that can easily be updated each time you move one spot, instead of recalculating everything. An example of such a hash is eg. the sum of all the digits.
If I have the following array: 012345678901234567890 and I want to find 34567 in this array, I could define the hash as the sum of all the digits in the search string. This would give a hash of 25 (3+4+5+6+7). I would then search through the array and keep updating a running hash on the array. The first hash in the array would be 10 (0+1+2+3+4) and the second hash would be 15 (1+2+3+4+5). But instead of recalculte the second hash, I can just update the previous hash by adding 5 (the new digit) and subtracting 0 (the old digit).
As updating the "running hash" is O(1) you can speed up the process considerable if you have a good Hash algorithm that don't give many false hits. The simple sum I use as hash is properbly too simple, but other methodes allows for this updating of the hash, eg XOR ..

Iterative Merge Sort, works same speed as Bubblesort

I have tried to implement an iterative Merge sort using nested loops. Although this algorithm does sort correctly (as in after sorting things are in correct order), I know there is something wrong with this implementation as I tried to sort larger collections with it and compare timings with slower sorts, and I end up getting slow times for this iterative implementation. For example, sorting 500 items gives a time of 31 milliseconds with this implementation just like bubble sort does.
int main()
{
int size;
cin >> size;
//assume vector is already initialized with values & size
vector<int> items(size);
IterativeMergeSort(items, 0, size - 1);
}
void IterativeMergeSort(vector<int> &items, int start, int end)
{
vector<int> temp(items.size());
int left, middle, right;
for(int outer = 1; outer < 2; outer *= 2)
{
for(int inner = start; inner < end; inner = inner * outer + 1)
{
left = outer - 1;
middle = inner;
right = inner + 1;
ItMerge(items, left, middle, right, temp);
}
}
}
void ItMerge(vector<int> &items, int start, int mid, int end, vector<int> &temp)
{
int first1 = start;
int last1 = mid;
int first2 = mid + 1;
int last2 = end;
int index = first1;
while(first1 <= last1 && first2 <= last2)
{
if(items[first1] <= items[first2])
{
temp[index] = items[first1];
first1++;
}
else
{
temp[index] = items[first2];
first2++;
}
index++;
}
while(first1 <= last1)
{
temp[index] = items[first1];
first1++;
index++;
}
while(first2 <= last2)
{
temp[index] = items[first2];
first2++;
index++;
}
for(index = start; index <= end; index++)
{
items[index] = temp[index];
}
}

Your algorithm isn't merge sort. It tries to be, but it isn't.
As I understand it, what is supposed to happen is that the inner loop steps over subsequences and merges them, while the outer loop controls the inner loop's sequence length, starting with 1 and doubling on every iteration until there are just two subsequences and they get merged.
But that's not what your algorithm is doing. The outer loop's condition is broken, so the outer loop will run exactly once. And the inner loop doesn't take roughly-equal subsequences in pairs. Instead, the right subsequence is exactly one element (mid is inner, right is inner+1) and the left subsequence is always everything used so far (left is outer-1, and outer is constant 1). So the algorithm will repeatedly merge the already-sorted left subsequence with a single-element right subsequence.
This means that in effect, your algorithm is insertion sort, except that you don't insert in place, but instead copy the sorted sequence to a buffer, inserting the new element at the right moment, then copy the result back. So it's a very inefficient insertion sort.

Below is a link to somewhat optimized examples of top down and bottom up merge sort. The bottom up merge sort is a bit faster because it skips the recursive sequence used to repeated generate sub-pairs of indexes until a sub-pair represents a run of size 1. Most of the time is spent merging, so bottom up isn't that much faster. The first pass of the bottom up merge pass could be optimized by swapping pairs in place rather than copying them. The bottom up merge sort ends up with the sorted data in either the temp or original array. If the original array is wanted, then a pass count can be calculated and if the count is odd, then the first pass swaps in place.
Both versions can sort 4 million 64 bit unsigned integers in less than a second on my system (Intel Core i7 2600k 3.4ghz).
merge_sort using vectors works well with less than 9 inputs
For a vector or array of integers, a counting / radix sort would be faster still.

I've finally figured it out.
In pseudocode:
for( outer = 1, outer < length, outer *=2)
for(inner = 0; inner < length, inner = inner + (outer *2))
left = inner
middle = (inner + outer) - 1
right = (inner + (outer * 2)) - 1
merge(items, left, middle, right, temp)
After rethinking how the iterative merge sort is supposed to work and looking at a couple implementations, in the merge method, all I needed was to check if the middle and right indexes passed in were greater than or equal to the vector size (that way we handle any values that could out of bounds), then merge as usual. Also, looking at this helped greatly understand it; also this. Just to be sure that it works as well as a recursive Merge Sort, I did timings on both and both (recursive and iterative) implementations produced identical times for 500,1000,5000, and 10K values to sort (in some cases the iterative solution produced a faster time).

Heap corruption while freeing memory in a recursion function

I'm implementing an algorithm to select Kth smallest element of an array . so far when i was trying to free heap memory i got this error : crt detected that the application wrote to memory after end of heap buffer ...
int SEQUENTIAL_SELECT(int *S , int k , int n)
{
if(n<=Q) // sort S and return the kth element directly
{
qsort(S,n,sizeof(int),compare);
return S[k];
}
// subdivide S into n/Q subsequences of Q elements each
int countSets = ceil((float)n/(float)Q);
//sort each subsequnce and determine its median
int *medians = new int[countSets];
for(int i=0;i<countSets;i++)
{
if(i==countSets-1)
{
int size = Q - (n%Q);
qsort(&S[Q*i],size,sizeof(int),compare);
medians[i] = S[i*Q+size/2];
continue;
}
qsort(&S[Q*i],Q,sizeof(int),compare);
medians[i] = S[i*Q+Q/2];
}
// call SEQUENTIAL_SELECT recursively to find median of medians
int m = SEQUENTIAL_SELECT(medians,countSets/2,countSets);
delete[] medians;
int size = (3*n)/4;
int* s1 = new int[size]; // contains values less than m
int* s3 = new int[size]; // contains values graten than m
for(int i=0;i<size;i++)
{
s1[i] = INT_MAX;
s3[i] = INT_MAX;
}
int i1=0;
int i2=0;
int i3=0;
for(int i=0;i<n;i++)
{
if(S[i]>m)
s3[i3++] = S[i];
else if(S[i]<m)
s1[i1++] = S[i];
else
i2++; // count number of values equal to m
}
if( i1>=k )
m = SEQUENTIAL_SELECT(s1,k,i1);
else if( i1+i2+i3 >= k)
m = SEQUENTIAL_SELECT(s3,k-i1-i2,i3);
delete[] s3;
delete[] s1;
return m;
}

#Dcoder is certainly correct that Q - n%q is incorrect. It should be n%Q. In addition, the computation size = (3*n)/4 is not reliable; try it with n = 6 (assuming, as seems certain, that Q is actually 5) given the vector [1, 2, 3, 4, 5, 0].
You could have avoided having a lot of eyes looking at your code by simply checking the values of the indexes at every array subscript assignment (although that wouldn't have caught the assignments inside of qsort, but more on that below).
It must surely have occurred to you that you are using an awful lot of memory to perform a simple operation, which could in fact be done in-place. Normally the reason to avoid doing an in-place operation would be that you need to preserve the original vector, but you're computing medians with qsort which sorts in-place, so the original vector is already modified. If that's acceptable, then there is no reason not to do the rest of the median-of-medians algorithm in-place. [1]
By the way, although I'm certainly not one of those who fears floating-point computations, there is no reason at all for countSets = ceil(float(n)/float(Q)). (n + Q - 1)/Q will work just fine. That idiom could usefully have been used in the computation of size as well, although I'm not at all sure where you got the 3n/4 computation from in the first place.
[Note 1] Hint: instead of grouping consecutively, divide the vector into five regions and find the median of the ith element of each region. Once you've found it, swap it with the ith element of the first region; once that is done, your first region -- the first fifth of the vector -- contains the medians and you can recurse on that subvector. That means actually writing out the median code as a series of comparisons, which is tedious but a lot faster than calling qsort. That also avoids the degenerate case I mentioned above, where the median-of-medians computation incorrectly returns the smallest element in the vector.

How to find if 3 numbers in a set of size N exactly sum up to M

I want to know how I can implement a better solution than O(N^3). Its similar to the knapsack and subset problems. In my question N<=8000, so i started computing sums of pairs of numbers and stored them in an array. Then I would binary search in the sorted set for each (M-sum[i]) value but the problem arises how will I keep track of the indices which summed up to sum[i]. I know I could declare extra space but my Sums array already has a size of 64 million, and hence I couldn't complete my O(N^2) solution. Please advice if I can do some optimization or if I need some totally different technique.

You could benefit from some generic tricks to improve the performance of your algorithm.
1) Don't store what you use only once
It is a common error to store more than you really need. Whenever your memory requirement seem to blow up the first question to ask yourself is Do I really need to store that stuff ? Here it turns out that you do not (as Steve explained in comments), compute the sum of two numbers (in a triangular fashion to avoid repeating yourself) and then check for the presence of the third one.
We drop the O(N**2) memory complexity! Now expected memory is O(N).
2) Know your data structures, and in particular: the hash table
Perfect hash tables are rarely (if ever) implemented, but it is (in theory) possible to craft hash tables with O(1) insertion, check and deletion characteristics, and in practice you do approach those complexities (tough it generally comes at the cost of a high constant factor that will make you prefer so-called suboptimal approaches).
Therefore, unless you need ordering (for some reason), membership is better tested through a hash table in general.
We drop the 'log N' term in the speed complexity.
With those two recommendations you easily get what you were asking for:
Build a simple hash table: the number is the key, the index the satellite data associated
Iterate in triangle fashion over your data set: for i in [0..N-1]; for j in [i+1..N-1]
At each iteration, check if K = M - set[i] - set[j] is in the hash table, if it is, extract k = table[K] and if k != i and k != j store the triple (i,j,k) in your result.
If a single result is sufficient, you can stop iterating as soon as you get the first result, otherwise you just store all the triples.

There is a simple O(n^2) solution to this that uses only O(1)* memory if you only want to find the 3 numbers (O(n) memory if you want the indices of the numbers and the set is not already sorted).
First, sort the set.
Then for each element in the set, see if there are two (other) numbers that sum to it. This is a common interview question and can be done in O(n) on a sorted set.
The idea is that you start a pointer at the beginning and one at the end, if your current sum is not the target, if it is greater than the target, decrement the end pointer, else increment the start pointer.
So for each of the n numbers we do an O(n) search and we get an O(n^2) algorithm.
*Note that this requires a sort that uses O(1) memory. Hell, since the sort need only be O(n^2) you could use bubble sort. Heapsort is O(n log n) and uses O(1) memory.

Create a "bitset" of all the numbers which makes it constant time to check if a number is there. That is a start.
The solution will then be at most O(N^2) to make all combinations of 2 numbers.
The only tricky bit here is when the solution contains a repeat, but it doesn't really matter, you can discard repeats unless it is the same number 3 times because you will hit the "repeat" case when you pair up the 2 identical numbers and see if the unique one is present.
The 3 times one is simply a matter of checking if M is divisible by 3 and whether M/3 appears 3 times as you create the bitset.
This solution does require creating extra storage, up to MAX/8 where MAX is the highest number in your set. You could use a hash table though if this number exceeds a certain point: still O(1) lookup.

This appears to work for me...
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main(void)
{
set<long long> keys;
// By default this set is sorted
set<short> N;
N.insert(4);
N.insert(8);
N.insert(19);
N.insert(5);
N.insert(12);
N.insert(35);
N.insert(6);
N.insert(1);
typedef set<short>::iterator iterator;
const short M = 18;
for(iterator i(N.begin()); i != N.end() && *i < M; ++i)
{
short d1 = M - *i; // subtract the value at this location
// if there is more to "consume"
if (d1 > 0)
{
// ignore below i as we will have already scanned it...
for(iterator j(i); j != N.end() && *j < M; ++j)
{
short d2 = d1 - *j; // again "consume" as much as we can
// now the remainder must eixst in our set N
if (N.find(d2) != N.end())
{
// means that the three numbers we've found, *i (from first loop), *j (from second loop) and d2 exist in our set of N
// now to generate the unique combination, we need to generate some form of key for our keys set
// here we take advantage of the fact that all the numbers fit into a short, we can construct such a key with a long long (8 bytes)
// the 8 byte key is made up of 2 bytes for i, 2 bytes for j and 2 bytes for d2
// and is formed in sorted order
long long key = *i; // first index is easy
// second index slightly trickier, if it's less than j, then this short must be "after" i
if (*i < *j)
key = (key << 16) | *j;
else
key |= (static_cast<int>(*j) << 16); // else it's before i
// now the key is either: i | j, or j | i (where i & j are two bytes each, and the key is currently 4 bytes)
// third index is a bugger, we have to scan the key in two byte chunks to insert our third short
if ((key & 0xFFFF) < d2)
key = (key << 16) | d2; // simple, it's the largest of the three
else if (((key >> 16) & 0xFFFF) < d2)
key = (((key << 16) | (key & 0xFFFF)) & 0xFFFF0000FFFFLL) | (d2 << 16); // its less than j but greater i
else
key |= (static_cast<long long>(d2) << 32); // it's less than i
// Now if this unique key already exists in the hash, this won't insert an entry for it
keys.insert(key);
}
// else don't care...
}
}
}
// tells us how many unique combinations there are
cout << "size: " << keys.size() << endl;
// prints out the 6 bytes for representing the three numbers
for(set<long long>::iterator it (keys.begin()), end(keys.end()); it != end; ++it)
cout << hex << *it << endl;
return 0;
}
Okay, here is attempt two: this generates the output:
start: 19
size: 4
10005000c
400060008
500050008
600060006
As you can see from there, the first "key" is the three shorts (in hex), 0x0001, 0x0005, 0x000C (which is 1, 5, 12 = 18), etc.
Okay, cleaned up the code some more, realised that the reverse iteration is pointless..
My Big O notation is not the best (never studied computer science), however I think the above is something like, O(N) for outer and O(NlogN) for inner, reason for log N is that std::set::find() is logarithmic - however if you replace this with a hashed set, the inner loop could be as good as O(N) - please someone correct me if this is crap...

I combined the suggestions by #Matthieu M. and #Chris Hopman, and (after much trial and error) I came up with this algorithm that should be O(n log n + log (n-k)! + k) in time and O(log(n-k)) in space (the stack). That should be O(n log n) overall. It's in Python, but it doesn't use any Python-specific features.
import bisect
def binsearch(r, q, i, j): # O(log (j-i))
return bisect.bisect_left(q, r, i, j)
def binfind(q, m, i, j):
while i + 1 < j:
r = m - (q[i] + q[j])
if r < q[i]:
j -= 1
elif r > q[j]:
i += 1
else:
k = binsearch(r, q, i + 1, j - 1) # O(log (j-i))
if not (i < k < j):
return None
elif q[k] == r:
return (i, k, j)
else:
return (
binfind(q, m, i + 1, j)
or
binfind(q, m, i, j - 1)
)
def find_sumof3(q, m):
return binfind(sorted(q), m, 0, len(q) - 1)

Not trying to boast about my programming skills or add redundant stuff here.
Just wanted to provide beginners with an implementation in C++.
Implementation based on the pseudocode provided by Charles Ma at Given an array of numbers, find out if 3 of them add up to 0.
I hope the comments help.
#include <iostream>
using namespace std;
void merge(int originalArray[], int low, int high, int sizeOfOriginalArray){
// Step 4: Merge sorted halves into an auxiliary array
int aux[sizeOfOriginalArray];
int auxArrayIndex, left, right, mid;
auxArrayIndex = low;
mid = (low + high)/2;
right = mid + 1;
left = low;
// choose the smaller of the two values "pointed to" by left, right
// copy that value into auxArray[auxArrayIndex]
// increment either left or right as appropriate
// increment auxArrayIndex
while ((left <= mid) && (right <= high)) {
if (originalArray[left] <= originalArray[right]) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}else{
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
}
// here when one of the two sorted halves has "run out" of values, but
// there are still some in the other half; copy all the remaining values
// to auxArray
// Note: only 1 of the next 2 loops will actually execute
while (left <= mid) {
aux[auxArrayIndex] = originalArray[left];
left++;
auxArrayIndex++;
}
while (right <= high) {
aux[auxArrayIndex] = originalArray[right];
right++;
auxArrayIndex++;
}
// all values are in auxArray; copy them back into originalArray
int index = low;
while (index <= high) {
originalArray[index] = aux[index];
index++;
}
}
void mergeSortArray(int originalArray[], int low, int high){
int sizeOfOriginalArray = high + 1;
// base case
if (low >= high) {
return;
}
// Step 1: Find the middle of the array (conceptually, divide it in half)
int mid = (low + high)/2;
// Steps 2 and 3: Recursively sort the 2 halves of origianlArray and then merge those
mergeSortArray(originalArray, low, mid);
mergeSortArray(originalArray, mid + 1, high);
merge(originalArray, low, high, sizeOfOriginalArray);
}
//O(n^2) solution without hash tables
//Basically using a sorted array, for each number in an array, you use two pointers, one starting from the number and one starting from the end of the array, check if the sum of the three elements pointed to by the pointers (and the current number) is >, < or == to the targetSum, and advance the pointers accordingly or return true if the targetSum is found.
bool is3SumPossible(int originalArray[], int targetSum, int sizeOfOriginalArray){
int high = sizeOfOriginalArray - 1;
mergeSortArray(originalArray, 0, high);
int temp;
for (int k = 0; k < sizeOfOriginalArray; k++) {
for (int i = k, j = sizeOfOriginalArray-1; i <= j; ) {
temp = originalArray[k] + originalArray[i] + originalArray[j];
if (temp == targetSum) {
return true;
}else if (temp < targetSum){
i++;
}else if (temp > targetSum){
j--;
}
}
}
return false;
}
int main()
{
int arr[] = {2, -5, 10, 9, 8, 7, 3};
int size = sizeof(arr)/sizeof(int);
int targetSum = 5;
//3Sum possible?
bool ans = is3SumPossible(arr, targetSum, size); //size of the array passed as a function parameter because the array itself is passed as a pointer. Hence, it is cummbersome to calculate the size of the array inside is3SumPossible()
if (ans) {
cout<<"Possible";
}else{
cout<<"Not possible";
}
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js