Efficient algorithm to produce closest triplet from 3 arrays? - c++

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}

OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Related

Sort array of n elements which has k sorted sections

What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
{
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
{
sub_sections_indices[i] = 0;
}
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
{
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
{
if (sub_sections_indices[j] < section_size)
{
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
{
min = input[current_index];
min_index = j;
}
}
}
sub_sections_indices[min_index]++;
result[i] = min;
}
return result;
}
Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(n²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.

Efficiently find an integer not in a set of size 40, 400, or 4000

Related to the classic problem find an integer not among four billion given ones but not exactly the same.
To clarify, by integers what I really mean is only a subset of its mathemtical definition. That is, assume there are only finite number of integers. Say in C++, they are int in the range of [INT_MIN, INT_MAX].
Now given a std::vector<int> (no duplicates) or std::unordered_set<int>, whose size can be 40, 400, 4000 or so, but not too large, how to efficiently generate a number that is guaranteed to be not among the given ones?
If there is no worry for overflow, then I could multiply all nonzero ones together and add the product by 1. But there is. The adversary test cases could delibrately contain INT_MAX.
I am more in favor of simple, non-random approaches. Is there any?
Thank you!
Update: to clear up ambiguity, let's say an unsorted std::vector<int> which is guaranteed to have no duplicates. So I am asking if there is anything better than O(n log(n)). Also please note that test cases may contain both INT_MIN and INT_MAX.
You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.
int find_not_contained(container<int> const&data)
{
const int N=data.size();
std::vector<char> known(N+1, 0); // one more candidates than data
for(int i=0; i< N; ++i)
if(data[i]>=0 && data[i]<=N)
known[data[i]]=1;
for(int i=0; i<=N; ++i)
if(!known[i])
return i;
assert(false); // should never be reached.
}
Random methods can be more space efficient, but may require more passes over the data in the worst case.
Random methods are indeed very efficient here.
If we want to use a deterministic method and by assuming the size n is not too large, 4000 for example, then we can create a vector x of size m = n + 1 (or a little bit larger, 4096 for example to facilitate calculation), initialised with 0.
For each i in the range, we just set x[array[i] modulo m] = 1.
Then a simple O(n) search in x will provide a value which is not in array
Note: the modulo operation is not exactly the "%" operation
Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation
You can find the smallest unused integer in O(N) time using O(1) auxiliary space if you are allowed to reorder the input vector, using the following algorithm. [Note 1] (The algorithm also works if the vector contains repeated data.)
size_t smallest_unused(std::vector<unsigned>& data) {
size_t N = data.size(), scan = 0;
while (scan < N) {
auto other = data[scan];
if (other < scan && data[other] != other) {
data[scan] = data[other];
data[other] = other;
}
else
++scan;
}
for (scan = 0; scan < N && data[scan] == scan; ++scan) { }
return scan;
}
The first pass guarantees that if some k in the range [0, N) was found after position k, then it is now present at position k. This rearrangement is done by swapping in order to avoid losing data. Once that scan is complete, the first entry whose value is not the same as its index is not referenced anywhere in the array.
That assertion may not be 100% obvious, since a entry could be referenced from an earlier index. However, in that case the entry could not be the first entry unequal to its index, since the earlier entry would be meet that criterion.
To see that this algorithm is O(N), it should be observed that the swap at lines 6 and 7 can only happen if the target entry is not equal to its index, and that after the swap the target entry is equal to its index. So at most N swaps can be performed, and the if condition at line 5 will be true at most N times. On the other hand, if the if condition is false, scan will be incremented, which can also only happen N times. So the if statement is evaluated at most 2N times (which is O(N)).
Notes:
I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.
Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).
Step 1: Sort the vector.
That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.
Step 2: Find the first int not in the vector.
Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:
Pseudocode:
SIZE = 40|400|4000 // The one you are using
for (int i = 0; i < SIZE; i++) {
if (array[i] != INT_MIN + i)
return INT_MIN + i;
The solution would be O(n log(n) + n) meaning: O(n log(n))
Edit: just read your edit asking for something better than O(n log(n)), sorry.
For the case in which the integers are provided in an std::unordered_set<int> (as opposed to a std::vector<int>), you could simply traverse the range of integer values until you come up against one integer value that is not present in the unordered_set<int>. Searching for the presence of an integer in an std::unordered_set<int> is quite straightforward, since std::unodered_set does provide searching through its find() member function.
The space complexity of this approach would be O(1).
If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:
int find_lowest_not_contained(const std::unordered_set<int>& set) {
for (auto i = std::numeric_limits<int>::min(); ; ++i) {
auto it = set.find(i); // search in set
if (it == set.end()) // integer not in set?
return *it;
}
}
Analogously, if you start traversing at the greatest possible value for an int (i.e., std::numeric_limits<int>::max()), you will obtain the lowest int not contained in the std::unordered_set<int>:
int find_greatest_not_contained(const std::unordered_set<int>& set) {
for (auto i = std::numeric_limits<int>::max(); ; --i) {
auto it = set.find(i); // search in set
if (it == set.end()) // integer not in set?
return *it;
}
}
Assuming that the ints are uniformly mapped by the hash function into the unordered_set<int>'s buckets, a search operation on the unordered_set<int> can be achieved in constant time. The run-time complexity would then be O(M ), where M is the size of the integer range you are looking for a non-contained value. M is upper-bounded by the size of the unordered_set<int> (i.e., in your case M <= 4000).
Indeed, with this approach, selecting any integer range whose size is greater than the size of the unordered_set, is guaranteed to come up against an integer value which is not present in the unordered_set<int>.

Merging K Sorted Arrays/Vectors Complexity

While looking into the problem of merging k sorted contiguous arrays/vectors and how it differs in implementation from merging k sorted linked lists I found two relatively easy naive solutions for merging k contiguous arrays and a nice optimized method based off of pairwise-merging that simulates how mergeSort() works. The two naive solutions I implemented seem to have the same complexity, but in a big randomized test I ran it seems one is way more inefficient than the other.
Naive merging
My naive merging method works as follows. We create an output vector<int> and set it to the first of k vectors we are given. We then merge in the second vector, then the third, and so on. Since a typical merge() method that takes in two vectors and returns one is asymptotically linear in both space and time to the number of elements in both vectors the total complexity will be O(n + 2n + 3n + ... + kn) where n is the average number of elements in each list. Since we're adding 1n + 2n + 3n + ... + kn I believe the total complexity is O(n*k^2). Consider the following code:
vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
vector<int> finalList = multiList[0];
for (int j = 1; j < multiList.size(); ++j) {
finalList = mergeLists(multiList[j], finalList);
}
return finalList;
}
Naive selection
My second naive solution works as follows:
/**
* The logic behind this algorithm is fairly simple and inefficient.
* Basically we want to start with the first values of each of the k
* vectors, pick the smallest value and push it to our finalList vector.
* We then need to be looking at the next value of the vector we took the
* value from so we don't keep taking the same value. A vector of vector
* iterators is used to hold our position in each vector. While all iterators
* are not at the .end() of their corresponding vector, we maintain a minValue
* variable initialized to INT_MAX, and a minValueIndex variable and iterate over
* each of the k vector iterators and if the current iterator is not an end position
* we check to see if it is smaller than our minValue. If it is, we update our minValue
* and set our minValue index (this is so we later know which iterator to increment after
* we iterate through all of them). We do a check after our iteration to see if minValue
* still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
* exhausted every vector and can stop iterative over all k of them. Regarding the complexity
* of this method, we are iterating over `k` vectors so long as at least one value has not been
* accounted for. Since there are `nk` values where `n` is the average number of elements in each
* list, the time complexity = O(nk^2) like our other naive method.
*/
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
vector<int> finalList;
vector<vector<int>::const_iterator> iterators(multiList.size());
// Set all iterators to the beginning of their corresponding vectors in multiList
for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();
int k = 0, minValue, minValueIndex;
while (1) {
minValue = INT_MAX;
for (int i = 0; i < iterators.size(); ++i){
if (iterators[i] == multiList[i].end()) continue;
if (*iterators[i] < minValue) {
minValue = *iterators[i];
minValueIndex = i;
}
}
iterators[minValueIndex]++;
if (minValue == INT_MAX) break;
finalList.push_back(minValue);
}
return finalList;
}
Random simulation
Long story short, I built a simple randomized simulation that builds a multidimensional vector<vector<int>>. The multidimensional vector starts with 2 vectors each of size 2, and ends up with 600 vectors each of size 600. Each vector is sorted, and the sizes of the larger container and each child vector increase by two elements every iteration. I time how long it takes for each algorithm to perform like this:
clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();
clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();
I then built the following plot:
My calculations say the two naive solutions (merging and selecting) both have the same time complexity but the above plot shows them as very different. At first I rationalized this by saying there may be more overhead in one vs the other, but then realized that the overhead should be a constant factor and not produce a plot like the following. What is the explanation for this? I assume my complexity analysis is wrong?
Even if two algorithms have the same complexity (O(nk^2) in your case) they may end up having enormously different running times depending upon your size of input and the 'constant' factors involved.
For example, if an algorithm runs in n/1000 time and another algorithm runs in 1000n time, they both have the same asymptotic complexity but they shall have very different running times for 'reasonable' choices of n.
Moreover, there are effects caused by caching, compiler optimizations etc that may change the running time significantly.
For your case, although your calculation of complexities seem to be correct, but in the first case, the actual running time shall be (nk^2 + nk)/2 whereas in the second case, the running time shall be nk^2. Notice that the division by 2 may be significant because as k increases the nk term shall be negligible.
For a third algorithm, you can modify the Naive selection by maintaining a heap of k elements containing the first elements of all the k vectors. Then your selection process shall take O(logk) time and hence the complexity shall reduce to O(nklogk).

Whats the efficient way to sum up the elements of an array in following way?

Suppose you are given an n sized array A and a integer k
Now you have to follow this function:
long long sum(int k)
{
long long sum=0;
for(int i=0;i<n;i++){
sum+=min(A[i],k);
}
return sum;
}
what is the most efficient way to find sum?
EDIT: if I am given m(<=100000) queries, and given a different k every time, it becomes very time consuming.
If set of queries changes with each k then you can't do better than in O(n). Your only options for optimizing is to use multiple threads (each thread sums some region of array) or at least ensure that your loop is properly vectorized by compiler (or write vectorized version manually using intrinsics).
But if set of queries is fixed and only k is changed, then you may do in O(log n) by using following optimization.
Preprocess array. This is done only once for all ks:
Sort elements
Make another array of the same length which contains partial sums
For example:
inputArray: 5 1 3 8 7
sortedArray: 1 3 5 7 8
partialSums: 1 4 9 16 24
Now, when new k is given, you need to perform following steps:
Make binary search for given k in sortedArray -- returns index of maximal element <= k
Result is partialSums[i] + (partialSums.length - i) * k
You can do way better than that if you can sort the array A[i] and have a secondary array prepared once.
The idea is:
Count how many items are less than k, and just compute the equivalent sum by the formula: count*k
Prepare an helper array which will give you the sum of the items superior to k directly
Preparation
Step 1: sort the array
std::sort(begin(A), end(A));
Step 2: prepare an helper array
std::vector<long long> p_sums(A.size());
std::partial_sum(rbegin(A), rend(A), begin(p_sums));
Query
long long query(int k) {
// first skip all items whose value is below k strictly
auto it = std::lower_bound(begin(A), end(A), k);
// compute the distance (number of items skipped)
auto index = std::distance(begin(A), it);
// do the sum
long long result = index*k + p_sums[index];
return result;
}
The complexity of the query is: O(log(N)) where N is the length of the array A.
The complexity of the preparation is: O(N*log(N)). We could go down to O(N) with a radix sort but I don't think it is useful in your case.
References
std::sort()
std::partial_sum()
std::lower_bound()
What you do seems absolutely fine. Unless this is really absolutely time critical (that is customers complain that your app is too slow and you measured it, and this function is the problem, in which case you can try some non-portable vector instructions, for example).
Often you can do things more efficiently by looking at them from a higher level. For example, if I write
for (n = 0; n < 1000000; ++n)
printf ("%lld\n", sum (100));
then this will take an awful long time (half a trillion additions) and can be done a lot quicker. Same if you change one element of the array A at a time and recalculate sum each time.
Suppose there are x elements of array A which are no larger than k and set B contains those elements which are larger than k and belongs to A.
Then the result of function sum(k) equals
k * x + sum_b
,where sum_b is the sum of elements belonging to B.
You can firstly sort the the array A, and calculate the array pre_A, where
pre_A[i] = pre_A[i - 1] + A[i] (i > 0),
or 0 (i = 0);
Then for each query k, use binary search on A to find the largest element u which is no larger than k. Assume the index of u is index_u, then sum(k) equals
k * index_u + pre_A[n] - pre_A[index_u]
. The time complex for each query is log(n).
In case array A may be dynamically changed, you can use BST to handle it.

Long array performance issue

I have an array of char pointers of length 175,000. Each pointer points to a c-string array of length 100, each character is either 1 or 0. I need to compare the difference between the strings.
char* arr[175000];
So far, I have two for loops where I compare every string with every other string. The comparison functions basically take two c-strings and returns an integer which is the number of differences of the arrays.
This is taking really long on my 4-core machine. Last time I left it to run for 45min and it never finished executing. Please advise of a faster solution or some optimizations.
Example:
000010
000001
have a difference of 2 since the last two bits do not match.
After i calculate the difference i store the value in another array
int holder;
for(int x = 0;x < UsedTableSpace; x++){
int min = 10000000;
for(int y = 0; y < UsedTableSpace; y++){
if(x != y){
//compr calculates difference between two c-string arrays
int tempDiff =compr(similarity[x]->matrix, similarity[y]->matrix);
if(tempDiff < min){
min = tempDiff;
holder = y;
}
}
}
similarity[holder]->inbound++;
}
With more information, we could probably give you better advice, but based on what I understand of the question, here are some ideas:
Since you're using each character to represent a 1 or a 0, you're using several times more memory than you need to use, which creates a big performance impact when it comes to caching and such. Instead, represent your data using numeric values that you can think of in terms of a series of bits.
Once you've implemented #1, you can grab an entire integer or long at a time and do a bitwise XOR operation to end up with a number that has a 1 in every place where the two numbers didn't have the same values. Then you can use some of the tricks mentioned here to count these bits speedily.
Work on "unrolling" your loops somewhat to avoid the number of jumps necessary. For example, the following code:
total = total + array[i];
total = total + array[i + 1];
total = total + array[i + 2];
... will work faster than just looping over total = total + array[i] three times. Jumps are expensive, and interfere with the processor's pipelining. Update: I should mention that your compiler may be doing some of this for you already--you can check the compiled code to see.
Break your overall data set into chunks that will allow you to take full advantage of caching. Think of your problem as a "square" with the i index on one axis and the j axis on the other. If you start with one i and iterate across all 175000 j values, the first j values you visit will be gone from the cache by the time you get to the end of the line. On the other hand, if you take the top left corner and go from j=0 to 256, most of the values on the j axis will still be in a low-level cache as you loop around to compare them with i=0, 1, 2, etc.
Lastly, although this should go without saying, I guess it's worth mentioning: Make sure your compiler is set to optimize!
One simple optimization is to compare the strings only once. If the difference between A and B is 12, the difference between B and A is also 12. Your running time is going to drop almost half.
In code:
int compr(const char* a, const char* b) {
int d = 0, i;
for (i=0; i < 100; ++i)
if (a[i] != b[i]) ++d;
return d;
}
void main_function(...) {
for(int x = 0;x < UsedTableSpace; x++){
int min = 10000000;
for(int y = x + 1; y < UsedTableSpace; y++){
//compr calculates difference between two c-string arrays
int tempDiff = compr(similarity[x]->matrix, similarity[y]->matrix);
if(tempDiff < min){
min = tempDiff;
holder = y;
}
}
similarity[holder]->inbound++;
}
}
Notice the second-th for loop, I've changed the start index.
Some other optimizations is running the run method on separate threads to take advantage of your 4 cores.
What is your goal, i.e. what do you want to do with the Hamming Distances (which is what they are) after you've got them? For example, if you are looking for the closest pair, or most distant pair, you probably can get an O(n ln n) algorithm instead of the O(n^2) methods suggested so far. (At n=175000, n^2 is 15000 times larger than n ln n.)
For example, you could characterize each 100-bit number m by 8 4-bit numbers, being the number of bits set in 8 segments of m, and sort the resulting 32-bit signatures into ascending order. Signatures of the closest pair are likely to be nearby in the sorted list. It is easy to lower-bound the distance between two numbers if their signatures differ, giving an effective branch-and-bound process as less-distant numbers are found.