All possible combination. Faster way - c++

I have a vector of numbers between 1 and 100(this is not important) which can take sizes between 3 and 1.000.000 values.
If anyone can help me getting 3 value unique* combinations from that vector.
*Unique
Example: I have in the array the following values: 1[0] 5[1] 7[2] 8[3] 7[4] (the [x] is the index)
In this case 1[0] 5[1] 7[2] and 1[3] 5[1] 7[4] are different, but 1[0] 5[1] 7[2] and 7[2] 1[0] 5[1] are the same(duplicate)
My algorithm is a little slow when i work with a lot of values(example 1.000.000). So what i want is a faster way to do it.
for(unsigned int x = 0;x<vect.size()-2;x++){
for(unsigned int y = x+1;y<vect.size()-1;y++){
for(unsigned int z = y+1;z<vect.size();z++)
{
// do thing with vect[x],vect[y],vect[z]
}
}
}

In fact it is very very important that your values are between 1 and 100! Because with a vector of size 1,000,000 you have a lot of numbers that are equal and you don't need to inspect all of them! What you can do is the following:
Note: the following code is just an outline! It may lack sufficient error checking and is just here to give you the idea, not for copy paste!
Note2: When I wrote the answer, I assumed the numbers to be in the range [0, 99]. Then I read that they are actually in [1, 100]. Obviously this is not a problem and you can either -1 all the numbers or even better, change all the 100s to 101s.
bool exists[100] = {0}; // exists[i] means whether i exists in your vector
for (unsigned int i = 0, size = vect.size(); i < size; ++i)
exists[vect[i]] = true;
Then, you do similar to what you did before:
for(unsigned int x = 0; x < 98; x++)
if (exists[x])
for(unsigned int y = x+1; y < 99; y++)
if (exists[y])
for(unsigned int z = y+1; z < 100; z++)
if (exists[z])
{
// {x, y, z} is an answer
}
Another thing you can do is spend more time in preparation to have less time generating the pairs. For example:
int nums[100]; // from 0 to count are the numbers you have
int count = 0;
for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
bool exists = false;
for (int j = 0; j < count; ++j)
if (vect[i] == nums[j])
{
exists = true;
break;
}
if (!exists)
nums[count++] = vect[i];
}
Then
for(unsigned int x = 0; x < count-2; x++)
for(unsigned int y = x+1; y < count-1; y++)
for(unsigned int z = y+1; z < count; z++)
{
// {nums[x], nums[y], nums[z]} is an answer
}
Let us consider 100 to be a variable, so let's call it k, and the actual numbers present in the array as m (which is smaller than or equal to k).
With the first method, you have O(n) preparation and O(m^2*k) operations to search for the value which is quite fast.
In the second method, you have O(nm) preparation and O(m^3) for generation of the values. Given your values for n and m, the preparation takes too long.
You could actually merge the two methods to get the best of both worlds, so something like this:
int nums[100]; // from 0 to count are the numbers you have
int count = 0;
bool exists[100] = {0}; // exists[i] means whether i exists in your vector
for (unsigned int i = 0, size = vect.size(); i < size; ++i)
{
if (!exists[vect[i]])
nums[count++] = vect[i];
exists[vect[i]] = true;
}
Then:
for(unsigned int x = 0; x < count-2; x++)
for(unsigned int y = x+1; y < count-1; y++)
for(unsigned int z = y+1; z < count; z++)
{
// {nums[x], nums[y], nums[z]} is an answer
}
This method has O(n) preparation and O(m^3) cost to find the unique triplets.
Edit: It turned out that for the OP, the same number in different locations are considered different values. If that is really the case, then I'm sorry, there is no faster solution. The reason is that all the possible combinations themselves are C(n, m) (That's a combination) that although you are generating each one of them in O(1), it is still too big for you.

There's really nothing that can be done to speed up the loop body you have there. Consider that with 1M vector size, you are making one trillion loop iterations.
Producing all combinations like that is an exponential problem, which means that you won't be able to practically solve it when the input size becomes large enough. Your only option would be to leverage specific knowledge of your application (what you need the results for, and how exactly they will be used) to "work around" the issue if possible.

Possibly you can sort your input, make it unique, and pick x[a], x[b] and x[c] when a < b < c. The sort will be O(n log n) and picking the combination will be O(n³). Still you will have less triplets to iterate over:
std::vector<int> x = original_vector;
std::sort(x.begin(), x.end());
std::erase(std::unique(x.begin(), x.end()), x.end());
for(a = 0; a < x.size() - 2; ++a)
for(b=a+1; b < x.size() - 1; ++b)
for(c=b+1; c< x.size(); ++c
issue triplet(x[a],x[b],x[c]);

Depending on your actual data, you may be able to speed it up significantly by first making a vector that has at most three entries with each value and iterate over that instead.

As r15habh pointed out, I think the fact that the values in the array are between 1-100 is in fact important.
Here's what you can do: make one pass through the array, reading values into a unique set. This by itself is O(n) time complexity. The set will have no more than 100 elements, which means O(1) space complexity.
Now since you need to generate all 3-item permutations, you'll still need 3 nested loops, but instead of operating on the potentially huge array, you'll be operating on a set that has at most 100 elements.
Overall time complexity depends on your original data set. For a small data set, time complexity will be O(n^3). For a large data set, it will approach O(n).

If understand your application correctly then you can use a tuple instead, and store in either a set or hash table depending on your requirements. If the normal of the tri matters, then make sure that you shift the tri so that lets say the largest element is first, if normal shouldn't matter, then just sort the tuple. A version using boost & integers:
#include <set>
#include <algorithm>
#include "boost/tuple/tuple.hpp"
#include "boost/tuple/tuple_comparison.hpp"
int main()
{
typedef boost::tuple< int, int, int > Tri;
typedef std::set< Tri > TriSet;
TriSet storage;
// 1 duplicate
int exampleData[4][3] = { { 1, 2, 3 }, { 2, 3, 6 }, { 5, 3, 2 }, { 2, 1, 3 } };
for( unsigned int i = 0; i < sizeof( exampleData ) / sizeof( exampleData[0] ); ++i )
{
std::sort( exampleData[i], exampleData[i] + ( sizeof( exampleData[i] ) / sizeof( exampleData[i][0] ) ) );
if( !storage.insert( boost::make_tuple( exampleData[i][0], exampleData[i][1], exampleData[i][2] ) ).second )
std::cout << "Duplicate!" << std::endl;
else
std::cout << "Not duplicate!" << std::endl;
}
}

Related

Determine duplicates/pairs in an array in C++

I have been doing this problem for 2 days now, and I still can't figure out how to do this properly.
In this program, I have to input the number of sticks available (let's say 5). Then, the user will be asked to input the lengths of each stick (space-separated integer). Let's say the lengths of each stick respectively are [4, 4, 3, 3, 4]. Now, I have to determine if there are pairs (2 sticks of same length). In this case, we have 2 (4,4 and 3,3). Since there are 2 pairs, we can create a canvas (a canvas has a total of 2 pairs of sticks as the frame). Now, I don't know exactly how to determine how many "pairs" there are in an array. I would like to ask for your help and guidance. Just note that I am a beginner. I might not understand complex processes. So, if there is a simple (or something that a beginner can understand) way to do it, it would be great. It's just that I don't want to put something in my code that I don't fully comprehend. Thank you!
Attached here is the link to the problem itself.
https://codeforces.com/problemset/problem/127/B
Here is my code (without the process that determines the number of pairs)
#include<iostream>
#include<cmath>
#define MAX 100
int lookForPairs(int numberOfSticks);
int main(void){
int numberOfSticks = 0, maxNumOfFrames = 0;
std::cin >> numberOfSticks;
maxNumOfFrames = lookForPairs(numberOfSticks);
std::cout << maxNumOfFrames << std::endl;
return 0;
}
int lookForPairs(int numberOfSticks){
int lengths[MAX], pairs = 0, count = 0, canvas = 0;
for(int i=0; i<numberOfSticks; i++){
std::cin >> lengths[i];
}
pairs = floor(count/2);
canvas = floor(pairs/2);
return count;
}
I tried doing it like this, but it was flawed. It wouldn't work when there were 3 or more integers of the same number (for ex. [4, 4, 3, 4, 2] or [5. 5. 5. 5. 6]). On the first array, the count would be 6 when it should only be 3 since there are only three 4s.
for(int i=0; i<numberOfSticks; i++){
for (int j=0; j<numberOfSticks; j++){
if (lengths[i] == lengths[j] && i!=j)
count++;
}
}
Instead of storing all the lengths and then comparing them, count how many there are of each length directly.
These values are known to be positive and at most 100, so you can use an int[100] array for this as well:
int counts[MAX] = {}; // Initialize array to all zeros.
for(int i = 0; i < numberOfSticks; i++) {
int length = 0;
std::cin >> length;
counts[length-1] += 1; // Adjust for zero-based indexing.
}
Then count them:
int pairs = 0;
for(int i = 0; i < MAX; i++) {
pairs += counts[i] / 2;
}
and then you have the answer:
return pairs;
Just an extension to molbdnilo's answer: You can even count all pairs in one single iteration:
for(int i = 0; i < numberOfSticks; ++i)
{
if(std::cin >> length) // catch invalid input!
{
pairs += flags[length] == 1; // add a pair if there is already a stick
flags[length] ^= 1; // toggle between 0 and 1...
}
else
{
// some appropriate error handling
}
}
Note that I skipped subtracting 1 from the length – which requires the array being one larger in length (but now it can be of smallest type available, i.e. char), while index 0 just serves as an unused sentinel. This variant would even allow to use bitmaps for storing the flags, though questionable if, with a maximum length that small, all this bit fiddling would be worth it…
You can count the number of occurrences using a map. It seems that you are not allowed to use a standard map. Since the size of a stick is limited to 100, according to the link you provided, you can use an array, m of 101 items (stick's minimum size is 1, maximum size is 100). The element index is the size of the stick. The element value is the number of sticks. That is, m[a[i]] is the number of sticks of size a[i]. Demo.
#define MAX 100
int n = 7;
int a[MAX] = { 1,2,3,4,1,2,3 };
int m[MAX + 1]; // maps stick len to number of sticks
void count()
{
for (int i = 0; i < n; ++i)
m[a[i]]++;
}
int main()
{
count();
for (int i = 1; i < MAX + 1; ++i)
if (m[i])
std::cout << i << "->" << m[i] << std::endl;
}
Your inner loop is counting forward from the very beginning each time, making you overcount the items in your array. Count forward from i , not zero.
for(int i=0; i<numberOfSticks; i++)
{
for (int j=i; j<numberOfSticks; j++) { // count forward from i (not zero)
if (lengths[i] == lengths[j] && i!=j)
{ // enclosing your blocks in curly braces , even if only one line, is easier to read
count++; // you'll want to store this value somewhere along with the 'length'. perhaps a map?
}
}
}

C++ Part of brute-force knapsack

reader,
Well, I think I just got brainfucked a bit.
I'm implementing knapsack, and I thought about I implemented brute-force algorithm like 1 or 2 times ever. So I decided to make another one.
And here's what I chocked in.
Let us decide W is maximum weight, and w(min) is minimal-weighted element we can put in knapsack like k=W/w(min) times. I'm explaining this because you, reader, are better know why I need to ask my question.
Now. If we imagine that we have like 3 types of things we can put in knapsack, and our knapsack can store like 15 units of mass, let's count each unit weight as its number respectively. so we can put like 15 things of 1st type, or 7 things of 2nd type and 1 thing of 1st type. but, combinations like 22222221[7ed] and 12222222[7ed] will mean the same for us. and counting them is a waste of any type of resources we pay for decision. (it's a joke, 'cause bf is a waste if we have a cheaper algorithm, but I'm very interested)
As I guess the type of selections we need to go through all possible combinations is called "Combinations with repetitions". The number of C'(n,k) counts as (n+k-1)!/(n-1)!k!.
(while I typing my message I just spotted a hole in my theory. we will probably need to add an empty, zero-weighted-zero-priced item to hold free space it's probably just increases n by 1)
so, what's the matter.
https://rosettacode.org/wiki/Combinations_with_repetitions
as this problem is well-described up here^ I don't really want to use stack this way, I want to generate variations in single cycle, which is going from i=0 to i<C'(n,k).
so, If I can make it, how it works?
we have
int prices[n]; //appear mystically
int weights[n]; // same as previous and I guess we place (0,0) in both of them.
int W, k; // W initialized by our lord and savior
k = W/min(weights);
int road[k], finalroad[k]; //all 0
int curP = curW = maxP = maxW = 0;
for (int i = 0; i < rCombNumber(n, k); i ++) {
/*guys please help me to know how to generate this mask which is consists of indices from 0 to n (meaning of each element) and k is size of mask.*/
curW = 0;
for (int j = 0; j < k; j ++)
curW += weights[road[j]];
if (curW < W) {
curP = 0;
for (int l = 0; l < k; l ++)
curP += prices[road[l]];
if (curP > maxP) {
maxP = curP;
maxW = curW;
finalroad = road;
}
}
}
mask, road -- is an array of indices, each can be equal from 0 to n; and have to be generated as C'(n,k) (link about it above) from { 0, 1, 2, ... , n } by k elements in each selection (combination with repetitions where order is unimportant)
that's it. prove me wrong or help me. Much thanks in advance _
and yes, of course algorithm will take the hell much time, but it looks like it should work. and I'm very interesting in it.
UPDATE:
what do I miss?
http://pastexen.com/code.php?file=EMcn3F9ceC.txt
The answer was provided by Minoru here https://gist.github.com/Minoru/745a7c19c7fa77702332cf4bd3f80f9e ,
it's enough to increment only the first element, then we count all of the carries, set where we did a carry and count reset value as the maximum of elements to reset and reset with it.
here's my code:
#include <iostream>
using namespace std;
static long FactNaive(int n)
{
long r = 1;
for (int i = 2; i <= n; ++i)
r *= i;
return r;
}
static long long CrNK (long n, long k)
{
long long u, l;
u = FactNaive(n+k-1);
l = FactNaive(k)*FactNaive(n-1);
return u/l;
}
int main()
{
int numberOFchoices=7,kountOfElementsInCombination=4;
int arrayOfSingleCombination[kountOfElementsInCombination] = {0,0,0,0};
int leftmostResetPos = kountOfElementsInCombination;
int resetValue=1;
for (long long iterationCounter = 0; iterationCounter<CrNK(numberOFchoices,kountOfElementsInCombination); iterationCounter++)
{
leftmostResetPos = kountOfElementsInCombination;
if (iterationCounter!=0)
{
arrayOfSingleCombination[kountOfElementsInCombination-1]++;
for (int anotherIterationCounter=kountOfElementsInCombination-1; anotherIterationCounter>0; anotherIterationCounter--)
{
if(arrayOfSingleCombination[anotherIterationCounter]==numberOFchoices)
{
leftmostResetPos = anotherIterationCounter;
arrayOfSingleCombination[anotherIterationCounter-1]++;
}
}
}
if (leftmostResetPos != kountOfElementsInCombination)
{
resetValue = 1;
for (int j = 0; j < leftmostResetPos; j++)
{
if (arrayOfSingleCombination[j] > resetValue)
{
resetValue = arrayOfSingleCombination[j];
}
}
for (int j = leftmostResetPos; j != kountOfElementsInCombination; j++)
{
arrayOfSingleCombination[j] = resetValue;
}
}
for (int j = 0; j<kountOfElementsInCombination; j++)
{
cout<<arrayOfSingleCombination[j]<<" ";
}
cout<<"\n";
}
return 0;
}
thanks a lot, Minoru

Almost same code running much slower

I am trying to solve this problem:
Given a string array words, find the maximum value of length(word[i]) * length(word[j]) where the two words do not share common letters. You may assume that each word will contain only lower case letters. If no such two words exist, return 0.
https://leetcode.com/problems/maximum-product-of-word-lengths/
You can create a bitmap of char for each word to check if they share chars in common and then calc the max product.
I have two method almost equal but the first pass checks, while the second is too slow, can you understand why?
class Solution {
public:
int maxProduct2(vector<string>& words) {
int len = words.size();
int *num = new int[len];
// compute the bit O(n)
for (int i = 0; i < len; i ++) {
int k = 0;
for (int j = 0; j < words[i].length(); j ++) {
k = k | (1 <<(char)(words[i].at(j)));
}
num[i] = k;
}
int c = 0;
// O(n^2)
for (int i = 0; i < len - 1; i ++) {
for (int j = i + 1; j < len; j ++) {
if ((num[i] & num[j]) == 0) { // if no common letters
int x = words[i].length() * words[j].length();
if (x > c) {
c = x;
}
}
}
}
delete []num;
return c;
}
int maxProduct(vector<string>& words) {
vector<int> bitmap(words.size());
for(int i=0;i<words.size();++i) {
int k = 0;
for(int j=0;j<words[i].length();++j) {
k |= 1 << (char)(words[i][j]);
}
bitmap[i] = k;
}
int maxProd = 0;
for(int i=0;i<words.size()-1;++i) {
for(int j=i+1;j<words.size();++j) {
if ( !(bitmap[i] & bitmap[j])) {
int x = words[i].length() * words[j].length();
if ( x > maxProd )
maxProd = x;
}
}
}
return maxProd;
}
};
Why the second function (maxProduct) is too slow for leetcode?
Solution
The second method does repetitive call to words.size(). If you save that in a var than it working fine
Since my comment turned out to be correct I'll turn my comment into an answer and try to explain what I think is happening.
I wrote some simple code to benchmark on my own machine with two solutions of two loops each. The only difference is the call to words.size() is inside the loop versus outside the loop. The first solution is approximately 13.87 seconds versus 16.65 seconds for the second solution. This isn't huge, but it's about 20% slower.
Even though vector.size() is a constant time operation that doesn't mean it's as fast as just checking against a variable that's already in a register. Constant time can still have large variances. When inside nested loops that adds up.
The other thing that could be happening (someone much smarter than me will probably chime in and let us know) is that you're hurting your CPU optimizations like branching and pipelining. Every time it gets to the end of the the loop it has to stop, wait for the call to size() to return, and then check the loop variable against that return value. If the cpu can look ahead and guess that j is still going to be less than len because it hasn't seen len change (len isn't even inside the loop!) it can make a good branch prediction each time and not have to wait.

C++ Checking for identical values in 2 arrays

I have 2 arrays called xVal, and yVal.
I'm using these arrays as coords. What I want to do is to make sure that the array doesn't contain 2 identical sets of coords.
Lets say my arrays looks like this:
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
Here I want to find the match between xVal[0] yVal[0] and xVal[1] yVal[1] as 2 identical sets of coords called 1,1.
I have tried some different things with a forLoop, but I cant make it work as intended.
You can write an explicit loop using an O(n^2) approach (see answer from x77aBs) or you can trade in some memory for performance. For example using std::set
bool unique(std::vector<int>& x, std::vector<int>& y)
{
std::set< std::pair<int, int> > seen;
for (int i=0,n=x.size(); i<n; i++)
{
if (seen.insert(std::make_pair(x[i], y[i])).second == false)
return false;
}
return true;
}
You can do it with two for loops:
int MAX=4; //number of elements in array
for (int i=0; i<MAX; i++)
{
for (int j=i+1; j<MAX; j++)
{
if (xVal[i]==xVal[j] && yVal[i]==yVal[j])
{
//DUPLICATE ELEMENT at xVal[j], yVal[j]. Here you implement what
//you want (maybe just set them to -1, or delete them and move everything
//one position back)
}
}
}
Small explanation: first variable i get value 0. Than you loop j over all possible numbers. That way you compare xVal[0] and yVal[0] with all other values. j starts at i+1 because you don't need to compare values before i (they have already been compared).
Edit - you should consider writing small class that will represent a point, or at least structure, and using std::vector instead of arrays (it's easier to delete an element in the middle). That should make your life easier :)
int identicalValueNum = 0;
int identicalIndices[4]; // 4 is the max. possible number of identical values
for (int i = 0; i < 4; i++)
{
if (xVal[i] == yVal[i])
{
identicalIndices[identicalValueNum++] = i;
}
}
for (int i = 0; i < identicalValueNum; i++)
{
printf(
"The %ith value in both arrays is the same and is: %i.\n",
identicalIndices[i], xVal[i]);
}
For
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
the output of printf would be:
The 0th value in both arrays is the same and is: 1.
The 1th value in both arrays is the same and is: 1.
The 3th value in both arrays is the same and is: 4.

Optimized way to find M largest elements in an NxN array using C++

I need a blazing fast way to find the 2D positions and values of the M largest elements in an NxN array.
right now I'm doing this:
struct SourcePoint {
Point point;
float value;
}
SourcePoint* maxValues = new SourcePoint[ M ];
maxCoefficients = new SourcePoint*[
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
if (sample > maxValues[0].value) {
int q = 1;
while ( sample > maxValues[q].value && q < M ) {
maxValues[q-1] = maxValues[q]; // shuffle the values back
q++;
}
maxValues[q-1].value = sample;
maxValues[q-1].point = Point(i,j);
}
}
}
A Point struct is just two ints - x and y.
This code basically does an insertion sort of the values coming in. maxValues[0] always contains the SourcePoint with the lowest value that still keeps it within the top M values encoutered so far. This gives us a quick and easy bailout if sample <= maxValues, we don't do anything. The issue I'm having is the shuffling every time a new better value is found. It works its way all the way down maxValues until it finds it's spot, shuffling all the elements in maxValues to make room for itself.
I'm getting to the point where I'm ready to look into SIMD solutions, or cache optimisations, since it looks like there's a fair bit of cache thrashing happening. Cutting the cost of this operation down will dramatically affect the performance of my overall algorithm since this is called many many times and accounts for 60-80% of my overall cost.
I've tried using a std::vector and make_heap, but I think the overhead for creating the heap outweighed the savings of the heap operations. This is likely because M and N generally aren't large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is this operation is called repeatedly, and it can't be precomputed.
I just had a thought to pre-load the first M elements of maxValues which may provide some small savings. In the current algorithm, the first M elements are guaranteed to shuffle themselves all the way down just to initially fill maxValues.
Any help from optimization gurus would be much appreciated :)
A few ideas you can try. In some quick tests with N=100 and M=15 I was able to get it around 25% faster in VC++ 2010 but test it yourself to see whether any of them help in your case. Some of these changes may have no or even a negative effect depending on the actual usage/data and compiler optimizations.
Don't allocate a new maxValues array each time unless you need to. Using a stack variable instead of dynamic allocation gets me +5%.
Changing g_Source[i][j] to g_Source[j][i] gains you a very little bit (not as much as I'd thought there would be).
Using the structure SourcePoint1 listed at the bottom gets me another few percent.
The biggest gain of around +15% was to replace the local variable sample with g_Source[j][i]. The compiler is likely smart enough to optimize out the multiple reads to the array which it can't do if you use a local variable.
Trying a simple binary search netted me a small loss of a few percent. For larger M/Ns you'd likely see a benefit.
If possible try to keep the source data in arr[][] sorted, even if only partially. Ideally you'd want to generate maxValues[] at the same time the source data is created.
Look at how the data is created/stored/organized may give you patterns or information to reduce the amount of time to generate your maxValues[] array. For example, in the best case you could come up with a formula that gives you the top M coordinates without needing to iterate and sort.
Code for above:
struct SourcePoint1 {
int x;
int y;
float value;
int test; //Play with manual/compiler padding if needed
};
If you want to go into micro-optimizations at this point, the a simple first step should be to get rid of the Points and just stuff both dimensions into a single int. That reduces the amount of data you need to shift around, and gets SourcePoint down to being a power of two long, which simplifies indexing into it.
Also, are you sure that keeping the list sorted is better than simply recomputing which element is the new lowest after each time you shift the old lowest out?
(Updated 22:37 UTC 2011-08-20)
I propose a binary min-heap of fixed size holding the M largest elements (but still in min-heap order!). It probably won't be faster in practice, as I think OPs insertion sort probably has decent real world performance (at least when the recommendations of the other posteres in this thread are taken into account).
Look-up in the case of failure should be constant time: If the current element is less than the minimum element of the heap (containing the max M elements) we can reject it outright.
If it turns out that we have an element bigger than the current minimum of the heap (the Mth biggest element) we extract (discard) the previous min and insert the new element.
If the elements are needed in sorted order the heap can be sorted afterwards.
First attempt at a minimal C++ implementation:
template<unsigned size, typename T>
class m_heap {
private:
T nodes[size];
static const unsigned last = size - 1;
static unsigned parent(unsigned i) { return (i - 1) / 2; }
static unsigned left(unsigned i) { return i * 2; }
static unsigned right(unsigned i) { return i * 2 + 1; }
void bubble_down(unsigned int i) {
for (;;) {
unsigned j = i;
if (left(i) < size && nodes[left(i)] < nodes[i])
j = left(i);
if (right(i) < size && nodes[right(i)] < nodes[j])
j = right(i);
if (i != j) {
swap(nodes[i], nodes[j]);
i = j;
} else {
break;
}
}
}
void bubble_up(unsigned i) {
while (i > 0 && nodes[i] < nodes[parent(i)]) {
swap(nodes[parent(i)], nodes[i]);
i = parent(i);
}
}
public:
m_heap() {
for (unsigned i = 0; i < size; i++) {
nodes[i] = numeric_limits<T>::min();
}
}
void add(const T& x) {
if (x < nodes[0]) {
// reject outright
return;
}
nodes[0] = x;
swap(nodes[0], nodes[last]);
bubble_down(0);
}
};
Small test/usage case:
#include <iostream>
#include <limits>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
using namespace std;
// INCLUDE TEMPLATED CLASS FROM ABOVE
typedef vector<float> vf;
bool compare(float a, float b) { return a > b; }
int main()
{
int N = 2000;
vf v;
for (int i = 0; i < N; i++) v.push_back( rand()*1e6 / RAND_MAX);
static const int M = 50;
m_heap<M, float> h;
for (int i = 0; i < N; i++) h.add( v[i] );
sort(v.begin(), v.end(), compare);
vf heap(h.get(), h.get() + M); // assume public in m_heap: T* get() { return nodes; }
sort(heap.begin(), heap.end(), compare);
cout << "Real\tFake" << endl;
for (int i = 0; i < M; i++) {
cout << v[i] << "\t" << heap[i] << endl;
if (fabs(v[i] - heap[i]) > 1e-5) abort();
}
}
You're looking for a priority queue:
template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> >
class priority_queue;
You'll need to figure out the best underlying container to use, and probably define a Compare function to deal with your Point type.
If you want to optimize it, you could run a queue on each row of your matrix in its own worker thread, then run an algorithm to pick the largest item of the queue fronts until you have your M elements.
A quick optimization would be to add a sentinel value to yourmaxValues array. If you have maxValues[M].value equal to std::numeric_limits<float>::max() then you can eliminate the q < M test in your while loop condition.
One idea would be to use the std::partial_sort algorithm on a plain one-dimensional sequence of references into your NxN array. You could probably also cache this sequence of references for subsequent calls. I don't know how well it performs, but it's worth a try - if it works good enough, you don't have as much "magic". In particular, you don't resort to micro optimizations.
Consider this showcase:
#include <algorithm>
#include <iostream>
#include <vector>
#include <stddef.h>
static const int M = 15;
static const int N = 20;
// Represents a reference to a sample of some two-dimensional array
class Sample
{
public:
Sample( float *arr, size_t row, size_t col )
: m_arr( arr ),
m_row( row ),
m_col( col )
{
}
inline operator float() const {
return m_arr[m_row * N + m_col];
}
bool operator<( const Sample &rhs ) const {
return (float)other < (float)*this;
}
int row() const {
return m_row;
}
int col() const {
return m_col;
}
private:
float *m_arr;
size_t m_row;
size_t m_col;
};
int main()
{
// Setup a demo array
float arr[N][N];
memset( arr, 0, sizeof( arr ) );
// Put in some sample values
arr[2][1] = 5.0;
arr[9][11] = 2.0;
arr[5][4] = 4.0;
arr[15][7] = 3.0;
arr[12][19] = 1.0;
// Setup the sequence of references into this array; you could keep
// a copy of this sequence around to reuse it later, I think.
std::vector<Sample> samples;
samples.reserve( N * N );
for ( size_t row = 0; row < N; ++row ) {
for ( size_t col = 0; col < N; ++col ) {
samples.push_back( Sample( (float *)arr, row, col ) );
}
}
// Let partial_sort find the M largest entry
std::partial_sort( samples.begin(), samples.begin() + M, samples.end() );
// Print out the row/column of the M largest entries.
for ( std::vector<Sample>::size_type i = 0; i < M; ++i ) {
std::cout << "#" << (i + 1) << " is " << (float)samples[i] << " at " << samples[i].row() << "/" << samples[i].col() << std::endl;
}
}
First of all, you are marching through the array in the wrong order!
You always, always, always want to scan through memory linearly. That means the last index of your array needs to be changing fastest. So instead of this:
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
Try this:
for (int i = 0; i < cols; i++) {
for (int j = 0; j < rows; j++) {
float sample = arr[i][j];
I predict this will make a bigger difference than any other single change.
Next, I would use a heap instead of a sorted array. The standard <algorithm> header already has push_heap and pop_heap functions to use a vector as a heap. (This will probably not help all that much, though, unless M is fairly large. For small M and a randomized array, you do not wind up doing all that many insertions on average... Something like O(log N) I believe.)
Next after that is to use SSE2. But that is peanuts compared to marching through memory in the right order.
You should be able to get nearly linear speedup with parallel processing.
With N CPUs, you can process a band of rows/N rows (and all columns) with each CPU, finding the top M entries in each band. And then do a selection sort to find the overall top M.
You could probably do that with SIMD as well (but here you'd divide up the task by interleaving columns instead of banding the rows). Don't try to make SIMD do your insertion sort faster, make it do more insertion sorts at once, which you combine at the end using a single very fast step.
Naturally you could do both multi-threading and SIMD, but on a problem which is only 30x30, that's not likely to be worthwhile.
I tried replacing float by double, and interestingly that gave me a speed improvement of about 20% (using VC++ 2008). That's a bit counterintuitive, but it seems modern processors or compilers are optimized for double value processing.
Use a linked list to store the best yet M values. You'll still have to iterate over it to find the right spot, but the insertion is O(1). It would probably even be better than binary search and insertion O(N)+O(1) vs O(lg(n))+O(N).
Interchange the fors, so you're not accessing every N element in memory and trashing the cache.
LE: Throwing another idea that might work for uniformly distributed values.
Find the min, max in 3/2*O(N^2) comparisons.
Create anywhere from N to N^2 uniformly distributed buckets, preferably closer to N^2 than N.
For every element in the NxN matrix place it in bucket[(int)(value-min)/range], range=max-min.
Finally create a set starting from the highest bucket to the lowest, add elements from other buckets to it while |current set| + |next bucket| <=M.
If you get M elements you're done.
You'll likely get less elements than M, let's say P.
Apply your algorithm for the remaining bucket and get biggest M-P elements out of it.
If elements are uniform and you use N^2 buckets it's complexity is about 3.5*(N^2) vs your current solution which is about O(N^2)*ln(M).