The most efficient way to find mode in an array? - c++

Recently I am trying to find the mode in a set of number by using C.
My code can done well when the set is small.
Here is my code:
int frequency[10001]; //This array stores the frequency of a number that between 0 to 10000
int main()
{
int x[10]={1,6,5,99,1,12,50,50,244,50};
int highest = 0;
int i,j,k;
for(i=0;i<10;i++)
{
frequency[x[i]]++;
if(frequency[x[i]]>highest)
highest = frequency[x[i]];
}
printf("The mode in the array : ");
for(i=0;i<=10001;i++)
if(frequency[i]==highest)
printf("%d ",i);
return 0;
}
Later, I found out that my method will be extremely slow if there is a large set of number. Also, my program will not work if there is number smaller than 0 or greater than 10000, unless I increase the size of the "frequency" array.
Therefore, I would like to know is there any way that I can find the mode in the array more efficiently? Thanks.

Use a hash table. (i.e. unordered_map is typically implemented as such).
You tagged your question as C++, so you're going to get some sample code in C++. Your on your own for implementing a hash table in C. It's not a bad learning exercise.
int x[10]={1,6,5,99,1,12,50,50,244,50};
std::unordered_map<int, int> table; // map of items in "x" to the number of times observed.
for (int i = 0; i < 10; i++)
{
table[x[i]]++;
}
int mode = 0;
int mode_freq = 0;
for (auto itor = table.begin(); itor != table.end(); itor++)
{
if (itor->second > mode_freq)
{
mode = itor->first;
mode_freq = itor->second;
}
}
std::cout << "The mode in the array is " << mode << std::endl;

You could simply sort your array (man qsort), and then search for the longuest sequence of the same number.
The question is : how do you behave when two number equally appear at the most frequency in the array ?

I think that your question is too general to get a definite answer:
the "most efficient" is a pretty big requirement, I guess you would be interested in any "more" efficient solution :).
more efficient in what way? Faster execution time? Less memory usage? Better code?
First of all I would have written this little piece like this:
static const size_t NUM_FREQ=1000;
int main()
{
vector< unsigned int > frequency(NUM_FREQ);
vector< unsigned int > samples[10]={1,6,5,99,1,12,50,50,244,50};
int highest = 0;
int i,j,k;
for ( size_t i = 0; i < samples.size(); i++ )
{
assert( samples[i] < NUM_FREQ && "value in samples is bigger than expected" );
frequency[ samples[ i ] ]++;
if( frequency[ samples[ i ] ] > highest )
highest = frequency[ samples[ i ] ];
}
printf("The mode in the array : ");
for ( size_t i = 0; i < frequency.size(); i++ )
if ( frequency[ i ] == highest )
printf("%d ",i);
return EXIT_SUCCESS;
}
Out of all the bad practices that I changed, the one you should be more careful with is relying on plain types implicit initialization.
Now, there are a number of things that may or may not be wrong with this:
The most obvious is that you do not need to loop over it twice, just use an extra variable to remember the location of the highest frequency and get rid of the second loop completely.
In your example, there are very few samples and using such a big frequency array is a waste of space. If the size of samples is less than the NUM_FREQ I would simply use a vector of pairs. I assume that your real application uses a sample array that is bigger than the frequency array.
Finally sorting or hashing could accelerate things but it very much depends on how the frequency data is used in the rest of your application (but you haven't shown anything but this simple code).

You cannot find an occurrence of a negative number.You can only find the occurrence of a number.
Instead of using an array of frequency[10001] use MAPS in C++.
Now let me modify your code.By using maps instead of arrays.
#include <bits/stdc++.h>
using namespace std;
int main()
{
int x[10]={1,6,5,99,1,12,50,50,244,50};
map <int, int> freq;//using map here instead of frequency array
int highiest=0;
for(int i=0;i<10;i++)
{
freq[x[i]]+=1;//indexing
}
for(int i=0;i<sizeof(freq);i++)
{
if(freq[i]>highiest)//finding the highiest occurancy of a number.
highiest=i;
}
cout<<highiest<<endl;//printing the highiest occurancy number
}

Related

How to deal with large sizes of data such as array or just number that causing stack in Cpp?

its my first time dealing with large numbers or arrays and i cant avoid over stacking i tried to use long long to try to avoid it but it shows me that the error is int main line :
CODE:
#include <iostream>
using namespace std;
int main()
{
long long n=0, city[100000], min[100000] = {10^9}, max[100000] = { 0 };
cin >> n;
for (int i = 0; i < n; i++) {
cin >> city[i];
}
for (int i = 0; i < n; i++)
{//min
for (int s = 0; s < n; s++)
{
if (city[i] != city[s])
{
if (min[i] >= abs(city[i] - city[s]))
{
min[i] = abs(city[i] - city[s]);
}
}
}
}
for (int i = 0; i < n; i++)
{//max
for (int s = 0; s < n; s++)
{
if (city[i] != city[s])
{
if (max[i] <= abs(city[i] - city[s]))
{
max[i] = abs(city[i] - city[s]);
}
}
}
}
for (int i = 0; i < n; i++) {
cout << min[i] << " " << max[i] << endl;
}
}
**ERROR:**
Severity Code Description Project File Line Suppression State
Warning C6262 Function uses '2400032' bytes of stack: exceeds /analyze:stacksize '16384'. Consider moving some data to heap.
then it opens chkstk.asm and shows error in :
test dword ptr [eax],eax ; probe page.
Small optimistic remark:
100,000 is not a large number for your computer! (you're also not dealing with that many arrays, but arrays of that size)
Error message describes what goes wrong pretty well:
You're creating arrays on your current function's "scratchpad" (the stack). That has very limited size!
This is C++, so you really should do things the (modern-ish) C++ way and avoid manually handling large data objects when you can.
So, replace
long long n=0, city[100000], min[100000] = {10^9}, max[100000] = { 0 };
with (I don't see any case where you'd want to use long long; presumably, you want a 64bit variable?)
(10^9 is "10 XOR 9", not "10 to the power of 9")
constexpr size_t size = 100000;
constexpr int64_t default_min = 1'000'000'000;
uint64_t n = 0;
std::vector<int64_t> city(size);
std::vector<int64_t> min_value(size, default_min);
std::vector<int64_t> max_value(size, 0);
Additional remarks:
Notice how I took your 100000 and your 10⁹ and made them constexpr constants? Do that! Whenever some non-zero "magic constant" appears in your code, it's a good time to ask yourself "will I ever need that value somewhere else, too?" and "Would it make sense to give this number a name explaining what it is?". And if you answer one of them with "yes": make a new constexpr constant, even just directly above where you use it! The compiler will just deal with that as if you had the literal number where you use it, it's not any extra memory, or CPU cycles, that this will cost.
Matter of fact, that's even bad! You pre-allocating not-really-large-but-still-unneccesarily-large arrays is just a bad idea. Instead, read n first, then use that n to make std::vectors of that size.
Don not using namespace std;, for multiple reasons, chief among them that now your min and max variables would shadow std::min and std::max, and if you call something, you never know whether you're actually calling what you mean to, or just the function of the same name from the std:: namespace. Instead using std::cout; using std::cin; would do for you here!
This might be beyond your current learning level (that's fine!), but
for (int i = 0; i < n; i++) {
cin >> city[i];
}
is inelegant, and with the std::vector approach, if you make your std::vector really have length n, can be written nicely as:
for (auto &value: city) {
cin >> value;
}
This will also make sure you're not accidentally reading more values than you mean when changing the length of that city storage one day.
It looks as if you're trying to find the minimum and maximum absolute distance between city values. But you do it in an incredibly inefficient way, needing multiple loops over 10⁵·10⁵=10¹⁰ iterations.
Start with the maximum distance: assume your city vector, array (whatever!) were sorted. What are the two elements with the greatest absolute distance?
If you had a sorted array/vector: how would you find the two elements with the smallest distance?

C++ Sort number using bitshifting operations

I wanted to ask how it is possible to sort an integers digit by size using bitshifting operations.
Here is an example:
Input : 12823745
Output : 87543221
Basically sorting the digits from the high digits to the small digits
I heared it is possible without using the Bubblesort/Quicksort algorithms, but by using some bitshifting operations.
Does someone know how that can be achieved?
Quick sort and bubble sort are general purpose algorithms. As such the do not make any assumption on the data to be sorted. However, whenever we have additional information on the data we can use this to get something different (I do not say better/faster or anything like this because it is really hard to be better than something as simple and powerful as quick/bubble sort and it really depends on the specific situation what you need).
If there is only a limited number of elements to be sorted (only 10 different digits) one could use something like this:
#include <iostream>
#include <vector>
using namespace std;
typedef std::vector<int> ivec;
void sort(std::vector<int>& vec){
ivec count(10,0);
for (int i=0;i<vec.size();++i){count[vec[i]]++;}
ivec out;
for (int i=9;i>-1;--i){
for (int j=0;j<count[i];j++){
out.push_back(i);
}
}
vec = out;
}
void print(const ivec& vec){
for (int i=0;i<vec.size();++i){std::cout << vec[i];}
std::cout << std::endl;
}
int main() {
ivec vec {1,2,8,2,3,7,4,5};
sort1(vec);
print(vec);
return 0;
}
Note that this has complexity O(N). Further, this always works when set of possible elements has a finite size (not only for digits but not for floats). Unfortunately it is only practical for really small sizes.
Sometimes it is not sufficient to just count the elements. They might have some identity beside the value that has to be sorted. However, the above can be modified easily to work also in this case (needs quite some copies but still O(n)).
Actually I have no idea how your problem could be solved by using bitshift operations. However, I just wanted to point out that there is always a way not to use a general purpose algorithm when your data has nice properties (and sometimes it can be even more efficient).
Here is a solution - Implement bubble sort with loops and bitwise operations.
std::string unsorted = "37980965";
for(int i = 1; i < unsorted.size(); ++i)
for(int j = 0; j < i; ++j) {
auto &a = unsorted[i];
auto &b = unsorted[j];
(((a) >= (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b))));
}
std::cout << unsorted ;
Notice that the comparison and swap happens without any branching and arithmetic operations. There are only comparison and bitwise operations done.
How about this one?
#include <iostream>
int main()
{
int digit[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
unsigned int input = 12823745;
unsigned int output = 0;
while(input > 0) {
digit[input % 10]++;
input /= 10;
}
for(int i = 9; i >= 0; --i) {
while(digit[i] > 0) {
output = output * 10 + i;
digit[i]--;
}
}
std::cout << output;
}

Time-efficient way to count number of distinct numbers

get_number() returns an integer. I'm going to call it 30 times and count the number of distinct integers returned. My plan is to put these numbers into an std::array<int,30>, sort it and then use std::unique.
Is that a good solution? Is there a better one? This piece of code will be the bottleneck of my program.
I'm thinking there should be a hash-based solution, but maybe its overhead would be too much when I've only got 30 elements?
Edit I changed unique to distinct. Example:
{1,1,1,1} => 1
{1,2,3,4} => 4
{1,3,3,1} => 2
I would use std::set<int> as it's simpler:
std::set<int> s;
for(/*loop 30 times*/)
{
s.insert(get_number());
}
std::cout << s.size() << std::endl; // You get count of unique numbers
If you want to count return times of each unique number, I'd suggest map
std::map<int, int> s;
for(int i=0; i<30; i++)
{
s[get_number()]++;
}
cout << s.size() << std::endl; // total count of distinct numbers returned
for (auto it : s)
{
cout << it.first << " " << it.second<< std::endl; // each number and return counts
}
The simplest solution would be to use a std::map:
std::map<int, size_t> counters;
for (size_t i = 0; i != 30; ++i) {
counters[getNumber()] += 1;
}
std::vector<int> uniques;
for (auto const& pair: counters) {
if (pair.second == 1) { uniques.push_back(pair.first); }
}
// uniques now contains the items that only appeared once.
Using a std::map, std::set or the std::sort algorithm will give you a O(n*log(n)) complexity. For a small to large number of elements it is perfectly correct. But you use a known integer range and this opens the door to lot of optimizations.
As you say (in a comment) that the range of your integers is known and short: [0..99]. I would recommend to implement a modified counting sort. See: http://en.wikipedia.org/wiki/Counting_sort
You can count the number of distinct items while doing the sort itself, removing the need for the std::unique call. The whole complexity would be O(n). Another advantage is that the memory needed is independent of the number of input items. If you had 30.000.000.000 integers to sort, it would not need a single supplementary byte to count the distinct items.
Even is the range of allowed integer value is large, says [0..10.000.000] the memory consumed would be quite low. Indeed, an optimized version could consume as low as 1 bit per allowed integer value. That is less than 2 MB of memory or 1/1000th of a laptop ram.
Here is a short example program:
#include <cstdlib>
#include <algorithm>
#include <iostream>
#include <vector>
// A function returning an integer between [0..99]
int get_number()
{
return rand() % 100;
}
int main(int argc, char* argv[])
{
// reserves one bucket for each possible integer
// and initialize to 0
std::vector<int> cnt_buckets(100, 0);
int nb_distincts = 0;
// Get 30 numbers and count distincts
for(int i=0; i<30; ++i)
{
int number = get_number();
std::cout << number << std::endl;
if(0 == cnt_buckets[number])
++ nb_distincts;
// We could optimize by doing this only the first time
++ cnt_buckets[number];
}
std::cerr << "Total distincts numbers: " << nb_distincts << std::endl;
}
You can see it working:
$ ./main | sort | uniq | wc -l
Total distincts numbers: 26
26
The simplest way is just to use std::set.
std::set<int> s;
int uniqueCount = 0;
for( int i = 0; i < 30; ++i )
{
int n = get_number();
if( s.find(n) != s.end() ) {
--uniqueCount;
continue;
}
s.insert( n );
}
// now s contains unique numbers
// and uniqueCount contains the number of unique integers returned
Using an array and sort seems good, but unique may be a bit overkill if you just need to count distinct values. The following function should return number of distinct values in a sorted range.
template<typename ForwardIterator>
size_t distinct(ForwardIterator begin, ForwardIterator end) {
if (begin == end) return 0;
size_t count = 1;
ForwardIterator prior = begin;
while (++begin != end)
{
if (*prior != *begin)
++count;
prior = begin;
}
return count;
}
In contrast to the set- or map-based approaches this one does not need any heap allocation and elements are stored continuously in memory, therefore it should be much faster. Asymptotic time complexity is O(N log N) which is the same as when using an associative container. I bet that even your original solution of using std::sort followed by std::unique would be much faster than using std::set.
Try a set, try an unordered set, try sort and unique, try something else that seems fun.
Then MEASURE each one. If you want the fastest implementation, there is no substitute for trying out real code and seeing what it really does.
Your particular platform and compiler and other particulars will surely matter, so test in an environment as close as possible to where it will be running in production.

Need advice on improving my code: Search Algorithm

I'm pretty new at C++ and would need some advice on this.
Here I have a code that I wrote to measure the number of times an arbitrary integer x occurs in an array and to output the comparisons made.
However I've read that by using multi-way branching("Divide and conqurer!") techniques, I could make the algorithm run faster.
Could anyone point me in the right direction how should I go about doing it?
Here is my working code for the other method I did:
#include <iostream>
#include <cstdlib>
#include <vector>
using namespace std;
vector <int> integers;
int function(int vectorsize, int count);
int x;
double input;
int main()
{
cout<<"Enter 20 integers"<<endl;
cout<<"Type 0.5 to end"<<endl;
while(true)
{
cin>>input;
if (input == 0.5)
break;
integers.push_back(input);
}
cout<<"Enter the integer x"<<endl;
cin>>x;
function((integers.size()-1),0);
system("pause");
}
int function(int vectorsize, int count)
{
if(vectorsize<0) //termination condition
{
cout<<"The number of times"<< x <<"appears is "<<count<<endl;
return 0;
}
if (integers[vectorsize] > x)
{
cout<< integers[vectorsize] << " > " << x <<endl;
}
if (integers[vectorsize] < x)
{
cout<< integers[vectorsize] << " < " << x <<endl;
}
if (integers[vectorsize] == x)
{
cout<< integers[vectorsize] << " = " << x <<endl;
count = count+1;
}
return (function(vectorsize-1,count));
}
Thanks!
If the array is unsorted, just use a single loop to compare each element to x. Unless there's something you're forgetting to tell us, I don't see any need for anything more complicated.
If the array is sorted, there are algorithms (e.g. binary search) that would have better asymptotic complexity. However, for a 20-element array a simple linear search should still be the preferred strategy.
If your array is a sorted one you can use a divide to conquer strategy:
Efficient way to count occurrences of a key in a sorted array
A divide and conquer algorithm is only beneficial if you can either eliminate some work with it, or if you can parallelize the divided work parts accross several computation units. In your case, the first option is possible with an already sorted dataset, other answers may have addressed the problem.
For the second solution, the algorithm name is map reduce, which split the dataset in several subsets, distribute the subsets to as many threads or processes, and gather the results to "compile" them (the term is actually "reduce") in a meaningful result. In your setting, it means that each thread will scan its own slice of the array to count the items, and return its result to the "reduce" thread, which will add them up to return the final result. This solution is only interesting for large datasets though.
There are questions dealing with mapreduce and c++ on SO, but I'll try to give you a sample implementation here:
#include <utility>
#include <thread>
#include <boost/barrier>
constexpr int MAP_COUNT = 4;
int mresults[MAP_COUNT];
boost::barrier endmap(MAP_COUNT + 1);
void mfunction(int start, int end, int rank ){
int count = 0;
for (int i= start; i < end; i++)
if ( integers[i] == x) count++;
mresult[rank] = count;
endmap.wait();
}
int rfunction(){
int count = 0;
for (int i : mresults) {
count += i;
}
return count;
}
int mapreduce(){
vector<thread &> mthreads;
int range = integers.size() / MAP_COUNT;
for (int i = 0; i < MAP_COUNT; i++ )
mthreads.push_back(thread(bind(mfunction, i * range, (i+1) * range, i)));
endmap.wait();
return rfunction();
}
Once the integers vector has been populated, you call the mapreduce function defined above, which should return the expected result. As you can see, the implementation is very specialized:
the map and reduce functions are specific to your problem,
the number of threads used for map is static,
I followed your style and used global variables,
for convenience, I used a boost::barrier for synchronization
However this should give you an idea of the algorithm, and how you could apply it to similar problems.
caveat: code untested.

Optimized way to find M largest elements in an NxN array using C++

I need a blazing fast way to find the 2D positions and values of the M largest elements in an NxN array.
right now I'm doing this:
struct SourcePoint {
Point point;
float value;
}
SourcePoint* maxValues = new SourcePoint[ M ];
maxCoefficients = new SourcePoint*[
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
if (sample > maxValues[0].value) {
int q = 1;
while ( sample > maxValues[q].value && q < M ) {
maxValues[q-1] = maxValues[q]; // shuffle the values back
q++;
}
maxValues[q-1].value = sample;
maxValues[q-1].point = Point(i,j);
}
}
}
A Point struct is just two ints - x and y.
This code basically does an insertion sort of the values coming in. maxValues[0] always contains the SourcePoint with the lowest value that still keeps it within the top M values encoutered so far. This gives us a quick and easy bailout if sample <= maxValues, we don't do anything. The issue I'm having is the shuffling every time a new better value is found. It works its way all the way down maxValues until it finds it's spot, shuffling all the elements in maxValues to make room for itself.
I'm getting to the point where I'm ready to look into SIMD solutions, or cache optimisations, since it looks like there's a fair bit of cache thrashing happening. Cutting the cost of this operation down will dramatically affect the performance of my overall algorithm since this is called many many times and accounts for 60-80% of my overall cost.
I've tried using a std::vector and make_heap, but I think the overhead for creating the heap outweighed the savings of the heap operations. This is likely because M and N generally aren't large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is this operation is called repeatedly, and it can't be precomputed.
I just had a thought to pre-load the first M elements of maxValues which may provide some small savings. In the current algorithm, the first M elements are guaranteed to shuffle themselves all the way down just to initially fill maxValues.
Any help from optimization gurus would be much appreciated :)
A few ideas you can try. In some quick tests with N=100 and M=15 I was able to get it around 25% faster in VC++ 2010 but test it yourself to see whether any of them help in your case. Some of these changes may have no or even a negative effect depending on the actual usage/data and compiler optimizations.
Don't allocate a new maxValues array each time unless you need to. Using a stack variable instead of dynamic allocation gets me +5%.
Changing g_Source[i][j] to g_Source[j][i] gains you a very little bit (not as much as I'd thought there would be).
Using the structure SourcePoint1 listed at the bottom gets me another few percent.
The biggest gain of around +15% was to replace the local variable sample with g_Source[j][i]. The compiler is likely smart enough to optimize out the multiple reads to the array which it can't do if you use a local variable.
Trying a simple binary search netted me a small loss of a few percent. For larger M/Ns you'd likely see a benefit.
If possible try to keep the source data in arr[][] sorted, even if only partially. Ideally you'd want to generate maxValues[] at the same time the source data is created.
Look at how the data is created/stored/organized may give you patterns or information to reduce the amount of time to generate your maxValues[] array. For example, in the best case you could come up with a formula that gives you the top M coordinates without needing to iterate and sort.
Code for above:
struct SourcePoint1 {
int x;
int y;
float value;
int test; //Play with manual/compiler padding if needed
};
If you want to go into micro-optimizations at this point, the a simple first step should be to get rid of the Points and just stuff both dimensions into a single int. That reduces the amount of data you need to shift around, and gets SourcePoint down to being a power of two long, which simplifies indexing into it.
Also, are you sure that keeping the list sorted is better than simply recomputing which element is the new lowest after each time you shift the old lowest out?
(Updated 22:37 UTC 2011-08-20)
I propose a binary min-heap of fixed size holding the M largest elements (but still in min-heap order!). It probably won't be faster in practice, as I think OPs insertion sort probably has decent real world performance (at least when the recommendations of the other posteres in this thread are taken into account).
Look-up in the case of failure should be constant time: If the current element is less than the minimum element of the heap (containing the max M elements) we can reject it outright.
If it turns out that we have an element bigger than the current minimum of the heap (the Mth biggest element) we extract (discard) the previous min and insert the new element.
If the elements are needed in sorted order the heap can be sorted afterwards.
First attempt at a minimal C++ implementation:
template<unsigned size, typename T>
class m_heap {
private:
T nodes[size];
static const unsigned last = size - 1;
static unsigned parent(unsigned i) { return (i - 1) / 2; }
static unsigned left(unsigned i) { return i * 2; }
static unsigned right(unsigned i) { return i * 2 + 1; }
void bubble_down(unsigned int i) {
for (;;) {
unsigned j = i;
if (left(i) < size && nodes[left(i)] < nodes[i])
j = left(i);
if (right(i) < size && nodes[right(i)] < nodes[j])
j = right(i);
if (i != j) {
swap(nodes[i], nodes[j]);
i = j;
} else {
break;
}
}
}
void bubble_up(unsigned i) {
while (i > 0 && nodes[i] < nodes[parent(i)]) {
swap(nodes[parent(i)], nodes[i]);
i = parent(i);
}
}
public:
m_heap() {
for (unsigned i = 0; i < size; i++) {
nodes[i] = numeric_limits<T>::min();
}
}
void add(const T& x) {
if (x < nodes[0]) {
// reject outright
return;
}
nodes[0] = x;
swap(nodes[0], nodes[last]);
bubble_down(0);
}
};
Small test/usage case:
#include <iostream>
#include <limits>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
using namespace std;
// INCLUDE TEMPLATED CLASS FROM ABOVE
typedef vector<float> vf;
bool compare(float a, float b) { return a > b; }
int main()
{
int N = 2000;
vf v;
for (int i = 0; i < N; i++) v.push_back( rand()*1e6 / RAND_MAX);
static const int M = 50;
m_heap<M, float> h;
for (int i = 0; i < N; i++) h.add( v[i] );
sort(v.begin(), v.end(), compare);
vf heap(h.get(), h.get() + M); // assume public in m_heap: T* get() { return nodes; }
sort(heap.begin(), heap.end(), compare);
cout << "Real\tFake" << endl;
for (int i = 0; i < M; i++) {
cout << v[i] << "\t" << heap[i] << endl;
if (fabs(v[i] - heap[i]) > 1e-5) abort();
}
}
You're looking for a priority queue:
template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> >
class priority_queue;
You'll need to figure out the best underlying container to use, and probably define a Compare function to deal with your Point type.
If you want to optimize it, you could run a queue on each row of your matrix in its own worker thread, then run an algorithm to pick the largest item of the queue fronts until you have your M elements.
A quick optimization would be to add a sentinel value to yourmaxValues array. If you have maxValues[M].value equal to std::numeric_limits<float>::max() then you can eliminate the q < M test in your while loop condition.
One idea would be to use the std::partial_sort algorithm on a plain one-dimensional sequence of references into your NxN array. You could probably also cache this sequence of references for subsequent calls. I don't know how well it performs, but it's worth a try - if it works good enough, you don't have as much "magic". In particular, you don't resort to micro optimizations.
Consider this showcase:
#include <algorithm>
#include <iostream>
#include <vector>
#include <stddef.h>
static const int M = 15;
static const int N = 20;
// Represents a reference to a sample of some two-dimensional array
class Sample
{
public:
Sample( float *arr, size_t row, size_t col )
: m_arr( arr ),
m_row( row ),
m_col( col )
{
}
inline operator float() const {
return m_arr[m_row * N + m_col];
}
bool operator<( const Sample &rhs ) const {
return (float)other < (float)*this;
}
int row() const {
return m_row;
}
int col() const {
return m_col;
}
private:
float *m_arr;
size_t m_row;
size_t m_col;
};
int main()
{
// Setup a demo array
float arr[N][N];
memset( arr, 0, sizeof( arr ) );
// Put in some sample values
arr[2][1] = 5.0;
arr[9][11] = 2.0;
arr[5][4] = 4.0;
arr[15][7] = 3.0;
arr[12][19] = 1.0;
// Setup the sequence of references into this array; you could keep
// a copy of this sequence around to reuse it later, I think.
std::vector<Sample> samples;
samples.reserve( N * N );
for ( size_t row = 0; row < N; ++row ) {
for ( size_t col = 0; col < N; ++col ) {
samples.push_back( Sample( (float *)arr, row, col ) );
}
}
// Let partial_sort find the M largest entry
std::partial_sort( samples.begin(), samples.begin() + M, samples.end() );
// Print out the row/column of the M largest entries.
for ( std::vector<Sample>::size_type i = 0; i < M; ++i ) {
std::cout << "#" << (i + 1) << " is " << (float)samples[i] << " at " << samples[i].row() << "/" << samples[i].col() << std::endl;
}
}
First of all, you are marching through the array in the wrong order!
You always, always, always want to scan through memory linearly. That means the last index of your array needs to be changing fastest. So instead of this:
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
Try this:
for (int i = 0; i < cols; i++) {
for (int j = 0; j < rows; j++) {
float sample = arr[i][j];
I predict this will make a bigger difference than any other single change.
Next, I would use a heap instead of a sorted array. The standard <algorithm> header already has push_heap and pop_heap functions to use a vector as a heap. (This will probably not help all that much, though, unless M is fairly large. For small M and a randomized array, you do not wind up doing all that many insertions on average... Something like O(log N) I believe.)
Next after that is to use SSE2. But that is peanuts compared to marching through memory in the right order.
You should be able to get nearly linear speedup with parallel processing.
With N CPUs, you can process a band of rows/N rows (and all columns) with each CPU, finding the top M entries in each band. And then do a selection sort to find the overall top M.
You could probably do that with SIMD as well (but here you'd divide up the task by interleaving columns instead of banding the rows). Don't try to make SIMD do your insertion sort faster, make it do more insertion sorts at once, which you combine at the end using a single very fast step.
Naturally you could do both multi-threading and SIMD, but on a problem which is only 30x30, that's not likely to be worthwhile.
I tried replacing float by double, and interestingly that gave me a speed improvement of about 20% (using VC++ 2008). That's a bit counterintuitive, but it seems modern processors or compilers are optimized for double value processing.
Use a linked list to store the best yet M values. You'll still have to iterate over it to find the right spot, but the insertion is O(1). It would probably even be better than binary search and insertion O(N)+O(1) vs O(lg(n))+O(N).
Interchange the fors, so you're not accessing every N element in memory and trashing the cache.
LE: Throwing another idea that might work for uniformly distributed values.
Find the min, max in 3/2*O(N^2) comparisons.
Create anywhere from N to N^2 uniformly distributed buckets, preferably closer to N^2 than N.
For every element in the NxN matrix place it in bucket[(int)(value-min)/range], range=max-min.
Finally create a set starting from the highest bucket to the lowest, add elements from other buckets to it while |current set| + |next bucket| <=M.
If you get M elements you're done.
You'll likely get less elements than M, let's say P.
Apply your algorithm for the remaining bucket and get biggest M-P elements out of it.
If elements are uniform and you use N^2 buckets it's complexity is about 3.5*(N^2) vs your current solution which is about O(N^2)*ln(M).