I have a 2d array houses[5][2] = {{1,1},{1,1},{1,1},{1,1},{1,1}}
What is the fastest way to check if all the elements inside that array are equal?
Here is what I have tried so far:
```
for(int j=0;j<5;j++){
for(int k=0;k<6;k++){
if(houses[j][k] == houses[j+1][k+1] && j+1 != 5 && k + 1 != 6)
equal = true;
else{
equal = false;
break;
}
}
}
This won't compare all the elements tho, I know how to compare all of them, but it seems to be a very long loop .. is there a faster way to do that?
Your current code will fail because break will only take you out of one loop. You must exit both, which requires a second check, like so:
auto the_value = houses[0][0];
bool equal = true;
for(int j=0;j<5;j++){
for(int k=0;k<6;k++){
if(houses[j][k]!=the_value){
equal = false;
goto done;
}
}
if(!equal)
break
}
(Storing the first element in a variable and then looping over all of the elements to check to see if they are equal to that variable obviates the mess you invoke by checking adjacent elements.)
Breaking out of both loops simultaneously requires the Dark Arts (goto), but may be more readable/maintainable if you are disciplined and may be slightly faster, depending on your compiler:
auto the_value = houses[0][0];
bool equal = true;
for(int j=0;j<5;j++)
for(int k=0;k<6;k++)
if(houses[j][k]!=the_value){
equal = false;
goto done; //Danger, Will Robinson!
}
done:
//More stuff
You may find a flat array to be faster:
auto the_value = houses[0][0];
bool equal = true;
for(int i=0;i<5*6;i++)
if(houses[i]!=the_value){
equal = false;
break;
}
The 2D array is stored as a 1D contiguous array in memory. Using flat array addressing accesses the same memory locations, but explicitly avoids forcing the internal arithmetic. For highly performant code you may wish to consider using flat arrays by default.
Since you might use a function such as this a number of times or have it embedded in otherwise complex code, perhaps you'd like to abstract it:
template<class T>
bool AllEqual(const T& arr, size_t N){
T the_value = arr[0];
for(int i=0;i<N;i++)
if(arr[i]!=the_value)
return false;
return true;
}
AllEqual(houses, 5*6);
Since you're coding in C++, you probably don't want to be using raw arrays anyway. Let's rewrite your code using the STL, assuming flat arrays:
template<class T>
bool AllEqual(const std::vector<T>& arr){
return std::all_of(arr.begin(), arr.end(), [&](const T& x){ return x==arr[0]; });
}
std::vector<int> houses = {}; //Replace with appropriate initialization
if(AllEqual(houses))
//Do stuff
(Also: as another answerer mentioned, the way you are adding data to your array seems to imply that it should be 2x6/6x2 array instead of 5x6/6x5.)
First, do you understand what your array looks like? You have 6 times of two ones, but you used houses[5][6]. That's it 5 rows and 6 columns. You should have gotten an error for that:
main.cpp:5:55: error: excess elements in array initializer
int houses[5][6] = {{1,1},{1,1},{1,1},{1,1},{1,1},{1,1}};
^~~~~
What you really wanted was 6 rows and 2 columns.
As for the way of checking whether all elements of a 2D array are equal, I would follow a simple approach; store the first element of your array to a variable, e.g. named v, and check that value versus all the other elements. If it is not equal to just one element, then it is enough to take a decision and say that not all elements are equal, like in the following example:
#include <iostream>
bool allEqual(int arr[][2], int rows)
{
int v = arr[0][0];
for(int i = 0; i < rows; ++i)
for(int j = 0; j < 2; ++j)
if(v != arr[i][j])
return false;
return true;
}
int main(void)
{
int houses[6][2] = {{1,1},{1,1},{1,1},{1,1},{1,1},{1,1}};
allEqual(houses, 6) ? std::cout << "All " : std::cout << "Not all ";
std::cout << "elements are equal\n";
return 0;
}
If I emulate a 2D array with an 1D, will it be faster?
I doubt it. They idea is that the memory locations will be contiguous, but this is what happens pretty most in the 2D case, given that the rows are more than the columns.
Here is my experiment:
Georgioss-MacBook-Pro:~ gsamaras$ g++ -Wall -std=c++0x -O3 -o 2d 2d.cpp
Georgioss-MacBook-Pro:~ gsamaras$ ./2d
2D array took 1.48e-10 seconds.
Georgioss-MacBook-Pro:~ gsamaras$ g++ -Wall -std=c++0x -O3 -o 1d 1d.cpp
Georgioss-MacBook-Pro:~ gsamaras$ ./1d
Emulating 2D array with 1D array took 1.5e-10 seconds.
and my code, based on my Time measurements (C++):
#include <iostream>
#define ROWS 10000
#define COLS 20
#define REPEAT 1000
#include <iostream>
#include <ctime>
#include <ratio>
#include <chrono>
bool allEqual(int* arr, const int size)
{
int v = arr[0];
for(int i = 0; i < size; ++i)
if(v != arr[i])
return false;
return true;
}
void fill(int* arr, const int size)
{
for(int i = 0; i < size; ++i)
arr[i] = 1;
}
int main(void)
{
const int size = ROWS * COLS;
int houses[size];
fill(houses, size);
bool equal;
using namespace std::chrono;
high_resolution_clock::time_point t1 = high_resolution_clock::now();
for(int i = 0; i < REPEAT; ++i)
equal = allEqual(houses, size);
high_resolution_clock::time_point t2 = high_resolution_clock::now();
duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
std::cout << "Emulating 2D array with 1D array took " << time_span.count()/(double)REPEAT << " seconds.\n";
return 0;
}
where the 2d.cpp is the straightforward way.
Using the equal method provided in this answer for a 2D array, the timings reported are similar.
Moreover, there is std::equal, which is comparable in terms of performance to my code above, reporting a time of:
std::equal with 2D array took 1.63e-10 seconds.
It's complexity is: "Up to linear in the distance between first1 and last1: Compares elements until a mismatch is found."
Summary:
std::equal does OK, and requires the less effort from the programmer, thus use it.
Multiple things:
First, as others have pointed out, the line:
int houses[5][6] = {{1,1},{1,1},{1,1},{1,1},{1,1},{1,1}};
Is wrong, the left hand side declares an array with 5 rows and 6 columns, but the right hand side constitutes an array of 6 rows and 2 columns.
On the general case comparing all elements of a 2d array (or even a 1d array) is in O(n) since for every element you must check all other elements. You can optimize it a little bit but it will still be an O(n) algorithm. On the most general case:
A[n][m] is an array of n rows and m columns
for(int i=0; i<n*m; i++)
{
if(A[0][0] != A[i/n][i%n])
return false;
}
return true;
This may seem a little bit confusing so let me explain:
a 2d array has n*m elements, so an easy way to see all of them in a single loop is doing [i/n] (if i < n, then it's the first row, if n < i < 2n then it's the second row...) and doing [i%n] gives you the remainder. This way we can iterate the entire array in a single loop.
Since we want all elements to be the same, if the first element is equal to all others then they are ll the same, if at least on is different then they are not all the same.
The fastest way:
int houses[6][2] = {{1,1},{1,1},{1,1},{1,1},{1,1},{1,2}};
int equals()
{
int *p = (int *)houses;
int *end = p + 6*2;
int v = *p++;
for(; p < end; p++)
if (*p != v)
return 0;
return 1;
}
I wrote it for fun, don't use that in production.
Instead, iterate through them all:
int equals() {
int v = houses[0][0];
for(int j=0;j<5;j++)
for(int k=0;k<6;k++)
if (houses[i][j] != v)
return false;
return true;
}
We can simply way to check if all the elements inside that array are equal
or not. just assign the first row & column element in a variable. Then compare each element. If not equal then return false.
Code Snippet :
bool Equal(int **arr, int row, int col)
{
int v = arr[0][0];
for(int i=0; i<row; i++)
{
for(int k=0; k<col; k++)
{
if(arr[i][k]!=v) return false;
}
}
return true;
}
Related
I am learning DSA and while practising my LeetCode questions I came across a question-( https://leetcode.com/problems/find-pivot-index/).
Whenever I use vector prefix(size), I am greeted with errors, but when I do not add the size, the program runs fine.
Below is the code with the size:
class Solution {
public:
int pivotIndex(vector<int>& nums) {
//prefix[] stores the prefix sum of nums[]
vector<int> prefix(nums.size());
int sum2=0;
int l=nums.size();
//Prefix sum of nums in prefix:
for(int i=0;i<l;i++){
sum2=sum2+nums[i];
prefix.push_back(sum2);
}
//Total stores the total sum of the vector given
int total=prefix[l-1];
for(int i=0; i<l;i++)
{
if((prefix[i]-nums[i])==(total-prefix[i]))
{
return i;
}
}
return -1;
}
};
I would really appreciate if someone could explain this to me.
Thanks!
You create prefix to be the same size as nums and then you push_back the same number of elments. prefix will therefore be twice the size of nums after the first loop. You never access the elements you've push_backed in the second loop so the algorithm is broken.
I suggest that you simplify your algorithm. Keep a running sum for the left and the right side. Add to the left and remove from the right as you loop.
Example:
#include <numeric>
#include <vector>
int pivotIndex(const std::vector<int>& nums) {
int lsum = 0;
int rsum = std::accumulate(nums.begin(), nums.end(), 0);
for(int idx = 0; idx < nums.size(); ++idx) {
rsum -= nums[idx]; // remove from the right
if(lsum == rsum) return idx;
lsum += nums[idx]; // add to the left
}
return -1;
}
If you use vector constructor with the integer parameter, you get vector with nums.size() elements initialized by default value. You should use indexing to set the elements:
...
for(int i = 0; i < l; ++i){
sum2 = sum2 + nums[i];
prefix[i] = sum2;
}
...
If you want to use push_back method, you should create a zero size vector. Use the constructor without parameters. You can use reserve method to allocate memory before adding new elements to the vector.
I recently started learning C++ and ran into problems with this task:
I am given 4 arrays of different lengths with different values.
vector<int> A = {1,2,3,4};
vector<int> B = {1,3,44};
vector<int> C = {1,23};
vector<int> D = {0,2,5,4};
I need to implement a function that goes through all possible variations of the elements of these vectors and checks if there are such values a from array A, b from array B, c from array C and d from array D that their sum would be 0(a+b+c+d=0)
I wrote such a program, but it outputs 1, although the desired combination does not exist.
using namespace std;
vector<int> test;
int sum (vector<int> v){
int sum_of_elements = 0;
for (int i = 0; i < v.size(); i++){
sum_of_elements += v[i];
}
return sum_of_elements;
}
bool abcd0(vector<int> A,vector<int> B,vector<int> C,vector<int> D){
for ( int ai = 0; ai <= A.size(); ai++){
test[0] = A[ai];
for ( int bi = 0; bi <= B.size(); bi++){
test[1] = B[bi];
for ( int ci = 0; ci <= C.size(); ci++){
test[2] = C[ci];
for ( int di = 0; di <= D.size(); di++){
test[3] = D[di];
if (sum (test) == 0){
return true;
}
}
}
}
}
}
I would be happy if you could explain what the problem is
Vectors don't increase their size by themself. You either need to construct with right size, resize it, or push_back elements (you can also insert, but vectors arent very efficient at that). In your code you never add any element to test and accessing any element, eg test[0] = A[ai]; causes undefined behavior.
Further, valid indices are [0, size()) (ie size() excluded, it is not a valid index). Hence your loops are accessing the input vectors out-of-bounds, causing undefined behavior again. The loops conditions should be for ( int ai = 0; ai < A.size(); ai++){.
Not returning something from a non-void function is again undefined behavior. When your abcd0 does not find a combination that adds up to 0 it does not return anything.
After fixing those issues your code does produce the expected output: https://godbolt.org/z/KvW1nePMh.
However, I suggest you to...
not use global variables. It makes the code difficult to reason about. For example we need to see all your code to know if you actually do resize test. If test was local to abcd0 we would only need to consider that function to know what happens to test.
read about Why is “using namespace std;” considered bad practice?
not pass parameters by value when you can pass them by const reference to avoid unnecessary copies.
using range based for loops helps to avoid making mistakes with the bounds.
Trying to change not more than necessary, your code could look like this:
#include <vector>
#include <iostream>
int sum (const std::vector<int>& v){
int sum_of_elements = 0;
for (int i = 0; i < v.size(); i++){
sum_of_elements += v[i];
}
return sum_of_elements;
}
bool abcd0(const std::vector<int>& A,
const std::vector<int>& B,
const std::vector<int>& C,
const std::vector<int>& D){
for (const auto& a : A){
for (const auto& b : B){
for (const auto& c : C){
for (const auto& d : D){
if (sum ({a,b,c,d}) == 0){
return true;
}
}
}
}
}
return false;
}
int main() {
std::vector<int> A = {1,2,3,4};
std::vector<int> B = {1,3,44};
std::vector<int> C = {1,23};
std::vector<int> D = {0,2,5,4};
std::cout << abcd0(A,B,C,D);
}
Note that I removed the vector test completely. You don't need to construct it explicitly, but you can pass a temporary to sum. sum could use std::accumulate, or you could simply add the four numbers directly in abcd0. I suppose this is for exercise, so let's leave it at that.
Edit : The answer written by #463035818_is_not_a_number is the answer you should refer to.
As mentioned in the comments by #Alan Birtles, there's nothing in that code that adds elements to test. Also, as mentioned in comments by #PaulMcKenzie, the condition in loops should be modified. Currently, it is looping all the way up to the size of the vector which is invalid(since the index runs from 0 to the size of vector-1). For implementing the algorithm that you've in mind (as I inferred from your code), you can declare and initialise the vector all the way down in the 4th loop.
Here's the modified code,
int sum (vector<int> v){
int sum_of_elements = 0;
for (int i = 0; i < v.size(); i++){
sum_of_elements += v[i];
}
return sum_of_elements;
}
bool abcd0(vector<int> A,vector<int> B,vector<int> C,vector<int> D){
for ( int ai = 0; ai <A.size(); ai++){
for ( int bi = 0; bi <B.size(); bi++){
for ( int ci = 0; ci <C.size(); ci++){
for ( int di = 0; di <D.size(); di++){
vector<int> test = {A[ai], B[bi], C[ci], D[di]};
if (sum (test) == 0){
return true;
}
}
}
}
}
return false;
}
The algorithm is inefficient though. You can try sorting the vectors first. Loop through the first two of them while using the 2 pointer technique to check if desired sum is available from the remaining two vectors
It looks to me, like you're calling the function every time you want to check an array. Within the function you're initiating int sum_of_elements = 0;.
So at the first run, you're starting with int sum_of_elements = 0;.
It finds the element and increases sum_of_elements up to 1.
Second run you're calling the function and it initiates again with int sum_of_elements = 0;.
This is repeated every time you're checking the next array for the element.
Let me know if I understood that correctly (didn't run it, just skimmed a bit).
Let's say you have a number of unsorted arrays containing integers. Your job is to make sums of the arrays. The sums have to contain exactly one value from each array, i.e. (for 3 arrays)
sum = array1[2]+array2[12]+array3[4];
Goal: You should output the 20 combinations that generate the lowest possible sums.
The solution below is off-limits as the algorithm needs to be able to handle 10 arrays that can contain a huge number of integers. The following solution is way too slow for larger number of arrays:
//You already have int array1, array2 and array3
int top[20];
for(int i=0; i<20; i++)
top[i] = 1e99;
int sum = 0;
for(int i=0; i<array1.size(); i++) //One for loop per array is trouble for
for(int j=0; j<array2.size(); j++) //increasing numbers of arrays
for(int k=0; k<array3.size(); k++)
{
sum = array1[i] + array2[j] + array3[k];
if (sum < top[19])
swapFunction(sum, top); //Function that adds sum to top
//and sorts top in increasing order
}
printResults(top); // Outputs top 20 lowest sums in increasing order
What would you do to achieve correct results more efficiently (with a lower Big O notation)?
The answer can be found by considering how to find the absolute lowest sum, and how to find the 2nd lowest sum and so on.
As you only need 20 sums at most, you only need the lowest 20 values from each array at most. I would recommend using std::partial_sort for this.
The rest should be able to be accomplished with a priority_queue in which each element contains the current sum and the indicies of the arrays for this sum. Simply take each index of indicies and increase it by one, calculate the new sum and add that to the priority queue. The top most item of the queue should always be the one of the lowest sum. Remove the lowest sum, generate the next possibilities, and then repeat until you have enough answers.
Assuming that the number of answers needed is much less than Big O should be predominately be the efficiency of partial_sort (N + k*log(k)) * number of arrays
Here's some basic code to demonstrate the idea. There's very likely ways of improving on this. For example, I'm sure that with some work, you could avoid adding the same set of indicies multiple times, and there by eliminate the need for the do-while pop.
for (size_t i = 0; i < arrays.size(); i++)
{
auto b = arrays[i].begin();
partial_sort(b, b + numAnswers, arrays[i].end());
}
struct answer
{
answer(int s, vector<int> i)
: sum(s), indices(i)
{
}
int sum;
vector<int> indices;
bool operator <(const answer &o) const
{
return sum > o.sum;
}
};
auto getSum =[&arrays](const vector<int> &indices) {
auto retval = 0;
for (size_t i = 0; i < arrays.size(); i++)
{
retval += arrays[i][indices[i]];
}
return retval;
};
vector<int> initalIndices(arrays.size());
priority_queue<answer> q;
q.emplace(getSum(initalIndices), initalIndices );
for (auto i = 0; i < numAnswers; i++)
{
auto ans = q.top();
cout << ans.sum << endl;
do
{
q.pop();
} while (!q.empty() && q.top().indices == ans.indices);
for (size_t i = 0; i < ans.indices.size(); i++)
{
auto nextIndices = ans.indices;
nextIndices[i]++;
q.emplace(getSum(nextIndices), nextIndices);
}
}
I have 2 arrays called xVal, and yVal.
I'm using these arrays as coords. What I want to do is to make sure that the array doesn't contain 2 identical sets of coords.
Lets say my arrays looks like this:
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
Here I want to find the match between xVal[0] yVal[0] and xVal[1] yVal[1] as 2 identical sets of coords called 1,1.
I have tried some different things with a forLoop, but I cant make it work as intended.
You can write an explicit loop using an O(n^2) approach (see answer from x77aBs) or you can trade in some memory for performance. For example using std::set
bool unique(std::vector<int>& x, std::vector<int>& y)
{
std::set< std::pair<int, int> > seen;
for (int i=0,n=x.size(); i<n; i++)
{
if (seen.insert(std::make_pair(x[i], y[i])).second == false)
return false;
}
return true;
}
You can do it with two for loops:
int MAX=4; //number of elements in array
for (int i=0; i<MAX; i++)
{
for (int j=i+1; j<MAX; j++)
{
if (xVal[i]==xVal[j] && yVal[i]==yVal[j])
{
//DUPLICATE ELEMENT at xVal[j], yVal[j]. Here you implement what
//you want (maybe just set them to -1, or delete them and move everything
//one position back)
}
}
}
Small explanation: first variable i get value 0. Than you loop j over all possible numbers. That way you compare xVal[0] and yVal[0] with all other values. j starts at i+1 because you don't need to compare values before i (they have already been compared).
Edit - you should consider writing small class that will represent a point, or at least structure, and using std::vector instead of arrays (it's easier to delete an element in the middle). That should make your life easier :)
int identicalValueNum = 0;
int identicalIndices[4]; // 4 is the max. possible number of identical values
for (int i = 0; i < 4; i++)
{
if (xVal[i] == yVal[i])
{
identicalIndices[identicalValueNum++] = i;
}
}
for (int i = 0; i < identicalValueNum; i++)
{
printf(
"The %ith value in both arrays is the same and is: %i.\n",
identicalIndices[i], xVal[i]);
}
For
int xVal[4] = {1,1,3,4};
int yVal[4] = {1,1,5,4};
the output of printf would be:
The 0th value in both arrays is the same and is: 1.
The 1th value in both arrays is the same and is: 1.
The 3th value in both arrays is the same and is: 4.
I need a blazing fast way to find the 2D positions and values of the M largest elements in an NxN array.
right now I'm doing this:
struct SourcePoint {
Point point;
float value;
}
SourcePoint* maxValues = new SourcePoint[ M ];
maxCoefficients = new SourcePoint*[
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
if (sample > maxValues[0].value) {
int q = 1;
while ( sample > maxValues[q].value && q < M ) {
maxValues[q-1] = maxValues[q]; // shuffle the values back
q++;
}
maxValues[q-1].value = sample;
maxValues[q-1].point = Point(i,j);
}
}
}
A Point struct is just two ints - x and y.
This code basically does an insertion sort of the values coming in. maxValues[0] always contains the SourcePoint with the lowest value that still keeps it within the top M values encoutered so far. This gives us a quick and easy bailout if sample <= maxValues, we don't do anything. The issue I'm having is the shuffling every time a new better value is found. It works its way all the way down maxValues until it finds it's spot, shuffling all the elements in maxValues to make room for itself.
I'm getting to the point where I'm ready to look into SIMD solutions, or cache optimisations, since it looks like there's a fair bit of cache thrashing happening. Cutting the cost of this operation down will dramatically affect the performance of my overall algorithm since this is called many many times and accounts for 60-80% of my overall cost.
I've tried using a std::vector and make_heap, but I think the overhead for creating the heap outweighed the savings of the heap operations. This is likely because M and N generally aren't large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is this operation is called repeatedly, and it can't be precomputed.
I just had a thought to pre-load the first M elements of maxValues which may provide some small savings. In the current algorithm, the first M elements are guaranteed to shuffle themselves all the way down just to initially fill maxValues.
Any help from optimization gurus would be much appreciated :)
A few ideas you can try. In some quick tests with N=100 and M=15 I was able to get it around 25% faster in VC++ 2010 but test it yourself to see whether any of them help in your case. Some of these changes may have no or even a negative effect depending on the actual usage/data and compiler optimizations.
Don't allocate a new maxValues array each time unless you need to. Using a stack variable instead of dynamic allocation gets me +5%.
Changing g_Source[i][j] to g_Source[j][i] gains you a very little bit (not as much as I'd thought there would be).
Using the structure SourcePoint1 listed at the bottom gets me another few percent.
The biggest gain of around +15% was to replace the local variable sample with g_Source[j][i]. The compiler is likely smart enough to optimize out the multiple reads to the array which it can't do if you use a local variable.
Trying a simple binary search netted me a small loss of a few percent. For larger M/Ns you'd likely see a benefit.
If possible try to keep the source data in arr[][] sorted, even if only partially. Ideally you'd want to generate maxValues[] at the same time the source data is created.
Look at how the data is created/stored/organized may give you patterns or information to reduce the amount of time to generate your maxValues[] array. For example, in the best case you could come up with a formula that gives you the top M coordinates without needing to iterate and sort.
Code for above:
struct SourcePoint1 {
int x;
int y;
float value;
int test; //Play with manual/compiler padding if needed
};
If you want to go into micro-optimizations at this point, the a simple first step should be to get rid of the Points and just stuff both dimensions into a single int. That reduces the amount of data you need to shift around, and gets SourcePoint down to being a power of two long, which simplifies indexing into it.
Also, are you sure that keeping the list sorted is better than simply recomputing which element is the new lowest after each time you shift the old lowest out?
(Updated 22:37 UTC 2011-08-20)
I propose a binary min-heap of fixed size holding the M largest elements (but still in min-heap order!). It probably won't be faster in practice, as I think OPs insertion sort probably has decent real world performance (at least when the recommendations of the other posteres in this thread are taken into account).
Look-up in the case of failure should be constant time: If the current element is less than the minimum element of the heap (containing the max M elements) we can reject it outright.
If it turns out that we have an element bigger than the current minimum of the heap (the Mth biggest element) we extract (discard) the previous min and insert the new element.
If the elements are needed in sorted order the heap can be sorted afterwards.
First attempt at a minimal C++ implementation:
template<unsigned size, typename T>
class m_heap {
private:
T nodes[size];
static const unsigned last = size - 1;
static unsigned parent(unsigned i) { return (i - 1) / 2; }
static unsigned left(unsigned i) { return i * 2; }
static unsigned right(unsigned i) { return i * 2 + 1; }
void bubble_down(unsigned int i) {
for (;;) {
unsigned j = i;
if (left(i) < size && nodes[left(i)] < nodes[i])
j = left(i);
if (right(i) < size && nodes[right(i)] < nodes[j])
j = right(i);
if (i != j) {
swap(nodes[i], nodes[j]);
i = j;
} else {
break;
}
}
}
void bubble_up(unsigned i) {
while (i > 0 && nodes[i] < nodes[parent(i)]) {
swap(nodes[parent(i)], nodes[i]);
i = parent(i);
}
}
public:
m_heap() {
for (unsigned i = 0; i < size; i++) {
nodes[i] = numeric_limits<T>::min();
}
}
void add(const T& x) {
if (x < nodes[0]) {
// reject outright
return;
}
nodes[0] = x;
swap(nodes[0], nodes[last]);
bubble_down(0);
}
};
Small test/usage case:
#include <iostream>
#include <limits>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
using namespace std;
// INCLUDE TEMPLATED CLASS FROM ABOVE
typedef vector<float> vf;
bool compare(float a, float b) { return a > b; }
int main()
{
int N = 2000;
vf v;
for (int i = 0; i < N; i++) v.push_back( rand()*1e6 / RAND_MAX);
static const int M = 50;
m_heap<M, float> h;
for (int i = 0; i < N; i++) h.add( v[i] );
sort(v.begin(), v.end(), compare);
vf heap(h.get(), h.get() + M); // assume public in m_heap: T* get() { return nodes; }
sort(heap.begin(), heap.end(), compare);
cout << "Real\tFake" << endl;
for (int i = 0; i < M; i++) {
cout << v[i] << "\t" << heap[i] << endl;
if (fabs(v[i] - heap[i]) > 1e-5) abort();
}
}
You're looking for a priority queue:
template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> >
class priority_queue;
You'll need to figure out the best underlying container to use, and probably define a Compare function to deal with your Point type.
If you want to optimize it, you could run a queue on each row of your matrix in its own worker thread, then run an algorithm to pick the largest item of the queue fronts until you have your M elements.
A quick optimization would be to add a sentinel value to yourmaxValues array. If you have maxValues[M].value equal to std::numeric_limits<float>::max() then you can eliminate the q < M test in your while loop condition.
One idea would be to use the std::partial_sort algorithm on a plain one-dimensional sequence of references into your NxN array. You could probably also cache this sequence of references for subsequent calls. I don't know how well it performs, but it's worth a try - if it works good enough, you don't have as much "magic". In particular, you don't resort to micro optimizations.
Consider this showcase:
#include <algorithm>
#include <iostream>
#include <vector>
#include <stddef.h>
static const int M = 15;
static const int N = 20;
// Represents a reference to a sample of some two-dimensional array
class Sample
{
public:
Sample( float *arr, size_t row, size_t col )
: m_arr( arr ),
m_row( row ),
m_col( col )
{
}
inline operator float() const {
return m_arr[m_row * N + m_col];
}
bool operator<( const Sample &rhs ) const {
return (float)other < (float)*this;
}
int row() const {
return m_row;
}
int col() const {
return m_col;
}
private:
float *m_arr;
size_t m_row;
size_t m_col;
};
int main()
{
// Setup a demo array
float arr[N][N];
memset( arr, 0, sizeof( arr ) );
// Put in some sample values
arr[2][1] = 5.0;
arr[9][11] = 2.0;
arr[5][4] = 4.0;
arr[15][7] = 3.0;
arr[12][19] = 1.0;
// Setup the sequence of references into this array; you could keep
// a copy of this sequence around to reuse it later, I think.
std::vector<Sample> samples;
samples.reserve( N * N );
for ( size_t row = 0; row < N; ++row ) {
for ( size_t col = 0; col < N; ++col ) {
samples.push_back( Sample( (float *)arr, row, col ) );
}
}
// Let partial_sort find the M largest entry
std::partial_sort( samples.begin(), samples.begin() + M, samples.end() );
// Print out the row/column of the M largest entries.
for ( std::vector<Sample>::size_type i = 0; i < M; ++i ) {
std::cout << "#" << (i + 1) << " is " << (float)samples[i] << " at " << samples[i].row() << "/" << samples[i].col() << std::endl;
}
}
First of all, you are marching through the array in the wrong order!
You always, always, always want to scan through memory linearly. That means the last index of your array needs to be changing fastest. So instead of this:
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
Try this:
for (int i = 0; i < cols; i++) {
for (int j = 0; j < rows; j++) {
float sample = arr[i][j];
I predict this will make a bigger difference than any other single change.
Next, I would use a heap instead of a sorted array. The standard <algorithm> header already has push_heap and pop_heap functions to use a vector as a heap. (This will probably not help all that much, though, unless M is fairly large. For small M and a randomized array, you do not wind up doing all that many insertions on average... Something like O(log N) I believe.)
Next after that is to use SSE2. But that is peanuts compared to marching through memory in the right order.
You should be able to get nearly linear speedup with parallel processing.
With N CPUs, you can process a band of rows/N rows (and all columns) with each CPU, finding the top M entries in each band. And then do a selection sort to find the overall top M.
You could probably do that with SIMD as well (but here you'd divide up the task by interleaving columns instead of banding the rows). Don't try to make SIMD do your insertion sort faster, make it do more insertion sorts at once, which you combine at the end using a single very fast step.
Naturally you could do both multi-threading and SIMD, but on a problem which is only 30x30, that's not likely to be worthwhile.
I tried replacing float by double, and interestingly that gave me a speed improvement of about 20% (using VC++ 2008). That's a bit counterintuitive, but it seems modern processors or compilers are optimized for double value processing.
Use a linked list to store the best yet M values. You'll still have to iterate over it to find the right spot, but the insertion is O(1). It would probably even be better than binary search and insertion O(N)+O(1) vs O(lg(n))+O(N).
Interchange the fors, so you're not accessing every N element in memory and trashing the cache.
LE: Throwing another idea that might work for uniformly distributed values.
Find the min, max in 3/2*O(N^2) comparisons.
Create anywhere from N to N^2 uniformly distributed buckets, preferably closer to N^2 than N.
For every element in the NxN matrix place it in bucket[(int)(value-min)/range], range=max-min.
Finally create a set starting from the highest bucket to the lowest, add elements from other buckets to it while |current set| + |next bucket| <=M.
If you get M elements you're done.
You'll likely get less elements than M, let's say P.
Apply your algorithm for the remaining bucket and get biggest M-P elements out of it.
If elements are uniform and you use N^2 buckets it's complexity is about 3.5*(N^2) vs your current solution which is about O(N^2)*ln(M).