Fast Algorithm for finding largest values in 2d array - c++

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?

Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.

You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.

If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.

Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)

How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).

Related

Optimal data structure (in C++) for random access and looping through elements

I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.
Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.
N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset
#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".
An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};

Merging Two Sorted Arrays with O(log(n+m)) Worst Case

What kind of algorithm can I use to merge two sorted arrays into one sorted array with worst-case time complexity of O(log(m+n)) where n, m are the length of the arrays? I have very little experience with algorithms, but I checked out merge-sort and it seems that the time-complexity for the merging step is O(n). Is there a different approach to merge in O(log(n))?
Edit: I hadn't considered initially, but maybe it's not possible to merge two sorted arrays in O(log(n))? The actual goal is to find the median of two sorted arrays. Is there a way to do this without merging them?
The only idea I've had was I read that merging two binomial heaps is O(log(n)), but turning an array into a binomial heap is O(n) I think so that won't work.
Edit2: I'm going to post a new question because I've realized that merging will never work fast enough. I think instead I need to perform a binary search on each array to find the median in log(n).
I don't think there is an algorithm that would merge two arrays in O(log(n+m)) time.
And it makes sense when you think about it. If you're trying to create a new sorted array of n+m elements you will need to do at least n+m copies. There is no way getting around that.
I think the best way would be to iterate through each array simultaneously and, at each iteration, compare the values of both elements. If one is less than the other (if you want the array sorted in descending order), then copy that element to the array and increment your indexing pointer for that array and vice versa. If the two elements are the same, you can just add them both into the newly sorted array and increment both pointers.
Continue until one of the pointers has reached the end of its respective array and then copy in the rest of the other array once one has.
That should be O(m+n)
Regarding your edit, there is a way to find the median of two separate arrays in log(n + m) time.
You can first find the median of the two sorted arrays (the middle element) and compare them. If they are equal, then that is the median. If the first's median is greater than the second's you know the median has to be in either the first half of the first array or the second half of the second array and vice versa if the first's median is less than the second's.
This method cuts your search space in half each iteration and is thus log(n + m)
You're probably thinking of The Selection Algorithm.
For a sorted data structure, finding the median is O(1). For an unsorted data structure (or a data structure where the data is sorted into two logical partitions) the runtime is O(n).
You could probably pull it off with a massively parallel reduction algorithm, but I think that's cheating in Runtime Analysis terms.
So I don't believe there's an algorithm that reduces it below O(n) (or, in your case, O(n+m))
You need to merge the arrays. so, no matter what, you need to traverse the 2 arrays at least, so the complexity can't be less than o(m+n)

how to select least N elements with limited space?

The problem:
A function f returns elements one at a time in an unknown order. I want to select the least N elements. Function f is called many times (I'm searching through a very complex search space) and I don't have enough memory to store every output element for the future sorting.
The obvious solution:
Keep a vector of N elements in the memory and on each f() search for minimum and maximum and possibly replace something. This would probably work for very small N well. I'm looking for more general solution, though.
My solution so far:
I though about using priority_queue in order to store let's say 2N values and reducing the upper half after each 2N steps.
Pseudocode:
while (search goes on)
for (i=0..2N)
el = f()
pust el to the priority queue
remove N greatest elements from the priority queue
select N least elements from the priority queue
I think this should work, however, I don't find it elegant at all. Maybe there is already some kind of data structure that handles this problem. It would be really nice just to modify the priority_queue in order to throw away the elements that don't fit into the saved range.
Could you recommend me an existing std data structure for C++ or encourage me to implement the solution I suggested above? Or maybe there is some great and elegant trick that I can't think of.
You want to find least n elements on total K elements got from calling a function. Each time you call function f() you get one element and you want to store least n elements among them without storing total k elements got from the function since k is too big.
You can define a heap or priority_queue to store this least n found so far. Just add the returned item from f() to the pq and pop the greatest element if its size became n+1.
Total complexity would be O(K*log(n)) and space needed would be O(n). (If we ignore some extra space required by pq)
Alternate option would be to use an array. Depending on the maximum allowed elements compared to N, there are two options I can think of:
Make the array as big as possible and unsorted, periodically retrieve the smallest elements.
Have an array of size N, sorted with max elements on the end.
Option 1 would have you sort the array with O(n log n) time every time you fill up the array. That would happen for each n - N elements (except the first time), yielding (k - n) / (n - N) sorts, resulting in O((k - n) / (n - N) n log n) time complexity for k total elements, n elements in the array, N elements to be selected. So for n = 2N, you get O(2*(k - 2N) log 2N) time complexity if I'm not mistaken.
Option 2 would have you keep the array (sized N) sorted with maximum elements at the end. Each time you get an element, you can quickly (O(1)) see if it is smaller than the last one. Using binary search, you can find the right spot for the element in O(log N) time. However, you now need to move all the elements after the new element one place right. That takes O(N) time. So you end up with theoretical O(k*N) time complexity. Given that computers like working with homogenous data accesses however (caches and stuff), this might be faster than heap, even if it is array-backed.
If your elements are big, you might be better off having a structure of { coparison_value; actual_element_pointer } even if you are using heap (unless it is list-backed).

What is the fastest data structure to search and update list of integer values?

I have to maintain a list of unordered integers , where number of integers are unknown. It may increase or decrease over the time. I need to update this list of integers frequently. I have tried using vector . But it is really slow . Array appears to be faster , but since the length of list is not fixed, it takes significant amount of time to resize it . Please suggest any other option .
Use a hash table, if order of the values in unimportant. Time is O(1). I'm pretty sure you'll find an implementation in the standard template libraries.
Failing that, a splay tree is extremely fast, especially if you want to keep the list ordered: amortized cost of O(ln n) per operation, with a very low constant factor. I think C++ stdlib map is something like this.
Know thy data structures.
If you are interested in Dynamic increments of Arrays size you can do this .
current =0;
x = (int**)malloc(temp * sizeof(int*));
x[current]=(int*)malloc(RequiredLength * sizeof(int));
So add elements to array and when elements are filled in x[current]
You can add more space for elements by doing
x[++current]=(int*)malloc(RequiredLength * sizeof(int));
Doing this you can accommodate for RequiredLength more elements .
You can repeat this upto 1024 times which means 1024*RequiredLength elements can be
accommodated , here it gives you chance to increase size of array whenever you want it .
You can always access the n th element by X[ n / 1024 ][ n % 1024] ;
Considering your comments, it looks like it is std::set or std::unordered_set fits your needs better than std::vector.
If sequential data structures fails to meet requirements, you could try looking at trees (binary, AVL, m-way, red-black ect ...). I would suggest you try to implement AVL tree since it yields a balanced or near balanced binary search tree which would optimize your operation. For more on AVL tree: http://en.wikipedia.org/wiki/AVL_tree
well,deque has no resize cost,but if it's unordered,it's search time is linear ,and its delete and insert operation time in the middle of its self is even worth than vector.
if you don't need search by the value of the number,hashmap or map may be your choice .No resize cost.,then you set the key of the map to number's index,and the value to the number's value.the search and insert operation is better than linear.
std::list is definitely created for such problems, adding and deleting elements in list do not necessitate memory re-allocations like in vector. However, due to the noncontagious memory allocation of the list, searching elements may prove to be a painful experience ofcourse but if you do not search its entries frequently, it can be used.

How to efficiently *nearly* sort a list?

I have a list of items; I want to sort them, but I want a small element of randomness so they are not strictly in order, only on average ordered.
How can I do this most efficiently?
I don't mind if the quality of the random is not especially good, e.g. it simply based on the chance ordering of the input, e.g. an early-terminated incomplete sort.
The context is implementing a nearly-greedy search by introducing a very slight element of inexactness; this is in a tight loop and so the speed of sorting and calling random() are to be considered
My current code is to do a std::sort (this being C++) and then do a very short shuffle just in the early part of the array:
for(int i=0; i<3; i++) // I know I have more than 6 elements
std::swap(order[i],order[i+rand()%3]);
Use first two passes of JSort. Build heap twice, but do not perform insertion sort. If element of randomness is not small enough, repeat.
There is an approach that (unlike incomplete JSort) allows finer control over the resulting randomness and has time complexity dependent on randomness (the more random result is needed, the less time complexity). Use heapsort with Soft heap. For detailed description of the soft heap, see pdf 1 or pdf 2.
You could use a standard sort algorithm (is a standard library available?) and pass a predicate that "knows", given two elements, which is less than the other, or if they are equal (returning -1, 0 or 1). In the predicate then introduce a rare (configurable) case where the answer is random, by using a random number:
pseudocode:
if random(1000) == 0 then
return = random(2)-1 <-- -1,0,-1 randomly choosen
Here we have 1/1000 chances to "scamble" two elements, but that number strictly depends on the size of your container to sort.
Another thing to add in the 1000 case, could be to remove the "right" answer because that would not scramble the result!
Edit:
if random(100 * container_size) == 0 then <-- here I consider the container size
{
if element_1 < element_2
return random(1); <-- do not return the "correct" value of -1
else if element_1 > element_2
return random(1)-1; <-- do not return the "correct" value of 1
else
return random(1)==0 ? -1 : 1; <-- do not return 0
}
in my pseudocode:
random(x) = y where 0 <= y <=x
One possibility that requires a bit more space but would guarantee that existing sort algorithms could be used without modification would be to create a copy of the sort value(s) and then modify those in some fashion prior to sorting (and then use the modified value(s) for the sort).
For example, if the data to be sorted is a simple character field Name[N] then add a field (assuming data is in a structure or class) called NameMod[N]. Fill in the NameMod with a copy of Name but add some randomization. Then 3% of the time (or some appropriate amount) change the first character of the name (e.g., change it by +/- one or two characters). And then 10% of the time change the second character +/- a few characters.
Then run it through whatever sort algorithm you prefer. The benefit is that you could easily change those percentages and randomness. And the sort algorithm will still work (e.g., it would not have problems with the compare function returning inconsistent results).
If you are sure that element is at most k far away from where they should be, you can reduce quicksort N log(N) sorting time complexity down to N log(k)....
edit
More specifically, you would create k buckets, each containing N/k elements.
You can do quick sort for each bucket, which takes k * log(k) times, and then sort N/k buckets, which takes N/k log(N/k) time. Multiplying these two, you can do sorting in N log(max(N/k,k))
This can be useful because you can run sorting for each bucket in parallel, reducing total running time.
This works if you are sure that any element in the list is at most k indices away from their correct position after the sorting.
but I do not think you meant any restriction.
Split the list into two equally-sized parts. Sort each part separately, using any usual algorithm. Then merge these parts. Perform some merge iterations as usual, comparing merged elements. For other merge iterations, do not compare the elements, but instead select element from the same part, as in the previous step. It is not necessary to use RNG to decide, how to treat each element. Just ignore sorting order for every N-th element.
Other variant of this approach nearly sorts an array nearly in-place. Split the array into two parts with odd/even indexes. Sort them. (It is even possible to use standard C++ algorithm with appropriately modified iterator, like boost::permutation_iterator). Reserve some limited space at the end of the array. Merge parts, starting from the end. If merged part is going to overwrite one of the non-merged elements, just select this element. Otherwise select element in sorted order. Level of randomness is determined by the amount of reserved space.
Assuming you want the array sorted in ascending order, I would do the following:
for M iterations
pick a random index i
pick a random index k
if (i<k)!=(array[i]<array[k]) then swap(array[i],array[k])
M controls the "sortedness" of the array - as M increases the array becomes more and more sorted. I would say a reasonable value for M is n^2 where n is the length of the array. If it is too slow to pick random elements then you can precompute their indices beforehand. If the method is still too slow then you can always decrease M at the cost of getting a poorer sort.
Take a small random subset of the data and sort it. You can use this as a map to provide an estimate of where every element should appear in the final nearly-sorted list. You can scan through the full list now and move/swap elements that are not in a good position.
This is basically O(n), assuming the small initial sorting of the subset doesn't take a long time. Hopefully you can build the map such that the estimate can be extracted quickly.
Bubblesort to the rescue!
For a unsorted array, you could pick a few random elements and bubble them up or down. (maybe by rotation, which is a bit more efficient) It will be hard to control the amount of (dis)order, even if you pick all N elements, you are not sure that the whole array will be sorted, because elements are moved and you cannot ensure that you touched every element only once.
BTW: this kind of problem tends to occur in game playing engines, where the list with candidate moves is kept more-or-less sorted (because of weighted sampling), and sorting after each iteration is too expensive, and only one or a few elements are expected to move.