Find out in linear time whether there is a pair in sorted vector that adds up to certain value - c++

Given an std::vector of distinct elements sorted in ascending order, I want to develop an algorithm that determines whether there are two elements in the collection whose sum is a certain value, sum.
I've tried two different approaches with their respective trade-offs:
I can scan the whole vector and, for each element in the vector, apply binary search (std::lower_bound) on the vector for searching an element corresponding to the difference between sum and the current element. This is an O(n log n) time solution that requires no additional space.
I can traverse the whole vector and populate an std::unordered_set. Then, I scan the vector and, for each element, I look up in the std::unordered_set for the difference between sum and the current element. Since searching on a hash table runs in constant time on average, this solution runs in linear time. However, this solution requires additional linear space because of the std::unordered_set data structure.
Nevertheless, I'm looking for a solution that runs in linear time and requires no additional linear space. Any ideas? It seems that I'm forced to trade speed for space.

As the std::vector is already sorted and you can calculate the sum of a pair on the fly, you can achieve a linear time solution in the size of the vector with O(1) space.
The following is an STL-like implementation that requires no additional space and runs in linear time:
template<typename BidirIt, typename T>
bool has_pair_sum(BidirIt first, BidirIt last, T sum) {
if (first == last)
return false; // empty range
for (--last; first != last;) {
if ((*first + *last) == sum)
return true; // pair found
if ((*first + *last) > sum)
--last; // decrease pair sum
else // (*first + *last) < sum (trichotomy)
++first; // increase pair sum
}
return false;
}
The idea is to traverse the vector from both ends – front and back – in opposite directions at the same time and calculate the sum of the pair of elements while doing so.
At the very beginning, the pair consists of the elements with the lowest and the highest values, respectively. If the resulting sum is lower than sum, then advance first – the iterator pointing at the left end. Otherwise, move last – the iterator pointing at the right end – backward. This way, the resulting sum progressively approaches to sum. If both iterators end up pointing at the same element and no pair whose sum is equal to sum has been found, then there is no such a pair.
auto main() -> int {
std::vector<int> vec{1, 3, 4, 7, 11, 13, 17};
std::cout << has_pair_sum(vec.begin(), vec.end(), 2) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 7) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 19) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 30) << '\n';
}
The output is:
0 1 0 1
Thanks to the generic nature of the function template has_pair_sum() and since it just requires bidirectional iterators, this solution works with std::list as well:
std::list<int> lst{1, 3, 4, 7, 11, 13, 17};
has_pair_sum(lst.begin(), lst.end(), 2);

I had the same idea as the one in the answer of 眠りネロク, but with a little bit more comprehensible implementation.
bool has_pair_sum(std::vector<int> v, int sum){
if(v.empty())
return false;
std::vector<int>::iterator p1 = v.begin();
std::vector<int>::iterator p2 = v.end(); // points to the End(Null-terminator), after the last element
p2--; // Now it points to the last element.
while(p1 != p2){
if(*p1 + *p2 == sum)
return true;
else if(*p1 + *p2 < sum){
p1++;
}else{
p2--;
}
}
return false;
}

well, since we are already given sorted array, we can do it with two pointer approach, we first keep a left pointer at start of the array and a right pointer at end of array, then in each iteration we check if sum of value of left pointer index and value of right pointer index is equal or not , if yes, return from here, otherwise we have to decide how to reduce the boundary, that is either increase left pointer or decrease right pointer, so we compare the temporary sum with given sum and if this temporary sum is greater than the given sum then we decide to reduce the right pointer, if we increase left pointer the temporary sum will remain same or only increase but never lesser, so we decide to reduce the right pointer so that temporary sum decrease and we reach near our given sum, similary if temporary sum is less than given sum, so no meaning of reducing the right pointer as temporary sum will either remain sum or decrease more but never increase so we increase our left pointer so our temporary sum increase and we reach near given sum, and we do the same process again and again unless we get the equal sum or left pointer index value becomes greater than right right pointer index or vice versa
below is the code for demonstration, let me know if something is not clear
bool pairSumExists(vector<int> &a, int &sum){
if(a.empty())
return false;
int len = a.size();
int left_pointer = 0 , right_pointer = len - 1;
while(left_pointer < right_pointer){
if(a[left_pointer] + a[right_pointer] == sum){
return true;
}
if(a[left_pointer] + a[right_pointer] > sum){
--right_pointer;
}
else
if(a[left_pointer] + a[right_poitner] < sum){
++left_pointer;
}
}
return false;
}

Related

Why do these two variations on the "quick sorting" algorithm differ so much in performance?

I initially thought up some sorting algorithm to code in C++ for practice. People told me it's very inefficient (indeed, sorting a few hundred numbers took ~10 seconds). The algorithm was to remember the first element ("pivot") in a vector, then parse through every other element, moving each element to the left of the pivot if it is smaller, or not do anything otherwise. This would split the list into to smaller lists to sort; the rest is done through recursion.
So now I know that dividing the list into two and doing recursions like this is essentially what quicksorting does (although there are a lot of variations on how to do the partitioning). I didn't understand why my original code was so inefficient, so I wrote up a new one. Someone had mentioned that it is because of the insert() and erase() functions, so I made sure to not use those, but instead used swap().
Old (slow):
void sort(vector<T>& vec){
int size = vec.size();
if (size <= 1){ //this is the most basic case
return;
}
T pivot = vec[0];
int index = 0; //to help split the list later
for (int i = 1; i < size; ++i){ //moving (or not moving) the elements
if (vec[i] < pivot){
vec.insert(vec.begin(), vec[i]);
vec.erase(vec.begin() + i + 1);
++index;
}
}
if (index == 0){ //in case the 0th element is the smallest
vec.erase(vec.begin());
sort(vec);
vec.insert(vec.begin(), pivot);
}
else if(index == size - 1){ //in case the 0th element is the largest
vec.pop_back();
sort(vec);
vec.push_back(pivot);
}
//here is the main recursive portion
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
//concatenating the sorted lists together
left.push_back(pivot);
left.insert(left.end(), right.begin(), right.end());
vec = left;
}
new (fast):
template <typename T>
void quickSort(vector<T>& vec, const int& left, const int& right){
if (left >= right){ //basic case
return;
}
T pivot = vec[left];
int j = left; //j will be the final index of the pivot before the next iteration
for (int i = left + 1; i <= right; ++i){
if (vec[i] < pivot){
swap(vec[i], vec[j]); //swapping the pivot and lesser element
++j;
swap(vec[i], vec[j]); //sending the pivot next to its original spot so it doesn't go the to right of any greater element
}
}
//recursion
quickSort(vec, left, j - 1);
quickSort(vec, j + 1, right);
}
The difference in performance is insane; the newer version can sort through tens of thousands of numbers in less than a second, while the first one can't do that with 100 numbers. What are erase() and insert() doing to slow it down, exactly? Is it really the erase() and insert() causing the bottleneck, or is there something else I am missing?
First of all, yes, insert() and erase() will be much slower than swap().
insert() will, in the best case, require every element after the spot where you're inserting into the vector to be moved to the next spot in the vector. Think about what happens if you shove yourself into the middle of a crowded line of people - everyone behind you will have to take one step back to make room for you. In the worst case, because inserting into the vector increases the vector's size, the vector may run out of space in its current memory location, leading to the entire vector (element by element) being copied into a new space where it has room to accommodate the newly inserted item. When an element in the middle of a vector is erase()'d, every element after it must be copied and moved up one space; just like how everyone behind you in a line would take one step up if you left said line. In comparison, swap() only moves the two elements being swapped.
In addition to that, I also noticed another major efficiency improvement between the two code samples:
In the first code sample, you have:
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
which uses the range constructor of C++ vectors. Every time the code reaches this point, when it creates left and right, it is traversing the entirety of vec and copying each element one-by-one into the two new vectors.
In the newer, faster code, none of the elements are ever copied into a new vector; the entire algorithm takes place in the exact memory space in which the original numbers existed.
Vectors are arrays, so inserting and deleting elements in places other than the end position is done by relocate all the elements that were after position to their new positions.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Maintain a sorted array in O(1)?

We have a sorted array and we would like to increase the value of one index by only 1 unit (array[i]++), such that the resulting array is still sorted. Is this possible in O(1)?
It is fine to use any data structure possible in STL and C++.
In a more specific case, if the array is initialised by all 0 values, and it is always incrementally constructed only by increasing a value of an index by one, is there an O(1) solution?
I haven't worked this out completely, but I think the general idea might help for integers at least. At the cost of more memory, you can maintain a separate data-structure that maintains the ending index of a run of repeated values (since you want to swap your incremented value with the ending index of the repeated value). This is because it's with repeated values that you run into the worst case O(n) runtime: let's say you have [0, 0, 0, 0] and you increment the value at location 0. Then it is O(n) to find out the last location (3).
But let's say that you maintain the data-structure I mentioned (a map would works because it has O(1) lookup). In that case you would have something like this:
0 -> 3
So you have a run of 0 values that end at location 3. When you increment a value, let's say at location i, you check to see if the new value is greater than the value at i + 1. If it is not, you are fine. But if it is, you look to see if there is an entry for this value in the secondary data-structure. If there isn't, you can simply swap. If there is an entry, you look up the ending-index and then swap with the value at that location. You then make any changes you need to the secondary data-structure to reflect the new state of the array.
A more thorough example:
[0, 2, 3, 3, 3, 4, 4, 5, 5, 5, 7]
The secondary data-structure is:
3 -> 4
4 -> 6
5 -> 9
Let's say you increment the value at location 2. So you have incremented 3, to 4. The array now looks like this:
[0, 2, 4, 3, 3, 4, 4, 5, 5, 5, 7]
You look at the next element, which is 3. You then look up the entry for that element in the secondary data-structure. The entry is 4, which means that there is a run of 3's that end at 4. This means that you can swap the value from the current location with the value at index 4:
[0, 2, 3, 3, 4, 4, 4, 5, 5, 5, 7]
Now you will also need to update the secondary data-structure. Specifically, there the run of 3's ends one index early, so you need to decrement that value:
3 -> 3
4 -> 6
5 -> 9
Another check you will need to do is to see if the value is repeated anymore. You can check that by looking at the i - 1th and the i + 1th locations to see if they are the same as the value in question. If neither are equal, then you can remove the entry for this value from the map.
Again, this is just a general idea. I will have to code it out to see if it works out the way I thought about it.
Please feel free to poke holes.
UPDATE
I have an implementation of this algorithm here in JavaScript. I used JavaScript just so I could do it quickly. Also, because I coded it up pretty quickly it can probably be cleaned up. I do have comments though. I'm not doing anything esoteric either, so this should be easily portable to C++.
There are essentially two parts to the algorithm: the incrementing and swapping (if necessary), and book-keeping done on the map that keeps track of our ending indices for runs of repeated values.
The code contains a testing harness that starts with an array of zeroes and increments random locations. At the end of every iteration, there is a test to ensure that the array is sorted.
var array = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
var endingIndices = {0: 9};
var increments = 10000;
for(var i = 0; i < increments; i++) {
var index = Math.floor(Math.random() * array.length);
var oldValue = array[index];
var newValue = ++array[index];
if(index == (array.length - 1)) {
//Incremented element is the last element.
//We don't need to swap, but we need to see if we modified a run (if one exists)
if(endingIndices[oldValue]) {
endingIndices[oldValue]--;
}
} else if(index >= 0) {
//Incremented element is not the last element; it is in the middle of
//the array, possibly even the first element
var nextIndexValue = array[index + 1];
if(newValue === nextIndexValue) {
//If the new value is the same as the next value, we don't need to swap anything. But
//we are doing some book-keeping later with the endingIndices map. That code requires
//the ending index (i.e., where we moved the incremented value to). Since we didn't
//move it anywhere, the endingIndex is simply the index of the incremented element.
endingIndex = index;
} else if(newValue > nextIndexValue) {
//If the new value is greater than the next value, we will have to swap it
var swapIndex = -1;
if(!endingIndices[nextIndexValue]) {
//If the next value doesn't have a run, then location we have to swap with
//is just the next index
swapIndex = index + 1;
} else {
//If the next value has a run, we get the swap index from the map
swapIndex = endingIndices[nextIndexValue];
}
array[index] = nextIndexValue;
array[swapIndex] = newValue;
endingIndex = swapIndex;
} else {
//If the next value is already greater, there is nothing we need to swap but we do
//need to do some book-keeping with the endingIndices map later, because it is
//possible that we modified a run (the value might be the same as the value that
//came before it). Since we don't have anything to swap, the endingIndex is
//effectively the index that we are incrementing.
endingIndex = index;
}
//Moving the new value to its new position may have created a new run, so we need to
//check for that. This will only happen if the new position is not at the end of
//the array, and the new value does not have an entry in the map, and the value
//at the position after the new position is the same as the new value
if(endingIndex < (array.length - 1) &&
!endingIndices[newValue] &&
array[endingIndex + 1] == newValue) {
endingIndices[newValue] = endingIndex + 1;
}
//We also need to check to see if the old value had an entry in the
//map because now that run has been shortened by one.
if(endingIndices[oldValue]) {
var newEndingIndex = --endingIndices[oldValue];
if(newEndingIndex == 0 ||
(newEndingIndex > 0 && array[newEndingIndex - 1] != oldValue)) {
//In this case we check to see if the old value only has one entry, in
//which case there is no run of values and so we will need to remove
//its entry from the map. This happens when the new ending-index for this
//value is the first location (0) or if the location before the new
//ending-index doesn't contain the old value.
delete endingIndices[oldValue];
}
}
}
//Make sure that the array is sorted
for(var j = 0; j < array.length - 1; j++) {
if(array[j] > array[j + 1]) {
throw "Array not sorted; Value at location " + j + "(" + array[j] + ") is greater than value at location " + (j + 1) + "(" + array[j + 1] + ")";
}
}
}
In a more specific case, if the array is initialised by all 0 values, and it is always incrementally constructed only by increasing a value of an index by one, is there an O(1) solution?
No. Given an array of all 0's: [0, 0, 0, 0, 0]. If you increment the first value, giving [1, 0, 0, 0, 0], then you will have to make 4 swaps to ensure that it remains sorted.
Given a sorted array with no duplicates, then the answer is yes. But after the first operation (i.e. the first time you increment), then you could potentially have duplicates. The more increments you do, the higher the likelihood is that you'll have duplicates, and the more likely it'll take O(n) to keep that array sorted.
If all you have is the array, it's impossible to guarantee less than O(n) time per increment. If what you're looking for is a data structure that supports sorted order and lookup by index, then you probably want an order stastic tree.
If the values are small, counting sort will work. Represent the array [0,0,0,0] as {4}. Incrementing any zero gives {3,1} : 3 zeroes and a one. In general, to increment any value x, deduct one from the count of x and increment the count of {x+1}. The space efficiency is O(N), though, where N is the highest value.
It depends on how many items can have the same value. If more items can have the same value, then it is not possible to have O(1) with ordinary arrays.
Let's do an example: suppose array[5] = 21, and you want to do array[5]++:
Increment the item:
array[5]++
(which is O(1) because it is an array).
So, now array[5] = 22.
Check the next item (i.e., array[6]):
If array[6] == 21, then you have to keep checking new items (i.e., array[7] and so on) until you find a value higher than 21. At that point you can swap the values. This search is not O(1) because potentially you have to scan the whole array.
Instead, if items cannot have the same value, then you have:
Increment the item:
array[5]++
(which is O(1) because it is an array).
So, now array[5] = 22.
The next item cannot be 21 (because two items cannot have the same value), so it must have a value > 21 and the array is already sorted.
So you take sorted array and hashtable. You go over array to figure out 'flat' areas - where elements are of the same value. For every flat area you have to figure out three things 1) where it starts (index of first element) 2) what is it's value 3) what is the value of next element (the next bigger). Then put this tuple into the hashtable, where the key will be element value. This is prerequisite and it's complexity doesn't really matter.
Then when you increase some element (index i) you look up a table for index of next bigger element (call it j), and swap i with i - 1. Then 1) add new entry to hashtable 2) update existing entry for it's previous value.
With perfect hashtable (or limited range of possible values) it will be almost O(1). The downside: it will not be stable.
Here is some code:
#include <iostream>
#include <unordered_map>
#include <vector>
struct Range {
int start, value, next;
};
void print_ht(std::unordered_map<int, Range>& ht)
{
for (auto i = ht.begin(); i != ht.end(); i++) {
Range& r = (*i).second;
std::cout << '(' << r.start << ", "<< r.value << ", "<< r.next << ") ";
}
std::cout << std::endl;
}
void increment_el(int i, std::vector<int>& array, std::unordered_map<int, Range>& ht)
{
int val = array[i];
array[i]++;
//Pick next bigger element
Range& r = ht[val];
//Do the swapping, so last element of that range will be first
std::swap(array[i], array[ht[r.next].start - 1]);
//Update hashtable
ht[r.next].start--;
}
int main(int argc, const char * argv[])
{
std::vector<int> array = {1, 1, 1, 2, 2, 3};
std::unordered_map<int, Range> ht;
int start = 0;
int value = array[0];
//Build indexing hashtable
for (int i = 0; i <= array.size(); i++) {
int cur_value = i < array.size() ? array[i] : -1;
if (cur_value > value || i == array.size()) {
ht[value] = {start, value, cur_value};
start = i;
value = cur_value;
}
}
print_ht(ht);
//Now let's increment first element
increment_el(0, array, ht);
print_ht(ht);
increment_el(3, array, ht);
print_ht(ht);
for (auto i = array.begin(); i != array.end(); i++)
std::cout << *i << " ";
return 0;
}
Yes and no.
Yes if the list contains only unique integers, as that means you only need to check the next value. No in any other situation. If the values are not unique, incrementing the first of N duplicate values means that it must move N positions. If the values are floating-point, you may have thousands of values between x and x+1
It's important to be very clear about the requirements; the simplest way is to express the problem as an ADT (Abstract Datatype), listing the required operations and complexities.
Here's what I think you are looking for: a datatype which provides the following operations:
Construct(n): Create a new object of size n all of whose values are 0.
Value(i): Return the value at index i.
Increment(i): Increment the value at index i.
Least(): Return the index of the element with least value (or one such element if there are several).
Next(i): Return the index of the next element after element i in a sorted traversal starting at Least(), such that the traversal will return every element.
Aside from the Constructor, we want every one of the above operations to have complexity O(1). We also want the object to occupy O(n) space.
The implementation uses a list of buckets; each bucket has a value and a list of elements. Each element has an index, a pointer to the bucket it is part of. Finally, we have an array of pointers to elements. (In C++, I'd probably use iterators rather than pointers; in another language, I'd probably use intrusive lists.) The invariants are that no bucket is ever empty, and the value of the buckets are strictly monotonically increasing.
We start with a single bucket with value 0 which has a list of n elements.
Value(i) is implemented by returning the value of the bucket of the element referenced by the iterator at element i of the array. Least() is the index of the first element in the first bucket. Next(i) is the index of the next element after the one referenced by the iterator at element i, unless that iterator is already pointing at the end of the the list in which case it is the first element in the next bucket, unless the element's bucket is the last bucket, in which case we're at the end of the element list.
The only interface of interest is Increment(i), which is as follows:
If element i is the only element in its bucket (i.e. there is no next element in the bucket list, and element i is the first element in the bucket list):
Increment the value of the associated bucket.
If the next bucket has the same value, append the next bucket's element list to this bucket's element list (this is O(1), regardless of the list's size, because it is just a pointer swap), and then delete the next bucket.
If element i is not the only element in its bucket, then:
Remove it from its bucket list.
If the next bucket has the next sequential value, then push element i onto the next bucket's list.
Otherwise, the next bucket's value is larger, then create a new bucket with the next sequential value and only element i and insert it between this bucket and the next one.
just iterate along the array from the modified element until you find the correct place, then swap. Average case complexity is O(N) where N is the average number of duplicates. Worst case is O(n) where n is the length of the array. As long as N isn't large and doesn't scale badly with n, you're fine and can probably pretend it's O(1) for practical purposes.
If duplicates are the norm and/or scale strongly with n, then there are better solutions, see other responses.
I think that it is possible without using a hashtable. I have an implementation here:
#include <cstdio>
#include <vector>
#include <cassert>
// This code is a solution for http://stackoverflow.com/questions/19957753/maintain-a-sorted-array-in-o1
//
// """We have a sorted array and we would like to increase the value of one index by only 1 unit
// (array[i]++), such that the resulting array is still sorted. Is this possible in O(1)?"""
// The obvious implementation, which has O(n) worst case increment.
class LinearIncrementor
{
public:
LinearIncrementor(int numElems);
int valueAt(int index) const;
void incrementAt(int index);
private:
std::vector<int> m_values;
};
// Free list to store runs of same values
class RunList
{
public:
struct Run
{
int m_end; // end index of run, inclusive, or next object in free list
int m_value; // value at this run
};
RunList();
int allocateRun(int endIndex, int value);
void freeRun(int index);
Run& runAt(int index);
const Run& runAt(int index) const;
private:
std::vector<Run> m_runs;
int m_firstFree;
};
// More optimal implementation, which increments in O(1) time
class ConstantIncrementor
{
public:
ConstantIncrementor(int numElems);
int valueAt(int index) const;
void incrementAt(int index);
private:
std::vector<int> m_runIndices;
RunList m_runs;
};
LinearIncrementor::LinearIncrementor(int numElems)
: m_values(numElems, 0)
{
}
int LinearIncrementor::valueAt(int index) const
{
return m_values[index];
}
void LinearIncrementor::incrementAt(int index)
{
const int n = static_cast<int>(m_values.size());
const int value = m_values[index];
while (index+1 < n && value == m_values[index+1])
++index;
++m_values[index];
}
RunList::RunList() : m_firstFree(-1)
{
}
int RunList::allocateRun(int endIndex, int value)
{
int runIndex = -1;
if (m_firstFree == -1)
{
runIndex = static_cast<int>(m_runs.size());
m_runs.resize(runIndex + 1);
}
else
{
runIndex = m_firstFree;
m_firstFree = m_runs[runIndex].m_end;
}
Run& run = m_runs[runIndex];
run.m_end = endIndex;
run.m_value = value;
return runIndex;
}
void RunList::freeRun(int index)
{
m_runs[index].m_end = m_firstFree;
m_firstFree = index;
}
RunList::Run& RunList::runAt(int index)
{
return m_runs[index];
}
const RunList::Run& RunList::runAt(int index) const
{
return m_runs[index];
}
ConstantIncrementor::ConstantIncrementor(int numElems) : m_runIndices(numElems, 0)
{
const int runIndex = m_runs.allocateRun(numElems-1, 0);
assert(runIndex == 0);
}
int ConstantIncrementor::valueAt(int index) const
{
return m_runs.runAt(m_runIndices[index]).m_value;
}
void ConstantIncrementor::incrementAt(int index)
{
const int numElems = static_cast<int>(m_runIndices.size());
const int curRunIndex = m_runIndices[index];
RunList::Run& curRun = m_runs.runAt(curRunIndex);
index = curRun.m_end;
const bool freeCurRun = index == 0 || m_runIndices[index-1] != curRunIndex;
RunList::Run* runToMerge = NULL;
int runToMergeIndex = -1;
if (curRun.m_end+1 < numElems)
{
const int nextRunIndex = m_runIndices[curRun.m_end+1];
RunList::Run& nextRun = m_runs.runAt(nextRunIndex);
if (curRun.m_value+1 == nextRun.m_value)
{
runToMerge = &nextRun;
runToMergeIndex = nextRunIndex;
}
}
if (freeCurRun && !runToMerge) // then free and allocate at the same time
{
++curRun.m_value;
}
else
{
if (freeCurRun)
{
m_runs.freeRun(curRunIndex);
}
else
{
--curRun.m_end;
}
if (runToMerge)
{
m_runIndices[index] = runToMergeIndex;
}
else
{
m_runIndices[index] = m_runs.allocateRun(index, curRun.m_value+1);
}
}
}
int main(int argc, char* argv[])
{
const int numElems = 100;
const int numInc = 1000000;
LinearIncrementor linearInc(numElems);
ConstantIncrementor constInc(numElems);
srand(1);
for (int i = 0; i < numInc; ++i)
{
const int index = rand() % numElems;
linearInc.incrementAt(index);
constInc.incrementAt(index);
for (int j = 0; j < numElems; ++j)
{
if (linearInc.valueAt(j) != constInc.valueAt(j))
{
printf("Error: differing values at increment step %d, value at index %d\n", i, j);
}
}
}
return 0;
}
As a complement to the other answers: if you can only have the array, then you cannot indeed guarantee the operation will be constant-time; but because the array is sorted, you can find the end of a run of identical numbers in log n operations, not in n operations. This is simply a binary search.
If we expect most runs of numbers to be short, we should use galloping search, which is a variant where we first find the bounds by looking at positions +1, +2, +4, +8, +16, etc. and then doing binary search inside. You would get a time that is often constant (and extremely fast if the item is unique) but can grow up to log n. Unless for some reason long runs of identical numbers remain common even after many updates, this might outperform any solution that requires keeping additional data.

Solving the array sum problem using iterators and testing for equality only

While getting ready for interviews, I decided to code the classic "Find if there are two elements in an array that sum up to a given number" question using iterator logic, so that it can be generalized to other containers than vector.
Here's my function so far
// Search given container for two elements with given sum.
// If two such elements exist, return true and the iterators
// pointing to the elements.
bool hasElementSum( int sum, const vector<int>& v, vector<int>::iterator& el1, vector<int>::iterator& el2 )
{
bool ret = false;
el1 = v.begin();
el2 = v.end()-1;
while ( el1 != el2 ) {
if ( *el1 + *el2 == sum ) return true;
++el1;--el2;
}
return false;
}
This, of course, doesn't work, but I couldn't figure out a way to do it without using the condition while ( el1 >= el2 ). Various sources I looked advise against using omnly equality checking for iterators, to be able to generalize to all types of containers that support iterators.
Thanks!
First of all, your algorithm is wrong unless you've somehow determined ahead of time that you only need to look at sums where one item is in the first half of the collection, and the other is in the second half of the collection.
If the input's not sorted, then #sbi's answer is about as good as it gets.
With a sorted, random-access input, you can start with the first element, and do a binary search (or interpolation search, etc.) to see if you can find the value that would have to go with that to produce the desired sum. Then you can try the second element, but when you do the binary search (or whatever) use the result from the previous search as the upper limit. Since your first element is larger than the previous one, the matching value to produce the correct sum must be less than or equal to what you found the last time around.
foreach element1 in array
foreach element2 in array + &element1
if( element1 + element2 == sum )
return true
return false
This is O(N^2), since you have to add each element to each of the other elements.
Isn't this question usually asked with a sorted array ?
If not it has to work in O(n^2) complexity, and you will have to check all possible pairs.
I propose the following method though did not analyze the order
Construct a binary search tree with all the elements of the vector, Then for each element
foreach(element = vec.begin to vec.end)
{
if element == node.data, skip
if the element + node.data == sum, return true
if the element + node.data > sum, goto left child
if the element + node.data < sum, goto right child
}
Not a perfect solution/algorithm, but something of this kind.
Sorry, I screwed this one up. What I meant to write was a sort followed by a linear passed, which is the typical answer given to this question, as ltsik pointed out in his comment to Jerry, i.e. something like
bool hasElementSum( int sum, const vector<int>& v, int* ind1, int* ind2 )
{
*ind1 = 0; *ind2 = v.size()-1;
std::sort( v.begin(), v.end() );
while ( *ind1 <= *ind2 ) {
int s = v[*ind1] + v[*ind2];
if ( s > sum ) (*ind1)++;
else if ( s < sum ) (*ind2)++;
else return true
}
return false;
}
My question was how to write this using iterators without saying while (iter1 <= iter2 ) in order to be general, but I now see that doesn't make sense because this algorithm needs random access iterators anyway. Also, returning the indexes is meaningless since they refer to the sorted array and not the original one.

Find largest and second largest element in a range

How do I find the above without removing the largest element and searching again? Is there a more efficient way to do this? It does not matter if the these elements are duplicates.
for (e: all elements) {
if (e > largest) {
second = largest;
largest = e;
} else if (e > second) {
second = e;
}
}
You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)
using partial_sort ?
std::partial_sort(aTest.begin(), aTest.begin() + 2, aTest.end(), Functor);
An Example:
std::vector<int> aTest;
aTest.push_back(3);
aTest.push_back(2);
aTest.push_back(4);
aTest.push_back(1);
std::partial_sort(aTest.begin(), aTest.begin()+2,aTest.end(), std::greater<int>());
int Max = aTest[0];
int SecMax = aTest[1];
nth_element(begin, begin+n,end,Compare) places the element that would be nth (where "first" is "0th") if the range [begin, end) were sorted at position begin+n and makes sure that everything from [begin,begin+n) would appear before the nth element in the sorted list. So the code you want is:
nth_element(container.begin(),
container.begin()+1,
container.end(),
appropriateCompare);
This will work well in your case, since you're only looking for the two largest. Assuming your appropriateCompare sorts things from largest to smallest, the second largest element with be at position 1 and the largest will be at position 0.
Lets assume you mean to find the two largest unique values in the list.
If the list is already sorted, then just look at the second last element (or rather, iterate from the end looking for the second last value).
If the list is unsorted, then don't bother to sort it. Sorting is at best O(n lg n). Simple linear iteration is O(n), so just loop over the elements keeping track:
v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
if(*i > best) {
second_best = best;
best = *i;
} else if(*i > second_best) {
second_best = *i;
}
There are of course other criteria, and these could all be put into the test inside the loop. However, should you mean that two elements that both have the same largest value should be found, you have to consider what happens should three or more elements all have this largest value, or if two or more elements have the second largest.
The optimal algorithm shouldn't need more than 1.5 * N - 2 comparisons. (Once we've decided that it's O(n), what's the coefficient in front of N? 2 * N comparisons is less than optimal).
So, first determine the "winner" and the "loser" in each pair - that's 0.5 * N comparisons.
Then determine the largest element by comparing winners - that's another 0.5 * N - 1 comparisons.
Then determine the second-largest element by comparing the loser of the pair where the largest element came from against the winners of all other pairs - another 0.5 * N - 1 comparisons.
Total comparisons = 1.5 N - 2.
The answer depends if you just want the values, or also iterators pointing at the values.
Minor modification of #will answer.
v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
{
if(*i > best)
{
second_best = best;
best = *i;
}
else if (*i > second_best)
{
second_best = *i;
}
}
Create a sublist from n..m, sort it descending. Then grab the first two elements. Delete these elements from the orginal list.
You can scan the list in one pass and save the 1st and 2nd values, that has a O(n) efficiency while sorting is O(n log n).
EDIT:
I think that partial sort is O(n log k)
Untested but fun:
template <typename T, int n>
class top_n_functor : public unary_function<T, void>
{
void operator() (const T& x) {
auto f = lower_bound(values_.begin(), values_.end(), x);
if(values_.size() < n) {
values_.insert(f, x);
return;
}
if(values_.begin() == f)
return;
auto removed = values_.begin();
values_.splice(removed, values_, removed+1, f);
*removed = x;
}
std::list<T> values() {
return values_;
}
private:
std::list<T> values_;
};
int main()
{
int A[] = {1, 4, 2, 8, 5, 7};
const int N = sizeof(A) / sizeof(int);
auto vals = for_each(A, A + N, top_n_functor<int,2>()).values();
cout << "The top is " << vals.front()
<< " with second place being " << *(vals.begin()+1) << endl;
}
If the largest is the first element, search for the second largest in [largest+1,end). Otherwise search in [begin,largest) and [largest+1,end) and take the maximum of the two. Of course, this has O(2n), so it's not optimal.
If you have random-access iterators, you could do as quick sort does and use the ever-elegant recursion:
template< typename T >
std::pair<T,T> find_two_largest(const std::pair<T,T>& lhs, const std::pair<T,T>& rhs)
{
// implementation finding the two largest of the four values left as an exercise :)
}
template< typename RAIter >
std::pair< typename std::iterator_traits<RAIter>::value_type
, typename std::iterator_traits<RAIter>::value_type >
find_two_largest(RAIter begin, RAIter end)
{
const ptr_diff_t diff = end-begin;
if( diff < 2 )
return std::make_pair(*begin, *begin);
if( diff < 3 )
return std::make_pair(*begin, *begin+1);
const RAIter middle = begin + (diff)/2;
typedef std::pair< typename std::iterator_traits<RAIter>::value_type
, typename std::iterator_traits<RAIter>::value_type >
result_t;
const result_t left = find_two_largest(begin,middle);
const result_t right = find_two_largest(middle,end);
return find_two_largest(left,right);
}
This has O(n) and shouldn't make more comparisons than NomeN's implementation.
top k is usually a bit better than n(log k)
template <class t,class ordering>
class TopK {
public:
typedef std::multiset<t,ordering,special_allocator> BEST_t;
BEST_t best;
const size_t K;
TopK(const size_t k)
: K(k){
}
const BEST_t& insert(const t& item){
if(best.size()<k){
best.insert(item);
return best;
}
//k items in multiset now
//and here is why its better - because if the distribution is random then
//this and comparison above are usually the comparisons that is done;
if(compare(*best.begin(),item){//item better than worst
erase(begin());//the worst
best.insert(item); //log k-1 average as only k-1 items in best
}
return best;
}
template <class it>
const BEST_t& insert(it i,const it last){
for(;i!=last;++i){
insert(*i);
}
return best;
}
};
Of course the special_allocator can in essence be just an array of k multiset value_types and a list of those nodes (which typically has nothing on it as the other k are in use in the multiset until its time to put a new one in and we erase and then immediate ly reuse it. Good to have this or else the memory alloc/free in std::multiset and the cache line crap kills ya. Its a (very) tiny bit of work to give it static state without violating STL allocator rules.
Not as good as a specialized algo for exactly 2 but for fixed k<<n, I would GUESS (2n+delta*n) where delta is small - my DEK ACP vol3 S&S is packed away and an estimate on delta is a bit more work that I want to do.
average worst is I would guess n(log(k-1) + 2) when in opposite order and all distinct.
best is 2n + k(log k) for the k best being the first
I think you could implement a custom array and overload the indexed get/set methods of elements. Then on every set call, compare the new value with two fields for the result. While this makes setter slower, it benefits from caching or even registers. Then its a no op to get the result. This must be faster if you populate array only once per finding maximums. But if array is modified frequently, then it is slower.
If array is used in vectorized loops, then it gets harder to implement as you have to use avx/sse optimized max methods inside setter.