How to obtain the index of the median using STL? - c++

How to calculate the median of a digital number array has been discussed before. For example, you can refer to What is the right approach when using STL container for median calculation?. Now I have a different question, and that is how can you get the index of the median in the original STL container. In order to illustrate my question, I give an example:
vector<int> myarray;
myarray.push_back(3);
myarray.push_back(1);
myarray.push_back(100);
myarray.push_back( 20);
myarray.push_back(200);
int n = myarray.size()/2;
nth_element(myarray.begin(), myarray.begin()+n, myarray.end());
int median = myarray[n];
In the above codes I can get the median value but I can not get its index in the original vector array (4). Any ideas? Thanks!

I think there is no straight-forward way to do that.
The vector that you sorted has changed its order, so that searching in that will always return n.
You need to save a copy of your original vector, and search in that. Keep in mind that if the original vector contained duplicates, you will not know exactly which of them was actually put to position n (if this is of any relevance for you).
As an alternative, you could have a look at the implementation of nth_element, and implement your own version that also reports the original position of the found n-th element.

If it is accapteble to search the element
vector<int>::iterator itOfMedian = std::find(myarray.begin(), myarray.end(), median);
int index = itOfMedian - myarray.begin();
should do the trick.
EDIT
seems you have point here. nth_element sorts its argument vector... Therefore
vector<int> myArrayCopy = myarray;
// find median in myArrayCopy
vector<int>::iterator itOfMedian = std::find(myarray.begin(), myarray.end(), median);
int index = itOfMedian - myarray.begin();

You can use std::nth_element to find an iterator to the median element. However, this does a partial sorting of the vector, so you would need to use a copy:
std::vector<int> dataCopy = myarray;
// we will use iterator middle later
std::vector<int>::iterator middle = dataCopy.begin() + (dataCopy.size() / 2);
// this sets iterator middle to the median element
std::nth_element(dataCopy.begin(), middle, dataCopy.end());
int nthValue = *middle;
Now it gets complicated. You have a value corresponding to the median. You can search the original vector for it, and use std::distance to get the index:
std::vector<int>::iterator it = std::find(myarray.begin(), myarray.end(), nthValue);
std::vector<int>::size_type pos = std::distance(myarray.begin(), it);
however, this only works if there are not duplicates of nthValue in myarray.

Sorry to dig up an old topic, but here's a nice way to do it. Exploit the fact that nth_element will sort a pair by the first element; with this in mind, create a vector of pairs where the first part of the pair is value to participate in median calculation, and second is index. Modifying your example:
vector<pair<unsigned int, size_t>> myarray;
myarray.push_back(pair<unsigned int, size_t>( 3, 0));
myarray.push_back(pair<unsigned int, size_t>( 1, 1));
myarray.push_back(pair<unsigned int, size_t>(100, 2));
myarray.push_back(pair<unsigned int, size_t>( 20, 3));
myarray.push_back(pair<unsigned int, size_t>(200, 4));
int n = myarray.size()/2;
nth_element(myarray.begin(), myarray.begin()+n, myarray.end());
int median = myarray[n].first;
int medianindex = myarray[n].second;
Of course myarray has been rearranged, and so myarray[medianindex] is not the median. If you made a copy before nth_element, medianindex would be the desired index.

Related

Nearest permutation to given array

Question
I have two arrays of integers A[] and B[]. Array B[] is fixed, I need to to find the permutation of A[] which is lexiographically smaller than B[] and the permutation is nearest to B[]. Here what I mean is:
for i in (0 <= i < n)
abs(B[i]-A[i]) is minimum and A[] should be smaller than B[] lexiographically.
For Example:
A[]={1,3,5,6,7}
B[]={7,3,2,4,6}
So,possible nearest permutation of A[] to B[] is
A[]={7,3,1,6,5}
My Approach
Try all permutation of A[] and then compare that with B[]. But the time complexity would be (n! * n)
So is there any way to optimize this?
EDIT
n can be as large as 10^5
For better understanding
First, build an ordered map of the counts of the distinct elements of A.
Then, iterate forward through array indices (0 to n−1), "withdrawing" elements from this map. At each point, there are three possibilities:
If i < n-1, and it's possible to choose A[i] == B[i], do so and continue iterating forward.
Otherwise, if it's possible to choose A[i] < B[i], choose the greatest possible value for A[i] < B[i]. Then proceed by choosing the largest available values for all subsequent array indices. (At this point you no longer need to worry about maintaining A[i] <= B[i], because we're already after an index where A[i] < B[i].) Return the result.
Otherwise, we need to backtrack to the last index where it was possible to choose A[i] < B[i], then use the approach in the previous bullet-point.
Note that, despite the need for backtracking, the very worst case here is three passes: one forward pass using the logic in the first bullet-point, one backward pass in backtracking to find the last index where A[i] < B[i] was possible, and then a final forward pass using the logic in the second bullet-point.
Because of the overhead of maintaining the ordered map, this requires O(n log m) time and O(m) extra space, where n is the total number of elements of A and m is the number of distinct elements. (Since m ≤ n, we can also express this as O(n log n) time and O(n) extra space.)
Note that if there's no solution, then the backtracking step will reach all the way to i == -1. You'll probably want to raise an exception if that happens.
Edited to add (2019-02-01):
In a now-deleted answer, גלעד ברקן summarizes the goal this way:
To be lexicographically smaller, the array must have an initial optional section from left to right where A[i] = B[i] that ends with an element A[j] < B[j]. To be closest to B, we want to maximise the length of that section, and then maximise the remaining part of the array.
So, with that summary in mind, another approach is to do two separate loops, where the first loop determines the length of the initial section, and the second loop actually populates A. This is equivalent to the above approach, but may make for cleaner code. So:
Build an ordered map of the counts of the distinct elements of A.
Initialize initial_section_length := -1.
Iterate through the array indices 0 to n−1, "withdrawing" elements from this map. For each index:
If it's possible to choose an as-yet-unused element of A that's less than the current element of B, set initial_section_length equal to the current array index. (Otherwise, don't.)
If it's not possible to choose an as-yet-unused element of A that's equal to the current element of B, break out of this loop. (Otherwise, continue looping.)
If initial_section_length == -1, then there's no solution; raise an exception.
Repeat step #1: re-build the ordered map.
Iterate through the array indices from 0 to initial_section_length-1, "withdrawing" elements from the map. For each index, choose an as-yet-unused element of A that's equal to the current element of B. (The existence of such an element is ensured by the first loop.)
For array index initial_section_length, choose the greatest as-yet-unused element of A that's less than the current element of B (and "withdraw" it from the map). (The existence of such an element is ensured by the first loop.)
Iterate through the array indices from initial_section_length+1 to n−1, continuing to "withdraw" elements from the map. For each index, choose the greatest element of A that hasn't been used yet.
This approach has the same time and space complexities as the backtracking-based approach.
There are n! permutations of A[n] (less if there are repeating elements).
Use binary search over range 0..n!-1 to determine k-th lexicographic permutation of A[] (arbitrary found example) which is closest lower one to B[].
Perhaps in C++ you can exploit std::lower_bound
Based on the discussion in the comment section to your question, you seek an array made up entirely of elements of the vector A that is -- in lexicographic ordering -- closest to the vector B.
For this scenario, the algorithm becomes quite straightforward. The idea is the same as as already mentioned in the answer of #ruakh (although his answer refers to an earlier and more complicated version of your question -- that is still displayed in the OP -- and is therefore more complicated):
Sort A
Loop over B and select the element of A that is closest to B[i]. Remove that element from the list.
If no element in A is smaller-or-equal than B[i], pick the largest element.
Here is the basic implementation:
#include <string>
#include <vector>
#include <algorithm>
auto get_closest_array(std::vector<int> A, std::vector<int> const& B)
{
std::sort(std::begin(A), std::end(A), std::greater<>{});
auto select_closest_and_remove = [&](int i)
{
auto it = std::find_if(std::begin(A), std::end(A), [&](auto x) { return x<=i;});
if(it==std::end(A))
{
it = std::max_element(std::begin(A), std::end(A));
}
auto ret = *it;
A.erase(it);
return ret;
};
std::vector<int> ret(B.size());
for(int i=0;i<(int)B.size();++i)
{
ret[i] = select_closest_and_remove(B[i]);
}
return ret;
}
Applied to the problem in the OP one gets:
int main()
{
std::vector<int> A ={1,3,5,6,7};
std::vector<int> B ={7,3,2,4,6};
auto C = get_closest_array(A, B);
for(auto i : C)
{
std::cout<<i<<" ";
}
std::cout<<std::endl;
}
and it displays
7 3 1 6 5
which seems to be the desired result.

How to make lower bound binary search if we have vector of pairs

I'm trying to implement lower_bound function in my c++ program, but the problem is next: it works fine with vector but it fails if we have to search over vector of pairs
I have one vector of pairs and i want to search first the first member of the pair and if we have multiple values with same value i want to return the smallest of the second value, for example:
Let's say we have the following vector of pairs
v = {(1,1),(2,1),(2,2),(2,3),(3,4),(5,6)};
Let's say we are searching for value K = 2, now I want to return the position 1 (if the array is 0-indexed) because the second value of the pair is 1 and 1 is smallest.
How can I implement this in easiest way, I tried implementing this but it fails in compiling, here is my code:
vector<pair<int,int> >a,b;
void solve() {
sort(b.begin(), b.end());
sort(a.begin(), a.end());
vector<int>::iterator it;
for(int i=0;i<a.size();i++) {
ll zero=0;
int to_search=max(zero, k-a[i].first);
it=lower_bound(b.begin(), b.end(), to_search);
int position=it-b.begin();
if(position==b.size()) continue;
answer=min(answer, a[i].second+b[position].second);
}
}
In other words I'm searching for the first value, but if there are more of that value return the one with smallest second element.
Thanks in advance.
less operator work on pair, so you may use directly
std::lower_bound(v.begin(), v.end(), std::make_pair(2, std::numeric_limits<int>::min()));

Find elements in a vector which lie within specified ranges

I have a vector of integer elements in sorted. An example is given below:
vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
Now given the following "start" and "end" ranges I want to find out which elements of vector A fall into the start and end ranges.
int startA=4; int endA=8;
int startB=20; int endB=99;
int startA=120; int endC=195;
For example,
elements lying in range startA and startB are: {4,5}
elements lying in range startA and startB are: {20,71,89,92}
elements lying in range startC and startC are: {121,172,189,194}
One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range
One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range
If the vector is sorted, as you have shown it to be, you can use binary search to locate the index of the element that is higher than the lower value of the range and index of element that is lower than the higher value of the range.
That will make your search O(log(N)).
You can use std::lower_bound and std::upper_bound, which requires the container to be partially ordered, which is true in your case.
If the vector is not sorted, linear iteration is the best you can do.
If the vector is sorted all you need to do is to use dedicated functions to find your start range iterator and end range iterator - std::lower_bound and std::upper_bound. Eg.:
#include <vector>
#include <algorithm>
#include <iostream>
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
auto start = std::lower_bound(A.begin(), A.end(), 4);
auto end = std::upper_bound(A.begin(), A.end(), 8);
for (auto it = start; it != end; it++) {
std::cout << *it << " ";
}
std::cout << std::endl;
}
//or the C++1z version (works in VS2015u3)
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
std::copy(std::lower_bound(A.begin(), A.end(), 4),
std::upper_bound(A.begin(), A.end(), 8),
std::ostream_iterator<int>(cout, " "));
std::cout << std::endl;
}
This however will work only if startX <= endX so you may want to test the appropriate condition before running it with arbitrary numbers...
Searching bound iterators using std::lower_bound and std::upper_bound will cost O(log(N)) however it has to be stated that iterating through the range of elements in average case is O(N) and the range may contain all the elements in your vector...
The best way I can think is to apply modified binary search twice and find two indices in the vector arr and then print all items in between this range . Time complexity will be O(log n).
A modified form of binary search looks like:(PS its for arrays, also applicable for vector):
int binary_search(int *arr,int start,int end,int key)
{
if(start==end)
{
if(arr[start]==key){return start+1;}
else if(arr[start]>key&&arr[start-1]<=key){return start;}
else return 0;
}
int mid=(start+end)/2;
if(arr[mid]>key && arr[mid-1]<=key)return mid;
else if(arr[mid]>key)return binary_search(arr,start,mid-1,key);
else return binary_search(arr,mid+1,end,key);
}
If range of integers of vector A is not wide, bitmap is worth the consideration.
Let's assume all integers of A are positive and are in between 0 ... 1024, the bitmap can be built with:
#include <bitset>
// ...
// If fixed size is not an option
// consider vector<bool> or boost::dynamic_bitset
std::bitset<1024> bitmap;
for(auto i : A)
bitmap.set(i);
That takes N iterations to set bits, and N/8 for storing bits. With the bitmap, one can match elements as follows:
std::vector<int> result;
for(auto i = startA; i < endA; ++i) {
if (bitmap[i]) result.emplace_back(i);
}
Hence speed of the matching depends on size of range rather than N. This solution should be attractive when you have many limited ranges to match.

Find n largest values in a vector

I currently have a vector and need to find the n largest numbers in it. For example, a user enters 5, i gotta run through it and output the 5 largest. Problem is, i can not sort this vector due to other constraints. Whats the best way to go about this?
Thanks!
Based on your description of not modifying the original vector and my assumption that you want the order to matter, I suggest std::partial_sort_copy:
//assume vector<int> as source
std::vector<int> dest(n); //largest n numbers; VLA or std::dynarray in C++14
std::partial_sort_copy(
std::begin(source), std::end(source), //.begin/.end in C++98/C++03
std::begin(dest), std::end(dest),
std::greater<int>() //remove "int" in C++14
);
//output dest however you want, e.g., std::copy
Is copying and sorting an option? I mean if your application is not that performance critical, this is the simplest (and asymptotically not too bad) way to go!
Something like this (A is incoming vector, N the number largest you want to find, v becomes the result vector):
vector<T> v(N, 0);
for each element in A:
if (element > v[N-1])
for(i = N-1; i > 0 && v[i] < element; i--)
v[i] = v[i-1];
v[i] = element;
This is some sort of "pseudo-C++", not exactly C++, but hopefully describes how you'd do this.

Adding to middle of std::vector

Is there a way to add values to the middle of a vector in C++? Say I have:
vector <string> a;
// a gets filled up with "abcd", "wertyu", "dvcea", "eafdefef", "aeefr", etc
and I want to break up one of the strings and put all of the pieces back into the vector. How would I do that? the strings I break can be anywhere, index = 0, somewhere in the middle, or index = a.size() - 1.
You can insert into a vector at position i by writing
v.insert(v.begin() + i, valueToInsert);
However, this isn't very efficient; it runs in time proportional to the number of elements after the element being inserted. If you're planning on splitting up the strings and adding them back in, you are much better off using a std::list, which supports O(1) insertion and deletion everywhere.
You can do that, but it will be really slow:
int split = 3; // where to split
a.insert(a.begin()+index, a[index].substr(0, split));
a[index+1] = a[index+1].substr(split);
in this example dynamically find the vector middle and insert new element.
std::vector <std::string> friends;
friends.push_back("Ali");
friends.push_back("Kemal");
friends.push_back("Akin");
friends.push_back("Veli");
friends.push_back("Hakan");
// finding middle using size() / 2
int middleIndexRef = friends.size() / 2;
friends.insert(friends.begin() + middleIndexRef, "Bob");