How to find median of `std::set` [duplicate]

How to find median of `std::set` [duplicate] - c++

This question already has answers here:
Efficient way to get middle (median) of an std::set?
(6 answers)
Closed 2 years ago.
I'm trying to find the median of a std::set. Since std::set already sorts everything, I just have to pick the middle element. My idea is to advance to the half: std::advance(e, rtspUrls.size() / 2);, but I'm not sure how it'll behave. What about numbers like 1.5? Will it advance to something?
I'm using a try catch to try to not advance into something undefined. Is this safe?
According to http://www.cplusplus.com/reference/algorithm/min_element/?kw=min_element, std::advance throws if the iterator throws. I'm not sure if the iterator for std::set throws when we try to ++ it (https://en.cppreference.com/w/cpp/named_req/BidirectionalIterator does not say anything).
std::set<RTSPUrl, decltype(compare_rtsp_url)*> rtspUrls(compare_rtsp_url);
std::set<RTSPUrl, decltype(compare_rtsp_url)*>::iterator e = rtspUrls.begin();
for (const RTSPUrl &rtspUrl : stream.rtsp_urls())
{
if (rtspUrl.has_resolution())
{
rtspUrls.push_back(rtspUrl);
}
}
try
{
std::advance(e, rtspUrls.size() / 2);
return *e;
}
catch (std::exception &e)
{
return std::nullopt;
}

I just have to pick the middle element. My idea is to advance to the half: std::advance(e, rtspUrls.size() / 2);, but I'm not sure how it'll behave. What about numbers like 1.5? Will it advance to something?
std::set indices use unsigned integer values (size_t) so the double 1.5 will be converted to size_t 1.
I'm not sure if the iterator for std::set throws when we try to ++
No it will not, but advancing beyond end() is undefined.
A true median for a set with an even amount of elements would take the average of the two middle elements - but that requires that the type you store in your std::set both supports + and /. Example:
std::set<double> foo{1., 2., 3., 10.};
if(foo.empty()) throw std::runtime_error("no elements in set");
double median;
if(foo.size() % 2 == 0) { // even number of elements
auto lo = std::next(foo.begin(), foo.size() / 2 - 1);
auto hi = std::next(lo);
median = (*lo + *hi) / 2.;
} else { // odd number of elements
median = *std::next(foo.begin(), foo.size() / 2);
}
std::cout << median << '\n'; // prints 2.5
In your case, the type in the set does not look like it's supporting + and / to create an average of two RTSPUrls in case you have an even number of elements, so you should probably just go for one of the two middle elements in case you have an even amount. Either by returning an iterator (so the user can then check if it's rtspUrls.end()):
return std::next(rtspUrls.begin(), rtspUrls.size() / 2);
Or by returning a reference to, or copy of, the element:
if(rtspUrls.empty()) throw std::runtime_error("no elements in set");
return *std::next(rtspUrls.begin(), rtspUrls.size() / 2);

With std::set you are limited to using iterators to iterate to the middle element (in case of an odd number of entries in your set) or iterating to middle-1 and middle and taking the average (int the case of a even number of entries) to determine the median.
A simple loop and a counter is about as straight-forward as it gets. A short example would be:
#include <iostream>
#include <set>
int main (void) {
#ifdef ODD
std::set<std::pair<char,int>> s {{'a',1}, {'b',2}, {'c',3}, {'d',4}, {'e',5}};
#else
std::set<std::pair<char,int>> s {{'a',1}, {'b',2}, {'c',3}, {'d',4}, {'e',5}, {'f',6}};
#endif
double median = 0.;
size_t n = 0;
for (auto iter = s.begin(); iter != s.end(); iter++, n++) {
if (n == s.size() / 2 - 1 && s.size() % 2 == 0) {
median += iter->second;
std::cout << iter->first << " " << iter->second << '\n';
}
if (n == s.size() / 2) {
median += iter->second;
if (s.size() % 2 == 0)
median /= 2.;
std::cout << iter->first << " " << iter->second
<< "\n\nmedian " << median << '\n';
break;
}
}
}
(of course you will have to adjust the types to meet your data)
Example Use/Output
Compiled with ODD defined:
$ ./bin/set_median
c 3
median 3
Compiled without additional definition for the EVEN case:
$ ./bin/set_median
c 3
d 4
median 3.5
std::next
You can use std::next to advance to the nth iterator after the current. You must assign the result:
median = 0.;
auto iter = s.begin();
if (s.size() % 2 == 0) {
iter = std::next(iter, s.size() / 2 - 1);
median += iter->second;
iter = std::next(iter);
median += iter->second;
median /= 2.;
}
else {
iter = std::next(iter, s.size() / 2);
median += iter->second;
}
std::cout << "\nmedian " << median << '\n';
std::advance
std::advance advances the iterator provided as a parameter to the nth iterator after the current:
median = 0.;
iter = s.begin();
if (s.size() % 2 == 0) {
std::advance(iter, s.size() / 2 - 1);
median += iter->second;
std::advance(iter, 1);
median += iter->second;
median /= 2.;
}
else {
std::advance(iter, s.size() / 2);
median += iter->second;
}
std::cout << "\nmedian " << median << '\n';
(the output for median is the same as with the loop above)
Look things over and let me know if you have further questions.

I just have to pick the middle element
Only when the set contains an odd number of elements. Otherwise, when the size is even, the median is defined as the mean of the two middle values, sometimes called upper and lower median.
What about numbers like 1.5?
You will never get that since rtspUrls.size() / 2 is an integer division that truncates any decimal places.
I think, passing an float or double as second parameter, like std::advance(e, 1.5) shouldn't compile.
As far as I can see the reference does not specify the type of the second paramter. However the "possible implementations"-section uses always the difference type specific to the first parameter, which is usually an integral type and seems reasonable.
I'm using a try catch to try to not advance into something undefined. Is this safe?
No, dereferencing or incrementing an invalid iterator is undefined behaviour and is not required to throw any exceptions. Allthough many implementations provide extensive error checking in debug builds and be so nice to throw an exception UB occurs. But advancing until half the sets size won't become a problem.

Related

different result between c+ set and unordered_set

I am working on leetcode Frog Jump question and find some wired result when I use unordered_set instead of set for the following test case. unordered_set and set both have size 4, but looks like unordered_set doesn't loop through all elements.
[0,1,2,3,4,5,6,7,8,9,10,11]
output :
set size 4
1
2
3
4
unordered set size: 4
1
Struggleing for hours but can't find out any reason. Any tips would be really appeciated.
bool canCross(vector<int>& stones) {
unordered_map<int, set<int>> dp;
unordered_map<int, unordered_set<int>> dp1;
unordered_set<int> s(stones.begin(), stones.end());
dp[0].insert(0);
dp1[0].insert(0);
for (int i = 0; i < stones.size(); ++i) {
if (i == 10) cout << "set size " << dp[stones[i]].size() << endl;
for (auto a: dp[stones[i]]) {
if (i == 10) cout << a << "\t" << endl;
int b = stones[i];
if (s.count(b + a - 1)) {
dp[b + a - 1].insert(a - 1);
}
if (s.count(b + a)) {
dp[b + a].insert(a);
}
if (s.count(b + a + 1)) {
dp[b + a + 1].insert(a + 1);
}
}
if (i == 10) cout << "unordered set size: " << dp1[stones[i]].size() << endl;
for (auto a: dp1[stones[i]]) {
if (i == 10) cout << a << "\t" << endl;
int b = stones[i];
if (s.count(b + a - 1)) {
dp1[b + a - 1].insert(a - 1);
}
if (s.count(b + a)) {
dp1[b + a].insert(a);
}
if (s.count(b + a + 1)) {
dp1[b + a + 1].insert(a + 1);
}
}
}
return !dp[stones.back()].empty();
}

It happens because some of your insertions modify the same container that you are currently iterating over by a for cycle. Not surprisingly, insertions into setand into unordered_set might end up in different positions in the linear sequence of container elements. In one container the new element ends up in front of the current position and is later iterated over by the cycle. In other container the new element ends up behind the current position and is never seen by the cycle.
It is generally not a good idea to modify container that you are currently iterating over by a range-based for cycle. It might not produce any undefined behavior in your case (if you are using associative containers with stable iterators), but still... in my opinion range-based for should be reserved for iterating over non-changing containers.
In your case insertion of a new element into an std::unordered_set may trigger rehashing and invalidate all iterators of that unordered_set. It means that if that unordered_set is currently being iterated over by a range-based for, you end up with undefined behavior.

Reversing sequence of integers

I'm learning C++ and recently got into this problem:
reverse sequence of positive integers that are coming from std::cin, sequence ends when -1 is approached. This '-1' should not be a part of sequence. Print out reversed sequence, it also must end with -1.
So, I've written pretty straightforward code that does exactly that, it might be not the best in terms of performance, as if I counted right, overall O(N^2 / 2).
int n = 0;
vector<int> numbers; //placeholder vector for input
while (n != -1) {
cin >> n;
numbers.push_back(n);
}
numbers.erase(numbers.end() - 1); // -1 should not be the part of the vector, so I erase it
n = numbers.size() - 1;
for (int i = 0; i < n / 2; ++i) { //swapping
int tmp = numbers[i];
numbers[i] = numbers[n - i];
numbers[n - i] = tmp;
}
for (auto a : numbers) //printing out
cout << a << " "; //each integer in input and output is separated by spacebar
cout << -1; //last element should be '-1'
Unfortunately, this code passes 4/10 test cases which was quite shocking for me.
I would be very grateful if anyone could give me a hint of what is wrong with my code or any generic advices about performance.

Your algorithm is linear, there is no performance issues with it. It looks like you've got a problem with computing indexes while swapping array elements.
You would be better off not adding -1 to the vector in the first place. Moreover, reversing should be done using std::reverse. You also should pay attention to premature end of the input to ensure that your program does not hang if -1 is never entered:
std::vector<int> numbers;
int n;
while (std::cin >> n) {
if (n == -1) break;
numbers.push_back(n);
}
std::reverse(numbers.begin(), numbers.end());
Your output section looks good, although you should add std::endl or '\n' to the end of your output:
std::cout << -1 << std::endl;
You can also write the entire vector to std::cout using std::copy:
std::copy(numbers.begin(), numbers.end(), std::ostream_iterator<int>(std::cout, " "));
Edit: It's for studying purpose, so I can't use std::reverse
Then you should rewrite your loop with iterators and std::iter_swap:
auto first = vector.begin();
auto last = vector.end();
while ((first != last) && (first != --last)) {
std::iter_swap(first++, last);
}
In general, you want to avoid indexes in favor of iterators in order for your code to be idiomatic C++, and avoid potential off-by-one problems.

your problem is when the length of the vector is even
for example if you have 4 elements your n = 4-1 = 3
your loop will go one step only as n/2 = 3/2 = 1
so to fix this just change your loop to this for (int i = 0; i <= n / 2; ++i)

Calculating Prime Numbers using Sets, C++

I am trying to calculate the prime numbers using a set but when I do the calculation my iterator is jumping randomly.
I am trying to implement this method for an value of N=10.
Choose an integer n. This function will compute all prime numbers up
to n. First insert all numbers from 1 to n into a set. Then erase all
multiples of 2 (except 2); that is, 4, 6, 8, 10, 12, .... Erase all
multiples of 3, that is, 6, 9, 12, 15, ... . Go up to sqrt(n) . The
remaining numbers are all primes.
When I run my code, it erases 1 and then pos jumps to 4? I am not sure why this happens instead of it going to the value 2 which is the 2nd value in the set?
Also what happens after I erase a value that the iterator is pointing to, what does the iterator point to then and if I advance it where does it advance?
Here is the code:
set<int> sieveofEratosthenes(int n){ //n = 10
set<int> a;
set<int>::iterator pos = a.begin();
//generate set of values 1-10
for (int i = 1; i <= n; i++) {
a.insert(i);
if(pos != a.end())
pos++;
}
pos = a.begin();
//remove prime numbers
while (pos != a.end())
{
cout << "\nNew Iteration \n\n";
for (int i = 1; i < sqrt(n); i++) {
int val = *pos%i;
cout << "Pos = " << *pos << "\n";
cout << "I = " << i << "\n";
cout << *pos << "/" << i << "=" << val << "\n\n";
if (val == 0) {
a.erase(i);
}
}
pos++;
}
return a;
}

Your implementation is incorrect in that it is trying to combine the sieve algorithm with the straightforward algorithm of trying out divisors, and it does so unsuccessfully. You do not need to test divisibility to implement the sieve - in fact, that's a major contributor to the beauty of the algorithm! You do not even need multiplication.
a.erase(1);
pos = a.begin();
while (pos != a.end()) {
int current = *pos++;
// "remove" is the number to remove.
// Start it at twice the current number
int remove = current + current;
while (remove <= n) {
a.erase(remove);
// Add the current number to get the next item to remove
remove += current;
}
}
Demo.

When erasing elements inside a loop you have to be carefull with the indices. For example, when you erase the element at position 0, then the next element is now at position 0. Thus the loop should look something like this:
for (int i = 1; i < sqrt(n); /*no increment*/) {
/* ... */
if (val == 0) {
a.erase(i);
} else {
i++;
}
}
Actually, you also have to take care that the size of the set is shrinking while you erase elements. Thus you better use iterators:
for (auto it = a.begin(); i != a.end(); /*no increment*/) {
/* ... */
if (val == 0) {
a.erase(it);
} else {
it++;
}
}
PS: the above is not exactly what you need for the sieve, but it should be sufficient to demonstrate how to erase elements (I hope so).

lower_bound for more than one value in c++

Suppose I have a sorted vector of numbers from 0 to 1. I want to know the indices, where values become larger than multiples of 0.1 (i.e. the deciles. in the future maybe also percentiles).
A simple solution I have in mind is using std::lower_bound:
std::vector<float> v;
/// something which fills the vector here
std::sort(v.begin(),v.end());
std::vector<float>::iterator i = v.begin();
for (float k = 0.1 ; k < 0.99 ; k+= 0.1) {
i = std::lower_bound (v.begin(), v.end(), k);
std::cout << "reached " << k << " at position " << (low-v.begin()) << std::endl;
std::cout << " going from " << *(low-1) << " to " << *low << std::endl;
// for simplicity of the example, I don't check if low is the first item of the vector
}
Since the vector can be long, I was wondering if this can be made faster. A first optimisation is to not search the part of the vector below the previous decile:
i = std::lower_bound (i, v.end(), k);
But, assuming lower_bound performs a binary search, this still scans the entire upper part of the vector for each decile over and over again and doesn't use the intermediate results from the previous binary search.
So ideally I would like to use a search function to which I can pass multiple search items, somehow like:
float searchvalues[9];
for (int k = 1; k <= 9 ; ++k) {
searchvalues[k] = ((float)k)/10.;
}
int deciles[9] = FANCY_SEARCH(v.begin(),v.end(),searchvalues,9);
is there anything like this already around and existing in standard, boost, or other libraries?

To be in O(log n), you may use the following:
void fill_range(
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>& ranges,
const std::vector<float>& v,
std::size_t b,
std::size_t e)
{
if (b == e) {
return;
}
int decile_b = v[b] / 0.1f;
int decile_e = v[e - 1] / 0.1f;
if (decile_b == decile_e) {
auto& range = ranges[decile_b];
if (range) {
range->first = std::min(range->first, b);
range->second = std::max(range->second, e);
} else {
range = std::make_pair(b, e);
}
} else {
std::size_t mid = (b + e + 1) / 2;
fill_range(ranges, v, b, mid);
fill_range(ranges, v, mid, e);
}
}
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>
decile_ranges(const std::vector<float>& v)
{
// assume sorted `v` with value x: 0 <= x < 1
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u> res;
fill_range(res, v, 0, v.size());
return res;
}
Live Demo
but a linear search seems simpler
auto last = v.begin();
for (int i = 0; i != 10; ++i) {
const auto it = std::find_if(v.begin(), v.end(),
[i](float f) {return f >= (i + 1) * 0.1f;});
// ith decile ranges from `last` to `it`;
last = it;
}

There isn't anything in Boost or the C++ Standard Library. Two choices for an algorithm, bearing in mind that both vectors are sorted:
O(N): trundle through the sorted vector, considering the elements of your quantile vector as you go.
O(Log N * Log M): Start with the middle quantile. Call lower_bound. The result of this becomes the higher iterator in a subsequent lower_bound call on the set of quantiles below that pivot and the lower iterator in a subsequent lower_bound call on the set of quantiles above that pivot. Repeat the process for both halves.
For percentiles, my feeling is that (1) will be the faster choice, and is considerably simpler to implement.

Find First Missing Element in a vector

This question has been asked before but I cannot find it for C++.
If I have a vector and I have a starting number, does std::algorithm provide me a way to find the next highest missing number?
I can obviously write this in a nested loop, I just cant shake the feeling that I'm reinventing the wheel.
For example, given: vector foo{13,8,3,6,10,1,7,0};
The starting number 0 should find 2.
The starting number 6 should find 9.
The starting number -2 should find -1.
EDIT:
Thus far all the solutions require sorting. This may in fact be required, but a temporary sorted vector would have to be created to accommodate this, as foo must remain unchanged.

At least as far as I know, there's no standard algorithm that directly implements exactly what you're asking for.
If you wanted to do it with something like O(N log N) complexity, you could start by sorting the input. Then use std::upper_bound to find the (last instance of) the number you've asked for (if present). From there, you'd find a number that differs from the previous by more than one. From there you'd scan for a difference greater than 1 between the consecutive numbers in the collection.
One way to do this in real code would be something like this:
#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <iterator>
int find_missing(std::vector<int> x, int number) {
std::sort(x.begin(), x.end());
auto pos = std::upper_bound(x.begin(), x.end(), number);
if (*pos - number > 1)
return number + 1;
else {
std::vector<int> diffs;
std::adjacent_difference(pos, x.end(), std::back_inserter(diffs));
auto pos2 = std::find_if(diffs.begin() + 1, diffs.end(), [](int x) { return x > 1; });
return *(pos + (pos2 - diffs.begin() - 1)) + 1;
}
}
int main() {
std::vector<int> x{ 13, 8, 3, 6, 10, 1,7, 0};
std::cout << find_missing(x, 0) << "\n";
std::cout << find_missing(x, 6) << "\n";
}
This is somewhat less than what you'd normally think of as optimal to provide the external appearance of a vector that can/does remain un-sorted (and unmodified in any way). I've done that by creating a copy of the vector, and sorting the copy inside the find_missing function. Thus, the original vector remains unmodified. The disadvantage is obvious: if the vector is large, copying it can/will be expensive. Furthermore, this ends up sorting the vector for every query instead of sorting once, then carrying out as many queries as desired on it.

So I thought I'd post an answer. I don't know anything in std::algorithm that accomplishes this directly, but in combination with vector<bool> you can do this in O(2N).
template <typename T>
T find_missing(const vector<T>& v, T elem){
vector<bool> range(v.size());
elem++;
for_each(v.begin(), v.end(), [&](const T& i){if((i >= elem && i - elem < range.size())range[i - elem] = true;});
auto result = distance(range.begin(), find(range.begin(), range.end(), false));
return result + elem;
}

First you need to sort the vector. Use std::sort for that.
std::lower_bound finds the first element that is greater or equal with a given element. (the elements have to be at least partially ordered)
From there you iterate while you have consecutive elements.
Dealing with duplicates: One way is the way I went: consider consecutive and equal elements when iterating. Another approach is to add a prerequisite that the vector / range contains unique elements. I chose the former because it avoids erasing elements.
Here is how you eliminate duplicates from a sorted vector:
v.erase(std::unique(v.begin(), v.end()), v.end());
My implementation:
// finds the first missing element in the vector v
// prerequisite: v must be sorted
auto firstMissing(std::vector<int> const &v, int elem) -> int {
auto low = std::lower_bound(std::begin(v), std::end(v), elem);
if (low == std::end(v) || *low != elem) {
return elem;
}
while (low + 1 != std::end(v) &&
(*low == *(low + 1) || *low + 1 == *(low + 1))) {
++low;
}
return *low + 1;
}
And a generalized version:
// finds the first missing element in the range [first, last)
// prerequisite: the range must be sorted
template <class It, class T = decltype(*std::declval<It>())>
auto firstMissing(It first, It last, T elem) -> T {
auto low = std::lower_bound(first, last, elem);
if (low == last || *low != elem) {
return elem;
}
while (std::next(low) != last &&
(*low == *std::next(low) || *low + 1 == *std::next(low))) {
std::advance(low, 1);
}
return *low + 1;
}
Test case:
int main() {
auto v = std::vector<int>{13, 8, 3, 6, 10, 1, 7, 7, 7, 0};
std::sort(v.begin(), v.end());
for (auto n : {-2, 0, 5, 6, 20}) {
cout << n << ": " << firstMissing(v, n) << endl;
}
return 0;
}
Result:
-2: -2
0: 2
5: 5
6: 9
20: 20
A note about sorting: From the OP's comments he was searching for a solution that wouldn't modify the vector.
You have to sort the vector for an efficient solution. If modifying the vector is not an option you could create a copy and work on it.
If you are hell-bent on not sorting, there is a brute force solution (very very inefficient - O(n^2)):
auto max = std::max_element(std::begin(v), std::end(v));
if (elem > *max) {
return elem;
}
auto i = elem;
while (std::find(std::begin(v), std::end(v), i) != std::end(v)) {
++i;
}
return i;

First solution:
Sort the vector. Find the starting number and see what number is next.
This will take O(NlogN) where N is the size of vector.
Second solution:
If the range of numbers is small e.g. (0,M) you can create boolean vector of size M. For each number of initial vector make the boolean of that index true. Later you can see next missing number by checking the boolean vector. This will take O(N) time and O(M) auxiliary memory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find median of `std::set` [duplicate] - c++

Related

different result between c+ set and unordered_set

Reversing sequence of integers

Calculating Prime Numbers using Sets, C++

lower_bound for more than one value in c++

Find First Missing Element in a vector

Categories

Resources