What's the difference between std::merge and std::set_union?

What's the difference between std::merge and std::set_union? - c++

The question is clear, my google- and cplusplus.com/reference-fu is failing me.

std::set_union will contain those elements that are present in both sets only once. std::merge will contain them twice.
For example, with A = {1, 2, 5}; B = {2, 3, 4}:
union will give C = {1, 2, 3, 4, 5}
merge will give D = {1, 2, 2, 3, 4, 5}
Both work on sorted ranges, and return a sorted result.
Short example:
#include <algorithm>
#include <iostream>
#include <set>
#include <vector>
int main()
{
std::set<int> A = {1, 2, 5};
std::set<int> B = {2, 3, 4};
std::vector<int> out;
std::set_union(std::begin(A), std::end(A), std::begin(B), std::end(B),
std::back_inserter(out));
for (auto i : out)
{
std::cout << i << " ";
}
std::cout << '\n';
out.clear();
std::merge(std::begin(A), std::end(A), std::begin(B), std::end(B),
std::back_inserter(out));
for (auto i : out)
{
std::cout << i << " ";
}
std::cout << '\n';
}
Output:
1 2 3 4 5
1 2 2 3 4 5

std::merge keeps all elements from both ranges, equivalent elements from the first range preceding equivalent elements from the second range in the output. Where an equivalent elements appear in both ranges std::set_union takes only the element from the first range, otherwise each element is merged in order as with std::merge.
References: ISO/IEC 14882:2003 25.3.4 [lib.alg.merge] and 25.3.5.2 [lib.set.union].

This is the verification I suggested in the comment I posted to the accepted answer (i.e. that if an element is present in one of the input-sets N times, it will appear N times in the output of set_union - so set_union does not remove duplicate equivalent items in the way we would 'naturally' or 'mathematically' expect - if, however, both input-ranges contained a common item once only, then set_union would appear to remove the duplicate)
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
using namespace std;
void printer(int i) { cout << i << ", "; }
int main() {
int mynumbers1[] = { 0, 1, 2, 3, 3, 4 }; // this is sorted, 3 is dupe
int mynumbers2[] = { 5 }; // this is sorted
vector<int> union_result(10);
set_union(mynumbers1, mynumbers1 + sizeof(mynumbers1)/sizeof(int),
mynumbers2, mynumbers2 + sizeof(mynumbers2)/sizeof(int),
union_result.begin());
for_each(union_result.begin(), union_result.end(), printer);
return 0;
}
This will print: 0, 1, 2, 3, 3, 4, 5, 0, 0, 0,

std::merge merges all elements, without eliminating the duplicates, while std::set_union eliminates the duplicates. That is, the latter applies the rule of union operation of set theory.

To add to the previous answers - beware that the complexity of std::set_union is twice that of std::merge. In practise, this means the comparator in std::set_union may be applied to an element after it has been dereferenced, while with std::merge this is never the case.
Why may this be important? Consider something like:
std::vector<Foo> lhs, rhs;
And you want to produce a union of lhs and rhs:
std::set_union(std::cbegin(lhs), std::cend(lhs),
std::cbegin(rhs), std::cend(rhs),
std::back_inserter(union));
But now suppose Foo is not copyable, or is very expensive to copy and you don't need the originals. You may think to use:
std::set_union(std::make_move_iterator(std::begin(lhs)),
std::make_move_iterator(std::end(lhs)),
std::make_move_iterator(std::begin(rhs)),
std::make_move_iterator(std::end(rhs)),
std::back_inserter(union));
But this is undefined behaviour as there is a possibility of a moved Foo being compared! The correct solution is therefore:
std::merge(std::make_move_iterator(std::begin(lhs)),
std::make_move_iterator(std::end(lhs)),
std::make_move_iterator(std::begin(rhs)),
std::make_move_iterator(std::end(rhs)),
std::back_inserter(union));
union.erase(std::unique(std::begin(union), std::end(union), std::end(union));
Which has the same complexity as std::set_union.

Related

STL function to get all dimensions of a vector in c++

I wish to know if there is any STL function in c++, to get the dimensions of a vector.
For example,
vec = [[1, 2, 3],
[4, 5, 6]]
The dimensions are (2, 3)
I am aware of size() function. But the function does not return dimensions.
In the above example, vec.size() would have returned 2.
To get second dimension, I would have to use vec[0].size(), which would be 3

In C++, a(n std::)vector is, by definition, 1D-vector of size() elements, which can be changed in runtime.
You can define a vector of vectors (e.g., std::vector<std::vector<int>>), but that doesn't have a constraint that the 'inner' dimensions are the same. E.g., {{1, 2, 3}, {1, 2}} is valid.
Therefore, inner dimensions are ambiguous. What you can do, if you maintain it to be the same and if you're sure that you've got elements, is to query v[0].size() as well, and so on.

As said in lorro's answer, you likely want to find the dimensions of a std::vector<std::vector<int>>.
Finding the outer dimension is easy, since all you have to do is vec.size(). But the inner vectors can be of any length, and don't have to be the same length. Assuming you want the minimum, this is doable with STL functions.
We can use std::transform to fill a vector with the dimensions of the inner vectors, and then use std::min_element to find the smaller of those.
#include <vector>
#include <algorithm>
#include <iostream>
int main() {
std::vector<std::vector<int>> vec = {{1, 2}, {3, 4, 5}};
std::vector<std::size_t> dim(vec.size());
std::transform(vec.cbegin(), vec.cend(), dim.begin(),
[](auto &v) { return v.size(); });
std::size_t min = *std::min_element(dim.cbegin(), dim.cend());
std::cout << "(" << vec.size() << ", " << min << ")\n";
return 0;
}
Output:
(2, 2)

How can I group a million numbers faster than using qsort in C++? [duplicate]

This question already has answers here:
Can an array be grouped more efficiently than sorted?
(4 answers)
Closed 3 years ago.
How can i group a series of integer numbers, eg., [4, 2, 3, 3, 2, 4, 1, 2, 4] to become [4, 4, 4, 2, 2, 2, 3, 3, 1] without using any sorting algorithm.
Note that i don't need the result to be in any sorted order, but i do need the suggested algorithm to group a million of numbers faster than qsort.

This should work if you don't care too much about using extra space. It first stores the number of occurrences of each number in an unordered_map and then creates a vector that contains each value in the map, repeated the number of times it was seen in the original vector. See the documentation for insert for how this works. The [] operator for an unordered_map works in O(1) on average. So creating the unordered_map takes O(N) time. Iterating through the map and populating the return vector again takes O(N) time, so this whole thing should run in O(N). Note that this creates two extra copies of the data.
In the worst case, the [] operator takes O(N) time, so the only way to really know if this is faster than qsort would be to measure it.
#include <vector>
#include <unordered_map>
#include <iostream>
std::vector<int> groupNumbers(const std::vector<int> &input)
{
std::vector<int> grouped;
std::unordered_map<int, int> counts;
for (auto &x: input)
{
++counts[x];
}
for (auto &x: counts)
{
grouped.insert(grouped.end(), x.second, x.first);
}
return grouped;
}
// example
int main()
{
std::vector<int> test{1,2,3,4,3,2,3,2,3,4,1,2,3,2,3,4,3,2};
std::vector<int> result(groupNumbers(test));
for (auto &x: result)
{
std::cout << x << std::endl;
}
return 0;
}

How to get a minimum value from a vector with (value > 0)

I'm trying to reduce by some value all the elements of a vector that are not or less than 0.
I haven't really tried anything because I just can't get around with this, but I Need, for example, from a vector{1, 2, 3, 0, -1} the value vector[0]
I don't want to sort or remove any value, I need the vector to keep its "structure".
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> R;
//I'm sorry I can't provide any code, I can't get around with this
}
I expect for example: from vectors
A = {1, 2, 3, 4, 0, -1}
B = {53, 36, 205, -8, -45, -93}
to get:
A[0]
B[1]
(Those are just random vectors to make some examples)

You can use a custom accumulation like this:
#include <algorithm>
#include <iostream>
#include <vector>
#include <limits> //std::numeric_limits
#include <numeric> //std::accumulate
#include <iterator> //std::begin, std::end
int main()
{
std::vector<int> R{1, 2, 3, 4, 0, -1};
std::cout << std::accumulate(std::begin(R), std::end(R), std::numeric_limits<int>::max(),
[](int a, int b){if (b > 0) return std::min(a, b); return a;});
}
It returns the max for an integer if there are no strictly positive element in the vector.

This is a use case for the rather unknown std::lower_bound:
int smallest_positive(std::vector<int> v)
{
std::sort(begin(v), end(v));
return *std::lower_bound(begin(v), end(v), 0);
}
std::sort(begin(v), end(v));
This sorts a copy of the input vector. For simple cases, this is the best effort/perf you can get ;)
[std::lower_bound] returns an iterator pointing to the first element in the range [first, last) that is not less than (i.e. greater or equal to) value, or last if no such element is found.
std::lower_bound(begin(v), end(v), 1);
This scans the sorted v for the first element that is not negative (not less than 1) and returns an iterator to it. Beware, it returns an invalid iterator if no element of v is positive. I'll let you fix that ;)

Calculate the union of an ordered set in C++

I would like to combine three variants of runlength encoding schemes (the runlengths are cumulated, hence the variant).
Let's start with two of them:
The first one contains a list of booleans, the second a list of counters. Let's say that the first looks as follows: (value:position of that value):
[(true:6), (false:10), (true:14), (false:20)]
// From 1 to 6, the value is true
// From 7 to 10, the value is false
// From 11 to 14, the value is true
// From 15 to 20, the value is false
The second looks as follows (again (value:position of that value)):
[(1:4), (2:8), (4:16), (0:20)]
// From 1 to 4, the value is 1
// From 5 to 8, the value is 2
// From 9 to 16, the value is 4
// From 17 to 20, the value is 0
As you can see, the positions are slightly different in both cases:
Case 1 : [6, 10, 14, 20]
Case 2 : [4, 8, 16, 20]
I would like to combine those "position arrays", by calculating their union:
[4, 6, 8, 10, 14, 16, 20]
Once I have this, I would derive from there the new schemes:
[(true:4), (true:6), (false:8), (false:10), (true:14), (false:16), (false:20)]
[(1:4), (2:6), (2:8), (4:10), (4:14), (4:16), (0:20)]
I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
Thanks
Dominique

You'll want to use std::set_union from <algorithm>.
I use a std::vector<int> here, but it can be any template type.
#include <iostream>
#include <array>
#include <algorithm>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
std::vector<int> c;
std::set_union(a.begin(), a.end(), b.begin(), b.end(), std::back_inserter(c));
for(auto e: c) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
If you'd like to maintain only two std::vectors without introducing c, you could simply append b to a, sort the array, then call std::unique on a. There may be a clever way to do this in O(n), but here's the naïve approach:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
a.insert(a.end(), b.begin(), b.end());
std::sort(a.begin(), a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
Finally, you can use std::inplace_merge instead of std::sort. In the worst case it's O(nlogn) like std::sort, but in the best case it's O(n). Quite an increase in performance:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
auto a_size = a.size();
a.insert(a.end(), b.begin(), b.end());
// merge point is where `a` and `b` meet: at the end of original `a`.
std::inplace_merge(a.begin(), a.begin() + a_size, a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone

I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
I guess you didn't do much research before asking this question. There's a class template that manages an ordered set, called set. If you add all the elements of two sets into a single set, you will have the union.
std::set<int> s1{6, 10, 14, 20};
std::set<int> s2{4, 8, 16, 20};
std::set<int> union = s1;
union.insert(s2.begin(), s2.end());

As hinted at by erip, there is an algorithm that only requires you to iterate both vectors once. As a precondition, both of them have to be sorted at the start. You can use that fact to always check which one is smaller, and only append a value from that vector to the result. It also allows you to remove duplicates, because if you want to add a value, that value will only be a duplicate if it is the last value added to the result vector.
I have whipped up some code; I haven't run extensive tests on it, so it may still be a little buggy, but here you go:
// Assume a and b are the input vectors, and they are sorted.
std::vector<int> result;
// We know how many elements we will get at most, so prevent reallocations
result.reserve(a.size() + b.size());
auto aIt = a.cbegin();
auto bIt = b.cbegin();
// Loop until we have reached the end for both vectors
while(aIt != a.cend() && bIt != b.cend())
{
// We pick the next value in a if it is smaller than the next value in b.
// Of course we cannot do this if we are at the end of a.
// If b has no more items, we also take the value from a.
if(aIt != a.end() && (bIt == b.end() || *aIt < *bIt))
{
// Skip this value if it equals the last added value
// (of course, for result.back() we need it to be nonempty)
if(result.size() == 0 || *aIt != result.back())
{
result.push_back(*aIt);
}
++aIt;
}
// We take the value from b if a has no more items,
// or if the next item in a was greater than the next item in b
else
{
// If we get here, then either aIt == a.end(), in which case bIt != b.end() (see loop condition)
// or bIt != b.end() and *aIt >= *bIt.
// So in either case we can safely dereference bIt here.
if(result.size() == 0 || *bIt != result.back())
{
result.push_back(*bIt);
}
++bIt;
}
}
It allows some optimizations in both style and performance but I think it works overall.
Of course if you want the result back in a, you can either modify this algorithm to insert directly into a, but it's probably faster to keep it like this and just a.swap(result) at the end.
You can see it in action here.

Find max element between two vectors

I thought the following would work but it just outputs zero. Ideas?
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6 };
int max = *std::max(std::max(a.begin(), a.end()), std::max(b.begin(), b.end()));
std::cout << max;

You're using std::max, which compares its arguments. That is, it's returning the greater of the two iterators.
What you want for the inner invocation is std::max_element, which finds the maximum element in a range:
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6 };
int max = std::max(*std::max_element(a.begin(), a.end()), *std::max_element(b.begin(), b.end()));
std::cout << max;
Live example
As #MikeSeymour correctly pointed out in comments, the above code assumes the ranges are not empty, as it unconditionally dereferences the iterators returned from std::max_element. If one of the ranges was empty, the returned iterator would be the past-the-end one, which cannot be dereferenced.

Here's a way that behaves sensibly with empty ranges. If either range is empty, you still get the maximum from the other range. If both ranges are empty, you get INT_MIN.
int m = std::accumulate(begin(b), end(b),
std::accumulate(begin(a), end(a), INT_MIN, std::max<int>),
std::max<int>);
std::accumulate is better here, since you want a value, not an iterator, as the result.

int m = std::max(std::max_element(a.begin(), a.end()), std::max_element(b.begin(), b.end()));
This finds maximum of the maximums of the individual vectors. For example, for 1st vector, { 1, 2, 3 }, max value is 3, and for 2nd vector, { 4, 5, 6 }, max value is 6, max of 3 and 6 is now 6

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What's the difference between std::merge and std::set_union? - c++

The question is clear, my google- and cplusplus.com/reference-fu is failing me.

std::merge merges all elements, without eliminating the duplicates, while std::set_union eliminates the duplicates. That is, the latter applies the rule of union operation of set theory.

Related

STL function to get all dimensions of a vector in c++

How can I group a million numbers faster than using qsort in C++? [duplicate]

How to get a minimum value from a vector with (value > 0)

Calculate the union of an ordered set in C++

Find max element between two vectors

Categories

Resources