Calculate the union of an ordered set in C++

Calculate the union of an ordered set in C++ - c++

I would like to combine three variants of runlength encoding schemes (the runlengths are cumulated, hence the variant).
Let's start with two of them:
The first one contains a list of booleans, the second a list of counters. Let's say that the first looks as follows: (value:position of that value):
[(true:6), (false:10), (true:14), (false:20)]
// From 1 to 6, the value is true
// From 7 to 10, the value is false
// From 11 to 14, the value is true
// From 15 to 20, the value is false
The second looks as follows (again (value:position of that value)):
[(1:4), (2:8), (4:16), (0:20)]
// From 1 to 4, the value is 1
// From 5 to 8, the value is 2
// From 9 to 16, the value is 4
// From 17 to 20, the value is 0
As you can see, the positions are slightly different in both cases:
Case 1 : [6, 10, 14, 20]
Case 2 : [4, 8, 16, 20]
I would like to combine those "position arrays", by calculating their union:
[4, 6, 8, 10, 14, 16, 20]
Once I have this, I would derive from there the new schemes:
[(true:4), (true:6), (false:8), (false:10), (true:14), (false:16), (false:20)]
[(1:4), (2:6), (2:8), (4:10), (4:14), (4:16), (0:20)]
I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
Thanks
Dominique

You'll want to use std::set_union from <algorithm>.
I use a std::vector<int> here, but it can be any template type.
#include <iostream>
#include <array>
#include <algorithm>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
std::vector<int> c;
std::set_union(a.begin(), a.end(), b.begin(), b.end(), std::back_inserter(c));
for(auto e: c) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
If you'd like to maintain only two std::vectors without introducing c, you could simply append b to a, sort the array, then call std::unique on a. There may be a clever way to do this in O(n), but here's the naïve approach:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
a.insert(a.end(), b.begin(), b.end());
std::sort(a.begin(), a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
Finally, you can use std::inplace_merge instead of std::sort. In the worst case it's O(nlogn) like std::sort, but in the best case it's O(n). Quite an increase in performance:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
auto a_size = a.size();
a.insert(a.end(), b.begin(), b.end());
// merge point is where `a` and `b` meet: at the end of original `a`.
std::inplace_merge(a.begin(), a.begin() + a_size, a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone

I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
I guess you didn't do much research before asking this question. There's a class template that manages an ordered set, called set. If you add all the elements of two sets into a single set, you will have the union.
std::set<int> s1{6, 10, 14, 20};
std::set<int> s2{4, 8, 16, 20};
std::set<int> union = s1;
union.insert(s2.begin(), s2.end());

As hinted at by erip, there is an algorithm that only requires you to iterate both vectors once. As a precondition, both of them have to be sorted at the start. You can use that fact to always check which one is smaller, and only append a value from that vector to the result. It also allows you to remove duplicates, because if you want to add a value, that value will only be a duplicate if it is the last value added to the result vector.
I have whipped up some code; I haven't run extensive tests on it, so it may still be a little buggy, but here you go:
// Assume a and b are the input vectors, and they are sorted.
std::vector<int> result;
// We know how many elements we will get at most, so prevent reallocations
result.reserve(a.size() + b.size());
auto aIt = a.cbegin();
auto bIt = b.cbegin();
// Loop until we have reached the end for both vectors
while(aIt != a.cend() && bIt != b.cend())
{
// We pick the next value in a if it is smaller than the next value in b.
// Of course we cannot do this if we are at the end of a.
// If b has no more items, we also take the value from a.
if(aIt != a.end() && (bIt == b.end() || *aIt < *bIt))
{
// Skip this value if it equals the last added value
// (of course, for result.back() we need it to be nonempty)
if(result.size() == 0 || *aIt != result.back())
{
result.push_back(*aIt);
}
++aIt;
}
// We take the value from b if a has no more items,
// or if the next item in a was greater than the next item in b
else
{
// If we get here, then either aIt == a.end(), in which case bIt != b.end() (see loop condition)
// or bIt != b.end() and *aIt >= *bIt.
// So in either case we can safely dereference bIt here.
if(result.size() == 0 || *bIt != result.back())
{
result.push_back(*bIt);
}
++bIt;
}
}
It allows some optimizations in both style and performance but I think it works overall.
Of course if you want the result back in a, you can either modify this algorithm to insert directly into a, but it's probably faster to keep it like this and just a.swap(result) at the end.
You can see it in action here.

Related

Copying non-sequential columns from an array into another array C++ and removing duplicates based on 1 column

I want to copy columns from a std::vector<std::vector<double> > into another std::vector<std::vector<double> > in C++. This question answers that but only deals with the case where all the columns are in a sequence. In my case, the inner std::vector has 8 elements {C1, C2, C3, C4, C5, C6, C7, C8}. The new object needs to contain {C4, C5, C6, C8} and all the rows. Is there a way to do it directly?
After this step, I will be manipulating this to remove the duplicate rows and write it into a file. Also, please suggest which activity to do first (deleting "columns" or duplicates).
Just to put things in perspective - the outer std::vector has ~2 billion elements and after removing duplicates, I will end up with ~50 elements. So, a method that is faster and memory efficient is highly preferred.

I would use std::transform.
It could look like this:
#include <algorithm> // transform
#include <vector>
#include <iostream>
#include <iterator> // back_inserter
int main() {
std::vector<std::vector<double>> orig{
{1,2,3,4,5,6,7,8},
{11,12,13,14,15,16,17,18},
};
std::vector<std::vector<double>> result;
result.reserve(orig.size());
std::transform(orig.begin(), orig.end(), std::back_inserter(result),
[](auto& iv) -> std::vector<double> {
return {iv[3], iv[4], iv[5], iv[7]};
});
// print the result:
for(auto& inner : result) {
for(auto val : inner) std::cout << val << ' ';
std::cout << '\n';
}
}
Output:
4 5 6 8
14 15 16 18
Note: If any of the inner vector<double>s in orig has fewer elements than 8, the transformation will access that array out of bounds (with undefined behavior as a result) - so, make sure they all have the required amount of elements.
Or using C++20 ranges to create the resulting vector from a transformation view:
#include <iostream>
#include <ranges> // views::transform
#include <vector>
int main() {
std::vector<std::vector<double>> orig{
{1, 2, 3, 4, 5, 6, 7, 8},
{11, 12, 13, 14, 15, 16, 17, 18},
};
auto trans = [](auto&& iv) -> std::vector<double> {
return {iv[3], iv[4], iv[5], iv[7]};
};
auto tview = orig | std::views::transform(trans);
std::vector<std::vector<double>> result(tview.begin(), tview.end());
// print the result:
for (auto& inner : result) {
for (auto val : inner) std::cout << val << ' ';
std::cout << '\n';
}
}

How can I improve the following lambda function for finding the first 4 elements in a vector

For practice, I am trying to copy the first 4 entries different than 2 from a vector of integers using copy_if.
This seems to work but if there is a better way of writing this lambda then I'd like to learn the proper way. Cheers.
vector<int> first_vector = {2,8,50,2,4,5,9,12};
vector<int> second_vector (first_vector.size());
int count_elem=0;
auto it = copy_if(first_vector.begin(),first_vector.end(),second_vector.begin(),
[&count_elem]
(int i){
if(i!=2 && count_elem!=4)
{
count_elem++;
return 1;
}
return 0;});

Since you are not copying all of the values from first_vector to second_vector, you should not initialize second_vector to hold the same number of elements as first_vector. You are creating more elements than you want, where the extra elements are value-initialized to 0.
I would suggest reserve()'ing the size of second_vector instead and then use std::back_inserter as the destination iterator to copy to. That way, second_vector ends up with only the values you want pushed and nothing else.
That would also eliminate the need for the count_elem variable. You can use second_vector.size() to know how many values have been pushed into the vector.
std::vector<int> first_vector = {2, 8, 50, 2, 4, 5, 9, 12};
std::vector<int> second_vector;
second_vector.reserve(4);
std::copy_if(
first_vector.begin(), first_vector.end(),
std::back_inserter(second_vector),
[&](int i){
return ((i != 2) && (second_vector.size() < 4));
}
);
Do note, however, that this use of std::copy_if() will iterate through the entire first_vector, it will not stop iterating once 4 values have been pushed to second_vector. It would be more efficient to simply run your own loop instead so you can break it as soon as possible:
std::vector<int> first_vector = {2, 8, 50, 2, 4, 5, 9, 12};
std::vector<int> second_vector;
second_vector.reserve(4);
for(int i : first_vector) {
if (i != 2) {
second_vector.push_back(i);
if (second_vector.size() == 4)
break;
}
}

How to get iterator to a certain object in a container?

I'm currently trying to copy a part of a std::vector, starting with the first value until a sequence of values has been "encountered". I’m using mainly STL algorithms and especially std::find_if() (I know there are other ways to accomplish the goal stated in the first sentence, but I'm mainly doing this to understand the STL, so using them would be defeating the underlying purpose).
As an example, let's say a vector holding integer elements (originalvec in the code) is to be copied until first a 6 and then in direct succession a 7 is encountered. I know how to compare for the 6, and then I would like to compare in the same call of the lambda if behind the 6 there is a 7. I think (not sure) for that, I would need to get an iterator to the 6, then use either std::advance() or just operator++ on the iterator and compare the dereferenced value to 7. However, I do not know how to get an iterator to the 6/the number currently compared?
#include <algorithm>
#include <vector>
using namespace std;
int main() {
vector <int> originalvec = { 4, 8, 7, 6, 55, 2, 6, 7, 8 };
vector <int> newvec;
copy(originalvec.begin(),
find_if(originalvec.begin(), originalvec.end(), [](int curnum) {
return (curnum == 6);
}),
back_inserter(newvec));
//why does newvec.begin() (instead of back_inserter(newvec)) not work?
//current result: newvec = {4, 8, 7}
//wanted result : newvec = {4, 8, 7, 6, 55, 2}
/*wanted function is roughly in this style:
copy(originalvec.begin(),
find_if(originalvec.begin(), originalvec.end(), [](int curnum) {
return (curnum == 6 && [curnum* +1] == 7);
}),
back_inserter(newvec));
*/
return 0;
}

You can use std::adjacent_find in this case:
auto it = std::adjacent_find( originalvec.begin(), originalvec.end(), []( int i1, int i2 ) {
return i1 == 6 and i2 == 7;
} );
Live example

You can use a custom find function, for example (not tested)
template <typename It, typename T>
It find_succ(It begin, It end, T v1, T v2)
{
if (begin == end)
return end;
It next = begin;
++next;
while (next != end) {
if (*begin == v1 && *next == v2)
return begin;
++begin, ++next;
}
return end;
}

What's the difference between std::merge and std::set_union?

The question is clear, my google- and cplusplus.com/reference-fu is failing me.

std::set_union will contain those elements that are present in both sets only once. std::merge will contain them twice.
For example, with A = {1, 2, 5}; B = {2, 3, 4}:
union will give C = {1, 2, 3, 4, 5}
merge will give D = {1, 2, 2, 3, 4, 5}
Both work on sorted ranges, and return a sorted result.
Short example:
#include <algorithm>
#include <iostream>
#include <set>
#include <vector>
int main()
{
std::set<int> A = {1, 2, 5};
std::set<int> B = {2, 3, 4};
std::vector<int> out;
std::set_union(std::begin(A), std::end(A), std::begin(B), std::end(B),
std::back_inserter(out));
for (auto i : out)
{
std::cout << i << " ";
}
std::cout << '\n';
out.clear();
std::merge(std::begin(A), std::end(A), std::begin(B), std::end(B),
std::back_inserter(out));
for (auto i : out)
{
std::cout << i << " ";
}
std::cout << '\n';
}
Output:
1 2 3 4 5
1 2 2 3 4 5

std::merge keeps all elements from both ranges, equivalent elements from the first range preceding equivalent elements from the second range in the output. Where an equivalent elements appear in both ranges std::set_union takes only the element from the first range, otherwise each element is merged in order as with std::merge.
References: ISO/IEC 14882:2003 25.3.4 [lib.alg.merge] and 25.3.5.2 [lib.set.union].

This is the verification I suggested in the comment I posted to the accepted answer (i.e. that if an element is present in one of the input-sets N times, it will appear N times in the output of set_union - so set_union does not remove duplicate equivalent items in the way we would 'naturally' or 'mathematically' expect - if, however, both input-ranges contained a common item once only, then set_union would appear to remove the duplicate)
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
using namespace std;
void printer(int i) { cout << i << ", "; }
int main() {
int mynumbers1[] = { 0, 1, 2, 3, 3, 4 }; // this is sorted, 3 is dupe
int mynumbers2[] = { 5 }; // this is sorted
vector<int> union_result(10);
set_union(mynumbers1, mynumbers1 + sizeof(mynumbers1)/sizeof(int),
mynumbers2, mynumbers2 + sizeof(mynumbers2)/sizeof(int),
union_result.begin());
for_each(union_result.begin(), union_result.end(), printer);
return 0;
}
This will print: 0, 1, 2, 3, 3, 4, 5, 0, 0, 0,

std::merge merges all elements, without eliminating the duplicates, while std::set_union eliminates the duplicates. That is, the latter applies the rule of union operation of set theory.

To add to the previous answers - beware that the complexity of std::set_union is twice that of std::merge. In practise, this means the comparator in std::set_union may be applied to an element after it has been dereferenced, while with std::merge this is never the case.
Why may this be important? Consider something like:
std::vector<Foo> lhs, rhs;
And you want to produce a union of lhs and rhs:
std::set_union(std::cbegin(lhs), std::cend(lhs),
std::cbegin(rhs), std::cend(rhs),
std::back_inserter(union));
But now suppose Foo is not copyable, or is very expensive to copy and you don't need the originals. You may think to use:
std::set_union(std::make_move_iterator(std::begin(lhs)),
std::make_move_iterator(std::end(lhs)),
std::make_move_iterator(std::begin(rhs)),
std::make_move_iterator(std::end(rhs)),
std::back_inserter(union));
But this is undefined behaviour as there is a possibility of a moved Foo being compared! The correct solution is therefore:
std::merge(std::make_move_iterator(std::begin(lhs)),
std::make_move_iterator(std::end(lhs)),
std::make_move_iterator(std::begin(rhs)),
std::make_move_iterator(std::end(rhs)),
std::back_inserter(union));
union.erase(std::unique(std::begin(union), std::end(union), std::end(union));
Which has the same complexity as std::set_union.

Inplace union sorted vectors

I'd like an efficient method for doing the inplace union of a sorted vector with another sorted vector. By inplace, I mean that the algorithm shouldn't create a whole new vector or other storage to store the union, even temporarily. Instead, the first vector should simple grow by exactly the number of new elements.
Something like:
void inplace_union(vector & A, const vector & B);
Where, afterwards, A contains all of the elements of A union B and is sorted.
std::set_union in <algorithm> wont work because it overwrites its destination, which would be A.
Also, can this be done with just one pass over the two vectors?
Edit: elements that are in both A and B should only appear once in A.

I believe you can use the algorithm std::inplace_merge. Here is the sample code:
void inplace_union(std::vector<int>& a, const std::vector<int>& b)
{
int mid = a.size(); //Store the end of first sorted range
//First copy the second sorted range into the destination vector
std::copy(b.begin(), b.end(), std::back_inserter(a));
//Then perform the in place merge on the two sub-sorted ranges.
std::inplace_merge(a.begin(), a.begin() + mid, a.end());
//Remove duplicate elements from the sorted vector
a.erase(std::unique(a.begin(), a.end()), a.end());
}

Yes, this can be done in-place, and in O(n) time, assuming both inputs are sorted, and with one pass over both vectors. Here's how:
Extend A (the destination vector) by B.size() - make room for our new elements.
Iterate backwards over the two vectors, starting from the end of B and the original end of A. If the vectors are sorted small → big (big at the end), then take the iterator pointing at the larger number, and stick it in the true end of A. Keep going until B's iterator hits the beginning of B. Reverse iterators should prove especially nice here.
Example:
A: [ 1, 2, 4, 9 ]
B: [ 3, 7, 11 ]
* = iterator, ^ = where we're inserting, _ = unitialized
A: [ 1, 3, 4, 9*, _, _, _^ ] B: [ 3, 7, 11* ]
A: [ 1, 3, 4, 9*, _, _^, 11 ] B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9, _^, 9, 11 ] B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9^, 7, 9, 11 ] B: [ 3*, 7, 11 ]
A: [ 1, 3*, 4^, 4, 7, 9, 11 ] B: [ 3*, 7, 11 ]
A: [ 1, 3*^, 3, 4, 7, 9, 11 ] B: [ 3, 7, 11 ]
Super edit: Have you considered std::inplace_merge? (which I may have just re-invented?)

The set_difference idea is good, but the disadvantage is we don't know how much we need to grow the vector in advance.
This is my solution which does the set_difference twice, once to count the number of extra slots we'll need, and once again to do the actual copy.
Note: that means we will iterate over the source twice.
#include <algorithm>
#include <boost/function_output_iterator.hpp>
// for the test
#include <vector>
#include <iostream>
struct output_counter
{
output_counter(size_t & r) : result(r) {}
template <class T> void operator()(const T & x) const { ++result; }
private:
size_t & result;
};
// Target is assumed to work the same as a vector
// Both target and source must be sorted
template <class Target, class It>
void inplace_union( Target & target, It src_begin, It src_end )
{
const size_t mid = target.size(); // Store the end of first sorted range
// first, count how many items we will copy
size_t extra = 0;
std::set_difference(
src_begin, src_end,
target.begin(), target.end(),
boost::make_function_output_iterator(output_counter(extra)));
if (extra > 0) // don't waste time if nothing to do
{
// reserve the exact memory we will require
target.reserve( target.size() + extra );
// Copy the items from the source that are missing in the destination
std::set_difference(
src_begin, src_end,
target.begin(), target.end(),
std::back_inserter(target) );
// Then perform the in place merge on the two sub-sorted ranges.
std::inplace_merge( target.begin(), target.begin() + mid, target.end() );
}
}
int main()
{
std::vector<int> a(3), b(3);
a[0] = 1;
a[1] = 3;
a[2] = 5;
b[0] = 4;
b[1] = 5;
b[2] = 6;
inplace_union(a, b.begin(), b.end());
for (size_t i = 0; i != a.size(); ++i)
std::cout << a[i] << ", ";
std::cout << std::endl;
return 0;
}
Compiled with the boost headers, the result is:
$ ./test
1, 3, 4, 5, 6,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Calculate the union of an ordered set in C++ - c++

Related

Copying non-sequential columns from an array into another array C++ and removing duplicates based on 1 column

How can I improve the following lambda function for finding the first 4 elements in a vector

How to get iterator to a certain object in a container?

What's the difference between std::merge and std::set_union?

Inplace union sorted vectors

Categories

Resources