Inplace union sorted vectors

Inplace union sorted vectors - c++

I'd like an efficient method for doing the inplace union of a sorted vector with another sorted vector. By inplace, I mean that the algorithm shouldn't create a whole new vector or other storage to store the union, even temporarily. Instead, the first vector should simple grow by exactly the number of new elements.
Something like:
void inplace_union(vector & A, const vector & B);
Where, afterwards, A contains all of the elements of A union B and is sorted.
std::set_union in <algorithm> wont work because it overwrites its destination, which would be A.
Also, can this be done with just one pass over the two vectors?
Edit: elements that are in both A and B should only appear once in A.

I believe you can use the algorithm std::inplace_merge. Here is the sample code:
void inplace_union(std::vector<int>& a, const std::vector<int>& b)
{
int mid = a.size(); //Store the end of first sorted range
//First copy the second sorted range into the destination vector
std::copy(b.begin(), b.end(), std::back_inserter(a));
//Then perform the in place merge on the two sub-sorted ranges.
std::inplace_merge(a.begin(), a.begin() + mid, a.end());
//Remove duplicate elements from the sorted vector
a.erase(std::unique(a.begin(), a.end()), a.end());
}

Yes, this can be done in-place, and in O(n) time, assuming both inputs are sorted, and with one pass over both vectors. Here's how:
Extend A (the destination vector) by B.size() - make room for our new elements.
Iterate backwards over the two vectors, starting from the end of B and the original end of A. If the vectors are sorted small → big (big at the end), then take the iterator pointing at the larger number, and stick it in the true end of A. Keep going until B's iterator hits the beginning of B. Reverse iterators should prove especially nice here.
Example:
A: [ 1, 2, 4, 9 ]
B: [ 3, 7, 11 ]
* = iterator, ^ = where we're inserting, _ = unitialized
A: [ 1, 3, 4, 9*, _, _, _^ ] B: [ 3, 7, 11* ]
A: [ 1, 3, 4, 9*, _, _^, 11 ] B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9, _^, 9, 11 ] B: [ 3, 7*, 11 ]
A: [ 1, 3, 4*, 9^, 7, 9, 11 ] B: [ 3*, 7, 11 ]
A: [ 1, 3*, 4^, 4, 7, 9, 11 ] B: [ 3*, 7, 11 ]
A: [ 1, 3*^, 3, 4, 7, 9, 11 ] B: [ 3, 7, 11 ]
Super edit: Have you considered std::inplace_merge? (which I may have just re-invented?)

The set_difference idea is good, but the disadvantage is we don't know how much we need to grow the vector in advance.
This is my solution which does the set_difference twice, once to count the number of extra slots we'll need, and once again to do the actual copy.
Note: that means we will iterate over the source twice.
#include <algorithm>
#include <boost/function_output_iterator.hpp>
// for the test
#include <vector>
#include <iostream>
struct output_counter
{
output_counter(size_t & r) : result(r) {}
template <class T> void operator()(const T & x) const { ++result; }
private:
size_t & result;
};
// Target is assumed to work the same as a vector
// Both target and source must be sorted
template <class Target, class It>
void inplace_union( Target & target, It src_begin, It src_end )
{
const size_t mid = target.size(); // Store the end of first sorted range
// first, count how many items we will copy
size_t extra = 0;
std::set_difference(
src_begin, src_end,
target.begin(), target.end(),
boost::make_function_output_iterator(output_counter(extra)));
if (extra > 0) // don't waste time if nothing to do
{
// reserve the exact memory we will require
target.reserve( target.size() + extra );
// Copy the items from the source that are missing in the destination
std::set_difference(
src_begin, src_end,
target.begin(), target.end(),
std::back_inserter(target) );
// Then perform the in place merge on the two sub-sorted ranges.
std::inplace_merge( target.begin(), target.begin() + mid, target.end() );
}
}
int main()
{
std::vector<int> a(3), b(3);
a[0] = 1;
a[1] = 3;
a[2] = 5;
b[0] = 4;
b[1] = 5;
b[2] = 6;
inplace_union(a, b.begin(), b.end());
for (size_t i = 0; i != a.size(); ++i)
std::cout << a[i] << ", ";
std::cout << std::endl;
return 0;
}
Compiled with the boost headers, the result is:
$ ./test
1, 3, 4, 5, 6,

Related

c++ Algorithm to Compare various length vectors and isolate "unique", sort of

I have a complex problem and have been trying to identify what needs to be a very, very efficient algorithm. I'm hoping i can get some ideas from you helpful folks. Here is the situation.
I have a vector of vectors. These nested vectors are of various length, all storing integers in a random order, such as (pseudocode):
vector_list = {
{ 1, 4, 2, 3 },
{ 5, 9, 2, 1, 3, 3 },
{ 2, 4, 2 },
...,
100 more,
{ 8, 2, 2, 4 }
}
and so on, up to over 100 different vectors at a time inside vector_list. Note that the same integer can appear in each vector more than once. I need to remove from this vector_list any vectors that are duplicates of another vector. A vector is a duplicate of another vector if:
It has the same integers as the other vector (regardless of order). So if we have
vec1 = { 1, 2, 3 }
vec2 = { 2, 3, 1 }
These are duplicates and I need to remove one of them, it doesnt matter which one.
A vector contains all of the other integers of the other vector. So if we have
vec1 = { 3, 2, 2 }
vec2 = { 4, 2, 3, 2, 5 }
Vec2 has all of the ints of vec1 and is bigger, so i need to delete vec1 in favor of vec2
The problem is as I mentioned the list of vectors can be very big, over 100, and the algorithm may need to run as many as 1000 times on a button click, with a different group of 100+ vectors over 1000 times. Hence the need for efficiency. I have considered the following:
Sorting the vectors may make life easier, but as I said, this has to be efficient, and i'd rather not sort if i didnt have to.
It's more complicated by the fact that the vectors aren't in any order with respect to their size. For example, if the vectors in the list were ordered by size:
vector_list = {
{ },
{ },
{ },
{ },
{ },
...
{ },
{ }
}
It might make life easier, but that seems like it would take a lot of effort and I'm not sure about the gain.
The best effort I've had so far to try and solve this problem is:
// list of vectors, just 4 for illustration, but in reality more like 100, with lengths from 5 to 15 integers long
std::vector<std::vector<int>> vector_list;
vector_list.push_back({9});
vector_list.push_back({3, 4, 2, 8, 1});
vector_list.push_back({4, 2});
vector_list.push_back({1, 3, 2, 4});
std::vector<int>::iterator it;
int i;
int j;
int k;
// to test if a smaller vector is a duplicate of a larger vector, i copy the smaller vector, then
// loop through ints in the larger vector, seeing if i can find them in the copy of the smaller. if i can,
// i remove the item from the smaller copy, and if the size of the smaller copy reaches 0, then the smaller vector
// was a duplicate of the larger vector and can be removed.
std::vector<int> copy;
// flag for breaking a for loop below
bool erased_i;
// loop through vector list
for ( i = 0; i < vector_list.size(); i++ )
{
// loop again, so we can compare every vector to every other vector
for ( j = 0; j < vector_list.size(); j++ )
{
// don't want to compare a vector to itself
if ( i != j )
{
// if the vector in i loop is at least as big as the vector in j loop
if ( vector_list[i].size() >= vector_list[j].size() )
{
// copy the smaller j vector
copy = vector_list[j];
// loop through each item in the larger i vector
for ( k = 0; k < vector_list[i].size(); k++ ) {
// if the item in the larger i vector is in the smaller vector,
// remove it from the smaller vector
it = std::find(copy.begin(), copy.end(), vector_list[i][k]);
if (it != copy.end())
{
// erase
copy.erase(it);
// if the smaller vector has reached size 0, then it must have been a smaller duplicate that
// we can delete
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + j);
j--;
}
}
}
}
else
{
// otherwise vector j must be bigger than vector i, so we do the same thing
// in reverse, trying to erase vector i
copy = vector_list[i];
erased_i = false;
for ( k = 0; k < vector_list[j].size(); k++ ) {
it = std::find(copy.begin(), copy.end(), vector_list[j][k]);
if (it != copy.end()) {
copy.erase(it);
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + i);
// put an extra flag so we break out of the j loop as well as the k loop
erased_i = true;
break;
}
}
}
if ( erased_i ) {
// break the j loop because we have to start over with whatever
// vector is now in position i
break;
}
}
}
}
}
std::cout << "ENDING VECTORS\n";
// TERMINAL OUTPUT:
vector_list[0]
[9]
vector_list[1]
[3, 4, 2, 8, 1]
So this function gives me the right results, as these are the 2 unique vectors. It also gives me the correct results if i push the initial 4 vectors in reverse order, so the smallest one comes last for example. But it feels so inefficient comparing every vector to every other vector. Plus i have to create these "copies" and try to reduce them to 0 .size() with every comparison I make. very inefficient.
Anyways, any ideas on how I could make this speedier would be much appreciated. Maybe some kind of organization by vector length, I dunno.... It seems wasteful to compare them all to each other.
Thanks!

Loop through the vectors and for each vector, map the count of unique values occurring in it. unordered_map<int, int> would suffice for this, let's call it M.
Also maintain a set<unordered_map<int, int>>, say S, ordered by the size of unordered_map<int, int> in decreasing order.
Now we will have to compare contents of M with the contents of unordered_maps in S. Let's call M', the current unordered_map in S being compared with M. M will be a subset of M' only when the count of all the elements in M is less than or equal to the count of their respective elements in M'. If that's the case then it's a duplicate and we'll not insert. For any other case, we'll insert. Also notice that if the size of M is greater than the size of M', M can't be a subset of M'. That means we can insert M in S. This can be used as a pre-condition to speed things up. Maintain the indices of vectors which weren't inserted in S, these are the duplicates and have to be deleted from vector_list in the end.
Time Complexity: O(N*M) + O(N^2*D) + O(N*log(N)) = O(N^2*D) where N is the number of vectors in vector_list, M is the average size of the vectors in vector_list and D is the average size of unordered_map's in S. This is for the worst case when there aren't any duplicates. For average case, when there are duplicates, the second complexity will come down.
Edit: The above procedure will create a problem. To fix that, we'll need to make unordered_maps of all vectors, store them in a vector V, and sort that vector in decreasing order of the size of unordered_map. Then, we'll start from the biggest in this vector and apply the above procedure on it. This is necessary because, a subset, say M1 of a set M2, can be inserted into S before M2 if the respective vector of M1 comes before the respective vector of M2 in vector_list. So now we don't really need S, we can compare them within V itself. Complexity won't change.
Edit 2: The same problem will occur again if sizes of two unordered_maps are the same in V when sorting V. To fix that, we'll need to keep the contents of unordered_maps in some order too. So just replace unordered_map with map and in the comparator function, if the size of two maps is the same, compare element by element and whenever the keys are not the same for the very first time or are same but the M[key] is not the same, put the bigger element before the other in V.
Edit 3: New Time Complexity: O(N*M*log(D)) + O(N*D*log(N)) + O(N^2*D*log(D)) = O(N^2*D*log(D)). Also you might want to pair the maps with the index of the respective vectors in vector_list so as to know which vector you must delete from vector_list when you find a duplicate in V.
IMPORTANT: In sorted V, we must start checking from the end just to be safe (in case we choose to delete a duplicate from vector_list as well as V whenever we encounter it). So for the last map in V compare it with the rest of the maps before it to check if it is a duplicate.
Example:
vector_list = {
{1, 2, 3},
{2, 3, 1},
{3, 2, 2},
{4, 2, 3, 2, 5},
{1, 2, 3, 4, 6, 2},
{2, 3, 4, 5, 6},
{1, 5}
}
Creating maps of respective vectors:
V = {
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{2->2, 3->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{1->1, 5->1}
}
After sorting:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{1->1, 5->1},
{2->2, 3->1}
}
After deleting duplicates:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 5->1}
}
Edit 4: I tried coding it up. Running it a 1000 times on a list of 100 vectors, the size of each vector being in range [1-250], the range of the elements of vector being [0-50] and assuming the input is available for all the 1000 times, it takes around 2 minutes on my machine. It goes without saying that there is room for improvement in my code (and my machine).

My approach is to copy the vectors that pass the test to an empty vector.
May be inefficient.
May have bugs.
HTH :)
C++ Fiddle
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
int main(int, char **) {
using namespace std;
using vector_of_integers = vector<int>;
using vector_of_vectors = vector<vector_of_integers>;
vector_of_vectors in = {
{ 1, 4, 2, 3 }, // unique
{ 5, 9, 2, 1, 3, 3 }, // unique
{ 3, 2, 1 }, // exists
{ 2, 4, 2 }, // exists
{ 8, 2, 2, 4 }, // unique
{ 1, 1, 1 }, // exists
{ 1, 2, 2 }, // exists
{ 5, 8, 2 }, // unique
};
vector_of_vectors out;
// doesnt_contain_vector returns true when there is no entry in out that is superset of any of the passed vectors
auto doesnt_contain_vector = [&out](const vector_of_integers &in_vector) {
// is_subset returns true a vector contains all the integers of the passed vector
auto is_subset = [&in_vector](const vector_of_integers &out_vector) {
// contained returns true when the vector contains the passed integer
auto contained = [&out_vector](int i) {
return find(out_vector.cbegin(), out_vector.cend(), i) != out_vector.cend();
};
return all_of(in_vector.cbegin(), in_vector.cend(), contained);
};
return find_if(out.cbegin(), out.cend(), is_subset) == out.cend();
};
copy_if(in.cbegin(), in.cend(), back_insert_iterator<vector_of_vectors>(out), doesnt_contain_vector);
// show results
for (auto &vi: out) {
copy(vi.cbegin(), vi.cend(), std::ostream_iterator<int>(std::cout, ", "));
cout << "\n";
}
}

You could try something like this. I use std::sort and std::includes. Perhaps this is not the most effective solution.
// sort all nested vectors
std::for_each(vlist.begin(), vlist.end(), [](auto& v)
{
std::sort(v.begin(), v.end());
});
// sort vector of vectors by length of items
std::sort(vlist.begin(), vlist.end(), [](const vector<int>& a, const vector<int>& b)
{
return a.size() < b.size();
});
// exclude all duplicates
auto i = std::begin(vlist);
while (i != std::end(vlist)) {
if (any_of(i+1, std::end(vlist), [&](const vector<int>& a){
return std::includes(std::begin(a), std::end(a), std::begin(*i), std::end(*i));
}))
i = vlist.erase(i);
else
++i;
}

Calculate the union of an ordered set in C++

I would like to combine three variants of runlength encoding schemes (the runlengths are cumulated, hence the variant).
Let's start with two of them:
The first one contains a list of booleans, the second a list of counters. Let's say that the first looks as follows: (value:position of that value):
[(true:6), (false:10), (true:14), (false:20)]
// From 1 to 6, the value is true
// From 7 to 10, the value is false
// From 11 to 14, the value is true
// From 15 to 20, the value is false
The second looks as follows (again (value:position of that value)):
[(1:4), (2:8), (4:16), (0:20)]
// From 1 to 4, the value is 1
// From 5 to 8, the value is 2
// From 9 to 16, the value is 4
// From 17 to 20, the value is 0
As you can see, the positions are slightly different in both cases:
Case 1 : [6, 10, 14, 20]
Case 2 : [4, 8, 16, 20]
I would like to combine those "position arrays", by calculating their union:
[4, 6, 8, 10, 14, 16, 20]
Once I have this, I would derive from there the new schemes:
[(true:4), (true:6), (false:8), (false:10), (true:14), (false:16), (false:20)]
[(1:4), (2:6), (2:8), (4:10), (4:14), (4:16), (0:20)]
I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
Thanks
Dominique

You'll want to use std::set_union from <algorithm>.
I use a std::vector<int> here, but it can be any template type.
#include <iostream>
#include <array>
#include <algorithm>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
std::vector<int> c;
std::set_union(a.begin(), a.end(), b.begin(), b.end(), std::back_inserter(c));
for(auto e: c) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
If you'd like to maintain only two std::vectors without introducing c, you could simply append b to a, sort the array, then call std::unique on a. There may be a clever way to do this in O(n), but here's the naïve approach:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
a.insert(a.end(), b.begin(), b.end());
std::sort(a.begin(), a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone
Finally, you can use std::inplace_merge instead of std::sort. In the worst case it's O(nlogn) like std::sort, but in the best case it's O(n). Quite an increase in performance:
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> a{6, 10, 14, 20};
std::vector<int> b{4, 8, 16, 20};
auto a_size = a.size();
a.insert(a.end(), b.begin(), b.end());
// merge point is where `a` and `b` meet: at the end of original `a`.
std::inplace_merge(a.begin(), a.begin() + a_size, a.end());
auto last = std::unique(a.begin(), a.end());
a.erase(last, a.end());
for(auto e: a) {
std::cout << e << ' ';
}
std::cout << '\n';
}
Here's the ideone

I would like to know: is there any C++ standard type/class which can contain the "arrays" [6, 10, 14, 20] and [4, 8, 16, 20], calculate their union and sort it?
I guess you didn't do much research before asking this question. There's a class template that manages an ordered set, called set. If you add all the elements of two sets into a single set, you will have the union.
std::set<int> s1{6, 10, 14, 20};
std::set<int> s2{4, 8, 16, 20};
std::set<int> union = s1;
union.insert(s2.begin(), s2.end());

As hinted at by erip, there is an algorithm that only requires you to iterate both vectors once. As a precondition, both of them have to be sorted at the start. You can use that fact to always check which one is smaller, and only append a value from that vector to the result. It also allows you to remove duplicates, because if you want to add a value, that value will only be a duplicate if it is the last value added to the result vector.
I have whipped up some code; I haven't run extensive tests on it, so it may still be a little buggy, but here you go:
// Assume a and b are the input vectors, and they are sorted.
std::vector<int> result;
// We know how many elements we will get at most, so prevent reallocations
result.reserve(a.size() + b.size());
auto aIt = a.cbegin();
auto bIt = b.cbegin();
// Loop until we have reached the end for both vectors
while(aIt != a.cend() && bIt != b.cend())
{
// We pick the next value in a if it is smaller than the next value in b.
// Of course we cannot do this if we are at the end of a.
// If b has no more items, we also take the value from a.
if(aIt != a.end() && (bIt == b.end() || *aIt < *bIt))
{
// Skip this value if it equals the last added value
// (of course, for result.back() we need it to be nonempty)
if(result.size() == 0 || *aIt != result.back())
{
result.push_back(*aIt);
}
++aIt;
}
// We take the value from b if a has no more items,
// or if the next item in a was greater than the next item in b
else
{
// If we get here, then either aIt == a.end(), in which case bIt != b.end() (see loop condition)
// or bIt != b.end() and *aIt >= *bIt.
// So in either case we can safely dereference bIt here.
if(result.size() == 0 || *bIt != result.back())
{
result.push_back(*bIt);
}
++bIt;
}
}
It allows some optimizations in both style and performance but I think it works overall.
Of course if you want the result back in a, you can either modify this algorithm to insert directly into a, but it's probably faster to keep it like this and just a.swap(result) at the end.
You can see it in action here.

How to count unique integers in unordered_set?

A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set. To be clear, I start with some array, turn into an unordered set, and suddenly, the unordered_set only contains unique integers, and I am simply after the repeat number of the integers in the unordered_set.
Is this possible at all? (something like unordered_set.count(index) ?)

A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set.
If the container is contiguous, like an array, then I believe you can use ptrdiff_t to count them after doing some iterator math. I'm not sure about non-contiguous containers, though.
Since you start with an array:
Call unique on the array
unique returns iter.end()
Calculate ptrdiff_t count using iter.begin() and iter.end()
Remember that the calculation in step 3 needs to be adjusted for the sizeof and element.
But to paraphrase Beta, some containers lend themselves to this, and other do not. If you have an unordered set (or a map or a tree), then the information will not be readily available.

According to your answer to the user2357112's question I will write a solution.
So, let's assume that instead of unordered_set we will use a vector and our vector has values like this:
{1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
So, we want to get numbers (in different vector I think) of how many times particular value appears in the vector, right? And in this specific case result would be: 1 appears 5 times, 3 appears one time, 4 appears 3 times and 5 appears 2 times.
To get this done, one possible solution can be like this:
Get unique entries from source vector and store them in different vector, so this vector will contain: 1, 3, 4, 5
Iterate through whole unique vector and count these elements from source vector.
Print result
The code from point 1, can be like this:
template <typename Type>
vector<Type> unique_entries (vector<Type> vec) {
for (auto iter = vec.begin (); iter != vec.end (); ++iter) {
auto f = find_if (iter+1, vec.end (), [&] (const Type& val) {
return *iter == val;
});
if (f != vec.end ()) {
vec.erase (remove (iter+1, vec.end (), *iter), vec.end ());
}
}
return vec;
}
The code from point 2, can be like this:
template <typename Type>
struct Properties {
Type key;
long int count;
};
template <typename Type>
vector<Properties<Type>> get_properties (const vector<Type>& vec) {
vector<Properties<Type>> ret {};
auto unique_vec = unique_entries (vec);
for (const auto& uv : unique_vec) {
auto c = count (vec.begin (), vec.end (), uv); // (X)
ret.push_back ({uv, c});
}
return ret;
}
Of course we do not need Properties class to store key and count value, you can return just a vector of int (with count of elements), but as I said, it is one of the possible solutions. So, by using unique_entries we get a vector with unique entries ( :) ), then we can iterate through the whole vector vec (get_properties, using std::count marked as (X)), and push_back Properties object to the vector ret.
The code from point 3, can be like this:
template <typename Type>
void show (const vector<Properties<Type>>& vec) {
for (const auto& v : vec) {
cout << v.key << " " << v.count << endl;
}
}
// usage below
vector<int> vec {1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
auto properties = get_properties (vec);
show (properties);
And the result looks like this:
1 5
3 1
4 3
5 2
What is worth to note, this example has been written using templates to provide flexibility of choosing type of elements in the vector. If you want to store values of long, long long, short, etc, instead of int type, all you have to do is to change definition of source vector, for example:
vector<unsigned long long> vec2 {1, 3, 2, 3, 4, 4, 4, 4, 3, 3, 2, 3, 1, 7, 2, 2, 2, 1, 6, 5};
show (get_properties (vec2));
will produce:
1 3
3 5
2 5
4 4
7 1
6 1
5 1
which is desired result.
One more note, you can do this with vector of string as well.
vector<string> vec_str {"Thomas", "Rick", "Martin", "Martin", "Carol", "Thomas", "Martin", "Josh", "Jacob", "Jacob", "Rick"};
show (get_properties (vec_str));
And result is:
Thomas 2
Rick 2
Martin 3
Carol 1
Josh 1
Jacob 2

I assume you're trying to get a list of unique values AND the number of their occurences. If that's the case, then std::map provides the cleanest and simplest solution:
//Always prefer std::vector (or at least std::array) over raw arrays if you can
std::vector<int> myInts {2,2,7,8,3,7,2,3,46,7,2,1};
std::map<int, unsigned> uniqueValues;
//Get unique values and their count
for (int val : myInts)
++uniqueValues[val];
//Output:
for (const auto & val : uniqueValues)
std::cout << val.first << " occurs " << val.second << " times." << std::endl;
In this case it doesn't have to be std::unordered_set.

Find max element between two vectors

I thought the following would work but it just outputs zero. Ideas?
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6 };
int max = *std::max(std::max(a.begin(), a.end()), std::max(b.begin(), b.end()));
std::cout << max;

You're using std::max, which compares its arguments. That is, it's returning the greater of the two iterators.
What you want for the inner invocation is std::max_element, which finds the maximum element in a range:
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6 };
int max = std::max(*std::max_element(a.begin(), a.end()), *std::max_element(b.begin(), b.end()));
std::cout << max;
Live example
As #MikeSeymour correctly pointed out in comments, the above code assumes the ranges are not empty, as it unconditionally dereferences the iterators returned from std::max_element. If one of the ranges was empty, the returned iterator would be the past-the-end one, which cannot be dereferenced.

Here's a way that behaves sensibly with empty ranges. If either range is empty, you still get the maximum from the other range. If both ranges are empty, you get INT_MIN.
int m = std::accumulate(begin(b), end(b),
std::accumulate(begin(a), end(a), INT_MIN, std::max<int>),
std::max<int>);
std::accumulate is better here, since you want a value, not an iterator, as the result.

int m = std::max(std::max_element(a.begin(), a.end()), std::max_element(b.begin(), b.end()));
This finds maximum of the maximums of the individual vectors. For example, for 1st vector, { 1, 2, 3 }, max value is 3, and for 2nd vector, { 4, 5, 6 }, max value is 6, max of 3 and 6 is now 6

Swap the elements of two sequences, such that the difference of the element-sums gets minimal.

An interview question:
Given two non-ordered integer sequences a and b, their size is n, all
numbers are randomly chosen: Exchange the elements of a and b, such that the sum of the elements of a minus the sum of the elements of b is minimal.
Given the example:
a = [ 5 1 3 ]
b = [ 2 4 9 ]
The result is (1 + 2 + 3) - (4 + 5 + 9) = -12.
My algorithm: Sort them together and then put the first smallest n ints in a and left in b. It is O(n lg n) in time and O(n) in space. I do not know how to improve it to an algorithm with O(n) in time and O(1) in space. O(1) means that we do not need more extra space except seq 1 and 2 themselves.
Any ideas ?
An alternative question would be: What if we need to minimize the absolute value of the differences (minimize |sum(a) - sum(b)|)?
A python or C++ thinking is preferred.

Revised solution:
Merge both lists x = merge(a,b).
Calculate median of x (complexity O(n) See http://en.wikipedia.org/wiki/Selection_algorithm )
Using this median swap elements between a and b. That is, find an element in a that is less than median, find one in b that is more than median and swap them
Final complexity: O(n)
Minimizing absolute difference is NP complete since it is equivalent to the knapsack problem.

What comes into my mind is following algorithm outline:
C = A v B
Partitially sort #A (number of A) Elements of C
Subtract the sum of the last #B Elements from C from the sum of the first #A Elements from C.
You should notice, that you don't need to sort all elements, it is enough to find the number of A smallest elements. Your example given:
C = {5, 1, 3, 2, 4, 9}
C = {1, 2, 3, 5, 4, 9}
(1 + 2 + 3) - (5 + 4 + 9) = -12
A C++ solution:
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
// Initialize 'a' and 'b'
int ai[] = { 5, 1, 3 };
int bi[] = { 2, 4, 9 };
std::vector<int> a(ai, ai + 3);
std::vector<int> b(bi, bi + 3);
// 'c' = 'a' merged with 'b'
std::vector<int> c;
c.insert(c.end(), a.begin(), a.end());
c.insert(c.end(), b.begin(), b.end());
// partitially sort #a elements of 'c'
std::partial_sort(c.begin(), c.begin() + a.size(), c.end());
// build the difference
int result = 0;
for (auto cit = c.begin(); cit != c.end(); ++cit)
result += (cit < c.begin() + a.size()) ? (*cit) : -(*cit);
// print result (and it's -12)
std::cout << result << std::endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Inplace union sorted vectors - c++

Related

c++ Algorithm to Compare various length vectors and isolate "unique", sort of

Calculate the union of an ordered set in C++

How to count unique integers in unordered_set?

Find max element between two vectors

Swap the elements of two sequences, such that the difference of the element-sums gets minimal.

Categories

Resources