Union with map? - c++

I am trying to find the union of two sets with map. I have two sets and would like to combine them into a third one. I get an error for this code in the push_back. Is there a way to do this?
map<char, vector<char> > numbers;
map<char, vector<char> >::iterator it;
numbers['E'].push_back('a');//set1
numbers['E'].push_back('b');
numbers['E'].push_back('c');
numbers['G'].push_back('d');//set2
numbers['G'].push_back('e');
void Create::Union(char set1, char set2, char set3)
{
for (it = numbers.begin(); it != numbers.end(); ++it)
{
numbers[set3].push_back(it->second);
}
}

numbers is a load of vectors, keyed by character. So it->second is a vector. You can't push_back a vector into a vector of char.
You should be iterating over numbers[set1] and numbers[set2], not iterating over numbers. Or as bdonlan says, you could insert a range, although he's taking a union of everything in numbers, not just set1 and set2.
Also: where's item defined? Do you mean it?
Also, note that push_back doesn't check whether the value is in the vector already. So once you get the details of this general approach sorted out, your example case will work and the union of 'E' and 'G' will be a vector containing 'a','b','c','d','e'. But if you took the union of 'a','b','c' with 'c','d','e' you'd get 'a','b','c','c','d','e', which probably isn't what you want from a union.
Assuming your vectors are always going to be sorted, you could instead use the standard algorithm set_union:
#include <algorithm>
#include <iterator>
...
numbers[set3].clear();
std::set_union(numbers[set1].begin(), numbers[set1].end(),
numbers[set2].begin(), numbers[set2].end(),
std::back_inserter(numbers[set3]));
If you want to take the union of everything in numbers, I would probably go with either:
vector<char> sofar;
map<char, vector<char> >::iterator it;
for (it = numbers.begin(); it != numbers.end(); ++it) {
// new, empty vector
vector<char> target;
// merge everything so far with the next item from the map,
// putting the results in target
set_union(sofar.begin(), sofar.end(),
it->second.begin(), it->second.end(),
back_inserter(target));
// the result is the new "everything so far"
// note that this operation is very fast. It doesn't have to
// copy any of the contents of the vector, just exchange some pointers.
swap(target, sofar);
}
// replace numbers[set3] with the final result
swap(numbers[set3], sofar);
Or:
set<char> sofar;
map<char, vector<char> >::iterator it;
for (it = numbers.begin(); it != numbers.end(); ++it) {
// let std::set remove the duplicates for us
sofar.insert(it->second.begin(), it->second.end());
}
// replace numbers[set3] with the final result
numbers[set3].clear();
numbers[set3].insert(numbers[set3].end(), sofar.begin(), sofar.end());
This is less code and might be faster, or might thrash the memory allocator too much. Not sure which is better, and for small collections performance almost certainly doesn't matter at all.
The version with set also doesn't require the vectors to be sorted, although it's faster if they are.

I think you might want:
void Create::Union(char set1, char set2, char set3)
{
vector<char> &target = numbers[set3];
for (it = numbers.begin(); it != numbers.end(); ++it)
{
if (&it->second == &target)
continue; // Don't insert into ourselves
target.insert(target.end(), it->second.begin(), it->second.end());
}
}
push_back was trying to add the item->second vector itself to the target vector; this way explicitly copies the contents only.

Related

Remove duplicates without using any STL containers

I was asked the following question in a 30-minute interview:
Given an array of integers, remove the duplicates without using any STL containers. For e.g.:
For the input array [1,2,3,4,5,3,3,5,4] the output should be:
[1,2,3,4,5];
Note that the first 3, 4 and 5 have been included, but the subsequent ones have been removed since we have already included them once in the output array. How do we do without using an extra STL container?
In the interview, I assumed that we only have positive integers and suggested using a bit array to mark off every element present in the input (assume every element in the input array as an index of the bit array and update it to 1). Finally, we could iterate over this bit vector, populating (or displaying) the unique elements. However, he was not satisfied with this approach. Any other methods that I could have used?
Thanks.
Just use std::sort() and std::unique():
int arr[] = { 1,2,3,4,5,3,3,5,4 };
std::sort( std::begin(arr), std::end(arr) );
auto end = std::unique( std::begin(arr), std::end(arr) );
Live example
We can first sort the array then check if the next element is equal to the previous one and finally give the answer with the help of another array of size 2 larger than the previous one like this.
Initialize the second array with a value that first array will not take (any number larger/smaller than the limit given) ,suppose 0 for simplicity then
int arr1[] = { 1,2,3,4,5,3,3,5,4 };
int arr2[] = { 0,0,0,0,0,0,0,0,0,0,0 };
std::sort( std::begin(arr1), std::end(arr1) );
int position=1;
arr2[0] = arr1[0];
for(int* i=begin(arr1)+1;i!=end(arr1);i++){
if((*i)!=(*(i-1))){
arr2[position] = (*i);
position++;
}
}
int size = 0;
for(int* i=begin(arr2);i!=end(arr2);i++){
if((*i)!=(*(i+1))){
size++;
}
else{
break;
}
}
int ans[size];
for(int i=0;i<size;i++){
ans[i]=arr2[i];
}
Easy algorithm in O(n^2):
void remove_duplicates(Vec& v) {
// range end
auto it_end = end(v);
for (auto it = begin(v); it != it_end; ++it) {
// remove elements matching *it
it_end = remove(it+1, it_end, *it);
}
// erase now-unused elements
v.erase(it_end, end(v));
}
See also erase-remove idiom
Edit: This is assuming you get a std::vector in, but it would work with C-style arrays too, you would just have to implement the erasure yourself.

How to merge sorted vectors into a single vector in C++

I have 10,000 vector<pair<unsigned,unsigned>> and I want to merge them into a single vector such that it is lexicographically sorted and does not contain duplicates. In order to do so I wrote the following code. However, to my surprise the below code is taking a lot of time. Can someone please suggest as to how can I reduce the running time of my code?
using obj = pair<unsigned, unsigned>
vector< vector<obj> > vecOfVec; // 10,000 vector<obj>, each sorted with size()=10M
vector<obj> result;
for(auto it=vecOfVec.begin(), l=vecOfVec.end(); it!=l; ++it)
{
// append vectors
result.insert(result.end(),it->begin(),it->end());
// sort result
std::sort(result.begin(), result.end());
// remove duplicates from result
result.erase(std::unique(result.begin(), result.end()), result.end());
}
I think you should use the fact that the vector in vectOfVect are sorted.
So detecting the min value in the front on the single vectors, push_back() it in the result and remove all the values detected from the front of the vectors matching the min values (avoiding duplicates in result).
If you can delete the vecOfVec variable, something like (caution: code not tested: just to give an idea)
while ( vecOfVec.size() )
{
// detect the minimal front value
auto itc = vecOfVec.cbegin();
auto lc = vecOfVec.cend();
auto valMin = itc->front();
while ( ++itc != lc )
valMin = std::min(valMin, itc->front());
// push_back() the minimal front value in result
result.push_back(valMin);
for ( auto it = vecOfVec.begin() ; it != vecOfVec.end() ; )
{
// remove all the front values equals to valMin (this remove the
// duplicates from result)
while ( (false == it->empty()) && (valMin == it->front()) )
it->erase(it->begin());
// when a vector is empty is removed
it = ( it->empty() ? vecOfVec.erase(it) : ++it );
}
}
If you can, I suggest you to switch vecOfVec from a vector< vector<obj> > to something that permit an efficient removal from the front of single containers (stacks?) and an efficient removal of single containers (a list?).
If there are lot of duplicates, you should use set rather than vector for your result, as set is the most natural thing to store something without duplicates:
set< pair<unsigned,unsigned> > resultSet;
for (auto it=vecOfVec.begin(); it!=vecOfVec.end(); ++it)
resultSet.insert(it->begin(), it->end());
If you need to turn it into a vector, you can write
vector< pair<unsigned,unsigned> > resultVec(resultSet.begin(), resultSet.end());
Note that since your code runs over 800 billion elements, it would still take a lot of time, no matter what. At least hours, if not days.
Other ideas are:
recursively merge vectors (10000 -> 5000 -> 2500 -> ... -> 1)
to merge 10000 vectors, store 10000 iterators in a heap structure
One problem with your code is the excessive use of std::sort. Unfortunately, the quicksort algorithm (which usually is the working horse used by std::sort) is not particularly faster when encountering an already sorted array.
Moreover, you're not exploiting the fact that your initial vectors are already sorted. This can be exploited by using a heap of their next values, when you will not need to call sort again. This may be coded as follows (code tested using obj=int), but perhaps it can be made more concise.
// represents the next unused entry in one vector<obj>
template<typename obj>
struct feed
{
typename std::vector<obj>::const_iterator current, end;
feed(std::vector<obj> const&v)
: current(v.begin()), end(v.end()) {}
friend bool operator> (feed const&l, feed const&r)
{ return *(l.current) > *(r.current); }
};
// - returns the smallest element
// - set corresponding feeder to next and re-establish the heap
template<typename obj>
obj get_next(std::vector<feed<obj>>&heap)
{
auto&f = heap[0];
auto x = *(f.current++);
if(f.current == f.end) {
std::pop_heap(heap.begin(),heap.end(),std::greater<feed<obj>>{});
heap.pop_back();
} else
std::make_heap(heap.begin(),heap.end(),std::greater<feed<obj>>{});
return x;
}
template<typename obj>
std::vector<obj> merge(std::vector<std::vector<obj>>const&vecOfvec)
{
// create min heap of feed<obj> and count total number of objects
std::vector<feed<obj>> input;
input.reserve(vecOfvec.size());
size_t num_total = 0;
for(auto const&v:vecOfvec)
if(v.size()) {
num_total += v.size();
input.emplace_back(v);
}
std::make_heap(input.begin(),input.end(),std::greater<feed<obj>>{});
// append values in ascending order, avoiding duplicates
std::vector<obj> result;
result.reserve(num_total);
while(!input.empty()) {
auto x = get_next(input);
result.push_back(x);
while(!input.empty() &&
!(*(input[0].current) > x)) // remove duplicates
get_next(input);
}
return result;
}

How to reach elements in a std::set two by two in C++

I have a list of integers.(Currently stored in a std::vector but to increase efficieny, I need to convert it to set. But in current version, I use it as following: (I'm using c++98 not c++11)
int res=0;
vector<vector<int> >costMatrix;
vector<int>partialSolution;
for(int i =0;i<partialSolution.size()-1;i++){
res+=costMatrix[partialSolution.get(i)][partialSolution.get(i+1)];
}
So, I need to do the same thing with the set data structure. But I dont know how to get two elements from the set at a time. I can get the partialSolution.get(i) with the code below but I could not get the partialSolution.get(i+1). Is there anyone to help me to modify the code below?
// this time set<int> partialSolution
int res=0;
std::set<int>::iterator it;
for (it = partialSolution.begin(); it != partialSolution.end(); ++it)
{
res+=costMatrix[*it][];
}
This could work (iterating from begin() to end()-1 and using std::next or ++ to get item next to current one).
In C++11:
for (it = partialSolution.begin(); it != std::prev(partialSolution.end()); ++it)
{
res+=costMatrix[*it][*(std::next(it))];
}
In C++98:
std::set<int>::iterator last = partialSolution.end();
--last;
for (it = partialSolution.begin(); it != last; ++it)
{
// not optimal but I'm trying to make it easy to understand...
std::set<int>::iterator next = it;
++next;
res+=costMatrix[*it][*next];
}

Erase duplicate element from a vector

I create a vector inside with several elements in c++ and I want to remove the elements of vector with the same values. Basically, I want to remove the whole index of the vector that is found a duplicate element. My vector is called person. I am trying to do something like:
for(int i=0; i < person.size(); i++){
if(i>0 && person.at(i) == person.at(0:i-1)) { // matlab operator
continue;
}
writeToFile( perason.at(i) );
}
How is it possible to create the operator 0:i-1 to check all possible combinations of indexes?
Edit: I am trying GarMan solution but I got issues in for each:
set<string> myset;
vector<string> outputvector;
for (string element:person)
{
if (myset.find(element) != myset.end())
{
myset.insert(element);
outputvector.emplace_back(element);
}
}
Here is an "in-place" version (no second vector required) that should work with older compilers:
std::set<std::string> seen_so_far;
for (std::vector<std::string>::iterator it = person.begin(); it != person.end();)
{
bool was_inserted = seen_so_far.insert(*it).second;
if (was_inserted)
{
++it;
}
else
{
swap(*it, person.back());
person.pop_back();
}
}
Let me know if this works for you. Note that the order of elements is not guaranteed to stay the same.
Something like this will work
unordered_set<same_type_as_vector> myset;
vector<same_type_as_vector> outputvector;
for (auto&& element: myvector)
{
if (myset.find(element) != myset.end())
{
myset.insert(element);
outputvector.emplace_back(element);
}
}
myvector.swap(outputvector);
Code written into reply box, so might need tweaking.
If you can sort your vector, you can simply call std::unique.
#include <algorithm>
std::sort(person.begin(), person.end());
person.erase(std::unique(person.begin(), person.end()), person.end());
If you cannot sort, you can use a hash-table instead by scanning the vector and update the hash-table accordingly. On the same time, you can easily check if one element is already existent or not in O(1) (and O(n) in total). You don't need to check all other elements for each one, which will be time-costly O(n^2).

STL Multimap Remove/Erase Values

I have STL Multimap, I want to remove entries from the map which has specific value , I do not want to remove entire key, as that key may be mapping to other values which are required.
any help please.
If I understand correctly these values can appear under any key. If that is the case you'll have to iterate over your multimap and erase specific values.
typedef std::multimap<std::string, int> Multimap;
Multimap data;
for (Multimap::iterator iter = data.begin(); iter != data.end();)
{
// you have to do this because iterators are invalidated
Multimap::iterator erase_iter = iter++;
// removes all even values
if (erase_iter->second % 2 == 0)
data.erase(erase_iter);
}
Since C++11, std::multimap::erase returns an iterator following the last removed element.
So you can rewrite Nikola's answer slightly more cleanly without needing to introduce the local erase_iter variable:
typedef std::multimap<std::string, int> Multimap;
Multimap data;
for (Multimap::iterator iter = data.begin(); iter != data.end();)
{
// removes all even values
if (iter->second % 2 == 0)
iter = data.erase(iter);
else
++iter;
}
(See also answer to this question)