A reduce function (for many set unions) in C++ - c++

What I am trying to do:
I have a simple set union function in C++ using STL, and I'm trying to wrap it in a function that will let me perform the union of arbitrarily many sets contained in STL data structures (e.g. std::list, std::vector, std::forward_list, ...).
How I tried to do it:
To start, my simple set union:
#include <algorithm>
template <typename set_type>
set_type sunion(const set_type & lhs, const set_type & rhs)
{
set_type result;
std::set_union( lhs.begin(), lhs.end(), rhs.begin(), rhs.end(), std::inserter(result, result.end()) );
return result;
}
where set_type defines some STL std::set<T>, e.g. std::set<int>.
After noticing several times that I end up needing to perform several unions on iterators of sets (in Python this would be a reduce of my sunion function over some iterable object of set_types). For instance, I might have
std::vector<std::set<int> > all_sets;
or
std::list<std::set<int> > all_sets;
etc., and I want to get the union of all sets in all_sets. I am trying to implement a simple reduce for this, which essentially does a (faster, more elegant, non-copying) version of:
sunion(... sunion( sunion( all_sets.begin(), all_sets.begin()+1 ), all_sets.begin()+2 ) , ... )
Essentially, to do this quickly, I just want to declare a set_type result and then iterate through all_sets and insert value in every set in all_sets into the result object:
template <typename set_type>
set_type sunion_over_iterator_range(const std::iterator<std::forward_iterator_tag, set_type> & begin, const std::iterator<std::forward_iterator_tag, set_type> & end)
{
set_type result;
for (std::iterator<std::forward_iterator_tag, set_type> iter = begin; iter != end; iter++)
{
insert_all(result, *iter);
}
return result;
}
where insert_all is defined:
// |= operator; faster than making a copy and performing union
template <typename set_type>
void insert_all(set_type & lhs, const set_type & rhs)
{
for (typename set_type::iterator iter = rhs.begin(); iter != rhs.end(); iter++)
{
lhs.insert(*iter);
}
}
How it didn't work:
Unfortunately, my sunion_over_iterator_range(...) doesn't work with arguments std::vector<set_type>::begin(), std::vector<set_type>::end(), which are of type std::vector<set_type>::iterator. I thought std::vector<T>::iterator returns an iterator<random_access_iterator_tag, T>. A
After compilation failed because of type incompatibility of the iterators, I looked at the stl vector source (located in /usr/include/c++/4.6/bits/stl_vector.h for g++ 4.6 & Ubuntu 11.10), and was surprised to see the typedef for vector<T>::iterator to be typedef __gnu_cxx::__normal_iterator<pointer, vector> iterator;. I had thought that a ForwardIterator was a subtype of RandomAccessIterator, and so should be fine, but clearly I was incorrect, or I would not be here.
How I am grateful and ashamed of inciting your frustration due to my inexperience:
Apologies if I'm showing my ignorance-- I am trying to learn to be a better object oriented programmer (in the past I have simply hacked everything out in C-style code).
I'm doing my best, coach! Please help me out and spare the world from bad code that I would produce without your code ninja insight...

Here's a very naive approach:
std::set<T> result;
std::vector<std::set<T>> all_sets;
for (std::set<T> & s : all_sets)
{
result.insert(std::make_move_iterator(s.begin()),
std::make_move_iterator(s.end()));
}
This invalidates the elements in the source sets, though it doesn't actually move the element nodes over. If you want to leave the source sets intact, just remove the make_move_iterator.
Unfortunately there's no interface for std::set that lets you "splice" two sets in a way that doesn't reallocate the internal tree nodes, so this is more or less as good as you can get.
Here's a variadic template approach:
template <typename RSet> void union(RSet &) { }
template <typename RSet, typename ASet, typename ...Rest>
void union(RSet & result, ASet const & a, Rest const &... r)
{
a.insert(a.begin(), a.end());
union(result, r...);
}
Usage:
std::set<T> result
union(result, s1, s2, s3, s4);
(Similar move-optimizations are feasible here; you can even add some branching that will copy from immutables but move from mutables, or from rvalues only, if you like.)
Here's a version using std::accumulate:
std::set<T> result =
std::accumulate(all_sets.begin(), all_sets.end(), std::set<T>(),
[](std::set<T> & s, std::set<T> const & t)
{ s.insert(t.begin(), t.end()); return s; } );
This version seems to rely on return value optimisation a lot, though, so you might like to compare it to this hacked up and rather ugly version:
std::set<T> result;
std::accumulate(all_sets.begin(), all_sets.end(), 0,
[&result](int, std::set<T> const & t)
{ result.insert(t.begin(), t.end()); return 0; } );

Usually, when using iterators we don't care about the actual category. Just let the implementation sort it out. That means, just change the function to accept any type:
template <typename T>
typename std::iterator_traits<T>::value_type sunion_over_iterator_range(T begin, T end)
{
typename std::iterator_traits<T>::value_type result;
for (T iter = begin; iter != end; ++ iter)
{
insert_all(result, *iter);
}
return result;
}
Note that I have used typename std::iterator_traits<T>::value_type, which is the type of *iter.
BTW, the iterator pattern is not related to OOP. (That doesn't mean it's a bad thing).

Related

Template to check if vector and map contains value

I'm a beginner in c++ i was searching for templates that could check if a vector / map independent of their data type, contains a given value, I have found these:
template <typename Container, typename Value>
bool vector_contains(const Container& c, const Value& v)
{
return std::find(std::begin(c), std::end(c), v) != std::begin(c);
}
template< typename container, typename key >
auto map_contains(container const& c, key const& k)
-> decltype(c.find(k) != c.end())
{
return c.find(k) != c.end();
}
My doubt is, does using templates to do this kind of verification impact performance somehow?
I have found these
Ok, but do analyze them. They are sub optimal and/or plain wrong.
template <typename Container, typename Value>
bool vector_contains(const Container& c, const Value& v)
{
return std::find(std::begin(c), std::end(c), v) != std::begin(c);
}
This will return true as long as v is not the first value found. It'll also return true if v is not found at all.
A vector, without any other information, is unsorted, which means that contains will have to search from the first element to the last if the value is not found. Such searches are considered expensive.
If you on the other hand std::sort the vector and use the same Comparator when using std::binary_search, it'll have a quicker lookup. Sorting takes time too, though.
template< typename container, typename key >
auto map_contains(container const& c, key const& k) -> decltype(c.find(k) != c.end())
{
return c.find(k) != c.end();
}
This looks like it may work for types matching the function template. It should use map::contains instead - if it's meant to be used with maps.

How to elegantly avoid duplication of code when < changes to >?

Here is a simplified version of my code:
template<typename TIterator>
TIterator findMaximalPosition(TIterator begin, TIterator end)
{
TIterator result(begin);
for (TIterator it = begin + 1; it != end; ++it)
{
if ((*it)->value > (*result)->value) // Here I just need to change to "<"
result = it; // to get a findMinimalPosition
}
return result;
}
template<typename TIterator>
TIterator findMinimalPosition(TIterator begin, TIterator end)
{
// almost the same
}
This is just a simplified example. My code is full of places where two functions are the same except for a < or > sign or whether ++ or -- should be used.
My question is:
Is there a method how to reduce this duplication in code without
Destroying the readability
Decreasing the performance
?
I was thinking of using a pointer to an operator (either < or >) as a template parameter. This should not decrease performance, since the pointer would be a compile time constant. Is there some better or generally used way?
EDIT:
So what I did based on the answers was to implement:
template <typename TIterator, typename TComparison>
TIterator findExtremalPosition(TIterator begin, TIterator end,
TComparison comparison);
and then just call:
return findExtremalPosition(begin, end, std::less<double>());
and
return findExtremalPosition(begin, end, std::greater<double>());
I hope this is what you meant. I suppose that after some struggling similar solution can be done for ++ and -- operators.
I would make a general function that takes a predicate and use std::greater and std::less as argument to that function for the given type to implement findMaximalPosition and findMinimalPosition respectively.
As sugested by Ivaylo Strandjev, one possible solution is to use predicates.
So, if you change your function to work with predicates...
typename std::vector<int> vec;
template<typename TIterator, bool (*Predicate)(const TIterator &, const TIterator &)>
TIterator findPosition(TIterator begin, TIterator end)
{
TIterator result(begin);
for (TIterator it = begin + 1; it != end; ++it)
{
if (Predicate(it, result))
result = it;
}
return result;
}
... and then, you define some predicates that helps you to achieve your goal...
bool lesser(const vec::iterator &a, const vec::iterator &b)
{
return (*a) < (*b);
}
bool greater(const vec::iterator &a, const vec::iterator &b)
{
return (*a) > (*b);
}
... then you would be able to do this:
vec::iterator min = findPosition<typename vec::iterator, lesser>(v.begin(), v.end());
vec::iterator max = findPosition<typename vec::iterator, greater>(v.begin(), v.end());
The advantage is to provide any function you would found useful, not only the ones to check if an element is greater or smaller than other:
bool weird(const vec::iterator &a, const vec::iterator &b)
{
return ((*a) | (*b)) & 0x4;
}
vec::iterator weird = findPosition<typename vec::iterator, weird>(v.begin(), v.end());
Example here.
But before do this effort, check if the Algorithms library has already did the job.
I think that it looks pretty neat and simple.
Hope it helps.

Combining C++ standard algorithms by looping only once

I currently have this code up and running:
string word="test,";
string::iterator it = word.begin();
for (; it != word.end(); it++)
{
if (!isalpha(*it)) {
break;
}
else {
*it = toupper(*it);
}
}
word.erase(it, word.end());
// word should now be: TEST
I would like to make it more compact and readable it by:
Composing existing standard C++ algorithms (*)
Perform the loop only once
(*) I'm assuming that combining existing algorithms makes my code more readable...
An alternative solution
In addition to defining a custom transform_until algorithm, as suggested by jrok, it might be possible to define a custom iterator adaptor that would iterate using the underlying iterator but redefine operator*() by modifying the underlying reference before returning it.
Something like that:
template <typename Iterator, typename UnaryFunction = typename Iterator::value_type (*)(typename Iterator::value_type)>
class sidefx_iterator: public std::iterator<
typename std::forward_iterator_tag,
typename std::iterator_traits<Iterator>::value_type,
typename std::iterator_traits<Iterator>::difference_type,
typename std::iterator_traits<Iterator>::pointer,
typename std::iterator_traits<Iterator>::reference >
{
public:
explicit sidefx_iterator(Iterator x, UnaryFunction fx) : current_(x), fx_(fx) {}
typename Iterator::reference operator*() const { *current_ = fx_(*current_); return *current_; }
typename Iterator::pointer operator->() const { return current_.operator->(); }
Iterator& operator++() { return ++current_; }
Iterator& operator++(int) { return current_++; }
bool operator==(const sidefx_iterator<Iterator>& other) const { return current_ == other.current_; }
bool operator==(const Iterator& other) const { return current_ == other; }
bool operator!=(const sidefx_iterator<Iterator>& other) const { return current_ != other.current_; }
bool operator!=(const Iterator& other) const { return current_ != other; }
operator Iterator() const { return current_; }
private:
Iterator current_;
UnaryFunction fx_;
};
Of course this is still very raw, but it should give the idea.
With the above adaptor, I could then write the following:
word.erase(std::find_if(it, it_end, std::not1(std::ref(::isalpha))), word.end());
with the following defined in advance (which could be simplified by some template-magic):
using TransformIterator = sidefx_iterator<typename std::string::iterator>;
TransformIterator it(word.begin(), reinterpret_cast<typename std::string::value_type(*)(typename std::string::value_type)>(static_cast<int(*)(int)>(std::toupper)));
TransformIterator it_end(word.end(), nullptr);
If the standard would include such an adaptor I would use it, because it would mean that it was flawless, but since this is not the case I'll probably keep my loop as it is.
Such an adaptor would allow to reuse existing algorithms and mixing them in different ways not possible today, but it might have downsides as well, which I'm likely overlooking at the moment...
I don't think there's a clean way to do this with a single standard algorithm. None that I know of takes a predicate (you need one to decide when to break early) and allows to modify the elements of the source sequence.
You can write your own generic algorithm if you really want to do it "standard" way. Let's call it, hmm, transform_until:
#include <cctype>
#include <string>
#include <iostream>
template<typename InputIt, typename OutputIt,
typename UnaryPredicate, typename UnaryOperation>
OutputIt transform_until(InputIt first, InputIt last, OutputIt out,
UnaryPredicate p, UnaryOperation op)
{
while (first != last && !p(*first)) {
*out = op(*first);
++first;
++out;
}
return first;
}
int main()
{
std::string word = "test,";
auto it =
transform_until(word.begin(), word.end(), word.begin(),
[](char c) { return !::isalpha(static_cast<unsigned char>(c)); },
[](char c) { return ::toupper(static_cast<unsigned char>(c)); });
word.erase(it, word.end());
std::cout << word << '.';
}
It's debatable whether this is any better than what you have :) Sometimes a plain for loop is best.
After better understanding your question, I have got an idea that might work, but requires Boost.
You could use a transform_iterator which calls toupper on all characters and use that as the inputiterator to find_if or remove_if. I am not familiar enough with Boost to provide an example though.
As #jrok points out, the transform_iterator will only transform the value during iteration and not actually modify the original container. To get around this, instead of operating on the same sequence, you would want to copy to a new one, using something like remove_copy_if. This copies as long as the predicate is NOT true, so std::not1 would be needed. This would replace the remove_if case.
Use std::copy to copy until the iterator returned by std::find_if to get the other case to work.
Finally, if your output string is empty, it will need a std::inserter type of iterator for the output.

What is a C++ container with a "contains" operation?

I want to use a structure in which I insert integers, and then can ask
if (container.contains(3)) { /**/ }
There has to be something like this.
You can use std::vector.
std::vector<int> myVec;
myVec.push_back(3);
if (std::find(myVec.begin(), myVec.end(), 3) != myVec.end())
{
// do your stuff
}
You can even make a little helper function:
template <class T>
bool contains(const std::vector<T> &vec, const T &value)
{
return std::find(vec.begin(), vec.end(), value) != vec.end();
}
Here is how you would use it:
if (contains(myVec, 3)) { /*...*/ }
Simple algorithm:
template <typename Container>
bool contains(Container const& c, typename Container::const_reference v) {
return std::find(c.begin(), c.end(), v) != c.end();
}
You can customize it for more efficient search on some known containers:
template <typename Key, typename Cmp, typename Alloc>
bool contains(std::set<Key,Cmp,Alloc> const& s, Key const& k) {
return s.find(k) != s.end();
}
template <typename Key, typename Value, typename Cmp, typename Alloc>
bool contains(std::map<Key,Value,Cmp,Alloc> const& m, Key const& k) {
return m.find(k) != m.end();
}
And this way you obtain a single algorithm that performs the search on any container type, and is special cased to be faster on those containers which are ordered.
find on an unsorted vector is O(n).
std::set supports O(log n) insertions and lookups and is a good choice.
std::tr1::unordered_set provides a similar interface but supports near-constant-time lookups. It is the best choice if you have TR1 (or C++0x) and do not need to enumerate the elements in order.
What you want is the find_first_of method from the algorithms library. (or binary search, or anything along those lines)
http://www.cplusplus.com/reference/algorithm/find_first_of/
If you want to use a C++ standard container, due to its design, the containers themselves do not necessarily have "contains", but you can always use the find algorithm.
You should choose your container according to the characteristics of your dataset and the access "workload".
For a good reference of the containers and algorithms on the C++ standard library check http://www.cplusplus.com
Containers, Algorithms
If as it seems, your data is made of unique items, for which you want to associate a value, you probably will be well served by the map container. If all you care about is "membership", then set is a better choice.

How can i find a value in a map using binders only

Searching in the second value of a map i use somthing like the following:
typedef std::map<int, int> CMyList;
static CMyList myList;
template<class t> struct second_equal
{
typename typedef t::mapped_type mapped_type;
typename typedef t::value_type value_type;
second_equal(mapped_type f) : v(f) {};
bool operator()(const value_type &a) { return a.second == v;};
mapped_type v;
};
...
int i = 7;
CMyList::iterator it = std::find_if(myList.begin(), myList.end(),
second_equal<CMyList>(i));
Question: How can i do such a find in a single line without supplying a self written template?
Use a selector to select the first or the second element from the value_type that you get from the map.
Use a binder to bind the value (i) to one of the arguments of the std::equal_to function.
Use a composer to use the output of the selector as the other argument of the equal_to function.
//stl version
CMyList::iterator it = std::find_if(
myList.begin(),
myList.end(),
std::compose1(
std::bind2nd(equal_to<CMyList::mapped_type>(), i),
std::select2nd<CMyList::value_type>())) ;
//Boost.Lambda or Boost.Bind version
CMyList::iterator it = std::find_if(
myList.begin(),
myList.end(),
bind( &CMyList::mapped_type::second, _1)==i);
I am going to be off, voluntarily. The problem with lambda's is that (apart from C++0x) you cannot actually use something like _.second at the moment.
Personally, I thus use:
template <class Second>
class CompareSecond
{
public:
CompareSecond(Second const& t) : m_ref(t) {} // actual impl use Boost.callparams
template <class First>
bool operator()(std::pair<First,Second> const& p) const { return p.second == m_ref; }
private:
Second const& m_ref;
};
Which I combine with:
template <class Second>
CompareSecond<Second> compare_second(Second const& t)
{
return CompareSecond<Second>(t);
}
In order to get automatic type deduction.
And this way I can just write
CMyList::iterator it = std::find_if(myList.begin(), myList.end(), compare_second(i));
True, it does not use binders.
But at least, mine is readable and easily understandable, which beats the crap out of clever trickery in my opinion.
Note:
actually I went as far as wrapping STL algorithms to take full containers, so it would be:
CMyList::iterator it = toolbox::find_if(myList, compare_second(i));
which (imho) is clearly as readable as you can get without the auto keyword for type inference.
You can use Boost Lambda
CMyList::iterator it = std::find_if(
myList.begin(), myList.end(),
boost::lambda::bind(&CMyList::value_type::second, boost::lambda::_1) == i);
You can turn this problem around and just write your own algorithm and use it instead. This way you are not stuck with writing lots of little functors.
template <typename Iter, typename T>
Iter find_second(Iter first, Iter last, T value) {
while (first != last) {
if (first->second == value) {
return first;
}
++first;
}
return first;
}
Note this isn't tested or even compiled.
It seems to me that solving this with binders is just asking for lots of ugly code. What you are really asking for is a new algorithm so just add the algorithm. With that said, I would probably end up implementing something like Matthieu M. came up with.