How to find most commonly occurring non-unique keys in Boost MultiIndex?

How to find most commonly occurring non-unique keys in Boost MultiIndex? - c++

Boost MultiIndex Container, when defined to have hashed_non_unique keys, can group equivalent keys together and return them all against an equal_range query, as mentioned here. But I see no way of querying the largest range (or n largest ranges) in a set. Without comparing between the range sizes of distinct hashes, which can become computationally very expensive, is there a way to query the largest equal ranges?
If we consider a simple example, such as this one, I would like to query by frequency and get Tom as the first result, and then Jack and Leo in no particular order.

Ok, if you're using non-unique hashed indices, turns out equal_range does not invoke equality comparison for all the elements in the returned range (unlike common implementations of std::unordered_multimap, BTW), so the following can be very efficient:
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(auto it=i.begin(),end=i.end();it!=end;){
auto next=i.equal_range(*it).second;
res.emplace((std::size_t)std::distance(it,next),*it);
it=next;
}
return res;
}
To check how efficient this actually is, let's try instrumenting the element type:
Live Coliru Demo
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <cstring>
#include <functional>
#include <iostream>
#include <string>
#include <tuple>
#include <map>
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(auto it=i.begin(),end=i.end();it!=end;){
auto next=i.equal_range(*it).second;
res.emplace((std::size_t)std::distance(it,next),*it);
it=next;
}
return res;
}
struct instrumented_string:std::string
{
using std::string::string;
static void reset_nums()
{
num_hashes=0;
num_eqs=0;
}
static std::size_t num_hashes,num_eqs;
};
std::size_t instrumented_string::num_hashes=0;
std::size_t instrumented_string::num_eqs=0;
bool operator==(const instrumented_string& x,const instrumented_string& y)
{
++instrumented_string::num_eqs;
return static_cast<std::string>(x)==y;
}
std::size_t hash_value(const instrumented_string& x)
{
++instrumented_string::num_hashes;
return boost::hash<std::string>{}(x);
}
using namespace boost::multi_index;
using container=multi_index_container<
instrumented_string,
indexed_by<
hashed_non_unique<identity<instrumented_string>>
>
>;
int main()
{
auto values={"Tom","Jack","Leo","Bjarne","Subhamoy"};
container c;
for(auto& v:values){
for(auto i=100*std::strlen(v);i--;)c.insert(v);
}
instrumented_string::reset_nums();
auto gs=group_sizes(c);
for(const auto& g:gs){
std::cout<<g.first<<": "<<g.second.get()<<"\n";
}
std::cout<<"# hashes: "<<instrumented_string::num_hashes<<"\n";
std::cout<<"# eqs: "<<instrumented_string::num_eqs<<"\n";
}
Output
800: Subhamoy
600: Bjarne
400: Jack
300: Tom
300: Leo
# hashes: 5
# eqs: 5
So, for a container with 2,400 elements, invoking group_sizes has resulted in just 5 hash calculations and 5 equality comparisons (plus ~2,400 iterator increments, of course).
If you really want to get rid of hashes, the following can do:
Live Coliru Demo
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <cstring>
#include <functional>
#include <iostream>
#include <memory>
#include <string>
#include <map>
template<typename HashIndex>
struct internal_reference
{
const HashIndex& i;
const typename HashIndex::value_type& r;
std::size_t buc;
};
template<typename HashIndex>
struct internal_reference_equal_to
{
bool operator()(
const typename HashIndex::value_type& x,
const internal_reference<HashIndex>& y)const
{
return
std::addressof(x)==std::addressof(y.r)||
y.i.key_eq()(y.i.key_extractor()(x),y.i.key_extractor()(y.r));
}
bool operator()(
const internal_reference<HashIndex>& x,
const typename HashIndex::value_type& y)const
{
return (*this)(y,x);
}
};
template<typename HashIndex>
struct internal_reference_hash
{
std::size_t operator()(const internal_reference<HashIndex>& x)const
{
return x.buc;
}
};
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(std::size_t buc=0,buc_count=i.bucket_count();buc<buc_count;++buc){
for(auto it=i.begin(buc),end=i.end(buc);it!=end;){
auto p=i.equal_range(
internal_reference<HashIndex>{i,*it,buc},
internal_reference_hash<HashIndex>{},
internal_reference_equal_to<HashIndex>{});
std::size_t dist=0;
auto next=it;
while(p.first!=p.second){
++p.first;
++dist;
++next;
}
res.emplace(dist,*it);
it=next;
}
}
return res;
}
struct instrumented_string:std::string
{
using std::string::string;
static void reset_nums()
{
num_hashes=0;
num_eqs=0;
}
static std::size_t num_hashes,num_eqs;
};
std::size_t instrumented_string::num_hashes=0;
std::size_t instrumented_string::num_eqs=0;
bool operator==(const instrumented_string& x,const instrumented_string& y)
{
++instrumented_string::num_eqs;
return static_cast<std::string>(x)==y;
}
std::size_t hash_value(const instrumented_string& x)
{
++instrumented_string::num_hashes;
return boost::hash<std::string>{}(x);
}
using namespace boost::multi_index;
using container=multi_index_container<
instrumented_string,
indexed_by<
hashed_non_unique<identity<instrumented_string>>
>
>;
int main()
{
auto values={"Tom","Jack","Leo","Bjarne","Subhamoy"};
container c;
for(auto& v:values){
for(auto i=100*std::strlen(v);i--;)c.insert(v);
}
instrumented_string::reset_nums();
auto gs=group_sizes(c);
for(const auto& g:gs){
std::cout<<g.first<<": "<<g.second.get()<<"\n";
}
std::cout<<"# hashes: "<<instrumented_string::num_hashes<<"\n";
std::cout<<"# eqs: "<<instrumented_string::num_eqs<<"\n";
}
Output
800: Subhamoy
600: Bjarne
400: Jack
300: Tom
300: Leo
# hashes: 0
# eqs: 0
But please bear in mind this version of group_sizes exploits the undocumented fact that elements with hash value h get placed in the bucket h%bucket_count() (or, put another way, internal_reference<HashIndex> hashing is technically not a conformant compatible extension of the index hash function).

It seems like you might be metter served with a std::map<K, std::vector<V> > like interface here.
You would still always have to do the counting.
To have the counting done "magically" you might consider making the "bucket key" a refcounting type.
This would be more magical than I'd be comfortable with for my code-bases. In particular, copied elements could easily cause overcounting.
Approach 1: BMI + RangeV3 for syntactic sugar
Warning: I consider this "advanced", as in the learning curve might be steepish. However, when you wield Ranges with ease, this can become a great productivity boost.
Note also, this does not in any way promise to increase performance. But you should note that no elements are copied, the vector (groups) merely contains subranges, which are iterator ranges into the multi-index container.
Live On Compiler Explorer
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index_container.hpp>
#include <iostream>
#include <iomanip>
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace bmi = boost::multi_index;
namespace vw = ranges::views;
namespace act = ranges::actions;
struct Person {
int m_id;
std::string m_name;
friend std::ostream& operator<<(std::ostream& os, Person const& p) {
return os << "[" << p.m_id << ", " << std::quoted(p.m_name) << "]";
}
};
typedef bmi::multi_index_container<
Person,
bmi::indexed_by<
bmi::ordered_unique<bmi::member<Person, int, &Person::m_id>>,
bmi::ordered_unique<
bmi::tag<struct by_name_id>,
bmi::composite_key<Person,
bmi::member<Person, std::string, &Person::m_name>,
bmi::member<Person, int, &Person::m_id>>
>
> >
Roster;
template <typename Index, typename KeyExtractor>
std::size_t distinct(const Index& i, KeyExtractor key) {
std::size_t res = 0;
for (auto it = i.begin(), it_end = i.end(); it != it_end;) {
++res;
it = i.upper_bound(key(*it));
}
return res;
}
int main() {
Roster const r {
{1, "Tom"},
{2, "Jack"},
{3, "Tom"},
{4, "Leo"}
};
fmt::print("Roster: {}\n", r);
static constexpr auto eq_ = std::equal_to<>{};
static constexpr auto name_ = std::mem_fn(&Person::m_name);
static constexpr auto size_ = [](auto const& r) constexpr { return std::distance(begin(r), end(r)); };
auto& idx = r.get<by_name_id>();
fmt::print("Distinct: {}, Index: {}\n", distinct(idx, name_), idx);
auto by_name_ = vw::group_by([](auto const&... arg) { return eq_(name_(arg)...); });
auto by_size_ = [](auto const&... subrange) { return (size_(subrange) > ...); };
auto groups = idx | by_name_ | ranges::to_vector;
for (auto&& x : groups |= act::sort(by_size_)) {
fmt::print("#{} persons in group {}: {}\n",
size_(x),
name_(ranges::front(x)),
x);
}
}
Prints:
Roster: {[1, "Tom"], [2, "Jack"], [3, "Tom"], [4, "Leo"]}
Distinct: 3, Index: {[2, "Jack"], [4, "Leo"], [1, "Tom"], [3, "Tom"]}
#2 persons in group Tom: {[1, "Tom"], [3, "Tom"]}
#1 persons in group Jack: {[2, "Jack"]}
#1 persons in group Leo: {[4, "Leo"]}
Note, I merely kept the distinct() function from the original link. You could drop it to remove some noise.
Approach 2: The same, but w/o Boost
Multi-index seems to be supplying nothing more than the ordered container now, so let's simplify:
Live On Compiler Explorer
#include <set>
#include <iostream>
#include <iomanip>
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace vw = ranges::views;
namespace act = ranges::actions;
struct Person {
int m_id;
std::string m_name;
friend std::ostream& operator<<(std::ostream& os, Person const& p) {
return os << "[" << p.m_id << ", " << std::quoted(p.m_name) << "]";
}
bool operator<(Person const& o) const { return m_name < o.m_name; }
};
int main() {
std::multiset<Person> const r {
{1, "Tom"},
{2, "Jack"},
{3, "Tom"},
{4, "Leo"}
};
fmt::print("Roster: {}\n", r);
static constexpr auto eq_ = std::equal_to<>{};
static constexpr auto name_ = std::mem_fn(&Person::m_name);
static constexpr auto size_ = [](auto const& r) constexpr { return std::distance(begin(r), end(r)); };
auto by_name_ = vw::group_by([](auto const&... arg) { return eq_(name_(arg)...); });
auto by_size_ = [](auto const&... subrange) { return (size_(subrange) > ...); };
auto groups = r | by_name_ | ranges::to_vector;
for (auto&& x : groups |= act::sort(by_size_)) {
fmt::print("#{} persons in group {}: {}\n",
size_(x),
name_(ranges::front(x)),
x);
}
}
Prints
Roster: {[2, "Jack"], [4, "Leo"], [1, "Tom"], [3, "Tom"]}
#2 persons in group Tom: {[1, "Tom"], [3, "Tom"]}
#1 persons in group Jack: {[2, "Jack"]}
#1 persons in group Leo: {[4, "Leo"]}
Bonus: Slightly more simplified assuming equality operator on Person suffices: https://godbolt.org/z/58xsTK

Related

How can I convert std::vector<T> to a vector of pairs std::vector<std::pair<T,T>> using an STL algorithm?

I have a vector of integers:
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
Given that values.size() will always be even.
I simply want to convert the adjacent elements into a pair, like this:
std::vector<std::pair<int,int>> values = { {1,2}, {3,4} , {5,6}, {7,8} ,{9,10} };
I.e., the two adjacent elements are joined into a pair.
What STL algorithm can I use to easily achieve this? Is it possible to achieve this through some standard algorithms?
Of course, I can easily write an old school indexed for loop to achieve that. But I want to know what the simplest solution could look like using rangebased for loops or any other STL algorithm, like std::transform, etc.

Once we have C++23's extension to <ranges>, you can get most of the way there with std::ranges::views::chunk, although that produces subranges, not pairs.
#include <iostream>
#include <ranges>
#include <vector>
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto chunk_to_pair = [](auto chunk)
{
return std::pair(*chunk.begin(), *std::next(chunk.begin()));
};
for (auto [first, second] : values | std::ranges::views::chunk(2) | std::ranges::views::transform(chunk_to_pair))
{
std::cout << first << second << std::endl;
}
}
Alternatively, you could achieve a similar result by ziping a pair of strided views
#include <iostream>
#include <ranges>
#include <vector>
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto odds = values | std::ranges::views::drop(0) | std::ranges::views::stride(2);
auto evens = values | std::ranges::views::drop(1) | std::ranges::views::stride(2);
for (auto [first, second] : std::ranges::views::zip(odds, evens))
{
std::cout << first << second << std::endl;
}
}
That last one can be generalised to n-tuples
template <size_t N>
struct tuple_chunk_t
{
template <typename R, size_t... Is>
auto impl(R && r, std::index_sequence<Is...>)
{
using namespace ranges::view;
return zip(r | drop(Is) | stride(N)...);
}
template <typename R>
auto operator()(R && r) const
{
return impl(std::forward<R>(r), std::make_index_sequence<N>{});
}
template <typename R>
friend auto operator|(R && r, chunk_t)
{
return impl(std::forward<R>(r), std::make_index_sequence<N>{});
}
};
template <size_t N>
constexpr tuple_chunk_t<N> tuple_chunk;

I'm not sure why you would require a standard algorithm when writing it yourself is roughly 5 lines of code (plus boilerplate):
template<class T>
std::vector<std::pair<T, T>> group_pairs(const std::vector<T>& values)
{
assert(values.size() % 2 == 0);
auto output = std::vector<std::pair<T, T>>();
output.reserve(values.size()/2);
for(size_t i = 0; i < values.size(); i+=2)
output.emplace_back(values[i], values[i+1]);
return output;
}
And call it like so:
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto result = group_pairs(values)
Live Demo

I am not aware of a standard algorithm that does what you want directly (though I am not very familiar with C++20 and beyond). You can always write a loop and most loops can be expressed via std::for_each which is a standard algorithm.
As you are accumulating elements in pairs, I would give std::accumulate a try:
#include <vector>
#include <numeric>
#include <iostream>
struct pair_accumulator {
std::vector<std::pair<int,int>> result;
int temp = 0;
bool set = false;
pair_accumulator& operator+(int x){
if (set) {
result.push_back({temp,x});
set = false;
} else {
temp = x;
set = true;
}
return *this;
}
};
int main() {
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto x = std::accumulate(values.begin(),values.end(),pair_accumulator{}).result;
for (const auto& e : x) {
std::cout << e.first << " " << e.second << "\n";
}
}
Whether this is simpler than writing a plain loop is questionable admittedly.
If possible I would try to not transform the vector. Instead of accessing result[i].first you can as well use values[i*2] and similar for second. If this is not feasible the next option is to populate a std::vector<std::pair<int,int>> from the start so you don't have to do the transformation. For the first, depending on what you need in details, the following might be a start:
#include <vector>
#include <iostream>
struct view_as_pairs {
std::vector<int>& values;
struct proxy {
std::vector<int>::iterator it;
int& first() { return *it;}
int& second() { return *(it +1); }
};
proxy operator[](size_t index){
return proxy{values.begin() + index*2};
}
size_t size() { return values.size() / 2;}
};
int main() {
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
view_as_pairs v{values};
for (size_t i=0; i < v.size(); ++i){
std::cout << v[i].first() << " " << v[i].second() << "\n";
}
}
TL;DR: Consider if you can avoid the transformation. If you cannot avoid it, it is probably cleanest to write a loop. Standard algorithms help often but not always.

OK, I hinted in the comments about using std::adjacent_find, so here is how you would do this.
And yes, many (even myself) considers this a hack, where we are using a tool meant for something else to make short work of solving a seemingly unrelated problem:
#include <algorithm>
#include <iostream>
#include <utility>
#include <vector>
int main()
{
//Test data
std::vector<int> v = {1,2,3,4,5,6,7,8,9,10};
// results
std::vector<std::pair<int,int>> result;
// save flag
bool save_it = true;
// Use std::adjacent_find
std::adjacent_find(v.begin(), v.end(), [&](int n1, int n2)
{ if (save_it) result.push_back({n1,n2}); save_it = !save_it; return false; });
for (auto& pr : result)
std::cout << pr.first << " " << pr.second << "\n";
}
Output:
1 2
3 4
5 6
7 8
9 10
The way it works is we ignore the second, fourth, sixth, etc. pairs, and only save the first, third, fifth, etc. pairs. That's controlled by a boolean flag variable, save_it.
Note that since we want to process all pairs, the std::adjacent_find predicate always returns false. That's the hackish part of this solution.

The solutions so far try to use the std::vector iterators as input to the algorithms directly. How about defining a custom iterator that returns a std::pair and has strides of 2? Creating the vector of pairs is then a one-liner that uses std::copy. The iterator effectively provides a "view" onto the original vector in terms of pairs. This also allows the use of many of the standard algorithms. The following example could also be generalized quite a bit to work with most container iterators, i.e. you do the difficult work of defining such an iterator once and then you can apply it to all sorts of containers and algorithms. Live example: https://godbolt.org/z/ceEsvKhzd
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
struct pair_iterator {
using difference_type = std::vector<int>::const_iterator::difference_type;
using value_type = std::pair<int, int>;
using pointer = value_type*;
using reference = value_type; // Not a pair&, but that is ok for LegacyIterator
// Can't be forward_iterator_tag because "reference" is not a pair&
using iterator_category = std::input_iterator_tag;
reference operator*()const { return {*base_iter, *(base_iter + 1)}; }
pair_iterator & operator++() { base_iter += 2; return *this; }
pair_iterator operator++(int) { auto ret = *this; ++(*this); return ret; }
friend bool operator==(pair_iterator lhs, pair_iterator rhs){
return lhs.base_iter == rhs.base_iter;
}
friend bool operator!=(pair_iterator lhs, pair_iterator rhs){
return lhs.base_iter != rhs.base_iter;
}
std::vector<int>::const_iterator base_iter{};
};
auto pair_begin(std::vector<int> const & v){ assert(v.size()%2==0); return pair_iterator{v.begin()}; }
auto pair_end(std::vector<int> const & v){ assert(v.size()%2==0); return pair_iterator{v.end()}; }
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
std::vector<std::pair<int, int>> pair_values;
std::copy(pair_begin(values), pair_end(values), std::back_inserter(pair_values));
for (auto const & pair : pair_values) {
std::cout << "{" << pair.first << "," << pair.second << "} ";
}
std::cout << std::endl;
}

Is there an elegant possibility to include several sets into a multimap?

I have several sets of the same type:
std::set< TDate> spieltagDatum;
I would like to include all of them into a multimap of this type:
std::multimap<TDate, int> ereignis;
Is there an elegant possibility (perhaps with a lambda related function?) to include all members of ONE set into the multimap above not using the iterator mechanism? (The multimap pairs should be enriched with the INT parameter during insert).

I can suggest instead of iterators to use simplified for loop with auto like below.
I used integer TDate just for example, also instead of 123 in my code you may put any function for filling in values of multimap.
Try it online!
#include <map>
#include <set>
int main() {
using TDate = int;
std::set<TDate> spieltagDatum = {3, 5, 7};
std::multimap<TDate, int> ereignis;
for (auto & e: spieltagDatum)
ereignis.emplace(e, 123);
}

What do you mean by "not using the iterator mechanism"? (Don't use iterators at your own peril)
As you describe, what you do is to 1) transform (by enrich) and 2) insert, so the answer is std::tranform + std::insert.
#include <algorithm> // transform
#include <cassert>
#include <map>
#include <set>
int main() {
using TDate = int;
std::set<TDate> spieltagDatum = {3, 5, 7};
std::set<TDate> ...;
std::multimap<TDate, int> ereignis;
auto enrich = [](auto e){return std::make_pair(e, 123);};
std::transform(
begin(spieltagDatum), end(spieltagDatum),
std::inserter(ereignis, end(ereignis)),
enrich
);
... // repeat for other sets if necessary
assert( ereignis.find(5) != ereignis.end() );
assert( ereignis.find(5)->second == 123 );
}
https://godbolt.org/z/zzYbKK83d
A more declarative option using libraries, based on #prehistoricpenguin answer is:
(IMO it is worth mainly in C++17, where so many of the templates parameters are not really necessary)
#include <cassert>
#include <map>
#include <set>
#include <boost/iterator/transform_iterator.hpp>
int main() {
using TDate = int;
std::set<TDate> spieltagDatum = {3, 5, 7};
auto enriched = [](auto it){
return boost::transform_iterator(it, [](auto e){return std::pair(e, 123);});
};
std::multimap ereignis(
enriched(begin(spieltagDatum)),
enriched(end (spieltagDatum))
);
assert( ereignis.find(5) != ereignis.end() );
assert( ereignis.find(5)->second == 123 );
}
https://godbolt.org/z/6ajssjjjP

One possible answer is to write a convert iterator class, then we use the iterator to constructor the multimap instance.
#include <iostream>
#include <iterator>
#include <map>
#include <set>
template <typename KeyT, typename ValT>
class ConvertIter
: public std::iterator<std::forward_iterator_tag, std::pair<KeyT, ValT>> {
using SetIter = typename std::set<KeyT>::iterator;
public:
ConvertIter(SetIter itr, ValT v = ValT{}) : _itr(itr), _val(v) {}
bool operator==(const ConvertIter& other) { return other._itr == _itr; }
bool operator!=(const ConvertIter& other) { return other._itr != _itr; }
std::pair<KeyT, ValT> operator*() const {
return {*_itr, _val};
}
ConvertIter& operator++() {
++_itr;
return *this;
}
ConvertIter& operator++(int) {
++_itr;
return *this;
}
private:
SetIter _itr;
ValT _val;
};
int main() {
using TDate = int;
std::set<TDate> spieltagDatum = {3, 5, 7};
std::multimap<TDate, int> ereignis(
ConvertIter<TDate, int>(spieltagDatum.begin(), 123),
ConvertIter<TDate, int>(spieltagDatum.end()));
for (auto [date, val] : ereignis) {
std::cout << "[" << date << "," << val << "]" << std::endl;
}
return 0;
}
Demo:
https://godbolt.org/z/cr98f15jq

How to find the indices of matching elements of sorted containers?

I'm trying to get the indices of one container where the elements match. Both containers are sorted in ascending order. Is there an algorithm or combo of algorithms that would place the indices of matching elements of sorted containers into another container?
I've coded an algorithm already, but was wondering if this has been coded before in the stl in some way that I didn't think of?
I would like the algorithm to have a running complexity comparable to the one I suggested, which I belive is O(min(m, n)).
#include <iterator>
#include <iostream>
template <typename It, typename Index_it>
void get_indices(It selected_it, It selected_it_end, It subitems_it, It subitems_it_end, Index_it indices_it)
{
auto reference_it = selected_it;
while (selected_it != selected_it_end && subitems_it != subitems_it_end) {
if (*selected_it == *subitems_it) {
*indices_it++ = std::distance(reference_it, selected_it);
++selected_it;
++subitems_it;
}
else if (*selected_it < *subitems_it) {
++selected_it;
}
else {
++subitems_it;
}
}
}
int main()
{
int items[] = { 1, 3, 6, 8, 13, 17 };
int subitems[] = { 3, 6, 17 };
int indices[std::size(subitems)] = {0};
auto selected_it = std::begin(items), it = std::begin(subitems);
auto indices_it = std::begin(indices);
get_indices(std::begin(items), std::end(items)
, std::begin(subitems), std::end(subitems)
, std::begin(indices));
for (auto i : indices) {
std::cout << i << ", ";
}
return 0;
}

We can use find_if to simplify the implementation of the function:
template<class SourceIt, class SelectIt, class IndexIt>
void get_indicies(SourceIt begin, SourceIt end, SelectIt sbegin, SelectIt send, IndexIt dest) {
auto scan = begin;
for(; sbegin != send; ++sbegin) {
auto&& key = *sbegin;
scan = std::find_if(scan, end, [&](auto&& obj) { return obj >= key; });
if(scan == end) break;
for(; scan != end && *scan == key; ++scan) {
*dest = std::distance(begin, scan);
++dest;
}
}
}
This doesn't make it that much shorter, but the code looks a little cleaner now. You're scanning until you find something as big as or equal to the key, and then you copy indicies to the destination as long as the source matches key.

maybe I misunderstodd the question. But there is a function in the algorithm library.
std::set_intersection
This does, what you want in one function. See:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main()
{
// Input values
std::vector<int> items{ 1,3,6,8,13,17 };
std::vector<int> subitems{ 3,6,17 };
// Result
std::vector<int> result;
// Do the work. One liner
std::set_intersection(items.begin(),items.end(), subitems.begin(),subitems.end(),std::back_inserter(result));
// Debug output: Show result
std::copy(result.begin(), result.end(), std::ostream_iterator<int>(std::cout, " "));
return 0;
}
If I misunderstood, then please tell me and I will find another solution.
EDIT:
I indeed misunderstood. You wanted the indices. Then maybe like this?
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
using Iter = std::vector<int>::iterator;
int main()
{
// Input values
std::vector<int> items{ 1,3,6,8,13,17 };
std::vector<int> subitems{ 3,6,17 };
// Result
std::vector<int> indices{};
Iter it;
// Do the work.
std::for_each(subitems.begin(), subitems.end(), [&](int i) {it = find(items.begin(), items.end(), i); if (it != items.end()) indices.push_back(std::distance(items.begin(),it));});
// Debug output: Show result
std::copy(indices.begin(), indices.end(), std::ostream_iterator<int>(std::cout, " "));
return 0;
}
Unfortunately a very long "one-liner".
I need to think more . . .

The answer is yes but it will come with C++20:
you can use ranges for this purpose:
first make a view with some predicate you like:
auto result = items | ranges::view::filter(predicate);
then take the iterator to the original array from base, for example result.begin().base() will give you the iterator to the begin of result in the original array.
#include <algorithm>
#include <iostream>
#include <vector>
#include <iterator>
#include <range/v3/view/filter.hpp>
#include <range/v3/view/transform.hpp>
int main()
{
std::vector<int> items = { 1, 3, 6, 8, 13, 17 };
std::vector<int> subitems = { 3, 6, 17 };
auto predicate = [&](int& n){
for(auto& s : subitems)
if(n == s)
return true;
return false;
};
auto result = items | ranges::view::filter(predicate);
for (auto& n : result)
{
std::cout << n << '\n';
}
for(auto it = result.begin(); it != result.end(); ++it )
std::cout << it.base() - items.begin() << ' ';
}
see the godbolt

By using std::set_intersection, defining an assignment_iterator class and a assignment helper, this is possible:
#include <iterator>
#include <iostream>
#include <algorithm>
#include <vector>
template <typename Transform>
class assignment_iterator
{
Transform transform;
public:
using iterator_category = std::output_iterator_tag;
using value_type = void;
using difference_type = void;
using pointer = void;
using reference = void;
assignment_iterator(Transform transform)
: transform(transform)
{}
// For some reason VC++ is assigning the iterator inside of std::copy().
// Not needed for other compilers.
#ifdef _MSC_VER
assignment_iterator& operator=(assignment_iterator const& copy)
{
transform.~Transform();
new (&transform) Transform(copy.transform);
return *this;
}
#endif
template <typename T>
constexpr assignment_iterator& operator=(T& value) {
transform(value);
return *this;
}
constexpr assignment_iterator& operator* ( ) { return *this; }
constexpr assignment_iterator& operator++( ) { return *this; }
constexpr assignment_iterator& operator++(int) { return *this; }
};
template <typename Transform>
assignment_iterator<Transform> assignment(Transform&& transform)
{
return { std::forward<Transform>(transform) };
}
int main()
{
int items[] = { 1, 3, 6, 8, 13, 17 };
int subitems[] = { 3, 6, 17 };
std::vector<int> indices;
std::set_intersection(std::begin(items), std::end(items)
, std::begin(subitems), std::end(subitems)
, assignment([&items, &indices](int& item) {
return indices.push_back(&item - &*std::begin(items));
})
);
std::copy(indices.begin(), indices.end()
, assignment([&indices](int& index) {
std::cout << index;
if (&index != &std::end(indices)[-1])
std::cout << ", ";
})
);
return 0;
}
Demo
It's more code, but maybe assignment is a more generic means to do other operations, that currently require a specific implementations like back_inserter and ostream_iterator, and thus be less code in the long run (e.g. like the other use above with std::copy)?
This should work properly all the time based on the documentation here:
elements will be copied from the first range to the destination range.

You can use std::find and std::distance to find the index of the match, then put it in the container.
#include <vector>
#include <algorithm>
int main ()
{
std::vector<int> v = {1,2,3,4,5,6,7};
std::vector<int> matchIndexes;
std::vector<int>::iterator match = std::find(v.begin(), v.end(), 5);
int index = std::distance(v.begin(), match);
matchIndexes.push_back(index);
return 0;
}
To match multiple elements, you can use std::search in similar fashion.

LowerBound in multiset stl

I was trying to find how many element are less than a certain X in a multiset by using:
mset.lower_bound(X) - mset.begin()
But it didn't work. Any workarounds?

You may use:
std::distance(mset.begin(), mset.lower_bound(X));
To make it robust, use:
size_t count = 0;
auto found = mset.lower_bound(X);
if ( found != mset.end() )
{
count = std::distance(mset.begin(), found);
}

If computing the number of items below a lower bound is done frequently, and items are inserted seldom, you might get better performance using a std::vector and keeping it sorted.
Particularly if T is moveable.
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <iterator>
auto insert(std::vector<std::string>& v, std::string s)
{
auto lb = std::lower_bound(v.begin(), v.end(), s);
v.insert(lb, std::move(s));
}
int main()
{
std::vector<std::string> v;
insert(v, "goodbye");
insert(v, "world");
insert(v, "cruel");
auto count = std::distance(v.begin(), std::lower_bound(v.begin(), v.end(), "goodbye"));
std::cout << count << std::endl;
std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
but why still use std::distance with vector?
Because we might change our mind if we choose to profile with different container types, so it's better to be idiomatic. The standard library contains specialisations to ensure that the idiom is optimal:
#include <vector>
#include <set>
#include <string>
#include <algorithm>
#include <iostream>
#include <iterator>
template<class Range, class Value, class Pred = std::less<>>
auto lower_bound(Range&& range, Value&& v, Pred&& pred = Pred())
{
return std::lower_bound(std::begin(range), std::end(range),
std::forward<Value>(v),
std::forward<Pred>(pred));
}
template<class Container>
auto insert(Container&& v, std::string s)
{
auto lb = lower_bound(v, s);
v.insert(lb, std::move(s));
}
template<class Range, class OutIter>
auto copy(Range&& range, OutIter dest)
{
return std::copy(std::begin(range), std::end(range), dest);
}
auto test = [](auto&& container)
{
insert(container, "goodbye");
insert(container, "world");
insert(container, "cruel");
auto count = std::distance(std::begin(container), lower_bound(container, "goodbye"));
std::cout << count << std::endl;
copy(container, std::ostream_iterator<std::string>(std::cout, "\n"));
};
int main()
{
test(std::vector<std::string>{});
test(std::multiset<std::string>{});
}

C++ algorithm like python's 'groupby'

Are there any C++ transformations which are similar to itertools.groupby()?
Of course I could easily write my own, but I'd prefer to leverage the idiomatic behavior or compose one from the features provided by the STL or boost.
#include <cstdlib>
#include <map>
#include <algorithm>
#include <string>
#include <vector>
struct foo
{
int x;
std::string y;
float z;
};
bool lt_by_x(const foo &a, const foo &b)
{
return a.x < b.x;
}
void list_by_x(const std::vector<foo> &foos, std::map<int, std::vector<foo> > &foos_by_x)
{
/* ideas..? */
}
int main(int argc, const char *argv[])
{
std::vector<foo> foos;
std::map<int, std::vector<foo> > foos_by_x;
std::vector<foo> sorted_foos;
std::sort(foos.begin(), foos.end(), lt_by_x);
list_by_x(sorted_foos, foos_by_x);
return EXIT_SUCCESS;
}

This doesn't really answer your question, but for the fun of it, I implemented a group_by iterator. Maybe someone will find it useful:
#include <assert.h>
#include <iostream>
#include <set>
#include <sstream>
#include <string>
#include <vector>
using std::cout;
using std::cerr;
using std::multiset;
using std::ostringstream;
using std::pair;
using std::vector;
struct Foo
{
int x;
std::string y;
float z;
};
struct FooX {
typedef int value_type;
value_type operator()(const Foo &f) const { return f.x; }
};
template <typename Iterator,typename KeyFunc>
struct GroupBy {
typedef typename KeyFunc::value_type KeyValue;
struct Range {
Range(Iterator begin,Iterator end)
: iter_pair(begin,end)
{
}
Iterator begin() const { return iter_pair.first; }
Iterator end() const { return iter_pair.second; }
private:
pair<Iterator,Iterator> iter_pair;
};
struct Group {
KeyValue value;
Range range;
Group(KeyValue value,Range range)
: value(value), range(range)
{
}
};
struct GroupIterator {
typedef Group value_type;
GroupIterator(Iterator iter,Iterator end,KeyFunc key_func)
: range_begin(iter), range_end(iter), end(end), key_func(key_func)
{
advance_range_end();
}
bool operator==(const GroupIterator &that) const
{
return range_begin==that.range_begin;
}
bool operator!=(const GroupIterator &that) const
{
return !(*this==that);
}
GroupIterator operator++()
{
range_begin = range_end;
advance_range_end();
return *this;
}
value_type operator*() const
{
return value_type(key_func(*range_begin),Range(range_begin,range_end));
}
private:
void advance_range_end()
{
if (range_end!=end) {
typename KeyFunc::value_type value = key_func(*range_end++);
while (range_end!=end && key_func(*range_end)==value) {
++range_end;
}
}
}
Iterator range_begin;
Iterator range_end;
Iterator end;
KeyFunc key_func;
};
GroupBy(Iterator begin_iter,Iterator end_iter,KeyFunc key_func)
: begin_iter(begin_iter),
end_iter(end_iter),
key_func(key_func)
{
}
GroupIterator begin() { return GroupIterator(begin_iter,end_iter,key_func); }
GroupIterator end() { return GroupIterator(end_iter,end_iter,key_func); }
private:
Iterator begin_iter;
Iterator end_iter;
KeyFunc key_func;
};
template <typename Iterator,typename KeyFunc>
inline GroupBy<Iterator,KeyFunc>
group_by(
Iterator begin,
Iterator end,
const KeyFunc &key_func = KeyFunc()
)
{
return GroupBy<Iterator,KeyFunc>(begin,end,key_func);
}
static void test()
{
vector<Foo> foos;
foos.push_back({5,"bill",2.1});
foos.push_back({5,"rick",3.7});
foos.push_back({3,"tom",2.5});
foos.push_back({7,"joe",3.4});
foos.push_back({5,"bob",7.2});
ostringstream out;
for (auto group : group_by(foos.begin(),foos.end(),FooX())) {
out << group.value << ":";
for (auto elem : group.range) {
out << " " << elem.y;
}
out << "\n";
}
assert(out.str()==
"5: bill rick\n"
"3: tom\n"
"7: joe\n"
"5: bob\n"
);
}
int main(int argc,char **argv)
{
test();
return 0;
}

Eric Niebler's ranges library provides a group_by view.
according to the docs it is a header only library and can be included easily.
It's supposed to go into the standard C++ space, but can be used with a recent C++11 compiler.
minimal working example:
#include <map>
#include <vector>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
int main(int argc, char **argv) {
vector<int> l { 0,1,2,3,6,5,4,7,8,9 };
ranges::v3::sort(l);
auto x = l | view::group_by([](int x, int y) { return x / 5 == y / 5; });
map<int, vector<int>> res;
auto i = x.begin();
auto e = x.end();
for (;i != e; ++i) {
auto first = *((*i).begin());
res[first / 5] = to_vector(*i);
}
// res = { 0 : [0,1,2,3,4], 1: [5,6,7,8,9] }
}
(I compiled this with clang 3.9.0. and --std=c++11)

I recently discovered cppitertools.
It fulfills this need exactly as described.
https://github.com/ryanhaining/cppitertools#groupby

What is the point of bloating standard C++ library with an algorithm that is one line of code?
for (const auto & foo : foos) foos_by_x[foo.x].push_back(foo);
Also, take a look at std::multimap, it might be just what you need.
UPDATE:
The one-liner I have provided is not well-optimized for the case when your vector is already sorted. A number of map lookups can be reduced if we remember the iterator of previously inserted object, so it the "key" of the next object and do a lookup only when the key is changing. For example:
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
struct foo {
int x;
std::string y;
float z;
};
class optimized_inserter {
public:
typedef std::map<int, std::vector<foo> > map_type;
optimized_inserter(map_type & map) : map(&map), it(map.end()) {}
void operator()(const foo & obj) {
typedef map_type::value_type value_type;
if (it != map->end() && last_x == obj.x) {
it->second.push_back(obj);
return;
}
last_x = obj.x;
it = map->insert(value_type(obj.x, std::vector<foo>({ obj }))).first;
}
private:
map_type *map;
map_type::iterator it;
int last_x;
};
int main()
{
std::vector<foo> foos;
std::map<int, std::vector<foo>> foos_by_x;
foos.push_back({ 1, "one", 1.0 });
foos.push_back({ 3, "third", 2.5 });
foos.push_back({ 1, "one.. but third", 1.5 });
foos.push_back({ 2, "second", 1.8 });
foos.push_back({ 1, "one.. but second", 1.5 });
std::sort(foos.begin(), foos.end(), [](const foo & lhs, const foo & rhs) {
return lhs.x < rhs.x;
});
std::for_each(foos.begin(), foos.end(), optimized_inserter(foos_by_x));
for (const auto & p : foos_by_x) {
std::cout << "--- " << p.first << "---\n";
for (auto & f : p.second) {
std::cout << '\t' << f.x << " '" << f.y << "' / " << f.z << '\n';
}
}
}

How about this?
template <typename StructType, typename FieldSelectorUnaryFn>
auto GroupBy(const std::vector<StructType>& instances, const FieldSelectorUnaryFn& fieldChooser)
{
StructType _;
using FieldType = decltype(fieldChooser(_));
std::map<FieldType, std::vector<StructType>> instancesByField;
for (auto& instance : instances)
{
instancesByField[fieldChooser(instance)].push_back(instance);
}
return instancesByField;
}
and use it like this:
auto itemsByX = GroupBy(items, [](const auto& item){ return item.x; });

I wrote a C++ library to address this problem in an elegant way. Given your struct
struct foo
{
int x;
std::string y;
float z;
};
To group by y you simply do:
std::vector<foo> dataframe;
...
auto groups = group_by(dataframe, &foo::y);
You can also group by more than one variable:
auto groups = group_by(dataframe, &foo::y, &foo::x);
And then iterate through the groups normally:
for(auto& [key, group]: groups)
{
// do something
}
It also has other operations such as: subset, concat, and others.

I would simply use boolinq.h, which includes all of LINQ. No documentation, but very simple to use.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find most commonly occurring non-unique keys in Boost MultiIndex? - c++

Related

How can I convert std::vector<T> to a vector of pairs std::vector<std::pair<T,T>> using an STL algorithm?

Is there an elegant possibility to include several sets into a multimap?

How to find the indices of matching elements of sorted containers?

LowerBound in multiset stl

C++ algorithm like python's 'groupby'

Categories

Resources