Related
I have a vector of integers:
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
Given that values.size() will always be even.
I simply want to convert the adjacent elements into a pair, like this:
std::vector<std::pair<int,int>> values = { {1,2}, {3,4} , {5,6}, {7,8} ,{9,10} };
I.e., the two adjacent elements are joined into a pair.
What STL algorithm can I use to easily achieve this? Is it possible to achieve this through some standard algorithms?
Of course, I can easily write an old school indexed for loop to achieve that. But I want to know what the simplest solution could look like using rangebased for loops or any other STL algorithm, like std::transform, etc.
Once we have C++23's extension to <ranges>, you can get most of the way there with std::ranges::views::chunk, although that produces subranges, not pairs.
#include <iostream>
#include <ranges>
#include <vector>
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto chunk_to_pair = [](auto chunk)
{
return std::pair(*chunk.begin(), *std::next(chunk.begin()));
};
for (auto [first, second] : values | std::ranges::views::chunk(2) | std::ranges::views::transform(chunk_to_pair))
{
std::cout << first << second << std::endl;
}
}
Alternatively, you could achieve a similar result by ziping a pair of strided views
#include <iostream>
#include <ranges>
#include <vector>
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto odds = values | std::ranges::views::drop(0) | std::ranges::views::stride(2);
auto evens = values | std::ranges::views::drop(1) | std::ranges::views::stride(2);
for (auto [first, second] : std::ranges::views::zip(odds, evens))
{
std::cout << first << second << std::endl;
}
}
That last one can be generalised to n-tuples
template <size_t N>
struct tuple_chunk_t
{
template <typename R, size_t... Is>
auto impl(R && r, std::index_sequence<Is...>)
{
using namespace ranges::view;
return zip(r | drop(Is) | stride(N)...);
}
template <typename R>
auto operator()(R && r) const
{
return impl(std::forward<R>(r), std::make_index_sequence<N>{});
}
template <typename R>
friend auto operator|(R && r, chunk_t)
{
return impl(std::forward<R>(r), std::make_index_sequence<N>{});
}
};
template <size_t N>
constexpr tuple_chunk_t<N> tuple_chunk;
I'm not sure why you would require a standard algorithm when writing it yourself is roughly 5 lines of code (plus boilerplate):
template<class T>
std::vector<std::pair<T, T>> group_pairs(const std::vector<T>& values)
{
assert(values.size() % 2 == 0);
auto output = std::vector<std::pair<T, T>>();
output.reserve(values.size()/2);
for(size_t i = 0; i < values.size(); i+=2)
output.emplace_back(values[i], values[i+1]);
return output;
}
And call it like so:
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto result = group_pairs(values)
Live Demo
I am not aware of a standard algorithm that does what you want directly (though I am not very familiar with C++20 and beyond). You can always write a loop and most loops can be expressed via std::for_each which is a standard algorithm.
As you are accumulating elements in pairs, I would give std::accumulate a try:
#include <vector>
#include <numeric>
#include <iostream>
struct pair_accumulator {
std::vector<std::pair<int,int>> result;
int temp = 0;
bool set = false;
pair_accumulator& operator+(int x){
if (set) {
result.push_back({temp,x});
set = false;
} else {
temp = x;
set = true;
}
return *this;
}
};
int main() {
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
auto x = std::accumulate(values.begin(),values.end(),pair_accumulator{}).result;
for (const auto& e : x) {
std::cout << e.first << " " << e.second << "\n";
}
}
Whether this is simpler than writing a plain loop is questionable admittedly.
If possible I would try to not transform the vector. Instead of accessing result[i].first you can as well use values[i*2] and similar for second. If this is not feasible the next option is to populate a std::vector<std::pair<int,int>> from the start so you don't have to do the transformation. For the first, depending on what you need in details, the following might be a start:
#include <vector>
#include <iostream>
struct view_as_pairs {
std::vector<int>& values;
struct proxy {
std::vector<int>::iterator it;
int& first() { return *it;}
int& second() { return *(it +1); }
};
proxy operator[](size_t index){
return proxy{values.begin() + index*2};
}
size_t size() { return values.size() / 2;}
};
int main() {
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
view_as_pairs v{values};
for (size_t i=0; i < v.size(); ++i){
std::cout << v[i].first() << " " << v[i].second() << "\n";
}
}
TL;DR: Consider if you can avoid the transformation. If you cannot avoid it, it is probably cleanest to write a loop. Standard algorithms help often but not always.
OK, I hinted in the comments about using std::adjacent_find, so here is how you would do this.
And yes, many (even myself) considers this a hack, where we are using a tool meant for something else to make short work of solving a seemingly unrelated problem:
#include <algorithm>
#include <iostream>
#include <utility>
#include <vector>
int main()
{
//Test data
std::vector<int> v = {1,2,3,4,5,6,7,8,9,10};
// results
std::vector<std::pair<int,int>> result;
// save flag
bool save_it = true;
// Use std::adjacent_find
std::adjacent_find(v.begin(), v.end(), [&](int n1, int n2)
{ if (save_it) result.push_back({n1,n2}); save_it = !save_it; return false; });
for (auto& pr : result)
std::cout << pr.first << " " << pr.second << "\n";
}
Output:
1 2
3 4
5 6
7 8
9 10
The way it works is we ignore the second, fourth, sixth, etc. pairs, and only save the first, third, fifth, etc. pairs. That's controlled by a boolean flag variable, save_it.
Note that since we want to process all pairs, the std::adjacent_find predicate always returns false. That's the hackish part of this solution.
The solutions so far try to use the std::vector iterators as input to the algorithms directly. How about defining a custom iterator that returns a std::pair and has strides of 2? Creating the vector of pairs is then a one-liner that uses std::copy. The iterator effectively provides a "view" onto the original vector in terms of pairs. This also allows the use of many of the standard algorithms. The following example could also be generalized quite a bit to work with most container iterators, i.e. you do the difficult work of defining such an iterator once and then you can apply it to all sorts of containers and algorithms. Live example: https://godbolt.org/z/ceEsvKhzd
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
struct pair_iterator {
using difference_type = std::vector<int>::const_iterator::difference_type;
using value_type = std::pair<int, int>;
using pointer = value_type*;
using reference = value_type; // Not a pair&, but that is ok for LegacyIterator
// Can't be forward_iterator_tag because "reference" is not a pair&
using iterator_category = std::input_iterator_tag;
reference operator*()const { return {*base_iter, *(base_iter + 1)}; }
pair_iterator & operator++() { base_iter += 2; return *this; }
pair_iterator operator++(int) { auto ret = *this; ++(*this); return ret; }
friend bool operator==(pair_iterator lhs, pair_iterator rhs){
return lhs.base_iter == rhs.base_iter;
}
friend bool operator!=(pair_iterator lhs, pair_iterator rhs){
return lhs.base_iter != rhs.base_iter;
}
std::vector<int>::const_iterator base_iter{};
};
auto pair_begin(std::vector<int> const & v){ assert(v.size()%2==0); return pair_iterator{v.begin()}; }
auto pair_end(std::vector<int> const & v){ assert(v.size()%2==0); return pair_iterator{v.end()}; }
int main()
{
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
std::vector<std::pair<int, int>> pair_values;
std::copy(pair_begin(values), pair_end(values), std::back_inserter(pair_values));
for (auto const & pair : pair_values) {
std::cout << "{" << pair.first << "," << pair.second << "} ";
}
std::cout << std::endl;
}
Following the question in Heterogenous vectors of pointers. How to call functions.
I would like to know how to identify null points inside the vector of boost::variant.
Example code:
#include <boost/variant.hpp>
#include <vector>
template< typename T>
class A
{
public:
A(){}
~A(){}
void write();
private:
T data;
};
template< typename T>
void A<T>::write()
{
std::cout << data << std::endl;
}
class myVisitor
: public boost::static_visitor<>
{
public:
template< typename T>
void operator() (A<T>* a) const
{
a->write();
}
};
int main()
{
A<int> one;
A<double> two;
typedef boost::variant<A<int>*, A<double>* > registry;
std::vector<registry> v;
v.push_back(&one);
v.push_back(&two);
A<int>* tst = new A<int>;
for(auto x: v)
{
boost::apply_visitor(myVisitor(), x);
try {delete tst; tst = nullptr;}
catch (...){}
}
}
Since I am deleting the pointer I would hope that the last one will give me an error or something. How can I check if the entry in the entry is pointing to nullptr?
Note: this partly ignores the X/Y of this question, based on the tandom question (Heterogenous vectors of pointers. How to call functions)
What you seem to be after is polymorphic collections, but not with a virtual type hierarchy.
This is known as type erasure, and Boost Type Erasure is conveniently wrapped for exactly this use case with Boost PolyCollection.
The type erased variation would probably look like any_collection:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/member.hpp>
namespace pc = boost::poly_collection;
BOOST_TYPE_ERASURE_MEMBER(has_write, write)
using writable = has_write<void()>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
void write() const { std::cout << data << std::endl; }
private:
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
w.write();
}
}
Prints
3.14159
314
Note that the insertion order is preserved, but iteration is done type-by-type. This is also what makes PolyCollection much more efficient than "regular" containers that do not optimize allocation sizes or use pointers.
BONUS: Natural printing operator<<
Using classical dynamic polymorphism, this would not work without adding virtual methods, but with Boost TypeErasure ostreamable is a ready-made concept:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/operators.hpp>
namespace pc = boost::poly_collection;
using writable = boost::type_erasure::ostreamable<>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
private:
friend std::ostream& operator<<(std::ostream& os, A const& a) {
return os << a.data;
}
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
std::cout << w << "\n";
}
}
Printing the same as before.
UPDATE
To the comment:
I want to create n A<someType> variables (these are big objects). All of these variables have a write function to write something to a file.
My idea is to collect all the pointers of these variables and at the end loop through the vector to call each write function. Now, it might happen that I want to allocate memory and delete a A<someType> variable. If this happens it should not execute the write function.
This sounds like one of the rare occasions where shared_ptr makes sense, because it allows you to observe the object's lifetime using weak_ptr.
Object Graph Imagined...
Let's invent a node type that can participate in a pretty large object graph, such that you would keep an "index" of pointers to some of its nodes. For this demonstration, I'll make it a tree-structured graph, and we're going to keep References to the leaf nodes:
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
Now, lets add identification to the Node base so we have an arbitrary way to identify nodes to delete (e.g. all nodes with odd ids). In addition, any node can have child nodes, so let's put that in the base node as well:
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
Now we need some concrete derived node types:
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
Yeah. Imagination is not my strong suit ¯\(ツ)/¯
Generate Random Data
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
Nothing fancy. Just a randomly generated tree with a max_depth and random distribution of node types.
write to Pretty-Print
Let's add some logic to display any object graph with indentation:
// for demo output
#include <boost/core/demangle.hpp>
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
To keep track of the indentation level I'll define these indent/unindent
manipulators modifying some custom state inside the stream object:
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
That should do.
Getting Leaf Nodes
Leaf nodes are the nodes without any children.
This is a depth-first tree visitor taking any output iterator:
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree; // that's a leaf node!
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
Removing some nodes:
Yet another depht-first visitor:
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
DEMO TIME
Tieing it all together, we can print a randomly generated graph:
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
This puts all its leaf node References in a container:
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
Which we can print using their write() methods:
std::cout << "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
Of course all the leafs are still alive. But we can change that! We will remove one in 5 nodes by id:
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
The reported number of leafs nodes would still seem the same. That's... not
what you wanted. Here's where your question comes in: how do we detect the
nodes that were deleted?
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
Now the count will accurately reflect the number of leaf nodes remaining.
Live On Coliru
#include <memory>
#include <vector>
#include <ostream>
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
static size_t s_idgen = 0;
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
// for demo output
#include <boost/core/demangle.hpp>
#include <iostream>
#include <iomanip>
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree;
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
#include <list>
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
std::cout << "\n------------"
<< "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
// some of them are not alive, see which are gone ("detecing the null pointers")
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
}
Prints e.g.
root_tag*(id:0) {
- bicycle_tag*(id:1) {}
- bicycle_tag*(id:2) {
- pear_tag*(id:3) {}
}
- bicycle_tag*(id:4) {
- bicycle_tag*(id:5) {}
- bicycle_tag*(id:6) {}
}
}
------------
Leafs: 4 bicycle_tag*(id:1) {} pear_tag*(id:3) {} bicycle_tag*(id:5) {} bicycle_tag*(id:6) {}
Removed 1 2mod5 nodes from graph
(Stale?) Leafs: 4
Live leafs: 3
Or see the COLIRU link for a much larger sample.
I am trying to solve a question on leetcode which is finding the top k frequent elements. I think my code is correct but the output for a test case is failing.
Input: [ 4,1,-1,2,-1,2,3]
K=2
My answer comes out to be {1,-1} but the expected is {-1,2}. I am not sure where am i getting wrong.
struct myComp{
constexpr bool operator()(pair<int,int> & a,pair<int,int> &b)
const noexcept
{
if(a.second==b.second)
{
return a.first<b.first;
}
return a.second<b.second;
}
};
class Solution {
public:
vector<int> topKFrequent(vector<int>& nums, int k) {
unordered_map<int,int> mp;
for(int i=0;i<nums.size();i++)
{
mp[nums[i]]++;
}
priority_queue<pair<int,int>,vector<pair<int,int>>,myComp> minheap;
for(auto x:mp)
{
minheap.push(make_pair(x.second,x.first));
if(minheap.size()>k)
{
minheap.pop();
}
}
vector<int> x;
while(minheap.size()>0)
{
x.push_back(minheap.top().second);
minheap.pop();
}
return x;
link: https://leetcode.com/problems/top-k-frequent-elements
In the minheap, pairs of <frequency, element> are being pushed. Since we want to sort these pairs on basis of frequency, we need to compare on the basis of frequency only.
Let's say there are two pairs a and b. Then for normal sorting, the comparison would be :
a.first < b.first;
And for reverse sorting, the comparison would be :
a.first > b.first;
In case of min-heap, we need reverse sorting. Hence, the following comparator makes your code pass all the test cases :
struct myComp
{
constexpr bool operator()(pair<int,int> & a,pair<int,int> &b)
const noexcept
{
return a.first > b.first;
}
};
There are several issues with your code.
Obviously there is somewhere using namespace std; in your code. This should be avoided. You will find many posts here on SO explaining, why it this should not be done.
Then we need to qualify all elements from the std library with std::, which makes the scope very clear.
Next: You do not need your own sorting function. Since you insert the elements from the pair in swapped order into the std::priority_queue, the sorting criteria is valid for the counter part, not for the key value. So, your sorting function was anyway wrong, because it was sorting accodring to "second" and not to "first". But if we have a standard sorting, we do not need a special sorting algorithm. A std::pair has a less-than operator. So, the definition can be simply:
std::priority_queue<std::pair<int, int>> minheap;
Then, your if statement
if(minheap.size()>k)
{
minheap.pop();
}
is wrong. You will allow only k values to be inserted. And this are not necessarily the biggest ones. So, you need to insert all values from the std::unordered map. And then they are sorted.
With some cosmetic changes the code will look like the below:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <vector>
#include <queue>
std::vector<int> topKFrequent(std::vector<int>& nums, size_t k) {
std::unordered_map<int, int> mp;
for (size_t i = 0; i < nums.size(); i++)
{
mp[nums[i]]++;
}
std::priority_queue<std::pair<int, int>> minheap;
for (auto x : mp)
{
minheap.push(std::make_pair(x.second, x.first));
}
std::vector<int> x;
for (size_t i{}; i< k; ++i)
{
x.push_back(minheap.top().second);
minheap.pop();
}
return x;
}
int main() {
std::vector data{ 4,1,-1,2,-1,2,3 };
std::vector result = topKFrequent(data, 2);
for (const int i : result) std::cout << i << ' '; std::cout << '\n';
return 0;
}
An additional solution
#include <iostream>
#include <vector>
#include <algorithm>
#include <unordered_map>
#include <utility>
auto topKFrequent(std::vector<int>& nums, size_t k) {
// Count occurences
std::unordered_map<int, size_t> counter{};
for (const int& i : nums) counter[i]++;
// For storing the top k
std::vector<std::pair<int, size_t>> top(k);
// Get top k
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<int, size_t >& p1, const std::pair<int, size_t>& p2) { return p1.second > p2.second; });
return top;
}
// Test code
int main() {
std::vector data{ 4,1,-1,2,-1,2,3 };
for (const auto& p : topKFrequent(data, 2))
std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
return 0;
}
And of course, we do have also the universal solution for any kind of iterable container. Including the definition for type traits using SFINAE and checking for the correct template parameter.
#include <iostream>
#include <utility>
#include <unordered_map>
#include <algorithm>
#include <vector>
#include <iterator>
#include <type_traits>
// Helper for type trait We want to identify an iterable container ----------------------------------------------------
template <typename Container>
auto isIterableHelper(int) -> decltype (
std::begin(std::declval<Container&>()) != std::end(std::declval<Container&>()), // begin/end and operator !=
++std::declval<decltype(std::begin(std::declval<Container&>()))&>(), // operator ++
void(*std::begin(std::declval<Container&>())), // operator*
void(), // Handle potential operator ,
std::true_type{});
template <typename T>
std::false_type isIterableHelper(...);
// The type trait -----------------------------------------------------------------------------------------------------
template <typename Container>
using is_iterable = decltype(isIterableHelper<Container>(0));
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::decay_t<decltype(*std::begin(std::declval<Container&>()))>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
// Function to get the k most frequent elements used in any Container ------------------------------------------------
template <class Container>
auto topKFrequent(const Container& data, size_t k) {
if constexpr (is_iterable<Container>::value) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d]++;
// For storing the top k
std::vector<Pair<Container>> top(k);
// Get top k
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<int, size_t >& p1, const std::pair<int, size_t>& p2) { return p1.second > p2.second; });
return top;
}
else
return data;
}
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
for (const auto& p : topKFrequent(testVector, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
for (const auto& p : topKFrequent(cStyleArray, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
std::string s{"abbcccddddeeeeeffffffggggggg"};
for (const auto& p : topKFrequent(s, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double value = 12.34;
std::cout << topKFrequent(value,2) << "\n";
return 0;
}
Developed and tested with Microsoft Visual Studio Community 2019, Version 16.8.2.
Additionally compiled and tested with clang11.0 and gcc10.2
Language: C++17
Here is my code:
#include <functional>
#include <iostream>
#include<vector>
using namespace std;
// vector iterator
template <class T> class vit
{
private:
//vector<T>::iterator it;
vector<T> m_v;
function<bool (T, T)> m_fptr;
int len, pos;
public:
vit(vector<T> &v) { this->m_v = v; len = v.size(); pos = 0;};
// it= v.begin(); };
bool next(T &i) {
//if(it == m_v.end()) return false;
if(pos==len) return false;
//i = *it;
i = m_v[pos];
//if(idle) { idle = false ; return true; }
//it++;
pos++;
return true;};
//bool idle = true;
void set_same(function<bool (T,T)> fptr) { m_fptr = fptr ;};
//void set_same(function<bool(int, int)> fun) { return ; }
bool grp_begin() {
return pos == 0 || ! m_fptr(m_v[pos], m_v[pos-1]); };
bool grp_end() {
return pos == len || ! m_fptr(m_v[pos], m_v[pos+1]); };
};
bool is_same(int a, int b) { return a == b; }
main()
{
vector<int> v ={ 1, 1, 2, 2, 2, 3, 1, 1, 1 };
int total;
for(auto it = v.begin(); it != v.end(); it++) {
if(it == v.begin() || *it != *(it-1)) {
total = 0;
}
total += *it;
if(it+1 == v.end() || *it != *(it+1)) {
cout << total << endl;
}
}
cout << "let's gry a group" <<endl;
vit<int> g(v);
int i;
while(g.next(i)) { cout << i << endl; }
cout << "now let's get really fancy" << endl;
vit<int> a_vit(v);
//auto is_same = [](int a, int b) { return a == b; };
a_vit.set_same(is_same);
//int total;
while(a_vit.next(i)) {
if(a_vit.grp_begin()) total = 0;
total += i;
if(a_vit.grp_end()) cout << total << endl ;
}
}
When I compile it with g++ -std=c++11 iter.cc -o iter, I get the result:
iter.cc: In function 'int main()':
iter.cc:63:17: error: reference to 'is_same' is ambiguous
a_vit.set_same(is_same);
^
iter.cc:37:6: note: candidates are: bool is_same(int, int)
bool is_same(int a, int b) { return a == b; }
^
In file included from /usr/include/c++/5.3.0/bits/move.h:57:0,
from /usr/include/c++/5.3.0/bits/stl_pair.h:59,
from /usr/include/c++/5.3.0/utility:70,
from /usr/include/c++/5.3.0/tuple:38,
from /usr/include/c++/5.3.0/functional:55,
from iter.cc:1:
/usr/include/c++/5.3.0/type_traits:958:12: note: template<class, class> struct std::is_same
struct is_same;
^
By way of explanation, I have created a class called 'vit'. It does two things: iterate over a vector, and determine if a new group has been reached.
The class function 'set_same' is supposed to store a function provided by the calling class to determine if two adjacent elements of a vector are in the same group. However, I can't seem to store the function in the class for future use by grp_begin() and grp_end() on account of the ostensible ambiguity of is_same.
What gives?
There is an is_same function defined by you and there is a struct is_same defined by the C++ Standard Library. Since you are using namespace std, your compiler doesn't know which is_same you meant to use.
It's what the error says: it's not clear whether you mean your is_same (in the global namespace) or the class template is_same (in namespace std).
You may disambiguate as follows:
::is_same
… with the leading :: meaning "in the global namespace".
Though you should consider putting your code in a namespace of its own.
Thanks guys. This is my first time touching C++ after more than a decade. I have cleaned up the code, and used a lambda to bring the "is_same" function closer to where it is called.
Did you spot the bug in my code? 'pos' was off-by-one when calling grp_begin() and grp_end(). Here is the revised code:
#include <functional>
#include <iostream>
#include <vector>
// vector iterator
template <class T> class vit
{
private:
std::vector<T> m_v;
std::function<bool (T, T)> m_fptr;
int len, pos;
public:
vit(std::vector<T> &v) { m_v = v; len = v.size(); pos = -1;};
bool next(T &val) {
pos++;
if(pos==len) return false;
val = m_v[pos];
return true;};
void set_same(std::function<bool (T,T)> fptr) { m_fptr = fptr ;};
bool grp_begin() {
return pos == 0 || ! m_fptr(m_v[pos], m_v[pos-1]); };
bool grp_end() {
return pos+1 == len || ! m_fptr(m_v[pos], m_v[pos+1]); };
};
main()
{
std::vector<int> v ={ 1, 1, 2, 2, 2, 3, 1, 1, 1 };
vit<int> a_vit(v);
std::function<bool (int, int)> is_same = [](int a, int b) { return a == b; };
a_vit.set_same(is_same);
int i, total;
while(a_vit.next(i)) {
if(a_vit.grp_begin()) total = 0;
total += i;
if(a_vit.grp_end()) std::cout << total << std::endl ;
}
}
My class definition isn't bullet-proof and could be better: if the user forgets to 'set-same', for example, they'll be referring a random memory address as a function.
Nevertheless, I'm pretty chuffed with my solution so far. The class caller is relieved of all the bookkeeping relating iterating over the vector, and working out if a group boundary has been crossed.
The calling code looks very compact and intuitive to me.I can see C++ being my go to language.
Are there any C++ transformations which are similar to itertools.groupby()?
Of course I could easily write my own, but I'd prefer to leverage the idiomatic behavior or compose one from the features provided by the STL or boost.
#include <cstdlib>
#include <map>
#include <algorithm>
#include <string>
#include <vector>
struct foo
{
int x;
std::string y;
float z;
};
bool lt_by_x(const foo &a, const foo &b)
{
return a.x < b.x;
}
void list_by_x(const std::vector<foo> &foos, std::map<int, std::vector<foo> > &foos_by_x)
{
/* ideas..? */
}
int main(int argc, const char *argv[])
{
std::vector<foo> foos;
std::map<int, std::vector<foo> > foos_by_x;
std::vector<foo> sorted_foos;
std::sort(foos.begin(), foos.end(), lt_by_x);
list_by_x(sorted_foos, foos_by_x);
return EXIT_SUCCESS;
}
This doesn't really answer your question, but for the fun of it, I implemented a group_by iterator. Maybe someone will find it useful:
#include <assert.h>
#include <iostream>
#include <set>
#include <sstream>
#include <string>
#include <vector>
using std::cout;
using std::cerr;
using std::multiset;
using std::ostringstream;
using std::pair;
using std::vector;
struct Foo
{
int x;
std::string y;
float z;
};
struct FooX {
typedef int value_type;
value_type operator()(const Foo &f) const { return f.x; }
};
template <typename Iterator,typename KeyFunc>
struct GroupBy {
typedef typename KeyFunc::value_type KeyValue;
struct Range {
Range(Iterator begin,Iterator end)
: iter_pair(begin,end)
{
}
Iterator begin() const { return iter_pair.first; }
Iterator end() const { return iter_pair.second; }
private:
pair<Iterator,Iterator> iter_pair;
};
struct Group {
KeyValue value;
Range range;
Group(KeyValue value,Range range)
: value(value), range(range)
{
}
};
struct GroupIterator {
typedef Group value_type;
GroupIterator(Iterator iter,Iterator end,KeyFunc key_func)
: range_begin(iter), range_end(iter), end(end), key_func(key_func)
{
advance_range_end();
}
bool operator==(const GroupIterator &that) const
{
return range_begin==that.range_begin;
}
bool operator!=(const GroupIterator &that) const
{
return !(*this==that);
}
GroupIterator operator++()
{
range_begin = range_end;
advance_range_end();
return *this;
}
value_type operator*() const
{
return value_type(key_func(*range_begin),Range(range_begin,range_end));
}
private:
void advance_range_end()
{
if (range_end!=end) {
typename KeyFunc::value_type value = key_func(*range_end++);
while (range_end!=end && key_func(*range_end)==value) {
++range_end;
}
}
}
Iterator range_begin;
Iterator range_end;
Iterator end;
KeyFunc key_func;
};
GroupBy(Iterator begin_iter,Iterator end_iter,KeyFunc key_func)
: begin_iter(begin_iter),
end_iter(end_iter),
key_func(key_func)
{
}
GroupIterator begin() { return GroupIterator(begin_iter,end_iter,key_func); }
GroupIterator end() { return GroupIterator(end_iter,end_iter,key_func); }
private:
Iterator begin_iter;
Iterator end_iter;
KeyFunc key_func;
};
template <typename Iterator,typename KeyFunc>
inline GroupBy<Iterator,KeyFunc>
group_by(
Iterator begin,
Iterator end,
const KeyFunc &key_func = KeyFunc()
)
{
return GroupBy<Iterator,KeyFunc>(begin,end,key_func);
}
static void test()
{
vector<Foo> foos;
foos.push_back({5,"bill",2.1});
foos.push_back({5,"rick",3.7});
foos.push_back({3,"tom",2.5});
foos.push_back({7,"joe",3.4});
foos.push_back({5,"bob",7.2});
ostringstream out;
for (auto group : group_by(foos.begin(),foos.end(),FooX())) {
out << group.value << ":";
for (auto elem : group.range) {
out << " " << elem.y;
}
out << "\n";
}
assert(out.str()==
"5: bill rick\n"
"3: tom\n"
"7: joe\n"
"5: bob\n"
);
}
int main(int argc,char **argv)
{
test();
return 0;
}
Eric Niebler's ranges library provides a group_by view.
according to the docs it is a header only library and can be included easily.
It's supposed to go into the standard C++ space, but can be used with a recent C++11 compiler.
minimal working example:
#include <map>
#include <vector>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
int main(int argc, char **argv) {
vector<int> l { 0,1,2,3,6,5,4,7,8,9 };
ranges::v3::sort(l);
auto x = l | view::group_by([](int x, int y) { return x / 5 == y / 5; });
map<int, vector<int>> res;
auto i = x.begin();
auto e = x.end();
for (;i != e; ++i) {
auto first = *((*i).begin());
res[first / 5] = to_vector(*i);
}
// res = { 0 : [0,1,2,3,4], 1: [5,6,7,8,9] }
}
(I compiled this with clang 3.9.0. and --std=c++11)
I recently discovered cppitertools.
It fulfills this need exactly as described.
https://github.com/ryanhaining/cppitertools#groupby
What is the point of bloating standard C++ library with an algorithm that is one line of code?
for (const auto & foo : foos) foos_by_x[foo.x].push_back(foo);
Also, take a look at std::multimap, it might be just what you need.
UPDATE:
The one-liner I have provided is not well-optimized for the case when your vector is already sorted. A number of map lookups can be reduced if we remember the iterator of previously inserted object, so it the "key" of the next object and do a lookup only when the key is changing. For example:
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
struct foo {
int x;
std::string y;
float z;
};
class optimized_inserter {
public:
typedef std::map<int, std::vector<foo> > map_type;
optimized_inserter(map_type & map) : map(&map), it(map.end()) {}
void operator()(const foo & obj) {
typedef map_type::value_type value_type;
if (it != map->end() && last_x == obj.x) {
it->second.push_back(obj);
return;
}
last_x = obj.x;
it = map->insert(value_type(obj.x, std::vector<foo>({ obj }))).first;
}
private:
map_type *map;
map_type::iterator it;
int last_x;
};
int main()
{
std::vector<foo> foos;
std::map<int, std::vector<foo>> foos_by_x;
foos.push_back({ 1, "one", 1.0 });
foos.push_back({ 3, "third", 2.5 });
foos.push_back({ 1, "one.. but third", 1.5 });
foos.push_back({ 2, "second", 1.8 });
foos.push_back({ 1, "one.. but second", 1.5 });
std::sort(foos.begin(), foos.end(), [](const foo & lhs, const foo & rhs) {
return lhs.x < rhs.x;
});
std::for_each(foos.begin(), foos.end(), optimized_inserter(foos_by_x));
for (const auto & p : foos_by_x) {
std::cout << "--- " << p.first << "---\n";
for (auto & f : p.second) {
std::cout << '\t' << f.x << " '" << f.y << "' / " << f.z << '\n';
}
}
}
How about this?
template <typename StructType, typename FieldSelectorUnaryFn>
auto GroupBy(const std::vector<StructType>& instances, const FieldSelectorUnaryFn& fieldChooser)
{
StructType _;
using FieldType = decltype(fieldChooser(_));
std::map<FieldType, std::vector<StructType>> instancesByField;
for (auto& instance : instances)
{
instancesByField[fieldChooser(instance)].push_back(instance);
}
return instancesByField;
}
and use it like this:
auto itemsByX = GroupBy(items, [](const auto& item){ return item.x; });
I wrote a C++ library to address this problem in an elegant way. Given your struct
struct foo
{
int x;
std::string y;
float z;
};
To group by y you simply do:
std::vector<foo> dataframe;
...
auto groups = group_by(dataframe, &foo::y);
You can also group by more than one variable:
auto groups = group_by(dataframe, &foo::y, &foo::x);
And then iterate through the groups normally:
for(auto& [key, group]: groups)
{
// do something
}
It also has other operations such as: subset, concat, and others.
I would simply use boolinq.h, which includes all of LINQ. No documentation, but very simple to use.