Related
Following the question in Heterogenous vectors of pointers. How to call functions.
I would like to know how to identify null points inside the vector of boost::variant.
Example code:
#include <boost/variant.hpp>
#include <vector>
template< typename T>
class A
{
public:
A(){}
~A(){}
void write();
private:
T data;
};
template< typename T>
void A<T>::write()
{
std::cout << data << std::endl;
}
class myVisitor
: public boost::static_visitor<>
{
public:
template< typename T>
void operator() (A<T>* a) const
{
a->write();
}
};
int main()
{
A<int> one;
A<double> two;
typedef boost::variant<A<int>*, A<double>* > registry;
std::vector<registry> v;
v.push_back(&one);
v.push_back(&two);
A<int>* tst = new A<int>;
for(auto x: v)
{
boost::apply_visitor(myVisitor(), x);
try {delete tst; tst = nullptr;}
catch (...){}
}
}
Since I am deleting the pointer I would hope that the last one will give me an error or something. How can I check if the entry in the entry is pointing to nullptr?
Note: this partly ignores the X/Y of this question, based on the tandom question (Heterogenous vectors of pointers. How to call functions)
What you seem to be after is polymorphic collections, but not with a virtual type hierarchy.
This is known as type erasure, and Boost Type Erasure is conveniently wrapped for exactly this use case with Boost PolyCollection.
The type erased variation would probably look like any_collection:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/member.hpp>
namespace pc = boost::poly_collection;
BOOST_TYPE_ERASURE_MEMBER(has_write, write)
using writable = has_write<void()>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
void write() const { std::cout << data << std::endl; }
private:
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
w.write();
}
}
Prints
3.14159
314
Note that the insertion order is preserved, but iteration is done type-by-type. This is also what makes PolyCollection much more efficient than "regular" containers that do not optimize allocation sizes or use pointers.
BONUS: Natural printing operator<<
Using classical dynamic polymorphism, this would not work without adding virtual methods, but with Boost TypeErasure ostreamable is a ready-made concept:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/operators.hpp>
namespace pc = boost::poly_collection;
using writable = boost::type_erasure::ostreamable<>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
private:
friend std::ostream& operator<<(std::ostream& os, A const& a) {
return os << a.data;
}
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
std::cout << w << "\n";
}
}
Printing the same as before.
UPDATE
To the comment:
I want to create n A<someType> variables (these are big objects). All of these variables have a write function to write something to a file.
My idea is to collect all the pointers of these variables and at the end loop through the vector to call each write function. Now, it might happen that I want to allocate memory and delete a A<someType> variable. If this happens it should not execute the write function.
This sounds like one of the rare occasions where shared_ptr makes sense, because it allows you to observe the object's lifetime using weak_ptr.
Object Graph Imagined...
Let's invent a node type that can participate in a pretty large object graph, such that you would keep an "index" of pointers to some of its nodes. For this demonstration, I'll make it a tree-structured graph, and we're going to keep References to the leaf nodes:
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
Now, lets add identification to the Node base so we have an arbitrary way to identify nodes to delete (e.g. all nodes with odd ids). In addition, any node can have child nodes, so let's put that in the base node as well:
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
Now we need some concrete derived node types:
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
Yeah. Imagination is not my strong suit ¯\(ツ)/¯
Generate Random Data
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
Nothing fancy. Just a randomly generated tree with a max_depth and random distribution of node types.
write to Pretty-Print
Let's add some logic to display any object graph with indentation:
// for demo output
#include <boost/core/demangle.hpp>
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
To keep track of the indentation level I'll define these indent/unindent
manipulators modifying some custom state inside the stream object:
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
That should do.
Getting Leaf Nodes
Leaf nodes are the nodes without any children.
This is a depth-first tree visitor taking any output iterator:
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree; // that's a leaf node!
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
Removing some nodes:
Yet another depht-first visitor:
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
DEMO TIME
Tieing it all together, we can print a randomly generated graph:
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
This puts all its leaf node References in a container:
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
Which we can print using their write() methods:
std::cout << "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
Of course all the leafs are still alive. But we can change that! We will remove one in 5 nodes by id:
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
The reported number of leafs nodes would still seem the same. That's... not
what you wanted. Here's where your question comes in: how do we detect the
nodes that were deleted?
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
Now the count will accurately reflect the number of leaf nodes remaining.
Live On Coliru
#include <memory>
#include <vector>
#include <ostream>
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
static size_t s_idgen = 0;
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
// for demo output
#include <boost/core/demangle.hpp>
#include <iostream>
#include <iomanip>
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree;
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
#include <list>
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
std::cout << "\n------------"
<< "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
// some of them are not alive, see which are gone ("detecing the null pointers")
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
}
Prints e.g.
root_tag*(id:0) {
- bicycle_tag*(id:1) {}
- bicycle_tag*(id:2) {
- pear_tag*(id:3) {}
}
- bicycle_tag*(id:4) {
- bicycle_tag*(id:5) {}
- bicycle_tag*(id:6) {}
}
}
------------
Leafs: 4 bicycle_tag*(id:1) {} pear_tag*(id:3) {} bicycle_tag*(id:5) {} bicycle_tag*(id:6) {}
Removed 1 2mod5 nodes from graph
(Stale?) Leafs: 4
Live leafs: 3
Or see the COLIRU link for a much larger sample.
I have a parser in which I want to capture certain types of whitespace as enum values and preserve the spaces for the "text" values.
My whitespace parser is pretty basic (Note: I've only added the pipe character here for test/dev purposes):
struct whitespace_p : x3::symbols<Whitespace>
{
whitespace_p()
{
add
("\n", Whitespace::NEWLINE)
("\t", Whitespace::TAB)
("|", Whitespace::PIPE)
;
}
} whitespace;
And I want to capture everything either into my enum or into std::strings:
struct Element : x3::variant<Whitespace, std::string>
{
using base_type::base_type;
using base_type::operator=;
};
And to parse my input I use something like this:
const auto contentParser
= x3::rule<class ContentParserID, Element, true> { "contentParser" }
= x3::no_skip[+(x3::char_ - (whitespace))]
| whitespace
;
using Elements = std::vector<Element>;
const auto elementsParser
= x3::rule<class ContentParserID, Elements, true> { "elementsParser" }
= contentParser >> *(contentParser);
The problem though is that the parser stops at the first tab or newline it hits.
Code: http://coliru.stacked-crooked.com/a/d2cda4ce721279a4
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
enum Whitespace
{
NEWLINE,
TAB,
PIPE
};
struct whitespace_p : x3::symbols<Whitespace>
{
whitespace_p()
{
add
("\n", Whitespace::NEWLINE)
("\t", Whitespace::TAB)
("|", Whitespace::PIPE)
;
}
} whitespace;
struct Element : x3::variant<Whitespace, std::string>
{
using base_type::base_type;
using base_type::operator=;
};
const auto contentParser
= x3::rule<class ContentParserID, Element, true> { "contentParser" }
= x3::no_skip[+(x3::char_ - (whitespace))]
| whitespace
;
using Elements = std::vector<Element>;
const auto elementsParser
= x3::rule<class ContentParserID, Elements, true> { "elementsParser" }
= contentParser >> *(contentParser);
struct print_visitor
: public boost::static_visitor<std::string>
{
std::string operator()(const Whitespace& ws) const
{
if (ws == Whitespace::NEWLINE)
{
return "newline";
}
else if (ws == Whitespace::PIPE)
{
return "pipe";
}
else
{
return "tab";
}
}
std::string operator()(const std::string& str) const
{
return str;
}
};
int main()
{
const std::string text = "Hello \n World";
std::string::const_iterator start = std::begin(text);
const std::string::const_iterator stop = std::end(text);
Elements elements{};
bool result =
phrase_parse(start, stop, elementsParser, x3::ascii::space, elements);
if (!result)
{
std::cout << "failed to parse!\n";
}
else if (start != stop)
{
std::cout << "unparsed: " << std::string{start, stop} << '\n';
}
else
{
for (const auto& e : elements)
{
std::cout << "element: [" << boost::apply_visitor(print_visitor{}, e) << "]\n";
}
}
}
If I parse the text Hello | World then I get the results I'm expecting. But if I instead use Hello \n World the whitespace after the \n is swallowed and the World is never parsed. Ideally I'd like to see this output:
element: [Hello ]
element: [newline]
element: [ World]
How can I accomplish this? Thank you!
My goto reference on skipper issues: Boost spirit skipper issues
In this case you made it work with no_skip[]. That's correct.
no_skip is like lexeme except it doesn't pre-skip, from the source (boost/spirit/home/x3/directive/no_skip.hpp):
// same as lexeme[], but does not pre-skip
Alternative Take
In your case I would flip the logic: just adjust the skipper itself.
Also, don't supply the skipper with phrase_parse, because your grammar is highly sensitive to the correct value of the skipper.
Your whole grammar could be:
const auto p = x3::skip(x3::space - whitespace) [
*(+x3::graph | whitespace)
];
Here's a Live Demo On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
enum Whitespace { NEWLINE, TAB, PIPE };
struct whitespace_p : x3::symbols<Whitespace> {
whitespace_p() {
add
("\n", Whitespace::NEWLINE)
("\t", Whitespace::TAB)
("|", Whitespace::PIPE)
;
}
} static const whitespace;
struct Element : x3::variant<Whitespace, std::string> {
using base_type::base_type;
using base_type::operator=;
};
using Elements = std::vector<Element>;
static inline std::ostream& operator<<(std::ostream& os, Element const& el) {
struct print_visitor {
std::ostream& os;
auto& operator()(Whitespace ws) const {
switch(ws) {
case Whitespace::NEWLINE: return os << "[newline]";
case Whitespace::PIPE: return os << "[pipe]";
case Whitespace::TAB: return os << "[tab]";
}
return os << "?";
}
auto& operator()(const std::string& str) const { return os << std::quoted(str); }
} vis{os};
return boost::apply_visitor(vis, el);
}
int main() {
std::string const text = "\tHello \n World";
auto start = begin(text), stop = end(text);
const auto p = x3::skip(x3::space - whitespace) [
*(+x3::graph | whitespace)
];
Elements elements;
if (!parse(start, stop, p, elements)) {
std::cout << "failed to parse!\n";
} else {
std::copy(begin(elements), end(elements), std::ostream_iterator<Element>(std::cout, "\n"));
}
if (start != stop) {
std::cout << "unparsed: " << std::quoted(std::string(start, stop)) << '\n';
}
}
Prints
[tab]
"Hello"
[newline]
"World"
Even Simpler?
It doesn't seem like you'd need any skipper here at all. Why not:
const auto p = *(+~x3::char_("\n\t|") | whitespace);
While we're at it, there's no need for symbols to map enums:
struct Element : x3::variant<char, std::string> {
// ...
};
using Elements = std::vector<Element>;
And then
const auto p
= x3::rule<struct ID, Element> {}
= +~x3::char_("\n\t|") | x3::char_;
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
struct Element : x3::variant<char, std::string> {
using variant = x3::variant<char, std::string>;
using variant::variant;
using variant::operator=;
friend std::ostream& operator<<(std::ostream& os, Element const& el) {
struct print_visitor {
std::ostream& os;
auto& operator()(char ws) const {
switch(ws) {
case '\n': return os << "[newline]";
case '\t': return os << "[pipe]";
case '|': return os << "[tab]";
}
return os << "?";
}
auto& operator()(const std::string& str) const { return os << std::quoted(str); }
} vis{os};
return boost::apply_visitor(vis, el);
}
};
using Elements = std::vector<Element>;
int main() {
std::string const text = "\tHello \n World";
auto start = begin(text);
auto const stop = end(text);
Elements elements;
const auto p
= x3::rule<struct ID, Element> {}
= +~x3::char_("\n\t|") | x3::char_;
if (!parse(start, stop, *p, elements)) {
std::cout << "failed to parse!\n";
} else {
std::copy(begin(elements), end(elements), std::ostream_iterator<Element>(std::cout, "\n"));
}
if (start != stop) {
std::cout << "unparsed: " << std::quoted(std::string(start, stop)) << '\n';
}
}
Prints
[pipe]
"Hello "
[newline]
" World"
The problems are that you are using a phrase_parser instead of a parser at line 76.
Try to use something like
bool result =
parse(start, stop, elementsParser, elements);
Your phrase_parser was instructed to skip spaces, what you really don't want.
Look the first answer of How to use boost::spirit to parse a sequence of words into a vector?
Here is the map with a string key and a structure value
1. Firstly I am creating a map of integer and a structure as value
std::map<int,struct value>; and then I am adding all these map objects to a set
std::set<std::map<int,struct value>> and I would like to understand how to loop through this set
I am not able to access the maps that are the part of this set, please suggest
struct values
{
std::string a;
std::string b;
values():a("milepost"),b("dummyval"){};
values( std::string ab, std::string bc)
{
a=ab;
b=bc;
};
bool operator<(const values& other) const {
return (a< other.a && b < other.b) ;
}
friend std::ostream& operator<<(std::ostream& os, const values& val);
};
std::ostream& operator<< (std::ostream& os , const values& val)
{
os << val.a <<"\t"<< val.b;
return os;
}
typedef std::map<std::string,values> myWsData;
main()
{
values a;
myWsData et_Data1,pt_Data2;
et_Data2.insert(std::make_pair("780256", a));
pt_Data2.insert(std::make_pair("780256", a));
std::set<myWsData> myet_pt_data;
myet_pt_data.insert(et_Data1);
myet_pt_data.insert(pt_Data2);
for (auto &i:myet_pt_data)
{
std::cout<<i<<"\n";
}
}
You have to use two loops, like so:
for (auto const& it1 : myet_pt_data)
{
for (auto const& it2 : it1)
{
std::cout << it2.first << '\t' << it2.second << std::endl;
}
}
When you use auto const& in your range-based for loops you avoid copying the set and all its content.
The types are as follows:
decltype(it1) is std::map<std::string,values> const&, and
decltype(it2) is std::pair<std::string const,values> const&
For completeness, note that the std::string in it2 (the std::pair) is const.
How about this:
#include <iostream>
#include <map>
#include <set>
#include <algorithm>
using namespace std;
int main(int argc, char *argv[])
{
set<map<int,string> > list;
//fill list
std::for_each(list.begin(), list.end(), [](auto set_el){
std::for_each(set_el.begin(),set_el.end(),[](auto map_el) {
std::cout<<map_el.first<<"\t"<<map_el.second<<std::endl;
});
});
cout << "Hello World!" << endl;
return 0;
}
You are probably missing a inner loop:
// iterates through all elements of the set:
for (const auto& the_map : myet_pt_data)
{
// the_map takes all values from the set
// the_map actual type is const std::map<std::string,values>&
for (const auto& the_value : the_map)
{
// the_value takes all value of the current map (the_map)
// the_value actual type is const std::pair<std::string,values>&
std::cout << the_value.first << " value is " << the_value.second << std::endl;
}
}
Are there any C++ transformations which are similar to itertools.groupby()?
Of course I could easily write my own, but I'd prefer to leverage the idiomatic behavior or compose one from the features provided by the STL or boost.
#include <cstdlib>
#include <map>
#include <algorithm>
#include <string>
#include <vector>
struct foo
{
int x;
std::string y;
float z;
};
bool lt_by_x(const foo &a, const foo &b)
{
return a.x < b.x;
}
void list_by_x(const std::vector<foo> &foos, std::map<int, std::vector<foo> > &foos_by_x)
{
/* ideas..? */
}
int main(int argc, const char *argv[])
{
std::vector<foo> foos;
std::map<int, std::vector<foo> > foos_by_x;
std::vector<foo> sorted_foos;
std::sort(foos.begin(), foos.end(), lt_by_x);
list_by_x(sorted_foos, foos_by_x);
return EXIT_SUCCESS;
}
This doesn't really answer your question, but for the fun of it, I implemented a group_by iterator. Maybe someone will find it useful:
#include <assert.h>
#include <iostream>
#include <set>
#include <sstream>
#include <string>
#include <vector>
using std::cout;
using std::cerr;
using std::multiset;
using std::ostringstream;
using std::pair;
using std::vector;
struct Foo
{
int x;
std::string y;
float z;
};
struct FooX {
typedef int value_type;
value_type operator()(const Foo &f) const { return f.x; }
};
template <typename Iterator,typename KeyFunc>
struct GroupBy {
typedef typename KeyFunc::value_type KeyValue;
struct Range {
Range(Iterator begin,Iterator end)
: iter_pair(begin,end)
{
}
Iterator begin() const { return iter_pair.first; }
Iterator end() const { return iter_pair.second; }
private:
pair<Iterator,Iterator> iter_pair;
};
struct Group {
KeyValue value;
Range range;
Group(KeyValue value,Range range)
: value(value), range(range)
{
}
};
struct GroupIterator {
typedef Group value_type;
GroupIterator(Iterator iter,Iterator end,KeyFunc key_func)
: range_begin(iter), range_end(iter), end(end), key_func(key_func)
{
advance_range_end();
}
bool operator==(const GroupIterator &that) const
{
return range_begin==that.range_begin;
}
bool operator!=(const GroupIterator &that) const
{
return !(*this==that);
}
GroupIterator operator++()
{
range_begin = range_end;
advance_range_end();
return *this;
}
value_type operator*() const
{
return value_type(key_func(*range_begin),Range(range_begin,range_end));
}
private:
void advance_range_end()
{
if (range_end!=end) {
typename KeyFunc::value_type value = key_func(*range_end++);
while (range_end!=end && key_func(*range_end)==value) {
++range_end;
}
}
}
Iterator range_begin;
Iterator range_end;
Iterator end;
KeyFunc key_func;
};
GroupBy(Iterator begin_iter,Iterator end_iter,KeyFunc key_func)
: begin_iter(begin_iter),
end_iter(end_iter),
key_func(key_func)
{
}
GroupIterator begin() { return GroupIterator(begin_iter,end_iter,key_func); }
GroupIterator end() { return GroupIterator(end_iter,end_iter,key_func); }
private:
Iterator begin_iter;
Iterator end_iter;
KeyFunc key_func;
};
template <typename Iterator,typename KeyFunc>
inline GroupBy<Iterator,KeyFunc>
group_by(
Iterator begin,
Iterator end,
const KeyFunc &key_func = KeyFunc()
)
{
return GroupBy<Iterator,KeyFunc>(begin,end,key_func);
}
static void test()
{
vector<Foo> foos;
foos.push_back({5,"bill",2.1});
foos.push_back({5,"rick",3.7});
foos.push_back({3,"tom",2.5});
foos.push_back({7,"joe",3.4});
foos.push_back({5,"bob",7.2});
ostringstream out;
for (auto group : group_by(foos.begin(),foos.end(),FooX())) {
out << group.value << ":";
for (auto elem : group.range) {
out << " " << elem.y;
}
out << "\n";
}
assert(out.str()==
"5: bill rick\n"
"3: tom\n"
"7: joe\n"
"5: bob\n"
);
}
int main(int argc,char **argv)
{
test();
return 0;
}
Eric Niebler's ranges library provides a group_by view.
according to the docs it is a header only library and can be included easily.
It's supposed to go into the standard C++ space, but can be used with a recent C++11 compiler.
minimal working example:
#include <map>
#include <vector>
#include <range/v3/all.hpp>
using namespace std;
using namespace ranges;
int main(int argc, char **argv) {
vector<int> l { 0,1,2,3,6,5,4,7,8,9 };
ranges::v3::sort(l);
auto x = l | view::group_by([](int x, int y) { return x / 5 == y / 5; });
map<int, vector<int>> res;
auto i = x.begin();
auto e = x.end();
for (;i != e; ++i) {
auto first = *((*i).begin());
res[first / 5] = to_vector(*i);
}
// res = { 0 : [0,1,2,3,4], 1: [5,6,7,8,9] }
}
(I compiled this with clang 3.9.0. and --std=c++11)
I recently discovered cppitertools.
It fulfills this need exactly as described.
https://github.com/ryanhaining/cppitertools#groupby
What is the point of bloating standard C++ library with an algorithm that is one line of code?
for (const auto & foo : foos) foos_by_x[foo.x].push_back(foo);
Also, take a look at std::multimap, it might be just what you need.
UPDATE:
The one-liner I have provided is not well-optimized for the case when your vector is already sorted. A number of map lookups can be reduced if we remember the iterator of previously inserted object, so it the "key" of the next object and do a lookup only when the key is changing. For example:
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
struct foo {
int x;
std::string y;
float z;
};
class optimized_inserter {
public:
typedef std::map<int, std::vector<foo> > map_type;
optimized_inserter(map_type & map) : map(&map), it(map.end()) {}
void operator()(const foo & obj) {
typedef map_type::value_type value_type;
if (it != map->end() && last_x == obj.x) {
it->second.push_back(obj);
return;
}
last_x = obj.x;
it = map->insert(value_type(obj.x, std::vector<foo>({ obj }))).first;
}
private:
map_type *map;
map_type::iterator it;
int last_x;
};
int main()
{
std::vector<foo> foos;
std::map<int, std::vector<foo>> foos_by_x;
foos.push_back({ 1, "one", 1.0 });
foos.push_back({ 3, "third", 2.5 });
foos.push_back({ 1, "one.. but third", 1.5 });
foos.push_back({ 2, "second", 1.8 });
foos.push_back({ 1, "one.. but second", 1.5 });
std::sort(foos.begin(), foos.end(), [](const foo & lhs, const foo & rhs) {
return lhs.x < rhs.x;
});
std::for_each(foos.begin(), foos.end(), optimized_inserter(foos_by_x));
for (const auto & p : foos_by_x) {
std::cout << "--- " << p.first << "---\n";
for (auto & f : p.second) {
std::cout << '\t' << f.x << " '" << f.y << "' / " << f.z << '\n';
}
}
}
How about this?
template <typename StructType, typename FieldSelectorUnaryFn>
auto GroupBy(const std::vector<StructType>& instances, const FieldSelectorUnaryFn& fieldChooser)
{
StructType _;
using FieldType = decltype(fieldChooser(_));
std::map<FieldType, std::vector<StructType>> instancesByField;
for (auto& instance : instances)
{
instancesByField[fieldChooser(instance)].push_back(instance);
}
return instancesByField;
}
and use it like this:
auto itemsByX = GroupBy(items, [](const auto& item){ return item.x; });
I wrote a C++ library to address this problem in an elegant way. Given your struct
struct foo
{
int x;
std::string y;
float z;
};
To group by y you simply do:
std::vector<foo> dataframe;
...
auto groups = group_by(dataframe, &foo::y);
You can also group by more than one variable:
auto groups = group_by(dataframe, &foo::y, &foo::x);
And then iterate through the groups normally:
for(auto& [key, group]: groups)
{
// do something
}
It also has other operations such as: subset, concat, and others.
I would simply use boolinq.h, which includes all of LINQ. No documentation, but very simple to use.
I came across one requirement where the record is stored as
Name : Employee_Id : Address
where Name and Employee_Id are supposed to be keys that is, a search function is to be provided on both Name and Employee Id.
I can think of using a map to store this structure
std::map< std:pair<std::string,std::string> , std::string >
// < < Name , Employee-Id> , Address >
but I'm not exactly sure how the search function will look like.
Boost.Multiindex
This is a Boost example
In the above example an ordered index is used but you can use also a hashed index:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <string>
#include <iostream>
struct employee
{
int id_;
std::string name_;
std::string address_;
employee(int id,std::string name,std::string address):id_(id),name_(name),address_(address) {}
};
struct id{};
struct name{};
struct address{};
struct id_hash{};
struct name_hash{};
typedef boost::multi_index_container<
employee,
boost::multi_index::indexed_by<
boost::multi_index::ordered_unique<boost::multi_index::tag<id>, BOOST_MULTI_INDEX_MEMBER(employee,int,id_)>,
boost::multi_index::ordered_unique<boost::multi_index::tag<name>,BOOST_MULTI_INDEX_MEMBER(employee,std::string,name_)>,
boost::multi_index::ordered_unique<boost::multi_index::tag<address>, BOOST_MULTI_INDEX_MEMBER(employee,std::string,address_)>,
boost::multi_index::hashed_unique<boost::multi_index::tag<id_hash>, BOOST_MULTI_INDEX_MEMBER(employee,int,id_)>,
boost::multi_index::hashed_unique<boost::multi_index::tag<name_hash>, BOOST_MULTI_INDEX_MEMBER(employee,std::string,name_)>
>
> employee_set;
typedef boost::multi_index::index<employee_set,id>::type employee_set_ordered_by_id_index_t;
typedef boost::multi_index::index<employee_set,name>::type employee_set_ordered_by_name_index_t;
typedef boost::multi_index::index<employee_set,name_hash>::type employee_set_hashed_by_name_index_t;
typedef boost::multi_index::index<employee_set,id>::type::const_iterator employee_set_ordered_by_id_iterator_t;
typedef boost::multi_index::index<employee_set,name>::type::const_iterator employee_set_ordered_by_name_iterator_t;
typedef boost::multi_index::index<employee_set,id_hash>::type::const_iterator employee_set_hashed_by_id_iterator_t;
typedef boost::multi_index::index<employee_set,name_hash>::type::const_iterator employee_set_hashed_by_name_iterator_t;
int main()
{
employee_set employee_set_;
employee_set_.insert(employee(1, "Employer1", "Address1"));
employee_set_.insert(employee(2, "Employer2", "Address2"));
employee_set_.insert(employee(3, "Employer3", "Address3"));
employee_set_.insert(employee(4, "Employer4", "Address4"));
// search by id using an ordered index
{
const employee_set_ordered_by_id_index_t& index_id = boost::multi_index::get<id>(employee_set_);
employee_set_ordered_by_id_iterator_t id_itr = index_id.find(2);
if (id_itr != index_id.end() ) {
const employee& tmp = *id_itr;
std::cout << tmp.id_ << ", " << tmp.name_ << ", " << tmp .address_ << std::endl;
} else {
std::cout << "No records have been found\n";
}
}
// search by non existing id using an ordered index
{
const employee_set_ordered_by_id_index_t& index_id = boost::multi_index::get<id>(employee_set_);
employee_set_ordered_by_id_iterator_t id_itr = index_id.find(2234);
if (id_itr != index_id.end() ) {
const employee& tmp = *id_itr;
std::cout << tmp.id_ << ", " << tmp.name_ << ", " << tmp .address_ << std::endl;
} else {
std::cout << "No records have been found\n";
}
}
// search by name using an ordered index
{
const employee_set_ordered_by_name_index_t& index_name = boost::multi_index::get<name>(employee_set_);
employee_set_ordered_by_name_iterator_t name_itr = index_name.find("Employer3");
if (name_itr != index_name.end() ) {
const employee& tmp = *name_itr;
std::cout << tmp.id_ << ", " << tmp.name_ << ", " << tmp .address_ << std::endl;
} else {
std::cout << "No records have been found\n";
}
}
// search by name using an hashed index
{
employee_set_hashed_by_name_index_t& index_name = boost::multi_index::get<name_hash>(employee_set_);
employee_set_hashed_by_name_iterator_t name_itr = index_name.find("Employer4");
if (name_itr != index_name.end() ) {
const employee& tmp = *name_itr;
std::cout << tmp.id_ << ", " << tmp.name_ << ", " << tmp .address_ << std::endl;
} else {
std::cout << "No records have been found\n";
}
}
// search by name using an hashed index but the name does not exists in the container
{
employee_set_hashed_by_name_index_t& index_name = boost::multi_index::get<name_hash>(employee_set_);
employee_set_hashed_by_name_iterator_t name_itr = index_name.find("Employer46545");
if (name_itr != index_name.end() ) {
const employee& tmp = *name_itr;
std::cout << tmp.id_ << ", " << tmp.name_ << ", " << tmp .address_ << std::endl;
} else {
std::cout << "No records have been found\n";
}
}
return 0;
}
If you want to use std::map, you can have two separate containers, each one having adifferent key (name, emp id) and the value should be a pointer the structure, so that you will not have multiple copies of the same data.
Example with tew keys:
#include <memory>
#include <map>
#include <iostream>
template <class KEY1,class KEY2, class OTHER >
class MultiKeyMap {
public:
struct Entry
{
KEY1 key1;
KEY2 key2;
OTHER otherVal;
Entry( const KEY1 &_key1,
const KEY2 &_key2,
const OTHER &_otherVal):
key1(_key1),key2(_key2),otherVal(_otherVal) {};
Entry() {};
};
private:
struct ExtendedEntry;
typedef std::shared_ptr<ExtendedEntry> ExtendedEntrySptr;
struct ExtendedEntry {
Entry entry;
typename std::map<KEY1,ExtendedEntrySptr>::iterator it1;
typename std::map<KEY2,ExtendedEntrySptr>::iterator it2;
ExtendedEntry() {};
ExtendedEntry(const Entry &e):entry(e) {};
};
std::map<KEY1,ExtendedEntrySptr> byKey1;
std::map<KEY2,ExtendedEntrySptr> byKey2;
public:
void del(ExtendedEntrySptr p)
{
if (p)
{
byKey1.erase(p->it1);
byKey2.erase(p->it2);
}
}
void insert(const Entry &entry) {
auto p=ExtendedEntrySptr(new ExtendedEntry(entry));
p->it1=byKey1.insert(std::make_pair(entry.key1,p)).first;
p->it2=byKey2.insert(std::make_pair(entry.key2,p)).first;
}
std::pair<Entry,bool> getByKey1(const KEY1 &key1)
{
const auto &ret=byKey1[key1];
if (ret)
return std::make_pair(ret->entry,true);
return std::make_pair(Entry(),false);
}
std::pair<Entry,bool> getByKey2(const KEY2 &key2)
{
const auto &ret=byKey2[key2];
if (ret)
return std::make_pair(ret->entry,true);
return std::make_pair(Entry(),false);
}
void deleteByKey1(const KEY1 &key1)
{
del(byKey1[key1]);
}
void deleteByKey2(const KEY2 &key2)
{
del(byKey2[key2]);
}
};
int main(int argc, const char *argv[])
{
typedef MultiKeyMap<int,std::string,int> M;
M map1;
map1.insert(M::Entry(1,"aaa",7));
map1.insert(M::Entry(2,"bbb",8));
map1.insert(M::Entry(3,"ccc",9));
map1.insert(M::Entry(7,"eee",9));
map1.insert(M::Entry(4,"ddd",9));
map1.deleteByKey1(7);
auto a=map1.getByKey1(2);
auto b=map1.getByKey2("ddd");
auto c=map1.getByKey1(7);
std::cout << "by key1=2 (should be bbb ): "<< (a.second ? a.first.key2:"Null") << std::endl;
std::cout << "by key2=ddd (should be ddd ): "<< (b.second ? b.first.key2:"Null") << std::endl;
std::cout << "by key1=7 (does not exist): "<< (c.second ? c.first.key2:"Null") << std::endl;
return 0;
}
Output:
by key1=2 (should be bbb ): bbb
by key2=ddd (should be ddd ): ddd
by key1=7 (does not exist): Null
If EmployeeID is the unique identifier, why use other keys? I would use EmployeeID as the internal key everywhere, and have other mappings from external/human readable IDs (such as Name) to it.
C++14 std::set::find non-key searches solution
This method saves you from storing the keys twice, once one the indexed object and secondly on as the key of a map as done at: https://stackoverflow.com/a/44526820/895245
This provides minimal examples of the central technique that should be easier to understand first: How to make a C++ map container where the key is part of the value?
#include <cassert>
#include <set>
#include <vector>
struct Point {
int x;
int y;
int z;
};
class PointIndexXY {
public:
void insert(Point *point) {
sx.insert(point);
sy.insert(point);
}
void erase(Point *point) {
sx.insert(point);
sy.insert(point);
}
Point* findX(int x) {
return *(this->sx.find(x));
}
Point* findY(int y) {
return *(this->sy.find(y));
}
private:
struct PointCmpX {
typedef std::true_type is_transparent;
bool operator()(const Point* lhs, int rhs) const { return lhs->x < rhs; }
bool operator()(int lhs, const Point* rhs) const { return lhs < rhs->x; }
bool operator()(const Point* lhs, const Point* rhs) const { return lhs->x < rhs->x; }
};
struct PointCmpY {
typedef std::true_type is_transparent;
bool operator()(const Point* lhs, int rhs) const { return lhs->y < rhs; }
bool operator()(int lhs, const Point* rhs) const { return lhs < rhs->y; }
bool operator()(const Point* lhs, const Point* rhs) const { return lhs->y < rhs->y; }
};
std::set<Point*, PointCmpX> sx;
std::set<Point*, PointCmpY> sy;
};
int main() {
std::vector<Point> points{
{1, -1, 1},
{2, -2, 4},
{0, 0, 0},
{3, -3, 9},
};
PointIndexXY idx;
for (auto& point : points) {
idx.insert(&point);
}
Point *p;
p = idx.findX(0);
assert(p->y == 0 && p->z == 0);
p = idx.findX(1);
assert(p->y == -1 && p->z == 1);
p = idx.findY(-2);
assert(p->x == 2 && p->z == 4);
}