How to interface this DFS code with a Boost Graph DFS? - c++
Background
I have defined formatting operations on trees that work pretty well when they are used a pre/in/post order operators with the following structure and DFS definition (it works also on k-ary trees):
struct Node
{
Node *parent = nullptr;
Node *left = nullptr;
Node *right = nullptr;
char data;
template<class Op1, class Op2, class Op3>
void depth_first_search(Op1 pre_order, Op2 in_order, Op3 post_order)
{
pre_order(*this);
if(this->left != nullptr && this->right != nullptr)
{
this->left->depth_first_search(pre_order, in_order, post_order);
in_order(*this);
this->right->depth_first_search(pre_order, in_order, post_order);
in_order(*this);
}
post_order(*this);
}
};
I can format this structure with the following:
Formatter formatter(my_config);
tree.depth_first_search(formatter.get_pre_order(), formatter.get_in_order(), formatter.get_post_order());
auto result = formatter.take_result();
Objective
Since they work nicely, I would like to re-use the same functors to operate formatting on Boost graphs when they have values of trees. So I am trying to get this (approximate/buggy) syntax working:
template<class Formatter>
class my_visitor : boost::default_dfs_visitor
{
Formatter & _formatter;
public:
my_visitor(Formatter& formatter) : _formatter(formatter){}
auto discover_vertex(auto vertex, auto const& g)
{
auto f = _formatter.get_pre_order();
f(vertex);
}
auto examine_edge(auto edge, auto const& g)
{
auto f = _formatter.get_in_order();
f(edge);
}
auto finish_vertex(auto vertex, auto const& g)
{
auto f = _formatter.get_post_order();
f(vertex);
}
};
So I can format the tree using a syntax like
Formatter formatter(my_config);
my_visitor vis(formatter);
depth_first_search(graph, root, boost::visitor(vis));
auto s = formatter.take_result();
Code
In its present state, the code compiles and run on Gobolt: https://godbolt.org/z/bzjqjbvE3
However, the last method called in the main function does not exist (yet) since I don't know how to specify it:
auto s = newick::generate_from(tree);
There is a commented draft of such function but I struggle fitting it to BGL:
///
/// #brief Generate a Newick string from a k-ary tree with no properties attached to edges or vertices
///
std::string generate_from(quetzal::coalescence::k_ary_tree<> graph)
{
using vertex_t = typename quetzal::coalescence::k_ary_tree<>::vertex_descriptor;
// Data access
std::predicate<vertex_t> auto has_parent = [&graph](vertex_t v){ return graph.has_parent(v); };
std::predicate<vertex_t> auto has_children = [&graph](vertex_t v){ return graph.has_children(v); };
newick::Formattable<vertex_t> auto label = [&graph](auto){ return ""; };
newick::Formattable<vertex_t> auto branch_length = [&graph](auto){ return ""; };
// We declare a generator passing it the data interfaces
auto generator = newick::make_generator(has_parent, has_children, label, branch_length);
// We expose its interface to the boost DFS algorithm
detail::newick_no_property_visitor vis(generator);
depth_first_search(graph, boost::visitor(vis));
}
Problem
I can not really wrap my head around the interface consistency and I struggle identifying the root (lol) of the problem:
the pre/in/post operations as I defined them are callables with 1 parameter having a Node semantic (encapsulating both the node identifier and the graph it refers to)
whereas visitors methods defined by the BGL have heterogenous signatures taking either vertex + graph or edge + graph parameters.
I also struggle mapping the pre/in/post order operations to the BGL visitor more convoluted interface. I think that pre-order maps to discover_vertex, in_order maps to examine_edge and post_order maps to finish_vertex, although I am not certain ?
Is there a way to reconcile these two interfaces or their semantic is too different and I have to duplicate/modify the formatting grammar?
Here's my take on it. Instead of generating a random graph, let's replicate the exact graph that you treated with your "old-style Node* 2-ary tree":
using G = quetzal::coalescence::k_ary_tree<>;
using V = G::vertex_descriptor;
enum {a,b,c,d,e,N};
G tree(N);
/*
* a
* / \
* / c
* / / \
* b d e
*/
add_edge(a, b, tree);
add_edge(a, c, tree);
add_edge(c, d, tree);
add_edge(c, e, tree);
auto name_map = boost::make_function_property_map<V>([](V v) -> char { return 'a' + v; });
print_graph(tree, name_map);
Prints
a --> b c
b -->
c --> d e
d -->
e -->
Implementing The DFS Visitor
The key is to wrap the "formatter" (generator) into a DFS visitor suitably.
Some notes:
the events exposed don't map 1:1
visitors are copied, so to refer to the same generator, the visitor should hold a reference to it
to make this easily extensible, I put the generator inside a State struct:
// We declare a generator passing it the data interfaces
struct State {
newick::generator<V, P, P, F, F, decltype(flavor)> gen;
std::stack<int> nth_child;
} state{{has_parent, has_children, label, branch_length, flavor}, {}};
// We expose its interface to the boost DFS algorithm
struct VisWrap : boost::default_dfs_visitor {
State& state_;
VisWrap(State& ref) : state_(ref) {}
void discover_vertex(V v, G const&) const {
state_.nth_child.push(0);
state_.gen.pre_order()(v);
}
void finish_vertex(V v, G const&) const {
state_.gen.post_order()(v);
state_.nth_child.pop();
}
void tree_edge(E e, G const& g) const {
if (state_.nth_child.top()++ > 0)
state_.gen.in_order()(target(e, g));
}
} vis{state};
depth_first_search(graph, boost::visitor(vis));
return state.gen.take_result();
Side Notes/Bug
You had mixed up in_edges and out_edges inside has_parent() and has_children(). So I changed those from:
bool has_parent(vertex_descriptor v) const {
auto [it1, it2] = out_edges(v, *this);
// since it is a tree, at most 1 parent
assert(std::distance(it1,it2) <= 1);
// if iterators equal, then no parent
return it1 == it2 ? false : true;
}
bool has_children(vertex_descriptor v) const {
auto [it1, it2] = in_edges(v, *this);
return std::distance(it1, it2) >= 1;
}
Into the (slightly modernized):
bool has_parent(vertex_descriptor v) const {
auto pp = make_iterator_range(in_edges(v, *this));
// since it is a tree, at most 1 parent
assert(boost::size(pp) <= 1);
return !pp.empty();
}
bool has_children(vertex_descriptor v) const {
return !make_iterator_range(out_edges(v, *this)).empty();
}
Note that using in_edges does necessitate changing to a model of BidirectionalGraph by changing
using directed_type = boost::directedS;
into
using directed_type = boost::bidirectionalS; // for boost::in_edges
Due to the timing issues of the generator operations vs. the visitor events I had to add some guards around adding/removing the comma delimiters. I have a feeling that if you remove the "impedance mismatch" in this area things will become simpler.
There's some work left that you left out of scope for this question (good idea), meaning actually generalizing for the label/length functions.
Full Demo
Live On Compiler Explorer
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>
#include <boost/graph/isomorphism.hpp>
#include <concepts>
#include <iostream>
#include <regex>
namespace quetzal::coalescence {
namespace details {
///
/// #brief Defines the desired properties and constraints for a coalescent graph
///
struct tree_traits {
/// #brief Trees are sparse graph in nature, adjacency_matrix would not be justified here
template <class... Types> using model = boost::adjacency_list<Types...>;
/// #brief We want to enforce avoiding multi-graphs (edges with same end nodes)
using out_edge_list_type = boost::setS;
/// #brief We don't allow for inserting vertices except at the end and we don't remove vertices.
/// This means that neither reallocation cost nor stability are reasons for preferring
/// listS to vecS.
using vertex_list_type = boost::vecS;
/// #brief Coalescent trees are directed acyclic graphs
using directed_type = boost::bidirectionalS; // for boost::in_edges
};
} // namespace details
///
/// #brief A class adapted for simulating coalescent trees as a rooted K-ary tree, where each node can
/// hold at most k number of children.
///
/// #remark
/// - Since topology (pure structure) matter, a coalescence tree is more than a data container.
/// - This class inherits from as a boost::graph with specialized edge and vertex properties
/// defined in details#tree_traits
/// - The simulation interface intentionally does not allow for removing edges or vertices,
/// but you may access the underlying boost graph object to do so.
///
template <class VertexProperties = boost::no_property, class EdgeProperties = boost::no_property>
struct k_ary_tree
: public details::tree_traits::model<
details::tree_traits::out_edge_list_type, details::tree_traits::vertex_list_type,
details::tree_traits::directed_type, VertexProperties, EdgeProperties> {
/// #brief Properties of an edge, e.g. a structure representing the series of demes visited or simply
/// the branch length.
using edge_properties = EdgeProperties;
/// #brief Properties of a vertex, e.g. a structure representing the mutational state.
using vertex_properties = VertexProperties;
/// #brief The type of graph hold by the tree class
using base_type = details::tree_traits::model<
details::tree_traits::out_edge_list_type, details::tree_traits::vertex_list_type,
details::tree_traits::directed_type, vertex_properties, edge_properties>;
using self_type = k_ary_tree<vertex_properties, edge_properties>;
/// #brief The type used for identifying vertices within the graph
using vertex_descriptor = typename self_type::vertex_descriptor;
/// #brief Inherit all constructor from boost graph
using base_type::base_type;
///
/// #brief Print the tree to the graphviz format
/// #remarks Intends to hide the bundles writer syntax
///
void to_graphviz(std::ostream& out) const {
using namespace boost;
return boost::write_graphviz(out, *this,
boost::make_label_writer(boost::get(vertex_bundle, *this)),
boost::make_label_writer(boost::get(edge_bundle, *this)));
}
///
/// #brief Returns true if there exists an isomorphism between this and other and false otherwise.
///
template <class T> bool is_isomorphic(T const& other) noexcept {
return boost::isomorphism(*this, other);
}
///
/// #brief Returns true if a given node has a parent node
///
bool has_parent(vertex_descriptor v) const {
auto pp = make_iterator_range(in_edges(v, *this));
// since it is a tree, at most 1 parent
assert(boost::size(pp) <= 1);
return !pp.empty();
}
///
/// #brief Returns true if a given node has children nodes
///
bool has_children(vertex_descriptor v) const {
return !make_iterator_range(out_edges(v, *this)).empty();
}
}; // end class Tree
} // end namespace quetzal::coalescence
namespace quetzal::format::newick {
namespace detail {
// Replacement for `std::function<T(U)>::argument_type`
template <typename T> struct single_function_argument;
template <typename Ret, typename Arg> struct single_function_argument<std::function<Ret(Arg)>> {
using type = Arg;
};
template <typename P1> struct single_function_argument_impl {
using type = typename single_function_argument<decltype(std::function{std::declval<P1>()})>::type;
};
template <typename P1>
using single_function_argument_t = typename single_function_argument_impl<P1>::type;
/// #brief Tag
struct parenthesis {};
/// #brief Tag
struct square_bracket {};
///
/// #brief Check if the string is balanced for open/close symbols (parenthesis,brackets)
///
///
/// #note Since parenthesis checking is a context-free grammar, it requires a stack.
/// Regex can not accomplish that since they do not have memory.
///
bool check_if_balanced(std::string_view input, char const& open = '(', char const& close = ')') {
int count = 0;
for (auto const& ch : input) {
if (ch == open)
count++;
if (ch == close)
count--;
// if a parenthesis is closed without being opened return false
if (count < 0)
return false;
}
// in the end the test is passed only if count is zero
return count == 0;
}
///
/// #brief Default comment removal policy: do not change anything
///
struct identity {
static std::string edit(const std::string s) { return s; }
};
///
/// #brief Class template, base for further specialization
///
template <class tag> struct is_balanced {};
///
/// #brief Specialization for parenthesis
///
template <> struct is_balanced<detail::parenthesis> {
static bool check(std::string_view s) { return check_if_balanced(s, '(', ')'); }
};
///
/// #brief Specialization for square bracket
///
template <> struct is_balanced<detail::square_bracket> {
static bool check(std::string_view s) { return check_if_balanced(s, '[', ']'); }
};
} // namespace detail
using namespace std::string_literals;
///
/// #brief Node names can be any character except blanks, colons, semicolons, parentheses, and square
/// brackets.
///
static inline std::vector<std::string> forbidden_labels = {" "s, ","s, ";"s, "("s, ")"s, "["s, "]"s};
///
/// #brief Underscore characters in unquoted labels are converted to blanks.
///
/// #detail Because you may want to include a blank in a name, it is assumed
/// that an underscore character ("_") stands for a blank; any of
/// these in a name will be converted to a blank when it is read in.
///
static inline constexpr char blank = '_';
///
/// #brief Template class.
///
template <unsigned int N> struct remove_comments_of_depth {};
///
/// #brief Do not remove anything
///
template <> struct remove_comments_of_depth<0> : detail::identity {};
///
/// #brief Remove all comments substrings contained between square brackets
///
/// #note Assumes that text is well formatted so there are no such things like [[]] or unclosed bracket
///
template <> struct remove_comments_of_depth<1> {
static std::string edit(const std::string s) {
if (s.empty())
return s;
return std::regex_replace(s, std::regex(R"(\[[^()]*\])"), "");
}
};
///
/// #brief Remove all comments substrings contained between square brackets of depth 2
///
/// #note Assumes that text is well formatted so there are no such things like [[]] or unclosed bracket
///
template <> struct remove_comments_of_depth<2> {
static std::string edit(const std::string s) {
std::string buffer;
int counter = 0;
for (auto const& ch : s) {
if (ch == '[')
counter++;
if (ch == ']')
counter--;
if (ch == '[' && counter == 2)
continue; // do nothing, that was the opening
if (ch == ']' && counter == 1)
continue; // do nothing, that was the closing
if (!(counter >= 2 || (counter == 1 && ch == ']')))
buffer.append(std::string(1, ch));
}
return buffer;
}
};
///
/// #brief Policy allowing to keep nested comments.
///
/// #note Use this as a template parameter to specialize a Newick generator policy
///
struct PAUP {
// return empty string
static inline std::string root_branch_length() { return ""; }
// do nothing
static inline std::string treat_comments(std::string& s) { return s; }
};
///
/// #brief Set a root node branch length to zero, allow comments of depth 1, but will remove nested
/// comments.
///
/// #note Use this as a template parameter to specialize a Newick generator policy
///
struct TreeAlign {
// Set explicit null branch length for root node
static inline std::string root_branch_length() { return ":0.0"; }
// Remove comments that are nested, keep comments of depth 1
static inline std::string treat_comments(const std::string s) {
return remove_comments_of_depth<2>::edit(s);
}
};
///
/// #brief Requires that an unrooted tree begin with a trifurcation; it will not "uproot" a rooted tree.
/// Allow comments of depth 1, but does not allow nested comments.
/// #note Use this as a template parameter to specialize a Newick generator policy
///
struct PHYLIP {
// Branch length for root node is not explicit.
static inline std::string root_branch_length() { return ""; }
// Remove comments that are nested, keep comments of depth 1
static inline std::string treat_comments(std::string& s) {
// Allow comments of depth 1, but does not allow nested comments.
return remove_comments_of_depth<2>::edit(s);
}
};
///
/// #brief Concept for label name: invocable and the return type is convertible to a string.
///
template <class F, class... Args>
concept Formattable =
std::invocable<F, Args...> && std::convertible_to<std::invoke_result_t<F, Args...>, std::string>;
///
/// #brief Generate the Newick formula from an external (custom) tree class.
///
/// #remark This is a non-intrusive interface implementation so users can reuse Newick formatting
/// logic and expose the formatting internals to their own Tree class's DFS.
template <class T, std::predicate<T> P1, std::predicate<T> P2, Formattable<T> F1, Formattable<T> F2,
class Policy = PAUP>
class generator : public Policy {
public:
///
/// #brief Type of node being formatted
///
using node_type = T;
///
/// #brief Type of formula being generated
///
using formula_type = std::string;
///
/// #brief Type of formula being generated
///
using policy_type = Policy;
private:
///
/// #brief End character.
///
static inline constexpr char _end = ';';
///
/// #brief The string formula to be updated.
///
mutable formula_type _formula;
///
/// #brief Functor inspecting if the node being visited has a parent.
///
P1 _has_parent;
///
/// #brief Functor inspecting if the node being visited has children.
///
P2 _has_children;
///
/// #brief Retrieve the name of the node being visited.
///
/// #detail A name can be any string of printable characters except blanks,
/// colons, semicolons, parentheses, and square brackets.
///
/// #remark Return type must be convertible to std::string
///
F1 _label;
///
/// #brief Retrieve the branch length immediately above the node being visited.
///
/// #details Branch lengths can be incorporated into a tree by putting a
/// real number, with or without decimal point, after a node and
/// preceded by a colon. This represents the length of the branch
/// immediately above that node (that is, distance to parent node)
/// #remark Return type must be convertible to std::string
///
F2 _branch_length;
void _pre_order(node_type const& node) const {
if (std::invoke(_has_children, node)) {
_formula += '(';
}
}
void _in_order(node_type const&) const { _formula += ","; }
void _post_order(node_type const& node) const {
if (std::invoke(_has_children, node)) {
if (_formula.back() == ',')
_formula.pop_back(); // Remove comma
_formula += ')';
}
if (std::invoke(_has_parent, node)) {
auto label = std::invoke(_label, node);
if (has_forbidden_characters(remove_comments_of_depth<1>::edit(label))) {
throw std::invalid_argument(std::string("Label with forbidden characters:") +
std::string(label));
}
_formula += label;
std::string branch(std::invoke(_branch_length, node));
if (!branch.empty()) {
_formula += ":";
_formula += branch;
}
} else {
_formula += std::invoke(_label, node);
_formula += policy_type::root_branch_length();
}
}
public:
///
/// #brief Constructor
///
generator(P1 has_parent, P2 has_children, F1 label, F2 branch_length, Policy pol = {})
: policy_type(std::move(pol))
, _has_parent(std::move(has_parent))
, _has_children(std::move(has_children))
, _label(std::move(label))
, _branch_length(std::move(branch_length)) {}
///
/// #brief Operation called in the general DFS algorithm to open a parenthesis if node has children to
/// be visited.
///
/// #param node the node currently visited
///
auto pre_order() {
return [this](node_type const& node) { this->_pre_order(node); };
}
///
/// #brief Operation called in the general DFS algorithm to add a comma between visited nodes.
///
auto in_order() {
return [this](node_type const& node) { this->_in_order(node); };
}
///
/// #brief Operation to be passed to a generic DFS algorithm to open a parenthesis if node has
/// children to be visited.
///
/// #param node the node currently visited
///
auto post_order() {
return [this](node_type const& node) { this->_post_order(node); };
}
///
/// #brief Check if a string contains characters forbidden by the standard
///
bool has_forbidden_characters(std::string const& s) const {
if (s.empty())
return false;
std::string forbidden = " ,;()[\\]";
bool is_forbidden = std::regex_search(s, std::regex("[" + forbidden + "]"));
return is_forbidden;
}
///
/// #brief Clear the formula buffer to refresh the generator.
///
void clear() { _formula.clear(); }
///
/// #brief Retrieve the formatted string of the given node in the specified format
///
std::string&& take_result() const {
_formula.push_back(this->_end);
_formula = policy_type::treat_comments(_formula);
if (detail::is_balanced<detail::parenthesis>::check(_formula) == false) {
throw std::runtime_error(std::string("Failed: formula parenthesis are not balanced:") +
_formula);
}
if (detail::is_balanced<detail::square_bracket>::check(_formula) == false) {
throw std::runtime_error(std::string("Failed: formula square brackets are not balanced:") +
_formula);
}
return std::move(_formula);
}
}; // end structure generator
///
/// #brief User-defined deduction guide where the node/graph type T is deduced from P1
/// #remark Template deduction guides are patterns associated with a template
/// class that tell the compiler how to translate a set of constructor
/// arguments (and their types) into template parameters for the class.
template <class P1, class P2, class F1, class F2, class Policy = PAUP>
generator(P1, P2, F1, F2, Policy pol = {})
-> generator<detail::single_function_argument_t<P1>, P1, P2, F1, F2, Policy>;
} // end namespace quetzal::format::newick
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/connected_components.hpp>
#include <boost/graph/graph_utility.hpp>
#include <boost/graph/graphviz.hpp>
#include <boost/property_map/function_property_map.hpp>
#include <iomanip>
// Simplistic tree for testing - emulates implementations found in my field
struct Node {
Node* parent = nullptr;
Node* left = nullptr;
Node* right = nullptr;
char data;
template <class Op1, class Op2, class Op3>
void depth_first_search(Op1 pre_order, Op2 in_order, Op3 post_order) {
pre_order(*this);
if (this->left != nullptr && this->right != nullptr) {
this->left->depth_first_search(pre_order, in_order, post_order);
in_order(*this);
this->right->depth_first_search(pre_order, in_order, post_order);
in_order(*this);
}
post_order(*this);
}
};
using Flavor = quetzal::format::newick::TreeAlign;
std::string legacy_implementation() {
/* Topology :
*
* a
* / \
* / c
* / / \
* b d e
*/
Node a, b, c, d, e;
a.data = 'a';
b.data = 'b';
c.data = 'c';
d.data = 'd';
e.data = 'e';
a.left = &b;
b.parent = &a;
a.right = &c;
c.parent = &a;
c.left = &d;
d.parent = &c;
c.right = &e;
e.parent = &c;
namespace newick = quetzal::format::newick;
// Interfacing Quetzal generator with non-quetzal tree types
std::predicate<Node> auto has_parent = [](Node const& n) { return n.parent; };
std::predicate<Node> auto has_children = [](Node const& n) { return n.left && n.right; };
// Random data is generated for the branch length
newick::Formattable<Node> auto branch_length = [](Node const&) /*-> std::string*/ { return "0.1"; };
// More sophisticated label formatting
newick::Formattable<Node> auto label = [](Node const& n) {
return std::string(1, n.data) + "[my[comment]]";
};
// Writes a root node branch length with a value of 0.0 and disable/remove nested comments
newick::generator generator(has_parent, has_children, label, branch_length, Flavor());
a.depth_first_search(generator.pre_order(), generator.in_order(), generator.post_order());
// We retrieve the formatted string
return generator.take_result();
}
//
// #brief Generate a Newick string from a k-ary tree with no properties attached to edges or vertices
//
std::string generate_from(quetzal::coalescence::k_ary_tree<> const& graph, auto flavor) {
using G = quetzal::coalescence::k_ary_tree<>;
using V = G::vertex_descriptor;
using E = G::edge_descriptor;
namespace newick = quetzal::format::newick;
// Data access
using P = std::function<bool(V)>;
using F = std::function<std::string(V)>;
P has_parent = [&graph](V v) { return graph.has_parent(v); };
P has_children = [&graph](V v) { return graph.has_children(v); };
F branch_length = [](V) { return "0.1"; };
F label = [](V v) {
std::string r = "a[my[comment]]";
r.front() += v;
return r;
};
// We declare a generator passing it the data interfaces
struct State {
newick::generator<V, P, P, F, F, decltype(flavor)> gen;
std::stack<int> nth_child;
} state{{has_parent, has_children, label, branch_length, flavor}, {}};
// We expose its interface to the boost DFS algorithm
struct VisWrap : boost::default_dfs_visitor {
State& state_;
VisWrap(State& ref) : state_(ref) {}
void discover_vertex(V v, G const&) const {
state_.nth_child.push(0);
state_.gen.pre_order()(v);
}
void finish_vertex(V v, G const&) const {
state_.gen.post_order()(v);
state_.nth_child.pop();
}
void tree_edge(E e, G const& g) const {
if (state_.nth_child.top()++ > 0)
state_.gen.in_order()(target(e, g));
}
} vis{state};
depth_first_search(graph, boost::visitor(vis));
return state.gen.take_result();
}
int main() {
std::string const legacy = legacy_implementation();
assert(legacy == "(b[my]:0.1,(d[my]:0.1,e[my]:0.1)c[my]:0.1)a[my]:0.0;");
// NOW WITH BGL GRAPHS
using G = quetzal::coalescence::k_ary_tree<>;
enum {a,b,c,d,e,N};
G tree(N);
add_edge(a, b, tree);
add_edge(a, c, tree);
add_edge(c, d, tree);
add_edge(c, e, tree);
// Generate the newick string
auto const bgl = generate_from(tree, Flavor());
std::cout << quoted(bgl) << "\n";
std::cout << quoted(legacy) << "\n";
std::cout << (bgl == legacy?"matching":"MISMATCH") << "\n";
}
Prints
"(b[my]:0.1,(d[my]:0.1,e[my]:0.1)c[my]:0.1)a[my]:0.0;"
"(b[my]:0.1,(d[my]:0.1,e[my]:0.1)c[my]:0.1)a[my]:0.0;"
matching
Related
How to pick proper sorted container?
I'm developing a real-time game client C++. And wondering how to pick container correctly. Server sends new game objects (monsters, players), their changes and their removal. All of them have to be stored in a single container called World identified by unique <int>ID. So the most common operations are getObjByID() to change something and push_back(), remove(). Besides that client have to dynamically sort that World by objects fields like Distance or Type to pick an object for clients needs. That sorting can be made very often like every 10ms, and objects values can be dynamically changed by incoming server info like player moves and all other objects distance changes. My first intention was to use std::vector without alloc\free - reinit deleted objects, but reading internet made me thinking about std::map. The main reason I doubt about map is that values cannot be sorted. Is there a performant way to sort and filter std::vector or std::map without copying elements? Something like c# linq: var mons = world.Where(o=> o.isMonster()).OrderBy(o=> o.Distance); foreach(var mon in mons){ //do smth }
I recommend a different approach for two key reasons: A single data structure is unlikely to satisfy your needs. Using an interlocked set of structures with one main index and multiple indices for specific query types will serve you better Updating every entry when a single object moves is pretty wasteful. There is an entire set of spatial data structures that is designed to deal with looking up positions and finding objects in the vicinity. For my example, I'm using the R-Tree in Boost Let's start with some basic type definitions. I assume 2D coordinates and use simple integers for object and type IDs. Adapt as necessary: #include <boost/geometry.hpp> #include <boost/geometry/geometries/point.hpp> #include <boost/geometry/geometries/box.hpp> #include <boost/geometry/index/rtree.hpp> #include <iterator> // using std::back_inserter #include <unordered_map> #include <utility> // using std::swap, std::pair #include <vector> namespace game { using ObjectID = int; using TypeID = int; namespace bg = boost::geometry; namespace bgi = boost::geometry::index; using Point = bg::model::d2::point_xy<float, bg::cs::cartesian>; using PointEntry = std::pair<Point, ObjectID>; using RTree = bgi::rtree<PointEntry, bgi::quadratic<16> >; You want to query specific types, e.g. only monsters. So we need to keep track of objects per type and their positions. The way we set up the R-Tree, mapping a Point to an ObjectID, even allows us to iterate over all objects of a specific type by just using the RTree. class TypeState { public: RTree positions; void add(ObjectID id, Point position) { positions.insert(std::make_pair(position, id)); } void erase(ObjectID id, Point position) { positions.remove(std::make_pair(position, id)); } void move(ObjectID id, Point from, Point to) { positions.remove(std::make_pair(from, id)); positions.insert(std::make_pair(to, id)); } RTree::const_iterator begin() const noexcept { return positions.begin(); } RTree::const_iterator end() const noexcept { return positions.end(); } }; Next, we define the state per object. This needs to be linked to the type so that deleting the object will remove it from the RTree. Since I plan to keep all types in an unordered_map and these guarantee that pointers are not invalidated when elements are added or removed, we can simply use that. using TypeMap = std::unordered_map<TypeID, TypeState>; using TypePointer = TypeMap::pointer; class ObjectState { TypePointer type; ObjectID id; Point position; public: ObjectState() noexcept : type(), id() {} ObjectState(TypePointer type, ObjectID id, Point position) : type(type), id(id), position(position) { type->second.add(id, position); } ObjectState(ObjectState&& o) noexcept : type(o.type), id(o.id), position(o.position) { o.type = nullptr; } ObjectState(const ObjectState&) = delete; ~ObjectState() { if(type) type->second.erase(id, position); } void swap(ObjectState& o) noexcept { using std::swap; swap(type, o.type); swap(id, o.id); swap(position, o.position); } ObjectState& operator=(ObjectState&& o) noexcept { ObjectState tmp = std::move(o); swap(tmp); return *this; } ObjectState& operator=(const ObjectState&) = delete; TypeID get_type() const noexcept { return type->first; } ObjectID get_id() const noexcept { return id; } Point get_position() const noexcept { return position; } /** * Changes position * * Do not call this directly! Call WorldState::move */ void move(Point to) { type->second.move(id, position, to); position = to; } }; Finally, we can put it all together. Since we may also want to query all objects regardless of type, we add a second R-tree for just that purpose. This is also the place where we define our spatial queries. There are a lot of possibilities, such as K nearest neighbours, or all points within a range. See Predicates (boost::geometry::index::) There are also iterative queries that don't need temporary storage but I haven't used those for simplicity. Be careful about modifying data structures while queries are running. class WorldState { using ObjectMap = std::unordered_map<ObjectID, ObjectState>; TypeMap types; ObjectMap objects; RTree positions; /* * Warning: TypeMap must come before ObjectMap because ObjectState * borrows pointers to TypeMap entries. Therefore destructor order matters */ public: void add(TypeID type, ObjectID object, Point pos) { TypeMap::iterator typeentry = types.emplace(std::piecewise_construct, std::forward_as_tuple(type), std::forward_as_tuple()).first; objects.emplace(std::piecewise_construct, std::forward_as_tuple(object), std::forward_as_tuple(&(*typeentry), object, pos)); positions.insert(std::make_pair(pos, object)); } void move(ObjectID object, Point newpos) { ObjectState& objectstate = objects.at(object); positions.remove(std::make_pair(objectstate.get_position(), object)); positions.insert(std::make_pair(newpos, object)); objectstate.move(newpos); } void erase(ObjectID object) { ObjectMap::iterator found = objects.find(object); positions.remove(std::make_pair(found->second.get_position(), object)); objects.erase(found); } /** * Calls functor for all objects * * Do not add or remove objects during the query! * * \param fun functor called with (ObjectID, const ObjectState&) */ template<class Functor> void for_all_objects(Functor fun) const { for(ObjectMap::const_reference entry: objects) fun(entry.first, entry.second); } /** * Calls functor for all objects of given type * * \see for_all_objects */ template<class Functor> void for_all_of_type(TypeID type, Functor fun) const { TypeMap::const_iterator foundtype = types.find(type); if(foundtype == types.cend()) return; for(const PointEntry& entry: foundtype->second) fun(entry.second, objects.find(entry.second)->second); } /** * Calls functor for the K nearest objects around the given object * * The object passed to the functor can be manipulated, removed, or other * objects inserted during the functor call. But do not erase other * objects! * * \param fun functor called with (ObjectID, ObjectState&) */ template<class Functor> void for_k_around_object( unsigned count, ObjectID object, Functor fun) { Point pos = objects.at(object).get_position(); std::vector<PointEntry> result_n; positions.query(bgi::nearest(pos, count + 1), std::back_inserter(result_n)); for(const PointEntry& entry: result_n) { ObjectID found = entry.second; if(entry.second != object) // exclude itself fun(found, objects.find(found)->second); } } /** * K nearest objects of specific type around the given object * * \see for_k_around_object */ template<class Functor> void for_k_of_type_around_object( unsigned count, TypeID type, ObjectID object, Functor fun) { TypeMap::const_iterator foundtype = types.find(type); if(foundtype == types.cend()) return; const ObjectState& objectstate = objects.at(object); if(objectstate.get_type() == type) count += 1; // self will be returned by query Point pos = objectstate.get_position(); std::vector<PointEntry> result_n; foundtype->second.positions.query( bgi::nearest(pos, count), std::back_inserter(result_n)); for(const PointEntry& entry: result_n) { ObjectID found = entry.second; if(entry.second != object) // exclude itself fun(found, objects.find(found)->second); } } }; } /* namespace game */
Clang Tool that extracts the lambda body given the lambda type
I'm currently trying to implement a Clang Tool based on RecursiveASTVisitors (based on this tutorial) that applies code transformations based on lambdas given to a function. E.g. generate something based on the lambda given as an argument to foo: foo([](){}); This is easily possible. Find all callExprs that have the name foo and then find all lambdaExprs that are descendants of this callExpr: struct LambdaExprVisitor : public RecursiveASTVisitor<LambdaExprVisitor> { bool VisitLambdaExpr(LambdaExpr * lambdaExpr) { //Do stuff lambdaExpr->getCallOperator()->getBody()->dump(); return true; } }; struct CallVisitor : public RecursiveASTVisitor<CallVisitor> { //Find call expressions based on the given name bool VisitCallExpr(CallExpr * expr) { auto * callee = expr->getDirectCallee(); if(callee != nullptr && callee->getName() == "foo") { visitor.TraverseCallExpr(expr); } return true; } LambdaExprVisitor visitor; }; The problem that I now have is that there are multiple ways to get to pass a lambda function to this original function foo, e.g.: auto bar() { return [](){}; } int main() { foo(bar()); } And the earlier approach to get the body does not work here. Now I thought that the lambdas' bodies are known during compile-time and therefore the lambda body must somehow be inferable given the value of the Expr of the given paramter: struct CallVisitor : public RecursiveASTVisitor<CallVisitor> { bool VisitCallExpr(CallExpr * expr) { auto * callee = expr->getDirectCallee(); if(callee != nullptr && callee->getName() == "foo") { //Get the first argument which must be the lambda auto * arg = expr->getArg(0); //do something with the the first argument //? }; return true; } }; Is there a way to get the lambda body at this point? If not, is there a way to infer the lambda body differently without having to resort to implementing all possible ways to pass a lambda body to foo? Note: A matcher-based solution would also work for me.
A solution is to first traverse the translation unit and gather all lambdaExprs in a map that has their type as keys. Then, in a second traversal of the translation unit, it is possible to infer the lambda's body by the type. Here are the modified Visitors that now store a reference to this map (The type is encoded as its string representation): struct LambdaExprVisitor : public RecursiveASTVisitor<LambdaExprVisitor> { LambdaExprVisitor(std::map<std::string, LambdaExpr *>& id2Expr) : RecursiveASTVisitor<LambdaExprVisitor> {}, id2Expr { id2Expr } {} std::map<std::string, LambdaExpr *>& id2Expr; bool VisitLambdaExpr(LambdaExpr * lambdaExpr) { id2Expr.emplace( lambdaExpr->getType().getAsString(), lambdaExpr ); return true; } }; struct CallVisitor : public RecursiveASTVisitor<CallVisitor> { CallVisitor(std::map<std::string, LambdaExpr *>& id2Expr) : RecursiveASTVisitor<CallVisitor> {}, id2Expr { id2Expr } {} std::map<std::string, LambdaExpr *>& id2Expr; bool VisitCallExpr(CallExpr * expr) { auto * callee = expr->getDirectCallee(); if(callee != nullptr && callee->getName() == "foo") { //Get the expr from the map auto arg = expr->getArg(0)->getType().getAsString(); if(auto iter = id2Expr.find(arg); iter != id2Expr.end()) { //Do stuff with the lambdaExpr auto * lambdaExpr = iter->second; lambdaExpr->dump(); } }; return true; } }; The ASTConsumer to handle this just stores this map and executes the two visitors: struct Consumer : public ASTConsumer { public: virtual void HandleTranslationUnit(clang::ASTContext &Context) override { visitor.TraverseDecl(Context.getTranslationUnitDecl()); visitor2.TraverseDecl(Context.getTranslationUnitDecl()); } std::map<std::string, LambdaExpr *> id2Expr; LambdaExprVisitor visitor { id2Expr }; CallVisitor visitor2 { id2Expr }; };
How can I split "Segment" to two "Segments"? c++
given the following classes: #include <vector> using std::vector; enum TypeSeg { OPEN, CLOSE, CLOSE_LEFT, CLOSE_RIGHT }; template<class T> class Segment { T left; T right; TypeSeg typeS; public: Segment(T left, T right, TypeSeg typeS) : left(left), right(right), typeS(typeS) { } //..... }; template<class T> class SegCollection { vector<Segment<T>> segments; public: //..... }; So that Segment describes segment ?left, right? while: if typeS==OPEN so : (left,right). if typeS==CLOSE so : [left,right]. if typeS==CLOSE_LEFT so : [left,right). if typeS==CLOSE_RIGHTso : (left,right]. and SegCollection describes collection of segments so that: The SegCollection no contains two same segments , and no contains two segments with intersect (it will contain the union of them instead) and even no contains two segments like that: [1,4) and [4,5) (for example) , but, it will contain [1,5). How can I implement operator-() for SegCollection that delete segment from SegCollection so that: this segment not must be in SegCollection, but, all the points (from all the segments in SegCollection) that exists in SegCollection and in the deleted segment will be removed from the segments that exists in SegCollection. For example: given: [1,7] , [9,12] , if we will remove (2,5) , so we will get: [1,2] , [5,7] , [9,12]. I don't know (I thought about it some hours..) how can I treat in the case that I need to split segment following remove of segment (like [1,7] in the example , that changed to [1,2] , [5,7])? Note: Segment is a template-class because that it's can be from (int, int) , (float,float) e.g.
This tip may help. template<typename T> class SegCollection { vector<Segment<T>> segments; public: void push_back(Segment<T> seg) { segments.push_back(seg); } const SegCollection& operator-(const Segment<T> seg) { // implement some rules that recognize field // TypeSeg and depending on it, changing segments' content // ... return *this; } } Below find the very short application of your assumptions. I have changed class to struct due to the access reasons (you could leave class but you need to provide with some getters in order to have access to the private members) #include <vector> #include <algorithm> using std::vector; enum TypeSeg { OPEN, CLOSE, CLOSE_LEFT, CLOSE_RIGHT }; template<typename T> struct Segment { T left; T right; TypeSeg typeS; Segment(T left, T right, TypeSeg typeS) : left(left), right(right), typeS(typeS) { }; }; template<typename T> struct SegCollection { vector<Segment<T>> segments; void push_back(Segment<T> seg) { segments.push_back(seg); } const SegCollection& operator-(const Segment<T> seg) { //Find segment that could be divided auto it = std::find_if(segments.begin(), segments.end(), [&seg](auto i){ if (i.left <= seg.left) if (seg.right <= i.right) return true; return false; }); //If not found, return *this if (it == segments.end()) { std::cout << "Cannot do this, Bro." << std::endl; return *this; } //Set boundaries for new segments int new_right = it->left + seg.left - it->left; int new_left = seg.right; // Here you have to apply other conditions // I have used only one case - as you have mentioned in your post if (seg.typeS == OPEN) { Segment<T> seg_first(it->left, new_right, CLOSE); Segment<T> seg_second(new_left, it->right, CLOSE); *it = seg_second; this->segments.insert(it, seg_first); } return *this; } };
How to delete an element from custom list type in c++?
I am trying to implement adjacency list in C++. I want to write a function to delete an edge from a vertex. Refer following code. class edge { private: int destinationVertex; /*!< ::vertex.id of destination vertex in graph */ public: /** * Counstructor for edge. */ edge (int ver) : destinationVertex(ver) { } friend std::ostream& operator<<(std::ostream& a, edge const& e) { return a << e.destinationVertex; } /** #return value of ::destinationVertex */ int getDestinationVertex() { return destinationVertex; } ~edge(); }; class graph; class vertex { friend class graph; /** id of the vertex */ int id; /** list of destinations */ std::list<edge> list; public: /** * Counstructor that creates an new empty vertex. */ vertex(int id) : id(id) { } /** * #brief Overloading for << operator. * #details friend function that overloads the << operator for vertex * class and defines a printing behaviour. */ friend std::ostream& operator<<(std::ostream& s, vertex const& v) { s << v.id << "->"; std::copy(v.list.begin(), v.list.end(), std::ostream_iterator<edge>(s, ",")); return s; } /** * Linear search for a in list of edges of the vertex. * #param a value to search * #return true if element matches, else false */ bool find(int a) { for(std::list<edge>::iterator it = list.begin(); it != list.end(); ++it) { if((*it).getDestinationVertex() == a) return true; } return false; } /** * Returns degree of a vertex. * #return number of edges in vertex */ int deg() { return list.size(); } void removeEdge(const int id) { /// How do i use the remove function of list to delete elements /// Any other way to write this function } }; See the vertex.removeEdge(...). I have tried using list.remove(id); but it didn't work.
std::list::remove() removes all items that match a specified value. Your edge class can be constructed from an int value, but it has no comparison operators that std::list::remove() can use to compare edge objects for equality. You need to implement those operators, or else use std::list::remove_if() instead so you can do the comparisons using a predicate function/lambda. On the other hand, if the int value that is being passed to vertex::removeEdge() represents the same kind of value that is being passed to vertex::find(), then you could just use the same looping logic in removeEdge() that you already have in find(). Use std::list::erase() to remove an item by iterator. BTW, you might consider re-writing vertex::find() to use std::find_if() with a predicate, instead of using a manual loop.
C++ elegant way of holding different types of data in a member variable
If I have a class, and the type of its data may be int, float, double, char[], std::string, std::vector ... Now I'm using an enum to indicate which type the data is and a void* to dynamically allocate memory for the data. However, I'm sure there must be a much more elegant way. How to implement it without using boost?
Implement a "Variant" or "Any" type, as other have pointed out there are some implementations you can use already. But you can implement a simple version of your own if you dont want to use boost or other alternatives. You will need 2 structures for your types, a base class which will be the one you store, and a derived template class which will hold the actual object. Lets call them Placeholder and Holder: This is the base structure: /** * #brief The place holder structure.. */ struct PlaceHolder { /** * #brief Finalizes and instance of the PlaceHolder class. */ virtual ~PlaceHolder() {} /** * #brief Gets the type of the underlying value. */ virtual const std::type_info& getType() const = 0; /** * #brief Clones the holder. */ virtual PlaceHolder * clone() const = 0; }; And this will be the derived class: template<typename ValueType> struct Holder: public PlaceHolder { /** * #brief Initializes a new instance of the Holder class. * * #param ValueType The value to be holded. */ Holder(const ValueType & value) : held(value) {} /** * #brief Gets the type of the underlying value. */ virtual const std::type_info & getType() const { return typeid(ValueType); } /** * #brief Clones the holder. */ virtual PlaceHolder * clone() const { return new Holder(held); } ValueType held; }; Now we can this: PlaceHolder* any = new Holder<int>(3); And we can get the value back from it like this: int number = static_cast<Holder<int> *>(any)->held; This is not very practical, so we create a class that will handle all this stuff for us, and add some comodities to it, lets call it Any: /** * #brief This data type can be used to represent any other data type (for example, integer, floating-point, * single- and double-precision, user defined types, etc.). * * While the use of not explicitly declared variants such as this is not recommended, they can be of use when the needed * data type can only be known at runtime, when the data type is expected to vary, or when optional parameters * and parameter arrays are desired. */ class Any { public: /** * #brief Initializes a new instance of the Any class. */ Any() : m_content(0) { } /** * #brief Initializes a new instance of the Any class. * * #param value The value to be holded. */ template<typename ValueType> Any(const ValueType & value) : m_content(new Holder<ValueType>(value)) { } /** * #brief Initializes a new instance of the Any class. * * #param other The Any object to copy. */ Any(const Any & other) : m_content(other.m_content ? other.m_content->clone() : 0) { } /** * #brief Finalizes and instance of the Any class. */ virtual ~Any() { delete m_content; } /** * #brief Exchange values of two objects. * * #param rhs The Any object to be swapped with. * * #return A reference to this. */ Any& swap(Any & rhs) { std::swap(m_content, rhs.m_content); return *this; } /** * #brief The assignment operator. * * #param rhs The value to be assigned. * * #return A reference to this. */ template<typename ValueType> Any& operator=(const ValueType & rhs) { Any(rhs).swap(*this); return *this; } /** * #brief The assignment operator. * * #param rhs The value to be assigned. * * #return A reference to this. */ Any & operator=(const Any & rhs) { Any(rhs).swap(*this); return *this; } /** * #brief The () operator. * * #return The holded value. */ template<typename ValueType> ValueType operator()() const { if (!m_content) { //TODO: throw } else if (getType() == typeid(ValueType)) { return static_cast<Any::Holder<ValueType> *>(m_content)->held; } else { //TODO: throw } } /** * #brief Gets the underlying value. * * #return The holded value. */ template<typename ValueType> ValueType get(void) const { if (!m_content) { //TODO: throw } else if (getType() == typeid(ValueType)) { return static_cast<Any::Holder<ValueType> *>(m_content)->held; } else { //TODO: throw } } /** * #brief Tells whether the holder is empty or not. * * #return <tt>true</tt> if the holder is empty; otherwise <tt>false</tt>. */ bool isEmpty() const; { return !m_content; } /** * #brief Gets the type of the underlying value. */ const std::type_info& getType() const; { return m_content ? m_content->getType() : typeid(void); } protected: /** * #brief The place holder structure.. */ struct PlaceHolder { /** * #brief Finalizes and instance of the PlaceHolder class. */ virtual ~PlaceHolder() {} /** * #brief Gets the type of the underlying value. */ virtual const std::type_info& getType() const = 0; /** * #brief Clones the holder. */ virtual PlaceHolder * clone() const = 0; }; template<typename ValueType> struct Holder: public PlaceHolder { /** * #brief Initializes a new instance of the Holder class. * * #param ValueType The value to be holded. */ Holder(const ValueType & value) : held(value) {} /** * #brief Gets the type of the underlying value. */ virtual const std::type_info & getType() const { return typeid(ValueType); } /** * #brief Clones the holder. */ virtual PlaceHolder * clone() const { return new Holder(held); } ValueType held; }; protected: PlaceHolder* m_content; }; This implementation is based on the Any of Ogre you can use it for example like this: int main() { Any three = 3; int number = three.get<int>(); cout << number << "\n"; three = string("Three"); std::string word = three.get<string>(); cout << word << "\n"; return 0; } output: 3 Three
If there is a finite list of types, consider the visitor pattern. It is designed for when you have a small set of types, but many algorithms which wish to operate on the data. You often see it in 3d graphics scene graphs. It lets you effectively dynamic_cast a node to any type, but only requires a pair of virtual calls to do it, rather than a large number of dynamic_casts. class Visitor; class IntNode; class FloatNode; class Node { public: virtual void accept(Visitor& inVisitor) = 0; }; class Visitor { public: virtual void visit(IntNode& inNode) = 0; virtual void visit(FloatNode& inNode) = 0; }; class IntNode { public: virtual void accept(Visitor& inVisitor) { return inVisitor->visit(this); } int& value() { return mValue; } private: int mValue; } class FloatNode { public: virtual void accept(Visitor& inVisitor) { return inVisitor->visit(this); } float& value() { return mValue; } private: float mValue; } The idea is that you build an algorithm as a Visitor, and pass that algorithm to the Nodes. Each Node type's accept function "knows" the type of the node, so it can call the visitor's function for that particular type. Now the visitor knows the type of the node, and can do special processing for it. As am example, consider copying a node, done both using your initial way and then done using the new Visitor pattern OldNode* copyNodeOldWayWithEnums(OldNode* inNode) { switch(inNode->type) { case INT_TYPE: { int* oldValue = static_cast<int*>(inNode->value); OldNode* rval = new OldNode; rval->type = INT_TYPE; rval->value = new int(oldValue); return rval; } case FLOAT_TYPE: { float* oldValue = static_cast<float*>(inNode->value); OldNode* rval = new OldNode; rval->type = FLOAT_TYPE; rval->value = new float(oldValue); return rval; } case: throw std::runtime_error("Someone added a new type, but the copy algorithm didn't get updated"); } } class CopyVisitor : public Visitor { public: virtual visitor(IntNode& inNode) { int value = inNode.value(); mResult = new IntNode(value); } virtual visitor(FloatNode& inNode) { float value = inNode.value(); mResult = new FloatNode(value); } Node* mResult; } Node* copyNode(Node* inNode) { CopyVisitor v; inNode->accept(v); return v.mResult; } Traits of the visitor pattern It is not immediately intuitive. Of the design patterns that appear in the Gang of Four's design patterns (the definitive book of Object Oriented Designs), it is by FAR the most difficult to understand. Yes, that is a disadvantage... but it can be worth it none the less It is very typesafe. There are no unreliable static_casts or expensive dynamic_casts Adding a type is very time consuming, so make sure you know the node types before writing a lot of visitors. However, if you do add a type, you immediately get compiler errors until all of your visitors are updated. The enum method you were using doesn't give you a compiler error -- you have to wait for a runtime error, which is much harder to find. The Visitor pattern is TERRIBLY efficient at handling tree structures. This is why 3d scene graphs use it. If you find yourself using things like std::vector<Node*>, you'll find this pattern is very effective The Node destructor is naturally virtual. This means you can do something like "delete mNode", and have it release memory safely. You don't have to put a switch in your destructor to figure out the real type behind the void* and delete it correctly. Works best when there is a small number of node types, and a large number of algorithms. Does a very good job of aggregating algorithm code in one place (in a Visitor), instead of distributing it across the nodes. Now, all of this assumes you have a small list of node types. Visitor is designed for 5-20 types. If you want to store anything and everything in your node structure, boost::any is such a good solution that you should just take the time to install boost and use it. You will not beat it.