Vector of string shared memory map - c++

How to append string to a vector contained inside map? Structure is map(float,vector(string)) where the map is in shared memory.my question is if key==desired key then append string to the vector of strings?

Do you mean something like this:
#include <map>
#include <vector>
#include <string>
#include <iostream>
int main()
{
std::map<float, std::vector<std::string>> m;
m[.5f].emplace_back("First");
m[.5f].emplace_back("Second");
m[.0f].emplace_back("Hello");
m[.0f].emplace_back("World");
for(const auto& [key, value] : m)
{
std::cout << "Key: " << key << '\n';
for(const auto& str : value)
std::cout << '\t' << str << '\n';
}
std::cout.flush();
return 0;
}

Doing this in shared memory is pretty hard, actually.
If you get all the allocators right, and add the locking, you'd usually get very clunky code that is hard to read due to all the allocator passing around.
You can, however, employ Boost's scoped allocator adaptor which will do a lot (lot) of magic that makes life better.
I think the following code sample just about nails the sweet spot.
Warning: This builds on years of experience trying to beat this into submission. If you fall just outside of the boundary of "magic" (mostly the in-place construction support due to uses_allocator<> and scoped_allocator_adaptor) you will find it breaks up and you'll be writing a lot of manual constructor/conversion calls to make it work.
Live On Coliru
#define DEMO
#include <iostream>
#include <iomanip>
#include <mutex>
#include <boost/interprocess/containers/map.hpp>
#include <boost/interprocess/containers/string.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/managed_mapped_file.hpp> // For Coliru (doesn't support shared memory)
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/container/scoped_allocator.hpp>
namespace bip = boost::interprocess;
namespace bc = boost::container;
namespace Shared {
using Segment = bip::managed_mapped_file; // Coliru doesn't support bip::managed_shared_memory
template <typename T> using Alloc = bc::scoped_allocator_adaptor<bip::allocator<T, Segment::segment_manager> >;
template <typename V>
using Vector = bip::vector<V, Alloc<V> >;
template <typename K, typename V, typename Cmp = std::less<K> >
using Map = bip::map<K, V, Cmp, Alloc<std::pair<K const, V> > >;
using String = bip::basic_string<char, std::char_traits<char>, Alloc<char> >;
using Mutex = bip::interprocess_mutex;
}
namespace Lib {
using namespace Shared;
struct Data {
using Map = Shared::Map<float, Shared::Vector<Shared::String> >;
mutable Mutex _mx;
Map _map;
template <typename Alloc> Data(Alloc alloc = {}) : _map(alloc) {}
bool append(float f, std::string s) {
std::lock_guard<Mutex> lk(_mx); // lock
auto it = _map.find(f);
bool const exists = it != _map.end();
#ifndef DEMO
if (exists) {
it->second.emplace_back(s);
}
#else
// you didn't specify this, but lets insert new keys here, if
// only for the demo
_map[f].emplace_back(s);
#endif
return exists;
}
size_t size() const {
std::lock_guard<Mutex> lk(_mx); // lock
return _map.size();
}
friend std::ostream& operator<<(std::ostream& os, Data const& data) {
std::lock_guard<Mutex> lk(data._mx); // lock
for (auto& [f,v] : data._map) {
os << f << " ->";
for (auto& ss : v) {
os << " " << std::quoted(std::string(ss));
}
os << "\n";
}
return os;
}
};
}
struct Program {
Shared::Segment msm { bip::open_or_create, "data.bin", 10*1024 };
Lib::Data& _data = *msm.find_or_construct<Lib::Data>("data")(msm.get_segment_manager());
void report() const {
std::cout << "Map contains " << _data.size() << " entries\n" << _data;
}
};
struct Client : Program {
void run(float f) {
_data.append(f, "one");
_data.append(f, "two");
}
};
int main() {
{
Program server;
server.report();
Client().run(.5f);
Client().run(.6f);
}
// report again
Program().report();
}
First run would print:
Map contains 0 entries
Map contains 2 entries
0.5 -> "one" "two"
0.6 -> "one" "two"
A second run:
Map contains 2 entries
0.5 -> "one" "two"
0.6 -> "one" "two"
Map contains 2 entries
0.5 -> "one" "two" "one" "two"
0.6 -> "one" "two" "one" "two"

Related

How to use u8_to_u32_iterator in Boost Spirit X3?

I am using Boost Spirit X3 to create a programming language, but when I try to support Unicode, I get an error!
Here is an example of a simplified version of that program.
#define BOOST_SPIRIT_X3_UNICODE
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
struct sample : x3::symbols<unsigned> {
sample()
{
add("48", 10);
}
};
int main()
{
const std::string s("🌸");
boost::u8_to_u32_iterator<std::string::const_iterator> first{cbegin(s)},
last{cend(s)};
x3::parse(first, last, sample{});
}
Live on wandbox
What should I do?
As you noticed, internally char_encoding::unicode employs char32_t.
So, first changing the symbols accordingly:
template <typename T>
using symbols = x3::symbols_parser<boost::spirit::char_encoding::unicode, T>;
struct sample : symbols<unsigned> {
sample() { add(U"48", 10); }
};
Now the code fails calling into case_compare:
/home/sehe/custom/boost_1_78_0/boost/spirit/home/x3/string/detail/tst.hpp|74 col 33| error: no match for call to ‘(boost::spirit::x3::case_compare<boost::spirit::char_encoding::unicode>) (reference, char32_t&)’
As you can see it expects a char32_t reference, but u8_to_u32_iterator returns unsigned ints (std::uint32_t).
Just for comparison / sanity check: https://godbolt.org/z/1zozxq96W
Luckily you can instruct the u8_to_u32_iterator to use another co-domain type:
Live On Compiler Explorer
#define BOOST_SPIRIT_X3_UNICODE
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
template <typename T>
using symbols = x3::symbols_parser<boost::spirit::char_encoding::unicode, T>;
struct sample : symbols<unsigned> {
sample() { add(U"48", 10)(U"🌸", 11); }
};
int main() {
auto test = [](auto const& s) {
boost::u8_to_u32_iterator<decltype(cbegin(s)), char32_t> first{
cbegin(s)},
last{cend(s)};
unsigned parsed_value;
if (x3::parse(first, last, sample{}, parsed_value)) {
std::cout << s << " -> " << parsed_value << "\n";
} else {
std::cout << s << " FAIL\n";
}
};
for (std::string s : {"🌸", "48", "🤷"})
test(s);
}
Prints
🌸 -> 11
48 -> 10
🤷 FAIL

Detecting null pointers inside vector of boos::variant

Following the question in Heterogenous vectors of pointers. How to call functions.
I would like to know how to identify null points inside the vector of boost::variant.
Example code:
#include <boost/variant.hpp>
#include <vector>
template< typename T>
class A
{
public:
A(){}
~A(){}
void write();
private:
T data;
};
template< typename T>
void A<T>::write()
{
std::cout << data << std::endl;
}
class myVisitor
: public boost::static_visitor<>
{
public:
template< typename T>
void operator() (A<T>* a) const
{
a->write();
}
};
int main()
{
A<int> one;
A<double> two;
typedef boost::variant<A<int>*, A<double>* > registry;
std::vector<registry> v;
v.push_back(&one);
v.push_back(&two);
A<int>* tst = new A<int>;
for(auto x: v)
{
boost::apply_visitor(myVisitor(), x);
try {delete tst; tst = nullptr;}
catch (...){}
}
}
Since I am deleting the pointer I would hope that the last one will give me an error or something. How can I check if the entry in the entry is pointing to nullptr?
Note: this partly ignores the X/Y of this question, based on the tandom question (Heterogenous vectors of pointers. How to call functions)
What you seem to be after is polymorphic collections, but not with a virtual type hierarchy.
This is known as type erasure, and Boost Type Erasure is conveniently wrapped for exactly this use case with Boost PolyCollection.
The type erased variation would probably look like any_collection:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/member.hpp>
namespace pc = boost::poly_collection;
BOOST_TYPE_ERASURE_MEMBER(has_write, write)
using writable = has_write<void()>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
void write() const { std::cout << data << std::endl; }
private:
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
w.write();
}
}
Prints
3.14159
314
Note that the insertion order is preserved, but iteration is done type-by-type. This is also what makes PolyCollection much more efficient than "regular" containers that do not optimize allocation sizes or use pointers.
BONUS: Natural printing operator<<
Using classical dynamic polymorphism, this would not work without adding virtual methods, but with Boost TypeErasure ostreamable is a ready-made concept:
Live On Coliru
#include <boost/variant.hpp>
#include <cmath>
#include <iostream>
#include <vector>
#include <boost/poly_collection/any_collection.hpp>
#include <boost/type_erasure/operators.hpp>
namespace pc = boost::poly_collection;
using writable = boost::type_erasure::ostreamable<>;
template <typename T> class A {
public:
A(T value = 0) : data(value) {}
// A() = default; // rule of zero
//~A() = default;
private:
friend std::ostream& operator<<(std::ostream& os, A const& a) {
return os << a.data;
}
T data/* = 0*/;
};
int main()
{
pc::any_collection<writable> registry;
A<int> one(314);
A<double> two(M_PI);
registry.insert(one);
registry.insert(two);
for (auto& w : registry) {
std::cout << w << "\n";
}
}
Printing the same as before.
UPDATE
To the comment:
I want to create n A<someType> variables (these are big objects). All of these variables have a write function to write something to a file.
My idea is to collect all the pointers of these variables and at the end loop through the vector to call each write function. Now, it might happen that I want to allocate memory and delete a A<someType> variable. If this happens it should not execute the write function.
This sounds like one of the rare occasions where shared_ptr makes sense, because it allows you to observe the object's lifetime using weak_ptr.
Object Graph Imagined...
Let's invent a node type that can participate in a pretty large object graph, such that you would keep an "index" of pointers to some of its nodes. For this demonstration, I'll make it a tree-structured graph, and we're going to keep References to the leaf nodes:
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
Now, lets add identification to the Node base so we have an arbitrary way to identify nodes to delete (e.g. all nodes with odd ids). In addition, any node can have child nodes, so let's put that in the base node as well:
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
Now we need some concrete derived node types:
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
Yeah. Imagination is not my strong suit ¯\(ツ)/¯
Generate Random Data
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
Nothing fancy. Just a randomly generated tree with a max_depth and random distribution of node types.
write to Pretty-Print
Let's add some logic to display any object graph with indentation:
// for demo output
#include <boost/core/demangle.hpp>
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
To keep track of the indentation level I'll define these indent/unindent
manipulators modifying some custom state inside the stream object:
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
That should do.
Getting Leaf Nodes
Leaf nodes are the nodes without any children.
This is a depth-first tree visitor taking any output iterator:
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree; // that's a leaf node!
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
Removing some nodes:
Yet another depht-first visitor:
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
DEMO TIME
Tieing it all together, we can print a randomly generated graph:
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
This puts all its leaf node References in a container:
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
Which we can print using their write() methods:
std::cout << "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
Of course all the leafs are still alive. But we can change that! We will remove one in 5 nodes by id:
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
The reported number of leafs nodes would still seem the same. That's... not
what you wanted. Here's where your question comes in: how do we detect the
nodes that were deleted?
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
Now the count will accurately reflect the number of leaf nodes remaining.
Live On Coliru
#include <memory>
#include <vector>
#include <ostream>
using Object = std::shared_ptr<struct INode>;
using Reference = std::weak_ptr<struct INode>;
static size_t s_idgen = 0;
struct INode {
virtual void write(std::ostream& os) const = 0;
std::vector<Object> children;
size_t id() const { return _id; }
private:
size_t _id = s_idgen++;
};
template <typename> struct Node : INode {
void write(std::ostream& os) const override;
};
using Root = Node<struct root_tag>;
using Banana = Node<struct banana_tag>;
using Pear = Node<struct pear_tag>;
using Bicycle = Node<struct bicycle_tag>;
// etc
// for demo output
#include <boost/core/demangle.hpp>
#include <iostream>
#include <iomanip>
static auto s_indent = std::ios::xalloc();
std::ostream& indent(std::ostream& os) { return os.iword(s_indent) += 3, os; }
std::ostream& unindent(std::ostream& os) { return os.iword(s_indent) -= 3, os; }
std::ostream& linebreak(std::ostream& os) {
return os << "\n" << std::setw(os.iword(s_indent)) << "";
}
template <typename Tag> void Node<Tag>::write(std::ostream& os) const {
os << boost::core::demangle(typeid(Tag*).name()) << "(id:" << id() << ") {";
if (not children.empty()) {
for (auto& ch : children) {
ch->write(os << linebreak << "- " << indent);
os << unindent;
}
os << linebreak;
}
os << "}";
}
// generating demo data
#include <random>
#include <functional>
#include <array>
static std::mt19937 s_prng{std::random_device{}()};
static std::uniform_int_distribution<size_t> s_num_children(0, 3);
Object generate_object_graph(Object node, unsigned max_depth = 10) {
std::array<std::function<Object()>, 3> factories = {
[] { return std::make_shared<Banana>(); },
[] { return std::make_shared<Pear>(); },
[] { return std::make_shared<Bicycle>(); },
};
for(auto n = s_num_children(s_prng); max_depth && n--;) {
auto pick = factories.at(s_prng() % factories.size());
node->children.push_back(generate_object_graph(pick(), max_depth - 1));
}
return node;
}
template <typename Out>
Out get_leaf_nodes(Object const& tree, Out out) {
if (tree) {
if (tree->children.empty()) {
*out++ = tree;
} else {
for (auto& ch : tree->children) {
get_leaf_nodes(ch, out);
}
}
}
return out;
}
template <typename Pred>
size_t remove_nodes_if(Object tree, Pred predicate)
{
size_t n = 0;
if (!tree)
return n;
auto& c = tree->children;
// depth first
for (auto& child : c)
n += remove_nodes_if(child, predicate);
auto e = std::remove_if(begin(c), end(c), predicate);
n += std::distance(e, end(c));
c.erase(e, end(c));
return n;
}
#include <list>
int main()
{
auto root = generate_object_graph(std::make_shared<Root>());
root->write(std::cout);
std::list<Reference> leafs;
get_leaf_nodes(root, back_inserter(leafs));
std::cout << "\n------------"
<< "\nLeafs: " << leafs.size();
for (Reference& ref : leafs)
if (Object alive = ref.lock())
alive->write(std::cout << " ");
auto _2mod5 = [](Object const& node) { return (2 == node->id() % 5); };
std::cout << "\nRemoved " << remove_nodes_if(root, _2mod5) << " 2mod5 nodes from graph\n";
std::cout << "\n(Stale?) Leafs: " << leafs.size();
// some of them are not alive, see which are gone ("detecing the null pointers")
leafs.remove_if(std::mem_fn(&Reference::expired));
std::cout << "\nLive leafs: " << leafs.size();
}
Prints e.g.
root_tag*(id:0) {
- bicycle_tag*(id:1) {}
- bicycle_tag*(id:2) {
- pear_tag*(id:3) {}
}
- bicycle_tag*(id:4) {
- bicycle_tag*(id:5) {}
- bicycle_tag*(id:6) {}
}
}
------------
Leafs: 4 bicycle_tag*(id:1) {} pear_tag*(id:3) {} bicycle_tag*(id:5) {} bicycle_tag*(id:6) {}
Removed 1 2mod5 nodes from graph
(Stale?) Leafs: 4
Live leafs: 3
Or see the COLIRU link for a much larger sample.

Insert an element into a range

What is the right implementation of Insert method in the code below?
#include <ranges>
#include <vector>
#include <set>
template <std::ranges::range Range>
class Processor
{
public:
using T = std::ranges::range_value_t<Range>;
void Insert(Range range, T val)
{
//add val into range
}
};
int main()
{
std::vector<int> v;
Processor<std::vector<int>> p;
p.Insert(v, 5);
std::set<int> set;
Processor<std::set<int>> p1;
p.Insert(set, 5);
return 0;
}
Is it possible to insert to vector and set with the same code? (insertion into vector is probably push_back)
Is it possible to insert to vector and set with the same code? (insertion into vector is probably push_back)
If you want to call the same function for set and vector you could try this soultion using templates, constexpr if and std::same_as.
example:
#include <ios>
#include <iostream>
#include <set>
#include <type_traits>
#include <vector>
template <typename Range, typename T>
void
insertInSetOrVector (Range &range, T const &t)
{
if constexpr (std::same_as<std::vector<T>, Range>)
{
range.push_back (t);
}
else if constexpr (std::same_as<std::set<T>, Range>)
{
range.insert (t);
}
}
int
main ()
{
auto vec = std::vector<int>{};
insertInSetOrVector (vec, 42);
auto set = std::set<int>{};
insertInSetOrVector (set, 42);
std::cout << "vector: " << vec.at (0) << std::endl;
std::cout << std::boolalpha << "set has 42: " << (set.count (42) == 1) << std::endl;
return 0;
}
which prints:
vector: 42
set has 42: true
edit: thanks for the suggestion (set.count (42) == 1) is more accurate.

How to find most commonly occurring non-unique keys in Boost MultiIndex?

Boost MultiIndex Container, when defined to have hashed_non_unique keys, can group equivalent keys together and return them all against an equal_range query, as mentioned here. But I see no way of querying the largest range (or n largest ranges) in a set. Without comparing between the range sizes of distinct hashes, which can become computationally very expensive, is there a way to query the largest equal ranges?
If we consider a simple example, such as this one, I would like to query by frequency and get Tom as the first result, and then Jack and Leo in no particular order.
Ok, if you're using non-unique hashed indices, turns out equal_range does not invoke equality comparison for all the elements in the returned range (unlike common implementations of std::unordered_multimap, BTW), so the following can be very efficient:
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(auto it=i.begin(),end=i.end();it!=end;){
auto next=i.equal_range(*it).second;
res.emplace((std::size_t)std::distance(it,next),*it);
it=next;
}
return res;
}
To check how efficient this actually is, let's try instrumenting the element type:
Live Coliru Demo
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <cstring>
#include <functional>
#include <iostream>
#include <string>
#include <tuple>
#include <map>
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(auto it=i.begin(),end=i.end();it!=end;){
auto next=i.equal_range(*it).second;
res.emplace((std::size_t)std::distance(it,next),*it);
it=next;
}
return res;
}
struct instrumented_string:std::string
{
using std::string::string;
static void reset_nums()
{
num_hashes=0;
num_eqs=0;
}
static std::size_t num_hashes,num_eqs;
};
std::size_t instrumented_string::num_hashes=0;
std::size_t instrumented_string::num_eqs=0;
bool operator==(const instrumented_string& x,const instrumented_string& y)
{
++instrumented_string::num_eqs;
return static_cast<std::string>(x)==y;
}
std::size_t hash_value(const instrumented_string& x)
{
++instrumented_string::num_hashes;
return boost::hash<std::string>{}(x);
}
using namespace boost::multi_index;
using container=multi_index_container<
instrumented_string,
indexed_by<
hashed_non_unique<identity<instrumented_string>>
>
>;
int main()
{
auto values={"Tom","Jack","Leo","Bjarne","Subhamoy"};
container c;
for(auto& v:values){
for(auto i=100*std::strlen(v);i--;)c.insert(v);
}
instrumented_string::reset_nums();
auto gs=group_sizes(c);
for(const auto& g:gs){
std::cout<<g.first<<": "<<g.second.get()<<"\n";
}
std::cout<<"# hashes: "<<instrumented_string::num_hashes<<"\n";
std::cout<<"# eqs: "<<instrumented_string::num_eqs<<"\n";
}
Output
800: Subhamoy
600: Bjarne
400: Jack
300: Tom
300: Leo
# hashes: 5
# eqs: 5
So, for a container with 2,400 elements, invoking group_sizes has resulted in just 5 hash calculations and 5 equality comparisons (plus ~2,400 iterator increments, of course).
If you really want to get rid of hashes, the following can do:
Live Coliru Demo
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <cstring>
#include <functional>
#include <iostream>
#include <memory>
#include <string>
#include <map>
template<typename HashIndex>
struct internal_reference
{
const HashIndex& i;
const typename HashIndex::value_type& r;
std::size_t buc;
};
template<typename HashIndex>
struct internal_reference_equal_to
{
bool operator()(
const typename HashIndex::value_type& x,
const internal_reference<HashIndex>& y)const
{
return
std::addressof(x)==std::addressof(y.r)||
y.i.key_eq()(y.i.key_extractor()(x),y.i.key_extractor()(y.r));
}
bool operator()(
const internal_reference<HashIndex>& x,
const typename HashIndex::value_type& y)const
{
return (*this)(y,x);
}
};
template<typename HashIndex>
struct internal_reference_hash
{
std::size_t operator()(const internal_reference<HashIndex>& x)const
{
return x.buc;
}
};
template<typename HashIndex>
std::multimap<
std::size_t,
std::reference_wrapper<const typename HashIndex::value_type>,
std::greater<std::size_t>
> group_sizes(const HashIndex& i)
{
decltype(group_sizes(i)) res;
for(std::size_t buc=0,buc_count=i.bucket_count();buc<buc_count;++buc){
for(auto it=i.begin(buc),end=i.end(buc);it!=end;){
auto p=i.equal_range(
internal_reference<HashIndex>{i,*it,buc},
internal_reference_hash<HashIndex>{},
internal_reference_equal_to<HashIndex>{});
std::size_t dist=0;
auto next=it;
while(p.first!=p.second){
++p.first;
++dist;
++next;
}
res.emplace(dist,*it);
it=next;
}
}
return res;
}
struct instrumented_string:std::string
{
using std::string::string;
static void reset_nums()
{
num_hashes=0;
num_eqs=0;
}
static std::size_t num_hashes,num_eqs;
};
std::size_t instrumented_string::num_hashes=0;
std::size_t instrumented_string::num_eqs=0;
bool operator==(const instrumented_string& x,const instrumented_string& y)
{
++instrumented_string::num_eqs;
return static_cast<std::string>(x)==y;
}
std::size_t hash_value(const instrumented_string& x)
{
++instrumented_string::num_hashes;
return boost::hash<std::string>{}(x);
}
using namespace boost::multi_index;
using container=multi_index_container<
instrumented_string,
indexed_by<
hashed_non_unique<identity<instrumented_string>>
>
>;
int main()
{
auto values={"Tom","Jack","Leo","Bjarne","Subhamoy"};
container c;
for(auto& v:values){
for(auto i=100*std::strlen(v);i--;)c.insert(v);
}
instrumented_string::reset_nums();
auto gs=group_sizes(c);
for(const auto& g:gs){
std::cout<<g.first<<": "<<g.second.get()<<"\n";
}
std::cout<<"# hashes: "<<instrumented_string::num_hashes<<"\n";
std::cout<<"# eqs: "<<instrumented_string::num_eqs<<"\n";
}
Output
800: Subhamoy
600: Bjarne
400: Jack
300: Tom
300: Leo
# hashes: 0
# eqs: 0
But please bear in mind this version of group_sizes exploits the undocumented fact that elements with hash value h get placed in the bucket h%bucket_count() (or, put another way, internal_reference<HashIndex> hashing is technically not a conformant compatible extension of the index hash function).
It seems like you might be metter served with a std::map<K, std::vector<V> > like interface here.
You would still always have to do the counting.
To have the counting done "magically" you might consider making the "bucket key" a refcounting type.
This would be more magical than I'd be comfortable with for my code-bases. In particular, copied elements could easily cause overcounting.
Approach 1: BMI + RangeV3 for syntactic sugar
Warning: I consider this "advanced", as in the learning curve might be steepish. However, when you wield Ranges with ease, this can become a great productivity boost.
Note also, this does not in any way promise to increase performance. But you should note that no elements are copied, the vector (groups) merely contains subranges, which are iterator ranges into the multi-index container.
Live On Compiler Explorer
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index_container.hpp>
#include <iostream>
#include <iomanip>
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace bmi = boost::multi_index;
namespace vw = ranges::views;
namespace act = ranges::actions;
struct Person {
int m_id;
std::string m_name;
friend std::ostream& operator<<(std::ostream& os, Person const& p) {
return os << "[" << p.m_id << ", " << std::quoted(p.m_name) << "]";
}
};
typedef bmi::multi_index_container<
Person,
bmi::indexed_by<
bmi::ordered_unique<bmi::member<Person, int, &Person::m_id>>,
bmi::ordered_unique<
bmi::tag<struct by_name_id>,
bmi::composite_key<Person,
bmi::member<Person, std::string, &Person::m_name>,
bmi::member<Person, int, &Person::m_id>>
>
> >
Roster;
template <typename Index, typename KeyExtractor>
std::size_t distinct(const Index& i, KeyExtractor key) {
std::size_t res = 0;
for (auto it = i.begin(), it_end = i.end(); it != it_end;) {
++res;
it = i.upper_bound(key(*it));
}
return res;
}
int main() {
Roster const r {
{1, "Tom"},
{2, "Jack"},
{3, "Tom"},
{4, "Leo"}
};
fmt::print("Roster: {}\n", r);
static constexpr auto eq_ = std::equal_to<>{};
static constexpr auto name_ = std::mem_fn(&Person::m_name);
static constexpr auto size_ = [](auto const& r) constexpr { return std::distance(begin(r), end(r)); };
auto& idx = r.get<by_name_id>();
fmt::print("Distinct: {}, Index: {}\n", distinct(idx, name_), idx);
auto by_name_ = vw::group_by([](auto const&... arg) { return eq_(name_(arg)...); });
auto by_size_ = [](auto const&... subrange) { return (size_(subrange) > ...); };
auto groups = idx | by_name_ | ranges::to_vector;
for (auto&& x : groups |= act::sort(by_size_)) {
fmt::print("#{} persons in group {}: {}\n",
size_(x),
name_(ranges::front(x)),
x);
}
}
Prints:
Roster: {[1, "Tom"], [2, "Jack"], [3, "Tom"], [4, "Leo"]}
Distinct: 3, Index: {[2, "Jack"], [4, "Leo"], [1, "Tom"], [3, "Tom"]}
#2 persons in group Tom: {[1, "Tom"], [3, "Tom"]}
#1 persons in group Jack: {[2, "Jack"]}
#1 persons in group Leo: {[4, "Leo"]}
Note, I merely kept the distinct() function from the original link. You could drop it to remove some noise.
Approach 2: The same, but w/o Boost
Multi-index seems to be supplying nothing more than the ordered container now, so let's simplify:
Live On Compiler Explorer
#include <set>
#include <iostream>
#include <iomanip>
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace vw = ranges::views;
namespace act = ranges::actions;
struct Person {
int m_id;
std::string m_name;
friend std::ostream& operator<<(std::ostream& os, Person const& p) {
return os << "[" << p.m_id << ", " << std::quoted(p.m_name) << "]";
}
bool operator<(Person const& o) const { return m_name < o.m_name; }
};
int main() {
std::multiset<Person> const r {
{1, "Tom"},
{2, "Jack"},
{3, "Tom"},
{4, "Leo"}
};
fmt::print("Roster: {}\n", r);
static constexpr auto eq_ = std::equal_to<>{};
static constexpr auto name_ = std::mem_fn(&Person::m_name);
static constexpr auto size_ = [](auto const& r) constexpr { return std::distance(begin(r), end(r)); };
auto by_name_ = vw::group_by([](auto const&... arg) { return eq_(name_(arg)...); });
auto by_size_ = [](auto const&... subrange) { return (size_(subrange) > ...); };
auto groups = r | by_name_ | ranges::to_vector;
for (auto&& x : groups |= act::sort(by_size_)) {
fmt::print("#{} persons in group {}: {}\n",
size_(x),
name_(ranges::front(x)),
x);
}
}
Prints
Roster: {[2, "Jack"], [4, "Leo"], [1, "Tom"], [3, "Tom"]}
#2 persons in group Tom: {[1, "Tom"], [3, "Tom"]}
#1 persons in group Jack: {[2, "Jack"]}
#1 persons in group Leo: {[4, "Leo"]}
Bonus: Slightly more simplified assuming equality operator on Person suffices: https://godbolt.org/z/58xsTK

Hash an arbitrary precision value (boost::multiprecision::cpp_int)

I need to get the hash of a value with arbitrary precision (from Boost.Multiprecision); I use the cpp_int backend. I came up with the following code:
boost::multiprecision::cpp_int x0 = 1;
const auto seed = std::hash<std::string>{}(x0.str());
I don't need the code to be as fast as possible, but I find it very clumsy to hash the string representation.
So my question is twofold:
Keeping the arbitrary precision, can I hash the value more efficiently?
Maybe I should not insisting on keeping the arbitrary precision and I should convert to a double which I could hash easily (I would still however make the comparison needed for the hash table using the arbitrary precision value)?
You can (ab)use the serialization support:
Support for serialization comes in two forms:
Classes number, debug_adaptor, logged_adaptor and rational_adaptor have "pass through" serialization support which requires the underlying backend to be serializable.
Backends cpp_int, cpp_bin_float, cpp_dec_float and float128 have full support for Boost.Serialization.
So, let me cobble something together that works with boost and std unordered containers:
template <typename Map>
void test(Map const& map) {
std::cout << "\n" << __PRETTY_FUNCTION__ << "\n";
for(auto& p : map)
std::cout << p.second << "\t" << p.first << "\n";
}
int main() {
using boost::multiprecision::cpp_int;
test(std::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
test(boost::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
}
Let's forward the relevant hash<> implementations to our own hash_impl specialization that uses Multiprecision and Serialization:
namespace std {
template <typename backend>
struct hash<boost::multiprecision::number<backend> >
: mp_hashing::hash_impl<boost::multiprecision::number<backend> >
{};
}
namespace boost {
template <typename backend>
struct hash<multiprecision::number<backend> >
: mp_hashing::hash_impl<multiprecision::number<backend> >
{};
}
Now, of course, this begs the question, how is hash_impl implemented?
template <typename T> struct hash_impl {
size_t operator()(T const& v) const {
using namespace boost;
size_t seed = 0;
{
iostreams::stream<hash_sink> os(seed);
archive::binary_oarchive oa(os, archive::no_header | archive::no_codecvt);
oa << v;
}
return seed;
}
};
This looks pretty simple. That's because Boost is awesome, and writing a hash_sink device for use with Boost Iostreams is just the following straightforward exercise:
namespace io = boost::iostreams;
struct hash_sink {
hash_sink(size_t& seed_ref) : _ptr(&seed_ref) {}
typedef char char_type;
typedef io::sink_tag category;
std::streamsize write(const char* s, std::streamsize n) {
boost::hash_combine(*_ptr, boost::hash_range(s, s+n));
return n;
}
private:
size_t* _ptr;
};
Full Demo:
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_int/serialize.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream_buffer.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/functional/hash.hpp>
namespace mp_hashing {
namespace io = boost::iostreams;
struct hash_sink {
hash_sink(size_t& seed_ref) : _ptr(&seed_ref) {}
typedef char char_type;
typedef io::sink_tag category;
std::streamsize write(const char* s, std::streamsize n) {
boost::hash_combine(*_ptr, boost::hash_range(s, s+n));
return n;
}
private:
size_t* _ptr;
};
template <typename T> struct hash_impl {
size_t operator()(T const& v) const {
using namespace boost;
size_t seed = 0;
{
iostreams::stream<hash_sink> os(seed);
archive::binary_oarchive oa(os, archive::no_header | archive::no_codecvt);
oa << v;
}
return seed;
}
};
}
#include <unordered_map>
#include <boost/unordered_map.hpp>
namespace std {
template <typename backend>
struct hash<boost::multiprecision::number<backend> >
: mp_hashing::hash_impl<boost::multiprecision::number<backend> >
{};
}
namespace boost {
template <typename backend>
struct hash<multiprecision::number<backend> >
: mp_hashing::hash_impl<multiprecision::number<backend> >
{};
}
template <typename Map>
void test(Map const& map) {
std::cout << "\n" << __PRETTY_FUNCTION__ << "\n";
for(auto& p : map)
std::cout << p.second << "\t" << p.first << "\n";
}
int main() {
using boost::multiprecision::cpp_int;
test(std::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
test(boost::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
}
Prints
void test(const Map&) [with Map = std::unordered_map<boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<> >, std::basic_string<char> >]
one 2596148429267413814265248164610048
three 52494017394792286184940053450822912768476066341437098474218494553838871980785022157364316248553291776
two 13479973333575319897333507543509815336818572211270286240551805124608
void test(const Map&) [with Map = boost::unordered::unordered_map<boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<> >, std::basic_string<char> >]
three 52494017394792286184940053450822912768476066341437098474218494553838871980785022157364316248553291776
two 13479973333575319897333507543509815336818572211270286240551805124608
one 2596148429267413814265248164610048
As you can see, the difference in implementation between Boost's and the standard library's unordered_map show up in the different orderings for identical hashes.
Just to say that I've just added native hashing support (for Boost.Hash and std::hash) to git develop. It works for all the number types including those from GMP etc. Unfortunately that code won't be released until Boost-1.62 now.
The answer above that (ab)uses serialization support, is actually extremely cool and really rather clever ;) However, it wouldn't work if you wanted to use a vector-based hasher like CityHash, I added an example of using that by accessing the limbs directly to the docs: https://htmlpreview.github.io/?https://github.com/boostorg/multiprecision/blob/develop/doc/html/boost_multiprecision/tut/hash.html Either direct limb-access or the serialization tip will work with all previous releases of course.