Modify internal structure when implementing std::hash<T> - c++

I'm writing a custom OrderedTree class I want to use as a key to an unordered_set.
I want to do a couple things when hashing the Tree:
calculate the hash lazily and cache it as needed (since this may be an expensive operation),
maybe balance the tree.
Neither of these operations change the semantic equality or hash value of the object, but they do modify some private fields.
Unfortunately, trying to modify any members in OrderedTree while inside std::hash<Tree>::operator() seems to violate const correctness that unordered_set expects.
Can I use my OrderedTree with unordered_set? If so, how?
EDIT:
As per request in the comments, minimal proof of concept:
#include <unordered_set>
std::size_t hash_combine(std::size_t a, std::size_t b) {
// TODO: Copy from boost source or something
return 0;
}
struct Node {
int value;
Node *left, *right, *parent;
std::size_t hash(std::size_t seed) const {
if (left != nullptr)
seed = left->hash(seed);
std::hash<int> hasher;
seed = hash_combine(seed, hasher(value));
if (right != nullptr)
seed = right->hash(seed);
return seed;
}
};
struct Tree {
Tree(): hash_(0), root(nullptr) {}
Node *root;
std::size_t hash() const {
if (hash_ == 0 && root != nullptr) {
hash_ = root->hash(7);
}
return hash_;
}
private:
std::size_t hash_;
};
namespace std {
template<>
struct hash<Tree> {
std::size_t operator()(const Tree& t) const {
return t.hash();
}
};
}
int main() {
std::unordered_set<Tree> set;
}
When I try to compile I get:
Sample.cc:31:13: error: cannot assign to non-static data member within const member function 'hash'
hash_ = root->hash(7);
~~~~~ ^
Sample.cc:29:15: note: member function 'Tree::hash' is declared const here
std::size_t hash() const {
~~~~~~~~~~~~^~~~~~~~~~~~

There is a guarantee that std containers will only call const members when doing const or logically const operations. If those const operations are multiple-reader safe, then so is the container; contrawise, if they are not, neither is the container.
The immutability of the hash value and equality (or < on ordered containers) are the only things you need guarantee in a key type in an associative container. Actual const gives the above multiple-reader guarantee, which can be quite useful. What more, violating it costs you using this in the future, and/or subtle buts when someone does presume const means immutable.
You could carefully synchonize the write operation internally to keep the multiple-reader guarantee, or you can give it up.
To violate const, typically you use mutable. A const method that uses casting to bypass const risks Undefined Behaviour if the object was actually const, and not just a const view of a non-const object.
In general, be careful before using this kind of optimizaton; it can easily increase code complexity (hance bugs, maintenance, etc) more than it adds speed. And speeding up code is fungible: make sure you identify this as slow code and this part as a bottlenecm prior to investing in it. And if you are going to balance in hash, why wait for hash? Balance before insert!

Related

Copyable C++ coroutine with data

I have written a forward iterator that iterates over the nodes of a graph in order of a (preorder/postorder/inorder) DFS spanning tree. Since it is quite complicated compared to writing a simple DFS and calling a callback for each encountered node, I thought I could use C++20 coroutines to simplify the code of the iterator.
However, C++20 coroutines are not copyable (much less so if they are stateful) but iterators should better be copyable!
Is there any way I could still use some coroutine-like code for my iterator?
Note: I want all iterators to iterate independently from each other. If I copy an iterator and then call ++ on it, then only the iterator should have advanced, but not its copy.
I figured that, with the Miro Knejp "goto-hack", something resembling copyable co-routines is possible as follows (this toy example just counts "+1" "*2" until a certain value but it illustrates the point).
(1) this is just a simple wrapper for the actual function
template<class Func, class Data>
struct CopyableCoroutine {
Data data;
Func func;
CopyableCoroutine(const Data& _data, const Func& _func): data(_data), func(_func) {}
bool done() const { return data.done(); }
template<class... Args> decltype(auto) operator()(Args&&... args) {
return func(data, std::forward<Args>(args)...);
}
};
(2) this is where the magic happens
struct my_stack {
int n;
int next_label_idx = 0;
bool done() const { return n > 30; }
};
int wierd_count(my_stack& coro_data) {
static constexpr void* jump_table[] = {&&beginning, &&before_add, &&after_add, &&the_end};
goto* jump_table[coro_data.next_label_idx];
beginning:
while(coro_data.n <= 32) {
coro_data.next_label_idx = 1;
return coro_data.n;
before_add:
coro_data.next_label_idx = 2;
++coro_data.n; // odd steps: add one
return coro_data.n;
after_add:
coro_data.next_label_idx = 3;
coro_data.n *= 2; // even steps: multiply 2
}
the_end:
return -1;
}
play with it here.
NOTE: unfortunately, this "dirty hack" requires some extra work and I would be super happy if that could be avoided somehow. I'd really love to see C++ native coroutines that can be copied if the user promises that its stack is copyable.

C++ Constant anonymous instance with aggregate initialization

Basically Im wanting to fetch a pointer of a constant and anonymous object, such as an instance of a class, array or struct that is inialised with T {x, y, z...}. Sorry for my poor skills in wording.
The basic code that Im trying to write is as follows:
//Clunky, Im sure there is an inbuilt class that can replace this, any information would be a nice addition
template<class T> class TerminatedArray {
public:
T* children;
int length;
TerminatedArray(const T* children) {
this->children = children;
length = 0;
while ((unsigned long)&children[length] != 0)
length++;
}
TerminatedArray() {
length = 0;
while ((unsigned long)&children[length] != 0)
length++;
}
const T get(int i) {
if (i < 0 || i >= length)
return 0;
return children[i];
}
};
const TerminatedArray<const int> i = (const TerminatedArray<const int>){(const int[]){1,2,3,4,5,6,0}};
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
TerminatedArray<const int> const children;
};
const Directory* baseDir;
const TerminatedArray<const Option>* options;
Settings(const Directory* _baseDir, const TerminatedArray<const Option> *_options);
};
//in some init method's:
Settings s = Settings(
&(const Settings::Directory){
"Clock",
(const TerminatedArray<const int>){(const int[]){1,2,0}}
},
&(const TerminatedArray<const Settings::Option>){(const Settings::Option[]){
{"testFoo"},
{"foofoo"},
0
}}
);
The code that I refer to is at the very bottom, the definition of s. I seem to be able to initialize a constant array of integers, but when applying the same technique to classes, it fails with:
error: taking address of temporary [-fpermissive]
I don't even know if C++ supports such things, I want to avoid having to have separate const definitions dirtying and splitting up the code, and instead have them clean and anonymous.
The reason for wanting all these definitions as constants is that Im working on an Arduino project that requires efficient balancing of SRAM to Flash. And I have a lot of Flash to my disposal.
My question is this. How can I declare a constant anonymous class/struct using aggregate initialization?
The direct (and better) equivalent to TerminatedArray is std::initializer_list:
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
std::initializer_list<const int> const children;
};
const Directory* baseDir;
const std::initializer_list<const Option>* options;
Settings(const Directory& _baseDir, const std::initializer_list<const Option>& _options);
};
//in some init method's:
Settings s = Settings(
{
"Clock",
{1,2,0}
},
{
{"testFoo"},
{"foofoo"}
}
);
https://godbolt.org/z/8t7j0f
However, this will almost certainly have lifetime issues (which the compiler tried to warn you about with "taking address of temporary"). If you want to store a (non-owning) pointer (or reference) then somebody else should have ownership of the object. But when initializing with temporary objects like this, nobody else does. The temporaries die at the end of the full expression, so your stored pointers now point to dead objects. Fixing this is a different matter (possibly making your requirements conflicting).
Somewhat relatedly, I'm not sure whether storing a std::initializer_list as class member is a good idea might. But it's certainly the thing you can use as function parameter to make aggregate initialization nicer.
&children[length] != 0 is still true or UB.
If you don't want to allocate memory, you might take reference to existing array:
class Settings {
public:
struct Option {
const char* name;
};
struct Directory {
const char* name;
std::span<const int> const children;
};
const Directory baseDir;
const std::span<const Option> options;
Settings(Directory baseDir, span<const Option> options);
};
//in some method:
const std::array<int, 3> ints{{1,2,0}};
const std::array<Settings::Option> options{{"testFoo"}, {"foofoo"}};
Settings s{"Clock", {ints}}, options};
First, you're not aggregate-initializing anything. This is uniform initialization and you're calling constructors instead of directly initializing members. This is because your classes have user-defined constructors, and classes with constructors can't be aggregate-initialized.
Second, you're not really able to "initialize a constant array of integers". It merely compiles. Trying to run it gives undefined behavior - in my case, trying to construct i goes into an infinite search for element value 0.
In C++, there's values on the stack, there's values on the heap and there's temporary values (I genuinely apologize to anyone who knows C++ for this statement).
Values on the heap have permanent addresses which you can pass around freely.
Values on the stack have temporary addresses which are valid until
the end of the block.
Temporary values either don't have addresses
(as your compiler warns you) or have a valid address for the duration
of the expression they're used for.
You're using such a temporary to initialize i, and trying to store and use the address of a temporary. This is an error and to fix it you can create your "temporary" array on the stack if you don't plan to use i outside of the block where your array will be.
Or you can create your array on the heap, use its address to initialize i, and remember to explicitly delete your array when you're done with it.
I recommend reading https://isocpp.org/faq and getting familiar with lifetime of variables and memory management before attempting to fix this code. It should give you a much better idea of what you need to do to make your code do what you want it to do.
Best of luck.

Modify key of std::map

Is there a way to modify the key of a std::map or ? This example shows how to do so with rebalancing the tree. But what if I provide some guarantees that the key won't need to be rebalanced?
#include <vector>
#include <iostream>
#include <map>
class Keymap
{
private:
int key; // this key will be used for the indexing
int total;
public:
Keymap(int key): key(key), total(0)
{}
bool operator<(const Keymap& rhs) const{
return key < rhs.key;
}
void inc()
{
total++;
}
};
std::map<Keymap, int> my_index;
int main (){
std::map<Keymap, int> my_index;
Keymap k(2);
my_index.insert(std::make_pair(k, 0));
auto it = my_index.begin();
it->first.inc(); // this won't rebalance the tree from my understanding
return 0;
}
The modification won't compile because of the constness of it->first
Is there any way to override this behavior?
You could make inc const and total mutable
class Keymap
{
private:
int key; // this key will be used for the indexing
mutable int total;
public:
Keymap(int key): key(key), total(0)
{}
bool operator<(const Keymap& rhs) const{
return key < rhs.key;
}
void inc() const
{
total++;
}
};
But you do need to ask yourself why you are doing this, mutable isn't used much.
You're right that no rebalancing is going to happen.
If you cannot change the design and introduce surrogate read-only keys, your best option is to use Boost.MultiIndex container (I am not aware of reasonable alternatives). It is designed specifically for this purpose and has consistent built-in support of updating the indexed object, including the transactional variant. Documentation and code examples are here.
Generally, patterns like storing business entities in a self-keyed sets, having mutable keys serving additional purpose (counters and whatnot), etc. tend to have impact on maintenability, performance, and scalability of the code.
You could wrap your keys into a class that allows modification of const objects. One such class would be std::unique_ptr:
using KeymapPtr = std::unique_ptr<Keymap>;
struct PtrComp
{
template<class T>
bool operator()(const std::unique_ptr<T>& lhs, const std::unique_ptr<T>& rhs) const
{
return *lhs < *rhs;
}
};
template<class V>
using PtrMap = std::map<KeymapPtr, V, PtrComp>;
int main (){
PtrMap<int> my_index;
KeymapPtr k = std::make_unique<Keymap>(2);
my_index.emplace(std::move(k), 0);
auto it = my_index.begin();
it->first->inc(); // this won't rebalance the tree from my understanding
return 0;
}
Demo
Note that we have to supply a custom comparator object since we (presumably) want to sort by the key values, not the pointer values.
To be clear, this is not what unique_ptr is meant for, and the const semantics of smart pointers (which follow those of regular pointers) are a bit backwards from this perspective (why can I get a non-const reference from a const object? A linter may complain about this kind of use...), but it does the trick here. The same would of course work with naked pointers (where a T* const can have the T value changed but not the pointer location, whereas a const T* can have its location changed but not the T), but this mimics the ownership/lifetime model of your original code.
Needless to say, this opens the door to breaking the map invariants (breaking the sortedness by keys) so think twice before using it. But unlike const_casting your key directly, it is free of UB.
std::map and the other standard associative containers do not provide a way to do this without removing and adding an element, likely causing tree rebalancing side effects. You can go around the map key constness in various ways (e.g. using mutable members), but then it's entirely up to you to make sure you don't actually break the key ordering.
If you need this sort of efficiency but a bit more safety, you might consider changing the container to a boost::multi_index_container instead.
A std::map<K,V> is similar to:
namespace BMI = boost::multi_index;
using map_value_type = std::pair<K, V>;
using map_type = BMI::multi_index_container<
map_value_type,
BMI::indexed_by<BMI::ordered_unique<
BMI::member<map_value_type, &map_value_type::first>
>>>;
except that in a multi_index_container, the entire element is always const. If you want to be able to directly modify the second members, a means for that is described on this boost page.
multi_index_container provides two members the standard associative containers do not, replace and modify. Both of these will check for whether the modified element is in the same sort order or not. If it is, no rebalancing is done.
auto it = my_index.begin();
auto pair = *it;
pair.first.inc();
my_index.replace(it, pair);
// OR
auto it = my_index.begin();
my_index.modify(it, [](auto& pair) { pair.first.inc(); });

Creating a map for millions of objects in C++

I have an abstract class called Object that has a few virtual functions, one of which is a function that will retrieve the id of an Object.
Currently, I am using a std::vector<Object> to store tens of millions of these objects. Unfortunately, adding, copying, or removing from this is painfully slow.
I wanted to create a hash map that could maybe have the Object->id as the key, and maybe the object itself as a value? Or is there some type of data structure that would allow for easy insertion and removal like a std::vector but would be faster for tens of millions of objects?
I would want the class to end up looking something like this outline:
stl::container<Objects*> obj_container;
DataContainer::DataContainer()
: stl::container(initialized_here)
{}
DataContainer::addObject(Object* object)
{
obj_container.insert(object);
}
DataContainer::removeObject(Object* object)
{
obj_container.remove(object);
}
DataContainer::preSort()
{
obj_container.sort_by_id();
}
DataContainer::getObject(Object* object)
{
if(!obj_container.contains(object)) { return; }
binary_search(object);
}
Is there anything really fast at processing large amounts of these objects, or is there anything really fast that could possibly use an unsigned integer id from an object to process the data?
Also, my class would get pre-sorted, so every object would be sorted by ID before being added to the container. Then I would do a binary search on the data by ID.
You probably could use std::set (if the id-s have some order and are unique for it) or std::unordered_set and I would suggest you make it a component of your container, not derive your container from it. You'll better have a way of constructing a local fake Object with only its id ...
class Object {
friend class BigContainer;
unsigned long _id;
// other fields;
// your constructors
public:
unsigned long id() const { return _id; };
private:
Object(unsigned long pseudoid); // construct a fake object
};
struct LessById {
bool operator () (const Object &ob1, const Object& ob2)
{ return ob1.id() < ob2.id(); };
bool operator () (const Object &ob, unsigned long idl)
{ return ob1.id() < idl;
};
class BigContainer {
std::set<Object,LessById> set;
public:
// add members, constructors, destructors, etc...
bool contains(unsigned long id) const {
Object fakeobj{id};
if (set.find(fakeobj) != set.end()) return true;
return false;
};
const Object* find_by_id(unsigned long id) const {
Object fakeobj{id};
auto p = set.find(fakeobj);
if (p != set.end()) return &(*p);
return nullptr;
};
bool contains(const Object& ob) const {
if (set.find(ob) != set.end()) return true;
return false;
};
void add(const Object&ob) const {
Object fakeobj{id};
auto p = set.find(fakeobj);
if (p == set.end()) set.insert(ob);
}
void remove(unsigned long id) const {
Object fakeobj{id};
auto p = set.find(fakeobj);
if (p != set.end()) set.erase(p);
}
};
If you want a set of pointers use a set of some smart pointers and adapt the scheme above.
If the Object is big and you have trouble in defining a way of constructing efficiently local fake objects for a given id, define a super struct BoxedId { unsigned long id; BoxedId(unsigned long l): id(l) {}; }, declare internally a std::set<std::shared_ptr<BoxedId>,BoxedLessById> make class Object : public BoxedId, etc...
BTW, since Object has virtual methods you probably will subclass it and you need to have a set of pointers. You need to define a pointer policy (are every actual instances of sub-classes of Object-s in your Container) and use some smart pointer.... You need to define who is in charge of delete-ing your Object-s (who owns the pointer). Is it only the unique BigContainer.
Read the C++11 rule of five.
Please have a look at this site : http://www.cs.northwestern.edu/~riesbeck/programming/c++/stl-summary.html
It shows the time complexity of each operation of each STL.
First be clear about your requirement and then choose particular STL wisely by comparing its time complexity shown in above link.

Setter and Getter method for map

string var;
void setvar(string ivar)
{
var=ivar;
}
string getVar() const
{
return var;
}
as same way how can i write setter and getter method for a map like this
std::map varmap;
You can write a getter or setter for a field that's a std::map just as you would any other field - just have the getter return a std::map and have the setter accept a std::map.
Of course, if you have a field that's a std::map that you're trying to use getters and setters on, that might suggest that there's a better way to structure the program. Can you provide more details about what you're trying to do?
EDIT: The above answer is for a slightly different question than the one you asked. It seems like what you're interested in is
Given a class with a std::map as a data member, write a function to set a given key/value pair and a function to return the value associated with a given key.
The setter logic for this is not too hard - you just write a function that takes in the key and value and associates the key with the value. For example:
void put(const string& key, const string& value) {
varmap[key] = value;
}
Writing a getter is trickier because there's no guarantee that there's a value associated with a particular key. When this happens, you have multiple options.
You could return a sentinel value. For example, you might return an empty string if the given value isn't stored in the map anywhere. This makes the code for using the function easier to read, but risks using an invalid value in code.
You could throw an exception. This would be good if it represents a serious error for the given value not to exist. This has the drawback that if you look up a value, you always need to try/catch the logic to avoid propagation of errors.
You could associate a default value with the key, then hand that back. If you're writing a program that represents a music library, for example, you might hand back "(none)" or "(unknown)" if you tried to look up the artist for a song on which you have no data, for example.
No one of these approaches works best, and you'll need to think over which is most appropriate to your particular circumstance.
Entries in a std::map<Key, Value> must have a key and a value. The normal way of getting and setting them is:
my_map[a_key] = new_value; // set
do_something_with(my_map[a_key]); // get and use...
If you want to add new functions, they probably wouldn't look like what you're proposing because:
your set is only given one parameter despite needing a key and value (admittedly, you could adopt some convention like having the first ':' or '=' separate them), and
the get() function doesn't provide any key.
You could instead have something more like:
void set(const Key&, const Value&);
std::string get(const Key&) const;
But, even if you have write permissions to do so, you shouldn't add that directly in the map header file - all C++ programs compiled on that computer will share that file and won't expect it to be modified. Any small mistake could cause trouble, and if you ship your program to another computer you won't be able to compile it there without making a similar modification - if that computer uses a different C++ compiler the necessary details of that modification may be slightly different too.
So, you can either write your own (preferably templated) class that derives from (inherits) or contains (composition) a std::map, providing your functions in your custom class. An inheritance based solution is easier and more concise to write:
template <typename Key, typename Value>
struct My_Map : std::map<Key, Value>
{
My_Map(...); // have to provide any non-default constructors you want...
void set(const Key& key, const Value& value) { operator[](key) = value; }
// if you want entries for non-existent keys to be created with a default Value...
Value& get(const Key& key) { return operator[](key); }
--- OR ---
// if you want an exception thrown for non-existent keys...
Value& get(const Key& key) { return at(key); }
const Value& get(const Key& key) const { return at(key); }
};
This is slightly dangerous if you're planning to pass My_Maps around by pointer and accidentally end up with a "new My_Map" pointer that's later deleted as a std::map pointer, as in:
void f(std::map<int, string>* p) { /* use *p */ delete p; }
My_Map<int, string>* p = new My_Map<int, string>;
f(p);
Still, in most programs there's no real danger of accidentally disposing of a map like this, so go ahead and do it.
Further, and this is the kind of thinking that'll make me unpopular with the Standard-fearing purists around here - because My_Map hasn't added any data members or other bases, the std::map<> destructor probably does all the necessary tear-down even though it's technically Undefined Behaviour. I'm NOT encouraging you to ignore the issue (and would consider it unprofessional in a job requiring robustness), but you can at least rest a little easier. I'd be curious to hear from anyone with any compiler/settings where it demonstrably doesn't operate safely.
If you use composition, you'll have to write your own "forwarding" functions to let you use My_Map like a std::map, accessing iterators, find, erase, insert etc.. It's a pain.
Setter and getter for std::map is no different except that you need to pass the necessary parameters for the setter. Assume if I have a struct and has a member variable whose type is std::map, whose key is of type char and data is of type int. Method signatures would be of the format -
void setEncode( char* key, int* data, const int& size ); Because, std::map requires a key, data and sizes of these arrays being passed. With out knowing size, it is unknown as how far to insert the elements in to the container.
std::map<char, int> getEncode() const ; const key word signifies it a non-modifying member function. Because it's functionality is to just return a variable of type std::map.
Example -
struct myMap
{
std::map<char, int> encode;
void setEncode( char* key, int* data, const int& size );
std::map<char, int> getEncode() const ;
};
void myMap::setEncode( char *key, int* data, const int& size )
{
int i=0;
while( i < size )
{
encode.insert(std::pair<char, int>(key[i], data[i]));
++i ;
}
}
std::map<char, int> myMap::getEncode() const
{
return encode;
}
Results IdeOne. This should give you an idea, but should also follow the general rules what #templatetypedef, #tony suggested.
Do you want to set a key value pair in an existing map(probably that's what you want) or create a new map itself?
void setvar(string key, int value)
{
myMap[key] = value;
}
int getVar(string key) const
{
return myMap[key];
}
where int and string are interchangeable
For latter you'll probably have to interate over all map values for setting and getter should be just to return that map pointer.