unordered_set non const iterator - c++

For testing purposes I created a little unordered_set and tried to iterate over the set. The set holds an own class:
class Student {
private:
int matrNr;
string name;
public:
Student( const int& matrNr = 0, const string& name = "" )
: matrNr( matrNr ), name( name ) {}
void setNr( const int& matrNr ) {
this->matrNr = matrNr;
}
...
};
I inserted some elements and tried to change the objects during iteration:
unordered_set<Student, meinHash> meineHashTable;
meineHashTable.emplace( 12, "Fred" );
meineHashTable.emplace( 22, "Barney" );
meineHashTable.emplace( 33, "Wilma" );
for (int i = 0; i < meineHashTable.bucket_count(); i++) {
cout << "Bucketnummer: " << i << endl;
unordered_set<Student, meinHash>::local_iterator iter; // not constant?!?
if (meineHashTable.bucket_size( i ) > 0) {
for (iter = meineHashTable.begin( i ); iter != meineHashTable.end( i ); iter++) {
//const_cast<Student&>(*iter).setNr( 1234 ); //This does work
iter->setNr( 1234 ); //This does not work
}
}
else {
cout << "An empty Bucket" << endl;
}
}
I used a local_iterator (and not the const_local_iterator) but still I can't change the objects. For some reasons the iterator refers still to a constant object.
My question now: why is this so? If the normal iterator refers to a const object, what is the different between the const and the non-const iterator?
Tested with VisualStudio 2013 and minGW.
Thanks in advance for any help :-)
EDIT:
The Hash functor:
struct meinHash {
size_t operator()( const Student& s ) {
return s.getNr();
}
};
For finders of this topic in the future who have the same question, here is some example output if you change the matrNr with violent:
const_cast<Student&>(*iter).setNr( 5 );
and try to display it:
unordered_set<Student, meinHash>::local_iterator iter = meineHashTable.find( 5 );
iter->display();
you may get something like:
Bucketnummer: 0
An empty Bucket
Bucketnummer: 1
Matrikelnummer: 5
Name: Wilma
Bucketnummer: 2
An empty Bucket
Bucketnummer: 3
An empty Bucket
Bucketnummer: 4
Matrikelnummer: 5
Name: Fred
Bucketnummer: 5
An empty Bucket
Bucketnummer: 6
Matrikelnummer: 5
Name: Barney
Bucketnummer: 7
An empty Bucket
//The not wanted output ;-)
Matrikelnummer: -842150451
Name:

Both set and unordered_set have read-only keys. It's easy to see why this is the case - if the key value were to change, the data structure would have it filed in the wrong spot and you wouldn't be able to find it anymore.
Per your example, suppose your hash function simply returned the matrNr field. When the hash number changes, any lookup for 1234 will fail because there's nothing stored in that hash bucket.
It could be possible to change some part of the object that is not used in making the hash key, but that would lead to possible hard to track down bugs. The standards committee decided to eliminate that possibility by making the entire key const.
There are two ways around this restriction. The first is to split the key from the value and use a map or unordered_map instead. The second is to remove the item from the set and reinsert it after it's modified.

They value type of a set<K> is const K, and for a map<K, T> it is pair<const K, T>; ditto for the unordered versions.
An iterator gives you access to value_type &, and a const-iterator to a const value_type &. As you can see, neither iterator type can "undo" the constness of the key.
The reason the key is immutable is that it forms an integral part of the underlying data structure; changing the key would require a non-trivial internal rearrangement which would cause all sorts of problems (e.g. non-zero computational complexity (for element access!), and confused iterator ordering).

I had a similar problem and I was confused too. All the sources I looked at indicated that std::unordered_set::find can return a non-const iterator that dereferences to value_type&, which is non-const. On the other hand, all the above answers that state that changing field values within the instance changes its hash and therefore the way it is stored seem to make that impossible. It seems uncharacteristically "sloppy" for the spec to provide an interface that cannot be used, so there has to be a way to do something like what the questioner wants, and there is. You just have to give the compiler enough information to KNOW it's safe to provide you the non-const iterator. To further simplify the original question, we consider the following:
struct student {
std::string name;
double gpa;
// necessary for a decent member of a hash table. Compares all fields by default
bool operator==(const student& other) const = default;
student(const char* _name)
: name(_name)
, gpa(2.0) {}
};
std::unordered_set<student> student_set;
auto found = student_set.find("edgar"); // danger!! See note below
if (found != student_set.end()) {
found->gpa = 4.0; // <- compile failure here. "found" is of type const_iterator
}
If you just use the default std::hash<student>, it folds in all the data from the struct to create the hash - perhaps some combo of std::hash<std::string>(name) and std::hash<double>(gpa). Regardless of how it uses all this data, the compiler behaves as if it incorporates all the data and that's the problem to which the other answers allude, namely that changing any part of record hashed changes its table index. The unordered_set definition from the original question specifies "MeinHash", but we are not shown what it is, and if it factors in things that might be changed via an iterator, we're back to the problem described by the above answers. Typically though, not all the data in record is used to uniquely id an instance within a set. Let's say "name" is enough to disambiguate the student and gpa is just associated data that we may update. The constructor above strongly implies that, making the call to find above dangerous. It will create a temp, using the constructor, assign a name and a gpa of 2.0, and then look up the student using BOTH pieces of information. If "edgar" was added to the set with a gpa of 3.0, his record will never be found, let alone updated by the operation on the iterator (which won't even compile). The compiler takes into account the whole lifespan of an iterator when inferring which override of find to use, so if you use a naive hash function that includes all the fields of the struct, and the compiler sees you changing one of those fields, it "helps" you by failing at compile time. So the first thing you need to do is identify the fields that are truly intrinsic, and required for a hash, and which are not. Then you supply a hash function that uses only these fields - something like the following -
struct student_hash {
std::size_t operator()(const student& hashed_student) {
return std::hash<std::string>()(hashed_student.name);
}
};
For me (using clang), this was not quite enough - necessary, but not sufficient, but at least the compiler now knows that changing "gpa" will have no effect on the index of a record within hash table. I then had to use the mutable keyword with the declaration of "gpa" to explicitly say to the compiler that this field can change without changing what we writer considers the state of this data. Typically, it's used for refcounts or master pointers and other kinds of meta-data not intrinsic to the state of the struct instance, but it applies here as well. So now we have -
struct student {
std::string name;
mutable double gpa;
// indicates that a matching name means a hit
bool operator==(const student& other) const {
return name.compare(other.name) == 0;
}
student(const char* _name)
: name(_name)
, gpa(2.0) {}
};
std::unordered_set<student, student_hash> student_set;
auto found = student_set.find("edgar"); // will find "edgar" regardless of gpa
if (found != student_set.end()) {
found->gpa = 4.0; // <- no longer fails here. "found" is of type iterator
}

unordered_set is a kind of data structure where you cant modify an item without changing its location.
Non-const iterator is const here 'cause STL does protect you from such an obvious mistake.
If you want to modify an unordered_set's item you have to remove it and add it again.

You can cast const type to non-const type. By this you are 'telling the compiler' that you know what you are doing, so you should indeed know what you are doing.

Related

Make object searchable with two different keys

Given a class with two keys:
class A {
int key1;
int key2;
byte x[]; // large array
}
If multiple objects of class A are instantiated and I want to sort them by key1, I can insert them into an std::set.
But if I want to sort these objects both by key1 and by key2, how would I do that?
I could create two sets where one set sorts by key1 and the other set sorts by key2, but that doubles the amount of memory used. How can I avoid this?
Edit 1:
As far as I know, when an object is inserted into a set, the object is copied. So if I create two sets (one sorted by key1 and one sorted by key2), that means two versions of the object will exist: one in set1 and one in set2. This means that member x also exists twice, which unnecessarily doubles the amount of memory used.
Edit 2:
To give a more specific example: given the class Person.
class Person {
std::string name;
std::string address;
// other fields
}
I want to be able to find people either by their name and by their address. Both keys won't be used at the same time: I want to be able to call find(name) and find(address).
Also, objects of the Person class won't be added or removed from the datastructure that often, but lookups will happen often. So lookups should ideally be fast.
Edit 3:
Storing pointers to the objects in the set instead of the objects themselves seems like a good solution. But would it be possible to store pointers in both sets? I.e.
std::set<A*> set_sorted_by_key1;
std::set<A*> set_sorted_by_key2;
A *obj_p = new A();
set_sorted_by_key1.insert(obj_p);
set_sorted_by_key2.insert(obj_p);
Finding an element in a sorted vector via binary_search is O(log(N)) just as std::set::find is O(log(N)), hence if you want to stay with standard containers, concerning time complexity of finding elements, the type of container you actually choose isnt that important.
Concerning the additional memory, you wont get it any cheaper than storing an additional pointer to the elements somewhere.
So what you can do is
std::vector<A> sorted1;
std::sort(sorted1.begin(),sorted1.end(),
[](const A& a,const A& b) { return a.key1 < b.key2; });
std::vector<A*> sorted2;
// ... fill with pointers to elements in sorted2
std::sort(sorted2.begin(),sorted2.end(),
[](A* a, A* b) { return a->key2 < b->key2; });
Storing pointers to the objects in the set instead of the objects themselves seems like a good solution. But would it be possible to store pointers in both sets?
Sure, your sets seem to share ownership of that objects, so:
class Person {
std::string name;
std::string address;
// other fields
};
using PersonPtr = std::shared_ptr<Person>;
Now you want to sort them by name:
struct CmpName {
using is_transparent = void;
bool operator()( const PersonPtr &p1, const PersonPtr &p2 ) const { return p1->name < p2->name; }
bool operator()( const std::string &s, const PersonPtr &p2 ) const { return s < p2->name; }
bool operator()( const PersonPtr &p1, const std::string &s ) const { return p1->name < s; }
};
std::set<PersonPtr,CmpName> byName;
Note type alias using is_transparent = void; and two additional methods are to enable equivalent search in std::set otherwise you would have to create instance of std::shared pointer<Person> just to to lookup. Details can be found here What are transparent comparators?
And search it:
auto f = byName.find( "John" );
Here how it works: Live example
Searching by address can be done very similar way, just add another comparator struct and initialize std::set with it.
Though you can store object and have multiple indexes using boost.multiindex but it has learning curve.

Why iterator from unordered_set is read-only? [duplicate]

For testing purposes I created a little unordered_set and tried to iterate over the set. The set holds an own class:
class Student {
private:
int matrNr;
string name;
public:
Student( const int& matrNr = 0, const string& name = "" )
: matrNr( matrNr ), name( name ) {}
void setNr( const int& matrNr ) {
this->matrNr = matrNr;
}
...
};
I inserted some elements and tried to change the objects during iteration:
unordered_set<Student, meinHash> meineHashTable;
meineHashTable.emplace( 12, "Fred" );
meineHashTable.emplace( 22, "Barney" );
meineHashTable.emplace( 33, "Wilma" );
for (int i = 0; i < meineHashTable.bucket_count(); i++) {
cout << "Bucketnummer: " << i << endl;
unordered_set<Student, meinHash>::local_iterator iter; // not constant?!?
if (meineHashTable.bucket_size( i ) > 0) {
for (iter = meineHashTable.begin( i ); iter != meineHashTable.end( i ); iter++) {
//const_cast<Student&>(*iter).setNr( 1234 ); //This does work
iter->setNr( 1234 ); //This does not work
}
}
else {
cout << "An empty Bucket" << endl;
}
}
I used a local_iterator (and not the const_local_iterator) but still I can't change the objects. For some reasons the iterator refers still to a constant object.
My question now: why is this so? If the normal iterator refers to a const object, what is the different between the const and the non-const iterator?
Tested with VisualStudio 2013 and minGW.
Thanks in advance for any help :-)
EDIT:
The Hash functor:
struct meinHash {
size_t operator()( const Student& s ) {
return s.getNr();
}
};
For finders of this topic in the future who have the same question, here is some example output if you change the matrNr with violent:
const_cast<Student&>(*iter).setNr( 5 );
and try to display it:
unordered_set<Student, meinHash>::local_iterator iter = meineHashTable.find( 5 );
iter->display();
you may get something like:
Bucketnummer: 0
An empty Bucket
Bucketnummer: 1
Matrikelnummer: 5
Name: Wilma
Bucketnummer: 2
An empty Bucket
Bucketnummer: 3
An empty Bucket
Bucketnummer: 4
Matrikelnummer: 5
Name: Fred
Bucketnummer: 5
An empty Bucket
Bucketnummer: 6
Matrikelnummer: 5
Name: Barney
Bucketnummer: 7
An empty Bucket
//The not wanted output ;-)
Matrikelnummer: -842150451
Name:
Both set and unordered_set have read-only keys. It's easy to see why this is the case - if the key value were to change, the data structure would have it filed in the wrong spot and you wouldn't be able to find it anymore.
Per your example, suppose your hash function simply returned the matrNr field. When the hash number changes, any lookup for 1234 will fail because there's nothing stored in that hash bucket.
It could be possible to change some part of the object that is not used in making the hash key, but that would lead to possible hard to track down bugs. The standards committee decided to eliminate that possibility by making the entire key const.
There are two ways around this restriction. The first is to split the key from the value and use a map or unordered_map instead. The second is to remove the item from the set and reinsert it after it's modified.
They value type of a set<K> is const K, and for a map<K, T> it is pair<const K, T>; ditto for the unordered versions.
An iterator gives you access to value_type &, and a const-iterator to a const value_type &. As you can see, neither iterator type can "undo" the constness of the key.
The reason the key is immutable is that it forms an integral part of the underlying data structure; changing the key would require a non-trivial internal rearrangement which would cause all sorts of problems (e.g. non-zero computational complexity (for element access!), and confused iterator ordering).
I had a similar problem and I was confused too. All the sources I looked at indicated that std::unordered_set::find can return a non-const iterator that dereferences to value_type&, which is non-const. On the other hand, all the above answers that state that changing field values within the instance changes its hash and therefore the way it is stored seem to make that impossible. It seems uncharacteristically "sloppy" for the spec to provide an interface that cannot be used, so there has to be a way to do something like what the questioner wants, and there is. You just have to give the compiler enough information to KNOW it's safe to provide you the non-const iterator. To further simplify the original question, we consider the following:
struct student {
std::string name;
double gpa;
// necessary for a decent member of a hash table. Compares all fields by default
bool operator==(const student& other) const = default;
student(const char* _name)
: name(_name)
, gpa(2.0) {}
};
std::unordered_set<student> student_set;
auto found = student_set.find("edgar"); // danger!! See note below
if (found != student_set.end()) {
found->gpa = 4.0; // <- compile failure here. "found" is of type const_iterator
}
If you just use the default std::hash<student>, it folds in all the data from the struct to create the hash - perhaps some combo of std::hash<std::string>(name) and std::hash<double>(gpa). Regardless of how it uses all this data, the compiler behaves as if it incorporates all the data and that's the problem to which the other answers allude, namely that changing any part of record hashed changes its table index. The unordered_set definition from the original question specifies "MeinHash", but we are not shown what it is, and if it factors in things that might be changed via an iterator, we're back to the problem described by the above answers. Typically though, not all the data in record is used to uniquely id an instance within a set. Let's say "name" is enough to disambiguate the student and gpa is just associated data that we may update. The constructor above strongly implies that, making the call to find above dangerous. It will create a temp, using the constructor, assign a name and a gpa of 2.0, and then look up the student using BOTH pieces of information. If "edgar" was added to the set with a gpa of 3.0, his record will never be found, let alone updated by the operation on the iterator (which won't even compile). The compiler takes into account the whole lifespan of an iterator when inferring which override of find to use, so if you use a naive hash function that includes all the fields of the struct, and the compiler sees you changing one of those fields, it "helps" you by failing at compile time. So the first thing you need to do is identify the fields that are truly intrinsic, and required for a hash, and which are not. Then you supply a hash function that uses only these fields - something like the following -
struct student_hash {
std::size_t operator()(const student& hashed_student) {
return std::hash<std::string>()(hashed_student.name);
}
};
For me (using clang), this was not quite enough - necessary, but not sufficient, but at least the compiler now knows that changing "gpa" will have no effect on the index of a record within hash table. I then had to use the mutable keyword with the declaration of "gpa" to explicitly say to the compiler that this field can change without changing what we writer considers the state of this data. Typically, it's used for refcounts or master pointers and other kinds of meta-data not intrinsic to the state of the struct instance, but it applies here as well. So now we have -
struct student {
std::string name;
mutable double gpa;
// indicates that a matching name means a hit
bool operator==(const student& other) const {
return name.compare(other.name) == 0;
}
student(const char* _name)
: name(_name)
, gpa(2.0) {}
};
std::unordered_set<student, student_hash> student_set;
auto found = student_set.find("edgar"); // will find "edgar" regardless of gpa
if (found != student_set.end()) {
found->gpa = 4.0; // <- no longer fails here. "found" is of type iterator
}
unordered_set is a kind of data structure where you cant modify an item without changing its location.
Non-const iterator is const here 'cause STL does protect you from such an obvious mistake.
If you want to modify an unordered_set's item you have to remove it and add it again.
You can cast const type to non-const type. By this you are 'telling the compiler' that you know what you are doing, so you should indeed know what you are doing.

How to convert vector to set?

I have a vector, in which I save objects. I need to convert it to set. I have been reading about sets, but I still have a couple of questions:
How to correctly initialize it? Honestly, some tutorials say it is fine to initialize it like set<ObjectName> something. Others say that you need an iterator there too, like set<Iterator, ObjectName> something.
How to insert them correctly. Again, is it enough to just write something.insert(object) and that's all?
How to get a specific object (for example, an object which has a named variable in it, which is equal to "ben") from the set?
I have to convert the vector itself to be a set (a.k.a. I have to use a set rather than a vector).
Suppose you have a vector of strings, to convert it to a set you can:
std::vector<std::string> v;
std::set<std::string> s(v.begin(), v.end());
For other types, you must have operator< defined.
All of the answers so far have copied a vector to a set. Since you asked to 'convert' a vector to a set, I'll show a more optimized method which moves each element into a set instead of copying each element:
std::vector<T> v = /*...*/;
std::set<T> s(std::make_move_iterator(v.begin()),
std::make_move_iterator(v.end()));
Note, you need C++11 support for this.
You can initialize a set using the objects in a vector in the following manner:
vector<T> a;
... some stuff ...
set<T> s(a.begin(), a.end());
This is the easy part. Now, you have to realize that in order to have elements stored in a set, you need to have bool operator<(const T&a, const T& b) operator overloaded. Also in a set you can have no more then one element with a given value acording to the operator definition. So in the set s you can not have two elements for which neither operator<(a,b) nor operator<(b,a) is true. As long as you know and realize that you should be good to go.
If all you want to do is store the elements you already have in a vector, in a set:
std::vector<int> vec;
// fill the vector
std::set<int> myset(vec.begin(), vec.end());
You haven't told us much about your objects, but suppose you have a class like this:
class Thing
{
public:
int n;
double x;
string name;
};
You want to put some Things into a set, so you try this:
Thing A;
set<Thing> S;
S.insert(A);
This fails, because sets are sorted, and there's no way to sort Things, because there's no way to compare two of them. You must provide either an operator<:
class Thing
{
public:
int n;
double x;
string name;
bool operator<(const Thing &Other) const;
};
bool Thing::operator<(const Thing &Other) const
{
return(Other.n<n);
}
...
set<Thing> S;
or a comparison function object:
class Thing
{
public:
int n;
double x;
string name;
};
struct ltThing
{
bool operator()(const Thing &T1, const Thing &T2) const
{
return(T1.x < T2.x);
}
};
...
set<Thing, ltThing> S;
To find the Thing whose name is "ben", you can iterate over the set, but it would really help if you told us more specifically what you want to do.
How to correctly initialize it?
std::set<YourType> set;
The only condition is that YourType must have bool operator<(const YourType&) const and by copyable (default constructor + assignment operator). For std::vector copyable is enough.
How to insert them correctly.
set.insert(my_elem);
How to get specific object (for example object, which has name variable in it, which is equal to "ben") from set?
That's maybe the point. A set is just a bunch of object, if you can just check that an object is inside or iterate throught the whole set.
Creating a set is just like creating a vector. Where you have
std::vector<int> my_vec;
(or some other type rather than int) replace it with
std::set<int> my_set;
To add elements to the set, use insert:
my_set.insert(3);
my_set.insert(2);
my_set.insert(1);

Access elements of map which is member of vector without creating copy

I have following data type
typedef std::map <std::string.std::string> leaf;
typedef std::map <std::string,leaf> child;
typedef std::vector<child> parent;
Now if I want access parent element at index 0 and child element having key "x" and then perform some operation of it's values
First way of doing this will be like:
parentobject[0]["x"]["r"]
But every time I need to repeat these index whenever I want access that value.
Second way of doing this will be like:
std::string value=parentobject[0]["x"]["r"]
Then use value object. But problem with this approach is this line will create copy of the string.
Is there a better way to access variable without creating copy?
You can use reference to avoid copy:
std::string & value = parentobject[x][y][z];
Or, will you be okay with this instead:
//define a lambda in the scope
auto get = [] (int x, std::string const & y, std::string const & z)
-> std::string &
{
return parentobject[x][y][z];
}
//then use it as many times as you want in the scope
std::string & value = get(0, "x", "r");
get(1, "y", "s") = "modify";
Use a reference:
const std::string& value = parentobject[0]["x"]["r"];
Now you can refer to value anywhere you like (within the same block scope) without having to perform the map lookups again.
Remove the const if you really need to.
Please purchase and read one of these books in order to learn C++'s basic features.

How to make std::map compare to work on multiple data types?

I've a entire table stored in std::deque<record *> and I need to allow the user to sort the table on any column. The table is presented to the user in a list box format.
Each record consists of multiple strings (struct of strings). However, the fields are of different types i.e., time (HH:MM:SS), float, and strings, even though they are all stored as strings.
The user is permitted to sort on any of these columns. When the user clicks on the column, I store each record in a multimap so that the table is shown in sorted format to the user.
However, since the columns are of different types, how do I write a single compare method, that handles all these efficiently?
I thought of the following ways
Use different maps for each type and write one compare function class for each of the maps.
Use a single map, with a compare class that handles all three different types. But for each insertion, the comparison class has to decide the type , and insert accordingly.
Is there a better way than these two?
Example:
struct ltDataCompare
{
bool operator()( const CString& csData1, const CString& csData2) const
{
if ( isTimeFormat(csData1) && isTimeFormat(csData1) )
{
// Do time relevant comparision
}
else if ( isNumberFormat( csTime1 ) && isNumberFormat(csTime2) )
{
double dPrice1 = atof((LPCTSTR)csTime1);
double dPrice2 = atof((LPCTSTR)csTime2);
return ( dPrice1 < dPrice2);
}
return ( csTime1 < csTime2 );
}
};
std::multimap<CString,list_record_t*,ltDataCompare> _mapAllRecords; // Used only for sorting
You can't re-sort a map or multimap - once an item is inserted, its position is locked. It would be better to use a different container such as a vector and sort it when necessary.
The nice thing about a comparison class is that it is allowed to contain state. You can have a member with some constant or pointer to determine which comparison method to use.
You can use the same principle to choose which field to sort on.
struct ltDataCompare
{
ltDataCompare(int field, int method) : m_field(field), m_method(method) {}
bool operator()( const record& left, const record& right) const
{
if (m_method == enumTimeFormat)
return CompareTimes(left[m_field], right[m_field]);
else if (m_method == enumNumberFormat)
return CompareNumbers(left[m_field], right[m_field]);
// ...
}
int m_field;
int m_method;
};
std::sort(table.begin(), table.end(), ltDataCompare(0, enumTimeFormat));
You could be more elegant about it - I don't know you'd save yourself any work - if you had a class with a < operator in it for each of the types. If you have a superclass that has a virtual < operator then you can use it as the key type, as in
std::multimap< superclass, list_record_t >
Now you can use any of the child types as the actual keys (so long as you remain consistent).
Actually I'm not sure whether this is more clever or more elegant. More clever is generally a bad thing (as it means more obscure/less maintainable). If it makes for fewer lines of code, that's usually a good thing.