STL set::find redefined search

STL set::find redefined search - c++

My program is based around a set of pairs, namely
typedef std::pair<int,int> innerPair;
typedef std::pair<innerPair,int> setElement;
std::set<setElement> Foo;
The innerPair element is what really defines the setElement, but I need to attach a group ID to every element, hence the latter setElement definition.
In the remainder of the program, I need to find innerPair regardless of their group ID so I basically need a function
std::set<setElement>::iterator find(innePair);
that will find the innerPair regardless of its group ID. As it stands I could simply cycle through all available group ID and do multiple find() calls, but it is far from efficient.
Is there a concise way of defining a find( ... ) member function that perform some sort of wildcarded search, or do I need to overload it with my own definition?

If you have multiple elements with the same inner pair and a differing group id, you could use std::multimap<innerPair, int>.
This allows you to store multiple elements with the same innerPair.
It also simplifies searching with lower_bound/upper_bound or equal_range.

I see two possible designs for this. One may be more applicable than the other.
It may be more appropriate for innerPair to be a struct with 3 members (first, second, and group ID) or an std::tuple<int, int, int>. Is the group ID part of the innerPair objects? If so, then I suggest this is the better design.
If not (and to be honest, I think in your situation it's not), you should be using an std::map<innerPair,int> to create a mapping from innerPair objects to group IDs. You can then find an element easily with:
std::map<innerPair,int> mapping;
// Fill your map
innerPair key = {1, 2};
auto found_iter = mapping.find(key);
You can also get the group ID for a specific innerPair with:
int group_id = mapping[key];
You don't need to provide a custom comparator because operator< is already defined for std::pair .

If you want to search by a partial object, you are probably best off not to use a std::set<...> but rather a std::map<...>:
std::map<std::pair<int, int>, int>
This has nearly the value type you targeted for your std::set<...> anyway (the difference is that the first member is declared to be const).
If you really insist in using a std::set<...> you'd needto create a comparator type which ignores the last second member. I'm not sure if std::set<...> supports mixed type comparisons (I think it does not).

You could use a multiset, with a custom comparison function, or functor for instance:
struct innerPairCompare
{
bool operator () (const setElement &a, const setElement &b)
{
const innerPair &a_ = a.first;
const innerPair &b_ = b.first;
return (a_.first > b_.first || a_.first = b_.first && a_.second > b_.second);
}
};
then use it for your multiset:
std::multiset<setElement,innerPairCompare> Foo;
It will store all the setElements with the same innerPair in the same list.
Alternatively, if you need all the innerPair with a given GroupID, use a function comparing on groupID.
Finally, if you don't really need to keep the GroupID and the innerPair together, you could use a map (or multimap), using the GroupID or the innerPair as the Key.

If you want to keep using an std::set then you can use std::find_if with a custom predicate, take a look at this answer.
Basically you will define a function
bool pairCorresponds(std::pair<int,int> element)
that will do the work for you.

Related

Efficient way to store C++ object with multiple fields of different types in std::set/std::map

I have a general question. Let's assume I have a C++ class with multiple fields of different types. I want/need to store the objects of this class in std::set or std::map (in order to access them in O(log(N)).
In order to do it I need to overload operator< BUT what if operator< has no any logical meaning in my case? For example I have class faceDescription which contains fields like eye color, nose type etc.
The most obvious would be to implement operator< just by comparing each field like this:
if (fieldA < other.fieldA)
{
return true;
}
else if (fieldA == other.fieldA && fieldB < other.fieldB)
...
and so on. But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.
I was thinking about "packing" all the fields into a buffer and then compare it with something like std::memcmp but the point is that some fields may be pointers or different classes/structs.
So my question:
Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some std methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?
EDIT
Just an example which may explain the motivation and should be clear for everyone:
Assume video processing with face recognition so that the program receives face description object and it has to count how many times each face appeared during the given video. There may be thousands/millions of faces. So the efficient way to do it is to maintain a map of face description object as a key and number of appearance as a value.
Thanks in advance!

Your question is actually more like three questions packed in one:
I need to overload operator< BUT what if operator< has no any logical meaning in my case?
You don't really need to overload operator<, just provide a custom comparer to std::set or std::map (it's their second template argument); the default is std::less (which uses operator<), but you can provide any binary functor that defines a strict weak ordering relation between your elements.
The most obvious would be to implement operator< just by comparing each field [...] But if I have many fields this method will be too long with too many branches, hardly readable and probably hard to maintain.
Unfortunately, C++ has no reflection (not even compile-time reflection, which would solve the situation here), so there's no easy way to make the "remember to add all the fields to the comparer whenever I add them to the struct".
However, the lexicografical comparison of a tuple of heterogeneous values is already solved (in C++11) by std::tuple; you can easily implement an operator< (or, FWIW, your custom comparer) by using std::tie and calling < on the returned tuples:
bool myComparer(const MyStruct &a, const MyStruct &b) {
return std::tie(a.member1, a.member2, a.member3) < std::tie(b.member1, b.member2, b.member3);
}
You can find a similar example at its reference page on cppreference.com.
Is there an efficient and generic way to define a "unique identifier" to the class (maybe with some std methods) based on the fields values so that this "unique identifier" could be used to compare/sort objects of that class?
Creating a unique identifier to compare/sort objects (i.e. satisfies the constraints of a strict weak ordering) depends on the exact details of your object - but probably, if you say that your objects do not have a meaningful ordering (besides an artificial one that you can impose by lexicographically comparing their components) you don't actually want such a thing; you just want to be able to use associative containers.
Enter std::unordered_map and std::unordered_set (actually hashtables behind the standardese decoy names); what they require is a "somewhat unique" identifier that can quickly discriminate between different keys, AKA a hash function, and they can retrieve your element on average in O(1) time. In C++11, this function is std::hash.
The standard already defines overloads of it for primitive types plus some other random types; you can define your own hash (following the standard signature; see at the bottom for an example specialization) by combining the hashes of the individual components of your struct; the combination can go from plain XOR or sum to something more elaborated like this.

You can create your own hash function taking class members as arguments and then, you can store your objects in a std::map or std::unordered_map structure using these hash values as keys. So that you won't bother to compare new objects with the all objects in the map. You can also use std::hash for this particular purpose.
You can specialize std::hash for a user defined class (from the reference):
#include <iostream>
#include <functional>
#include <string>
struct S
{
std::string first_name;
std::string last_name;
};
namespace std
{
template<>
struct hash<S>
{
typedef S argument_type;
typedef std::size_t result_type;
result_type operator()(argument_type const& s) const
{
result_type const h1 ( std::hash<std::string>()(s.first_name) );
result_type const h2 ( std::hash<std::string>()(s.last_name) );
return h1 ^ (h2 << 1);
}
};
}
int main()
{
S s;
s.first_name = "Bender";
s.last_name = "Rodriguez";
std::hash<S> hash_fn;
std::cout << "hash(s) = " << hash_fn(s) << "\n";
}

Have you considered using tuple?
// Multi-index map
map<tuple<int, char, float>, string> m;
m[make_tuple(31, 'd', 23.5f)] = "Just an idea";

Priority queue to push pair and int

I want to use priority_queue like this:
priority_queue< pair< int, int> ,int ,cmp >
The comparision should be based on the int value in non-decreasing order.
Example:
((2,5),1),((2,5),2),((2,5),3)

Read the template parameters of std::priority_queue again. The second parameter is the underlying container. You can't use an int.
What you seem to be asking is how to store a pair and an int in a priority queue and sort by the int. Well, you've already figured out how to store a pair of ints. Simply expand that idea and store a pairof a pair and int. That's a naïve solution though. Instead, I recommend using a struct with the pair and the int as members so that you can give them descriptive names. Consider using a struct for the pair of ints too. Then simply use a compare functor which compares the third int only in the order of your choosing ignoring the pair.

priority_queue accepts only "one element" by saying one element I mean only a single element of arbitrary type. For having a pair, and int and another component in the priority queue you need to bring all of them as one meaning you need to be build a struct which will hold them. Then you have to use the bool operator < to tell the compiler how to compare the elements of your type .
bool operator < ( const structName& left, const structName& right)
{
return left.number < right. number;
}
this means that the comparison must be done with the member named number.

Creating unordered_set of unordered_set

I want to create a container that will store unique sets of integers inside.
I want to create something similar to
std::unordered_set<std::unordered_set<unsigned int>>
But g++ does not let me do that and says:
invalid use of incomplete type 'struct std::hash<std::unordered_set<unsigned int> >'
What I want to achieve is to have unique sets of unsigned ints.
How can I do that?

I'm adding yet another answer to this question as currently no one has touched upon a key point.
Everyone is telling you that you need to create a hash function for unordered_set<unsigned>, and this is correct. You can do so by specializing std::hash<unordered_set<unsigned>>, or you can create your own functor and use it like this:
unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor> s;
Either way is fine. However there is a big problem you need to watch out for:
For any two unordered_set<unsigned> that compare equal (x == y), they must hash to the same value: hash(x) == hash(y). If you fail to follow this rule, you will get run time errors. Also note that the following two unordered_sets compare equal (using pseudo code here for clarity):
{1, 2, 3} == {3, 2, 1}
Therefore hash({1, 2, 3}) must equal hash({3, 2, 1}). Said differently, the unordered containers have an equality operator where order does not matter. So however you construct your hash function, its result must be independent of the order of the elements in the container.
Alternatively you can replace the equality predicate used in the unordered_set such that it does respect order:
unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor,
my_unordered_equal> s;
The burden of getting all of this right, makes:
unodered_set<set<unsigned>, my_set_hash_functor>
look fairly attractive. You still have to create a hash functor for set<unsigned>, but now you don't have to worry about getting the same hash code for {1, 2, 3} and {3, 2, 1}. Instead you have to make sure these hash codes are different.
I note that Walter's answer gives a hash functor that has the right behavior: it ignores order in computing the hash code. But then his answer (currently) tells you that this is not a good solution. :-) It actually is a good solution for unordered containers. An even better solution would be to return the sum of the individual hashes instead of hashing the sum of the elements.

You can do this, but like every unsorted_set/map element type the inner unsorted_set now needs a Hash function to be defined. It does not have one by default but you can write one yourself.

What you have to do is to define an appropriate hash for keys of type std::unordered_set<unsigned int> (since operator== is already defined for this key, you will not need to also provide the EqualKey template parameter for std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey>.
One simple (albeit inefficient) option is to hash on the total sum of all elements of the set. This would look similar to this:
template<typename T>
struct hash_on_sum
: private std::hash<typename T::element_type>
{
typedef T::element_type count_type;
typedef std::hash<count_type> base;
std::size_t operator()(T const&obj) const
{
return base::operator()(std::accumulate(obj.begin(),obj.end(),count_type()));
}
};
typedef std::unordered_set<unsigned int> inner_type;
typedef std::unordered_set<inner_type, hash_on_sum<inner_type>> set_of_unique_sets;
However, while simple, this is not good, since it does not guarantee the following requirement. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max().

std::unordered_set<unsigned int>> does not meet the requirement to be an element of a std::unordered_set since there is no default hash function (i.e. std::hash<> is no specialized for std::unordered_set<unsigned int>> ).
you can provide one (it should be fast, and avoid collisions as much as possible) :
class MyHash
{
public:
std::size_t operator()(const std::unordered_set<unsigned int>& s) const
{
return ... // return some meaningful hash of the et elements
}
};
int main() {
std::unordered_set<std::unordered_set<unsigned int>, MyHash> u;
}
You can see very good examples of hash functions in this answer.
You should really provide both a Hash and an Equality function meeting the standard requirement of an Unordered Associative Container.

Hash() the default function to create hashes of your set's elements does not know how to deal with an entire set as an element. Create a hash function that creates a unique value for every unique set and you're good to go.
This is the constructor for an unordered_set
explicit unordered_set( size_type bucket_count = /*implementation-defined*/,
const Hash& hash = Hash(),
const KeyEqual& equal = KeyEqual(),
const Allocator& alloc = Allocator() );
http://en.cppreference.com/w/cpp/container/unordered_set/unordered_set
Perhaps the simplest thing for you to do is create a hash function for your unordered_set<unsigned int>
unsigned int my_hash(std::unordered_set<unsigned int>& element)
{
for( e : element )
{
some sort of math to create a unique hash for every unique set
}
}
edit: as seen in another answer, which I forgot completely, the hashing function must be within a Hash object. At least according to the constructor I pasted in my answer.

There's a reason there is no hash to unordered_set. An unordered_set is a mutable sequence by default. A hash must hold the same value for as long as the object is in the unordered_set. Thus your elements must be immutable. This is not guaranteed by using the modifier const&, as it only guaranties that only the main unordered_set and its methods will not modify the sub-unordered_set. Not using a reference could be a safe solution (you'd still have to write the hash function) but do you really want the overhead of moving/copying unordered_sets ?
You could instead use some kind of pointer. This is fine; a pointer is only a memory address and your unordered_set itself does not relocate (it might reallocate its element pool, but who cares ?). Therefore your pointer is constant and it can hold the same hash for its lifetime in the unordered_set.
( EDIT: as Howard pointed out, you must ensure that any order you element are stored for your set, if two sets have the same elements they are considered equal. By enforcing an order in how you store your integers, you freely get that two sets correspond to two equal vectors. )
As a bonus, you now can use a smart pointer within the main set itself to manage the memory of sub-unordered_set if you allocated them on the heap.
Note that this is still not your most efficient implementation to get a collection of sets of int. To make you sub-sets, you could write a quick wrapper around std::vector that stores the int, ordered by value. int int are small and cheap to compare, and using a dichotomic search is only O(log n) in complexity. A std::unordered_set is a heavy structure and what you lose by going from O(1) to O(log n), you gain it back by having compact memory for each sets. This shouldn't be too hard to implement but is almost guaranteed to be better in performance.
Harder to implements solution would involve a trie.

which element will be returned from std::multimap::find, and similarly std::multiset::find?

Most likely this question is a duplicate but I could not find a reference to it.
I'm looking at std::multiset::find & std::multimap::find functions and I was wondering which element will be returned if a specific key was inserted multiple times?
From the description:
Notice that this function returns an iterator to a single element (of
the possibly multiple equivalent elements)
Question
Is it guaranteed that the single element is the first one inserted or is it random?
Background
The reason I'm asking is that I'm implementing multipmap like class:
typedef std::vector<Item> Item_vector;
class Item
{
string m_name;
};
class MyItemMultiMap
{
public:
// forgive me for not checking if key exist in the map. it is just an example.
void add_item( const Item& v ) { m_map[v.m_name].push_back(v); }
// is returning the first item in the vector mimic std::multimap::find behavior?
Item& get_item( const string& v ) { return m_map[v][0]; }
private:
std::map<string,Item_vector> m_map;
};
I'd like get_item() to work exactly as std::multimap::find. is it possible? if so, how would it be implemented?

The find method may return an arbitrary one if more than one is present, though your STL implementation might indeed just give the first one.
It's safer to use the 'lower_bound' method, and ++ iterate from there (see std::multimap::lower_bound). Do note though that 'lower_bound' returns a ref to another element if what you're looking for isn't present!

The C++ standard says that for any associative container a, a.find(k) "returns an iterator pointing to an element with the key equivalent to k, or a.end() if such an element is not found", and it doesn't impose any additional requirements on multimap. Since it doesn't specify which element is returned, the implementation is permitted to return any matching element.
If you're trying to imitate the exact behavior of multimap on the platform where you're running, that's bad news, but if your goal is just to satisfy the same requirements as multimap, it's good news: you can return any matching element that you want to, and in particular it's fine to just always return the first one.

http://en.cppreference.com/w/cpp/container/multimap/find
Finds an element with key key. If there are several elements with key
in the container, the one inserted earlier is selected.
So, an iterator to the first element will be returned.
In general, I find equal_range to be the more useful method, returning a pair of iterators pointing respectively at the first, and after the last, elements matching the key.

Does QMap support custom comparator functions?

I couldn't find a way to set a custom comparator function for QMap, like I can for std::map (the typename _Compare = std::less<_Key> part of its template arguments).
Does QMap have a way to set one?

It's not documented (and it's a mistake, I think), but in you can specialize the qMapLessThanKey template function for your types (cf. the source). That will allow your type to use some other function rather than operator<:
template<> bool qMapLessThanKey<int>(const int &key1, const int &key2)
{
return key1 > key2; // sort by operator> !
}
Nonetheless, std::map has the advantage that you can specify a different comparator per each map, while here you can't (all maps using your type must see that specialization, or everything will fall apart).

No, as far as i know QMap doesn't have that functionality it requires that it's key type to have operator<, so you are stuck with std::map if you really need that compare functionality.

QMap's key type must provide operator<(). QMap uses it to keep its items sorted, and assumes that two keys x and y are equal if neither x < y nor y < x is true.
In case, overload operator<().

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js