What container to store unique values with no operator< defined - c++

I need to store unique objects in a container. The object provides a operator== and operator!= (operator< nor operator>).
I can't use std::set, as it requires a operator<.
I can't use std::unordered_set as it requires a hash function and I have none. Let's say I can't write one considering my object type (or I'm lazy).
Am I really forced to use a std::vector and make sure myself that items does not get duplicated in the container (using std::find which uses operator==)?
Is there really no container that could be used to store unique items only using the operator==?

There's indeed no standard container, and that's because it would be inefficient. O(N), to be precise - exactly the brute force search you imagine.
Both std::set<T> and std::unordered_set<T> avoid a brute-force search by taking advantage of a non-trivial property of T. Lacking either property, any of the existing N members of a container could be equal to a potential new value V, and you must therefore compare all N members using operator== repeatedly.

"Let's say I can't write a hash function considering my object type (or I'm lazy)."
Well, you're lazy, but I'll write one for you anyway : template<typename T> size_t degenerate_hash(T) { return 0; }.
Of course, this means you get O(N) performance because every value collides with every other value, but that was the best possible outcome anyway.

Use a std::vector and before you std::vector::push_back or std::vector::insert use first std::find to check whether the element already exists in the vector.
Or at the end of all insertions use std::unique in combination with std::vector::erase to remove duplicates.

Related

How to implement set::find() to match only the key of a pair?

I have a container class which stores data in a std::set. I don't need or use the extended facilities provided by std::map. There is a method values() which returns a const reference to the private set so if I were to use a map instead then I would have to copy the entire container. I want to keep it as a std::set.
The set contains objects of a class similar to std::pair with a key and a value and implements operator < for use in a set.
I have a method in the container which accepts the 'key' portion of the pair for the purpose of searching the set and returning a complete pair while only matching the key.
I can iterate through the set sequentially but then I lose the O(log N).
Also note that the set needs to be sorted, which removes the option of using an unordered_set.
It's not clear exactly what your operator< actually compares, but the long and the short of it is that with a std::set, the only way to efficiently search the set is by using its defined comparison function.
Based on your question, I am assuming that your set is
std::set<std::pair<firstType, secondType>, ComparisonClass>
With ComparisonClass implementing the strict weak ordering. Or, your could also be using a:
std::set<PairClass>
With the PairClass being a subclass of a std::pair, that implements an operator<, for the strict weak ordering. Either one or the other is what appears your question is describing. But either way, both alternatives are logically equivalent, for the purpose of the following answer:
If your operator< implements strict weak ordering based on both the value pair's first and second, then that's pretty much it. You can only execute the set's built-in logarithmic search by searching for the same first and second.
There's no easy way to do anything other than that. So, what now?
Well, the root problem seems to be is that you might not be using the right container. Consider the following container that, with a little bit of work, will be equivalent to your set:
std::multimap<firstType, std::set<secondType>>
That is, your container is a multimap keyed by your pair's first, with the value of your multimap being a std::set of all the secondType that are paired up with a given firstType.
The only thing you have to be careful here is to define insert and remove operation into this container in such a manner, so that you will never end up with a firstType with an empty std::set value. As long as this condition is met, this should be logically equivalent to a std::set of your std::pairs. Furthermore:
1) You can still implement an algorithmic search for a firstType+secondType by, first, a logarithmic search on the firstType, grabbing the value std::set, and then executing a logarithmic search on that. Logically equivalent.
2) You can implement an algorithmic search for just the firstType by doing only the first half of the full search. This gives you the value std::set, that provides the equivalent of all pairs that have the same firstType.

Binary_search in STL set over set's member function find?

Why do we have 2 ways like above to search for an element in the set?
Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?
Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?
Binary_search in STL set over set's member function find?
Why do we have 2 ways like above to search for an element in the set?
Binary search returns a bool and set::find() and iterator. In order to compare apples to apples, the algorithm to compare set::find() with is std::lower_bound() which also returns an iterator.
You can apply std::lower_bound() on an arbitrary sorted range specified by a pair of (forward / bidirectional / random access) iterators and not only on a std::set. So having std::lower_bound() is justified. As std::set happens to be a sorted range, you can call
std::lower_bound(mySet.begin(), mySet.end(), value);
but the
mySet.find(value);
call is not only more concise, it is also more efficient. If you look into the implementation of std::lower_bound() you will find something like std::advance(__middle, __half); which has different complexity depending on the iterator (whether forward / bidirectional / random access iterator). In case of std::set, the iterators are bidirectional and advancing them has linear complexity, ouch! In contrast, std::set::find() is guaranteed to perform the search in logarithmic time complexity. The underlying implementation (which is a red and black tree in case of libstdc++) makes it possible. Offering a set::find() is also justified as it is more efficient than calling std::lower_bound() on std::set.
Also find algorithm can be used to find an element in a list or a
vector but what would be the harm in these providing a member function
as well as member functions are expected to be faster than a generic
algorithm?
I don't see how you could provide a faster member function for list or vector, unless the container is sorted (or possesses some special property).
Why do we need remove algorithm and create all the drama about erase
remove where remove will just shift the elements and then use erase to
delete the actual element..Just like STL list provides a member
function remove why cant the other containers just offer a remove
function and be done with it?
I can think of two reasons.
Yes, the STL is seriously lacking many convenience functions. I often feel like I live in a begin-end hell when using algorithms on an entire container; I often proved my own wrappers that accept a container, something like:
template <typename T>
bool contains(const std::vector<T>& v, const T& elem) {
return std::find(v.begin(), v.end(), elem) != v.end();
}
so that I can write
if (contains(myVector, 42)) {
instead of
if (std::find(myVector.begin(), myVector.end(), 42) != myVector.end()) {
Unfortunately, you quite often have to roll your own or use boost. Why? Because standardization is painful and slow so the standardization committee focuses on more important things. The people on the committee often donate their free time and are not paid for their work.
Now deleting elements from a vector can be tricky: Do you care about the order of your elements? Are your elements PODs? What are your exception safety requirements?
Let's assume you don't care about the order of your elements and you want to delete the i-th element:
std::swap(myVector[i], myVector.back());
myVector.pop_back();
or even simpler:
myVector[i] = myVector.back(); // but if operator= throws during copying you might be in trouble
myVector.pop_back();
In C++11 with move semantics:
myVector[i] = std::move(myVector.back());
myVector.pop_back();
Note that these are O(1) operations instead of O(N). These are examples of the efficiency and exception safety considerations that the standard committee leaves up to you. Providing a member function and "one size fits all" is not the C++ way.
Having said all these, I repeat I wish we had more convenience functions; I understand your problem.
I'll answer part of your question. The Erase-Remove idiom is from the book “Effective STL” written by Scott Meye. As to why remove() doesn't actually delete elements from the container, there is a good answer here, I just copy part of the answer:
The key is to realize that remove() is designed to work on not just a
container but on any arbitrary forward iterator pair: that means it
can't actually delete the elements, because an arbitrary iterator pair
doesn't necessarily have the ability to delete elements.
Why STL list provides a member function remove and why can't the other containers just offer a remove function and be done with it? I think it's because the idiom is more efficient than other methods to remove specific values from the contiguous-memory containers.

Retrieve container's comparison function given an iterator

Given an iterator, is it possible to retrieve/use the correct comparison function for the collection that this iterator refers to?
For example, let's assume I'm writing a generic algorithm:
template <class InIt, class T>
void do_something(InIt b, InIt e, T v) {
// ...
}
Now, let's say I want to do something simple, like find v in [b..e). If b and e are iterators over a std::vector, I can simply use if (*b == v) .... Let's assume, however, that b and e are iterators over a std::map. In this case, I should only compare the keys, not the whole value type of what's contained in the map.
So the question is, given those iterators into the map, how do I retrieve that map's comparison function that will only compare the keys? At the same time, I don't want to blindly assume that I'm working with a map either. For example, if the iterators pointed to a set, I'd want to use the comparison function defined for that set. If they pointed to a vector or deque, I'd probably have to use ==, because those containers won't have a comparison function defined.
Oh, almost forgot: I realize that in many cases, a container will only have an equivalent of operator< rather than operator== for the elements it contains -- I'm perfectly fine with being able to use that.
Iterators don't have to be connected to containers, so they don't give you any details about the containers that they aren't necessarily connected to. That's the essential iterator abstraction: iterators delimit sequences, without regard to where the sequence comes from. If you need to know about containers you have to write algorithms that take containers.
There is no standard way to map from an iterator to the underlying container type (if there is such a container at all). You might be able to use some heuristics to try to determine which container, although that will not be simple and probably not guaranteed either.
For example, you can use a metafunction to determine whether the *value_type* is std::pair<const K, T>, which is a hint that this could be a std::map and after extracting the types K and T try to use a metafunction to determine whether the type of the iterator and the type of std::map<K,T,X,Y>::iterator or std::map<K,T,X,Y>::const_iterator match for a particular combination of X, Y.
In the case of the map that could be sufficient to determine (i.e. guess with a high chance of success) that the iterator refers to a std::map, but you should note that even if you can use that and even extract the type X of the comparator, that is not sufficient to replicate the comparator in the general case. While uncommon (and not recommended) comparators can have state, and you would not know which is the particular state of the comparator without having access to the container directly. Also note that there are cases where this type of heuristic will not even help, in some implementations of std::vector<> the iterator type is directly a pointer, and in that case you cannot differentiate between an 'iterator' into an array and an iterator into a std::vector<> of the same underlying types.
Unfortunately iterators don't always know about the container that contains them (and sometimes they aren't in a standard container at all). Even the iterator_traits only have information about the value_type which doesn't specifically tell you how to compare.
Instead, let's draw inspiration from the standard library. All the associative containers (map, etc) have their own find methods rather than using std::find. And if you do need to use std::find on a such a container, you don't: you use find_if.
It sounds like your solution is that for associative containers you need a do_something_if that accepts a predicate telling it how to compare the entries.

quick accessing to element of std::map

Do you know if it is any difference in performance when I access a std::map element using find or operator []?
One returns an iterator and the other a const ref to the object.
Which one might be quicker becuase of all of the behind the scene of the STL?
When you use [] on a key that doesn't exist, the default element will be inserted. This default element depends on your map definition (for example, for an int it will be a zero).
When you use find, there is no "automatic" insertion, so it can be quite faster if you often search for keys that does not exist.
find() is O(n). operator [] is O(1). Therefore the latter is (usually) faster.

Is it possible to use a custom class in place of std::pair in an STL map?

Is this possible?
#include <map>
class Example {
private:
std::map<std::string, std::string, less<std::string>,
std::allocator< CustomPair<std::string, std::string> > > myMap;
};
In the example above, CustomPair would be a template class holding a key and value. If this is possible, is it that simple or is there anything I should look out for?
One can only speculate what your real intent is here, so I assume you already have a class that contains both key and value. In that case std::set with a custom comparison may be a better choice than a std::map.
You then need to provide a comparison that will only compare the key part of your class and the key part must be const (not change over time) as long as the object is in the set.
As mentioned in the comment the elements of a set are only accessable as consts, so if you want to change the value of a such element you need to const_cast the write access or declare the member mutable.
In another answer iain made another very good suggestion. If you rarely insert into the container and mostly access the container searching for elements then a sorted std::vector and std::binary_search are a very effective alternative to the set.
I would be more likely to use std::set.
I would either use a set as described by lothar or use an sorted std::vector as described in "Effective STL" chapter 23: "Consider replacing associative containers with sorted vectors".
The rational for this is that a std::binary_search of a sorted vector with a custom comparitor is nearly as fast and sometimes faster than a map lookup and iteration is much faster. The insert operations are more expensive though (you have to call sort after each insert). A lot of map use cases insert very infrequently though.
The vector would be more flexibility than the set.
I replaced a map of 2000 complex objects (indexed by int) with this approach, iteration and processing every object in the map went from 50 seconds to less than 5 on a server class system. There was no noticeable difference for map lookups times.
I think you can do it but will not get the desired effect, because the use of std::allocator will be done via rebind<std::pair>, thus overriding your selection of CustomPair. In fact, it probably doesn't matter what type you put there, the STL functions will ignore it. At least some of them will definitely do this, but I'm not sure all will. Strictly speaking this is almost certainly implementation dependent. I don't know what the standard says.