Ordering a container on something else than the key - c++

I am currently trying to implement a A* algorithm and I've come to a problem :
I want to keep a set of distinct objects, identified by a hash (I've used boost::hash and family, but can use anything else) and ordered by a public int value, member of those objects.
The goal is being able to retrieve the smaller object based on the int value in O(1) and guarantee uniqueness in the most efficient manner (hash seemed a good way to achieve that, but i'm open to alternatives). I don't need to iterate over the container if those two conditions are met.
Is there any already present implementation that answer those specifications ? Am I mistaken in my assumptions ? Should I just extend any existing container ?
EDIT :
Apparently unclear on what "smaller based on int value" means. I mean that my object has a public attribute (lets say score). For two objects a and b, a < b if and only if a.score < b.score.
I want a and b to be in a container, ordered by score. And if I try to insert c with c.hash == a.hash, I want the insertion to fail.

Although std::priority_queue is an adapter, its Container template parameter has to satisfy SequenceContainer, so you can't build one backed by a std::set.
It looks like your best option is to maintain both a set and a priority queue, and use the former to control insertion into the latter. It may be a good idea to encapsulate that into a container-concept class, but you might get away with a couple of methods if your use of it is quite localised.

use a custom comparator and a std::set :
#include <set>
#include <string>
struct Object
{
int value;
long hash;
std::string data;
Object(int value, std::string data) :
value(value), data(data)
{
}
bool operator<(const Object& other) const
{
return data < other.data;
}
};
struct ObjComp1
{
bool operator()(const Object& lhs, const Object& rhs) const
{
return lhs.value < rhs.value;
}
};
struct ObjComp2
{
bool operator()(const Object& lhs, const Object& rhs) const
{
if (lhs.value != rhs.value)
{
return lhs.value < rhs.value;
}
return lhs < rhs;
}
};
int main()
{
Object o1(5, "a");
Object o2(1, "b");
Object o3(1, "c");
Object o4(1, "c");
std::set<Object, ObjComp1> set;
set.insert(o1);
set.insert(o2);
set.insert(o3);
set.insert(o4);
std::set<Object, ObjComp2> set2;
set2.insert(o1);
set2.insert(o2);
set2.insert(o3);
set2.insert(o4);
return 0;
}
First variant will allow you to only insert o1 and o2, second variant will allow you to insert o1, o2 and o3, as it's not really clear which one you need. The only downside is that you need to code your own operator< for the Object type.
alternatively if you don't want to create a custom operator< for you data type, you can wrap a std::map > but this is less straightforward

You could use the stl type priority_queue. If your elements are integers then you could do:
priority_queue<int> q;
Priority queues are internally implemented with heaps, a complete binary tree whose root always is the minimum element of the set. So, you could consult in O(1) by invoking top().
However, as you algorithm progress, you will need extract the items with pop(). Since is a binary tree, the extraction takes O(log N), which it is not O(1), but is a very good time and it is guaranteed, by contrast with a expected time, which would be the case for an imperfect hash table .
I do not know a way for maintaining a set and extracting the minimum in O(1).

Related

C++ map set hybrid

I have folder information in
struct folder
{
int id;
int folder_count;
long long size;
};
I need to keep folders (can be 1000 or more ) sorted by their folder_count and size respectively (folder having most folder_count must be first, and if there are same folder_count, it needs to be sorted by size).
I have achieved it by custom comparator
struct folder_comparator
{
bool operator() (const folder& a, const folder& b) const
{
return a.folder_count>b.folder_count || (a.folder_count==b.folder_count && a.size>=b.size);
}
};
and putting folders into set
set <folder, folder_comparator> folders;
But in the meantime folder gets many updates. I can achieve this by map with key being id of folder.
map<int, folder> folders;
But in this case i cannot keep custom order (mentioned above).
I need both. (keep custom order and O(1) or atleast O(log(N)) complexity search)
What data structure or hybrid of set and map can help me in this situation?
I need both. (keep custom order and O(1) or atleast O(log(N)) complexity search)
This means Boost Multi-Index most probably will fit your need perfectly with hashed index on id and non unique sorted index with your (fixed) comparator. It has learning curve though but it seems it would worse the effort in your case (rather than maintain 2 independed containers)
PS you current comparator does not meet requirement for strict weak ordering which is part of all sorted standard and boost containers. You need to fix it in either case. Easiest way to provide proper comparator is to use std::tie:
bool operator() (const folder& a, const folder& b) const
{
return std::tie( a.folder_count, a.size ) > std::tie( b.folder_count, b.size );
}

Comparator can be used to set a new key, isn't?

I need to change the "key" of a multiset:
multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber> playingNotes;
such as that when I use the .find() function it search and return the first object (iterator) with that NoteNumber property value.
I said "first" because my multiset list could contains objects with the same "key". So I did:
struct IMidiMsgExtCompByNoteNumber {
bool operator()(const IMidiMsgExt& lhs, const IMidiMsgExt& rhs) {
return lhs.NoteNumber() < rhs.NoteNumber();
}
};
but when I try to do:
auto it = playingNotes.find(60);
the compiler says no instance of overloaded function "std::multiset<_Kty, _Pr, _Alloc>::find [with _Kty=IMidiMsgExt, _Pr=IMidiMsgExtCompByNoteNumber, _Alloc=std::allocator<IMidiMsgExt>]" matches the argument list
Am I misunderstanding the whole thing? What's wrong?
I do believe that you have some misunderstandings here:
Part of an associative container's type is it's key type and comparator. Because C++ is strongly typed the only way to change the comparator on a container is to create a new container, copying or moving all the elements into it
Creating a copy of all the elements in a container is a potentially expensive process
By creating a copy you are violating the Single Source of Truth best practice
multiset is used infrequently, I have used it once in my career, others have pointed out it's shortcomings and recommended that you use another container, write your own container, or in my case I'd suggests simply using vector and sorting it how you want when you have to
I'm going to catalog your comments to show how the answer I've already given you is correct:
We're going to assume that the multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber> that you've selected is necessary and cannot be improved upon by using vector as suggested in 4, where:
struct IMidiMsgExtCompByNoteNumber {
bool operator()(const IMidiMsgExt& lhs, const IMidiMsgExt& rhs) {
return lhs.NoteNumber() < rhs.NoteNumber();
}
};
You cannot use multiset::find because that requires you tospecify the exact IMidiMsgExt you are searching for; so you'll need to use find_if(cbegin(playingNotes), cend(playingNotes), [value = int{60}](const auto& i){return i.mNote == value;}) to search for a specific property value. Which will be fine to use on to use directly on PlayingNotes without changing the sorting, because you say:
I want to delete the first note that has mNote of 60. No matter the mTime when deleting.
You'll need to capture the result of the [find_if], check if it is valid, and if so erase it as demonstrated in my answer, because you say:
The first element find will find for that, erase. [sic]
I would roll the code from my answer into a function because you say:
Ill recall find if I want another element, maybe with same value, to get deleted [sic]
Your final solution should be to write a function like this:
bool foo(const multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber>& playingNotes, const int value) {
const auto it = find_if(cbegin(playingNotes), cend(playingNotes), [=](const auto& i){return i.mNote == value;});
const auto result = it != cend(playingNotes);
if(result) {
playingNotes.erase(it);
}
return result;
}
And you'd call it something like this: foo(playingNotes, 60) if you wish to know whether an element was removed you may test foo's return.

How not to use custom comparison function of std::map in searching ( map::find)?

As you can see in my code, lenMap is a std::map with a custom comparison function. This function just check the string's length.
Now when I want to search for some key ( using map::find), the map still uses that custom comparison function.
But How can I force my map not to use that when I search for some key ?
Code:
struct CompareByLength : public std::binary_function<string, string, bool>
{
bool operator()(const string& lhs, const string& rhs) const
{
return lhs.length() < rhs.length();
}
};
int main()
{
typedef map<string, string, CompareByLength> lenMap;
lenMap mymap;
mymap["one"] = "one";
mymap["a"] = "a";
mymap["foobar"] = "foobar";
// Now In mymap: [a, one, foobar]
string target = "b";
if (mymap.find(target) == mymap.end())
cout << "Not Found :) !";
else
cout << "Found :( !"; // I don't want to reach here because of "a" item !
return 0;
}
The map itself does not offer such an operation. The idea of the comparison functor is to create an internal ordering for faster lookup, so the elements are actually ordered according to your functor.
If you need to search for elements in a different way, you can either use the STL algorithm std::find_if() (which has linear time complexity) or create a second map that uses another comparison functor.
In your specific example, since you seem only to be interested in the string's length, you should rather use the length (of type std::size_t) and not the string itself as a key.
By the way, std::binary_function is not needed as a base class. Starting from C++11, it has even been deprecated, see here for example.
The comparison function tells the map how to order elements and how to differentiate between them. If it only compares the length, two different strings with the same length will occupy the same position in the map (one will overwrite the other).
Either store your strings in a different data structure and sort them, or perhaps try this comparison function:
struct CompareByLength
{
bool operator()(const string& lhs, const string& rhs) const
{
if (lhs.length() < rhs.length())
{
return true;
}
else if (rhs.length() < lhs.length())
{
return false;
}
else
{
return lhs < rhs;
}
}
};
I didn't test it, but I believe this will first order strings by length, and then however strings normally compare.
You could also use std::map<std::string::size_type, std::map<std::string, std::string>> and use the length for the first map and the string value for the second map. You would probably want to wrap this in a class to make it easier to use, as there is no protection against messing it up.

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.
C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.
Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.
If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

stl predicate with different types

I have a vector of ordered container classes where I need to know the index of the container that has a given element
so, I would like to do the following, but this obviously doesn't work. I could create a dummy Container to house the date to find, but I was wondering if there was a nicer way.
struct FooAccDateComp
{
bool operator()(const Container& d1, const MyDate& f1) const
{ return d1->myDate < f1; }
};
class Container
{
MyDate myDate;
...
};
vector<Container> mystuff;
MyDate temp(2008, 3, 15);
//add stuff to variable mystuff
int index = int(upper_bound(events.begin(), events.end(),temp, FooAccDateComp())-events.begin());
EDIT: The container class can contain other dates.
upper_bound needs to be able to evaluate expressions like Comp(date,container), but you've only provided Comp(container,date). You'll need to provide both:
struct FooAccDateComp
{
bool operator()(const Container& c, const MyDate& d) const
{ return c.myDate < d; }
bool operator()(const MyDate& d, const Container& c) const
{ return d < c.myDate; }
};
Remember that the vector must be sorted according to this comparison for upper_bound and friends to work.
You don't necessarily need a special predicate, just enable comparison between Container and MyDate.
#include <vector>
struct MyDate {
MyDate(int, int, int);
};
struct Container {
MyDate myDate;
};
// enable comparison between Container and MyDate
bool operator<(Container const&, MyDate const&);
bool operator==(Container const&, MyDate const&);
std::vector<Container> v;
//add stuff to variable mystuff
MyDate temp(2008, 3, 15);
std::vector<Container>::iterator i = std::lower_bound(v.begin(), v.end(), temp);
ptrdiff_t index = i != v.end() && *i == temp ? i - v.begin() : -1;
You can use find_if if you don't mind degrading performance (you said that you have a vector of sorted Container, so binary search would be faster)
Or you can add
struct Container {
MyDate myDate;
operator MyDate () {return myDate};
}
bool operator <(MyDate const&, MyDate const&)
{
return // your logic here
};
Now you can use binary search functions
std::vector<Container>::iterator i = std::upper_bound(v.begin(), v.end(), MyDateObject);
Surely, it will work only if your vector is sorted by Container.myDate
Your example is broken in several trivial ways: the class Container should be defined before FooAccDateComp in order for it to be used there, you should make myDate a public member of Container, access that member in the comparison method using .myDate rather than ->myDate, and finally decide whether to call your vector mystuff or events, but not mix both. I'll suppose that appropriate corrections have been made.
You should have defined your comparison function to take a Date parameter as first argument and a Container parameter as second; the opposite to what you did. Or you could use std::lower_bound instead of std::upper_bound if that would suit you purpose (since you don't say what you are going to do with index it is hard to tell) as the choice made in the question is adapted to that. Contrary to what the currently accepted answer says you do not need both if you are only using std::upper_bound or only std::lower_bound (though you would need both if using std::equal_range, or when using both std::upper_bound and std::lower_bound).
You can find these at first sight a bit strange specifications in the standard, but there is a way to understand without looking it up why they have to be like this. When using lower_bound, you want to find the point that separates the Container entries that are (strictly) less than your given Date from those that are not, and this requires calling the comparison function with that Date argument in second position. If however you ask for an upper_bound (as you are), you want to find the point that separates the entries that are not strictly greater than your given Date from those that are, and this requires calling the comparison function with that Date argument in first position (and negating the boolean result it returns). And for equal_range you of course need both possibilities.