boost::multi_index composite keys efficiency

boost::multi_index composite keys efficiency - c++

Long time reader first time poster! I'm playing around with the boost::multi_index container stuff and have a rather in-depth question that hopefully a boost or C++ container expert might know (my knowledge in C++ containers is pretty basic). For reference, the boost documentation on composite keys can be found here: boost::multi_index composite keys.
When using a composite key, the documentation states that "Composite keys are sorted by lexicographical order, i.e. sorting is performed by the first key, then the second key if the first one is equal, etc". Does this mean that the structure is stored such that a lookup for a specific 2-part composite key will take O(n=1) time, i.e. is the container sorted such that there is a pointer directly to each item, or does the boost container retrieve a list that matches the first part of the composite key and then need to perform a search for items matching the second part of the key and thus is slower?
For example, if I was to maintain two containers manually using two different indices and wanted to find items that matched a specific 2-part query I would probably filter the first container for all items matching the 1st part of the query, and then filter the result for items that match the 2nd part of the query. So this manual method would effectively involve two searches. Does boost effectively do this or does it improve on efficiency somehow via the use of composite keys?
Hopefully I've explained myself here but please ask questions and I will try my best to clarify exactly what I mean!

Lookups involving composite keys do not go through any two-stage process as you describe. composite_key-induced orderings are normal orderings, the only special thing about it being its dependence on two or more element keys rather than one.
Maybe an example will clarify. Consider this use of composite_key:
struct element
{
int x,y,z;
};
typedef multi_index_container<
element,
indexed_by<
ordered_unique<
composite_key<
element,
member<element,int,&element::x>,
member<element,int,&element::y>,
member<element,int,&element::z>
>
>
>
> multi_t;
The resulting container is in a sense equivalent to this:
struct element_cmp
{
bool operator()(const element& v1, const element& v2)const
{
if(v1.x<v2.x)return true;
if(v2.x<v1.x)return false;
if(v1.y<v2.y)return true;
if(v2.y<v1.y)return false;
return v1.z<v2.z;
}
};
typedef std::set<element,element_cmp> set_t;
composite_key automatically generates equivalent code to that in element_cmp::operator(), and additionally allows for lookup on just the first n keys, but the underlying data structure does not change with respect to the case using std::set.

Related

performance of boost multi_index_container

I am interested to know the performance of multi_index_container for the following use case:
struct idx_1 {};
struct idx_2 {};
typedef multi_index_container<
Object,
indexed_by<
// Keyed by: idx1
hashed_unique<
tag<idx_1>,
unique_key >,
// Keyed by: (attribute1, attribute2 and attribute3)
ordered_non_unique<
tag<idx_2>,
composite_key<
Object,
attribute1,
attribute2,
attribute3 > >
>
> ObjectMap;
I need a map to save the object, and the number of objects should be more than 300,000. while each object has 1 unique key and 3 attributes. The details of the keys:
unique key is "unique" as the name
each attribute only has a few possible values, say there's only 16 combinations. So with 300,000 objects, each combination will have a list of 300,000/16 objects
attribute1 needs to be modified from one value to another value occasionally
object finding is always be done via the unique_key while the composite_key is used to iterating objects with one or several attributes
For such use case, multi_index_container is a very good fit as I don't need to maintain several map independently. For the unique key part, I believe hashed_unique is a good candidate instead of ordered_unique.
But I am extremely not comfortable about the "ordered_non_unique" part. I don't know how's implemented in boost. My guess it boost maintain a list of objects in a single list for each combination similar to the unordered_map(forgive me if it's too naive!). If that's the case, modify the attribute an existing object will be a big pain as it requires to 1) go through a long list of objects for a particular combination 2) execute the equal comparison 3) and move the destination combination.
the steps that I suspect with high latency:
ObjectMap objects_;
auto& by_idx1 = objects_.get<idx1>();
auto it = by_idx1.find(some_unique_key);
Object new_value;
by_idx1.modify(it, [&](const Object& object) {
object = new_value;
});
My concern is that whether the last "modify" function has some liner behavior as stated to go through some potential long list of objects under one combination...

As this is a very specific piece of code, I'd suggest you benchmark and profile it using a large amount of real-world data.

As Fabio comments, your best option is to profile the case and see the outcome. Anyway, an ordered_non_unique index is implemented exactly as a std::multimap, namely via a regular rb-tree with the provision that elements with equivalent keys are allowed to coexist in the container; no lists of equivalent elements or anything. As for modify (for your particular use case replace is a better fit), the following procedure is executed:
Check if the element is in place: O(1).
If not, rearrange: O(log n), which for 300,000 elements amounts to a maximum of 19 element comparisons (not 300,000/16=18,750 as you suggest): these comparisons are done lexicographically on the triple (attribute1, attribute2, attribute3). Is this fast enough or not? Well, that depends on your performance requirements, so only profiling can really tell.

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.

Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.

There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?

I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.

First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it

If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

Which sorted STL container to use for fast insert and find with a special key?

I have some data with a key associated with each data item. The key is made of two parts: let's call them color and id. I want to iterate the container by color to speed up rendering and I also want to find items in the container by id alone.
I tried using std::map for this with a key
class MyKey {
public:
int color;
int id;
bool operator<(...)
bool operator==(...)
};
But I cannot provide a < operator that would keep the data sorted by color and at the same time allow map::find to work on id alone (i.e. no information about color).
I want both the insert and find operations to be fast (e.g. O(log(n))).
Any ideas what kind of container I could use to implement this?

Adapt the example here from Boost.Multi_index based on the following modifications:
typedef multi_index_container<
MyKey,
indexed_by<ordered_unique<identity<MyKey> >,
ordered_non_unique<member<MyKey,int,&MyKey::color> >
>
> DataSet;

Try Boost.BiMap, it's a map with 2 views.
Otherwise there is the more complicated Boost.MultiIndex.

If you'd rather write your own than use Boost (dumb in my opinion) you can make a class that holds two maps. One to map color and the other to map ID. The map values would be pointers. You'd define insert and find operations for your new class. The insert would new a data struct and insert the pointer into both maps. The find operations would locate the item using either map as appropriate. For delete/remove operations it is a little trickier: you'd need to look up the data struct by a map, then remove it from both maps, probably by using the data in the data struct to find the other map entry.

Container for database-like searches

I'm looking for some STL, boost, or similar container to use the same way indexes are used in databases to search for record using a query like this:
select * from table1 where field1 starting with 'X';
or
select * from table1 where field1 like 'X%';
I thought about using std::map, but I cannot because I need to search for fields that "start with" some text, and not those that are "equal to". Beside that, I need it to work on multiple fields (each "record" has 6 fields, for example), so I would need a separate std::map for each one.
I could create a sorted vector or list and use binary search (breaking the set in 2 in each step by reading the element in the middle and seeing if it's more or less than 'X'), but I wonder if there is some ready-made container I could use without reinventing the wheel?

Boost.Multi-Index allows you to manage with several index and it implements the lower_bound as for std::set/map. You will need to select the index corresponding to the field and then do as if it was a map or a set.
Next follows a generic function that could be used to get a couple of iterators, the fist to the first item starting with a given prefix, the second the first item starting with the next prefix, i.e. the end of the search
template <typename SortedAssociateveContainer>
std::pair<typename SortedAssociateveContainer::iterator,
typename SortedAssociateveContainer::iterator>
starts_with(
SortedAssociateveContainer const& coll,
typename SortedAssociateveContainer::key_type const& k)
{
return make_pair(coll.lower_bound(k),
coll.lower_bound(next_prefix(k));
}
where
next_prefix gets the next prefix using lexicographic order based on the SortedAssociateveContainer comparator (of course this function needs more arguments to be completely generic, see the question).
The result of starts_with can be used on any range algorithm (see Boost.Range)

std::map is fine, or std::set if there's no data other than the string. Pass your prefix string into lower_bound to get the first string which sorts at or after that point. Then iterate forward through the map until you hit the end or find an element which doesn't begin with your prefix.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js