performance of boost multi_index_container - c++

I am interested to know the performance of multi_index_container for the following use case:
struct idx_1 {};
struct idx_2 {};
typedef multi_index_container<
Object,
indexed_by<
// Keyed by: idx1
hashed_unique<
tag<idx_1>,
unique_key >,
// Keyed by: (attribute1, attribute2 and attribute3)
ordered_non_unique<
tag<idx_2>,
composite_key<
Object,
attribute1,
attribute2,
attribute3 > >
>
> ObjectMap;
I need a map to save the object, and the number of objects should be more than 300,000. while each object has 1 unique key and 3 attributes. The details of the keys:
unique key is "unique" as the name
each attribute only has a few possible values, say there's only 16 combinations. So with 300,000 objects, each combination will have a list of 300,000/16 objects
attribute1 needs to be modified from one value to another value occasionally
object finding is always be done via the unique_key while the composite_key is used to iterating objects with one or several attributes
For such use case, multi_index_container is a very good fit as I don't need to maintain several map independently. For the unique key part, I believe hashed_unique is a good candidate instead of ordered_unique.
But I am extremely not comfortable about the "ordered_non_unique" part. I don't know how's implemented in boost. My guess it boost maintain a list of objects in a single list for each combination similar to the unordered_map(forgive me if it's too naive!). If that's the case, modify the attribute an existing object will be a big pain as it requires to 1) go through a long list of objects for a particular combination 2) execute the equal comparison 3) and move the destination combination.
the steps that I suspect with high latency:
ObjectMap objects_;
auto& by_idx1 = objects_.get<idx1>();
auto it = by_idx1.find(some_unique_key);
Object new_value;
by_idx1.modify(it, [&](const Object& object) {
object = new_value;
});
My concern is that whether the last "modify" function has some liner behavior as stated to go through some potential long list of objects under one combination...

As this is a very specific piece of code, I'd suggest you benchmark and profile it using a large amount of real-world data.

As Fabio comments, your best option is to profile the case and see the outcome. Anyway, an ordered_non_unique index is implemented exactly as a std::multimap, namely via a regular rb-tree with the provision that elements with equivalent keys are allowed to coexist in the container; no lists of equivalent elements or anything. As for modify (for your particular use case replace is a better fit), the following procedure is executed:
Check if the element is in place: O(1).
If not, rearrange: O(log n), which for 300,000 elements amounts to a maximum of 19 element comparisons (not 300,000/16=18,750 as you suggest): these comparisons are done lexicographically on the triple (attribute1, attribute2, attribute3). Is this fast enough or not? Well, that depends on your performance requirements, so only profiling can really tell.

Related

Is There a way to Find the Number of Keys in a multimap Inline?

A multimap's size reports the number of values it contains. I'm interested in the number of keys it contains. For example, given multimap<int, double> foo I'd like to be able to do this:
const auto keyCount = ???
One way to get this is to use a for-loop on a zero initialized keyCount:
for(auto it = cbegin(foo); foo != cend(foo); it = foo.upper_bound(foo->first)) ++keyCount;
This however, does not let me perform the operation inline. So I can't initialize a const auto keyCount.
A solution could be a lambda or function which wraps this for-loop such as:
template <typename Key, typename Value>
size_t getKeyCount(const multimap<Key, Value>& arg) {
size_t result = 0U;
for(auto it = cbegin(foo); foo != cend(foo); it = foo.upper_bound(foo->first)) ++result;
return result;
}
But I was hoping for something provided by the standard. Does such a thing exist?
No, there is no built-in in the standard to do this. Consider that your count function works because a multimap is internally sorted. Typical implementations such as libstdc++ use red-black trees for the internal representation. You would need to walk the tree in order to count all the (unique) keys. It's unavoidable.
1st, multimap doesn't contain a key count naively. So your question then is about how to use something from the algorithm library to find the count. The 2 constraints you have placed that preclude everything in the library are:
Must be used inline, so a count, not an iterator must be returned
Must perform at least as well as upper_bound used in a lambda, which has time complexity: O(log n)
1 leaves us with: count, count_if, for_each, as well as most of the algorithms in the numeric library
2 eliminates consideration of all of these as each of them has time complexity: O(n)
As such your getKeyCount is preferable to any other option that the standard supplies.
Just a comment about another option that may be presented: the maintenance of keyCount whenever something is added or removed from foo, this may seem workable, but it requires a check before each insert if the inserted key exists, and a check after each removal if the removed key still exists. Aside from this, there is also the consideration of the danger of multi-threaded inoperability and the loss of code readability, where it's not clear that this keyCount must be maintained along with foo. Ultimately this a bad idea unless the key count is taken significantly more frequently than foo is updated.
If you only are creating multimap and inserting values in it, you can keep a companion map to record different types of keys. That size of that map will give you key count.

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.
Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.
There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

boost::multi_index composite keys efficiency

Long time reader first time poster! I'm playing around with the boost::multi_index container stuff and have a rather in-depth question that hopefully a boost or C++ container expert might know (my knowledge in C++ containers is pretty basic). For reference, the boost documentation on composite keys can be found here: boost::multi_index composite keys.
When using a composite key, the documentation states that "Composite keys are sorted by lexicographical order, i.e. sorting is performed by the first key, then the second key if the first one is equal, etc". Does this mean that the structure is stored such that a lookup for a specific 2-part composite key will take O(n=1) time, i.e. is the container sorted such that there is a pointer directly to each item, or does the boost container retrieve a list that matches the first part of the composite key and then need to perform a search for items matching the second part of the key and thus is slower?
For example, if I was to maintain two containers manually using two different indices and wanted to find items that matched a specific 2-part query I would probably filter the first container for all items matching the 1st part of the query, and then filter the result for items that match the 2nd part of the query. So this manual method would effectively involve two searches. Does boost effectively do this or does it improve on efficiency somehow via the use of composite keys?
Hopefully I've explained myself here but please ask questions and I will try my best to clarify exactly what I mean!
Lookups involving composite keys do not go through any two-stage process as you describe. composite_key-induced orderings are normal orderings, the only special thing about it being its dependence on two or more element keys rather than one.
Maybe an example will clarify. Consider this use of composite_key:
struct element
{
int x,y,z;
};
typedef multi_index_container<
element,
indexed_by<
ordered_unique<
composite_key<
element,
member<element,int,&element::x>,
member<element,int,&element::y>,
member<element,int,&element::z>
>
>
>
> multi_t;
The resulting container is in a sense equivalent to this:
struct element_cmp
{
bool operator()(const element& v1, const element& v2)const
{
if(v1.x<v2.x)return true;
if(v2.x<v1.x)return false;
if(v1.y<v2.y)return true;
if(v2.y<v1.y)return false;
return v1.z<v2.z;
}
};
typedef std::set<element,element_cmp> set_t;
composite_key automatically generates equivalent code to that in element_cmp::operator(), and additionally allows for lookup on just the first n keys, but the underlying data structure does not change with respect to the case using std::set.

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?
I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.
First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it
If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

how boost multi_index is implemented

I have some difficulties understanding how Boost.MultiIndex is implemented. Lets say I have the following:
typedef multi_index_container<
employee,
indexed_by<
ordered_unique<member<employee, std::string, &employee::name> >,
ordered_unique<member<employee, int, &employee::age> >
>
> employee_set;
I imagine that I have one array, Employee[], which actually stores the employee objects, and two maps
map<std::string, employee*>
map<int, employee*>
with name and age as keys. Each map has employee* value which points to the stored object in the array. Is this ok?
A short explanation on the underlying structure is given here, quoted below:
The implementation is based on nodes interlinked with pointers, just as say your favorite std::set implementation. I'll elaborate a bit on this: A std::set is usually implemented as an rb-tree where nodes look like
struct node
{
// header
color c;
pointer parent,left,right;
// payload
value_type value;
};
Well, a multi_index_container's node is basically a "multinode" with as many headers as indices as well as the payload. For instance, a multi_index_container with two so-called ordered indices uses an internal node that looks like
struct node
{
// header index #0
color c0;
pointer parent0,left0,right0;
// header index #1
color c1;
pointer parent1,left1,right2;
// payload
value_type value;
};
(The reality is more complicated, these nodes are generated through some metaprogramming etc. but you get the idea) [...]
Conceptually, yes.
From what I understand of Boost.MultiIndex (I've used it, but not seen the implementation), your example with two ordered_unique indices will indeed create two sorted associative containers (like std::map) which store pointers/references/indices into a common set of employees.
In any case, every employee is stored only once in the multi-indexed container, whereas a combination of map<string,employee> and map<int,employee> would store every employee twice.
It may very well be that there is indeed a (dynamic) array inside some multi-indexed containers, but there is no guarantee that this is true:
[Random access indices] do not provide memory contiguity,
a property of std::vectors by which
elements are stored adjacent to one
another in a single block of memory.
Also, Boost.Bimap is based on Boost.MultiIndex and the former allows for different representations of its "backbone" structure.
Actually I do not think it is.
Based on what is located in detail/node_type.hpp. It seems to me that like a std::map the node will contain both the value and the index. Except that in this case the various indices differ from one another and thus the node interleaving would actually differ depending on the index you're following.
I am not sure about this though, Boost headers are definitely hard to parse, however it would make sense if you think in term of memory:
less allocations: faster allocation/deallocation
better cache locality
I would appreciate a definitive answer though, if anyone knows about the gore.