multimap representation in memory - c++

I'm debugging my code and at one point I have a multimap which contains pairs of a long and a Note object created like this:
void Track::addNote(Note &note) {
long key = note.measureNumber * 1000000 + note.startTime;
this->noteList.insert(make_pair(key, note));
}
I wanted to look if these values are actually inserted in the multi map so I placed a breakpoint and this is what the multimap looks like (in Xcode):
It seems like I can infinitely open the elements (my actual multimap is the first element called noteList) Any ideas if this is normal and why I can't read the actual pair values (the long and the Note)?

libstdc++ implements it's maps and sets using a generic Red/Black tree. The nodes of the tree use a base class _Rb_tree_node_base which contain pointers to the same types for the parent/left/right nodes.
To access the data, it performs a static cast to the node type that's specific to the template arguments you provided. You won't be able to see the data using XCode unless you can force the cast.
It does something similar with linked lists, with a linked list node base.
Edit: It does this to remove the amount of duplicate code that is generated by the template. Rather than have a RbTree<Type1>, RbTree<Type2>, and so on; libstdc++ has a single set of operations that work on the base class, and those operations are the same regardless of the underlying type of the map. It only casts when it needs to examine the data, and the actual rotation/rebalance code is the same for all of the trees.

Seems like a bug in the component that renders the collection. About halfway down the list there is an entry that is 0x00000000, but the rendering continues below that, without any valid pointers though. Perhaps you need to add your own common-sense interpretation of the displayed data and treat a null value as the end of that part of the tree.

Related

Fill CapnProto List with non-primitive

According to the CapnProto documentation: (NOTE: I am using the C++ version)
For List where Foo is a non-primitive type, the type returned by
operator[] and iterator::operator*() is Foo::Reader (for
List::Reader) or Foo::Builder (for List::Builder). The
builder’s set method takes a Foo::Reader as its second parameter.
While using "set" seems to work fine for non-primitive types:
Other stack overflow question for primitives only
There does not appear to be a "set" function for automatically generated lists of non-primitives. Did my CapnProto generation fail in some way, or is there another method for setting elements in a list of non-primitives?
There is a "set" method, but it is called setWithCaveats():
destListBuilder.setWithCaveats(index, sourceStructReader)
This is to let you know that there are some obscure problems with setting an element of a struct list. The problem stems from the fact that struct lists are not represented as a list of pointers as you might expect, but rather they are a "flattened" series of consecutive structs, all of the same size. This implies that space for all the structs in the list is allocated at the time that you initialize the list. So, when you call setWithCaveats(), the target space is already allocated previously, and you're copying the source struct into that space.
This presents a problem in the face of varying versions: the source struct might have been constructed using a newer version of the protocol, in which additional fields were defined. In this case, it may actually be larger than expected. But, the destination space was already allocated, based on the protocol version you compiled with. So, it's too small! Unfortunately, there's no choice but to discard the newly-defined fields that we don't know about. Hence, data may be lost.
Of course, it may be that in your application, you know that the struct value doesn't come from a newer version, or that you don't care if you lose fields that you don't know about. In this case, setWithCaveats() will do what you want.
If you want to be careful to preserve unknown fields, you may want to look into the method capnp::Orphanage::newOrphanConcat(). This method can concatenate a list of lists of struct readers into a single list in such a way that no data is lost -- the target list is allocated with each struct's size equal to the maximum of all input structs.
auto orphanage = Orphanage::getForMessageContaining(builder);
auto orphan = orphanage.newOrphanConcat({list1Reader, list2Reader});
builder.adoptListField(kj::mv(orphan));

Can I reinterpret a memory mapped file of key-value pairs as a map in order to sort them?

I have a memory mapped file that contains key-value pairs. Both the key and value are uint32_t, and all the keys and values are stored in the file in binary, where a key immediately proceeds the value. The file contains only these pairs, no delimiters.
I want to be able to sort all of these key-value pairs by increasing key.
The following just compiled in my code:
struct FileAsMap { map<uint32_t, uint32_t> keyValueMap; };
const FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
but I don't really know what to do from here, since by definition the map container keeps a strict weak ordering of the pairs by key. If I just reinterpret the mapped file as a map, how can I get the pairs to order?
it's not an answer but explanations don't fit into comment limitations.
The keys in a map are usually unique (at least in std::map they are). But maps in general differ one from another in method they sort stored keys. For example std::map is based on a balanced binary tree with average complexity of retrieving a given key equal to O(ln(n)) where n is a number of elements in the map. Or e.g. std::unordered_map is a hashmap internally with the average access time = O(1). That is it looks for a key in constant time regardless of number of elements inside.
In any case all these data containers demand dedicated internal in-memory structure which practically never looks like a simple stream of key-value pairs. That's why I've told above in the first comment that it's almost impossible to reuse one of standard maps as a convenient data accessor for mmap-ed data w/o prior read and unpack the data stream.
But you can create your own map-like class which would iterate over data in mmap-ed area and would check in its operator[](size_t i) if a stored key matches the requested one. Iguess that a simplest implementation would take a single screen of code.
But beware: sequental scan is a relatively expensive operation, so if you got enough elements in the file, it could become unacceptable slow. In this case you'll need some optimized indexing. For example all keys are read in the beginning of processing and an indexing array is built. But all these questions heavily depend on task details, ao it's better to stop explanations now.
If you have any further questions feel free to ask. Of course a good question assumes that you have already studied the subject and now have encountered a particular problem that you can't solve yoursef
There are a lot of reasons why the answer is no. The two simplest are:
Maps are a structure that stores data in a form in which it's already sorted. Your data isn't already sorted, so it's simply not a map.
The map class has its own internal data structure that it uses to store maps. Unless your file replicates this internal structure perfectly (which it almost certainly can't since it likely includes pointers into memory) the map class will misunderstand the data in the file.
How did u serialize the data to the file?
Assuming that you serialized a struct consisting of maps, you'd de-serialize as below:
FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
Gives access to entire structure (blob).
(*fileAsMap).keyValueMap gives access to map.

How to do per node caching in a tree visitor

I have an application where a want to calculate different representations (mesh, voxelization, signed distance function, ...) of a tree of primitives (leaf nodes) that are combined via boolean operations (inner nodes).
My first approach to this was to write an abstract base class with a virtual getter function for each of the different representations and cached the intermediate results at the respective nodes as long as there was no change in their subtree (which would flush their cache).
However, I was unsatisfied with the ugly coupling of the tree structure with each of the different representations. To alleviate this I removed the abstract base classes and instead set up a visitor for each of the representations.
This neatly decoupled the tree from the representations but left me with the problem that I now need to cache the intermediate results somewhere else and this is where my problem starts.
TL;DR
How do I cache (arbitrary many differently typed) intermediate values at inner nodes of the tree without making the tree dependent on the value type?
My Approaches
The requirements offer two choices:
store the data in the tree but with type erasure
store the data outside the tree and somehow "connect" it to the node
The first one leaves me puzzled with some efficiency problem: I could easily add a container of boost::any (or something equivalent) in the nodes but then each visitor would have to search the whole container for it's own data.
The separation in the second one introduces the problem of keeping the cache up to date to the current tree. If there are changes in the tree (deletions, alterations of nodes) the cached values must at least be invalidated. My intuition was to use some hash function and an unordered_map but I hit some problems there as well:
I cannot use the treenodes themselves as key, so I need to introduce another class that just references tree nodes and represents them in the tree
referencing the values from the unordered_map's keys requires to erase all entries whose referencees are deleted or we have a dangling reference(/pointer) in the unordered_map which could get triggered on rehash
changes in the tree would require to reconstruct the unordered_map because keys might have changed
Am I missing some obvious solution to this?
Which approach would you favor (and why)?
I once had a similar problem and my solution was as follows:
Let each node have an unique identifier.
Let each node have a version number. Modifications that invalidate calculated values for the node just increase the version number.
Let each visitor have a caching map, where the ID pair is the key, mapped to a version/value pair.
When (re-)walking the tree, look for a node's entry in the map. If the version is correct, use the cached value. If it is outdated, calculate the new value and replace the old version/value pair.
At first, I used the node's address as IDs, but for memory reasons I had to reuse subtrees and picked the path to the node as ID. Such a path has the advantage that it can be calculated by each visitor and need not be stored at the node. In my case, each node could have at most two children, so a path was merely a set of left/right decisions, which can be stored in a simple unsigned int with some bit-shifting (my trees did never reach a depth of 32, so a 32 bit unsigned was more than enough as key).

QMap::contains() VS QMap::find()

I often see code like:
if(myQMap.contains("my key")){
myValue = myQMap["my key"];
}
which theoretically performs two look-up's in the QMap.
My first reaction is that it should be replaced by the following, which performs one lookup only and should be two times faster:
auto it = myQMap.find("my key");
if(it != myQMap.end()){
myValue = it.value();
}
I am wondering if QMap does this optimization automatically for me?
In other words, I am wondering if QMap saves the position of the last element found with QMap::contains() and checks it first before performing the next lookup?
I would expect that QMap provides both functions for a better interface to the class. It's more natural to ask if the map 'contains' a value with a specified key than it is to call the 'find' function.
As the code shows, both find and contains call the following internal function: -
Node *n = d->findNode(akey);
So if you're going to use the returned iterator, then using find and checking the return value will be more efficient, but if you just want to know if the value exists in the map, calling contains is better for readability.
If you look at the source code, you'll see that QMap is implemented as a binary tree structure of nodes. Calling findNode iterates through the nodes and does not cache the result.
QMap source code reveals that there is no special code in QMap::contains() method.
In some cases you can use QMap::value() or QMap::values() to get value for a key and check if it is correct. These methods (and const operator[]) will copy the value, although this is probably OK for most Qt types since their underlying data are copied-on-write (notably QMap itself).

how boost multi_index is implemented

I have some difficulties understanding how Boost.MultiIndex is implemented. Lets say I have the following:
typedef multi_index_container<
employee,
indexed_by<
ordered_unique<member<employee, std::string, &employee::name> >,
ordered_unique<member<employee, int, &employee::age> >
>
> employee_set;
I imagine that I have one array, Employee[], which actually stores the employee objects, and two maps
map<std::string, employee*>
map<int, employee*>
with name and age as keys. Each map has employee* value which points to the stored object in the array. Is this ok?
A short explanation on the underlying structure is given here, quoted below:
The implementation is based on nodes interlinked with pointers, just as say your favorite std::set implementation. I'll elaborate a bit on this: A std::set is usually implemented as an rb-tree where nodes look like
struct node
{
// header
color c;
pointer parent,left,right;
// payload
value_type value;
};
Well, a multi_index_container's node is basically a "multinode" with as many headers as indices as well as the payload. For instance, a multi_index_container with two so-called ordered indices uses an internal node that looks like
struct node
{
// header index #0
color c0;
pointer parent0,left0,right0;
// header index #1
color c1;
pointer parent1,left1,right2;
// payload
value_type value;
};
(The reality is more complicated, these nodes are generated through some metaprogramming etc. but you get the idea) [...]
Conceptually, yes.
From what I understand of Boost.MultiIndex (I've used it, but not seen the implementation), your example with two ordered_unique indices will indeed create two sorted associative containers (like std::map) which store pointers/references/indices into a common set of employees.
In any case, every employee is stored only once in the multi-indexed container, whereas a combination of map<string,employee> and map<int,employee> would store every employee twice.
It may very well be that there is indeed a (dynamic) array inside some multi-indexed containers, but there is no guarantee that this is true:
[Random access indices] do not provide memory contiguity,
a property of std::vectors by which
elements are stored adjacent to one
another in a single block of memory.
Also, Boost.Bimap is based on Boost.MultiIndex and the former allows for different representations of its "backbone" structure.
Actually I do not think it is.
Based on what is located in detail/node_type.hpp. It seems to me that like a std::map the node will contain both the value and the index. Except that in this case the various indices differ from one another and thus the node interleaving would actually differ depending on the index you're following.
I am not sure about this though, Boost headers are definitely hard to parse, however it would make sense if you think in term of memory:
less allocations: faster allocation/deallocation
better cache locality
I would appreciate a definitive answer though, if anyone knows about the gore.