Performance in Nesting three maps Vs Separate maps in C++

Performance in Nesting three maps Vs Separate maps in C++ - c++

I am confused to choose between the two methods to have a STL structure ,
Method A:
map<pair<string,int>,map<string,map<ULONG,vector<string>>*>*>
Method B:
Is the above advisable or having a separate maps like below,
map<pair<string,int>,vector<string>>
After querying from this parent map , then iterating the vector and query the second map
map<string,map<ULONG,vector<string>>*>
Out of the above two methods which is the optimal way and which will cause more performance overhead?
Update 1:
My target is to store the output logs in memory which has three groups.. the outermost key "pair" is parent grouping and which has it's own sub groups.. And each sub groups will have it's own groups.
After TypeDef the Method A:
typedef map<ULONG,vector<string>> Sub_Map2;
typedef map<string,Sub_Map2*> Sub_Map1;
typedef map<pair<string,int>,Sub_Map1*> Parent_map;
For better readability

Don't go with premature optimization. Use clean code and try to optimize it only if you see a bottleneck in that code. Use typedef's in order to maintain readability.
I.e. (I don't know how you want to organize it).
typedef map<ULONG, vector<string>> IDLogMap;
typedef map<pair<string, int>, IDLogMap> PairLogMap;
Anyway I suggest you to refactor a bit your code, creating some log message class and so on, because map<pair<string,int>,map<string,map<ULONG,vector<string>>*>*> it's a bit too complicated for me, especially if you want to obtain a specific log message. Also, try to avoid raw pointers.

std::map will allocate each key-value pair separately, so there is no reallocation done when you insert or remove elements from the maps. Meaning, there is no overhead difference between the two versions (asides from the extra lookup).
That said, option B may be nicer if you ever need to iterate the inner maps on their own - if you don't, no need to complicate the code.

Related

unordered_map to find indices of an array

I want to find indices of a set efficiently. I am using unordered_map and making the inverse map like this
std::unordered_map <int, int> myHash (size);
Int i = 0;
for (it = someSet.begin(); it != someSet.end(); it++)
{
myHash.insert({*it , i++});
}
It works but it is not efficient. I did this so anytime I need the indices I could access them O(1). Performance analysis is showing me that this part became hotspot of my code.
VTune tells me that new operator is my hotspot. I guess something is happening inside the unordered_map.
It seems to me that this case should be handled efficiently. I couldn't find a good way yet. Is there a better solution? a correct constructor?
Maybe I should pass more info to the constructor. I looked up the initialize list but it is not exactly what I want.
Update: Let me add some more information. The set is not that important; I save the set in to an array (sorted). Later I need to find the index of the values which are unique. I can do it in logn but it is not fast enough. It is why I decided to use a hash. The size of the set (columns of submatrix) doesn't change after this point.
It arise from sparse matrix computation which I need to find index of submatrices in a bigger matrix. Therefore the size and the pattern of the look ups is depend on the input matrix. It works reasonable on smaller problems. I could use a lookup table but while I am planning to do it in parallel the lookup table for each thread can be expensive. I have the exact size of hash in the time of creation. I thought by sending it to the constructor it stops reallocating. I really don't understand why it is reallocating this much.

The problem is, std::unordered_map, mainly implemented as a list of vectors, is extremely cache-unfriendly, and will perform especially poorly with small keys/values (like int,int in your case), not to mention requiring tons of (re-)allocations.
As an alternative you can try a third-party hash map implementing open addressing with linear probing (a mouthful, but the underlying structure is simply a vector, i.e. much more cache-friendly). For example, Google's dense_hash_map or this: flat_hash_map. Both can be used as a drop-in replacement for unordered_map, and only additionally require to designate one int value as the "empty" key.

std::unordered_map<int, int> is often implemented as if it was
std::vector<std::list<std::par<int, int>>>
Which causes a lot of allocations and deallocations of each node, each (de-)allocation is using a lock which causes contention.
You can help it a bit by using emplace instead of insert, or you can jump out in the fantastic new world of pmr allocators. If your creation and destruction of the pmr::unordered_map is single threaded you should be able to get a lot of extra performance out of it. See Jason Turners C++ Weekly - Ep 222 - 3.5x Faster Standard Containers With PMR!, his example is a bit on the small side but you can get the general idea.

Searching data using different keys

I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?

You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().

Not in the Standard Library, but Boost offers boost::multi_index

Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.

What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.

Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.

You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.

I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.

Is using a map where value is std::shared_ptr a good design choice for having multi-indexed lists of classes?

problem is simple:
We have a class that has members a,b,c,d...
We want to be able to quickly search(key being value of one member) and update class list with new value by providing current value for a or b or c ...
I thought about having a bunch of
std::map<decltype(MyClass.a/*b,c,d*/),shared_ptr<MyClass>>.
1) Is that a good idea?
2) Is boost multi index superior to this handcrafted solution in every way?
PS SQL is out of the question for simplicity/perf reasons.

Boost MultiIndex may have a distinct disadvantage that it will attempt to keep all indices up to date after each mutation of the collection.
This may be a large performance penalty if you have a data load phase with many separate writes.
The usage patterns of Boost Multi Index may not fit with the coding style (and taste...) of the project (members). This should be a minor disadvantage, but I thought I'd mention it
As ildjarn mentioned, Boost MI doesn't support move semantics as of yet
Otherwise, I'd consider Boost MultiIndex superior in most occasions, since you'd be unlikely to reach the amount of testing it received.

You want want to consider containing all of your maps in a single class, arbitrarily deciding on one of the containers as the one that stores the "real" objects, and then just use a std::map with a mapped type of raw pointers to elements of the first std::map.
This would be a little more difficult if you ever need to make copies of those maps, however.

An integer hashing problem

I have a (C++) std::map<int, MyObject*> that contains a couple of millions of objects of type MyObject*. The maximum number of objects that I can have, is around 100 millions. The key is the object's id. During a certain process, these objects must be somehow marked( with a 0 or 1) as fast as possible. The marking cannot happen on the objects themselves (so I cannot introduce a member variable and use that for the marking process). Since I know the minimum and maximum id (1 to 100_000_000), the first thought that occured to me, was to use a std::bit_set<100000000> and perform my marking there. This solves my problem and also makes it easier when marking processes run in parallel, since these use their own bit_set to mark things, but I was wondering what the solution could be, if I had to use something else instead of a 0-1 marking, e.g what could I use if I had to mark all objects with an integer number ?
Is there some form of a data structure that can deal with this kind of problem in a compact (memory-wise) manner, and also be fast ? The main queries of interest are whether an object is marked, and with what was marked with.
Thank you.
Note: std::map<int, MyObject*> cannot be changed. Whatever data structure I use, must not deal with the map itself.

How about making the value_type of your map a std::pair<bool, MyObject*> instead of MyObject*?

If you're not concerned with memory, then a std::vector<int> (or whatever suits your need in place of an int) should work.
If you don't like that, and you can't modify your map, then why not create a parallel map for the markers?
std::map<id,T> my_object_map;
std::map<id,int> my_marker_map;
If you cannot modify the objects directly, have you considered wrapping the objects before you place them in the map? e.g.:
struct
{
int marker;
T *p_x;
} T_wrapper;
std::map<int,T_wrapper> my_map;
If you're going to need to do lookups anyway, then this will be no slower.
EDIT: As #tenfour suggests in his/her answer, a std::pair may be a cleaner solution here, as it saves the struct definition. Personally, I'm not a big fan of std::pairs, because you have to refer to everything as first and second, rather than by meaningful names. But that's just me...

The most important question to ask yourself is "How many of these 100,000,000 objects might be marked (or remain unmarked)?" If the answer is smaller than roughly 100,000,000/(2*sizeof(int)), then just use another std::set or std::tr1::unordered_set (hash_set previous to tr1) to track which ones are so marked (or remained unmarked).
Where does 2*sizeof(int) come from? It's an estimate of the amount of memory overhead to maintain a heap structure in a deque of the list of items that will be marked.
If it is larger, then use std::bitset as you were about to use. It's overhead is effectively 0% for the scale of quantity you need. You'll need about 13 megabytes of contiguous ram to hold the bitset.
If you need to store a marking as well as presence, then use std::tr1::unordered_map using the key of Object* and value of marker_type. And again, if the percentage of marked nodes is higher than the aforementioned comparison, then you'll want to use some sort of bitset to hold the number of bits needed, with suitable adjustments in size, at 12.5 megabytes per bit.
A purpose-built object holding the bitset might be your best choice, given the clarification of the requirements.
Edit: this assumes that you've done proper time-complexity computations for what are acceptable solutions to you, since changing the base std::map structure is no longer permitted.

If you don't mind using hacks, take a look at the memory optimization used in Boost.MultiIndex. It can store one bit in the LSB of a stored pointer.

Is this use of nested vector/multimap/map okay?

I am looking for the perfect data structure for the following scenario:
I have an index i, and for each one I need to support the following operation 1: Quickly look up its Foo objects (see below), each of which is associated with a double value.
So I did this:
struct Foo {
int a, b, c;
};
typedef std::map<Foo, double> VecElem;
std::vector<VecElem> vec;
But it turns out to be inefficient because I also have to provide very fast support for the following operation 2: Remove all Foos that have a certain value for a and b (together with the associated double values).
To perform this operation 2, I have to iterate over the maps in the vector, checking the Foos for their a and b values and erasing them one by one from the map, which seems to be very expensive.
So I am now considering this data structure instead:
struct Foo0 {
int a, b;
};
typedef std::multimap<Foo0, std::map<int, double> > VecElem;
std::vector<VecElem> vec;
This should provide fast support for both operations 1 and 2 above. Is that reasonable? Is there lots of overhead from the nested container structures?
Note: Each of the multimaps will usually only have one or two keys (of type Foo0), each of which will have about 5-20 values (of type std::map<int,double>).

To answer the headline question: yes, nesting STL containers is perfectly fine. Depending on your usage profile, this could result in excessive copying behind the scenes though. A better option might be to wrap the contents of all but top-level container using Boost::shared_ptr, so that container housekeeping does not require a deep copy of your nested container's entire contents. This would be the case say if you plan on spending a lot of time inserting and removing VecElem in the toplevel vector - expensive if VecElem is a direct multimap.
Memory overhead in the data structures is likely to be not significantly worse than anything you could design with equivalent functionality, and more likely better unless you plan to spend more time on this than is healthy.

Well, you have a reasonable start on this idea ... but there are some questions that must be addressed first.
For instance, is the type Foo mutable? If it is, then you need to be careful about creating a type Foo0 (um ... a different name may be a good idea hear to avoid confusion) since changes to Foo may invalidate Foo0.
Second, you need to decide whether you also need this structure to work well for inserts/updates. If the population of Foo is static and unchanging - this isn't an issue, but if it isn't, you may end up spending a lot of time maintaining Vec and VecElem.
As far as the question of nesting STL containers goes, this is fine - and is often used to create arbitrarily complex structures.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js