Is this use of nested vector/multimap/map okay? - c++

I am looking for the perfect data structure for the following scenario:
I have an index i, and for each one I need to support the following operation 1: Quickly look up its Foo objects (see below), each of which is associated with a double value.
So I did this:
struct Foo {
int a, b, c;
};
typedef std::map<Foo, double> VecElem;
std::vector<VecElem> vec;
But it turns out to be inefficient because I also have to provide very fast support for the following operation 2: Remove all Foos that have a certain value for a and b (together with the associated double values).
To perform this operation 2, I have to iterate over the maps in the vector, checking the Foos for their a and b values and erasing them one by one from the map, which seems to be very expensive.
So I am now considering this data structure instead:
struct Foo0 {
int a, b;
};
typedef std::multimap<Foo0, std::map<int, double> > VecElem;
std::vector<VecElem> vec;
This should provide fast support for both operations 1 and 2 above. Is that reasonable? Is there lots of overhead from the nested container structures?
Note: Each of the multimaps will usually only have one or two keys (of type Foo0), each of which will have about 5-20 values (of type std::map<int,double>).

To answer the headline question: yes, nesting STL containers is perfectly fine. Depending on your usage profile, this could result in excessive copying behind the scenes though. A better option might be to wrap the contents of all but top-level container using Boost::shared_ptr, so that container housekeeping does not require a deep copy of your nested container's entire contents. This would be the case say if you plan on spending a lot of time inserting and removing VecElem in the toplevel vector - expensive if VecElem is a direct multimap.
Memory overhead in the data structures is likely to be not significantly worse than anything you could design with equivalent functionality, and more likely better unless you plan to spend more time on this than is healthy.

Well, you have a reasonable start on this idea ... but there are some questions that must be addressed first.
For instance, is the type Foo mutable? If it is, then you need to be careful about creating a type Foo0 (um ... a different name may be a good idea hear to avoid confusion) since changes to Foo may invalidate Foo0.
Second, you need to decide whether you also need this structure to work well for inserts/updates. If the population of Foo is static and unchanging - this isn't an issue, but if it isn't, you may end up spending a lot of time maintaining Vec and VecElem.
As far as the question of nesting STL containers goes, this is fine - and is often used to create arbitrarily complex structures.

Related

Performance in Nesting three maps Vs Separate maps in C++

I am confused to choose between the two methods to have a STL structure ,
Method A:
map<pair<string,int>,map<string,map<ULONG,vector<string>>*>*>
Method B:
Is the above advisable or having a separate maps like below,
map<pair<string,int>,vector<string>>
After querying from this parent map , then iterating the vector and query the second map
map<string,map<ULONG,vector<string>>*>
Out of the above two methods which is the optimal way and which will cause more performance overhead?
Update 1:
My target is to store the output logs in memory which has three groups.. the outermost key "pair" is parent grouping and which has it's own sub groups.. And each sub groups will have it's own groups.
After TypeDef the Method A:
typedef map<ULONG,vector<string>> Sub_Map2;
typedef map<string,Sub_Map2*> Sub_Map1;
typedef map<pair<string,int>,Sub_Map1*> Parent_map;
For better readability
Don't go with premature optimization. Use clean code and try to optimize it only if you see a bottleneck in that code. Use typedef's in order to maintain readability.
I.e. (I don't know how you want to organize it).
typedef map<ULONG, vector<string>> IDLogMap;
typedef map<pair<string, int>, IDLogMap> PairLogMap;
Anyway I suggest you to refactor a bit your code, creating some log message class and so on, because map<pair<string,int>,map<string,map<ULONG,vector<string>>*>*> it's a bit too complicated for me, especially if you want to obtain a specific log message. Also, try to avoid raw pointers.
std::map will allocate each key-value pair separately, so there is no reallocation done when you insert or remove elements from the maps. Meaning, there is no overhead difference between the two versions (asides from the extra lookup).
That said, option B may be nicer if you ever need to iterate the inner maps on their own - if you don't, no need to complicate the code.

Define struct with minimum size

I want to define a struct, e.g. type, such that sizeof(type) is no less than some value.
Motivation:
I have a vector std::vector<type> and I will remove some elements from it. Also, I have saved the indexes of some elements to other places, thus I want just mark it as not used and reuse it in the future. This leads me to save the next available position as a list in erased positions. As a result, sizeof(type) should be no less than sizeof(size_t) and type should be properly aligned as well.
Possible Solutions:
boost::variant<type, size_t>
This has two problems from my point of view. If I use boost::get<type>, the performance will decrease significantly. If I use boost::apply_visitor, the syntax would be weird and the performance also decreases according to my profile.
union{type t; size_t s;}
This of course works except for two shortfalls. Firstly, the syntax to refer the member of type would be more messy. Secondly, I have to define constructor, copy constructor, etc. for this union.
Extend type by char[sizeof(size_t) - sizeof(type)]
This almost fulfills my requirements. However, this risks of zero length array which is not supported by the c++ standard and possibly wrong alignment.
Since I won't use type as size_t often, I'd like to just ensure I can use reinterpret_cast<size_t> when needed.
Complements
After reading the comments, I think the best solution for my problem should be boost::variant. But I am still wondering is there a way to combine the benefits of solution 2 and 3, i.e.
a. I can access members of type without changes.
b. Get the guarantee that reinterpret_cast<size_t> works.
You can mitigate the concerns about solution 3 with something like:
struct data
{
// ...
};
template<class T, bool> class pad_;
template<class T> class pad_<T, true> { char dummy[sizeof(T) - sizeof(data)]; };
template<class T> class pad_<T, false> {};
template<class T> using pad = pad_<T, (sizeof(T) > sizeof(data))>;
class type : public data, pad<size_t>
{
// ...
};
This code:
assumes empty base optimization so that pad could be completely optimized out from type layout when sizeof(data) >= sizeof(size_t)
hasn't the risk of zero length array
Though this being an interesting problem the design itself seams questionable.
When inserting a new element items marked unused are considered first before growing the vector. It means that the relative order of items is unpredictable. If that's being acceptable you could have just used a vector of (smart) pointers.
Typically a vector is inefficient when removing items from the middle. Since the order doesn't matter it is possible to swap the element being removed with the last element and pop the last element.
All elements are of the same size; allocating them using a pool could be faster then using the system allocator.
A pool basically allocates memory in big chunks and hands out smaller chunks on request. A pool usually stores the free list in yet unallocated chunks to track available memory (the same very idea described in the question). There are some good implementations readily available (from Boost and other sources).
Concerning the original design it is cumbersome to enumerate elements in the vector since real elements are mixed with "holes", the logic is going to be obfuscated with additional checks.
Probably there is some sold reasoning behind the original design; unfortunately #user1535111 is not telling the details.

Searching data using different keys

I am no expert in C++ and STL.
I use a structure in a Map as data. Key is some class C1.
I would like to access the same data but using a different key C2 too (where C1 and C2 are two unrelated classes).
Is this possible without duplicating the data?
I tried searching in google, but had a tough time finding an answer that I could understand.
This is for an embedded target where boost libraries are not supported.
Can somebody offer help?
You may store pointers to Data as std::map values, and you can have two maps with different keys pointing to the same data.
I think a smart pointer like std::shared_ptr is a good option in this case of shared ownership of data:
#include <map> // for std::map
#include <memory> // for std::shared_ptr
....
std::map<C1, std::shared_ptr<Data>> map1;
std::map<C2, std::shared_ptr<Data>> map2;
Instances of Data can be allocated using std::make_shared().
Not in the Standard Library, but Boost offers boost::multi_index
Two keys of different types
I must admit I've misread a bit, and didn't really notice you want 2 keys of different types, not values. The solution for that will base on what's below, though. Other answers have pretty much what will be needed for that, I'd just add that you could make an universal lookup function: (C++14-ish pseudocode).
template<class Key>
auto lookup (Key const& key) { }
And specialize it for your keys (arguably easier than SFINAE)
template<>
auto lookup<KeyA> (KeyA const& key) { return map_of_keys_a[key]; }
And the same for KeyB.
If you wanted to encapsulate it in a class, an obvious choice would be to change lookup to operator[].
Key of the same type, but different value
Idea 1
The simplest solution I can think of in 60 seconds: (simplest meaning exactly that it should be really thought through). I'd also switch to unordered_map as default.
map<Key, Data> data;
map<Key2, Key> keys;
Access via data[keys["multikey"]].
This will obviously waste some space (duplicating objects of Key type), but I am assuming they are much smaller than the Data type.
Idea 2
Another solution would be to use pointers; then the only cost of duplicate is a (smart) pointer:
map<Key, shared_ptr<Data>> data;
Object of Data will be alive as long as there is at least one key pointing to it.
What I usually do in these cases is use non-owned pointers. I store my data in a vector:
std::vector<Data> myData;
And then I map pointers to each element. Since it is possible that pointers are invalidated because of the future growth of the vector, though, I will choose to use the vector indexes in this case.
std::map<Key1, int> myMap1;
std::map<Key2, int> myMap2;
Don't expose the data containers to your clients. Encapsulate element insertion and removal in specific functions, which insert everywhere and remove everywhere.
Bartek's "Idea 1" is good (though there's no compelling reason to prefer unordered_map to map).
Alternatively, you could have a std::map<C2, Data*>, or std::map<C2, std::map<C1, Data>::iterator> to allow direct access to Data objects after one C2-keyed search, but then you'd need to be more careful not to access invalid (erased) Data (or more precisely, to erase from both containers atomically from the perspective of any other users).
It's also possible for one or both maps to move to shared_ptr<Data> - the other could use weak_ptr<> if that's helpful ownership-wise. (These are in the C++11 Standard, otherwise the obvious source - boost - is apparently out for you, but maybe you've implemented your own or selected another library? Pretty fundamental classes for modern C++).
EDIT - hash tables versus balanced binary trees
This isn't particularly relevant to the question, but has received comments/interest below and I need more space to address it properly. Some points:
1) Bartek's casually advising to change from map to unordered_map without recommending an impact study re iterator/pointer invalidation is dangerous, and unwarranted given there's no reason to think it's needed (the question doesn't mention performance) and no recommendation to profile.
3) Relatively few data structures in a program are important to performance-critical behaviours, and there are plenty of times when the relative performance of one versus another is of insignificant interest. Supporting this claim - masses of code were written with std::map to ensure portability before C++11, and perform just fine.
4) When performance is a serious concern, the advice should be "Care => profile", but saying that a rule of thumb is ok - in line with "Don't pessimise prematurely" (see e.g. Sutter and Alexandrescu's C++ Coding Standards) - and if asked for one here I'd happily recommend unordered_map by default - but that's not particularly reliable. That's a world away from recommending every std::map usage I see be changed.
5) This container performance side-track has started to pull in ad-hoc snippets of useful insight, but is far from being comprehensive or balanced. This question is not a sane venue for such a discussion. If there's another question addressing this where it makes sense to continue this discussion and someone asks me to chip in, I'll do it sometime over the next month or two.
You could consider having a plain std::list holding all your data, and then various std::map objects mapping arbitrary key values to iterators pointing into the list:
std::list<Data> values;
std::map<C1, std::list<Data>::iterator> byC1;
std::map<C2, std::list<Data>::iterator> byC2;
I.e. instead of fiddling with more-or-less-raw pointers, you use plain iterators. And iterators into a std::list have very good invalidation guarantees.
I had the same problem, at first holding two map for shared pointers sound very cool. But you will still need to manage this two maps(inserting, removing etc...).
Than I came up with other way of doing this.
My reason was; accessing a data with x-y or radius-angle. Think like each point will hold data but point could be described as cartesian x,y or radius-angle .
So I wrote a struct like
struct MyPoint
{
std::pair<int, int> cartesianPoint;
std::pair<int, int> radianPoint;
bool operator== (const MyPoint& rhs)
{
if (cartesianPoint == rhs.cartesianPoint || radianPoint == rhs.radianPoint)
return true;
return false;
}
}
After that I could used that as key,
std::unordered_map<MyPoint, DataType> myMultIndexMap;
I am not sure if your case is the same or adjustable to this scenerio but it can be a option.

C++ index-to-index map

I have a list of IDs (integers).
They are sorted in a really efficient way so that my application can easily handle them, for example
9382
297832
92
83723
173934
(this sort is really important in my application).
Now I am facing the problem of having to access certain values of an ID in another vector.
For example certain values for ID 9382 are located on someVectorB[30].
I have been using
const int UNITS_MAX_SIZE = 400000;
class clsUnitsUnitIDToArrayIndex : public CBaseStructure
{
private:
int m_content[UNITS_MAX_SIZE];
long m_size;
protected:
void ProcessTxtLine(string line);
public:
clsUnitsUnitIDToArrayIndex();
int *Content();
long Size();
};
But now that I raised UNITS_MAX_SIZE to 400.000, I get page stack errors, and that tells me that I am doing something wrong. I think the entire approach is not really good.
What should I use if I want to locate an ID in a different vector if the "position" is different?
ps: I am looking for something simple that can be easily read-in from a file and that can also easily be serialized to a file. That is why I have been using this brute-force approach before.
If you want a mapping from int's to int's and your index numbers non-consecutive you should consider a std::map. In this case you would define it as such:
std::map<int, int> m_idLocations;
A map represents a mapping between two types. The first type is the "key" and is used for lookup up the second type known as the "value". For each id lookup you can insert it with:
m_idLocations[id] = position;
// or
m_idLocations.insert(std::pair<int,int>(id, position));
And you can look them up using the following syntax:
m_idLocations[id];
Typically a std::map in the stl is implemented using red-black trees which have a worse-cast lookup speed of O(log n). This is slightly slower then O(1) that you'll be getting from the huge array however it's a substantially better use of a space and you're unlikely to notice the difference in practise unless you're storing truly gigantic amounts of numbers or doing an enourmous amount of lookups.
Edit:
In response to some of the comments I think it's important to point out that moving from O(1) to O(log n) can make a significant difference in the speed of your application not to mention practical speed concerns from moving to fixed blocks of memory to tree based structure. However I think that it's important to initially represent what you're trying to say (an int-to-int) mapping and avoid the pitfall of premature optimization.
After you've represented the concept you should then use a profiler to determine if and where the speed issues are. If you find that the map is causing issues then you should look at replacing your mapping with something that you think will be quicker. Make sure to test that the optimization helped and don't forget to include a big comment about what you are representing and why it needed to be changed.
if nothing else works you can just allocate the array dynamically in the constructor. this will move the large array on the heap and avoid your page stack error. you should also remember to release the resource while destroying your clsUnitsUnitIDToArrayIndex
But the recommended usage is as suggested by other members, use a std::vector or std::map
Probably you are getting stackoverflow error due to int m_content[UNITS_MAX_SIZE]. The array is allocated on the stack and 400000 is a pretty big number for the stack. You can use std::vector instead, it is dynamically allocated and you can return a reference of vector member to avoid copy operation:
std::vector<int> m_content(UNITS_MAX_SIZE);
const std::vector<int> &clsUnitsUnitIDToArrayIndex::Content() const
{
return m_content;
}

Choosing specific objects satisfying conditions

Let's say I have objects which look very roughly like this:
class object
{
public:
// ctors etc.
bool has_property_X() const { ... }
std::size_t size() const { ... }
private:
// a little something here, but not really much
};
I'm storing these objects inside a vector and the vector is rather small (say, at most around 1000 elements). Then, inside a performance critical algorithm, I would like to choose the object that both has the property X and has the least size (in case there are multiple such objects, choose any of them). I need to do this "choosing" multiple times, and both the holding of the property X and the size may vary in between the choices, so that the objects are in a way dynamic here. Both queries (property, size) can be made in constant time.
How would I best achieve this? Performance is profiled to be important here. My ideas at the moment:
1) Use std::min_element with a suitable predicate. This would probably also need boost::filter_iterator or something similar to iterate over objects satisfying property X?
2) Use some data structure, such as a priority queue. I would store pointers or reference_wrappers to the objects and so forth. This atleast to me, feels slow and probably it's not even feasible because of the dynamic nature of the objects.
Any other suggestions or comments on these thoughts? Should I just go ahead and try any or both of these schemes and profile?
Your last choice is always a good one. Our intuitions about how code will run are often wrong. So where possible profiling is always useful on critical code.