I have created a vector which contains several map<>.
vector<map<key,value>*> v;
v.push_back(&map1);
// ...
v.push_back(&map2);
// ...
v.push_back(&map3);
At any point of time, if a value has to be retrieved, I iterate through the vector and find the key in every map element (i.e. v[0], v[1] etc.) until it's found. Is this the best way ? I am open for any suggestion. This is just an idea I have given, I am yet to implement this way (please show if any mistake).
Edit: It's not important, in which map the element is found. In multiple modules different maps are prepared. And they are added one by one as the code progresses. Whenever any key is searched, the result should be searched in all maps combined till that time.
Without more information on the purpose and use, it might be a little difficult to answer. For example, is it necessary to have multiple map objects? If not, then you could store all of the items in a single map and eliminate the vector altogether. This would be more efficient to do the lookups. If there are duplicate entries in the maps, then the key for each value could include the differentiating information that currently defines into which map the values are put.
If you need to know which submap the key was found in, try:
unordered_set<key, pair<mapid, value>>
This has much better complexity for searching.
If the keys do not overlap, i.e., are unique througout all maps, then I'd advice a set or unordered_set with a custom comparision functor, as this will help with the lookup. Or even extend the first map with the new maps, if profiling shows that is fast enough / faster.
If the keys are not unique, go with a multiset or unordered_multiset, again with a custom comparision functor.
You could also sort your vector manually and search it with a binary_search. In any case, I advice using a tree to store all maps.
It depends on how your maps are "independently created", but if it's an option, I'd make just one global map (or multimap) object and pass that to all your creators. If you have lots of small maps all over the place, you can just call insert on the global one to merge your maps into it.
That way you have only a single object in which to perform lookup, which is reasonably efficient (O(log n) for multimap, expected O(1) for unordered_multimap).
This also saves you from having to pass raw pointers to containers around and having to clean up!
Related
I am trying to sort a large collection of objects into a series of groups, which represent some kind of commonality between them.
There seems to be two ways I can go about this:
1) I can manage everything by hand, sorting out all the objects into a vector of vectors. However, this means that I have to iterate over all the upper level vectors every time I want to try and find an existing group for an ungrouped object. I imagine this will become very computationally expensive very quickly as the number of disjoint groups increases.
2) I can use the identifiers of each object that I'm using to classify them as a key for an std::map, where the value is a vector. At that point, all I have to do is iterate over all the input objects once, calling myMap[object.identifier].push_back(object) each time. The map will sort everything out into the appropriate vector, and then I can just iterate over the resulting values afterwards.
My question is...
Which method would be best to use? It seems like a vector of vectors would be faster initially, but it's going to slow down as more and more groups are created. AFAIK, std::map uses RB trees internally, which means that finding the appropriate vector to add the object to should be faster, but you're going to pay for that when the tree inevitably needs to be rebalanced.
The additional memory consumption from an std::map doesn't matter. I'm dealing with anywhere from 12000 to 80000 individual objects that need to be grouped together, and I expect there to be anywhere from 12000 to 20000 groups once everything is said and done.
Instead of using either of your mentioned approaches directly, I suggest you evaluate the use of std::unordered_map (docs here) for your use case. It uses maps with buckets and hashed values internally and has average constant complexity for search, insertion and removal.
For certain reasons, I have a linked list of objects, with the Object containing a string.
I might be required to search for a particular string, and in doing so, retrieve the object, based on that string.
The starting header of the list is the only input I have for the list.
Though the number of objects I have, is capped at 3000, and that's not so much, I still wondered if there was an efficient way to do this, instead of searching the objects one by one for a matching string.
The Objects in the list are not sorted in any way, and I cannot expect them to be sorted, and the entry point to the linked list is the only input I have.
So, could anyone tell me if there's an efficient way ( search algorithm perhaps ) to achieve this?
Separately for this kind of search, if required, what would be the sort of data structure recommended, assuming that this search be the most data intensive function of the object?
Thanks..
Use a std::map<std::string, YourObjectType>. You can still iterate all objects. But they are sorted by the string now.
If you might have multiple objects with the same string, use a multimap instead.
If you can't switch to any different structure/container, then there is no way to do this better than linear to size of list.
Having 3000 you would like to use a unordered map instead of a linked list, which will give you average O(1) lookup, insertion, and deletion time.
I am writing a program for numerical simulation by using std::map to store some key-value pairs. The map is used as storing the states evoluted during the simulation. The type of the key is a integer and the value of corresponds to the key tells how many copies are there for the same keys, i.e. std::map. For each step of the simulation, I need to calculate how many values are there for the same key, so I will check that by the following code
if (map[key]>0) {do something here with the number of copies}
However, I soon find that this code doesn't work because even there is no such key in the map, whenever you call the map[key], it will generate a placeholder for that key and set the value as zero; therefore, I always overcount the total number of keys by std::map.size(). I later change the code as follow to search the key instead
if (map.find(key)!=map.end()) {...}
So is it the only and fastest way to check if a key exists or not for a map? I am going to run the simulation for hundreds millions times and it will call above code very often to check the key. Will it be too slow to use map.find() instead? Thanks.
The find member function is probably the fastest way to find whether a key is already in the map. That said, if you don't need to iterate over items in the map in order, you might get better performance with an std::unordered_map instead.
In a std::map or hashtable (std::unordered_map), the find function is very fast, as fast as using the [] subscripting operator. In fact, it's faster when the element is not found, because it doesn't have to insert one.
I don't think there is much difference in speed for various ways to check for existence of key. On the other hand: if your keys are integers and range is known, you might just use the array.
BTW:
I got interested about the speed of simple array, vector, map and unordered map. I have written simple program, that does 100000000 container[n]++, where n is a random number in range of 0 to 10000. The results:
array: 1.27s
vector: 1.36s
unordered map: 2.6s
map: 11.6s
The overhead of loop + index calculation in this simple case is ~0.8s.
So it all depends on how much time is spent elsewhere. If it's considerably more (per 100000000 iterations) then it does not matter much what you use. But if it's not, it can be quite a difference.
you can use hash_map, it is the fastest data structures for your key-value type;
also you can use map,but it is slower than hash_map
I have to store information about contents in a lookup table such that it can be accessed very quickly.I might need to refer some of the elements in look up table recursively to get complete information about contents. What will be better data structure to use:
Map with one of parameter, which will be unique to all the entries in look up table, as key and rest of the information as value
Use static array for each unique entries and access them when needed according to key(which will be same as the one used in MAP).
I want my software to be robust as if we have any crash it will be catastrophic for my product.
It depends on the range of keys that you have.
Usually, when you say lookup table, you mean a smallish table which you can index directly ( O(1) ). As a dumb example, for a substitution cipher, you could have a char cipher[256] and simply index with the ASCII code of a character to get the substitution character. If the keys are complex objects or simply too many, you're probably stuck with a map.
You might also consider a hashtable (see unordered_map).
Reply:
If the key itself can be any 32-bit number, it wouldn't make sense to store a very sparse 4-billion element array.
If however your keys are themselves between say 0..10000, then you can have a 10000-element array containing pointers to your objects (or the objects themselves), with only 2000-5000 of your elements containing non-null pointers (or meaningful data, respectively). Access will be O(1).
If you can have large keys, then I'd probably go with the unordered_map. With a map of 5000 elements, you'd get O(log n) to mean around ~12 accesses, a hash table should be pretty much one or two accesses tops.
I'm not familiar with perfect hashes, so I can't advise about their implementation. If you do choose that, I'd be grateful for a link or two with ideas to keep in mind.
The lookup times in a std::map should be O=ln(n), with a linear search in a static array in the worst case O=n.
I'd strongly opt for a std::map even if it has a larger memory footprint (which should not matter, in the most cases).
Also you can make "maps of maps" or even deeper structures:
typedef std::map<MyKeyType, std::map<MyKeyType, MyValueType> > MyDoubleMapType;
I have a vector which contains a list of Hash maps in Clojure and I have an add-watch on this vector to see any changes made. Is there an easy way to do a diff on the changes made to the hash map, so that maybe I could get a list of just the changed entries in the hash?
Note: This follows on from some earlier posts I have had where I have tried to persist changes to a database for a data structure stored in a ref. I have realised that the easiest way to save state is simple to watch the ref for changes and then store those changes. My ideal solution would be if the add-watch was passed a changelist as well :)
You probably need to define "diff" a little more precisely. For example, does an insertion in the middle of the vector count as a single change or as a change of that element and all subsequent ones? Also are your vectors guaranteed to be the same length?
Having said that, the simple approach would be something like:
First check the length of the two vectors. If one is longer, then consider the extra elements as changes
Then compare all the other elements with the corresponding element in the other vector using not= (this workes with hashes and will be very fast in the common case that the elements haven't changed). Something to get you started: (map not= vector-1 vector-2)
You can then use the answer from stackoverflow.com/questions/3387155/difference-between-two-maps that pmf mentioned if you want to find out exactly how the two maps differ.