perform a binary search on map elements - c++

I'm having a map<double, unique_ptr<Item>>. I would like to search this map, to find the item where a computed value is closest to the search value. The computed value can be generated by Item::compute which is a length computation, that I would like avoid doing for all elements. It can be assumed that this map is already ordered according to the results of the compute function.
So I thought that I could make a binary search, but the problem is, that I cannot really jump to the nth element in the map, since it is a map and not a vector. More specifically, I would need to get the middle item between two arbitrary items in the map. Is that possible? Is there a way an efficiant way to perform binary search within a map?

Use either the lower_bound() or the upper_bound() std::map methods. See the std::map documentation for both of these methods, that find an existing key nearest to the search key, if one does not exist. You don't need to code the binary search yourself, these methods do it for you.
Although using double as a map key is problematic, of course, I guess that using lower_bound() or upper_bound() might produce reasonable results, in this use case.

Related

How to only make searching in std::map using a custom comparator?

I want the std::map to use the comparator only while searching, e.g. the rest operations including the operation of insertion one must use the default one. Is it possible?
I want the std::map to use the comparator only while searching ... Is it possible?
Well, you can do linear search on the map using whatever comparator you want. But that won't be as fast as using the search tree structure that the map has, which is built using the comparator of the map.
I have a map with keys that are regular expressions (represented by strings). So when I want to find some value by the key, the map must check if the key matches to one of the map's regular expression.
It seems that linear search is what you need indeed.

Speeding up positional access to a std::map

I find myself in a situation where I have to access a std::map by position. Since std::advance(iter, position) is slow af, I want to add a second data structure to speedup this operation.
My idea: Maintain a vector for every key-value pair in the map. Then access the vector by position, vector[position]->second.
When erasing/inserting a new element I obviously have to remove the iterator from the vector. But besides that the iterator-preserving properties of std::map seem to be sufficient.
Question: Is this a good idea?
Alternative: Maintain a vector of map::keys. Then access vector by position an use the key to lookup the value in the map,map[vector[position]]. Is this smarter?
If iteration through the map is your primary performance issue, then you should be using a flat_map (not as of yet part of the standard, but Boost has a decent one). If you don't have one of those, just use a vector<pair> that you keep sorted using the same comparison function you would have used in your map.
You can use std::lower_bound as the equivalent function for being able to find a value by its key. You can also use the iterator returned from std::lower_bound as the position for doing a single-element insertion of a new element. Everything else works just like any other vector; simply keep it sorted and you'll be fine.
std::map search, removal, and insertion operations have logarithmic complexity. The same complexity can be achieved by a sorted std::vector
Don't use a map, but a vector. Keep it sorted by key. Binary search by key is logaritgmic. Access by position is the fastest.
The only drawback is that inserting and removing needs memory reallocation. You may test its performance and consider if it's worth.

How can I find out if a map contains a given value?

I'd like to find out whether a given value is present in a map. Also getting the corresponding key(s) would be nice but is not required.
bool map::contains(string value);
Is there a simple way to do this other than to iterate over the whole map and comparing each value with the given value? Why is there no corresponding method in the STL?
std::map only indexes its elements by key; it does not index them by value. Therefore, there is no way to look up an element by its value without iterating over the map.
Take a look at Boost.Bimap:
Boost.Bimap is a bidirectional maps library for C++. With Boost.Bimap you can create associative containers in which both types can be used as key. A bimap<X,Y> can be thought of as a combination of a std::map<X,Y> and a std::map<Y,X>.
Using it is pretty straightforward, although you will of course need to consider the question of whether duplicate values are allowed.
Also, see Is there a Boost.Bimap alternative in c++11?

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.

Why retrieving elements from CMap is not ordered

In my application, I have a CMap of CString values. After adding the elements in the Map, if I retrieve the elements in some other place, am not getting the elements in the order of insertion.Suppose I retrieve the third element, I get the fifth like that. Is it a behavior of CMap. Why this happens?
You asked for "why", so here goes:
A Map provides for an efficient way to retrieve values by key. It does this by using a clever datastructure that is faster for this than a list or an array would be (where you have to search through the whole list before you know if an element is in there or not). There are trade-offs, such as increased memory usage, and the inability to do some other things (such as knowing in which order things were inserted).
There are two common ways to implement this
a hashmap, which puts keys into buckets by hash value.
a treemap, which arranges keys into a binary tree, according to how they are sorted
You can iterate over maps, but it will be according to how they are stored internally, either in key order (treemap) or completely unpredictable (hashmap). Your CMap seems to be a hashmap.
Either way, insertion order is not preserved. If you want that, you need an extra datastructure (such as a list).
How about read documentation to CMap?
http://msdn.microsoft.com/ru-ru/library/s897094z%28v=vs.71%29.aspx
It's unordered map really. How you retrieve elements? By GetStartPosition and GetNextAssoc? http://msdn.microsoft.com/ru-ru/library/d82fyybt%28v=vs.71%29.aspx read Remark here
Remarks
The iteration sequence is not predictable; therefore, the "first element in the map" has no special significance.
CMap is a dictionary collection class that maps unique keys to values. Once you have inserted a key-value pair (element) into the map, you can efficiently retrieve or delete the pair using the key to access it. You can also iterate over all the elements in the map.