stl map performance? - c++

I am using map<MyStruct, I*> map1;. Apparently 9% of my total app time is spent in there. Specifically on one line of one of my major functions. The map isn't very big (<1k almost always, <20 is common).
Is there an alternative implementation i may want to use? I think i shouldn't write my own but i could if i thought it was a good idea.
Additional info: I always check before adding an element. If a key exist I need to report a problem. Than after a point i will be using map heavily for lookups and will not add any more elements.

First you need to understand what a map is and what the operations that you are doing represent. A std::map is a balanced binary tree, lookup will take O( log N ) operations, each of which is a comparison of the keys plus some extra that you can ignore in most cases (pointer management). Insertion takes roughly the same time to locate the point of insertion, plus allocation of the new node, the actual insertion into the tree and rebalancing. The complexity is again O( log N ) although the hidden constants are higher.
When you try to determine whether an key is in the map prior to insertion you are incurring the cost of the lookup and if it does not succeed, the same cost to locate the point of insertion. You can avoid the extra cost by using std::map::insert that return a pair with an iterator and a bool telling you whether the insertion actually happened or the element was already there.
Beyond that, you need to understand how costly it is to compare your keys, which falls out of what the question shows (MyStruct could hold just one int or a thousand of them), which is something you need to take into account.
Finally, it might be the case that a map is not the most efficient data structure for your needs, and you might want to consider using either an std::unordered_map (hash table) that has expected constant time insertions (if the hash function is not horrible) or for small data sets even a plain ordered array (or std::vector) on which you can use binary search to locate the elements (this will reduce the number of allocations, at the cost of more expensive insertions, but if the held types are small enough it might be worth it)
As always with performance, measure and then try to understand where the time is being spent. Also note that a 10% of the time spent in a particular function or data structure might be a lot or almost nothing at all, depending on what your application is. For example, if your application is just performing lookups and insertions into a data set, and that takes only a 10% of the CPU you have a lot to optimize everywhere else!

Probably it will be quicker to just do an insert and check if the pair.second is false if key already exists:
like this
if ( myMap.insert( make_pair( MyStruct, I* ) ).second == false)
{
// report error
}
else
// inserted new value
... rather than doing a find call every time.

Instead of map you could try unordered_map which uses hash keys, instead of a tree, to find elements. This answer gives some hints when to prefer unordered_map over map.

It might be a long shot, but for small collections, sometimes the most critical factor is the cache performance.
Since std::map implements a Red-Black Tree, which is [AFAIK] not very cache-efficient - maybe implementing the map as a std::vector<pair<MyStruct,I*>> would be a good idea, and use binary search there [instead of map look-ups], at the very least it should be efficient once you start only looking up [stop inserting elements], since the std::vector is more likely to fit in cache than the map.
This factor [cpu-cache] is usually neglected and hidden as constant in the big O notation, but for large collections it might have major effect.

The way you are using the map, you're doing lookups on the basis of a MyStruct instance and depending on your particular implementation, the required comparison may or may not be costly.

Is there an alternative implementation i may want to use? I think i shouldn't write my own but i could if i thought it was a good idea.
If you understand the problem well enough, you should detail how your implementation will be superior.
Is map the proper structure? If so, then your standard library's implementation will likely be of good quality (well optimized).
Can MyStruct comparison be simplified?
Where is the problem -- resizing? lookup?
Have you minimized copy and assign costs for your structures?

As stated in the comments, without proper code, there is little universal answers to give you. However, if MyStruct is really huge the stack copying may be costly. Perhaps it makes sense to store pointers to MyStruct and implement your own compare mechanism:
template <typename T> struct deref_cmp {
bool operator()(std::shared_ptr<T> lhs, std::shared_ptr<T> rhs) const {
return *lhs < *rhs;
}
};
std::map<std::shared_ptr<MyStruct>, I*, deref_cmp<MyStruct>> mymap;
However, this is something you will have to profile. It might speed things up.
You would look up an element like this
template <typename T> struct NullDeleter {
void operator()(T const*) const {}
};
// needle being a MyStruct
mymap.find(std::shared_ptr<MyStruct>(&needle,NullDeleter()));
Needless to say, there is more potential to optimise.

Related

unordered_map to find indices of an array

I want to find indices of a set efficiently. I am using unordered_map and making the inverse map like this
std::unordered_map <int, int> myHash (size);
Int i = 0;
for (it = someSet.begin(); it != someSet.end(); it++)
{
myHash.insert({*it , i++});
}
It works but it is not efficient. I did this so anytime I need the indices I could access them O(1). Performance analysis is showing me that this part became hotspot of my code.
VTune tells me that new operator is my hotspot. I guess something is happening inside the unordered_map.
It seems to me that this case should be handled efficiently. I couldn't find a good way yet. Is there a better solution? a correct constructor?
Maybe I should pass more info to the constructor. I looked up the initialize list but it is not exactly what I want.
Update: Let me add some more information. The set is not that important; I save the set in to an array (sorted). Later I need to find the index of the values which are unique. I can do it in logn but it is not fast enough. It is why I decided to use a hash. The size of the set (columns of submatrix) doesn't change after this point.
It arise from sparse matrix computation which I need to find index of submatrices in a bigger matrix. Therefore the size and the pattern of the look ups is depend on the input matrix. It works reasonable on smaller problems. I could use a lookup table but while I am planning to do it in parallel the lookup table for each thread can be expensive. I have the exact size of hash in the time of creation. I thought by sending it to the constructor it stops reallocating. I really don't understand why it is reallocating this much.
The problem is, std::unordered_map, mainly implemented as a list of vectors, is extremely cache-unfriendly, and will perform especially poorly with small keys/values (like int,int in your case), not to mention requiring tons of (re-)allocations.
As an alternative you can try a third-party hash map implementing open addressing with linear probing (a mouthful, but the underlying structure is simply a vector, i.e. much more cache-friendly). For example, Google's dense_hash_map or this: flat_hash_map. Both can be used as a drop-in replacement for unordered_map, and only additionally require to designate one int value as the "empty" key.
std::unordered_map<int, int> is often implemented as if it was
std::vector<std::list<std::par<int, int>>>
Which causes a lot of allocations and deallocations of each node, each (de-)allocation is using a lock which causes contention.
You can help it a bit by using emplace instead of insert, or you can jump out in the fantastic new world of pmr allocators. If your creation and destruction of the pmr::unordered_map is single threaded you should be able to get a lot of extra performance out of it. See Jason Turners C++ Weekly - Ep 222 - 3.5x Faster Standard Containers With PMR!, his example is a bit on the small side but you can get the general idea.

C++ some questions on boost::unordered_map & boost::hash

I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections.
So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container.
Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it.
In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority.
Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function.
So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value.
The operator== compares basically the state's hash values.
So:
is std::size_t enough to cover billions of possible hash_function
combinations ? In order to avoid duplicate states I intend to use
their hash_values.
When creating a state_map, should I use as key the State* or the hash
value ?
i.e: boost::unordered_map<State*,std::size_t> state_map;
Or
boost::unordered_map<std::size_t,State*> state_map;
Are the lookup times with a boost::unordered_map::iterator =
state_map.find() faster than going through a boost::ptr_vector and
comparing each iterator's key value ?
Finally, any tips or tricks on how to optimize such an unordered map
for speed and fast lookups would be greatly appreciated.
EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function.
I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.
This is a bit muddled.
What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).
You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.
size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.
What's crucial is that your hash function is good, i.e. has few collisions.
You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.
Using hash_combine is good.
Baby example:
struct State
{
inline bool operator==(const State &) const;
/* Stuff */
};
namespace std
{
template <> struct hash<State>
{
inline std::size_t operator()(const State & s) const
{
/* your hash algorithm here */
}
};
}
std::size_t Foo(const State & s) { /* some code */ }
int main()
{
std::unordered_set<State> states; // no extra data needed
std::unordered_set<State, Foo> states; // another hash function
}
An unordered_map is a hashtable. You don't store the hash; it is done internally as the storage and lookup method.
Given your requirements, an unordered_set might be more appropriate, since your object is the only item to store.
You are a little confused though -- the equality operator and hash function are not truly performance items, but required for nontrivial objects for the container to work correctly. A good hash function will distribute your nodes evenly across the buckets, and the equality operator will be used to remove any ambiguity about matches based on the hash function.
std::size_t is fine for the hash function. Remember that no hash is perfect; there will be collisions, and these collision items are stored in a linked list at that bucket position.
Thus, .find() will be O(1) in the optimal case and very close to O(1) in the average case (and O(N) in the worst case, but a decent hash function will avoid that.)
You don't mention your platform or architecture; at billions of entries you still might have to worry about out-of-memory situations depending on those and the size of your State object.
forget about hash; there is nothing (at least from your question) that suggests you have a meaningful key;
lets take a step back and rephrase your actual performance goals:
you want to quickly validate no duplicates ever exist for any of your State objects
comment if i need to add others.
From the aforementioned goal, and from your comment i would suggest you use actually a ordered_set rather than an unordered_map. Yes, the ordered search uses binary search O(log (n)) while unordered uses lookup O(1).
However, the difference is that with this approach you need the ordered_set ONLY to check that a similar state doesn't exist already when you are about to create a new one, that is, at State creation-time.
In all the other lookups, you actually don't need to look into the ordered_set! because you already have the key; State*, and the key can access the value by the magic dereference operator: *key
so with this approach, you only are using the ordered_set as an index to verify States on creation time only. In all the other cases, you access your State with the dereference operator of your pointer-value key.
if all the above wasn't enough to convince you, here is the final nail in the coffin of the idea of using a hash to quickly determine equality; hash function has a small probability of collision, but as the number of states will grow, that probability will become complete certainty. So depending on your fault-tolerance, you are going to deal with state collisions (and from your question and the number of States you are expecting to deal, it seems you will deal with a lot of them)
For this to work, you obviously need the compare predicate to test for all the internal properties of your state (giroscope, thrusters, accelerometers, proton rays, etc.)

Why is std::tr1::unordered_map slower than a homegrown hash map?

I wrote a basic program that takes strings and counts the incidences of unique ones by inserting them into a string->integer hash map.
I use std::tr1::unordered_map for the storage, templated for a custom hash function and a custom equality function. The key type is actually char* rather than the too-slow std::string.
I then changed the same code to use a very, very simple hash table (really an array of {key, value} structures indexed by hash) with a power-of-two size and linear probing for collisions. The program got 33% faster.
Given that when I was using tr1::unordered_map I presized the hash table so it never had to grow, and that I was using exactly the same hash and comparison routines, what is tr1::unordered_map doing that slows it down by 50% as compared to the most basic hash map imaginable?
Code for the hash map type I'm talking about as "simple" here:
typedef struct dataitem {
char* item;
size_t count;
} dataitem_t;
dataitem_t hashtable[HASHTABLE_SIZE] = {{NULL,0}}; // Start off with empty table
void insert(char* item) {
size_t hash = generate_hash(item);
size_t firsthash = hash;
while (true) {
hash &= HASHTABLE_SIZE_MASK; // Bitmasking effect is hash %= HASHTABLE_SIZE
if (hashtable[hash].item == NULL) { // Free bucket
hashtable[hash].item = item;
hashtable[hash].count = 1;
break;
}
if (strcmp(hashtable[hash].item, item) == 0) { // Not hash collision; same item
hashtable[hash].count += 1;
break;
}
hash++; // Hash collision. Move to next bucket (linear probing)
if (hash == firsthash) {
// Table is full. This does not happen because the presizing is correct.
exit(1);
}
}
}
I wish to extend #AProgrammer answer.
Your hash map is simple because it is custom tailored to your need. On the other hand std::tr1::unordered_map has to fulfill a number of different tasks, and do well in all case. This require a mean-performance approach in all cases, so it'll never be excellent in any particular area.
Hash containers are very special in that there are many ways to implement them, you chose Open-Addressing, while the standard forces a bucket approach on the implementors. Both have different trade-offs, and this is one reason why the standard, this time, actually enforced a particular implementation: so that performance do not change dramatically when switching from one library to another. Simply specifying Big-O complexity / amortized complexity would not have been enough here.
You say that you instructed the unordered_map as to the number of finals elements, but did you change the load factor ? Chaining is notoriously "bad" (because of the lack of memory locality) in case of collisions, and using a smaller load factor would favor spreading out your elements.
Finally, to point out one difference: what happens when you resize your hash map ? By using chaining, the unordered_map does not move the elements in memory:
references to them are still valid (even though the iterators may be invalidated)
in case of big or complex objects, there is no invocation of copy constructors
This is in contrast with your simple implementation, which would incur O(N) copies (unless you use linear rehashing to spread out the work, but this is definitely not simple).
It seems, therefore, that the choice for unordered_map was to smooth the spikes, at the cost of a slower average insert.
There is something you can do though: provide a custom allocator. By writing a specific allocator for your usecase, and allocate all its memory in one go (since you know how many objects will be inserted, and can have the allocator report how much memory is a node). Then allocate the nodes in a stack-like fashion (simple pointer increase). It should improve (somewhat) the performance.
Your "homegrown hash map" is not a hash map at all, it's an intrusive hash set.
And that's the reason it's faster. Simple as that.
Well, actually intrusive hash set isn't exact either, but it's the closest match.
In general comparing speed of components not build to the same spec isn't fair.
Without knowing exactly what you have measured -- which mix of operations on which load factor with which mix of present/absent data --, it is difficult to explain where the difference come from.
The TR1 of g++ solve collision by chaining. This implies dynamic allocation. But this also gives better performance at high load level.
Your "homegrown" hash map is faster1 than std::tr1::unordered_map because, as you yourself said, your homegrown hash map is "simple" and it doesn't handle checking if the hash table is full. And possibly many things that you're not checking before operating on it. That may be the reason why your hash map is faster than std::tr1::unordered_map.
Also, the performance of std::tr1::unordered_map is defined by the implementation, so different implementation would perform differently speed-wise. You can see its implementation and compare it with yours, as that is the first thing you can do, and I believe, that will also answer your question to some extent.
1. I just assumed your claim to be correct, and based on it I said the above thing.

Fastest C++ map?

Correct me I'm wrong but std::map is an ordered map, thus each time I insert a value the map uses an algorithm to sort its items internally, which takes some time.
My application gets information regarding some items on a constant interval.
This app keeps a map which is defined like this:
::std::map<DWORD, myItem*>
At first all items are considered "new" to the app. An "Item" object is being allocated and added to this map, associating its id and a pointer to it.
When it's not a "new" item (just an update of this object) my app should find the object at the map, using the given id, and update.
Most of the times I get updates.
My question is:
Is there any faster map implementation or should I keep using this one?
Am I better use unordered_map?
Am I better use unordered_map?
Possibly.
std:map provides consistent performance at O(log n) because it needs to be implemented as a balanced tree. But std:unordered_map will be implemented as a hash table which might give you O(1) performance (good hash function and distribution of keys across hash buckets), but it could be O(n) (everything in one hash bucket and devolves to a list). One would normally expect something inbetween these extremes.
So you can have reasonable performance (O(log n)) all the time, or you need to ensure everything lines up to get good performance with a hash.
As with any such question: you need to measure before committing to one approach. Unless your datasets are large you might find there is no significant difference.
Important warning: Unless you have measured (and your question suggests that you haven't) that map performance substantially influences your application performance (large percentage of time is spent on searching and updating the map) don't bother with making it faster.
Stick to std::map (or std::unordered_map or any available hash_map implementation).
Speeding up your application by 1% probably will not be worth the effort.
Make it bug free instead.
Echoing Richard's answer: measure performance with different map implementation using your real classes and real data.
Some additional notes:
Understand the difference between expected cost (hash maps usually have it lower), worst case cost (O(logn) for balanced binary tree but much higher for hash map if insert triggers reallocation of hash array) and amortized cost (total cost divided by number of operations or elements; depends on things like ratio of new and existing elements). You need to find out which is more constraining in your case. For example reallocating of hash maps can be too much if you need to adhere to very low latency limit.
Find out where real bottleneck is. It might be that cost of searching in map is insignificant compared to e.g. IO cost.
Try more specialized map implementation. For example a lot can be gained if you know something more about map's key. Authors of generic map implementations do not have such knowledge.
In your example (32 bit unsigned integer keys which strongly cluster, e.g. are assigned sequentially) you can use radix based approach. Very simple example (threat it as an illustration, not ready to use recipe):
Item *sentinel[65536]; // sentinel page, initialized to NULLs.
Item (*pages[65536])[65536]; // list of pages,
// initialized so every element points to sentinel
Then search is as simple as:
Item *value = pages[index >> 16][index & 0xFFFF];
When you need to set new value:
if (pages[index >> 16] == sentinel) {
pages[index >> 16] = allocate_new_null_filled_page();
}
pages[index >> 16][index & 0xFFFF] = value;
Tweak your map implementation.
E.g. every hash_map likes to know approximate number of elements in advance. It helps avoid unnecessary reallocation of hash table and (possibly) rehashing of all keys.
With my specialized example above you certainly would try different page sizes, or three level version.
Common optimization is providing specialized memory allocator to avoid multiple allocations of small objects.
Whenever you insert or delete item, the memory allocation/deallocation costs a lot. Instead you can use an allocator like this one: https://github.com/moya-lang/Allocator which speeds up std::map twice as author says, but I found it even faster especially for other STL containers.

How can I increase the performance in a map lookup with key type std::string?

I'm using a std::map (VC++ implementation) and it's a little slow for lookups via the map's find method.
The key type is std::string.
Can I increase the performance of this std::map lookup via a custom key compare override for the map? For example, maybe std::string < compare doesn't take into consideration a simple string::size() compare before comparing its data?
Any other ideas to speed up the compare?
In my situation the map will always contain < 15 elements, but it is being queried non stop and performance is critical. Maybe there is a better data structure that I can use that would be faster?
Update: The map contains file paths.
Update2: The map's elements are changing often.
First, turn off all the profiling and DEBUG switches. These can slow down STL immensely.
If that's not it, part of the problem may be that your strings are identical for the first 80-90% of the string. This isn't bad for map, necessarily, but it is for string comparisons. If this is the case, your search can take much longer.
For example, in this code find() will likely result in a couple of string compares, but each will return after comparing the first character until "david", and then the first three characters will be checked. So at most, 5 characters will be checked per call.
map<string,int> names;
names["larry"] = 1;
names["david"] = 2;
names["juanita"] = 3;
map<string,int>::iterator iter = names.find("daniel");
On the other hand, in the following code, find() will likely check 135+ characters:
map<string,int> names;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/wilma"] = 1;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/fred"] = 2;
names["/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/barney"] = 3;
map<string,int>::iterator iter = names.find("/usr/local/lib/fancy-pants/share/etc/doc/foobar/longpath/yadda/yadda/betty");
That's because the string comparisons have to search deeper to find a match since the beginning of each string is the same.
Using size() in your comparison for equality won't help you much here since your data set is so small. A std::map is kept sorted so its elements can be searched with a binary search. Each call to find should result in less than 5 string comparisons for a miss, and an average of 2 comparisons for a hit. But it does depend on your data. If most of your path strings are of different lengths, then a size check like Motti describes could help a lot.
Something to consider when thinking of alternative algorithms is how many many "hits" you get. Are most of your find() calls returning end() or a hit? If most of your find()s return end() (misses) then you are searching the entire map every time (2logn string compares).
Hash_map is a good idea; it should cut your search time in about half for hits; more for misses.
A custom algorithm may be called for because of the nature of path strings, especially if your data set has common ancestry like in the above code.
Another thing to consider is how you get your search strings. If you are reusing them, it may help to encode them into something that is easier to compare. If you use them once and discard them, then this encoding step is probably too expensive.
I used something like a Huffman coding tree once (a long time ago) to optimize string searches. A binary string search tree like that may be more efficient in some cases, but its pretty expensive for small sets like yours.
Finally, look into alternative std::map implementations. I've heard bad things about some of VC's stl code performance. The DEBUG library in particular is bad about checking you on every call. StlPort used to be a good alternative, but I haven't tried it in a few years. I've always loved Boost too.
As Even said the operator used in a set is < not ==.
If you don't care about the order of the strings in your set you can pass the set a custom comparator that performs better than the regular less-than.
For example if a lot of your strings have similar prefixes (but they vary in length) you can sort by string length (since string.length is constant speed).
If you do so beware a common mistake:
struct comp {
bool operator()(const std::string& lhs, const std::string& rhs)
{
if (lhs.length() < rhs.length())
return true;
return lhs < rhs;
}
};
This operator does not maintain a strict weak ordering, as it can treat two strings as each less than the other.
string a = "z";
string b = "aa";
Follow the logic and you'll see that comp(a, b) == true and comp(b, a) == true.
The correct implementation is:
struct comp {
bool operator()(const std::string& lhs, const std::string& rhs)
{
if (lhs.length() != rhs.length())
return lhs.length() < rhs.length();
return lhs < rhs;
}
};
The first thing is to try using a hash_map if that's possible - you are right that the standard string compare doesn't first check for size (since it compares lexicographically), but writing your own map code is something you'd be better off avoiding. From your question it sounds like you do not need to iterate over ranges; in that case map doesn't have anything hash_map doesn't.
It also depends on what sort of keys you have in your map. Are they typically very long? Also what does "a little slow" mean? If you have not profiled the code it's quite possible that it's a different part taking time.
Update: Hmm, the bottleneck in your program is a map::find, but the map always has less than 15 elements. This makes me suspect that the profile was somehow misleading, because a find on a map this small should not be slow, at all. In fact, a map::find should be so fast, just the overhead of profiling could be more than the find call itself. I have to ask again, are you sure this is really the bottleneck in your program? You say the strings are paths, but you're not doing any sort of OS calls, file system access, disk access in this loop? Any of those should be orders of magnitude slower than a map::find on a small map. Really any way of getting a string should be slower than the map::find.
You can try to use a sorted vector (here's one sample), this may turn out to be faster (you'll have to profile it to make sure of-course).
Reasons to think it'll be faster:
Less memory allocations and deallocations (the vector will expand to the maximal size used and then reuse freed memory).
Binary find with random access should be faster than tree traversal (espacially due to data locality).
Reasons to think it'll be slower:
Deleations and additions will mean moving strings around in memory, since string's swap is efficiant and the size of the data set is small this may not be an issue.
std::map's comparator isn't std::equal_to it's std::less, I'm not sure what the best way to short circuit a < compare so that it would be faster than the built in one.
If there are always < 15 elems, perhaps you could use a key besides std::string?
Motti has a good solution. However, I'm pretty sure that for your < 15 elements a map isn't the right way because its overhead will always be greater than that of a simple lookup table with an appropriate hashing scheme. In your case, it might even be enough to hash by length alone, and if that still produces collisions, use a linear search through all entries of the same length.
To establish if I'm right, a benchmark is of course required but I'm quite sure of its outcome.
You might consider pre-computing a hash for a string, and saving that in your map. Doing so gives the advantage of hash compares instead of string compares during the search through the std::map tree.
class HashedString
{
unsigned m_hash;
std::string m_string;
public:
HashedString(const std::string& str)
: m_hash(HashString(str))
, m_string(str)
{};
// ... copy constructor and etc...
unsigned GetHash() const {return m_hash;}
const std::string& GetString() const {return m_string;}
};
This has the benefits of computing a hash of the string once, on construction. After this, you could implement a comparison function:
struct comp
{
bool operator()(const HashedString& lhs, const HashedString& rhs)
{
if(lhs.GetHash() < rhs.GetHash()) return true;
if(lhs.GetHash() > rhs.GetHash()) return false;
return lhs.GetString() < rhs.GetString();
}
};
Since hashes are now computed on HashedString construction, they are stored that way in the std::map, and so the compare can happen very quickly (an integer compare) in an astronomically high percentage of the time, falling back on standard string compares when the hashes are equal.
Maybe you could reverse the strings prior to using them as keys in the map? That could help if the first few letters of each string are identical.
Here are some things you can consider:
0) Are you sure this is where the performance bottleneck is? Like the results from Quantify, Cachegrind, gprof or something like that? Because lookups on such a smap map should be fairly fast...
1) You can override the functor used to compare the keys in std::map<>, there is a second template parameter to do that. I doubt you can do much better than operator<, however.
2) Are the contents of the map changing a lot? If not, and given the very small size of your map, maybe using a sorted vector and binary search could yield better results (for example because you can exploit memory locality better.
3) Are the elements known at compile time? You could use a perfect hash function to improve lookup times if that is the case. Search for gperf on the web.
4) Do you have a lot of lookups that fail to find anything? If so, maybe comparing with the first and last elements in the collection may eliminate many mismatches quicker than a full search every time.
These have been suggested already, but in more detail:
5) Since you have so few strings, maybe you could use a different key. For example, are your keys all the same size? Can you use a class containing a fixed-length array of characters? Can you convert your strings to numbers or some data structure with only numbers?
Depending on the usage cases, there are some other techniques you can use. For example we had an application that needed to keep up with over a million different file paths. The problem with that there were thousands of objects that needed to keep small maps of these file paths.
Since adding new file paths to the data set was an infrequent operation, when path was added to the system, a master map was searched. If the path was not found, then it was added and a new sequenced integer (starting at 1) was returned. If the path already existed, then the previously assigned integer was returned. Then each map maintained by each object was converted from a string based map to an integer map. Not only did this greatly improve performance, it reduced memory usage by not having so many duplicate copies of the strings.
Sure, this is a very specific optimization. But when it comes to performance improvements, you often find yourself having to make tailored solutions to specific problems.
And I hate strings :) Not are they slow to compare, but they can really trash your CPU caches on high performance software.
Try std::tr1::unordered_map (found in the header <tr1/unordered_map>). This is a hash map, and, while it doesn't maintain a sorted order of elements, will likely be far faster than a regular map.
If your compiler doesn't support TR1, get a newer version. MSVC and gcc both support TR1, and I believe the newest versions of most other compilers also have support. Unfortunately, a lot of the library reference sites haven't been updated, so TR1 remains a largely-unknown piece of technology.
I hope C++0x isn't the same way.
EDIT: Note that the default hashing method for tr1::unordered_map is tr1::hash, which needs to be specialized to work on a UDT, probably.
Where you have long common substrings, a trie might be a better data structure than a map or a hash_map. I said "might", though - a hash_map already only traverses the key once per lookup, so should be fairly fast. I won't discuss it further since others already have.
You could also consider a splay tree if some keys are more frequently looked up than others, but of course this makes the worst-case lookup worse than a balanced tree, and lookups are mutating operations, which may matter to you if you're using e.g. a reader-writer lock.
If you care about the performance of lookups more than modifications, you might do better with an AVL tree than a red-black, which I think is what STL implementations generally use for map. An AVL tree is typically better balanced and so will on average require fewer comparisons per lookup, but the difference is marginal.
Finding an implementation of these that you're happy with might be an issue. A search on the Boost main page suggests they have a splay and AVL tree but not a trie.
You mentioned in a comment that you never have a lookup that fails to find anything. So you could in theory skip the final comparison, which in a tree of 15 < 2^4 elements could give you something like a 20-25% speedup without doing anything else. In fact, maybe more than that, since equal strings are the slowest to compare. Whether it's worth writing your own container just for this optimisation is another question.
You might also consider locality of reference - I don't know whether you could avoid the occasional page miss by allocating the keys and the nodes out of a small heap. If you only need about 15 entries at a time, then assuming a file name limit below 256 bytes you could ensure that everything accessed during a lookup fits into a single 4k page (apart from the key being looked up, of course). It may be that comparing the strings is insignificant compared with a couple of page loads. However, if this is your bottleneck there must be an enormous number of lookups going on, so I'd guess that everything is reasonably close to the CPU. Worth checking, maybe.
Another thought: if you are using pessimistic locking on a structure where there's a lot of contention (you said in a comment the program is massively multi-threaded) then regardless of what the profiler tells you (what code the CPU cycles are spent in), it might be costing you more than you think by effectively limiting you to 1 core. Try a reader-writer lock?
hash_map is not standard, try using unordered_map available in tr1 (which is available in boost if your tool chain doesn't already have it).
For small numbers of strings you might be better using vector, as map is typically implemented as a tree.
Why don't you use a hashtable instead? boost::unordered_map could do. Or you can roll out your own solution, and store the crc of a string instead of the string itself. Or better yet, put #defines for the strings, and use those for lookup, e.g.,
#define "STRING_1" STRING_1