I was using a map with a std::string key and while everything was working fine I wasn't getting the performance I expected. I searched for places to optimize and improved things only a little and that's when a colleague said, "that string key is going to be slow."
I read dozens of questions and they consistently say:
"don't use a char * as a key"
"std::string keys are never your bottleneck"
"the performance difference between a char * and a
std::string is a myth."
I reluctantly tried a char * key and there was a difference, a big difference.
I boiled the problem down to a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <map>
#ifdef USE_STRING
#include <string>
typedef std::map<std::string, int> Map;
#else
#include <string.h>
struct char_cmp {
bool operator () (const char *a,const char *b) const
{
return strcmp(a,b)<0;
}
};
typedef std::map<const char *, int, char_cmp> Map;
#endif
Map m;
bool test(const char *s)
{
Map::iterator it = m.find(s);
return it != m.end();
}
int main(int argc, char *argv[])
{
m.insert( Map::value_type("hello", 42) );
const int lcount = atoi(argv[1]);
for (int i=0 ; i<lcount ; i++) test("hello");
}
First the std::string version:
$ g++ -O3 -o test test.cpp -DUSE_STRING
$ time ./test 20000000
real 0m1.893s
Next the 'char *' version:
g++ -O3 -o test test.cpp
$ time ./test 20000000
real 0m0.465s
That's a pretty big performance difference and about the same difference I see in my larger program.
Using a char * key is a pain to handle freeing the key and just doesn't feel right. C++ experts what am I missing? Any thoughts or suggestions?
You are using a const char * as a lookup key for find(). For the map containing const char* this is the correct type that find expects and the lookup can be done directly.
The map containing std::string expects the parameter of find() to be a std::string, so in this case the const char* first has to be converted to a std::string. This is probably the difference you are seeing.
As sth noted, the issue is one of specifications of the associative containers (sets and maps), in that their member search methods always force a conversion to the key_type, even if an operator< exists that would accept to compare your key against the keys in the map despite their different types.
On the other hand, the functions in <algorithm> do not suffer from this, for example lower_bound is defined as:
template< class ForwardIt, class T >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value );
template< class ForwardIt, class T, class Compare >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp );
So, an alternative could be:
std::vector< std::pair< std::string, int > >
And then you could do:
std::lower_bound(vec.begin(), vec.end(), std::make_pair("hello", 0), CompareFirst{})
Where CompareFirst is defined as:
struct CompareFirst {
template <typename T, typename U>
bool operator()(T const& t, U const& u) const { return t.first < u.first; }
};
Or even build a completely custom comparator (but it's a bit harder).
A vector of pair is generally more efficient in read-heavy loads, so it's really to store a configuration for example.
I do advise to provide methods to wrap the accesses. lower_bound is pretty low-level.
If your in C++ 11, the copy constructor is not called unless the string is changed. Because std::string is a C++ construct, at least 1 dereference is needed to get at the string data.
My guess would be the time is taken up in an extra dereference (which if done 10000 times is costly), and std::string is likely doing appropriate null pointer checks, which again eats up cycles.
Store the std::string as a pointer and then you lose the copy constructor overhead.
But after you have to remember to handle the deletes.
The reason std::string is slow is that is constructs itself. Calls the copy constructor, and then at the end calls delete. If you create the string on the heap you lose the copy construction.
After compilation the 2 "Hello" string literals will have the same memory address. On the char * case you use this memory addresses as keys.
In the string case every "Hello"s will be converted to a different object. This is a small part (really really small) of your performance difference.
A bigger part can be that as all the "Hello"s you are using has the same memory address strcmp will always get 2 equivalent char pointers and I'm quite sure that it early checks for this case :) So it will never really iterate on the all characters but the std::string comparison will.
One solution to this is use a custom key class that acts as a cross between a const char * and a std::string, but has a boolean to tell at run time if it is "owning" or "non-owning". That way you can insert a key into the map which owns it's data (and will free it on destruction), and then compare with a key that does not own it's data. (This is a similar concept to the rust Cow<'a, str> type).
The below example also inherits from boost's string_ref to avoid having to re-implement hash functions etc.
NOTE this has the dangerous effect that if you accidentally insert into the map with the non-owning version, and the string you are pointing at goes out of scope, the key will point at already freed memory. The non-owning version can only be used for lookups.
#include <iostream>
#include <map>
#include <cstring>
#include <boost/utility/string_ref.hpp>
class MaybeOwned: public boost::string_ref {
public:
// owning constructor, takes a std::string and copies the data
// deletes it's copy on destruction
MaybeOwned(const std::string& string):
boost::string_ref(
(char *)malloc(string.size() * sizeof(char)),
string.size()
),
owned(true)
{
memcpy((void *)data(), (void *)string.data(), string.size());
}
// non-owning constructor, takes a string ref and points to the same data
// does not delete it's data on destruction
MaybeOwned(boost::string_ref string):
boost::string_ref(string),
owned(false)
{
}
// non-owning constructor, takes a c string and points to the same data
// does not delete it's data on destruction
MaybeOwned(const char * string):
boost::string_ref(string),
owned(false)
{
}
// move constructor, tells source that it no longer owns the data if it did
// to avoid double free
MaybeOwned(MaybeOwned&& other):
boost::string_ref(other),
owned(other.owned)
{
other.owned = false;
}
// I was to lazy to write a proper copy constructor
// (it would need to malloc and memcpy again if it owned the data)
MaybeOwned(const MaybeOwned& other) = delete;
// free owned data if it has any
~MaybeOwned() {
if (owned) {
free((void *)data());
}
}
private:
bool owned;
};
int main()
{
std::map<MaybeOwned, std::string> map;
map.emplace(std::string("key"), "value");
map["key"] += " here";
std::cout << map["key"] << "\n";
}
Related
I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}
You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.
Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;
You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.
You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.
As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.
Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.
std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.
You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};
There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.
I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}
You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.
Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;
You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.
You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.
As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.
Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.
std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.
You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};
There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.
This question already has answers here:
Avoiding key construction for std::map::find()
(4 answers)
Closed 8 years ago.
Consider the following code:
std::map<std::string, int> m1;
auto i = m1.find("foo");
const char* key = ...
auto j = m1.find(key);
This will create a temporary std::string object for every map lookup. What are the canonical ways to avoid it?
Don't use pointers; instead, pass strings directly. Then you can take advantage of references:
void do_something(std::string const & key)
{
auto it = m.find(key);
// ....
}
C++ typically becomes "more correct" the more you use its idioms and don't try to write C with it.
You can avoid the temporary by giving the std::map a custom comparator class, that can compare char *s. (The default will use the pointer's address, which isn't what you want. You need to compare on the string's value.)
Thus, something like:
class StrCmp
{
public:
bool operator () (const char *a, const char *b)
{
return strcmp(a, b) < 0;
}
};
// Later:
std::map<const char *, int, StrCmp> m;
Then, use like a normal map, but pass char *'s. Keep in mind that anything you store in the map must remain alive for the duration of the map. That means you need char literals, or you have to keep the data pointed to by the pointer alive on your own. For these reasons, I'd go with a std::map<std::string> and eat the temporary until profiling showed that the above was really needed.
There is no way to avoid a temporary std::string instance that copies character data. Note that this cost is very low and does not incur dynamic memory allocation if your standard library implementation uses short string optimizations.
However, if you need to proxy C-style strings on a frequent basis, you can still come up with custom solutions that will by-pass this allocation. This can pay off if you have to do this really often and your strings are lengthy enough not to benefit from short string optimizations.
If you only need a very small subset of string functionality (e.g. only assignment and copies), then you can write a small special-purpose string class that stores a const char * pointer and a function to release the memory.
class cheap_string
{
public:
typedef void(*Free)(const char*);
private:
const char * myData;
std::size_t mySize;
Free myFree;
public:
// direct member assignments, use with care.
cheap_string ( const char * data, std::size_t size, Free free );
// releases using custom deleter (a no-op for proxies).
~cheap_string ();
// create real copies (safety first).
cheap_string ( const cheap_string& );
cheap_string& operator= ( const cheap_string& );
cheap_string ( const char * data );
cheap_string ( const char * data, std::size_t size )
: myData(new char[size+1]), mySize(size), myFree(&destroy)
{
strcpy(myData, data);
myData[mySize] = '\0';
}
const char * data () const;
const std::size_t size () const;
// whatever string functionality you need.
bool operator< ( const cheap_string& ) const;
bool operator== ( const cheap_string& ) const;
// create proxies for existing character buffers.
static const cheap_string proxy ( const char * data )
{
return cheap_string(data, strlen(data), &abandon);
}
static const cheap_string proxy ( const char * data, std::size_t size )
{
return cheap_string(data, size, &abandon);
}
private:
// deleter for proxies (no-op)
static void abandon ( const char * data )
{
// no-op, this is used for proxies, which don't own the data!
}
// deleter for copies (delete[]).
static void destroy ( const char * data )
{
delete [] data;
}
};
Then, you can use this class as:
std::map<cheap_string, int> m1;
auto i = m1.find(cheap_string::proxy("foo"));
The temporary cheap_string instance does not create a copy of the character buffer like std::string does, yet it preserves safe copy semantics for storing instances of cheap_string in standard containers.
notes: if your implementation does not use return value optimization, you'll want to find an alternate syntax for the proxy method, such as a constructor with a special overload (taking a custom proxy_t type à la std::nothrow for placement new).
Well, map's find actually accepts a constant reference to the key, so you cannot avoid creating it at one point or another.
For the first part of the code you can have a constant static std::string with value "foo" to lookup. That way you won't create copies.
If you want to go Spartan's way, you can always create your own type that can be used like a string, but also can hold pointer to string literals.
But in any event, overhead associated with map lookups is so huge so this doesn't really make sense. If I were you I'd first replace map/unordered_map with google's dense hash. Then I would run Intel's VTune (amplifier these days) and see where the time is going and optimize those places. I doubt strings as keys will show up at a bottleneck top ten list.
Take a look at the StringRef class from llvm.
They can be constructed very cheap from c-strings, string literals or std::string. If you made a map of those, instead of std::string, the construction would be very fast.
It's a very fragile system though. You need to be sure that whatever the source of the strings you insert stays alive and unmodified for the lifetime of the map.
I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}
You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.
Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;
You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.
You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.
As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.
Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.
std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.
You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};
There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.
I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase) but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?
The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase you had in your Dictionary:
struct my_hash_compare {
const size_t bucket_size = 4;
const size_t min_buckets = 8;
size_t operator()(const LPCWSTR &Key) const {
// implement a case-insensitive hash function here,
// or find something in the Windows libraries.
}
bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
// implement a case-insensitive comparison function here
return _wcsicmp(Key1, Key2) < 0;
// or something like that. There's warnings about
// locale plastered all over this function's docs.
}
};
I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==. That is, they might require that if my_hash_compare()(a,b) is false, and my_hash_compare()(b,a) is false, then a == b. Obviously that's not true for what I've written, in which case you're out of luck.
As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert, in which case you should use wstring as the key.
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>
typedef std::pair<std::wstring, int> MyPair;
struct key_comparer
{
bool operator()(std::wstring a, std::wstring b) const
{
return _wcsicmp(a.c_str(), b.c_str()) < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::map<std::wstring, int, key_comparer> mymap;
mymap.insert(MyPair(L"GHI",3));
mymap.insert(MyPair(L"DEF",2));
mymap.insert(MyPair(L"ABC",1));
std::map<std::wstring, int, key_comparer>::iterator iter;
iter = mymap.find(L"def");
if (iter == mymap.end()) {
printf("No match.\n");
} else {
printf("match: %i\n", iter->second);
}
return 0;
}
If you use an std::map instead of the non-standard hash_map, you can set the comparison function to be used when doing the binary search:
// Function object for case insensitive comparison
struct case_insensitive_compare
{
case_insensitive_compare() {}
// Function objects overloader operator()
// When used as a comparer, it should function as operator<(a,b)
bool operator()(const std::string& a, const std::string& b) const
{
return to_lower(a) < to_lower(b);
}
std::string to_lower(const std::string& a) const
{
std::string s(a);
std::for_each(s.begin(), s.end(), char_to_lower);
return s;
}
void char_to_lower(char& c) const
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
}
};
// ...
std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring specialization of basic_string instead.
For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.