Using char* as a key in std::map - c++

I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}

You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;

You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.

Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;

You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.

You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.

As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.

Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.

std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.

You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};

There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.

Related

map<char*, int> how to use it correctly? [duplicate]

I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}
You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.
Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;
You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.
You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.
As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.
Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.
std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.
You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};
There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.

find wchar_t data in std::map and duplicates [duplicate]

I am trying to figure out why the following code is not working, and I am assuming it is an issue with using char* as the key type, however I am not sure how I can resolve it or why it is occuring. All of the other functions I use (in the HL2 SDK) use char* so using std::string is going to cause a lot of unnecessary complications.
std::map<char*, int> g_PlayerNames;
int PlayerManager::CreateFakePlayer()
{
FakePlayer *player = new FakePlayer();
int index = g_FakePlayers.AddToTail(player);
bool foundName = false;
// Iterate through Player Names and find an Unused one
for(std::map<char*,int>::iterator it = g_PlayerNames.begin(); it != g_PlayerNames.end(); ++it)
{
if(it->second == NAME_AVAILABLE)
{
// We found an Available Name. Mark as Unavailable and move it to the end of the list
foundName = true;
g_FakePlayers.Element(index)->name = it->first;
g_PlayerNames.insert(std::pair<char*, int>(it->first, NAME_UNAVAILABLE));
g_PlayerNames.erase(it); // Remove name since we added it to the end of the list
break;
}
}
// If we can't find a usable name, just user 'player'
if(!foundName)
{
g_FakePlayers.Element(index)->name = "player";
}
g_FakePlayers.Element(index)->connectTime = time(NULL);
g_FakePlayers.Element(index)->score = 0;
return index;
}
You need to give a comparison functor to the map otherwise it's comparing the pointer, not the null-terminated string it points to. In general, this is the case anytime you want your map key to be a pointer.
For example:
struct cmp_str
{
bool operator()(char const *a, char const *b) const
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
You can't use char* unless you are absolutely 100% sure you are going to access the map with the exact same pointers, not strings.
Example:
char *s1; // pointing to a string "hello" stored memory location #12
char *s2; // pointing to a string "hello" stored memory location #20
If you access map with s1 you will get a different location than accessing it with s2.
Two C-style strings can have equal contents but be at different addresses. And that map compares the pointers, not the contents.
The cost of converting to std::map<std::string, int> may not be as much as you think.
But if you really do need to use const char* as map keys, try:
#include <functional>
#include <cstring>
struct StrCompare : public std::binary_function<const char*, const char*, bool> {
public:
bool operator() (const char* str1, const char* str2) const
{ return std::strcmp(str1, str2) < 0; }
};
typedef std::map<const char*, int, StrCompare> NameMap;
NameMap g_PlayerNames;
You can get it working with std::map<const char*, int>, but must not use non-const pointers (note the added const for the key), because you must not change those strings while the map refers to them as keys. (While a map protects its keys by making them const, this would only constify the pointer, not the string it points to.)
But why don't you simply use std::map<std::string, int>? It works out of the box without headaches.
You are comparing using a char * to using a string. They are not the same.
A char * is a pointer to a char. Ultimately, it is an integer type whose value is interpreted as a valid address for a char.
A string is a string.
The container works correctly, but as a container for pairs in which the key is a char * and the value is an int.
As the others say, you should probably use std::string instead of a char* in this case although there is nothing wrong in principle with a pointer as a key if that's what is really required.
I think another reason this code isn't working is because once you find an available entry in the map you attempt to reinsert it into the map with the same key (the char*). Since that key already exists in your map, the insert will fail. The standard for map::insert() defines this behaviour...if the key value exists the insert fails and the mapped value remains unchanged. Then it gets deleted anyway. You'd need to delete it first and then reinsert.
Even if you change the char* to a std::string this problem will remain.
I know this thread is quite old and you've fixed it all by now but I didn't see anyone making this point so for the sake of future viewers I'm answering.
Had a hard time using the char* as the map key when I try to find element in multiple source files. It works fine when all the accessing/finding within the same source file where the elements are inserted. However, when I try to access the element using find in another file, I am not able to get the element which is definitely inside the map.
It turns out the reason is as Plabo pointed out, the pointers (every compilation unit has its own constant char*) are NOT the same at all when it is accessed in another cpp file.
std::map<char*,int> will use the default std::less<char*,int> to compare char* keys, which will do a pointer comparison. But you can specify your own Compare class like this:
class StringPtrCmp {
public:
StringPtrCmp() {}
bool operator()(const char *str1, const char *str2) const {
if (str1 == str2)
return false; // same pointer so "not less"
else
return (strcmp(str1, str2) < 0); //string compare: str1<str2 ?
}
};
std::map<char*, YourType, StringPtrCmp> myMap;
Bear in mind that you have to make sure that the char* pointer are valid.
I would advice to use std::map<std::string, int> anyway.
You can bind a lambda that does the same job.
#include <map>
#include <functional>
class a{
std::map < const char*, Property*, std::function<bool(const char* a, const char* b)> > propertyStore{
std::bind([](const char* a, const char* b) {return std::strcmp(a,b) < 0;},std::placeholders::_1,std::placeholders::_2)
};
};
There's no problem to use any key type as long as it supports comparison (<, >, ==) and assignment.
One point that should be mentioned - take into account that you're using a template class. As the result compiler will generate two different instantiations for char* and int*. Whereas the actual code of both will be virtually identical.
Hence - I'd consider using a void* as a key type, and then casting as necessary.
This is my opinion.

How to use std::string as key in stxxl::map

I am trying to use std::string as a key in the stxxl::map
The insertion was fine for small number of strings about 10-100.
But while trying to insert large number of strings about 100000 in it, I am getting segmentation fault.
The code is as follows:
struct CompareGreaterString {
bool operator () (const std::string& a, const std::string& b) const {
return a > b;
}
static std::string max_value() {
return "";
}
};
// template parameter <KeyType, DataType, CompareType, RawNodeSize, RawLeafSize, PDAllocStrategy (optional)>
typedef stxxl::map<std::string, unsigned int, CompareGreaterString, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> name_map;
name_map strMap((name_map::node_block_type::raw_size)*3, (name_map::leaf_block_type::raw_size)*3);
for (unsigned int i = 0; i < 1000000; i++) { /// Inserting 1 million strings
std::stringstream strStream;
strStream << (i);
Console::println("Inserting: " + strStream.str());
strMap[strStream.str()]=i;
}
In here I am unable to identify why I am unable to insert more number of strings. I am getting segmentation fault exactly while inserting "1377". Plus I am able to add any number of integers as key. I feel that the variable size of string might be causing this trouble.
Also I am unable to understand what to return for max_value of the string. I simply returned a blank string.
According to documentation:
CompareType must also provide a static max_value method, that returns a value of type KeyType that is larger than any key stored in map
Because empty string happens to compare as smaller than any other string, it breaks this precondition and may thus cause unspecified behaviour.
Here's a max_value that should work. MAX_KEY_LEN is just an integer which is larger or equal to the length of the longest possible string key that the map can have.
struct CompareGreaterString {
// ...
static std::string max_value() {
return std::string(MAX_KEY_LEN, std::numeric_limits<unsigned char>::max());
}
};
I have finally found the solution to my problem with great help from Timo bingmann, user2079303 and Martin Ba. Thank you.
I would like to share it with you.
Firstly stxxl supports POD only. That means it stores fixed sized structures only. Hence std::string cannot be a key. stxxl::map worked for about 100-1000 strings because they were contained in the physical memory itself. When more strings are inserted it has to write on disk which is internally causing some problems.
Hence we need to use a fixed string using char[] as follows:
static const int MAX_KEY_LEN = 16;
class FixedString {
public:
char charStr[MAX_KEY_LEN];
bool operator< (const FixedString& fixedString) const {
return std::lexicographical_compare(charStr, charStr+MAX_KEY_LEN,
fixedString.charStr, fixedString.charStr+MAX_KEY_LEN);
}
bool operator==(const FixedString& fixedString) const {
return std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
bool operator!=(const FixedString& fixedString) const {
return !std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
}
};
struct comp_type : public std::less<FixedString> {
static FixedString max_value()
{
FixedString s;
std::fill(s.charStr, s.charStr+MAX_KEY_LEN, 0x7f);
return s;
}
};
Please note that all the operators mainly((), ==, !=) need to be overriden for all the stxxl::map functions to work
Now we may define fixed_name_map for map as follows:
typedef stxxl::map<FixedString, unsigned int, comp_type, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> fixed_name_map;
fixed_name_map myFixedMap((fixed_name_map::node_block_type::raw_size)*5, (fixed_name_map::leaf_block_type::raw_size)*5);
Now the program is compiling fine and is accepting about 10^8 strings without any problem.
also we can use myFixedMap like std::map itself. {for ex: myFixedMap[fixedString] = 10}
If you are using C++11, then as an alternative to the FixedString class you could use std::array<char, MAX_KEY_LEN>. It is an STL layer on top of an ordinary fixed-size C array, implementing comparisons and iterators as you are used to from std::string, but it's a POD type, so STXXL should support it.
Alternatively, you can use serialization_sort in TPIE. It can sort elements of type std::pair<std::string, unsigned int> just fine, so if all you need is to insert everything in bulk and then access it in bulk, this will be sufficient for your case (and probably faster depending on the exact case).

C++ map<std::string> vs map<char *> performance (I know, "again?")

I was using a map with a std::string key and while everything was working fine I wasn't getting the performance I expected. I searched for places to optimize and improved things only a little and that's when a colleague said, "that string key is going to be slow."
I read dozens of questions and they consistently say:
"don't use a char * as a key"
"std::string keys are never your bottleneck"
"the performance difference between a char * and a
std::string is a myth."
I reluctantly tried a char * key and there was a difference, a big difference.
I boiled the problem down to a simple example:
#include <stdio.h>
#include <stdlib.h>
#include <map>
#ifdef USE_STRING
#include <string>
typedef std::map<std::string, int> Map;
#else
#include <string.h>
struct char_cmp {
bool operator () (const char *a,const char *b) const
{
return strcmp(a,b)<0;
}
};
typedef std::map<const char *, int, char_cmp> Map;
#endif
Map m;
bool test(const char *s)
{
Map::iterator it = m.find(s);
return it != m.end();
}
int main(int argc, char *argv[])
{
m.insert( Map::value_type("hello", 42) );
const int lcount = atoi(argv[1]);
for (int i=0 ; i<lcount ; i++) test("hello");
}
First the std::string version:
$ g++ -O3 -o test test.cpp -DUSE_STRING
$ time ./test 20000000
real 0m1.893s
Next the 'char *' version:
g++ -O3 -o test test.cpp
$ time ./test 20000000
real 0m0.465s
That's a pretty big performance difference and about the same difference I see in my larger program.
Using a char * key is a pain to handle freeing the key and just doesn't feel right. C++ experts what am I missing? Any thoughts or suggestions?
You are using a const char * as a lookup key for find(). For the map containing const char* this is the correct type that find expects and the lookup can be done directly.
The map containing std::string expects the parameter of find() to be a std::string, so in this case the const char* first has to be converted to a std::string. This is probably the difference you are seeing.
As sth noted, the issue is one of specifications of the associative containers (sets and maps), in that their member search methods always force a conversion to the key_type, even if an operator< exists that would accept to compare your key against the keys in the map despite their different types.
On the other hand, the functions in <algorithm> do not suffer from this, for example lower_bound is defined as:
template< class ForwardIt, class T >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value );
template< class ForwardIt, class T, class Compare >
ForwardIt lower_bound( ForwardIt first, ForwardIt last, const T& value, Compare comp );
So, an alternative could be:
std::vector< std::pair< std::string, int > >
And then you could do:
std::lower_bound(vec.begin(), vec.end(), std::make_pair("hello", 0), CompareFirst{})
Where CompareFirst is defined as:
struct CompareFirst {
template <typename T, typename U>
bool operator()(T const& t, U const& u) const { return t.first < u.first; }
};
Or even build a completely custom comparator (but it's a bit harder).
A vector of pair is generally more efficient in read-heavy loads, so it's really to store a configuration for example.
I do advise to provide methods to wrap the accesses. lower_bound is pretty low-level.
If your in C++ 11, the copy constructor is not called unless the string is changed. Because std::string is a C++ construct, at least 1 dereference is needed to get at the string data.
My guess would be the time is taken up in an extra dereference (which if done 10000 times is costly), and std::string is likely doing appropriate null pointer checks, which again eats up cycles.
Store the std::string as a pointer and then you lose the copy constructor overhead.
But after you have to remember to handle the deletes.
The reason std::string is slow is that is constructs itself. Calls the copy constructor, and then at the end calls delete. If you create the string on the heap you lose the copy construction.
After compilation the 2 "Hello" string literals will have the same memory address. On the char * case you use this memory addresses as keys.
In the string case every "Hello"s will be converted to a different object. This is a small part (really really small) of your performance difference.
A bigger part can be that as all the "Hello"s you are using has the same memory address strcmp will always get 2 equivalent char pointers and I'm quite sure that it early checks for this case :) So it will never really iterate on the all characters but the std::string comparison will.
One solution to this is use a custom key class that acts as a cross between a const char * and a std::string, but has a boolean to tell at run time if it is "owning" or "non-owning". That way you can insert a key into the map which owns it's data (and will free it on destruction), and then compare with a key that does not own it's data. (This is a similar concept to the rust Cow<'a, str> type).
The below example also inherits from boost's string_ref to avoid having to re-implement hash functions etc.
NOTE this has the dangerous effect that if you accidentally insert into the map with the non-owning version, and the string you are pointing at goes out of scope, the key will point at already freed memory. The non-owning version can only be used for lookups.
#include <iostream>
#include <map>
#include <cstring>
#include <boost/utility/string_ref.hpp>
class MaybeOwned: public boost::string_ref {
public:
// owning constructor, takes a std::string and copies the data
// deletes it's copy on destruction
MaybeOwned(const std::string& string):
boost::string_ref(
(char *)malloc(string.size() * sizeof(char)),
string.size()
),
owned(true)
{
memcpy((void *)data(), (void *)string.data(), string.size());
}
// non-owning constructor, takes a string ref and points to the same data
// does not delete it's data on destruction
MaybeOwned(boost::string_ref string):
boost::string_ref(string),
owned(false)
{
}
// non-owning constructor, takes a c string and points to the same data
// does not delete it's data on destruction
MaybeOwned(const char * string):
boost::string_ref(string),
owned(false)
{
}
// move constructor, tells source that it no longer owns the data if it did
// to avoid double free
MaybeOwned(MaybeOwned&& other):
boost::string_ref(other),
owned(other.owned)
{
other.owned = false;
}
// I was to lazy to write a proper copy constructor
// (it would need to malloc and memcpy again if it owned the data)
MaybeOwned(const MaybeOwned& other) = delete;
// free owned data if it has any
~MaybeOwned() {
if (owned) {
free((void *)data());
}
}
private:
bool owned;
};
int main()
{
std::map<MaybeOwned, std::string> map;
map.emplace(std::string("key"), "value");
map["key"] += " here";
std::cout << map["key"] << "\n";
}

How to use a hash_map with case insensitive unicode string for key?

I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase) but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses case-insensitive keys and can store my custom struct?
The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase you had in your Dictionary:
struct my_hash_compare {
const size_t bucket_size = 4;
const size_t min_buckets = 8;
size_t operator()(const LPCWSTR &Key) const {
// implement a case-insensitive hash function here,
// or find something in the Windows libraries.
}
bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
// implement a case-insensitive comparison function here
return _wcsicmp(Key1, Key2) < 0;
// or something like that. There's warnings about
// locale plastered all over this function's docs.
}
};
I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==. That is, they might require that if my_hash_compare()(a,b) is false, and my_hash_compare()(b,a) is false, then a == b. Obviously that's not true for what I've written, in which case you're out of luck.
As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert, in which case you should use wstring as the key.
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>
typedef std::pair<std::wstring, int> MyPair;
struct key_comparer
{
bool operator()(std::wstring a, std::wstring b) const
{
return _wcsicmp(a.c_str(), b.c_str()) < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::map<std::wstring, int, key_comparer> mymap;
mymap.insert(MyPair(L"GHI",3));
mymap.insert(MyPair(L"DEF",2));
mymap.insert(MyPair(L"ABC",1));
std::map<std::wstring, int, key_comparer>::iterator iter;
iter = mymap.find(L"def");
if (iter == mymap.end()) {
printf("No match.\n");
} else {
printf("match: %i\n", iter->second);
}
return 0;
}
If you use an std::map instead of the non-standard hash_map, you can set the comparison function to be used when doing the binary search:
// Function object for case insensitive comparison
struct case_insensitive_compare
{
case_insensitive_compare() {}
// Function objects overloader operator()
// When used as a comparer, it should function as operator<(a,b)
bool operator()(const std::string& a, const std::string& b) const
{
return to_lower(a) < to_lower(b);
}
std::string to_lower(const std::string& a) const
{
std::string s(a);
std::for_each(s.begin(), s.end(), char_to_lower);
return s;
}
void char_to_lower(char& c) const
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
}
};
// ...
std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring specialization of basic_string instead.
For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.