C++ index into string map without allocation - c++

I'm writing an application with a high performance thread that doesn't allow allocation. I have a map that looks like this:
map<String, MyCustomClass> objectCollection;
Where String is a custom wrapper around std::string. I want to be able to write code like this on the high priority thread:
int someValue = objectCollection["some string"].value;
When I do this, indexing into the array causes the construction of a String, which requires allocation. My thought was that I might be able to define a custom comparator for my map that would accept a const char*, and be able to do string comparison with a String's c string guts. Is this possible? How might it look?
I can do something like this with String instances:
String strTest = "";
const char* chars = strTest.chars();

You can get away with doing only one allocation.
static const string Key("some string");
int someValue = objectCollection[Key];
Doing it with zero allocations would require a different string class. You would have somehow make use of const char* and a custom comparison mechanism.

A custom comparison won't do you any good with a map; the lookup operator always converts its argument to the key type, regardless of how the comparison operator works. But when you want fast lookups, there's probably a better way.
Keeping things in a sorted vector and looking them up using the binary search algorithms (lower_bound() etc) is usually faster than looking them up in a map (because, among other things, a map's internal tree structure imposes a good deal of pointer chasing on each lookup). A map is much faster for insertion than a sorted vector, but when fast lookup is more important than fast insertion, the vector is usually faster, and the vector has the advantage that you can use a heterogeneous comparison function (one that takes two different argument types).
Something like this:
struct Element {
std::string key;
Thing value;
};
bool compare(const Element& lhs, const char* rhs) {
return lhs.key < rhs;
}
using Collection = std::vector<Element>;
inline Thing lookup(const char* key, const Collection& coll) {
// Requires coll to be already sorted
auto i(std::lower_bound(coll.begin(), coll.end(), key, compare));
if (i != coll.end() && i->key == key)
return i->value;
else
return Thing();
}

In C++14, there are some neat new features that should allow this to happen. For instance, there's a templated map::find
template< class K > iterator find( const K& x );
http://en.cppreference.com/w/cpp/container/map/find

All you can do is change the key_type to const char* since map::find aswell as map::operator[] aswell as map::at take key_type as their argument. as such even if you pass a const char* it will construct a String before the map function is even called. So unless you make your String static you wont get away without constructing one.

Related

Do map::erase(iterator) functions require valid key contents?

Consider the following:
struct ExternalBuffer {
ExternalBuffer(const char *s) : s(s){}
const char *s;
bool operator<(const ExternalBuffer& other) const {
return strcmp(s, other.s) < 0;
}
};
char *some_cstr = strdup("hello");
std::map<ExternalBuffer, Value> m;
m.insert(std::make_pair(ExternalBuffer(some_cstr), Value());
auto it = m.find(ExternalBuffer(some_cstr));
free(some_cstr);
m.erase(it);
In the above example, ExternalBuffer is just a wrapper object for the necessary comparison functions (e.g. operator<(), or std::hash).
Is it safe to assume that comparison operations on the key are not performed when erasing by iterator? If no, is anyone aware of implementations which do perform these operations?
The 'safe' thing in my example is of course to free the string after erasing the iterator, but I'm just wondering if that's required.
Keys in a map are const objects, and they need to stay valid as long as they are in the map. Any change to their state that can change the behavior of the comparison function results in Undefined Behavior.
Because free(some_cstr); will break this comparison (since it will invoke Undefined Behavior by dereferencing a freed pointer), the your map is broken.
In practical terms, however, it is unlikely that the comparison will need to be run when erasing from a map using an iterator. It is possible that some debug or validation code could complain (if, for example, there is validation the state of that portion of the map before erasing the element).
But is this guaranteed to work by the standard? No.

std::set::find vs std::find on std::set with const

I wrote a little (working) test code but I do not understand why in the test1 function I can only pass a int* const as parameter while in the test2 function I can pass a const int*. If I pass a const int* to test1, I get a discard qualifier error.
In my research, I found that both std::find and set::find have a const version so I can't see why they behave differently. I also tried with boost::container::flat_set instead of a std::set and I got the same result.
Could someone explain me please?
class myClass
{
public:
myClass() {};
~myClass() {};
void add(int* ref)
{
this->_ref.insert(ref);
};
bool test1(int* const ref) const
{
return ( this->_ref.find(ref) != this->_ref.end() );
}
inline
bool test2(const int* ref) const
{
return ( std::find(this->_ref.begin(), this->_ref.end(), ref) != this->_ref.end() );
}
std::set<int*> _ref;
};
int main()
{
myClass test;
test.add(new int(18));
test.add(new int(35));
test.add(new int(78));
test.add(new int(156));
std::cout<<test.test1(0)<<std::endl;
std::cout<<test.test1(*test._ref.begin())<<std::endl;
std::cout<<test.test2(0)<<std::endl;
std::cout<<test.test2(*test._ref.begin())<<std::endl;
return 0;
}
set::find() gives the answer in O(logN) while std::find() gives the answer in O(N) .
Similarly, map::find() gives the answer in O(logN) while std::find() gives the answer in O(N) .
The container std::set<int*> has only homogeneous lookup, so you can only search keys by comparing them with a value of the same type: find, count, erase. Naturally, a value of type const int* does not have the same type as int*, so your test2 code attempts to convert the former to the latter, which is not an allowed conversion.
The fact that containers could only be used in a homogeneous way like that has been a shortcoming of C++ since inception, and more egregrious examples of undesired conversions are when you have a map with std::string keys and want to look up an element with a key provided as a string literal. You always have to construct the dynamic std::string object, even though std::string provide comparisons operators with string literals.
Therefore, since C++14, you can also make a set (or map) with inhomogeneous lookup by spelling it std::set<int*, std::less<>>. With such a container, the loopup functions become templates, and you can indeed compare values of different types (leaving the conversion logic to the underlying <-operator). But note that std::less<int*> is required to provide a strict weak ordering on pointers, whereas std::less<> is not, so you may end up with undefined behaviour.

Why does std::vector transfer its constness to the contained objects?

A const int * and an int *const are very different. Similarly with const std::auto_ptr<int> vs. std::auto_ptr<const int>. However, there appears to be no such distinction with const std::vector<int> vs. std::vector<const int> (actually I'm not sure the second is even allowed). Why is this?
Sometimes I have a function which I want to pass a reference to a vector. The function shouldn't modify the vector itself (eg. no push_back()), but it wants to modify each of the contained values (say, increment them). Similarly, I might want a function to only change the vector structure but not modify any of its existing contents (though this would be odd). This kind of thing is possible with std::auto_ptr (for example), but because std::vector::front() (for example) is defined as
const T &front() const;
T &front();
rather than just
T &front() const;
There's no way to express this.
Examples of what I want to do:
//create a (non-modifiable) auto_ptr containing a (modifiable) int
const std::auto_ptr<int> a(new int(3));
//this works and makes sense - changing the value pointed to, not the pointer itself
*a = 4;
//this is an error, as it should be
a.reset();
//create a (non-modifiable) vector containing a (modifiable) int
const std::vector<int> v(1, 3);
//this makes sense to me but doesn't work - trying to change the value in the vector, not the vector itself
v.front() = 4;
//this is an error, as it should be
v.clear();
It's a design decision.
If you have a const container, it usually stands to reason that you don't want anybody to modify the elements that it contains, which are an intrinsic part of it. That the container completely "owns" these elements "solidifies the bond", if you will.
This is in contrast to the historic, more lower-level "container" implementations (i.e. raw arrays) which are more hands-off. As you quite rightly say, there is a big difference between int const* and int * const. But standard containers simply choose to pass the constness on.
The difference is that pointers to int do not own the ints that they point to, whereas a vector<int> does own the contained ints. A vector<int> can be conceptualised as a struct with int members, where the number of members just happens to be variable.
If you want to create a function that can modify the values contained in the vector but not the vector itself then you should design the function to accept iterator arguments.
Example:
void setAllToOne(std::vector<int>::iterator begin, std::vector<int>::iterator end)
{
std::for_each(begin, end, [](int& elem) { elem = 1; });
}
If you can afford to put the desired functionality in a header, then it can be made generic as:
template<typename OutputIterator>
void setAllToOne(OutputIterator begin, OutputIterator end)
{
typedef typename iterator_traits<OutputIterator>::reference ref;
std::for_each(begin, end, [](ref elem) { elem = 1; });
}
One big problem syntactically with what you suggest is this: a std::vector<const T> is not the same type as a std::vector<T>. Therefore, you could not pass a vector<T> to a function that expects a vector<const T> without some kind of conversion. Not a simple cast, but the creation of a new vector<const T>. And that new one could not simply share data with the old; it would have to either copy or move the data from the old one to the new one.
You can get away with this with std::shared_ptr, but that's because those are shared pointers. You can have two objects that reference the same pointer, so the conversion from a std::shared_ptr<T> to shared_ptr<const T> doesn't hurt (beyond bumping the reference count). There is no such thing as a shared_vector.
std::unique_ptr works too because they can only be moved from, not copied. Therefore, only one of them will ever have the pointer.
So what you're asking for is simply not possible.
You are correct, it is not possible to have a vector of const int primarily because the elements will not assignable (requirements for the type of the element contained in the vector).
If you want a function that only modifies the elements of a vector but not add elements to the vector itself, this is primarily what STL does for you -- have functions that are agnostic about which container a sequence of elements is contained in. The function simply takes a pair of iterators and does its thing for that sequence, completely oblivious to the fact that they are contained in a vector.
Look up "insert iterators" for getting to know about how to insert something into a container without needing to know what the elements are. E.g., back_inserter takes a container and all that it cares for is to know that the container has a member function called "push_back".

How to std::find using a Compare object?

I am confused about the interface of std::find. Why doesn't it take a Compare object that tells it how to compare two objects?
If I could pass a Compare object I could make the following code work, where I would like to compare by value, instead of just comparing the pointer values directly:
typedef std::vector<std::string*> Vec;
Vec vec;
std::string* s1 = new std::string("foo");
std::string* s2 = new std::string("foo");
vec.push_back(s1);
Vec::const_iterator found = std::find(vec.begin(), vec.end(), s2);
// not found, obviously, because I can't tell it to compare by value
delete s1;
delete s2;
Is the following the recommended way to do it?
template<class T>
struct MyEqualsByVal {
const T& x_;
MyEqualsByVal(const T& x) : x_(x) {}
bool operator()(const T& y) const {
return *x_ == *y;
}
};
// ...
vec.push_back(s1);
Vec::const_iterator found =
std::find_if(vec.begin(), vec.end(),
MyEqualsByVal<std::string*>(s2)); // OK, will find "foo"
find can't be overloaded to take a unary predicate instead of a value, because it's an unconstrained template parameter. So if you called find(first, last, my_predicate), there would be a potential ambiguity whether you want the predicate to be evaluated on each member of the range, or whether you want to find a member of the range that's equal to the predicate itself (it could be a range of predicates, for all the designers of the standard libraries know or care, or the value_type of the iterator could be convertible both to the predicate type, and to its argument_type). Hence the need for find_if to go under a separate name.
find could have been overloaded to take an optional binary predicate, in addition to the value searched for. But capturing values in functors, as you've done, is such a standard technique that I don't think it would be a massive gain: it's certainly never necessary since you can always achieve the same result with find_if.
If you got the find you wanted, you'd still have to write a functor (or use boost), since <functional> doesn't contain anything to dereference a pointer. Your functor would be a little simpler as a binary predicate, though, or you could use a function pointer, so it'd be a modest gain. So I don't know why this isn't provided. Given the copy_if fiasco I'm not sure there's much value in assuming there are always good reasons for algorithms that aren't available :-)
Since your T is a pointer, you may as well store a copy of the pointer in the function object.
Other than that, that is how it is done and there's not a whole lot more to it.
As an aside, it's not a good idea to store bare pointers in a container, unless you are extremely careful with ensuring exception safety, which is almost always more hassle than it's worth.
That's exactly what find_if is for - it takes a predicate that is called to compare elements.

Simplest, safest way of holding a bunch of const char* in a set?

I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.
If you don't want to wrap them in std::strings, you can define a functor class:
struct ConstCharStarComparator
{
bool operator()(const char *s1, const char *s2) const
{
return strcmp(s1, s2) < 0;
}
};
typedef std::set<const char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;
Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:
"For templates greater, less, greater_equal, and less_equal, the specializations for any
pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."
The guarantee is there exactly for things like your set<const char*>.
The "optimized way"
If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:
struct MyCharComparator
{
bool operator()(const char * A, const char * B) const
{
return (strcmp(A, B) < 0) ;
}
} ;
To use with a:
std::set<const char *, MyCharComparator>
The standard way
Use a:
std::set<std::string>
It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).
Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".
Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...
The "never do it" way
I noted the following comment after my answer:
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).
Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:
{ "AAA", "AAA", "AAA" }
Of course, if you only use the same global variables to contain the string, this is another story.
In this case, I suggest:
std::set<const char *>
Of course, it won't work if you compare strings with the same contents but different variables/addresses.
And, of course, it won't work with static const char * strings if those strings are defined in a header.
But this is another story.
Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of std::strings in the set. That way you won't have to write any extra glue code.
Must the set contain const char*?
What immediately springs to mind is storing the strings in a std::string instead, and putting those into the std::set. This will allow comparisons without a problem, and you can always get the raw const char* with a simple function call:
const char* data = theString.c_str();
Either use a comparator, or use a wrapper type to be contained in the set. (Note: std::string is a wrapper, too....)
const char* a("a");
const char* b("b");
struct CWrap {
const char* p;
bool operator<(const CWrap& other) const{
return strcmp( p, other.p ) < 0;
}
CWrap( const char* p ): p(p){}
};
std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);
Others have already posted plenty of solutions showing how to do lexical comparisons with const char*, so I won't bother.
Please do not suggest creating std::strings - it is a waste of time and space.
If std::string is a waste of time and space, then std::set might be a waste of time and space as well. Each element in a std::set is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more than std::set's O(log n) lookups help performance. You may get better results using another data structure, such as a sorted std::vector, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.
the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
The strings are static, so they can be compared for (in)equality based on their address.
That depends on what the pointers point to. If all of the keys are allocated from the same array, then using operator< to compare pointers is not undefined behavior.
Example of an array containing separate static strings:
static const char keys[] = "apple\0banana\0cantaloupe";
If you create a std::set<const char*> and fill it with pointers that point into that array, their ordering will be well-defined.
If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.
If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?
If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them
Casting the pointers to uintptr_t would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.
Presumably you don't want to use std::string because of performance reasons.
I'm running MSVC and gcc, and they both seem to not mind this:
bool foo = "blah" < "grar";
EDIT: However, the behaviour in this case is unspecified. See comments...
They also don't complain about std::set<const char*>.
If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to ints.
Edit:
Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:
The question is not looking for std::string solutions. Every time you enter an std::string in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.
Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.
The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a const char* over either std::string or a custom comparison that calls strcmp. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.