Do map::erase(iterator) functions require valid key contents? - c++

Consider the following:
struct ExternalBuffer {
ExternalBuffer(const char *s) : s(s){}
const char *s;
bool operator<(const ExternalBuffer& other) const {
return strcmp(s, other.s) < 0;
}
};
char *some_cstr = strdup("hello");
std::map<ExternalBuffer, Value> m;
m.insert(std::make_pair(ExternalBuffer(some_cstr), Value());
auto it = m.find(ExternalBuffer(some_cstr));
free(some_cstr);
m.erase(it);
In the above example, ExternalBuffer is just a wrapper object for the necessary comparison functions (e.g. operator<(), or std::hash).
Is it safe to assume that comparison operations on the key are not performed when erasing by iterator? If no, is anyone aware of implementations which do perform these operations?
The 'safe' thing in my example is of course to free the string after erasing the iterator, but I'm just wondering if that's required.

Keys in a map are const objects, and they need to stay valid as long as they are in the map. Any change to their state that can change the behavior of the comparison function results in Undefined Behavior.
Because free(some_cstr); will break this comparison (since it will invoke Undefined Behavior by dereferencing a freed pointer), the your map is broken.
In practical terms, however, it is unlikely that the comparison will need to be run when erasing from a map using an iterator. It is possible that some debug or validation code could complain (if, for example, there is validation the state of that portion of the map before erasing the element).
But is this guaranteed to work by the standard? No.

Related

Constant time `contains` for `std::vector`? [duplicate]

This question already has an answer here:
How to correcly check whether a pointer belongs within an allocated block?
(1 answer)
Closed 1 year ago.
I am working with some code that checks if std::vector contains a given element in constant time by comparing its address to those describing the extent of the vector's data. However I suspect that, although it works, it relies on undefined behaviour. If the element is not contained by the vector then the pointer comparisons are not permitted.
bool contains(const std::vector<T>& v, const T& a) {
return (v.data() <= &a) && (&a < v.data() + v.size());
}
Am I right in believing it is undefined behaviour? If so, is there any way to do the same thing without drastically changing the time complexity of the code?
You can use std::less
A specialization of std::less for any pointer type yields the implementation-defined strict total order, even if the built-in < operator does not.
Update:
The standard doesn't guarantee that this will actually work for contains though. If you have say two vectors a and b, the total order is permitted to be &a[0], &b[0], &a[1], &b[1], &a[2], &b[2], ..., i.e., with the elements interleaved.
As pointed out in the comments, the standard only guarantees that std::less yields the implementation-defined strict total order, which is is consistent with the partial order imposed by the builtin operators. However, the standard doesn't guarantee the order of pointers pointing to different objects or arrays. Releated: https://devblogs.microsoft.com/oldnewthing/20170927-00/?p=97095
One interesting thing is that there's a similar usage in Herb Sutter's gcpp library(link). There's a comment saying that it is portable, the library is experimental though.
// Return whether p points into this page's storage and is allocated.
//
inline
bool gpage::contains(gsl::not_null<const byte*> p) const noexcept {
// Use std::less<> to compare (possibly unrelated) pointers portably
auto const cmp = std::less<>{};
auto const ext = extent();
return !cmp(p, ext.data()) && cmp(p, ext.data() + ext.size());
}
Yes, the comparisons as written are not permitted if the reference doesn't reference something that is already an element of the vector.
You can make the behavior defined by casting all pointers to uintptr_t and comparing those. This will work on all architectures with continuous memory (i.e. possibly not old 16-bit x86), although I don't know if the specific semantics are guaranteed.
As a side note, I would always interpret the name contains to be about the value, and thus be very surprised if the semantics are anything other than std::find(v.begin(), v.end(), a) != v.end(). Consider using a more expressive name.

Modification of elements of std::set - Defined behavior?

Given the following code:
#include <set>
struct X {
int a, b;
friend bool operator<(X const& lhs, X const& rhs) {
return lhs.a < rhs.a;
}
};
int main() {
std::set<X> xs;
// some insertion...
auto it = xs.find({0, 0}); // assume it != xs.end()
const_cast<X&>(*it).b = 4; // (1)
}
Is (1) defined behavior? I.e., am I allowed to const_cast a reference to an element obtain from a const_iterator of a std::set and modify it if the modification does not alter the ordering?
I have read some posts here and there proposing this kind of const_cast, but I was wondering if this was actually defined behavior.
Whether this has defined behaviour is unclear, but I believe it does.
There appears to be no specific prohibition in the description of std::set on modifying the values, other than the restriction you already hinted at that the comparer must return the same result when passed the same inputs ([associative.reqmts]p3).
However, the general rule on modifying objects defined as const does apply. Whether set defines its elements as const is not spelt out. If it does, then modifying the elements is not allowed ([dcl.type.cv]p4).
But [container.requirements.general]p3 reads:
For the components affected by this subclause that declare an allocator_type, objects stored in these components shall be constructed using the allocator_traits<allocator_type>::construct function and
destroyed using the allocator_traits<allocator_type>::destroy function (20.7.8.2).
std::set<T> declares an allocator_type, which defaults to std::allocator<T>. std::allocator_traits<allocator_type>::construct passes it a T *, causing a T to be constructed, not a const T.
I believe this means std::set<T> is not permitted to define its elements as const, and for lack of any other prohibition, means modifying the elements through const_cast is allowed.
Personally, if I couldn't find any better alternative, I would consider avoiding the issue by putting the whole thing in a wrapper struct, which defines a member mutable X x;. This would allow modifications without const_cast, so long as you take care to avoid changing the relative order of elements already in the set. It would additionally serve as documentation to other readers of your code that the elements of your set can and will be modified.

Curious behaviour of std::string::operator[] in MSVC

I've been using some semi-iterators to tokenize a std::string, and I've run into a curious problem with operator[]. When constructing a new string from a position using char*, I've used something like the following:
t.begin = i;
t.end = i + 1;
t.contents = std::string(&arg.second[t.begin], &arg.second[t.end]);
where arg.second is a std::string. But, if i is the position of the last character, then arg.second[t.end] will throw a debugging assertion- even though taking a pointer of one-past-the-end is well defined behaviour and even common for primitive arrays, and since the constructor is being called using iterators I know that the end iterator will never be de-referenced. Doesn't it seem logical that arg.second[arg.second.size()] should be a valid expression, producing the equivalent of arg.second.end() as a char*?
You're not taking a pointer to one past the end, you're ACCESSING one past the end and then getting the address of that. Entirely different and while the the former is well defined and well formed, the latter is not either. I suggest using the iterator constructor, which is basically what you ARE using but do so with iterators instead of char*. See Alexandre's comment.
operator[](size_type pos) const doesn't return one-past-the-end is pos == size(); it returns charT(), which is a temporary. In the non-const version of operator[], the behavior is undefined.
21.3.4/1
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1 Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const
version returns charT(). Otherwise, the behavior is undefined.
What is well-defined is creating an iterator one past the end. (Pointers might be iterators, too.) However, dereferencing such an iterator will yield Undefined Behavior.
Now, what you're doing is array subscription, and that is very different from forming iterators, because it returns a reference to the referred-to object (much akin to dereferencing an iterator). You are certainly not allowed to access an array one-past-the-end.
std::string is not an array. It is an object, whose interface loosely resembles an array (namely, provides operator[]). But that's when the similarity ends.
Even if we for a second assume that std::string is just a wrapper built on top of an ordinary array, then in order to obtain the one-past-the-end pointer for the stored sequence, you have to do something like &arg.second[0] + t.end, i.e. instead of going through the std::string interface first move into into the domain of ordinary pointers and use ordinary low-level pointer arithmetic.
However, even that assumption is not correct and doing something like &arg.second[0] + t.end is a recipe for disaster. std::string is not guaranteed to store its controlled sequence as an array. It is not guaranteed to be stored continuously, meaning that regardless of where your pointers point, you cannot assume that you'll be able to iterate from one to another by using pointer arithmetic.
If you want to use an std::string in some legacy pointer-based interface the only choice you have is to go through the std::string::c_str() method, which will generate a non-permanent array-based copy of the controlled sequence.
P.S. Note, BTW, that in the original C and C++ specifications it is illegal to use the &a[N] method to obtain the one-past-the-end pointer even for an ordinary built-in array. You always have to make sure that you are not using the [] operator with past-the-end index. The legal way to obtain the pointer has always been something like a + N or &a[0] + N, but not &a[N]. Recent changes legalized the &a[N] approach as well, but nevertheless originally it was not legal.
A string is not a primitive array, so I'd say the implementation is free to add some debug diagnostics if you are doing something dangerous like accessing elements outside its range. I would guess that a release build will probably work.
But...
For what you are trying to do, why not just use the basic_string( const basic_string& str, size_type index, size_type length ); constructor to create the sub strings?

Any ideas why QHash and QMap return const T instead of const T&?

Unlike std::map and std::hash_map, corresponding versions in Qt do not bother to return a reference. Isn't it quite inefficient, if I build a hash for quite bulky class?
EDIT
especially since there is a separate method value(), which could then return it by value.
const subscript operators of STL containers can return a reference-to-const because they flat out deny calls to it with indexes that do not exist in the container. Behaviour in this case is undefined. Consequently, as a wise design choice, std::map doesn't even provide a const subscript operator overload.
QMap tries to be a bit more accommodating, provides a const subscript operator overload as syntactic sugar, runs into the problem with non-existing keys, again tries to be more accomodating, and returns a default-constructed value instead.
If you wanted to keep STL's return-by-const-reference convention, you'd need to allocate a static value and return a reference to that. That, however, would be quite at odds with the reentrancy guarantees that QMap provides, so the only option is to return by value. The const there is just sugar coating to prevent some stupid mistakes like constmap["foo"]++ from compiling.
That said, returning by reference is not always the most efficient way. If you return a fundamental type, or, with more aggressive optimisation, when sizeof(T)<=sizeof(void*), return-by-value often makes the compiler return the result in a register directly instead of indirectly (address to result in register) or—heaven forbid—on the stack.
The other reason (besides premature pessimisation) to prefer pass-by-const-reference, slicing, doesn't apply here, since both std::map and QMap are value-based, and therefore homogeneous. For a heterogeneous container, you'd need to hold pointers, and pointers are fundamental types (except smart ones, of course).
That all said, I almost never use the const subscript operator in Qt. Yes, it has nicer syntax than find()+*it, but invariably, you'll end up with count()/contains() calls right in front of the const subscript operator, which means you're doing the binary search twice. And then you won't notice the miniscule differences in return value performance anyway :)
For value() const, though, I agree that it should return reference-to-const, defaulting to the reference-to-default-value being passed in as second argument, but I guess the Qt developers felt that was too much magic.
The documentation for QMap and QHash specifically say to avoid operator[] for lookup due to the reason Martin B stated.
If you want a const reference, use const_iterator find ( const Key & key ) const where you can then use any of:
const Key & key () const
const T & value () const
const T & operator* () const
const T * operator-> () const
Actually, some of the methods do return a reference... for example, the non-const version of operator[] returns a T &.
However, the const version of operator[] returns a const T. Why? As "unwind" has already noted, the reason has to do with what happens when the key does not exist in the map. In the non-const operator[], we can add the key to the map, then return a reference to the newly added entry. However, the const operator[] can't do this because it can't modify the map. So what should it return a reference to? The solution is to make the const operator[] return const T, and then return a default-constructed T in the case where the key is not present in the map.
Weird, yes.
Perhaps this is because of the desired semantics, where doing e.g. value() on an unspecified key, returns a default-constructed value of the proper type. That's not possible using references, at least not as cleanly.
Also, things like name return value optimization can lessen the performance impact of this design.

Simplest, safest way of holding a bunch of const char* in a set?

I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.
If you don't want to wrap them in std::strings, you can define a functor class:
struct ConstCharStarComparator
{
bool operator()(const char *s1, const char *s2) const
{
return strcmp(s1, s2) < 0;
}
};
typedef std::set<const char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;
Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:
"For templates greater, less, greater_equal, and less_equal, the specializations for any
pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."
The guarantee is there exactly for things like your set<const char*>.
The "optimized way"
If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:
struct MyCharComparator
{
bool operator()(const char * A, const char * B) const
{
return (strcmp(A, B) < 0) ;
}
} ;
To use with a:
std::set<const char *, MyCharComparator>
The standard way
Use a:
std::set<std::string>
It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).
Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".
Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...
The "never do it" way
I noted the following comment after my answer:
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).
Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:
{ "AAA", "AAA", "AAA" }
Of course, if you only use the same global variables to contain the string, this is another story.
In this case, I suggest:
std::set<const char *>
Of course, it won't work if you compare strings with the same contents but different variables/addresses.
And, of course, it won't work with static const char * strings if those strings are defined in a header.
But this is another story.
Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of std::strings in the set. That way you won't have to write any extra glue code.
Must the set contain const char*?
What immediately springs to mind is storing the strings in a std::string instead, and putting those into the std::set. This will allow comparisons without a problem, and you can always get the raw const char* with a simple function call:
const char* data = theString.c_str();
Either use a comparator, or use a wrapper type to be contained in the set. (Note: std::string is a wrapper, too....)
const char* a("a");
const char* b("b");
struct CWrap {
const char* p;
bool operator<(const CWrap& other) const{
return strcmp( p, other.p ) < 0;
}
CWrap( const char* p ): p(p){}
};
std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);
Others have already posted plenty of solutions showing how to do lexical comparisons with const char*, so I won't bother.
Please do not suggest creating std::strings - it is a waste of time and space.
If std::string is a waste of time and space, then std::set might be a waste of time and space as well. Each element in a std::set is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more than std::set's O(log n) lookups help performance. You may get better results using another data structure, such as a sorted std::vector, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.
the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
The strings are static, so they can be compared for (in)equality based on their address.
That depends on what the pointers point to. If all of the keys are allocated from the same array, then using operator< to compare pointers is not undefined behavior.
Example of an array containing separate static strings:
static const char keys[] = "apple\0banana\0cantaloupe";
If you create a std::set<const char*> and fill it with pointers that point into that array, their ordering will be well-defined.
If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.
If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?
If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them
Casting the pointers to uintptr_t would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.
Presumably you don't want to use std::string because of performance reasons.
I'm running MSVC and gcc, and they both seem to not mind this:
bool foo = "blah" < "grar";
EDIT: However, the behaviour in this case is unspecified. See comments...
They also don't complain about std::set<const char*>.
If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to ints.
Edit:
Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:
The question is not looking for std::string solutions. Every time you enter an std::string in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.
Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.
The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a const char* over either std::string or a custom comparison that calls strcmp. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.