Issue
In C++17, associative containers in standard library will have insert_or_assign member function, which will do what its name suggests. Unfortunately, it seems like it doesn't have iterator based interface for bulk insert/assign. I even tried to compile small example, and from the compiler error, compiler couldn't find suitable overload, and neither candidate was reasonably close to iterator based interface.
Question
Why C++17 didn't include iterator based insert_or_assign for operations in bulk? Was there any technical reasons? Design issues?
My assumptions and ideas
I don't see any technical reason to not add iterator based bulk insertion/addition. It seems quite doable. It needs to lookup the key anyway, so I don't see any violations of "Don't pay for what you don't use".
Actually not having the overload makes standard library less consistent, and kind of going against itself. Plain insert supports it, so I'd expect insert_or_assign to support that too. I don't think that not having the overload will make it "Easier to use correctly, and harder to use incorrectly".
The only clue left is this notice from cppreference:
insert_or_assign returns more information than operator[] and does not require default-constructibility of the mapped type.
I'm not sure why this might be a limitation, as the associative container has access to all of the internals and doesn't need to deal with operator[] at all.
Binary compatibility is not applicable here, if I didn't forget anything. Modification will be in a header, and everything will need to recompile anyway.
I couldn't find any associated paper either. Splicing maps and sets doesn't seem to mention it. The feature looks like a phantom.
Standard could at least include insert_assign_iterator, so that one could write std::copy(input_first, input_last, insert_assign_iterator{map});, but standard includes neither.
insert_or_emplace is intended to be a better form of doing some_map[key] = value;. The latter requires that the mapped_type is default constructible, while insert_or_emplace does not.
What you're talking about is similar, but different. You have some range of key-value pairs, and you want to stick all of those values into the map, whether they have equivalent keys or not. This was not the problem that these functions were created to solve, as evidenced by the original proposal for them.
It's not that they're necessarily a bad idea. It simply wasn't the problem the function was added to solve.
Related
Why is std::range::sort (and other range-based algorithms) implemented in the range namespace? Why isn't it defined as an overload of std::sort taking a range?
It's to avoid disrupting existing code bases. Eric Niebler, Sean Parent and Andrew Sutton discussed different approaches in their design paper D4128.
3.3.6 Algorithm Return Types are Changed to Accommodate Sentinels
... In similar fashion, most algorithm get new return types when they
are generalized to support sentinels. This is a source-breaking change
in many cases. In some cases, like for_each, the change is unlikely to
be very disruptive. In other cases it may be more so. Merely accepting
the breakage is clearly not acceptable. We can imagine three ways to
mitigate the problem:
Only change the return type when the types of the iterator and the sentinel differ. This leads to a slightly more complicated interface
that may confuse users. It also greatly complicates generic code,
which would need metaprogramming logic just to use the result of
calling some algorithms. For this reason, this possibility is not
explored here.
Make the new return type of the algorithms implicitly convertible to the old return type. Consider copy, which currently returns the
ending position of the output iterator. When changed to accommodate
sentinels, the return type would be changed to something like pair<I, O>; that is, a pair of the input and output iterators. Instead of
returning a pair, we could return a kind of pair that is implicitly
convertible to its second argument. This avoids breakage in some, but
not all, scenarios. This subterfuge is unlikely to go completely
unnoticed.
Deliver the new standard library in a separate namespace that users must opt into. In that case, no code is broken until the user
explicitly ports their code. The user would have to accommodate the
changed return types then. An automated upgrade tool similar to clang
modernize can greatly help here.
We, the authors, prefer (3).
Ultimately, it was to be the least disruptive to existing code bases that move onto building using C++20 enabled compilers. It's the approach they themselves preferred, and seems like the rest is history.
I have a codebase which makes extensive use of CUDA, which unfortunately only supports C++14 to date. However, I still want to use string_view, which is a C++17 feature. The implementation is relatively straight-forward, especially since I do not require the 'find' functionalities.
However, I do need hashing to work. The standard mandates that std::hash of a string_view must be the equal to the hash of a string constructed from the string_view (and I intend to rely on this guarantee). Is there a standard-conform way of getting the output from std::hash without having to temporarily construct a string object, which may come with an unoptimizable heap allocation (which is the route that string-view-lite went)? I would rather not rely on copying over the algorithm from the concrete stdlib implementation, since that might break in the future or already breaks compilation with older versions.
Alternatively, is there a way to let MSVC (EDIT: v14.16) use std::string_view in C++14-mode, which NVCC also recognizes? It would be great if Clang and GCC had a similar option as well, since the codebase might migrate away from MSVC one day.
You are out of luck in my opinion since, also by assuming that you could mimic the internal structure and pass a tailored object to fulfill and look like a std::string, that would surely be more fragile than just copying the implementation.
I see two choices, either:
you copy the std::hash<std::string> specialization implementation and put some asserts with hardcoded cases which may warn you if anything changes or is different (rather a clumsy solution).
you provide your own hash function which overrides std::string one and pass it as template argument so that you can enforce the constraint of them being equal when working with STL collections.
I've been reading a bit about C++20's consistent comparison (i.e. operator<=>) but couldn't understand what's the practical difference between std::strong_ordering and std::weak_ordering (same goes for the _equality version for this manner).
Other than being very descriptive about the substitutability of the type, does it actually affect the generated code? Does it add any constraints for how one could use the type?
Would love to see a real-life example that demonstrates this.
Does it add any constraints for how one could use the type?
One very significant constraint (which wasn't intended by the original paper) was the adoption of the significance of strong_ordering by P0732 as an indicator that a class type can be used as a non-type template parameter. weak_ordering isn't sufficient for this case due to how template equivalence has to work. This is no longer the case, as non-type template parameters no longer work this way (see P1907R0 for explanation of issues and P1907R1 for wording of the new rules).
Generally, it's possible that some algorithms simply require weak_ordering but other algorithms require strong_ordering, so being able to annotate that on the type might mean a compile error (insufficiently strong ordering provided) instead of simply failing to meet the algorithm's requirements at runtime and hence just being undefined behavior. But all the algorithms in the standard library and the Ranges TS that I know of simply require weak_ordering. I do not know of one that requires strong_ordering off the top of my head.
Does it actually affect the generated code?
Outside of the cases where strong_ordering is required, or an algorithm explicitly chooses different behavior based on the comparison category, no.
There really isn't any reason to have std::weak_ordering. It's true that the standard describes operations like sorting in terms of a "strict" weak order, but there's an isomorphism between strict weak orderings and a totally ordered partition of the original set into incomparability equivalence classes. It's rare to encounter generic code that is interested both in the order structure (which considers each equivalence class to be one "value") and in some possibly finer notion of equivalence: note that when the standard library uses < (or <=>) it does not use == (which might be finer).
The usual example for std::weak_ordering is a case-insensitive string, since for instance printing two strings that differ only by case certainly produces different behavior despite their equivalence (under any operator). However, lots of types can have different behavior despite being ==: two std::vector<int> objects, for instance, might have the same contents and different capacities, so that appending to them might invalidate iterators differently.
The simple fact is that the "equality" implied by std::strong_ordering::equivalent but not by std::weak_ordering::equivalent is irrelevant to the very code that stands to benefit from it, because generic code doesn't depend on the implied behavioral changes, and non-generic code doesn't need to distinguish between the ordering types because it knows the rules for the type on which it operates.
The standard attempts to give the distinction meaning by talking about "substitutability", but that is inevitably circular because it can sensibly refer only to the very state examined by the comparisons. This was discussed prior to publishing C++20, but (perhaps for the obvious reasons) not much of the planned further discussion has taken place.
From what I understand, standard layout allows three things:
Empty base class optimization
Backwards compatibility with C with certain pointer casts
Use of offsetof
Now, included in the library is the is_standard_layout predicate metafunction, but I can't see much use for it in generic code as those C features I listed above seem extremely rare to need checking in generic code. The only thing I can think of is using it inside static_assert, but that is only to make code more robust and isn't required.
How is is_standard_layout useful? Are there any things which would be impossible without it, thus requiring it in the standard library?
General response
It is a way of validating assumptions. You wouldn't want to write code that assumes standard layout if that wasn't the case.
C++11 provides a bunch of utilities like this. They are particularly valuable for writing generic code (templates) where you would otherwise have to trust the client code to not make any mistakes.
Notes specific to is_standard_layout
It looks to me like the (pseudo code) definition of is_pod would roughly be...
// note: applied recursively to all members
bool is_pod(T) { return is_standard_layout(T) && is_trivial(T); }
So, you need to know is_standard_layout in order to implement is_pod. Given that, we might as well expose is_standard_layout as a tool available to library developers. Also of note: if you have a use-case for is_pod, you might want to consider the possibility that is_standard_layout might actually be a better (more accurate) choice in that case, since POD is essentially a subset of standard layout.
I get the feeling that they added every conceivable variant of type evaluation, regardless of any obvious value, just in case someone might encounter a need sometime before the next standard comes out. I doubt if piling on these "extra" type properties adds a significant additional burden to compiler developers.
There is a nice discussion of standard layout here: Why is C++11's POD "standard layout" definition the way it is?
There is also a lot of good detail at cppreference.com: Non-static data members
std::set_union and its kin take two pairs of iterators for the sets to be operated on. That's great in that it's the most flexible thing to do. However they very easily could have made an additional convenience functions which would be more elegant for 80% of typical uses.
For instance:
template<typename ContainerType, typename OutputIterator>
OutputIterator set_union( const ContainerType & container1,
const ContainerType & container2,
OutputIterator & result )
{
return std::set_union( container1.begin(), container1.end(),
container2.begin(), container2.end(),
result );
}
would turn:
std::set_union( mathStudents.begin(), mathStudents.end(),
physicsStudents.begin(), physicsStudents.end(),
students.begin() );
into:
std::set_union( mathStudents, physicsStudents, students.begin() );
So:
Are there convenience functions like this hiding somewhere that I just haven't found?
If not, can anyone thing of a reason why it would be left out of STL?
Is there perhaps a more full featured set library in boost? (I can't find one)
I can of course always put my implementations in a utility library somewhere, but it's hard to keep such things organized so that they're used across all projects, but not conglomerated improperly.
Are there convenience functions like this hiding somewhere that I just haven't found?
Not in the standard library.
If not, can anyone thing of a reason why it would be left out of STL?
The general idea with algorithms is that they work with iterators, not containers. Containers can be modified, altered, and poked at; iterators cannot. Therefore, you know that, after executing an algorithm, it has not altered the container itself, only potentially the container's contents.
Is there perhaps a more full featured set library in boost?
Boost.Range does this. Granted, Boost.Range does more than this. It's algorithms don't take "containers"; they take iterator ranges, which STL containers happen to satisfy the conditions for. They also have lazy evaluation, which can be nice for performance.
One reason for working with iterators is of course that it is more general and works on ranges that are not containers, or just a part of a container.
Another reason is that the signatures would be mixed up. Many algorithms, like std::sort have more than one signature already:
sort(Begin, End);
sort(Begin, End, Compare);
Where the second one is for using a custom Compare when sorting on other than standard less-than.
If we now add a set of sort for containers, we get these new functions
sort(Container);
sort(Container, Compare);
Now we have the two signatures sort(Begin, End) and sort(Container, Compare) which both take two template parameters, and the compiler will have problems resolving the call.
If we change the name of one of the functions to resolve this (sort_range, sort_container?) it will not be as convenient anymore.
I agree, STL should take containers instead of iterators-pairs for the following reasons;
Simpler code
Algorithms could be overloaded for specified containers, ie, could use the map::find algorithm instead of std::find -> More general code
A subrange could easily be wrapped into a container, as is done in boost::range
#Bo Persson has pointed to a problem with ambiguity, and I think that's quite valid.
I think there's a historical reason that probably prevented that from ever really even being considered though.
The STL was introduced into C++ relatively late in the standardization process. Shortly after it was accepted, the committee voted against even considering any more new features for addition into C++98 (maybe even at the same meeting). By the time most people had wrapped their head around the existing STL to the point of recognizing how much convenience you could get from something like ranges instead of individual iterators, it was too late to even be considered.
Even if the committee was still considering new features, and somebody had written a proposals to allow passing containers instead of discrete iterators, and had dealt acceptably with the potential for ambiguity, I suspect the proposal would have been rejected. Many (especially the C-oriented people) saw the STL as a huge addition to the standard library anyway. I'm reasonably certain quite a few people would have considered it completely unacceptable to add (lots) more functions/overloads/specializations just to allowing passing one parameter in place of two.
Using the begin & end elements for iteration allows one to use non-container types as inputs. For example:
ContainerType students[10];
vector<ContainerType> physicsStudents;
std::set_union(physicsStudents.begin(), physicsStudents.end(),
&students[0], &students[10],
physicsStudents.begin());
Since they are such simple implementations, I think it makes sense not to add them to the std library and allow authors to add their own. Especially given that they are templates, thus potentially increasing the lib size of the code and adding convenience functions across std would lead to code bloat.