Comparators in std::map, std::set and std::priority_queue - c++

All these three containers accept a function object by const reference as opposed to either a value or a forwarding reference. This results in the need for copying the function object into the container's internal storage (at most twice).
Is there a reason for copying a function object twice? As opposed to giving the user the ability to pass any type of function object and have that be constructed into the internal functor storage? That way the library is more general and there are less surprises to the user.
The same philosophy is applied in push_back() functions - they have two overloads, one with a const reference and one with an rvalue reference, because this gives the user more control about whether they want to move the value or copy the value. The library remains efficient in the general case without making any assumptions about the usecase.
I suspect this is a design decision that has been carried over since the pre-C++11 days. Would changing this be a decent proposal for the standard?

Typically, the comparator is a pretty small object that is cheap to copy, and you're only going to construct your container once. That extra one copy one time isn't really going to matter. You're probably not creating a bunch of std::maps in your latency sensitive code. So there's simply not a lot of benefit from introducing more constructors for these containers. And what would such a proposal look like? Would you then want to take the Allocator by rvalue reference as well? Now we're adding a bunch more constructors. Change all the constructors taking a Compare const& to instead take a constrained forwarding references? Now, we broke ABI for still marginal, if any, gain. Constructors are complicated. I'm not even convinced that if std::map were designed today, the interface would look different in this regard. If anything, we'd probably just take Compare by value instead of by const&.
On the other hand, push_back is used a LOT, with a wide variety of types, during the main runtime of programs. Being able to move into a vector, or emplace into a vector, is a huge win. The two situations aren't really comparable.

Related

Why/when should I use std::unique/shared_ptr (std::vector<>) over just std::vector<>?

I'm a little bit confused about the main use of std::unique/shared_ptr(std::vector<>) when I can simply use a std::vector<>, which, as I know, is itself inherently a dynamic array. As I have also seen around, people say that there is no any performance difference between these two. So, based on all this, what is the point of using a smart pointer pointing to a container (in this case, a vector) instead of a vector alone?
First of all, you shouldn't be using std::shared_ptr unless you need the specific "shared ownership" semantics associated with std::shared_ptr. If you need a smart pointer, you should default to std::unique_ptr by default, and only switch away from it in the scenario where you expressly find that you need to.
Secondly: ostensibly, the reason to prefer std::unique_ptr<TYPE> over TYPE is if you plan to move the object around a lot. This is the common design paradigm for large objects that are either unmovable, or otherwise expensive to move—i.e. they implemented a Copy Constructor and didn't implement a Move Constructor, so moves are forced to behave like a Copy.
std::vector, however, does have relatively efficient move semantics: if you move a std::vector around, regardless of how complex its contained types are, the move only constitutes a couple of pointer swaps. There's no real risk that moving a std::vector will incur a large amount of computational complexity. Even in the scenario where you're overriding a previously allocated array (invoking the Destructors of all objects in the vector), you'd still have that complexity if you were using std::unique_ptr<std::vector<TYPE>> instead, saving you nothing.
There are two advantages to std::unique_ptr<std::vector<TYPE>>. The first of which is that it gets rid of the implicit copy constructor; maybe you want to enforce to maintaining programmers that the object shouldn't be copied. But that's a pretty niche use. The other advantage is that it allows you to stipulate the scenario where there's no vector, i.e. vec.size() == 0 is a different condition than doesNotExist(vec). But even in that scenario, you should be preferring std::optional<std::vector> instead, which better conveys through the code the intent of the object. Granted, std::optional is only available in C++17→ Code, so maybe you're in an environment that hasn't implemented it yet. But otherwise, there's little reason to use std::unique_ptr<std::vector>.
So in general, I don't believe there are practical uses for std::unique_ptr<std::vector>. There's no practical performance difference between it and std::vector, and using it will just make your code needlessly complex.

Why does std::promise::set_value() have two overloads

For the case when std::promise<> is instantiated with a non reference type, why does the set_value() method have two distinct overloads as opposed to one pass by value overload?
so instead of the following two
std::promise::set_value(const Type& value);
std::promise::set_value(Type&& value);
just one
std::promise::set_value(Type value);
This has at least the following two benefits
Enable users to move the value into the promise/future when they want, since the API argument is a value type. When copying is not supported it is obvious that the value is going to be copied. Further when the expression being passed into the function is a prvalue it can be completely elided easily by the compiler (especially so in C++17)
It conveys the point that the class requires a copy of the value a lot better and succinctly than two overloads which accomplish the same task.
I was making a similar API (as far as ownership is concerned) and I was wondering what benefits the design decision employed by the C++ standard library has as opposed to what I mentioned.
Thanks!
Passing an argument by value if it needs to be "transferred" unconditionally and then moving from it is a neat little trick, but it does incur at least one mandatory move. Therefore, this trick is best in leaf code that is used rarely or only in situations that are completely under the control of the author.
By contrast, a core library whose users and uses are mostly unknown to the author should not unnecessarily add avoidable costs, and providing two separate reference parameter overloads is more efficient.
In a nutshell, the more leaf and user you are, the more you should favour simplicity over micro-optimizations, and the more library you are, the more you should go out of your way to be general and efficient.

How C++11 has changed the standard containers?

I get that C++ 11 has introduced the new move semantics and as consequence data containers are changed to meet the new definitions and specifications of the language, I don't really get how the standard containers benefits from them.
Also I think that I have got what an Rvalue is and how the move semantics act, the problem is i don't see any useful point about this, moving things and changing their labels doesn't sound like a meaningful feature.
I can ask for a good resource about how map, list, vector, ... are changing in the new C++11 ?
The reason RValues and LValues were introduced is it can be much faster to move object data than it is to perform a copy on it. The performance gain is mainly because of internally stored pointers that do not need to be replicated during a move, which would otherwise involve needless malloc calls and memcpys. For instance, std::string contains a pointer to a char array that can be very large. Copying it would involve copying the data in that char array, moving simply involves copying the pointer to that data.
With respect to LValues and RValues, the only things, of which I'm aware, that have changed, is now we have a shiny new constructor to play with, and many of the member functions have been rewritten to take advantage of move semantics.
For instance, std::vector now has a std::vector::vector(std::vector&& move) ctor, and functions like push_back have been changed to also accept RValues.
This should be, for the most part, seemless to you. If you're writing a library, rather than just using one, you need to know this, and URefs too.
You can check out the containers' descriptions at cppreference, which explains the interface of STL. There, you can find what the C++11 standard adds - it is marked with '(since C++11)' tag. If you click on a container type, it will show you which methods are new for it.

Is it good practice to generally make heavyweight classes non-copyable?

I have a Shape class containing potentially many vertices, and I was contemplating making copy-constructor/copy-assignment private to prevent accidental needless copying of my heavyweight class (for example, passing by value instead of by reference).
To make a copy of Shape, one would have to deliberately call a "clone" or "duplicate" method.
Is this good practice? I wonder why STL containers don't use this approach, as I rarely want to pass them by value.
Restricting your users isn't always a good idea. Just documenting that copying may be expensive is enough. If a user really wants to copy, then using the native syntax of C++ by providing a copy constructor is a much cleaner approach.
Therefore, I think the real answer depends on the context. Perhaps the real class you're writing (not the imaginary Shape) shouldn't be copied, perhaps it should. But as a general approach, I certainly can't say that one should discourage users from copying large objects by forcing them to use explicit method calls.
IMHO, providing a copy constructor and assignment operator or not depend more of what your class modelizes than the cost of copying.
If your class represent values, that is if passing an object or a copy of the object doesn't make a difference, then provide them (and provide the equality operator also)
If your class isn't, that is if you think that object of the class have an identity and a state (one also speak of entities), don't. If a copy make sense, provide it with a clone or copy member.
There are sometimes classes you can't easily classify. Containers are in that position. It is meaninfull the consider them as entities and pass them only by reference and have special operations to make a copy when needed. You can also consider them simply as agregation of values and so copying makes sense. The STL was designed around value types. And as everything is a value, it makes sense for containers to be so. That allows things like map<int, list<> > which are usefull. (Remember, you can't put nocopyable classes in an STL container).
Generally, you do not make classes non-copyable just because they are heavy (you had shown a good example STL).
You make them non-copyable when they connected to some non-copyable resource like socket, file, lock or they are not designed to be copied at all (for example have some internal structures that can be hardly deep copied).
However, in your case your object is copyable so leave it as this.
Small note about clone() -- it is used as polymorphic copy constructor -- it has different
meaning and used differently.
Most programmers are already aware of the cost of copying various objects, and know how to avoid copies, using techniques such as pass by reference.
Note the STL's vector, string, map, list etc. could all be variously considered 'heavyweight' objects (especially something like a vector with 10,000 elements!). Those classes all still provide copy constructors and assignment operators, so if you know what you're doing (such as making a std::list of vectors), you can copy them when necessary.
So if it's useful, provide them anyway, but be sure to document they are expensive operations.
Depending on your needs...
If you want to ensure that a copy won't happen by mistake, and making a copy would cause a severe bottleneck or simply doesn't make sense, then this is good practice. Compiling errors are better than performance investigations.
If you are not sure how your class will be used, and are unsure if it's a good idea or not then it is not good practice. Most of the time you would not limit your class in this way.

"const T &arg" vs. "T arg"

Which of the following examples is the better way of declaring the following function and why?
void myFunction (const int &myArgument);
or
void myFunction (int myArgument);
Use const T & arg if sizeof(T)>sizeof(void*) and use T arg if sizeof(T) <= sizeof(void*)
They do different things. const T& makes the function take a reference to the variable. On the other hand, T arg will call the copy constructor of the object and passes the copy.
If the copy constructor is not accessible (e.g. it's private), T arg won't work:
class Demo {
public: Demo() {}
private: Demo(const Demo& t) { }
};
void foo(Demo t) { }
int main() {
Demo t;
foo(t); // error: cannot copy `t`.
return 0;
}
For small values like primitive types (where all matters is the contents of the object, not the actual referential identity; say, it's not a handle or something), T arg is generally preferred. For large objects and objects that you can't copy and/or preserving referential identity is important (regardless of the size), passing the reference is preferred.
Another advantage of T arg is that since it's a copy, the callee cannot maliciously alter the original value. It can freely mutate the variable like any local variables to do its work.
Taken from Move constructors. I like the easy rules
If the function intends to change the argument as a side effect, take it by reference/pointer to a non-const object. Example:
void Transmogrify(Widget& toChange);
void Increment(int* pToBump);
If the function doesn't modify its argument and the argument is of primitive type, take it by value. Example:
double Cube(double value);
Otherwise
3.1. If the function always makes a copy of its argument inside, take it by value.
3.2. If the function never makes a copy of its argument, take it by reference to const.
3.3. Added by me: If the function sometimes makes a copy, then decide on gut feeling: If the copy is done almost always, then take by value. If the copy is done half of the time, go the safe way and take by reference to const.
In your case, you should take the int by value, because you don't intend to modify the argument, and the argument is of primitive type. I think of "primitive type" as either a non-class type or a type without a user defined copy constructor and where sizeof(T) is only a couple of bytes.
There's a popular advice that states that the method of passing ("by value" vs "by const reference") should be chosen depending in the actual size of the type you are going to pass. Even in this discussion you have an answer labeled as "correct" that suggests exactly that.
In reality, basing your decision on the size of the type is not only incorrect, this is a major and rather blatant design error, revealing a serious lack of intuition/understanding of good programming practices.
Decisions based on the actual implementation-dependent physical sizes of the objects must be left to the compiler as often as possible. Trying to "tailor" your code to these sizes by hard-coding the passing method is a completely counterproductive waste of effort in 99 cases out of 100. (Yes, it is true, that in case of C++ language, the compiler doesn't have enough freedom to use these methods interchangeably - they are not really interchangeable in C++ in general case. Although, if necessary, a proper size-based [semi-]automatic passing methios selection might be implemented through template metaprogramming; but that's a different story).
The much more meaningful criterion for selecting the passing method when you write the code "by hand" might sound as follows:
Prefer to pass "by value" when you are passing an atomic, unitary, indivisible entity, such as a single non-aggregate value of any type - a number, a pointer, an iterator. Note that, for example, iterators are unitary values at the logical level. So, prefer to pass iterators by value, regardless of whether their actual size is greater than sizeof(void*). (STL implementation does exactly that, BTW).
Prefer to pass "by const reference" when you are passing an aggregate, compound value of any kind. i.e. a value that has exposed pronouncedly "compound" nature at the logical level, even if its size is no greater than sizeof(void*).
The separation between the two is not always clear, but that how things always are with all such recommendations. Moreover, the separation into "atomic" and "compound" entities might depend on the specifics of your design, so the decision might actually differ from one design to the other.
Note, that this rule might produce decisions different from those of the allegedly "correct" size-based method mentioned in this discussion.
As an example, it is interesing to observe, that the size-based method will suggest you manually hard-code different passing methods for different kinds of iterators, depending on their physical size. This makes is especially obvious how bogus the size-based method is.
Once again, one of the basic principles from which good programming practices derive, is to avoid basing your decisions on physical characteristics of the platform (as much as possible). Instead, you decisions have to be based on the logical and conceptual properties of the entities in your program (as much as possible). The issue of passing "by value" or "by reference" is no exception here.
In C++11 introduction of move semantics into the language produced a notable shift in the relative priorities of different parameter-passing methods. Under certain circumstances it might become perfectly feasible to pass even complex objects by value
Should all/most setter functions in C++11 be written as function templates accepting universal references?
Contrary to popular and long-held beliefs, passing by const reference isn't necessarily faster even when you're passing a large object. You might want to read Dave Abrahams recent article on this very subject.
Edit: (mostly in response to Jeff Hardy's comments): It's true that passing by const reference is probably the "safest" alternative under the largest number of circumstances -- but that doesn't mean it's always the best thing to do. But, to understand what's being discussed here, you really do need to read Dave's entire article quite carefully, as it is fairly technical, and the reasoning behind its conclusions is not always intuitively obvious (and you need to understand the reasoning to make intelligent choices).
Usually for built-in types you can just pass by value. They're small types.
For user defined types (or templates, when you don't what is going to be passed) prefer const&. The size of a reference is probably smaller than the size of the type. And it won't incurr an extra copy (no call to a copy constructor).
Well, yes ... the other answers about efficiency are true. But there's something else going on here which is important - passing a class by value creates a copy and, therefore, invokes the copy constructor. If you're doing fancy stuff there, it's another reason to use references.
A reference to const T is not worth the typing effort in case of scalar types like int, double, etc. The rule of thumb is that class-types should be accepted via ref-to-const. But for iterators (which could be class-types) we often make an exception.
In generic code you should probably write "T const&" most of the time to be on the safe side. There's also boost's call traits you can use to select the most promising parameter passing type. It basically uses ref-to-const for class types and pass-by-value for scalar types as far as I can tell.
But there are also situations where you might want to accept parameters by value, regardless of how expensive creating a copy can be. See Dave's article "Want Speed? Use pass by value!".
For simple types like int, double and char*, it makes sense to pass it by value. For more complex types, I use const T& unless there is a specific reason not to.
The cost of passing a 4 - 8 byte parameter is as low as you can get. You don't buy anything by passing a reference. For larger types, passing them by value can be expensive.
It won't make any difference for an int, as when you use a reference the memory address still has to be passed, and the memory address (void*) is usually about the size of an integer.
For types that contain a lot of data it becomes far more efficient as it avoids the huge overhead from having to copy the data.
Well the difference between the two doesn't really mean much for ints.
However, when using larger structures (or objects), the first method you used, pass by const reference, gives you access to the structure without need to copy it. The second case pass by value will instantiate a new structure that will have the same value as the argument.
In both cases you see this in the caller
myFunct(item);
To the caller, item will not be changed by myFunct, but the pass by reference will not incur the cost of creating a copy.
There is a very good answer to a similar question over at Pass by Reference / Value in C++
The difference between them is that one passes an int (which gets copied), and one uses the existing int. Since it's a const reference, it doesn't get changed, so it works much the same. The big difference here is that the function can alter the value of the int locally, but not the const reference. (I suppose some idiot could do the same thing with const_cast<>, or at least try to.) For larger objects, I can think of two differences.
First, some objects simply can't get copied, auto_ptr<>s and objects containing them being the obvious example.
Second, for large and complicated objects it's faster to pass by const reference than to copy. It's usually not a big deal, but passing objects by const reference is a useful habit to get into.
Either works fine. Don't waste your time worrying about this stuff.
The only time it might make a difference is when the type is a large struct, which might be expensive to pass on the stack. In that case, passing the arg as a pointer or a reference is (slightly) more efficient.
The problem appears when you are passing objects. If you pass by value, the copy constructor will be called. If you haven't implemented one, then a shallow copy of that object will be passed to the function.
Why is this a problem? If you have pointers to dynamically allocated memory, this could be freed when the destructor of the copy is called (when the object leaves the function's scope). Then, when you re call your destructor, youll have a double free.
Moral: Write your copy constructors.