(This question stems for this more specific questions about stream iterators.)
A type is said [Stepanov, McJones] to be Regular if:
it is equality-comparable
it is assignable (from other values of the type)
it is destructible
it is default-constructible (i.e. constructible with no arguments)
It has a (default) total ordering of its values
(and there's some wording about "underlying type" which I didn't quite get.)
Some/many people claim that, when designing types - e.g. for the standard library of C++ - it is worthwhile or even important to make an effort to make these types regular, possibly ignoring the total-order requirement. Some mention the maxim:
Do as the ints do.
and indeed, the int type satisfies all these requirements. However, default-constructible types, when constructed, hold some kind of null, invalid or junk value - ints do. A different approach is requiring initialization on construction (and de-initialization on destruction), so that the object's lifetime corresponds to its time of viability.
To contrast these two approaches, one can perhaps think about a T-pointer type, and a T-reference or T-reference-wrapper type. The pointer is basically regular (the ordering assumes a linear address space), and has nullptr which one needs to look out for; the reference is, under the hood, a pointer - but you can't just construct a reference-to-nothing or a junk-reference. Now, pointers may have their place, but we mostly like to work with references (or possibly reference wrappers which are assignable).
So, why should we prefer designing (as library authors) and using regular types? Or at least, why should we prefer them in so many contexts?
I doubt this is the most significant answer, but you could make the argument that "no uninitialized state" / "no junk value" is a principle that doesn't sit well with move-assignment: After you move from your value, and take its resources away - what state does it remain in? It is not very far from default-constructing it (unless the move-assignment is based on some sort of swapping; but then - you could look at move construction).
The rebuttal would be: "Fine, so let's not have such types move-assignable, only move-destructible"; unfortunately - C++ decided sometime in the 2000's to go with non-destructive moves. See also this question & answer by #HowardHinnant:
Why does C++ move semantics leave the source constructed?
All these three containers accept a function object by const reference as opposed to either a value or a forwarding reference. This results in the need for copying the function object into the container's internal storage (at most twice).
Is there a reason for copying a function object twice? As opposed to giving the user the ability to pass any type of function object and have that be constructed into the internal functor storage? That way the library is more general and there are less surprises to the user.
The same philosophy is applied in push_back() functions - they have two overloads, one with a const reference and one with an rvalue reference, because this gives the user more control about whether they want to move the value or copy the value. The library remains efficient in the general case without making any assumptions about the usecase.
I suspect this is a design decision that has been carried over since the pre-C++11 days. Would changing this be a decent proposal for the standard?
Typically, the comparator is a pretty small object that is cheap to copy, and you're only going to construct your container once. That extra one copy one time isn't really going to matter. You're probably not creating a bunch of std::maps in your latency sensitive code. So there's simply not a lot of benefit from introducing more constructors for these containers. And what would such a proposal look like? Would you then want to take the Allocator by rvalue reference as well? Now we're adding a bunch more constructors. Change all the constructors taking a Compare const& to instead take a constrained forwarding references? Now, we broke ABI for still marginal, if any, gain. Constructors are complicated. I'm not even convinced that if std::map were designed today, the interface would look different in this regard. If anything, we'd probably just take Compare by value instead of by const&.
On the other hand, push_back is used a LOT, with a wide variety of types, during the main runtime of programs. Being able to move into a vector, or emplace into a vector, is a huge win. The two situations aren't really comparable.
As far as I know, when two pointers (or references) do not type alias each other, it is legal to for the compiler to make the assumption that they address different locations and to make certain optimizations thereof, e.g., reordering instructions. Therefore, having pointers to different types to have the same value may be problematic. However, I think this issue only applies when the two pointers are passed to functions. Within the function body where the two pointers are created, the compiler should be able to make sure the relationship between them as to whether they address the same location. Am I right?
As far as I know, when two pointers (or references) do not type alias
each other, it is legal to for the compiler to make the assumption
that they address different locations and to make certain
optimizations thereof, e.g., reordering instructions.
Correct. GCC, for example, does perform optimizations of this form which can be disabled by passing the flag -fno-strict-aliasing.
However, I think this issue only applies when the two pointers are
passed to functions. Within the function body where the two pointers
are created, the compiler should be able to make sure the relationship
between them as to whether they address the same location. Am I right?
The standard doesn't distinguish between where those pointers came from. If your operation has undefined behavior, the program has undefined behavior, period. The compiler is in no way obliged to analyze the operands at compile time, but he may give you a warning.
Implementations which are designed and intended to be suitable for low-level programming should have no particular difficulty recognizing common patterns where storage of one type is reused or reinterpreted as another in situations not involving aliasing, provided that:
Within any particular function or loop, all pointers or lvalues used to access a particular piece of storage are derived from lvalues of a common type which identify the same object or elements of the same array, and
Between the creation of a derived-type pointer and the last use of it or any pointer derived from it, all operations involving the storage are performed only using the derived pointer or other pointers derived from it.
Most low-level programming scenarios requiring reuse or reinterpretation of storage fit these criteria, and handling code that fits these criteria will typically be rather straightforward in an implementation designed for low-level programming. If an implementation cache lvalues in registers and performs loop hoisting, for example, it could support the above semantics reasonably efficiently by flushing all cached values of type T whenever T or T* is used to form a pointer or lvalue of another type. Such an approach may be optimal, but would degrade performance much less than having to block all type-based optimizations entirely.
Note that it is probably in many cases not worthwhile for even an implementation intended for low-level programming to try to handle all possible scenarios involving aliasing. Doing that would be much more expensive than handling the far more common scenarios that don't involve aliasing.
Implementations which are specialized for other purposes are, of course, not required to make any attempt whatsoever to support any exceptions to 6.5p7--not even those that are often treated as part of the Standard. Whether such an implementation should be able to support such constructs would depend upon the particular purposes for which it is designed.
Lately I see a lot of material about generic programming, and I still cannot wrap my head around one thing, when designing types. I am not sure what is the best way, let me explain.
For some types, it is natural to provide a default constructor.
All the possible constructions of that type will be valid or a default makes sense, so it makes sense to provide a default. This is the case for basic types.
Later, there are some types for which default constructing them does not yield a value. For example, in the standard library we have std::function<Sig> and std::thread, for example. Nevertheless, they are default-constructible, even if they are not holding a value.
Later, we have the proposed optional<T> in the standard. It makes a lot of sense to use it for basic types, since for basic types all the possible assignments represent a valid value (except double and float NaN), but I don't see how you would use it for a thread or a std::function<Sig>, since these types don't hold a "value" when constructed. It is AS IF these types had "optional" embedded in the type directly.
This has these drawbacks. Since there is no "natural" default (or value) construction, such as with an int:
Now I have to litter my class with if (valid) in all my design and signal the error. OR
make it less safe to use if I don't do this check. Precondition -> assign before using if default-constructed.
So when I want to design a type, I always find the question: should I make it default constructible?
Pros:
Easier to reuse in more generic contexts, because my type will more easily model SemiRegular or Regular if I add the appropiate operations.
Cons:
Litter the whole class with if statements or making a contract with the user in which the class is more unsafe to use.
For example, let's say I have a class Song with ID, artist, title, duration and year. It is very nice for the standard library to make the type default constructible. But:
I can't just find a natural way to construct a "default Song".
I have to litter with if (validsong) or make it unsafe to use.
So my questions are:
How should I design a type that has no "natural (as in value)" defaults? Should I provide a default constructor or not?
In the case I choose to provide a default constructor, how does optional<T> fit into all this puzzle? My view is that making a type that is not "naturally" default constructible provide a default constructor makes optional<T> useless in this case.
Should optional<T> just be used for types whose domain of values is complete, meaning, I cannot assign an invalid value to its representation because all of them hold a value, such as in int?
Why were types such as std::function<Sig> made default constructible in the first place in the standard? When constructed, it does not hold a value, so I don't see why a default constructor should be provided. You could always do: optional<function<void ()>>, for example. Is this just a design choice and both are valid or there is one design, in this case, about choosing default vs non-default constructible superior to the other?
(Note: a problem with lots of questions in one question is that some parts of it can be duplicate. Best to ask smaller questions, and check each for prior posts. "One question per question" is a good policy; easier said than done sometimes, I guess.)
Why were types such as std::function made default constructible in the first place in the standard? When constructed, it does not hold a value, so I don't see why a default constructor should be provided. You could always do: optional<function<void ()>>, for example.
See Why do std::function instances have a default constructor?
How should I design a type that has no "natural (as in value)" defaults? Should I provide a default constructor or not?
Default constructors for types that have a tough time meaningfully defining themselves without some kind of data is how a lot of classes implement a null value. Is optional a better choice? I usually think so, but I'm assuming you're aware that std::optional was voted out of C++14. Even if it were the perfect answer it can't be everyone's answer...it's not soup yet.
It will always add some overhead to do runtime tracking of if the value is bound or not. Perhaps not a lot of overhead. But when you are using a language whose raison d'etre is to allow abstraction while still letting you shoot yourself in the foot as close to the metal as you want...shaving off a byte per value in a giant vector can be important.
So even if optional<T> semantics and compile-time checking were perfect, you still might face a scenario where it's advantageous to scrap it and allow your type to encode its own nullity. Gotta push those pixels or polygons or packets or... pfafftowns.
In the case I choose to provide a default constructor, how does optional fit into all this puzzle? My view is that making a type that is not "naturally" default constructible provide a default constructor makes optional useless in this case.
Should optional just be used for types whose domain of values is complete, meaning, I cannot assign an invalid value to its representation because all of them hold a value (except float and double NaN I guess).
In my own case, I found myself wanting to distinguish at compile-time checking between routines that could handle null pointers and those which could not. But suddenly an optional<pointer> offered this situation of either the optional being unbound, being bound to a null pointer, and being bound to a non-null pointer. The compile-time sanity check seeming less the win it had.
So how about optional references? They're controversial to the point that last I heard they're one of the sticking points in the set of things that delayed std::optional from C++14. Which was a bit annoying after I'd converted my optional pointers to optional references. :-/
I had a vague idea to write a book about "pathological C++" where you pick some idea and start taking it to its logical conclusions. optional<T> was one kick I got on and going with essentially the principles you identify. Remove the possibility of "nullity" from being encoded in the type itself, and then suddenly you can get the compiler doing the type-checking for whether a given bit of code is prepared to expect a null or not.
(These days I tend toward suspecting if you get very hung up on this kind of "pathological C++" you'll wind up reinventing Haskell. :-/ See the popular Data.Maybe monad.)
Which of the following examples is the better way of declaring the following function and why?
void myFunction (const int &myArgument);
or
void myFunction (int myArgument);
Use const T & arg if sizeof(T)>sizeof(void*) and use T arg if sizeof(T) <= sizeof(void*)
They do different things. const T& makes the function take a reference to the variable. On the other hand, T arg will call the copy constructor of the object and passes the copy.
If the copy constructor is not accessible (e.g. it's private), T arg won't work:
class Demo {
public: Demo() {}
private: Demo(const Demo& t) { }
};
void foo(Demo t) { }
int main() {
Demo t;
foo(t); // error: cannot copy `t`.
return 0;
}
For small values like primitive types (where all matters is the contents of the object, not the actual referential identity; say, it's not a handle or something), T arg is generally preferred. For large objects and objects that you can't copy and/or preserving referential identity is important (regardless of the size), passing the reference is preferred.
Another advantage of T arg is that since it's a copy, the callee cannot maliciously alter the original value. It can freely mutate the variable like any local variables to do its work.
Taken from Move constructors. I like the easy rules
If the function intends to change the argument as a side effect, take it by reference/pointer to a non-const object. Example:
void Transmogrify(Widget& toChange);
void Increment(int* pToBump);
If the function doesn't modify its argument and the argument is of primitive type, take it by value. Example:
double Cube(double value);
Otherwise
3.1. If the function always makes a copy of its argument inside, take it by value.
3.2. If the function never makes a copy of its argument, take it by reference to const.
3.3. Added by me: If the function sometimes makes a copy, then decide on gut feeling: If the copy is done almost always, then take by value. If the copy is done half of the time, go the safe way and take by reference to const.
In your case, you should take the int by value, because you don't intend to modify the argument, and the argument is of primitive type. I think of "primitive type" as either a non-class type or a type without a user defined copy constructor and where sizeof(T) is only a couple of bytes.
There's a popular advice that states that the method of passing ("by value" vs "by const reference") should be chosen depending in the actual size of the type you are going to pass. Even in this discussion you have an answer labeled as "correct" that suggests exactly that.
In reality, basing your decision on the size of the type is not only incorrect, this is a major and rather blatant design error, revealing a serious lack of intuition/understanding of good programming practices.
Decisions based on the actual implementation-dependent physical sizes of the objects must be left to the compiler as often as possible. Trying to "tailor" your code to these sizes by hard-coding the passing method is a completely counterproductive waste of effort in 99 cases out of 100. (Yes, it is true, that in case of C++ language, the compiler doesn't have enough freedom to use these methods interchangeably - they are not really interchangeable in C++ in general case. Although, if necessary, a proper size-based [semi-]automatic passing methios selection might be implemented through template metaprogramming; but that's a different story).
The much more meaningful criterion for selecting the passing method when you write the code "by hand" might sound as follows:
Prefer to pass "by value" when you are passing an atomic, unitary, indivisible entity, such as a single non-aggregate value of any type - a number, a pointer, an iterator. Note that, for example, iterators are unitary values at the logical level. So, prefer to pass iterators by value, regardless of whether their actual size is greater than sizeof(void*). (STL implementation does exactly that, BTW).
Prefer to pass "by const reference" when you are passing an aggregate, compound value of any kind. i.e. a value that has exposed pronouncedly "compound" nature at the logical level, even if its size is no greater than sizeof(void*).
The separation between the two is not always clear, but that how things always are with all such recommendations. Moreover, the separation into "atomic" and "compound" entities might depend on the specifics of your design, so the decision might actually differ from one design to the other.
Note, that this rule might produce decisions different from those of the allegedly "correct" size-based method mentioned in this discussion.
As an example, it is interesing to observe, that the size-based method will suggest you manually hard-code different passing methods for different kinds of iterators, depending on their physical size. This makes is especially obvious how bogus the size-based method is.
Once again, one of the basic principles from which good programming practices derive, is to avoid basing your decisions on physical characteristics of the platform (as much as possible). Instead, you decisions have to be based on the logical and conceptual properties of the entities in your program (as much as possible). The issue of passing "by value" or "by reference" is no exception here.
In C++11 introduction of move semantics into the language produced a notable shift in the relative priorities of different parameter-passing methods. Under certain circumstances it might become perfectly feasible to pass even complex objects by value
Should all/most setter functions in C++11 be written as function templates accepting universal references?
Contrary to popular and long-held beliefs, passing by const reference isn't necessarily faster even when you're passing a large object. You might want to read Dave Abrahams recent article on this very subject.
Edit: (mostly in response to Jeff Hardy's comments): It's true that passing by const reference is probably the "safest" alternative under the largest number of circumstances -- but that doesn't mean it's always the best thing to do. But, to understand what's being discussed here, you really do need to read Dave's entire article quite carefully, as it is fairly technical, and the reasoning behind its conclusions is not always intuitively obvious (and you need to understand the reasoning to make intelligent choices).
Usually for built-in types you can just pass by value. They're small types.
For user defined types (or templates, when you don't what is going to be passed) prefer const&. The size of a reference is probably smaller than the size of the type. And it won't incurr an extra copy (no call to a copy constructor).
Well, yes ... the other answers about efficiency are true. But there's something else going on here which is important - passing a class by value creates a copy and, therefore, invokes the copy constructor. If you're doing fancy stuff there, it's another reason to use references.
A reference to const T is not worth the typing effort in case of scalar types like int, double, etc. The rule of thumb is that class-types should be accepted via ref-to-const. But for iterators (which could be class-types) we often make an exception.
In generic code you should probably write "T const&" most of the time to be on the safe side. There's also boost's call traits you can use to select the most promising parameter passing type. It basically uses ref-to-const for class types and pass-by-value for scalar types as far as I can tell.
But there are also situations where you might want to accept parameters by value, regardless of how expensive creating a copy can be. See Dave's article "Want Speed? Use pass by value!".
For simple types like int, double and char*, it makes sense to pass it by value. For more complex types, I use const T& unless there is a specific reason not to.
The cost of passing a 4 - 8 byte parameter is as low as you can get. You don't buy anything by passing a reference. For larger types, passing them by value can be expensive.
It won't make any difference for an int, as when you use a reference the memory address still has to be passed, and the memory address (void*) is usually about the size of an integer.
For types that contain a lot of data it becomes far more efficient as it avoids the huge overhead from having to copy the data.
Well the difference between the two doesn't really mean much for ints.
However, when using larger structures (or objects), the first method you used, pass by const reference, gives you access to the structure without need to copy it. The second case pass by value will instantiate a new structure that will have the same value as the argument.
In both cases you see this in the caller
myFunct(item);
To the caller, item will not be changed by myFunct, but the pass by reference will not incur the cost of creating a copy.
There is a very good answer to a similar question over at Pass by Reference / Value in C++
The difference between them is that one passes an int (which gets copied), and one uses the existing int. Since it's a const reference, it doesn't get changed, so it works much the same. The big difference here is that the function can alter the value of the int locally, but not the const reference. (I suppose some idiot could do the same thing with const_cast<>, or at least try to.) For larger objects, I can think of two differences.
First, some objects simply can't get copied, auto_ptr<>s and objects containing them being the obvious example.
Second, for large and complicated objects it's faster to pass by const reference than to copy. It's usually not a big deal, but passing objects by const reference is a useful habit to get into.
Either works fine. Don't waste your time worrying about this stuff.
The only time it might make a difference is when the type is a large struct, which might be expensive to pass on the stack. In that case, passing the arg as a pointer or a reference is (slightly) more efficient.
The problem appears when you are passing objects. If you pass by value, the copy constructor will be called. If you haven't implemented one, then a shallow copy of that object will be passed to the function.
Why is this a problem? If you have pointers to dynamically allocated memory, this could be freed when the destructor of the copy is called (when the object leaves the function's scope). Then, when you re call your destructor, youll have a double free.
Moral: Write your copy constructors.