Implementing rvalue references as parameters in function overloads - c++

I've already asked on code review and software engineering but the topic didn't fit the site, so I'm asking here hoping this is not opinion-based. I am an "old school" C++ developer (I've stopped at C++ 2003) but now I've read a few books on modern C++ 11/17 and I'm rewriting some libraries of mine.
The first thing I've made is adding move constructor/assignment operator where needed ( = classes that already had destructor + copy constructor and copy assignment). Basically I'm using the rule of five.
Most of my functions are declared like
func(const std::string& s);
Which is the common way to pass a reference avoiding a copy. By the way there is also the new move semantic and there's somethig that I wasn't able to find in my books/online. This code:
void fun(std::string& x) {
x.append(" world");
std::cout << x;
}
int main()
{
std::string s{"Hello "};
fun(s);
}
Can also be written as:
void fun(std::string&& x) {
x.append(" world");
std::cout << x;
}
int main()
{
std::string s{"Hello "};
fun(std::move(s));
//or fun("Hello ");
// or fun(std::string {"Hello" });
}
My question is: when should I declare functions that accept a paramenter that is a rvalue reference?
I understand the usage of && semantic on constructors and assignment operators but not really on functions. In the example above (first function) I have a std::string& x which cannot be called as fun("Hello "); of course because I should delcare the type as const std::string& x. But now the const doesnt allow me to change the string!
Yes, I could use a const cast but I rarely do casts (and if it's the case, they're dynamic casts). The power of the && is that I avoid copies, I don't have to do something like
std::string x = "...";
fun(x); //void fun(std::string& x) {}
and I can assing temporary values that will be moved. Should I declare functions with rvalue references when possible?
I have a library that I'm rewriting with modern C++ 17 and I have functions like:
//only const-ref
Type1 func(const type2& x);
Type3 function(const type4& x);
I am asking if it's worth rewriting all of them as
//const-ref AND rvalue reference
Type1 func(const type2& x);
Type3 function(const type4& x);
Type1 func(type2&& x);
Type3 function(type4&& x);
I don't want to create too many overloads that may be useless but if an user of my library wanted to use the move operation I should create the && param types. Of course I am not doing this for primitive types (int, double, char...) but for containers or classes. What do you suggest?
I am not sure if the latter scenario (with both versions) would be useful or not.

Let me comment on four scenarios in your question and examples.
std::string_view with pass-by-value is supposed to replace const std::string& parameters and whenever you can guarantee the necessary preconditions for a safe usage of std::string_view (lifetime, pointee doesn't change), it's a good candidate to start modernizing your function signatures.
const T& vs. T&& (where T is not subject to template type deduction) with known usage scenarios. The void fun function that appends to a given, modifiable string, will only makes sense as void fun(std::string&&) if calling code doesn't need the result after the call. In this case, the rvalue-reference signature documents this expectation nicely and is the way to go. But these cases are rather rare in my experience.
const T& vs. T&& (again, no type deduction) with unknown usage scenarios. A good reference here is std::vector::push_back, which is overloaded for both rvalue and lvalue references. The push_back operation is assumed to be cheap compared to move-construction a T, that's why the overload makes sense. When a function is assumed to be more expensive than such a move-construction, passing the argument by value is a simplification that can make sense (see also Item 41 in EMC++).
const T& vs. T&& when type deduction takes place. Here, use universal references together with std::forward whenever possible and the parameters can't be const qualified. If they aren't modified in the function body, go with const T&.

You want to use rvalue references only if:
You might retain a copy and you need the extra performance (measure!)
Example for this would be writing a library type (e.g. std::vector) where performance matters to its users.
You want only temporaries to be passed to your function
Example for this is the move assignment operator: After the assignment, the original objects state will not exist anymore.
Forwarding references (T&& with T deduced) fall under the first option.

Rvalue reference (not to be confused with a forwarding reference!) in function arguments is used when there is a need to move ownership from one object to another.
It is true that it is often done in context of move constructors/assignment operators, but this is not the only case. For example, a function accepting an ownership of std::unique_prt could accept it's argument by an rvalue reference.

Related

Forwarding reference vs const lvalue reference in template code

I've recently been looking into forwarding references in C++ and below is a quick summary of my current understanding of the concept.
Let's say I have a template function footaking a forwarding reference to a single argument of type T.
template<typename T>
void foo(T&& arg);
If I call this function with an lvalue then T will be deduced as T& making the arg parameter be of type T& due to the reference collapsing rules T& && -> T&.
If this function gets called with an unnamed temporary, such as the result of a function call, then Twill be deduced as T making the arg parameter be of type T&&.
Inside foo however, arg is a named parameter so I will need to use std::forward if I want to pass the parameter along to other functions and still maintain its value category.
template<typename T>
void foo(T&& arg)
{
bar(std::forward<T>(arg));
}
As far as I understand the cv-qualifiers are unaffected by this forwarding. This means that if I call foo with a named const variable then T will be deduced as const T& and hence the type of arg will also be const T& due to the reference collapsing rules. For const rvalues T will be deduced as const T and hence arg will be of type const T&&.
This also means that if I modify the value of arg inside foo I will get a compile time error if I did infact pass a const variable to it.
Now onto my question.
Assume I am writing a container class and want to provide a method for inserting objects into my container.
template<typename T>
class Container
{
public:
void insert(T&& obj) { storage[size++] = std::forward<T>(obj); }
private:
T *storage;
std::size_t size;
/* ... */
};
By making the insert member function take a forwarding reference to obj I can use std::forward to take advantage of the move assignment operator of the stored type T if insert was infact passed a temporary object.
Previously, when I didn't know anything about forwarding references I would have written this member function taking a const lvalue reference:
void insert(const T& obj).
The downside of this is that this code does not take advantage of the (presumably more efficient) move assignment operator if insert was passed a temporary object.
Assuming I haven't missed anything.
Is there any reason to provide two overloads for the insert function? One taking a const lvalue reference and one taking a forwarding reference.
void insert(const T& obj);
void insert(T&& obj);
The reason I'm asking is that the reference documentation for std::vectorstates that the push_back method comes in two overloads.
void push_back (const value_type& val);
void push_back (value_type&& val);
Why is the first version (taking a const value_type&) needed?
You have to be careful about function templates, versus non-template methods of class templates. Your member insert is not itself a template. It's a method of a template class.
Container<int> c;
c.insert(...);
We can pretty easily see that T is not deduced on the second line, because it's already fixed to int on the first line, because T is a template parameter of the class, not the method.
Non-template methods of class templates, only differ from regular methods in one way, once the class has been instantiated: they aren't instantiated unless they are actually called. This is useful because it allows a template class to work with types, for which only some of the methods make sense (STL containers are full of examples like this).
The bottom line is that in my example above, since T is fixed to int, your method becomes:
void insert(int&& obj) { storage[size++] = std::forward<int>(obj); }
This is not a forwaring reference at all, but simply takes by rvalue reference, i.e. it only binds to rvalues. That is why you typically see two overloads for things like push_back, one for lvalues and one for rvalues.
#Nir Friedman already answered the question, so I'm going to offer some additional advice.
If your Container class is not meant to store polymorphic types (which is common of containers, including std::vector and other similar STL containers), you can get away with simplifying your code, in the way you're trying to do in your original example.
Instead of:
void insert(T const& t) {
storage[size++] = t;
}
void insert(T && t) {
storage[size++] = std::move(t);
}
You could get perfectly correct code by writing the following instead:
void insert(T t) {
storage[size++] = std::move(t);
}
The reason for this is that if the object is being copied in, t will be copy-constructed with the object provided, and then move-assigned into storage[size++], whereas if the object is being moved in, t will be move-constructed with the object provided, and then move-assigned into storage[size++]. So you've simplified your code at the cost of a single extra move-assignment, which many compilers will happily optimize out.
There is a major downside to this approach, though: If the object defines a copy-constructor and doesn't define a move-constructor (common for older types in legacy code), this results in double-copies in all cases. Your compiler might be able to optimize it away (because compilers can optimize to completely different code so long as the user-visible effects are unchanged), but maybe not. That could be a significant performance hit if you have to work with heavy objects that don't implement move-semantics. This is probably the reason STL containers don't use this technique (they value performance over brevity). But if you're looking for a way to reduce the amount of boilerplate code you write, and aren't worried about having to use "copy-only" objects, then this will probably work fine for you.

C++ reference for both LValue and Rvalue without type deduction

I was reading a good tutorial on lvalue/rvalue references. If I've understood correctly when there is type deduction something like T&& can accept both an lvalue and an rvalue.
But is there a way to achieve that without a generic class? I'd like to avoid duplicating all my methods for accepting both lvalues and rvalues. And of course avoid passing big objects by value.
r-value references are mostly use in move-constructor and move assignment.
For regular method, you may stick with one reference type only:
For read only parameter (without copy), const reference is enough.
if you have to do a copy, you may take your argument by value and use std::move:
Example:
class Test
{
public:
void displayString(const std::string& s) const { std::cout << s << m_s; }
void setString(std::string s) { m_s = std::move(s); }
private:
std::string m_s;
};
If the function that you implement does not need rvalue semantic, then you can simply pass the argument by reference or by constant reference.
However, if you can take advantage of rvalues and do not want to duplicate your code, you can pass by value and move the result. That should be almost as efficient and can be more maintainable than code duplication or an implementation with universal references.
This answer shows the technique: Should all/most setter functions in C++11 be written as function templates accepting universal references?
// copy, then move
void set_a(A a_) { a = std::move(a_); }

Pass by value or rvalue-ref

For move enabled classes is there a difference between this two?
struct Foo {
typedef std::vector<std::string> Vectype;
Vectype m_vec;
//this or
void bar(Vectype&& vec)
{
m_vec = std::move(vec);
}
//that
void bar(Vectype vec)
{
m_vec = std::move(vec);
}
};
int main()
{
Vectype myvec{"alpha","beta","gamma"};
Foo fool;
fool.bar(std::move(myvec));
}
My understanding is that if you use a lvalue myvec you also required to introduce const
Vectype& version of Foo::bar() since Vectype&& won't bind. That's aside, in the rvalue case, Foo::bar(Vectype) will construct the vector using the move constructor or better yet elide the copy all together seeing vec is an rvalue (would it?). So is there a compelling reason to not to prefer by value declaration instead of lvalue and rvalue overloads?
(Consider I need to copy the vector to the member variable in any case.)
The pass-by-value version allows an lvalue argument and makes a copy of it. The rvalue-reference version can't be called with an lvalue argument.
Use const Type& when you don't need to change or copy the argument at all, use pass-by-value when you want a modifiable value but don't care how you get it, and use Type& and Type&& overloads when you want something slightly different to happen depending on the context.
The pass-by-value function is sufficient (and equivalent), as long as the argument type has an efficient move constructor, which is true in this case for std::vector.
Otherwise, using the pass-by-value function may introduce an extra copy-construction compared to using the pass-by-rvalue-ref function.
See the answer https://stackoverflow.com/a/7587151/1190077 to the related question Do I need to overload methods accepting const lvalue reference for rvalue references explicitly? .
Yes, the first one (Vectype&& vec) won't accept a const object or simply lvalue.
If you want to save the object inside like you do, it's best to copy(or move if you pass an rvalue) in the interface and then move, just like you did in your second example.

"Reference qualifier correctness" or should a non-const method ever apply to rvalues?

Now that GCC 4.8.1 and Clang 2.9 and higher support them, reference qualifiers (also known as "rvalue references for *this") have become more widely available. They allow classes to behave even more like built-in types by, e.g., disallowing assignment to rvalues (which can otherwise cause an unwanted cast of an rvalue to an lvalue):
class A
{
// ...
public:
A& operator=(A const& o) &
{
// ...
return *this;
}
};
In general, it is sensible to call a const member function of an rvalue, so an lvalue reference qualifier would be out of place (unless the rvalue qualifier can be used for an optimization such as moving a member out of a class instead of returning a copy).
On the flip side, mutating operators such as the pre decrement/increment operators should be lvalue-qualified, as they usually return an lvalue-reference to the object. Hence also the question: Are there any reasons to ever allow mutating/non-const methods (including operators) to be called on rvalue references aside from conceptually const methods which are only not marked const because const-correctness (including proper application of mutable when using an internal cache, which may include ensuring certain thread-saftey guarantees now) was neglected in the code base?
To clarify, I am not suggesting to forbid mutating methods on rvalues on the language level (at the very least this could break legacy code) but I believe that defaulting (as an idiom / coding style) to only allowing lvalues for mutating methods will generally lead to cleaner, safer APIs. However I am interested in examples where not doing so leads to cleaner, less astonishing APIs.
A mutator that operates on an R-value can be useful if the R-value is used to accomplish some task, but in the interim it maintains some state. For example:
struct StringFormatter {
StringFormatter &addString(string const &) &;
StringFormatter &&addString(string const &) &&;
StringFormatter &addNumber(int) &;
StringFormatter &&addNumber(int) &&;
string finish() &;
string finish() &&;
};
int main() {
string message = StringFormatter()
.addString("The answer is: ")
.addNumber(42)
.finish();
cout << message << endl;
}
By allowing either an L-value or an R-value, one can construct an object, pass it through some mutators, and use the result of the expression to accomplish some task without having to store it in an L-value, even if the mutators are member functions.
Also note that not all mutating operators return a reference to the self. User-defined mutators can implement any signature they need or want. A mutator may consume the state of the object to return something more useful, and by acting on an R-value, the fact that the object is consumed isn't a problem since the state would have otherwise been discarded. In fact, a member function that consumes the state of the object to produce something else useful will have to be marked as such, making it easier to see when l-values are consumed. For example:
MagicBuilder mbuilder("foo", "bar");
// Shouldn't compile (because it silently consumes mbuilder's state):
// MagicThing thing = mbuilder.construct();
// Good (the consumption of mbuilder is explicit):
MagicThing thing = move(mbuilder).construct();
I think it comes about in cases where the only way to retrieve some value is by mutating another value. For instance, iterators don't provide a "+1" or a "next" method. So suppose I'm constructing a wrapper for stl list iterators (perhaps to create an iterator for my own list-backed data-structure):
class my_iter{
private:
std::list::iterator<T> iter;
void assign_to_next(std::list::iterator<T>&& rhs) {
iter = std::move(++rhs);
}
};
Here, the assign_to_next method takes an iterator and assigns this one to have the next position after that one. It's not too hard to imagine situations where this might be useful, but more importantly there is nothing surprising about this implementation. True, we could also say iter = std::move(rhs); ++iter; or ++(iter = std::move(rhs));, but I don't see any arguments for why those would be any cleaner or faster. I think this implementation is the most natural to me.
FWIW HIC++ agrees with you as far as assignment operators:
http://www.codingstandard.com/rule/12-5-7-declare-assignment-operators-with-the-ref-qualifier/
Should a non-const method ever apply to rvalues?
This question puzzles me. A more sensible question to me would be:
Should a const method ever apply exclusively to rvalues?
To which I believe the answer is no. I can't imagine a situation in which you would want to overload on const rvalue *this, just as I can't imagine a situation in which you would want to overload on const rvalue arguments.
You overload on rvalues because it's possible to handle them more efficiently when you know that you can steal their guts, but you can't steal the guts of a const object.
There are four possible ways to overload on *this:
struct foo {
void bar() &;
void bar() &&;
void bar() const &;
void bar() const &&;
};
The constness of the latter two overloads means that neither one can mutate *this, so there can be no difference between what the const & overload is allowed to do to *this and what the const && overload is allowed to do to *this. In the absence of the const && overload, the const & will bind to both lvalues and rvalues anyway.
Given that overloading on const && is useless and only really provided for completeness (prove me wrong!) we are left with only one remaining use case for ref-qualifiers: overloading on non-const rvalue *this. One can define a function body for a && overload, or one can = delete it (this happens implicitly if only a & overload is provided). I can imagine plenty of cases in which defining a && function body might be useful.
A proxy object which implements pointer semantics by overloading operator-> and unary operator*, such as boost::detail::operator_arrow_dispatch, might find it useful to use ref-qualifiers on its operator*:
template <typename T>
struct proxy {
proxy(T obj) : m_obj(std::move(obj)) {}
T const* operator->() const { return &m_obj; }
T operator*() const& { return m_obj; }
T operator*() && { return std::move(m_obj); }
private:
T m_obj;
};
If *this is a rvalue then operator* can return by move instead of by copy.
I can imagine functions that move from the actual object to a parameter.

C++0x rvalue references - lvalues-rvalue binding

This is a follow-on question to
C++0x rvalue references and temporaries
In the previous question, I asked how this code should work:
void f(const std::string &); //less efficient
void f(std::string &&); //more efficient
void g(const char * arg)
{
f(arg);
}
It seems that the move overload should probably be called because of the implicit temporary, and this happens in GCC but not MSVC (or the EDG front-end used in MSVC's Intellisense).
What about this code?
void f(std::string &&); //NB: No const string & overload supplied
void g1(const char * arg)
{
f(arg);
}
void g2(const std::string & arg)
{
f(arg);
}
It seems that, based on the answers to my previous question that function g1 is legal (and is accepted by GCC 4.3-4.5, but not by MSVC). However, GCC and MSVC both reject g2 because of clause 13.3.3.1.4/3, which prohibits lvalues from binding to rvalue ref arguments. I understand the rationale behind this - it is explained in N2831 "Fixing a safety problem with rvalue references". I also think that GCC is probably implementing this clause as intended by the authors of that paper, because the original patch to GCC was written by one of the authors (Doug Gregor).
However, I don't this is quite intuitive. To me, (a) a const string & is conceptually closer to a string && than a const char *, and (b) the compiler could create a temporary string in g2, as if it were written like this:
void g2(const std::string & arg)
{
f(std::string(arg));
}
Indeed, sometimes the copy constructor is considered to be an implicit conversion operator. Syntactically, this is suggested by the form of a copy constructor, and the standard even mentions this specifically in clause 13.3.3.1.2/4, where the copy constructor for derived-base conversions is given a higher conversion rank than other user-defined conversions:
A conversion of an expression of class type to the same class type is given Exact Match rank, and a conversion
of an expression of class type to a base class of that type is given Conversion rank, in spite of the fact that
a copy/move constructor (i.e., a user-defined conversion function) is called for those cases.
(I assume this is used when passing a derived class to a function like void h(Base), which takes a base class by value.)
Motivation
My motivation for asking this is something like the question asked in How to reduce redundant code when adding new c++0x rvalue reference operator overloads ("How to reduce redundant code when adding new c++0x rvalue reference operator overloads").
If you have a function that accepts a number of potentially-moveable arguments, and would move them if it can (e.g. a factory function/constructor: Object create_object(string, vector<string>, string) or the like), and want to move or copy each argument as appropriate, you quickly start writing a lot of code.
If the argument types are movable, then one could just write one version that accepts the arguments by value, as above. But if the arguments are (legacy) non-movable-but-swappable classes a la C++03, and you can't change them, then writing rvalue reference overloads is more efficient.
So if lvalues did bind to rvalues via an implicit copy, then you could write just one overload like create_object(legacy_string &&, legacy_vector<legacy_string> &&, legacy_string &&) and it would more or less work like providing all the combinations of rvalue/lvalue reference overloads - actual arguments that were lvalues would get copied and then bound to the arguments, actual arguments that were rvalues would get directly bound.
Clarification/edit: I realize this is virtually identical to accepting arguments by value for movable types, like C++0x std::string and std::vector (save for the number of times the move constructor is conceptually invoked). However, it is not identical for copyable, but non-movable types, which includes all C++03 classes with explicitly-defined copy constructors. Consider this example:
class legacy_string { legacy_string(const legacy_string &); }; //defined in a header somewhere; not modifiable.
void f(legacy_string s1, legacy_string s2); //A *new* (C++0x) function that wants to move from its arguments where possible, and avoid copying
void g() //A C++0x function as well
{
legacy_string x(/*initialization*/);
legacy_string y(/*initialization*/);
f(std::move(x), std::move(y));
}
If g calls f, then x and y would be copied - I don't see how the compiler can move them. If f were instead declared as taking legacy_string && arguments, it could avoid those copies where the caller explicitly invoked std::move on the arguments. I don't see how these are equivalent.
Questions
My questions are then:
Is this a valid interpretation of the standard? It seems that it's not the conventional or intended one, at any rate.
Does it make intuitive sense?
Is there a problem with this idea that I"m not seeing? It seems like you could get copies being quietly created when that's not exactly expected, but that's the status quo in places in C++03 anyway. Also, it would make some overloads viable when they're currently not, but I don't see it being a problem in practice.
Is this a significant enough improvement that it would be worth making e.g. an experimental patch for GCC?
What about this code?
void f(std::string &&); //NB: No const string & overload supplied
void g2(const std::string & arg)
{
f(arg);
}
...However, GCC and MSVC both reject g2 because of clause 13.3.3.1.4/3, which prohibits lvalues from binding to rvalue ref arguments. I understand the rationale behind this - it is explained in N2831 "Fixing a safety problem with rvalue references". I also think that GCC is probably implementing this clause as intended by the authors of that paper, because the original patch to GCC was written by one of the authors (Doug Gregor)....
No, that's only half of the reason why both compilers reject your code. The other reason is that you can't initialize a reference to non-const with an expression referring to a const object. So, even before N2831 this didn't work. There is simply no need for a conversion because a string is a already a string. It seems you want to use string&& like string. Then, simply write your function f so that it takes a string by value. If you want the compiler to create a temporary copy of a const string lvalue just so you can invoke a function taking a string&&, there wouldn't be a difference between taking the string by value or by rref, would it?
N2831 has little to do with this scenario.
If you have a function that accepts a number of potentially-moveable arguments, and would move them if it can (e.g. a factory function/constructor: Object create_object(string, vector, string) or the like), and want to move or copy each argument as appropriate, you quickly start writing a lot of code.
Not really. Why would you want to write a lot of code? There is little reason to clutter all your code with const&/&& overloads. You can still use a single function with a mix of pass-by-value and pass-by-ref-to-const -- depending on what you want to do with the parameters. As for factories, the idea is to use perfect forwarding:
template<class T, class... Args>
unique_ptr<T> make_unique(Args&&... args)
{
T* ptr = new T(std::forward<Args>(args)...);
return unique_ptr<T>(ptr);
}
...and all is well. A special template argument deduction rule helps differentiating between lvalue and rvalue arguments and std::forward allows you to create expressions with the same "value-ness" as the actual arguments had. So, if you write something like this:
string foo();
int main() {
auto ups = make_unique<string>(foo());
}
the string that foo returned is automatically moved to the heap.
So if lvalues did bind to rvalues via an implicit copy, then you could write just one overload like create_object(legacy_string &&, legacy_vector &&, legacy_string &&) and it would more or less work like providing all the combinations of rvalue/lvalue reference overloads...
Well, and it would be pretty much equivalent to a function taking the parameters by value. No kidding.
Is this a significant enough improvement that it would be worth making e.g. an experimental patch for GCC?
There's no improvement.
I don't quite see your point in this question. If you have a class that is movable, then you just need a T version:
struct A {
T t;
A(T t):t(move(t)) { }
};
And if the class is traditional but has an efficient swap you can write the swap version or you can fallback to the const T& way
struct A {
T t;
A(T t) { swap(this->t, t); }
};
Regarding the swap version, I would rather go with the const T& way instead of that swap. The main advantage of the swap technique is exception safety and is to move the copy closer to the caller so that it can optimize away copies of temporaries. But what do you have to save if you are just constructing the object anyway? And if the constructor is small, the compiler can look into it and can optimize away copies too.
struct A {
T t;
A(T const& t):t(t) { }
};
To me, it doesn't seem right to automatically convert a string lvalue to a rvalue copy of itself just to bind to a rvalue reference. An rvalue reference says it binds to rvalue. But if you try binding to an lvalue of the same type it better fails. Introducing hidden copies to allow that doesn't sound right to me, because when people see a X&& and you pass a X lvalue, I bet most will expect that there is no copy, and that binding is directly, if it works at all. Better fail out straight away so the user can fix his/her code.