STL swap on return? - c++

sorry for such a long question but I try to be as clear as possible. This somehow follows my previous question about strings in C++. I'm trying to figure out how I could return std::string from a function without redundant memory allocations, without relying on NRVO. The reasons why I don't want to rely on NRVO are:
it is not supported by the compiler we currently use
even when it is supported it might not always be enabled in Debug mode
it might fail in some cases (example)
Please note that I need a C++03 compatible solution (no C++0x rvalue references thus, unfortunately...)
The simplest way do this is pass-by-reference and do std::swap, like this
void test(std::string& res)
{
std::string s;
//...
res.swap(s);
}
But it is more natural and often more convenient to return by value than pass by reference, so what I want to achieve is this:
std::string test()
{
std::string s;
//...
return SOMETHING(s);
}
Ideally it would just do a swap with the "return value", but I don't see how to do this in C++. There is auto_ptr already which does move instead of copy, and I could actually use auto_ptr<string>, but I'd like to avoid dynamically allocating the string object itself.
My idea is to somehow "tag" a string object that it is being returned from a function to permit moving its data when a copy constructor is called on return. So I ended up with this code, which does exactly what I want:
struct Str
{
struct Moveable
{
Str & ref;
explicit Moveable(Str & other): ref(other) {}
};
Str() {}
Str(const std::string& other) : data(other) {} // copy
Str(Moveable& other) { data.swap(other.ref.data); } // move
Moveable Move()
{
return Moveable(*this);
}
std::string data;
};
Str test()
{
Str s;
//...
return s.Move(); // no allocation, even without NRVO
}
So... Does all this make sense, or there are some serious issues that I'm missing? (I'm not sure if there is no lifetime issue for example). Maybe you have already seen such idea in a library (book, article...), and could give me a reference to it?
EDIT: As #rstevens noticed, this code is MSVC-specific and won't compile under g++ which does not like the non-const temporary. This is an issue, but let's just assume this implementation is MSVC-specific.

The implementation of boost uses move semantics emulation internally for libraries like Boost.Thread. You may want to look at the implementation and do something similar.
Edit: there is actually an active development of a library Boost.Move, so you can already start using it.

Did you check this code on g++?
Since you call Str(Movable &) with a temporary object (the one returned by s.Move())!
This is not standard's compliant and not supported by g++. It is supported by MSVC! (MS calls this a feature...).

Have you actually determined that returning by value is a performance problem in your application? That seems like the simplest/easiest way to go, and when you upgrade to a more modern compiler you can use rvalue references.
I can't answer the question regarding the destruction order of s vs the Movable's reference. You could, for your compiler, put code in the various constructors and destructors to see what the order is. Even if it looks ok I'd still consider using one of the normal patterns you outlined though just to prevent reader confusion and possibly breaking on an alternate compiler.

Related

why should we use std::move semantic with unique pointers?

Conceptual Question
Say we have simple example like this:
void foo(std::unique_ptr<int> ptr)
{
std::cout << *ptr.get() << std::endl;
}
int main()
{
std::unique_ptr<int> uobj = std::make_unique<int>(4);
foo(uobj ); // line-(1) Problem ,but Alternative -> foo(std::move(uobj ))
std::unique_ptr<int> uobjAlt = uobj; // line-(2) Problem ,but Alternative -> std::unique_ptr<int> uobjAlt = std::move(uobj);
return EXIT_SUCCESS;
}
We know simply std::unique_ptr bound with concept of resource owning by single owner with moving resource among multiple owners while shared_ptr has opposite aspect.
As example shown above, when you look at line-(1) & line-(2) you notice that some standard rules are being violated because std::unique_ptr has(deleted) no both copy constructors and copy assignable operators defined, but In order to avoid compilation errors we have to use std::move function instead.
Problem
Why modern C++ compiler cannot automatically generate instructions to move the resource among unique pointers in line-(1) and line-(2)? because we know unique pointer intentionally design for that. Why should we use std::move explicitly to instruct the machine to move ownership of the resource?
std::unique_ptr nothing but class template.we know that, But situations addressed in line-1 and line -2 having issues while compiler complain about copying unique_pointers not allowed(deleted functions).why we having these kind of errors why c++ standard and compiler vendors cannot override this concept?
Unique Pointer intentionally designed for the purpose of moving resource while passing its ownership, when we pass it as function/constructor argument or assign to another unique pointer, it conceptually should move resource with ownership nothing else, but why we should use std::move to convey compiler to actual move, why don't we have a freedom to call line-(1) and line-(2) as it is? (while intelligent compiler generate automatic move operation among unique pointers for us, unless there is const or non-const reference passing).
(Sorry for long description and broken English) Thank you.
unique_ptr is useful to free memory for you automatically when uobj goes out of scope. That's its job. So, since it has 1 pointer it has to free, it has to be unique, and hence its name: unique_ptr!
When you do something like this:
std::unique_ptr<int> uobjAlt = uobj;
You're issuing a copy operation, but, you're not supposed to copy the pointer, because copying means that both objects uobjAlt and uobj must both be freed, which will directly lead to a segmentation fault and a crash. So, by using std::move, you're moving ownership from one object to another.
If you want to have multiple pointers to a single object, you should consider using std::shared_ptr.
This has nothing to do with whether the compiler can do this. It certainly could work that way, and in fact, it did work that way prior to C++11 with std::auto_ptr<>. It was horrible.
std::auto_ptr<int> x = std::auto_ptr<int>(new int(5));
std::auto_ptr<int> y = x;
// Now, x is NULL
The problem here is that the = sign usually means "copy from x to y", but in this case what is happening is "move from x to y, invalidating x in the process". Yes, if you are a savvy programmer you would understand what is going on here and it wouldn't surprise you, at least not all of the time. However, in more common situations it would be horribly surprising:
Here's MyClass.h:
class MyClass {
private:
std::auto_ptr<Type> my_field;
...
};
Here's MyClass.cpp:
void MyClass::Method() {
SomeFunction(my_field);
OtherFunction(my_field);
}
Here's Functions.h:
// Which overload, hmm?
void SomeFunction(Type &x);
void SomeFunction(std::auto_ptr<Type> x);
void OtherFunction(const std::auto_ptr<Type> &x);
Now you have to look at three different files before you can figure out that my_field is set to NULL. With std::unique_ptr you only have to look at one:
void MyClass::Method() {
SomeFunction(std::move(my_field));
OtherFunction(my_field);
}
Just looking at this one function I know that it's wrong, I don't have to figure out which overload is being used for SomeFunction, and I don't have to know what the type of my_field is. There's definitely a balance that we need to have between making things explicit and implicit. In this case, the fact that you couldn't explicitly tell the difference between moving and copying a value in C++ was such a problem that rvalue references, std::move, std::unique_ptr, etc. were added to C++ to clear things up, and they're pretty amazing.
The other reason why auto_ptr was so bad is because it interacted poorly with containers.
// This was a recipe for disaster
std::vector<std::auto_ptr<Type> > my_vector;
In general, many templates worked poorly with auto_ptr, not just containers.
If the compiler were allowed to auto-infer move semantics for types such as std::unique_ptr, code like this would break:
template<typename T> void simple_swap(T& a, T& b) {
T tmp = a;
a = b;
b = tmp;
}
The above counts on tmp being a copy of a (because it continues to use a as the left-hand side of as assignment operator). There is code in the standard algorithms which actually requires temporary copies of container values. Inferring moves would break them, causing crashes at run-time. This is why std::auto_ptr was warned against ever being used in STL containers.

How to avoid the copy when I return

I have a function which returns a vector or set:
set<int> foo() {
set<int> bar;
// create and massage bar
return bar;
}
set<int> afoo = foo();
In this case, I create a temporary memory space in function foo(), and then
assign it to afoo by copying. I really want to avoid this copy, any easy way I
can do this in C++11? I think this has to do with the rvalue thing.
OK, update to the question: If I am going to return an object defined by myself,
not the vector or set thing, does that mean I should define a move constructor?
like this:
class value_to_return {
value_to_return (value_to_return && other) {
// how to write it here? I think std::move is supposed to be used?
}
}
THanks!!!
Modem C++ compiler will implement: given a type T:
If T has an accessible copy or move constructor, the compiler may
choose to elide the copy. This is the so-called (named) return value
optimization (RVO), which was specified even before C++11 and is
supported by most compilers.
Otherwise, if T has a move constructor, T is moved(Since C++11).
Otherwise, if T has a copy constructor, T is copied.
Otherwise, a compile-time error is emitted.
Check out return value optimization. A modern compiler will optimize this situation, and in straightforward situations like these, no copy will be made on any of the major compilers.
In principle, you could also create your object outside the function, and then call the function and pass the object to it by reference. That would be the old way of avoiding a copy, but it is unnecessary and undesirable now.
I usually work around this by having function signature as
void foo(set<int> *x)
Just pass it by reference or the other option is already mentioned in the comment.
Edit: I have changed the argument type to illustrate that x could be changed.
set<int> s;
foo(&s);
This is only preferred when you have an old compiler. I suppose that could be the case with some of the projects.
And, better thing to do will be Either to use move semantics with c++11. Or go ahead returning the container and look into RVO in modern compilers.

Return statement vs. accepting pointer to write into [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In C++, is it still bad practice to return a vector from a function?
In terms of performance, when needing to return'heavier' objects like std::vector or std::string from a function, is it recommended to use this form:
void func(std::vector<int> *dest)
{
}
instead of this form:
std::vector<int> func()
{
std::vector<int> arr;
// ...
return arr;
}
I am assuming that the first form should be faster, but at the same time I've seen the second form often enough, the Qt API often returns a QString for example, probably because it is much more convenient or intuitive to use.
Also I've wondered if there are compiler optimizations which could remove the unnecessary copying of objects when using the return statement.
Edit
Are there any popular compilers still used today which do not perform the optimizations mentioned in the answers?
is it recommended to use [pass by pointer] instead of [return by value]?
No.
A modern C++ compiler performs named return value optimisation (NRVO) which effectively means that the compiler reliably elides the copy here. No copy is performed.
Note that this is regardless of which C++ version you are using: C++03 does it as well as C++11. The only thing that changed in C++11 is that the language makes it easier for libraries to make moving out of a value (as is happening here) efficient when no copy elision can be performed.
For return values, copy elision can normally be performed – it’s more relevant in other cases (passing parameters by value, for instance). There are exceptions though; the following code cannot use named return value optimisation. It can use a C++11 move though:
std::string foo() {
std::string one = "Foo";
std::string two = "Bar";
if (rand() % 2 == 0)
return one;
else
return two;
}
The reason is that now two code paths return different named objects; this prevents NRVO.
Return by value:
std::vector<int> func();
C++ allows for copy elision for situations just like this, and beyond that the new C++ defined move se­man­tics to make those operations cheap. Compilers generally implement this well. (With copy elision, your local arr will actually end up being constructed right in the recipient call site variable. This situa­tion is also known as "return value optimization".)
The rules allowing RVO and NRVO were present in the ARM (1990),
so it would be surprising if any compiler didn't implement them.
More importantly, using out parameters (either pointers or
references to non-const) is extremely unwieldy. Don't do it,
until the profiler says that you really are having time problems
because of the copying of the return value. At which point,
overload the function, along the lines of:
void func( std::vector<int>& dest )
{
// ...
}
std::vector<int> func()
{
std::vector<int> results;
func( results );
return results;
}
Then try both, at the location where the profiler says you are
having problems, and chose the one which solves the problems (if
it makes a difference).
I actually had to do this once, but that was sometime around
1991 or 1992. I haven't had to do it since, and for the last
couple of years, I've been working on some pretty performance
critical stuff; we still regularly return std::vector or our
in house Matrix classes. Without the advantages of C++11,
since not all of the targeted compilers support it.
std::vector<int> func();
With C++11, the above should be your function. The local std::vector object will be moved when the function returns. If possible, even move can be elided by the compiler.
In short, no worry. Return by value.

Trying to write string class that can do move semantics from an std::string

I am writing my own string class for really just for learning and cementing some knowledge. I have everything working except I want to have a constructor that uses move semantics with an std::string.
Within my constructor I need to copy and null out the std::string data pointers and other things, it needs to be left in an empty but valid state, without deleting the data the string points to, how do I do this?
So far I have this
class String
{
private:
char* mpData;
unsigned int mLength;
public:
String( std::string&& str)
:mpData(nullptr), mLength(0)
{
// need to copy the memory pointer from std::string to this->mpData
// need to null out the std::string memory pointer
//str.clear(); // can't use clear because it deletes the memory
}
~String()
{
delete[] mpData;
mLength = 0;
}
There is no way to do this. The implementation of std::string is implementation-defined. Every implementation is different.
Further, there is no guarantee that the string will be contained in a dynamically allocated array. Some std::string implementations perform a small string optimization, where small strings are stored inside of the std::string object itself.
The below implementation accomplishes what was requested, but at some risk.
Notes about this approach:
It uses std::string to manage the allocated memory. In my view, layering the allocation like this is a good idea because it reduces the number of things that a single class is trying to accomplish (but due to the use of a pointer, this class still has potential bugs associated with compiler-generated copy operations).
I did away with the delete operation since that is now performed automatically by the allocation object.
It will invoke so-called undefined behavior if mpData is used to modify the underlying data. It is undefined, as indicated here, because the standard says it is undefined. I wonder, though, if there are real-world implementations for which const char * std::string::data() behaves differently than T * std::vector::data() -- through which such modifications would be perfectly legal. It may be possible that modifications via data() would not be reflected in subsequent accesses to allocation, but based on the discussion in this question, it seems very unlikely that such modifications would result in unpredictable behavior assuming that no further changes are made via the allocation object.
Is it truly optimized for move semantics? That may be implementation defined. It may also depend on the actual value of the incoming string. As I noted in my other answer, the move constructor provides a mechanism for optimization -- but it doesn't guarantee that an optimization will occur.
class String
{
private:
char* mpData;
unsigned int mLength;
std::string allocation;
public:
String( std::string&& str)
: mpData(const_cast<char*>(str.data())) // cast used to invoke UB
, mLength(str.length())
, allocation(std::move(str)) // this is where the magic happens
{}
};
I am interpreting the question as "can I make the move constructor result in correct behavior" and not "can I make the move constructor optimally fast".
If the question is strictly, "is there a portable way to steal the internal memory from std::string", then the answer is "no, because there is no 'transfer memory ownership' operation provided in the public API".
The following quote from this explanation of move semantics provides a good summary of "move constructors"...
C++0x introduces a new mechanism called "rvalue reference" which,
among other things, allows us to detect rvalue arguments via function
overloading. All we have to do is write a constructor with an rvalue
reference parameter. Inside that constructor we can do anything we
want with the source, as long as we leave it in some valid state.
Based on this description, it seems to me that you can implement the "move semantics" constructor (or "move constructor") without being obligated to actually steal the internal data buffers.
An example implementation:
String( std::string&& str)
:mpData(new char[str.length()]), mLength(str.length())
{
for ( int i=0; i<mLength; i++ ) mpData[i] = str[i];
}
As I understand it, the point of move semantics is that you can be more efficient if you want to. Since the incoming object is transient, its contents do not need to be preserved -- so it is legal to steal them, but it is not mandatory. Maybe, there is no point to implementing this if you aren't transferring ownership of some heap-based object, but it seems like it should be legal. Perhaps it is useful as a stepping stone -- you can steal as much as is useful, even if that isn't the entire contents.
By the way, there is a closely related question here in which the same kind of non-standard string is being built and includes a move constructor for std::string. The internals of the class are different however, and it is suggested that std::string may have built-in support for move semantics internally (std::string -> std::string).

C++ returning an object copy

I wrote the following code:
class MyObjectHolder {
public:
std::vector<int> getMyObject() const {
return myObject;
}
private:
std::vector<int> myObject;
};
At some point of my program I attempt to use the getMyObject method and use only const methods on the retrieved object:
const std::vector<int> myObject = myObjectHolder.getMyObject();
myObject.size();
int a = myObject.front();
Now, is it possible that the compiler will optimize this code so that no copies of the std::vector<int> are done?
Is it somehow possible that the compiler determines that I'm only using the const methods on the retrieved object (and let's assume there is no mutable nonsense happening behind it) and it would not make any copies of the objects and perform these const operations on the private member of the MyObjectHolder instead?
If yes, would it be possible if I didn't explicitly declare the const std::vector<int> myObject as const?
If no, what are the reasons not to do this? In which cases this optimization would be to hard to implement / deduce that it's possible and correct here / etc... ?
Now, is it possible that the compiler will optimize this code so that no copies of the std::vector<int> are done?
No, the compiler doesn't know what callers will do with that object unless you are making use of global optimization over all code that uses the object (the compiler can't generally make assumptions about its use; moreover if object is exported from a dll it can't make any assumption at all).
If yes, would it be possible if I didn't explicitly declare the const std::vector myObject as const?
No, anyway the conversion from non-const to const could be implicit.
If no, what are the reasons not to do this? In which cases this optimization would be to hard to implement / deduce that it's possible and correct here / etc... ?
It's an optmiziation that should be done inside getMyObject() but the compiler can't be sure that callers won't cast away the const. Actually this is a very old debate about the use of const, usually I think it's more clear to always think about const as something for programmers and not for compilers.
I would suggest to use
const std::vector<int>& getMyObject() const {
return myObject;
}
It would return the constant reference of myObject without copy that.
And use the result with
const std::vector<int>& myObject = myObjectHolder.getMyObject();
It is possible Copy Elision and Return Value Optimization will kick in. If you use C++ compiler with C++11 support, then you may get it optimised by move semantics.
I'd recommend to read the excellent article Want Speed? Pass by Value by Dave Abrahams with discussion in the comments below it.
However, for details you should refer documentation of your C++ compiler.