Please consider the following code:
struct MyStruct
{
int iInteger;
string strString;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct NewStruct = { 8, "Hello" };
vecStructs.push_back(std::move(NewStruct));
}
int main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
Why does this work?
At the moment when MyFunc is called, the return address should be placed on the stack of the current thread. Now create the NewStruct object gets created, which should be placed on the stack as well. With std::move, I tell the compiler, that i do not plan to use the NewStruct reference anymore. He can steal the memory. (The push_back function is the one with the move semantics.)
But when the function returns and NewStruct falls out of scope. Even if the compiler would not remove the memory, occupied by the originally existing structure from the stack, he has at least to remove the previously stored return address.
This would lead to a fragmented stack and future allocations would overwrite the "moved" Memory.
Can someone explain this to me, please?
EDIT:
First of all: Thank you very much for your answers.
But from what i have learned, I still cannot understand, why the following does not work like I expect it to work:
struct MyStruct
{
int iInteger;
string strString;
string strString2;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct oNewStruct = { 8, "Hello", "Definetly more than 16 characters" };
vecStructs.push_back(std::move(oNewStruct));
// At this point, oNewStruct.String2 should be "", because its memory was stolen.
// But only when I explicitly create a move-constructor in the form which was
// stated by Yakk, it is really that case.
}
void main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
First, std::move does not move, and std::forward does not forward.
std::move is a cast to an rvalue reference. By convention, rvalue references are treated as "references you are permitted to move the data out of, as the caller promises they really don't need that data anymore".
On the other side of the fence, rvalue references implicitly bind to the return value of std::move (and sometimes forward), to temporary objects, in certain cases when returning a local from a function, and when using a member of a temporary or a moved-from object.
What happens within the function taking an rvalue reference is not magic. It cannot claim the storage directly within the object in question. It can, however, tear out its guts; it has permission (by convention) to mess with its arguments internal state if it can do the operation faster that way.
Now, C++ will automatically write some move constructors for you.
struct MyStruct
{
int iInteger;
string strString;
};
In this case, it will write something that roughly looks like this:
MyStruct::MyStruct( MyStruct&& other ) noexcept(true) :
iInteger( std::move(other.iInteger) ),
strString( std::move(other.strString) )
{}
Ie, it will do an element-wise move construct.
When you move an integer, nothing interesting happens. There isn't any benefit to messing with the source integer's state.
When you move a std::string, we get some efficiencies. The C++ standard describes what happens when you move from one std::string to another. Basically, if the source std::string is using the heap, the heap storage is transferred to the destination std::string.
This is a general pattern of C++ containers; when you move from them, they steal the "heap allocated" storage of the source container and reuse it in the destination.
Note that the source std::string remains a std::string, just one that has its "guts torn out". Most container like things are left empty, I don't recall if std::string makes that guarantee (it might not due to SBO), and it isn't important right now.
In short, when you move from something, its memory is not "reused", but memory it owns can be reused.
In your case, MyStruct has a std::string which can use heap allocated memory. This heap allocated memory can be moved into the MyStruct stored in the std::vector.
Going a bit further down the rabbit hole, "Hello" is likely to be so short that SBO occurs (small buffer optimization), and the std::string doesn't use the heap at all. For this particular case, there may be next to no performance improvement due to moveing.
Your example can be reduced to:
vector<string> vec;
string str; // populate with a really long string
vec.push_back(std::move(str));
This still raises the question, "Is it possible to move local stack variables." It just removes some extraneous code to make it easier to understand.
The answer is yes. Code like the above can benefit from std::move because std::string--at least if the content is large enough--stores it actual data on the heap, even if the variable is on the stack.
If you do not use std::move(), you can expect code like the above to copy the content of str, which could be arbitrarily large. If you do use std::move(), only the direct members of the string will be copied (move does not need to "zero out" the old locations), and the data will be used without modification or copying.
It's basically the difference between this:
char* str; // populate with a really long string
char* other = new char[strlen(str)+1];
strcpy(other, str);
vs
char* str; // populate with a really long string
char* other = str;
In both cases, the variables are on the stack. But the data is not.
If you have a case where truly all the data is on the stack, such as a std::string with the "small string optimization" in effect, or a struct containing integers, then std::move() will buy you nothing.
Related
Say I have something like this
extern "C" void make_foo (char** tgt) {
*tgt = (char*) malloc(4*sizeof(char));
strncpy(*tgt, "foo", 4);
}
int main() {
char* foo;
make_foo(&foo);
std::string foos{{foo}};
free(foo);
...
return 0;
}
Now, I would like to avoid using and then deleting the foo buffer. I.e., I'd like to change the initialisation of foos to something like
std::string foos{{std::move(foo)}};
and use no explicit free.
Turns out this actually compiles and seems to work, but I have a rather suspicious feel about it: does it actually move the C-defined string and properly free the storage? Or does it just ignore the std::move and leak the storage once the foo pointer goes out of scope?
It's not that I worry too much about the extra copy, but I do wonder if it's possible to write this in modern move-semantics style.
std::string constructor #5:
Constructs the string with the contents initialized with a copy of
the null-terminated character string pointed to by s. The length of
the string is determined by the first null character. The behavior is
undefined if s does not point at an array of at least
Traits::length(s)+1 elements of CharT, including the case when s is a
null pointer.
Your C-string is copied (the std::move doesn't matter here) and thus it is up to you to call free on foo.
A std::string will never take ownership.
tl;dr: Not really.
Pointers don't have any special move semantics. x = std::move(my_char_ptr) is the same as x = my_char_ptr. They are not similar in that regard to, say, std::vector's, in which moving takes away the allocated space.
However, in your case, if you want to keep existing heap buffers and treat them as strings - it can't be using std::string's, as they can't be constructed as a wrapper of an existing buffer (and there's small-string optimization etc.). Instead, consider either implementing a custom container, e.g. with some string data buffer (std::vector<char>) and an std::vector<std::string_view>, whose elements point into that buffer.
According to the reference:
1. std::vector::swap exchanges contents;
2. copying strings is deep.
But how about swapping a function returned array of strings?
My guess is, the function returns a copy of the internal strings. So the swapping should be fine. However, debugging in visual studio, the internal strings and the outside strings (after swapping) have the same memory addresses at the raw_view, so i doubt my guess.
Thank you.
std::vector<std::string> get_name_list()
{
std::string name1 = "foo";
std::string name2 = "bar";
std::vector<std::string> names;
names.push_back(name1);
names.push_back(name2);
return names;
}
void main()
{
std::vector<std::string> list;
list.swap(get_name_list()); // deep copy strings? or access local memory?
}
In general passing and returning by value avoids memory leaks, though of course the types involved might still have buggy memory management. This shouldn't be the case for standard library containers and std::string.
There is no memory leak in your code [edit: assuming it compiles, that is; you can make it compile by changing it to get_name_list().swap(list).] Swapping two vectors does not copy or move the vectors' elements. You can imagine that the two vectors' pointers to their internal data arrays are simply swapped, leaving the objects themselves in place.
Your code doesn't compile since you're trying to bind a temporary to a l-value reference
template <class T> void swap (T& a, T& b)
MSVC seems to accept it and swaps the contents (it doesn't copy the contents) but this is not conformant. It shouldn't leak (it's swapping the internal contents, not copying the contents) but it shouldn't work that way either.
Assuming both a C++11-conformant compiler and standard library in your case you'd better off relying on the compiler doing the right choice: i.e. returning that std::vector<std::string> is indeed a temporary and subject to move semantics. No memory leak would be involved since you're using a vector which (assuming no bugs in the implementation of course) provides move operators/constructors.
std::vector<std::string> list;
list = get_name_list();
Live Example
The signature
void main()
is also wrong even though, as Brian commented, MSVC might accept it. The standard signature is
int main()
I am writing my own string class for really just for learning and cementing some knowledge. I have everything working except I want to have a constructor that uses move semantics with an std::string.
Within my constructor I need to copy and null out the std::string data pointers and other things, it needs to be left in an empty but valid state, without deleting the data the string points to, how do I do this?
So far I have this
class String
{
private:
char* mpData;
unsigned int mLength;
public:
String( std::string&& str)
:mpData(nullptr), mLength(0)
{
// need to copy the memory pointer from std::string to this->mpData
// need to null out the std::string memory pointer
//str.clear(); // can't use clear because it deletes the memory
}
~String()
{
delete[] mpData;
mLength = 0;
}
There is no way to do this. The implementation of std::string is implementation-defined. Every implementation is different.
Further, there is no guarantee that the string will be contained in a dynamically allocated array. Some std::string implementations perform a small string optimization, where small strings are stored inside of the std::string object itself.
The below implementation accomplishes what was requested, but at some risk.
Notes about this approach:
It uses std::string to manage the allocated memory. In my view, layering the allocation like this is a good idea because it reduces the number of things that a single class is trying to accomplish (but due to the use of a pointer, this class still has potential bugs associated with compiler-generated copy operations).
I did away with the delete operation since that is now performed automatically by the allocation object.
It will invoke so-called undefined behavior if mpData is used to modify the underlying data. It is undefined, as indicated here, because the standard says it is undefined. I wonder, though, if there are real-world implementations for which const char * std::string::data() behaves differently than T * std::vector::data() -- through which such modifications would be perfectly legal. It may be possible that modifications via data() would not be reflected in subsequent accesses to allocation, but based on the discussion in this question, it seems very unlikely that such modifications would result in unpredictable behavior assuming that no further changes are made via the allocation object.
Is it truly optimized for move semantics? That may be implementation defined. It may also depend on the actual value of the incoming string. As I noted in my other answer, the move constructor provides a mechanism for optimization -- but it doesn't guarantee that an optimization will occur.
class String
{
private:
char* mpData;
unsigned int mLength;
std::string allocation;
public:
String( std::string&& str)
: mpData(const_cast<char*>(str.data())) // cast used to invoke UB
, mLength(str.length())
, allocation(std::move(str)) // this is where the magic happens
{}
};
I am interpreting the question as "can I make the move constructor result in correct behavior" and not "can I make the move constructor optimally fast".
If the question is strictly, "is there a portable way to steal the internal memory from std::string", then the answer is "no, because there is no 'transfer memory ownership' operation provided in the public API".
The following quote from this explanation of move semantics provides a good summary of "move constructors"...
C++0x introduces a new mechanism called "rvalue reference" which,
among other things, allows us to detect rvalue arguments via function
overloading. All we have to do is write a constructor with an rvalue
reference parameter. Inside that constructor we can do anything we
want with the source, as long as we leave it in some valid state.
Based on this description, it seems to me that you can implement the "move semantics" constructor (or "move constructor") without being obligated to actually steal the internal data buffers.
An example implementation:
String( std::string&& str)
:mpData(new char[str.length()]), mLength(str.length())
{
for ( int i=0; i<mLength; i++ ) mpData[i] = str[i];
}
As I understand it, the point of move semantics is that you can be more efficient if you want to. Since the incoming object is transient, its contents do not need to be preserved -- so it is legal to steal them, but it is not mandatory. Maybe, there is no point to implementing this if you aren't transferring ownership of some heap-based object, but it seems like it should be legal. Perhaps it is useful as a stepping stone -- you can steal as much as is useful, even if that isn't the entire contents.
By the way, there is a closely related question here in which the same kind of non-standard string is being built and includes a move constructor for std::string. The internals of the class are different however, and it is suggested that std::string may have built-in support for move semantics internally (std::string -> std::string).
Prior to C++11, if I had a function that operated on large objects, my instinct would be to write functions with this kind of prototype.
void f(A &return_value, A const ¶meter_value);
(Here, return_value is just a blank object which will receive the output of the function. A is just some class which is large and expensive to copy.)
In C++11, taking advantage of move semantics, the default recommendation (as I understand it) is the more straightforward:
A f(A const ¶meter_value);
Is there ever still a need to do it the old way, passing in an object to hold the return value?
Others have covered the case where A might not have a cheap move constructor. I'm assuming your A does. But there is still one more situation where you might want to pass in an "out" parameter:
If A is some type like vector or string and it is known that the "out" parameter already has resources (such as memory) that can be reused within f, then it makes sense to reuse that resource if you can. For example consider:
void get_info(std::string&);
bool process_info(const std::string&);
void
foo()
{
std::string info;
for (bool not_done = true; not_done;)
{
info.clear();
get_info(info);
not_done = process_info(info);
}
}
vs:
std::string get_info();
bool process_info(const std::string&);
void
foo()
{
for (bool not_done = true; not_done;)
{
std::string info = get_info();
not_done = process_info(info);
}
}
In the first case, capacity will build up in the string as the loop executes, and that capacity is then potentially reused on each iteration of the loop. In the second case a new string is allocated on every iteration (neglecting the small string optimization buffer).
Now this isn't to say that you should never return std::string by value. Just that you should be aware of this issue and apply engineering judgment on a case by case basis.
It is possible for an object to be large and expensive to copy, and for which move semantics cannot improve on copying. Consider:
struct A {
std::array<double,100000> m_data;
};
It may not be a good idea to design your objects this way, but if you have an object of this type for some reason and you want to write a function to fill the data in then you might do it using an out param.
It depends: does your compiler support return-value-optimization, and is your function f designed to be able to use the RVO your compiler supports?
If so, then yes, by all means return by value. You will gain nothing at all by passing a mutable parameter, and you'll gain a great deal of code clarity by doing it this way. If not, then you have to investigate the definition of A.
For some types, a move is nothing more than a copy. If A doesn't contain anything that is actually worth moving (pointers transferring ownership and so forth), then you're not going to gain anything by moving. A move isn't free, after all; it's simply a copy that knows that anything owned by the original is being transferred to the copy. If the type doesn't own anything, then a move is just a copy.
I'm fairly novice with C++'s strings so the following pattern may be a little fugly. I'm reviewing some code I've written before beginning integration testing with a larger system. What I'd like to know is if it is safe, or if it would be prone to leaking memory?
string somefunc( void ) {
string returnString;
returnString.assign( "A string" );
return returnString;
}
void anotherfunc( void ) {
string myString;
myString.assign( somefunc() );
// ...
return;
}
The understanding I have is that the value of returnString is assigned to a new object myString and then the returnString object is destroyed as part of resolving the call to somefunc. At some point in the future when myString goes out of scope, it too is destroyed.
I would have typically passed a pointer to myString into somefunc() and directly assigned to values to myString but I'm striving to be a little clearer in my code ( and relying on the side effect function style less ).
Yes, returning a string this way (by value) is safe,albeit I would prefer assigning it this way:
string myString = somefunc();
This is easier to read, and is also more efficient (saving the construction of an empty string, which would then be overwritten by the next call to assign).
std::string manages its own memory, and it has properly written copy constructor and assignment operator, so it is safe to use strings this way.
Yes by doing
return returnString
You are invoking the string's copy constructor. Which performs a copy* of returnString into the temporary (aka rValue) that takes the place of "somefunc()" in the calling expression:
myString.assign( somefunc() /*somefunc()'s return becomes temporary*/);
This is in turn passed to assign and used by assign to perform a copy into myString.
So in your case, the copy constructor of string guarantees a deep copy and ensures no memory leaks.
* Note this may or may not be a true deep copy, the behavior of the copy constructor is implementation specific. Some string libraries implement copy-on-write which has some internal bookkeeping to prevent copying until actually needed.
You're completely safe because you're returning the string by value, where the string will be "copied", and not by reference. If you were to return a std::string &, then you'd be doing it wrong, as you'd have a dangling reference. Some compilers, even, might perform return value optimization, which won't even really copy the string upon return. See this post for more information.
Yes, it's (at least normally) safe. One of the most basic contributions of almost any reasonable string class is the ability to act like a basic value for which normal assignment, returns, etc., "just work".
As you said a string returnStringis created inside somefunc and a copy is given back when the function returns. This is perfectly safe.
What you want is to give a reference to myString to somefunc (don't use pointer). It will be perfectly clear:
void somefunc( string& myString ) {
myString.assign( "A string" );
}
void anotherfunc( void ) {
string myString;
somefunc(myString);
// ...
return;
}