std::move - std::string - internal pointer - c++

I'm surprised that s and s2 internal pointer to "sample" are not equal, what is the explanation ?
#include <string>
#include <cassert>
int main()
{
std::string s("sample");
std::string s2(std::move(s));
assert(
reinterpret_cast<int*>(const_cast<char*>(s.data())) ==
reinterpret_cast<int*>(const_cast<char*>(s2.data()))
); // assertion failure here
return 1;
}

Why do you assume that they should be the same? You are constructing s2 from s using its move constructor. This transfers the data ownership from s over to s2 and leaves s in an “empty” state. The standard doesn’t specify in detail what this entails, but accessing s’s data after that (without re-assigning it first) is undefined.
A much simplified (and incomplete) version of string could look as follows:
class string {
char* buffer;
public:
string(char const* from)
: buffer(new char[std::strlen(from) + 1])
{
std::strcpy(buffer, from);
}
string(string&& other)
: buffer(other.buffer)
{
other.buffer = nullptr; // (*)
}
~string() {
delete[] buffer;
}
char const* data() const { return buffer; }
};
I hope that shows why the data() members are not equal. If we had omitted the line marked by (*) we would delete the internal buffer twice at the end of main: once for s and once for s2. Resetting the buffer pointer ensures that this doesn’t happen.

Each std::string has its own buffer that it points to. The fact that you moved one from the other doesn't make it share one buffer. When you initialize s2, it takes over the buffer from s and becomes the owner of that buffer. In order to avoid s to "own" the same buffer, it simply sets s's buffer to a new empty one (and which s is responsible for now).
Technically, there are also some optimizations involved, most likely there isn't a real buffer that got explicitly allocated for empty or very small strings, but instead the implementation of std::string will use a part of the std::string's memory itself. This is usually known as the small-string-optimization in the STL.
Also note that s has been moved away, so the access of your code to it's data is illegal, meaning it could return anything.

You should not use the moved-from string before replacing its value with some known value:
The library code is required to leave a valid value in the argument, but unless the type or function documents otherwise, there are no other constraints on the resulting argument value. This means that it's generally wisest to avoid using a moved from argument again. If you have to use it again, be sure to re-initialize it with a known value before doing so.
The library can stick anything it wants into the string, but it's very likely that you would end up with an empty string. That's what running an example from cppreference produces. However, one should not expect to find anything in particular inside a moved-from object.

Related

Can a char* be moved into an std::string?

Say I have something like this
extern "C" void make_foo (char** tgt) {
*tgt = (char*) malloc(4*sizeof(char));
strncpy(*tgt, "foo", 4);
}
int main() {
char* foo;
make_foo(&foo);
std::string foos{{foo}};
free(foo);
...
return 0;
}
Now, I would like to avoid using and then deleting the foo buffer. I.e., I'd like to change the initialisation of foos to something like
std::string foos{{std::move(foo)}};
and use no explicit free.
Turns out this actually compiles and seems to work, but I have a rather suspicious feel about it: does it actually move the C-defined string and properly free the storage? Or does it just ignore the std::move and leak the storage once the foo pointer goes out of scope?
It's not that I worry too much about the extra copy, but I do wonder if it's possible to write this in modern move-semantics style.
std::string constructor #5:
Constructs the string with the contents initialized with a copy of
the null-terminated character string pointed to by s. The length of
the string is determined by the first null character. The behavior is
undefined if s does not point at an array of at least
Traits::length(s)+1 elements of CharT, including the case when s is a
null pointer.
Your C-string is copied (the std::move doesn't matter here) and thus it is up to you to call free on foo.
A std::string will never take ownership.
tl;dr: Not really.
Pointers don't have any special move semantics. x = std::move(my_char_ptr) is the same as x = my_char_ptr. They are not similar in that regard to, say, std::vector's, in which moving takes away the allocated space.
However, in your case, if you want to keep existing heap buffers and treat them as strings - it can't be using std::string's, as they can't be constructed as a wrapper of an existing buffer (and there's small-string optimization etc.). Instead, consider either implementing a custom container, e.g. with some string data buffer (std::vector<char>) and an std::vector<std::string_view>, whose elements point into that buffer.

Is it possible to std::move local stack variables?

Please consider the following code:
struct MyStruct
{
int iInteger;
string strString;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct NewStruct = { 8, "Hello" };
vecStructs.push_back(std::move(NewStruct));
}
int main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
Why does this work?
At the moment when MyFunc is called, the return address should be placed on the stack of the current thread. Now create the NewStruct object gets created, which should be placed on the stack as well. With std::move, I tell the compiler, that i do not plan to use the NewStruct reference anymore. He can steal the memory. (The push_back function is the one with the move semantics.)
But when the function returns and NewStruct falls out of scope. Even if the compiler would not remove the memory, occupied by the originally existing structure from the stack, he has at least to remove the previously stored return address.
This would lead to a fragmented stack and future allocations would overwrite the "moved" Memory.
Can someone explain this to me, please?
EDIT:
First of all: Thank you very much for your answers.
But from what i have learned, I still cannot understand, why the following does not work like I expect it to work:
struct MyStruct
{
int iInteger;
string strString;
string strString2;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct oNewStruct = { 8, "Hello", "Definetly more than 16 characters" };
vecStructs.push_back(std::move(oNewStruct));
// At this point, oNewStruct.String2 should be "", because its memory was stolen.
// But only when I explicitly create a move-constructor in the form which was
// stated by Yakk, it is really that case.
}
void main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
First, std::move does not move, and std::forward does not forward.
std::move is a cast to an rvalue reference. By convention, rvalue references are treated as "references you are permitted to move the data out of, as the caller promises they really don't need that data anymore".
On the other side of the fence, rvalue references implicitly bind to the return value of std::move (and sometimes forward), to temporary objects, in certain cases when returning a local from a function, and when using a member of a temporary or a moved-from object.
What happens within the function taking an rvalue reference is not magic. It cannot claim the storage directly within the object in question. It can, however, tear out its guts; it has permission (by convention) to mess with its arguments internal state if it can do the operation faster that way.
Now, C++ will automatically write some move constructors for you.
struct MyStruct
{
int iInteger;
string strString;
};
In this case, it will write something that roughly looks like this:
MyStruct::MyStruct( MyStruct&& other ) noexcept(true) :
iInteger( std::move(other.iInteger) ),
strString( std::move(other.strString) )
{}
Ie, it will do an element-wise move construct.
When you move an integer, nothing interesting happens. There isn't any benefit to messing with the source integer's state.
When you move a std::string, we get some efficiencies. The C++ standard describes what happens when you move from one std::string to another. Basically, if the source std::string is using the heap, the heap storage is transferred to the destination std::string.
This is a general pattern of C++ containers; when you move from them, they steal the "heap allocated" storage of the source container and reuse it in the destination.
Note that the source std::string remains a std::string, just one that has its "guts torn out". Most container like things are left empty, I don't recall if std::string makes that guarantee (it might not due to SBO), and it isn't important right now.
In short, when you move from something, its memory is not "reused", but memory it owns can be reused.
In your case, MyStruct has a std::string which can use heap allocated memory. This heap allocated memory can be moved into the MyStruct stored in the std::vector.
Going a bit further down the rabbit hole, "Hello" is likely to be so short that SBO occurs (small buffer optimization), and the std::string doesn't use the heap at all. For this particular case, there may be next to no performance improvement due to moveing.
Your example can be reduced to:
vector<string> vec;
string str; // populate with a really long string
vec.push_back(std::move(str));
This still raises the question, "Is it possible to move local stack variables." It just removes some extraneous code to make it easier to understand.
The answer is yes. Code like the above can benefit from std::move because std::string--at least if the content is large enough--stores it actual data on the heap, even if the variable is on the stack.
If you do not use std::move(), you can expect code like the above to copy the content of str, which could be arbitrarily large. If you do use std::move(), only the direct members of the string will be copied (move does not need to "zero out" the old locations), and the data will be used without modification or copying.
It's basically the difference between this:
char* str; // populate with a really long string
char* other = new char[strlen(str)+1];
strcpy(other, str);
vs
char* str; // populate with a really long string
char* other = str;
In both cases, the variables are on the stack. But the data is not.
If you have a case where truly all the data is on the stack, such as a std::string with the "small string optimization" in effect, or a struct containing integers, then std::move() will buy you nothing.

Does the use of std::move have any performance benefits?

Please consider this code :
#include <iostream>
#include <vector>
#include <utility>
std::vector<int> vecTest;
int main()
{
int someRval = 3;
vecTest.push_back(someRval);
vecTest.push_back(std::move(someRval));
return 0;
}
So as far as I understand, someRval's value will be copied into vecTest on the first call of push_back(), but on the second someRval produces an x value. My question is, will there ever be any performance benefit, I mean probably not with int but would there maybe be some performance benefit when working with much larger objects?
The performance benefit from moving usually comes from dynamic allocation being ruled out.
Consider an over-simplified (and naive) string (missing a copy-assignment operator and a move-assignment operator):
class MyString
{
public:
MyString() : data(nullptr) {}
~MyString()
{
delete[] data;
}
MyString(const MyString& other) //copy constructor
{
data = new char[strlen(other.c_str()) + 1]; // another allocation
strcpy(data, other.c_str()); // copy over the old string buffer
}
void set(const char* str)
{
char* newString = new char[strlen(str) + 1];
strcpy(newString, str);
delete[] data;
data = newString;
}
const char* c_str() const
{
return data;
}
private:
char* data;
};
This is all fine and dandy but the copy constructor here is possibly expensive if your string becomes long. The copy constructor is however required to copy over everything because it's not allowed to touch the other object, it must do exactly what it's name says, copy contents. Now this is the price you have to pay if you need a copy of the string, but if you just want to use the string's state and don't care about what happens with it afterwards you might as well move it.
Moving it only requires to leave the other object in some valid state so we can use everything in other which is exactly what we want. Now, all we have to do instead of copying the content our data pointer is pointing to is just to re-assign our data pointer to the one of other, we're basically stealing the contents of other, we'll also be nice and set the original data pointer to nullptr:
MyString(MyString&& other)
{
data = other.data;
other.data = nullptr;
}
There, this is all we have to do. This is obviously way faster than copying the whole buffer over like the copy constructor is doing.
Example.
Moving "primitive" types like int or even char* does nothing different than copying them.
Complex types, like std::string, can use the information that you are willing to sacrifice the source-object state to make moving far more efficient than copying.
Yes, but it depends on the details of your application - size of the object, and frequence of the operation.
Casting it to an r-value and moving it (by using std:move()) avoids a copy. If the size of the object is large enough, this saves time (consider for example an array with 1 000 000 doubles - copying it typically means copying 4 or more MB of memory).
The other point is frequency - if your code does the respective operation very often, it can add up considerable.
Note that the source object is destroyed (made unusable) in the process, and this might or might not be acceptable for your logic - you need to understand it and code accordingly. If you still need the source object afterwards, it obvioulsy would not work.
Generally, don't optimize unless you need to optimize.

Can I detach the buffer from a std::string in C++?

I'm new to C++. I was assuming std::string uses a reference count to determine when to release the buffer. In the following example, s buffer will be released when f() returns. What if I wanted to give ownership of the string buffer to give_ownership_of and not to release it?
void f()
{
string s = read_str();
give_ownership_of(s);
}
UPDATE
Let me add more details into the question. The actual code looks like this,
string read_str();
void write_str_async(const char *str, void (*free_fn)(const char*));
void f() {
string s = read_str();
// write_str_async() need to access the buffer of s after f() returns.
// So I'm responsible to keep s alive until write_str_async() calls free_fn to release the buffer when the write is done.
// The PROBLEM here is that s will be released when the variable scope ends. NOTE: I'm not able to change either read_str() or write_str_async() here.
write_str_async(s.c_str(), my_free_fn);
}
In C++11, they added something like this, called a move. std::string has a move constructor and a move assignment operator.
The compiler here can determine that s has reached the end of its lifetime, so rather than copying into give_ownership_of it can move, which is basically just copying a few integers / pointers around, rather than the contents of the std::string. Note that this is still slower than passing by reference, so if a reference works for you, you should prefer that regardless.
https://akrzemi1.wordpress.com/2011/08/11/move-constructor/
I would strongly recommend against using std::shared_ptr for this, as there is no actual sharing of ownership.
In cases where you want to make the move explicit, then you would do this:
give_ownership_of(std::move(s));
Note that you do not need to (and in fact should not) use std::move when returning a value from a function. Just return the value normally. The compiler can in many cases perform "return value optimization", which means that there is no copy and no move. It's similar to passing the value in by reference and assigning to that, except it actually gives the optimizer a little more room (because it knows that the std::string is a unique object that doesn't alias anything). It's also more straightforward to read.
There is no standard way to take ownership of the underlying data of a std::string. Generally, one should instead return a string object itself or have the caller pass in a reference, e.g.
void f(std::string& s) {
s = read_str();
}
The question is ambiguous but the examples below should illustrate all the alternatives. The last one is probably what you want, and it's a new feature added in C++11 (std::move and rvalue references).
This allows you to transfer the buffer to a new object of the same type, but you can never eliminate std::string entirely. You can ignore that string and treat the buffer memory as bytes, but deallocation must be performed by destroying a string.
// will retain s for duration of function
void give_ownership_of( std::string &s );
// will retain a copy of s for duration of function
void give_ownership_of( std::string s );
struct give_ownership_of {
std::string s;
// will retain a copy of in_s for object lifetime
give_ownership_of( std::string const &in_s ) : s( in_s ) {}
// will transfer buffer of in_s to s and retain that (C++11 only)
// you would write give_ownership_of( std::move( s ) );
give_ownership_of( std::string &&in_s ) : s( std::move( in_s ) ) {}
};

Is returning a C++ std::string object safe from memory leaks?

I'm fairly novice with C++'s strings so the following pattern may be a little fugly. I'm reviewing some code I've written before beginning integration testing with a larger system. What I'd like to know is if it is safe, or if it would be prone to leaking memory?
string somefunc( void ) {
string returnString;
returnString.assign( "A string" );
return returnString;
}
void anotherfunc( void ) {
string myString;
myString.assign( somefunc() );
// ...
return;
}
The understanding I have is that the value of returnString is assigned to a new object myString and then the returnString object is destroyed as part of resolving the call to somefunc. At some point in the future when myString goes out of scope, it too is destroyed.
I would have typically passed a pointer to myString into somefunc() and directly assigned to values to myString but I'm striving to be a little clearer in my code ( and relying on the side effect function style less ).
Yes, returning a string this way (by value) is safe,albeit I would prefer assigning it this way:
string myString = somefunc();
This is easier to read, and is also more efficient (saving the construction of an empty string, which would then be overwritten by the next call to assign).
std::string manages its own memory, and it has properly written copy constructor and assignment operator, so it is safe to use strings this way.
Yes by doing
return returnString
You are invoking the string's copy constructor. Which performs a copy* of returnString into the temporary (aka rValue) that takes the place of "somefunc()" in the calling expression:
myString.assign( somefunc() /*somefunc()'s return becomes temporary*/);
This is in turn passed to assign and used by assign to perform a copy into myString.
So in your case, the copy constructor of string guarantees a deep copy and ensures no memory leaks.
* Note this may or may not be a true deep copy, the behavior of the copy constructor is implementation specific. Some string libraries implement copy-on-write which has some internal bookkeeping to prevent copying until actually needed.
You're completely safe because you're returning the string by value, where the string will be "copied", and not by reference. If you were to return a std::string &, then you'd be doing it wrong, as you'd have a dangling reference. Some compilers, even, might perform return value optimization, which won't even really copy the string upon return. See this post for more information.
Yes, it's (at least normally) safe. One of the most basic contributions of almost any reasonable string class is the ability to act like a basic value for which normal assignment, returns, etc., "just work".
As you said a string returnStringis created inside somefunc and a copy is given back when the function returns. This is perfectly safe.
What you want is to give a reference to myString to somefunc (don't use pointer). It will be perfectly clear:
void somefunc( string& myString ) {
myString.assign( "A string" );
}
void anotherfunc( void ) {
string myString;
somefunc(myString);
// ...
return;
}