Can I detach the buffer from a std::string in C++?

Can I detach the buffer from a std::string in C++? - c++

I'm new to C++. I was assuming std::string uses a reference count to determine when to release the buffer. In the following example, s buffer will be released when f() returns. What if I wanted to give ownership of the string buffer to give_ownership_of and not to release it?
void f()
{
string s = read_str();
give_ownership_of(s);
}
UPDATE
Let me add more details into the question. The actual code looks like this,
string read_str();
void write_str_async(const char *str, void (*free_fn)(const char*));
void f() {
string s = read_str();
// write_str_async() need to access the buffer of s after f() returns.
// So I'm responsible to keep s alive until write_str_async() calls free_fn to release the buffer when the write is done.
// The PROBLEM here is that s will be released when the variable scope ends. NOTE: I'm not able to change either read_str() or write_str_async() here.
write_str_async(s.c_str(), my_free_fn);
}

In C++11, they added something like this, called a move. std::string has a move constructor and a move assignment operator.
The compiler here can determine that s has reached the end of its lifetime, so rather than copying into give_ownership_of it can move, which is basically just copying a few integers / pointers around, rather than the contents of the std::string. Note that this is still slower than passing by reference, so if a reference works for you, you should prefer that regardless.
https://akrzemi1.wordpress.com/2011/08/11/move-constructor/
I would strongly recommend against using std::shared_ptr for this, as there is no actual sharing of ownership.
In cases where you want to make the move explicit, then you would do this:
give_ownership_of(std::move(s));
Note that you do not need to (and in fact should not) use std::move when returning a value from a function. Just return the value normally. The compiler can in many cases perform "return value optimization", which means that there is no copy and no move. It's similar to passing the value in by reference and assigning to that, except it actually gives the optimizer a little more room (because it knows that the std::string is a unique object that doesn't alias anything). It's also more straightforward to read.

There is no standard way to take ownership of the underlying data of a std::string. Generally, one should instead return a string object itself or have the caller pass in a reference, e.g.
void f(std::string& s) {
s = read_str();
}

The question is ambiguous but the examples below should illustrate all the alternatives. The last one is probably what you want, and it's a new feature added in C++11 (std::move and rvalue references).
This allows you to transfer the buffer to a new object of the same type, but you can never eliminate std::string entirely. You can ignore that string and treat the buffer memory as bytes, but deallocation must be performed by destroying a string.
// will retain s for duration of function
void give_ownership_of( std::string &s );
// will retain a copy of s for duration of function
void give_ownership_of( std::string s );
struct give_ownership_of {
std::string s;
// will retain a copy of in_s for object lifetime
give_ownership_of( std::string const &in_s ) : s( in_s ) {}
// will transfer buffer of in_s to s and retain that (C++11 only)
// you would write give_ownership_of( std::move( s ) );
give_ownership_of( std::string &&in_s ) : s( std::move( in_s ) ) {}
};

Related

Can a char* be moved into an std::string?

Say I have something like this
extern "C" void make_foo (char** tgt) {
*tgt = (char*) malloc(4*sizeof(char));
strncpy(*tgt, "foo", 4);
}
int main() {
char* foo;
make_foo(&foo);
std::string foos{{foo}};
free(foo);
...
return 0;
}
Now, I would like to avoid using and then deleting the foo buffer. I.e., I'd like to change the initialisation of foos to something like
std::string foos{{std::move(foo)}};
and use no explicit free.
Turns out this actually compiles and seems to work, but I have a rather suspicious feel about it: does it actually move the C-defined string and properly free the storage? Or does it just ignore the std::move and leak the storage once the foo pointer goes out of scope?
It's not that I worry too much about the extra copy, but I do wonder if it's possible to write this in modern move-semantics style.

std::string constructor #5:
Constructs the string with the contents initialized with a copy of
the null-terminated character string pointed to by s. The length of
the string is determined by the first null character. The behavior is
undefined if s does not point at an array of at least
Traits::length(s)+1 elements of CharT, including the case when s is a
null pointer.
Your C-string is copied (the std::move doesn't matter here) and thus it is up to you to call free on foo.
A std::string will never take ownership.

tl;dr: Not really.
Pointers don't have any special move semantics. x = std::move(my_char_ptr) is the same as x = my_char_ptr. They are not similar in that regard to, say, std::vector's, in which moving takes away the allocated space.
However, in your case, if you want to keep existing heap buffers and treat them as strings - it can't be using std::string's, as they can't be constructed as a wrapper of an existing buffer (and there's small-string optimization etc.). Instead, consider either implementing a custom container, e.g. with some string data buffer (std::vector<char>) and an std::vector<std::string_view>, whose elements point into that buffer.

Is it possible to std::move local stack variables?

Please consider the following code:
struct MyStruct
{
int iInteger;
string strString;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct NewStruct = { 8, "Hello" };
vecStructs.push_back(std::move(NewStruct));
}
int main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}
Why does this work?
At the moment when MyFunc is called, the return address should be placed on the stack of the current thread. Now create the NewStruct object gets created, which should be placed on the stack as well. With std::move, I tell the compiler, that i do not plan to use the NewStruct reference anymore. He can steal the memory. (The push_back function is the one with the move semantics.)
But when the function returns and NewStruct falls out of scope. Even if the compiler would not remove the memory, occupied by the originally existing structure from the stack, he has at least to remove the previously stored return address.
This would lead to a fragmented stack and future allocations would overwrite the "moved" Memory.
Can someone explain this to me, please?
EDIT:
First of all: Thank you very much for your answers.
But from what i have learned, I still cannot understand, why the following does not work like I expect it to work:
struct MyStruct
{
int iInteger;
string strString;
string strString2;
};
void MyFunc(vector<MyStruct>& vecStructs)
{
MyStruct oNewStruct = { 8, "Hello", "Definetly more than 16 characters" };
vecStructs.push_back(std::move(oNewStruct));
// At this point, oNewStruct.String2 should be "", because its memory was stolen.
// But only when I explicitly create a move-constructor in the form which was
// stated by Yakk, it is really that case.
}
void main()
{
vector<MyStruct> vecStructs;
MyFunc(vecStructs);
}

First, std::move does not move, and std::forward does not forward.
std::move is a cast to an rvalue reference. By convention, rvalue references are treated as "references you are permitted to move the data out of, as the caller promises they really don't need that data anymore".
On the other side of the fence, rvalue references implicitly bind to the return value of std::move (and sometimes forward), to temporary objects, in certain cases when returning a local from a function, and when using a member of a temporary or a moved-from object.
What happens within the function taking an rvalue reference is not magic. It cannot claim the storage directly within the object in question. It can, however, tear out its guts; it has permission (by convention) to mess with its arguments internal state if it can do the operation faster that way.
Now, C++ will automatically write some move constructors for you.
struct MyStruct
{
int iInteger;
string strString;
};
In this case, it will write something that roughly looks like this:
MyStruct::MyStruct( MyStruct&& other ) noexcept(true) :
iInteger( std::move(other.iInteger) ),
strString( std::move(other.strString) )
{}
Ie, it will do an element-wise move construct.
When you move an integer, nothing interesting happens. There isn't any benefit to messing with the source integer's state.
When you move a std::string, we get some efficiencies. The C++ standard describes what happens when you move from one std::string to another. Basically, if the source std::string is using the heap, the heap storage is transferred to the destination std::string.
This is a general pattern of C++ containers; when you move from them, they steal the "heap allocated" storage of the source container and reuse it in the destination.
Note that the source std::string remains a std::string, just one that has its "guts torn out". Most container like things are left empty, I don't recall if std::string makes that guarantee (it might not due to SBO), and it isn't important right now.
In short, when you move from something, its memory is not "reused", but memory it owns can be reused.
In your case, MyStruct has a std::string which can use heap allocated memory. This heap allocated memory can be moved into the MyStruct stored in the std::vector.
Going a bit further down the rabbit hole, "Hello" is likely to be so short that SBO occurs (small buffer optimization), and the std::string doesn't use the heap at all. For this particular case, there may be next to no performance improvement due to moveing.

Your example can be reduced to:
vector<string> vec;
string str; // populate with a really long string
vec.push_back(std::move(str));
This still raises the question, "Is it possible to move local stack variables." It just removes some extraneous code to make it easier to understand.
The answer is yes. Code like the above can benefit from std::move because std::string--at least if the content is large enough--stores it actual data on the heap, even if the variable is on the stack.
If you do not use std::move(), you can expect code like the above to copy the content of str, which could be arbitrarily large. If you do use std::move(), only the direct members of the string will be copied (move does not need to "zero out" the old locations), and the data will be used without modification or copying.
It's basically the difference between this:
char* str; // populate with a really long string
char* other = new char[strlen(str)+1];
strcpy(other, str);
vs
char* str; // populate with a really long string
char* other = str;
In both cases, the variables are on the stack. But the data is not.
If you have a case where truly all the data is on the stack, such as a std::string with the "small string optimization" in effect, or a struct containing integers, then std::move() will buy you nothing.

How to use a std::string without copying?

I have a class say,
class Foo
{
public:
void ProcessString(std::string &buffer)
{
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
void Bar(std::string &buffer)
{
// perform other operations on "std::string" buffer
}
void Baz(std::string &buffer)
{
// perform other operations on "std::string" buffer
}
};
This class tries to use a std::string buffer to perform operations on it using various methods under these conditions:
I don't want to pass a copy of std::string which I already have.
I don't want to create multiple objects of this class.
For example:
// Once an object is created
Foo myObject;
// We could pass many different std::string's to same method without copying
std::string s1, s2, s3;
myObject.ProcessString(s1);
myObject.ProcessString(s2);
myObject.ProcessString(s3);
I could use the string and assign it as a class member so that other functions using can know about it.
But it seems we cannot have a reference class member std::string &buffer because it can only be initialized from constructor.
I could use a pointer to std::string i.e. std::string *buffer and use it as a class member and then pass the addresses of s1, s2, s3.
class Foo
{
public:
void ProcessString(std::string *buf)
{
// Save pointer
buffer = buf;
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
void Bar()
{
// perform other operations on "std::string" buffer
}
void Baz()
{
// perform other operations on "std::string" buffer
}
private:
std::string *buffer;
};
Or, the other way could be pass each functions a reference to std::string buffer just as shown in the first example above.
Both ways kind of seem a bit ugly workarounds to be able to use a std::string without copying as I have rarely seen the usage of std::string as a pointer or pass all the functions of class the same argument.
Is there a better around this or what I'm doing is just fine?

Keeping in MyObject a reference or a pointer to a string which is not ownned by your object is dangerous. It will be easy to get nasty undefined behaviour.
Look at the following legal example (Bar is public):
myObject.ProcessString(s1); // start with s1 and keep its address
myObject.Bar(); // works with s1 (using address previously stored)
Look at the following UB:
if (is_today) {
myObject.ProcessString(string("Hello")); // uses an automatic temporary string
} // !! end of block: temporary is destroyed!
else {
string tmp = to_string(1234); // create a block variable
myObject.ProcessString(tmp); // call the main function
} // !! end of block: tmp is destroyed
myObject.Bar(); // expects to work with pointer, but in reality use an object that was already destroyed !! => UB
The errors are very nasty, because when reading function's usage, everything seems ok and well managed. The problem is hidden by automatic destruction of bloc variables.
So if you really want to avoid the copy of the string, you could use a pointer as you envisaged, but you shall only use this pointer in functions called directly by ProcessString(), and make these functions private.
In all other case, I'd strongly suggest to reconsider your position, and envisage:
a local copy of the string in the object that shall use it.
Or use a string& parameters in all the object's function that need it. This avoids the copies but leaves to caller the responsibility of organising the proper management of the string.

You basically need to answer this question: who owns the string? Does Foo own the string? Does the external caller own the string? Or do they both share ownership of the string.
"Owning" the string means that the lifetime of the string is tied to it. So if Foo owns the string, the string will stop existing when Foo stops existing or destroys it. Shared ownership is far more complicated, but we can make it simpler by saying that the string will exist as long as any of the owners keep it.
Each situation has a different answer:
Foo owns the string: Copy the string into Foo, then let the member methods mutate it.
External resource owns the string: Foo should never hold a reference to the string outside of its own stack, since the string could be destroyed without its knowledge. This means that it needs to be passed by reference to every method that uses it and does not own it, even if the methods are in the same class.
Shared ownership: Use a shared_ptr when creating the string, then pass that shared_ptr to every instance that shares ownership. You then copy the shared_ptr to a member variable, and methods access it. This has much higher overhead then passing by reference, but if you want shared ownership it is one of the safest ways to do so.
There are actually several other kinds of ways to model ownership, but they tend to be more esoteric. Weak ownership, transferable ownership, etc.

Since your requirement is that
1.I don't want to pass a copy of std::string which I already have.
2.I don't want to create multiple objects of this class.
using pass by ref would be the solution to 1
using static would be the solution to 2. since it is a static memeber method, there would be only one copy of this method. it wont belong to any object, though. With that being said, you can call this method directly instead of through an object.
For example,
class Foo
{
static void ProcessString(std::string &s)
{
// perform operations on std::string
// call other functions within class
// which use same std::string string
}
}
when you call this method, it would be something like this:
std::string s1, s2, s3;
Foo::ProcessString(s1);
Foo::ProcessString(s2);
Foo::ProcessString(s3);
One step further, if you want only one instance of this class, you can refer to singleton design pattern.

std::move - std::string - internal pointer

I'm surprised that s and s2 internal pointer to "sample" are not equal, what is the explanation ?
#include <string>
#include <cassert>
int main()
{
std::string s("sample");
std::string s2(std::move(s));
assert(
reinterpret_cast<int*>(const_cast<char*>(s.data())) ==
reinterpret_cast<int*>(const_cast<char*>(s2.data()))
); // assertion failure here
return 1;
}

Why do you assume that they should be the same? You are constructing s2 from s using its move constructor. This transfers the data ownership from s over to s2 and leaves s in an “empty” state. The standard doesn’t specify in detail what this entails, but accessing s’s data after that (without re-assigning it first) is undefined.
A much simplified (and incomplete) version of string could look as follows:
class string {
char* buffer;
public:
string(char const* from)
: buffer(new char[std::strlen(from) + 1])
{
std::strcpy(buffer, from);
}
string(string&& other)
: buffer(other.buffer)
{
other.buffer = nullptr; // (*)
}
~string() {
delete[] buffer;
}
char const* data() const { return buffer; }
};
I hope that shows why the data() members are not equal. If we had omitted the line marked by (*) we would delete the internal buffer twice at the end of main: once for s and once for s2. Resetting the buffer pointer ensures that this doesn’t happen.

Each std::string has its own buffer that it points to. The fact that you moved one from the other doesn't make it share one buffer. When you initialize s2, it takes over the buffer from s and becomes the owner of that buffer. In order to avoid s to "own" the same buffer, it simply sets s's buffer to a new empty one (and which s is responsible for now).
Technically, there are also some optimizations involved, most likely there isn't a real buffer that got explicitly allocated for empty or very small strings, but instead the implementation of std::string will use a part of the std::string's memory itself. This is usually known as the small-string-optimization in the STL.
Also note that s has been moved away, so the access of your code to it's data is illegal, meaning it could return anything.

You should not use the moved-from string before replacing its value with some known value:
The library code is required to leave a valid value in the argument, but unless the type or function documents otherwise, there are no other constraints on the resulting argument value. This means that it's generally wisest to avoid using a moved from argument again. If you have to use it again, be sure to re-initialize it with a known value before doing so.
The library can stick anything it wants into the string, but it's very likely that you would end up with an empty string. That's what running an example from cppreference produces. However, one should not expect to find anything in particular inside a moved-from object.

Is returning a C++ std::string object safe from memory leaks?

I'm fairly novice with C++'s strings so the following pattern may be a little fugly. I'm reviewing some code I've written before beginning integration testing with a larger system. What I'd like to know is if it is safe, or if it would be prone to leaking memory?
string somefunc( void ) {
string returnString;
returnString.assign( "A string" );
return returnString;
}
void anotherfunc( void ) {
string myString;
myString.assign( somefunc() );
// ...
return;
}
The understanding I have is that the value of returnString is assigned to a new object myString and then the returnString object is destroyed as part of resolving the call to somefunc. At some point in the future when myString goes out of scope, it too is destroyed.
I would have typically passed a pointer to myString into somefunc() and directly assigned to values to myString but I'm striving to be a little clearer in my code ( and relying on the side effect function style less ).

Yes, returning a string this way (by value) is safe,albeit I would prefer assigning it this way:
string myString = somefunc();
This is easier to read, and is also more efficient (saving the construction of an empty string, which would then be overwritten by the next call to assign).
std::string manages its own memory, and it has properly written copy constructor and assignment operator, so it is safe to use strings this way.

Yes by doing
return returnString
You are invoking the string's copy constructor. Which performs a copy* of returnString into the temporary (aka rValue) that takes the place of "somefunc()" in the calling expression:
myString.assign( somefunc() /*somefunc()'s return becomes temporary*/);
This is in turn passed to assign and used by assign to perform a copy into myString.
So in your case, the copy constructor of string guarantees a deep copy and ensures no memory leaks.
* Note this may or may not be a true deep copy, the behavior of the copy constructor is implementation specific. Some string libraries implement copy-on-write which has some internal bookkeeping to prevent copying until actually needed.

You're completely safe because you're returning the string by value, where the string will be "copied", and not by reference. If you were to return a std::string &, then you'd be doing it wrong, as you'd have a dangling reference. Some compilers, even, might perform return value optimization, which won't even really copy the string upon return. See this post for more information.

Yes, it's (at least normally) safe. One of the most basic contributions of almost any reasonable string class is the ability to act like a basic value for which normal assignment, returns, etc., "just work".

As you said a string returnStringis created inside somefunc and a copy is given back when the function returns. This is perfectly safe.
What you want is to give a reference to myString to somefunc (don't use pointer). It will be perfectly clear:
void somefunc( string& myString ) {
myString.assign( "A string" );
}
void anotherfunc( void ) {
string myString;
somefunc(myString);
// ...
return;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can I detach the buffer from a std::string in C++? - c++

There is no standard way to take ownership of the underlying data of a std::string. Generally, one should instead return a string object itself or have the caller pass in a reference, e.g. void f(std::string& s) { s = read_str(); }

Related

Can a char* be moved into an std::string?

Is it possible to std::move local stack variables?

How to use a std::string without copying?

std::move - std::string - internal pointer

Is returning a C++ std::string object safe from memory leaks?

Categories

Resources