Should I use unique_ptr for a string?

Should I use unique_ptr for a string? - c++

I'm new to C++. I've heard that using unique_ptr / shared_ptr is the "way to go" for references to data allocated on the heap. Does it make sense, therefore, to use unique_ptrs instead of std::strings?

Why would you want to do that?
An std::string object manages the life time of the "contained" string (memory bytes) by itself.
Since you are new to C++. Inside your function / class method, I will advice you create your objects on the stack:
Like so:
std::string s;
as opposed to using the heap:
std::string* s = new std::string();
Objects created on the stack will be destroyed when your object goes out of scope. So there is no need for smart pointers.
You can following this link to know more: http://www.learncpp.com/cpp-tutorial/79-the-stack-and-the-heap/

There's no need to use a std::unique_ptr or std::shared_ptr for a std::string.
You do not have to allocate memory for a simple std::string.
std::string str = "abc";
As simple as that. No need for memory allocation, as the std::string manages the 'real' string by itself.
There are situations which may lead to usage of a pointer, though it is likely a class/struct instance.
For instance consider using
std::unique_ptr<MyClass> p;
instead of
MyClass *p;
if possible.

You normally don't need pointers to strings, just like you generally don't need pointers to integers. You can just store string values, you can pass string values, etc.
But if you are in the exceptional situation where you'd need a pointer to a string, then yes, std::unique_ptr<std::string> or std::shared_ptr<std::string> are better than std::string*.

An std::unique_ptr ensures that the object pointed is not accidentally copied and properly deleted.
As you should avoid dynamic allocation as much as you can, you can simply keep the std::string as a member of your class. If it is a returned value, as already pointed out, the string class is smart enough to properly move resources in a safe way.
You don't need to guarantee that a string is unique, I mean, no copies exist, so the unique ptr is just a too strong constraint.
KIS rule: Keep it simple.

Usually, and by default the answer is no, as others have suggested. However - sometimes, the answer may paradoxically be "Possibly Yes"! How come?
With std::string, you don't control how, when and by whom your buffer is allocated. While this is somewhat mitigated if you use a different allocator (std::basic_string<char, MyAllocatorType>) - the resulting class is not std::string; and will generally not be accepted by functions which taken std::string's. And it may not make sense to get into allocators just for this purpose.
More specifically, you can allow your unique-pointer-based string class to be created as an owning wrapper for an existing buffer.
Now that we can use string_view's - and they are even in the standard in C++20 (std::string_view) - you don't have to rewrite the whole string class for unique-pointer-based strings; all you need is to create a string view over it, using the raw pointer and the size in bytes (or size - 1 if you want better null-termination safety.) And if you do want the std::string methods still, they'll be one-liners, e.g.
std::string_view view() const {
return std::string_view{uptr_.get(), size_};
}
substr(size_type pos = 0, size_type count = npos) const {
return view().substr(pos, count);
}
If you want to update a string in-place, while maintaining its size - an std::string won't work for you: Either its completely constant, or mutable both in size and in contents.

Related

Why moving a std::string landed on a different address? [duplicate]

I want to provide zero-copy, move based API. I want to move a string from thread A into thread B. Ideologically it seems that move shall be able to simply pass\move data from instance A into new instance B with minimal to none copy operations (mainly for addresses). So all data like data pointers will be simply copied no new instance (constructed via move). So does std::move on std::string garantee that .c_str() returns same result on instance before move and instance created via move constructor?

No. There's no requirement for std::string to use dynamic allocation or to do anything specific with such an allocation if it has one. In fact, modern implementations usually put short strings into the string object itself and don't allocate anything; then moving is the same as copying.
It's important to keep in mind that std::string is not a container, even though it looks very similar to one. Containers make stronger guarantees with respect to their elements than std::string does.

No, it's not guaranteed.
Guaranteeing it would basically prohibit (for one example) the short string optimization, in which the entire body of a short string is stored in the string object itself, rather than being allocated separately on the heap.
At least for now, I think SSO is regarded as important enough that the committee would be extremely reluctant to prohibit it (but that could change--when the original C++98 standard was written, they went to considerable trouble to allow copy-on-write strings, but they are now prohibited).

No,
but if that is needed, an option is to put the string in std::unique_ptr. Personally I would typically not rely on the c_str() value for more than the local scope.
Example, on request:
#include <iostream>
#include <string>
#include <memory>
int main() {
std::string ss("hello");
auto u_str = std::make_unique<std::string>(ss);
std::cout << u_str->c_str() <<std::endl;
std::cout << *u_str <<std::endl;
return 0;
}
if you don't have make_unique (new in C++14).
auto u_str = std::unique_ptr<std::string>(new std::string(ss));
Or just copy the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that

It is documented here, so you can assume that the c_str() result is stable under some conditions.
You cannot however assume that c_str() will remain the same after move.
In practice it will stay in case of long string, but it won't stay for short strings.

Does std::move on std::string garantee that .c_str() returns same result?

I want to provide zero-copy, move based API. I want to move a string from thread A into thread B. Ideologically it seems that move shall be able to simply pass\move data from instance A into new instance B with minimal to none copy operations (mainly for addresses). So all data like data pointers will be simply copied no new instance (constructed via move). So does std::move on std::string garantee that .c_str() returns same result on instance before move and instance created via move constructor?

No. There's no requirement for std::string to use dynamic allocation or to do anything specific with such an allocation if it has one. In fact, modern implementations usually put short strings into the string object itself and don't allocate anything; then moving is the same as copying.
It's important to keep in mind that std::string is not a container, even though it looks very similar to one. Containers make stronger guarantees with respect to their elements than std::string does.

No, it's not guaranteed.
Guaranteeing it would basically prohibit (for one example) the short string optimization, in which the entire body of a short string is stored in the string object itself, rather than being allocated separately on the heap.
At least for now, I think SSO is regarded as important enough that the committee would be extremely reluctant to prohibit it (but that could change--when the original C++98 standard was written, they went to considerable trouble to allow copy-on-write strings, but they are now prohibited).

No,
but if that is needed, an option is to put the string in std::unique_ptr. Personally I would typically not rely on the c_str() value for more than the local scope.
Example, on request:
#include <iostream>
#include <string>
#include <memory>
int main() {
std::string ss("hello");
auto u_str = std::make_unique<std::string>(ss);
std::cout << u_str->c_str() <<std::endl;
std::cout << *u_str <<std::endl;
return 0;
}
if you don't have make_unique (new in C++14).
auto u_str = std::unique_ptr<std::string>(new std::string(ss));
Or just copy the whole implementation from the proposal by S.T.L.:
Ideone example on how to do that

It is documented here, so you can assume that the c_str() result is stable under some conditions.
You cannot however assume that c_str() will remain the same after move.
In practice it will stay in case of long string, but it won't stay for short strings.

Array: Storing Objects or References

As a Java developer I have the following C++ question.
If I have objects of type A and I want to store a collection of them in an array,
then should I just store pointers to the objects or is it better to store the object itself?
In my opinion it is better to store pointers because:
1) One can easily remove an object, by setting its pointer to null
2) One saves space.

Pointers or just the objects?
You can't put references in an array in C++. You can make an array of pointers, but I'd still prefer a container and of actual objects rather than pointers because:
No chance to leak, exception safety is easier to deal with.
It isn't less space - if you store an array of pointers you need the memory for the object plus the memory for a pointer.
The only times I'd advocate putting pointers (or smart pointers would be better) in a container (or array if you must) is when your object isn't copy construable and assignable (a requirement for containers, pointers always meet this) or you need them to be polymorphic. E.g.
#include <vector>
struct foo {
virtual void it() {}
};
struct bar : public foo {
int a;
virtual void it() {}
};
int main() {
std::vector<foo> v;
v.push_back(bar()); // not doing what you expected! (the temporary bar gets "made into" a foo before storing as a foo and your vector doesn't get a bar added)
std::vector<foo*> v2;
v2.push_back(new bar()); // Fine
}
If you want to go down this road boost pointer containers might be of interest because they do all of the hard work for you.
Removing from arrays or containers.
Assigning NULL doesn't cause there to be any less pointers in your container/array, (it doesn't handle the delete either), the size remains the same but there are now pointers you can't legally dereference. This makes the rest of your code more complex in the form of extra if statements and prohibits things like:
// need to go out of our way to make sure there's no NULL here
std::for_each(v2.begin(),v2.end(), std::mem_fun(&foo::it));
I really dislike the idea of allowing NULLs in sequences of pointers in general because you quickly end up burying all the real work in a sequence of conditional statements. The alternative is that std::vector provides an erase method that takes an iterator so you can write:
v2.erase(v2.begin());
to remove the first or v2.begin()+1 for the second. There's no easy "erase the nth element" method though on std::vector because of the time complexity - if you're doing lots of erasing then there are other containers which might be more appropriate.
For an array you can simulate erasing with:
#include <utility>
#include <iterator>
#include <algorithm>
#include <iostream>
int main() {
int arr[] = {1,2,3,4};
int len = sizeof(arr)/sizeof(*arr);
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// remove 2nd element, without preserving order:
std::swap(arr[1], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
// and again, first element:
std::swap(arr[0], arr[len-1]);
len -= 1;
std::copy(arr, arr+len, std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
preserving the order requires a series of shuffles instead of a single swap, which nicely illustrates the complexity of erasing that std::vector faces. Of course by doing this you've just reinvented a pretty big wheel a whole lot less usefully and flexibly than a standard library container would do for you for free!

It sounds like you are confusing references with pointers. C++ has 3 common ways of representing object handles
References
Pointers
Values
Coming from Java the most analogous way is to do so with a pointer. This is likely what you are trying to do here.
How they are stored though has some pretty fundamental effects on their behaviors. When you store as a value you are often dealing with copies of the values. Where pointers are dealing with one object with multiple references. Giving a flat answer of one is better than the other is not really possible without a bit more context on what these objects do

It completely depends on what you want to do... but you're misguided in some ways.
Things you should know are:
You can't set a reference to NULL in C++, though you can set a pointer to NULL.
A reference can only be made to an existing object - it must start initialized as such.
A reference cannot be changed (though the referenced value can be).
You wouldn't save space, in fact you would use more since you're using an object and a reference. If you need to reference the same object multiple times then you save space, but you might as well use a pointer - it's more flexible in MOST (read: not all) scenarios.
A last important one: STL containers (vector, list, etc) have COPY semantics - they cannot work with references. They can work with pointers, but it gets complicated, so for now you should always use copyable objects in those containers and accept that they will be copied, like it or not. The STL is designed to be efficient and safe with copy semantics.
Hope that helps! :)
PS (EDIT): You can use some new features in BOOST/TR1 (google them), and make a container/array of shared_ptr (reference counting smart pointers) which will give you a similar feel to Java's references and garbage collection. There's a flurry of differences but you'll have to read about it yourself - they are a great feature of the new standard.

You should always store objects when possible; that way, the container will manage the objects' lifetimes for you.
Occasionally, you will need to store pointers; most commonly, pointers to a base class where the objects themselves will be of different types. In that case, you need to be careful to manage the lifetime of the objects yourself; ensuring that they are not destroyed while in the container, but that they are destroyed once they are no longer needed.
Unlike Java, setting a pointer to null does not deallocate the object pointed to; instead, you get a memory leak if there are no more pointers to the object. If the object was created using new, then delete must be called at some point. Your best options here are to store smart pointers (shared_ptr, or perhaps unique_ptr if available), or to use Boost's pointer containers.

You can't store references in a container. You could store (naked) pointers instead, but that's prone to errors and is therefore frowned upon.
Thus, the real choice is between storing objects and smart pointers to objects. Both have their uses. My recommendation would be to go with storing objects by value unless the particular situation demands otherwise. This could happen:
if you need to NULL out the object without removing it from the
container;
if you need to store pointers to the same object in
multiple containers;
if you need to treat elements of the container
polymorphically.
One reason to not do it is to save space, since storing elements by value is likely to be more space-efficient.

To add to the answer of aix:
If you want to store polymorphic objects, you must use smart pointers because the containers make a copy, and for derived types only copy the base part (at least the standard ones, I think boost has some containers which work differently). Therefore you'll lose any polymorphic behaviour (and any derived-class state) of your objects.

C++ containers on classes, returning pointers

I'm having some trouble to find the best way to accomplish what I have in mind due to my inexperience. I have a class where I need to a vector of objects. So my first question will be:
is there any problem having this: vector< AnyType > container* and then on the constructor initialize it with new (and deleting it on the destructor)?
Another question is: if this vector is going to store objects, shouldn't it be more like vector< AnyTipe* > so they could be dynamically created? In that case how would I return an object from a method and how to avoid memory leaks (trying to use only STL)?

Yes, you can do vector<AnyType> *container and new/delete it. Just be careful when you do subscript notation to access its elements; be sure to say (*container)[i], not container[i], or worse, *container[i], which will probably compile and lead to a crash.
When you do a vector<AnyType>, constructors/destructors are called automatically as needed. However, this approach may lead to unwanted object copying if you plan to pass objects around. Although vector<AnyType> lends itself to better syntactic sugar for the most obvious operations, I recommend vector<AnyType*> for non-primitive objects simply because it's more flexible.

is there any problem having this: vector< AnyType > *container and then on the constructor initialize it with new (and deleting it on the destructor)
No there isn't a problem. But based on that, neither is there a need to dynamically allocate the vector.
Simply make the vector a member of the class:
class foo
{
std::vector<AnyType> container;
...
}
The container will be automatically constructed/destructed along with the instance of foo. Since that was your entire description of what you wanted to do, just let the compiler do the work for you.

Don't use new and delete for anything.
Sometimes you have to, but usually you don't, so try to avoid it and see how you get on. It's hard to explain exactly how without a more concrete example, but in particular if you're doing:
SomeType *myobject = new SomeType();
... use myobject for something ...
delete myobject;
return;
Then firstly this code is leak-prone, and secondly it should be replaced with:
SomeType myobject;
... use myobject for something (replacing -> with . etc.) ...
return;
Especially don't create a vector with new - it's almost always wrong because in practice a vector almost always has one well-defined owner. That owner should have a vector variable, not a pointer-to-vector that they have to remember to delete. You wouldn't dynamically allocate an int just to be a loop counter, and you don't dynamically allocate a vector just to hold some values. In C++, all types can behave in many respects like built-in types. The issues are what lifetime you want them to have, and (sometimes) whether it's expensive to pass them by value or otherwise copy them.
shouldn't it be more like vector< AnyTipe* > so they could be dynamically created?
Only if they need to be dynamically created for some other reason, aside from just that you want to organise them in a vector. Until you hit that reason, don't look for one.
In that case how would I return an object from a method and how to avoid memory leaks (trying to use only STL)?
The standard libraries don't really provide the tools to avoid memory leaks in all common cases. If you must manage memory, I promise you that it is less effort to get hold of an implementation of shared_ptr than it is to do it right without one.

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.
void unleashMonkeyFish()
{
MonkeyFish * monkey_fish = new MonkeyFish();
std::string localname = "Wanda";
monkey_fish->setName(localname);
monkey_fish->go();
}
In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.
Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:
std::string name;
std::string & name;
When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:
void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );
I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?
What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?
Basically, I want to avoid these scenarios:
I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
I don't want to copy the string if I can help it.
I would prefer not to new localname
What prototype and data member combination should I use?
CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)
EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.

One way to do this is to have your string
std::string name;
As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed
void setName( const std::string & parameter_name ) {
name = parameter_name;
}
It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).
The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.
Edit:
As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):
void setName(std::string parameter_name) {
name.swap(parameter_name);
}
The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example
obj.setName("Mr. " + things.getName());
If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.
For further explanation, read the excellent article BoostCon09/Rvalue-References

If you use the following method declaration:
void setName( const std::string & parameter_name );
then you would also use the member declaration:
std::string name;
and the assignment in the setName body:
name = parameter_name;
You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).
Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.

Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.
Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.
Can you clarify exactly why you don't want to copy the string?
Edit
An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.
... hope that made some sense :)

This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.
Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.

When the compiler sees ...
std::string localname = "Wanda";
... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.
You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.

As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.
I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.

In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:
class MonkeyFish {
public:
void setName( const std::string & parameter_name ) { name = parameter_name; }
private:
std::string name;
};
This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)

You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).
I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.

If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.
As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.
Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Should I use unique_ptr for a string? - c++

I'm new to C++. I've heard that using unique_ptr / shared_ptr is the "way to go" for references to data allocated on the heap. Does it make sense, therefore, to use unique_ptrs instead of std::strings?

Related

Why moving a std::string landed on a different address? [duplicate]

Does std::move on std::string garantee that .c_str() returns same result?

Array: Storing Objects or References

C++ containers on classes, returning pointers

When is it not a good idea to pass by reference?

Categories

Resources