C++ Passing a long string to constructor or setter - c++

I have a class with a Glib::ustring member (if you're not familar with it, assume it's std::string) which is expected to contain a long string, i.e. at lest one paragraph, maybe a few more. Maybe even more than 10 paragraphs. The string is planned to be displayed in a GUI, so maybe in the future it will be stored in the text widget's buffer, but for now it's just a string member object of my C++ class.
The question is: how to pass a string to the constructor, and how to pass it to the set_string() setter method. A long string means a big copy, so I though a good solution would be to take an rvalue reference and std::move the argument into the member object. But I also don't want the class interface to be suprising and hard to use/understand. You know, the rule of least surprise.
So I was thinking, what's the expected/common solution in this case?
(for the setter method here's another option: since editing is done in GUI, just let the GUI edit the string directly, and then the only use of the setter method is to completely replace the string programatically, e.g. reset it or undo a recent edit)
class MyClass
{
public:
explicit MyClass (Glib::ustring str);
void set_string (Glib::ustring str);
private:
Glib::ustring str;
}
(I've seen code of existing libraries, e.g. gtkmm, taking strings by const reference, but I also saw SO posts with answers saying pass-by-value to allow optimization)

http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Your function should take the string by value, assuming your string has an efficient move-constructor.
When you expect that a string will be long, the caller calls std::move and passes the value to the setter/constructor. This isn't surprising, because std::move makes it pretty explicit that you are moving the data.
If your system has no more than a modest amount of concurrency, and you rarely modify strings (really, most strings are shared far more than they are modified) shared pointers to immutable strings is actually a pretty useful pattern. (The shared write data is the reference count, so with high levels of concurrency can cause contention)

I'd go for references (not only passing them but also storing a reference). However, a setter has to re-seat the reference, which isn't possible. If you really need to change the string after construction, you have to use pointers (maybe smart pointers).
Depending on the surrounding code you might want to use shared ownership of a string. In this case, I'd use a std::shared_pointer<GLib::ustring>. If you need a setter. (Otherwise, a reference is better.)
Please note that "some paragraphs" aren't very long strings. In user interfaces, a couple of milliseconds of delay, let's say when loading some text file, is totally acceptable. As always: please first profile your code, detect the bottleneck, then optimize if you need it to be faster.

Related

Temporary modifyable adapters

I find it convenient to (and have a lot of code which) wrappers some storage object in an allocation adapter, and then this allocation adapter is used commonly to scope the guaranteed backstore in a managing object for its lifetime, which is normally the lifetime of the function call.
This seems to be using a nonstandard VisualStudio extension, however, and I'm curious as to what is a better paradigm and why..
e.g. a lot of our code still uses CString. One of the features of a CString is the ability to lock its contents so that you can take a non-const pointer into the underlying buffer and manipulate it directly - a'la C library functions.
This makes it very easy to use a CString for its automatic resource management, but still interface to legacy libraries / code which needed a writable character buffer. However, locking the buffer is not itself an RAII operation, so I made an RAII class to perform that function (like taking on a lock):
// replaces each occurrence of any of the given list of characters with a specified replacement character in-place
inline void ReplaceAll(CStringW & str, const wchar_t * chsOld, const wchar_t chNew)
{
ReplaceAll(make_autobuffer(str), chsOld, chNew);
}
The code behind make_autobuffer is rather long-winded, but the idea boils down to returning an object that holds a lock on the underlying string's buffer, and will ask the str to release that buffer lock on its destruction, thus releasing the buffer back to control by the CString.
This is a trivial idea, and the implementation is fairly trivial as well (lots of boiler plate to make it robust wrt wide/narrow strings, and variations that take a fixed buffer size or not, and some debugging helpers, etc.).
But more generally speaking, I have often found it very useful to have a class which takes a non-const reference to an instance of something, and wrappers that instance in some sort of mutable adapter layer, and then releases the underlying entity upon termination.
The question is whether there is a better way to accomplish this than having to create a named variable to do the adaptation:
// replaces each occurrence of any of the given list of characters with a specified replacement character in-place
inline void ReplaceAll(CStringW & str, const wchar_t * chsOld, const wchar_t chNew)
{
auto adapter = make_autobuffer(str);
ReplaceAll(adapter, chsOld, chNew);
}
This works, and doesn't violate the standard. I'm no longer trying to pass a non-const object by ref from a temp object - since I've forced the temp object into being less temporary by naming it.
But... that seems silly. At the end of the day, doing the above doesn't change the meaning as far as I can tell. And if the VisualStudio allowance was non-standard, then is there a better standard way that isn't so silly?
But... that seems silly. At the end of the day, doing the above doesn't change the meaning as far as I can tell.
Yes it is quite different. However, had you written
auto const& adapter = make_autobuffer(str);
ReplaceAll(adapter, chsOld, chNew);
that would be the same as
ReplaceAll(make_autobuffer(str), chsOld, chNew);
(consider what happens if the copy constructor isn't accessible for the return type of make_autobuffer).
WARNING I just realized we know nothing about what make_autobuffer actually returns. Let me state my assumption: I assume you return a RAII wrapper class with implicit conversion to the parameter type expected by the ReplaceAll function (e.g. char const*¹)
Onto the "main question" - this seems pretty vague. I think you're talking about the MSVC non-standard extension to "lifetime extensions of temporaries when bound to non-const references".
I don't see where that comes in in the code you showed, for the simple reason that you show make_autobuffer being called inside the parameter list. As such, the temporary is guaranteed to exist until after that function call returned anyways, no need to bind/name things to achieve that. Not even in standard c++03.
If you wish to extend the lock to beyond the function call, then yes, name the RAII handle.
Similar design points in the standard library (and Boost counterparts):
std::lock_guard needs to be named to keep the lock beyond the end of the containing full expression
std::async returns a future that needs to be kept in a named variable if you want any semblance of actual async execution (otherwise, the destructor of the future still causes immediate blocking execution)
¹ LPCTSTR in your region?
Ultimately what I chose to do to solve this was to change from using Wrapper & to Wrapper (by copy / move).
Passing a non-const wrapper by ref is a clear violation of C++ argument rules. Having to declare it explicitly was annoying. Passing it by value meant that I could take advantage of Cx11's move rules and maintain no copies for a CStringAutoBuffer (and other similar types of temporary wrapper objects), but still create one on the fly without having to explicitly name one locally.
This preserves const correctness as well.
Overall, a happy resolution to this genre of issue. :)

Questions and Verifications on immutable [string] objects c++

I've been doing some reading on immutable strings in general and in c++, here, here, and I think I have a decent understanding of how things work. However I have built a few assumptions that I would just like to run by some people for verification. Some of the assumptions are more general than the title would suggest:
While a const string in c++ is the closest thing to an immutable string in STL, it is only locally immutable and therefore doesn't experience the benefit of being a smaller object. So it has all the trimmings of a standard string object but it can't access all of the member functions. This means that it doesn't create any optimization in the program over non-const? But rather just protects the object from modification? I understand that this is an important attribute but I'm simply looking to know what it means to use this
I'm assuming that an object's member functions exist only once in read-only memory, and how is probably implementation specific, so does a const object have a separate location in memory? Or are the member functions limited in another way? If there are only 'const string' objects and no non-const strings in a code base, does the compiler leave out the inaccessible functions?
I recall hearing that each string literal is stored only once in read-only memory in c++, however I don't find anything on this here. In other words, if I use some string literal multiple times in the same program, each instance references the same location in memory. I'm going to assume no, but would two string objects initialized by the same string literal point to the same string until one is modified?
I apologize if I have included too many disjunct thoughts in the same post, they are all related to me as string representation and just learning how to code better.
As far as I know, std::string cannot assume that the input string is a read-only constant string from your data segment. Therefore, point (3) does not apply. It will most likely allocate a buffer and copy the string in the buffer.
Note that C++ (like C) has a const qualifier for compilation time, it is a good idea to use it for two reasons: (a) it will help you find bugs, a statement such as a = 5; if a is declared const fails to compile; (b) the compile may be able to optimize the code more easily (it may otherwise not be able to figure out that the object is constant.)
However, C++ has a special cast to remove the const-ness of a variable. So our a variable can be cast and assigned a value as in const_cast<int&>(a) = 5;. An std::string can also get its const-ness removed. (Note that C does not have a special cast, but it offers the exact same behavior: * (int *) &a = 5)
Are all class members defined in the final binary?
No. std::string as most of the STL uses templates. Templates are compiled once per unit (your .o object files) and the link will reduce duplicates automatically. So if you look at the size of all the .o files and add them up, the final output will be a lot small.
That also means only the functions that are used in a unit are compiled and saved in the object file. Any other function "disappear". That being said, often function A calls function B, so B will be defined, even if you did not explicitly call it.
On the other hand, because these are templates, very often the functions get inlined. But that is a choice by the compiler, not the language or the STL (although you can use the inline keyword for fun; the compiler has the right to ignore it anyway).
Smaller objects... No, in C++ an object has a very specific size that cannot change. Otherwise the sizeof(something) would vary from place to place and C/C++ would go berserk!
Static strings that are saved in read-only data sections, however, can be optimized. If the linker/compiler are good enough, they will be able to merge the same string in a single location. These are just plan char * or wchar_t *, of course. The Microsoft compiler has been able to do that one for a while now.
Yet, the const on a string does not always force your string to be put in a read-only data section. That will generally depend on your command line option. C++ may have corrected that, but I think C still put everything in a read/write section unless you use the correct command line option. That's something you need to test to make sure (your compiler is likely to do it, but without testing you won't know.)
Finally, although std::string may not use it, C++ offers a quite interesting keyword called mutable. If you heard about it, you would know that a variable member can be marked as mutable and that means even const functions can modify that variable member. There are two main reason for using that keyword: (1) you are writing a multi-thread program and that class has to be multi-thread safe, in that case you mark the mutex as mutable, very practical; (2) you want to have a buffer used to cache a computed value which is costly, that buffer is only initialized when that value is requested to not waste time otherwise, that buffer is made mutable too.
Therefore the "immutable" concept is only really something that you can count on at a higher level. In practice, reality is often quite different. For example, an std::string c_str() function may reallocate the buffer to add the necessary '\0' terminator, yet that function is marked as being a const:
const CharT* c_str() const;
Actually, an implementation is free to allocate a completely different buffer, copy its existing data to that buffer and return that bare pointer. That means internally the std::string could be allocate many buffers to store large strings (instead of using realloc() which can be costly.)
Once thing, though... when you copy string A into string B (B = A;) the string data does not get copied. Instead A and B will share the same data buffer. Once you modify A or B, and only then, the data gets copied. This means calling a function which accepts a string by copy does not waste that much time:
int func(std::string a)
{
...
if(some_test)
{
// deep copy only happens here
a += "?";
}
}
std::string b;
func(b);
The characters of string b do not get copied at the time func() gets called. And if func() never modifies 'a', the string data remains the same all along. This is often referenced as a shallow copy or copy on write.

Would this optimization in the implementation of std::string be allowed?

I was just thinking about the implementation of std::string::substr. It returns a new std::string object, which seems a bit wasteful to me. Why not return an object that refers to the contents of the original string and can be implicitly assigned to a std::string? A kind of lazy evaluation of the actual copying. Such a class could look something like this:
template <class Ch, class Tr, class A>
class string_ref {
public:
// not important yet, but *looks* like basic_string's for the most part
private:
const basic_string<Ch, Tr, A> &s_;
const size_type pos_;
const size_type len_;
};
The public interface of this class would mimic all of the read-only operations of a real std::string, so the usage would be seamless. std::string could then have a new constructor which takes a string_ref so the user would never be the wiser. The moment you try to "store" the result, you end up creating a copy, so no real issues with the reference pointing to data and then having it modified behind its back.
The idea being that code like this:
std::string s1 = "hello world";
std::string s2 = "world";
if(s1.substr(6) == s2) {
std::cout << "match!" << std::endl;
}
would have no more than 2 std::string objects constructed in total. This seems like a useful optimization for code which that performs a lot of string manipulations. Of course, this doesn't just apply to std::string, but to any type which can return a subset of its contents.
As far as I know, no implementations do this.
I suppose the core of the question is:
Given a class that can be implicitly converted to a std::string as needed, would it be conforming to the standard for a library writer to change the prototype of a member's to return type? Or more generally, do the library writers have the leeway to return "proxy objects" instead of regular objects in these types of cases as an optimization?
My gut is that this is not allowed and that the prototypes must match exactly. Given that you cannot overload on return type alone, that would leave no room for library writers to take advantage of these types of situations. Like I said, I think the answer is no, but I figured I'd ask :-).
This idea is copy-on-write, but instead of COW'ing the entire buffer, you keep track of which subset of the buffer is the "real" string. (COW, in its normal form, was (is?) used in some library implementations.)
So you don't need a proxy object or change of interface at all because these details can be made completely internal. Conceptually, you need to keep track of four things: a source buffer, a reference count for the buffer, and the start and end of the string within this buffer.
Anytime an operation modifies the buffer at all, it creates its own copy (from the start and end delimiters), decreases the old buffer's reference count by one, and sets the new buffer's reference count to one. The rest of the reference counting rules are the same: copy and increase count by one, destruct a string and decrease count by one, reach zero and delete, etc.
substr just makes a new string instance, except with the start and end delimiters explicitly specified.
This is a quite well-known optimization that is relatively widely used, called copy-on-write or COW. The basic thing is not even to do with substrings, but with something as simple as
s1 = s2;
Now, the problem with this optimization is that for C++ libraries that are supposed to be used on targets supporting multiple threads, the reference count for the string has to be accessed using atomic operations (or worse, protected with a mutex in case the target platform doesn't supply atomic operations). This is expensive enough that in most cases the simple non-COW string implementation is faster.
See GOTW #43-45:
http://www.gotw.ca/gotw/043.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/045.htm
To make matters worse, libraries that have used COW, such as the GNU C++ library, cannot simply revert to the simple implementation since that would break the ABI. (Although, C++0x to the rescue, as that will require an ABI bump anyway! :) )
Since substr returns std::string, there is no way to return a proxy object, and they can't just change the return type or overload on it (for the reasons you mentioned).
They could do this by making string itself capable of being a sub of another string. This would mean a memory penalty for all usages (to hold an extra string and two size_types). Also, every operation would need to check to see if it has the characters or is a proxy. Perhaps this could be done with an implementation pointer -- the problem is, now we're making a general purpose class slower for a possible edge case.
If you need this, the best way is to create another class, substring, that constructs from a string, pos, and length, and coverts to string. You can't use it as s1.substr(6), but you can do
substring sub(s1, 6);
You would also need to create common operations that take a substring and string to avoid the conversion (since that's the whole point).
Regarding your specific example, this worked for me:
if (&s1[6] == s2) {
std::cout << "match!" << std::endl;
}
That may not answer your question for a general-purpose solution. For that, you'd need sub-string CoW, as #GMan suggests.
What you are talking about is (or was) one of the core features of Java's java.lang.String class (http://fishbowl.pastiche.org/2005/04/27/the_string_memory_gotcha/). In many ways, the designs of Java's String class and C++'s basic_string template are similar, so I would imagine that writing an implementation of the basic_string template utilizing this "substring optimization" is possible.
One thing that you will need to consider is how to write the implementation of the c_str() const member. Depending on the location of a string as a substring of another, it may have to create a new copy. It definitely would have to create a new copy of the internal array if the string for which the c_str was requested is not a trailing substring. I think that this necessitates using the mutable keyword on most, if not all, of the data members of the basic_string implementation, greatly complicating the implementation of other const methods because the compiler is no longer able to assist the programmer with const correctness.
EDIT: Actually, to accommodate c_str() const and data() const, you could use a single mutable field of type const charT*. Initially set to NULL, it could be per-instance, initialized to a pointer to a new charT array whenever c_str() const or data() const are called, and deleted in the basic_string destructor if non-NULL.
If and only if you really need more performance than std::string provides then go ahead and write something that works the way you need it to. I have worked with variants of strings before.
My own preference is to use non-mutable strings rather than copy-on-write, and to use boost::shared_ptr or equivalent but only when the string is actually beyond 16 in length, so the string class also has a private buffer for short strings.
This does mean that the string class might carry a bit of weight.
I also have in my collection list a "slice" class that can look at a "subset" of a class that lives elsewhere as long as the lifetime of the original object is intact. So in your case I could slice the string to see a substring. Of course it would not be null-terminated, nor is there any way of making it such without copying it. And it is not a string class.

Coding practices in C++, what is your pick and why?

I have a big object say MyApplicationContext which keeps information about MyApplication such as name, path, loginInformation, description, details and others..
//MyApplicationCtx
class MyApplicationCtx{
// ....
private:
std::string name;
std::string path;
std::string desciption;
struct loginInformation loginInfo;
int appVersion;
std::string appPresident;
//others
}
this is my method cloneApplication() which actually sets up a new application. there are two ways to do it as shown in Code 1 and Code 2. Which one should I prefer and why?
//Code 1
public void cloneApplication(MyApplicationCtx appObj){
setAppName(appObj);
setAppPath(appObj);
setAppAddress(&appObj); // Note this address is passed
setAppDescription(appObj);
setAppLoginInformation(appObj);
setAppVersion(appObj);
setAppPresident(appObj);
}
public void setAppLoginInformation(MyApplicationCtx appObj){
this->loginInfo = appObj.loginInfo; //assume it is correct
}
public void setAppAddress(MyApplicationCtx *appObj){
this->address = appObj->address;
}
.... // same way other setAppXXX(appObj) methods are called.
Q1. Does passing the big object appObj everytime has a performance impact?
Q2. If I pass it using reference, what should be the impact on performance?
public void setAppLoginInformation(MyApplicationCtx &appObj){
this->loginInfo = appObj.loginInfo;
}
//Code 2
public void setUpApplication(MyApplicationCtx appObj){
std::string appName;
appName += appOj.getName();
appName += "myname";
setAppName(appName);
std::string appPath;
appPath += appObj.getPath();
appPath += "myname";
setAppPath(appPath);
std::string appaddress;
appaddress += appObj.getAppAddress();
appaddress += "myname";
setAppAddress(appaddress);
... same way setup the string for description and pass it to function
setAppDescription(appdescription);
struct loginInformation loginInfo = appObj.getLoginInfo();
setAppLoginInformation(loginInfo);
... similarly appVersion
setAppVersion(appVersion);
... similarly appPresident
setAppPresident(appPresident);
}
Q3. Compare code 1 and code 2, which one should I use? Personally i like Code 1
You're better off defining a Copy Constructor and an Assignment Operator:
// Note the use of passing by const reference! This avoids the overhead of copying the object in the function call.
MyApplicationCtx(const MyApplicationCtx& other);
MyApplicationCtx& operator = (const MyApplicationCtx& other);
Better still, also define a private struct in your class that looks like:
struct AppInfo
{
std::string name;
std::string path;
std::string desciption;
struct loginInformation loginInfo;
int appVersion;
std::string appPresident;
};
In your App class' copy constructor and assignment operator you can take advantage of AppInfo's automatically generated assignment operator to do all of the assignment for you. This is assuming you only want a subset of MyApplicationCtx's members copied when you "clone".
This will also automatically be correct if you add or remove members of the AppInfo struct without having to go and change all of your boilerplate.
Short answer:
Q1: Given the size of your MyAppCtx class, yes, a significant performance hit will take place if the data is dealt with very frequently.
Q2: Minimal, you're passing a pointer.
Q3: Neither, for large objects like that you should use reference semantics and access the data through accessors. Don't worry about function call overhead, with optimizations turned on, the compiler can inline them if they meet various criteria (which I leave up to you to find out).
Long answer:
Given functions:
void FuncByValue(MyAppCtx ctx);
void FuncByRef1(MyAppCtx& ctx);
void FuncByRef2(MyAppCtx* ctx);
When passing large objects like your MyApplicationCtx, it's a good idea to use reference semantics (FuncByRef1 & FuncByRef2), passing by reference is identical in performance to passing a pointer, the difference is only the syntax. If you pass the object by value, the object is copy-constructed into the function, such that the argument you pass into FuncByValue is different from the parameter FuncByValue receives. This is where you have to be careful of pointers (if any) contained in an object that was passed by value, because the pointer will have been copied as well, so it's very possible that more than one object will point to one element in memory at a given time, which could lead to memory leaks, corruption, etc.
In general, for objects like your MyAppCtx, I would recommend passing by reference and using accessors as appropriate.
Note, the reason I differentiated between argument and parameter above is that there is a difference between a function argument and a function parameter, it is as follows:
Given (template T is used simply to demonstrate that object type is irrelevent here):
template<typename T>
void MyFunc(T myTobject);
When calling MyFunc, you pass in an argument, eg:
int my_arg = 3;
MyFunc(my_arg);
And MyFunc receives a parameter, eg:
template<typename T>
void MyFunc(T myTobject)
{
T cloned_param = T(myTobject);
}
In other words, my_arg is an argument, myTobject is a parameter.
Another note, in the above examples, there are essentially three versions of my_arg in memory: the original argument, the copy-constructed parameter myTobject, plus cloned_param which was explicitly copied as well.
Luke beat me to tell you about copy constructors, to answer your other questions passing a large object by value has a performance impact when compared to passing by reference, make it a const if the function won't change it as in this case.
General:
Why do you need to copy application object? Isn't it better to use singleton for this (with completely disabled copying by the way)?
Q1:
Not only performance (yes, they will be copied) but memory too. As soon as I saw std::string implementations they at least occupy 2 memory chunks and first is in any case significantly less then minimal allocation size so such objects could cause memory efficiency problem if cloned extensively.
Q2:
Passing reference is barely different (from performance point of view) from passing pointer so this should in general constant complexity. It is much better. Don't forget to add "const" modifier to block modifications.
Q3:
I don't like actually both because of encapsulation broken. Once I saw good Java programmer article called something like "Why setters/getters are evil" (Well, I found it easily, there is not so much based on Java itself). This is VERY useful article to change style forever.
Q1..
Pass by object is heavy operation since it will create copy by invoking copy constructor.
Q2.
Pass reference to constant , it will improve perfomance.
Q1 - yes, every time you pass an object a copy is done
Q2 - minimal since an object passed by reference is basically just a pointer
Q3 - It is generally not a good idea to have large monolithic objects, instead you should split your objects into smaller objects, this allows for better re-usability and makes the code easier to read.
The best practice for cloning an object where all the members are copyable, like "Code 1" appears to be doing, is to use the default copy constructor - you don't have to write any code at all. Just copy like this:
MyApplicationCtx new_app = old_app;
"Code 2" is doing something different to "Code 1", so choosing one over the other is a matter of what you want the code to do, not a matter of style.
Q1. Passing a large object by value will cause it to be copied, which will have an impact on performance.
Q2. Yes, passing a reference to a large structure is more efficient than passing a copy. The only way to tell how large the impact is is to measure it with a profiler.
There is one single point that has not been dealt in any other of the answers (that focused on your explicit questions more than in the general approach). I agree with #luke in that you should use what is idiomatic: copy constructor and assignment operators are there for a reason. But just for the sake of discusion on the first possibility you presented:
In the first block you propose small functions like:
public:
void setAppLoginInformation(MyApplicationCtx appObj){
this->loginInfo = appObj.loginInfo; //assume it is correct
}
Now, besides the fact that the parameter should be passed by const reference, there are some other design issues there. You are offering a public operation that promises to change the login information, but the argument you require from your user is a full blown application context.
If you want to provide a method for just setting one of the attributes, I would change the method signature to match the attribute type:
public:
void setAppLoginInformation( loginInformation const & li ); // struct is optional here
This offers the possibility of changing the loginInformation both with a full application context or just with some specific login information object you can build yourself.
If on the other hand you want to disallow changing particular attributes of your class and you want to allow setting the values only from another application context object, then you should use the assignment operator, and if you want to do it in terms of small setter functions (assuming that the compiler provided assigment operator does not suffice), make them private.
With the proposed design you are offering users the possibility of setting each attribute to any value, but you are doing so in a cumbersome, hard to use way.

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.
void unleashMonkeyFish()
{
MonkeyFish * monkey_fish = new MonkeyFish();
std::string localname = "Wanda";
monkey_fish->setName(localname);
monkey_fish->go();
}
In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.
Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:
std::string name;
std::string & name;
When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:
void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );
I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?
What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?
Basically, I want to avoid these scenarios:
I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
I don't want to copy the string if I can help it.
I would prefer not to new localname
What prototype and data member combination should I use?
CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)
EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.
One way to do this is to have your string
std::string name;
As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed
void setName( const std::string & parameter_name ) {
name = parameter_name;
}
It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).
The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.
Edit:
As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):
void setName(std::string parameter_name) {
name.swap(parameter_name);
}
The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example
obj.setName("Mr. " + things.getName());
If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.
For further explanation, read the excellent article BoostCon09/Rvalue-References
If you use the following method declaration:
void setName( const std::string & parameter_name );
then you would also use the member declaration:
std::string name;
and the assignment in the setName body:
name = parameter_name;
You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).
Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.
Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.
Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.
Can you clarify exactly why you don't want to copy the string?
Edit
An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.
... hope that made some sense :)
This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.
Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.
When the compiler sees ...
std::string localname = "Wanda";
... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.
You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.
As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.
I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.
In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:
class MonkeyFish {
public:
void setName( const std::string & parameter_name ) { name = parameter_name; }
private:
std::string name;
};
This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)
You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).
I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.
If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.
As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.
Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.