Coding practices in C++, what is your pick and why? - c++

I have a big object say MyApplicationContext which keeps information about MyApplication such as name, path, loginInformation, description, details and others..
//MyApplicationCtx
class MyApplicationCtx{
// ....
private:
std::string name;
std::string path;
std::string desciption;
struct loginInformation loginInfo;
int appVersion;
std::string appPresident;
//others
}
this is my method cloneApplication() which actually sets up a new application. there are two ways to do it as shown in Code 1 and Code 2. Which one should I prefer and why?
//Code 1
public void cloneApplication(MyApplicationCtx appObj){
setAppName(appObj);
setAppPath(appObj);
setAppAddress(&appObj); // Note this address is passed
setAppDescription(appObj);
setAppLoginInformation(appObj);
setAppVersion(appObj);
setAppPresident(appObj);
}
public void setAppLoginInformation(MyApplicationCtx appObj){
this->loginInfo = appObj.loginInfo; //assume it is correct
}
public void setAppAddress(MyApplicationCtx *appObj){
this->address = appObj->address;
}
.... // same way other setAppXXX(appObj) methods are called.
Q1. Does passing the big object appObj everytime has a performance impact?
Q2. If I pass it using reference, what should be the impact on performance?
public void setAppLoginInformation(MyApplicationCtx &appObj){
this->loginInfo = appObj.loginInfo;
}
//Code 2
public void setUpApplication(MyApplicationCtx appObj){
std::string appName;
appName += appOj.getName();
appName += "myname";
setAppName(appName);
std::string appPath;
appPath += appObj.getPath();
appPath += "myname";
setAppPath(appPath);
std::string appaddress;
appaddress += appObj.getAppAddress();
appaddress += "myname";
setAppAddress(appaddress);
... same way setup the string for description and pass it to function
setAppDescription(appdescription);
struct loginInformation loginInfo = appObj.getLoginInfo();
setAppLoginInformation(loginInfo);
... similarly appVersion
setAppVersion(appVersion);
... similarly appPresident
setAppPresident(appPresident);
}
Q3. Compare code 1 and code 2, which one should I use? Personally i like Code 1

You're better off defining a Copy Constructor and an Assignment Operator:
// Note the use of passing by const reference! This avoids the overhead of copying the object in the function call.
MyApplicationCtx(const MyApplicationCtx& other);
MyApplicationCtx& operator = (const MyApplicationCtx& other);
Better still, also define a private struct in your class that looks like:
struct AppInfo
{
std::string name;
std::string path;
std::string desciption;
struct loginInformation loginInfo;
int appVersion;
std::string appPresident;
};
In your App class' copy constructor and assignment operator you can take advantage of AppInfo's automatically generated assignment operator to do all of the assignment for you. This is assuming you only want a subset of MyApplicationCtx's members copied when you "clone".
This will also automatically be correct if you add or remove members of the AppInfo struct without having to go and change all of your boilerplate.

Short answer:
Q1: Given the size of your MyAppCtx class, yes, a significant performance hit will take place if the data is dealt with very frequently.
Q2: Minimal, you're passing a pointer.
Q3: Neither, for large objects like that you should use reference semantics and access the data through accessors. Don't worry about function call overhead, with optimizations turned on, the compiler can inline them if they meet various criteria (which I leave up to you to find out).
Long answer:
Given functions:
void FuncByValue(MyAppCtx ctx);
void FuncByRef1(MyAppCtx& ctx);
void FuncByRef2(MyAppCtx* ctx);
When passing large objects like your MyApplicationCtx, it's a good idea to use reference semantics (FuncByRef1 & FuncByRef2), passing by reference is identical in performance to passing a pointer, the difference is only the syntax. If you pass the object by value, the object is copy-constructed into the function, such that the argument you pass into FuncByValue is different from the parameter FuncByValue receives. This is where you have to be careful of pointers (if any) contained in an object that was passed by value, because the pointer will have been copied as well, so it's very possible that more than one object will point to one element in memory at a given time, which could lead to memory leaks, corruption, etc.
In general, for objects like your MyAppCtx, I would recommend passing by reference and using accessors as appropriate.
Note, the reason I differentiated between argument and parameter above is that there is a difference between a function argument and a function parameter, it is as follows:
Given (template T is used simply to demonstrate that object type is irrelevent here):
template<typename T>
void MyFunc(T myTobject);
When calling MyFunc, you pass in an argument, eg:
int my_arg = 3;
MyFunc(my_arg);
And MyFunc receives a parameter, eg:
template<typename T>
void MyFunc(T myTobject)
{
T cloned_param = T(myTobject);
}
In other words, my_arg is an argument, myTobject is a parameter.
Another note, in the above examples, there are essentially three versions of my_arg in memory: the original argument, the copy-constructed parameter myTobject, plus cloned_param which was explicitly copied as well.

Luke beat me to tell you about copy constructors, to answer your other questions passing a large object by value has a performance impact when compared to passing by reference, make it a const if the function won't change it as in this case.

General:
Why do you need to copy application object? Isn't it better to use singleton for this (with completely disabled copying by the way)?
Q1:
Not only performance (yes, they will be copied) but memory too. As soon as I saw std::string implementations they at least occupy 2 memory chunks and first is in any case significantly less then minimal allocation size so such objects could cause memory efficiency problem if cloned extensively.
Q2:
Passing reference is barely different (from performance point of view) from passing pointer so this should in general constant complexity. It is much better. Don't forget to add "const" modifier to block modifications.
Q3:
I don't like actually both because of encapsulation broken. Once I saw good Java programmer article called something like "Why setters/getters are evil" (Well, I found it easily, there is not so much based on Java itself). This is VERY useful article to change style forever.

Q1..
Pass by object is heavy operation since it will create copy by invoking copy constructor.
Q2.
Pass reference to constant , it will improve perfomance.

Q1 - yes, every time you pass an object a copy is done
Q2 - minimal since an object passed by reference is basically just a pointer
Q3 - It is generally not a good idea to have large monolithic objects, instead you should split your objects into smaller objects, this allows for better re-usability and makes the code easier to read.

The best practice for cloning an object where all the members are copyable, like "Code 1" appears to be doing, is to use the default copy constructor - you don't have to write any code at all. Just copy like this:
MyApplicationCtx new_app = old_app;
"Code 2" is doing something different to "Code 1", so choosing one over the other is a matter of what you want the code to do, not a matter of style.
Q1. Passing a large object by value will cause it to be copied, which will have an impact on performance.
Q2. Yes, passing a reference to a large structure is more efficient than passing a copy. The only way to tell how large the impact is is to measure it with a profiler.

There is one single point that has not been dealt in any other of the answers (that focused on your explicit questions more than in the general approach). I agree with #luke in that you should use what is idiomatic: copy constructor and assignment operators are there for a reason. But just for the sake of discusion on the first possibility you presented:
In the first block you propose small functions like:
public:
void setAppLoginInformation(MyApplicationCtx appObj){
this->loginInfo = appObj.loginInfo; //assume it is correct
}
Now, besides the fact that the parameter should be passed by const reference, there are some other design issues there. You are offering a public operation that promises to change the login information, but the argument you require from your user is a full blown application context.
If you want to provide a method for just setting one of the attributes, I would change the method signature to match the attribute type:
public:
void setAppLoginInformation( loginInformation const & li ); // struct is optional here
This offers the possibility of changing the loginInformation both with a full application context or just with some specific login information object you can build yourself.
If on the other hand you want to disallow changing particular attributes of your class and you want to allow setting the values only from another application context object, then you should use the assignment operator, and if you want to do it in terms of small setter functions (assuming that the compiler provided assigment operator does not suffice), make them private.
With the proposed design you are offering users the possibility of setting each attribute to any value, but you are doing so in a cumbersome, hard to use way.

Related

function returning - unique_ptr VS passing result as parameter VS returning by value

In c++, what the preferred/recommended way to create an object in a function/method and return it to be used outside the creation function's scope?
In most functional languages, option 3 (and sometimes even option 1) would be preferred, but what's the c++ way of best handling this?
Option 1 (return unique_ptr)
pros: function is pure and does not change input params
cons: is this an unnecessarily complicated solution?
std::unique_ptr<SomeClass> createSometing(){
auto s = std::make_unique<SomeClass>();
return s;
}
Option 2 (pass result as a reference parameter)
pros: simple and does not involve pointers
cons: input parameter is changed (makes function less pure and more unpredictable - the result reference param could be changed anywhere within the function and it could get hard/messy to track in larger functions).
void createSometing(SomeClass& result){
SomeClass s;
result = s;
}
Option 3 (return by value - involves copying)
pros: simple and clear
cons: involves copying an object - which could be expensive. But is this ok?
SomeClass createSometing(){
SomeClass s;
return s;
}
In modern C++, the rule is that the compiler is smarter than the programmer. Said differently the programmer is expected to write code that will be easy to read and maintain. And except when profiling have proven that there is a non acceptable bottleneck, low level concerns should be left to the optimizing compilers.
For that reason and except if profiling has proven that another way is required I would first try option 3 and return a plain object. If the object is moveable, moving an object is generally not too expensive. Furthermore, most compilers are able to fully elide the copy/move operation if they can. If I correctly remember, copy elision is even required starting with C++17 for statements like that:
T foo = functionReturningT();
This is a loaded question, because the matter involves a decision to create the object on the heap vs not creating it on the heap. In C++, it’s ideal to have objects that can be passed around as values cheaply. std::string is a good example of that. It’s generally a premature pessimization to allocate std::string on the heap. On the other hand, the object you may be creating may be large and expensive to copy. In that case, putting it on the heap would be preferable. But that assumes that a copy would have to take place. By default, the copy is eluded! But also: figure out if the type could be made cheaper to copy.
So there’s no “one way suits all”. In my experience, legacy code tends to overuse the heap.
In most cases, returning by value is preferable, since all mainstream compilers will have the function instantiate the object in the storage where it’ll reside, without moves nor copies.
Then, the object can be copy-constructed on the heap by the user of the function, if they so desire, and the compiler will get rid of that copy as well.
Micromanagement of this stuff, without looking at actual generated code, is typically a waste of time, since the code declares intent and not the implementation. Compilers these days literally produce code that has equivalent meaning, taking the C++ source’s semantics, but not necessarily using the source to dictate identical implementation at the machine level.
Thus, in most instances, returning by value is the sensible default, unless the type is borked and doesn’t support that. Unfortunately , some widely used types are in this camp, eg. Qt’s QObject.
TL;DR: Given MyType myFactoryFunction();, the statement auto obj = std::make_unique<MyType>(myFactoryFunction()); will not copy nor move on modern compilers in the release build, if the type is designed well.
There isn't a single right answer and it depends on the situation and personal preference to some extent. Here are pros and cons of different approaches.
Just declare it
SomeClass foo(arg1, arg2);
Factory functions should be relatively uncommon and only needed if the code creating the object doesn't have all the necessary information to create it (or shouldn't, due to encapsulation reasons). Perhaps it's more common in other languages to have factory functions for everything, but instantiating objects directly should be the first pick.
Return by value
SomeClass createSomeClass();
The first question is whether you want the resulting object to live on the stack or the heap. The default for small objects is the stack, since it's more efficient as you skip the call to malloc(). With Return Value Optimization usually there's no copy.
Return by pointer
std::unique_ptr<SomeClass> createSomeClass();
or
SomeClass* createSomeClass();
Reasons you might pick this include being a large object that you want to be heap allocated; the object is created out of some data store and the caller won't own the memory; you want a nullable return type to signal errors.
Out parameter
bool createSomeClass(SomeClass&);
Main benefits of using out parameters are when you have multiple return types. For example, you might want to return true/false for whether the object creation succeeded (e.g. if your object doesn't have a valid "unset" state, like an integer). You might also have a factory function that returns multiple things, e.g.
void createUserAndToken(User& user, Token& token);
In summary, I'd say by default, go with return by value. Do you need to signal failure? Out parameter or pointer. Is it a large object that lives on the heap, or some other data structure and you're giving out a handle? Return by pointer. If you don't strictly need a factory function, just declare it.

Getter for large member variables w/o copying

I have a class containing large member variables. In my case, the large member variable is a container of many objects and it must be private as I don't want to allow a user to modify it directly
class Example {
public:
std::vector<BigObject> get_very_big_object() const { return very_big_object; }
private:
std::vector<BigObject> very_big_object;
}
I want a user to be able to view the object without making a copy:
Example e();
auto very_big_object = e.get_very_big_object(); // Uh oh, made a copy
cout << very_big_object[11]; // Look at any element in the vector etc
I'm a bit confused about the best way to do it. I thought about returning a constant reference, i.e., make my getter:
const std::vector<BigObject>& get_very_big_object() const { return very_big_object; }
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics. But I found that a bit cryptic.
What's the modern best practice for doing this?
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics.
On this point, the article is flat-out wrong. A smart pointer does not remove the "risk".
Quick summary of relevant parts of the article
If a class returns a const reference to a data member, client code may introduce a const_cast and thereby change the data member without going through the class' API.
The article proposes (incorrectly) that the above can be avoided by using a smart pointer. The setup is for the class to maintain a shared pointer to the data, and have the getter return that pointer cast to a shared pointer to const data.
Critique of the points
First of all, this does not work. All one has to do is de-reference the smart pointer to get a const reference to the data, which can then be const_cast as before. Using the author's own example, instead of
std::string &evil = const_cast<std::string&>(obj.someStr());
use
std::string &evil = const_cast<std::string&>(*obj.str_ptr());
to get the same data-changing results when returning a smart pointer. The entire article is not wrong, but it does get several points wrong. This is one of them.
Second of all, this is not your concern. When you return a const reference, you are telling client code that this value is not to be changed. If the client code does so anyway, it's the client code that broke the agreement. Essentially, the client code invoked undefined behavior, so your class is free to do anything, even crash the program.
What's the modern best practice for doing this?
Simply return a const reference. (Most rules have exceptions, but in my experience, this one seems to be on target 95-99.9% of the time.)
What I did when I was working on my BDD-library for school is to create a wrapper class called VeryBigObject, which contacts the singleton upon instantiation and hides a reference-counting pointer, from there you can override the operator->() method to allow for direct access to the class's methods.
So something like this
class VeryBigObject {
private:
vector<BigObject>* obj;
public:
VeryBigObject() {
// find a way to instantiate with a pointer, not by copying
}
VeryBigObject(const VeryBigObject& o) {
// Update reference counts
obj = o.obj;
}
virtual VeryBigObject operator->(const VeryBigObject&); // I don't remember how to do this one, just google it.
... // Do other overloads as you see fit to mask working with the pointer directly.
};
This allows you to create a small portable class that you don't have to worry about copying, but also has access to the larger object easily. You'll still need to worry about things like caching and such though

Is this the right way to return a struct in a parameter?

I made the following method in a C++/CLI project:
void GetSessionData(CDROM_TOC_SESSION_DATA& data)
{
auto state = CDROM_TOC_SESSION_DATA{};
// ...
data = state;
}
Then I use it like this in another method:
CDROM_TOC_SESSION_DATA data;
GetSessionData(data);
// do something with data
It does work, returned data is not garbage, however there's something I don't understand.
Question:
C++ is supposed to clean up state when it has exitted its scope, so data is a copy of state, correct ?
And in what exactly it is different from the following you see on many examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data); // signature should be GetSession(CDROM_TOC_SESSION_DATA *data)
Which one makes more sense to use or is the right way ?
Reference:
CDROM_TOC_SESSION_DATA
Using a reference vs a pointer for an out parameter is really more of a matter of style. Both function equally well, but some people feel that the explicit & when calling a function makes it more clear that the function may modify the parameter it was passed.
i.e.
doAThing(someObject);
// It's not clear that doAThing accepts a reference and
// therefore may modify someObject
vs
doAThing(&someObject);
// It's clear that doAThing accepts a pointer and it's
// therefore possible for it to modify someOjbect
Note that 99% of the time the correct way to return a class/struct type is to just return it. i.e.:
MyType getObject()
{
MyType object{};
// ...
return object;
}
Called as
auto obj = getObject();
In the specific case of CDROM_TOC_SESSION_DATA it likely makes sense to use an out parameter, since the class contains a flexible array member. That means that the parameter is almost certainly a reference/pointer to the beginning of some memory buffer that's larger than sizeof(CDROM_TOC_SESSION_DATA), and so must be handled in a somewhat peculiar way.
C++ is supposed to clean up state when it has exitted its scope, so
data is a copy of state, correct ?
In the first example, the statement
data = state
presumably copies the value of state into local variable data, which is a reference to the same object that is identified by data in the caller's scope (because those are the chosen names -- they don't have to match). I say "presumably" because in principle, an overridden assignment operator could do something else entirely. In any library you would actually want to use, you can assume that the assignment operator does something sensible, but it may be important to know the details, so you should check.
The lifetimes of local variables data and state end when the method exits. They will be cleaned up at that point, and no attempt may be made to access them thereafter. None of that affects the caller's data object.
And in what exactly it is different from the following you see on many
examples:
CDROM_TOC_SESSION_DATA data;
GetSessionData(&data);
Not much. Here the caller passes a pointer instead of a reference. GetSessionData must be declared appropriately for that, and its implementation must explicitly dereference the pointer to access the caller's data object, but the general idea is the same for most intents and purposes. Pointer and reference are similar mechanisms for indirect access.
Which one makes more sense to use or is the right way ?
It depends. Passing a reference is generally a bit more idiomatic in C++, and it has the advantage that the method does not have to worry about receiving a null or invalid pointer. On the other hand, passing a pointer is necessary if the function has C linkage, or if you need to accommodate the possibility of receiving a null pointer.

c++: Excessive copying of large objects

While there is quite a few questions about copy constructors/assignment operators on SO already, I did not find an answer that fit my problem.
I have a class like
class Foo
{
// ...
private:
std::vector<int> vec1;
std::vector<int> vec2;
boost::bimap<unsigned int, unsigned int> bimap;
// And a couple more
};
Now it seems that there is some quite excessive copying going on (based on profile data).. So my question is how to best tackle this?
Should I implement custom copy constructor/assignment operator and use swap? Or should I define my own swap method and use that (where appropriate) instead of assignment?
As I am not a c++ expert, examples that show how to properly handle this situation are greatly appreciated.
UPDATE: It appears I was not terribly clear.. Let me try to explain. The program is basically an on-the-fly breadth-first search program, and for each step taken I need to store metadata about the step (which is the Foo class).. Now the problem is that there is (usually) exponentially steps, so you can imagine a large number of these objects needs to be stored.. I do pass by (const) reference always as far as I know.. Each time I calculate a successor from a node in the graph I need to create and store ONE Foo object (however, some of the data members will be added to this one foo further on in the processing of this successor)..
My profile data shows roughly something like this (I don't have the actual numbers on this machine):
SearchStrategy::Search 13s
FooStore::Save 10s
So you can see I spend nearly as much time saving this meta data as I do searching through the graph.. Oh, and FooStore saves Foo in a google::sparse_hash_map<long long, Foo, boost::hash<long long> >.
Compiler is g++4.4 or g++4.5 (I'm not at my dev. machine, so I cannot check at the moment)..
UPDATE 2 I assign some of the members after construction to a Foo instance like
void SetVec1(const std::vector<int>& vec1) { this->vec1 = vec1; };
I guess tomorrow, I should change this to use the swap method, which should definitely improve this a bit..
I'm sorry if I'm not entirely clear about what semantics I'm trying to achieve, but the reason is that I am not quite sure.
Regards,
Morten
Everything depends on what copying this object means in your case :
it means copying it's whole value
it means the copied object will refer to the same content
If it's 1, then this class seem correct. You're not very clear about the operations that you say does make lot of copies so I'm assuming you try to copy the whole object.
If it's 2, then you need to use something like shared_ptr to share the containers between the objects. Just using shared_ptr instead of real objects as member will implicitely allow the buffers to be refered by both objects (the copy and the copied).
That's the easier way (using boost::shared_ptr or std::shared_ptr if you have a C++0x enabled compiler providing it).
There are harder ways but they will certainly become a problem later.
Of course, and everyone says this, don't optimize prematurely. Don't bother with this until and unless you prove a) that your program goes too slowly, and b) it would go faster if you didn't copy so much data.
If your program design requires you to hold multiple simultaneous copies of the data, there is nothing you can do. You just have to bite the bullet and copy the data. No, implementing a custom copy constructor and custom assignment operator won't make it go faster.
If your program doesn't require multiple simultaneous copies of this data, then you do have a couple of tricks to reduce the number of copies you perform.
Instrument your copy methods If it were me, the first thing I would do, even before trying to improve anything, is to count the number of times my copy methods were
invoked.
class Foo {
private:
static int numberOfConstructors;
static int numberofCopyConstructors;
static int numberofAssignments;
Foo() { ++numberOfConstructors; ...; }
Foo(const Foo& f) : vec1(f.vec1), vec2(f.vec2), bimap(f.bimap) {
++numberOfCopyConstructors;
...;
}
Foo& operator=(const Foo& f) {
++numberOfAssignments;
...;
}
};
Run your program with and without your improvements. Print out the value of those static members to see if your changes had any effect.
Avoid assignments in function calls by using references If you pass objects of type Foo to functions, consider if you can do it by reference. If you don't change the passed copy, passing it by const reference is a no-brainer.
// WAS:
extern SomeFuncton(Foo f);
// EASY change -- if this compiles, you know that it is correct
extern SomeFunction(const Foo& f);
// HARD change -- you have to examine your code to see if this is safe
extern SomeFunction(Foo& f);
Avoid copies by using Foo::swap If you use the copy methods (either explicitly or implicitly) a lot, consider whether the assigned-from item could give up its data, rather than copying it.
// Was:
vectorOfFoo.push_back(myFoo);
// maybe faster:
vectorOfFoo.push_back(Foo());
vectorOfFoo.back().swap(myFoo);
// Was:
newFoo = oldFoo;
// maybe faster
newfoo.swap(oldFoo);
Of course, this only works if myFoo and oldFoo no longer need access to their data. And, you have to implement Foo::swap
void Foo::swap(Foo& old) {
std::swap(this->vec1, old.vec1);
std::swap(this->vec2, old.vec2);
...
}
Whatever you do, measure your program before and after your change. Measure the number of times your copy methods are invoked, and the total time improvement in your program.
Your class doesn't seem that bad, but you do not show how you use it.
If there is lots of copying, then you need to pass objects of those class by reference (or if possible const reference).
If that class has to be copied, then you can not do anything.
If it's really a problem, you might consider implementing the pimpl idiom. But I doubt it's a problem, though I'd have to see your use of the class to be sure.
Copying of huge vectors unlikely can be cheap. The most promising way is to copy rarer. While it's quite easy (may be too easy) in C++ to invoke copy without intention, there are ways to avoid needless copying:
passing by const and non-const reference
move-constructors
smart pointers with ownership transfer
These techniques may leave only copies which are required by algorithm.
Sometimes it's possible to avoid even some of those copying. For example, if you need two objects where the second one is reversed copy of the first one, a wrapper object may be created which acts like reversed, but instead of storing entire copy has only a reference.
The obvious way to reduce copying is to use something like a shared_ptr. With multithreading, however, this cure can be worse than the disease -- incrementing and decrementing reference counts needs to be done atomically, which can be quite expensive. If, however, you typically end up modifying the copies and need each copy to act unique (i.e., modifying a copy doesn't affect the original) you can end up with worse performance still, paying for the atomic increment/decrement for reference counting, and still doing lots of copies anyway.
There are a couple of obvious ways to avoid that. One is to move unique objects instead of copying at all -- this is great if you can make it work. Another is to use non-atomic reference counting most of the time, and do deep copies only when moving data between threads.
There is no one answer that'a universal and really clean though.

When is it not a good idea to pass by reference?

This is a memory allocation issue that I've never really understood.
void unleashMonkeyFish()
{
MonkeyFish * monkey_fish = new MonkeyFish();
std::string localname = "Wanda";
monkey_fish->setName(localname);
monkey_fish->go();
}
In the above code, I've created a MonkeyFish object on the heap, assigned it a name, and then unleashed it upon the world. Let's say that ownership of the allocated memory has been transferred to the MonkeyFish object itself - and only the MonkeyFish itself will decide when to die and delete itself.
Now, when I define the "name" data member inside the MonkeyFish class, I can choose one of the following:
std::string name;
std::string & name;
When I define the prototype for the setName() function inside the MonkeyFish class, I can choose one of the following:
void setName( const std::string & parameter_name );
void setName( const std::string parameter_name );
I want to be able to minimize string copies. In fact, I want to eliminate them entirely if I can. So, it seems like I should pass the parameter by reference...right?
What bugs me is that it seems that my localname variable is going to go out of scope once the unleashMonkeyFish() function completes. Does that mean I'm FORCED to pass the parameter by copy? Or can I pass it by reference and "get away with it" somehow?
Basically, I want to avoid these scenarios:
I don't want to set the MonkeyFish's name, only to have the memory for the localname string go away when the unleashMonkeyFish() function terminates. (This seems like it would be very bad.)
I don't want to copy the string if I can help it.
I would prefer not to new localname
What prototype and data member combination should I use?
CLARIFICATION: Several answers suggested using the static keyword to ensure that the memory is not automatically de-allocated when unleashMonkeyFish() ends. Since the ultimate goal of this application is to unleash N MonkeyFish (all of which must have unique names) this is not a viable option. (And yes, MonkeyFish - being fickle creatures - often change their names, sometime several times in a single day.)
EDIT: Greg Hewgil has pointed out that it is illegal to store the name variable as a reference, since it is not being set in the constructor. I'm leaving the mistake in the question as-is, since I think my mistake (and Greg's correction) might be useful to someone seeing this problem for the first time.
One way to do this is to have your string
std::string name;
As the data-member of your object. And then, in the unleashMonkeyFish function create a string like you did, and pass it by reference like you showed
void setName( const std::string & parameter_name ) {
name = parameter_name;
}
It will do what you want - creating one copy to copy the string into your data-member. It's not like it has to re-allocate a new buffer internally if you assign another string. Probably, assigning a new string just copies a few bytes. std::string has the capability to reserve bytes. So you can call "name.reserve(25);" in your constructor and it will likely not reallocate if you assign something smaller. (i have done tests, and it looks like GCC always reallocates if you assign from another std::string, but not if you assign from a c-string. They say they have a copy-on-write string, which would explain that behavior).
The string you create in the unleashMonkeyFish function will automatically release its allocated resources. That's the key feature of those objects - they manage their own stuff. Classes have a destructor that they use to free allocated resources once objects die, std::string has too. In my opinion, you should not worry about having that std::string local in the function. It will not do anything noticeable to your performance anyway most likely. Some std::string implementations (msvc++ afaik) have a small-buffer optimization: For up to some small limit, they keep characters in an embedded buffer instead of allocating from the heap.
Edit:
As it turns out, there is a better way to do this for classes that have an efficient swap implementation (constant time):
void setName(std::string parameter_name) {
name.swap(parameter_name);
}
The reason that this is better, is that now the caller knows that the argument is being copied. Return value optimization and similar optimizations can now be applied easily by the compiler. Consider this case, for example
obj.setName("Mr. " + things.getName());
If you had the setName take a reference, then the temporary created in the argument would be bound to that reference, and within setName it would be copied, and after it returns, the temporary would be destroyed - which was a throw-away product anyway. This is only suboptimal, because the temporary itself could have been used, instead of its copy. Having the parameter not a reference will make the caller see that the argument is being copied anyway, and make the optimizer's job much more easy - because it wouldn't have to inline the call to see that the argument is copied anyway.
For further explanation, read the excellent article BoostCon09/Rvalue-References
If you use the following method declaration:
void setName( const std::string & parameter_name );
then you would also use the member declaration:
std::string name;
and the assignment in the setName body:
name = parameter_name;
You cannot declare the name member as a reference because you must initialise a reference member in the object constructor (which means you couldn't set it in setName).
Finally, your std::string implementation probably uses reference counted strings anyway, so no copy of the actual string data is being made in the assignment. If you're that concerned about performance, you had better be intimately familiar with the STL implementation you are using.
Just to clarify the terminology, you've created MonkeyFish from the heap (using new) and localname on the stack.
Ok, so storing a reference to an object is perfectly legit, but obviously you must be aware of the scope of that object. Much easier to pass the string by reference, then copy to the class member variable. Unless the string is very large, or your performing this operation a lot (and I mean a lot, a lot) then there's really no need to worry.
Can you clarify exactly why you don't want to copy the string?
Edit
An alternative approach is to create a pool of MonkeyName objects. Each MonkeyName stores a pointer to a string. Then get a new MonkeyName by requesting one from the pool (sets the name on the internal string *). Now pass that into the class by reference and perform a straight pointer swap. Of course, the MonkayName object passed in is changed, but if it goes straight back into the pool, that won't make a difference. The only overhead is then the actual setting of the name when you get the MonkeyName from the pool.
... hope that made some sense :)
This is precisely the problem that reference counting is meant to solve. You could use the Boost shared_ptr<> to reference the string object in a way such that it lives at least as long as every pointer at it.
Personally I never trust it, though, preferring to be explicit about the allocation and lifespan of all my objects. litb's solution is preferable.
When the compiler sees ...
std::string localname = "Wanda";
... it will (barring optimization magic) emit 0x57 0x61 0x6E 0x64 0x61 0x00 [Wanda with the null terminator] and store it somewhere in the the static section of your code. Then it will invoke std::string(const char *) and pass it that address. Since the author of the constructor has no way of knowing the lifetime of the supplied const char *, s/he must make a copy. In MonkeyFish::setName(const std::string &), the compiler will see std::string::operator=(const std::string &), and, if your std::string is implemented with copy-on-write semantics, the compiler will emit code to increment the reference count but make no copy.
You will thus pay for one copy. Do you need even one? Do you know at compile time what the names of the MonkeyFish shall be? Do the MonkeyFish ever change their names to something that is not known at compile time? If all the possible names of MonkeyFish are known at compile time, you can avoid all the copying by using a static table of string literals, and implementing MonkeyFish's data member as a const char *.
As a simple rule of thumb store your data as a copy within a class, and pass and return data by (const) reference, use reference counting pointers wherever possible.
I'm not so concerned about copying a few 1000s bytes of string data, until such time that the profiler says it is a significant cost. OTOH I do care that the data structures that hold several 10s of MBs of data don't get copied.
In your example code, yes, you are forced to copy the string at least once. The cleanest solution is defining your object like this:
class MonkeyFish {
public:
void setName( const std::string & parameter_name ) { name = parameter_name; }
private:
std::string name;
};
This will pass a reference to the local string, which is copied into a permanent string inside the object. Any solutions that involve zero copying are extremely fragile, because you would have to be careful that the string you pass stays alive until after the object is deleted. Better not go there unless it's absolutely necessary, and string copies aren't THAT expensive -- worry about that only when you have to. :-)
You could make the string in unleashMonkeyFish static but I don't think that really helps anything (and could be quite bad depending on how this is implemented).
I've moved "down" from higher-level languages (like C#, Java) and have hit this same issue recently. I assume that often the only choice is to copy the string.
If you use a temporary variable to assign the name (as in your sample code) you will eventually have to copy the string to your MonkeyFish object in order to avoid the temporary string object going end-of-scope on you.
As Andrew Flanagan mentioned, you can avoid the string copy by using a local static variable or a constant.
Assuming that that isn't an option, you can at least minimize the number of string copies to exactly one. Pass the string as a reference pointer to setName(), and then perform the copy inside the setName() function itself. This way, you can be sure that the copy is being performed only once.