Constructors accepting string reference. Bad idea?

Constructors accepting string reference. Bad idea? - c++

It's considered a bad idea/bad design, have a class with a constructor accepting a reference, like the following?
class Compiler
{
public:
Compiler( const std::string& fileName );
~Compiler();
//etc
private:
const std::string& m_CurrentFileName;
};
or should I use values?
I actually do care about performance.

If you used a value parameter in this case, you would have a reference in the class to a temporary, which would become invalid at some point in the future.
The bad idea here is probably storing a reference as a member in the class. It is almost always simpler and more correct to store a value. And in that case, passing the constructor a const reference is the right thing to do.
And as for performance, you should only care about this where it matters, which you can only find out by profiling your code. You should always write your code firstly for correctness, secondly for clarity and lastly for performance.

It's fine as long as the constructor either just uses it without retaining it, copies it for further use (at which point, using a reference probably doesn't matter), or assumes ownership of it (which is iffy because you're depending on the user to behave correctly and not use the string reference further).
However, in most cases, the string copy probably won't be a bottleneck and should be preferred for bug avoidance reasons. If, later, you can PROVE that it's a bottleneck (using profiling, for instance), you might want to think about fixing it.

If you can guarantee that the string that the reference uses won't go out of scope until the class does, then it is maybe Ok to use (I wouldn't). If you were having issues, you may be better passing the string around with a reference counted smart pointer.
It may be worth, and safer, writing your application so that the class constructor copies the string, then when you have performance issues, profile them. Most of the time it is not this sort of thing that causes the issues, but at the algorithm and data structure level.

While passing the parameter via a const reference is a nice thing (you should do that in most cases), storing it as a const reference is dangerous -- if the object passed ceases to exits, you might get a segfault.
Also remember -- premature optimization is the root of all evil! If you have performance issues after writing working code, use a tool like gprof to find where the bottleneck is. And from experience I can tell that the bottleneck almost always will be in bad design, and not a bad language use.

I agree with other people that you should be more concerned about correctness and robustness over performance (so the member variable should be a copy, not a reference) and that if you're really concerned about performance, you should profile your code.
That said, it's not always clear-cut that passing by const reference is faster. For example, if you pass by value instead and if the argument is an rvalue, the compiler can do copy elision (see http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/) and avoid an extra copy when you save it to the member variable. (This isn't very intuitive, and it's probably not something you'd want to do everywhere, so again: profile!)

If you are writing a compiler, copying the filename once or twice will not be the bottleneck. This is more of a C++ style issue, which I leave to the more C++ savvy people around here.

Your class must be self-contained, and avoid unnecessary dependencies. In your example, your "Compiler" class will depend on the CurrentFileName string for its whole existence. Meaning that if CurrentFileName is destroyed before Compiler, you'll have a problem.
Dependents & Managers
So, I guess is depends on the nature of the dependency between the Dependent class and it Manager class (i.e. the Dependent class depends on the Manager class, or, in your example, the Compiler class depends on the std::string class)...
If the dependency is "soft", then Dependant should make a copy of Manager class.
If the dependency is "strong", then Dependant could have a reference to the Manager class.
Soft vs. Strong
An example of "soft dependency" is when your Dependant needs only a value, not the exact object itself. A string is usually seen as a value.
An example of "strong dependency" is when your Dependant needs to have access to its Manager, or when rhe Dependent has no meaning without a Manager (i.e. if the Manager is destroyed, then all Dependents should have been destroyed before)
Conclusion
Usually, the dependency is soft.
If in doubt, considers it soft. You'll have less problems (this is one of the way to have a pretty C++ segfault without pointer arithmetics), and still have the possibility of optimize it if needed.
About your case: Soft
Do yourself a favor, and make a copy of the value.
Optimization is the root of all evil, and unless your profiler says the copy of the string is a problem, then, make a copy, as below:
class Compiler
{
public:
Compiler( const std::string& fileName ); // a reference
~Compiler();
//etc
private:
const std::string m_CurrentFileName; // a value
};
Compiler::Compiler(const std::string& fileName )
: m_CurrentFileName(fileName) // copy of the filename
{
}
About my case: Strong
Now, you could be in a situation where the Dependent's existence has no meaning without the Manager itself.
I work currently on code where the user creates Notifier objects to subscribe to events, and retrieve them when needed. The Notifier object is attached to a Manager at construction, and cannot be detached from it.
It is a design choice to impose to the library user the Manager outlives the Notifier. This means the following code:
Manager manager ;
Notifier notifier(manager) ; // manager is passed as reference
The Notifier code is quite similar to the code you proposed:
class Notifier
{
public :
Notifier(Manager & manager) : m_manager(manager) {}
private :
Manager & m_manager ;
} ;
If you look closely at the design, the m_manager is used as an immutable pointer to the Manager object. I use the C++ reference to be sure:
The reference is defined at construction, and won't ever be changed
The reference is never supposed to be NULL or invalid
This is part of the contract.

If you are highly concerned about performance then passing by reference is the better approach .
Think about following example to make picture more clear:
class A{
public : A() {}
};
class B : public A{
public : B() {}
};
class MyClass{
B bObj;
public:
MyClass(B b) : bObj(b) { } // constructor and destructor overhead
MyClass(B &b) { }
};

Related

Is using a reference as a return value considered as bad coding style?

I got a question regarding to these two possibilities of setting a value:
Let's say I got a string of a class which I want to change. I am using this:
void setfunc(std::string& st) { this->str = st; }
And another function which is able to do the exact same, but with a string reference instead of a void for setting a value:
std::string& reffunc() { return this->str; }
now if I am going to set a value I can use:
std::string text("mytext");
setfunc(text);
//or
reffunc() = text;
Now my question is if it is considered bad at using the second form of setting the value.
There is no performance difference.

The reason to have getters and setters in the first place is that the class can protect its invariants and is easier to modify.
If you have only setters and getters that return by value, your class has the following freedoms, without breaking API:
Change the internal representation. Maybe the string is stored in a different format that is more appropriate for internal operations. Maybe it isn't stored in the class itself.
Validate the incoming value. Does the string have a maximum or minimum length? A setter can enforce this.
Preserve invariants. Is there a second member of the class that needs to change if the string changes? The setter can perform the change. Maybe the string is a URL and the class caches some kind of information about it. The setter can clear the cache.
If you change the getter to return a const reference, as is sometimes done to save a copy, you lose some freedom of representation. You now need an actual object of the return type that you can reference which lives long enough. You need to add lifetime guarantees to the return value, e.g. promising that the reference is not invalidated until a non-const member is used. You can still return a reference to an object that is not a direct member of the class, but maybe a member of a member (for example, returning a reference to the first name part of an internal name struct), or a dereferenced pointer.
But if you return by non-const reference, almost all bets are off. Since the client can change the value referenced, you can no longer rely on a setter being called and code controlled by the class being executed when the value changes. You cannot constrain the value, and you cannot preserve invariants. Returning by non-const reference makes the class little different from a simple struct with public members.
Which leads us to that last option, simply making the data member public. Compared to a getter returning a non-const reference, the only thing you lose is that the object returned can no longer be an indirect member; it has to be a direct, real member.
On the other side of that equation is performance and code overhead. Every getter and setter is additional code to write, with additional opportunities for errors (ever copy-pasted a getter/setter pair for one member to create the same for another and then forgot to change one of the variables in there?). The compiler will probably inline the accessors, but if it doesn't, there's call overhead. If you return something like a string by value, it has to be copied, which is expensive.
It's a trade-off, but most of the time, it's better to write the accessors because it gives you better encapsulation and flexibility.

We cannot see the definition of the member str.
If it's private, your reffunc() exposes a reference to a private member; you're writing a function to violate your design, so you should reconsider what you're doing.
Moreover, it's a reference, so you have to be sure that the object containing str still exists when you use that reference.
Moreover, you are showing outside implementation details, that could change in the future, changing the interface itself (if str becomes something different, setfunc()'s implementation could be adapted, reffunc()'s signature has to change).
It's not wrong what you wrote, but it could be used in the wrong way. You're reducing the encapsulation. It's a design choice.

It's fine. However, you have to watch out for these pitfalls:
the referenced object is modifiable. When you return a non-const reference, you expose data without protection against modifications. Obvious, but be aware of this anyway!
referenced objects can go out of sope. If the referenced object's lifetime ends, accessing the reference will be undefined behavior. However, they can be used to extend the lifetime of temporaries.

The way you used the reffunc() function is considered bad coding. But (as mentioned in the comments), generally speaking, returning references is not bad coding.
Here's why reffunc() = text; is considered bad coding:
People usually do not expect function calls on the left hand of an assignment, but on the right side. The natural expectation when seeing a function call is that it computes and returns a value (or rvalue, which is expected to be on the right hand side of assignment) and not a reference (or lvalue, which is expected to be on the left hand side of assignment).
So by putting a function call on the left hand side of the assignment, you are making your code more complicated, and therefore, less readable. Keeping in mind that you do not have any other motivations for it (as you say, performance is the same, and it usually is in these situations), good coding recommends that you use a "set" function.
Read the great book "Clean Code" for more issues on clean coding.
As for returning references in functions, which is the title of your question, it is not always bad coding and is sometimes required for having cleaner and briefer code. Specifically many operator overloading features in c++ work properly if you return a reference (see operator[] in std::vector and the assignment operator which usually help the code become more readable and less complex. See the comments).

Should a C++ object always be in a valid state?

Whenever an object is constructed, should the constructor always leave it in an "initialised" state?
For example, if an Image class has two constructors where one takes a file path string, and the other takes no parameters, is it bad practice for the latter to leave the image object in an invalid state?
Assuming the class is written to handle both states.
I ask this because I find it almost necessary to have a default construct in a lot of cases. Especially when an object is a member of a class and you want to initialise it IN the constructor of that class.
EDIT: I am aware of the member initialiser list. I have found a few situations where I would like to construct the object DURING the constructor of the class it is held in, not before. Although, I understand that this could potentially be more dangerous than any other alternative.

It all boils down to the definition of a "valid state": if the methods of your class handle the state when the path is empty, then the state with the empty path is a valid state, and is definitely acceptable.
This may not be optimal from the coding perspective, though, because potentially you might need to add multiple checks for the path to be valid. You can often manage complexity by implementing the State Pattern.
I find it almost necessary to have a default construct in a lot of cases. Especially when an object is a member of a class and you want to initialise it IN the constructor of that class.
You do not need a default constructor in order to initialize an object in the constructor of the class of which it is a member, as long as you construct the dependent in the initialization list.

Your last line:
I ask this because I find it almost necessary to have a default construct in a lot of cases. Especially when an object is a member of a class and you want to initialise it IN the constructor of that class.
Implies that you are not using member initializer lists. You do not need a default constructor in this case. Example:
class Member {
public:
Member(std::string str) { std::cout << str << std::endl; }
};
class Foo {
public:
Foo() : member_("Foo") {}
private:
Member member_;
}
Additionally, your question title and body conflict and the terminology is a bit vague. When constructing, it is usually best to leave the object in a valid and usable state. Sometimes the second aspect (being usable) is less necessary, and many solutions require it. Further, in C++11, moving from an object must leave it in a valid state, but doesn't necessarily (and in many cases shouldn't) leave it in a usable state.
EDIT: To address your concern about doing work in your constructor, consider moving the work to either a static member of the Member class, or a private (static or non-static) function in the owning class:
class Member {
public:
Member(std::string str) { std::cout << str << std::endl; }
};
class Foo {
public:
Foo() : member_(CreateFoo()) {}
private:
Member CreateMember() {
std::string str;
std::cin >> str;
return Member(str);
}
Member member_;
};
One danger of this approach, however, is that the intialization order can be important if you use a non-static member function to do the creation. A static function is much much safer, but you may wish to pass some other pertinent member info. Remember that initialization is done in order of member declaration within the class, NOT initializer list declaration order.

Yes, it should always be valid. However, it is usually not very well defined what makes the object valid. At the very least, the object should be usable in a way without crashing. This, however, does not mean that all operations can be performed on the object, but there should be at least one. In many cases, that's just assignment from another source (e.g. std container iterators) and destruction (this one is mandatory, even after a move). But there more operations the objects supports in any kind of state, the less prone to error it will be.
It is usually a trade-off. If you can get away with objects only having states where all operations are valid, that's certainly great. However, those cases are rare, and if you have to jump through hoops to get there, it's usually easier to just add and document preconditions to some of its functionality. In some cases, you might even split the interface to differentiate between functions that make this trade-off and those that do not. A popular example of this is in std::vector, where you need to have enough elements as a precondition to using operator[]. On the other hand, the at() function will still work, but throw an exception.

First, let us define what exactly a "valid state" is: Its an state where the object could do its work.
For example, if we are writting a class that manages a file and let us to write and read the file, a valid state (following our definition) could be an state where the object is holding a correctly oppened file and its ready to read/write on it.
But consider other situation: Whats the state of a moved rvalue?
File::File( File&& other )
{
_file_handle = other._file_handle;
other._file_handle = nullptr; //Whats this state?
}
Its a state where the file object its not ready to write/read on a file, but is ready to be initialized. That is, is a ready to initialize state.
Now consider an alternative implementation of the above ctor using the copy and swap idiom:
File::File() :
_file_handle{ nullptr }
{}
File::File( File&& other ) : File() //Set the object to a ready to initialice state
{
using std::swap; //Enable ADL
swap( *this , other );
}
Here we use the default ctor to put the object on a ready to initialice state, and just swap the passed rvalue with this object, resulting on exactly the same behaviour as the first implementation.
As we have seen above, one thing is a ready to work state, a state where the object is ready to do whats supposed to do, and other completely different thing is a ready to initialize state: A state where the object is not ready to work, but is ready to be initialized and setted up to work..
My answer to your question is: An object is not alwways in a valid state (Its not allways ready to be used), but if its not ready to be used it should be ready to be initialized and then ready to work.

Normally, yes. I've seen a few good counterexamples but they are so rare.

Algorithmic initialization of instance variables in C++

I have been using Java for a very long time and I have problem getting used to C++ programming styles.
How we can manage scenarios like below:
Instance variables are objects which cannot be created using default constructor. In java constructor parameters can be decided upon in higher level class constructor.
Instance variable is a reference type and we need to run a simple algorithm (condition, calculation,...) in the constructor and then create and assign an object to that reference.
There are possibly similar scenarios in which we need to initiate instance variables in places other than the constructor initialization list. I guess GCC would allow to do that (issues a warning), but VC++ does not seem to allow.
I guess most of these can be done using pointers but I am trying to avoid pointers as much as I can (to minimize run-time crash and also hard to debug problems).

Instance variables are objects which cannot be created using default constructor. In java constructor parameters can be decided upon in higher level class constructor.
class A {
public:
A(int n);
}
class B {
public:
B(int n) : a1(n), a2(n+1) {}
private:
A a1, a2;
}
Instance variable is a reference type and we need to run a simple algorithm (condition, calculation,...) in the constructor and then create and assign an object to that reference.
static int n = 1;
static int m = 2;
class A {
public:
A(bool useN) : ref(useN ? n : m) {}
private:
int &ref;
}
You can hide more complicated computations in (static) helper functions, of course, having ref(f(parameters)) in the initializer list.
If you need to create an object first and then assign it to the reference, where does that object primarily live? A reference, after all, is just someone pointing at someone else saying “that's me, over there.” If your outer object is actually the one owning this object, you don't want a reference. You either want an object or a smart pointer.
A Java reference is probably closest to C++11's std::shared_ptr, one of the smart pointers of the standard library highly recommended for everyday use. In this kind of setting, you might also want to consider std::uniqe_ptr, which has a little less overhead, but comes with limitations. Whether the fact that it requires you to create a proper copy constructor is a problem is a matter of taste – pretty often, the default constructor combined with shared_ptr is not the behavior you want, anyway.
Stay clear of std::auto_ptr, which is only in the language for backwards compatibility – it's tricky to use correctly in a lot of situations.

Should I use virtual 'Initialize()' functions to initialize an object of my class?

I'm currently having a discussion with my teacher about class design and we came to the point of Initialize() functions, which he heavily promotes. Example:
class Foo{
public:
Foo()
{ // acquire light-weight resources only / default initialize
}
virtual void Initialize()
{ // do allocation, acquire heavy-weight resources, load data from disk
}
// optionally provide a Destroy() function
// virtual void Destroy(){ /*...*/ }
};
Everything with optional parameters of course.
Now, he also puts emphasis on extendability and usage in class hierarchies (he's a game developer and his company sells a game engine), with the following arguments (taken verbatim, only translated):
Arguments against constructors:
can't be overridden by derived classes
can't call virtual functions
Arguments for Initialize() functions:
derived class can completely replace initialization code
derived class can do the base class initialization at any time during its own initialization
I have always been taught to do the real initialization directly in the constructor and to not provide such Initialize() functions. That said, I for sure don't have as much experience as he does when it comes to deploying a library / engine, so I thought I'd ask at good ol' SO.
So, what exactly are the arguments for and against such Initialize() functions? Does it depend on the environment where it should be used? If yes, please provide reasonings for library / engine developers or, if you can, even game developer in general.
Edit: I should have mentioned, that such classes will be used as member variables in other classes only, as anything else wouldn't make sense for them. Sorry.

For Initialize: exactly what your teacher says, but in well-designed code you'll probably never need it.
Against: non-standard, may defeat the purpose of a constructor if used spuriously. More importantly: client needs to remember to call Initialize. So, either instances will be in an inconsistent state upon construction, or they need lots of extra bookkeeping to prevent client code from calling anything else:
void Foo::im_a_method()
{
if (!fully_initialized)
throw Unitialized("Foo::im_a_method called before Initialize");
// do actual work
}
The only way to prevent this kind of code is to start using factory functions. So, if you use Initialize in every class, you'll need a factory for every hierarchy.
In other words: don't do this if it's not necessary; always check if the code can be redesigned in terms of standard constructs. And certainly don't add a public Destroy member, that's the destructor's task. Destructors can (and in inheritance situations, must) be virtual anyway.

I"m against 'double initialization' in C++ whatsoever.
Arguments against constructors:
can't be overridden by derived classes
can't call virtual functions
If you have to write such code, it means your design is wrong (e.g. MFC). Design your base class so all the necessary information that can be overridden is passed through the parameters of its constructor, so the derived class can override it like this:
Derived::Derived() : Base(GetSomeParameter())
{
}

This is a terrible, terrible idea. Ask yourself- what's the point of the constructor if you just have to call Initialize() later? If the derived class wants to override the base class, then don't derive.
When the constructor finishes, it should make sense to use the object. If it doesn't, you've done it wrong.

One argument for preferring initialization in the constructor: it makes it easier to ensure that every object has a valid state. Using two-phase initialization, there's a window where the object is ill-formed.
One argument against using the constructor is that the only way of signalling a problem is through throwing an exception; there's no ability to return anything from a constructor.
Another plus for a separate initialization function is that it makes it easier to support multiple constructors with different parameter lists.
As with everything this is really a design decision that should be made with the specific requirements of the problem at hand, rather than making a blanket generalization.

A voice of dissension is in order here.
You might be working in an environment where you have no choice but to separate construction and initialization. Welcome to my world. Don't tell me to find a different environment; I have no choice. The preferred embodiment of the products I create is not in my hands.
Tell me how to initialize some aspects of object B with respect to object C, other aspects with respect to object A; some aspects of object C with respect to object B, other aspects with respect to object A. The next time around the situation may well be reversed. I won't even get into how to initialize object A. The apparently circular initialization dependencies can be resolved, but not by the constructors.
Similar concerns goes for destruction versus shutdown. The object may need to live past shutdown, it may need to be reused for Monte Carlo purposes, and it might need to be restarted from a checkpoint dumped three months ago. Putting all of the deallocation code directly in the destructor is a very bad idea because it leaks.

Forget about the Initialize() function - that is the job of the constructor.
When an object is created, if the construction passed successfully (no exception thrown), the object should be fully initialized.

While I agree with the downsides of doing initialization exclusively in the constructor, I do think that those are actually signs of bad design.
A deriving class should not need to override base class initialization behaviour entirely. This is a design flaw which should be cured, rather than introducing Initialize()-functions as a workaround.

Not calling Initialize may be easy to do accidentally and won't give you a properly constructed object. It also doesn't follow the RAII principle since there are separate steps in constructing/destructing the object: What happens if Initialize fails (how do you deal with the invalid object)?
By forcing default initialization you may end up doing more work than doing initialization in the constructor proper.

Ignoring the RAII implications, which others have adequately covered, a virtual initialization method greatly complicates your design. You can't have any private data, because for the ability to override the initialization routine to be at all useful, the derived object needs access to it. So now the class's invariants are required to be maintained not only by the class, but by every class that inherits from it. Avoiding that sort of burden is part of the point behind inheritance in the first place, and the reason constructors work the way they do with regard to subobject creation.

Others have argued at length against the use of Initialize, I myself see one use: laziness.
For example:
File file("/tmp/xxx");
foo(file);
Now, if foo never uses file (after all), then it's completely unnecessary to try and read it (and would indeed be a waste of resources).
In this situation, I support Lazy Initialization, however it should not rely on the client calling the function, but rather each member function should check if it is necessary to initialize or not. In this example name() does not require it, but encoding() does.

Only use initialize function if you don't have the data available at point of creation.
For example, you're dynamically building a model of data, and the data that determines the object hierarchy must be consumed before the data that describes object parameters.

If you use it, then you should make the constructor private and use factory methods instead that call the initialize() method for you. For example:
class MyClass
{
public:
static std::unique_ptr<MyClass> Create()
{
std::unique_ptr<MyClass> result(new MyClass);
result->initialize();
return result;
}
private:
MyClass();
void initialize();
};
That said, initializer methods are not very elegant, but they can be useful for the exact reasons your teacher said. I would not consider them 'wrong' per se. If your design is good then you probably will never need them. However, real-life code sometimes forces you to make compromises.

Some members simply must have values at construction (e.g. references, const values, objects designed for RAII without default constructors)... they can't be constructed in the initialise() function, and some can't be reassigned then.
So, in general it's not a choice of constructor vs. initialise(), it's a question of whether you'll end up having code split between the two.
Of bases and members that could be initialised later, for the derived class to do it implies they're not private; if you go so far as to make bases/members non-private for the sake of delaying initialisaton you break encapsulation - one of the core principles of OOP. Breaking encapsulation prevents base class developer(s) from reasoning about the invariants the class should protect; they can't develop their code without risking breaking derived classes - which they might not have visibility into.
Other times it's possible but sometimes inefficient if you must default construct a base or member with a value you'll never use, then assign it a different value soon after. The optimiser may help - particularly if both functions are inlined and called in quick succession - but may not.
[constructors] can't be overridden by derived classes
...so you can actually rely on them doing what the base class needs...
[constructors] can't call virtual functions
The CRTP allows derived classes to inject functionality - that's typically a better option than a separate initialise() routine, being faster.
Arguments for Initialize() functions:
derived class can completely replace initialization code
I'd say that's an argument against, as above.
derived class can do the base class initialization at any time during its own initialization
That's flexible but risky - if the base class isn't initialised the derived class could easily end up (due to oversight during the evolution of the code) calling something that relies on that base being initialised and consequently fails at run time.
More generally, there's the question of reliable invocation, usage and error handling. With initialise, client code has to remember to call it with failures evident at runtime not compile time. Issues may be reported using return types instead of exceptions or state, which can sometimes be better.
If initialise() needs to be called to set say a pointer to nullptr or a value safe for the destructor to delete, but some other data member or code throws first, all hell breaks loose.
initialise() also forces the entire class to be non-const in the client code, even if the client just wants to create an initial state and ensure it won't be further modified - basically you've thrown const-correctness out the window.
Code doing things like p_x = new X(values, for, initialisation);, f(X(values, for initialisation), v.push_back(X(values, for initialisation)) won't be possible - forcing verbose and clumsy alternatives.
If a destroy() function is also used, many of the above problems are exacerbated.

Returning a const reference to an object instead of a copy

Whilst refactoring some code I came across some getter methods that returns a std::string. Something like this for example:
class foo
{
private:
std::string name_;
public:
std::string name()
{
return name_;
}
};
Surely the getter would be better returning a const std::string&? The current method is returning a copy which isn't as efficient. Would returning a const reference instead cause any problems?

The only way this can cause a problem is if the caller stores the reference, rather than copy the string, and tries to use it after the object is destroyed. Like this:
foo *pFoo = new foo;
const std::string &myName = pFoo->getName();
delete pFoo;
cout << myName; // error! dangling reference
However, since your existing function returns a copy, then you would
not break any of the existing code.
Edit: Modern C++ (i. e. C++11 and up) supports Return Value Optimization, so returning things by value is no longer frowned upon. One should still be mindful of returning extremely large objects by value, but in most cases it should be ok.

Actually, another issue specifically with returning a string not by reference, is the fact that std::string provides access via pointer to an internal const char* via the c_str() method. This has caused me many hours of debugging headache. For instance, let's say I want to get the name from foo, and pass it to JNI to be used to construct a jstring to pass into Java later on, and that name() is returning a copy and not a reference. I might write something like this:
foo myFoo = getFoo(); // Get the foo from somewhere.
const char* fooCName = foo.name().c_str(); // Woops! foo.name() creates a temporary that's destructed as soon as this line executes!
jniEnv->NewStringUTF(fooCName); // No good, fooCName was released when the temporary was deleted.
If your caller is going to be doing this kind of thing, it might be better to use some type of smart pointer, or a const reference, or at the very least have a nasty warning comment header over your foo.name() method. I mention JNI because former Java coders might be particularly vulnerable to this type of method chaining that may seem otherwise harmless.

One problem for the const reference return would be if the user coded something like:
const std::string & str = myObject.getSomeString() ;
With a std::string return, the temporary object would remain alive and attached to str until str goes out of scope.
But what happens with a const std::string &? My guess is that we would have a const reference to an object that could die when its parent object deallocates it:
MyObject * myObject = new MyObject("My String") ;
const std::string & str = myObject->getSomeString() ;
delete myObject ;
// Use str... which references a destroyed object.
So my preference goes to the const reference return (because, anyway, I'm just more confortable with sending a reference than hoping the compiler will optimize the extra temporary), as long as the following contract is respected: "if you want it beyond my object's existence, they copy it before my object's destruction"

Some implementations of std::string share memory with copy-on-write semantics, so return-by-value can be almost as efficient as return-by-reference and you don't have to worry about the lifetime issues (the runtime does it for you).
If you're worried about performance, then benchmark it (<= can't stress that enough) !!! Try both approaches and measure the gain (or lack thereof). If one is better and you really care, then use it. If not, then prefer by-value for the protection it offers agains lifetime issues mentioned by other people.
You know what they say about making assumptions...

Okay, so the differences between returning a copy and returning the reference are:
Performance: Returning the reference may or may not be faster; it depends on how std::string is implemented by your compiler implementation (as others have pointed out). But even if you return the reference the assignment after the function call usually involves a copy, as in std::string name = obj.name();
Safety: Returning the reference may or may not cause problems (dangling reference). If the users of your function don't know what they are doing, storing the reference as reference and using it after the providing object goes out of scope then there's a problem.
If you want it fast and safe use boost::shared_ptr. Your object can internally store the string as shared_ptr and return a shared_ptr. That way, there will be no copying of the object going and and it's always safe (unless your users pull out the raw pointer with get() and do stuff with it after your object goes out of scope).

I'd change it to return const std::string&. The caller will probably make a copy of the result anyway if you don't change all the calling code, but it won't introduce any problems.
One potential wrinkle arises if you have multiple threads calling name(). If you return a reference, but then later change the underlying value, then the caller's value will change. But the existing code doesn't look thread-safe anyway.
Take a look at Dima's answer for a related potential-but-unlikely problem.

It is conceivable that you could break something if the caller really wanted a copy, because they were about to alter the original and wanted to preserve a copy of it. However it is far more likely that it should, indeed, just be returning a const reference.
The easiest thing to do is try it and then test it to see if it still works, provided that you have some sort of test you can run. If not, I'd focus on writing the test first, before continuing with refactoring.

Odds are pretty good that typical usage of that function won't break if you change to a const reference.
If all of the code calling that function is under your control, just make the change and see if the compiler complains.

Does it matter? As soon as you use a modern optimizing compiler, functions that return by value will not involve a copy unless they are semantically required to.
See the C++ lite FAQ on this.

Depends what you need to do. Maybe you want to all the caller to change the returned value without changing the class. If you return the const reference that won't fly.
Of course, the next argument is that the caller could then make their own copy. But if you know how the function will be used and know that happens anyway, then maybe doing this saves you a step later in code.

I normally return const& unless I can't. QBziZ gives an example of where this is the case. Of course QBziZ also claims that std::string has copy-on-write semantics which is rarely true today since COW involves a lot of overhead in a multi-threaded environment. By returning const & you put the onus on the caller to do the right thing with the string on their end. But since you are dealing with code that is already in use you probably shouldn't change it unless profiling shows that the copying of this string is causing massive performance problems. Then if you decide to change it you will need to test thouroughly to make sure you didn't break anything. Hopefully the other developers you work with don't do sketchy stuff like in Dima's answer.

Returning a reference to a member exposes the implementation of the class.
That's could prevent to change the class. May be usefull for private or protected methods incase the optimization is needed.
What should a C++ getter return

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Constructors accepting string reference. Bad idea? - c++

If you are writing a compiler, copying the filename once or twice will not be the bottleneck. This is more of a C++ style issue, which I leave to the more C++ savvy people around here.

Related

Is using a reference as a return value considered as bad coding style?

Should a C++ object always be in a valid state?

Algorithmic initialization of instance variables in C++

Should I use virtual 'Initialize()' functions to initialize an object of my class?

Returning a const reference to an object instead of a copy

Categories

Resources