This is probably a simple question, but this came across my mind. It is regarding the difference between the two functions below:
T func_one(T obj) { //for the purpose of this question,
return obj + obj; //T is a large object and has an overloaded '+' operator
}
T func_two(T obj) {
T output = obj + obj;
return output;
}
In func_one(), rather than creating an object T, assigning it a value and then returning the object, I just return the value itself without creating a new object. If T was a large object, would func_one() be more efficient than func_two() or does func_one() make an object T anyways when returning the sum of the two objects?
The compiler would optimize away fund_two into something similar to func_one which would then be optimized to something else, long story short, you need not to worry about this, unless you really do need to worry about this, then in that case you can look at the asm output.
Short answer: We can't know
Long answer: it depends highly on how T works and your compilers support for return value optimization.
Any function which returns by value can have RVO or NRVO optimization applied to it.
This means that it will construct the return value directly into the calling function, eliminating the copy constructor. As this is the problem with returning large objects by value, this will mean a substantial gain in performance.
The difference between func_one and func_two is that func_one returns an anonymous temporary value, an r-value; this means RVO can trivially be used. func_two returns a named value, an l-value, so NRVO, a much harder optimization, will be used. However, func_two is trivial, so it will almost certainly have NRVO applied, and both functions will be basically identical.
This is assuming you have a modern or even semi-modern compiler; if not, it will depend highly on how you implemented T.
If T has move semantics, your compiler will instead be able to move rather than copy. This should apply to both functions, as temporaries exist in both; however, as func_two returns a named value, it may not be capable of using move semantics. It's up to the compiler, and if the compiler isn't doing RVO or NRVO, I doubt it's doing move.
Finally, it depends on how + operator and = operator are implemented. If, for example, they were implemented as expression templates, then fun_two still requires an assignment, which will slow it down, where as func_one will simply return a highly optimized temporary.
In Summary
In almost all practical contexts, these are identical. In the vanishingly small window where your compiler is acting very strange, func_one is almost universally faster.
Modern compilers can transform the version with the extra variable to the one without (named return value optimization, this is quite a frequent source of questions here on SO, Why isn't the copy-constructor called when returning LOCAL variable for example). So this is not the overhead you should worry about.
The overhead you should worry about, is the function call overhead. An addition takes a modern CPU at most a single cycle. A function call takes between 10 and 20 cycles, depending on the amount of arguments.
I am a bit unsure what you mean with T in your question (is it a template parameter? is it a class? is it a placeholder for a type that you didn't want to disclose in your question?). However, the question whether you have a function call overhead problem depends on that type. And it depends on whether your compiler can inline your function.
Obviously, if it's inlined, you're fine, there's no function call overhead.
If T is a complex type with an expensive operator+() overload, then you are fine as well.
However, if T is int, for instance, and your function is not inlined, then you have roughly 90% overhead in your function.
Related
When returning a container, I always had to determine if I should use return value or use output parameter. If the performance matter, I chose the second option, otherwise I always chose the first option because it is more intuitive.
Frankly, I personally have been strongly objected about output parameters, possibly because of my mathematical background, but it was kind of OK to use them when I had no other options.
However, the things have been completely changed when it comes to generic programming. There are situations I encountered where a function may not know whether or not the object it returns is a huge container or just a simple value.
Consistently using output parameters may be the solution that I want to know if I can avoid. It is just so awkward if I have to do
int a;
f(a, other_arguments);
compared to
auto a = f(other_arguments);
Furthermore, sometimes the return type of f() has no default constructor. If output parameters are used, there is no graceful way to deal with that case.
I wonder if it is possible to return a "modifier object", a functor taking output parameters to modify them appropriately. (Perhaps this is a kind of lazy evaluation?) Well, returning such objects is not a problem, but the problem is I can't insert an appropriate overload of the assignment operator (or constructor) that takes such an object and triggers it to do its job, when the return type belongs to a library that I can't touch, e.g., std::vector. Of course, conversion operators are not helpful as they have no access to existing resources prepared for the target object.
Some people might ask why not use assign(); define a "generator object" which has begin() & end(), and pass those iterators to std::vector::assign. It is not a solution. For the first reason, the "generator object" does not have the full access to the target object and this may limit what could be done. For the second and more important reason, the call site of my function f() may also be a template which does not know the exact return type of f(), so it cannot determine which of the assignment operator or the assign() member function should be used.
I think that the "modifier object" approach to modify containers should have been already discussed in the past as it is not at all a new idea.
To sum up,
Is it possible to use return values to simulate what would happen when output parameters are used instead, in particular, when outputs are containers?
If not, was adding those supports to the standard discussed before? If it was, what were the issues? Is it a terrible idea?
Edit
The code example I've put above is misleading. The function f() may be used to initialize a local variable, but it may be also used to modify existing variables defined elsewhere. For the first case, as Rakete1111 mentioned, there is no problem with return by value as copy elision comes into play. But for the second case, there may be unnecessary resource releasing/acquiring.
I don't think your "modifier object" was ever proposed (AFAIK). And it will never go into the standard. Why? Because we already have a way to get rid of the expensive copy, and that is return by value (plus compiler optimizations).
Before C++17, compilers were allowed to do basically almost the same thing. This optimization is known as (N)RVO, which optimizes away the copy when returning a (named) temporary from a function.
auto a = f(other_arguments);
Will not return a temporary, then copy it into a. The compiler will optimize the copy away entirely, it is not needed. In theory, you cannot assume that your compiler supports this, but the three major ones do (clang, gcc, MSVC) so no need to worry - I don't know about ICC and the others, so I can't say.
So, as there is no copy (or move) involved, there is no performance penalty of using return values instead of output parameters (most probably, if for some reason your compiler doesn't support it, you'll get a move most of the time). You should always use return parameters if possible, and only use output parameters or some other technique if you measure that you get significantly better performance otherwise.
(Edited, based on comments)
You are right you should avoid output parameters if possible, because the code using them is harder to read and to debug.
Since C++11 we have a feature called move constructors (see reference). You can use std::move on all primitive and STL containers types. It is efficient (question: Efficiency difference between copy and move constructor), because you don't actually copy the values of the variables. Only the pointers are swapped. For your own complex types, you can write your own move constructor.
On the other hand, the only thing you can do is to return a reference to a temporary, which is of undefined behavior, for example:
#include <iostream>
int &&add(int initial, int howMany) {
return std::move(initial + howMany);
}
int main() {
std::cout << add(5, 2) << std::endl;
return 0;
}
Its output could be:
7
You could avoid the problem of temporary variables using global or static ones, but it is bad either. #Rakete is right, there is no good way of achieving it.
One commonly known compiler optimisation is is the so-called return value optimisation. This optimisation basically allows the compiler to not copy a local variable that is being returned from a function, but instead moving it.
However, I was wondering if the same is also possible for passing arguments to a function by value if it is known that the return value of the function will overwrite the original argument.
Here is an example. Let's assume we have the following function:
std::vector<Foo> modify(std::vector<Foo> data) {
/* Do some funny things to data */
return data;
}
This function is then used in the following way:
std::vector<Foo> bigData = /* big data */;
bigData = modify(bigData); // Here copying the data into the function could be omitted
Now, in this case it can be clearly determined that the return value of the function call will override the argument that is passed into the function per value. My question is whether current compilers are able to optimise this code in a way so that the argument data is not copied when passed to the function, or if this might even be a part of the so-called return value optimisation.
Update
Let's take C++11 into account. I wonder if the following understanding is correct: If the value passed to a function parameter by value is an r-value, and the type of the parameter has a move-constructor, the move constructor will be used instead of the copy constructor.
For example:
std::vector<Foo> bigData = /* big data */;
bigData = modify(std::move(bigData));
If this is assumption is correct, this eliminates the copy operation when passing the value. From the answers already given it seems that the optimisation I referred to earlier is not commonly undertaken. Looking at this manual approach I don't really understand why, as appears to be pretty straightforward to apply.
It's hard to say for sure because in principle compilers can optimize many things, as long as they are certain it has the same behavior. However, in my experience, this optimization will not occur without inlining. Consider the following code:
__attribute__((noinline)) std::vector<double> modify(std::vector<double> data) {
std::sort(data.begin(), data.end());
return data;
}
std::vector<double> blah(std::vector<double> v) {
v = modify(v);
return v;
}
You can look at the assembly generated for this for various compilers; here I have clang 4.0 with O3 optimization: https://godbolt.org/g/xa2Dhf. If you look at the assembly carefully, you'll see a call to operator new in blah. This proves that blah is indeed performing a copy in order to call modify.
Of course, if inlining occurs, it should be pretty trivial for the compiler to remove the copy.
In C++11 the compiler could determine that bigData is reassigned after use in the function and pass it as rvalue, but there is no guarantee for that, unlike for the RVO (from c++17).
For std::vector at least you can make sure this happens by calling the function as modify(std::move(bigData)), which will construct the value in modify from the rvalue reference, which it cannot optimize with the RVO afaik, because it is the function parameter, which is explicitly excluded from this optimization (3rd point here). However the compiler should understand that the return value is an r-value, and move it into big-data again.
Whether some compilers elide a move from an object into a function and out of the function back into the object I don't know for sure, but I know nothing that explicitly allows it, and since the move-constructor could have observable side-effects, that probably means, that it is not allowed (cf. the Notes section in above link).
That is really compiler specific and depends on how you perform operations(whether we are modifying the data or not) with the data. Mostly you shouldn't expect the compiler to do such kind of optimizations unless you really benchmark it. I did some tests with VS2012 compiler that performs copy operations though we don't modify it.
Please have a look at this post(Does the compiler optimize the function parameters passed by value?), that may give you a better idea I hope.
In my current project I need to implement quite a few functions/methods that take some parameters and generate a collection of results (rather large). So in order to return this collection without copying, I can either create a new collection and return a smart pointer:
boost::shared_ptr<std::vector<Stuff> > generate();
or take a reference to a vector which will be populated:
void generate(std::vector<Stuff> &output);
Both approaches have benefits. The first clearly shows that the vector is the output of the function, it is trivial to use in a parallelized scenario, etc. The second might be more efficient when called in a loop (because we don't allocate memory every time), but then it is not that obvious that the parameter is the output, and someone needs to clean the old data from the vector...
Which would be more customary in real life (i.e. what is the best practise)? In C#/java I would argue that the first one, what is the case in C++?
Also, is it possible to effectively return a vector by value using C++11? What would the pitfalls be?
do correctness first, then optimize if necessary
with both move semantics and Return Value Optimization conspiring to make an ordinary function result non-copying, you would probably have to work at it to make it sufficiently inefficient to be worth optimization work
so, just return the collection as a function result, then MEASURE if you feel that it's too slow
You should return by value.
is it possible to effectively return a vector by value using C++11?
Yes, C++11 supports move semantics. You return a value, but the compiler knows it's a temporary, and therefore can invoke a special constructor (move constructor) that is especially designed to simply "steal the guts" of the returned object. After all, you won't use that temporary object anymore, so why copying it when you can just move its content?
Apart from this, it may be worth mentioning that most C++ compilers, even pre-C++11, implement (Named) Return Value Optimization, which would elide the copy anyway, incurring in no overhead. Thus, you may want to actually measure the performance penalty you (possibly) get before optimizing.
I think you should pass by reference, or return a shared pointer, only when you need reference semantics. This does not seem to be your case.
There is an alternative approach. If you can make your functions template, make them take an output iterator (whose type is a template argument) as argument:
tempalte<class OutputIterator>
void your_algorithm(OutputIterator out) {
for(/*condition*/) {
++out = /* calculation */;
}
}
This has the advantage that the caller can decide in what kind of collection he wants to store the result (the output iterator could for instance write directly to a file, or store the result in a std::vector, or filter it, etc.).
The best practise will probably be surprising to you. I would recommend returning by value in both C++03 and C++11.
In C++03, if you create a std::vector local to generate and return it, the copy may be elided by the compiler (and almost certainly will be). See C++03 §12.8/15:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object with the same cv-unqualified type as the function return type, the copy operation can be omitted by constructing the automatic object directly into the function's return value
In C++11, if you create a std::vector local to generate and return it, the copy will first be considered as a move first (which will already be very fast) and then that may be elided (and almost certainly will be). See C++11 §12.8/31:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
And §12.8/32:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
So return by value!
Believe it or not, I'm going to suggest that instead of either of those approaches just take the obvious implementation and return by value! Compilers are very often able to optimize away the notional copy that would be induced, removing it completely. By writing the code in the most obvious manner you make it very clear to future maintainers what the intent is.
But let's say you try return by value and your program runs too slow and let's further suppose that your profiler shows that the return by value is in fact your bottleneck. In this case I would allocate the container on the heap and return as an auto_ptr in C++03 or a unique_ptr in C++11 to clearly indicate that ownership is being transferred and that the generate isn't keeping a copy of that shared_ptr for its own purposes later.
Finally, the series at http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ provides a great perspective on almost the exact same question.
Which of the following examples is the better way of declaring the following function and why?
void myFunction (const int &myArgument);
or
void myFunction (int myArgument);
Use const T & arg if sizeof(T)>sizeof(void*) and use T arg if sizeof(T) <= sizeof(void*)
They do different things. const T& makes the function take a reference to the variable. On the other hand, T arg will call the copy constructor of the object and passes the copy.
If the copy constructor is not accessible (e.g. it's private), T arg won't work:
class Demo {
public: Demo() {}
private: Demo(const Demo& t) { }
};
void foo(Demo t) { }
int main() {
Demo t;
foo(t); // error: cannot copy `t`.
return 0;
}
For small values like primitive types (where all matters is the contents of the object, not the actual referential identity; say, it's not a handle or something), T arg is generally preferred. For large objects and objects that you can't copy and/or preserving referential identity is important (regardless of the size), passing the reference is preferred.
Another advantage of T arg is that since it's a copy, the callee cannot maliciously alter the original value. It can freely mutate the variable like any local variables to do its work.
Taken from Move constructors. I like the easy rules
If the function intends to change the argument as a side effect, take it by reference/pointer to a non-const object. Example:
void Transmogrify(Widget& toChange);
void Increment(int* pToBump);
If the function doesn't modify its argument and the argument is of primitive type, take it by value. Example:
double Cube(double value);
Otherwise
3.1. If the function always makes a copy of its argument inside, take it by value.
3.2. If the function never makes a copy of its argument, take it by reference to const.
3.3. Added by me: If the function sometimes makes a copy, then decide on gut feeling: If the copy is done almost always, then take by value. If the copy is done half of the time, go the safe way and take by reference to const.
In your case, you should take the int by value, because you don't intend to modify the argument, and the argument is of primitive type. I think of "primitive type" as either a non-class type or a type without a user defined copy constructor and where sizeof(T) is only a couple of bytes.
There's a popular advice that states that the method of passing ("by value" vs "by const reference") should be chosen depending in the actual size of the type you are going to pass. Even in this discussion you have an answer labeled as "correct" that suggests exactly that.
In reality, basing your decision on the size of the type is not only incorrect, this is a major and rather blatant design error, revealing a serious lack of intuition/understanding of good programming practices.
Decisions based on the actual implementation-dependent physical sizes of the objects must be left to the compiler as often as possible. Trying to "tailor" your code to these sizes by hard-coding the passing method is a completely counterproductive waste of effort in 99 cases out of 100. (Yes, it is true, that in case of C++ language, the compiler doesn't have enough freedom to use these methods interchangeably - they are not really interchangeable in C++ in general case. Although, if necessary, a proper size-based [semi-]automatic passing methios selection might be implemented through template metaprogramming; but that's a different story).
The much more meaningful criterion for selecting the passing method when you write the code "by hand" might sound as follows:
Prefer to pass "by value" when you are passing an atomic, unitary, indivisible entity, such as a single non-aggregate value of any type - a number, a pointer, an iterator. Note that, for example, iterators are unitary values at the logical level. So, prefer to pass iterators by value, regardless of whether their actual size is greater than sizeof(void*). (STL implementation does exactly that, BTW).
Prefer to pass "by const reference" when you are passing an aggregate, compound value of any kind. i.e. a value that has exposed pronouncedly "compound" nature at the logical level, even if its size is no greater than sizeof(void*).
The separation between the two is not always clear, but that how things always are with all such recommendations. Moreover, the separation into "atomic" and "compound" entities might depend on the specifics of your design, so the decision might actually differ from one design to the other.
Note, that this rule might produce decisions different from those of the allegedly "correct" size-based method mentioned in this discussion.
As an example, it is interesing to observe, that the size-based method will suggest you manually hard-code different passing methods for different kinds of iterators, depending on their physical size. This makes is especially obvious how bogus the size-based method is.
Once again, one of the basic principles from which good programming practices derive, is to avoid basing your decisions on physical characteristics of the platform (as much as possible). Instead, you decisions have to be based on the logical and conceptual properties of the entities in your program (as much as possible). The issue of passing "by value" or "by reference" is no exception here.
In C++11 introduction of move semantics into the language produced a notable shift in the relative priorities of different parameter-passing methods. Under certain circumstances it might become perfectly feasible to pass even complex objects by value
Should all/most setter functions in C++11 be written as function templates accepting universal references?
Contrary to popular and long-held beliefs, passing by const reference isn't necessarily faster even when you're passing a large object. You might want to read Dave Abrahams recent article on this very subject.
Edit: (mostly in response to Jeff Hardy's comments): It's true that passing by const reference is probably the "safest" alternative under the largest number of circumstances -- but that doesn't mean it's always the best thing to do. But, to understand what's being discussed here, you really do need to read Dave's entire article quite carefully, as it is fairly technical, and the reasoning behind its conclusions is not always intuitively obvious (and you need to understand the reasoning to make intelligent choices).
Usually for built-in types you can just pass by value. They're small types.
For user defined types (or templates, when you don't what is going to be passed) prefer const&. The size of a reference is probably smaller than the size of the type. And it won't incurr an extra copy (no call to a copy constructor).
Well, yes ... the other answers about efficiency are true. But there's something else going on here which is important - passing a class by value creates a copy and, therefore, invokes the copy constructor. If you're doing fancy stuff there, it's another reason to use references.
A reference to const T is not worth the typing effort in case of scalar types like int, double, etc. The rule of thumb is that class-types should be accepted via ref-to-const. But for iterators (which could be class-types) we often make an exception.
In generic code you should probably write "T const&" most of the time to be on the safe side. There's also boost's call traits you can use to select the most promising parameter passing type. It basically uses ref-to-const for class types and pass-by-value for scalar types as far as I can tell.
But there are also situations where you might want to accept parameters by value, regardless of how expensive creating a copy can be. See Dave's article "Want Speed? Use pass by value!".
For simple types like int, double and char*, it makes sense to pass it by value. For more complex types, I use const T& unless there is a specific reason not to.
The cost of passing a 4 - 8 byte parameter is as low as you can get. You don't buy anything by passing a reference. For larger types, passing them by value can be expensive.
It won't make any difference for an int, as when you use a reference the memory address still has to be passed, and the memory address (void*) is usually about the size of an integer.
For types that contain a lot of data it becomes far more efficient as it avoids the huge overhead from having to copy the data.
Well the difference between the two doesn't really mean much for ints.
However, when using larger structures (or objects), the first method you used, pass by const reference, gives you access to the structure without need to copy it. The second case pass by value will instantiate a new structure that will have the same value as the argument.
In both cases you see this in the caller
myFunct(item);
To the caller, item will not be changed by myFunct, but the pass by reference will not incur the cost of creating a copy.
There is a very good answer to a similar question over at Pass by Reference / Value in C++
The difference between them is that one passes an int (which gets copied), and one uses the existing int. Since it's a const reference, it doesn't get changed, so it works much the same. The big difference here is that the function can alter the value of the int locally, but not the const reference. (I suppose some idiot could do the same thing with const_cast<>, or at least try to.) For larger objects, I can think of two differences.
First, some objects simply can't get copied, auto_ptr<>s and objects containing them being the obvious example.
Second, for large and complicated objects it's faster to pass by const reference than to copy. It's usually not a big deal, but passing objects by const reference is a useful habit to get into.
Either works fine. Don't waste your time worrying about this stuff.
The only time it might make a difference is when the type is a large struct, which might be expensive to pass on the stack. In that case, passing the arg as a pointer or a reference is (slightly) more efficient.
The problem appears when you are passing objects. If you pass by value, the copy constructor will be called. If you haven't implemented one, then a shallow copy of that object will be passed to the function.
Why is this a problem? If you have pointers to dynamically allocated memory, this could be freed when the destructor of the copy is called (when the object leaves the function's scope). Then, when you re call your destructor, youll have a double free.
Moral: Write your copy constructors.
I'm trying to use std::string instead of char* whenever possible, but I worry I may be degrading performance too much. Is this a good way of returning strings (no error checking for brevity)?
std::string linux_settings_provider::get_home_folder() {
return std::string(getenv("HOME"));
}
Also, a related question: when accepting strings as parameters, should I receive them as const std::string& or const char*?
Thanks.
Return the string.
I think the better abstraction is worth it. Until you can measure a meaningful performance difference, I'd argue that it's a micro-optimization that only exists in your imagination.
It took many years to get a good string abstraction into C++. I don't believe that Bjarne Stroustroup, so famous for his conservative "only pay for what you use" dictum, would have permitted an obvious performance killer into the language. Higher abstraction is good.
Return the string, like everyone says.
when accepting strings as parameters, should I receive them as const std::string& or const char*?
I'd say take any const parameters by reference, unless either they're lightweight enough to take by value, or in those rare cases where you need a null pointer to be a valid input meaning "none of the above". This policy isn't specific to strings.
Non-const reference parameters are debatable, because from the calling code (without a good IDE), you can't immediately see whether they're passed by value or by reference, and the difference is important. So the code may be unclear. For const params, that doesn't apply. People reading the calling code can usually just assume that it's not their problem, so they'll only occasionally need to check the signature.
In the case where you're going to take a copy of the argument in the function, your general policy should be to take the argument by value. Then you already have a copy you can use, and if you would have copied it into some specific location (like a data member) then you can move it (in C++11) or swap it (in C++03) to get it there. This gives the compiler the best opportunity to optimize cases where the caller passes a temporary object.
For string in particular, this covers the case where your function takes a std::string by value, and the caller specifies as the argument expression a string literal or a char* pointing to a nul-terminated string. If you took a const std::string& and copied it in the function, that would result in the construction of two strings.
The cost of copying strings by value varies based on the STL implementation you're working with:
std::string under MSVC uses the short string optimisation, so that short strings (< 16 characters iirc) don't require any memory allocation (they're stored within the std::string itself), while longer ones require a heap allocation every time the string is copied.
std::string under GCC uses a reference counted implementation: when constructing a std::string from a char*, a heap allocation is done every time, but when passing by value to a function, a reference count is simply incremented, avoiding the memory allocation.
In general, you're better off just forgetting about the above and returning std::strings by value, unless you're doing it thousands of times a second.
re: parameter passing, keep in mind that there's a cost from going from char*->std::string, but not from going from std::string->char*. In general, this means you're better off accepting a const reference to a std::string. However, the best justification for accepting a const std::string& as an argument is that then the callee doesn't have to have extra code for checking vs. null.
Seems like a good idea.
If this is not part of a realtime software (like a game) but a regular application, you should be more than fine.
Remember, "Premature optimization is the root of all evil"
It's human nature to worry about performance especially when programming language supports low-level optimization.
What we shouldn't forget as programmers though is that program performance is just one thing among many that we can optimize and admire. In addition to program speed we can find beauty in our own performance. We can minimize our efforts while trying to achieve maximum visual output and user-interface interactiveness. Do you think that could be more motivation that worrying about bits and cycles in a long run... So yes, return string:s. They minimize your code size, and your efforts, and make the amount of work you put in less depressing.
In your case Return Value Optimization will take place so std::string will not be copied.
Beware when you cross module boundaries.
Then it's best to return primitive types since C++ types are not necessarily binary compatible across even different versions of the same compiler.
I agree with the other posters, that you should use string.
But know, that depending on how aggressively your compiler optimizes temporaries, you will probably have some extra overhead (over using a dynamic array of chars). (Note: The good news is that in C++0a, the judicious use of rvalue references will not require compiler optimizations to buy efficiency here - and programmers will be able to make some additional performance guarantees about their code without relying on the quality of the compiler.)
In your situation, is the extra overhead worth introducing manual memory management? Most reasonable programmers would disagree - but if your application does end up having performance issues, the next step would be to profile your application - thus, if you do introduce complexity, you only do it once you have good evidence that it is needed to improve overall efficiency.
Someone mentioned that Return Value optimization (RVO) is irrelevant here - I disagree.
The standard text (C++03) on this reads (12.2):
[Begin Standard Quote]
Temporaries of class type are created in various contexts: binding an rvalue to a reference (8.5.3), returning an rvalue (6.6.3), a conversion that creates an rvalue (4.1, 5.2.9, 5.2.11, 5.4), throwing an exception (15.1), entering a handler (15.3), and in some initializations (8.5). [Note: the lifetime of exception objects is described in 15.1. ] Even when the creation of the temporary object is avoided (12.8), all the semantic
restrictions must be respected as if the temporary object was created. [Example: even if the copy constructor is not called, all the semantic restrictions, such as accessibility (clause 11), shall be satisfied. ]
[Example:
struct X {
X(int);
X(const X&);
˜X();
};
X f(X);
void g()
{
X a(1);
X b = f(X(2));
a = f(a);
}
Here, an implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy-constructor; alternatively, X(2) might be constructed in the space used to hold the argument. Also, a temporary might be used to hold the result of f(X(2)) before copying it to b using X’s copyconstructor; alternatively, f()’s result might be constructed in b. On the other hand, the expression a=f(a) requires a temporary for either the argument a or the result of f(a) to avoid undesired aliasing of
a. ]
[End Standard Quote]
Essentially, the text above says that you can possibly rely on RVO in initialization situations, but not in assignment situations. The reason is, when you are initializing an object, there is no way that what you are initializing it with could ever be aliased to the object itself (which is why you never do a self check in a copy constructor), but when you do an assignment, it could.
There is nothing about your code, that inherently prohibits RVO - but read your compiler documentation to ensure that you can truly rely on it, if you do indeed need it.
I agree with duffymo. You should make an understandable working application first and then, if there is a need, attack optimization. It is at this point that you will have an idea where the major bottlenecks are and will be able to more efficiently manage your time in making a faster app.
I agree with #duffymo. Don't optimize until you have measured, this holds double true when doing micro-optimizations. And always: measure before and after you've optimized, to see if you actually changed things to the better.
Return the string, it's not that big of a loss in term of performance but it will surely ease your job afterward.
Plus, you could always inline the function but most optimizer will fix it anyways.
If you pass a referenced string and you work on that string you don't need to return anything. ;)