In c++, what the preferred/recommended way to create an object in a function/method and return it to be used outside the creation function's scope?
In most functional languages, option 3 (and sometimes even option 1) would be preferred, but what's the c++ way of best handling this?
Option 1 (return unique_ptr)
pros: function is pure and does not change input params
cons: is this an unnecessarily complicated solution?
std::unique_ptr<SomeClass> createSometing(){
auto s = std::make_unique<SomeClass>();
return s;
}
Option 2 (pass result as a reference parameter)
pros: simple and does not involve pointers
cons: input parameter is changed (makes function less pure and more unpredictable - the result reference param could be changed anywhere within the function and it could get hard/messy to track in larger functions).
void createSometing(SomeClass& result){
SomeClass s;
result = s;
}
Option 3 (return by value - involves copying)
pros: simple and clear
cons: involves copying an object - which could be expensive. But is this ok?
SomeClass createSometing(){
SomeClass s;
return s;
}
In modern C++, the rule is that the compiler is smarter than the programmer. Said differently the programmer is expected to write code that will be easy to read and maintain. And except when profiling have proven that there is a non acceptable bottleneck, low level concerns should be left to the optimizing compilers.
For that reason and except if profiling has proven that another way is required I would first try option 3 and return a plain object. If the object is moveable, moving an object is generally not too expensive. Furthermore, most compilers are able to fully elide the copy/move operation if they can. If I correctly remember, copy elision is even required starting with C++17 for statements like that:
T foo = functionReturningT();
This is a loaded question, because the matter involves a decision to create the object on the heap vs not creating it on the heap. In C++, it’s ideal to have objects that can be passed around as values cheaply. std::string is a good example of that. It’s generally a premature pessimization to allocate std::string on the heap. On the other hand, the object you may be creating may be large and expensive to copy. In that case, putting it on the heap would be preferable. But that assumes that a copy would have to take place. By default, the copy is eluded! But also: figure out if the type could be made cheaper to copy.
So there’s no “one way suits all”. In my experience, legacy code tends to overuse the heap.
In most cases, returning by value is preferable, since all mainstream compilers will have the function instantiate the object in the storage where it’ll reside, without moves nor copies.
Then, the object can be copy-constructed on the heap by the user of the function, if they so desire, and the compiler will get rid of that copy as well.
Micromanagement of this stuff, without looking at actual generated code, is typically a waste of time, since the code declares intent and not the implementation. Compilers these days literally produce code that has equivalent meaning, taking the C++ source’s semantics, but not necessarily using the source to dictate identical implementation at the machine level.
Thus, in most instances, returning by value is the sensible default, unless the type is borked and doesn’t support that. Unfortunately , some widely used types are in this camp, eg. Qt’s QObject.
TL;DR: Given MyType myFactoryFunction();, the statement auto obj = std::make_unique<MyType>(myFactoryFunction()); will not copy nor move on modern compilers in the release build, if the type is designed well.
There isn't a single right answer and it depends on the situation and personal preference to some extent. Here are pros and cons of different approaches.
Just declare it
SomeClass foo(arg1, arg2);
Factory functions should be relatively uncommon and only needed if the code creating the object doesn't have all the necessary information to create it (or shouldn't, due to encapsulation reasons). Perhaps it's more common in other languages to have factory functions for everything, but instantiating objects directly should be the first pick.
Return by value
SomeClass createSomeClass();
The first question is whether you want the resulting object to live on the stack or the heap. The default for small objects is the stack, since it's more efficient as you skip the call to malloc(). With Return Value Optimization usually there's no copy.
Return by pointer
std::unique_ptr<SomeClass> createSomeClass();
or
SomeClass* createSomeClass();
Reasons you might pick this include being a large object that you want to be heap allocated; the object is created out of some data store and the caller won't own the memory; you want a nullable return type to signal errors.
Out parameter
bool createSomeClass(SomeClass&);
Main benefits of using out parameters are when you have multiple return types. For example, you might want to return true/false for whether the object creation succeeded (e.g. if your object doesn't have a valid "unset" state, like an integer). You might also have a factory function that returns multiple things, e.g.
void createUserAndToken(User& user, Token& token);
In summary, I'd say by default, go with return by value. Do you need to signal failure? Out parameter or pointer. Is it a large object that lives on the heap, or some other data structure and you're giving out a handle? Return by pointer. If you don't strictly need a factory function, just declare it.
I have been attempting to learn C++ over the past few weeks and have a question regarding good practice.
Let's say I have a function that will produce some object. Is it better to define the function to produce an output of type object, or is it better to have the function be passed an object pointer as an argument such that it can modify it directly?
I suppose this answer is dependent on the scenario, but I'm curious if efficiency comes into play. When passing objects into a function as an argument, I know it is more efficient to use const reference such that the function has immediate access to the object with no need of generating a copy.
Does such concern of efficiency come into play when outputting function results?
The following:
MyType someFunc()
{
MyType result;
// produce value here
return result;
}
Used like this:
MyType var = someFunc();
Will do no copy, and no move, but rather RVO.
This means that it can't get more efficient anyway, and it is
also easy to read, and hard to use wrong. Don't help the compiler.
You can return created object as a pointer or shared pointer from function. This is useful for immediate checking return value.
std::shared_ptr<Object> CreateObject(int type)
{
if (type == SupportedType)
return std::make_shared<Object>();
else
return std::shared_ptr<Object>();
}
...
if (std::shared_ptr<Object> object = CreateObject(param))
// do something with object
else
// process error
This is more compact way than passing reference to object's pointer as param and maybe a bit more intuitive.
By passing things by reference you are saving memory resources, this will prevent you from creating copies of things when not needed.
I find it is good practice to pass everything as constant pointers initially and go back and change if needed. This makes sure you are really aware of the structure of your code.
As the best practice, often having easy-to-read code is the most important factor. See what method makes that block of code easier to read and go that way. In most cases the answer by sp2danny is the clearest.
If for your project the speed has the highest priority then test all the possible methods and see which one is faster. Because most likely your code is more complicated than calling a single function and getting an object back, and probably a few other functions interact with that object too. Hence, you should consider the whole code while trying to improve the speed.
Suppose I have a function that will return a large data structure, with the intention that the caller will immediately copy the return value:
Large large()
{
return Large();
}
Now suppose I do not want to rely on any kind of compiler optimizations such as return value optimization etc. Also suppose that I cannot rely on the C++11 move constructor. I would like to gather some opinions on the "correctness" of the following code:
const Large& large()
{
static Large large;
large = Large();
return large;
}
It should work as intended, but is it poor style to return a reference to a static local even if it is const qualified?
It all depends on what should work as expected means. In this case all callers will share references to the exact same variable. Also note that if callers will copy, then you are effectively disabling RVO (Return Value Optimization), which will work in all current compilers [*].
I would stay away from that approach as much as possible, it is not idiomatic and will probably cause confusion in many cases.
[*]The calling convention in all compilers I know of determines that a function that returns a large (i.e. does not fit a register) variable receives a hidden pointer to the location in which the caller has allocated the space for the variable. That is, the optization is forced by the calling convention.
I don't think there's any issue with doing this. So long as this code base is, and forever will be, single threaded.
Do this on a multithreaded piece of code, and you might never be able to figure out why your data are occasionally being randomly corrupted.
Which is better for performance when calling a function that provides a simple datatype -- having it fill in a memory location (passed by pointer) or having it return the simple data?
I've oversimplified the example returning a static value of 5 here, but assume the lookup/functionality that determines the return value would be dynamic in real life...
Conventional logic would tell me the first approach is quicker since we are operating by reference instead of having to return a copy as in the 2nd approach... But, I'd like others' opinions.
Thanks
void func(int *a) {
*a = 5;
}
or...
int func() {
return 5;
}
In general, if your function acts like a function (that is, returning a single logical value), then it's probably best to use int func(). Even if the return value is a complex C++ object, there's a common optimisation called Return Value Optimisation that avoids unnecessary object copying and makes the two forms roughly equivalent in runtime performance.
Most compilers will return a value in a register as long as what you're returning is small enough to fit in a register. It's pretty unusual (and often nearly impossible) for anything else to be more efficient than that.
For PODs, there is no or almost no difference and I'd always go with a return value as I find those cleaner and easier to read.
For non-PODs the answer is "it depends" - a lot of compilers use Return Value Optimisation in this sort of scenario which tends to create an implicit reference parameter.
However unless you have measured - not "know", but actually measured with a profiler - that returning the results of the function using a return value is actually a bottleneck in your software, go for the more readable version of the code.
In my opinion, always go with return unless you know of a reason not to, or you have to return more than one value from the function. Returning a built-in type is very efficient, and whatever the difference vs. returning via pointer, it must be negligible. But the real benefit here is using return is clearer and simpler for those who read the code later.
Returning a simple value is just something like an instrution in assmbly ( ie MOV eax,xxxx ), passing a parameter introduce a little more overhead. in any case you should not worry about that, difference are hard to notice.
Another important issue is that a function returniong on the left is generally cleaner in term of design, and preferred when possible.
This is a low level thing, where it would be hard to see any difference.
Easy answer: it depends.
It depends on the types being used, whether they can be copied cheaply or not (or at all), whether the compiler can use RVO in some circumstances or not, inline things better with one form or another...
Use what makes sense in the context.
Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?
Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?
Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.
There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).
I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.
Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.
First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision
I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.
As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.
The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....
The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.
Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.
Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?