Efficiently Using A Function Output - c++

I have been attempting to learn C++ over the past few weeks and have a question regarding good practice.
Let's say I have a function that will produce some object. Is it better to define the function to produce an output of type object, or is it better to have the function be passed an object pointer as an argument such that it can modify it directly?
I suppose this answer is dependent on the scenario, but I'm curious if efficiency comes into play. When passing objects into a function as an argument, I know it is more efficient to use const reference such that the function has immediate access to the object with no need of generating a copy.
Does such concern of efficiency come into play when outputting function results?

The following:
MyType someFunc()
{
MyType result;
// produce value here
return result;
}
Used like this:
MyType var = someFunc();
Will do no copy, and no move, but rather RVO.
This means that it can't get more efficient anyway, and it is
also easy to read, and hard to use wrong. Don't help the compiler.

You can return created object as a pointer or shared pointer from function. This is useful for immediate checking return value.
std::shared_ptr<Object> CreateObject(int type)
{
if (type == SupportedType)
return std::make_shared<Object>();
else
return std::shared_ptr<Object>();
}
...
if (std::shared_ptr<Object> object = CreateObject(param))
// do something with object
else
// process error
This is more compact way than passing reference to object's pointer as param and maybe a bit more intuitive.

By passing things by reference you are saving memory resources, this will prevent you from creating copies of things when not needed.
I find it is good practice to pass everything as constant pointers initially and go back and change if needed. This makes sure you are really aware of the structure of your code.

As the best practice, often having easy-to-read code is the most important factor. See what method makes that block of code easier to read and go that way. In most cases the answer by sp2danny is the clearest.
If for your project the speed has the highest priority then test all the possible methods and see which one is faster. Because most likely your code is more complicated than calling a single function and getting an object back, and probably a few other functions interact with that object too. Hence, you should consider the whole code while trying to improve the speed.

Related

Returning a Static Local Reference

Suppose I have a function that will return a large data structure, with the intention that the caller will immediately copy the return value:
Large large()
{
return Large();
}
Now suppose I do not want to rely on any kind of compiler optimizations such as return value optimization etc. Also suppose that I cannot rely on the C++11 move constructor. I would like to gather some opinions on the "correctness" of the following code:
const Large& large()
{
static Large large;
large = Large();
return large;
}
It should work as intended, but is it poor style to return a reference to a static local even if it is const qualified?
It all depends on what should work as expected means. In this case all callers will share references to the exact same variable. Also note that if callers will copy, then you are effectively disabling RVO (Return Value Optimization), which will work in all current compilers [*].
I would stay away from that approach as much as possible, it is not idiomatic and will probably cause confusion in many cases.
[*]The calling convention in all compilers I know of determines that a function that returns a large (i.e. does not fit a register) variable receives a hidden pointer to the location in which the caller has allocated the space for the variable. That is, the optization is forced by the calling convention.
I don't think there's any issue with doing this. So long as this code base is, and forever will be, single threaded.
Do this on a multithreaded piece of code, and you might never be able to figure out why your data are occasionally being randomly corrupted.

C++ Is using auto_ptr references as out variables idiomatic?

Suppose I want to write factory method that is supposed to allocate heterogeneous objects on the heap and return them to the caller. I am thinking of designing the API like this:
bool MakeEm(auto_ptr<Foo>& outFoo, auto_ptr<Bar>& outBar) {
...
if (...) {
return false;
}
outFoo.reset(new Foo(...));
outBar.reset(new Bar(...));
return true;
}
This allows a caller to do this:
auto_ptr<Foo> foo;
auto_ptr<Bar> bar;
MakeEm(foo, bar);
My question is: "Is this idiomatic? If not, what is the right way to do this?"
The alternative approaches I can think of include returning a struct of auto_ptrs, or writing the factory API to take raw pointer references. They both require writing more code, and the latter has other gotchyas when it comes to exception safety.
Asking of something is idiomatic can get you some very subjective answers.
In general, however, I think auto_ptr is a great way to convey ownership, so as a return from a class factory - it's probably a Good Thing.
I would want to refactor this, such that
You return one object instead of 2. If you need 2 objects that are so tightly coupled they cannot exist without each other I'd say you have a strong case for is-a or has-a refactoring.
This is C++. Really ask yourself if you should return a value indicating success, forcing the consumer of your factory to have to check every time. Throw exceptions or pass exceptions from the constructors of your classes in the factory. Would you ever want to be OK with false and try to operate on uninitialized auto_ptr's?
You don't have to make up your own struct to return two values - you can use std::pair. In that case there isn't much syntactic overhead in returning the two values. This solution does have the problem that ".first" and ".second" aren't very descriptive names, but if the types involved and the name of the function make the intent clear enough then that's not necessarily a problem.
If you are using C++0x you could use unique_ptr insted of auto_ptr and the caller can use auto instead of having to type the longer std::pair<std::unique_ptr<A>, std::unique_ptr<B>>. If you are not using C++0x you might consider using a typedef for that instead.
If you return the two values then you won't have space for the bool. You could use a C++0x tuple to return all three values. You could also indicate error by throwing an exception or by returning null pointers. I would prefer an exception assuming that the error is rare/exceptional.
As other answers have pointed out, it is often preferable to have two separate functions that each return a single object. If you can't do that because the initialization of the two objects is inextricably linked then you could make a class that encapsulates the initialization. You could pass the necessary information to make the two objects to the constructor (requires exception to signal errors) and then have two methods on that class that yield one object each.
Let's assume that a return value of false means "don't look at the output parameters".
Then what I would do is get rid of the bool return value, return a struct or pair that has the auto_pointers you want, and throw in the error condition.
Usually when you have auto_ptr parameters they are not references.
This is because when you pass something to a function that takes auto_ptr you are expecting that function to take ownership. If you are passing by reference it does not actually take the object (it may take the object).
Its a subtle point, but in the end you need to look at what your interface is trying to say to the user.
Also you seem to be using it as an out parameter.
Personally I have never seen this use case (but I can see it) just document what you are trying to do and more importantly why.
As a general rule, if it involves auto_ptr, it's not idiomatic. In general, the structure is not idiomatic too- normally, you'd make one function for each, return by value and throw an exception if they fail, and if you need to share variables, make it an object.

Pass reference to output location vs using return

Which is better for performance when calling a function that provides a simple datatype -- having it fill in a memory location (passed by pointer) or having it return the simple data?
I've oversimplified the example returning a static value of 5 here, but assume the lookup/functionality that determines the return value would be dynamic in real life...
Conventional logic would tell me the first approach is quicker since we are operating by reference instead of having to return a copy as in the 2nd approach... But, I'd like others' opinions.
Thanks
void func(int *a) {
*a = 5;
}
or...
int func() {
return 5;
}
In general, if your function acts like a function (that is, returning a single logical value), then it's probably best to use int func(). Even if the return value is a complex C++ object, there's a common optimisation called Return Value Optimisation that avoids unnecessary object copying and makes the two forms roughly equivalent in runtime performance.
Most compilers will return a value in a register as long as what you're returning is small enough to fit in a register. It's pretty unusual (and often nearly impossible) for anything else to be more efficient than that.
For PODs, there is no or almost no difference and I'd always go with a return value as I find those cleaner and easier to read.
For non-PODs the answer is "it depends" - a lot of compilers use Return Value Optimisation in this sort of scenario which tends to create an implicit reference parameter.
However unless you have measured - not "know", but actually measured with a profiler - that returning the results of the function using a return value is actually a bottleneck in your software, go for the more readable version of the code.
In my opinion, always go with return unless you know of a reason not to, or you have to return more than one value from the function. Returning a built-in type is very efficient, and whatever the difference vs. returning via pointer, it must be negligible. But the real benefit here is using return is clearer and simpler for those who read the code later.
Returning a simple value is just something like an instrution in assmbly ( ie MOV eax,xxxx ), passing a parameter introduce a little more overhead. in any case you should not worry about that, difference are hard to notice.
Another important issue is that a function returniong on the left is generally cleaner in term of design, and preferred when possible.
This is a low level thing, where it would be hard to see any difference.
Easy answer: it depends.
It depends on the types being used, whether they can be copied cheaply or not (or at all), whether the compiler can use RVO in some circumstances or not, inline things better with one form or another...
Use what makes sense in the context.

How to avoid out parameters?

I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).

C++ - passing references to std::shared_ptr or boost::shared_ptr

If I have a function that needs to work with a shared_ptr, wouldn't it be more efficient to pass it a reference to it (so to avoid copying the shared_ptr object)?
What are the possible bad side effects?
I envision two possible cases:
1) inside the function a copy is made of the argument, like in
ClassA::take_copy_of_sp(boost::shared_ptr<foo> &sp)
{
...
m_sp_member=sp; //This will copy the object, incrementing refcount
...
}
2) inside the function the argument is only used, like in
Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here
{
...
sp->do_something();
...
}
I can't see in both cases a good reason to pass the boost::shared_ptr<foo> by value instead of by reference. Passing by value would only "temporarily" increment the reference count due to the copying, and then decrement it when exiting the function scope.
Am I overlooking something?
Just to clarify, after reading several answers: I perfectly agree on the premature-optimization concerns, and I always try to first-profile-then-work-on-the-hotspots. My question was more from a purely technical code-point-of-view, if you know what I mean.
I found myself disagreeing with the highest-voted answer, so I went looking for expert opinons and here they are.
From http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2011-Scott-Andrei-and-Herb-Ask-Us-Anything
Herb Sutter: "when you pass shared_ptrs, copies are expensive"
Scott Meyers: "There's nothing special about shared_ptr when it comes to whether you pass it by value, or pass it by reference. Use exactly the same analysis you use for any other user defined type. People seem to have this perception that shared_ptr somehow solves all management problems, and that because it's small, it's necessarily inexpensive to pass by value. It has to be copied, and there is a cost associated with that... it's expensive to pass it by value, so if I can get away with it with proper semantics in my program, I'm gonna pass it by reference to const or reference instead"
Herb Sutter: "always pass them by reference to const, and very occasionally maybe because you know what you called might modify the thing you got a reference from, maybe then you might pass by value... if you copy them as parameters, oh my goodness you almost never need to bump that reference count because it's being held alive anyway, and you should be passing it by reference, so please do that"
Update: Herb has expanded on this here: http://herbsutter.com/2013/06/05/gotw-91-solution-smart-pointer-parameters/, although the moral of the story is that you shouldn't be passing shared_ptrs at all "unless you want to use or manipulate the smart pointer itself, such as to share or transfer ownership."
The point of a distinct shared_ptr instance is to guarantee (as far as possible) that as long as this shared_ptr is in scope, the object it points to will still exist, because its reference count will be at least 1.
Class::only_work_with_sp(boost::shared_ptr<foo> sp)
{
// sp points to an object that cannot be destroyed during this function
}
So by using a reference to a shared_ptr, you disable that guarantee. So in your second case:
Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here
{
...
sp->do_something();
...
}
How do you know that sp->do_something() will not blow up due to a null pointer?
It all depends what is in those '...' sections of the code. What if you call something during the first '...' that has the side-effect (somewhere in another part of the code) of clearing a shared_ptr to that same object? And what if it happens to be the only remaining distinct shared_ptr to that object? Bye bye object, just where you're about to try and use it.
So there are two ways to answer that question:
Examine the source of your entire program very carefully until you are sure the object won't die during the function body.
Change the parameter back to be a distinct object instead of a reference.
General bit of advice that applies here: don't bother making risky changes to your code for the sake of performance until you've timed your product in a realistic situation in a profiler and conclusively measured that the change you want to make will make a significant difference to performance.
Update for commenter JQ
Here's a contrived example. It's deliberately simple, so the mistake will be obvious. In real examples, the mistake is not so obvious because it is hidden in layers of real detail.
We have a function that will send a message somewhere. It may be a large message so rather than using a std::string that likely gets copied as it is passed around to multiple places, we use a shared_ptr to a string:
void send_message(std::shared_ptr<std::string> msg)
{
std::cout << (*msg.get()) << std::endl;
}
(We just "send" it to the console for this example).
Now we want to add a facility to remember the previous message. We want the following behaviour: a variable must exist that contains the most recently sent message, but while a message is currently being sent then there must be no previous message (the variable should be reset before sending). So we declare the new variable:
std::shared_ptr<std::string> previous_message;
Then we amend our function according to the rules we specified:
void send_message(std::shared_ptr<std::string> msg)
{
previous_message = 0;
std::cout << *msg << std::endl;
previous_message = msg;
}
So, before we start sending we discard the current previous message, and then after the send is complete we can store the new previous message. All good. Here's some test code:
send_message(std::shared_ptr<std::string>(new std::string("Hi")));
send_message(previous_message);
And as expected, this prints Hi! twice.
Now along comes Mr Maintainer, who looks at the code and thinks: Hey, that parameter to send_message is a shared_ptr:
void send_message(std::shared_ptr<std::string> msg)
Obviously that can be changed to:
void send_message(const std::shared_ptr<std::string> &msg)
Think of the performance enhancement this will bring! (Never mind that we're about to send a typically large message over some channel, so the performance enhancement will be so small as to be unmeasureable).
But the real problem is that now the test code will exhibit undefined behaviour (in Visual C++ 2010 debug builds, it crashes).
Mr Maintainer is surprised by this, but adds a defensive check to send_message in an attempt to stop the problem happening:
void send_message(const std::shared_ptr<std::string> &msg)
{
if (msg == 0)
return;
But of course it still goes ahead and crashes, because msg is never null when send_message is called.
As I say, with all the code so close together in a trivial example, it's easy to find the mistake. But in real programs, with more complex relationships between mutable objects that hold pointers to each other, it is easy to make the mistake, and hard to construct the necessary test cases to detect the mistake.
The easy solution, where you want a function to be able to rely on a shared_ptr continuing to be non-null throughout, is for the function to allocate its own true shared_ptr, rather than relying on a reference to an existing shared_ptr.
The downside is that copied a shared_ptr is not free: even "lock-free" implementations have to use an interlocked operation to honour threading guarantees. So there may be situations where a program can be significantly sped up by changing a shared_ptr into a shared_ptr &. But it this is not a change that can be safely made to all programs. It changes the logical meaning of the program.
Note that a similar bug would occur if we used std::string throughout instead of std::shared_ptr<std::string>, and instead of:
previous_message = 0;
to clear the message, we said:
previous_message.clear();
Then the symptom would be the accidental sending of an empty message, instead of undefined behaviour. The cost of an extra copy of a very large string may be a lot more significant than the cost of copying a shared_ptr, so the trade-off may be different.
I would advise against this practice unless you and the other programmers you work with really, really know what you are all doing.
First, you have no idea how the interface to your class might evolve and you want to prevent other programmers from doing bad things. Passing a shared_ptr by reference isn't something a programmer should expect to see, because it isn't idiomatic, and that makes it easy to use it incorrectly. Program defensively: make the interface hard to use incorrectly. Passing by reference is just going to invite problems later on.
Second, don't optimize until you know this particular class is going to be a problem. Profile first, and then if your program really needs the boost given by passing by reference, then maybe. Otherwise, don't sweat the small stuff (i.e. the extra N instructions it takes to pass by value) instead worry about design, data structures, algorithms, and long-term maintainability.
Yes, taking a reference is fine there. You don't intend to give the method shared ownership; it only wants to work with it. You could take a reference for the first case too, since you copy it anyway. But for first case, it takes ownership. There is this trick to still copy it only once:
void ClassA::take_copy_of_sp(boost::shared_ptr<foo> sp) {
m_sp_member.swap(sp);
}
You should also copy when you return it (i.e not return a reference). Because your class doesn't know what the client is doing with it (it could store a pointer to it and then big bang happens). If it later turns out it's a bottleneck (first profile!), then you can still return a reference.
Edit: Of course, as others point out, this only is true if you know your code and know that you don't reset the passed shared pointer in some way. If in doubt, just pass by value.
It is sensible to pass shared_ptrs by const&. It will not likely cause trouble (except in the unlikely case that the referenced shared_ptr is deleted during the function call, as detailed by Earwicker) and it will likely be faster if you pass a lot of these around. Remember; the default boost::shared_ptr is thread safe, so copying it includes a thread safe increment.
Try to use const& rather than just &, because temporary objects may not be passed by non-const reference. (Even though a language extension in MSVC allows you to do it anyway)
In the second case, doing this is simpler:
Class::only_work_with_sp(foo &sp)
{
...
sp.do_something();
...
}
You can call it as
only_work_with_sp(*sp);
I would avoid a "plain" reference unless the function explicitely may modify the pointer.
A const & may be a sensible micro-optimization when calling small functions - e.g. to enable further optimizations, like inlining away some conditions. Also, the increment/decrement - since it's thread safe - is a synchronization point. I would not expect this to make a big difference in most scenarios, though.
Generally, you should use the simpler style unless you have reason not to. Then, either use the const & consistently, or add a comment as to why if you use it just in a few places.
I would advocate passing shared pointer by const reference - a semantics that the function being passed with the pointer does NOT own the pointer, which is a clean idiom for developers.
The only pitfall is in multiple thread programs the object being pointed by the shared pointer gets destroyed in another thread. So it is safe to say using const reference of shared pointer is safe in single threaded program.
Passing shared pointer by non-const reference is sometimes dangerous - the reason is the swap and reset functions the function may invoke inside so as to destroy the object which is still considered valid after the function returns.
It is not about premature optimization, I guess - it is about avoiding unnecessary waste of CPU cycles when you are clear what you want to do and the coding idiom has firmly been adopted by your fellow developers.
Just my 2 cents :-)
It seems that all the pros and cons here can actually be generalised to ANY type passed by reference not just shared_ptr. In my opinion, you should know the semantic of passing by reference, const reference and value and use it correctly. But there is absolutely nothing inherently wrong with passing shared_ptr by reference, unless you think that all references are bad...
To go back to the example:
Class::only_work_with_sp( foo &sp ) //Again, no copy here
{
...
sp.do_something();
...
}
How do you know that sp.do_something() will not blow up due to a dangling pointer?
The truth is that, shared_ptr or not, const or not, this could happen if you have a design flaw, like directly or indirectly sharing the ownership of sp between threads, missusing an object that do delete this, you have a circular ownership or other ownership errors.
One thing that I haven't seen mentioned yet is that when you pass shared pointers by reference, you lose the implicit conversion that you get if you want to pass a derived class shared pointer through a reference to a base class shared pointer.
For example, this code will produce an error, but it will work if you change test() so that the shared pointer is not passed by reference.
#include <boost/shared_ptr.hpp>
class Base { };
class Derived: public Base { };
// ONLY instances of Base can be passed by reference. If you have a shared_ptr
// to a derived type, you have to cast it manually. If you remove the reference
// and pass the shared_ptr by value, then the cast is implicit so you don't have
// to worry about it.
void test(boost::shared_ptr<Base>& b)
{
return;
}
int main(void)
{
boost::shared_ptr<Derived> d(new Derived);
test(d);
// If you want the above call to work with references, you will have to manually cast
// pointers like this, EVERY time you call the function. Since you are creating a new
// shared pointer, you lose the benefit of passing by reference.
boost::shared_ptr<Base> b = boost::dynamic_pointer_cast<Base>(d);
test(b);
return 0;
}
I'll assume that you are familiar with premature optimization and are asking this either for academic purposes or because you have isolated some pre-existing code that is under-performing.
Passing by reference is okay
Passing by const reference is better, and can usually be used, as it does not force const-ness on the object pointed to.
You are not at risk of losing the pointer due to using a reference. That reference is evidence that you have a copy of the smart pointer earlier in the stack and only one thread owns a call stack, so that pre-existing copy isn't going away.
Using references is often more efficient for the reasons you mention, but not guaranteed. Remember that dereferencing an object can take work too. Your ideal reference-usage scenario would be if your coding style involves many small functions, where the pointer would get passed from function to function to function before being used.
You should always avoid storing your smart pointer as a reference. Your Class::take_copy_of_sp(&sp) example shows correct usage for that.
Assuming we are not concerned with const correctness (or more, you mean to allow the functions to be able to modify or share ownership of the data being passed in), passing a boost::shared_ptr by value is safer than passing it by reference as we allow the original boost::shared_ptr to control it's own lifetime. Consider the results of the following code...
void FooTakesReference( boost::shared_ptr< int > & ptr )
{
ptr.reset(); // We reset, and so does sharedA, memory is deleted.
}
void FooTakesValue( boost::shared_ptr< int > ptr )
{
ptr.reset(); // Our temporary is reset, however sharedB hasn't.
}
void main()
{
boost::shared_ptr< int > sharedA( new int( 13 ) );
boost::shared_ptr< int > sharedB( new int( 14 ) );
FooTakesReference( sharedA );
FooTakesValue( sharedB );
}
From the example above we see that passing sharedA by reference allows FooTakesReference to reset the original pointer, which reduces it's use count to 0, destroying it's data. FooTakesValue, however, can't reset the original pointer, guaranteeing sharedB's data is still usable. When another developer inevitably comes along and attempts to piggyback on sharedA's fragile existence, chaos ensues. The lucky sharedB developer, however, goes home early as all is right in his world.
The code safety, in this case, far outweighs any speed improvement copying creates. At the same time, the boost::shared_ptr is meant to improve code safety. It will be far easier to go from a copy to a reference, if something requires this kind of niche optimization.
Sandy wrote: "It seems that all the pros and cons here can actually be generalised to ANY type passed by reference not just shared_ptr."
True to some extent, but the point of using shared_ptr is to eliminate concerns regarding object lifetimes and to let the compiler handle that for you. If you're going to pass a shared pointer by reference and allow clients of your reference-counted-object call non-const methods that might free the object data, then using a shared pointer is almost pointless.
I wrote "almost" in that previous sentence because performance can be a concern, and it 'might' be justified in rare cases, but I would also avoid this scenario myself and look for all possible other optimization solutions myself, such as to seriously look at adding another level of indirection, lazy evaluation, etc..
Code that exists past it's author, or even post it's author's memory, that requires implicit assumptions about behavior, in particular behavior about object lifetimes, requires clear, concise, readable documentation, and then many clients won't read it anyway! Simplicity almost always trumps efficiency, and there are almost always other ways to be efficient. If you really need to pass values by reference to avoid deep copying by copy constructors of your reference-counted-objects (and the equals operator), then perhaps you should consider ways to make the deep-copied data be reference counted pointers that can be copied quickly. (Of course, that's just one design scenario that might not apply to your situation).
I used to work in a project that the principle was very strong about passing smart pointers by value. When I was asked to do some performance analysis - I found that for increment and decrement of the reference counters of the smart pointers the application spends between 4-6% of the utilized processor time.
If you want to pass the smart pointers by value just to avoid having issues in weird cases as described from Daniel Earwicker make sure you understand the price you paying for it.
If you decide to go with a reference the main reason to use const reference is to make it possible to have implicit upcasting when you need to pass shared pointer to object from class that inherits the class you use in the interface.
In addition to what litb said, I'd like to point out that it's probably to pass by const reference in the second example, that way you are sure you don't accidentally modify it.
struct A {
shared_ptr<Message> msg;
shared_ptr<Message> * ptr_msg;
}
pass by value:
void set(shared_ptr<Message> msg) {
this->msg = msg; /// create a new shared_ptr, reference count will be added;
} /// out of method, new created shared_ptr will be deleted, of course, reference count also be reduced;
pass by reference:
void set(shared_ptr<Message>& msg) {
this->msg = msg; /// reference count will be added, because reference is just an alias.
}
pass by pointer:
void set(shared_ptr<Message>* msg) {
this->ptr_msg = msg; /// reference count will not be added;
}
Every code piece must carry some sense. If you pass a shared pointer by value everywhere in the application, this means "I am unsure about what's going on elsewhere, hence I favour raw safety". This is not what I call a good confidence sign to other programmers who could consult the code.
Anyway, even if a function gets a const reference and you are "unsure", you can still create a copy of the shared pointer at the head of the function, to add a strong reference to the pointer. This could also be seen as a hint about the design ("the pointer could be modified elsewhere").
So yes, IMO, the default should be "pass by const reference".