Smart pointers in C++ APIs? - c++

It is fairly often suggested not to use raw pointers in modern C++, except for a few rare cases. What is the common practice of using smart pointers in C++ library APIs?
The following use cases come up to my mind:
A function which returns a new object.
A function which returns a new object, but it has also created another reference to this object.
A function which only uses an object it receives as an argument.
A function which takes over the ownership of an object.
A function which will store a reference to an object which it has received as an argument, but there might be other references to the very same object (from the callers side).

Unfortunately, a library API design goes far beyond the rule of the language itself: suddenly you have to care about implementations details such as the ABI.
Contrary to C, where common implementations can easily interact together, C++ implementations have very different ABIs (Microsoft ABI for VC++ is completely incompatible with the Itanium ABI used by gcc and Clang), and C++ Standard Library implementations are also incompatible with each others, so that a library compiled with libstdc++ (bundled with gcc) cannot be used by a program using another major version of libstdc++ or another implementation such as libc++ (bundled with Clang) if std:: classes appear in the interface.
Therefore, it really depends on whether you intend your library to be delivered as a binary or assume the user will be in a position to compile the library with its own compiler and Standard Library implementation of choice. Only in the latter case should you expose a C++ interface, for binary distributions sticking to C is better (and it's also easier to integrate with other languages).
With that out of the way, let's assume you decided to use a C++ API:
1) A function which returns a new object.
If the object can be returned by value, do so; if it is polymorphic, use std::unique_ptr<Object>.
2) A function which returns a new object, but it has also created another reference to this object.
Rare Case. I suppose that you mean it has kept a reference, somehow. In this case you have shared ownership so the obvious choice is std::shared_ptr<Object>.
3) A function which only uses an object it receives as an argument.
Much more complex than it first seems, even supposing no reference to the object is kept.
in general, pass by reference (const or not)
unless you intend to operate on a copy of the object, in which case pass by value to benefit from a potential move
4) A function which takes over the ownership of an object.
Simple ownership: std::unique_ptr<Object>.
5) A function which will store a reference to an object which it has received as an argument, but there might be other references to the very same object (from the callers side).
Shared ownership: std::shared_ptr<Object>

Unique pointers may do the job in the general case.
std::unique_ptr<Foo> func();
So that you can assign them directly to auto variables at the callers site, and not worry about memory management.
auto u = func();
In case you need shared ownership, you can convert smart pointers on the callers site:
std::shared_ptr<Foo> s{func()};
However, in case the return type of the function is not polymorphic, and that type is fast to move, you may prefer return by value:
Foo func();
Shared pointers.
std::shared_ptr<Foo> func();
Simply use a reference or const reference to the object, with raw pointers or smart pointers "dereferenced" at the callers site.
void func(Foo& obj);
void func(const Foo& obj);
In case the parameters are optional, you may use raw pointers, so that you can easily pass a nullptr to them.
void func(Foo *obj);
void func(const Foo *obj);
Unique pointers.
void func(std::unique_ptr<Foo> obj);
Const reference to shared pointers.
void func(const std::shared_ptr<Foo>& obj);
The const reference (to shared pointer) is simply an optimization to prevent adjusting the reference count when the function is called and when it returns.

This is a very old question but also an important one.
One school of thought missing from the answers is that you should not return raw nor smart pointers from your API if you can help it.
The raw pointer issue has been discussed. But smart pointers can also be misused, especially in larger projects with complex internal APIs.
A workaround is creating an RAII/wrapper object that holds the ( never-exposed ) smart pointer.

Related

C++ conventions regarding passing objects (pointer vs reference)

I'd like to work out conventions on passing parameters to functions/methods. I know it's a common issue and it has been answered many times, but I searched a lot and found nothing that fully satisfies me.
Passing by value is obvious and I won't mention this. What I came up with is:
Passing by non-const reference means, that object is MODIFIED
Passing by const reference means, that object is USED
Passing by pointer means, that a reference to object is going to be STORED. Whether ownership is passed or not will depend on the context.
It seems to be consistent, but when I want to pick heap-allocated object and pass it to 2. case parameter, it'd look like this:
void use(const Object &object) { ... }
//...
Object *obj = getOrCreateObject();
use(*obj);
or
Object &obj = *getOrCreateObject();
use(obj);
Both look weird to me. What would you advise?
PS I know that one should avoid raw pointers and use smart instead (easier memory managment and expressiveness in ownership) and it can be the next step in refactoring the project I work on.
You can use these conventions if you like. But keep in mind that you cannot assume conventions when dealing with code written by other people. You also cannot assume that people reading your code are aware of your conventions. You should document an interface with comments when it might be ambiguous.
Passing by pointer means, that object is going to be STORED. Who's its owner will depend on the context.
I can think of only one context where the ownership of a pointer argument should transfer to the callee: Constructor of a smart pointer.
Besides possible intention of storing, a pointer argument can alternatively have the same meaning as a reference argument, with the addition that the argument is optional. You typically cannot represent an optional argument with a reference since they cannot be null - although with custom types you could use a reference to a sentinel value.
Both look weird to me. What would you advise?
Neither look weird to me, so my advise is to get accustomed.
The main problem with your conventions is that you make no allowance for the possibility of interfacing to code (e.g. written by someone else) that doesn't follow your conventions.
Generally speaking, I use a different set of conventions, and rarely find a need to work around them. (The main exception will be if there is a need to use a pointer to a pointer, but I rarely need to do that directly).
Passing by non-const reference is appropriate if ANY of the following MAY be true;
The object may be changed;
The object may be passed to another function by a non-const reference [relevant when using third party code by developers who choose to omit the const - which is actually something a lot of beginners or lazy developers do];
The object may be passed to another function by a non-const pointer [relevant when using third party code be developers who choose to omit the const, or when using legacy APIs];
Non-const member functions of the object are called (regardless of whether they change the object or not) [also often a consideration when using third-party code by developers who prefer to avoid using const].
Conversely, const references may be passed if ALL of the following are true;
No non-mutable members of the object are changed;
The object is only passed to other functions by const reference, by const pointer, or by value;
Only const member functions of the object are called (even if those members are able to change mutable members.
I'll pass by value instead of by const reference in cases where the function would copy the object anyway. (e.g. I won't pass by const reference, and then construct a copy of the passed object within the function).
Passing non-const pointers is relevant if it is appropriate to pass a non-const reference but there is also a possibility of passing no object (e.g. a nullptr).
Passing const pointers is relevant if it is appropriate to pass a const reference but there is also a possibility of passing no object (e.g. a nullptr).
I would not change the convention for either of the following
Storing a reference or pointer to the object within the function for later use - it is possible to convert a pointer to a reference or vice versa. And either one can be stored (a pointer can be assigned, a reference can be used to construct an object);
Distinguishing between dynamically allocated and other objects - since I mostly either avoid using dynamic memory allocation at all (e.g. use standard containers, and pass them around by reference or simply pass iterators from them around) or - if I must use a new expression directly - store the pointer in another object that becomes responsible for deallocation (e.g. a std::smart_pointer) and then pass the containing object around.
In my opionion, they are the same. In the first part of your post, you are talking about the signature, but your example is about function call.

Why is raw pointer to shared_ptr construction allowed in all cases?

I was reading Top 10 dumb mistakes to avoid with C++11 smart pointer.
Number #5 reads:
Mistake # 5 : Not assigning an object(raw pointer) to a shared_ptr as
soon as it is created !
int main()
{
Aircraft* myAircraft = new Aircraft("F-16");
shared_ptr<aircraft> pAircraft(myAircraft);
...
shared_ptr<aircraft> p2(myAircraft);
// will do a double delete and possibly crash
}
and the recommendation is something like:
Use make_shared or new and immediately construct the pointer with
it.
Ok, no doubt about it the problem and the recommendation.
However I have a question about the design of shared_ptr.
This is a very easy mistake to make and the whole "safe" design of shared_ptr could be thrown away by very easy-to-detect missuses.
Now the question is, could this be easily been fixed with an alternative design of shared_ptr in which the only constructor from raw pointer would be that from a r-value reference?
template<class T>
struct shared_ptr{
shared_ptr(T*&& t){...basically current implementation...}
shared_ptr(T* t) = delete; // this is to...
shared_ptr(T* const& t) = delete; // ... illustrate the point.
shared_ptr(T*& t) = delete;
...
};
In this way shared_ptr could be only initialized from the result of new or some factory function.
Is this an underexploitation of the C++ language in the library?
or What is the point of having a constructor from raw pointer (l-value) reference if this is going to be most likely a misuse?
Is this a historical accident? (e.g. shared_ptr was proposed before r-value references were introduced, etc) Backwards compatibility?
(Of course one could say std::shared_ptr<type>(std::move(ptr)); that that is easier to catch and also a work around if this is really necessary.)
Am I missing something?
Pointers are very easy to copy. Even if you restrict to r-value reference you can sill easily make copies (like when you pass a pointer as a function parameter) which will invalidate the safety setup. Moreover you will run into problems in templates where you can easily have T* const or T*& as a type and you get type mismatches.
So you are proposing to create more restrictions without significant safety gains, which is likely why it was not in the standard to begin with.
The point of make_shared is to atomize the construction of a shared pointer. Say you have f(shared_ptr<int>(new int(5)), throw_some_exception()). The order of parameter invokation is not guaranteed by the standard. The compiler is allowed to create a new int, run throw_some_exception and then construct the shared_ptr which means that you could leak the int (if throw_some_exception actually throws an exception). make_shared just creates the object and the shared pointer inside itself, which doesn't allow the compiler to change the order, so it becomes safe.
I do not have any special insight into the design of shared_ptr, but I think the most likely explanation is that the timelines involved made this impossible:
The shared_ptr was introduced at the same time as rvalue-references, in C++11. The shared_ptr already had a working reference implementation in boost, so it could be expected to be added to standard libraries relatively quickly.
If the constructor for shared_ptr had only supported construction from rvalue references, it would have been unusable until the compiler had also implemented support for rvalue references.
And at that time, compiler and standards development was much more asynchronous, so it could have taken years until all compiler had implemented support, if at all. (export templates were still fresh on peoples minds in 2011)
Additionally, I assume the standards committee would have felt uncomfortable standardizing an API that did not have a reference implementation, and could not even get one until after the standard was published.
There's a number of cases in which you may not be able to call make_shared(). For example, your code may not be responsible for allocating and constructing the class in question. The following paradigm (private constructors + factory functions) is used in some C++ code bases for a variety of reasons:
struct A {
private:
A();
};
A* A_factory();
In this case, if you wanted to stick the A* you get from A_factory() into a shared_ptr<>, you'd have to use the constructor which takes a raw pointer instead of make_shared().
Off the top of my head, some other examples:
You want to get aligned memory for your type using posix_memalign() and then store it in a shared_ptr<> with a custom deleter that calls free() (this use case will go away soon when we add aligned allocation to the language!).
You want to stick a pointer to a memory-mapped region created with mmap() into a shared_ptr<> with a custom deleter that calls munmap() (this use case will go away when we get a standardized facility for shmem, something I'm hoping to work on in the next few months).
You want to stick a pointer allocated by into a shared_ptr<> with a custom deleter.

Difference between reference and non-reference in C++ [duplicate]

I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int &current = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.

When to use references vs. pointers

I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int &current = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.

A new generic pointer any_ptr (now dumb_ptr) to make code more reusable among smart pointers

I have been using a lot of different boost smart pointers lately, as well as normal pointers. I have noticed that as you develop you tend to realise that you have to switch pointer types and memory management mechanism because you overlooks some circular dependency or some othe small annoying thing. When this happens and you change your pointer type you have to either go and change a whole bunch of you method signatures to take the new pointer type, or at each call site you have to convert between pointer types. You also have a problem if you have the same function but want it to take multiple pointer types.
I am wondering if there already exists a generic way to deal with, i.e. write methods that are agnostic of the pointer type you pass to it?
Obviously i can see a few ways to do this, one is to write overloaded methods for each pointer type, but this quickly becomes a hassle. The other is to use a template style solution with some type inference, but this will cause some significant bloat in the compiled code and is likely to start throwing strange unsolvable template errors.
My idea is to write a new class any_ptr<T> with conversion constructors from all the major pointer types, say, T*, shared_ptr<T>, auto_ptr<T>, scoped_ptr<T> perhaps even weak_ptr<T> and then have it expose the * and -> operators. In this way it could be used in any function that does not return the pointer outside of the function and could be called with any combination of common pointer types.
My question is whether this is a a really stupid thing to do? I see that it could be abused, but assuming it is never used for functions that return the any_ptr, is there a major problem I would be walking into? Your thoughts please.
EDIT 1
Having read your answers I would like to make some notes that were too long for the comments.
Firstly with regards to using raw pointers or references (#shoosh). I agree that you could make the functions use raw pointers, but then assume the case where I was using shared_ptr which means I would have had to go ptr.get() at each call site, now assume I realize I made a cyclic reference and I have to change the pointer to a weak_ptr then I have to go and change all those call sites to x.lock().get() . Now I agree this is not a catastrophe but it is irritating and I feel there is an elegant solution to this. The same could be said for passing the as T& references and going *x, similar call site changes would have to be made.
What I trying to do here is make the code more elegant to read and easier to refactor even through large changes in pointer type.
Secondly with regards to smart_ptr semantics: I agree that different smart pointers are used for different reasons , and have certain considerations that must be taken care of with regards to copying and storage (this is why boost::shared_ptr<T> is not automatically convertible to a T*).
However I envisioned any_ptr (maybe a bad name in retrospect) to only be used in cases where the pointer would not be stored (in anything other than perhaps temporary variables on the stack). It should just be implicitly constructible from the various smart pointer types, overload the * and -> operators and be convertible to a T* (through a custom conversion function T*()). In this way the semantics of any_ptr are the exact same as as T*. And as such it should only be used in places where it would be safe to use a raw ptr (This is what # Alexandre_C was saying in a comment). This also means that there would be none of the "heavy machinery" that #Matthieu_M was talking about.
Thirdly with regards to templates. While templates are great for somethings I am wary of them for the reasons I have made above.
FINALLY: So basically what I am trying to do is for functions where one would normally use raw ptr's (T*) as the parameters I would like to create a system where those parameters can automatically accept any of the various smart_ptr types without having to do the conversions at the call sites. The reason I want to do this is because I think it will make the code more readable by eliminating conversion cruft (and hence also slightly shorter, though not by much) and it would make refactoring and trying different smart pointer regimes less of a hassle.
Perhaps I should have called it unmanaged_ptr instead of any_ptr. That would more correctly describe the semantics. I apologize for the crummy name.
EDIT 2
Okay so here is the class I had in mind. I have called it dumb_ptr.
template<typename T>
class dumb_ptr {
public:
dumb_ptr(const dumb_ptr<T> & dm_ptr) : raw_ptr(dm_ptr.raw_ptr) { }
dumb_ptr(T* raw_ptr) : raw_ptr(raw_ptr) { }
dumb_ptr(const boost::shared_ptr<T> & sh_ptr) : raw_ptr(sh_ptr.get()) { }
dumb_ptr(const boost::weak_ptr<T> & wk_ptr) : raw_ptr(wk_ptr.lock().get()) { }
dumb_ptr(const boost::scoped_ptr<T> & sc_ptr) : raw_ptr(sc_ptr.get()) { }
dumb_ptr(const std::auto_ptr<T> & au_ptr) : raw_ptr(au_ptr.get()) { }
T& operator*() { return *raw_ptr; }
T * operator->() { return raw_ptr; }
operator T*() { return raw_ptr; }
private:
dumb_ptr() { }
dumb_ptr<T> operator=(const dumb_ptr<T> & x) { }
T* raw_ptr;
};
It can convert from the common smart pointers automatically and can be treated as a raw T* pointer, further it can be converted automatically to a T*. The default constructor and the assignment operator (=) have been hidden to deter people from using it for anything other than function arguments. When used as a function argument the following can be done.
void some_fn(dumb_ptr<A> ptr) {
B = ptr->b;
A a = *ptr;
A* raw = ptr;
ptr==raw;
ptr+1;
}
Which is pretty much everything you would want to do with a pointer. It has the exact same semantics as a raw pointer T*. But now it can be used with any smart pointer as the parameter without having to repeat the conversion code (.get,.lock) at each call site. Also if you change your smart pointers you don't have to go around fixing each call site.
Now I think this is reasonably useful, and I can't see problems with it?
With such a class any_ptr, you will not be able to do practically anything other than * and ->. No assignment, copy construction, duplication or destruction. if that's all you need then just write the function taking as argument the raw pointer T* and then call it use .get() or whatnot on your automatic pointer.
You can use different smart pointers because they offer different semantics and tradeoffs.
What would be the semantics of any_ptr ? If it comes from unique_ptr / scoped_ptr it should not be copied. If it comes from shared_ptr or weak_ptr it would need their heavy machinery for reference counting.
You would have a type with all the disadvantages (heavy, no copyable) for little gain...
What you have here is a problem of interface. Unless the methods manage the lifetime of the object (in which case the exact pointer type is required), then it need not be aware of how this lifetime is managed.
This means that methods that merely act on the object should take either a pointer or a reference (depending on whether it might be null), and be called accordingly:
void foo(T* ptr); which is called shared_ptr<T> p; foo(p.get());
void foo(T& ptr); which is called foo(*p);
Note: the second interface is much more pointer agnostic
This is plain encapsulation: a method need be aware only of the bare minimum it needs to operate. If it does not manage the lifetime, then exposing how this lifetime is managed to the method... causes those ripples that you've witnessed.
If your methods really have to be smart pointer-agnostic, why not pass a garden variety pointer:
virtual void MyMethod(MyClass* ptr)
{
ptr->doSomething();
}
and do
MyObj->MyMethod(mySmartPtr.get());
at the call site ?
You can even pass a reference:
virtual void MyMethod(const MyClass& ptr)
{
ptr->doSomething();
}
and do
MyObj->MyMethod(*mySmartPointer);
which has the advantage to have pretty much always the same syntax (except for std::weak_ptr).
Otherwise, smart pointer types are becoming standard: std::unique_ptr, std::shared_ptr and std::weak_ptr have their use.
Smart pointers are for storing objects according to certain resource management semantics. You can pretty much always extract a bare pointer from them.
Methods should therefore in general expect bare pointers, or specific kinds of smart pointers in case they need to. For instance, you can pass a shared_ptr by const reference in order to store a shared handle to an object.
To this goal, you can add typedefs:
struct MyClass
{
typedef std::shared_ptr<MyClass> Handle;
static Handle CreateHandle(...);
...
private:
Handle internalHandle;
void setHandle(const MyClass::Handle& h);
};
and do
void MyClass::setHandle(const MyClass::Handle& h)
{
this->internalHandle = h;
}
When this happens and you change your
pointer type you have to either go and
change a whole bunch of you method
signatures to take the new pointer
type, or at each call site you have to
convert between pointer types. You
also have a problem if you have the
same function but want it to take
multiple pointer types.
I am wondering if there already exists
a generic way to deal with, i.e. write
methods that are agnostic of the
pointer type you pass to it?
You almost answered your own question.
A smart-pointer should be present in function signatures only if ownership of an object is being transferred. If a function does not take ownership of an object, it should accept it by plain pointer or reference. Only functions that need to own an object or extend its lifetime should accept it by smart pointer.
Answering my own question so it doesn't appear in the unanswered list. Basically none of the other answers found what I considered to be a serious flaw in my idea. Props to #Alexandre C. for understanding what I was trying to do.