Does it make sense to combine optional with reference_wrapper? - c++

It occurred to me that in C++ it is possible to use the type std::optional<std::reference_wrapper<T>>. An object of this type is essentially a reference to an object of type T or a null value, i.e., pretty much a pointer. My questions:
Is there any conceptual difference between std::optional<std::reference_wrapper<T>> and T*?
Is there any practical difference? Are there situations where it might be advisable to choose std::optional<std::reference_wrapper<T>> over T*?

Is there any conceptual difference between std::optional<std::reference_wrapper<T>> and T*?
std::optional<>, as the name already suggest, is meant to be used when we could have a value or might not have any value at all.
The equivalent of having no value for a T* object would be assigning nullptr to it, i.e.: the pointer will point to nowhere, as opposed to somewhere (or even anywhere, i.e.: uninitialized). It can be said that std::optional<> exports the concept of nullptr for pointers to any arbitrary type. So, I would say they are conceptually very similar, being the std::option<> approach a kind of generalization.
Is there any practical difference? Are there situations where it might be advisable to choose std::optional<std::reference_wrapper<T>> over T*?
I can think of the size. std::optional<> contains an internal flag for indicating the presence/absence of a value, whereas for T* the nullptr is encoded directly as one of the values the pointer can store. So a std::optional<std::reference_wrapper<T>> object will be larger than a T*.
When it comes to safety, unlike T*, std::optional<> provides the member function value() which throws an exception if there is no value (it provides as well as the unsafe operator*() as T* does).
Also, using std::optional<std::reference_wrapper<T>> instead of T* , for example, as a function's return value may indicate in a more explicit way that there might be no value at all.

The main difference between std::optional<std::reference_wrapper<T>> and T* is that with T* you have to think about who owns the memory that is pointed to.
If a function returns T* you have to know if you are responsible for freeing the memory or someone else is. That's not something you have to be concerned with when it's a reference.

Related

Difference between reference and non-reference in C++ [duplicate]

I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int &current = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.

Can I cast shared_ptr<T> & to shared_ptr<T const> & without changing use_count?

I have a program that uses boost::shared_ptrs and, in particular, relies on the accuracy of the use_count to perform optimizations.
For instance, imagine an addition operation with two argument pointers called lhs and rhs. Say they both have the type shared_ptr<Node>. When it comes time to perform the addition, I'll check the use_count, and if I find that one of the arguments has a reference count of exactly one, then I'll reuse it to perform the operation in place. If neither argument can be reused, I must allocate a new data buffer and perform the operation out-of-place. I'm dealing with enormous data structures, so the in-place optimization is very beneficial.
Because of this, I can never copy the shared_ptrs without reason, i.e., every function takes the shared_ptrs by reference or const reference to avoid distorting use_count.
My question is this: I sometimes have a shared_ptr<T> & that I want to cast to shared_ptr<T const> &, but how can I do it without distorting the use count? static_pointer_cast returns a new object rather than a reference. I'd be inclined to think that it would work to just cast the whole shared_ptr, as in:
void f(shared_ptr<T> & x)
{
shared_ptr<T const> & x_ = *reinterpret_cast<shared_ptr<T const> *>(&x);
}
I highly doubt this complies with the standard, but, as I said, it will probably work. Is there a way to do this that's guaranteed safe and correct?
Updating to Focus the Question
Critiquing the design does not help answer this post. There are two interesting questions to consider:
Is there any guarantee (by the writer of boost::shared_ptr, or by the standard, in the case of std::tr1::shared_ptr) that shared_ptr<T> and shared_ptr<T const> have identical layouts and behavior?
If (1) is true, then is the above a legal use of reinterpret_cast? I think you would be hard-pressed to find a compiler that generates failing code for the above example, but that doesn't mean it's legal. Whatever your answer, can you find support for it in the C++ standard?
I sometimes have a shared_ptr<T> & that I want to cast to shared_ptr<T const> &, but how can I do it without distorting the use count?
You don't. The very concept is wrong. Consider what happens with a naked pointer T* and const T*. When you cast your T* into a const T*, you now have two pointers. You don't have two references to the same pointer; you have two pointers.
Why should this be different for smart pointers? You have two pointers: one to a T, and one to a const T. They're both sharing ownership of the same object, so you are using two of them. Your use_count therefore ought to be 2, not 1.
Your problem is your attempt to overload the meaning of use_count, co-opting its functionality for some other purpose. In short: you're doing it wrong.
Your description of what you do with shared_ptrs who's use_count is one is... frightening. You're basically saying that certain functions co-opt one of its arguments, which the caller is clearly using (since the caller obviously is still using it). And the caller doesn't know which one was claimed (if any), so the caller has no idea what the state of the arguments is after the function. Modifying the arguments for operations like that is usually not a good idea.
Plus, what you're doing can only work if you pass shared_ptr<T> by reference, which itself isn't a good idea (like regular pointers, smart pointers should almost always be taken by value).
In short, you're taking a very commonly used object with well-defined idioms and semantics, then requiring that it be used in a way that they are almost never used, with specialized semantics that work counter to the way everyone actually uses them. That's not a good thing.
You have effectively created the concept of co-optable pointer, a shared pointer that can be in 3 use states: empty, in use by the person who gave it to you only and thus you can steal from it, and in use by more than one person so you can't have it. It's not the semantics that shared_ptr exists to support. So you should write your own smart pointer that provides these semantics in a much more natural way.
Something that recognizes the difference between how many instances of a pointer you have around and how many actual users of it you have. That way, you can pass it around by value properly, but you have some way of saying that you are currently using it and don't want one of these other functions to claim it. It could use shared_ptr internally, but it should provide its own semantics.
static_pointer_cast is the right tool for the job — you've already identified that.
The problem with it isn't that it returns a new object, but rather that it leaves the old object unchanged. You want to get rid of the non-const pointer and move on with the const pointer. What you really want is static_pointer_cast< T const >( std::move( old_ptr ) ). But there isn't an overload for rvalue references.
The workaround is simple: manually invalidate the old pointer just as std::move would.
auto my_const_pointer = static_pointer_cast< T const >( modifiable_pointer );
modifiable_pointer = nullptr;
It might be slightly slower than reinterpret_cast, but it's a lot more likely to work. Don't underestimate how complex the library implementation is, and how it can fail when abused.
An aside: use pointer.unique() instead of use_count() == 1. Some implementations might use a linked list with no cached use count, making use_count() O(N) whereas the unique test remains O(1). The Standard recommends unique for copy on write optimization.
EDIT: Now I see you mention
I can never copy the shared_ptrs without reason, i.e., every function takes the shared_ptrs by reference or const reference to avoid distorting use_count.
This is Doing It Wrong. You've added another layer of ownership semantics atop what shared_ptr already does. They should be passed by value, with std::move used where the caller no longer desires ownership. If the profiler says you're spending time adjusting reference counts, then you might add some references-to-pointer in the inner loops. But as a general rule, if you can't set a pointer to nullptr because you're no longer using it, but someone else might be, then you've really lost track of ownership.
If you cast a shared_ptr to a different type, without changing the reference count, this implies that you'll now have two pointers to the same data. Hence, unless you erase the old pointer, you can't do this with shared_ptrs without "distorting the reference count".
I would suggest that you use raw pointers here instead, rather than going out of your way to not use the features of shared_ptrs. If you need to sometimes create new references, use enable_shared_from_this to derive a new shared_ptr to an existing raw pointer.
When it comes time to perform the addition, I'll check the use_count, and if I find that one of the arguments has a reference count of exactly one, then I'll reuse it to perform the operation in place.
This isn't necessarily valid unless you're applying some other rules across the whole program to make it so. Consider:
shared_ptr<Node> add(shared_ptr<Node> const &lhs,shared_ptr<Node> const &rhs) {
if(lhs.use_count()==1) {
// do whatever, reusing lhs
return lhs;
}
if(rhs.use_count()==1) {
// do whatever, reusing rhs
return rhs;
}
shared_ptr<Node> new_node = ... // do whatever without reusing lhs or rhs
return new_node;
}
void foo() {
shared_ptr<Node> a,b;
shared_ptr<Node> c = add(a,b);
// error, we still have a and b, and expect that they're unchanged! they could have been modified!
}
Instead if you take the smart pointers by value:
shared_ptr<Node> add(shared_ptr<Node> lhs,shared_ptr<Node> rhs) {
And the use_count()==1 then you know that your copy is the only one and it should be safe to reuse it.
However, there's a problem in using this as an optimization, because copying a shared_ptr requires synchronization. It could well be that doing all this synchronization all over the place costs far more than you save by reusing existing shared_ptrs. All this synchronization is the reason it's recommended that code that does not take ownership of a shared_ptr should take the shared_ptr by reference instead of by value.

When to use references vs. pointers

I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int &current = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.

Return as pointer, reference or object? [duplicate]

I'm moving from Java to C++ and am a bit confused of the language's flexibility. One point is that there are three ways to store objects: A pointer, a reference and a scalar (storing the object itself if I understand it correctly).
I tend to use references where possible, because that is as close to Java as possible. In some cases, e.g. getters for derived attributes, this is not possible:
MyType &MyClass::getSomeAttribute() {
MyType t;
return t;
}
This does not compile, because t exists only within the scope of getSomeAttribute() and if I return a reference to it, it would point nowhere before the client can use it.
Therefore I'm left with two options:
Return a pointer
Return a scalar
Returning a pointer would look like this:
MyType *MyClass::getSomeAttribute() {
MyType *t = new MyType;
return t;
}
This'd work, but the client would have to check this pointer for NULL in order to be really sure, something that's not necessary with references. Another problem is that the caller would have to make sure that t is deallocated, I'd rather not deal with that if I can avoid it.
The alternative would be to return the object itself (scalar):
MyType MyClass::getSomeAttribute() {
MyType t;
return t;
}
That's pretty straightforward and just what I want in this case: It feels like a reference and it can't be null. If the object is out of scope in the client's code, it is deleted. Pretty handy. However, I rarely see anyone doing that, is there a reason for that? Is there some kind of performance problem if I return a scalar instead of a pointer or reference?
What is the most common/elegant approach to handle this problem?
Return by value. The compiler can optimize away the copy, so the end result is what you want. An object is created, and returned to the caller.
I think the reason why you rarely see people do this is because you're looking at the wrong C++ code. ;)
Most people coming from Java feel uncomfortable doing something like this, so they call new all over the place. And then they get memory leaks all over the place, have to check for NULL and all the other problems that can cause. :)
It might also be worth pointing out that C++ references have very little in common with Java references.
A reference in Java is much more similar to a pointer (it can be reseated, or set to NULL).
In fact the only real differences are that a pointer can point to a garbage value as well (if it is uninitialized, or it points to an object that has gone out of scope), and that you can do pointer arithmetics on a pointer into an array.
A C++ references is an alias for an object. A Java reference doesn't behave like that.
Quite simply, avoid using pointers and dynamic allocation by new wherever possible. Use values, references and automatically allocated objects instead. Of course you can't always avoid dynamic allocation, but it should be a last resort, not a first.
Returning by value can introduce performance penalties because this means the object needs to be copied. If it is a large object, like a list, that operation might be very expensive.
But modern compilers are very good about making this not happen. The C++ standards explicitly states that the compiler is allowed to elide copies in certain circumstances. The particular instance that would be relevant in the example code you gave is called the 'return value optimization'.
Personally, I return by (usually const) reference when I'm returning a member variable, and return some sort of smart pointer object of some kind (frequently ::std::auto_ptr) when I need to dynamically allocate something. Otherwise I return by value.
I also very frequently have const reference parameters, and this is very common in C++. This is a way of passing a parameter and saying "the function is not allowed to touch this". Basically a read-only parameter. It should only be used for objects that are more complex than a single integer or pointer though.
I think one big change from Java is that const is important and used very frequently. Learn to understand it and make it your friend.
I also think Neil's answer is correct in stating that avoiding dynamic allocation whenever possible is a good idea. You should not contort your design too much to make that happen, but you should definitely prefer design choices in which it doesn't have to happen.
Returning by value is a common thing practised in C++. However, when you are passing an object, you pass by reference.
Example
main()
{
equity trader;
isTraderAllowed(trader);
....
}
bool isTraderAllowed(const equity& trdobj)
{
... // Perform your function routine here.
}
The above is a simple example of passing an object by reference. In reality, you would have a method called isTraderAllowed for the class equity, but I was showing you a real use of passing by reference.
A point regarding passing by value or reference:
Considering optimizations, assuming a function is inline, if its parameter is declared as "const DataType objectName" that DataType could be anything even primitives, no object copy will be involved; and if its parameter is declared as "const DataType & objectName" or "DataType & objectName" that again DataType could be anything even primitives, no address taking or pointer will be involved. In both previous cases input arguments are used directly in assembly code.
A point regarding references:
A reference is not always a pointer, as instance when you have following code in the body of a function, the reference is not a pointer:
int adad=5;
int & reference=adad;
A point regarding returning by value:
as some people have mentioned, using good compilers with capability of optimizations, returning by value of any type will not cause an extra copy.
A point regarding return by reference:
In case of inline functions and optimizations, returning by reference will not involve address taking or pointer.

When to return a pointer, scalar and reference in C++?

I'm moving from Java to C++ and am a bit confused of the language's flexibility. One point is that there are three ways to store objects: A pointer, a reference and a scalar (storing the object itself if I understand it correctly).
I tend to use references where possible, because that is as close to Java as possible. In some cases, e.g. getters for derived attributes, this is not possible:
MyType &MyClass::getSomeAttribute() {
MyType t;
return t;
}
This does not compile, because t exists only within the scope of getSomeAttribute() and if I return a reference to it, it would point nowhere before the client can use it.
Therefore I'm left with two options:
Return a pointer
Return a scalar
Returning a pointer would look like this:
MyType *MyClass::getSomeAttribute() {
MyType *t = new MyType;
return t;
}
This'd work, but the client would have to check this pointer for NULL in order to be really sure, something that's not necessary with references. Another problem is that the caller would have to make sure that t is deallocated, I'd rather not deal with that if I can avoid it.
The alternative would be to return the object itself (scalar):
MyType MyClass::getSomeAttribute() {
MyType t;
return t;
}
That's pretty straightforward and just what I want in this case: It feels like a reference and it can't be null. If the object is out of scope in the client's code, it is deleted. Pretty handy. However, I rarely see anyone doing that, is there a reason for that? Is there some kind of performance problem if I return a scalar instead of a pointer or reference?
What is the most common/elegant approach to handle this problem?
Return by value. The compiler can optimize away the copy, so the end result is what you want. An object is created, and returned to the caller.
I think the reason why you rarely see people do this is because you're looking at the wrong C++ code. ;)
Most people coming from Java feel uncomfortable doing something like this, so they call new all over the place. And then they get memory leaks all over the place, have to check for NULL and all the other problems that can cause. :)
It might also be worth pointing out that C++ references have very little in common with Java references.
A reference in Java is much more similar to a pointer (it can be reseated, or set to NULL).
In fact the only real differences are that a pointer can point to a garbage value as well (if it is uninitialized, or it points to an object that has gone out of scope), and that you can do pointer arithmetics on a pointer into an array.
A C++ references is an alias for an object. A Java reference doesn't behave like that.
Quite simply, avoid using pointers and dynamic allocation by new wherever possible. Use values, references and automatically allocated objects instead. Of course you can't always avoid dynamic allocation, but it should be a last resort, not a first.
Returning by value can introduce performance penalties because this means the object needs to be copied. If it is a large object, like a list, that operation might be very expensive.
But modern compilers are very good about making this not happen. The C++ standards explicitly states that the compiler is allowed to elide copies in certain circumstances. The particular instance that would be relevant in the example code you gave is called the 'return value optimization'.
Personally, I return by (usually const) reference when I'm returning a member variable, and return some sort of smart pointer object of some kind (frequently ::std::auto_ptr) when I need to dynamically allocate something. Otherwise I return by value.
I also very frequently have const reference parameters, and this is very common in C++. This is a way of passing a parameter and saying "the function is not allowed to touch this". Basically a read-only parameter. It should only be used for objects that are more complex than a single integer or pointer though.
I think one big change from Java is that const is important and used very frequently. Learn to understand it and make it your friend.
I also think Neil's answer is correct in stating that avoiding dynamic allocation whenever possible is a good idea. You should not contort your design too much to make that happen, but you should definitely prefer design choices in which it doesn't have to happen.
Returning by value is a common thing practised in C++. However, when you are passing an object, you pass by reference.
Example
main()
{
equity trader;
isTraderAllowed(trader);
....
}
bool isTraderAllowed(const equity& trdobj)
{
... // Perform your function routine here.
}
The above is a simple example of passing an object by reference. In reality, you would have a method called isTraderAllowed for the class equity, but I was showing you a real use of passing by reference.
A point regarding passing by value or reference:
Considering optimizations, assuming a function is inline, if its parameter is declared as "const DataType objectName" that DataType could be anything even primitives, no object copy will be involved; and if its parameter is declared as "const DataType & objectName" or "DataType & objectName" that again DataType could be anything even primitives, no address taking or pointer will be involved. In both previous cases input arguments are used directly in assembly code.
A point regarding references:
A reference is not always a pointer, as instance when you have following code in the body of a function, the reference is not a pointer:
int adad=5;
int & reference=adad;
A point regarding returning by value:
as some people have mentioned, using good compilers with capability of optimizations, returning by value of any type will not cause an extra copy.
A point regarding return by reference:
In case of inline functions and optimizations, returning by reference will not involve address taking or pointer.