According to Wikipedia:
An object is first-class when it:
can be stored in variables and data structures
can be passed as a parameter to a subroutine
can be returned as the result of a subroutine
can be constructed at runtime
has intrinsic identity (independent of any given name)
Somebody had once told me that raw pointers are not first class objects while smart pointers like std::auto_ptr are. But to me, a raw pointer (to an object or to a function) in C++ does seem to me to satisfy the conditions stated above to qualify as a first class object. Am I missing something?
The definition in Wikipedia of "first-class" is incorrect. In a programming language, an entity is considered to be "first-class" if it does not require any special treatment by the compiler and can be interpreted, understood, used, etc. just like any other object in the language. The reason pointers in C++ are not first class objects is that *p for a pointer p is not interpreted as simply invoking an overloaded operator* like it would be for any other object; instead, the compiler has to treat p specially based on the fact that it is a pointer type. When passing a pointer or a reference you cannot simply pass any value object (and some of those value objects happen to be pointers), but you are actually saying that it is a different type (a pointer type or a reference type, in place of a value type), and its interpretation / usage is subject to its type.
A good example of a language where all objects are first-class is Python. In Python, everything has some sort of type associated with it, it can be treated as an object, it can have members, functions, etc. As an extreme example, even functions in Python are objects that simply contain code and happen to be callable; in fact, if you try to call the function using the __call__ member function (which is akin to overloading operator() in C++, and which other non-function objects can provide), an ordinary function will return a function wrapper as a result (so that you don't have to treat functions as a special case). You can even add members to functions like you would to other objects. Here is an example of such a thing:
>>> def f(x):
... return x;
...
>>> func = f;
>>> print f.__class__
>>> print f(5)
5
>>> print f.__call__(5)
5
>>> f.myOwnMember = "Hello world!"
>>> print f.myOwnMember
Hello world!
I apologize for the Python in a C++ post, but it is hard to explain the concept without the contrast.
The Wikipedia article is shit (for once I agree with "citation needed"), but you are not missing anything: according to the Wikipedia definition, and also according to any sane definition, raw C++ pointers are first-class objects.
If anyone would like to volunteer to help me with some searches through Google Scholar, I would really like to see that Wikipedia article fixed. I am a subject-matter expert, but I am also extremely busy—I would love to have a partner to work on this with.
Actually both - "Pointers are FCO" and "Pointers are not FCO" - are correct. We need to take the statements in the respective context. (FCO - First Class Object)
When we talk of a 'pointer' in C++, we are talking of a data type that stores the address of some other data. Now whether this is FCO or not, depends really on how we envisage to use it. Unfortunately, this use semantics is not built-in for Pointers in C++.
If we are using pointers merely to 'point' to data, it will satisfy the requirements of being an FCO. However, if we use a pointer to 'hold' a data, then it can no longer be considered an FCO as its copy and assignment semantics do not work. Such 'resource handling' pointers (or more directly referred as 'raw pointers') are our interest in the context of Smart Pointer study. These are not FCO while the corresponding Smart Pointers are. In contrast, mere tracking pointers would continue to meet the requirements for an FCO.
The following paragraph from "Modern C++ Design" book nicely elucidates the point.
I quote from the Chapter on Smart Pointers:
An object with value semantics is an
object that you can copy and assign
to. Type int is the perfect example of
a first-class object. You can create,
copy, and change integer values
freely. A pointer that you use to
iterate in a buffer also has value
semantics—you initialize it to point
to the beginning of the buffer, and
you bump it until you reach the end.
Along the way, you can copy its value
to other variables to hold temporary
results.
With pointers that hold values
allocated with new, however, the story
is very different. Once you have
written
Widget* p = new Widget;
the variable p not only points to, but also owns, the memory
allocated for the Widget object. This
is because later you must issue delete
p to ensure that the Widget object is
destroyed and its memory is released.
If in the line after the line just
shown you write
p = 0; // assign something else to p
you lose ownership of the object previously pointed to by p, and you
have no chance at all to get a grip on
it again. You have a resource leak,
and resource leaks never help.
I hope this clarifies.
As Mike commented there seems to be a bit of confusion with the word Object here. Your friend seems to be confusing between objects as in "first class objects" and Objects as "instances of C++ classes", two very different things where the word object is used for two things completely different.
In Object Oriented programming, the word Object is used to speak of some data structure gathering datafields and methods to manipulate them. With C++ you define them in classes, get inheritance, etc.
On the other hand in the expression "first class object" the word object has a much broader meaning, not limited to Objects of object oriented languages. An integer qualify as a first class object, a pointer also do. I believe "Objects" with the other meaning does not really qualify as first class object in C++ (they do in other OO programming languages) because of restrictions on them on parameter passing (but you can pass around pointers or references to them, the difference is not so big).
Then the things are really the opposite of what you friend said: pointers are first class objects but are not instances of C++ classes, smart pointers are instances of C++ classes but are not first class objects.
From wiki:
In computing, a first-class object
(also value, entity, and citizen), in
the context of a particular
programming language, is an entity
which can be passed as a parameter,
returned from a subroutine, or
assigned into a variable.1 In
computer science the term reification
is used when referring to the process
(technique, mechanism) of making
something a first-class
object.[citation needed]
I included the entire entry in wiki-pedia for context. According to this definition - an entity which can be passed as a parameter - the C++ pointer would be a first class object.
Additional links supporting the argument:
catalysoft
wapedia
reification: referring to the process (technique, mechanism) of making something a first-class object
Related
Picture of text
To read something, we need somewhere to read into; that is, we need somewhere in the computer's memory to place what we read. We call such a "place" an object. An object is a region of memory with a type that specifies what kind of information can be placed in it. A named object is called a variable. For example, character strings are put into string variables and integers are put into int variables. You can think of an object as a "box" into which you can put a value of the object's type
An object is a region of memory with a type that specifies what kind of information can be placed in it? Here they give an example with ints and strings which are like built in datatypes, I already know some oop so I was kinda confused because like I thought an object was kinda its own separate entity with its own user created datatype? I get that all objects have their own region of memory but they're basically calling a variable of type int, age, with the value 42 an object. Is this correct?
This is the wording from the C++ standard [intro.object]:
The constructs in a C++ program create, destroy, refer to, access, and
manipulate objects. An object is created by a definition, by a
new-expression ([expr.new]), by an operation that implicitly creates
objects (see below), when implicitly changing the active member of a
union, or when a temporary object is created ([conv.rval],
[class.temporary]). An object occupies a region of storage in its
period of construction ([class.cdtor]), throughout its lifetime, and
in its period of destruction ([class.cdtor]).
There is more to read about objects, I just picked this paragraph from the intro so you can see that the term "object" in C++ is more general than "instance of a user defined class". Note how the last sentence is quite similar with what Bjarne wrote in the book. What you refer to (instance of a class) is called "class object".
A perhaps more readable introduction to objects in C++ can be found here: https://en.cppreference.com/w/cpp/language/object.
PS: I am not a historian, but I suppose that the term "object" was already in wide-spread use before Object Orientation came into existance. There have been objects before. They were not invented by OOP, just put into the focus. Also note that C++ is not a (pure) OOP language, but rather multi-paradigm.
but they're basically calling a variable of type int, age, with the value 42 an object. Is this correct?
They're correct.
I already know some oop
Modern OOP nomenclature is different from nomenclature of C++.
and strings which are like built in datatypes
The std::string type is a class i.e. a "user defined" data type (although it is provided by the standard library).
I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int ¤t = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.
I found below post
C++ polymorphism without pointers
that explains to have polymorphism feature C++ must use pointer or reference types.
I looked into some further resources all of them says the same but the reason .
Is there any technical difficulty to support polymorphism with values or it is possible but C++ have decided to not to provide that ability ?
The problem with treating values polymorphically boils down to the object slicing issue: since derived objects could use more memory than their base class, declaring a value in the automatic storage (i.e. on the stack) leads to allocating memory only for the base, not for the derived object. Therefore, parts of the object that belong to the derived class may be sliced off. That is why C++ designers made a conscious decision to re-route virtual member-functions to the implementations in the base class, which cannot touch the data members of the derived class.
The difficulty comes from the fact that what you call objects are allocated in automatic memory (on the stack) and the size must be known at compile-time.
Size of pointers are known at compile-time regardless of what they point to, and references are implemented as pointers under the hood, so no worries there.
Consider objects though:
BaseObject obj = ObjectFactory::createDerived();
How much memory should be allocated for obj if createDerived() conditionally returns derived objects? To overcome this, the object returned is sliced and "converted* to a BaseObject whose size is known.
This all stems from the "pay for what you use" mentality.
The short answer is because the standard specifies it. But are there any insurmountable technical barriers to allowing it?
C++ data structures have known size. Polymorphism typically requires that the data structures can vary in size. In general, you cannot store a different (larger) type within the storage of a smaller type, so storing a child class with extra variables (or other reasons to be larger) within storage for a parent class is not generally possible.
Now, we can get around this. We can create a buffer larger than what is required to store the parent class, and construct child classes within that buffer: but in this case, exposure to said instance will be via references, and you will carefully wrap the class.
This is similar to the technique known as "small object optimization" used by boost::any, boost::variant and many implementations of std::string, where we store (by value) objects in a buffer within a class and manage their lifetime manually.
There is also an issue where Derived pointers to an instance can have different values than Base pointers to an instance: value instances of objects in C++ are presumed to exist where the storage for the instance starts by most implementations.
So in theory, C++ could allow polymorphic instances if we restricted it to derived classes that could be stored in the same memory footprint, with the same "pointer to" value for both Derived and Base, but this would be an extremely narrow corner case, and could reduce the kinds of optimizations and assumptions compilers could make about value instances of a class in nearly every case! (Right now, the compiler can assume that value instances of a class C have virtual methods that are not overridden elsewhere, as an example) That is a non trivial cost for an extremely marginal benefit.
What more, we are capable of using the C++ language to emulate this corner case using existing language features (placement new, references, and manual destruction) if we really need it, without imposing that above cost.
It is not immediately clear what you mean by "polymorphism with values". In C++ when you have an object of type A, it always behaves as an object of type A. This is perfectly normal and logical thing to expect. I don't see how it can possible behave in any other way. So, it is not clear what "ability" that someone decided "not to provide" you are talking about.
Polymorphism in C++ means one thing: virtual function calls made through an expression with polymorphic type are resolved in accordance with the dynamic type of that expression (as opposed to static type for non-virtual functions). That's all there is to it.
Polymorphism in C++ always works in accordance with the above rule. It works that way through pointers. It works that way through references. It works that way through immediate objects ("values" as you called them). So, it not not correct to say that polymorphism in C++ only works with pointers and references. It works with "values" as well. They all follow the same rule, as stated above.
However, for an immediate object (a "value") its dynamic type is always the same as it static type. So, even though polymorphism works for immediate values, it does not demonstrate anything truly "polymorphic". The behavior of an immediate object with polymorphism is the same as it would be without polymorphism. So, polymorphism of an immediate object is degenerate, trivial polymorphism. It exists only conceptually. This is, again, perfectly logical: an object of type A should behave as an object of type A. How else can it behave?
In order to observe the actual non-degenerate polymorphism, one needs an expression whose static type is different from its dynamic type. Non-trivial polymorphism is observed when an expression of static type A behaves (with regard to virtual function calls) as an object of different type B. For this an expression of static type A must actually refer to an object of type B. This is only possible with pointers or references. The only way to create that difference between static and dynamic type of an expression is through using pointers or references.
In other words, its not correct to say that polymorphism in C++ only works through pointers or references. It is correct to say is that with pointers or references polymorphism becomes observable and non-trivial.
I understand the syntax and general semantics of pointers versus references, but how should I decide when it is more-or-less appropriate to use references or pointers in an API?
Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.
E.g. in the following code:
void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
int a = 0;
add_one(a); // Not clear that a may be modified
add_one(&a); // 'a' is clearly being passed destructively
}
With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?
EDIT (OUTDATED):
Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".
Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.
Use reference wherever you can, pointers wherever you must.
Avoid pointers until you can't.
The reason is that pointers make things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.
So the rule of thumb is to use pointers only if there is no other choice.
For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to std::optional (requires C++17; before that, there's boost::optional).
Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous parts of the whole code base.
In your example, there is no point in using a pointer as argument because:
if you provide nullptr as the argument, you're going in undefined-behaviour-land;
the reference attribute version doesn't allow (without easy to spot tricks) the problem with 1.
the reference attribute version is simpler to understand for the user: you have to provide a valid object, not something that could be null.
If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggests that you can pass nullptr as the argument and it is fine for the function. That's kind of a contract between the user and the implementation.
The performances are exactly the same, as references are implemented internally as pointers. Thus you do not need to worry about that.
There is no generally accepted convention regarding when to use references and pointers. In a few cases you have to return or accept references (copy constructor, for instance), but other than that you are free to do as you wish. A rather common convention I've encountered is to use references when the parameter must refer an existing object and pointers when a NULL value is ok.
Some coding convention (like Google's) prescribe that one should always use pointers, or const references, because references have a bit of unclear-syntax: they have reference behaviour but value syntax.
From C++ FAQ Lite -
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need
"reseating". This usually means that references are most useful in a
class's public interface. References typically appear on the skin of
an object, and pointers on the inside.
The exception to the above is where a function's parameter or return
value needs a "sentinel" reference — a reference that does not refer
to an object. This is usually best done by returning/taking a pointer,
and giving the NULL pointer this special significance (references must
always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since
they provide reference semantics that isn't explicit in the caller's
code. After some C++ experience, however, one quickly realizes this is
a form of information hiding, which is an asset rather than a
liability. E.g., programmers should write code in the language of the
problem rather than the language of the machine.
My rule of thumb is:
Use pointers for outgoing or in/out parameters. So it can be seen that the value is going to be changed. (You must use &)
Use pointers if NULL parameter is acceptable value. (Make sure it's const if it's an incoming parameter)
Use references for incoming parameter if it cannot be NULL and is not a primitive type (const T&).
Use pointers or smart pointers when returning a newly created object.
Use pointers or smart pointers as struct or class members instead of references.
Use references for aliasing (eg. int ¤t = someArray[i])
Regardless which one you use, don't forget to document your functions and the meaning of their parameters if they are not obvious.
Disclaimer: other than the fact that references cannot be NULL nor "rebound" (meaning thay can't change the object they're the alias of), it really comes down to a matter of taste, so I'm not going to say "this is better".
That said, I disagree with your last statement in the post, in that I don't think the code loses clarity with references. In your example,
add_one(&a);
might be clearer than
add_one(a);
since you know that most likely the value of a is going to change. On the other hand though, the signature of the function
void add_one(int* const n);
is somewhat not clear either: is n going to be a single integer or an array? Sometimes you only have access to (poorly documentated) headers, and signatures like
foo(int* const a, int b);
are not easy to interpret at first sight.
Imho, references are as good as pointers when no (re)allocation nor rebinding (in the sense explained before) is needed. Moreover, if a developer only uses pointers for arrays, functions signatures are somewhat less ambiguous. Not to mention the fact that operators syntax is way more readable with references.
Like others already answered: Always use references, unless the variable being NULL/nullptr is really a valid state.
John Carmack's viewpoint on the subject is similar:
NULL pointers are the biggest problem in C/C++, at least in our code. The dual use of a single value as both a flag and an address causes an incredible number of fatal issues. C++ references should be favored over pointers whenever possible; while a reference is “really” just a pointer, it has the implicit contract of being not-NULL. Perform NULL checks when pointers are turned into references, then you can ignore the issue thereafter.
http://www.altdevblogaday.com/2011/12/24/static-code-analysis/
Edit 2012-03-13
User Bret Kuhns rightly remarks:
The C++11 standard has been finalized. I think it's time in this thread to mention that most code should do perfectly fine with a combination of references, shared_ptr, and unique_ptr.
True enough, but the question still remains, even when replacing raw pointers with smart pointers.
For example, both std::unique_ptr and std::shared_ptr can be constructed as "empty" pointers through their default constructor:
http://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr
http://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr
... meaning that using them without verifying they are not empty risks a crash, which is exactly what J. Carmack's discussion is all about.
And then, we have the amusing problem of "how do we pass a smart pointer as a function parameter?"
Jon's answer for the question C++ - passing references to boost::shared_ptr, and the following comments show that even then, passing a smart pointer by copy or by reference is not as clear cut as one would like (I favor myself the "by-reference" by default, but I could be wrong).
It is not a matter of taste. Here are some definitive rules.
If you want to refer to a statically declared variable within the scope in which it was declared then use a C++ reference, and it will be perfectly safe. The same applies to a statically declared smart pointer. Passing parameters by reference is an example of this usage.
If you want to refer to anything from a scope that is wider than the scope in which it is declared then you should use a reference counted smart pointer for it to be perfectly safe.
You can refer to an element of a collection with a reference for syntactic convenience, but it is not safe; the element can be deleted at anytime.
To safely hold a reference to an element of a collection you must use a reference counted smart pointer.
There is problem with "use references wherever possible" rule and it arises if you want to keep reference for further use. To illustrate this with example, imagine you have following classes.
class SimCard
{
public:
explicit SimCard(int id):
m_id(id)
{
}
int getId() const
{
return m_id;
}
private:
int m_id;
};
class RefPhone
{
public:
explicit RefPhone(const SimCard & card):
m_card(card)
{
}
int getSimId()
{
return m_card.getId();
}
private:
const SimCard & m_card;
};
At first it may seem to be a good idea to have parameter in RefPhone(const SimCard & card) constructor passed by a reference, because it prevents passing wrong/null pointers to the constructor. It somehow encourages allocation of variables on stack and taking benefits from RAII.
PtrPhone nullPhone(0); //this will not happen that easily
SimCard * cardPtr = new SimCard(666); //evil pointer
delete cardPtr; //muahaha
PtrPhone uninitPhone(cardPtr); //this will not happen that easily
But then temporaries come to destroy your happy world.
RefPhone tempPhone(SimCard(666)); //evil temporary
//function referring to destroyed object
tempPhone.getSimId(); //this can happen
So if you blindly stick to references you trade off possibility of passing invalid pointers for the possibility of storing references to destroyed objects, which has basically same effect.
edit: Note that I sticked to the rule "Use reference wherever you can, pointers wherever you must. Avoid pointers until you can't." from the most upvoted and accepted answer (other answers also suggest so). Though it should be obvious, example is not to show that references as such are bad. They can be misused however, just like pointers and they can bring their own threats to the code.
There are following differences between pointers and references.
When it comes to passing variables, pass by reference looks like pass by value, but has pointer semantics (acts like pointer).
Reference can not be directly initialized to 0 (null).
Reference (reference, not referenced object) can not be modified (equivalent to "* const" pointer).
const reference can accept temporary parameter.
Local const references prolong the lifetime of temporary objects
Taking those into account my current rules are as follows.
Use references for parameters that will be used locally within a function scope.
Use pointers when 0 (null) is acceptable parameter value or you need to store parameter for further use. If 0 (null) is acceptable I am adding "_n" suffix to parameter, use guarded pointer (like QPointer in Qt) or just document it. You can also use smart pointers. You have to be even more careful with shared pointers than with normal pointers (otherwise you can end up with by design memory leaks and responsibility mess).
Any performance difference would be so small that it wouldn't justify using the approach that's less clear.
First, one case that wasn't mentioned where references are generally superior is const references. For non-simple types, passing a const reference avoids creating a temporary and doesn't cause the confusion you're concerned about (because the value isn't modified). Here, forcing a person to pass a pointer causes the very confusion you're worried about, as seeing the address taken and passed to a function might make you think the value changed.
In any event, I basically agree with you. I don't like functions taking references to modify their value when it's not very obvious that this is what the function is doing. I too prefer to use pointers in that case.
When you need to return a value in a complex type, I tend to prefer references. For example:
bool GetFooArray(array &foo); // my preference
bool GetFooArray(array *foo); // alternative
Here, the function name makes it clear that you're getting information back in an array. So there's no confusion.
The main advantages of references are that they always contain a valid value, are cleaner than pointers, and support polymorphism without needing any extra syntax. If none of these advantages apply, there is no reason to prefer a reference over a pointer.
Copied from wiki-
A consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly. References are a syntactically controversial feature of C++ because they obscure an identifier's level of indirection; that is, unlike C code where pointers usually stand out syntactically, in a large block of C++ code it may not be immediately obvious if the object being accessed is defined as a local or global variable or whether it is a reference (implicit pointer) to some other location, especially if the code mixes references and pointers. This aspect can make poorly written C++ code harder to read and debug (see Aliasing).
I agree 100% with this, and this is why I believe that you should only use a reference when you a have very good reason for doing so.
Points to keep in mind:
Pointers can be NULL, references cannot be NULL.
References are easier to use, const can be used for a reference when we don't want to change value and just need a reference in a function.
Pointer used with a * while references used with a &.
Use pointers when pointer arithmetic operation are required.
You can have pointers to a void type int a=5; void *p = &a; but cannot have a reference to a void type.
Pointer Vs Reference
void fun(int *a)
{
cout<<a<<'\n'; // address of a = 0x7fff79f83eac
cout<<*a<<'\n'; // value at a = 5
cout<<a+1<<'\n'; // address of a increment by 4 bytes(int) = 0x7fff79f83eb0
cout<<*(a+1)<<'\n'; // value here is by default = 0
}
void fun(int &a)
{
cout<<a<<'\n'; // reference of original a passed a = 5
}
int a=5;
fun(&a);
fun(a);
Verdict when to use what
Pointer: For array, linklist, tree implementations and pointer arithmetic.
Reference: In function parameters and return types.
The following are some guidelines.
A function uses passed data without modifying it:
If the data object is small, such as a built-in data type or a small structure, pass it by value.
If the data object is an array, use a pointer because that’s your only choice. Make the pointer a pointer to const.
If the data object is a good-sized structure, use a const pointer or a const
reference to increase program efficiency.You save the time and space needed to
copy a structure or a class design. Make the pointer or reference const.
If the data object is a class object, use a const reference.The semantics of class design often require using a reference, which is the main reason C++ added
this feature.Thus, the standard way to pass class object arguments is by reference.
A function modifies data in the calling function:
1.If the data object is a built-in data type, use a pointer. If you spot code
like fixit(&x), where x is an int, it’s pretty clear that this function intends to modify x.
2.If the data object is an array, use your only choice: a pointer.
3.If the data object is a structure, use a reference or a pointer.
4.If the data object is a class object, use a reference.
Of course, these are just guidelines, and there might be reasons for making different
choices. For example, cin uses references for basic types so that you can use cin >> n
instead of cin >> &n.
Your properly written example should look like
void add_one(int& n) { n += 1; }
void add_one(int* const n)
{
if (n)
*n += 1;
}
That's why references are preferable if possible
...
References are cleaner and easier to use, and they do a better job of hiding information.
References cannot be reassigned, however.
If you need to point first to one object and then to another, you must use a pointer. References cannot be null, so if any chance exists that the object in question might be null, you must not use a reference. You must use a pointer.
If you want to handle object manipulation on your own i.e if you want to allocate memory space for an object on the Heap rather on the Stack you must use Pointer
int *pInt = new int; // allocates *pInt on the Heap
In my practice I personally settled down with one simple rule - Use references for primitives and values that are copyable/movable and pointers for objects with long life cycle.
For Node example I would definitely use
AddChild(Node* pNode);
Just putting my dime in. I just performed a test. A sneeky one at that. I just let g++ create the assembly files of the same mini-program using pointers compared to using references.
When looking at the output they are exactly the same. Other than the symbolnaming. So looking at performance (in a simple example) there is no issue.
Now on the topic of pointers vs references. IMHO I think clearity stands above all. As soon as I read implicit behaviour my toes start to curl. I agree that it is nice implicit behaviour that a reference cannot be NULL.
Dereferencing a NULL pointer is not the problem. it will crash your application and will be easy to debug. A bigger problem is uninitialized pointers containing invalid values. This will most likely result in memory corruption causing undefined behaviour without a clear origin.
This is where I think references are much safer than pointers. And I agree with a previous statement, that the interface (which should be clearly documented, see design by contract, Bertrand Meyer) defines the result of the parameters to a function. Now taking this all into consideration my preferences go to
using references wherever/whenever possible.
For pointers, you need them to point to something, so pointers cost memory space.
For example a function that takes an integer pointer will not take the integer variable. So you will need to create a pointer for that first to pass on to the function.
As for a reference, it will not cost memory. You have an integer variable, and you can pass it as a reference variable. That's it. You don't need to create a reference variable specially for it.
What is the best practice for returning references from class methods. Is it the case that basic types you want to return without a reference whereas class objects you want to return by reference. Any articles, best practices article that you recommend.
I'll assume that by class method you mean member function. And that by "return by reference" you mean "return reference to member data". This is mainly as opposed to returning a reference to local, which is clearly wrong.
When should you return a reference to member data, and when the data itself ?
By default, you should be returning the data itself (aka "by value"). This avoids several problems with returning a reference:
Users storing the reference and becoming dependant on the lifetime of your members, without considering how long the containing object (your object) will live. Leads to dangling pointers.
User code becoming dependant on the exact return type. For example, you use a vector<T> for implementation (and that's what your getter returns). User code like "vector<T> foo = obj.getItems()" appears. Then you change your implementation (and getter) to use a deque<T> -- user code breaks. If you had been returning by value, you could simply make the getter create a local vector, copy the data over from the member deque, and return the result. Quite reasonable for small-sized collections. [*]
So when should you return a reference instead?
You can consider it when the returned object is huge (Image) or non-copyable (boost::signal). But, as always, you can instead opt for the more OOP pattern of having your class do stuff rather than have stuff hanging from it. In the Image case, you can provide a drawCircle member function, rather than returning Image& and having your users draw a circle on it.
When your data is logically owned by your user, and you're just holding it for him. Consider std collections: vector<T>::operator[] returns a reference to T because that's what I want to get at: my exact object, not a copy of it.
[*] There is a better way to ensure future-proof code. Rather than returning a vector (by ref of by value) return a pair of iterators to your vector -- a beginning and an ending one. This lets your users do everything they normally do with a deque or a vector, but independent of the actual implementation. Boost provides boost::iterator_pair for this purpose. As a perk, it also has operator[] overloaded, so you can even do "int i = obj.getItems()[5]" rather than "int i = obj.getItems().begin()[5]".
This solution is generalizable to any situation which allows you to treat types generically. For example, if you keep a Dog member but your users only need to know it's an Animal (because they only call eat() and sleep()), return an Animal reference/pointer to a freestore-allocated copy of your dog. Then when you decide dogs are weaklings and you really need a wolf for the implementation, user code won't break.
This sort of information-hiding does more than ensure future-compatibility. It also helps keep your design clean.
Overloading assignment operators (like =, +=, -= etc.) is a good example where returning by reference makes a lot of sense. This kind of methods would obviously return large objects, and you don't want to get back pointers, so returning a reference is the best way to go. Works like a pointer and looks like returning by value.
A reference is a pointer in disguise (so it's 4 bytes on 32 bit machines, and 8 bytes on 64 bit machines). So the rule of thumb is: if copying the object is more expensive than returning a pointer, use a pointer (or reference, since that's the same thing).
Which types are more expensive to copy depends on the architecture, compiler, the type itself etc. In some cases copying an object that's 16 bytes can be faster than returning a pointer to it (for example, if an object maps to SSE register, or similar situation).
Now, of course, returning a reference to a local variable does not make sense. Because the local variable will be gone after the function exits. So usually you'd return references/pointers to member variables, or global/static variables, or dynamically allocated objects.
There are situations where you don't want to return pointer/reference to an object, even if copying the object is expensive. Mostly when you don't want to tie the calling code into the lifetime of the original object.
Scott Meyers' book, Effective C++, has several items related to this topic. I would definitely check out the item titled, "Don't try to return a reference when you must return an object." This is item #23 in the 1st or 2nd editions, or #21 in the 3rd edition.
I would recommend staying away from returning references for the same reason as Iraimbilanja points out, but in my opinion you can get very good results by using shared pointers (e.g., boost, tr1) on the member data and use those in return. That way you do not have to copy the object but can still manage life time issues.
class Foo
{
private:
shared_ptr<Bar> _bar;
public:
shared_ptr<Bar> getBar() {return _bar;}
};
Usually the cost of copying Bar is greater than the cost of constructing new shared_ptrs, should this not be the case it can still be worth using for life time management.
if NULL is a possible return value, the method must return a pointer because you can't return a reference to NULL.
Return basic types by value, except if you want to let the caller access the actual member.
Return class objects (even std::string) by reference.