All,
I recently posted this question on DAL design. From that it would seem that passing a reference to an object into a function, with the function then populating that object, would be a good interface for a C++ Data Access Layer, e.g.
bool DAL::loadCar(int id, Car& car) {}
I'm now wondering if using a reference to a boost::shared_ptr would be better, e.g.
bool DAL::loadCar(int id, boost::shared_ptr<Car> &car)
Any thoughts? Does one offer advantages over the other?
What would be the implications of applying const correctness to both calls?
Thanks in advance.
As sbi says, "It depends on what the function does. "
However, I think the most important aspect of the above is not whether NULL is allowed or not, but whether the function stores a pointer to the object for later use. If the function just fills in some data then I would use reference for the following reasons:
the function can still be used by clients who do not use shared_ptr, used for stack objects, etc.
using the function with shared_ptr is still trivial - shared_ptr has dereferencing operator that returns a reference
passing NULL is not possible
less typing
I don't like using "stuff" when I don't have to
If the function needs to store pointer for later use or you anticipate the function might change in such a way that will require storing a pointer, then use shared_ptr.
It depends on what the function does.
In general, a function taking a pointer indicates that callers might call this function even if they don't have an object to pass to it - they can always pass NULL. If that fits the function's spec, then use a (smart) pointer. Passing reference counting smart pointers by references instead copying them is an optimization (and not a premature one, I should add), because it avoids needlessly increasing and decreasing the reference count, which can, in MT environments, be a noticeable performance hit.
A function taking a non-const reference as an argument expects to be passed a valid object that it might change. Callers cannot (legally) call that function unless they have a valid object and they will not call it unless they are willing to have the function change the object's state. If that better fits the function's spec, use a reference.
If you must receive a valid object (i.e. you don't want the caller to pass NULL), then by all means, do not use boost::shared_ptr. Your second example passes a reference to a "smart pointer".... ignoring the details, it is a "pointer to pointer to Car". Because it's reference, the shared_ptr object cannot be NULL.... but it doesn't meant that it can't have a NULL value (i.e. point to a "null" object).
I don't understand exactly why you would think that a reference to a smart pointer would be "better" - does the caller function use smart pointer already?
As for the implications of "const"... do you mean something like
bool DAL::loadCar(int id, const Car& car) {}
?
If yes, it would be counter-productive, you communicate to the compiler the fact that "car" doesn't change (but presumably you want it to change!).
Or do you mean to make the function "const", something like
class DAL{
bool loadCar(int id, Car& car) const;
}
?
In the latter case, you comunicate to the compiler/API user that the method "loadCar" does not modify the DAL object. It's a good idea to do so if this is true - not only that it enables some compiler optimizations, but it is generally a good thing to specify in the "contract" (function signature) that the function makes no modifications to DAL, especially if you make this implicit assumption in your code (this way you make sure that this stays true, and that in the future nobody will modify the "loadCar" function in a way that will change the "DAL" object)
In the first case you simply pass a Car and "fill it" with information. For example, you may create a "default" Car and then fill it. I see one inconvenience in this: it's not very OO to have two classes of Cars: one poor, default, useless, "empty" Car, and one truly filled in Car after it comes from the function. To me, a Car is a Car, so it should be a valid Car (one I can drive from location A to B, for example; one that I can accelerate, brake, start, stop) before and after your function.
I typically work with traditional pointers, not boost (without a problem, by the way) so I really can't comment on the latter alternative.
Related
So, as we're all hopefully aware, in Object-oriented programming when the occasion comes when you need somehow access an instance of a class in another class's method, you turn to passing that instance through arguments.
I'm curious, what's the difference in terms of good practice / less prone to breaking things when it comes to either passing an Object, or a Pointer to that object?
Get into the habit of passing objects by reference.
void DoStuff(const vector<int>& values)
If you need to modify the original object, omit the const qualifier.
void DoStuff(vector<int>& values)
If you want to be able to accept an empty/nothing answer, pass it by pointer.
void DoStuff(vector<int>* values)
If you want to do stuff to a local copy, pass it by value.
void DoStuff(vector<int> values)
Problems will only really pop up when you introduce tons of concurrency. By that time, you will know enough to know when to not use certain passing techniques.
Pass a pointer to the object if you want to be able to indicate nonexistence (by passing a NULL).
Try not to pass by value for objects, as that invokes a copy constructor to create a local version of the object within the scope of the call function. Instead, pass by reference. However, there are two modes here. In order to get the exact same effective behavior of passing by value (immutable "copy") without the overhead, pass by const reference. If you feel you will need to alter the passed object, pass by (non-const) reference.
I choose const reference as a default. Of course, non-const if you must mutate the object for the client. Deviation from using references is rarely required.
Pointers are not very C++ - like, since references are available. References are nice because they are forbidden to refer to nothing. Update: To clarify, proper containers for types and arrays are preferred, but for some internal implementations, you will need to pass a pointer.
Objects/values, are completely different in semantics. If I need a copy, I will typically just create it inside the function where needed:
void method(const std::string& str) {
std::string myCopy(str);
...
In fact what you can pass to a method is a pointer to object, a reference to the object and a copy of the object and all of these can also be constant. Depending on your needs you should choose the one that best suits your needs.
First descision you can make is whether the thing you pass should be able to change in your method or not. If you do not intend to change it then a const reference in probably the best alternative(by not changing I also mean you do not intend to call any non-const methods of that object). What are the advantages to that? You safe time for compying the object and also the method signature itself will say "I will not change that parameter".
If you have to change this object you can pass either a reference or a pointer to it. It is not very obligatory to choose just one of these options so you can go for either. The only difference I can think of is that pointer can be NULL(i.e. not pointing to any object at all) while a reference is always pointing to an existent object.
If what you need in your method is a copy of your object, then what you should pass a copy of the object(not a reference and not a pointer). For instance if your method looks like
void Foo(const A& a) {
A temp = a;
}
Then that is a clear indication that passing a copy is a better alternative.
Hope this makes things a bit clearer.
Actually, there's really no good reason for passing a pointer to an object, unless you want to somehow indicate that no object exists.
If you want to change the object, pass a reference to it. If you want to protect it from change within the function, pass it by value or at least const reference.
Some people pass by reference for the speed improvements (passing only an address of a large structure rather than the structure itself for example) but I don't agree with that. In most cases, I'd prefer my software to be safe than fast, a corollary of the saying: "you can't get any less optimised than wrong". :-)
Object-oriented programming is about polymorphism, Liskov Substitution Principle, old code calling new code, you name it. Pass a concrete (derived) object to a routine that works with more abstract (base) objects. If you are not doing that, you are not doing OOP.
This is only achievable when passing references or pointers. Passing by value is best reserved for, um, values.
It is useful to distinguish between values and objects. Values are always concrete, there's no polymorphism. They are often immutable. 5 is 5 and "abc" is "abc". You can pass them by value or by (const) reference.
Objects are always abstract to some degree. Given an object, one can almost always refine it to a more concrete object. A RectangularArea could be a Drawable which could be a Window which could be a ToplevelWindow which could be a ManagedWindow which could be... These must be passed by reference.
Pointers are a wholly separate can of worms. In my experience, naked pointers are best avoided. Use a smart pointer that cannot be NULL. If you need an optional argument, use an explicit optional class template such as boost::optional.
I'm moving from Java to C++ and am a bit confused of the language's flexibility. One point is that there are three ways to store objects: A pointer, a reference and a scalar (storing the object itself if I understand it correctly).
I tend to use references where possible, because that is as close to Java as possible. In some cases, e.g. getters for derived attributes, this is not possible:
MyType &MyClass::getSomeAttribute() {
MyType t;
return t;
}
This does not compile, because t exists only within the scope of getSomeAttribute() and if I return a reference to it, it would point nowhere before the client can use it.
Therefore I'm left with two options:
Return a pointer
Return a scalar
Returning a pointer would look like this:
MyType *MyClass::getSomeAttribute() {
MyType *t = new MyType;
return t;
}
This'd work, but the client would have to check this pointer for NULL in order to be really sure, something that's not necessary with references. Another problem is that the caller would have to make sure that t is deallocated, I'd rather not deal with that if I can avoid it.
The alternative would be to return the object itself (scalar):
MyType MyClass::getSomeAttribute() {
MyType t;
return t;
}
That's pretty straightforward and just what I want in this case: It feels like a reference and it can't be null. If the object is out of scope in the client's code, it is deleted. Pretty handy. However, I rarely see anyone doing that, is there a reason for that? Is there some kind of performance problem if I return a scalar instead of a pointer or reference?
What is the most common/elegant approach to handle this problem?
Return by value. The compiler can optimize away the copy, so the end result is what you want. An object is created, and returned to the caller.
I think the reason why you rarely see people do this is because you're looking at the wrong C++ code. ;)
Most people coming from Java feel uncomfortable doing something like this, so they call new all over the place. And then they get memory leaks all over the place, have to check for NULL and all the other problems that can cause. :)
It might also be worth pointing out that C++ references have very little in common with Java references.
A reference in Java is much more similar to a pointer (it can be reseated, or set to NULL).
In fact the only real differences are that a pointer can point to a garbage value as well (if it is uninitialized, or it points to an object that has gone out of scope), and that you can do pointer arithmetics on a pointer into an array.
A C++ references is an alias for an object. A Java reference doesn't behave like that.
Quite simply, avoid using pointers and dynamic allocation by new wherever possible. Use values, references and automatically allocated objects instead. Of course you can't always avoid dynamic allocation, but it should be a last resort, not a first.
Returning by value can introduce performance penalties because this means the object needs to be copied. If it is a large object, like a list, that operation might be very expensive.
But modern compilers are very good about making this not happen. The C++ standards explicitly states that the compiler is allowed to elide copies in certain circumstances. The particular instance that would be relevant in the example code you gave is called the 'return value optimization'.
Personally, I return by (usually const) reference when I'm returning a member variable, and return some sort of smart pointer object of some kind (frequently ::std::auto_ptr) when I need to dynamically allocate something. Otherwise I return by value.
I also very frequently have const reference parameters, and this is very common in C++. This is a way of passing a parameter and saying "the function is not allowed to touch this". Basically a read-only parameter. It should only be used for objects that are more complex than a single integer or pointer though.
I think one big change from Java is that const is important and used very frequently. Learn to understand it and make it your friend.
I also think Neil's answer is correct in stating that avoiding dynamic allocation whenever possible is a good idea. You should not contort your design too much to make that happen, but you should definitely prefer design choices in which it doesn't have to happen.
Returning by value is a common thing practised in C++. However, when you are passing an object, you pass by reference.
Example
main()
{
equity trader;
isTraderAllowed(trader);
....
}
bool isTraderAllowed(const equity& trdobj)
{
... // Perform your function routine here.
}
The above is a simple example of passing an object by reference. In reality, you would have a method called isTraderAllowed for the class equity, but I was showing you a real use of passing by reference.
A point regarding passing by value or reference:
Considering optimizations, assuming a function is inline, if its parameter is declared as "const DataType objectName" that DataType could be anything even primitives, no object copy will be involved; and if its parameter is declared as "const DataType & objectName" or "DataType & objectName" that again DataType could be anything even primitives, no address taking or pointer will be involved. In both previous cases input arguments are used directly in assembly code.
A point regarding references:
A reference is not always a pointer, as instance when you have following code in the body of a function, the reference is not a pointer:
int adad=5;
int & reference=adad;
A point regarding returning by value:
as some people have mentioned, using good compilers with capability of optimizations, returning by value of any type will not cause an extra copy.
A point regarding return by reference:
In case of inline functions and optimizations, returning by reference will not involve address taking or pointer.
I'm having a problem with a class like this:
class Sprite {
...
bool checkCollision(Sprite &spr);
...
};
So, if I have that class, I can do this:
ball.checkCollision(bar1);
But if I change the class to this:
class Sprite {
...
bool checkCollision(Sprite* spr);
...
};
I have to do this:
ball.checkCollision(&bar1);
So, what's the difference?? It's better a way instead other?
Thank you.
In both cases you are actually passing the address of bar1 (and you're not copying the value), since both pointers (Sprite *) and references (Sprite &) have reference semantics, in the first case explicit (you have to explicitly dereference the pointer to manipulate the pointed object, and you have to explicitly pass the address of the object to a pointer parameter), in the second case implicit (when you manipulate a reference it's as if you're manipulating the object itself, so they have value syntax, and the caller's code doesn't explicitly pass a pointer using the & operator).
So, the big difference between pointers and references is on what you can do on the pointer/reference variable: pointer variables themselves can be modified, so they may be changed to point to something else, can be NULLed, incremented, decremented, etc, so there's a strong separation between activities on the pointer (that you access directly with the variable name) and on the object that it points to (that you access with the * operator - or, if you want to access to the members, with the -> shortcut).
References, instead, aim to be just an alias to the object they point to, and do not allow changes to the reference itself: you initialize them with the object they refer to, and then they act as if they were such object for their whole life.
In general, in C++ references are preferred over pointers, for the motivations I said and for some other that you can find in the appropriate section of C++ FAQ.
In terms of performance, they should be the same, because a reference is actually a pointer in disguise; still, there may be some corner case in which the compiler may optimize more when the code uses a reference instead of a pointer, because references are guaranteed not to change the address they hide (i.e., from the beginning to the end of their life they always point to the same object), so in some strange case you may gain something in performance using references, but, again, the point of using references is about good programming style and readability, not performance.
A reference cannot be null. A pointer can.
If you don't want to allow passing null pointers into your function then use a reference.
With the pointer you need to specifically let the compiler know you want to pass the address of the object, with a reference, the compiler already knows you want the ptr. Both are ok, it's a matter of taste, I personally don't like references because I like to see whats going on but thats just me.
They both do the (essentially) same thing - they pass an object to a function by reference so that only the address of the object is copied. This is efficient and means the function can change the object.
In the simple case you give they are equivalent.
Main differences are that the reference cannot be null, so you don't have to test for null in the function - but you also cannot pass a null object if the case of no object is valid.
Some people also dislike the pass by reference version because it is not obvious in the calling code that the object you pass in might be modified. Some coding standards recommend you only pass const references to functions.
Provided, I want to pass a modifiable parameter to a function, what should I choose: to pass it by pointer or to pass it by reference?
bool GetFoo ( Foo& whereToPlaceResult );
bool GetFoo ( Foo* whereToPlaceResult );
I am asking this because I always considered it the best practice to pass parameter by reference (1), but after examining some local code database, I came to a conclusion, that the most common way is (2). Moreover, the man himself (Bjarne Stroustrup) recommends using (2). What are the [dis]advantages of (1) and (2), or is it just a matter of personal taste?
I prefer a reference instead of a pointer when:
It can't be null
It can't be changed (to point to something else)
It mustn't be deleted (by whoever receives the pointer)
Some people say though that the difference between a reference and a const reference is too subtle for many people, and is invisible in the code which calls the method (i.e., if you read the calling code which passes a parameter by reference, you can't see whether it's a const or a non-const reference), and that therefore you should make it a pointer (to make it explicit in the calling code that you're giving away the address of your variable, and that therefore the value of your variable may be altered by the callee).
I personally prefer a reference, for the following reason:
I think that a routine should know what subroutine it's calling
A subroutine shouldn't assume anything about what routine it's being called from.
[1.] implies that making the mutability visible to the caller doesn't matter much, because the caller should already (by other means) understand what the subroutine does (including the fact that it will modify the parameter).
[2.] implies that if it's a pointer then the subroutine should handle the possibility of the parameter's being a null pointer, which may be extra and IMO useless code.
Furthermore, whenever I see a pointer I think, "who's going to delete this, and when?", so whenever/wherever ownership/lifetime/deletion isn't an issue I prefer to use a reference.
For what it's worth I'm in the habit of writing const-correct code: so if I declare that a method has a non-const reference parameter, the fact that it's non-const is significant. If people weren't writing const-correct code then maybe it would be harder to tell whether a parameter will be modified in a subroutine, and the argument for another mechanism (e.g. a pointer instead of a reference) would be a bit stronger.
Advantages to passing by reference:
Forces user to supply a value.
Less error-prone: Handles pointer dereferencing itself. Don't have to check for null inside.
Makes the calling code look much cleaner.
Advantages to passing pointer by value:
Allows null to be passed for "optional" parameters. Kinda an ugly hack, but sometimes useful.
Forces caller to know what is being done w/ the parameter.
Gives the reader half a clue of what might be being done w/ the parameter without having to read the API.
Since reference passing is in the language, any non-pointer parameters might be getting modified too, and you don't know that pointer values are being changed. I've seen APIs where they are treated as constants. So pointer passing doesn't really give readers any info that they can count on. For some people that might be good enough, but for me it isn't.
Really, pointer passing is just an error-prone messy hack leftover from C which had no other way to pass values by reference. C++ has a way, so the hack is no longer needed.
One advantage to passing by reference is that they cannot be null (unlike pointers), obviating the need to null-check every out parameter.
I'd recommend that you consider (may not be best for every situation) returning Foo from the function rather than modifying a parameter. Your function prototype would look like this:
Foo GetFoo() // const (if a member function)
As you appear to be returning a success/failure flag, using an exception might be a better strategy.
Advantages:
You avoid all of the pointer/reference issues
Simplifies life for the caller. Can pass the return value to other functions without using a local variable, for example.
Caller cannot ignore error status if you throw an exception.
Return value optimization means that it may be as efficient as modifying a parameter.
I choose #2 because it obvious at the point of call that the parameter will be changed.
GetFoo(&var) rather than GetFoo(var)
I prefer pass by reference for just const references, where I am trying to avoid a copy constructor call.
Pass by reference, and avoid the whole NULL pointer problem.
I seem to recall that in c++ references where not null and pointers could be. Now I've not done c++ for a long time so my memory could be rusty.
The difference here is relatively minor.
A reference cannot be NULL.
A nullpointer may be passed.
Thus you can check if that happens and react accordingly.
I personally can't think of a real advantage of one of the two possibilities.
I find this a matter of personal taste. I actually prefer to pass by reference because pointers give more freedom but they also tend to cause a lot of problems.
The benefit to a pointer is that you can pass nothing, ie. use it as if the parameter was completely optional and not have a variable the caller passes in.
References otherwise are safer, if you have one its guaranteed to exist and be writeable (unless const of course)
I think its a matter of preference otherwise, but I don't like mixing the two as I think it makes maintainace and readability of your code harder to do (especially as your 2 functions look the same to the caller)
These days I use const references for input parameters and pointers for out parameters. FWIIW, Google C++ Style Guide recommends the same approach (not that I always agree with their style guide - for instance they don't use exceptions, which usually does not make much sense)
My preference is a reference. First, because it rhymes. :) Also because of the issues pointed out by other answers: no need to dereference, and no possibility of a reference being NULL. Another reason, which I have not seen mentioned, is that when you see a pointer you cannot be sure whether or not it points to dynamically allocated memory, and you may be tempted to call delete on it. A reference, on the other hand, dispenses with any ambiguity regarding memory management.
Having said that, there are of course many cases when passing a pointer is preferable, or even necessary. If you know in advance that the parameter is optional, then allowing it to be NULL is very useful. Similarly, you may know in advance that the parameter is always dynamically allocated and have the memory management all worked out.
What is the best practice for returning references from class methods. Is it the case that basic types you want to return without a reference whereas class objects you want to return by reference. Any articles, best practices article that you recommend.
I'll assume that by class method you mean member function. And that by "return by reference" you mean "return reference to member data". This is mainly as opposed to returning a reference to local, which is clearly wrong.
When should you return a reference to member data, and when the data itself ?
By default, you should be returning the data itself (aka "by value"). This avoids several problems with returning a reference:
Users storing the reference and becoming dependant on the lifetime of your members, without considering how long the containing object (your object) will live. Leads to dangling pointers.
User code becoming dependant on the exact return type. For example, you use a vector<T> for implementation (and that's what your getter returns). User code like "vector<T> foo = obj.getItems()" appears. Then you change your implementation (and getter) to use a deque<T> -- user code breaks. If you had been returning by value, you could simply make the getter create a local vector, copy the data over from the member deque, and return the result. Quite reasonable for small-sized collections. [*]
So when should you return a reference instead?
You can consider it when the returned object is huge (Image) or non-copyable (boost::signal). But, as always, you can instead opt for the more OOP pattern of having your class do stuff rather than have stuff hanging from it. In the Image case, you can provide a drawCircle member function, rather than returning Image& and having your users draw a circle on it.
When your data is logically owned by your user, and you're just holding it for him. Consider std collections: vector<T>::operator[] returns a reference to T because that's what I want to get at: my exact object, not a copy of it.
[*] There is a better way to ensure future-proof code. Rather than returning a vector (by ref of by value) return a pair of iterators to your vector -- a beginning and an ending one. This lets your users do everything they normally do with a deque or a vector, but independent of the actual implementation. Boost provides boost::iterator_pair for this purpose. As a perk, it also has operator[] overloaded, so you can even do "int i = obj.getItems()[5]" rather than "int i = obj.getItems().begin()[5]".
This solution is generalizable to any situation which allows you to treat types generically. For example, if you keep a Dog member but your users only need to know it's an Animal (because they only call eat() and sleep()), return an Animal reference/pointer to a freestore-allocated copy of your dog. Then when you decide dogs are weaklings and you really need a wolf for the implementation, user code won't break.
This sort of information-hiding does more than ensure future-compatibility. It also helps keep your design clean.
Overloading assignment operators (like =, +=, -= etc.) is a good example where returning by reference makes a lot of sense. This kind of methods would obviously return large objects, and you don't want to get back pointers, so returning a reference is the best way to go. Works like a pointer and looks like returning by value.
A reference is a pointer in disguise (so it's 4 bytes on 32 bit machines, and 8 bytes on 64 bit machines). So the rule of thumb is: if copying the object is more expensive than returning a pointer, use a pointer (or reference, since that's the same thing).
Which types are more expensive to copy depends on the architecture, compiler, the type itself etc. In some cases copying an object that's 16 bytes can be faster than returning a pointer to it (for example, if an object maps to SSE register, or similar situation).
Now, of course, returning a reference to a local variable does not make sense. Because the local variable will be gone after the function exits. So usually you'd return references/pointers to member variables, or global/static variables, or dynamically allocated objects.
There are situations where you don't want to return pointer/reference to an object, even if copying the object is expensive. Mostly when you don't want to tie the calling code into the lifetime of the original object.
Scott Meyers' book, Effective C++, has several items related to this topic. I would definitely check out the item titled, "Don't try to return a reference when you must return an object." This is item #23 in the 1st or 2nd editions, or #21 in the 3rd edition.
I would recommend staying away from returning references for the same reason as Iraimbilanja points out, but in my opinion you can get very good results by using shared pointers (e.g., boost, tr1) on the member data and use those in return. That way you do not have to copy the object but can still manage life time issues.
class Foo
{
private:
shared_ptr<Bar> _bar;
public:
shared_ptr<Bar> getBar() {return _bar;}
};
Usually the cost of copying Bar is greater than the cost of constructing new shared_ptrs, should this not be the case it can still be worth using for life time management.
if NULL is a possible return value, the method must return a pointer because you can't return a reference to NULL.
Return basic types by value, except if you want to let the caller access the actual member.
Return class objects (even std::string) by reference.