Enforce safe use of class containing reference or raw pointer - c++

Suppose we have a class that looks like the following.
class DoStuffWithRef
{
DoStuffWithRef(LargeObject& lo) : lo_(lo) {}
// a bunch of member functions, some of them useful
// [...]
private:
LargeObject& lo_;
};
The class is designed such that the only sane use is when at least one other entity has ownership of the object that lo_ references. If client code uses the class appropriately, there's no need for DoStuffWithRef to have ownership of the LargeObject.
Is there a way to enforce this usage or signal an error if the class is being misused? Can anything be done besides documenting the intended use of DoStuffWithRef?
For example, an automatically stored DoStuffWithRef might refer to an automatically stored LargeObject.
void foo()
{
LargeObject lo;
DoStuffWithRef dswr(lo);
// some code that makes use of the DoStuffWithRef instance
return;
}
This is the primary usage I have in mind although there are other possible cases where someone else might be guaranteed to have ownership.
The problem is that it's very possible for client code to create a DoStuffWithRef instance that will end up with a dangling reference. If no one else has ownership of a LargeObject referred to by a DoStuffWithRef, then chaos ensues when the the DoStuffWithRef tries to access the LargeObject. Worse yet, chaos may ensue only long after the error has been made.
When this has come up in the past, I've used a boost::shared_ptr<> instead of a reference (this was in C++03). That doesn't quite express the semantics of the class appropriately. The premise is that DoStuffWithRef doesn't need ownership -- it is meant to act on objects owned by others and if no one else owns the object then the functionality provided by DoStuffWithRef makes no sense. Giving ownership to DoStuffWithRef has unnecessary overhead and forces shared ownership semantics. weak_ptr<> would also have the wrong semantics because we shouldn't be asking at runtime if the pointer is valid.
I've never had this scenario come up in a performance sensitive section of code or in a case where shared ownership was a significant burden so it just hasn't really mattered but I wish I knew of a way to accurately express this in C++. The costs are small but they're also unnecessary. More importantly, a shared_ptr<> will hide incorrect use of the class. If a DoStuffWithRef, modified to use a shared_ptr<>, is the only remaining owner of a LargeObject, then the dangling reference has been averted in the short term but the client code ends up in uncharted territory.

Is DoStuffWithRef otherwise stateful? If not, it seems like it is a bucket of utility functions. The reference to LargeObject could just as well be passed as an argument to each function. That also resolves the question of ownership.
Otherwise, if it is stateful, this is a clear case of shared ownership, since DoStuffWithRef really does need to extend the lifetime of a single live instance of LargeObject to that of itself. Or, use a weak pointer and eat the cost of the runtime check...

Related

How to enable Rust Ownership paradigm in C++

The system programming language Rust uses the ownership paradigm to ensure at compile time with zero cost for the runtime when a resource has to be freed.
In C++ we commonly use smart pointers to achieve the same goal of hiding the complexity of managing resource allocation. There are a couple of differences though:
In Rust there is always only one owner, whereas C++ shared_ptr can easily leak ownership.
In Rust we can borrow references we do not own, whereas C++ unique_ptr cannot be shared in a safe way via weak_ptr and lock().
Reference counting of shared_ptr is costly.
My question is: How can we emulate the ownership paradigm in C++ within the following constraints:
Only one owner at any time
Possibility to borrow a pointer and use it temporarily without fear of the resource going out of scope (observer_ptr is useless for this)
As much compile-time checks as possible.
Edit: Given the comments so far, we can conclude:
No compile-time support for this (I was hoping for some decltype/template magic unknown to me) in the compilers. Might be possible using static analysis elsewhere (taint?)
No way to get this without reference counting.
No standard implementation to distinguish shared_ptrs with owning or borrowing semantic
Could roll your own by creating wrapper types around shared_ptr and weak_ptr:
owned_ptr: non-copyable, move-semantics, encapsulates shared_ptr, access to borrowed_ptr
borrowed_ptr: copyable, encapsulates weak_ptr, lock method
locked_ptr: non-copyable, move-semantics, encapsulates shared_ptr from locking weak_ptr
You can't do this with compile-time checks at all. The C++ type system is lacking any way to reason about when an object goes out of scope, is moved, or is destroyed — much less turn this into a type constraint.
What you could do is have a variant of unique_ptr that keeps a counter of how many "borrows" are active at run time. Instead of get() returning a raw pointer, it would return a smart pointer that increments this counter on construction and decrements it on destruction. If the unique_ptr is destroyed while the count is non-zero, at least you know someone somewhere did something wrong.
However, this is not a fool-proof solution. Regardless of how hard you try to prevent it, there will always be ways to get a raw pointer to the underlying object, and then it's game over, since that raw pointer can easily outlive the smart pointer and the unique_ptr. It will even sometimes be necessary to get a raw pointer, to interact with an API that requires raw pointers.
Moreover, ownership is not about pointers. Box/unique_ptr allows you to heap allocate an object, but it changes nothing about ownership, life time, etc. compared to putting the same object on the stack (or inside another object, or anywhere else really). To get the same mileage out of such a system in C++, you'd have to make such "borrow counting" wrappers for all objects everywhere, not just for unique_ptrs. And that is pretty impractical.
So let's revisit the compile time option. The C++ compiler can't help us, but maybe lints can? Theoretically, if you implement the whole life time part of the type system and add annotations to all APIs you use (in addition to your own code), that may work.
But it requires annotations for all functions used in the whole program. Including private helper function of third party libraries. And those for which no source code is available. And for those whose implementation that are too complicated for the linter to understand (from Rust experience, sometimes the reason something is safe are too subtle to express in the static model of lifetimes and it has to be written slightly differently to help the compiler). For the last two, the linter can't verify that the annotation is indeed correct, so you're back to trusting the programmer. Additionally, some APIs (or rather, the conditions for when they are safe) can't really be expressed very well in the lifetime system as Rust uses it.
In other words, a complete and practically useful linter for this this would be substantial original research with the associated risk of failure.
Maybe there is a middle ground that gets 80% of the benefits with 20% of the cost, but since you want a hard guarantee (and honestly, I'd like that too), tough luck. Existing "good practices" in C++ already go a long way to minimizing the risks, by essentially thinking (and documenting) the way a Rust programmer does, just without compiler aid. I'm not sure if there is much improvement over that to be had considering the state of C++ and its ecosystem.
tl;dr Just use Rust ;-)
What follows are some examples of ways people have tried to emulate parts of Rust's ownership paradigm in C++, with limited success:
Lifetime safety: Preventing common dangling. The most thorough and rigorous approach, involving several additions to the language to support the necessary annotations. If the effort is still alive (last commit was in 2019), getting this analysis added to a mainstream compiler is probably the most likely route to "borrow checked" C++. Discussed on IRLO.
Borrowing Trouble: The Difficulties Of A C++ Borrow-Checker
Is it possible to achieve Rust's ownership model with a generic C++ wrapper?
C++Now 2017: Jonathan Müller “Emulating Rust's borrow checker in C++" (video) and associated code, about which the author says, "You're not actually supposed to use that, if you need such a feature, you should use Rust."
Emulating the Rust borrow checker with C++ move-only types and part II (which is actually more like emulating RefCell than the borrow checker, per se)
I believe you can get some of the benefits of Rust by enforcing some strict coding conventions (which is after all what you'd have to do anyway, since there's no way with "template magic" to tell the compiler not to compile code that doesn't use said "magic"). Off the top of my head, the following could get you...well...kind of close, but only for single-threaded applications:
Never use new directly; instead, use make_unique. This goes partway toward ensuring that heap-allocated objects are "owned" in a Rust-like manner.
"Borrowing" should always be represented via reference parameters to function calls. Functions that take a reference should never create any sort of pointer to the refered-to object. (It may in some cases be necessary to use a raw pointer as a paramter instead of a reference, but the same rule should apply.)
Note that this works for objects on the stack or on the heap; the function shouldn't care.
Transfer of ownership is, of course, represented via R-value references (&&) and/or R-value references to unique_ptrs.
Unfortunately, I can't think of any way to enforce Rust's rule that mutable references can only exist anywhere in the system when there are no other extant references.
Also, for any kind of parallelism, you would need to start dealing with lifetimes, and the only way I can think of to permit cross-thread lifetime management (or cross-process lifetime management using shared memory) would be to implement your own "ptr-with-lifetime" wrapper. This could be implemented using shared_ptr, because here, reference-counting would actually be important; it's still a bit of unnecessary overhead, though, because reference-count blocks actually have two reference counters (one for all the shared_ptrs pointing to the object, another for all the weak_ptrs). It's also a little... odd, because in a shared_ptr scenario, everybody with a shared_ptr has "equal" ownership, whereas in a "borrowing with lifetime" scenario, only one thread/process should actually "own" the memory.
I think one could add a degree of compile-time introspection and custom sanitisation by introducing custom wrapper classes that track ownership and borrowing.
The code below is a hypothetical sketch, and not a production solution which would need a lot more tooling, e.g. #def out the checks when not sanitising. It uses a very naive lifetime checker to 'count' borrow errors in ints, in this instance during compilation. static_asserts are not possible as the ints are not constexpr, but the values are there and can be interrogated before runtime. I believe this answers your 3 constraints, regardless of whether these are heap allocations, so I'm using a simple int type to demo the idea, rather than a smart pointer.
Try uncommenting the use cases in main() below (run in compiler explorer with -O3 to see boilerplate optimise away), and you'll see the warning counters change.
https://godbolt.org/z/Pj4WMr
// Hypothetical Rust-like owner / borrow wrappers in C++
// This wraps types with data which is compiled away in release
// It is not possible to static_assert, so this uses static ints to count errors.
#include <utility>
// Statics to track errors. Ideally these would be static_asserts
// but they depen on Owner::has_been_moved which changes during compilation.
static int owner_already_moved = 0;
static int owner_use_after_move = 0;
static int owner_already_borrowed = 0;
// This method exists to ensure static errors are reported in compiler explorer
int get_fault_count() {
return owner_already_moved + owner_use_after_move + owner_already_borrowed;
}
// Storage for ownership of a type T.
// Equivalent to mut usage in Rust
// Disallows move by value, instead ownership must be explicitly moved.
template <typename T>
struct Owner {
Owner(T v) : value(v) {}
Owner(Owner<T>& ov) = delete;
Owner(Owner<T>&& ov) {
if (ov.has_been_moved) {
owner_already_moved++;
}
value = std::move(ov.value);
ov.has_been_moved = true;
}
T& operator*() {
if (has_been_moved) {
owner_use_after_move++;
}
return value;
}
T value;
bool has_been_moved{false};
};
// Safely borrow a value of type T
// Implicit constuction from Owner of same type to check borrow is safe
template <typename T>
struct Borrower {
Borrower(Owner<T>& v) : value(v.value) {
if (v.has_been_moved) {
owner_already_borrowed++;
}
}
const T& operator*() const {
return value;
}
T value;
};
// Example of function borrowing a value, can only read const ref
static void use(Borrower<int> v) {
(void)*v;
}
// Example of function taking ownership of value, can mutate via owner ref
static void use_mut(Owner<int> v) {
*v = 5;
}
int main() {
// Rather than just 'int', Owner<int> tracks the lifetime of the value
Owner<int> x{3};
// Borrowing value before mutating causes no problems
use(x);
// Mutating value passes ownership, has_been_moved set on original x
use_mut(std::move(x));
// Uncomment for owner_already_borrowed = 1
//use(x);
// Uncomment for owner_already_moved = 1
//use_mut(std::move(x));
// Uncomment for another owner_already_borrowed++
//Borrower<int> y = x;
// Uncomment for owner_use_after_move = 1;
//return *x;
}
The use of static counters is obviously not desirable, but it is not possible to use static_assert as owner_already_moved is non-const. The idea is these statics give hints to errors appearing, and in final production code they could be #defed out.
You can use an enhanced version of a unique_ptr (to enforce a unique owner) together with an enhanced version of observer_ptr (to get a nice runtime exception for dangling pointers, i.e. if the original object maintained through unique_ptr went out of scope). The Trilinos package implements this enhanced observer_ptr, they call it Ptr. I have implemented the enhanced version of unique_ptr here (I call it UniquePtr): https://github.com/certik/trilinos/pull/1
Finally, if you want the object to be stack allocated, but still be able to pass safe references around, you need to use the Viewable class, see my initial implementation here: https://github.com/certik/trilinos/pull/2
This should allow you to use C++ just like Rust for pointers, except that in Rust you get a compile time error, while in C++ you get a runtime exception. Also, it should be noted, that you only get a runtime exception in Debug mode. In Release mode, the classes do not do these checks, so they are as fast as in Rust (essentially as fast as raw pointers), but then they can segfault. So one has to make sure the whole test suite runs in Debug mode.

Suicide object implementation leveraging `std::weak_ptr`

I'm considering using "suicide objects" to model entities in a game, that is, objects able to delete themselves. Now, the usual C++03 implementation (plain old delete this) does nothing for other objects potentially refering to the suicide object, which is why I'm using std::shared_ptr and std::weak_ptr.
Now for the code dump :
#include <memory>
#include <iostream>
#include <cassert>
struct SuObj {
SuObj() { std::cout << __func__ << '\n'; }
~SuObj() { std::cout << __func__ << '\n'; }
void die() {
ptr.reset();
}
static std::weak_ptr<SuObj> create() {
std::shared_ptr<SuObj> obj = std::make_shared<SuObj>();
return (obj->ptr = std::move(obj));
}
private:
std::shared_ptr<SuObj> ptr;
};
int main() {
std::weak_ptr<SuObj> obj = SuObj::create();
assert(!obj.expired());
std::cout << "Still alive\n";
obj.lock()->die();
assert(obj.expired());
std::cout << "Deleted\n";
return 0;
}
Question
This code appears to work fine. However, I'd like to have someone else's eye to gauge it. Does this code make sense ? Did I blindly sail into undefined lands ? Should I drop my keyboard and begin art studies right now ?
I hope this question is sufficiently narrowed down for SO. Seemed a bit tiny and low-level for CR.
Minor precision
I do not intend to use this in multithreaded code. If the need ever arises, I'll be sure to reconsider the whole thing.
When you have shared_ptr based object lifetime, the lifetime of your object is the "lifetime" of the union of the shared_ptrs who own it collectively.
In your case, you have an internal shared_ptr, and your object will not die until that internal shared_ptr expires.
However, this does not mean you can commit suicide. If you remove that last reference, your object continues to exist if anyone has .lock()'d the weak_ptr and stored the result. As this is the only way you can access the object externally, it may happen1.
In short, die() can fail to kill the object. It might better be called remove_life_support(), as something else could keep the object alive after said life support is removed.
Other than that, your design works.
1
You could say "well, then callers should just not keep the shared_ptr around" -- but that doesn't work, as the check that the object is valid is only valid as long as the shared_ptr persists. Plus, by exposing the way to create shared_ptr, you have no type guarantees that the client code won't store them (accidentally or on purpose).
A transaction based model (where you pass a lambda in, and it operates on it internally) could help with this if you want seriously paranoid robustness.
Or you can live with the object sometimes living too long.
Consider hiding these messy details behind a Regular Type (or almost-regular) that has a pImpl to the nasty memory management problem. That pImpl could be a weak_ptr with the above semantics.
Then users of your code need only interact with the Regular (or pseudoRegular) wrapper.
If you don't want cloning to be easy, disable copy construction/assignment and only expose move.
Now your nasty memory management is hiding behind a fascade, and if you decide you did it all wrong the external pseudoRegular interface can have different guts.
Regular type in C++11
Not a direct answer but potentially useful information.
In Chromium codebase there is a concept of exactly what you are trying to achieve. They call it WeakPtrFactory. Their solution cannot be directly taken into your code since they have their own implementation of e.g. shared_ptr and weak_ptr but design wise it can be of use to you.
I made a try to implement it and found out that the problem of double deletion can be solved by passing into inner shared_ptr custom empty deleter - from this moment on neither shared_ptrs created from weak_ptr not inner shared_ptr will be able to call destructor (again) on your object.
The only problem to solve is what if your object is deleted and somewhere else you keep shared_ptr to it? But from what I see it cannot be simply solved by any magic mean and require designing that whole project the way that it simply never happens e.g. by using shared_ptr only in local scope and ensuring that some set of operations (creating suicide object, using it, ordering its suicide) could be performed only in the same thread.
I understand you're trying to create a minimal example for SO, but I see a few challenges you'll want to consider:
You have a public constructor and destructor, so technically there's no guarantee that the create() method is always used.
You could make those protected or private but that decision would interfere with use with std algorithms and containers.
This doesn't guarantee that the object will actually destruct because as long as someone has a shared_ptr it's going to exist. That may or may not be a problem for your use case, but because of that I don't think this will add as much value as you're heading.
This is likely going to be confusing and counter-intuitive to other developers. It might make maintenance harder, even if your intent is to make it easier. That's a bit of a value judgement, but I'd encourage you to consider if it's truly easier to manage.
I commend you for putting thought into memory management up front. Disciplined use of shared_ptr and weak_ptr will help with your memory management issues -- I'd counsel against trying to have the instance try to manage its own lifecycle.
As for art studies... I'd only recommend that if that's truly your passion! Good luck!

Is it alright to return a reference to a non-pointer member variable as a pointer?

I recently came across some C++ code that looked like this:
class SomeObject
{
private:
// NOT a pointer
BigObject foobar;
public:
BigObject * getFoobar() const
{
return &foobar;
}
};
I asked the programmer why he didn't just make foobar a pointer, and he said that this way he didn't have to worry about allocating/deallocating memory. I asked if he considered using some smart pointer, he said this worked just as well.
Is this bad practice? It seems very hackish.
That's perfectly reasonable, and not "hackish" in any way; although it might be considered better to return a reference to indicate that the object definitely exists. A pointer might be null, and might lead some to think that they should delete it after use.
The object has to exist somewhere, and existing as a member of an object is usually as good as existing anywhere else. Adding an extra level of indirection by dynamically allocating it separately from the object that owns it makes the code less efficient, and adds the burden of making sure it's correctly deallocated.
Of course, the member function can't be const if it returns a non-const reference or pointer to a member. That's another advantage of making it a member: a const qualifier on SomeObject applies to its members too, but doesn't apply to any objects it merely has a pointer to.
The only danger is that the object might be destroyed while someone still has a pointer or reference to it; but that danger is still present however you manage it. Smart pointers can help here, if the object lifetimes are too complex to manage otherwise.
You are returning a pointer to a member variable not a reference. This is bad design.
Your class manages the lifetime of foobar object and by returning a pointer to its members you enable the consumers of your class to keep using the pointer beyond the lifetime of SomeObject object. And also it enables the users to change the state of SomeObject object as they wish.
Instead you should refactor your class to include the operations that would be done on the foobar in SomeObject class as methods.
ps. Consider naming your classes properly. When you define it is a class. When you instantiate, then you have an object of that class.
It's generally considered less than ideal to return pointers to internal data at all; it prevents the class from managing access to its own data. But if you want to do that anyway I see no great problem here; it simplifies the management of memory.
Is this bad practice? It seems very hackish.
It is. If the class goes out of scope before the pointer does, the member variable will no longer exist, yet a pointer to it still exists. Any attempt to dereference that pointer post class destruction will result in undefined behaviour - this could result in a crash, or it could result in hard to find bugs where arbitrary memory is read and treated as a BigObject.
if he considered using some smart pointer
Using smart pointers, specifically std::shared_ptr<T> or the boost version, would technically work here and avoid the potential crash (if you allocate via the shared pointer constructor) - however, it also confuses who owns that pointer - the class, or the caller? Furthermore, I'm not sure you can just add a pointer to an object to a smart pointer.
Both of these two points deal with the technical issue of getting a pointer out of a class, but the real question should be "why?" as in "why are you returning a pointer from a class?" There are cases where this is the only way, but more often than not you don't need to return a pointer. For example, suppose that variable needs to be passed to a C API which takes a pointer to that type. In this case, you would probably be better encapsulating that C call in the class.
As long as the caller knows that the pointer returned from getFoobar() becomes invalid when the SomeObject object destructs, it's fine. Such provisos and caveats are common in older C++ programs and frameworks.
Even current libraries have to do this for historical reasons. e.g. std::string::c_str, which returns a pointer to an internal buffer in the string, which becomes unusable when the string destructs.
Of course, that is difficult to ensure in a large or complex program. In modern C++ the preferred approach is to give everything simple "value semantics" as far as possible, so that every object's life time is controlled by the code that uses it in a trivial way. So there are no naked pointers, no explicit new or delete calls scattered around your code, etc., and so no need to require programmers to manually ensure they are following the rules.
(And then you can resort to smart pointers in cases where you are totally unable to avoid shared responsibility for object lifetimes.)
Two unrelated issues here:
1) How would you like your instance of SomeObject to manage the instance of BigObject that it needs? If each instance of SomeObject needs its own BigObject, then a BigObject data member is totally reasonable. There are situations where you'd want to do something different, but unless that situation arises stick with the simple solution.
2) Do you want to give users of SomeObject direct access to its BigObject? By default the answer here would be "no", on the basis of good encapsulation. But if you do want to, then that doesn't change the assessment of (1). Also if you do want to, you don't necessarily need to do so via a pointer -- it could be via a reference or even a public data member.
A third possible issue might arise that does change the assessment of (1):
3) Do you want to give users of SomeObject direct access to an instance of BigObject that they continue using beyond the lifetime of the instance of SomeObject that they got it from? If so then of course a data member is no good. The proper solution might be shared_ptr, or for SomeObject::getFooBar to be a factory that returns a different BigObject each time it's called.
In summary:
Other than the fact it doesn't compile (getFooBar() needs to return const BigObject*), there is no reason so far to suppose that this code is wrong. Other issues could arise that make it wrong.
It might be better style to return const & rather than const *. Which you return has no bearing on whether foobar should be a BigObject data member.
There is certainly no "just" about making foobar a pointer or a smart pointer -- either one would necessitate extra code to create an instance of BigObject to point to.

What is the best way to implement smart pointers in C++?

I've been evaluating various smart pointer implementations (wow, there are a LOT out there) and it seems to me that most of them can be categorized into two broad classifications:
1) This category uses inheritance on the objects referenced so that they have reference counts and usually up() and down() (or their equivalents) implemented. IE, to use the smart pointer, the objects you're pointing at must inherit from some class the ref implementation provides.
2) This category uses a secondary object to hold the reference counts. For example, instead of pointing the smart pointer right at an object, it actually points at this meta data object... Who has a reference count and up() and down() implementations (and who usually provides a mechanism for the pointer to get at the actual object being pointed to, so that the smart pointer can properly implement operator ->()).
Now, 1 has the downside that it forces all of the objects you'd like to reference count to inherit from a common ancestor, and this means that you cannot use this to reference count objects that you don't have control over the source code to.
2 has the problem that since the count is stored in another object, if you ever have a situation that a pointer to an existing reference counted object is being converted into a reference, you probably have a bug (I.E., since the count is not in the actual object, there is no way for the new reference to get the count... ref to ref copy construction or assignment is fine, because they can share the count object, but if you ever have to convert from a pointer, you're totally hosed)...
Now, as I understand it, boost::shared_pointer uses mechanism 2, or something like it... That said, I can't quite make up my mind which is worse! I have only ever used mechanism 1, in production code... Does anyone have experience with both styles? Or perhaps there is another way thats better than both of these?
"What is the best way to implement smart pointers in C++"
Don't! Use an existing, well tested smart pointer, such as boost::shared_ptr or std::tr1::shared_ptr (std::unique_ptr and std::shared_ptr with C++ 11)
If you have to, then remember to:
use safe-bool idiom
provide an operator->
provide the strong exception guarantee
document the exception requirements your class makes on the deleter
use copy-modify-swap where possible to implement the strong exception guarantee
document whether you handle multithreading correctly
write extensive unit tests
implement conversion-to-base in such a way that it will delete on the derived pointer type (policied smart pointers / dynamic deleter smart pointers)
support getting access to raw pointer
consider cost/benifit of providing weak pointers to break cycles
provide appropriate casting operators for your smart pointers
make your constructor templated to handle constructing base pointer from derived.
And don't forget anything I may have forgotten in the above incomplete list.
Just to supply a different view to the ubiquitous Boost answer (even though it is the right answer for many uses), take a look at Loki's implementation of smart pointers. For a discourse on the design philosophy, the original creator of Loki wrote the book Modern C++ Design.
I've been using boost::shared_ptr for several years now and while you are right about the downside (no assignment via pointer possible), I think it was definitely worth it because of the huge amount of pointer-related bugs it saved me from.
In my homebrew game engine I've replaced normal pointers with shared_ptr as much as possible. The performance hit this causes is actually not so bad if you are calling most functions by reference so that the compiler does not have to create too many temporary shared_ptr instances.
Boost also has an intrusive pointer (like solution 1), that doesn't require inheriting from anything. It does require changing the pointer to class to store the reference count and provide appropriate member functions. I've used this in cases where memory efficiency was important, and didn't want the overhead of another object for each shared pointer used.
Example:
class Event {
public:
typedef boost::intrusive_ptr<Event> Ptr;
void addRef();
unsigned release();
\\ ...
private:
unsigned fRefCount;
};
inline void Event::addRef()
{
fRefCount++;
}
inline unsigned Event::release(){
fRefCount--;
return fRefCount;
}
inline void intrusive_ptr_add_ref(Event* e)
{
e->addRef();
}
inline void intrusive_ptr_release(Event* e)
{
if (e->release() == 0)
delete e;
}
The Ptr typedef is used so that I can easily switcth between boost::shared_ptr<> and boost::intrusive_ptr<> without changing any client code
If you stick with the ones that are in the standard library you will be fine.
Though there are a few other types than the ones you specified.
Shared: Where the ownership is shared between multiple objects
Owned: Where one object owns the object but transfer is allowed.
Unmovable: Where one object owns the object and it can not be transferred.
The standard library has:
std::auto_ptr
Boost has a couple more than have been adapted by tr1 (next version of the standard)
std::tr1::shared_ptr
std::tr1::weak_ptr
And those still in boost (which in relatively is a must have anyway) that hopefully make it into tr2.
boost::scoped_ptr
boost::scoped_array
boost::shared_array
boost::intrusive_ptr
See:
Smart Pointers: Or who owns you baby?
It seems to me this question is kind of like asking "Which is the best sort algorithm?" There is no one answer, it depends on your circumstances.
For my own purposes, I'm using your type 1. I don't have access to the TR1 library. I do have complete control over all the classes I need to have shared pointers to. The additional memory and time efficiency of type 1 might be pretty slight, but memory usage and speed are big issues for my code, so type 1 was a slam dunk.
On the other hand, for anyone who can use TR1, I'd think the type 2 std::tr1::shared_ptr class would be a sensible default choice, to be used whenever there isn't some pressing reason not to use it.
The problem with 2 can be worked around. Boost offers boost::shared_from_this for this same reason. In practice, it's not a big problem.
But the reason they went with your option #2 is that it can be used in all cases. Relying on inheritance isn't always an option, and then you're left with a smart pointer you can't use for half your code.
I'd have to say #2 is best, simply because it can be used in any circumstances.
Our project uses smart pointers extensively. In the beginning there was uncertainty about which pointer to use, and so one of the main authors chose an intrusive pointer in his module and the other a non-intrusive version.
In general, the differences between the two pointer types were not significant. The only exception being that early versions of our non-intrusive pointer implicitly converted from a raw pointer and this can easily lead to memory problems if the pointers are used incorrectly:
void doSomething (NIPtr<int> const &);
void foo () {
NIPtr<int> i = new int;
int & j = *i;
doSomething (&j); // Ooops - owned by two pointers! :(
}
A while ago, some refactoring resulted in some parts of the code being merged, and so a choice had to be made about which pointer type to use. The non-intrusive pointer now had the converting constructor declared as explicit and so it was decided to go with the intrusive pointer to save on the amount of code change that was required.
To our great surprise one thing we did notice was that we had an immediate performance improvement by using the intrusive pointer. We did not put much research into this, and just assumed that the difference was the cost of maintaining the count object. It is possible that other implementations of non-intrusive shared pointer have solved this problem by now.
What you are talking about are intrusive and non-intrusive smart pointers. Boost has both. boost::intrusive_ptr calls a function to decrease and increase the reference count of your object, everytime it needs to change the reference count. It's not calling member functions, but free functions. So it allows managing objects without the need to change the definition of their types. And as you say, boost::shared_ptr is non-intrusive, your category 2.
I have an answer explaining intrusive_ptr: Making shared_ptr not use delete. In short, you use it if you have an object that has already reference counting, or need (as you explain) an object that is already referenced to be owned by an intrusive_ptr.

C++ - passing references to std::shared_ptr or boost::shared_ptr

If I have a function that needs to work with a shared_ptr, wouldn't it be more efficient to pass it a reference to it (so to avoid copying the shared_ptr object)?
What are the possible bad side effects?
I envision two possible cases:
1) inside the function a copy is made of the argument, like in
ClassA::take_copy_of_sp(boost::shared_ptr<foo> &sp)
{
...
m_sp_member=sp; //This will copy the object, incrementing refcount
...
}
2) inside the function the argument is only used, like in
Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here
{
...
sp->do_something();
...
}
I can't see in both cases a good reason to pass the boost::shared_ptr<foo> by value instead of by reference. Passing by value would only "temporarily" increment the reference count due to the copying, and then decrement it when exiting the function scope.
Am I overlooking something?
Just to clarify, after reading several answers: I perfectly agree on the premature-optimization concerns, and I always try to first-profile-then-work-on-the-hotspots. My question was more from a purely technical code-point-of-view, if you know what I mean.
I found myself disagreeing with the highest-voted answer, so I went looking for expert opinons and here they are.
From http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2011-Scott-Andrei-and-Herb-Ask-Us-Anything
Herb Sutter: "when you pass shared_ptrs, copies are expensive"
Scott Meyers: "There's nothing special about shared_ptr when it comes to whether you pass it by value, or pass it by reference. Use exactly the same analysis you use for any other user defined type. People seem to have this perception that shared_ptr somehow solves all management problems, and that because it's small, it's necessarily inexpensive to pass by value. It has to be copied, and there is a cost associated with that... it's expensive to pass it by value, so if I can get away with it with proper semantics in my program, I'm gonna pass it by reference to const or reference instead"
Herb Sutter: "always pass them by reference to const, and very occasionally maybe because you know what you called might modify the thing you got a reference from, maybe then you might pass by value... if you copy them as parameters, oh my goodness you almost never need to bump that reference count because it's being held alive anyway, and you should be passing it by reference, so please do that"
Update: Herb has expanded on this here: http://herbsutter.com/2013/06/05/gotw-91-solution-smart-pointer-parameters/, although the moral of the story is that you shouldn't be passing shared_ptrs at all "unless you want to use or manipulate the smart pointer itself, such as to share or transfer ownership."
The point of a distinct shared_ptr instance is to guarantee (as far as possible) that as long as this shared_ptr is in scope, the object it points to will still exist, because its reference count will be at least 1.
Class::only_work_with_sp(boost::shared_ptr<foo> sp)
{
// sp points to an object that cannot be destroyed during this function
}
So by using a reference to a shared_ptr, you disable that guarantee. So in your second case:
Class::only_work_with_sp(boost::shared_ptr<foo> &sp) //Again, no copy here
{
...
sp->do_something();
...
}
How do you know that sp->do_something() will not blow up due to a null pointer?
It all depends what is in those '...' sections of the code. What if you call something during the first '...' that has the side-effect (somewhere in another part of the code) of clearing a shared_ptr to that same object? And what if it happens to be the only remaining distinct shared_ptr to that object? Bye bye object, just where you're about to try and use it.
So there are two ways to answer that question:
Examine the source of your entire program very carefully until you are sure the object won't die during the function body.
Change the parameter back to be a distinct object instead of a reference.
General bit of advice that applies here: don't bother making risky changes to your code for the sake of performance until you've timed your product in a realistic situation in a profiler and conclusively measured that the change you want to make will make a significant difference to performance.
Update for commenter JQ
Here's a contrived example. It's deliberately simple, so the mistake will be obvious. In real examples, the mistake is not so obvious because it is hidden in layers of real detail.
We have a function that will send a message somewhere. It may be a large message so rather than using a std::string that likely gets copied as it is passed around to multiple places, we use a shared_ptr to a string:
void send_message(std::shared_ptr<std::string> msg)
{
std::cout << (*msg.get()) << std::endl;
}
(We just "send" it to the console for this example).
Now we want to add a facility to remember the previous message. We want the following behaviour: a variable must exist that contains the most recently sent message, but while a message is currently being sent then there must be no previous message (the variable should be reset before sending). So we declare the new variable:
std::shared_ptr<std::string> previous_message;
Then we amend our function according to the rules we specified:
void send_message(std::shared_ptr<std::string> msg)
{
previous_message = 0;
std::cout << *msg << std::endl;
previous_message = msg;
}
So, before we start sending we discard the current previous message, and then after the send is complete we can store the new previous message. All good. Here's some test code:
send_message(std::shared_ptr<std::string>(new std::string("Hi")));
send_message(previous_message);
And as expected, this prints Hi! twice.
Now along comes Mr Maintainer, who looks at the code and thinks: Hey, that parameter to send_message is a shared_ptr:
void send_message(std::shared_ptr<std::string> msg)
Obviously that can be changed to:
void send_message(const std::shared_ptr<std::string> &msg)
Think of the performance enhancement this will bring! (Never mind that we're about to send a typically large message over some channel, so the performance enhancement will be so small as to be unmeasureable).
But the real problem is that now the test code will exhibit undefined behaviour (in Visual C++ 2010 debug builds, it crashes).
Mr Maintainer is surprised by this, but adds a defensive check to send_message in an attempt to stop the problem happening:
void send_message(const std::shared_ptr<std::string> &msg)
{
if (msg == 0)
return;
But of course it still goes ahead and crashes, because msg is never null when send_message is called.
As I say, with all the code so close together in a trivial example, it's easy to find the mistake. But in real programs, with more complex relationships between mutable objects that hold pointers to each other, it is easy to make the mistake, and hard to construct the necessary test cases to detect the mistake.
The easy solution, where you want a function to be able to rely on a shared_ptr continuing to be non-null throughout, is for the function to allocate its own true shared_ptr, rather than relying on a reference to an existing shared_ptr.
The downside is that copied a shared_ptr is not free: even "lock-free" implementations have to use an interlocked operation to honour threading guarantees. So there may be situations where a program can be significantly sped up by changing a shared_ptr into a shared_ptr &. But it this is not a change that can be safely made to all programs. It changes the logical meaning of the program.
Note that a similar bug would occur if we used std::string throughout instead of std::shared_ptr<std::string>, and instead of:
previous_message = 0;
to clear the message, we said:
previous_message.clear();
Then the symptom would be the accidental sending of an empty message, instead of undefined behaviour. The cost of an extra copy of a very large string may be a lot more significant than the cost of copying a shared_ptr, so the trade-off may be different.
I would advise against this practice unless you and the other programmers you work with really, really know what you are all doing.
First, you have no idea how the interface to your class might evolve and you want to prevent other programmers from doing bad things. Passing a shared_ptr by reference isn't something a programmer should expect to see, because it isn't idiomatic, and that makes it easy to use it incorrectly. Program defensively: make the interface hard to use incorrectly. Passing by reference is just going to invite problems later on.
Second, don't optimize until you know this particular class is going to be a problem. Profile first, and then if your program really needs the boost given by passing by reference, then maybe. Otherwise, don't sweat the small stuff (i.e. the extra N instructions it takes to pass by value) instead worry about design, data structures, algorithms, and long-term maintainability.
Yes, taking a reference is fine there. You don't intend to give the method shared ownership; it only wants to work with it. You could take a reference for the first case too, since you copy it anyway. But for first case, it takes ownership. There is this trick to still copy it only once:
void ClassA::take_copy_of_sp(boost::shared_ptr<foo> sp) {
m_sp_member.swap(sp);
}
You should also copy when you return it (i.e not return a reference). Because your class doesn't know what the client is doing with it (it could store a pointer to it and then big bang happens). If it later turns out it's a bottleneck (first profile!), then you can still return a reference.
Edit: Of course, as others point out, this only is true if you know your code and know that you don't reset the passed shared pointer in some way. If in doubt, just pass by value.
It is sensible to pass shared_ptrs by const&. It will not likely cause trouble (except in the unlikely case that the referenced shared_ptr is deleted during the function call, as detailed by Earwicker) and it will likely be faster if you pass a lot of these around. Remember; the default boost::shared_ptr is thread safe, so copying it includes a thread safe increment.
Try to use const& rather than just &, because temporary objects may not be passed by non-const reference. (Even though a language extension in MSVC allows you to do it anyway)
In the second case, doing this is simpler:
Class::only_work_with_sp(foo &sp)
{
...
sp.do_something();
...
}
You can call it as
only_work_with_sp(*sp);
I would avoid a "plain" reference unless the function explicitely may modify the pointer.
A const & may be a sensible micro-optimization when calling small functions - e.g. to enable further optimizations, like inlining away some conditions. Also, the increment/decrement - since it's thread safe - is a synchronization point. I would not expect this to make a big difference in most scenarios, though.
Generally, you should use the simpler style unless you have reason not to. Then, either use the const & consistently, or add a comment as to why if you use it just in a few places.
I would advocate passing shared pointer by const reference - a semantics that the function being passed with the pointer does NOT own the pointer, which is a clean idiom for developers.
The only pitfall is in multiple thread programs the object being pointed by the shared pointer gets destroyed in another thread. So it is safe to say using const reference of shared pointer is safe in single threaded program.
Passing shared pointer by non-const reference is sometimes dangerous - the reason is the swap and reset functions the function may invoke inside so as to destroy the object which is still considered valid after the function returns.
It is not about premature optimization, I guess - it is about avoiding unnecessary waste of CPU cycles when you are clear what you want to do and the coding idiom has firmly been adopted by your fellow developers.
Just my 2 cents :-)
It seems that all the pros and cons here can actually be generalised to ANY type passed by reference not just shared_ptr. In my opinion, you should know the semantic of passing by reference, const reference and value and use it correctly. But there is absolutely nothing inherently wrong with passing shared_ptr by reference, unless you think that all references are bad...
To go back to the example:
Class::only_work_with_sp( foo &sp ) //Again, no copy here
{
...
sp.do_something();
...
}
How do you know that sp.do_something() will not blow up due to a dangling pointer?
The truth is that, shared_ptr or not, const or not, this could happen if you have a design flaw, like directly or indirectly sharing the ownership of sp between threads, missusing an object that do delete this, you have a circular ownership or other ownership errors.
One thing that I haven't seen mentioned yet is that when you pass shared pointers by reference, you lose the implicit conversion that you get if you want to pass a derived class shared pointer through a reference to a base class shared pointer.
For example, this code will produce an error, but it will work if you change test() so that the shared pointer is not passed by reference.
#include <boost/shared_ptr.hpp>
class Base { };
class Derived: public Base { };
// ONLY instances of Base can be passed by reference. If you have a shared_ptr
// to a derived type, you have to cast it manually. If you remove the reference
// and pass the shared_ptr by value, then the cast is implicit so you don't have
// to worry about it.
void test(boost::shared_ptr<Base>& b)
{
return;
}
int main(void)
{
boost::shared_ptr<Derived> d(new Derived);
test(d);
// If you want the above call to work with references, you will have to manually cast
// pointers like this, EVERY time you call the function. Since you are creating a new
// shared pointer, you lose the benefit of passing by reference.
boost::shared_ptr<Base> b = boost::dynamic_pointer_cast<Base>(d);
test(b);
return 0;
}
I'll assume that you are familiar with premature optimization and are asking this either for academic purposes or because you have isolated some pre-existing code that is under-performing.
Passing by reference is okay
Passing by const reference is better, and can usually be used, as it does not force const-ness on the object pointed to.
You are not at risk of losing the pointer due to using a reference. That reference is evidence that you have a copy of the smart pointer earlier in the stack and only one thread owns a call stack, so that pre-existing copy isn't going away.
Using references is often more efficient for the reasons you mention, but not guaranteed. Remember that dereferencing an object can take work too. Your ideal reference-usage scenario would be if your coding style involves many small functions, where the pointer would get passed from function to function to function before being used.
You should always avoid storing your smart pointer as a reference. Your Class::take_copy_of_sp(&sp) example shows correct usage for that.
Assuming we are not concerned with const correctness (or more, you mean to allow the functions to be able to modify or share ownership of the data being passed in), passing a boost::shared_ptr by value is safer than passing it by reference as we allow the original boost::shared_ptr to control it's own lifetime. Consider the results of the following code...
void FooTakesReference( boost::shared_ptr< int > & ptr )
{
ptr.reset(); // We reset, and so does sharedA, memory is deleted.
}
void FooTakesValue( boost::shared_ptr< int > ptr )
{
ptr.reset(); // Our temporary is reset, however sharedB hasn't.
}
void main()
{
boost::shared_ptr< int > sharedA( new int( 13 ) );
boost::shared_ptr< int > sharedB( new int( 14 ) );
FooTakesReference( sharedA );
FooTakesValue( sharedB );
}
From the example above we see that passing sharedA by reference allows FooTakesReference to reset the original pointer, which reduces it's use count to 0, destroying it's data. FooTakesValue, however, can't reset the original pointer, guaranteeing sharedB's data is still usable. When another developer inevitably comes along and attempts to piggyback on sharedA's fragile existence, chaos ensues. The lucky sharedB developer, however, goes home early as all is right in his world.
The code safety, in this case, far outweighs any speed improvement copying creates. At the same time, the boost::shared_ptr is meant to improve code safety. It will be far easier to go from a copy to a reference, if something requires this kind of niche optimization.
Sandy wrote: "It seems that all the pros and cons here can actually be generalised to ANY type passed by reference not just shared_ptr."
True to some extent, but the point of using shared_ptr is to eliminate concerns regarding object lifetimes and to let the compiler handle that for you. If you're going to pass a shared pointer by reference and allow clients of your reference-counted-object call non-const methods that might free the object data, then using a shared pointer is almost pointless.
I wrote "almost" in that previous sentence because performance can be a concern, and it 'might' be justified in rare cases, but I would also avoid this scenario myself and look for all possible other optimization solutions myself, such as to seriously look at adding another level of indirection, lazy evaluation, etc..
Code that exists past it's author, or even post it's author's memory, that requires implicit assumptions about behavior, in particular behavior about object lifetimes, requires clear, concise, readable documentation, and then many clients won't read it anyway! Simplicity almost always trumps efficiency, and there are almost always other ways to be efficient. If you really need to pass values by reference to avoid deep copying by copy constructors of your reference-counted-objects (and the equals operator), then perhaps you should consider ways to make the deep-copied data be reference counted pointers that can be copied quickly. (Of course, that's just one design scenario that might not apply to your situation).
I used to work in a project that the principle was very strong about passing smart pointers by value. When I was asked to do some performance analysis - I found that for increment and decrement of the reference counters of the smart pointers the application spends between 4-6% of the utilized processor time.
If you want to pass the smart pointers by value just to avoid having issues in weird cases as described from Daniel Earwicker make sure you understand the price you paying for it.
If you decide to go with a reference the main reason to use const reference is to make it possible to have implicit upcasting when you need to pass shared pointer to object from class that inherits the class you use in the interface.
In addition to what litb said, I'd like to point out that it's probably to pass by const reference in the second example, that way you are sure you don't accidentally modify it.
struct A {
shared_ptr<Message> msg;
shared_ptr<Message> * ptr_msg;
}
pass by value:
void set(shared_ptr<Message> msg) {
this->msg = msg; /// create a new shared_ptr, reference count will be added;
} /// out of method, new created shared_ptr will be deleted, of course, reference count also be reduced;
pass by reference:
void set(shared_ptr<Message>& msg) {
this->msg = msg; /// reference count will be added, because reference is just an alias.
}
pass by pointer:
void set(shared_ptr<Message>* msg) {
this->ptr_msg = msg; /// reference count will not be added;
}
Every code piece must carry some sense. If you pass a shared pointer by value everywhere in the application, this means "I am unsure about what's going on elsewhere, hence I favour raw safety". This is not what I call a good confidence sign to other programmers who could consult the code.
Anyway, even if a function gets a const reference and you are "unsure", you can still create a copy of the shared pointer at the head of the function, to add a strong reference to the pointer. This could also be seen as a hint about the design ("the pointer could be modified elsewhere").
So yes, IMO, the default should be "pass by const reference".