Revisiting reasons for no implicit conversion on smart pointer - c++

[One answer, below is to force a choice between .release() and .get(), sadly the wrong general advice is to just use .get()]
Summary: This question is asking for technical reasons, or behavioural reasons (perhaps based on experience), why smart pointers such as unique_ptr are stripped of the major characteristic of a pointer, i.e. the ability to be passed where a pointer is expected (C API's).
I have researched the topic and cite two major claimed reasons below, but these hardly seem valid.
But my arrogance is not boundless, and my experience not so extensive as to convince me that I must be right.
It may be that unique_ptr was not designed simple lifetime management of dumb C API pointers as a main use, certainly this proposed development of unique_ptr would not be [http://bartoszmilewski.com/2009/05/21/unique_ptr-how-unique-is-it/], however unique_ptr claims to be "what auto_ptr should have been" (but that we couldn't write in C++98)" [http://www.stroustrup.com/C++11FAQ.html#std-unique_ptr] but perhaps that is not a prime use of auto_ptr either.
I'm using unique_ptr for management of some C API resources, and shocked [yes, shocked :-)] to find that so-called smart pointers hardly behave as pointers at all.
The API's I use expect pointers, and I really don't want to be adding .get() all over the place. It all makes the smart pointer unique_ptr seem quite dumb.
What's the current reasoning for unique_ptr not automatically converting to the pointer it holds when it is being cast to the type of that pointer?
void do_something(BLOB* b);
unique_ptr<BLOB> b(new_BLOB(20));
do_something(b);
// equivalent to do_something(b.get()) because of implicit cast
I have read http://herbsutter.com/2012/06/21/reader-qa-why-dont-modern-smart-pointers-implicitly-convert-to/ and it remarkably (given the author) doesn't actually answer the question convincingly, so I wonder if there are more real examples or technical/behavioural reasons to justify this.
To reference the article examples, I'm not trying do_something(b + 42), and + is not defined on the object I'm pointing to so *b + 42 doesn't make sense.
But if it did, and if I meant it, then I would actually type *b + 42, and if I wanted to add 42 to the pointer I would type b + 42 because I'm expecting my smart pointer to actually act like a pointer.
Can a reason for making smart pointers dumb, really be the fear that the C++ coder won't understand how to use a pointer, or will keep forgetting to deference it? That if I make an error with a smart pointer, it will silently compile and behave just as it does with a dumb pointer? [Surely that argument has no end, I might forget the > in ->]
For the other point in the post, I'm no more likely to write delete b than I am to write delete b.get(), though this seems to be a commonly proposed reason (perhaps because of legacy code conversions), and discussed C++ "smart pointer" template that auto-converts to bare pointer but can't be explicitly deleted however Meyers ambiguity of 1996, mentioned in http://www.informit.com/articles/article.aspx?p=31529&seqNum=7 seems to answer that case well by defining a cast for void* as well as for T* so that delete can't work out which cast to use.
The delete problem seems to have had some legitimacy, as it is likely to be a real problem when porting some legacy code but it seems to be well addressed even prior to Meyer (http://www.boost.org/doc/libs/1_43_0/libs/smart_ptr/sp_techniques.html#preventing_delete)
Are there more technical for denying this basic pointer-behaviour of smart pointers? Or did the reasons just seem very compelling at the time?
Previous discussions contain general warnings of bad things [Add implicit conversion from unique_ptr<T> to T* , Why doesn't `unique_ptr<QByteArray>` degrade to `QByteArray*`?, C++ "smart pointer" template that auto-converts to bare pointer but can't be explicitly deleted, http://bartoszmilewski.com/2009/05/21/unique_ptr-how-unique-is-it/] but nothing specific, or that is any worse than use of non-smart C pointers to C API's.
The inconvenience measured against the implied risks of coders blindly adding .get() everywhere and getting all the same harms they were supposed to have been protected against make the whole limitation seem very unworthwhile.
In my case, I used Meyer's trick of two casts, and take the accompanying hidden risks, hoping that readers will help me know what they are.
template<typename T, typename D, D F>
class unique_dptr : public std::unique_ptr<T, D> {
public: unique_dptr(T* t) : std::unique_ptr<T, D>(t, F) { };
operator T*() { return this->get(); }
operator void*() { return this->get(); }
};
#define d_type(__f__) decltype(&__f__), __f__
with thanks to #Yakk for the macro tip (Clean implementation of function template taking function pointer, How to fix error refactoring decltype inside template)
using RIP_ptr = unique_dptr<RIP, d_type(::RIP_free)>;
RIP_ptr rip1(RIP_new_bitmap("/tmp/test.png"));
No that's a smart pointer I can use.
* I declare the free function once when the smart pointer type is defined
* I can pass it around like a pointer
* It gets freed when the scope dies
Yes, I can use the API wrongly and leak owned references, but .get() doesn't stop that, despite it's inconvenience.
Maybe I should have some consts in there as a nod to lack of ownership-transference.

One answer that I'm surprised I haven't found in my searches is implied in the documentation for unique_ptr::release [http://en.cppreference.com/w/cpp/memory/unique_ptr/release]
release() returns the pointer and the unique_ptr then references nullptr and so clearly this can be used for passing on the owned reference to an API that doesn't use smart pointers.
By inference, get() is a corresponding function to pass unowned reference
As a pair, these functions explain why automatic de-referencing is not permitted; the coder is forced to replace each pointer use either with .release() or .get() depending on how the called function will treat the pointer.
And thus the coder is forced to take intelligent action and choose one behaviour or the other, and upgrade the legacy code to be more explicit and safe.
.get() is a weird name for that use, but this explanation makes good sense to me.
Sadly, this strategy is ineffective if the only advice the coder has is to use .get(); the coder is not aware of the choice and misses a chance to make his code safe and clear.

Related

Is implicit conversion from unique_ptr to raw pointer supported?

I was reading Effective C++ 3rd Edition. In page 70, the author says:
Like virtually all smart pointer classes, tr1::shared_ptr and auto_ptr also overload the pointer dereferencing operators (operator-> and operator*), and this allows implicit conversion to the underlying raw pointers (...)
He then shows an example with shared_ptr (which was part of tr1 at the time) featuring implicit conversion based on a class named Investment:
shared_ptr<Investment> pi1();
bool taxable1 = !(pi1->isTaxFree());
^implicit conversion
shared_ptr<Investment> pi2();
bool taxable2 = !((*pi2).isTaxFree());
^implicit conversion
Well, I have since then written a few test cases with unique_ptr and they hold up.
I also found out about unique_ptr supporting arrays and shared_ptr also going to (see note). However, in my testing, implicit conversion does not seem to work for smart pointers around arrays.
Example: I wanted this to be valid...
unique_ptr<int[]> test(new int[1]);
(*test)[0] = 5;
but it is not, according to my compiler (Visual C++ 2015 Update 3).
Then, from a little research, I found some evidence suggesting that implicit conversion isn't supported at all... like this one for instance: https://herbsutter.com/2012/06/21/reader-qa-why-dont-modern-smart-pointers-implicitly-convert-to.
At this point I am in doubt. Is it supported (by the Standard), or is it not?
Note: The book might be a bit outdated on this topic, since the author also says on page 65 that "there is nothing like auto_ptr or tr1::shared_ptr for dinamically allocated arrays, not even in TR1".
Well, here's the thing. There is no implicit conversion to the underlying pointer, you have to call a specific get member function (it's a theme in the standard library, think std::string::c_str).
But that's a good thing! Implicitly converting the pointer can break the guarantees of unique_ptr. Consider the following:
std::unique_ptr<int> p1(new int);
std::unique_ptr<int> p2(p1);
In the above code, the compiler can try to pass p1s pointee to p2! (It won't since this call will be ambiguous anyway, but assume it wasn't). They will both call delete on it!
But we still want to use the smart pointer as if it was a raw one. Hence all the operators are overloaded.
Now let's consider your code:
(*test)[0] = 5;
It calls unique_ptr::operator* which produces an int&1. Then you try use the subscript operator on it. That's your error.
If you have a std::unique_ptr<int[]> than just use the operator[] overload that the handle provides:
test[0] = 5;
1 As David Scarlett pointed out, it shouldn't even compile. The array version isn't supposed to have this operator.
As StoryTeller indicates, implicit conversions would ruin the show, but I'd like to suggest another way of thinking about this:
Smart pointers like unique_ptr and shared_ptr try to hide the underlying raw pointer because they try to maintain a certain kind of ownership semantics over it. If you were to freely obtain that pointer and pass it around, you could easily violate those semantics. They still provide a way to access it (get), since they couldn't stop you completely even if they wanted to (after all you can just follow the smart pointer and get the address of the pointee). But they still want to put a barrier to make sure you don't access it accidentally.
All is not lost though! You can still gain that syntactic convenience by defining a new kind of smart pointer with very weak semantics, such that it can safely be implicitly constructed from most other smart pointers. Consider:
// ipiwdostbtetci_ptr stands for :
// I promise I wont delete or store this beyond the expression that created it ptr
template<class T>
struct ipiwdostbtetci_ptr {
T * _ptr;
T & operator*() {return *_ptr;}
T * operator->(){return _ptr;}
ipiwdostbtetci_ptr(T * raw): _ptr{raw} {}
ipiwdostbtetci_ptr(const std::unique_ptr<T> & unq): _ptr{unq.get()} {}
ipiwdostbtetci_ptr(const std::shared_ptr<T> & shr): _ptr{shr.get()} {}
};
So, what's the point of this satirically named smart pointer? It's just a kind of pointer that's verbally given a contract that the user will never keep it or a copy of it alive beyond the expression that created it and the user will also never attempt to delete it. With these constraints followed by the user (without the compiler checking it), it's completely safe to implicitly convert many smart pointers as well as any raw pointer.
Now you can implement functions that expect a ipiwdostbtetci_ptr (with the assumption that they'll honor the semantics), and conveniently call them:
void f(ipiwdostbtetci_ptr<MyClass>);
...
std::unique_ptr<MyClass> p = ...
f(p);

Why is raw pointer to shared_ptr construction allowed in all cases?

I was reading Top 10 dumb mistakes to avoid with C++11 smart pointer.
Number #5 reads:
Mistake # 5 : Not assigning an object(raw pointer) to a shared_ptr as
soon as it is created !
int main()
{
Aircraft* myAircraft = new Aircraft("F-16");
shared_ptr<aircraft> pAircraft(myAircraft);
...
shared_ptr<aircraft> p2(myAircraft);
// will do a double delete and possibly crash
}
and the recommendation is something like:
Use make_shared or new and immediately construct the pointer with
it.
Ok, no doubt about it the problem and the recommendation.
However I have a question about the design of shared_ptr.
This is a very easy mistake to make and the whole "safe" design of shared_ptr could be thrown away by very easy-to-detect missuses.
Now the question is, could this be easily been fixed with an alternative design of shared_ptr in which the only constructor from raw pointer would be that from a r-value reference?
template<class T>
struct shared_ptr{
shared_ptr(T*&& t){...basically current implementation...}
shared_ptr(T* t) = delete; // this is to...
shared_ptr(T* const& t) = delete; // ... illustrate the point.
shared_ptr(T*& t) = delete;
...
};
In this way shared_ptr could be only initialized from the result of new or some factory function.
Is this an underexploitation of the C++ language in the library?
or What is the point of having a constructor from raw pointer (l-value) reference if this is going to be most likely a misuse?
Is this a historical accident? (e.g. shared_ptr was proposed before r-value references were introduced, etc) Backwards compatibility?
(Of course one could say std::shared_ptr<type>(std::move(ptr)); that that is easier to catch and also a work around if this is really necessary.)
Am I missing something?
Pointers are very easy to copy. Even if you restrict to r-value reference you can sill easily make copies (like when you pass a pointer as a function parameter) which will invalidate the safety setup. Moreover you will run into problems in templates where you can easily have T* const or T*& as a type and you get type mismatches.
So you are proposing to create more restrictions without significant safety gains, which is likely why it was not in the standard to begin with.
The point of make_shared is to atomize the construction of a shared pointer. Say you have f(shared_ptr<int>(new int(5)), throw_some_exception()). The order of parameter invokation is not guaranteed by the standard. The compiler is allowed to create a new int, run throw_some_exception and then construct the shared_ptr which means that you could leak the int (if throw_some_exception actually throws an exception). make_shared just creates the object and the shared pointer inside itself, which doesn't allow the compiler to change the order, so it becomes safe.
I do not have any special insight into the design of shared_ptr, but I think the most likely explanation is that the timelines involved made this impossible:
The shared_ptr was introduced at the same time as rvalue-references, in C++11. The shared_ptr already had a working reference implementation in boost, so it could be expected to be added to standard libraries relatively quickly.
If the constructor for shared_ptr had only supported construction from rvalue references, it would have been unusable until the compiler had also implemented support for rvalue references.
And at that time, compiler and standards development was much more asynchronous, so it could have taken years until all compiler had implemented support, if at all. (export templates were still fresh on peoples minds in 2011)
Additionally, I assume the standards committee would have felt uncomfortable standardizing an API that did not have a reference implementation, and could not even get one until after the standard was published.
There's a number of cases in which you may not be able to call make_shared(). For example, your code may not be responsible for allocating and constructing the class in question. The following paradigm (private constructors + factory functions) is used in some C++ code bases for a variety of reasons:
struct A {
private:
A();
};
A* A_factory();
In this case, if you wanted to stick the A* you get from A_factory() into a shared_ptr<>, you'd have to use the constructor which takes a raw pointer instead of make_shared().
Off the top of my head, some other examples:
You want to get aligned memory for your type using posix_memalign() and then store it in a shared_ptr<> with a custom deleter that calls free() (this use case will go away soon when we add aligned allocation to the language!).
You want to stick a pointer to a memory-mapped region created with mmap() into a shared_ptr<> with a custom deleter that calls munmap() (this use case will go away when we get a standardized facility for shmem, something I'm hoping to work on in the next few months).
You want to stick a pointer allocated by into a shared_ptr<> with a custom deleter.

Why don't smart pointers have conversion operator back to the base pointer?

If often find myself using code like this:
boost::scoped_ptr<TFoo> f(new TFoo);
Bar(f.get()); // call legacy or 3rd party function : void Bar (TFoo *)
Now, I think the smart pointers could easily define an implicit conversion operator back to the 'raw' pointer type, which would allow this code to still be valid, and ease the 'smartening' of old code
Bar(f);
But, they don't -or at least, not the ones I've found. Why?
IMO implicit conversion is the root of all evil in c++, and one the toughest kinds of bugs to track down.
It's good practice not to rely on them - you can't predict all behaviours.
Because it's then very easy to accidentally bypass the smart pointer. For example what if you write :-
delete f;
In your example, bad things would happen. Your functions could be similar, they might store their own copy of the pointer which breaks the smart pointer then. At least calling get forces you to think "is this safe?"

Is it dangerous to have a cast operator on a unique_ptr?

We have an extensive code base which currently uses raw pointers, and I'm hoping to migrate to unique_ptr. However, many functions expect raw pointers as parameters and a unique_ptr cannot be used in these cases. I realize I can use the get() method to pass the raw pointer, but this increases the number of lines of code I have to touch, and I find it a tad unsightly. I've rolled my own unique_ptr which looks like this:
template <class T>
class my_unique_ptr: public unique_ptr <T>
{
public:
operator T*() { return get(); };
};
Then every time I provide a my_unique_ptr to a function parm which expects a raw pointer, it automagically turns it into the raw pointer.
Question: Is there something inherently dangerous about doing this? I would have thought this would have been part of the unique_ptr implementation, so I'm presuming its omission is deliberate - does anyone know why?
There's a lot of ugly things that can happen on accident with implicit conversions, such as this:
std::unique_ptr<resource> grab_resource()
{return std::unique_ptr<resource>(new resource());}
int main() {
resource* ptr = grab_resource(); //compiles just fine, no problem
ptr->thing(); //except the resource has been deallocated before this line
return 0; //This program has undefined behavior.
}
It is the same as invoking the get on the unique_ptr<>, but it will be done automatically. You will have to make sure the pointer is not stored/used after the function returns (as unique_ptr<> will delete it when its lifetime ends).
Also make sure you don't call delete (even indirectly) on the raw pointer.
Yet another thing to make sure is that you do not create another smart pointer that takes ownership of the pointer (e.g. another uniqe_ptr<>) -- see delete note above
The reason for unique_ptr<> not doing the conversion for you automatically (and have you call get() explicitly) is to ensure you have control over when you access the raw pointer (to avoid the above issues that could happen silently otherwise)
The main "danger" in providing an implicit conversion operator (to a raw pointer) on a unique_ptr stems from the fact that unique_ptr is supposed to model single-ownership semantics. This is also why it's not possible to copy a unique_ptr, but it can be moved.
Consider an example of this "danger":
class A {};
/* ... */
unique_ptr<A> a(new A);
A* a2 = a;
unique_ptr<A> a3(a2);
Now two unique_ptrs model single-ownership semantics over the object pointed to, and the fate of the free world -- nay, the universe -- hangs in the balance.
OK, I'm being a bit dramatic, but that's the idea.
As far as workarounds go, I would normally just call .get() on the pointer and be done with it, being careful to recognize a lack of ownership on what I just got().
Given that you have a large, legacy code-base that you are trying to migrate to using unique_ptr, I think your conversion wrapper is fine -- but only assuming you and your co-workers don't make mistakes in the future when maintaining this code. If that is a possibility you find likely (and I normally would because I'm paranoid), I would try to retrofit all existing code to call .get() explicitly instead of providing an implicit conversion. The compiler will happily find all the instances for you where this change needs to be made.
Is it dangerous to have a cast operator on a unique_ptr?
Edited see comments
---No, it certainly should not be dangerous---: in fact the standard requires an implicit explicit conversion to 'safe bool' (see Safe Bool Idiom).
Explanation for edit:
The new standard (also introducing unique_ptr) introduced explicit operator T() casts. It still works for unique_ptr as if it were an implicit conversion, in that if, while, for(;x;) does automatic contextual conversion to bool.
In my defense, my knowledge was largely based on C++03 libraries like Boost which do define the conversion to unspecified-bool-type implicit.
I hope this is still informative to anyone else.
(see the other answers for more canonical treatment, including the conversion to raw pointer)

auto_ptr design

In my opinion, a class should provide a well defined abstraction and no private members should be modified without the knowledge of class. But when I checked the "auto_ptr" (or any other smart pointer), this rule is violated. Please see the following code
class Foo{
public:
Foo(){}
};
int main(int argc, char* argv[])
{
std::auto_ptr<Foo> fooPtr(new Foo);
delete fooPtr.operator ->();
return 0;
}
The operator overload (->) gives the underlying pointer and it can be modified without the knowledge of "auto_ptr". I can't think it as a bad design as the smart pointers are designed by C++ geeks, but I am wondering why they allowed this. Is there any way to write a smart pointer without this problem.
Appreciate your thoughts.
There are two desirable properties a smart pointer should have:
The raw pointer can be retrieved (e.g. for passing to legacy library functions)
The raw pointer cannot be retrieved (to prevent double-delete)
Obviously, these properties are contradictory and cannot be realised at the same time! Even Boost's shared_ptr<Foo> et al. have get(), so they have this "problem." In practice, the first is more important, so the second has to go.
By the way, I'm not sure why you reached for the slightly obscure operator->() when the ordinary old get() method causes the same problem:
std::auto_ptr<Foo> fooPtr(new Foo);
delete fooPtr.get();
In order to provide fast, convenient, "pointer-like" access to the underlying object, operator-> unfortunately has to "leak" its abstraction a bit. Otherwise, smart pointers would have to manually wrap all of the members that are allowed to be exposed. These either requires a lot of "configuration" work on the part of those instantiating the smart pointer, or a level of meta-programming that just isn't present in C++. Besides, as pyrsta points out, even if this hole was plugged, there are still many other (perhaps non-standard) ways to subvert C++'s access control mechanisms.
Is there any way to write a smart pointer without this problem.
It isn't easy, and generally no (i.e., you can't do it for every, general Foo class).
The only way I can think of, to do this, would be by changing the declaration of the Foo class: make the Foo destructor private (or define a private delete operator as a member of the Foo class), and also specify in the declaration of the Foo class that std::auto_ptr<Foo> is a friend.
No, there's no way to completely prohibit such bad usage in C++.
As a general rule, the user of any library code should never call delete on any wrapped pointers unless specifically documented. And in my opinion, all modern C++ code should be designed so that the user of the classes never was left the full responsibility to manually release her acquired resources (ie. use RAII instead).
Aside note: std::auto_ptr<T> isn't the best option anymore. Its bad behaviour on copying can lead to serious coding errors. Often a better idea is to use std::tr1::scoped_ptr<T> or std::tr1::shared_ptr<T> or their Boost variants instead.
Moreover, in C++0x, std::unique_ptr<T> will functionally supercede std::auto_ptr<T> as a safer-to-use class. Some discussion on the topic and a recent C++03 implementation for unique_ptr emulation can be found here.
I don't think this shows that auto_ptr has an encapsulation problem. Whenever dealing with owned pointers, it is critical for people to understand who owns what. In the case of auto_ptr, it owns the pointer that it holds[1]; this is part of auto_ptr's abstraction. Therefore, deleting that pointer in any other way violates the contract that auto_ptr provides.
I'd agree that it's relatively easy to mis-use auto_ptr[2], which is very not ideal, but in C++, you can never avoid the fundamental issue of "who owns this pointer?", because for better or worse, C++ does not manage memory for you.
[1] Quote from cplusplus.com: "auto_ptr objects have the peculiarity of taking ownership of the pointers assigned to them": http://www.cplusplus.com/reference/std/memory/auto_ptr/
[2] For example, you might mistakenly believe that it has value semantics, and use it as a vector template parameter: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0CEEQFjAD&url=http%3A%2F%2Fwww.gamedev.net%2Ftopic%2F502150-c-why-is-stdvectorstdauto_ptrmytype--bad%2F&ei=XU1qT5i9GcnRiAKCiu20BQ&usg=AFQjCNHigbgumbMG3MTmMPla2zo4LhaE1Q&sig2=WSyJF2eWrq2aB2qw8dF3Dw
I think this question addresses a non-issue. Smart pointers are there to manage ownership of pointers, and if doing so they make the pointer inaccessible, they fail their purpose.
Also consider this. Any container type gives you iterators over them; if it is such an iterator then &*it is a pointer to an item in the container; if you say delete &*it then you are dead. But exposing the adresses of its items is not a defect of container types.