Why hasn't not_null made it into the C++ standard yet? - c++

After adding the comment "// not null" to a raw pointer for the Nth time I wondered once again whatever happened to the not_null template.
The C++ core guidelines were created quite some time ago now and a few things have made into into the standard including for example std::span (some like string_view and std::array originated before the core guidelines themselves but are sometimes conflated). Given its relative simplicity why hasn't not_null (or anything similar) made it into the standard yet?
I scan the ISO mailings regularly (but perhaps not thoroughly) and I am not even aware of a proposal for it.
Possibly answering my own question. I do not recall coming across any cases where it would have prevented a bug in code I've worked on as we try not to write code that way.
The guidelines themselves are quite popular, making it into clang-tidy and sonar for example. The support libraries seem a little less popular.
For example boost has been available as a package on Linux from near the start. I am not aware of any implementation of GSL that is. Though, I presume it is bundled with Visual C++ on windows.
Since people have asked in the comments.
Myself I would use it to document intent.
A construct like not_null<> potentially has semantic value which a comment does not.
Enforcing it is secondary though I can see its place. This would preferably be done with zero overhead (perhaps at compile time only for a limited number of cases).
I was thinking mainly about the case of a raw pointer member variable. I had forgotten about the case of passing a pointer to a function for which I always use a reference to mean not-null and also to mean "I am not taking ownership".
Likewise (for class members) we could also document ownership owned<> not_owned<>.
I suppose there is also whether the associated object is allowed to be changed. This might be too high level though. You could use references members instead of pointers to document this. I avoid reference members myself because I almost always want copyable and assignable types. However, see for example Should I prefer pointers or references in member data? for some discussion of this.
Another dimension is whether another entity can modify a variable.
"const" says I promise not to modify it. In multi-threaded code we would like to say almost the opposite. That is "other code promises not to modify it while we are using it" (without an explicit lock) but this is way off topic...

There is one big technical issue that is likely unsolvable which makes standardizing not_null a problem: it cannot work with move-only smart pointers.
The most important use case for not_null is with smart pointers (for raw pointers a reference usually is adequate, but even then, there are times when a reference won't work). not_null<shared_ptr<T>> is a useful thing that says something important about the API that consumes such an object.
But not_null<unique_ptr<T>> doesn't work. It cannot work. The reason being that moving from a unique pointer leaves the old object null. Which is exactly what not_null is expected to prevent. Therefore, not_null<T> always forces a copy on its contained T. Which... you can't do with a unique_ptr, because that's the whole point of the type.
Being able to say that the unqiue_ptr consumed by an API is not null is good and useful. But you can't actually do that with not_null, which puts a hole in its utility.
So long as move-only smart pointers can't work with not_null, standardizing the class becomes problematic.

Related

Should I now be passing by value?

In this talk (sorry about the sound) Chandler Carruth suggests not passing by reference, even const reference, in the vast majority of cases due to the way in which it limits the back-end to perform optimisation.
He claims that in most cases the copy is negligible - which I am happy to believe, most data structures/classes etc. have a very small part allocated on the stack - especially when compared with the back-end having to assume pointer aliasing and all the nasty things that could be done to a reference type.
Let's say that we have large object on the stack - say ~4kB and a function that does something to an instance of this object (assume free-standing function).
Classically I would write:
void DoSomething(ExpensiveType* inOut);
ExpensiveType data;
...
DoSomething(&data);
He's suggesting:
ExpensiveType DoSomething(ExpensiveType in);
ExpensiveType data;
...
data = DoSomething(data);
According to what I got from the talk, the second would tend to optimise better. Is there a limit to how big I make something like this though, or is the back-end copy-elision stuff just going to prefer the values in almost all cases?
EDIT: To clarify I'm interested in the whole system, since I feel that this would be a major change to the way I write code, I've had use of refs over values drilled into me for anything larger than integral types for a long time now.
EDIT2: I tested it too, results and code here. No competition really, as we've been taught for a long time, the pointer is a far faster method of doing things. What intrigues me now is why it was suggested during that talk that we move to pass by value, but as the numbers don't support it, it's not something I'm going to do.
I have now watched parts of Chandler's talk. I think the general discussion along the lines "should I now always pass by value" does not do his talk justice. Edit: And actually his talk has been discussed before, here value semantics vs output params with large data structures and in a blog from Eric Niebler, http://ericniebler.com/2013/10/13/out-parameters-vs-move-semantics/.
Back to Chandler. In the key note he specifically (around the 4x-5x minute mark mentioned elsewhere) mentions the following points:
If the optimizer cannot see the code of the called function you have much bigger problems than passing refs or values. It pretty much prevents optimization. (There is a follow-up question at that point about link time optimization which may be discussed later, I don't know.)
He recommends the "new classical" way of returning values using move semantics. Instead of the old school way of passing a reference to an existing object as an in-out parameter the value should be constructed locally and moved out. The big advantage is that the optimizer can be sure that no part of the object is alisased since only the function has access to it.
He mentions threads, storing a variable's value in globals, and observable behaviour like output as examples for unknowns which prevent optimization when only refs/pointers are passed. I think an abstract description could be "the local code can not assume that local value changes are undetected elsewhere, and it cannot assume that a value which is not changed locally has not changed at all". With local copies these assumptions could be made.
Obviously, when passing (and possibly, if objects cannot be moved, when returning) by value, there is a trade-off between the copy cost and the optimization benefits. Size and other things making copying costly will tip the balance towards reference strategies, while lots of optimizable work on the object in the function tips it towards value passing. (His examples involved pointers to ints, not to 4k sized objects.)
Based on the parts I watched I do not think Chandler promoted passing by value as a one-fits-all strategy. I think he dissed passing by reference mostly in the context of passing an out parameter instead of returning a new object. His example was not about a function which modified an existing object.
On a general note:
A program should express the programmer's intent. If you need a copy, by all means do copy! If you want to modify an existing object, by all means use references or pointers. Only if side effects or run time behavior become unbearable; really only then try do do something smart.
One should also be aware that compiler optimizations are sometimes surprising. Other platforms, compilers, compiling options, class libraries or even just small changes in your own code may all prevent the compiler from coming to the rescue. The run-time cost of the change would in many cases come totally unexpected.
Perhaps you took that part of the talk out of context, or something. For large objects, typically it depends on whether the function needs a copy of the object or not. For example:
ExpensiveType DoSomething(ExpensiveType in)
{
cout << in.member;
}
you wasted a lot of resource copying the object unnecessarily, when you could have passed by const reference instead.
But if the function is:
ExpensiveType DoSomething(ExpensiveType in)
{
in.member = 5;
do_something_else(in);
}
and we did not want to modify the calling function's object, then this code is likely to be more efficient than:
ExpensiveType DoSomething(ExpensiveType const &inr)
{
ExpensiveType in = inr;
in.member = 5;
do_something_else(in);
}
The difference comes when invoked with an rvalue (e.g. DoSomething( ExpensiveType(6) ); The latter creates a temporary , makes a copy, then destroys both; whereas the former will create a temporary and use that to move-construct in. (I think this can even undergo copy elision).
NB. Don't use pointers as a hack to implement pass-by-reference. C++ has native pass by reference.

Well definedness of C++ programs hiding pointers

According to Wikipedia:
C++11 defines conditions under which pointer values are "safely
derived" from other values. An implementation may specify that it
operates under "strict pointer safety," in which case pointers that
are not derived according to these rules can become invalid.
As I read it you can get the safety model used by an implementation, however that's fixed for the compiler (possibly variable with a command line switch).
Suppose I have code that hides pointers, such code definitely would not run with a naive bolt on garbage collector. However collectors (like my own) and Boehm provide hooks for finding pointers in certain objects.
I am in particular thinking about JudyArrays. These are digital tries which necessarily hide the keys. My question is basically whether using such data structures would render the behaviour of a program undefined in C++11.
I hope not (since Judy Arrays outperform everything else). Also as it happens .. I'm using them to implement a garbage collector. I am concerned however because "minimal requirements" don't general work at all and were strongly opposed in the original debate on the C++ conformance model (by the UK and Australia). Parametric requirements are better. But the C++11 GC related text seems to be a bit of both so I'm confused!
It's implementation defined whether an implementation provides relaxed pointer safety (what you seem to want) or strict pointer safety (pointers remain valid only when safely derived). As you've implied, you can call get_pointer_safety to find out what the policy is, but the standard provides no way to specify/change the policy.
You may, however, be able to side-step this question. If you can make a call to declare_reachable (passing that pointer value) before you hide the pointer, it remains valid until a matching call to undeclare_reachable (and here "matching" means calls nest).

What is preferred way of passing pointer/reference to existing object in a constructor?

I'll start from example. There is a nice "tokenizer" class in boost. It take a string to be tokenized as a parameter in a constructor:
std::string string_to_tokenize("a bb ccc ddd 0");
boost::tokenizer<boost::char_separator<char> > my_tok(string_to_tokenize);
/* do something with my_tok */
The string isn't modifed in the tokenizer, so it is passed by const object reference. Therefore I can pass a temporary object there:
boost::tokenizer<boost::char_separator<char> > my_tok(std::string("a bb ccc ddd 0"));
/* do something with my_tok */
Everything looks fine, but if I try to use the tokenizer, a disaster occurs. After short investigation I realized, that the tokenizer class store the reference that I gave to it, and use in further use. Of course it cannot work well for reference to temporary object.
The documentation doesn't say explicitly that the object passed in the constructor will be used later, but ok, it is also not stated, that it won't be :) So I cannot assume this, my mistake.
It is a bit confusing however. In general case, when one object take another one by const reference, it suggest that temporary object can be given there. What do you think? Is this a bad convention? Maybe pointer to object (rather than reference) should be used in such cases? Or even further - won't it be useful to have some special keyword to argument that allow/disallow giving temporary object as parameter?
EDIT: The documentation (version 1.49) is rather minimalistic and the only part that may suggest such a problem is:
Note: No parsing is actually done upon construction. Parsing is done on demand as the tokens are accessed via the iterator provided by begin.
But it doesn't state explicitly, that the same object that was given will be used.
However, the point of this question is rather discussion about coding style in such a case, this is only an example that inspired me.
If some function (such as a constructor) takes an argument as reference-to-const then it should either
Document clearly that the lifetime of the referenced object must satisfy certain requirements (as in "Is not destroyed before this and that happens")
or
Create copies internally if it needs to make use of the given object at a later point.
In this particular case (the boost::tokenizer class) I'd assume that the latter isn't done for performance reasons and/or to make the class usable with container types which aren't even copyable in the first place. For this reason, I'd consider this a documentation bug.
Personally I think it's a bad idea, and it would be better write the constructor either to copy the string, or to take a const std::string* instead. It's only one extra character for the caller to type, but that character stops them accidentally using a temporary.
As a rule: don't create responsibilities on people to maintain objects without making it very obvious that they have that responsibility.
I think a special keyword wouldn't be a complete enough solution to justify changing the language. It's not actually temporaries that are the problem, it's any object that lives for less time than the object being constructed. In some circumstances a temporary would be fine (for example if the tokenizer object itself were also a temporary in the same full-expression). I don't really want to mess about with the language for the sake of half a fix, and there are fuller fixes available (for example take a shared_ptr, although that has its own issues).
"So I cannot assume this, my mistake"
I don't think it really is your mistake, I agree with Frerich that as well as being against my personal style guide to do this at all, if you do it and don't document then that's a documentation bug in any reasonable style guide.
It's absolutely essential that the required lifetime of by-reference function parameters is documented, if it's anything other than "at least as long as the function call". It's something that docs are often lax about, and needs to be done properly to avoid errors.
Even in garbage-collected languages, where lifetime itself is automatically handled and so tends to get neglected, it matters whether or not you can change or re-use your object without changing the behavior of some other object that you passed it to method of, some time in the past. So functions should document whether they retain an alias to their arguments in any language that lacks referential transparency. Especially so in C++ where object lifetime is the caller's problem.
Unfortunately the only mechanism to actually ensure that your function cannot retain a reference is to pass by value, which has a performance cost. If you can invent a language that allows aliasing normally, but also has a C-style restrict property that is enforced at compile-time, const-style, to prevent functions from squirreling away references to their arguments, then good luck and sign me up.
As others said, the boost::tokenizer example is the result of either a bug in the tokenizer or a warning missing from the documentation.
To generally answer the question, I found the following priority list useful. If you can't choose an option for some reason, you go to the next item.
Pass by value (copyable at an acceptable cost and don't need to change original object)
Pass by const reference (don't need to change original object)
Pass by reference (need to change original object)
Pass by shared_ptr (the lifetime of the object is managed by something else, this also clearly shows the intention to keep the reference)
Pass by raw pointer (you got an address to cast to, or you can't use a smart pointer for some reason)
Also, if your reasoning to choose the next item from the list is "performance", then sit down and measure the difference. In my experience, most people (especially with Java or C# backgrounds) tend to over-estimate the cost of passing an object by value (and under-estimate the cost of dereferencing). Passing by value is the safest option (it will not cause any surprises outside the object or function, not even in another thread), don't give up that huge advantage easily.
A lot of time it will depend on context, for example if it's a functor which will be called in a for_each or similar, then you will often store a reference or a pointer within your functor to an object you expect will have a lifetime beyond your functor.
If it is a general use class then you have to consider how people are going to use it.
If you are writing a tokenizer, you need to consider that copying what you are tokenizing over might be expensive, however you also need to consider that if you are writing a boost library you are writing it for the general public who will use it in a multi-purpose way.
Storing a const char * would be better than a std::string const& here. If the user has a std::string then the const char * will remain valid as long as they don't modify their string, and they probably won't. If they have a const char * or something that holds an array of chars and passes it in, it will copy it anyway to create the std::string const & and you are in great danger of the fact that it won't live past your constructor.
Of course, with a const char * you can't use all the lovely std::basic_string functions in your implementation.
There is an option to take, as parameter, a std::string& (not const reference) which should guarantee (with a compliant compiler) that nobody will pass in a temporary, but you will be able to document that you don't actually change it, and the rationale behind your seemingly not const-correct code. Note, I have used this trick once in my code too. And you can happily use string's find functions. (As well as, if you wish, taking basic_string rather than string so you can tokenize wide character strings too).

c++: Loki StrongPtr looks unsafe to me, is that so?

I am currently looking at the most popular smart Ptr implementations such as boost shared and weak pointers aswell as loki Smart and Strong pointer since I want to implement my own and from what I understand Loki Strong pointer looks unsafe to me but I rather think that I understand it wrong so I'd like to discuss whether it's safe or not. The reason why I think it's not safe is that as far as I can tell it does not treat weak Pointers (that is a StrongPtr, where false indicates its weak) with enough care:
for instance the dereferencing functions:
PointerType operator -> ()
{
KP::OnDereference( GetPointer() ); //this only asserts by default as far as i know
//could be invalidated right here
return GetPointer();
}
In a multithreaded environment a weak pointer could be invalidated at any time, so that this function might return an invalidated Ptr.
As far as my understanding goes you would either have to create a strongPtr instance of the ptr you are dereferencing to ensure that it does not get invalidated half way through. I think thats also the reason why boost does not allow you to dereference a weak_ptr without creating a shared_ptr instance first. Lokis StrongPtr Constructor suffers from the same problem I think.
Is this a problem or am I reading the src wrong?
Regarding the use of assert, it's a programming error to use operator-> on an empty StrongPtr<> instance; i.e., it is the caller's responsibility to ensure that the StrongPtr<> instance is non-empty before dereferencing it. Why should anything more than an assert be needed? That said, if you deem some other behavior more appropriate than assert, then that's exactly what the policy is for.
This is a fundamental difference between preconditions and postconditions; here's a long but very good thread on the subject: comp.lang.c++.moderated: Exceptions. Read in particular the posts by D. Abrahams, as he explains in detail what I'm stating as understood fact. ;-]
Regarding the thread-safety of StrongPtr<>, I suspect most of Loki predates any serious thread-safety concerns; on the other hand, boost::shared_ptr<> and std::shared_ptr<> are explicitly guaranteed to be thread-safe, so I'm sure their implementations make for a "better" (though much more complicated) basis for study.
After reading carefully, I think I saw the rationale.
StrongPtr objects are dual in that they represent both Strong and Weak references.
The mechanism of assert works great on a Strong version. On a Weak version, it is the caller's responsability to ensure that the object referenced will live long enough. This can be achieved either:
automatically, if you know that you have a Strong version
manually, by creating a Strong instance
The benefit wrt std::shared_ptr is that you can avoid creating a new object when you already know that the item will outlive your use. It's an arguable design decision, but works great for experts (of which Alexandrescu undoubtebly is). It may not have been targetted at regular users (us) for which enforcing that a Strong version be taken would have been much better imho.
One could also argue that it's always easier to criticize with the benefit of hindsight. Loki, for all its greatness, is old.

If de-referencing a NULL pointer is an invalid thing to do, how should auto pointers be implemented?

I thought dereferencing a NULL pointer was dangerous, if so then what about this implementation of an auto_ptr?
http://ootips.org/yonat/4dev/smart-pointers.html
If the default constructor is invoked without a parameter the internal pointer will be NULL, then when operator*() is invoked won't that be dereferencing a null pointer?
Therefore what is the industrial strength implementation of this function?
Yes, dereferencing NULL pointer = bad.
Yes, constructing an auto_ptr with NULL creates a NULL auto_ptr.
Yes, dereferencing a NULL auto_ptr = bad.
Therefore what is the industrial strength implementation of this function?
I don't understand the question. If the definition of the function in question created by the industry itself is not "industrial strength" then I have a very hard time figuring out what might be.
std::auto_ptr is intended to provide essentially the same performance as a "raw" pointer. To that end, it doesn't (necessarily) do any run-time checking that the pointer is valid before being dereferenced.
If you want a pointer that checks validity, it's relatively easy to provide that, but it's not the intended purpose of auto_ptr. In fairness, I should add that the real intent of auto_ptr is rather an interesting question -- its specification was changed several times during the original standardization process, largely because of disagreements over what it should try to accomplish. The version that made it into the standard does have some uses, but quite frankly, not very many. In particular, it has transfer-of-ownership semantics that make it unsuitable for storage in a standard container (among other things), removing one of the obvious purposes for smart pointers in general.
Its purpose to help prevent memory leaks by ensuring that delete is performed on the underlying pointer whenever the auto_ptr goes out of scope (or itself is deleted).
Just like in higher-level languages such as C#, trying to dereference a null pointer/object will still explode, as it should.
Do what you would do if you dereferenced a NULL pointer. On many platforms, this means throw an exception.
Well, just like you said: dereferencing null pointer is illegal, leads to undefined behavior. This immediately means that you must not use operator * on a default-constructed auto_ptr. Period.
Where exactly you see a problem with "industrial strength" of this implementation is not clear to me.
#Jerry Coffin: It is naughty of me to answer your answer here rather than the OP's question but I need more lines than a comment allows..
You are completely right about the ridiculous semantics of the current version, it is so completely rotten that a new feature: "mutable" HAD to be added to the language just to allow these insane semantics to be implemented.
The original purpose of "auto_ptr" was exactly what boost::scoped_ptr does (AFAIK), and I'm happy to see a version of that finally made it into the new Standard. The reason for the name "auto_ptr" is that it should model a first class pointer on the stack, i.e. an automatic variable.
This auto_ptr was an National Body requirement, based on the following logic: if we have catchable exceptions in C++, we MUST have a way to manage pointers which is exception safe IN the Standard. This also applies to non-static class members (although that's a slightly harder problem which required a change to the syntax and semantics of constructors).
In addition a reference counting pointer was required but due to a lot of different possible implementation with different tradeoffs, one can accept that this be left out of the Standard until a later time.
Have you ever played that game where you pass a message around a ring of people and at the end someone reads out the input and output messages? That's what happened. The original intent got lost because some people thought that the auto_ptr, now we HAD to have it, could be made to do more... and finally what got put in the standard can't even do what the original simple scope_ptr style one did (auto_ptr semantics don't assure the pointed at object is destroyed because it could be moved elsewhere).
If I recall the key problem was returning the value of a auto_ptr: the core design simply doesn't allow that (it's uncopyable). A sane solution like
return ap.swap(NULL)
unfortunately still destroys the intended invariant. The right way is probably closer to:
return ap.clone();
which copies the object and returns the copy, destroying the original: the compiler is then free to optimise away the copy (as written not exception safe .. the clone might leak if another exception is thrown before it returns: a ref counted pointer solves this of course).