Well definedness of C++ programs hiding pointers - c++

According to Wikipedia:
C++11 defines conditions under which pointer values are "safely
derived" from other values. An implementation may specify that it
operates under "strict pointer safety," in which case pointers that
are not derived according to these rules can become invalid.
As I read it you can get the safety model used by an implementation, however that's fixed for the compiler (possibly variable with a command line switch).
Suppose I have code that hides pointers, such code definitely would not run with a naive bolt on garbage collector. However collectors (like my own) and Boehm provide hooks for finding pointers in certain objects.
I am in particular thinking about JudyArrays. These are digital tries which necessarily hide the keys. My question is basically whether using such data structures would render the behaviour of a program undefined in C++11.
I hope not (since Judy Arrays outperform everything else). Also as it happens .. I'm using them to implement a garbage collector. I am concerned however because "minimal requirements" don't general work at all and were strongly opposed in the original debate on the C++ conformance model (by the UK and Australia). Parametric requirements are better. But the C++11 GC related text seems to be a bit of both so I'm confused!

It's implementation defined whether an implementation provides relaxed pointer safety (what you seem to want) or strict pointer safety (pointers remain valid only when safely derived). As you've implied, you can call get_pointer_safety to find out what the policy is, but the standard provides no way to specify/change the policy.
You may, however, be able to side-step this question. If you can make a call to declare_reachable (passing that pointer value) before you hide the pointer, it remains valid until a matching call to undeclare_reachable (and here "matching" means calls nest).

Related

Why hasn't not_null made it into the C++ standard yet?

After adding the comment "// not null" to a raw pointer for the Nth time I wondered once again whatever happened to the not_null template.
The C++ core guidelines were created quite some time ago now and a few things have made into into the standard including for example std::span (some like string_view and std::array originated before the core guidelines themselves but are sometimes conflated). Given its relative simplicity why hasn't not_null (or anything similar) made it into the standard yet?
I scan the ISO mailings regularly (but perhaps not thoroughly) and I am not even aware of a proposal for it.
Possibly answering my own question. I do not recall coming across any cases where it would have prevented a bug in code I've worked on as we try not to write code that way.
The guidelines themselves are quite popular, making it into clang-tidy and sonar for example. The support libraries seem a little less popular.
For example boost has been available as a package on Linux from near the start. I am not aware of any implementation of GSL that is. Though, I presume it is bundled with Visual C++ on windows.
Since people have asked in the comments.
Myself I would use it to document intent.
A construct like not_null<> potentially has semantic value which a comment does not.
Enforcing it is secondary though I can see its place. This would preferably be done with zero overhead (perhaps at compile time only for a limited number of cases).
I was thinking mainly about the case of a raw pointer member variable. I had forgotten about the case of passing a pointer to a function for which I always use a reference to mean not-null and also to mean "I am not taking ownership".
Likewise (for class members) we could also document ownership owned<> not_owned<>.
I suppose there is also whether the associated object is allowed to be changed. This might be too high level though. You could use references members instead of pointers to document this. I avoid reference members myself because I almost always want copyable and assignable types. However, see for example Should I prefer pointers or references in member data? for some discussion of this.
Another dimension is whether another entity can modify a variable.
"const" says I promise not to modify it. In multi-threaded code we would like to say almost the opposite. That is "other code promises not to modify it while we are using it" (without an explicit lock) but this is way off topic...
There is one big technical issue that is likely unsolvable which makes standardizing not_null a problem: it cannot work with move-only smart pointers.
The most important use case for not_null is with smart pointers (for raw pointers a reference usually is adequate, but even then, there are times when a reference won't work). not_null<shared_ptr<T>> is a useful thing that says something important about the API that consumes such an object.
But not_null<unique_ptr<T>> doesn't work. It cannot work. The reason being that moving from a unique pointer leaves the old object null. Which is exactly what not_null is expected to prevent. Therefore, not_null<T> always forces a copy on its contained T. Which... you can't do with a unique_ptr, because that's the whole point of the type.
Being able to say that the unqiue_ptr consumed by an API is not null is good and useful. But you can't actually do that with not_null, which puts a hole in its utility.
So long as move-only smart pointers can't work with not_null, standardizing the class becomes problematic.

Technical reason why objects were not movable in the first C++ versions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
IMPORTANT I'm not asking about opinions, not about what's better or not, but about the objective facts: The actual reasons that Stroustrup or other collaborators used for taking the choice of imposing that objects cannot be relocatable.
I'm not able to locate the technical reason why it was decided that the address of an object cannot be changed (unfortunately any search about moving or copying objects gets cluttered with the move/copy semantics introduced later in the language).
I mean: C++ rightly imposes constructors when an object is copied (not doing so would go against the OOP paradigm). However, from the very beginning, it was decided that moving an object A from address addr1 to addr2 was not allowed, unless the copy constructor was invoked. What's the technical reason behind that choice? Why it was decided that its address was so important for an object?
Polymorphic objects have a pointer to their vtable, but the opposite is not true, so moving an object shouldn't break vtables. Thus, there must be another reason for it.
Also, there are other OOP languages in which objects can be freely moved. So, the OOP paradigm is not the reason.
Maybe some additional programming principles were considered undeniable when they designed the C++ language and that required to forbid moving objects? Which ones?
Maybe some additional programming principles were considered undeniable when they designed the C++ language and that required to forbid moving objects? Which ones?
The ones that say it should map efficiently onto the hardware implementation, which implies raw pointers, which implies address-identity.
Also, there are other OOP languages in which objects can be freely moved. So, the OOP paradigm is not the reason
Yes, these are languages which don't mind inserting an extra layer of indirection or complexity everywhere for the convenience of the garbage collector. This is more-or-less antithetical to C++'s goals.
it was decided that moving an object A from address addr1 to addr2 was not allowed, unless the copy constructor was invoked
The constructor is irrelevant, because in C++ all objects (which includes POD structs and even scalar primitives) have an identity. This identity is (or is bound to) their address. You can't move an int, either, you can only copy it. You can't move an object even with a copy constructor. You can't actually move an object even with a move constructor, because that's just a destructive copy. It doesn't pretend to mutate the address of an existing object.
I'm not able to locate the technical reason why it was decided that the address of an object cannot be changed
Well, the address of an object can't be changed now either.
This is only ever possible in languages which hide or abstract raw pointers. C++ exposes raw references and pointers to objects, so there's no way to change an object's address without potentially breaking references to it.
If you want to make an object which is movable in the sense that the Java runtime can shuffle memory around, you need to make sure it's only ever accessible via smart pointers with the required guarantees. Since the this pointer will always be a raw pointer in instance methods, you have to also make sure it either has no instance methods, or that your smart pointers can synchronize access to prevent relocation during a call.
I'm not able to locate the technical reason why it was decided that the address of an object cannot be changed
In C++ an object is the bytes it occupies1. Even std::memmove doesn't relocate an object, it merely copies it's bytes. (memmove and std::memcpy differ only in preconditions: memmove allows [src, src + size) to overlap [dest, dest + size), memcpy does not)
Also you seem to have a different definition of object than what C++ has. Almost everything with a type is an object (only function types and reference types don't categorise objects). ints are objects, pointers (including function pointers) are objects.
A class need not have a vtable2, if it has no virtual members, or if the compiler statically knows the dynamic type at every call site.
plus a lifetime. Before it is created and after it is destroyed, those bytes can be something else.
assuming that the implementation ordinarily uses vtables to dispatch virtual functions. Other methods of dynamic dispatch are possible.
To understand the historical development of C++, you need to understand the context of that development. C++ is of course historically rooted in C. That was not a random choice. C was and is historically very important.
To look at the success of C, we need to contrast it with its contemporaries. Languages like COBOL, Algol and FORTRAN were clunky and inflexible. Languages like Lisp, Smalltalk, and Modula-2 were more elegant, but did not perform well. And in the era where C blossomed, performance was still a very pressing concern. This was well before the era of multi-Ghz multi-core CPU's. C could give you almost the performance of assembly, with far more portability and far less development cost.
But as noted, C lacks some of the features that are really relevant when developing bigger programs. Even at the original home of C, this was noted in their own products. AT&T developed their phone switch (5ESS) in a sort of Object-Oriented fashion, but using their own C language. C++ made perfect sense in that context. Improvements in compiler technology made it possible to get the benefits of OO with lower costs than doing it manually. A string type did not need to be very expensive, and it's far easier to use than C's strcpy.
Now C++ needed to formalize how objects worked, and this was a very early decision - it's the key development that led to the fork from C. In fact, C++ redefined the C language such that many C programs were also correct C++ programs, and the struct from C became an object type in C++.
Now we come to your "move" question. In C, you can memcpy a struct. That would break more complex types such as strings. Even memcpy'ing a char* in C is fragile. The C++ solution was the copy ctor, and the assignment operator. These inventions made C++ safer compared to C++, and allowed more useful datastructures. But the very decision here defined how C++ objects work. C structs by default stay put, and so do C++ objects, because else a C program would not be a valid C++ program.
I will rephrase the question as I understand the essence of it, and then answer it.
Why reallocating an array of objects must be done by invoking constructors and destructors? Why cannot we just copy bytes of an old object to a new address and be done with it?
To reiterate, my understanding of moving in OP's terminology is copying object's bytes to a new object (perhaps if it is guaranteed that the original object will not be used any more). It has nothing to do with C++ move semantics.
Copying bytes does not guarantee integrity of an object, for several reasons.
An implementation is free to represent objects using hidden internal pointers. Objects that use multiple inheritance are often represented this way. Copying bytes to a new place invalidates these pointers. Copying bytes in this situation invalidates these pointers and destroys the integrity of the object.
Programmers are also free to use internal pointers. An object can contain a pointer to its own subobject. An object can also point to dependent objects which can point back to their owner. Copying bytes in this situation invalidates these pointers and destroys the integrity of the object.
The second bullet point is of course also true for C objects. But in C one cannot simply assign such objects either. Internal pointers need to be updated.
In C++, assignment always works (well, of the user wrote correct copy constructor and assignment operator). It would be unreasonable to keep assignment always working and break reallocation of certain arrays. So it was not done.
Note that a C+ compiler is allowed to byte-copy objects instead of copy-constructing them, if it can prove that it will not change the semantics of the program.

Why is it a good idea for (C++) types to be regular?

(This question stems for this more specific questions about stream iterators.)
A type is said [Stepanov, McJones] to be Regular if:
it is equality-comparable
it is assignable (from other values of the type)
it is destructible
it is default-constructible (i.e. constructible with no arguments)
It has a (default) total ordering of its values
(and there's some wording about "underlying type" which I didn't quite get.)
Some/many people claim that, when designing types - e.g. for the standard library of C++ - it is worthwhile or even important to make an effort to make these types regular, possibly ignoring the total-order requirement. Some mention the maxim:
Do as the ints do.
and indeed, the int type satisfies all these requirements. However, default-constructible types, when constructed, hold some kind of null, invalid or junk value - ints do. A different approach is requiring initialization on construction (and de-initialization on destruction), so that the object's lifetime corresponds to its time of viability.
To contrast these two approaches, one can perhaps think about a T-pointer type, and a T-reference or T-reference-wrapper type. The pointer is basically regular (the ordering assumes a linear address space), and has nullptr which one needs to look out for; the reference is, under the hood, a pointer - but you can't just construct a reference-to-nothing or a junk-reference. Now, pointers may have their place, but we mostly like to work with references (or possibly reference wrappers which are assignable).
So, why should we prefer designing (as library authors) and using regular types? Or at least, why should we prefer them in so many contexts?
I doubt this is the most significant answer, but you could make the argument that "no uninitialized state" / "no junk value" is a principle that doesn't sit well with move-assignment: After you move from your value, and take its resources away - what state does it remain in? It is not very far from default-constructing it (unless the move-assignment is based on some sort of swapping; but then - you could look at move construction).
The rebuttal would be: "Fine, so let's not have such types move-assignable, only move-destructible"; unfortunately - C++ decided sometime in the 2000's to go with non-destructive moves. See also this question & answer by #HowardHinnant:
Why does C++ move semantics leave the source constructed?

Does type aliasing issue exist only when pointers are passed to functions as arguments?

As far as I know, when two pointers (or references) do not type alias each other, it is legal to for the compiler to make the assumption that they address different locations and to make certain optimizations thereof, e.g., reordering instructions. Therefore, having pointers to different types to have the same value may be problematic. However, I think this issue only applies when the two pointers are passed to functions. Within the function body where the two pointers are created, the compiler should be able to make sure the relationship between them as to whether they address the same location. Am I right?
As far as I know, when two pointers (or references) do not type alias
each other, it is legal to for the compiler to make the assumption
that they address different locations and to make certain
optimizations thereof, e.g., reordering instructions.
Correct. GCC, for example, does perform optimizations of this form which can be disabled by passing the flag -fno-strict-aliasing.
However, I think this issue only applies when the two pointers are
passed to functions. Within the function body where the two pointers
are created, the compiler should be able to make sure the relationship
between them as to whether they address the same location. Am I right?
The standard doesn't distinguish between where those pointers came from. If your operation has undefined behavior, the program has undefined behavior, period. The compiler is in no way obliged to analyze the operands at compile time, but he may give you a warning.
Implementations which are designed and intended to be suitable for low-level programming should have no particular difficulty recognizing common patterns where storage of one type is reused or reinterpreted as another in situations not involving aliasing, provided that:
Within any particular function or loop, all pointers or lvalues used to access a particular piece of storage are derived from lvalues of a common type which identify the same object or elements of the same array, and
Between the creation of a derived-type pointer and the last use of it or any pointer derived from it, all operations involving the storage are performed only using the derived pointer or other pointers derived from it.
Most low-level programming scenarios requiring reuse or reinterpretation of storage fit these criteria, and handling code that fits these criteria will typically be rather straightforward in an implementation designed for low-level programming. If an implementation cache lvalues in registers and performs loop hoisting, for example, it could support the above semantics reasonably efficiently by flushing all cached values of type T whenever T or T* is used to form a pointer or lvalue of another type. Such an approach may be optimal, but would degrade performance much less than having to block all type-based optimizations entirely.
Note that it is probably in many cases not worthwhile for even an implementation intended for low-level programming to try to handle all possible scenarios involving aliasing. Doing that would be much more expensive than handling the far more common scenarios that don't involve aliasing.
Implementations which are specialized for other purposes are, of course, not required to make any attempt whatsoever to support any exceptions to 6.5p7--not even those that are often treated as part of the Standard. Whether such an implementation should be able to support such constructs would depend upon the particular purposes for which it is designed.

What kinds of types does qsort not work for in C++?

std::sort swaps elements by using std::swap, which in turn uses the copy constructor and assignment operators, guaranteeing that you get correct semantics when exchanging the values.
qsort swaps elements by simply swapping the elements' underlying bits, ignoring any semantics associated with the types you are swapping.
Even though qsort is ignorant of the semantics of the types you are sorting, it still works remarkably well with non-trivial types. If I'm not mistaken, it will work with all standard containers, despite them not being POD types.
I suppose that the prerequisite for qsort working correctly on a type T is that T is /trivially movable/. Off the top of my head, the only types that are not trivially movable are those that have inner pointers. For example:
struct NotTriviallyMovable
{
NotTriviallyMovable() : m_someElement(&m_array[5]) {}
int m_array[10];
int* m_someElement;
};
If you sorted an array of NotTriviallyMovable then the m_someElements would end up pointing to the wrong elements.
My question is: what other kinds of types do not work with qsort?
Any type that is not a POD type is not usable with qsort(). There might be more types that are usable with qsort() if you consider C++0x, as it changes definition of POD. If you are going to use non-POD types with qsort() then you are in the land of UBs and daemons will fly out of your nose.
This doesn't work either for types that have pointers to "related" objects. Such pointers have many of the issues associated with "inner" pointers, but it's a lot harder to prove precisely what a "related" object is.
A specific kind of "related" objects are objects with backpointers. If object A and B are bit-swapped, and A and C pointed to each other, then afterwards B will point to C but C will point to A.
You are completely mistaken. Any non-POD type working with qsort is complete and utter luck. Just because it happens to work for you on your platform with your compiler on a blue moon if you sacrifice the blood of a virgin to the Gods and do a little dance first doesn't mean that it actually works.
Oh, and here's another one for not trivially movable- types whose instances are externally observed. You move it, but you don't notify the observer, because you never called the swap or copy construction functions.
"If I'm not mistaken, it will work with all standard containers"
The whole question boils down to, in what implementation? Do you want to code to the standard, or do you want to code to implementation details of the compiler you have in front of you today? If the latter, then if all your tests pass I guess it works.
If you're asking about the C++ programming language, then qsort is required to work only for POD types. If you're asking about a specific implementation, which one? If you're asking about all implementations, then you've sort of missed your chance, since the best place for that kind of straw poll was C++0x working group meetings, since they gathered together representatives of pretty much every organization with an actively-maintained C++ implementation.
For what it's worth, I can pretty easily imagine an implementation of std::list in which a list node is embedded in the list object itself, and used as a head/tail sentinel. I don't know what implementations (if any) actually do that, since it's also common to use a null pointer as a head/tail sentinel, but certainly there are some advantages to implementing a doubly-linked list with a dummy node at each end. An instance of such a std::list would of course not be trivially movable, since the nodes for its first and last elements would no longer point to the sentinel. Its swap implementation and (in C++0x) its move constructor would account for this by updating those first and last nodes.
There is nothing to stop your compiler switching to this implementation of std::list in its next release, although that would break binary compatibility so given how most compilers are managed it would have to be a major release.
Similarly, the map/set/multimap/multiset quartet could have nodes that point to their parents. Debugging iterators for any container might conceivably contain a pointer to the container. To do what you want, you'd have to (at least) rule out the existence of any pointer into the container in any part of its implementation, and a sweeping statement like "no implementation uses any of these tricks" is pretty unwise. The whole point of having a standard is to make statements about all conforming implementations, so if you haven't deduced your conclusion from the standard, then even if your statement is true today it could become untrue tomorrow.