Consider this union:
union A{
int a;
struct{
int b;
} c;
};
c and a are not layout-compatibles types so it is not possible to read the value of b through a:
A x;
x.c.b=10;
x.a+x.a; //undefined behaviour (UB)
Trial 1
For the case below I think that since C++17, I also get an undefined behavior:
A x;
x.a=10;
auto p = &x.a; //(1)
x.c.b=12; //(2)
*p+*p; //(3) UB
Let's consider [basic.type]/3:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
Let's call this 4 pointer values categories as pointer value genre.
The value of a pointer may transition from of the above mentioned genre to an other, but the standard is not really explicit about that. Fill free to correct me if I am wrong. So I suppose that at (1) the value of p is a pointer to value. Then in (2) a life ends and the value of p becomes an invalid pointer value. So in (3) I get UB because I try to access the value of an object (a) out of its lifetime.
Trial 2
Now consider this weird code:
A x;
x.a=10;
auto p = &x.a; //(1)
x.c.b=12; //(2)
p = reinterpret_cast<int*>(p); //(2')
*p+*p; //(3) UB?
Could the reinterpret_cast<int*>(p) change the pointer value genre from invalid pointer value to a pointer to value.
reinterpret_cast<int*>(p) is defined to be equivalent to static_cast<int*>(static_cast<void*>(p)), so let's consider how is defined the static_cast from void* to int*, [expr.static.cast]/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
So in our case the original pointer pointed to the object a. So I suppose the reinterpret_cast will not help because a is not within its lifetime. Is my reading to strict? Could this code be well defined?
Then in (2) a life ends and the value of p becomes an invalid pointer value.
Incorrect. Pointers only become invalid when they point into memory that has ended its storage duration.
The pointer in this case becomes a pointer to an object outside of its lifetime. The object it points to is gone, but the pointer is not "invalid" in the way the specification means it. [basic.life] spends quite a bit of time explaining what you can and cannot do to pointers to objects outside of their lifetime.
reinterpret_cast cannot turn a pointer to an object outside of its lifetime into a pointer to a different object that is within its lifetime.
The notion of objects in the standard is rather abstract and differs somewhat from intuition. An object may be within its lifetime or not, and objects not within their lifetimes can have the same address, this is why unions work at all: the definition of active member is "the member that is within its lifetime".
A pointer to an object not within its lifetime is still a pointer to object. reinterpret_cast only casts between the type of the pointer, but not its validity. The UB you get with casting to non-pointer-interconvertible types are due to the strict-aliasing rule, not due to the validity of the pointer.
In all your trials, including your follow up question, you are using an object not within its lifetime in ways that aren't allowed, ie accessing it, and are consequently UB.
Every version to date of the C and C++ Standards has been ambiguous or contradictory with regard to what can be done with addresses of union members. The authors of the C Standard didn't want to require that compilers make pessimistic allowances for the possibility that functions might be invoked by constructs like:
someFunction(&myUnion.member1, &myUnion.member2);
in cases where function would cause the value one member of myUnion would be changed between access made via the other. While the ability to take union members' addresses would have been pretty useless if code couldn't do things like:
someFunction1(&myUnion.member1);
someFunction2(&myUnion.member2);
someFunction3(&myUnion.member1);
the authors of the Standard expected that quality implementations intended for various purposes would process constructs that Undefined Behavior "in a documented fashion characteristic of the environment" when doing so would best serve those purposes, and thus thought that making support for such constructs be a quality-of-implementation issue would be simpler than trying to formulate precise rules for which patterns must be supported. A compiler that generated code for the called functions in the second example without knowing their calling context wouldn't be able to interleave accesses performed by the two functions, and a quality compiler that expanded them inline while processing the above code would have no trouble noticing when each pointer was derived from myUnion.
The authors of the C89 Standard didn't think it necessary to define precise rules for how pointers to union members behave, because they thought compiler writers' desire to produce quality implementations would drive them to handle appropriate cases sensibly even without such rules. Unfortunately, some compiler writers were too lazy to handle cases like the second example above, and rather than recognizing that there was never any reason for quality compilers to be incapable of handling such cases, the authors of later C and C++ Standards have bent over backward to come up with weirdly contorted, ambiguous, and contradictory rules that justify such compiler behavior.
As a result, the address-of operator should only be regarded as meaningfully applicable to union members in cases where the resulting pointer will be used for accessing individual bytes of storage, either using character-types directly, or passing to functions like memcpy that are defined in such fashion. Unless or until there's a major revamp of the Standard, or an appendix that describes means by which implementations can offer optional guarantees beyond what the Standard requires, it would be best to pretend that union members are--like bitfields--lvalues that don't have addresses.
Related
First I need to note that I have asked a similar question several times, and last time I have got an almost satisfactory answer.
However, the wording of the c++20 standard (draft) is not clear to me, and I am curious whether it is a defect within the standard or in the comprehension of the standard.
C++20 standard introduced the notion of implicit-lifetime types.
Scalar types, implicit-lifetime class types ([class.prop]), array types, and cv-qualified versions of these types are collectively called implicit-lifetime types.
Further the standard (draft) says https://eel.is/c++draft/basic.memobj#intro.object-10
Some operations are described as implicitly creating objects within a
specified region of storage. For each operation that is specified as
implicitly creating objects, that operation implicitly creates and
starts the lifetime of zero or more objects of implicit-lifetime types
([basic.types]) in its specified region of storage if doing so would
result in the program having defined behavior. If no such set of
objects would give the program defined behavior, the behavior of the
program is undefined. If multiple such sets of objects would give the
program defined behavior, it is unspecified which such set of objects
is created.
[Note 3: Such operations do not start the lifetimes of subobjects of
such objects that are not themselves of implicit-lifetime types.— end
note]
and https://eel.is/c++draft/basic.memobj#intro.object-11.
Further, after implicitly creating objects within a specified region
of storage, some operations are described as producing a pointer to a
suitable created object. These operations select one of the
implicitly-created objects whose address is the address of the start
of the region of storage, and produce a pointer value that points to
that object, if that value would result in the program having defined
behavior.
If no such pointer value would give the program defined behavior, the
behavior of the program is undefined.
If multiple such pointer values would give the program defined
behavior, it is unspecified which such pointer value is produced.
The paper P0593R6
Implicit creation of objects for low-level object manipulation provides motivation and more examples for these.
One of the motivations for introduction of implicit-lifetime objects into standard was dynamic construction of arrays in order to make the pointer arithmetic in the example non-UB.
One of the proposed changes to the standard 5.9. 16.5.3.5 Table 34: Cpp17Allocator requirements, (which is the same as Example 1 in allocator requirements ) gives the following example.
Example 1: When reusing storage denoted by some pointer value p,
launder(reinterpret_cast<T*>(new (p)byte[n * sizeof(T)])) can be used to implicitly create a suitable
array object and obtain a pointer to it.
Which basically implies that the idea is to make the syntax (T*)malloc(...) and (T*)(::operator new(... ) ) (and similar cases) valid for subsequent pointer arithmetic.
I also understand the idea of "superposition" - that the suitable created object to which we return the pointer is determined on the usage, as long as an object that allows to use that pointer "legally" exists, we are fine.
Hence, I believe that the following snippets are fine
int* numbers = (int*)::operator new(n * sizeof(int) );
// Fine. Operator new implicitly creates an int array of a size smaller than `n` that is determined later
// and returns a pointer to the first element Both byte and byte array are implicitly constructed, so no
// problem here.
// the pointer to suitable created object points at numbers[0] which is implicitly constructed, so no
// problem here
A slightly modified example from the cited paper..
alignof(int) alignof(float) char buffer[max(sizeof(int), sizeof(float))];
*(int*)buffer = 5;
// First usage implicitly created an int and we have a pointer to that int, (int*)buffer is the pointer
// to suitable created object.
new (buffer) std::byte[sizeof(buffer)];
// ends the lifetime of the int and implicitly creates a float instead of
// it since next usage (float*) buffer implies implicit float creation
*(float*)buffer = 2.0f; // (float*)buffer is the pointer to the suitable created object (float) that was
// implicitly created by previous statement
However, note that we did not have a problem yet - all the types are implicitly constructed, so returning a pointer to first element of the array is fine - it is has implicitly constructed itself.
But what do we do to in order to create an array of type T if T is not an implicit-lifetime type?
Well, that should not be a problem.
auto memory = ::operator new(sizeof(T) * n, std::alignval_t{alignof(T)});
// implicitly creates a T array of size n.
auto arr_ptr = reinterpret_cast<T(*)[n]>(memory); // returns a pointer to that array
auto arr = *arr_ptr; // dereferences the pointer to our implicitly constructed array, and the result
// should be valid
However, there is a problem with this construct.
First of all - one of the motivations in the proposal paper was to make the usage of casting to T* valid, not to T(*)[] or T(*)[n].
The examples in the standard and the proposal paper suggest that cast to T* is a valid usage.
Let's look at the following snippet.
T* arr = (T*)::operator new(sizeof(T) * n, std::align_val_t{alignof(T)});
Now we have a problem. arr points at an object that is not constructed. No matter how we look at it, no matter what the size of the implicitly constructed T array is, its first element is not an implicit-lifetime object and therefore a pointer value that points at arr[0] is not a pointer to one of the implicitly-created objects whose address is the address of the start of the region of storage, hence the requirement for a pointer to suitable created object is not satisfied. I believe that same reasoning makes the example in allocator.requirements invalid. More than that - if n in example is replaced with 1, we basically produce a pointer to T object of non implicit-lifetime type (it is not even array, but just a pointer to T that has not yet been constructed) which is just plain wrong.
So where is the problem? Is the example indeed invalid? But then, the example in proposal paper is still invalid (UB) and making it valid (not UB), was one the motivations for introduction of implicit-lifetime into the standard.
So, how is this conflict solved? Is there an "error" in the wording in the standard and a pointer to suitable created object does not actually have to point to an object, but any valid pointer value that results in defined behavior is fine (and a pointer to an array element that has not yet begun its lifetime is a valid value for pointer arithmetic as far as I am aware, and pointer arithmetic is want we want here, after all) or pointer to first element before beginning of its lifetime a suitable created object (despite the fact that does not have implicit-lifetime) for reasons I do not understand? Or the examples in the motivation and allocator.requirements snippet are wrong, and despite being one of the main motivations for introduction of implicit-lifetime into the standard, they are still UB? Something else?
Diclaimer: Language-lawyer type of question.
An explanation in the answer to the cited question is practically fine. Besides that we have the standard library allocator facilities that allow creation of uninitialized arrays which should cover most of cases. After all, casting to (T*) (along with some laundering may be) work in practice, and the intention of the proposal paper to make it work. I believe that is the intention of the standard authors as well. This a question is how does the standard resolve this contradiction.
PS:
I do not mind if the answer to this question is merged with the cited one (as long as my question here gets answered), but I do believe that these questions are different - that one asked how to create an array without initializing its elements and this one asks about something that is (I believe so at least) an ambiguity (or contradiction) in the c++20 standard (well draft of it) and how to resolve it.
My understanding is that strict aliasing in C++ is defined in basic.lval 11:
(11) If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
(11.1) the dynamic type of the object,
(11.2) a cv-qualified version of the dynamic type of the object,
(11.3) a type similar (as defined in conv.qual) to the dynamic type of the object,
(11.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(11.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
(11.6) an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
(11.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(11.8) a char, unsigned char, or std::byte type.
By my reading, per 11.8, this is always legal, since the program accesses the stored value of x through a glvalue of type unsigned char:
int x = 0xdeadbeef;
auto y = reinterpret_cast<unsigned char*>(&x);
std::cout << y[1];
I am curious about using a pointer that aliases to an array of unsigned char:
alignas(int) unsigned char[4] x;
auto y = reinterpret_cast<int*>(x);
*y = 0xdeadbeef;
Is this a violation of strict aliasing? My reading is that it isn't, however I was just told on another thread that it is. Per basic.lval only, it seems to me that there is no UB, since the program does not attempt to access the stored value: it stores a new one without reading it, and so long as subsequent reads use x, then no violation occurs.
About the definition of "access":
http://eel.is/c++draft/defns.access
3.1 access [defns.access]
⟨execution-time action⟩ read or modify the value of an object
In other words, storing value is also "access". It is still UB.
There are many constructs which invoke UB, but which quality compilers should process correctly anyway. The use of character-typed storage to hold other types is among them. Requirement that a constructor for a char[] yield a pointer to aligned storage wouldn't make sense otherwise.
The authors of the C89 did not think it necessary to fully describe every situation where a quality implementation suitable for any particular purpose would need to behave predictably. The Rationale recognizes that implementations may be conforming while being of such low quality as to be essentially useless, and suggests that there was no perceived need to forbid implementations from behaving in ways that would impair their usefulness. Every subsequent C or C++ Standard has inherited parts of C89 which were never intended to be fully complete, and none of them have fully completed those parts.
The Standard makes no distinction between
actions which invoke UB but even the most obtuse compiler writer would recognize that they should behave predictably (e.g. struct foo {int x;} s; s.x=1;);
actions which quality compilers suitable for various purposes should process predictably, but which low-quality compilers or high-quality compilers that are suitable only for other purposes, might not;
actions which some compilers may handle predictably, but where such treatment should not be generally expected from any other compilers--even those targeting the same purposes (platforms, application fields, etc.).
Declaring a char[] with a particular alignment, using the named array once to capture its address (and never using the named array again), and employing it as raw storage that can hold other types, should fall into the first category above (especially since--as noted above--alignment guarantees wouldn't serve much purpose otherwise). A compiler may not recognize any pointers' relationship to the original array, and might thus not realize that actions on such pointers could interact with a char[](*), but if the array is never again used as a char[] the compiler would have no reason to care.
(*) For example, given
char foo[10];
int test(int *p)
{
if (foo[1])
*p = 1;
return foo[1];
}
an implementation might cache and reuse the first value read from foo[1], not recognizing that a write to *p might alter the underlying storage. If the named lvalue foo is never used after the first time its address is taken, however, it wouldn't matter what assumptions the compiler might make about whether it would be safe to cache reads of lvalue foo, because there wouldn't be any.
I know that int* ptr = (int*)buffer (where buffer is char*) breaks
strict-aliasing rule.
Does this syntax int& ref = (int&)(*buffer) also break the rule?
I had some SEGFAULTs due to violation of the strict aliasing rule, and this syntax has eliminated that. Though probably still is incorrect, is it?
This is not ok (assuming you're going to use said reference to access the value). § 3.10 [basic.lval] ¶ 10 of the C++14 standard (quoting N4140) says (emphasis mine):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
It doesn't matter whether you attempt to access via a pointer or a reference. For a stored object of type char, none of the bullet points apply to make it allowed accessing it as an int.
The last bullet point only says that you may alias any other type as char but not vice versa. It makes sense because a char is the smallest addressable unit with the weakest alignment requirements.
If you want, using a pointer is the same as using a reference except that you need to dereference explicitly in order to access the value.
Strict aliasing rules mean that you should not dereference pointers of different types pointing to the same memory location.
Since in your posted code you never dereference, it's not possible to tell if this violates the rule without seeing all the code.
Also, aliasing to the char* type is an exception and does not violate the rule. Which means you can access a memory location containing any type by converting its pointer to char*, and dereferencing it.
To conclude:
If buffer points on a memory location which contains an int, and was converted from int* to char*, this is valid. However, you should use reinterpret_cast for this
If buffer points to a memory location which contains chars, dereferencing the int* ptr does violate the rule.
The reference version is likely to suffer from the same problem. But the compiler has no obligation to prevent or warn you from doing this
Don't use C style casts, use reinterpret_cast instead, and read the standard about which uses have defined behavior.
Yes, it does.
Neither C nor C++ special case accesses via pointers vs. other accesses, the strict aliasing rules apply regardless of whether you use a pointer, a reference, or any other lvalue.
If you run into trouble, the easiest solution is to use memcpy to copy the memory location into a local variable - any self-respectable compiler will completely optimise this memcpy away and only treat it as an aliasing hint (memcpy is also preferable over unions, because the union method is not as portable).
I'm in the middle of a discussion trying to figure out whether unaligned access is allowable in C++ through reinterpret_cast. I think not, but I'm having trouble finding the right part(s) of the standard which confirm or refute that. I have been looking at C++11, but I would be okay with another version if it is more clear.
Unaligned access is undefined in C11. The relevant part of the C11 standard (§ 6.3.2.3, paragraph 7):
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Since the behavior of an unaligned access is undefined, some compilers (at least GCC) take that to mean that it is okay to generate instructions which require aligned data. Most of the time the code still works for unaligned data because most x86 and ARM instructions these days work with unaligned data, but some don't. In particular, some vector instructions don't, which means that as the compiler gets better at generating optimized instructions code which worked with older versions of the compiler may not work with newer versions. And, of course, some architectures (like MIPS) don't do as well with unaligned data.
C++11 is, of course, more complicated. § 5.2.10, paragraph 7 says:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.
Note that the last word is "unspecified", not "undefined". § 1.3.25 defines "unspecified behavior" as:
behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note]
Unless I'm missing something, the standard doesn't actually delineate the range of possible behaviors in this case, which seems to indicate to me that one very reasonable behavior is that which is implemented for C (at least by GCC): not supporting them. That would mean the compiler is free to assume unaligned accesses do not occur and emit instructions which may not work with unaligned memory, just like it does for C.
The person I'm discussing this with, however, has a different interpretation. They cite § 1.9, paragraph 5:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
Since there is no undefined behavior, they argue that the C++ compiler has no right to assume unaligned access don't occur.
So, are unaligned accesses through reinterpret_cast safe in C++? Where in the specification (any version) does it say?
Edit: By "access", I mean actually loading and storing. Something like
void unaligned_cp(void* a, void* b) {
*reinterpret_cast<volatile uint32_t*>(a) =
*reinterpret_cast<volatile uint32_t*>(b);
}
How the memory is allocated is actually outside my scope (it is for a library which can be called with data from anywhere), but malloc and an array on the stack are both likely candidates. I don't want to place any restrictions on how the memory is allocated.
Edit 2: Please cite sources (i.e., the C++ standard, section and paragraph) in answers.
Looking at 3.11/1:
Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated.
There's some debate in comments about exactly what constitutes allocating an object of a type. However I believe the following argument works regardless of how that discussion is resolved:
Take *reinterpret_cast<uint32_t*>(a) for example. If this expression does not cause UB, then (according to the strict aliasing rule) there must be an object of type uint32_t (or int32_t) at the given location after this statement. Whether the object was already there, or this write created it, does not matter.
According to the above Standard quote, objects with alignment requirement can only exist in a correctly aligned state.
Therefore any attempt to create or write an object that is not correctly aligned causes UB.
EDIT This answers the OP's original question, which was "is accessing a misaligned pointer safe". The OP has since edited their question to "is dereferencing a misaligned pointer safe", a far more practical and less interesting question.
The round-trip cast result of the pointer value is unspecified under those circumstances. Under certain limited circumstances (involving alignment), converting a pointer to A to a pointer to B, and then back again, results in the original pointer, even if you didn't have a B in that location.
If the alignment requirements are not met, than that round trip -- the pointer-to-A to pointer-to-B to pointer-to-A results in a pointer with an unspecified value.
As there are invalid pointer values, dereferencing a pointer with an unspecified value can result in undefined behavior. It is no different than *(int*)0xDEADBEEF in a sense.
Simply storing that pointer is not, however, undefined behavior.
None of the above C++ quotes talk about actually using a pointer-to-A as a pointer-to-B. Using a pointer to the "wrong type" in all but a very limited number of circumstances is undefined behavior, period.
An example of this involves creating a std::aligned_storage_t<sizeof(T), alignof(T)>. You can construct your T in that spot, and it will live their happily, even though it "actually" is an aligned_storage_t<sizeof(T), alignof(T)>. (You may, however, have to use the pointer returned from the placement new for full standard compliance; I am uncertain. See strict aliasing.)
Sadly, the standard is a bit lacking in terms of what object lifetime is. It refers to it, but does not define it well enough last I checked. You can only use a T at a particular location while a T lives there, but what that means is not made clear in all circumstances.
All of your quotes are about the pointer value, not the act of dereferencing.
5.2.10, paragraph 7 says that, assuming int has a stricter alignment than char, then the round trip of char* to int* to char* generates an unspecified value for the resulting char*.
On the other hand, if you convert int* to char* to int*, you are guaranteed to get back the exact same pointer as you started with.
It doesn't talk about what you get when you dereference said pointer. It simply states that in one case, you must be able to round trip. It washes its hands of the other way around.
Suppose you have some ints, and alignof(int) > 1:
int some_ints[3] ={0};
then you have an int pointer that is offset:
int* some_ptr = (int*)(((char*)&some_ints[0])+1);
We'll presume that copying this misaligned pointer doesn't cause undefined behavior for now.
The value of some_ptr is not specified by the standard. We'll be generous and presume it actually points to some chunk of bytes within some_bytes.
Now we have a int* that points to somewhere an int cannot be allocated (3.11/1). Under (3.8) the use of a pointer to an int is restricted in a number of ways. Usual use is restricted to a pointer to an T whose lifetime has begun allocated properly (/3). Some limited use is permitted on a pointer to a T which has been allocated properly, but whose lifetime has not begun (/5 and /6).
There is no way to create an int object that does not obey the alignment restrictions of int in the standard.
So the theoretical int* which claims to point to a misaligned int does not point to an int. No restrictions are placed on the behavior of said pointer when dereferenced; usual dereferencing rules provide behavior of a valid pointer to an object (including an int) and how it behaves.
And now our other assumptions. No restrictions on the value of some_ptr here are made by the standard: int* some_ptr = (int*)(((char*)&some_ints[0])+1);.
It is not a pointer to an int, much like (int*)nullptr is not a pointer to an int. Round tripping it back to a char* results in a pointer with unspecified value (it could be 0xbaadf00d or nullptr) explicitly in the standard.
The standard defines what you must do. There are (nearly? I guess evaluating it in a boolean context must return a bool) no requirements placed on the behavior of some_ptr by the standard, other than converting it back to char* results in an unspecified value (of the pointer).
Is it legal to compare dangling pointers?
int *p, *q;
{
int a;
p = &a;
}
{
int b;
q = &b;
}
std::cout << (p == q) << '\n';
Note how both p and q point to objects that have already vanished. Is this legal?
Introduction: The first issue is whether it is legal to use the value of p at all.
After a has been destroyed, p acquires what is known as an invalid pointer value. Quote from N4430 (for discussion of N4430's status see the "Note" below):
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of the deallocated storage become invalid pointer values.
The behaviour when an invalid pointer value is used is also covered in the same section of N4430 (and almost identical text appears in C++14 [basic.stc.dynamic.deallocation]/4):
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
[ Footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault. — end footnote ]
So you will need to consult your implementation's documentation to find out what should happen here (since C++14).
The term use in the above quotes means necessitating lvalue-to-rvalue conversion, as in C++14 [conv.lval/2]:
When an lvalue-to-rvalue conversion is applied to an expression e, and [...] the object to which the glvalue refers contains an invalid pointer value, the behaviour is implementation-defined.
History: In C++11 this said undefined rather than implementation-defined; it was changed by DR1438. See the edit history of this post for the full quotes.
Application to p == q: Supposing we have accepted in C++14+N4430 that the result of evaluating p and q is implementation-defined, and that the implementation does not define that a hardware trap occurs; [expr.eq]/2 says:
Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.
Since it's implementation-defined what values are obtained when p and q are evaluated, we can't say for sure what will happen here. But it must be either implementation-defined or unspecified.
g++ appears to exhibit unspecified behaviour in this case; depending on the -O switch I was able to have it say either 1 or 0, corresponding to whether or not the same memory address was re-used for b after a had been destroyed.
Note about N4430: This is a proposed defect resolution to C++14, that hasn't been accepted yet. It cleans up a lot of wording surrounding object lifetime, invalid pointers, subobjects, unions, and array bounds access.
In the C++14 text, it is defined under [basic.stc.dynamic.deallocation]/4 and subsequent paragraphs that an invalid pointer value arises when delete is used. However it's not clearly stated whether or not the same principle applies to static or automatic storage.
There is a definition "valid pointer" in [basic.compound]/3 but it is too vague to use sensibly.The [basic.life]/5 (footnote) refers to the same text to define the behaviour of pointers to objects of static storage duration, which suggests that it was meant to apply to all types of storage.
In N4430 the text is moved from that section up one level so that it does clearly apply to all storage durations. There is a note attached:
Drafting note: this should apply to all storage durations that can end, not just to dynamic storage duration. On an implementation supporting threads or segmented stacks, thread and automatic storage may behave in the same way that dynamic storage does.
My opinion: I don't see any consistent way to interpret the standard (pre-N4430) other than to say that p acquires an invalid pointer value. The behaviour doesn't seem to be covered by any other section besides what we have already looked at. So I am happy to treat the N4430 wording as representing the intent of the standard in this case.
Historically, there have been some systems where using a pointer as an rvalue might cause the system to fetch some information identified by some bits in that pointer. For example, if a pointer could contain the address of an object's header along with an offset into the object, fetching a pointer could cause the system to also fetch some information from that header. If the object has ceased to exist, the attempt to fetch information from its header could fail with arbitrary consequences.
That having been said, in the vast majority of C implementations, all pointers that were alive at some particular moment in time will forever hold the same relationships with regard to the relational and subtraction operators as they had at that particular time. Indeed, in most implementations if one has char *p, one may determine whether it identifies part of an object identified by char *base; size_t size; by checking whether (size_t)(p-base) < size; such comparison will work even retrospectively if there is any overlap in the objects' lifetime.
Unfortunately, the Standard defines no means by which code can indicate that it requires any of the latter guarantees, nor is there a standard means by which code can ask whether a particular implementation can promise any of the latter behaviors and refuse compilation if it does not. Further, some hyper-modern implementations will regard any use of relational or subtraction operators on two pointers as a promise by the programmer that the pointers in question will always identify the same live object, and omit any code which would only be relevant if that assumption didn't hold. Consequently, even though many hardware platforms would be able to offer guarantees that would be useful to many algorithms, there's no safe way by which code can exploit any such guarantees even if code will never need to run on hardware which does not naturally provide them.
The pointers contain the addresses of the variables they reference. The addresses are valid even when the variables that used to be stored there are released / destroyed / unavailable.
As long as you don't try to use the values at those addresses you are safe, meaning *p and *q will be undefined.
Obviously the result is implementation defined, therefore this code example can be used to study the features of your compiler if one doesn't want to dig into to assembly code.
Whether this is a meaningful practice is totally different discussion.