Is converting an integer to a pointer always well defined? - c++

Is this valid C++?
int main() {
int *p;
p = reinterpret_cast<int*>(42);
}
Assuming I never dereference p.
Looking up the C++ standard, we have
C++17 §6.9.2/3 [basic.compound]
3 Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an
object represents the address of the first byte in memory
([intro.memory]) occupied by the object or the first byte in memory
after the end of the storage occupied by the object, respectively. [
Note: A pointer past the end of an object ([expr.add]) is not
considered to point to an unrelated object of the object's type that
might be located at that address. A pointer value becomes invalid when
the storage it denotes reaches the end of its storage duration; see
[basic.stc]. — end note ] For purposes of pointer arithmetic
([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past
the end of the last element of an array x of n elements is considered
to be equivalent to a pointer to a hypothetical array element n of x
and an object of type T that is not an array element is considered to
belong to an array with one element of type T.
p = reinterpret_cast<int*>(42); does not fit into the list of possible values. And:
C++17 §8.2.10/5 [expr.reinterpret.cast]
A value of integral type or enumeration type can be explicitly
converted to a pointer. A pointer converted to an integer of
sufficient size (if any such exists on the implementation) and back to
the same pointer type will have its original value; mappings between
pointers and integers are otherwise implementation-defined. [ Note:
Except as described in 6.7.4.3, the result of such a conversion will
not be a safely-derived pointer value. — end note ]
C++ standard does not seem to say more about the integer to pointer conversion. Looking up the C17 standard:
C17 §6.3.2.3/5 (emphasis mine)
An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not
be correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.68)
and
C17 §6.2.6.1/5
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined.50) Such a representation is called a trap
representation.
To me, it seems like any value that does not fit into the list in [basic.compound] is a trap representation, thus p = reinterpret_cast<int*>(42); is UB. Am I correct? Is there something else making p = reinterpret_cast<int*>(42); undefined?

This is not UB, but implementation-defined, and you already cited why (§8.2.10/5 [expr.reinterpret.cast]). If a pointer has invalid pointer value, it doesn't necessarily mean that it has a trap representation. It can have a trap representation, and the compiler must document this. All you have here is a not safely-derived pointer.
Note, that we generate pointers with invalid pointer value all the time: if an object is freed by delete, all the pointers which pointed to this object have invalid pointer value.
Using the resulting pointer is implementation defined as well (not UB):
[...] if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.

The example shown is valid c++. On some platforms this is how you access "hardware resources" (and if it's not valid you have found a bug/mistake in standard text).
See also this answer for a better explanation.
Update:
The first sentence of reinterpret_cast as you quote yourself:
A value of integral type or enumeration type can be explicitly converted to a pointer.
I recommend you stop reading and rest yourself at this point. The rest of just a lot details including possible implementation specified behavior, etc. That doesn't make it UB/invalid.

Trap Representations
What: As covered by [C17 §6.2.6.1/5], a trap representation is a non-value. It is a bit pattern that fills the space allocated for an object of a given type, but this pattern does not correspond to a value of that type. It is a special pattern that can be recognized for the purpose of triggering behavior defined by the implementation. That is, the behavior is not covered by the standard, which means it falls under the banner of "undefined behavior". The standard sets out the possibilities for when a trap could be (not must be) triggered, but it makes no attempt to limit what a trap might do. For more information, see A: trap representation.
The undefined behavior associated with a trap representation is interesting in that an implementation has to check for it. The more common cases of undefined behavior were left undefined so that implementations do not need to check for them. The need to check for trap representations is a good reason to want few trap representations in an efficient implementation.
Who: The decision of which bit patterns (if any) constitute trap representations falls to the implementation. The standards do not force the existence of trap representations; when trap representations are mentioned, the wording is permissive, as in "might be", as opposed to demanding, as in "shall be". Trap representations are allowed, not required. In fact, N2091 came to the conclusion that trap representations are largely unused in practice, leading up to a proposal to remove them from the C standard. (It also proposes a backup plan if removal proves infeasible: explicitly call out that implementations must document which representations are trap representations, as there is no other way to know for sure whether or not a given bit pattern is a trap representation.)
Why: Theoretically, a trap representation could be used as a debugging aid. For example, an implementation could declare that 0xDDDD is a trap representation for pointer types, then choose to initialize all otherwise uninitialized pointers to this bit pattern. Reading this bit pattern could trigger a trap that alerts the programmer to the use of an uninitialized pointer. (Without the trap, a crash might not occur until later, complicating the debugging process. Sometimes early detection is the key.) In any event, a trap representation requires a trap of some sort to serve a purpose. An implementation would not define a trap representation without also defining its trap.
My point is that trap representations must be specified. They are deliberately removed from the set of values of a given type. They are not simply "everything else".
Pointer Values
C++17 §6.9.2/3 [basic.compound]
This section defines what an invalid pointer value is. It states "Every value of pointer type is one of the following" before listing four possibilities. That means that if you have a pointer value, then it is one of the four possibilities. The first three are fully specified (pointer to object or function, pointer past the end, and null pointer). The last possibility (invalid pointer value) is not fully specified elsewhere, so it becomes the catch-all "everything else" entry in the list (it is a "wild card", to borrow terminology from the comments). Hence this section defines "invalid pointer value" to mean a pointer value that does not point to something, does not point to the end of something, and is not null. If you have a pointer value that does not fit one of those three categories, then it is invalid.
In particular, if we agree that reinterpret_cast<int*>(42) does not point to something, does not point to the end of something, and is not null, then we must conclude that it is an invalid pointer value. (Admittedly, one could assume that the result of the cast is a trap representation for pointers in some implementation. In that case, yes, it does not fit into the list of possible pointer values because it would not be a pointer value, hence it's a trap representation. However, that is circular logic. Furthermore, based upon N2091, few implementations define any trap representations for pointers, so the assumption is likely groundless.)
[ Note: [...] A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see [basic.stc]. — end note ]
I should first acknowledge that this is a note. It explains and clarifies without adding new substance. One should expect no definitions in a note.
This note gives an example of an invalid pointer value. It clarifies that a pointer can (perhaps surprisingly) change from "points to an object" to "invalid pointer value" without changing its value. Looking at this from a formal logic perspective, this note is an implication: "if [something] then [invalid pointer]". Viewing this as a definition of "invalid pointer" is a fallacy; it is merely an example of one of the ways one can get an invalid pointer.
Casting
C++17 §8.2.10/5 [expr.reinterpret.cast]
A value of integral type or enumeration type can be explicitly converted to a pointer.
This explicitly permits reinterpret_cast<int*>(42). Therefore, the behavior is defined.
To be thorough, one should make sure there is nothing in the standard that makes 42 "erroneous data" to the degree that undefined behavior results from the cast. The rest of [§8.2.10/5] does not do this, and:
C++ standard does not seem to say more about the integer to pointer conversion.
Is this valid C++?
Yes.

Related

Is the value representation of integral types implementation-defined or unspecified?

To quote from N4868 6.8.2 paragraph 5:
Each value x of an unsigned integer type with width N has a unique representation...
Notably, it avoids specifying "value representation" or "object representation," so it's not clear if either is intended here.
Later on (in the index of implementation-defined behavior), N4868 does call out the value representation of pointers and of floating-point types as implementation-defined, but very notably excludes integral types.
Given this, there are four potential interpretations that I can think of:
The value representation of integral types is uniquely specified
The value representation of integral types is unspecified
The value representation of integral types is implementation-defined, but mistakenly left out of the aforementioned index
The value representation of integral types is undefined
#1 appears impossible, as implementations exist for both big- and little- endian architectures.
#3 appears unlikely, since the absence of integral types from the index is conspicuous, and the actual text of both floating-point and pointer types calls out their being implementation-defined, while the text on integral types goes to great lengths to avoid specifying the value representation.
#2 is the most likely interpretation, but is conspicuous in that the standard often calls out behavior as unspecified, but here says no such thing. This would, among other things, imply that behavior can be unspecified even if not actually called out as such, which makes it difficult to distinguish merely unspecified behavior vs behavior that is left undefined by the standard not defining it at all (as opposed to called out as "undefined behavior")
#4 seems absurd, as the standard implies that all types (or at least, trivially-copyable ones) have a definite, if otherwise unspecified, object representation (and by extension, value representation). Specifically, 6.7, paragraph 4 states:
For trivially copyable types, the value representation is a set of bits in the object representation
that determines a value, which is one discrete element of an implementation-defined set of values.
Which seems to imply that the value representation of trivially-copyable types (including integral types) is otherwise unspecified.
Scenario #2 probably indicates a failure to call the representation out as "unspecified," since we have the note under the definition of "undefined behavior" in Section 3: "Undefined behavior may be expected when this document omits any explicit definition of behavior." If the value representation of integral types isn't every explicitly stated as unspecified / implementation-defined, then code that depends on the value representation wouldn't just be unspecified / implementation-defined, it would be undefined by omission.
However, one could also argue that the "explicit definition of behavior" clause does not apply, as the behavior is perfectly well-defined, the object representation being a sequence of objects of type unsigned char, with merely their values being left to the implementation.
After bringing this up as an editorial issue, the correct answer appears to be that the integral representation is "none of the above." It is simply left unspecified, and is not called out as such because the "unspecified" label is only generally applied to behavior.

Is reinterpret casting an integer to a pointer bijective if the integer is the same size as a pointer?

Given an integer type IntT such that sizeof(IntT) == sizeof(void*), and a variable of said type i, is it guaranteed that reinterpret_cast<IntT>(reinterpret_cast<void*>(i)) == i? This is similar to this question, but that question was looking at any arbitrary sized integer so the answer was a straight forward no. Limiting it to integers of exactly the same size as a pointer makes it more interesting.
It strikes me as though the answer would have to be "yes," because the specification states that there exists a mapping to any integer large enough to hold the pointer value. If the variables are the same size, then that mapping must be bijective. If it's bijective, then that also means the conversion from int to void* must also be bijective.
But is there a hole in that logic? Is there a wiggle word in the spec that I'm not accounting for?
I don't think this is guaranteed. The Standard guarantees that a pointer converted to a suitably large integer and back will have its original value. From this follows that there is a mapping from pointers to a subset of the suitably large integers and back. What it does not imply is that for every suitably-large integer value, there is a corresponding pointer value…
As pointed out by DavisHerring in the comments below, this means that the mapping is injective, but does not have to be surjective and, thus, bijective. I believe what the standard implies in mathematical terms would be that there is a left-unique and left-total relation between pointers and integers, not a bijective function.
Just imagine some weird architecture where, for some reason, every third Bit of an address must be zero. Or a slightly more reasonable architecture that uses only the lower 42 Bits of a 64-Bit value to store an address. Independently of how much sense that'd make, the compiler would be free to assume that an integer value being cast to a pointer must follow the pattern of a valid address and, e.g., mask out every third bit or only use the lower six Byte respectively…

Understanding ARM assembly instructions and C/C++ pointers

I am trying to decode the assembly instruction that operates on address, 16 bit ARM thumb instruction. So, I don't think I should care about data type. Because I'm only interested in the 16 bits store there. I have separate interpreter to make sense of those bits, I don't want to use that as data anyway.
If I have a pointer p and I want to read 4 bytes (i.e data from p to p+3 address). Will casting p to int * and dereferencing give me the data?
You have a pointer to-some-type. Pointer arithmetic and dereferencing honors the data type.
Please note, you can only access the stored value of any variable (object) by an lvalue expression that has either a compatible type or a character pointer. Blindly forcing a pointer to cast to a different non-compatible type and attempt to dereference that will violate the strict aliasing rule and you'll face undefined behavior.
Quoting C11, chapter §6.5
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
You can however, always use a char * to point to any type and dereference and increment (and repeat) to get the individual values for the bytes but you need to take care of endianness yourself.
Related, quoting C11, chapter §6.3.2.3
[....] When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.

Using reinterpret_cast to convert integer to pointer and back to integer [duplicate]

According to http://en.cppreference.com/w/cpp/language/reinterpret_cast, it is known that reinterpret_cast a pointer to an integral of sufficient size and back yield the same value. I'm wondering whether the converse is also true by the standards. That is, does reinterpret_cast an integral to a pointer type of sufficient size and back yield the same value?
No, that is not guaranteed by the standard. Quoting all parts of C++14 (n4140) [expr.reinterpret.cast] which concern pointer–integer conversions, emphasis mine:
4 A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined. [ Note: It is intended to be unsurprising to those who know the addressing structure
of the underlying machine. —end note ] ...
5 A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value; mappings between pointers and integers are otherwise implementation-defined.
[ Note: Except as described in 3.7.4.3, the result of such a conversion will not be a safely-derived pointer
value. —end note ]
So starting with an integral value and converting it to a pointer and back (assuming no size issues) is implementation-defined. Which means you must consult your compiler's documentation to learn whether such a round trip preserves values or not. As such, it is certainly not portable.
I get exactly this problem in library exporting pointers to objects as opaque identifiers and now attempting to recover these pointers from external calls don't work for old x86 CPU's (in the time of windows 98). So, while we can expect that behaviour, this is false in general case. In 386-CPU the address is composed by overlapped pointers so the address of any memory position is not unique, and I found that conversion back don't recover original value.

static_cast on integer to enum conversion

There is some function that takes in an enum as argument
void myfunc(myEnum input);
As I understand, if I have to give an integer to this function, it is advised to explicitly cast it to enum, the reason being all integers may not be valid enum values.
As per MSDN
"The static_cast operator can explicitly convert an integral value to
an enumeration type. If the value of the integral type does not fall
within the range of enumeration values, the resulting enumeration
value is undefined."
and as per the C++ standards 5.2.9 Static cast -> 10
"A value of integral or enumeration type can be explicitly converted
to an enumeration type. The value is unchanged if the original value
is within the range of the enumeration values (7.2). Otherwise, the
resulting value is unspecified (and might not be in that range)."
So what's the point using static_cast in this scenario? Is there some option that would raise exceptions on values outside the enum range (other than writing explicit code for that)?
As usual, the compiler is just trying to keep you from shooting yourself in the foot. That's why you cannot just pass an int to a function expecting an enum. The compiler will rightfully complain, because the int might not match any valid enum value.
By adding the cast you basically tell the compiler 'Shut up, I know what I am doing'. What you are communicating here is that you are sure that the value you pass in is 'within the range of the enumeration values'. And you better make sure that is the case, or you are on a one-way trip to undefined-behavior-land.
If this is so dangerous, then why doesn't the compiler add a runtime check for the integer value? The reason is, as so often with C++, performance. Maybe you just know from the surrounding program logic that the int value will always be valid and you absolutely cannot waste any time on stupid runtime checks. From a language-design point of view, this might not be the most reasonable default to chose, especially when your goal is writing robust code. But that's just how C++ works: A developer should never have to pay for functionality that they might not want to use.