Pointer cast into char* array - c++

I'm fairly new to C++ and I'm having difficulty wrapping my head around what is going on in the final line of the below:
int numToSend = bs->GetSize();
char * tBuf = new char[NUM_LENGTH_BYTES + numToSend];
*(WORD*)tBuf = htons((WORD)numToSend);
So htons is returning a u_short or WORD, but the cast on tBuf is somewhat confusing to me. Is it something along the lines of "the value pointed to by tBuf is cast as a WORD pointer and assigned the return from htons"?
I believe this is a fairly unsafe operation in most cases, what would be the best practice here?

It may not be a recommended practice, but AFAIK, it is safe. It is true that in general, taking a pointer to P, casting it to a pointer to Q and using it as a pointer to Q leads to undefined behaviour. Here it looks even worse, because the alignment requirement of char are known to be the weakest possible.
But the char * tBuf pointer has been obtained through a new expression. Such a new expression internally rely on a allocation function to obtain storage, and draft n4296 for c++14 says in 3.7.4.1 Allocation functions [basic.stc.dynamic.allocation] §2:
The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall
return the address of the start of a block of storage whose length in bytes shall be at least as large as
the requested size... The pointer returned shall be suitably aligned so that it can be converted
to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used
to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call
to a corresponding deallocation function).
So this line *(WORD*)tBuf = htons((WORD)numToSend); only does perfectly defined operations:
convert numToSend from an integer type to an unsigned type, and 4.7 Integral conversions [conv.integral] says:
A prvalue of an integer type can be converted to a prvalue of another integer type...
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type)
call htons with a WORD or uint16_t as parameter to return a uint16_t or WORD
converts a pointer obtained by new to a WORD * and uses that pointer to access the object in the storage allocated
Simply, the value of the first two bytes of the allocated array is now unspecified. More exactly it is the byte representation of the WORD in the particular implementation.
But it is still allowed to access the allocated array as a character array, even if the first bytes now contain a WORD, because it is explicitely allowed per the so called strict aliasing rule 3.10 Lvalues and rvalues [basic.lval] §10 :
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:...
(10.8) — a char or unsigned char type.
If the tBuf pointer had not been obtained through a new expression, the only correct way would have been to do a memcpy:
WORD n_numToSend = htons(numToSend);
memcpy(tBuf, &n_numToSend, sizeof(WORD));
As this one is allowed for any pointer provided the storage is big enough, it is what I would call the recommended practice.

Related

Legality of using `T` pointer derived from aliased `byte` pointer

In C++, it's legal for a char, unsigned char, or std::byte pointer to alias any T pointer. Such aliased pointer is capable of accessing the object's representation in raw memory.
My question is whether it's legal and free of undefined-behavior to use a T* pointer derived from arithmetic on a char/unsigned char/std::byte pointer aliasing an original T array object sequence -- provided that the resultant pointer is still aligned correctly and within the original reachability of the aliased sequence.
As a concrete example:
// Array sequence is 2 in length
auto array = std::array<unsigned,2>{0xdead, 0xbeef};
// Derive the address of &array[1] using std::byte pointer
auto p = reinterpret_cast<std::byte*>(array.data());
auto p1 = reinterpret_cast<unsigned*>(p + sizeof(unsigned));
assert(p1 == &array[1]); // This expression should be legal since the pointer is safely derived
assert(*p1 == array[1]); // But is this legal, as per the C++ standard?
The pointer p1 should be safely derived in the above example since it doesn't exceed the reachability of array's bound. I know that in many cases in C++ it's legal to cast a pointer to a type/representation that can't actually be used/dereferenced. Is the resultant pointer here legal to dereference as far as the standard is concerned?
For the purpose of this question, assume that T may not be standard-layout type.
Note: I'm not asking about whether this will work in practice, since I have yet to see this type of code fail on compilers; but I'm curious to know whether this is, strictly-speaking, legal per the C++ standard.
Edit:
I'd rather not derail this question, but there are a few comments that are asserting that p + sizeof(unsigned) is undefined behavior as a result of [expr.add], in particular quoting both [expr.add]/4 and [expr.add]/6 due to p not actually being a byte-array.
There is, in fact, a byte sequence[1], and both [expr.add]/4 and [expr.add]/6 simply do not apply[2]
[1]: char, unsigned char, and std::byte view the object representation of an object, which is defined under [basic.types.general]/4 as being:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
The above means that there is a formal sequence of unsigned char (which std::byte is capable of representing), which means p + sizeof(unsigned) is valid. We aren't viewing through a similar pointer, we are viewing exactly as the underlying object-representation which is legal.
[2]: [expr.add]/4 is simply referring to reachability of array objects. However, as pointed out above, there is a valid and reachable object due to the unsigned char sequence, and we do not exceed reachability -- thus this doesn't apply as a case of undefined behavior.
As far as I can tell, [expr.add]/6 primarily exists to avoid cases where an array of derived types is cast to a base pointer and attempted to be indexed. However it wouldn't apply in this case because we aren't viewing through a "similar" pointer, we are viewing an object representation that is formally defined as a sequence of unsigned chars. This isn't a similar representation, it's exact.

Assign memory address stored in int to int

Assuming the value of uint32_t x is an address. How can I get the value behind this address?
My try was to simply assign the address to a pointer.
int*y = x;
But x is not an int pointer, it's just int with an address as value.
An integer type which is large enough to represent all data pointers can be converted into a pointer using reinterpret_cast or an explicit conversion. The pointer can be indirected to get the pointed value using the indirection operator.
Note that uint32_t is not guaranteed to be large enough to be able to represent all pointer values (and in fact will not be enough on modern 64 bit cpus). uintptr_t is meant precisely for this purpose.
Note that if the pointed address does not contain an object (of compatible type), then behaviour will be undefined.
In C++ this can be done using reinterpret_cast
8.5.1.10 Reinterpret cast [expr.reinterpret.cast]
...
5. A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. [Note: Except as described in 6.6.4.4.3, the result of such a conversion will not be a safely-derived pointer value. —end note]
And the rules in 6.6.4.4.3 state:
An integer value is an integer representation of a safely-derived pointer only if its type is at least as large as std::intptr_t and it is one of the following:
—(3.1) the result of a reinterpret_cast of a safely-derived pointer value;
—(3.2) the result of a valid conversion of an integer representation of a safely-derived pointer value;
—(3.3) the value of an object whose value was copied from a traceable pointer object, where at the time of the copy the source object contained an integer representation of a safely-derived pointer value;
—(3.4) the result of an additive or bitwise operation, one of whose operands is an integer representation of a safely-derived pointer value P, if that result converted by reinterpret_cast<void*> would compare equal to a safely-derived pointer computable from reinterpret_cast<void*>(P).
So if x (in the question) has a type at least as large as std::intptr_t and is already an integral representation of a safely derived pointer as per the rules above, you will be able to get the value behind the address stored in x.

Why can't I assign the address of a variable of one type(say double) to a pointer of int type?

In case of pointers, we know that their size is always same irrespective of data type of the variable it is pointing.
Data type is needed when dereferencing the pointer so it knows how much data it should read. So why cant i assign address of variable of double type to a pointer of int type?
why cant it happen like dereferencing a int pointer reads next 4 bytes from variable of double type and print its value?
Many computers have alignment requirements, so (for example) to read a 2-byte value, the address at which it's located must be a multiple of 2 (and likewise, a 4-byte value must be located at an address that's a multiple of 4, and so on). In fact, this alignment requirement is common enough that it's frequently referred to as "natural alignment".
Likewise, some types (e.g., floating point types) impose requirements on the bit sequence that can be read as that type, so if you try to take some arbitrary data and treat it as a double, you might trigger something like a floating point exception.
If you want to do this badly enough, you can use a cast to turn the pointer into the target type (but the results, if any, aren't usually portable).
You are guaranteed that you can convert a pointer to any other type of object to a pointer to unsigned char, and use that to read the bytes that represent the pointee object.
Also, if you primarily want an opaque pointer, without type information attached, you can assign a pointer to some other type to a void *.
Finally: no, not all pointers are actually the same. Pointers to different types can be different sizes (e.g., on the early Cray compilers, a char * was substantially different from an int *).
In case of pointers, we know that their size is always same irrespective of data type of the variable it is pointing.
No, we do not know that.
Chapter and verse for C
6.2.5 Types
...
28 A pointer to void shall have the same representation and alignment requirements as a
pointer to a character type.48) Similarly, pointers to qualified or unqualified versions of
compatible types shall have the same representation and alignment requirements. All
pointers to structure types shall have the same representation and alignment requirements
as each other. All pointers to union types shall have the same representation and
alignment requirements as each other. Pointers to other types need not have the same
representation or alignment requirements.
48) The same representation and alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of unions.
Emphasis added.
Chapter and verse for C++
3.9.2 Compound types
...
3 The type of a pointer to void or a pointer to an object type is called an object pointer type. [ Note: A pointer
to void does not have a pointer-to-object type, however, because void is not an object type. — end note ]
The type of a pointer that can designate a function is called a function pointer type. A pointer to objects
of type T is referred to as a “pointer to T.” [Example: a pointer to an object of type int is referred to as
“pointer to int ” and a pointer to an object of class X is called a “pointer to X.” — end example ] Except
for pointers to static members, text referring to “pointers” does not apply to pointers to members. Pointers
to incomplete types are allowed although there are restrictions on what can be done with them (3.11).
A valid value of an object pointer type represents either the address of a byte in memory (1.7) or a null
pointer (4.10). If an object of type T is located at an address A, a pointer of type cv T* whose value is the
address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance,
the address one past the end of an array (5.7) would be considered to point to an unrelated object of the
array’s element type that might be located at that address. There are further restrictions on pointers to
objects with dynamic storage duration; see 3.7.4.3. — end note ] The value representation of pointer types
is implementation-defined. Pointers to layout-compatible types shall have the same value representation and
alignment requirements (3.11). [ Note: Pointers to over-aligned types (3.11) have no special representation,
but their range of valid values is restricted by the extended alignment requirement. This International
Standard specifies only two ways of obtaining such a pointer: taking the address of a valid object with
an over-aligned type, and using one of the runtime pointer alignment functions. An implementation may
provide other means of obtaining a valid pointer value for an over-aligned type. — end note ]
4 A pointer to cv-qualified (3.9.3) or cv-unqualified void can be used to point to objects of unknown type.
Such a pointer shall be able to hold any object pointer. An object of type cv void* shall have the same
representation and alignment requirements as cv char*.
Emphasis added. It is entirely possible to have different sizes and representations for different pointer types. There is no reason to expect a pointer to int to have the same size and representation as a pointer to double, or a pointer to a struct type, or a pointer to a function type. It's true for commodity platforms like x86, but not all the world runs on x86.
This is why you can't assign pointer values of one type to pointer values of another type without an explicit cast (except for converting between void * and other pointer types in C), since a representation change may be required.
Secondly, pointer arithmetic depends on the size of the pointed-to type. Assume you have pointers to a 32-bit int and a 64-bit double:
int *ip;
double *dp;
The expression ip + 1 will return the address of the next integer object (current address plus 4), while the expression dp + 1 will return the address of the next double object (current address plus 8).
If I assign the address of a double to a pointer to int, incrementing that int pointer won't take me to the next double object.

Can std::uintptr_t be used to avoid undefined behavior of out-of-bounds pointer arithmetic?

Now we know that doing out-of-bounds-pointer-arithmetic has undefined behavior as described in this SO question.
My question is: can we workaround such restriction by casting to std::uintptr_t for arithmetic operations and then cast back to pointer? is that guaranteed to work?
For example:
char a[5];
auto u = reinterpret_cast<std::uintptr_t>(a) - 1;
auto p = reinterpret_cast<char*>(u + 1); // OK?
The real world usage is for optimizing offsetted memory access -- instead of p[n + offset], I want to do offset_p[n].
EDIT To make the question more explicit:
Given a base pointer p of a char array, if p + n is a valid pointer, will reinterpret_cast<char*>(reinterpret_cast<std::uintptr_t>(p) + n) be guaranteed to yield the same valid pointer?
No, uintptr_t cannot be meaningfully used to avoid undefined behavior when performing pointer arithmetic.
For one thing, at least in C there is no guarantee that uintptr_t even exists. The requirement is that any value of type void* may be converted to uintptr_t and back again, yielding the original value without loss of information. In principle, there might not be any unsigned integer type wide enough to hold all pointer values. (I presume the same applies to C++, since C++ inherits most of the C standard library and defines it by reference to the C standard.)
Even if uintptr_t does exist, there is no guarantee that a given arithmetic operation on a uintptr_t value does the same thing as the corresponding operation on a pointer value.
For example, I've worked on systems (Cray vector systems, T90 and SV1) on which byte pointers are implemented in software. A native address is a 64-bit address that refers to a 64-bit word; there is no hardware support for byte addressing. A char* or void* pointer consists of a word pointer with a 3-bit offset stored in the otherwise unused high-order bits. Conversion between integers and pointers simply copies the bits. So incrementing a char* would advance it to point to the next 8-bit byte in memory; incrementing a uintptr_t obtained by converting a char* would advance it to point to the next 64-bit word.
That's just one example. More generally, conversions between pointers and integers are implementation-defined, and the language standard makes no guarantee about the semantics of those conversions (other than, in some cases, converting back to a pointer).
So yes, you can convert a pointer value to uintptr_t (if that type exists) and perform arithmetic on it without risking undefined behavior -- but the result may or may not be meaningful.
It happens that, on most systems, the mapping between pointers and integers is simpler, and you probably can get away with that kind of game. But you're better off using pointer arithmetic directly, and just being very careful to avoid any invalid operations.
Yes, that is legal, but you must reinterpret_cast exactly the same uintptr_t value back to char*.
(Therefore, what it you're intending to do is illegal; that is, converting a different value back to a pointer.)
5.2.10 Reinterpret cast
4 . A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5 . A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value;
(Note that there'd be no way, in general, for the compiler to know that you subtracted one and then added it back.)

Casting between primitive type pointers

Is the following well-defined:
char* charPtr = new char[42];
int* intPtr = (int*)charPtr;
charPtr++;
intPtr = (int*) charPtr;
The intPtr isn't properly aligned (in at least one of the two cases). Is it illegal just having it there? Is it UB using it at any stage? How can you use it and how can't you?
In general, the result is unspecified (5.2.10p7) if the alignment requirements of int are greater than those of char, which they usually will be. The result will be a valid value of the type int * so it can be e.g. printed as a pointer with operator<< or converted to intptr_t.
Because the result has an unspecified value, unless specified by the implementation it is undefined behaviour to indirect it and perform lvalue-to-rvalue conversion on the resulting int lvalue (except in unevaluated contexts). Converting back to char * will not necessarily round-trip.
However, if the original char * was itself the result of a cast from int *, then the cast to int * counts as the second half of a round trip; in that case, the cast is defined.
In particular, in the case above where the char * was the result of a new[] expression, we are guaranteed (5.3.4p10) that the char * pointer is appropriately aligned for int, as long as sizeof(int) <= 42. Because the new[] expression obtains its storage from an allocation function, 3.7.4.1p2 applies; the void * pointer can be converted
to a pointer of any complete object type with a fundamental alignment requirement and then used to access the object [...] which strongly implies, along with the note to 5.3.4p10, that the same holds for the char * pointer returned by the new[] expression. In this case the int * is a pointer to an uninitialised int object, so performing lvalue-to-rvalue conversion on its indirection is undefined (3.8p6), but assigning to its indirection is fully defined. The int object is in the storage allocated (3.7.4.1p2) so converting the int * back to char * will yield the original value per 1.8p6. This does not hold for the incremented char * pointer as unless sizeof(int) == 1 it is not the address of an int object.
First, of course: the pointer is guaranteed to be aligned in the
first case (by §5.3.4/10 and §3.7.4.1/2), and may be correctly
aligned in both cases. (Obviously, if sizeof(int) == 1, but
even when this is not the case, an implementation doesn't
necessarily have alignment requirements.)
And to make things clear: your casts are all reinterpret_cast.
Beyond that, this is an interesting question, because as far as
I can tell, there is no difference in the two casts, as far as
the standard is concerned. The results of the conversion are
unspecified (according to §5.2.10/7); you're not even guaranteed
that converting it back into a char* will result in the
original value. (It obviously won't, for example, on machines
where int* is smaller than a char*.)
In practice, of course: the standard requires that the return
value of new char[N] be sufficiently aligned for any value
which may fit into it, so you are guaranteed to be able to do:
intPtr = new (charPtr) int;
Which has exactly the same effect as your cast, given that the
default constructor for int is a no-op. (And assuming that
sizeof(int) <= 42.) So it's hard to imagine an implementation
in which the first part fails. You should be able to use the
intPtr just like any other legally obtained intPtr. And the
idea that converting it back to a char* would somehow result
in a different value from the original char* seems
preposterous.
In the second part, all bets are off: you definitely can't
dereference the pointer (unless your implementation guarantees
otherwise), and it's also quite possible that converting it back
to char* results in something different. (Imagine a word
addressed machine, for example, where converting a char* to an
int* rounds up. Then converting back would result in
a char* which was sizeof(int) higher than the original. Or
where an attempt to convert a misaligned pointer always resulted
in a null pointer.)