Casting between primitive type pointers - c++

Is the following well-defined:
char* charPtr = new char[42];
int* intPtr = (int*)charPtr;
charPtr++;
intPtr = (int*) charPtr;
The intPtr isn't properly aligned (in at least one of the two cases). Is it illegal just having it there? Is it UB using it at any stage? How can you use it and how can't you?

In general, the result is unspecified (5.2.10p7) if the alignment requirements of int are greater than those of char, which they usually will be. The result will be a valid value of the type int * so it can be e.g. printed as a pointer with operator<< or converted to intptr_t.
Because the result has an unspecified value, unless specified by the implementation it is undefined behaviour to indirect it and perform lvalue-to-rvalue conversion on the resulting int lvalue (except in unevaluated contexts). Converting back to char * will not necessarily round-trip.
However, if the original char * was itself the result of a cast from int *, then the cast to int * counts as the second half of a round trip; in that case, the cast is defined.
In particular, in the case above where the char * was the result of a new[] expression, we are guaranteed (5.3.4p10) that the char * pointer is appropriately aligned for int, as long as sizeof(int) <= 42. Because the new[] expression obtains its storage from an allocation function, 3.7.4.1p2 applies; the void * pointer can be converted
to a pointer of any complete object type with a fundamental alignment requirement and then used to access the object [...] which strongly implies, along with the note to 5.3.4p10, that the same holds for the char * pointer returned by the new[] expression. In this case the int * is a pointer to an uninitialised int object, so performing lvalue-to-rvalue conversion on its indirection is undefined (3.8p6), but assigning to its indirection is fully defined. The int object is in the storage allocated (3.7.4.1p2) so converting the int * back to char * will yield the original value per 1.8p6. This does not hold for the incremented char * pointer as unless sizeof(int) == 1 it is not the address of an int object.

First, of course: the pointer is guaranteed to be aligned in the
first case (by §5.3.4/10 and §3.7.4.1/2), and may be correctly
aligned in both cases. (Obviously, if sizeof(int) == 1, but
even when this is not the case, an implementation doesn't
necessarily have alignment requirements.)
And to make things clear: your casts are all reinterpret_cast.
Beyond that, this is an interesting question, because as far as
I can tell, there is no difference in the two casts, as far as
the standard is concerned. The results of the conversion are
unspecified (according to §5.2.10/7); you're not even guaranteed
that converting it back into a char* will result in the
original value. (It obviously won't, for example, on machines
where int* is smaller than a char*.)
In practice, of course: the standard requires that the return
value of new char[N] be sufficiently aligned for any value
which may fit into it, so you are guaranteed to be able to do:
intPtr = new (charPtr) int;
Which has exactly the same effect as your cast, given that the
default constructor for int is a no-op. (And assuming that
sizeof(int) <= 42.) So it's hard to imagine an implementation
in which the first part fails. You should be able to use the
intPtr just like any other legally obtained intPtr. And the
idea that converting it back to a char* would somehow result
in a different value from the original char* seems
preposterous.
In the second part, all bets are off: you definitely can't
dereference the pointer (unless your implementation guarantees
otherwise), and it's also quite possible that converting it back
to char* results in something different. (Imagine a word
addressed machine, for example, where converting a char* to an
int* rounds up. Then converting back would result in
a char* which was sizeof(int) higher than the original. Or
where an attempt to convert a misaligned pointer always resulted
in a null pointer.)

Related

Is performing indirection from a pointer acquired from converting an integer value definitely UB?

Consider this example
int main(){
std::intptr_­t value = /* a special integer value */;
int* ptr = reinterpret_­cast<int*>(value ); // #1
int v = *ptr; // #2
}
[expr.reinterpret.cast] p5 says
A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined.
At least, step #1 is implementation-defined. For step #2, in my opinion, I think it has four possibilities, which are
The implementation does not support such a conversion, and implementation does anything for it.
The pointer value is exactly the address of an object of type int, the result is well-formed.
The pointer value is the address of an object other than the int type, the result is UB.
The pointer value is an invalid pointer value, the indirection is UB.
It means what the behavior of the indirection through the pointer ptr will be depends on the implementation. It is not definitely UB if the implementation takes option 2. So, I wonder whether this case is not definitely UB or is definitely UB? If it is latter, which provisions strongly state the behavior?
The standard has nothing more to say on it than what you quoted. The standard only guarantees the meaning of a integer-to-pointer cast if that integer value was taken from a pointer-to-integer cast. The meaning of all other integer-to-pointer conversions are implementation defined.
And that means everything about them is "implementation defined": what they result in and what using those results will do. After all, the pointer-to-integer-to-pointer round trip spells out that you get the "original value" back. The fact that the resulting pointer has the "original value" means that it will behave exactly like you had copied the original pointer itself. So the standard needs say nothing more on the matter.
The behavior of the pointer taken from an implementation-defined integer-to-pointer cast is... implementation-defined. If an implementation says that supports such conversions, it must spell out what the result of supported conversions are. That is, if there is some "a special integer value" for which a cast to an int* is supported by the implementation, it must say what the result of that cast is. This includes things like whether it pointer to an actual int or whatever.
Is performing indirection from a pointer acquired from converting an integer value definitely UB?
Not always. Here is an example that is definitely not UB:
int i = 42;
std::intptr_t value = reinterpret_cast<std::intptr_t>(&i);
int* ptr = reinterpret_cast<int*>(value);
int v = *ptr;
This is because converting a pointer to an integer of sufficient size and back to the same pointer type is guaranteed to yield the same pointer value as stated in the rule you quoted. Since the original pointer value was valid for indirection, so is the converted one.

Pointer cast into char* array

I'm fairly new to C++ and I'm having difficulty wrapping my head around what is going on in the final line of the below:
int numToSend = bs->GetSize();
char * tBuf = new char[NUM_LENGTH_BYTES + numToSend];
*(WORD*)tBuf = htons((WORD)numToSend);
So htons is returning a u_short or WORD, but the cast on tBuf is somewhat confusing to me. Is it something along the lines of "the value pointed to by tBuf is cast as a WORD pointer and assigned the return from htons"?
I believe this is a fairly unsafe operation in most cases, what would be the best practice here?
It may not be a recommended practice, but AFAIK, it is safe. It is true that in general, taking a pointer to P, casting it to a pointer to Q and using it as a pointer to Q leads to undefined behaviour. Here it looks even worse, because the alignment requirement of char are known to be the weakest possible.
But the char * tBuf pointer has been obtained through a new expression. Such a new expression internally rely on a allocation function to obtain storage, and draft n4296 for c++14 says in 3.7.4.1 Allocation functions [basic.stc.dynamic.allocation] §2:
The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall
return the address of the start of a block of storage whose length in bytes shall be at least as large as
the requested size... The pointer returned shall be suitably aligned so that it can be converted
to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used
to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call
to a corresponding deallocation function).
So this line *(WORD*)tBuf = htons((WORD)numToSend); only does perfectly defined operations:
convert numToSend from an integer type to an unsigned type, and 4.7 Integral conversions [conv.integral] says:
A prvalue of an integer type can be converted to a prvalue of another integer type...
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type)
call htons with a WORD or uint16_t as parameter to return a uint16_t or WORD
converts a pointer obtained by new to a WORD * and uses that pointer to access the object in the storage allocated
Simply, the value of the first two bytes of the allocated array is now unspecified. More exactly it is the byte representation of the WORD in the particular implementation.
But it is still allowed to access the allocated array as a character array, even if the first bytes now contain a WORD, because it is explicitely allowed per the so called strict aliasing rule 3.10 Lvalues and rvalues [basic.lval] §10 :
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:...
(10.8) — a char or unsigned char type.
If the tBuf pointer had not been obtained through a new expression, the only correct way would have been to do a memcpy:
WORD n_numToSend = htons(numToSend);
memcpy(tBuf, &n_numToSend, sizeof(WORD));
As this one is allowed for any pointer provided the storage is big enough, it is what I would call the recommended practice.

Can std::uintptr_t be used to avoid undefined behavior of out-of-bounds pointer arithmetic?

Now we know that doing out-of-bounds-pointer-arithmetic has undefined behavior as described in this SO question.
My question is: can we workaround such restriction by casting to std::uintptr_t for arithmetic operations and then cast back to pointer? is that guaranteed to work?
For example:
char a[5];
auto u = reinterpret_cast<std::uintptr_t>(a) - 1;
auto p = reinterpret_cast<char*>(u + 1); // OK?
The real world usage is for optimizing offsetted memory access -- instead of p[n + offset], I want to do offset_p[n].
EDIT To make the question more explicit:
Given a base pointer p of a char array, if p + n is a valid pointer, will reinterpret_cast<char*>(reinterpret_cast<std::uintptr_t>(p) + n) be guaranteed to yield the same valid pointer?
No, uintptr_t cannot be meaningfully used to avoid undefined behavior when performing pointer arithmetic.
For one thing, at least in C there is no guarantee that uintptr_t even exists. The requirement is that any value of type void* may be converted to uintptr_t and back again, yielding the original value without loss of information. In principle, there might not be any unsigned integer type wide enough to hold all pointer values. (I presume the same applies to C++, since C++ inherits most of the C standard library and defines it by reference to the C standard.)
Even if uintptr_t does exist, there is no guarantee that a given arithmetic operation on a uintptr_t value does the same thing as the corresponding operation on a pointer value.
For example, I've worked on systems (Cray vector systems, T90 and SV1) on which byte pointers are implemented in software. A native address is a 64-bit address that refers to a 64-bit word; there is no hardware support for byte addressing. A char* or void* pointer consists of a word pointer with a 3-bit offset stored in the otherwise unused high-order bits. Conversion between integers and pointers simply copies the bits. So incrementing a char* would advance it to point to the next 8-bit byte in memory; incrementing a uintptr_t obtained by converting a char* would advance it to point to the next 64-bit word.
That's just one example. More generally, conversions between pointers and integers are implementation-defined, and the language standard makes no guarantee about the semantics of those conversions (other than, in some cases, converting back to a pointer).
So yes, you can convert a pointer value to uintptr_t (if that type exists) and perform arithmetic on it without risking undefined behavior -- but the result may or may not be meaningful.
It happens that, on most systems, the mapping between pointers and integers is simpler, and you probably can get away with that kind of game. But you're better off using pointer arithmetic directly, and just being very careful to avoid any invalid operations.
Yes, that is legal, but you must reinterpret_cast exactly the same uintptr_t value back to char*.
(Therefore, what it you're intending to do is illegal; that is, converting a different value back to a pointer.)
5.2.10 Reinterpret cast
4 . A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is
implementation-defined.
5 . A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted
to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type
will have its original value;
(Note that there'd be no way, in general, for the compiler to know that you subtracted one and then added it back.)

Any type of pointer can point to anything?

Is this statement correct? Can any "TYPE" of pointer can point to any other type?
Because I believe so, still have doubts.
Why are pointers declared for definite types? E.g. int or char?
The one explanation I could get was: if an int type pointer was pointing to a char array, then when the pointer is incremented, the pointer will jump from 0 position to the 2 position, skipping 1 position in between (because int size=2).
And maybe because a pointer just holds the address of a value, not the value itself, i.e. the int or double.
Am I wrong? Was that statement correct?
Pointers may be interchangeable, but are not required to be.
In particular, on some platforms, certain types need to be aligned to certain byte-boundaries.
So while a char may be anywhere in memory, an int may need to be on a 4-byte boundary.
Another important potential difference is with function-pointers.
Pointers to functions may not be interchangeable with pointers to data-types on many platforms.
It bears repeating: This is platform-specific.
I believe Intel x86 architectures treat all pointers the same.
But you may well encounter other platforms where this is not true.
Every pointer is of some specific type. There's a special generic pointer type void* that can point to any object type, but you have to convert a void* to some specific pointer type before you can dereference it. (I'm ignoring function pointer types.)
You can convert a pointer value from one pointer type to another. In most cases, converting a pointer from foo* to bar* and back to foo* will yield the original value -- but that's not actually guaranteed in all cases.
You can cause a pointer of type foo* to point to an object of type bar, but (a) it's usually a bad idea, and (b) in some cases, it may not work (say, if the target types foo and bar have different sizes or alignment requirements).
You can get away with things like:
int n = 42;
char *p = (char*)&n;
which causes p to point to n -- but then *p doesn't give you the value of n, it gives you the value of the first byte of n as a char.
The differing behavior of pointer arithmetic is only part of the reason for having different pointer types. It's mostly about type safety. If you have a pointer of type int*, you can be reasonably sure (unless you've done something unsafe) that it actually points to an int object. And if you try to treat it as an object of a different type, the compiler will likely complain about it.
Basically, we have distinct pointer types for the same reasons we have other distinct types: so we can keep track of what kind of value is stored in each object, with help from the compiler.
(There have been languages that only have untyped generic pointers. In such a language, it's more difficult to avoid type errors, such as storing a value of one type and accidentally accessing it as if it were of another type.)
Any pointer can refer to any location in memory, so technically the statement is correct. With that said, you need to be careful when reinterpreting pointer types.
A pointer basically has two pieces of information: a memory location, and the type it expects to find there. The memory location could be anything. It could be the location where an object or value is stored; it could be in the middle of a string of text; or it could just be an arbitrary block of uninitialised memory.
The type information in a pointer is important though. The array and pointer arithmetic explanation in your question is correct -- if you try to iterate over data in memory using a pointer, then the type needs to be correct, otherwise you may not iterate correctly. This is because different types have different sizes, and may be aligned differently.
The type is also important in terms of how data is handled in your program. For example, if you have an int stored in memory, but you access it by dereferencing a float* pointer, then you'll probably get useless results (unless you've programmed it that way for a specific reason). This is because an int is stored in memory differently from the way a float is stored.
Can any "TYPE" of pointer can point to any other type?
Generally no. The types have to be related.
It is possible to use reinterpret_cast to cast a pointer from one type to another, but unless those pointers can be converted legally using a static_cast, the reinterpret_cast is invalid. Hence you can't do Foo* foo = ...; Bar* bar = (Bar*)foo; unless Foo and Bar are actually related.
You can also use reinterpret_cast to cast from an object pointer to a void* and vice versa, and in that sense a void* can point to anything -- but that's not what you seem to be asking about.
Further you can reinterpret_cast from object pointer to integral value and vice versa, but again, not what you appear to be asking.
Finally, a special exception is made for char*. You can initialize a char* variable with the address of any other type, and perform pointer math on the resulting pointer. You still can't dereference thru the pointer if the thing being pointed to isn't actually a char, but it can then be casted back to the actual type and used that way.
Also keep in mind that every time you use reinterpret_cast in any context, you are dancing on the precipice of a cliff. Dereferencing a pointer to a Foo when the thing it actually points to is a Bar yields Undefined Behavior when the types are not related. You would do well to avoid these types of casts at all costs.
Some pointers are more equal than others...
First of all, not all pointers are necessarily the same thing. Function pointers can be something very different from data pointers, for instance.
Aside: Function pointers on PPC
On the PPC platform, this was quite obvious: A function pointer was actually two pointers under the hood, so there was simply no way to meaningfully cast a function pointer to a data pointer or back. I.e. the following would hold:
int* dataP;
int (*functionP)(int);
assert(sizeof(dataP) == 4);
assert(sizeof(functionP) == 8);
assert(sizeof(dataP) != sizeof(functionP));
//impossible:
//dataP = (int*)functionP; //would loose information
//functionP = (int (*)(int))dataP; //part of the resulting pointer would be garbage
Alignment
Furthermore, there is problems with alignment: Depending on the platform some data types may need to be aligned in memory. This is especially common with vector data types, but could apply to any type larger than a byte. For instance, if an int must be 4 byte aligned, the following code might crash:
char a[4];
int* alias = (int*)a;
//int foo = *alias; //may crash because alias is not aligned properly
This is not an issue if the pointer comes from a malloc() call, as that is guaranteed to return sufficiently aligned pointers for all types:
char* a = malloc(sizeof(int));
int* alias = (int*)a;
*alias = 0; //perfectly legal, the pointer is aligned
Strict aliasing and type punning
Finally, there are strict aliasing rules: You must not access an object of one type through a pointer to another type. Type punning is forbidden:
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
//uint32_t bits = *(uint32_t*)&foo; //type punning is illegal
If you absolutely must reinterpret a bit pattern as another type, you must use memcpy():
assert(sizeof(float) == sizeof(uint32_t));
float foo = 42;
uint32_t bits;
memcpy(&bits, &foo, sizeof(bits)); //bit pattern reinterpretation is legal when copying the data
To allow memcpy() and friends to actually be implementable, the C/C++ language standards provide for an exception for char types: You can cast any pointer to a char*, copy the char data over to another buffer, and then access that other buffer as some other type. The results are implementation defined, but the standards allow it. Use cases are mostly general data manipulation routines like I/O, etc.
TL;DR:
Pointers are much less interchangeable than you think. Don't reinterpret pointers in any other way than to/from char* (check alignment in the "from" case). And even that does not work for function pointers.

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.