Why does reinterpret_cast only work for 0? - c++

While studying the behavior of casts in C++, I discovered that reinterpret_cast-ing from float* to int* only works for 0:
float x = 0.0f;
printf("%i\n", *reinterpret_cast<int*>(&x));
prints 0, whereas
float x = 1.0f;
printf("%i\n", *reinterpret_cast<int*>(&x));
prints 1065353216.
Why? I expected the latter to print 1 just like static_cast would.

A reinterpret_cast says to reinterpret the bits of its operand. In your example, the operand, &x, is the address of a float, so it is a float *. reinterpret_cast<int *> asks to reinterpret these bits as an int *, so you get an int *. If pointers to int and pointers to float have the same representation in your C++ implementation, this may1 work to give you an int * that points to the memory of the float.
However, the reinterpret_cast does not change the bits that are pointed to. They still have their same values. In the case of a float with value zero, the bits used to represent it are all zeros. When you access these through a dereferenced int *, they are read and interpreted as an int. Bits that are all zero represent an int value of zero.
In the case of a float with value one, the bits used to represent it in your C++ implementation are, using hexadecimal to show them, 3f80000016. This is because the exponent field of the format is stored with an offset, so there are some non-zero bits to show the value of the exponent. (That is part of how the floating-point format is encoded. Conceptually, 1.0f is represented as + 1.000000000000000000000002 • 20. Then the + sign and the bits after the “1.” are stored literally as zero bits. However, the exponent is stored by adding 127 and storing the result as an eight-bit integer. So an exponent of 0 is stored as 127. The represented value of the exponent is zero, but the bits that represent it are not zero.) When you access these bits through a dereferenced int *, they are read and interpreted as an int. These bits represent an int value of 1065353216 (which equals 3f80000016).
Footnote
1 The C++ standard does not guarantee this, and what actually happens is dependent on other factors.

In both cases the behaviour of the program is undefined, because you access an object through a glvalue that doesn't refer to an object of same or compatible type.
What you have observed is one possible behaviour. The behaviour could have been different, but there is no guarantee of that it would have been, and it wasn't. Whether you expected one result or another is not guaranteed to have effect on the behaviour.
I expected the latter to print 1 just like static_cast would.
It is unreasonable to expect reinterpret_cast to behave as static_cast would. They are wildly different and one can not be substituted for the other. Using static_cast to convert the pointers would make the program ill-formed.
rainterpret_cast should not be used unless one knows what it does, and knows that its use is correct. The practical use cases are rare.
Here are a few examples that have well defined behaviour, and are guaranteed to print 1:
int i = x;
printf("%i\n", i);
printf("%i\n", static_cast<int>(x));
printf("%g\n", x);
printf("%.0f\n", x);
Given that we've concluded that behaviour is undefined, there is no need for further analysis.
But we can consider why the behaviour may have happened to be what we observed. It is however important to understand that these considerations will not be useful in controlling what the result will be while the behaviour is undefined.
The binary representations of 32 bit IEEE 754 floating point number for 1.0f and +0.0f happen to be:
0b00111111100000000000000000000000
0b00000000000000000000000000000000
Which also happen to be the binary representation of the integer 1065353216 and 0. Is it a coincidence that the output of the programs were these specific integers whose binary representation match the representation of the float value? Could be in theory, but it probably isn't.

float has a different representation than int, so that you cannot treat float representation as int. That's undefined behaviour in C++.
It so happens that on modern architectures a 0-bit pattern represents any fundamental type with value of 0 (so that one can memset with zeroes a float or double, or integer types, or pointer types and get 0-valued object, that's what that calloc function does). This is why that cast-and-dereference "works" for 0 value, but that is still undefined behaviour. The C++ standard doesn't require a 0-bit pattern to represent a 0 floating point value, neither it requires any specific representation of floating point numbers.
A conversion of float to int is implicit and no cast is required.
A solution:
float x = 1.0f;
int x2 = x;
printf("%i\n", x2);
// or
printf("%i\n", static_cast<int>(x));

Related

What happens when a pointer to a 32-bit integer is cast to a pointer to a float and dereferenced?

I saw this cast:
const std::uint32_t tmp = (some number);
float ToFloat = *(float*)(&tmp);
But when casting, the wrong number gets into the float. Moreover, if I use a static cast, then it is cast correctly.
Why is this happening?
How exactly does static_cast work?
*(float*)(&Tmp) means that we are trying to dereference the pointer to the float, which is located at the address &tmp. Is it right?
Why is this happening?
The program has undefined behavior because you read from an int through the eyes of a float. This is not allowed in C++.
How exactly does static_cast work?
In this case, it does the same conversion that float ToFloat = Tmp; would do. That is, it converts the int to float and assigns it to ToFloat.
*(float*)(&Tmp) means that we are trying to dereference the pointer to the float, which is located at the address &Tmp. Is it right?
There is no float at &Tmp, only an int. You however tell the compiler that the pointer is pointing at a float and dereference that pointer. The bit patterns of ints and floats are very different so not only does it have undefined behavior, you will very unlikely get the correct result.
Storing Integers
Integers are stored in their binary (base 2) representation. int32_t occupies 32 bits.
Storing Floats
In most of the C++ compilers use a standard called IEEE 754, to store floats and doubles. Which differs a lot from how integers are stored.
You can check how numbers are stored yourself using some sort of a IEEE-754 Floating Point Converter.
Example
Let us consider a simple example of number 5.
const uint Tmp = 5;
Tmp would be stored as 0x00000005 in memory, therefore &Tmp would point to the same.
float ToFloat = *(float*)(&Tmp);
Casting it into a float pointer would treat 0x00000005 to be in IEEE 754 format, having value 7.00649232162e-45.
static_cast
The static cast performs conversions between compatible types using safe implicit conversions.
Note that you are converting from int* to float* in your question which is not a safe conversion. You can convert from int to float safely.
const uint32_t Tmp = 5;
float ToFloat = (float)Tmp; // 5
More about static_cast conversion.
What this code attempts to do is to convert a 32-bit representation of a floating-point number into an actual floating-point value. An IEEE 754 binary floating-point value is (most of the time) a representation of a number of the form (−1)s × m × 2e. For a given floating-point format, a certain number of bits within a machine word is allocated for each of s (the sign bit), m (the mantissa or significand bits) and e (the exponent bits). Those bits can be manipulated directly within the bit representation in order to construct any possible floating-point value, bypassing the usual abstractions of ordinary arithmetic operations provided by the language. The converter mentioned in Aryan’s answer is a pretty good way to explore how this works.
Unfortunately, this code is not standard-conforming C++ (or, in fact, C), and that’s thanks to a little something called strict aliasing. In short: if a variable has been declared as being of integer type, it can only be accessed via pointers to that same integer type. Accessing it via a pointer to a different type is not allowed.
A more-or-less officially-sanctioned version would be:
inline static float float_from_bits(std::uint32_t x) {
float result;
static_assert(sizeof(result) == sizeof(x),
"float must be 32 bits wide");
std::memcpy(&result, &x, sizeof(result));
return result;
}
Or, in C++20, simply std::bit_cast<float>(x). This is still not quite guaranteed to return expected results by the standard, but at least it doesn’t violate that particular rule.
A direct static_cast would instead convert the integer into a floating-point value denoting (approximately) the same quantity. This is not a direct bit conversion: performing it entails computing the appropriate values of s, m and e and putting them in the appropriate place according to the number format.

Why is the sign different after subtracting unsigned and signed?

unsigned int t = 10;
int d = 16;
float c = t - d;
int e = t - d;
Why is the value of c positive but e negative?
Let's start by analysing the result of t - d.
t is an unsigned int while d is an int, so to do arithmetic on them, the value of d is converted to an unsigned int (C++ rules say unsigned gets preference here). So we get 10u - 16u, which (assuming 32-bit int) wraps around to 4294967290u.
This value is then converted to float in the first declaration, and to int in the second one.
Assuming the typical implementation of float (32-bit single-precision IEEE), its highest representable value is roughly 1e38, so 4294967290u is well within that range. There will be rounding errors, but the conversion to float won't overflow.
For int, the situation's different. 4294967290u is too big to fit into an int, so wrap-around happens and we arrive back at the value -6. Note that such wrap-around is not guaranteed by the standard: the resulting value in this case is implementation-defined(1), which means it's up to the compiler what the result value is, but it must be documented.
(1) C++17 (N4659), [conv.integral] 7.8/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
First, you have to understand "usual arithmetic conversions" (that link is for C, but the rules are the same in C++). In C++, if you do arithmetic with mixed types (you should avoid that when possible, by the way), there's a set of rules that decides which type the calculation is done in.
In your case, you are subtracting a signed int from an unsigned int. The promotion rules say that the actual calculation is done using unsigned int.
So your calculation is 10 - 16 in unsigned int arithmetic. Unsigned arithmetic is modulo arithmetic, meaning that it wraps around. So, assuming your typical 32-bit int, the result of this calculation is 2^32 - 6.
This is the same for both lines. Note that the subtraction is completely independent from the assignment; the type on the left side has absolutely no influence on how the calculation happens. It is a common beginner mistake to think that the type on the left side somehow influences the calculation; but float f = 5 / 6 is zero, because the division still uses integer arithmetic.
The difference, then, is what happens during the assignment. The result of the subtraction is implicitly converted to float in one case, and int in the other.
The conversion to float tries to find the closest value to the actual one that the type can represent. This will be some very large value; not quite the one the original subtraction yielded though.
The conversion to int says that if the value fits into the range of int, the value will be unchanged. But 2^32 - 6 is far larger than the 2^31 - 1 that a 32-bit int can hold, so you get the other part of the conversion rule, which says that the resulting value is implementation-defined. This is a term in the standard that means "different compilers can do different things, but they have to document what they do".
For all practical purposes, all compilers that you'll likely encounter say that the bit pattern stays the same and is just interpreted as signed. Because of the way 2's complement arithmetic works (the way that almost all computers represent negative numbers), the result is the -6 you would expect from the calculation.
But all this is a very long way of repeating the first point, which is "don't do mixed type arithmetic". Cast the types first, explicitly, to types that you know will do the right thing.

Why cast a pointer to a float into a pointer to a long, then dereference?

I was going through this example which has a function outputting a hex bit pattern to represent an arbitrary float.
void ExamineFloat(float fValue)
{
printf("%08lx\n", *(unsigned long *)&fValue);
}
Why take the address of fValue, cast to unsigned long pointer, then dereference? Isn't all that work just equivalent to a direct cast to unsigned long?
printf("%08lx\n", (unsigned long)fValue);
I tried it and the answer isn't the same, so confused.
(unsigned long)fValue
This converts the float value to an unsigned long value, according to the "usual arithmetic conversions".
*(unsigned long *)&fValue
The intention here is to take the address at which fValue is stored, pretend that there is not a float but an unsigned long at this address, and to then read that unsigned long. The purpose is to examine the bit pattern which is used to store the float in memory.
As shown, this causes undefined behavior though.
Reason: You may not access an object through a pointer to a type that is not "compatible" to the object's type. "Compatible" types are for example (unsigned) char and every other type, or structures that share the same initial members (speaking of C here). See §6.5/7 N1570 for the detailed (C11) list (Note that my use of "compatible" is different - more broad - than in the referenced text.)
Solution: Cast to unsigned char *, access the individual bytes of the object and assemble an unsigned long out of them:
unsigned long pattern = 0;
unsigned char * access = (unsigned char *)&fValue;
for (size_t i = 0; i < sizeof(float); ++i) {
pattern |= *access;
pattern <<= CHAR_BIT;
++access;
}
Note that (as #CodesInChaos pointed out) the above treats the floating point value as being stored with its most significant byte first ("big endian"). If your system uses a different byte order for floating point values you'd need to adjust to that (or rearrange the bytes of above unsigned long, whatever's more practical to you).
Floating-point values have memory representations: for example the bytes can represent a floating-point value using IEEE 754.
The first expression *(unsigned long *)&fValue will interpret these bytes as if it was the representation of an unsigned long value. In fact in C standard it results in an undefined behavior (according to the so-called "strict aliasing rule"). In practice, there are issues such as endianness that have to be taken into account.
The second expression (unsigned long)fValue is C standard compliant. It has a precise meaning:
C11 (n1570), § 6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
*(unsigned long *)&fValue is not equivalent to a direct cast to an unsigned long.
The conversion to (unsigned long)fValue converts the value of fValue into an unsigned long, using the normal rules for conversion of a float value to an unsigned long value. The representation of that value in an unsigned long (for example, in terms of the bits) can be quite different from how that same value is represented in a float.
The conversion *(unsigned long *)&fValue formally has undefined behaviour. It interprets the memory occupied by fValue as if it is an unsigned long. Practically (i.e. this is what often happens, even though the behaviour is undefined) this will often yield a value quite different from fValue.
Typecasting in C does both a type conversion and a value conversion. The floating point → unsigned long conversion truncates the fractional portion of the floating point number and restricts the value to the possible range of an unsigned long. Converting from one type of pointer to another has no required change in value, so using the pointer typecast is a way to keep the same in-memory representation while changing the type associated with that representation.
In this case, it's a way to be able to output the binary representation of the floating point value.
As others have already noted, casting a pointer to a non-char type to a pointer to a different non-char type and then dereferencing is undefined behavior.
That printf("%08lx\n", *(unsigned long *)&fValue) invokes undefined behavior does not necessarily mean that running a program that attempts to perform such a travesty will result in hard drive erasure or make nasal demons erupt from ones nose (the two hallmarks of undefined behavior). On a computer in which sizeof(unsigned long)==sizeof(float) and on which both types have the same alignment requirements, that printf will almost certainly do what one expects it to do, which is to print the hex representation of the floating point value in question.
This shouldn't be surprising. The C standard openly invites implementations to extend the language. Many of these extensions are in areas that are, strictly speaking, undefined behavior. For example, the POSIX function dlsym returns a void*, but this function is typically used to find the address of a function rather than a global variable. This means the void pointer returned by dlsym needs to be cast to a function pointer and then dereferenced to call the function. This is obviously undefined behavior, but it nonetheless works on any POSIX compliant platform. This will not work on a Harvard architecture machine on which pointers to functions have different sizes than do pointers to data.
Similarly, casting a pointer to a float to a pointer to an unsigned integer and then dereferencing happens to work on almost any computer with almost any compiler in which the size and alignment requirements of that unsigned integer are the same as that of a float.
That said, using unsigned long might well get you into trouble. On my computer, an unsigned long is 64 bits long and has 64 bit alignment requirements. This is not compatible with a float. It would be better to use uint32_t -- on my computer, that is.
The union hack is one way around this mess:
typedef struct {
float fval;
uint32_t ival;
} float_uint32_t;
Assigning to a float_uint32_t.fval and accessing from a ``float_uint32_t.ival` used to be undefined behavior. That is no longer the case in C. No compiler that I know of blows nasal demons for the union hack. This was not UB in C++. It was illegal. Until C++11, a compliant C++ compiler had to complain to be compliant.
Any even better way around this mess is to use the %a format, which has been part of the C standard since 1999:
printf ("%a\n", fValue);
This is simple, easy, portable, and there is no chance of undefined behavior. This prints the hexadecimal/binary representation of the double precision floating point value in question. Since printf is an archaic function, all float arguments are converted to double prior to the call to printf. This conversion must be exact per the 1999 version of the C standard. One can pick up that exact value via a call to scanf or its sisters.

Implicit conversion in C++ between main and function

I have the below simple program:
#include <iostream>
#include <stdio.h>
void SomeFunction(int a)
{
std::cout<<"Value in function: a = "<<a<<std::endl;
}
int main(){
size_t a(0);
std::cout<<"Value in main: "<<a-1<<std::endl;
SomeFunction(a-1);
return 0;
}
Upon executing this I get:
Value in main: 18446744073709551615
Value in function: a = -1
I think I roughly understand why the function gets the 'correct' value of -1: there is an implicit conversion from the unsigned type to the signed one i.e. 18446744073709551615(unsigned) = -1(signed).
Is there any situation where the function will not get the 'correct' value?
Since size_t type is unsigned, subtracting 1 is well defined:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
However, the resultant value of 264-1 is out of ints range, so you get implementation-defined behavior:
[when] the new type is signed and the value cannot be represented in it, either the result is implementation-defined or an implementation-defined signal is raised.
Therefore, the answer to your question is "yes": there are platforms where the value of a would be different; there are also platforms where instead of calling SomeFunction the program will raise a signal.
Not on your computer... but technically yes, there is a situation where things can go wrong.
All modern PCs use the "two's complement" system for signed integer arithmetic (read Wikipedia for details). Two's complement has many advantages, but one of the biggest is this: unsaturated addition and subtraction of signed integers is identical to that of unsigned integers. As long as overflow/underflow causes the result to "wrap around" (i.e., 0-1 = UINT_MAX), the computer can add and subtract without even knowing whether you're interpreting the numbers as signed or unsigned.
BUT! C/C++ do not technically require two's complement for signed integers. There are two other permissible systems, known as "sign-magnitude" and "one's complement". These are unusual systems, never found outside antique architectures and embedded processors (and rarely even there). But in those systems, signed and unsigned arithmetic do not match up, and (signed)(a+b) will not necessarily equal (signed)a + (signed) b.
There's another, more mundane caveat when you're also narrowing types, as is the case between size_t and int on x64, because C/C++ don't require compilers to follow a particular rule when narrowing out-of-range values to signed types. This is likewise more a matter of language lawyering than actual unsafeness, though: VC++, GCC, Clang, and all other compilers I'm aware of narrow through truncation, leading to the expected behavior.
It's easier to compare and contrast signed and insigned of type same basic type, say signed int and unsigned int.
On a system that uses 32 bits for int, the range of unsigned int is [0 - 4294967295] and the range of signed int is [-2147483647 - 2147483647].
Say you have a variable of type unsigned int and its value is greater than 2147483647. If you pass such a variable to SomeFunction, you will see an incorrect value in the function.
Conversely, say you have a variable of type signed int and its value is less than zero. If you pass such a variable to a function that expects an unsigned int, you will see an incorrect value in the function.

Aliasing of otherwise equivalent signed and unsigned types

The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
Additionally, what about compatible but not equivalent integer types?
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
If an unsigned int is read through an int*, the value must be
within the range of int or an integer overflow occurs and the
behaviour is undefined. Is this correct?
Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.
If an int is read through an unsigned int*, negative values wrap
around as if they were casted to unsigned int. Is this correct?
Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this
conversion is conceptual and there is no change in the bit pattern (if
there is no truncation). — end note ]
And now consider
10000000 00000001 // -1 in signed magnitude for 16-bit int
This would certainly be 215+1 if interpreted as an unsigned. A cast would yield 216-1 though.
If the value is within the range of both int and unsigned int,
accessing it through a pointer of either type is fully defined and
gives the same value. Is this correct?
Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.
On systems where int and long have the same range, alignment,
etc., can int* and long* alias? (I assume not.)
No. They are independent types.
Can char16_t* and uint_least16_t* alias?
Technically not, but that seems to be an unneccessary restriction of the standard.
Types char16_t and char32_t denote distinct types with the same
size, signedness, and alignment as uint_least16_t and
uint_least32_t, respectively, in <cstdint>, called the underlying
types.
So it should be practically possible without any risks (since there shouldn't be any padding).
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:
int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;
Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:
int n = -1; /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n; /* 11111111 11111111
effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n; /* 10000000 00000001
only reinterpreted, the value is now INT_MAX + 2u */
Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -2M (two's complement);
the sign bit has the value -2M-1 (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:
The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:
int n = 42;
long l = *(long *)&n; // UB
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).
It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.
Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:
Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.
You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.
Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.
Concerning your questions regarding unsigned int* and int*: if the
value in the actual type doesn't fit in the type you're reading, the
behavior is undefined, simply because the standard neglects to define
any behavior in this case, and any time the standard fails to define
behavior, the behavior is undefined. In practice, you'll almost always
obtain a value (no signals or anything), but the value will vary
depending on the machine: a machine with signed magnitude or 1's
complement, for example, will result in different values (both ways)
from the usual 2's complement.
For the rest, int and long are different types, regardless of their
representations, and int* and long* cannot alias. Similarly, as you
say, in C++, char16_t is a distinct type in C++, but a typedef in
C (so the rules concerning aliasing are different).