Conversion between short* to int*

Conversion between short* to int* - c++

Assuming short is 2 bytes and int is 4 bytes on a 32 bit OS. Is the following an undefined behavior?
short s = 42;
int *p = (int*)(&s);

No, the code that you have posted does not exhibit undefined behavior but attempting to read *p would. Also, depending on the alignment requirements of int and short, the result of the cast may be unspecified and irreversable (see 5.2.10 [expr.reinterpret.cast] / 7).
See ISO/IEC 14882:2011 3.10 [basic.lval] / 10:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
The object that you are trying to access is a short and *p is a glvalue of type int which doesn't meet any of the above descriptions.

Your code is directly in the realm of UB, as you are reading two uninitialized bytes.
However, the opposite,
int b
short* f = (short *) &b
will probably work due to the semantics of the little endian architecture.
(This is all assuming the compiler doesn't do anything stupid)
From wikipedia:
The little-endian system has the property that the same value can be
read from memory at different lengths without using different
addresses (even when alignment restrictions are imposed). For example,
a 32-bit memory location with content 4A 00 00 00 can be read at the
same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit
(00004A), or 32-bit (0000004A), all of which retain the same numeric
value. Although this little-endian property is rarely used directly by
high-level programmers, it is often employed by code optimizers as
well as by assembly language programmers.
So, as long as you are little endian, the opposite direction should be fine.
Still undefined behavior though.

Related

ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

I recently discovered about the vreinterpret{q}_dsttype_srctype casting operator. However this doesn't seem to support conversion in the data type described at this link (bottom of the page):
Some intrinsics use an array of vector types of the form:
<type><size>x<number of lanes>x<length of array>_t
These types are treated as ordinary C structures containing a single
element named val.
An example structure definition is:
struct int16x4x2_t
{
int16x4_t val[2];
};
Do you know how to convert from uint8x16_t to uint8x8x2_t?
Note that that the problem cannot be reliably addressed using union (reading from inactive members leads to undefined behaviour Edit: That's only the case for C++, while it turns out that C allows type punning), nor by using pointers to cast (breaks the strict aliasing rule).

It's completely legal in C++ to type pun via pointer casting, as long as you're only doing it to char*. This, not coincidentally, is what memcpy is defined as working on (technically unsigned char* which is good enough).
Kindly observe the following passage:
For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char.
42 If the content of the array of char or unsigned char is copied back
into the object, the object shall subsequently hold its original
value. [Example:
#define N sizeof(T)
char buf[N];
T obj;
// obj initialized to its original value
std::memcpy(buf, &obj, N);
// between these two calls to std::memcpy,
// obj might be modified
std::memcpy(&obj, buf, N);
// at this point, each subobject of obj of scalar type
// holds its original value
— end example ]
Put simply, copying like this is the intended function of std::memcpy. As long as the types you're dealing with meet the necessary triviality requirements, it's totally legit.
Strict aliasing does not include char* or unsigned char*- you are free to alias any type with these.
Note that for unsigned ints specifically, you have some very explicit leeway here. The C++ Standard requires that they meet the requirements of the C Standard. The C Standard mandates the format. The only way that trap representations or anything like that can be involved is if your implementation has any padding bits, but ARM does not have any- 8bit bytes, 8bit and 16bit integers. So for unsigned integers on implementations with zero padding bits, any byte is a valid unsigned integer.
For unsigned integer types other than unsigned char, the bits
of the object representation shall be divided into two groups:
value bits and padding bits (there need not be any of the
latter). If there are N value bits, each bit shall represent
a different power of 2 between 1 and 2N−1, so that objects
of that type shall be capable of representing values from 0
to 2N−1 using a pure binary representation; this shall be
known as the value representation. The values of any padding bits are
unspecified.

Based on your comments, it seems you want to perform a bona fide conversion -- that is, to produce a distinct, new, separate value of a different type. This is a very different thing than a reinterpretation, such as the lead-in to your question suggests you wanted. In particular, you posit variables declared like this:
uint8x16_t a;
uint8x8x2_t b;
// code to set the value of a ...
and you want to know how to set the value of b so that it is in some sense equivalent to the value of a.
Speaking to the C language:
The strict aliasing rule (C2011 6.5/7) says,
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object, [...]
an aggregate or union type that includes one of the aforementioned types among its members [...], or
a character type.
(Emphasis added. Other enumerated options involve differently-qualified and differently-signed versions of the of the effective type of the object or compatible types; these are not relevant here.)
Note that these provisions never interfere with accessing a's value, including the member value, via variable a, and similarly for b. But don't overlook overlook the usage of the term "effective type" -- this is where things can get bolluxed up under slightly different circumstances. More on that later.
Using a union
C certainly permits you to perform a conversion via an intermediate union, or you could rely on b being a union member in the first place so as to remove the "intermediate" part:
union {
uint8x16_t x1;
uint8x8_2_t x2;
} temp;
temp.x1 = a;
b = temp.x2;
Using a typecast pointer (to produce UB)
However, although it's not so uncommon to see it, C does not permit you to type-pun via a pointer:
// UNDEFINED BEHAVIOR - strict-aliasing violation
b = *(uint8x8x2_t *)&a;
// DON'T DO THAT
There, you are accessing the value of a, whose effective type is uint8x16_t, via an lvalue of type uint8x8x2_t. Note that it is not the cast that is forbidden, nor even, I'd argue, the dereferencing -- it is reading the dereferenced value so as to apply the side effect of the = operator.
Using memcpy()
Now, what about memcpy()? This is where it gets interesting. C permits the stored values of a and b to be accessed via lvalues of character type, and although its arguments are declared to have type void *, this is the only plausible interpretation of how memcpy() works. Certainly its description characterizes it as copying characters. There is therefore nothing wrong with performing a
memcpy(&b, &a, sizeof a);
Having done so, you may freely access the value of b via variable b, as already mentioned. There are aspects of doing so that could be problematic in a more general context, but there's no UB here.
However, contrast this with the superficially similar situation in which you want to put the converted value into dynamically-allocated space:
uint8x8x2_t *c = malloc(sizeof(*c));
memcpy(c, &a, sizeof a);
What could be wrong with that? Nothing is wrong with it, as far as it goes, but here you have UB if you afterward you try to access the value of *c. Why? because the memory to which c points does not have a declared type, therefore its effective type is the effective type of whatever was last stored in it (if that has an effective type), including if that value was copied into it via memcpy() (C2011 6.5/6). As a result, the object to which c points has effective type uint8x16_t after the copy, whereas the expression *c has type uint8x8x2_t; the strict aliasing rule says that accessing that object via that lvalue produces UB.

So there are a bunch of gotchas here. This reflects C++.
First you can convert trivially copyable data to char* or unsigned char* or c++17 std::byte*, then copy it from one location to another. The result is defined behavior. The values of the bytes are unspecified.
If you do this from a value of one one type to another via something like memcpy, this can result in undefined behaviour upon access of the target type unless the target type has valid values for all byte representations, or if the layout of the two types is specified by your compiler.
There is the possibility of "trap representations" in the target type -- byte combinations that result in machine exceptions or something similar if interpreted as a value of that type. Imagine a system that doesn't use IEEE floats and where doing math on NaN or INF or the like causes a segfault.
There are also alignment concerns.
In C, I believe that type punning via unions is legal, with similar qualifications.
Finally, note that under a strict reading of the c++ standard, foo* pf = (foo*)malloc(sizeof(foo)); is not a pointer to a foo even if foo was plain old data. You must create an object before interacting with it, and the only way to create an object outside of automatic storage is via new or placement new. This means you must have data of the target type before you memcpy into it.

Do you know how to convert from uint8x16_t to uint8x8x2_t?
uint8x16_t input = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
uint8x8x2_t output = { vget_low_u8(input), vget_high_u8(input) };
One must understand that with neon intrinsics, uint8x16_t represents a 16-byte register; while uint8x8x2_t represents two adjacent 8-byte registers. For ARMv7 these may be the same thing (q0 == {d0, d1}) but for ARMv8 the register layout is different. It's necessary to get (extract) the low 8 bytes and the high 8 bytes of the single 16-byte register using two functions. The clang compiler will determine which instruction(s) are necessary based on the context.

Using memcpy to copy an int into a char array and then printing its members: undefined behaviour?

Consider the following code:
int i = 1;
char c[sizeof (i)];
memcpy(c, &i, sizeof (i));
cout << static_cast<int>(c[0]);
Please ignore whether this is good code. I know the output depends on the endianness of the system. This is only an academic question.
Is this code:
Undefined behaviour
Implementation-defined behaviour
Well-defined behaviour
Something else

The language does not say that doing this is immediately undefined behavior. It simply says that the representation of c[0] might end up being invalid (trap) representation, in which case the behavior is indeed undefined. But in cases when c[0] is not a trap representation, the behavior is implementation-defined.
If you use unsigned char array, trap representation becomes impossible and behavior becomes purely implementation-defined.

The rule you are looking for is 3.9p4:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that
hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
So if you use unsigned char, you do get implementation-defined behavior (any conforming implementation must give you a guarantee on what that behavior is).
Reading through char is also legal, but then the values are unspecified. You are however guaranteed that using unqualified char will preserve the value (therefore bare char cannot have trap representations or padding bits), according to 3.9p2:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
("unspecified" values are a bit weaker than "implementation-defined" values -- the semantics are the same but the platform is not required to document what the values are.)

It is clearly implementation defined behaviour.
The internal representation of an int is not defined by the standard (implementations can choose little or big endian or whatever else), so it cannot be well defined behaviour : the result is allowed to be different on different architectures.
On a defined system (architecture and C compiler and (eventually) configuration) the behaviour is perfectly determined : on a big endian, you will get a 1, on a little endian a 0. So it is implementation defined behaviour.

Form 16 bit words from a struct

So I'm working on creating a ICMPv4 echo request and decided to roll my own struct to hold the packet. To make identifying the packet easy to identify in wireshark, I decided to put abcde into the data field.
struct icmpPacket{
u_int8_t icmp_type:8, icmp_code:8;
u_int16_t icmp_checksum:16, icmp_id:16, icmp_seqnum:16;
char icmp_data[6]; //cheat a little bit, set the field just large enough to store "abcde";
} __attribute__((aligned (16))) icmppckt; // icmp has an 8 byte header + 6 bytes of data
What I'm getting stuck on is how to make the compiler read the struct out as a series of 16 bit word

The standard-compliant way to do this is via memcpy:
icmpPacket packet = { /* ... */ };
uint16_t buf[sizeof(icmpPacket) / sizeof(uint16_t)];
memcpy(buf, &packet, sizeof(icmpPacket));
/* Now use buf */
Modern compilers are clever enough to optimize this appropriately, without actually doing a function call. See examples with clang and g++).
A common compiler extension allows you to use unions, though this is undefined behavior under the C++ standard:
union packet_view{
icmpPacket packet;
uint16_t buf[sizeof(icmpPacket) / sizeof(uint16_t)];
};
icmpPacket packet = { /* ... */ };
packet_view view;
view.packet = packet;
/* Now read from view.buf. This is technically UB in C++ but most compilers define it. */
Using a reinterpret_cast<uint16_t*>(&packet) or its C equivalent would break strict aliasing rules and result in undefined behavior. §3.10 [basic.lval]/p10 of the C++ standard:
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including,
recursively, an element or non-static data member of a subaggregate or
contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Similarly, §6.5/p7 of C11 says:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.

you can use 16 bit pointers for that
but yout need to add aligning to 1 Byte of the structure elements !!!
in C++ you can do it like this:
#pragma pack(1)
struct icmpPacket
{
u_int8_t icmp_type:8, icmp_code:8;
u_int16_t icmp_checksum:16, icmp_id:16, icmp_seqnum:16;
char icmp_data[6]; //cheat a little bit, set the field just large enough to store "abcde";
} icmppckt; // icmp has an 8 byte header + 6 bytes of data
WORD *picmppckt16=(WORD*)((void*)&icmppckt);
#pragma pack()
change WORD to 16 bit data type your compiler knows ...

When and how is conversion to char pointer allowed?

We can look at the representation of an object of type T by converting a T* that points at that object into a char*. At least in practice:
int x = 511;
unsigned char* cp = (unsigned char*)&x;
std::cout << std::hex << std::setfill('0');
for (int i = 0; i < sizeof(int); i++) {
std::cout << std::setw(2) << (int)cp[i] << ' ';
}
This outputs the representation of 511 on my system: ff 01 00 00.
There is (surely) some implementation defined behaviour occurring here. Which of the casts is allowing me to convert an int* to an unsigned char* and which conversions does that cast entail? Am I invoking undefined behaviour as soon as I cast? Can I cast any T* type like this? What can I rely on when doing this?

Which of the casts is allowing me to convert an int* to an unsigned char*?
That C-style cast in this case is the same as reinterpret_cast<unsigned char*>.
Can I cast any T* type like this?
Yes and no. The yes part: You can safely cast any pointer type to a char* or unsigned char* (with the appropriate const and/or volatile qualifiers). The result is implementation-defined, but it is legal.
The no part: The standard explicitly allows char* and unsigned char* as the target type. However, you cannot (for example) safely cast a double* to an int*. Do this and you've crossed the boundary from implementation-defined behavior to undefined behavior. It violates the strict aliasing rule.

Your cast maps to:
unsigned char* cp = reinterpret_cast<unsigned char*>(&x);
The underlying representation of an int is implementation defined, and viewing it as characters allows you to examine that. In your case, it is 32-bit little endian.
There is nothing special here -- this method of examining the internal representation is valid for any data type.
C++03 5.2.10.7: A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
This suggests that the cast results in unspecified behavior. But pragmatically speaking, casting from any pointer type to char* will always allow you to examine (and modify) the internal representation of the referenced object.

The C-style cast in this case is equivalent to reinterpret_cast. The Standard describes the semantics in 5.2.10. Specifically, in paragraph 7:
"A pointer to an object can be explicitly converted to a pointer to a
different object type.70 When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cvT2”, the result is
static_cast<cvT2*>(static_cast<cvvoid*>(v)) if both T1 and T2 are
standard-layout types (3.9) and the alignment requirements of T2 are
no stricter than those of T1. Converting a prvalue of type “pointer to
T1” to the type “pointer to T2” (where T1 and T2 are object types and
where the alignment requirements of T2 are no stricter than those of
T1) and back to its original type yields the original pointer value.
The result of any other such pointer conversion is unspecified."
What it means in your case, the alignment requirements are satisfied, and the result is unspecified.

The implementation behaviour in your example is the endianness attribute of your system, in this case your CPU is a little endian.
About the type casting, when you cast an int* to char* all what you are doing is telling the compiler to interpret what cp is pointing to as a char, so it will read the first byte only and interpret it as a character.

The cast between pointers are themselves always possible since all pointers are nothing more than memory addresses and whatever type, in memory, can always be thought as a sequence of bytes.
But -of course- the way the sequence is formed depends on how the decomposed type is represented in memory, and that's out of the scope of the C++ specifications.
That said, unless of very pathological cases, you can expect that representation to be the same on all the code produced by a same compiler for all the machines of a same platform (or family), and you should not expect same results on different platforms.
In general one thing to avoid is to express the relation between type sizes as "predefined":
in your sample you assume sizeof(int) == 4*sizeof(char): that's not necessarily always true.
But it is always true that sizeof(T) = N*sizeof(char), hence whatever T can always be seen as a integer number of char-s

Unless you have a cast operator, then a cast is simply telling to "see" that memory area in a different way. Nothing really fancy, I would say.
Then, you are reading the memory area byte-by-byte; as long as you do not change it, it is just fine. Of course, the result of what you see depends a lot from the platform: think about endianness, word size, padding, and so on.

Just reverse the byte order then it becomes
00 00 01 ff
Which is 256 (01) + 255 (ff) = 511
This is because your platfom is little endian.

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Note: This question has been renamed and reduced to make it more focused and readable. Most of the comments refer to the old text.
According to the standard, objects of different type may not share the same memory location. So this would not be legal:
std::array<short, 4> shorts;
int* i = reinterpret_cast<int*>(shorts.data()); // Not OK
The standard, however, allows an exception to this rule: any object may be accessed through a pointer to char or unsigned char:
int i = 0;
char * c = reinterpret_cast<char*>(&i); // OK
However, it is not clear to me whether this is also allowed the other way around. For example:
char * c = read_socket(...);
unsigned * u = reinterpret_cast<unsigned*>(c); // huh?

Some of your code is questionable due to the pointer conversions involved. Keep in mind that in those instances reinterpret_cast<T*>(e) has the semantics of static_cast<T*>(static_cast<void*>(e)) because the types that are involved are standard-layout. (I would in fact recommend that you always use static_cast via cv void* when dealing with storage.)
A close reading of the Standard suggests that during a pointer conversion to or from T* it is assumed that there really is an actual object T* involved -- which is hard to fulfill in some of your snippet, even when 'cheating' thanks to the triviality of types involved (more on this later). That would be besides the point however because...
Aliasing is not about pointer conversions. This is the C++11 text that outlines the rules that are commonly referred to as 'strict aliasing' rules, from 3.10 Lvalues and rvalues [basic.lval]:
10 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
(This is paragraph 15 of the same clause and subclause in C++03, with some minor changes in the text with e.g. 'lvalue' being used instead of 'glvalue' since the latter is a C++11 notion.)
In the light of those rules, let's assume that an implementation provides us with magic_cast<T*>(p) which 'somehow' converts a pointer to another pointer type. Normally this would be reinterpret_cast, which yields unspecified results in some cases, but as I've explained before this is not so for pointers to standard-layout types. Then it's plainly true that all of your snippets are correct (substituting reinterpret_cast with magic_cast), because no glvalues are involved whatsoever with the results of magic_cast.
Here is a snippet that appears to incorrectly use magic_cast, but which I will argue is correct:
// assume constexpr max
constexpr auto alignment = max(alignof(int), alignof(short));
alignas(alignment) char c[sizeof(int)];
// I'm assuming here that the OP really meant to use &c and not c
// this is, however, inconsequential
auto p = magic_cast<int*>(&c);
*p = 42;
*magic_cast<short*>(p) = 42;
To justify my reasoning, assume this superficially different snippet:
// alignment same as before
alignas(alignment) char c[sizeof(int)];
auto p = magic_cast<int*>(&c);
// end lifetime of c
c.~decltype(c)();
// reuse storage to construct new int object
new (&c) int;
*p = 42;
auto q = magic_cast<short*>(p);
// end lifetime of int object
p->~decltype(0)();
// reuse storage again
new (p) short;
*q = 42;
This snippet is carefully constructed. In particular, in new (&c) int; I'm allowed to use &c even though c was destroyed due to the rules laid out in paragraph 5 of 3.8 Object lifetime [basic.life]. Paragraph 6 of same gives very similar rules to references to storage, and paragraph 7 explains what happens to variables, pointers and references that used to refer to an object once its storage is reused -- I will refer collectively to those as 3.8/5-7.
In this instance &c is (implicitly) converted to void*, which is one of the correct use of a pointer to storage that has not been yet reused. Similarly p is obtained from &c before the new int is constructed. Its definition could perhaps be moved to after the destruction of c, depending on how deep the implementation magic is, but certainly not after the int construction: paragraph 7 would apply and this is not one of the allowed situations. The construction of the short object also relies on p becoming a pointer to storage.
Now, because int and short are trivial types, I don't have to use the explicit calls to destructors. I don't need the explicit calls to the constructors, either (that is to say, the calls to the usual, Standard placement new declared in <new>). From 3.8 Object lifetime [basic.life]:
1 [...] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
This means that I can rewrite the code such that, after folding the intermediate variable q, I end up with the original snippet.
Do note that p cannot be folded away. That is to say, the following is defintively incorrect:
alignas(alignment) char c[sizeof(int)];
*magic_cast<int*>(&c) = 42;
*magic_cast<short*>(&c) = 42;
If we assume that an int object is (trivially) constructed with the second line, then that must mean &c becomes a pointer to storage that has been reused. Thus the third line is incorrect -- although due to 3.8/5-7 and not due to aliasing rules strictly speaking.
If we don't assume that, then the second line is a violation of aliasing rules: we're reading what is actually a char c[sizeof(int)] object through a glvalue of type int, which is not one of the allowed exception. By comparison, *magic_cast<unsigned char>(&c) = 42; would be fine (we would assume a short object is trivially constructed on the third line).
Just like Alf, I would also recommend that you explicitly make use of the Standard placement new when using storage. Skipping destruction for trivial types is fine, but when encountering *some_magic_pointer = foo; you're very much likely facing either a violation of 3.8/5-7 (no matter how magically that pointer was obtained) or of the aliasing rules. This means storing the result of the new expression, too, since you most likely can't reuse the magic pointer once your object is constructed -- due to 3.8/5-7 again.
Reading the bytes of an object (this means using char or unsigned char) is fine however, and you don't even to use reinterpret_cast or anything magic at all. static_cast via cv void* is arguably fine for the job (although I do feel like the Standard could use some better wording there).

This too:
// valid: char -> type
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
That is not correct. The aliasing rules state under which circumstances it is legal/illegal to access an object through an lvalue of a different type. There is an specific rule that says that you can access any object through a pointer of type char or unsigned char, so the first case is correct. That is, A => B does not necessarily mean B => A. You can access an int through a pointer to char, but you cannot access a char through a pointer to int.
For the benefit of Alf:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.

Regarding the validity of …
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
The reinterpret_cast itself is OK or not, in the sense of producing a useful pointer value, depending on the compiler. And in this example the result isn't used, in particular, the character array isn't accessed. So there is not much more that can be said about the example as-is: it just depends.
But let's consider an extended version that does touch on the aliasing rules:
void foo( char* );
alignas(int) char c[sizeof( int )];
foo( c );
int* p = reinterpret_cast<int*>( c );
cout << *p << endl;
And let's only consider the case where the compiler guarantees a useful pointer value, one that would place the pointee in the same bytes of memory (the reason that this depends on the compiler is that the standard, in §5.2.10/7, only guarantees it for pointer conversions where the types are alignment-compatible, and otherwise leave it as "unspecified" (but then, the whole of §5.2.10 is somewhat inconsistent with §9.2/18).
Now, one interpretation of the standard's §3.10/10, the so called "strict aliasing" clause (but note that the standard does not ever use the term "strict aliasing"),
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
is that, as it itself says, concerns the dynamic type of the object residing in the c bytes.
With that interpretation, the read operation on *p is OK if foo has placed an int object there, and otherwise not. So in this case, a char array is accessed via an int* pointer. And nobody is in any doubt that the other way is valid: even though foo may have placed an int object in those bytes, you can freely access that object as a sequence of char values, by the last dash of §3.10/10.
So with this (usual) interpretation, after foo has placed an int there, we can access it as char objects, so at least one char object exists within the memory region named c; and we can access it as int, so at least that one int exists there also; and so David’s assertion in another answer that char objects cannot be accessed as int, is incompatible with this usual interpretation.
David's assertion is also incompatible with the most common use of placement new.
Regarding what other possible interpretations there are, that perhaps could be compatible with David's assertion, well, I can't think of any that make sense.
So in conclusion, as far as the Holy Standard is concerned, merely casting oneself a T* pointer to the array is practically useful or not depending on the compiler, and accessing the pointed to could-be-value is valid or not depending on what's present. In particular, think of a trap representation of int: you would not want that blowing up on you, if the bitpattern happened to be that. So to be safe you have to know what's in there, the bits, and as the call to foo above illustrates the compiler can in general not know that, like, the g++ compiler's strict alignment-based optimizer can in general not know that…

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Conversion between short* to int* - c++

Assuming short is 2 bytes and int is 4 bytes on a 32 bit OS. Is the following an undefined behavior? short s = 42; int p = (int)(&s);

Related

ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

Using memcpy to copy an int into a char array and then printing its members: undefined behaviour?

Form 16 bit words from a struct

When and how is conversion to char pointer allowed?

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Conversion between short* to int* - c++

Assuming short is 2 bytes and int is 4 bytes on a 32 bit OS. Is the following an undefined behavior? short s = 42; int *p = (int*)(&s);

Related

ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

Using memcpy to copy an int into a char array and then printing its members: undefined behaviour?

Form 16 bit words from a struct

When and how is conversion to char pointer allowed?

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Categories

Resources

Assuming short is 2 bytes and int is 4 bytes on a 32 bit OS. Is the following an undefined behavior? short s = 42; int p = (int)(&s);