QByteArray access and strict aliasing [duplicate] - c++

The accepted answer to What is the strict aliasing rule? mentions that you can use char * to alias another type but not the other way.
It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?
It does, but that's not the point.
The point is that if you have one or more struct somethings then you may use a char* to read their constituent bytes, but if you have one or more chars then you may not use a struct something* to read them.

The wording in the referenced answer is slightly erroneous, so let’s get that ironed out first:
One object never aliases another object, but two pointers can “alias” the same object (meaning, the pointers point to the same memory location — as M.M. pointed out, this is still not 100% correct wording but you get the idea). Also, the standard itself doesn’t (to the best of my knowledge) actually talk about strict aliasing at all, but merely lays out rules that govern through which kinds of expressions an object may be accessed or not. Compiler flags like -fno-strict-aliasing tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not.
Now to your question: Any object can be accessed through a pointer to char, but a char object (especially a char array) may not be accessed through most other pointer types.
Based on that, the compiler is required to make the following assumptions:
If the type of the actual object itself is not known, both char* and T* pointers may always point to the same object (alias each other) — symmetric relationship.
If types T1 and T2 are not “related” and neither is char, then T1* and T2* may never point to the same object — symmetric relationship.
A char* pointer may point to a char object or an object of any type T.
A T* pointer may not point to a char object — asymmetric relationship.
I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of, e.g., an int.
So, even without compiler optimizations based on the strict aliasing rule, writing an int to the location of a 4-byte char array at addresses 0x1, 0x2, 0x3, 0x4, for instance, will — in the best case — result in poor performance and — in the worst case — access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0, 0x1, 0x2, and 0x3).
Please also be aware that the meaning of “related” differs between C and C++, but that is not relevant to your question.

if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?
Pointers don't alias each other; that's sloppy use of language. Aliasing is when an lvalue is used to access an object of a different type. (Dereferencing a pointer gives an lvalue).
In your example, what's important is the type of the object being aliased. For a concrete example let's say that the object is a double. Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this. However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double!).
If the compiler is looking at a function which takes char * and struct something *, and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage); then it would have to allow for the possibility that the object might actually be a struct something *, so no optimization could be done inside this function.

Many aspects of the C++ Standard are derived from the C Standard, which needs to be understood in the historical context when it was written. If the C Standard were being written to describe a new language which included type-based aliasing, rather than describing an existing language which was designed around the idea that accesses to lvalues were accesses to bit patterns stored in memory, there would be no reason to give any kind of privileged status to the type used for storing characters in a string. Having explicit operations to treat regions of storage as bit patterns would allow optimizations to be simultaneously more effective and safer. Had the C Standard been written in such fashion, the C++ Standard presumably would have been likewise.
As it is, however, the Standard was written to describe a language in which a very common idiom was to copy the values of objects by copying all of the bytes thereof, and the authors of the Standard wanted to allow such constructs to be usable within portable programs.
Further, the authors of the Standard intended that implementations process many non-portable constructs "in a documented manner characteristic of the environment" in cases where doing so would be useful, but waived jurisdiction over when that should happen, since compiler writers were expected to understand their customers' and prospective customers' needs far better than the Committee ever could.
Suppose that in one compilation unit, one has the function:
void copy_thing(char *dest, char *src, int size)
{
while(size--)
*(char volatile *)(dest++) = *(char volatile*)(src++);
}
and in another compilation unit:
float f1,f2;
float test(void)
{
f1 = 1.0f;
f2 = 2.0f;
copy_thing((char*)&f2, (char*)&f1, sizeof f1);
return f2;
}
I think there would have been a consensus among Committee members that no quality implementation should treat the fact that copy_thing never writes to an object of type float as an invitation to assume that the return value will always be 2.0f. There are many things about the above code that should prevent or discourage an implementation from consolidating the read of f2 with the preceding write, with or without a special rule regarding character types, but different implementations would have different reasons for their forfearance.
It would be difficult to describe a set of rules which would require that all implementations process the above code correctly without blocking some existing or plausible implementations from implementing what would otherwise be useful optimizations. An implementation that treated all inter-module calls as opaque would handle such code correctly even if it was oblivious to the fact that a cast from T1 to T2 is a sign that an access to a T2 may affect a T1, or the fact that a volatile access might affect other objects in ways a compiler shouldn't expect to understand. An implementation that performed cross-module in-lining and was oblivious to the implications of typecasts or volatile would process such code correctly if it refrained from making any aliasing assumptions about accesses via character pointers.
The Committee wanted to recognize something in the above construct that compilers would be required to recognize as implying that f2 might be modified, since the alternative would be to view such a construct as Undefined Behavior despite the fact that it should be usable within portable programs. The fact that they chose the fact that the access was made via character pointer was the aspect that forced the issue was never intended to imply that compilers be oblivious to everything else, even though unfortunately some compiler writers interpret the Standard as an invitation to do just that.

Related

How does binary I/O of POD types not break the aliasing rules?

Twenty plus years ago, I would have (and didn't) think anything of doing binary I/O with POD structs:
struct S { std::uint32_t x; std::uint16_t y; };
S s;
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
(I'm ignoring padding and byte order issues, because they're not part of what I am asking about.)
"Obviously", we can read into s and the compiler is required to assume that the contents of s.x and s.y are aliases by read(). So, s.x after the read() isn't undefined behaviour (because s was uninitialized).
Likewise in the case of
S s = { 1, 2 };
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
the compiler can't presume that s.x is still 1 after the read().
Fast forward to the modern world, where we actually have to follow the aliasing rules and avoid undefined behaviour, and so on, and I have been unable to prove to myself that this is allowed.
In C++14, for example, [basic.types] ¶2 says:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array
of char or unsigned char.
42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
¶4 says:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by
the object of type T, where N equals sizeof(T).
[basic.lval] ¶10 says:
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:54
...
— a char or unsigned char type.
54 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Taken together, I think that this is the standard saying that "you can form an unsigned char or char pointer to any trivially copyable (and thus POD) type and read or write its bytes". In fact, in N2342, which gave us the modern wording, the introductory table says:
Programs can safely apply coding optimizations, particularly std::memcpy.
and later:
Yet the only data member in the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.
With the proposed resolution, the class can be made into a POD by making the default constructor trivial (with N2210 the syntax would be endian()=default), resolving all the issues.
It really sounds like N2342 is trying to say "we need to update the wording to make it so you can do I/O like read() and write() for these types", and it really seems like the updated wording was made standard.
Also, I often hear reference to "the std::memcpy() hole" or similar where you can use std::memcpy() to basically "allow aliasing". But the standard doesn't seem to call out std::memcpy() specifically (and in fact in one footnote mentions it along with std::memmove() and calls it an "example" of a way to do this).
Plus, there's the fact that I/O functions like read() tend to be OS-specific from POSIX and thus aren't discussed in the standard.
So, with all this in mind, my questions are:
What actually guarantees that we can do real-world I/O of POD structs (as shown above)?
Do we actually need to need to std::memcpy() the content into and out of unsigned char buffers (surely not) or can we directly read into the POD types?
Do the OS I/O functions "promise" that they manipulate the underlying memory "as if by reading or writing unsigned char values" or "as if by std::memcpy()"?
What concerns should I have when there are layers (such as Asio) between me and the raw I/O functions?
Strict aliasing is about accessing an object through a pointer/reference to a type other than that object's actual type. However, the rules of strict aliasing permit accessing any object of any type through a pointer to an array of bytes. And this rule has been around for at least since C++14.
Now, that doesn't mean much, since something has to define what such an access means. For that (in terms of writing), we only really have two rules: [basic.types]/2 and /3, which cover copying the bytes of Trivially Copyable types. The question ultimately boils down to this:
Are you reading the "the underlying bytes making up [an] object" from the file?
If the data you're reading into your s was in fact copied from the bytes of a live instance of S, then you're 100% fine. It's clear from the standard that performing fwrite writes the given bytes to a file, and performing fread reads those bytes from the file. Therefore, if you write the bytes of an existing S instance to a file, and read those written bytes to an existing S, you have perform the equivalent of copying those bytes.
Where you run into technical issues is when you start getting into the weeds of interpretation. It is reasonable to interpret the standard as defining the behavior of such a program even when the writing and the reading happen in different invocations of the same program.
Concerns arise in one of two cases:
1: When the program which wrote the data is actually a different program than the one who read it.
2: When the program which wrote the data did not actually write an object of type S, but instead wrote bytes that just so happen to be legitimately interpret-able as an S.
The standard doesn't govern interoperability between two programs. However, C++20 does provide a tool that effectively says "if the bytes in this memory contain a legitimate object representation of a T, then I'll return a copy of what that object would look like." It's called std::bit_cast; you can pass it an array of bytes of sizeof(T), and it'll return a copy of that T.
And you get undefined behavior if you're a liar. And bit_cast doesn't even compile if T is not trivially copyable.
However, to do a byte copy directly into a live S from a source that wasn't technically an S but totally could be an S, is a different matter. There isn't wording in the standard to make that work.
Our friend P0593 proposes a mechanism for explicitly declaring such an assumption, but it didn't quite make it into C++20.
The type-access rules in every version of the C and C++ Standard to date are based upon the C89 rules, which were written with the presumption that implementations intended for various tasks would uphold the Spirit of C principle described in the published Rationale as "Don't prevent [or otherwise interfere with] the programmer from doing what needs to be done [to accomplish those tasks]." The authors of C89 would have seen no reason to worry about whether or not the rules as written actually required that compilers support constructs that everyone would agree that they should (e.g. allocating storage via malloc, passing it to fread, and then using it as a standard layout structure type) since they would expect such constructs to be supported on any compiler whose customers would need them, without regard for whether or not the rules as written actually required such support.
There are many situations where constructs which should "obviously" work, actually invoke UB, because e.g. the authors of the Standard saw no need to worry about whether the rules would e.g. forbid a compiler given the code:
struct S {int dat[10]; } x,y;
void test(int i)
{
y = x;
y.dat[i] = 1; /// Equivalent to *(y.dat+i) = 1;
x = y;
}
from assuming that object y of type struct S could not possibly be accessed by the dereferenced int* on the marked line(*), and thus need not be copied back to object x. For a compiler to make such an assumption when it can see that the pointer is derived from a struct S would have been universally recognized as obtuse regardless of whether or not the Standard would forbid it, but the question of exactly when a compiler should be expected "see" how a pointer was produced was a Quality of Implementation issue outside the Standard's jurisdiction.
(*) In fact, the rules as written would allow a compiler to make such an assumption, since the only types of lvalue that may be used to access a struct S would be that structure type, qualified versions of it, types derived from it, or character types.
It's sufficiently obvious that functions like fread() should be usable on standard-layout structures that quality compilers will generally support such usage without regard for whether the Standard would actually require them to do so. Moving such questions from Quality of Implementation issues to actual conformance issues would require adopting new terminology to describe what a statement like int *p = x.dat+3; does with the stored value of x [it should cause it to be accessible via p under at least some circumstances], and more importantly would require that the Standard itself affirm a point which is currently relegated to the published Rationale--that it is not intended to say anything bad about code which will only run on implementations that are suitable for its purpose, nor to say anything good about implementations which, although conforming, aren't suitable for their claimed purposes.

What is the purpose of std::launder?

P0137 introduces the function template std::launder and makes many, many changes to the standard in the sections concerning unions, lifetime, and pointers.
What is the problem this paper is solving? What are the changes to the language that I have to be aware of? And what are we laundering?
std::launder is aptly named, though only if you know what it's for. It performs memory laundering.
Consider the example in the paper:
struct X { const int n; };
union U { X x; float f; };
...
U u = {{ 1 }};
That statement performs aggregate initialization, initializing the first member of U with {1}.
Because n is a const variable, the compiler is free to assume that u.x.n shall always be 1.
So what happens if we do this:
X *p = new (&u.x) X {2};
Because X is trivial, we need not destroy the old object before creating a new one in its place, so this is perfectly legal code. The new object will have its n member be 2.
So tell me... what will u.x.n return?
The obvious answer will be 2. But that's wrong, because the compiler is allowed to assume that a truly const variable (not merely a const&, but an object variable declared const) will never change. But we just changed it.
[basic.life]/8 spells out the circumstances when it is OK to access the newly created object through variables/pointers/references to the old one. And having a const member is one of the disqualifying factors.
So... how can we talk about u.x.n properly?
We have to launder our memory:
assert(*std::launder(&u.x.n) == 2); //Will be true.
Money laundering is used to prevent people from tracing where you got your money from. Memory laundering is used to prevent the compiler from tracing where you got your object from, thus forcing it to avoid any optimizations that may no longer apply.
Another of the disqualifying factors is if you change the type of the object. std::launder can help here too:
alignas(int) char data[sizeof(int)];
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
[basic.life]/8 tells us that, if you allocate a new object in the storage of the old one, you cannot access the new object through pointers to the old. launder allows us to side-step that.
std::launder is a mis-nomer. This function performs the opposite of laundering: It soils the pointed-to memory, to remove any expectation the compiler might have regarding the pointed-to value. It precludes any compiler optimizations based on such expectations.
Thus in #NicolBolas' answer, the compiler might be assuming that some memory holds some constant value; or is uninitialized. You're telling the compiler: "That place is (now) soiled, don't make that assumption".
If you're wondering why the compiler would always stick to its naive expectations in the first place, and would need to you to conspicuously soil things for it - you might want to read this discussion:
Why introduce `std::launder` rather than have the compiler take care of it?
... which led me to this view of what std::launder means.
I think there are two purposes of std::launder.
A barrier for constant folding/propagation, including devirtualization.
A barrier for fine-grained object-structure-based alias analysis.
Barrier for overaggressive constant folding/propagation (abandoned)
Historically, the C++ standard allowed compilers to assume that the value of a const-qualified or reference non-static data member obtained in some ways to be immutable, even if its containing object is non-const and may be reused by placement new.
In C++17/P0137R1, std::launder is introduced as a functionality that disables the aforementioned (mis-)optimization (CWG 1776), which is needed for std::optional. And as discussed in P0532R0, portable implementations of std::vector and std::deque may also need std::launder, even if they are C++98 components.
Fortunately, such (mis-)optimization is forbidden by RU007 (included in P1971R0 and C++20). AFAIK there's no compiler performing this (mis-)optimization.
Barrier for devirtualization
A virtual table pointer (vptr) can be considered constant during the lifetime of its containing polymorphic object, which is needed for devirtualization. Given that vptr is not non-static data member, compilers is still allowed to perform devirtualization based on the assumption that the vptr is not changed (i.e., either the object is still in its lifetime, or it is reused by a new object of the same dynamic type) in some cases.
For some unusual uses that replace a polymorphic object with a new object of different dynamic type (shown here), std::launder is needed as a barrier for devirtualization.
IIUC Clang implemented std::launder (__builtin_launder) with these semantics (LLVM-D40218).
Barrier for object-structure-based alias analysis
P0137R1 also changes the C++ object model by introducing pointer-interconvertibility. IIUC such change enables some "object-structure-based alias analysis" proposed in N4303.
As a result, P0137R1 makes the direct use of dereferencing a reinterpret_cast'd pointer from an unsigned char [N] array undefined, even if the array is providing storage for another object of correct type. And then std::launder is needed for access to the nested object.
This kind of alias analysis seems overaggressive and may break many useful code bases. AFAIK it's currently not implemented by any compiler.
Relation to type-based alias analysis/strict aliasing
IIUC std::launder and type-based alias analysis/strict aliasing are unrelated. std::launder requires that an living object of correct type to be at the provided address.
However, it seems that they are accidently made related in Clang (LLVM-D47607).

Does type aliasing issue exist only when pointers are passed to functions as arguments?

As far as I know, when two pointers (or references) do not type alias each other, it is legal to for the compiler to make the assumption that they address different locations and to make certain optimizations thereof, e.g., reordering instructions. Therefore, having pointers to different types to have the same value may be problematic. However, I think this issue only applies when the two pointers are passed to functions. Within the function body where the two pointers are created, the compiler should be able to make sure the relationship between them as to whether they address the same location. Am I right?
As far as I know, when two pointers (or references) do not type alias
each other, it is legal to for the compiler to make the assumption
that they address different locations and to make certain
optimizations thereof, e.g., reordering instructions.
Correct. GCC, for example, does perform optimizations of this form which can be disabled by passing the flag -fno-strict-aliasing.
However, I think this issue only applies when the two pointers are
passed to functions. Within the function body where the two pointers
are created, the compiler should be able to make sure the relationship
between them as to whether they address the same location. Am I right?
The standard doesn't distinguish between where those pointers came from. If your operation has undefined behavior, the program has undefined behavior, period. The compiler is in no way obliged to analyze the operands at compile time, but he may give you a warning.
Implementations which are designed and intended to be suitable for low-level programming should have no particular difficulty recognizing common patterns where storage of one type is reused or reinterpreted as another in situations not involving aliasing, provided that:
Within any particular function or loop, all pointers or lvalues used to access a particular piece of storage are derived from lvalues of a common type which identify the same object or elements of the same array, and
Between the creation of a derived-type pointer and the last use of it or any pointer derived from it, all operations involving the storage are performed only using the derived pointer or other pointers derived from it.
Most low-level programming scenarios requiring reuse or reinterpretation of storage fit these criteria, and handling code that fits these criteria will typically be rather straightforward in an implementation designed for low-level programming. If an implementation cache lvalues in registers and performs loop hoisting, for example, it could support the above semantics reasonably efficiently by flushing all cached values of type T whenever T or T* is used to form a pointer or lvalue of another type. Such an approach may be optimal, but would degrade performance much less than having to block all type-based optimizations entirely.
Note that it is probably in many cases not worthwhile for even an implementation intended for low-level programming to try to handle all possible scenarios involving aliasing. Doing that would be much more expensive than handling the far more common scenarios that don't involve aliasing.
Implementations which are specialized for other purposes are, of course, not required to make any attempt whatsoever to support any exceptions to 6.5p7--not even those that are often treated as part of the Standard. Whether such an implementation should be able to support such constructs would depend upon the particular purposes for which it is designed.

Strict aliasing rule and 'char *' pointers

The accepted answer to What is the strict aliasing rule? mentions that you can use char * to alias another type but not the other way.
It doesn't make sense to me — if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?
if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?
It does, but that's not the point.
The point is that if you have one or more struct somethings then you may use a char* to read their constituent bytes, but if you have one or more chars then you may not use a struct something* to read them.
The wording in the referenced answer is slightly erroneous, so let’s get that ironed out first:
One object never aliases another object, but two pointers can “alias” the same object (meaning, the pointers point to the same memory location — as M.M. pointed out, this is still not 100% correct wording but you get the idea). Also, the standard itself doesn’t (to the best of my knowledge) actually talk about strict aliasing at all, but merely lays out rules that govern through which kinds of expressions an object may be accessed or not. Compiler flags like -fno-strict-aliasing tell the compiler whether it can assume the programmer followed those rules (so it can perform optimizations based on that assumption) or not.
Now to your question: Any object can be accessed through a pointer to char, but a char object (especially a char array) may not be accessed through most other pointer types.
Based on that, the compiler is required to make the following assumptions:
If the type of the actual object itself is not known, both char* and T* pointers may always point to the same object (alias each other) — symmetric relationship.
If types T1 and T2 are not “related” and neither is char, then T1* and T2* may never point to the same object — symmetric relationship.
A char* pointer may point to a char object or an object of any type T.
A T* pointer may not point to a char object — asymmetric relationship.
I believe, the main rationale behind the asymmetric rules about accessing object through pointers is that a char array might not satisfy the alignment requirements of, e.g., an int.
So, even without compiler optimizations based on the strict aliasing rule, writing an int to the location of a 4-byte char array at addresses 0x1, 0x2, 0x3, 0x4, for instance, will — in the best case — result in poor performance and — in the worst case — access a different memory location, because the CPU instructions might ignore the lowest two address bits when writing a 4-byte value (so here this might result in a write to 0x0, 0x1, 0x2, and 0x3).
Please also be aware that the meaning of “related” differs between C and C++, but that is not relevant to your question.
if we have two pointers, one of type char * and another of type struct something * pointing to the same location, how is it possible that the first aliases the second but the second doesn't alias the first?
Pointers don't alias each other; that's sloppy use of language. Aliasing is when an lvalue is used to access an object of a different type. (Dereferencing a pointer gives an lvalue).
In your example, what's important is the type of the object being aliased. For a concrete example let's say that the object is a double. Accessing the double by dereferencing a char * pointing at the double is fine because the strict aliasing rule permits this. However, accessing a double by dereferencing a struct something * is not permitted (unless, arguably, the struct starts with double!).
If the compiler is looking at a function which takes char * and struct something *, and it does not have available the information about the object being pointed to (this is actually unlikely as aliasing passes are done at a whole-program optimization stage); then it would have to allow for the possibility that the object might actually be a struct something *, so no optimization could be done inside this function.
Many aspects of the C++ Standard are derived from the C Standard, which needs to be understood in the historical context when it was written. If the C Standard were being written to describe a new language which included type-based aliasing, rather than describing an existing language which was designed around the idea that accesses to lvalues were accesses to bit patterns stored in memory, there would be no reason to give any kind of privileged status to the type used for storing characters in a string. Having explicit operations to treat regions of storage as bit patterns would allow optimizations to be simultaneously more effective and safer. Had the C Standard been written in such fashion, the C++ Standard presumably would have been likewise.
As it is, however, the Standard was written to describe a language in which a very common idiom was to copy the values of objects by copying all of the bytes thereof, and the authors of the Standard wanted to allow such constructs to be usable within portable programs.
Further, the authors of the Standard intended that implementations process many non-portable constructs "in a documented manner characteristic of the environment" in cases where doing so would be useful, but waived jurisdiction over when that should happen, since compiler writers were expected to understand their customers' and prospective customers' needs far better than the Committee ever could.
Suppose that in one compilation unit, one has the function:
void copy_thing(char *dest, char *src, int size)
{
while(size--)
*(char volatile *)(dest++) = *(char volatile*)(src++);
}
and in another compilation unit:
float f1,f2;
float test(void)
{
f1 = 1.0f;
f2 = 2.0f;
copy_thing((char*)&f2, (char*)&f1, sizeof f1);
return f2;
}
I think there would have been a consensus among Committee members that no quality implementation should treat the fact that copy_thing never writes to an object of type float as an invitation to assume that the return value will always be 2.0f. There are many things about the above code that should prevent or discourage an implementation from consolidating the read of f2 with the preceding write, with or without a special rule regarding character types, but different implementations would have different reasons for their forfearance.
It would be difficult to describe a set of rules which would require that all implementations process the above code correctly without blocking some existing or plausible implementations from implementing what would otherwise be useful optimizations. An implementation that treated all inter-module calls as opaque would handle such code correctly even if it was oblivious to the fact that a cast from T1 to T2 is a sign that an access to a T2 may affect a T1, or the fact that a volatile access might affect other objects in ways a compiler shouldn't expect to understand. An implementation that performed cross-module in-lining and was oblivious to the implications of typecasts or volatile would process such code correctly if it refrained from making any aliasing assumptions about accesses via character pointers.
The Committee wanted to recognize something in the above construct that compilers would be required to recognize as implying that f2 might be modified, since the alternative would be to view such a construct as Undefined Behavior despite the fact that it should be usable within portable programs. The fact that they chose the fact that the access was made via character pointer was the aspect that forced the issue was never intended to imply that compilers be oblivious to everything else, even though unfortunately some compiler writers interpret the Standard as an invitation to do just that.

Accessing buffer values via 2 differently typed pointers

I have two questions, a general one about pointer type-manipulation in general, and then one for a specific case I have.
What happens when you access a buffer of memory using pointers of different types?
In practice on many different compilers, it seems to work out as my brain would like to envision it. However, I sort-of know it's UB in many (if not all cases). For example:
typedef unsigned char byte;
struct color { /* stuff */};
std::vector<color> colors( 512 * 512 );
// pointer of one type
color* colordata = colors.data();
// pointer to another type?
byte* bytes = reinterpret_cast<byte*>( colordata );
// Proceed to read from (potentially write into)
// the "bytes" of the 512 * 512 heap array
The first question would be: Is there any point where doing this kind of conversion is legal/safe/standard-sanctioned?
The second question: spinning off the first, if you knew that the struct named color was defined as:
struct color { byte c[4]; };
Now, is it legal/safe/standard-sactioned? Read-safe? Read / Write safe? I'd like to know, as my intuition tells me that for these very simple structs, the above naughty pointer manipulation isn't that bad, or is it?
[ Reopen Reasons: ]
While the linked question about strict aliasing applies somewhat here, it is mostly about C. The one answer referencing the C++03 standard may be outdated when compared to the C++11 standard (unless absolutely nothing has changed). This question has a practical application and I and others would benefit from more answers. Finally, this question is very specific in asking whether it is not only read-safe, write-safe, or both (or neither, and in two different scenarios (PoD data where the underlying types match and a more general case of arbitrary internal data).
Both are legal.
Firstly, since byte is a typedef for unsigned char, it has a magical get-out-of-jail-free when it comes to strict aliasing. You can alias any type as char or one of it's signed or unsigned derivatives.
Secondly, it is entirely legal in both C and C++ for a struct to be cast to a pointer to the type of it's first element, as long as it meets certain guarantees like being POD. This means that
struct x {
int f;
};
int main() {
x var;
int* p = (int*)&var;
}
does not violate strict aliasing either, even without the getout clause used for char.
As has been stated in the comments: Accessing the same piece of memory as two different types is UB. So, that's the formal answer (note that "UB" does include "doing precisely what you would expect if you are a sane person reading the code" as well as "just about anything other than what a sane person reading the code would expect")
Having said that, it appears that all popular compilers tend to cope with this fairly well. It is not unusual to see these sort of constructs (in "good" production code - even if the code isn't strictly language-lawyer correct). However, you are at the mercy of the compiler "doing the right thing", and it's definitely a case where you may find compiler bugs if you stress things too harshly.
There are several reasons that the standard defines this as UB - the main one being that "different types of data may be stored in different memory" and "it can be hard for the compiler to figure out what is safe when someone is mucking about casting pointers to the same data with different types" - e.g. if we have a pointer to a 32-bit integer and another pointer to char, both pointing to the same address, when is it safe to read the integer value after the char value has been written. By defining it as UB, it's entirely up to the compiler vendor to decide how precisely they want to treat these conditions. If it was "defined" that this will work, compilers may not be viable for certain processor types (or code would become horribly slow due to the effect of the liberal sprinkling of "make sure partial memory writes have completed before I read" operations, even when those are generally not needed).
So, in summary: It will most likely work on most processors, but don't expect any language lawyer to approve of your code.