Generic char[] based storage and avoiding strict-aliasing related UB - c++

I'm trying to build a class template that packs a bunch of types in a suitably large char array, and allows access to the data as individual correctly typed references. Now, according to the standard this can lead to strict-aliasing violation, and hence undefined behavior, as we're accessing the char[] data via an object that is not compatible with it. Specifically, the standard states:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Given the wording of the highlighted bullet point, I came up with the following alias_cast idea:
#include <iostream>
#include <type_traits>
template <typename T>
T alias_cast(void *p) {
typedef typename std::remove_reference<T>::type BaseType;
union UT {
BaseType t;
};
return reinterpret_cast<UT*>(p)->t;
}
template <typename T, typename U>
class Data {
union {
long align_;
char data_[sizeof(T) + sizeof(U)];
};
public:
Data(T t = T(), U u = U()) { first() = t; second() = u; }
T& first() { return alias_cast<T&>(data_); }
U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};
int main() {
Data<int, unsigned short> test;
test.first() = 0xdead;
test.second() = 0xbeef;
std::cout << test.first() << ", " << test.second() << "\n";
return 0;
}
(The above test code, especially the Data class is just a dumbed-down demonstration of the idea, so please don't point out how I should use std::pair or std::tuple. The alias_cast template should also be extended to handle cv qualified types and it can only be safely used if the alignment requirements are met, but I hope this snippet is enough to demonstrate the idea.)
This trick silences the warnings by g++ (when compiled with g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing), and the code works, but is this really a valid way of telling the compiler to skip strict-aliasing based optimizations?
If it's not valid, then how would one go about implementing a char array based generic storage class like this without violating the aliasing rules?
Edit:
replacing the alias_cast with a simple reinterpret_cast like this:
T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }
produces the following warning when compiled with g++:
aliastest-so-1.cpp: In instantiation of ‘T& Data::first() [with
T = int; U = short unsigned int]’: aliastest-so-1.cpp:28:16:
required from here aliastest-so-1.cpp:21:58: warning: dereferencing
type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]

Using a union is almost never a good idea if you want to stick with strict conformance, they have stringent rules when it comes to reading the active member (and this one only). Although it has to be said that implementations like to use unions as hooks for reliable behaviour, and perhaps that is what you are after. If that is the case I defer to Mike Acton who has written a nice (and long) article on aliasing rules, where he does comment on casting through a union.
To the best of my knowledge this is how you should deal with arrays of char types as storage:
// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));
The reason this is defined to work is that T is the dynamic type of the object here. The storage was reused when the new expression created the T object, which operation implicitly ended the lifetime of storage (which happens trivially as unsigned char is a, well, trivial type).
You can still use e.g. storage[0] to read the bytes of the object as this is reading the object value through a glvalue of unsigned char type, one of the listed explicit exceptions. If on the other hand storage were of a different yet still trivial element type, you could still make the above snippet work but would not be able to do storage[0].
The final piece to make the snippet sensible is the pointer conversion. Note that reinterpret_cast is not suitable in the general case. It can be valid given that T is standard-layout (there are additional restrictions on alignment, too), but if that is the case then using reinterpret_cast would be equivalent to static_casting via void like I did. It makes more sense to use that form directly in the first place, especially considering the use of storage happens a lot in generic contexts. In any case converting to and from void is one of the standard conversions (with a well-defined meaning), and you want static_cast for those.
If you are worried at all about the pointer conversions (which is the weakest link in my opinion, and not the argument about storage reuse), then an alternative is to do
T* p = ::new (&storage) T;
which costs an additional pointer in storage if you want to keep track of it.
I heartily recommend the use of std::aligned_storage.

Related

Does reinterpret_casting std::aligned_storage* to T* without std::launder violate strict-aliasing rules? [duplicate]

This question already has answers here:
Does this really break strict-aliasing rules?
(3 answers)
Closed 5 years ago.
The following example comes from std::aligned_storage page of cppreference.com:
#include <iostream>
#include <type_traits>
#include <string>
template<class T, std::size_t N>
class static_vector
{
// properly aligned uninitialized storage for N T's
typename std::aligned_storage<sizeof(T), alignof(T)>::type data[N];
std::size_t m_size = 0;
public:
// Create an object in aligned storage
template<typename ...Args> void emplace_back(Args&&... args)
{
if( m_size >= N ) // possible error handling
throw std::bad_alloc{};
new(data+m_size) T(std::forward<Args>(args)...);
++m_size;
}
// Access an object in aligned storage
const T& operator[](std::size_t pos) const
{
return *reinterpret_cast<const T*>(data+pos);
}
// Delete objects from aligned storage
~static_vector()
{
for(std::size_t pos = 0; pos < m_size; ++pos) {
reinterpret_cast<T*>(data+pos)->~T();
}
}
};
int main()
{
static_vector<std::string, 10> v1;
v1.emplace_back(5, '*');
v1.emplace_back(10, '*');
std::cout << v1[0] << '\n' << v1[1] << '\n';
}
In the example, the operator[] just reinterpret_casts std::aligned_storage* to T* without std:launder, and performs an indirection directly. However, according to this question, this seems to be undefined, even if an object of type T has been ever created.
So my question is: does the example program really violate strict-aliasing rules? If it does not, what's wrong with my comprehension?
I asked a related question in the ISO C++ Standard - Discussion forum. I learned the answer from those discussions, and write it here to hope to help someone else who is confused about this question. I will keep updating this answer according to those discussions.
Before P0137, refer to [basic.compound] paragraph 3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
and [expr.static.cast] paragraph 13:
If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A.
The expression reinterpret_cast<const T*>(data+pos) represents the address of the previously created object of type T, thus points to that object. Indirection through this pointer indeed get that object, which is well-defined.
However after P0137, the definition for a pointer value is changed and the first block-quoted words is deleted. Now refer to [basic.compound] paragraph 3:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
...
and [expr.static.cast] paragraph 13:
If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
The expression reinterpret_cast<const T*>(data+pos) still points to the object of type std::aligned_storage<...>::type, and indirection get a lvalue referring to that object, though the type of the lvalue is const T. Evaluation of the expression v1[0] in the example tries to access the value of the std::aligned_storage<...>::type object through the lvalue, which is undefined behavior according to [basic.lval] paragraph 11 (i.e. the strict-aliasing rules):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in [conv.qual]) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char, unsigned char, or std​::​byte type.
The code doesn't violate the strict aliasing rule in any way. An lvalue of type const T is used to access an object of type T, which is permitted.
The rule in question, as covered by the linked question, is a lifetime rule; C++14 (N4140) [basic.life]/7. The problem is that, according to this rule, the pointer data+pos may not be used to manipulate the object created by placement-new. You're supposed to use the value "returned" by placement-new.
The question naturally follows: what about the pointer reinterpret_cast<T *>(data+pos) ? It is unclear whether accessing the new object via this new pointer violates [basic.life]/7.
The author of the answer you link to, assumes (with no justification offered) that this new pointer is still "a pointer that pointed to the original object". However it seems to me that it is also possible to argue that , being a T *, it cannot point to the original object, which is a std::aligned_storage and not a T.
This shows that the object model is underspecified. The proposal P0137, which was incorporated into C++17, was addressing a problem in a different part of the object model. But it introduced std::launder which is a sort of mjolnir to squash a wide range of aliasing, lifetime and provenance issues.
Undoubtedly the version with std::launder is correct in C++17. However, as far as I can see, P0137 and C++17 don't have any more to say about whether or not the version without launder is correct.
IMHO it is impractical to call the code UB in C++14, which did not have std::launder, because there is no way around the problem other than to waste memory storing all the result pointers of placement-new. If this is UB then it's impossible to implement std::vector in C++14, which is far from ideal.

Strict aliasing rule

I'm reading notes about reinterpret_cast and it's aliasing rules ( http://en.cppreference.com/w/cpp/language/reinterpret_cast ).
I wrote that code:
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A *ptr = reinterpret_cast<A*>(buf);
ptr->t = 1;
A *ptr2 = reinterpret_cast<A*>(buf);
cout << ptr2->t;
I think these rules doesn't apply here:
T2 is the (possibly cv-qualified) dynamic type of the object
T2 and T1 are both (possibly multi-level, possibly cv-qualified at each level) pointers to the same type T3 (since C++11)
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
T2 is the (possibly cv-qualified) signed or unsigned variant of the dynamic type of the object
T2 is a (possibly cv-qualified) base class of the dynamic type of the object
T2 is char or unsigned char
In my opinion this code is incorrect. Am I right? Is code correct or not?
On the other hand what about connect function (man 2 connect) and struct sockaddr?
int connect(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
Eg. we have struct sockaddr_in and we have to cast it to struct sockaddr. Above rules also doesn't apply, so is this cast incorrect?
Yeah, it's invalid, but not because you're converting a char* to an A*: it's because you are not obtaining a A* that actually points to an A* and, as you've identified, none of the type aliasing options fit.
You'd need something like this:
#include <new>
#include <iostream>
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A* ptr = new (buf) A;
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A *ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
Now type aliasing doesn't come into it at all (though keep reading because there's more to do!).
(live demo with -Wstrict-aliasing=2)
In reality, this is not enough. We must also consider alignment. Though the above code may appear to work, to be fully safe and whatnot you will need to placement-new into a properly-aligned region of storage, rather than just a casual block of chars.
The standard library (since C++11) gives us std::aligned_storage to do this:
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
Or, if you don't need to dynamically allocate it, just:
Storage data;
Then, do your placement-new:
new (buf) A();
// or: new(&data) A();
And to use it:
auto ptr = reinterpret_cast<A*>(buf);
// or: auto ptr = reinterpret_cast<A*>(&data);
All in it looks like this:
#include <iostream>
#include <new>
#include <type_traits>
struct A
{
int t;
};
int main()
{
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
A* ptr = new(buf) A();
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A* ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
}
(live demo)
Even then, since C++17 this is somewhat more complicated; see the relevant cppreference pages for more information and pay attention to std::launder.
Of course, this whole thing appears contrived because you only want one A and therefore don't need array form; in fact, you'd just create a bog-standard A in the first place. But, assuming buf is actually larger in reality and you're creating an allocator or something similar, this makes some sense.
The C aliasing rules from which the rules of C++ were derived included a footnote specifying that the purpose of the rules was to say when things may alias. The authors of the Standard didn't think it necessary to forbid implementations from applying the rules in needlessly restrictive fashion in cases where things don't alias, because they thought compiler writers would honor the proverb "Don't prevent the programmer from doing what needs to be done", which the authors of the Standard viewed as part of the Spirit of C.
Situations where it would be necessary to use an lvalue of an aggregate's member type to actually alias a value of the aggregate type are rare, so it's entirely reasonable that the Standard doesn't require compilers to recognize such aliasing. Applying the rules restrictively in cases that don't involve aliasing, however, would cause something like:
union foo {int x; float y;} foo;
int *p = &foo.x;
*p = 1;
or even, for that matter,
union foo {int x; float y;} foo;
foo.x = 1;
to invoke UB since the assignment is used to access the stored values of a union foo and a float using an int, which is not one of the allowed types. Any quality compiler, however, should be able to recognize that an operation done on an lvalue which is visibly freshly derived from a union foo is an access to a union foo, and an access to a union foo is allowed to affect the stored values of its members (like the float member in this case).
The authors of the Standard probably declined to make the footnote normative because doing so would require a formal definition of when an access via freshly-derived lvalue is an access to the parent, and what kinds of access patterns constitute aliasing. While most cases would be pretty clear cut, there are some corner cases which implementations intended for low-level programming should probably interpret more pessimistically than those intended for e.g. high-end number crunching, and the authors of the Standard figured that anyone who could figure out how to handle the harder cases should be able to handle the easy ones.

Can Aliasing Problems be Avoided with const Variables

My company uses a messaging server which gets a message into a const char* and then casts it to the message type.
I've become concerned about this after asking this question. I'm not aware of any bad behavior in the messaging server. Is it possible that const variables do not incur aliasing problems?
For example say that foo is defined in MessageServer in one of these ways:
As a parameter: void MessageServer(const char* foo)
Or as const variable at the top of MessageServer: const char* foo = PopMessage();
Now MessageServer is a huge function, but it never assigns anything to foo, however at 1 point in MessageServer's logic foo will be cast to the selected message type.
auto bar = reinterpret_cast<const MessageJ*>(foo);
bar will only be read from subsequently, but will be used extensively for object setup.
Is an aliasing problem possible here, or does the fact that foo is only initialized, and never modified save me?
EDIT:
Jarod42's answer finds no problem with casting from a const char* to a MessageJ*, but I'm not sure this makes sense.
We know this is illegal:
MessageX* foo = new MessageX;
const auto bar = reinterpret_cast<MessageJ*>(foo);
Are we saying this somehow makes it legal?
MessageX* foo = new MessageX;
const auto temp = reinterpret_cast<char*>(foo);
auto bar = reinterpret_cast<const MessageJ*>(temp);
My understanding of Jarod42's answer is that the cast to temp makes it legal.
EDIT:
I've gotten some comments with relation to serialization, alignment, network passing, and so on. That's not what this question is about.
This is a question about strict aliasing.
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.)
What I'm asking is: Will the initialization of a const object, by casting from a char*, ever be optimized below where that object is cast to another type of object, such that I am casting from uninitialized data?
First of all, casting pointers does not cause any aliasing violations (although it might cause alignment violations).
Aliasing refers to the process of reading or writing an object through a glvalue of different type than the object.
If an object has type T, and we read/write it via a X& and a Y& then the questions are:
Can X alias T?
Can Y alias T?
It does not directly matter whether X can alias Y or vice versa, as you seem to focus on in your question. But, the compiler can infer if X and Y are completely incompatible that there is no such type T that can be aliased by both X and Y, therefore it can assume that the two references refer to different objects.
So, to answer your question, it all hinges on what PopMessage does. If the code is something like:
const char *PopMessage()
{
static MessageJ foo = .....;
return reinterpret_cast<const char *>(&foo);
}
then it is fine to write:
const char *ptr = PopMessage();
auto bar = reinterpret_cast<const MessageJ*>(foo);
auto baz = *bar; // OK, accessing a `MessageJ` via glvalue of type `MessageJ`
auto ch = ptr[4]; // OK, accessing a `MessageJ` via glvalue of type `char`
and so on. The const has nothing to do with it. In fact if you did not use const here (or you cast it away) then you could also write through bar and ptr with no problem.
On the other hand, if PopMessage was something like:
const char *PopMessage()
{
static char buf[200];
return buf;
}
then the line auto baz = *bar; would cause UB because char cannot be aliased by MessageJ. Note that you can use placement-new to change the dynamic type of an object (in that case, char buf[200] is said to have stopped existing, and the new object created by placement-new exists and its type is T).
My company uses a messaging server which gets a message into a const char* and then casts it to the message type.
So long as you mean that it does a reinterpret_cast (or a C-style cast that devolves to a reinterpret_cast):
MessageJ *j = new MessageJ();
MessageServer(reinterpret_cast<char*>(j));
// or PushMessage(reinterpret_cast<char*>(j));
and later takes that same pointer and reinterpret_cast's it back to the actual underlying type, then that process is completely legitimate:
MessageServer(char *foo)
{
if (somehow figure out that foo is actually a MessageJ*)
{
MessageJ *bar = reinterpret_cast<MessageJ*>(foo);
// operate on bar
}
}
// or
MessageServer()
{
char *foo = PopMessage();
if (somehow figure out that foo is actually a MessageJ*)
{
MessageJ *bar = reinterpret_cast<MessageJ*>(foo);
// operate on bar
}
}
Note that I specifically dropped the const's from your examples as their presence or absence doesn't matter. The above is legitimate when the underlying object that foo points at actually is a MessageJ, otherwise it is undefined behavior. The reinterpret_cast'ing to char* and back again yields the original typed pointer. Indeed, you could reinterpret_cast to a pointer of any type and back again and get the original typed pointer. From this reference:
Only the following conversions can be done with reinterpret_cast ...
6) An lvalue expression of type T1 can be converted to reference to another type T2. The result is an lvalue or xvalue referring to the same object as the original lvalue, but with a different type. No temporary is created, no copy is made, no constructors or conversion functions are called. The resulting reference can only be accessed safely if allowed by the type aliasing rules (see below) ...
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true:
T2 is the (possibly cv-qualified) dynamic type of the object ...
Effectively, reinterpret_cast'ing between pointers of different types simply instructs the compiler to reinterpret the pointer as pointing at a different type. More importantly for your example though, round-tripping back to the original type again and then operating on it is safe. That is because all you've done is instructed the compiler to reinterpret a pointer as pointing at a different type and then told the compiler again to reinterpret that same pointer as pointing back at the original, underlying type.
So, the round trip conversion of your pointers is legitimate, but what about potential aliasing problems?
Is an aliasing problem possible here, or does the fact that foo is only initialized, and never modified save me?
The strict aliasing rule allows compilers to assume that references (and pointers) to unrelated types do not refer to the same underlying memory. This assumption allows lots of optimizations because it decouples operations on unrelated reference types as being completely independent.
#include <iostream>
int foo(int *x, long *y)
{
// foo can assume that x and y do not alias the same memory because they have unrelated types
// so it is free to reorder the operations on *x and *y as it sees fit
// and it need not worry that modifying one could affect the other
*x = -1;
*y = 0;
return *x;
}
int main()
{
long a;
int b = foo(reinterpret_cast<int*>(&a), &a); // violates strict aliasing rule
// the above call has UB because it both writes and reads a through an unrelated pointer type
// on return b might be either 0 or -1; a could similarly be arbitrary
// technically, the program could do anything because it's UB
std::cout << b << ' ' << a << std::endl;
return 0;
}
In this example, thanks to the strict aliasing rule, the compiler can assume in foo that setting *y cannot affect the value of *x. So, it can decide to just return -1 as a constant, for example. Without the strict aliasing rule, the compiler would have to assume that altering *y might actually change the value of *x. Therefore, it would have to enforce the given order of operations and reload *x after setting *y. In this example it might seem reasonable enough to enforce such paranoia, but in less trivial code doing so will greatly constrain reordering and elimination of operations and force the compiler to reload values much more often.
Here are the results on my machine when I compile the above program differently (Apple LLVM v6.0 for x86_64-apple-darwin14.1.0):
$ g++ -Wall test58.cc
$ ./a.out
0 0
$ g++ -Wall -O3 test58.cc
$ ./a.out
-1 0
In your first example, foo is a const char * and bar is a const MessageJ * reinterpret_cast'ed from foo. You further stipulate that the object's underlying type actually is a MessageJ and that no reads are done through the const char *. Instead, it is only casted to the const MessageJ * from which only reads are then done. Since you do not read nor write through the const char * alias, then there can be no aliasing optimization problem with your accesses through your second alias in the first place. This is because there are no potentially conflicting operations performed on the underlying memory through your aliases of unrelated types. However, even if you did read through foo, then there could still be no potential problem as such accesses are allowed by the type aliasing rules (see below) and any ordering of reads through foo or bar would yield the same results because there are no writes occurring here.
Let us now drop the const qualifiers from your example and presume that MessageServer does do some write operations on bar and furthermore that the function also reads through foo for some reason (e.g. - prints a hex dump of memory). Normally, there might be an aliasing problem here as we have reads and writes happening through two pointers to the same memory through unrelated types. However, in this specific example, we are saved by the fact that foo is a char*, which gets special treatment by the compiler:
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true: ...
T2 is char or unsigned char
The strict-aliasing optimizations that are allowed for operations through references (or pointers) of unrelated types are specifically disallowed when a char reference (or pointer) is in play. The compiler instead must be paranoid that operations through the char reference (or pointer) can affect and be affected by operations done through other references (or pointers). In the modified example where reads and writes operate on both foo and bar, you can still have defined behavior because foo is a char*. Therefore, the compiler is not allowed to optimize to reorder or eliminate operations on your two aliases in ways that conflict with the serial execution of the code as written. Similarly, it is forced to be paranoid about reloading values that may have been affected by operations through either alias.
The answer to your question is that, so long as your functions are properly round tripping pointers to a type through a char* back to its original type, then your function is safe, even if you were to interleave reads (and potentially writes, see caveat at end of EDIT) through the char* alias with reads+writes through the underlying type alias.
These two technical references (3.10.10) are useful for answering your question. These other references help give a better understanding of the technical information.
====
EDIT: In the comments below, zmb objects that while char* can legitimately alias a different type, that the converse is not true as several sources seem to say in varying forms: that the char* exception to the strict aliasing rule is an asymmetric, "one-way" rule.
Let us modify my above strict-aliasing code example and ask would this new version similarly result in undefined behavior?
#include <iostream>
char foo(char *x, long *y)
{
// can foo assume that x and y cannot alias the same memory?
*x = -1;
*y = 0;
return *x;
}
int main()
{
long a;
char b = foo(reinterpret_cast<char*>(&a), &a); // explicitly allowed!
// if this is defined behavior then what must the values of b and a be?
std::cout << (int) b << ' ' << a << std::endl;
return 0;
}
I argue that this is defined behavior and that both a and b must be zero after the call to foo. From the C++ standard (3.10.10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:^52
the dynamic type of the object ...
a char or unsigned char type ...
^52: The intent of this list is to specify those circumstances in which an object may or may not be aliased.
In the above program, I am accessing the stored value of an object through both its actual type and a char type, so it is defined behavior and the results have to comport with the serial execution of the code as written.
Now, there is no general way for the compiler to always statically know in foo that the pointer x actually aliases y or not (e.g. - imagine if foo was defined in a library). Maybe the program could detect such aliasing at run time by examining the values of the pointers themselves or consulting RTTI, but the overhead this would incur wouldn't be worth it. Instead, the better way to generally compile foo and allow for defined behavior when x and y do happen to alias one another is to always assume that they could (i.e. - disable strict alias optimizations when a char* is in play).
Here's what happens when I compile and run the above program:
$ g++ -Wall test59.cc
$ ./a.out
0 0
$ g++ -O3 -Wall test59.cc
$ ./a.out
0 0
This output is at odds with the earlier, similar strict-aliasing program's. This is not dispositive proof that I'm right about the standard, but the different results from the same compiler provides decent evidence that I may be right (or, at least that one important compiler seems to understand the standard the same way).
Let's examine some of the seemingly conflicting sources:
The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in volation of the strict aliasing rule. In other words, casting from a pointer of one type to pointer of an unrelated type through a char* is undefined.
The bolded bit is why this quote doesn't apply to the problem addressed by my answer nor the example I just gave. In both my answer and the example, the aliased memory is being accessed both through a char* and the actual type of the object itself, which can be defined behavior.
Both C and C++ allow accessing any object type via char * (or specifically, an lvalue of type char). They do not allow accessing a char object via an arbitrary type. So yes, the rule is a "one way" rule."
Again, the bolded bit is why this statement doesn't apply to my answers. In this and similar counter-examples, an array of characters is being accessed through a pointer of an unrelated type. Even in C, this is UB because the character array might not be aligned according to the aliased type's requirements, for example. In C++, this is UB because such access does not meet any of the type aliasing rules as the underlying type of the object actually is char.
In my examples, we first have a valid pointer to a properly constructed type that is then aliased by a char* and then reads and writes through these two aliased pointers are interleaved, which can be defined behavior. So, there seems to be some confusion and conflation out there between the strict aliasing exception for char and not accessing an underlying object through an incompatible reference.
int value;
int *p = &value;
char *q = reinterpret_cast<char*>(&value);
Both p and p refer to the same address, they are aliasing the same memory. What the language does is provide a set of rules defining the behaviors that are guaranteed: write through p read through q fine, other way around not fine.
The standard and many examples clearly state that "write through q, then read through p (or value)" can be well defined behavior. What is not as abundantly clear, but what I'm arguing for here, is that "write through p (or value), then read through q" is always well defined. I claim even further, that "reads and writes through p (or value) can be arbitrarily interleaved with reads and writes to q" with well defined behavior.
Now there is one caveat to the previous statement and why I kept sprinkling the word "can" throughout the above text. If you have a type T reference and a char reference that alias the same memory, then arbitrarily interleaving reads+writes on the T reference with reads on the char reference is always well defined. For example, you might do this to repeatedly print out a hex dump of the underlying memory as you modify it multiple times through the T reference. The standard guarantees that strict aliasing optimizations will not be applied to these interleaved accesses, which otherwise might give you undefined behavior.
But what about writes through a char reference alias? Well, such writes may or may not be well defined. If a write through the char reference violates an invariant of the underlying T type, then you can get undefined behavior. If such a write improperly modified the value of a T member pointer, then you can get undefined behavior. If such a write modified a T member value to a trap value, then you can get undefined behavior. And so on. However, in other instances, writes through the char reference can be completely well defined. Rearranging the endianness of a uint32_t or uint64_t by reading+writing to them through an aliased char reference is always well defined, for example. So, whether such writes are completely well defined or not depends on the particulars of the writes themselves. Regardless, the standard guarantees that its strict aliasing optimizations will not reorder or eliminate such writes w.r.t. other operations on the aliased memory in a manner that itself could lead to undefined behavior.
So my understanding is that you are doing something like that:
enum MType { J,K };
struct MessageX { MType type; };
struct MessageJ {
MType type{ J };
int id{ 5 };
//some other members
};
const char* popMessage() {
return reinterpret_cast<char*>(new MessageJ());
}
void MessageServer(const char* foo) {
const MessageX* msgx = reinterpret_cast<const MessageX*>(foo);
switch (msgx->type) {
case J: {
const MessageJ* msgJ = reinterpret_cast<const MessageJ*>(foo);
std::cout << msgJ->id << std::endl;
}
}
}
int main() {
const char* foo = popMessage();
MessageServer(foo);
}
If that is correct, then the expression msgJ->id is ok (as would be any access to foo), as msgJ has the correct dynamic type. msgx->type on the other hand does incur UB, because msgx has a unrelated type. The fact that the the pointer to MessageJ was cast to const char* in between is completely irrelevant.
As was cited by others, here is the relevant part in the standard (the "glvalue" is the result of dereferencing the pointer):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:52
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
As far as the discussion "cast to char*" vs "cast from char*" is concerned:
You might know that the standard doesn't talk about strict aliasing as such, it only provides the list above. Strict aliasing is one analysis technique based on that list for compilers to determine which pointers can potentially alias each other. As far as optimizations are concerned, it doesn't make a difference, if a pointer to a MessageJ object was cast to char* or vice versa. The compiler cannot (without further analysis) assume that a char* and MessageX* point to distinct objects and will not perform any optimizations (e.g. reordering) based on that.
Of course that doesn't change the fact that accessing a char array via a pointer to a different type would still be UB in C++ (I assume mostly due to alignment issues) and the compiler might perform other optimizations that could ruin your day.
EDIT:
What I'm asking is: Will the initialization of a const object, by
casting from a char*, ever be optimized below where that object is
cast to another type of object, such that I am casting from
uninitialized data?
No it will not. Aliasing analysis doesn't influence how the pointer itself is handled, but the access through that pointer. The compiler will NOT reorder the write access (store memory address in the pointer variable) with the read access (copy to other variable / load of address in order to access the memory location) to the same variable.
There is no aliasing problem as you use (const)char* type, see the last point of:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among -its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
The other answer answered the question well enough (it's a direct quotation from the C++ standard in https://isocpp.org/files/papers/N3690.pdf page 75), so I'll just point out other problems in what you're doing.
Note that your code may run into alignment problems. For example, if the alignment of MessageJ is 4 or 8 bytes (typical on 32-bit and 64-bit machines), strictly speaking, it is undefined behaviour to access an arbitrary character array pointer as a MessageJ pointer.
You won't run into any problems on x86/AMD64 architectures as they allow unaligned access. However, someday you may find that the code you're developing is ported to a mobile ARM architecture and the unaligned access would be a problem then.
It therefore seems you're doing something you shouldn't be doing. I would consider using serialization instead of accessing a character array as a MessageJ type. The only problem isn't potential alignment problems, an additional problem is that the data may have a different representation on 32-bit and 64-bit architectures.

Casting a section of a std::vector<char> to std::array<char, n>, c++11

Does the following code violate strict aliasing or otherwise result in undefined behaviour according to the C++11 standard? Are there better ways to achieve the same functionality?
void do_things(const std::array<char, 64> &block) {
// ...
}
int main() {
std::vector<char> buffer(64);
do_things(reinterpret_cast<const std::array<char, 64> &>(buffer[0]));
}
tldr: Using const char * is a lot less painful
edit: since sizeof(std::array<char, n>) isn't guaranteed to be equal to n, I then propose the following:
void do_things(const char (&block)[64]) {
// ...
}
int main() {
std::vector<char> buffer(64);
do_things(reinterpret_cast<char (&)[64]>(buffer[0]));
}
According my understanding of aliasing this should not result in undefined behaviour and capture the semantics of passing an array of fixed size. Is my understanding correct?
The strict aliasing rule refers to §3.10 [basic.lval]/p10, which provides that
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including,
recursively, an element or non-static data member of a subaggregate or
contained union)
[...]
Thus, accessing an object of type char through a glvalue of type std::array<char, N> does not break this rule, because std::array<char, N> is an aggregate type that includes char as an element of a nonstatic subaggregate data member.
However, you still can't do anything useful with the resulting reference without invoking undefined behavior because of a different rule - §9.3.1 [class.mfct.non-static]/p2:
If a non-static member function of a class X is called for an object
that is not of type X, or of a type derived from X, the behavior
is undefined.
It's also worth noting that no rule in the standard guarantees that sizeof(std::array<T, N>) == sizeof(T) * N. The only things the standard guarantees is that std::array<T, N> is an aggregate type and it can be initialized using a braced-init-list containing up to N Ts. The implementation is free to add extra stuff.
Depending on what do_things needs, you may want to make your function take random access iterators, or simply a pointer. Alternatively, if you want to limit your function to take only std::vector and std::arrays, you can write overloads that take const refs to those and call a helper function taking const char * that does the actual work.
The new version doesn't break any rule I can think of, but it's fairly bad design to require reinterpret_cast to be used pretty much every time your function is called. If you accidentally declared buffer as a std::vector<std::string>, or wrote buffer instead of buffer[0], the compiler will happily compile your code without a warning, with potentially disastrous results.

Does encapsulated char array used as object breaks strict aliasing rule

Do the following class break the strict aliasing rule:
template<typename T>
class store {
char m_data[sizeof(T)];
bool m_init;
public:
store() : m_init(false) {}
store(const T &t) : init(true) {
new(m_data) T(t);
}
~store() {
if(m_init) {
get()->~T();
}
}
store &operator=(const store &s) {
if(m_init) {
get()->~T();
}
if(s.m_init) {
new(m_data) T(*s.get());
}
m_init = s.m_init;
}
T *get() {
if (m_init) {
return reinterpret_cast<T *>(m_data);
} else {
return NULL;
}
}
}
My reading of a standard is that it is incorrect but I am not sure (my usage is to have an array of objects T + some metadata of those objects but to have control over the object construction/deconstruction without manually allocating memory) as the allocated objects are used as examples for placement new in standard.
The standard contains this note:
[ Note: A typical implementation would define aligned_storage as:
template <std::size_t Len, std::size_t Alignment>
struct aligned_storage {
typedef struct {
alignas(Alignment) unsigned char __data[Len];
} type;
};
— end note ]
— Pointer modifications [meta.trans.ptr] 20.9.7.5/1
And aligned_storage is defined in part with:
The member typedef type shall be a POD type suitable for use as uninitialized storage for any object whose size is at most Len and whose alignment is a divisor of Align.
The only property covered by the standard that restricts the addresses at which an object can be constructed is alignment. An implementation might have some other restrictions, however I'm not familiar with any that do. So just ensure that having correct alignment is enough on your implementation and I think this should be okay. (and in pre-C++11 compilers you can use use compiler extensions for setting alignment such as __attribute__((alignment(X))) or __declspec(align(X)).
I believe that as long as you don't access the underlying storage directly the aliasing rules don't even come into the picture, because the aliasing rules cover when it is okay to access the value of an object through an object of a different type. Constructing an object and accessing only that object doesn't involve accessing the object's value through an object of any other type.
Earlier answer
The aliasing rules specifically allow char arrays to alias other objects.
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
[...]
— a char or unsigned char type.
— Lvalues and rvalues [basic.lval] 3.10/10
You do need to make sure that the array is properly aligned for type T though.
alignas(T) char m_data[sizeof(T)];
The above is C++11 syntax for setting alignment, but if you're on a C++03 compiler then you'll need a compiler specific attribute to do the same thing. GCC has __attribute__((aligned(32))) and MSVC has __declspec(align(32))
Kerrek SB brings up a good point that the aliasing rules state that it's okay to access the value of a T object via a char array, but that may not mean that accessing the value of a char array via a T object is okay. However, if the placement new expression is well defined then that creates a T object which I think it's okay to accesses as a T object by definition, and reading the original char array is accessing the value of the created T object, which is covered under the aliasing rules.
I think that implies that you could store a T object in, for example, an int array, and as long as you don't access the value of that T object through the original int array then you're not hitting undefined behavior.
What is allowed is to take a T object and interpret it as an array of chars. However, it is in general not allowed to take an arbitrary array of chars and treat it as a T, or even as the pointer to an area of memory containing a T. At the very least, your char array would need to be properly aligned.
One way around this might be to use a union:
union storage { char buf[sizeof(T)]; T dummy; };
Now you can construct a T inside storage.buf:
T * p = ::new (storage.buf) T();