Strict aliasing rule - c++

I'm reading notes about reinterpret_cast and it's aliasing rules ( http://en.cppreference.com/w/cpp/language/reinterpret_cast ).
I wrote that code:
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A *ptr = reinterpret_cast<A*>(buf);
ptr->t = 1;
A *ptr2 = reinterpret_cast<A*>(buf);
cout << ptr2->t;
I think these rules doesn't apply here:
T2 is the (possibly cv-qualified) dynamic type of the object
T2 and T1 are both (possibly multi-level, possibly cv-qualified at each level) pointers to the same type T3 (since C++11)
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
T2 is the (possibly cv-qualified) signed or unsigned variant of the dynamic type of the object
T2 is a (possibly cv-qualified) base class of the dynamic type of the object
T2 is char or unsigned char
In my opinion this code is incorrect. Am I right? Is code correct or not?
On the other hand what about connect function (man 2 connect) and struct sockaddr?
int connect(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
Eg. we have struct sockaddr_in and we have to cast it to struct sockaddr. Above rules also doesn't apply, so is this cast incorrect?

Yeah, it's invalid, but not because you're converting a char* to an A*: it's because you are not obtaining a A* that actually points to an A* and, as you've identified, none of the type aliasing options fit.
You'd need something like this:
#include <new>
#include <iostream>
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A* ptr = new (buf) A;
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A *ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
Now type aliasing doesn't come into it at all (though keep reading because there's more to do!).
(live demo with -Wstrict-aliasing=2)
In reality, this is not enough. We must also consider alignment. Though the above code may appear to work, to be fully safe and whatnot you will need to placement-new into a properly-aligned region of storage, rather than just a casual block of chars.
The standard library (since C++11) gives us std::aligned_storage to do this:
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
Or, if you don't need to dynamically allocate it, just:
Storage data;
Then, do your placement-new:
new (buf) A();
// or: new(&data) A();
And to use it:
auto ptr = reinterpret_cast<A*>(buf);
// or: auto ptr = reinterpret_cast<A*>(&data);
All in it looks like this:
#include <iostream>
#include <new>
#include <type_traits>
struct A
{
int t;
};
int main()
{
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
A* ptr = new(buf) A();
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A* ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
}
(live demo)
Even then, since C++17 this is somewhat more complicated; see the relevant cppreference pages for more information and pay attention to std::launder.
Of course, this whole thing appears contrived because you only want one A and therefore don't need array form; in fact, you'd just create a bog-standard A in the first place. But, assuming buf is actually larger in reality and you're creating an allocator or something similar, this makes some sense.

The C aliasing rules from which the rules of C++ were derived included a footnote specifying that the purpose of the rules was to say when things may alias. The authors of the Standard didn't think it necessary to forbid implementations from applying the rules in needlessly restrictive fashion in cases where things don't alias, because they thought compiler writers would honor the proverb "Don't prevent the programmer from doing what needs to be done", which the authors of the Standard viewed as part of the Spirit of C.
Situations where it would be necessary to use an lvalue of an aggregate's member type to actually alias a value of the aggregate type are rare, so it's entirely reasonable that the Standard doesn't require compilers to recognize such aliasing. Applying the rules restrictively in cases that don't involve aliasing, however, would cause something like:
union foo {int x; float y;} foo;
int *p = &foo.x;
*p = 1;
or even, for that matter,
union foo {int x; float y;} foo;
foo.x = 1;
to invoke UB since the assignment is used to access the stored values of a union foo and a float using an int, which is not one of the allowed types. Any quality compiler, however, should be able to recognize that an operation done on an lvalue which is visibly freshly derived from a union foo is an access to a union foo, and an access to a union foo is allowed to affect the stored values of its members (like the float member in this case).
The authors of the Standard probably declined to make the footnote normative because doing so would require a formal definition of when an access via freshly-derived lvalue is an access to the parent, and what kinds of access patterns constitute aliasing. While most cases would be pretty clear cut, there are some corner cases which implementations intended for low-level programming should probably interpret more pessimistically than those intended for e.g. high-end number crunching, and the authors of the Standard figured that anyone who could figure out how to handle the harder cases should be able to handle the easy ones.

Related

Struct offsets and pointer safety in C++

This question is about pointers derived using pointer arithmetic with struct offsets.
Consider the following program:
#include <cstddef>
#include <iostream>
#include <new>
struct A {
float a;
double b;
int c;
};
static constexpr auto off_c = offsetof(A, c);
int main() {
A * a = new A{0.0f, 0.0, 5};
char * a_storage = reinterpret_cast<char *>(a);
int * c = reinterpret_cast<int *>(a_storage + off_c));
std::cout << *c << std::endl;
delete a;
}
This program appears to work and give expected results on compilers that I tested, using default settings and C++11 standard.
(A closely related program, where we use void * instead of char * and static_cast instead of reinterpret_cast,
is not universally accepted. gcc 5.4 issues a warning about pointer arithmetic with void pointers, and clang 6.0 says that pointer
arithmetic with void * is an error.)
Does this program have well-defined behavior according to the C++ standard?
Does the answer depend on whether the implementation has relaxed or strict pointer safety ([basic.stc.dynamic.safety])?
There are no fundamental errors in your code.
If A isn't plain old data, the above is UB (prior to C++17) and conditionally supported (after C++17).
You might want to replace char* and int* with auto*, but that is a style thing.
Note that pointers to members do this exact same thing in a type-safe manner. Most compilers implement a pointer to member ... as the offset of the member in the type. They do, however, work everywhere even on non-pod structures.
Aside: I don't see a guarantee that offsetof is constexpr in the standard. ;)
In any case, replace:
static constexpr auto off_c = offsetof(A, c);
with
static constexpr auto off_c = &A::c;
and
auto* a_storage = static_cast<char *>(a);
auto* c = reinterpret_cast<int *>(a_storage + off_c));
with
auto* c = &(a->*off_c);
to do it the C++ way.
It is safe in your specific example, but only because your struct is a standard layout, which you can double-check using std::is_standard_layout<>.
Trying to apply this to a struct such as:
struct A {
float a;
double b;
int c;
std::string str;
};
Would be illegal, even if the string is past the part of the struct that's relevant.
Edit
Heres what im concerned abt: in 3.7.4.3 [basic.stc.dynamic.safety] it says pointer is safely derived only if (conditions) and if we have strict pointer safety then a pointer is invalid if it doesnt come from such a place. In 5.7 pointer arithmetic, it says i can do usual arithmetic within an array but i dont see anything there telling me struct offset arithmetic is ok. Im trying to figure out if this isnt relevant in the way i think it is, or if struct offset arithmetic is not ok in the hypothetical "strict" impls, or if i read 5.7 wrong (n4296)
When you are doing the pointer arithmatic, you are performing it on an array of char, the size of which is at least sizeof(A), so that's fine.
Then, when you cast back into the second member, you are covered under (2.4):
— the result of a well-defined pointer conversion (4.10, 5.4) of a safely-derived pointer value;
you should examine your assumptions.
Assumption #1) offsetof gives the correct offset in bytes. This is only guaranteed if the class is considered "standard-layout", which has a number of restrictions, such as no virtual methods, avoids multiple inheritances, etc. In this case, it should be fine, but in general you can't be sure.
Assumption #2) A char is the same size as a byte. In C, this is by definition, so you are safe.
Assumption #3) The offsetof gives the correct offset from the pointer to the class, not from the beginning of the data. This is basically the same as #1, but a vtable could certainly be a problem. Again, only works with standard-layout.

Is it legal to alias a char array through a pointer to int?

I know that the following is explicitly allowed in the standard:
int n = 0;
char *ptr = (char *) &n;
cout << *ptr;
What about this?
alignas(int) char storage[sizeof(int)];
int *ptr = (int *) &storage[0];
*ptr = 0;
cout << *ptr;
Essentially, I'm asking if the aliasing rules allow for a sequence of chars to be accessed through a pointer to another type. I'd like references to the portions of the standard that indicate one way or another if possible.
Some parts of the standard have left me conflicted; (3.10.10) seems to indicate it would be undefined behavior on the assumption that the dynamic type of storage is not int. However, the definition of dynamic type is unclear to me, and the existence of std::aligned_storage would lead me to believe that this is possible.
The code int *ptr = (int *) &storage[0]; *ptr = 0; causes undefined behaviour by violating the strict aliasing rule (C++14 [basic.lval]/10)
The objects being accessed have type char but the glvalue used for the access has type int.
The "dynamic type of the object" for a char is still char. (The dynamic type only differs from the static type in the case of a derived class). C++ does not have any equivalent of C's "effective type" either, which allows typed objects to be "created" by using the assignment operator into malloc'd space.
Regarding correct use of std::aligned_storage, you're supposed to then use placement-new to create an object in the storage. The use of placement-new is considered to end the lifetime of the char (or whatever) objects, and create a new object (of dynamic storage duration) of the specified type, re-using the same storage. Then there will be no strict aliasing violation.
You could do the same thing with the char array, e.g.:
alignas(int) char storage[sizeof(int)];
int *ptr = new(storage) int;
*ptr = 0;
cout << *ptr;
Note that no pseudo-destructor call or delete is required for built-in type int. You would need to do that if using a class type with non-trivial initialization. Link to further reading
The union construct might be useful here.
union is similar to struct, except that all of the elements of a union occupy the same area of storage.
They are, in other words, "different ways to view the same thing," just like FORTRAN's EQUIVALENCE declaration. Thus, for instance:
union {
int foo;
float bar;
char bletch[8];
}
offers three entirely-different ways to consider the same area of storage. (The storage-size of a union is the size of its longest component.) foo, bar, and bletch are all synonyms for the same storage.
union is often used with typedef, as illustrated in this StackOverflow article: C: typedef union.
*ptr = 0;
writes to an int, so it is an access to int, with an lvalue of type int, so that part of the code is fine.
The cast is morally fine, but the C/C++ standard texts don't clearly describe casts, or pointers, or anything fundamental.

Is it legal to use address of one field of a union to access another field?

Consider following code:
union U
{
int a;
float b;
};
int main()
{
U u;
int *p = &u.a;
*(float *)p = 1.0f; // <-- this line
}
We all know that addresses of union fields are usually same, but I'm not sure is it well-defined behavior to do something like this.
So, question is: Is it legal and well-defined behavior to cast and dereference a pointer to union field like in the code above?
P.S. I know that it's more C than C++, but I'm trying to understand if it's legal in C++, not C.
All members of a union must reside at the same address, that is guaranteed by the standard. What you are doing is indeed well-defined behavior, but it shall be noted that you cannot read from an inactive member of a union using the same approach.
Accessing inactive union member - undefined behavior?
Note: Do not use c-style casts, prefer reinterpret_cast in this case.
As long as all you do is write to the other data-member of the union, the behavior is well-defined; but as stated this changes which is considered to be the active member of the union; meaning that you can later only read from that you just wrote to.
union U {
int a;
float b;
};
int main () {
U u;
int *p = &u.a;
reinterpret_cast<float*> (p) = 1.0f; // ok, well-defined
}
Note: There is an exception to the above rule when it comes to layout-compatible types.
The question can be rephrased into the following snippet which is semantically equivalent to a boiled down version of the "problem".
#include <type_traits>
#include <algorithm>
#include <cassert>
int main () {
using union_storage_t = std::aligned_storage<
std::max ( sizeof(int), sizeof(float)),
std::max (alignof(int), alignof(float))
>::type;
union_storage_t u;
int * p1 = reinterpret_cast< int*> (&u);
float * p2 = reinterpret_cast<float*> (p1);
float * p3 = reinterpret_cast<float*> (&u);
assert (p2 == p3); // will never fire
}
What does the Standard (n3797) say?
9.5/1 Unions [class.union]
In a union, at most one of the non-static data members can be
active at any time, that is, the value of at most one of the
non-static dat amembers ca nbe stored in a union at any time.
[...] The size of a union is sufficient to contain the largest of
its non-static data members. Each non-static data member is
allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
Note: The wording in C++11 (n3337) was underspecified, even though the intent has always been that of C++14.
Yes, it is legal. Using explicit casts, you can do almost anything.
As other comments have stated, all members in a union start at the same address / location so casting a pointer to a different member is pointless.
The assembly language will be the same. You want to make the code easy to read so I don't recommend the practice. It is confusing and there is no benefit.
Also, I recommend a "type" field so that you know when the data is in float format versus int format.

Generic char[] based storage and avoiding strict-aliasing related UB

I'm trying to build a class template that packs a bunch of types in a suitably large char array, and allows access to the data as individual correctly typed references. Now, according to the standard this can lead to strict-aliasing violation, and hence undefined behavior, as we're accessing the char[] data via an object that is not compatible with it. Specifically, the standard states:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Given the wording of the highlighted bullet point, I came up with the following alias_cast idea:
#include <iostream>
#include <type_traits>
template <typename T>
T alias_cast(void *p) {
typedef typename std::remove_reference<T>::type BaseType;
union UT {
BaseType t;
};
return reinterpret_cast<UT*>(p)->t;
}
template <typename T, typename U>
class Data {
union {
long align_;
char data_[sizeof(T) + sizeof(U)];
};
public:
Data(T t = T(), U u = U()) { first() = t; second() = u; }
T& first() { return alias_cast<T&>(data_); }
U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};
int main() {
Data<int, unsigned short> test;
test.first() = 0xdead;
test.second() = 0xbeef;
std::cout << test.first() << ", " << test.second() << "\n";
return 0;
}
(The above test code, especially the Data class is just a dumbed-down demonstration of the idea, so please don't point out how I should use std::pair or std::tuple. The alias_cast template should also be extended to handle cv qualified types and it can only be safely used if the alignment requirements are met, but I hope this snippet is enough to demonstrate the idea.)
This trick silences the warnings by g++ (when compiled with g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing), and the code works, but is this really a valid way of telling the compiler to skip strict-aliasing based optimizations?
If it's not valid, then how would one go about implementing a char array based generic storage class like this without violating the aliasing rules?
Edit:
replacing the alias_cast with a simple reinterpret_cast like this:
T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }
produces the following warning when compiled with g++:
aliastest-so-1.cpp: In instantiation of ‘T& Data::first() [with
T = int; U = short unsigned int]’: aliastest-so-1.cpp:28:16:
required from here aliastest-so-1.cpp:21:58: warning: dereferencing
type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
Using a union is almost never a good idea if you want to stick with strict conformance, they have stringent rules when it comes to reading the active member (and this one only). Although it has to be said that implementations like to use unions as hooks for reliable behaviour, and perhaps that is what you are after. If that is the case I defer to Mike Acton who has written a nice (and long) article on aliasing rules, where he does comment on casting through a union.
To the best of my knowledge this is how you should deal with arrays of char types as storage:
// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));
The reason this is defined to work is that T is the dynamic type of the object here. The storage was reused when the new expression created the T object, which operation implicitly ended the lifetime of storage (which happens trivially as unsigned char is a, well, trivial type).
You can still use e.g. storage[0] to read the bytes of the object as this is reading the object value through a glvalue of unsigned char type, one of the listed explicit exceptions. If on the other hand storage were of a different yet still trivial element type, you could still make the above snippet work but would not be able to do storage[0].
The final piece to make the snippet sensible is the pointer conversion. Note that reinterpret_cast is not suitable in the general case. It can be valid given that T is standard-layout (there are additional restrictions on alignment, too), but if that is the case then using reinterpret_cast would be equivalent to static_casting via void like I did. It makes more sense to use that form directly in the first place, especially considering the use of storage happens a lot in generic contexts. In any case converting to and from void is one of the standard conversions (with a well-defined meaning), and you want static_cast for those.
If you are worried at all about the pointer conversions (which is the weakest link in my opinion, and not the argument about storage reuse), then an alternative is to do
T* p = ::new (&storage) T;
which costs an additional pointer in storage if you want to keep track of it.
I heartily recommend the use of std::aligned_storage.

Container covariance in C++

I know that C++ doesn't support covariance for containers elements, as in Java or C#. So the following code probably is undefined behavior:
#include <vector>
struct A {};
struct B : A {};
std::vector<B*> test;
std::vector<A*>* foo = reinterpret_cast<std::vector<A*>*>(&test);
Not surprisingly, I received downvotes when suggesting this a solution to another question.
But what part of the C++ standard exactly tells me that this will result in undefined behavior? It's guaranteed that both std::vector<A*> and std::vector<B*> store their pointers in a continguous block of memory. It's also guaranteed that sizeof(A*) == sizeof(B*). Finally, A* a = new B is perfectly legal.
So what bad spirits in the standard did I conjure (except style)?
The rule violated here is documented in C++03 3.10/15 [basic.lval], which specifies what is referred to informally as the "strict aliasing rule"
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
In short, given an object, you are only allowed to access that object via an expression that has one of the types in the list. For a class-type object that has no base classes, like std::vector<T>, basically you are limited to the types named in the first, second, and last bullets.
std::vector<Base*> and std::vector<Derived*> are entirely unrelated types and you can't use an object of type std::vector<Base*> as if it were a std::vector<Derived*>. The compiler could do all sorts of things if you violate this rule, including:
perform different optimizations on one than on the other, or
lay out the internal members of one differently, or
perform optimizations assuming that a std::vector<Base*>* can never refer to the same object as a std::vector<Derived*>*
use runtime checks to ensure that you aren't violating the strict aliasing rule
[It might also do none of these things and it might "work," but there's no guarantee that it will "work" and if you change compilers or compiler versions or compilation settings, it might all stop "working." I use the scare-quotes for a reason here. :-)]
Even if you just had a Base*[N] you could not use that array as if it were a Derived*[N] (though in that case, the use would probably be safer, where "safer" means "still undefined but less likely to get you into trouble).
You are invoking the bad spirit of reinterpret_cast<>.
Unless you really know what you do (I mean not proudly and not pedantically) reinterpret_cast is one of the gates of evil.
The only safe use I know of is managing classes and structures between C++ and C functions calls.
There maybe some others however.
The general problem with covariance in containers is the following:
Let's say your cast would work and be legal (it isn't but let's assume it is for the following example):
#include <vector>
struct A {};
struct B : A { public: int Method(int x, int z); };
struct C : A { public: bool Method(char y); };
std::vector<B*> test;
std::vector<A*>* foo = reinterpret_cast<std::vector<A*>*>(&test);
foo->push_back(new C);
test[0]->Method(7, 99); // What should happen here???
So you have also reinterpret-casted a C* to a B*...
Actually I don't know how .NET and Java manage this (I think they throw an exception when trying to insert a C).
I think it'll be easier to show than tell:
struct A { int a; };
struct Stranger { int a; };
struct B: Stranger, A {};
int main(int argc, char* argv[])
{
B someObject;
B* b = &someObject;
A* correct = b;
A* incorrect = reinterpret_cast<A*>(b);
assert(correct != incorrect); // troubling, isn't it ?
return 0;
}
The (specific) issue showed here is that when doing a "proper" conversion, the compiler adds some pointer ajdustement depending on the memory layout of the objects. On a reinterpret_cast, no adjustement is performed.
I suppose you'll understand why the use of reinterpet_cast should normally be banned from the code...