I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code
union ARGB
{
uint32_t colour;
struct componentsTag
{
uint8_t b;
uint8_t g;
uint8_t r;
uint8_t a;
} components;
} pixel;
pixel.colour = 0xff040201; // ARGB::colour is the active member from now on
// somewhere down the line, without any edit to pixel
if(pixel.components.a) // accessing the non-active member ARGB::components
is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?
Update:
I wanted to clarify a few things in hindsight.
The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:
If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:
This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.
Extract from Stroustrup's TC++PL (emphasis mine)
Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".
Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allows E.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.
The purpose of unions is rather obvious, but for some reason people miss it quite often.
The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.
It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.
That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.
For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavior is described as producing implementation-defined behavior in C89/90.
EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.
You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:
struct VAROBJECT
{
enum o_t { Int, Double, String } objectType;
union
{
int intValue;
double dblValue;
char *strValue;
} value;
} object;
The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.
If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_t and sets the fields appropriately through bitmask operations. The same can be done in C with a function.
Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.
The most common use of union I regularly come across is aliasing.
Consider the following:
union Vector3f
{
struct{ float x,y,z ; } ;
float elts[3];
}
What does this do? It allows clean, neat access of a Vector3f vec;'s members by either name:
vec.x=vec.y=vec.z=1.f ;
or by integer access into the array
for( int i = 0 ; i < 3 ; i++ )
vec.elts[i]=1.f;
In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.
As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.
union A {
int i;
double d;
};
A a[10]; // records in "a" can be either ints or doubles
a[0].i = 42;
a[1].d = 1.23;
Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.
In C it was a nice way to implement something like an variant.
enum possibleTypes{
eInt,
eDouble,
eChar
}
struct Value{
union Value {
int iVal_;
double dval;
char cVal;
} value_;
possibleTypes discriminator_;
}
switch(val.discriminator_)
{
case eInt: val.value_.iVal_; break;
In times of litlle memory this structure is using less memory than a struct that has all the member.
By the way C provides
typedef struct {
unsigned int mantissa_low:32; //mantissa
unsigned int mantissa_high:20;
unsigned int exponent:11; //exponent
unsigned int sign:1;
} realVal;
to access bit values.
Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.
In C++, Boost Variant implement a safe version of the union, designed to prevent undefined behavior as much as possible.
Its performances are identical to the enum + union construct (stack allocated too etc) but it uses a template list of types instead of the enum :)
The behaviour may be undefined, but that just means there isn't a "standard". All decent compilers offer #pragmas to control packing and alignment, but may have different defaults. The defaults will also change depending on the optimisation settings used.
Also, unions are not just for saving space. They can help modern compilers with type punning. If you reinterpret_cast<> everything the compiler can't make assumptions about what you are doing. It may have to throw away what it knows about your type and start again (forcing a write back to memory, which is very inefficient these days compared to CPU clock speed).
Technically it's undefined, but in reality most (all?) compilers treat it exactly the same as using a reinterpret_cast from one type to the other, the result of which is implementation defined. I wouldn't lose sleep over your current code.
For one more example of the actual use of unions, the CORBA framework serializes objects using the tagged union approach. All user-defined classes are members of one (huge) union, and an integer identifier tells the demarshaller how to interpret the union.
Others have mentioned the architecture differences (little - big endian).
I read the problem that since the memory for the variables is shared, then by writing to one, the others change and, depending on their type, the value could be meaningless.
eg.
union{
float f;
int i;
} x;
Writing to x.i would be meaningless if you then read from x.f - unless that is what you intended in order to look at the sign, exponent or mantissa components of the float.
I think there is also an issue of alignment: If some variables must be word aligned then you might not get the expected result.
eg.
union{
char c[4];
int i;
} x;
If, hypothetically, on some machine a char had to be word aligned then c[0] and c[1] would share storage with i but not c[2] and c[3].
In the C language as it was documented in 1974, all structure members shared a common namespace, and the meaning of "ptr->member" was defined as adding the
member's displacement to "ptr" and accessing the resulting address using the
member's type. This design made it possible to use the same ptr with member
names taken from different structure definitions but with the same offset;
programmers used that ability for a variety of purposes.
When structure members were assigned their own namespaces, it became impossible
to declare two structure members with the same displacement. Adding unions to
the language made it possible to achieve the same semantics that had been
available in earlier versions of the language (though the inability to have
names exported to an enclosing context may have still necessitated using a
find/replace to replace foo->member into foo->type1.member). What was
important was not so much that the people who added unions have any particular
target usage in mind, but rather that they provide a means by which programmers
who had relied upon the earlier semantics, for whatever purpose, should still
be able to achieve the same semantics even if they had to use a different
syntax to do it.
As others mentioned, unions combined with enumerations and wrapped into structs can be used to implement tagged unions. One practical use is to implement Rust's Result<T, E>, which is originally implemented using a pure enum (Rust can hold additional data in enumeration variants). Here is a C++ example:
template <typename T, typename E> struct Result {
public:
enum class Success : uint8_t { Ok, Err };
Result(T val) {
m_success = Success::Ok;
m_value.ok = val;
}
Result(E val) {
m_success = Success::Err;
m_value.err = val;
}
inline bool operator==(const Result& other) {
return other.m_success == this->m_success;
}
inline bool operator!=(const Result& other) {
return other.m_success != this->m_success;
}
inline T expect(const char* errorMsg) {
if (m_success == Success::Err) throw errorMsg;
else return m_value.ok;
}
inline bool is_ok() {
return m_success == Success::Ok;
}
inline bool is_err() {
return m_success == Success::Err;
}
inline const T* ok() {
if (is_ok()) return m_value.ok;
else return nullptr;
}
inline const T* err() {
if (is_err()) return m_value.err;
else return nullptr;
}
// Other methods from https://doc.rust-lang.org/std/result/enum.Result.html
private:
Success m_success;
union _val_t { T ok; E err; } m_value;
}
You can use a a union for two main reasons:
A handy way to access the same data in different ways, like in your example
A way to save space when there are different data members of which only one can ever be 'active'
1 Is really more of a C-style hack to short-cut writing code on the basis you know how the target system's memory architecture works. As already said you can normally get away with it if you don't actually target lots of different platforms. I believe some compilers might let you use packing directives also (I know they do on structs)?
A good example of 2. can be found in the VARIANT type used extensively in COM.
#bobobobo code is correct as #Joshua pointed out (sadly I'm not allowed to add comments, so doing it here, IMO bad decision to disallow it in first place):
https://en.cppreference.com/w/cpp/language/data_members#Standard_layout tells that it is fine to do so, at least since C++14
In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2 (except that reading a volatile member through non-volatile glvalue is undefined).
since in the current case T1 and T2 donate the same type anyway.
Related
Here's a little puzzle I couldn't find a good answer for:
Given a struct with bitfields, such as
struct A {
unsigned foo:13;
unsigned bar:19;
};
Is there a (portable) way in C++ to get the correct mask for one of the bitfields, preferably as a compile-time constant function or template?
Something like this:
constinit unsigned mask = getmask<A::bar>(); // mask should be 0xFFFFE000
In theory, at runtime, I could crudely do:
unsigned getmask_bar() {
union AA {
unsigned mask;
A fields;
} aa{};
aa.fields.bar -= 1;
return aa.mask;
}
That could even be wrapped in a macro (yuck!) to make it "generic".
But I guess you can readily see the various deficiencies of this method.
Is there a nicer, generic C++ way of doing it? Or even a not-so-nice way? Is there something useful coming up for the next C++ standard(s)? Reflection?
Edit: Let me add that I am trying to find a way of making bitfield manipulation more flexible, so that it is up to the programmer to modify multiple fields at the same time using masking. I am after terse notation, so that things can be expressed concisely without lots of boilerplate. Think working with hardware registers in I/O drivers as a use case.
Unfortunately, there is no better way - in fact, there is no way to extract individual adjacent bit fields from a struct by inspecting its memory directly in C++.
From Cppreference:
The following properties of bit-fields are implementation-defined:
The value that results from assigning or initializing a signed bit-field with a value out of range, or from incrementing a signed
bit-field past its range.
Everything about the actual allocation details of bit-fields within the class object
For example, on some platforms, bit-fields don't straddle bytes, on others they do
Also, on some platforms, bit-fields are packed left-to-right, on others right-to-left
Your compiler might give you stronger guarantees; however, if you do rely on the behavior of a specific compiler, you can't expect your code to work with a different compiler/architecture pair. GCC doesn't even document their bit field packing, as far as I can tell, and it differs from one architecture to the next. So your code might work on a specific version of GCC on x86-64 but break on literally everything else, including other versions of the same compiler.
If you really want to be able to extract bitfields from a random structure in a generic way, your best bet is to pass a function pointer around (instead of a mask); that way, the function can access the field in a safe way and return the value to its caller (or set a value instead).
Something like this:
template<typename T>
auto extractThatBitField(const void *ptr) {
return static_cast<const T *>(ptr)->m_thatBitField;
}
auto *extractor1 = &extractThatBitField<Type1>;
auto *extractor2 = &extractThatBitField<Type2>;
/* ... */
Now, if you have a pair of {pointer, extractor}, you can get the value of the bitfield safely. (Of course, the extractor function has to match the type of the object behind that pointer.) It's not much overhead compared to having a {pointer, mask} pair instead; the function pointer is maybe 4 bytes larger than the mask on a 64-bit machine (if at all). The extractor function itself will just be a memory load, some bit twiddling, and a return instruction. It'll still be super fast.
This is portable and supported by the C++ standard, unlike inspecting the bits of a bitfield directly.
Alternatively, C++ allows casting between standard-layout structs that have common initial members. (Though keep in mind that this falls apart as soon as inheritance or private/protected members get involved! The first solution, above, works for all those cases as well.)
struct Common {
int m_a : 13;
int m_b : 19;
int : 0; //Needed to ensure the bit fields end on a byte boundary
};
struct Type1 {
int m_a : 13;
int m_b : 19;
int : 0;
Whatever m_whatever;
};
struct Type2 {
int m_a : 13;
int m_b : 19;
int : 0;
Something m_something;
};
int getFieldA(const void *ptr) {
//We still can't do type punning directly due
//to weirdness in various compilers' aliasing resolution.
//std::memcpy is the official way to do type punning.
//This won't compile to an actual memcpy call.
Common tmp;
std::memcpy(&tmp, ptr, sizeof(Common));
return tmp.m_a;
}
See also: Can memcpy be used for type punning?
In C++, it's well-defined to read from an union member which was most recently written, aka the active union member.
My question is whether std::memcpying a whole union object, as opposed to copying a particular union member, to an uninitialized memory area will preserve the active union member.
union A {
int x;
char y[4];
};
A a;
a.y[0] = 'U';
a.y[1] = 'B';
a.y[2] = '?';
a.y[3] = '\0';
std::byte buf[sizeof(A)];
std::memcpy(buf, &a, sizeof(A));
A& a2 = *reinterpret_cast<A*>(buf);
std::cout << a2.y << '\n'; // is `A::y` the active member of `a2`?
The assignments you have are okay because the assignment to non-class member a.y "begins its lifetime". However, your std::memcpy does not do that, so any accesses to members of a2 are invalid. So, you are relying on the consequences of undefined behaviour. Technically. In practice most toolchains are rather lax about aliasing and lifetime of primitive-type union members.
Unfortunately, there's more UB here, in that you're violating aliasing for the union itself: you can pretend that a T is a bunch of bytes, but you can't pretend that a bunch of bytes is a T, no matter how much reinterpret_casting you do. You could instantiate an A a2 normally and std::copy/std::memcpy over it from a, and then you're just back to the union member lifetime problem, if you care about that. But, I guess, if this option were open to you, you'd just be writing A a2 = a in the first place…
My question is whether std::memcpying a whole union object, as opposed to copying a particular union member, to an uninitialized memory area will preserve the active union member.
It'll be copied as expected.
It's how you're reading the result that may, or may not, make your program have undefined behavior.
Using std::memcpy copies char's from one source to a destination. Raw memory copying is ok. Reading from memory as something it wasn't initialized to be is not ok.
So far as I can tell, the C++ Standard makes no distinction between the following two functions on platforms where the sizes of int and foo would happen to be identical [as would typically be the case]
struct s1 { int x; };
struct s2 { int x; };
union foo { s1 a; s2 b; } u1, u2;
void test1(void)
{
u1.a.x = 1;
u2.b.x = 2;
std::memcpy(&u1, &u2, sizeof u1);
}
void test2(void)
{
u1.a = 1;
u2.b = 2;
std::memcpy(&u1.a.x, &u2.b.x, sizeof u1.a.x);
}
If a union of trivially-copyable types is a trivially-copyable type, that would suggest that the active member of u1 after the memcpy in test1 should be b. In the equivalent function test2, however, copying all the bytes from an int object into an one that's part of active union member s1.a should leave the active union member as a.
IMHO, this issue could be easily resolved by recognizing that a union may have multiple "potentially active" members, and allowing certain actions to be performed on any member that is at least potentially active (rather than limiting them to one particular active member). That would among other things allow the Common Initial Sequence rule to be made much clearer more useful, without unduly inhibiting optimizations, by specifying that the act of taking the address of a union member makes it "at least potentially" active, until the next time the union is written via non-character-based access, and providing that common-initial-sequence inspection or bytewise writing of a potentially active union member is allowed, but does not change the active member.
Unfortunately, when the Standard was first written, there was no effort to explore all of the relevant corner cases, much less reach a consensus about how they should be handled. At the time, I don't think there would have been opposition to the idea of officially accommodating multiple potentially-active members, since most compiler designs would naturally accommodate that without difficulty. Unfortunately, some compilers have evolved in a way that would make support for such constructs more difficult than it would have been if accommodated from the start, and their maintainers would block any changes that would contradict their design decisions, even if the Standard was never intended to allow such decisions in the first place.
Before I answer your question, I think your code should add this:
static_assert(std::is_trivial<A>());
Because in order to keep compatibility with C, trivial types get extra guarantees. For example, the requirement of running an object's constructor before using it (See https://eel.is/c++draft/class.cdtor) applies only to an object whose constructor is not trivial.
Because your union is trivial, your code is fine up to and including the memcpy. Where you run into trouble is *reinterpret_cast<A*>(buf);
Specifically, you are using an A object before its lifetime has begin.
As stated in https://eel.is/c++draft/basic.life , lifetime begins when storage with proper alignment and size for the type has been obtained, and its initialization is complete. A trivial type has "vacuous" initialization, so no problem there, however the storage is a problem.
When your example gets storage for buf,
std::byte buf[sizeof(A)];
It does not obtain proper alignment for the type. You'd need to change that line to:
alignas(A) std::byte buf[sizeof(A)];
I understand there are HW platforms where you need more information to point to a char than you need to point to an int (the platform having non-addressable bytes, so a pointer to char needs to store a pointer to a word and also an index of a byte in the word). So it is possible that sizeof(int*) < sizeof(char*) on such platforms.
Can something similar happen with pointers to non-union classes? C++ allows covariant returns types on virtual functions. Let's say we have classes like this:
struct Gadget
{
// some content
};
struct Widget
{
virtual Gadget* getGadget();
};
Any code which calls getGadget() has to work when receiving a Gadget*, but the same code (actually the same compiled binary code) has to work when it receives a pointer to a type derived from Gadget as well (perhaps one which is defined in a totally different library). The only way I can reasonably see this happening is sizeof(T*) == sizeof(U*) for all non-union class types T and U.
So my question is, given one particular practical compiler (that excludes hypothetical Hell++) on one particular platform, is it reasonable to expect that all pointers to non-union class types will be of the same size? Or is there a practical reason why a compiler might want to use different sizes while remaining compliant with covariant return types?
On platforms where different "levels" of pointer exist (such as __near and __far), assume the same attribute applied to both.
C has a hard requirement that all pointers to all structure types have the same representation and alignment.
6.2.5 Types
27 [...] All pointers to structure types shall have the same representation and alignment requirements as each other. [...]
C++ effectively requires binary compatibility with a C implementation, because of the standard's requirements for extern "C", so indirectly, this requires all pointers to structure types that are valid in C (POD types, pretty much) to have the same representation and alignment in C++ too.
No such requirement seems to have been made for non-POD types, so an implementation would be allowed to use different pointer sizes in that case. You suggest that that cannot work, but to follow your example,
struct G { };
struct H : G { };
struct W
{
virtual G* f() { ... }
};
struct X : W
{
virtual H* f() { ... }
};
could be translated to (pseudo-code)
struct W
{
virtual G* f() { ... }
};
struct X : W
{
override G* f() { ... }
inline H* __X_f() { return static_cast<H *>(f()); }
};
which would still match the requirements of the language.
A valid reason why two pointers to structure types might not be identical is when a C++ compiler would be ported to a platform that has an existing C compiler with a poorly-designed ABI. G is a POD type, so G * needs to be exactly what it is in C. H is not a POD type, so H * does not need to match any C type.
For alignment, that can actually happen: something that really happened is that the x86-32 ABI on a typical GNU/Linux system requires 64-bit integer types to be 32-bit aligned, even though the processor's preferred alignment is actually 64-bit. Now comes another implementer, and they decide that they do want to require 64-bit alignment, but are stuck if they want to remain compatible with the existing implementation.
For sizes, I cannot think of a reasonable scenario in which it would happen, but I am unsure whether that might be a lack of imagination on my part.
As far as I understand C and C++ assume memory to be linearly byte addressable. Certain platforms (early ARM) however insist on word aligned loads and stores. In such a case, it is the compiler's responsibility to round the pointer to the word boundary and then perform the necessary bit shift operations when fetching say a char.
But since this is all done only on loads and stores, all pointers still all look the same.
I get this warning. I would like defined behavior but i would like to keep this code as it is. When may i break aliasing rules?
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
String is my own string which is a POD. This code is called from C. S may be an int. String is pretty much struct String { RealString*s; } but templated and helper functions. I do a static assert to make sure String is a pod, is 4bytes and int is 4bytes. I also wrote an assert which checks if all pointers are >= NotAPtr. Its in my new/malloc overload. I may put that assert in String as well if you suggest
Considering the rules i am following (mainly that string is a pod and always the same size as int) would it be fine if i break aliasing rules? Is this one of the few times one is breaking it right?
void func(String s) {
auto v=*(unsigned int*)&s;
myassert(v);
if(v < NotAPtr) {
//v is an int
}
else{
//v is a ptr
}
}
memcpy is fully supported. So is punning to a char* (you can then use std::copy, for instance).
If you cannot change code to 2 functions as proposed why not (requires C99 compiler as uses uintptr_t - for older MSVC you need to define it yourself, 2008/2010 should be ok):
void f(RealString *s) {
uintptr_t int = reinterpret_cast<uintptr_t>(s);
assert(int);
}
The Standard specifies a minimal set of actions that all conforming implementations must process in predictable fashion unless they encounter translation limits (whereupon all bets are off). It does not attempt to define all of the actions that an implementation must support to be suitable for any particular purpose. Instead, support for actions beyond those mandated is treated as a Quality of Implementation issue. The authors acknowledge that an implementation could be conforming and yet be of such poor quality as to be useless.
Code such as yours should be usable for quality implementations that are intended for low-level programming, and which represent things in memory in the expected fashion. It should not be expected to be usable on other kinds of implementations, including those which interpret "quality of implementation" issues as an invitation to try to behave in poor-quality-but-conforming fashion.
The safe way of treating a variable as two different types is to turn it into a union. One part of the union can be your pointer, the other part an integer.
struct String
{
union
{
RealString*s;
int i;
};
};
If I have the following struct:
struct Foo { int a; };
Is the code bellow conforming with the C++ Standard? I mean, can't it generate an "Undefined Behavior"?
Foo foo;
int ifoo;
foo = *reinterpret_cast<Foo*>(&ifoo);
void bar(int value);
bar(*reinterpret_cast<int*>(&foo));
auto fptr = static_cast<void(*)(...)>(&bar);
fptr(foo);
9.2/20 in N3290 says
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
And your Foo is a standard-layout class.
So your second cast is correct.
I see no guarantee that the first one is correct (and I've used architecture where a char had weaker alignment restriction than a struct containing just a char, on such an architecture, it would be problematic). What the standard guarantee is that if you have a pointer to int which really point to the first element of a struct, you can reinterpret_cast it back to pointer to the struct.
Likewise, I see nothing which would make your third one defined if it was a reinterpret_cast (I'm pretty sure that some ABI use different convention to pass structs and basic types, so it is highly suspicious and I'd need a clear mention in the standard to accept it) and I'm quite sure that nothing allow static_cast between pointers to functions.
As long as you access only the first element of a struct, it's considered to be safe, since there's no padding before the first member of a struct. In fact, this trick is used, for example, in the Objecive-C runtime, where a generic pointer type is defined as:
typedef struct objc_object {
Class isa;
} *id;
and in runtime, real objecs (which are still bare struct pointers) have memory layouts like this:
struct {
Class isa;
int x; // random other data as instance variables
} *CustomObject;
and the runtime accesses the class of an actual object using this method.
Foo is a plain-old-data structure, which means it contains nothing but the data you explicitely store in it. In this case: an int.
Thus the memory layout for an int and Foo are the same.
You can typecast from one to the other without problems. Whether it's a clever idea to use this kind of stuff is a different question.
PS:
This usually works, but not necessarily due to different alignment restrictions. See AProgrammer's answer.