Is std::byte well defined? - c++

C++17 introduces the std::byte type. A library type that can (supposedly) be used to access raw memory, but stands separate from the character types and represents a mere lump of bits.
So far so good. But the definition has me slightly worried. As given in [cstddef.syn]:
enum class byte : unsigned char {};
I have seen two answers on SO which seem to imply different things about the robustness of the above. This answer argues (without reference) that an enumeration with an underlying type has the same size and alignment requirements as said type. Intuitively this seems correct, since specifying an underlying type allows for opaque enum declarations.
However, this answer argues that the standard only guarantees that two enumerations with the same underlying type are layout compatible, and no more.
When reading [dcl.enum] I couldn't help but notice that indeed, the underlying type is only used to specify the range of the enumerators. There is no mention of size or alignment requirements.
What am I missing?

Essentially there is special wording all around the c++17 draft standard that gives std::byte the same properties with regard to aliasing as char and unsigned char.
To give you an example, in $6.10 in n4659 it states
8 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined.
[...]
(8.8) — a char, unsigned char, or std::byte type.
I didn't do an exhaustive search, but essentially anywhere that char gets special treatment in the standard, the same is given to std::byte. As far as accessing memory is concerned, it seems irrelevant that it is defined as an enum or what it's underlying type is.
EDIT
Maybe I understood your question wrongly: If you are asking, if the standard guarantees that sizeof(std::byte) == alignof(std::byte) == 1 then I believe this is not the case, as there seems to be no wording about how those properties depend on the underlying type of a scoped enum and I couldn't find special wording for std::byte in that regard. As #T.C. mentions in the comments, this is probably a defect in the language.

(Documenting the comments made by #T.C. that ultimately answer my question)
(I will remove this if T.C. ever wishes to reformulate his own answer.)
Oddly enough, N2213 had wording that guarantees identical
representation to underlying type, but that wording was removed in
N2347. In fact, it even removed the C++03 wording providing for
identical sizeof without any obvious replacement.
The more general question regarding enums and their underlying types
is probably worth a core issue, given that CWG approved this
formulation of std::byte and presumably thought that the
size/alignment relationship exists. As a practical matter, the clear
intent is for std::byte to take up, well, one byte; no sane
implementer would do it differently.

Related

Why is there no overload for printing `std::byte`?

The following code does not compile in C++20
#include <iostream>
#include <cstddef>
int main(){
std::byte b {65};
std::cout<<"byte: "<<b<<'\n';// Missing overload
}
When std::byte was added in C++17, why was there no corresponding operator<< overloading for printing it? I can maybe understand the choice of not printing containers, but why not std::byte? It tries to act as primitive type and we even have overloads for std::string, the recent std::string_view, and perhaps the most related std::complex, and std::bitset itself can be printed.
There are also std::hex and similar modifiers, so printing 0-255 by default should not be an issue.
Was this just oversight? What about operator>>, std::bitset has it and it is not trivial at all.
EDIT: Found out even std::bitset can be printed.
From the paper on std::byte (P0298R3): (emphasis mine)
Design Decisions
std::byte is not an integer and not a character
The key motivation here is to make byte a distinct type – to improve program safety by leveraging the type system. This leads to the design that std::byte is not an integer type, nor a character type. It is a distinct
type for accessing the bits that ultimately make up object storage.
As such, it is not required to be implicitly convertible/interpreted to be either a char or any integral type whatsoever and hence cannot be printed using std::cout unless explicitly cast to the required type.
Furthermore, this question might help.
std::byte is intended for accessing raw data. To allow me to replace that damn uint8_t sprinkled all over the codebase with something that actually says "this is raw and unparsed", instead of something that could be misunderstood as a C string.
To underline: std::byte doesn't "try to be a primitive", it represents something even less - raw data.
That it's implemented like this is mostly a quirk of C++ and compiler implementations (layout rules for "primitive" types are much simpler than for a struct or a class).
This kind of thing is mostly found in low level code where, honestly, printing shouldn't be used. Isn't possible sometimes.
My use case, for example, is receiving raw bytes over I2C (or RS485) and parsing them into frame which is then put into a struct. Why would I want to serialize raw bytes over actual data? Data I will have access to almost immediately?
To sum up this somewhat ranty answer, providing operator overloads for std::byte to work with iostream goes against the intent of this type.
And expressing intent in code as much as possible is one of important principles in modern programming.

Unions, aliasing and type-punning in practice: what works and what does not?

I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).
In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:
-fstrict-aliasing
Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.
For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.
Pay special attention to code like this:
union a_union {
int i;
double d;
};
int f() {
union a_union t;
t.d = 3.0;
return t.i;
}
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common.
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.
This is what I think I understood from this example and my doubts:
1) aliasing only works between similar types, or char
Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);
Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?
Consequence of 1) for non similar types (whatever this means), aliasing does not work;
2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;
Doubt: is aliasing a specific case of type-punning where types are similar?
I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning:
not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.
The questions:
can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?
Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.
Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly
int mantissa(float f)
{
return (int&)f & 0x7FFFFF; // Accessing a float as if it's an int
}
The exceptions are (simplified)
Accessing integers as their unsigned/signed counterparts
Accessing anything as a char, unsigned char or std::byte
This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as
void transform(float* dst, const int* src, int n)
{
for(int i = 0; i < n; i++)
dst[i] = src[i]; // Can be unrolled and use vector instructions
// If dst and src alias the results would be wrong
}
What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to
union {
int64_t num;
struct {
int32_t hi, lo;
} parts;
} u = {42};
u.parts.hi = 420;
This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.
Terminology is a great thing, I can use it however I want, and so can everyone else!
are two types similar when they have the same size in bytes? If not, what are similar types?
Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.
is aliasing a specific case of type-punning where types are similar?
Type punning is any technique that circumvents the type system.
Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.
GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.
Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.
Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.
In ANSI C (AKA C89) you have (section 3.3.2.3 Structure and union members):
if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined
In C99 you have (section 6.5.2.3 Structure and union members):
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
IOW, union-based type punning is allowed in C, although the actual semantics may be different, depending on the language standard supported (note that the C99 semantics is narrower than the C89's implementation-defined).
In C99 you also have (section 6.5 Expressions):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
And there's a section (6.2.7 Compatible type and composite type) in C99 that describes compatible types:
Two types have compatible type if their types are the same. Additional rules for
determining whether two types are compatible are described in 6.7.2 for type specifiers,
in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. ...
And then (6.7.5.1 Pointer declarators):
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
Simplifying it a bit, this means that in C by using a pointer you can access signed ints as unsigned ints (and vice versa) and you can access individual chars in anything. Anything else would amount to aliasing violation.
You can find similar language in the various versions of the C++ standard. However, as far as I can see in C++03 and C++11 union-based type punning isn't explicitly allowed (unlike in C).
According to the footnote 88 in the C11 draft N1570, the "strict aliasing rule" (6.5p7) is intended to specify the circumstances in which compilers must allow for the possibility that things may alias, but makes no attempt to define what aliasing is. Somewhere along the line, a popular belief has emerged that accesses other than those defined by the rule represent "aliasing", and those allowed don't, but in fact the opposite is true.
Given a function like:
int foo(int *p, int *q)
{ *p = 1; *q = 2; return *p; }
Section 6.5p7 doesn't say that p and q won't alias if they identify the same storage. Rather, it specifies that they are allowed to alias.
Note that not all operations which involve accessing storage of one type as another represent aliasing. An operation on an lvalue which is freshly visibly derived from another object doesn't "alias" that other object. Instead, it is an operation upon that object. Aliasing occurs if, between the time a reference to some storage is created and the time it is used, the same storage is referenced in some way not derived from the first, or code enters a context wherein that occurs.
Although the ability to recognize when an lvalue is derived from another is a Quality of Implementation issue, the authors of the Standard must have expected implementations to recognize some constructs beyond those mandated. There is no general permission to access any of the storage associated with a struct or union by using an lvalue of member type, nor does anything in the Standard explicitly say that an operation involving someStruct.member must be recognized as an operation on a someStruct. Instead, the authors of the Standard expected that compiler writers who make a reasonable effort to support constructs their customers need should be better placed than the Committee to judge the needs of those customers and fulfill them. Since any compiler that makes an even-remotely-reasonable effort to recognize derived references would notice that someStruct.member is derived from someStruct, the authors of the Standard saw no need to explicitly mandate that.
Unfortunately, the treatment of constructs like:
actOnStruct(&someUnion.someStruct);
int q=*(someUnion.intArray+i)
has evolved from "It's sufficiently obvious that actOnStruct and the pointer dereference should be expected to act upon someUnion (and consequently all the members thereof) that there's no need to mandate such behavior" to "Since the Standard doesn't require that implementations recognize that the actions above might affect someUnion, any code relying upon such behavior is broken and need not be supported". Neither of the above constructs is reliably supported by gcc or clang except in -fno-strict-aliasing mode, even though most of the "optimizations" that would be blocked by supporting them would generate code that is "efficient" but useless.
If you're using -fno-strict-aliasing on any compiler having such an option, almost anything will work. If you're using -fstrict-aliasing on icc, it will try to support constructs that use type punning without aliasing, though I don't know if there's any documentation about exactly what constructs it does or does not handle. If you use -fstrict-aliasing on gcc or clang, anything at all that works is purely by happenstance.
I think it's good to add a complementary answer, simply because when I asked the question I did not know how to fulfill my needs without using UNION: I got stubborn on using it because it seemed to answer precisely my needs.
The good way to do type punning and to avoid possible consequences of undefined behavior (depending on the compiler and other env. settings) is to use std::memcpy and copy the memory bytes from one type to another. This is explained - for example - here and here.
I've also read that often when a compiler produces valid code for type punning using unions, it produces the same binary code as if std::memcpy was used.
Finally, even if this information does not directly answer my original question it's so strictly related that I felt it was useful to add it here.

Resolve (u)int_fastX_t at compile time

Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
Wouldn't it increase performance to resolve such types at compile time to account for the case by chosing the optimal type for the actual use? The compiler would analyze the use of a _fast variable and then chose the optimal type. Factors coming into play could be alignment and the kind of operations used with the variable.
This would effectively make those types a language feature.
This could introduce bugs when the compiler suddenly decides to choose another width for such a variable. But one shouldn't use a _fast type in such use cases, where the behaviour depends on the width, anyways.
Is such compile time resolval permitted by the standard?
If yes, why isn't it implemented as of today?
If no, why isn't it in the standard?
No, this is not permitted by the standard. Keep in mind the C++ standard defers to C for this particular area, for example, C++11 defers to C99, as per C++11 1.1 /2. Specifically, C++11 18.4.1 Header <cstdint> synopsis /2 states:
The header defines all functions, types, and macros the same as 7.18 in the C standard.
So let's get your first contention out of the way, you state:
Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
The C standard has this to say, in c99 7.18.1.3 Fastest minimum-width integer types (my italics):
Each of the following types designates an integer type that is usually fastest to operate with among all integer types that have at least the specified width.
The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.
So you're indeed correct that a type cannot be fastest for all possible uses but this seems to not be what the authors had in mind in defining these aspects.
The introduction of the fixed-width types was (in my opinion) to solve the problem all those developers had in having different int widths across the various implementations.
Similarly, once a developer knows the range of values they want, the fast minimum-width types give them a way to do arithmetic on those values at the maximum possible speed.
Covering your three specific questions in your final paragraph (in bold below):
(1) Is such compile time resolution permitted by the standard?
I don't believe so. The relevant part of the C standard has this little piece of text:
For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and define the associated macros.
That seems to indicate that it must be a typedef provided by the implementation and, since there are no "variable" typedefs, it has to be fixed.
There may be wiggle room because it could be possible to provide a different typedef depending on certain environmental considerations but the difficulty in actually implementing this seems very high (see my answer to your third question below).
Chief amongst these is that these adaptable types, should they have external linkage, would require agreement amongst all the compiled translation units when linked together. Having one unit with a 16-bit type and another with a 32-bit type is going to cause all sorts of problems.
(2) If yes, why isn't it implemented as of today?
I'm pushing "no" as an answer to your first question so I'm not going to speculate on this other than by referring you to the answer to the third question below (it's probably not implemented because it's very hard, with dubious benefits).
(3) If no, why isn't it in the standard?
A standard is a contract between the implementor and the user and describes what the implementor will provide. It's usual that the standards committees tend to be more populated by the former (who aren't that keen on making too much extra work for themselves) than the latter.
For example, I would love to have all the you-beaut C++ data structures in C but this would have the consequence that standards versions would be decades apart rather than years :-)

union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

Background
Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via #ecatmur ( https://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:
C11 (6.5.2.3 Structure and union members; Semantics):
[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently
contains one of these structures, it is permitted to inspect the
common initial part of any of them anywhere that a declaration of
the completed type of the union is visible. Two structures share a
common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or
more initial members.
C++03 ([class.mem]/16):
If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one
of these POD-structs, it is permitted to inspect the common initial
part of any of them. Two POD-structs share a common initial sequence
if corresponding members have layout-compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.
Other versions of the two standards have similar language; since C++11
the terminology used is standard-layout rather than POD.
Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.
But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in the corresponding section of any C++ standard.
#loop and #Mints97, here - https://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).
Standards discussions around this
[snipped - see my answer]
Questions
From this, then, my questions were:
What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?
Are we to assume that this omission in C++ is very deliberate?
What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?
If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?
What, if any, interesting ramifications does it have at compile- or runtime? For example, #ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.
I'd imagine it permits more aggressive optimization; C can assume that
function arguments S* s and T* t do not alias even if they share a
common initial sequence as long as no union { S; T; } is in view,
while C++ can make that assumption only at link time. Might be worth
asking a separate question about that difference.
Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!
I've found my way through the labyrinth to some great sources on this, and I think I've got a pretty comprehensive summary of it. I'm posting this as an answer because it seems to explain both the (IMO very misguided) intention of the C clause and the fact that C++ does not inherit it. This will evolve over time if I discover further supporting material or the situation changes.
This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects, so I'll welcome clarifications/suggestions on how to improve this answer - or simply a better answer if anyone has one.
Finally, some concrete commentary
Through vaguely related threads, I found the following answer by #tab - and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: answer by tab on StackOverflow
The GCC link contains some interesting discussion and reveals a sizeable amount of confusion and conflicting interpretations on part of the Committee and compiler vendors - surrounding the subject of union member structs, punning, and aliasing in both C and C++.
At the end of that, we're linked to the main event - another BugZilla thread, Bug 65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:
Origin of the added line in C99
C proposal N685 is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types, as we can see from this quote:
The proposed solution is to require that a union declaration be visible
if aliases through a common initial sequence (like the above) are possible.
Therefore the following TU provides this kind of aliasing if desired:
union utag {
struct tag1 { int m1; double d2; } st1;
struct tag2 { int m1; char c2; } st2;
};
int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
pst2->m1 = 2;
pst3->m1 = 0; /* might be an alias for pst2->m1 */
return pst2->m1;
}
Judging by the GCC discussion and comments below such as #ecatmur's, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented.
It's obvious how difficult it would be to satisfy this interpretation of the added clause without totally crippling many optimisations - for little benefit, as few coders would want this guarantee, and those who do can just turn on fno-strict-aliasing (which IMO indicates larger problems). If implemented, this allowance is more likely to catch people out and spuriously interact with other declarations of unions, than to be useful.
Omission of the line from C++
Following on from this and a comment I made elsewhere, #Potatoswatter in this answer here on SO states that:
The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.
In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:
The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.
Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; compare to my final section below about C. At present, though, it is not (and again, IMO, this is good).
What does this mean for C++ and practical C implementations?
So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:
C++ defect 1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:
In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union])
of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m
of another union member of struct type T2 provided m is part of the
common initial sequence of T1 and T2. [Note: Reading a volatile object
through a non-volatile glvalue has undefined behavior (7.1.6.1
[dcl.type.cv]). —end note]
This seems to clarify the meaning of the old wording: to me, it says that any specifically allowed 'punning' among union member structs with common initial sequences must be done via an instance of the parent union - rather than being based on the type of the structs (e.g. pointers to them passed to some function). This wording seems to rule out any other interpretation, a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!
The upshot is that - as nicely demonstrated by #ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?
Possible reversal of this line in C / clarification in C++
It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:
Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.
Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."
Potatoswatter inspiringly concludes:
The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.
We can only hope!
Again, all further thoughts are welcome.
I suspect it means that the access to these common parts is permitted not only through the union type, but outside of the union. That is, suppose we have this:
union u {
struct s1 m1;
struct s2 m2;
};
Now suppose that in some function we have a struct s1 *p1 pointer which we know was lifted from the m1 member of such a union. We can cast this to a struct s2 * pointer and still access the members which are in common with struct s1. But somewhere in the scope, a declaration of union u has to be visible. And it has to be the complete declaration, which informs the compiler that the members are struct s1 and struct s2.
The likely intent is that if there is such a type in scope, then the compiler has knowledge that struct s1 and struct s2 are aliased, and so an access through a struct s1 * pointer is suspected of really accessing a struct s2 or vice versa.
In the absence of any visible union type which joins those types this way, there is no such knowledge; strict aliasing can be applied.
Since the wording is absent from C++, then to take advantage of the "common initial members relaxation" rule in that language, you have to route the accesses through the union type, as is commonly done anyway:
union u *ptr_any;
// ...
ptr_any->m1.common_initial_member = 42;
fun(ptr_any->m2.common_initial_member); // pass 42 to fun

C++11 and [17.5.2.1.3] Bitmask Types

The Standard allows one to choose between an integer type, an enum, and a std::bitset.
Why would a library implementor use one over the other given these choices?
Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:
ctype_base::mask is implemented using an integer type:
<__locale>
regex_constants::syntax_option_type is implemented using an enum + overloaded operators:
<regex>
The gcc project's libstdc++ uses all three:
ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>
regex_constants::syntax_option_type is implemented using an integer type,
regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>
AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.
The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.
The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.
Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).
Any available enlightenment in this area would be appreciated.
The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…
Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.
Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.
It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.
The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).
My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.
I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.
Why would the standard allow different ways of implementing the library? And the answer is: Why not?
As you have seen, all three options are obviously used in some implementations. The standard doesn't want to make existing implementations non-conforming, if that can be avoided.
One reason to use a bitset could be that its size fits better than an enum or an integer. Not all systems even have a std::uint32_t. Maybe a bitset<24> will work better there?