Using standard layout types to communicate with other languages - c++

This draft of the standard contains a note at 11.2.6 regarding standard layout types :
[Note 3: Standard-layout classes are useful for communicating with code written in other programming languages. Their layout is specified in [class.mem]. — end note]
Following the link to class.mem we find rules regarding the layout of standard-layout types starting here but it is not clear to me what about them makes them useful for communicating with other languages. It all seems to be about layout-compatible types and common initial sequence, but I see no indication that these compatibility requirements extend being a given implementation.
I always assumed that standard layout types could not have arbitrary padding imposed by an implementation and had to follow an "intuitive" layout which would make them easy to use from other languages. But I can't seem to find any such rules.
What does this note mean? Did I miss any rules that force standard layout types to at least be consistent across a given platform?

The standard can’t meaningfully speak about other languages and implementations: even if one could unambiguously define “platform”, all it can do is constrain a C++ implementation, possibly in a fashion that would be impossible to satisfy for whatever arbitrary choices that other software makes. That said, the ABI can define such things, and standard-layout types are those that don’t have anything “C++-specific” (like references, base class subobjects, or a virtual table pointer) that would presumably fail to map into some other environment. In practice that “other environment” is just C, or some other language that itself follows C rules (e.g., ctypes in Python).

Related

What are the similarities and differences between C++'s concepts and Rust's traits?

In Rust, the main tool for abstraction are traits. In C++, there are two tools for abstractions: abstract classes and templates. To get rid of some of the disadvantages of using templates (e.g. hard to read error messages), C++ introduced concepts which are "named sets of requirements".
Both features seem to be fairly similar:
Defining a trait/concept is done by listing requirements.
Both can be used to bound/restrict generic/template type parameters.
Rust traits and C++ templates with concepts are both monomorphized (I know Rust traits can also be used with dynamic dispatch, but that's a different story).
But from what I understand, there are also notable differences. For example, C++'s concepts seem to define a set of expressions that have to be valid instead of listing function signatures. But there is a lot of different and confusing information out there (maybe because concepts only land in C++20?). That's why I'd like to know: what exactly are the differences between and the similarities of C++'s concepts and Rust's traits?
Are there features that are only offered by either concepts or traits? E.g. what about Rust's associated types and consts? Or bounding a type by multiple traits/concepts?
Disclaimer: I have not yet used concepts, all I know about them was gleaned from the various proposals and cppreference, so take this answer with a grain of salt.
Run-Time Polymorphism
Rust Traits are used both for Compile-Time Polymorphism and, sometimes, Run-Time Polymorphism; Concepts are only about Compile-Time Polymorphism.
Structural vs Nominal.
The greatest difference between Concepts and Traits is that Concepts use structural typing whereas Traits use nominal typing:
In C++ a type never explicitly satisfies a Concept; it may "accidentally" satisfy it if it happens to satisfy all requirements.
In Rust a specific syntactic construct impl Trait for Type is used to explicitly indicates that a type implements a Trait.
There are a number of consequences; in general Nominal Typing is better from a maintainability point of view -- adding a requirement to a Trait -- whereas Structural Typing is better a bridging 3rd party libraries -- a type from library A can satisfy a Concept from library B without them being aware of each other.
Constraints
Traits are mandatory:
No method can be called on a variable of a generic type without this type being required to implement a trait providing the method.
Concepts are entirely optional:
A method can be called on a variable of a generic type without this type being required to satisfy any Concept, or being constrained in any way.
A method can be called on a variable of a generic type satisfying a Concept (or several) without that method being specified by any Concept or Constraint.
Constraints (see note) can be entirely ad-hoc, and specify requirements without using a named Concept; and once again, they are entirely optional.
Note: a Constraint is introduced by a requires clause and specifies either ad-hoc requirements or requirements based on Concepts.
Requirements
The set of expressible requirements is different:
Concepts/Constraints work by substitution, so allow the whole breadth of the languages; requirements include: nested types/constants/variables, methods, fields, ability to be used as an argument of another function/method, ability to used as a generic argument of another type, and combinations thereof.
Traits, by contrast, only allow a small set of requirements: associated types/constants, and methods.
Overload Selection
Rust has no concept of ad-hoc overloading, overloading only occurs by Traits and specialization is not possible yet.
C++ Constraints can be used to "order" overloads from least specific to most specific, so the compiler can automatically select the most specific overload for which requirements are satisfied.
Note: prior to this, either SFINAE or tag-dispatching would be used in C++ to achieve the selection; calisthenics were required to work with open-ended overload sets.
Disjunction
How to use this feature is not quite clear to me yet.
The requirement mechanisms in Rust are purely additive (conjunctions, aka &&), in contrast, in C++ requires clauses can contain disjunctions (aka ||).

What is the difference between fundamental vs. built-in types C++

I'm reading my notes for my C++ class in my college. And it states that types can be classified into categories based on their relationship to the underlying hardware facilities:
fundamental types - correspond directly to the hardware facilities
built-in types - reflect the capabilities of the hardware facilities directly and efficiently
I understand that fundamental types are int, bool, char, double and so forth.
I always thought fundamental types are built-in types as they are built in within the C++ language. Or am I wrong? What is the difference between fundamental and built-in?
There is no such dichotomy in C++. Instead, there are fundamental types and compound types. Fundamental types are also informally known as built-in types.
built-in types - reflect the capabilities of the hardware facilities
directly and efficiently
The only reference I can find is at senecac.on.ca Overview that is about an object-oriented language, not specifically C++.
C++, as others have pointed put, makes no difference for "fundamental types" and "built-in types", even "intrinsic types" or "primitive types", they all are synonyms.
Trying to figure out what the author of that sentence is trying to explain, I can think of the size_t type. It's not something that a CPU can use "as is". It's an unsigned integer, but implementation-defined. Once the implementation defines it, then it fits into that "built-in types" definition sentence.

Resolve (u)int_fastX_t at compile time

Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
Wouldn't it increase performance to resolve such types at compile time to account for the case by chosing the optimal type for the actual use? The compiler would analyze the use of a _fast variable and then chose the optimal type. Factors coming into play could be alignment and the kind of operations used with the variable.
This would effectively make those types a language feature.
This could introduce bugs when the compiler suddenly decides to choose another width for such a variable. But one shouldn't use a _fast type in such use cases, where the behaviour depends on the width, anyways.
Is such compile time resolval permitted by the standard?
If yes, why isn't it implemented as of today?
If no, why isn't it in the standard?
No, this is not permitted by the standard. Keep in mind the C++ standard defers to C for this particular area, for example, C++11 defers to C99, as per C++11 1.1 /2. Specifically, C++11 18.4.1 Header <cstdint> synopsis /2 states:
The header defines all functions, types, and macros the same as 7.18 in the C standard.
So let's get your first contention out of the way, you state:
Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
The C standard has this to say, in c99 7.18.1.3 Fastest minimum-width integer types (my italics):
Each of the following types designates an integer type that is usually fastest to operate with among all integer types that have at least the specified width.
The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.
So you're indeed correct that a type cannot be fastest for all possible uses but this seems to not be what the authors had in mind in defining these aspects.
The introduction of the fixed-width types was (in my opinion) to solve the problem all those developers had in having different int widths across the various implementations.
Similarly, once a developer knows the range of values they want, the fast minimum-width types give them a way to do arithmetic on those values at the maximum possible speed.
Covering your three specific questions in your final paragraph (in bold below):
(1) Is such compile time resolution permitted by the standard?
I don't believe so. The relevant part of the C standard has this little piece of text:
For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and define the associated macros.
That seems to indicate that it must be a typedef provided by the implementation and, since there are no "variable" typedefs, it has to be fixed.
There may be wiggle room because it could be possible to provide a different typedef depending on certain environmental considerations but the difficulty in actually implementing this seems very high (see my answer to your third question below).
Chief amongst these is that these adaptable types, should they have external linkage, would require agreement amongst all the compiled translation units when linked together. Having one unit with a 16-bit type and another with a 32-bit type is going to cause all sorts of problems.
(2) If yes, why isn't it implemented as of today?
I'm pushing "no" as an answer to your first question so I'm not going to speculate on this other than by referring you to the answer to the third question below (it's probably not implemented because it's very hard, with dubious benefits).
(3) If no, why isn't it in the standard?
A standard is a contract between the implementor and the user and describes what the implementor will provide. It's usual that the standards committees tend to be more populated by the former (who aren't that keen on making too much extra work for themselves) than the latter.
For example, I would love to have all the you-beaut C++ data structures in C but this would have the consequence that standards versions would be decades apart rather than years :-)

Does C++ standard address the concept "TYPE"?

I have been reading Design Patterns(GOF), and it presents a clear distinction between the class and the type of an object as specified below.
The TYPE of the object is defined by it's interface(set of methods that it can handle) and the CLASS of the object defines its implementation.
I have read in many books on C++ that a Class is user-defined Type. And nothing more has been mentioned about the concept TYPE (not even as GOF mentions it.)
I just want to know does C++ standard mentions anywhere the concept TYPE in any way if not the way that GOF mentions.
Or is it assumed that this difference is too basic to mention?
C++ defines several kinds of types. Class types are just one such kind of type; others are integral types, floating-point types, pointer types, array types, function types, and so forth. The concept of "type" is well defined in C++.
The C++ standard discusses types in section 3.9 [basic.types] (in the 2011 ISO C++ standard; the section number may be different in other editions).
The Design Patterns book is is not language-specific, and it's using the words "type" and "class" in a different way than the way the C++ standard uses them.

union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

Background
Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via #ecatmur ( https://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:
C11 (6.5.2.3 Structure and union members; Semantics):
[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently
contains one of these structures, it is permitted to inspect the
common initial part of any of them anywhere that a declaration of
the completed type of the union is visible. Two structures share a
common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or
more initial members.
C++03 ([class.mem]/16):
If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one
of these POD-structs, it is permitted to inspect the common initial
part of any of them. Two POD-structs share a common initial sequence
if corresponding members have layout-compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.
Other versions of the two standards have similar language; since C++11
the terminology used is standard-layout rather than POD.
Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.
But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in the corresponding section of any C++ standard.
#loop and #Mints97, here - https://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).
Standards discussions around this
[snipped - see my answer]
Questions
From this, then, my questions were:
What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?
Are we to assume that this omission in C++ is very deliberate?
What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?
If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?
What, if any, interesting ramifications does it have at compile- or runtime? For example, #ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.
I'd imagine it permits more aggressive optimization; C can assume that
function arguments S* s and T* t do not alias even if they share a
common initial sequence as long as no union { S; T; } is in view,
while C++ can make that assumption only at link time. Might be worth
asking a separate question about that difference.
Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!
I've found my way through the labyrinth to some great sources on this, and I think I've got a pretty comprehensive summary of it. I'm posting this as an answer because it seems to explain both the (IMO very misguided) intention of the C clause and the fact that C++ does not inherit it. This will evolve over time if I discover further supporting material or the situation changes.
This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects, so I'll welcome clarifications/suggestions on how to improve this answer - or simply a better answer if anyone has one.
Finally, some concrete commentary
Through vaguely related threads, I found the following answer by #tab - and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: answer by tab on StackOverflow
The GCC link contains some interesting discussion and reveals a sizeable amount of confusion and conflicting interpretations on part of the Committee and compiler vendors - surrounding the subject of union member structs, punning, and aliasing in both C and C++.
At the end of that, we're linked to the main event - another BugZilla thread, Bug 65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:
Origin of the added line in C99
C proposal N685 is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types, as we can see from this quote:
The proposed solution is to require that a union declaration be visible
if aliases through a common initial sequence (like the above) are possible.
Therefore the following TU provides this kind of aliasing if desired:
union utag {
struct tag1 { int m1; double d2; } st1;
struct tag2 { int m1; char c2; } st2;
};
int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
pst2->m1 = 2;
pst3->m1 = 0; /* might be an alias for pst2->m1 */
return pst2->m1;
}
Judging by the GCC discussion and comments below such as #ecatmur's, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented.
It's obvious how difficult it would be to satisfy this interpretation of the added clause without totally crippling many optimisations - for little benefit, as few coders would want this guarantee, and those who do can just turn on fno-strict-aliasing (which IMO indicates larger problems). If implemented, this allowance is more likely to catch people out and spuriously interact with other declarations of unions, than to be useful.
Omission of the line from C++
Following on from this and a comment I made elsewhere, #Potatoswatter in this answer here on SO states that:
The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.
In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:
The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.
Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; compare to my final section below about C. At present, though, it is not (and again, IMO, this is good).
What does this mean for C++ and practical C implementations?
So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:
C++ defect 1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:
In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union])
of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m
of another union member of struct type T2 provided m is part of the
common initial sequence of T1 and T2. [Note: Reading a volatile object
through a non-volatile glvalue has undefined behavior (7.1.6.1
[dcl.type.cv]). —end note]
This seems to clarify the meaning of the old wording: to me, it says that any specifically allowed 'punning' among union member structs with common initial sequences must be done via an instance of the parent union - rather than being based on the type of the structs (e.g. pointers to them passed to some function). This wording seems to rule out any other interpretation, a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!
The upshot is that - as nicely demonstrated by #ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?
Possible reversal of this line in C / clarification in C++
It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:
Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.
Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."
Potatoswatter inspiringly concludes:
The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.
We can only hope!
Again, all further thoughts are welcome.
I suspect it means that the access to these common parts is permitted not only through the union type, but outside of the union. That is, suppose we have this:
union u {
struct s1 m1;
struct s2 m2;
};
Now suppose that in some function we have a struct s1 *p1 pointer which we know was lifted from the m1 member of such a union. We can cast this to a struct s2 * pointer and still access the members which are in common with struct s1. But somewhere in the scope, a declaration of union u has to be visible. And it has to be the complete declaration, which informs the compiler that the members are struct s1 and struct s2.
The likely intent is that if there is such a type in scope, then the compiler has knowledge that struct s1 and struct s2 are aliased, and so an access through a struct s1 * pointer is suspected of really accessing a struct s2 or vice versa.
In the absence of any visible union type which joins those types this way, there is no such knowledge; strict aliasing can be applied.
Since the wording is absent from C++, then to take advantage of the "common initial members relaxation" rule in that language, you have to route the accesses through the union type, as is commonly done anyway:
union u *ptr_any;
// ...
ptr_any->m1.common_initial_member = 42;
fun(ptr_any->m2.common_initial_member); // pass 42 to fun