Are "anonymous structs" standard? And, really, what *are* they? - c++

MSDN reckons that anonymous structs are non-standard in C++:
A Microsoft C extension allows you to declare a structure variable
within another structure without giving it a name. These nested
structures are called anonymous structures. C++ does not allow
anonymous structures.
You can access the members of an anonymous structure as if they were
members in the containing structure.
#K-ballo agrees.
I'm told that this feature isn't necessarily the same as just creating an unnamed struct but I can't see a distinction in terms of standard wording.
C++11 says:
[C++11: 9/1]: [..] A class-specifier whose class-head omits the class-head-name defines an unnamed class.
and provides an entire grammatical construction for a type definition missing a name.
C++03 lacks this explicit wording, but similarly indicates that the identifier in a type definition is optional, and makes reference to "unnamed classes" in 9.4.2/5 and 3.5/4.
So is MSDN wrong, and these things are all completely standard?
Or is there some subtlety I'm missing between "unnamed structs/classes" and the same when used as members that prevents them from being covered by this C++03/C++11 functionality?
Am I missing some fundamental difference between "unnamed struct" and "anonymous struct"? They look like synonyms to me.

All the standard text refers to creating an "unnamed struct":
struct {
int hi;
int bye;
};
Just a nice friendly type, with no accessible name.
In a standard way, it could be instantiated as a member like this:
struct Foo {
struct {
int hi;
int bye;
} bar;
};
int main()
{
Foo f;
f.bar.hi = 3;
}
But an "anonymous struct" is subtly different — it's the combination of an "unnamed struct" and the fact that you magically get members out of it in the parent object:
struct Foo {
struct {
int hi;
int bye;
}; // <--- no member name!
};
int main()
{
Foo f;
f.hi = 3;
}
Converse to intuition†, this does not merely create an unnamed struct that's nested witin Foo, but also automatically gives you an "anonymous member" of sorts which makes the members accessible within the parent object.
It is this functionality that is non-standard. GCC does support it, and so does Visual C++. Windows API headers make use of this feature by default, but you can specify that you don't want it by adding #define NONAMELESSUNION before including the Windows header files.
Compare with the standard functionality of "anonymous unions" which do a similar thing:
struct Foo {
union {
int hi;
int bye;
}; // <--- no member name!
};
int main()
{
Foo f;
f.hi = 3;
}
† It appears that, though the term "unnamed" refers to the type (i.e. "the class" or "the struct") itself, the term "anonymous" refers instead to the actual instantiated member (using an older meaning of "the struct" that's closer to "an object of some structy type"). This was likely the root of your initial confusion.

The things that Microsoft calls anonymous structs are not standard. An unnamed struct is just an ordinary struct that doesn't have a name. There's not much you can do with one, unless you also define an object of that type:
struct {
int i;
double d;
} my_object;
my_object.d = 2.3;
Anonymous unions are part of the standard, and they have the behavior you'd expect from reading Microsoft's description of their anonymous structs:
union {
int i;
double d;
};
d = 2.3;

The standard talks about anonymous unions: [9.5]/5
A union of the form
union { member-specification } ;
is called an anonymous union; it defines an unnamed object of unnamed type. The member-specification of an anonymous union shall only define non-static data members. [ Note: Nested types and functions cannot be declared within an anonymous union. —end note ] The names of the members of an anonymous union shall be distinct from the names of any other entity in the scope in which the anonymous union is declared. For the purpose of name lookup, after the anonymous union definition, the members of the anonymous union are considered to have been defined in the scope in which the anonymous union is declared. [ Example:
void f() {
union { int a; const char* p; };
a = 1;
p = "Jennifer";
}
Here a and p are used like ordinary (nonmember) variables, but since they are union members they have the same address. —end example ]
The anonymous structs that Microsoft talks about is this feature for unions but applied to structs. Is not just an unnamed definition, its important to note that mebers of the anonymous union/struct are considered to have been defined in the scope in which the anonymous union/struct is declared.
As far as I know, there is no such behavior for unnamed structs in the Standard. Note how in the cited example you can achieve things that wouldn't be otherwise possible, like sharing storage for variables in the stack, while anonymous structs bring nothing new to the table.

Related

What does the spec mean by the highlighted sentence in §3.3.7/1 item 5?

§3.3.7/1 item 5:
The potential scope of a declaration that extends to or past the end
of a class definition also extends to the regions defined by its
member definitions, even if the members are defined lexically outside
the class (this includes static data member definitions, nested class
definitions, and member function definitions, including the member
function body and any portion of the declarator part of such
definitions which follows the declarator-id, including a
parameter-declaration-clause and any default arguments (8.3.6)).
Would it be possible to identify such a declaration in the first example given in this paragraph?
typedef int c;
enum { i = 1 };
class X {
char v[i];
int f() { return sizeof(c); }
char c;
enum { i = 2 };
};
It looks as though it's saying, among other things, and in addition to the answer above, that given all the code outside that class definition, even if X::f were defined outside the class, like so:
typedef int c;
enum { i = 1 };
class X {
char v[i];
int f();
char c;
enum { i = 2 };
};
int X::f() {
return sizeof(c);
}
that, in the context of the definition of X::f, c would refer to the member variable X::c, not the typedef above, because even though it kind of looks like it's being defined globally, f actually lives in X's scope.
Yes. The declaration of the member c of the class X is visible inside the definition of f, even though lexically, it comes afterwards. This means that the sizeof expression applies to the member, and not to the type outside, which mean it will return 1, not whatever the size of int is (probably 4).
Also, the enum constant X::i should, according to this rule, be visible when the array v is declared, although this surprises me, and I would strongly suggest to avoid such code - sounds like a compiler bug or developer misunderstanding just waiting to happen.
Edit: Lightning Strikes in Orbit is probably right that the comment about parts of the declarator only applies to out-of-line definitions.

Why can a typedef-name for a struct not be used interchangeably with the struct name?

The following code (live example) does not compile:
struct S {};
typedef struct S T;
S s = T(); // OK
struct T * p; // error: elaborated type refers to a typedef
T::T(){} // error: C++ requires a type specifier for all declarations
Why is the language designed to not permit the last two lines?
Relevant Standard quote (N4140 §7.1.3/8):
[ Note: A typedef-name that names a class type, or a cv-qualified version thereof, is also a class-name (9.1).
If a typedef-name is used to identify the subject of an elaborated-type-specifier (7.1.6.3), a class definition (Clause 9), a constructor declaration (12.1), or a destructor declaration (12.4), the program is ill-formed.
—end note ]
So there are three unrelated issues. The first one you have in the quote you provide:
struct T * p;
That is illegal as T is a typedef.
T{};
That is illegal at namespace level, but would be legal in other concepts, for example as part of the initialization of a global, or inside a function:
T t = T{};
void f() { T{}; }
It really means to create a value-initialized temporary object of type T.
T::T(){}
That would be a valid definition for a default constructor, except that you did not declare one. If you modify the S to have a user declared default constructor that would work:
struct S { S(); };
Why is the language designed to not permit the last two lines?
Those two lines, in the updated question are:
struct T* p;
T::T() {}
The second one is legal, but you are trying to define a function that has not been declared as a member, so this is also unrelated to the original text. Which leaves us with one: struct T* p.
The motive comes from C. The identifiers for user defined types and other names appear to live in different scopes, when lookup is trying to resolve a name not qualified with struct or enum, it will ignore struct and enums, when trying to resolve a struct or enum it ignores everything else. The following is valid C (and C++):
struct T {}; // 1
typedef struct S {} T; // 2
struct T t;
In C++ the rules for lookup changed a bit and you can use the type specifiers without explicitly qualifying it but that is a different thing. Additionally, typedef-ed names can be used in other contexts that were not possible in C.
An special case is lookup for an elaborated type specifier, should the typedef-ed name be usable in an elaborated type specifier? If it was, the semantics of the program above would change and where in C t is of type T (defined in 1), in C++ it would become S (defined in 2).
Note that this is to some extent a wild guess, I did not make the rules and I don't know what went into consideration there. Note that C and C++ were never really compatible in this respect, a similar example changes semantics in C and C++:
int T;
void f() {
struct T { int data[10]; };
printf("%d\n", sizeof(T));
}
That program will print a number 10x larger in C++ than in C. But the ability to use a type without having to qualify it with class or struct was probably more important than breaking compatibility in a few cases...

Regarding struct variable declaration in C vrs C++

in C
struct node {
int x;
};
node *y; // incorrect
struct node *y; // correct
in C++
struct node{
int x;
};
node *y; // correct
y = new node; // correct
Question: Why is the tag name acceptable to create a pointer in C++ not in C?
Because the C++ standard says so, and the C standard doesn't.
I would say that C++ made the change to make classes easier to use (remember, classes and structs are fundamentally the same thing in C++; they only differ in default visibility), and C didn't follow suit because that would break tons of existing C code.
The reason is not just because the standard says so, C++ actually has namespaces while the struct keyword allowed for a primitive form of namespace in C which allowed you to have a struct and a non-struct identifier with the same name. We can see this primitive form of a namespace from the C99 draft standard section 6.2.3 Name spaces of identifiers which says:
If more than one declaration of a particular identifier is visible at any point in a
translation unit, the syntactic context disambiguates uses that refer to different entities.
Thus, there are separate name spaces for various categories of identifiers, as follows:
and has the following bullets:
— label names (disambiguated by the syntax of the label declaration and use);
— the tags of structures, unions, and enumerations (disambiguated by following any24)
of the keywords struct, union, or enum);
— the members of structures or unions; each structure or union has a separate name
space for its members (disambiguated by the type of the expression used to access the
member via the . or -> operator);
— all other identifiers, called ordinary identifiers (declared in ordinary declarators or as enumeration constants).
I could make up an example but POSIX gives us a great example with stat which is both a function and a struct:
int stat(const char *restrict path, struct stat *restrict buf);
^^^^ ^^^^^^^^^^^
If you want to write the way node* y in C, just
typedef struct node {
int x;
} node;

Name lookup Clarification

$10.2/4- "[ Note: Looking up a name in
an elaborated-type-specifier (3.4.4)
or base-specifier (Clause 10), for
instance, ignores all nontype
declarations, while looking up a name
in a nested-name-specifier (3.4.3)
ignores function, variable, and
enumerator declarations."
I have found this statement to be very confusing in this section while describing about name lookup.
void S(){}
struct S{
S(){cout << 1;}
void f(){}
static const int x = 0;
};
int main(){
struct S *p = new struct ::S; // here ::S refers to type
p->::S::f();
S::x; // base specifier, ignores the function declaration 'S'
::S(); // nested name specifier, ignores the struct declaration 'S'.
delete p;
}
My questions:
Is my understanding of the rules correct?
Why ::S on the line doing new treated automatically to mean struct S, whereas in the last line ::S means the functions S in the global namespace.
Does this point to an ambiguity in the documentation, or is it yet another day for me to stay away from C++ Standard document?
Q1: I think so.
Q2: Compatibility with C. When you declare a struct in C, the tag name is just that, a tag name. To be able to use it in a standalone way, you need a typedef. In C++ you don't need the typedef, that makes live easier. But C++ rules have been complicated by the need to be able to import already existing C headers which "overloaded" the tag name with a function name. The canonical example of that is the Unix stat() function which uses a struct stat* as argument.
Q3: Standard reading is usually quite difficult... you need to already know that there is no place elsewhere modifying what you are reading. It isn't strange that people knowing how to do that are language lawyer...
You are mistaken about the second comment. In S::x, the S is a name in a nested name specifier. What the Standard refers to with "base-specifier" is the following
namespace B { struct X { }; void X() }
struct A : B::X { }; // B::X is a base-specifier
You are also not correct about this:
::S(); // nested name specifier, ignores the struct declaration 'S'.`
That code calls the function not because ::S would be a nested-name-specifier (it isn't a nested-name-specifier!), but because function names hide class or enumeration names if both the function and the class/enumeration are declared in the same scope.
FWIW, the following code would be equally valid for line 2 of your main
p->S::f();
What's important is that S preceedes a ::, which makes lookup ignore the function. That you put :: before S has no effect in your case.

Unnamed classes

Motivated by the discussion
The grammar for C++ classes is defined as
class-key identifier *[opt]* base-clause *[opt]* (Italics are mine)
This to me means that the class name is option and we can have unnamed classes in C++.
So, is the following well-formed?
struct X{
struct{
int x;
int y;
};
};
int main(){}
VS: error C2467: illegal declaration
of anonymous 'struct'
Comeau online: error: declaration does
not declare anything
struct{
GCC(ideone): Compiles fine
Any thoughts?
No, it is not well-formed. You cannot derive the language syntax from these grammar statements alone. The extra requirements are given in the text of the standard also have to be taken into account. In this case that would be
7 Declarations
...
3 In a simple-declaration, the optional init-declarator-list can be
omitted only when declaring a class
(clause 9) or enumeration (7.2), that
is, when the decl-specifier-seq
contains either a class-specifier, an
elaboratedtype-specifier with a
class-key (9.1), or an enum-specifier.
In these cases and whenever a
class-specifier or enum-specifier is
present in the decl-specifier-seq, the
identifiers in these specifiers are
among the names being declared by the
declaration (as class-names,
enum-names, or enumerators, depending
on the syntax). In such cases, and
except for the declaration of an
unnamed bit-field (9.6), the
decl-specifier-seq shall introduce one
or more names into the program, or
shall redeclare a name introduced by a
previous declaration.
The last sentence is the one that matters in this case
The "optional" part is only there to allow declarations like
struct { int x; } s;
typedef struct { int x, y; } Point;
The first one which declares a class type with no linkage and variable s of that type. Note that types with no linkage cannot be used to declare a variable with linkage, meaning that such declaration cannot be used in namespace scope.
Your example is ill-formed, but this would be legal
struct X {
struct {
int x;
int y;
} point;
};
Also, nameless class syntax is used to declare anonymous unions (although I'm a bit puzzled by the fact that 7/3 does not mention anonymous unions).
That code is actually valid in MSVC, you must have compiled in a restricted mode.
And while I would most likely never use them, they do allow for some interesting usage, like this:
X obj;
obj.x=1;
obj.y=2;
They're used for example in the LARGE_INTEGER class, only it's an union instead. This way you can avoid sub-objects when all you really want is one member to be splittable into smaller pieces.
The LARGE_INTEGER declaration as a visual example:
#if defined(MIDL_PASS)
typedef struct _LARGE_INTEGER {
#else // MIDL_PASS
typedef union _LARGE_INTEGER {
struct {
DWORD LowPart;
LONG HighPart;
};
struct {
DWORD LowPart;
LONG HighPart;
} u;
#endif //MIDL_PASS
LONGLONG QuadPart;
} LARGE_INTEGER;
As far as I know however, this isn't valid standard C++, it's only allowed as extensions in gcc and msvc.