Why doesn't C++ need forward declarations for class members? - c++

I was under the impression that everything in C++ must be declared before being used.
In fact, I remember reading that this is the reason why the use of auto in return types is not valid C++0x without something like decltype: the compiler must know the declared type before evaluating the function body.
Imagine my surprise when I noticed (after a long time) that the following code is in fact perfectly legal:
[Edit: Changed example.]
class Foo
{
Foo(int x = y);
static const int y = 5;
};
So now I don't understand:
Why doesn't the compiler require a forward declaration inside classes, when it requires them in other places?

The standard says (section 3.3.7):
The potential scope of a name declared in a class consists not only of the declarative region following the name’s point of declaration, but also of all function bodies, brace-or-equal-initializers of non-static data members, and default arguments in that class (including such things in nested classes).
This is probably accomplished by delaying processing bodies of inline member functions until after parsing the entire class definition.

Function definitions within the class body are treated as if they were actually defined after the class has been defined. So your code is equivalent to:
class Foo
{
Foo();
int x, *p;
};
inline Foo::Foo() { p = &x; }

Actually, I think you need to reverse the question to understand it.
Why does C++ require forward declaration ?
Because of the way C++ works (include files, not modules), it would otherwise need to wait for the whole Translation Unit before being able to assess, for sure, what the functions are. There are several downsides here:
compilation time would take yet another hit
it would be nigh impossible to provide any guarantee for code in headers, since any introduction of a later function could invalidate it all
Why is a class different ?
A class is by definition contained. It's a small unit (or should be...). Therefore:
there is little compilation time issue, you can wait until the class end to start analyzing
there is no risk of dependency hell, since all dependencies are clearly identified and isolated
Therefore we can eschew this annoying forward-declaration rule for classes.

Just guessing: the compiler saves the body of the function and doesn't actually process it until the class declaration is complete.

unlike a namespace, a class' scope cannot be reopened. it is bound.
imagine implementing a class in a header if everything needed to be declared in advance. i presume that since it is bound, it was more logical to write the language as it is, rather than requiring the user to write forwards in the class (or requiring definitions separate from declarations).

Related

Why don't methods of structs have to be declared in C++?

Take, for example, the following code:
#include <iostream>
#include <string>
int main()
{
print("Hello!");
}
void print(std::string s) {
std::cout << s << std::endl;
}
When trying to build this, I get the following:
program.cpp: In function ‘int main()’:
program.cpp:6:16: error: ‘print’ was not declared in this scope
Which makes sense.
So why can I conduct a similar concept in a struct, but not get yelled at for it?
struct Snake {
...
Snake() {
...
addBlock(Block(...));
}
void addBlock(Block block) {
...
}
void update() {
...
}
} snake1;
Not only do I not get warnings, but the program actually compiles! Without error! Is this just the nature of structs? What's happening here? Clearly addBlock(Block) was called before the method was ever declared.
A struct in C++ is actually a class definition where all its content is public, unless specified otherwise by including a protected: or private: declaration.
When the compiler sees a class or struct, it first digests all its declarations from inside the block ({}) before operating on them.
In the regular method case, the compiler hasn't yet seen the type declared.
C++ standard 3.4.1:
.4:
A name used in global scope, outside of any function, class or
user-declared namespace, shall be declared before its use in global
scope.
This is why global variables and functions cannot be used before an afore declaration.
.5:
A name used in a user-declared namespace outside of the definition of
any function or class shall be declared before its use in that
namespace or before its use in a namespace enclosing its namespace.
same thing just written again as the .4 paragraph explictely restricted its saying to "global", this paragraph now says "by the way, its true as well in namespeces folks..."
.7:
A name used in the definition of a class X outside of a member
function body or nested class definition29 shall be declared in one of
the following ways: — before its use in class X or be a member of a
base class of X (10.2), or — if X is a nested class of class Y (9.7),
before the definition of X in Y, or shall be a member of a base class
of Y (this lookup applies in turn to Y ’s enclosing classes, starting
with the innermost enclosing class),30 or — if X is a local class
(9.8) or is a nested class of a local class, before the definition of
class X in a block enclosing the definition of class X, or — if X is a
member of namespace N, or is a nested class of a class that is a
member of N, or is a local class or a nested class within a local
class of a function that is a member of N, before the definition of
class X in namespace N or in one of N ’s enclosing namespaces.
I think this speaks of all the code that does not stand in cpu executed code (eg declarative code).
and finally the interesting part:
3.3.7 Class scope [basic.scope.class]
1 The following rules describe the scope of names declared in classes.
1) The potential scope of a
name declared in a class consists not only of the declarative region
following the name’s point of declaration, but also of all function
bodies, brace-or-equal-initializers of non-static data members, and
default arguments in that class (including such things in nested
classes).
2) A name N used in a class S shall refer to the same
declaration in its context and when re-evaluated in the completed
scope of S. No diagnostic is required for a violation of this rule.
3)
If reordering member declarations in a class yields an alternate valid
program under (1) and (2), the program is ill-formed, no diagnostic is
required.
particularly, by the last point they use a negative manner to define that "any ordering is possible" because if re-ordering would change lookup then there is a problem. its a negative way of saying "you can reorder anything and its ok, it doesnt change anything".
effectively saying, in a class, the declaration is looked-up in a two-phase compilation fashion.
"why can I conduct a similar concept in a struct, but not get yelled at for it?"
In a struct or class definition you're presenting the public interface to a class and it's much easier to understand, search and maintain/update that API if it's presented in:
a predictable order, with
minimal clutter.
For predictable order, people have their own styles and there's a bit of "art" involved, but for example I use each access specifier at most once and always public before protected before private, then within those I normally put typedefs, const data, constructors, destructors, mutating/non-const functions, const functions, statics, friends....
To minimise clutter, if a function is defined in the class, it might as well be without a prior declaration. Having both tends only to obfuscate the interface.
This is different from functions that aren't members of a class - where people who like top-down programming do use function declarations and hide the definitions later in the file - in that:
people who prefer a bottom-up programming style won't appreciate being forced to either have separate declarations in classes or abandon the oft-conflicting practice of grouping by access specifier
Classes are statistically more likely to have many very short functions, largely because they provide encapsulation and wrap a lot of trivial data member accesses or provide operator overloading, casting operators, implicit constructors and other convenience features that aren't relevant to non-OO, non-member functions. That makes a constant forced separation of declarations and definitions more painful for many classes (not so much in the public interfaces where definitions might be in a separate file, but definitely for e.g. classes in anonymous namespaces supporting the current translation unit).
Best practice is for classes not to cram in a wildly extensive interface... you generally want a functional core and then some discretionary convenience functions, after which it's worth considering what can be added as non-member functions. The std::string is an often claimed to have too many member functions, though I personally think it's quite reasonable. Still, this also differs from a header file declaring a library interface, where exhaustive functionality can be expected to be crammed together making a separation of even inline implementation more desirable.

Changing struct to class (and other type changes) and ABI/code generation

It is well-established and a canonical reference question that in C++ structs and classes are pretty much interchangeable, when writing code by hand.
However, if I want to link to existing code, can I expect it to make any difference (i.e. break, nasal demons etc.) if I redeclare a struct as a class, or vice versa, in a header after the original code has been generated?
So the situation is the type was compiled as a struct (or a class), and I'm then changing the header file to the other declaration before including it in my project.
The real-world use case is that I'm auto-generating code with SWIG, which generates different output depending on whether it's given structs or classes; I need to change one to the other to get it to output the right interface.
The example is here (Irrlicht, SVertexManipulator.h) - given:
struct IVertexManipulator
{
};
I am redeclaring it mechanically as:
/*struct*/class IVertexManipulator
{public:
};
The original library compiles with the original headers, untouched. The wrapper code is generated using the modified forms, and compiled using them. The two are then linked into the same program to work together. Assume I'm using the exact same compiler for both libraries.
Is this sort of thing undefined? "Undefined", but expected to work on real-world compilers? Perfectly allowable?
Other similar changes I'm making include removing some default values from parameters (to prevent ambiguity), and removing field declarations from a couple of classes where the type is not visible to SWIG (which changes the structure of the class, but my reasoning is that the generated code should need that information, only to link to member functions). Again, how much havoc could this cause?
e.g. IGPUProgrammingServices.h:
s32 addHighLevelShaderMaterial(
const c8* vertexShaderProgram,
const c8* vertexShaderEntryPointName/*="main"*/,
E_VERTEX_SHADER_TYPE vsCompileTarget/*=EVST_VS_1_1*/,
const c8* pixelShaderProgram=0,
...
CIndexBuffer.h:
public:
//IIndexList *Indices;
...and so on like that. Other changes include replacing some template parameter types with their typedefs and removing the packed attribute from some structs. Again, it seems like there should be no problem if the altered struct declarations are never actually used in machine code (just to generate names to link to accessor functions in the main library), but is this reliably the case? Ever the case?
This is technically undefined behavior.
3.2/5:
There can be more than one definition of a class type, [... or other things that should be defined in header files ...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
each definition of D shall consist of the same sequence of tokens; and
...
... If the definitions of D satisfy all these requirements, then the program shall behave as if there were a single definition of D. If the definitions of D do not satisfy these requirements, then the behavior is undefined.
Essentially, you are changing the first token from struct to class, and inserting tokens public and : as appropriate. The Standard doesn't allow that.
But in all compilers I'm familiar with, this will be fine in practice.
Other similar changes I'm making include removing some default values from parameters (to prevent ambiguity)
This actually is formally allowed, if the declaration doesn't happen to be within a class definition. Different translation units and even different scopes within a TU can define different default function arguments. So you're probably fine there too.
Other changes include replacing some template parameter types with their typedefs
Also formally allowed outside of a class definition: two declarations of a function that use different ways of naming the same type refer to the same function.
... removing field declarations ... and removing the packed attribute from some structs
Now you're in severe danger territory, though. I'm not familiar with SWIG, but if you do this sort of thing, you'd better be darn sure the code using these "wrong" definitions never:
create or destroy an object of the class type
define a type that inherits or contains a member of the class type
use a non-static data member of the class
call an inline or template function that uses a non-static data member of the class
call a virtual member function of the class type or a derived type
try to find sizeof or alignof the class type

Definition of a class's private integral constant: in the header or in the cpp file?

Subject has been addressed mostly here (Where to declare/define class scope constants in C++?)
and in particular here.
What I would like to fully understand, in case of integral constants, is there any difference between:
//In the header
class A {
private:
static const int member = 0; //Declaration and definition
};
And:
//In the header
class A {
private:
static const int member; //Only declaration
};
//In the cpp
const int A::member = 0; //Definition
(I understand that the second might have the advantage that if I change the value of the constant, I have to recompile only one file)
Side questions:
What happens for example with an inline method defined in the header that access member? Will it simply be not inlined? What would happens if, going to one extreme, all methods were defined in the header file as inline methods and all constants were defined in the cpp file?
Edit:
My apologizes: I thought it was not necessary, but I missed the fact that the member is static. My question stays, but now the code is legal.
If, as it was before the question was changed to make it static, it's a non-static member, then it can only be initialised in the constructor's initialiser list or (since 2011) in the member's declaration. Your second example was ill-formed.
If it's static, then you need a definition if it's odr-used: roughly speaking, if you do anything that requires its address rather than just its value. If you only use the value, then the first example is fine. But note that the comment is wrong - it's just a declaration, not a definition.
If you do need a definition, then it's up to you whether you specify the value in the declaration or the definition. Specifying it in the declaration allows better scope for optimisation, since the value is always available when the variable is used. Specifying it in the definition gives better encapsulation, only requiring one translation unit to be recompiled if it changes.
What happens for example with an inline method defined in the header that access member? Will it simply be not inlined?
There's no reason why accessing a data object defined in another translation unit should prevent a function from being inlined.
There are two points of view to take into account, namely visibility and addressing.
Note that the two are orthogonal, for you can actually declare the variable as initialized and still define it in a translation unit so it has an effective address in memory.
Visibility
Visibility affects the usage of the variable, and has some technical impacts.
For usage in template code as a non-type template parameter, the value must be visible at the point of use. Also, in C++11, this might be necessary for constexpr usage. Otherwise, it is not necessary that the value be visible.
Technically a visible value can trigger optimizations from the compiler. For example if (A::member) is trivially false so the test can be elided. This is generally referred to as Constant Propagation. While this may seem a good thing, at first glance, there is a profound impact though: all clients of the header file potentially depends on this value, and thus any change to this value means they should be recompiled. If you deliver this header as part of a shared library, this means that changing this value breaks the ABI.
Addressing
The rule here is quite simple: if the variable can be addressed (either passed by pointer or reference), then it needs to reside somewhere in memory. This requires a definition in one translation unit.
This is the question of data hiding. Whether you want to unveil internal class fields or not. If you are shipping a classes library and want to hide the implementation details then it is better to show in the interface as few entities as possible, then even a declaration of the private field member is too much.
I would just declare this value as a static variable inside a .cpp file.

Why field inside a local class cannot be static?

void foo (int x)
{
struct A { static const int d = 0; }; // error
}
Other than the reference from standard, is there any motivation behind this to disallow static field inside an inner class ?
error: field `foo(int)::A::d' in local class cannot be static
Edit: However, static member functions are allowed. I have one use case for such scenario. Suppose I want foo() to be called only for PODs then I can implement it like,
template<typename T>
void foo (T x)
{
struct A { static const T d = 0; }; // many compilers allow double, float etc.
}
foo() should pass for PODs only (if static is allowed) and not for other data types. This is just one use case which comes to my mind.
Because, static members of a class need to be defined in global a scope, e.g.
foo.h
class A {
static int dude;
};
foo.cpp
int A::dude = 314;
Since the scope inside void foo(int x) is local to that function, there is no scope to define its static member[s].
Magnus Skog has given the real answer: a static data member is just a declaration; the object must be defined elsewhere, at namespace scope, and the class definition isn't visible at namespace scope.
Note that this restriction only applies to static data members. Which means that there is a simple work-around:
class Local
{
static int& static_i()
{
static int value;
return value;
}
};
This provides you with exactly the same functionality, at the cost of
using the function syntax to access it.
Because nobody saw any need for it ?
[edit]: static variables need be defined only once, generally outside of the class (except for built-ins). Allowing them within a local class would require designing a way to define them also. [/edit]
Any feature added to a language has a cost:
it must be implemented by the compiler
it must be maintained in the compiler (and may introduce bugs, even in other features)
it lives in the compiler (and thus may cause some slow down even when unused)
Sometimes, not implementing a feature is the right decision.
Local functions, and classes, add difficulty already to the language, for little gain: they can be avoided with static functions and unnamed namespaces.
Frankly, if I had to make the decision, I'd remove them entirely: they just clutter the grammar.
A single example: The Most Vexing Parse.
I think this is the same naming problem that has prevented us from using local types in template instantiations.
The name foo()::A::d is not a good name for the linker to resolve, so how should it find the definition of the static member? What if there is another struct A in function baz()?
Interesting question, but I have difficulty understanding why you'd want a static member in a local class. Statics are typically used to maintain state across program flow, but in this case wouldn't it be better to use a static variable whose scope was foo()?
If I had to guess why the restriction exists, I'd say it was something to do with the difficulty for the compiler in knowing when to perform the static initialisation. The C++ standards docs might provide a more formal justification.
Just because.
One annoying thing about C++ is that there's a strong dependence on a "global context" concept where everything must be uniquely named. Even the nested namespaces machinery is just string trickery.
I suppose (just a wild guess) that one serious technical issue is working with linkers that were designed for C and that just got some tweak to get them working with C++ (and C++ code needs C interoperability).
It would be nice to be able to get any C++ code and "wrap it" to be able to use it without conflicts in a larger project, but this is not the case because of linkage problems. I don't think there is any reasonable philosophical reason for forbidding statics or non-inline methods (or even nested functions) at the function level but this is what we got (for now).
Even the declaration/definition duality with all its annoying verbosity and implications is just about implementation problems (and to give the ability to sell usable object code without providing the source, something that is now a lot less popular for good reasons).

Why is the 'Declare before use' rule not required inside a class? [duplicate]

This question already has answers here:
Do class functions/variables have to be declared before being used?
(5 answers)
Closed 4 years ago.
I'm wondering why the declare-before-use rule of C++ doesn't hold inside a class.
Look at this example:
#ifdef BASE
struct Base {
#endif
struct B;
struct A {
B *b;
A(){ b->foo(); }
};
struct B {
void foo() {}
};
#ifdef BASE
};
#endif
int main( ) { return 0; }
If BASE is defined, the code is valid.
Within A's constructor I can use B::foo, which hasn't been declared yet.
Why does this work and, mostly, why only works inside a class?
Well, to be pedantic there's no "declare before use rule" in C++. There are rules of name lookup, which are pretty complicated, but which can be (and often are) roughly simplified into the generic "declare before use rule" with a number of exceptions. (In a way, the situation is similar to "operator precedence and associativity" rules. While the language specification has no such concepts, we often use them in practice, even though they are not entirely accurate.)
This is actually one of those exceptions. Member function definitions in C++ are specifically and intentionally excluded from that "declare before use rule" in a sense that name lookup from the bodies of these members is performed as if they are defined after the class definition.
The language specification states that in 3.4.1/8 (and footnote 30), although it uses a different wording. It says that during the name lookup from the member function definition, the entire class definition is inspected, not just the portion above the member function definition. Footnote 30 additionally states though that the lookup rules are the same for functions defined inside the class definition or outside the class definition (which is pretty much what I said above).
Your example is a bit non-trivial. It raises the immediate question about member function definitions in nested classes: should they be interpreted as if they are defined after the definition of the most enclosing class? The answer is yes. 3.4.1/8 covers this situation as well.
"Design & Evolution of C++" book describes the reasoning behind these decisions.
That's because member functions are compiled only after the whole class definition has been parsed by the compiler, even when the function definition is written inline, whereas regular functions are compiled immediatedly after being read. The C++ standard requires this behaviour.
I don't know the chapter and verse of the standard on this.
But if you would apply the "declare before use" rule strictly within a class, you would not be able to declare member variables at the bottom of the class declaration either. You would have to declare them first, in order to use them e.g. in a constructor initialization list.
I could imagine the "declare before use" rule has been relaxed a bit within the class declaration to allow for "cleaner" overall layout.
Just guesswork, as I said.
The most stubborn problems in the definition of C++ relate to name lookup: exactly which uses of a name refer to which declarations? Here, I'll describe just one kind of lookup problem: the ones that relate to order dependencies between class member declarations. [...]
Difficulties arise because of conflicts between goals:
We want to be able to do syntax analysis reading the source text once only.
Reordering the members of a class should not change the meaning of the class.
A member function body explicitly written inline should mean the same thing when written out of line.
Names from an outer scope should be usable from an inner scope (in the same way as they are in C).
The rules for name lookup should be independent of what a name refers to.
If all of these hold, the language will be reasonably fast to parse, and users won't have to worry about these rules because the compiler will catch the ambiguous and near ambiguous cases. The current rules come very close to this ideal.
[The Design And Evolution Of C++, section 6.3.1 called Lookup Issues on page 138]