Is the compiler allowed to optimise out private data members? - c++

If the compiler can prove that a (private) member of a class is never used, including by potential friends, does the standard allow the compiler to remove this member from the memory footprint of the class?
It is self-evident that this not possible for protected or public members at compile time, but there could be circumstances where it is possible regarding private data members for such a proof to be constructed.
Related questions:
Behind the scenes of public, private and protected (sparked this question)
Is C++ compiler allowed to optimize out unreferenced local objects (about automatic objects)
Will a static variable always use up memory? (about static objects)

Possible in theory (along with unused public members), but not with the kind of compiler ecosystem we're used to (targeting a fixed ABI that can link separately-compiled code). Removing unused members could only be done with whole-program optimization that forbids separate libraries1.
Other compilation units might need to agree on sizeof(foo), but that wouldn't be something you could derive from a .h if it depended on verifying that no implementation of a member function's behaviour depended on any private members.
Remember C++ only really specifies one program, not a way to do libraries. The language ISO C++ specifies is compatible with the style of implementation we're used to (of course), but implementations that take all the .cpp and .h files at once and produce a single self-contained non-extensible executable are possible.
If you constrain the implementation enough (no fixed ABI), aggressive whole-program application of the as-if rule becomes possible.
Footnote 1: I was going to add "or exports the size information somehow to other code being compiled" as a way to allow libraries, if the compiler could already see definitions for every member function declared in the class. But #PasserBy's answer points out that a separately-compiled library could be the thing that used the declared private members in ways that ultimately produce externally-visible side effects (like I/O). So we'd have to fully rule them out.
Given that, public and private members are equivalent for the purposes of such an optimization.

If the compiler can prove that a (private) member of a class is never used
The compiler cannot prove that, because private members can be used in other compilation units. Concretely, this is possible in the context of a pointer to member in a template argument according to [temp.spec]/6 of the standard, as originally described by Johannes Schaub.
So, in summary: no, the compiler must not optimise out private data members any more than public or protected members (subject to the as-if rule).

No, because you can subvert the access control system legally.
class A
{
int x;
};
auto f();
template<auto x>
struct cheat
{
friend auto f() { return x; }
};
template struct cheat<&A::x>; // see [temp.spec]/6
int& foo(A& a)
{
return a.*f(); // returns a.x
}
Given that the compiler must fix the ABI when A is first used, and that it can never know whether some future code may access x, it must fix the memory of A to contain x.

Related

Access to protected member through member-pointer: is it a hack?

We all know members specified protected from a base class can only be accessed from a derived class own instance. This is a feature from the Standard, and this has been discussed on Stack Overflow multiple times:
Cannot access protected member of another instance from derived type's scope
;
Why can't my object access protected members of another object defined in common base class?
And others.
But it seems possible to walk around this restriction with member pointers, as user chtz has shown me:
struct Base { protected: int value; };
struct Derived : Base
{
void f(Base const& other)
{
//int n = other.value; // error: 'int Base::value' is protected within this context
int n = other.*(&Derived::value); // ok??? why?
(void) n;
}
};
Live demo on coliru
Why is this possible, is it a wanted feature or a glitch somewhere in the implementation or the wording of the Standard?
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
The fact that a member is not accessible using class member access expr.ref (aclass.amember) due to access control [class.access] does not make this member inaccessible using other expressions.
The expression &Derived::value (whose type is int Base::*) is perfectly standard compliant, and it designates the member value of Base. Then the expression a_base.*p where p is a pointer to a member of Base and a_base an instance of Base is also standard compliant.
So any standard compliant compiler shall make the expression other.*(&Derived::value); defined behavior: access the member value of other.
is it a hack?
In similar vein to using reinterpret_cast, this can be dangerous and may potentially be a source of hard to find bugs. But it's well formed and there's no doubt whether it should work.
To clarify the analogy: The behaviour of reinterpret_cast is also specified exactly in the standard and can be used without any UB. But reinterpret_cast circumvents the type system, and the type system is there for a reason. Similarly, this pointer to member trick is well formed according to the standard, but it circumvents the encapsulation of members, and that encapsulation (typically) exists for a reason (I say typically, since I suppose a programmer can use encapsulation frivolously).
[Is it] a glitch somewhere in the implementation or the wording of the Standard?
No, the implementation is correct. This is how the language has been specified to work.
Member function of Derived can obviously access &Derived::value, since it is a protected member of a base.
The result of that operation is a pointer to a member of Base. This can be applied to a reference to Base. Member access privileges does not apply to pointers to members: It applies only to the names of the members.
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
Not UB. Base has the member.
Just to add to the answers and zoom in a bit on the horror I can read between your lines. If you see access specifiers as 'the law', policing you to keep you from doing 'bad things', I think you are missing the point. public, protected, private, const ... are all part of a system that is a huge plus for C++. Languages without it may have many merits but when you build large systems such things are a real asset.
Having said that: I think it's a good thing that it is possible to get around almost all the safety nets provided to you. As long as you remember that 'possible' does not mean 'good'. This is why it should never be 'easy'. But for the rest - it's up to you. You are the architect.
Years ago I could simply do this (and it may still work in certain environments):
#define private public
Very helpful for 'hostile' external header files. Good practice? What do you think? But sometimes your options are limited.
So yes, what you show is kind-of a breach in the system. But hey, what keeps you from deriving and hand out public references to the member? If horrible maintenance problems turn you on - by all means, why not?
Basically what you're doing is tricking the compiler, and this is supposed to work. I always see this kind of questions and people some times get bad results and some times it works, depending on how this converts to assembler code.
I remember seeing a case with a const keyword on a integer, but then with some trickery the guy was able to change the value and successfully circumvented the compiler's awareness. The result was: A wrong value for a simple mathematical operation. The reason is simple: Assembly in x86 does make a distinction between constants and variables, because some instructions do contain constants in their opcode. So, since the compiler believes it's a constant, it'll treat it as a constant and deal with it in an optimized way with the wrong CPU instruction, and baam, you have an error in the resulting number.
In other words: The compiler will try to enforce all the rules it can enforce, but you can probably eventually trick it, and you may or may not get wrong results based on what you're trying to do, so you better do such things only if you know what you're doing.
In your case, the pointer &Derived::value can be calculated from an object by how many bytes there are from the beginning of the class. This is basically how the compiler accesses it, so, the compiler:
Doesn't see any problem with permissions, because you're accessing value through derived at compile-time.
Can do it, because you're taking the offset in bytes in an object that has the same structure as derived (well, obviously, the base).
So, you're not violating any rules. You successfully circumvented the compilation rules. You shouldn't do it, exactly because of the reasons described in the links you attached, as it breaks OOP encapsulation, but, well, if you know what you're doing...

Definition of a class's private integral constant: in the header or in the cpp file?

Subject has been addressed mostly here (Where to declare/define class scope constants in C++?)
and in particular here.
What I would like to fully understand, in case of integral constants, is there any difference between:
//In the header
class A {
private:
static const int member = 0; //Declaration and definition
};
And:
//In the header
class A {
private:
static const int member; //Only declaration
};
//In the cpp
const int A::member = 0; //Definition
(I understand that the second might have the advantage that if I change the value of the constant, I have to recompile only one file)
Side questions:
What happens for example with an inline method defined in the header that access member? Will it simply be not inlined? What would happens if, going to one extreme, all methods were defined in the header file as inline methods and all constants were defined in the cpp file?
Edit:
My apologizes: I thought it was not necessary, but I missed the fact that the member is static. My question stays, but now the code is legal.
If, as it was before the question was changed to make it static, it's a non-static member, then it can only be initialised in the constructor's initialiser list or (since 2011) in the member's declaration. Your second example was ill-formed.
If it's static, then you need a definition if it's odr-used: roughly speaking, if you do anything that requires its address rather than just its value. If you only use the value, then the first example is fine. But note that the comment is wrong - it's just a declaration, not a definition.
If you do need a definition, then it's up to you whether you specify the value in the declaration or the definition. Specifying it in the declaration allows better scope for optimisation, since the value is always available when the variable is used. Specifying it in the definition gives better encapsulation, only requiring one translation unit to be recompiled if it changes.
What happens for example with an inline method defined in the header that access member? Will it simply be not inlined?
There's no reason why accessing a data object defined in another translation unit should prevent a function from being inlined.
There are two points of view to take into account, namely visibility and addressing.
Note that the two are orthogonal, for you can actually declare the variable as initialized and still define it in a translation unit so it has an effective address in memory.
Visibility
Visibility affects the usage of the variable, and has some technical impacts.
For usage in template code as a non-type template parameter, the value must be visible at the point of use. Also, in C++11, this might be necessary for constexpr usage. Otherwise, it is not necessary that the value be visible.
Technically a visible value can trigger optimizations from the compiler. For example if (A::member) is trivially false so the test can be elided. This is generally referred to as Constant Propagation. While this may seem a good thing, at first glance, there is a profound impact though: all clients of the header file potentially depends on this value, and thus any change to this value means they should be recompiled. If you deliver this header as part of a shared library, this means that changing this value breaks the ABI.
Addressing
The rule here is quite simple: if the variable can be addressed (either passed by pointer or reference), then it needs to reside somewhere in memory. This requires a definition in one translation unit.
This is the question of data hiding. Whether you want to unveil internal class fields or not. If you are shipping a classes library and want to hide the implementation details then it is better to show in the interface as few entities as possible, then even a declaration of the private field member is too much.
I would just declare this value as a static variable inside a .cpp file.

Inline Class Functions that have private data members

Now I have been learning about inline functions and I encountered something that really made me confused
See this class
class Nebla{
private:
int x;
public:
inline void set(int y){x=y;}
inline void print(){cout<<x<<endl;}
};
it has a private data member: int x;
And it has two public inline functions: set(int y) and print()
Now since they two functions are inline, when they are called the compiler replaces the function call with the contents of the function.
So if I do this
Nebla n;
n.set(1);
n.print();
since the two functions are inline, It should be the equivalent of this:
Nebla n;
n.x=1;
cout<<n.x<<endl;
but wait a second, x is private. Therefore, this shouldn't work.
But it does and I'm confused why it does work although normally you cant access private members from outside the class?
Can anyone explain to be why you can access private data members from outside the class but when a member function is inline it can although inline just replaces the function call with the contents of the function?
Data member protection is purely conceptual. It exists only at the compiler level. It is checked and enforced when the compiler translates the source code. Once the code is compiled, there's no difference between public and private data members anymore, i.e. there are no physical mechanisms that would enforce access control and prevent access to private data members.
Member access is enforced by the compiler in accordance with the language specification. The language specification states that class member functions (regardless of whether they are inline or not) have access to private members of the class. So the compiler allows that access. Meanwhile, other functions are prohibited such access, so the compiler complains about it.
In your example you are accessing private data member from a member function. That is allowed, so the code compiles, i.e. the compiler does not complain. What happens later in the generated machine code, after the function gets inlined, is completely irrelevant. That's all there is to it.
You misunderstand how inline works. The compiler inlines the logic of the code, not the actual text of the code.
Can anyone explain to be why you can access private data members from outside the class but when a member function is inline it can although inline just replaces the function call with the contents of the function?
Because the contents of the function are the contents of the function. They don't stop being the function just because they've been inlined. You are allowed to access private member variables from inside a member function. When a member function is inlined, its code is still inside the member function because the function is inlined.
First of all, whether or not it gets inlined is up to the compiler. In a lot of cases it will decide is not the best thing to do.
Second, in the case it does inline it, it does so with a compiled binary, product of the behavior described in the C++ source code, not the actual text.
Morbo says the inline keyword doesn't work that way.
Morbo says that the inline keyword says that symbol conflict at linker time involving this function should be ignored, and that all functions whose implementation is within the declaration of the class are implicitly inline.
Morbo is wise. You should listen to Morbo, even if there is a minor technical additional meaning of inline that involves taking addresses.
More seriously, inline just let's you put definitions of the implementation into a header file. Actually making the code inline is thus easier because it doesn't have to happen at link time (and most C++ linkers are too lazy) but it does not cause the code to be inline.
And finally privacy is conceptual, it is not enforced by the C++ run time. It is just enforced at compile time by telling you that something is out of bounds.

Why field inside a local class cannot be static?

void foo (int x)
{
struct A { static const int d = 0; }; // error
}
Other than the reference from standard, is there any motivation behind this to disallow static field inside an inner class ?
error: field `foo(int)::A::d' in local class cannot be static
Edit: However, static member functions are allowed. I have one use case for such scenario. Suppose I want foo() to be called only for PODs then I can implement it like,
template<typename T>
void foo (T x)
{
struct A { static const T d = 0; }; // many compilers allow double, float etc.
}
foo() should pass for PODs only (if static is allowed) and not for other data types. This is just one use case which comes to my mind.
Because, static members of a class need to be defined in global a scope, e.g.
foo.h
class A {
static int dude;
};
foo.cpp
int A::dude = 314;
Since the scope inside void foo(int x) is local to that function, there is no scope to define its static member[s].
Magnus Skog has given the real answer: a static data member is just a declaration; the object must be defined elsewhere, at namespace scope, and the class definition isn't visible at namespace scope.
Note that this restriction only applies to static data members. Which means that there is a simple work-around:
class Local
{
static int& static_i()
{
static int value;
return value;
}
};
This provides you with exactly the same functionality, at the cost of
using the function syntax to access it.
Because nobody saw any need for it ?
[edit]: static variables need be defined only once, generally outside of the class (except for built-ins). Allowing them within a local class would require designing a way to define them also. [/edit]
Any feature added to a language has a cost:
it must be implemented by the compiler
it must be maintained in the compiler (and may introduce bugs, even in other features)
it lives in the compiler (and thus may cause some slow down even when unused)
Sometimes, not implementing a feature is the right decision.
Local functions, and classes, add difficulty already to the language, for little gain: they can be avoided with static functions and unnamed namespaces.
Frankly, if I had to make the decision, I'd remove them entirely: they just clutter the grammar.
A single example: The Most Vexing Parse.
I think this is the same naming problem that has prevented us from using local types in template instantiations.
The name foo()::A::d is not a good name for the linker to resolve, so how should it find the definition of the static member? What if there is another struct A in function baz()?
Interesting question, but I have difficulty understanding why you'd want a static member in a local class. Statics are typically used to maintain state across program flow, but in this case wouldn't it be better to use a static variable whose scope was foo()?
If I had to guess why the restriction exists, I'd say it was something to do with the difficulty for the compiler in knowing when to perform the static initialisation. The C++ standards docs might provide a more formal justification.
Just because.
One annoying thing about C++ is that there's a strong dependence on a "global context" concept where everything must be uniquely named. Even the nested namespaces machinery is just string trickery.
I suppose (just a wild guess) that one serious technical issue is working with linkers that were designed for C and that just got some tweak to get them working with C++ (and C++ code needs C interoperability).
It would be nice to be able to get any C++ code and "wrap it" to be able to use it without conflicts in a larger project, but this is not the case because of linkage problems. I don't think there is any reasonable philosophical reason for forbidding statics or non-inline methods (or even nested functions) at the function level but this is what we got (for now).
Even the declaration/definition duality with all its annoying verbosity and implications is just about implementation problems (and to give the ability to sell usable object code without providing the source, something that is now a lot less popular for good reasons).

Inheritance Costs in C++

Taking the following snippet as an example:
struct Foo
{
typedef int type;
};
class Bar : private Foo
{
};
class Baz
{
};
As you can see, no virtual functions exist in this relationship. Since this is the case, are the the following assumptions accurate as far as the language is concerned?
No virtual function table will be created in Bar.
sizeof(Bar) == sizeof(Baz)
Basically, I'm trying to figure out if I'll be paying any sort of penalty for doing this. My initial testing (albeit on a single compiler) indicates that my assertions are valid, but I'm not sure if this is my compiler's optimizer or the language specification that's responsible for what I'm seeing.
According to the standard, Bar is not a POD (plain old data) type, because it has a base. As a result, the standard gives C++ compilers wide latitude with what they do with such a type.
However, very few compilers are going to do anything insane here. The one thing you probably have to look out for is the Empty Base Optimization. For various technical reasons, the C++ standard requires that any instance be allocated storage space. For some compilers, Foo will be allocated dedicated space in the bar class. Compilers which implement the Empty Base Optimization (most all in modern use) will remove the empty base, however.
If the given compiler does not implement EBO, then sizeof(foo) will be at least twice sizeof(baz).
Yeah, without any virtual members or member variables, there shouldn't be a size difference.
As far as I know the compiler will optimize this correctly, if any optimizing is needed at all.