Maximum number of fields for a C++ object - c++

This answer states that in Java the maximum number of fields an object may have is 65536.
Is there any such limit imposed on an object in C++?

C++03 standard, Annex B (implementation quantities):
Because computers are finite, C++ implementations are inevitably limited
in the size of the programs they can
successfully process. Every
implementation shall document those
limitations where known. This
documentation may cite fixed limits
where they exist, say how to compute
variable limits as a function of
available resources, or say that fixed
limits do not exist or are unknown.
The limits may constrain quantities
that include those described below or
others. The bracketed number following
each quantity is recommended as the
minimum for that quantity. However,
these quantities are only guidelines
and do not determine compliance.
The list includes
Size of an object [262 144].
Data members in a single class, structure, or union [16 384].
Members declared in a single class [4 096].
So there's no defined limit, but an implementation which applies a limit "should" make the limit at least as big as the value indicated. I'm afraid I don't know what common implementations actually do, but if they don't document it they're either not compliant, or else the limit is "unknown". I guess that "unknown" generally means, "as many as we can fit in the available memory at compile time".
Btw, I'm not sure what the difference is between "members in a class" and "members declared in a class". I think it means that if your base class has 10 data members, and your class declares 10 members, then your class has 20 (or 21) data members in total (depending whether the base class sub-object counts as a data member or not).

I don't believe that there is anything in the C++ spec to cover this, but I suspect that different compilers will have different limits.

There is no hard limit on the amount a fields an object can have, but saying that I imagine this is highly platform and compiler dependent.
Also there is probably something very wrong with the design of your class if you are using even 100 or more fields in an object, so shouldn't have to worry about limits instead worry about OOP design

Related

Access to protected member through member-pointer: is it a hack?

We all know members specified protected from a base class can only be accessed from a derived class own instance. This is a feature from the Standard, and this has been discussed on Stack Overflow multiple times:
Cannot access protected member of another instance from derived type's scope
;
Why can't my object access protected members of another object defined in common base class?
And others.
But it seems possible to walk around this restriction with member pointers, as user chtz has shown me:
struct Base { protected: int value; };
struct Derived : Base
{
void f(Base const& other)
{
//int n = other.value; // error: 'int Base::value' is protected within this context
int n = other.*(&Derived::value); // ok??? why?
(void) n;
}
};
Live demo on coliru
Why is this possible, is it a wanted feature or a glitch somewhere in the implementation or the wording of the Standard?
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
The fact that a member is not accessible using class member access expr.ref (aclass.amember) due to access control [class.access] does not make this member inaccessible using other expressions.
The expression &Derived::value (whose type is int Base::*) is perfectly standard compliant, and it designates the member value of Base. Then the expression a_base.*p where p is a pointer to a member of Base and a_base an instance of Base is also standard compliant.
So any standard compliant compiler shall make the expression other.*(&Derived::value); defined behavior: access the member value of other.
is it a hack?
In similar vein to using reinterpret_cast, this can be dangerous and may potentially be a source of hard to find bugs. But it's well formed and there's no doubt whether it should work.
To clarify the analogy: The behaviour of reinterpret_cast is also specified exactly in the standard and can be used without any UB. But reinterpret_cast circumvents the type system, and the type system is there for a reason. Similarly, this pointer to member trick is well formed according to the standard, but it circumvents the encapsulation of members, and that encapsulation (typically) exists for a reason (I say typically, since I suppose a programmer can use encapsulation frivolously).
[Is it] a glitch somewhere in the implementation or the wording of the Standard?
No, the implementation is correct. This is how the language has been specified to work.
Member function of Derived can obviously access &Derived::value, since it is a protected member of a base.
The result of that operation is a pointer to a member of Base. This can be applied to a reference to Base. Member access privileges does not apply to pointers to members: It applies only to the names of the members.
From comments emerged another question: if Derived::f is called with an actual Base, is it undefined behaviour?
Not UB. Base has the member.
Just to add to the answers and zoom in a bit on the horror I can read between your lines. If you see access specifiers as 'the law', policing you to keep you from doing 'bad things', I think you are missing the point. public, protected, private, const ... are all part of a system that is a huge plus for C++. Languages without it may have many merits but when you build large systems such things are a real asset.
Having said that: I think it's a good thing that it is possible to get around almost all the safety nets provided to you. As long as you remember that 'possible' does not mean 'good'. This is why it should never be 'easy'. But for the rest - it's up to you. You are the architect.
Years ago I could simply do this (and it may still work in certain environments):
#define private public
Very helpful for 'hostile' external header files. Good practice? What do you think? But sometimes your options are limited.
So yes, what you show is kind-of a breach in the system. But hey, what keeps you from deriving and hand out public references to the member? If horrible maintenance problems turn you on - by all means, why not?
Basically what you're doing is tricking the compiler, and this is supposed to work. I always see this kind of questions and people some times get bad results and some times it works, depending on how this converts to assembler code.
I remember seeing a case with a const keyword on a integer, but then with some trickery the guy was able to change the value and successfully circumvented the compiler's awareness. The result was: A wrong value for a simple mathematical operation. The reason is simple: Assembly in x86 does make a distinction between constants and variables, because some instructions do contain constants in their opcode. So, since the compiler believes it's a constant, it'll treat it as a constant and deal with it in an optimized way with the wrong CPU instruction, and baam, you have an error in the resulting number.
In other words: The compiler will try to enforce all the rules it can enforce, but you can probably eventually trick it, and you may or may not get wrong results based on what you're trying to do, so you better do such things only if you know what you're doing.
In your case, the pointer &Derived::value can be calculated from an object by how many bytes there are from the beginning of the class. This is basically how the compiler accesses it, so, the compiler:
Doesn't see any problem with permissions, because you're accessing value through derived at compile-time.
Can do it, because you're taking the offset in bytes in an object that has the same structure as derived (well, obviously, the base).
So, you're not violating any rules. You successfully circumvented the compilation rules. You shouldn't do it, exactly because of the reasons described in the links you attached, as it breaks OOP encapsulation, but, well, if you know what you're doing...

Resolve (u)int_fastX_t at compile time

Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
Wouldn't it increase performance to resolve such types at compile time to account for the case by chosing the optimal type for the actual use? The compiler would analyze the use of a _fast variable and then chose the optimal type. Factors coming into play could be alignment and the kind of operations used with the variable.
This would effectively make those types a language feature.
This could introduce bugs when the compiler suddenly decides to choose another width for such a variable. But one shouldn't use a _fast type in such use cases, where the behaviour depends on the width, anyways.
Is such compile time resolval permitted by the standard?
If yes, why isn't it implemented as of today?
If no, why isn't it in the standard?
No, this is not permitted by the standard. Keep in mind the C++ standard defers to C for this particular area, for example, C++11 defers to C99, as per C++11 1.1 /2. Specifically, C++11 18.4.1 Header <cstdint> synopsis /2 states:
The header defines all functions, types, and macros the same as 7.18 in the C standard.
So let's get your first contention out of the way, you state:
Implementations of the C++ standard typedef the (u)int_fastX types as one of their built in types. This requires research in which type is the fastest, but there cannot be one fastest type for every case.
The C standard has this to say, in c99 7.18.1.3 Fastest minimum-width integer types (my italics):
Each of the following types designates an integer type that is usually fastest to operate with among all integer types that have at least the specified width.
The designated type is not guaranteed to be fastest for all purposes; if the implementation has no clear grounds for choosing one type over another, it will simply pick some integer type satisfying the signedness and width requirements.
So you're indeed correct that a type cannot be fastest for all possible uses but this seems to not be what the authors had in mind in defining these aspects.
The introduction of the fixed-width types was (in my opinion) to solve the problem all those developers had in having different int widths across the various implementations.
Similarly, once a developer knows the range of values they want, the fast minimum-width types give them a way to do arithmetic on those values at the maximum possible speed.
Covering your three specific questions in your final paragraph (in bold below):
(1) Is such compile time resolution permitted by the standard?
I don't believe so. The relevant part of the C standard has this little piece of text:
For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and define the associated macros.
That seems to indicate that it must be a typedef provided by the implementation and, since there are no "variable" typedefs, it has to be fixed.
There may be wiggle room because it could be possible to provide a different typedef depending on certain environmental considerations but the difficulty in actually implementing this seems very high (see my answer to your third question below).
Chief amongst these is that these adaptable types, should they have external linkage, would require agreement amongst all the compiled translation units when linked together. Having one unit with a 16-bit type and another with a 32-bit type is going to cause all sorts of problems.
(2) If yes, why isn't it implemented as of today?
I'm pushing "no" as an answer to your first question so I'm not going to speculate on this other than by referring you to the answer to the third question below (it's probably not implemented because it's very hard, with dubious benefits).
(3) If no, why isn't it in the standard?
A standard is a contract between the implementor and the user and describes what the implementor will provide. It's usual that the standards committees tend to be more populated by the former (who aren't that keen on making too much extra work for themselves) than the latter.
For example, I would love to have all the you-beaut C++ data structures in C but this would have the consequence that standards versions would be decades apart rather than years :-)

their representation is part of their definition as related to C++ concrete types?

In both of his books
The C++ Programming Language, 2013 (4th edition) and
A Tour of C++, 2013
Bjarne Stroustrup writes:
Types such as complex ... are called concrete types because
their representation is part of their definition.
What follows to some extent clarifies the above statement:
In that, they resemble built-in types. In contrast, an abstract type
is a type that completely insulates a user from implementation
details. To do that, we decouple the interface from the
representation and give up genuine local variables. Since we don’t
know anything about the representation of an abstract type (not even
its size), we must allocate objects on the free store and access them
through references or pointers.
Questions
In the phrase "...their representation is part of their definition."
What is the meaning of type representation? That is, the representation of what exactly: The object layout in memory? The private and public data that the type holds? Or something else?
What is the meaning of type definition?
Are these typical meanings of type representation and definition as related to C++?
I decided to do some more research and I checked other sources. First I looked through ISO/IEC 14882:2011 specifications that state requirements for implementations of the C++ programming language, then through other sources.
Ad question 1
I was not able to find in ISO specs anything like "type representation" or "representation of a type". Instead there are 2 terms related to objects:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
So it seems to me that the term type representation does not have any conventional meaning within the ISO standards.
Ok. Maybe it is something outside the ISO standards? Let's see what
Linux Standard Base C++ Specification 3.1 > Chapter 7. C++ Class Representations > 7.1. C++ Data Representation says:
An object file generated by the compilation process for a C++ program shall contain several closely related internal objects, or Class Components, to represent each C++ Class. Such objects are not a visible part of the source code. The following table describes these Class Components at a high level.
Table Class Components
Object.......................Contains
=----------------------------------------=
Class Data...................Class members
Virtual Table................Information needed to dispatch virtual functions,
access virtual base class subobjects and to access
the RTTI information
RTTI.........................Run-Time Type Information used by the typeid and
dynamic_cast operators, and exception handlers
Typeinfo Name................String representation of Class name
Construction Virtual Table...Information needed during construction and
destruction of Classes with non-trivial
inheritance relationships.
VTT..........................A table of virtual table pointers which holds the
addresses of construction and non-construction
virtual tables.
Ad question 2
I was again not able to find in ISO specs an explicit explanation of type definition.
Instead I found the following:
A declaration may introduce one or more names into a translation
unit... A class declaration introduces the class name into the
scope where it is declared...A declaration is a definition unless
[I removed things not directly related to the class declaration], ...
it is a class name declaration...
Here is a Microsoft interpretation of the same thing:
C++ Declarations - MSDN - Microsoft
A declaration introduces
one or more names into a program. Declarations can occur more than
once in a program...Declarations also serve as definitions, except
when the declaration:...;Is a class name declaration with no
following definition, such as class T;...
and
C++ Definitions - MSDN - Microsoft
A definition is a unique
specification of an object or variable, function, class, or
enumerator. Because definitions must be unique, a program can contain
only one definition for a given program element. There can be a
many-to-one correspondence between declarations and definitions.
There are two cases in which a program element can be declared and not defined: A function is declared but never referenced with a
function call or with an expression that takes the function's address.
A class is used only in a way that does not require its definition be
known.
Examples:
struct S; // declares, but not defines S
class T {}; // declares, and defines T
class P { int a;}; // declares, and defines P, P::a
Conclusions:
Candidate Answer N1:
proposed by Jonathan Wakely
(below is my understanding)
The phrase "Types such as complex ... are called concrete types because their representation is part of their definition" should be interpreted and understood in the following way:
● their(=type) definition is a technical c++ term whose meaning is conventional and can be found in c++ specs;
● their(=type) representation is (according to Jonathan Wakely) not a technical c++ term in this context, but its meaning can be easily figured out by anybody who understands English language well enough (and probably, it is my guess, has been previously exposed to the generous amount of c++ codes and texts). Type representation in this context means
"the properties that define what the type is and what it does", that is:
"for a concrete type: the type and layout of its members",
"for an abstract type: its member functions and their observable behavior"
● The whole phrase then (we are talking about the concrete classes) translates to:
"Types such as complex ... are called concrete types because the types and layouts of their members are part of their definition"
I think this interpretation makes sense, is understandable, and also agrees well with what follows it in the BS books.
Please correct me if something here is not ok**
QUESTIONS: in the phrase "...their representation is part of their definition." 1) What is the meaning of type representation? (that is, the representation of WHAT exactly: object layout in memory or private and public data that the type holds OR something else) 2) What is the meaning of type definition? 3) Are these typical meanings of type representation and definition as related to c++?
You're asking for the meaning of terms that Stroustrup doesn't use in the text you quoted!
He's not trying to define a formal specification of a term like "type representation" the way the C++ standard does, he's writing prose that is more informal. All the references to technical terms that you've dug up are misleading and not directly relevant.
(that is, the representation of WHAT exactly: object layout in memory or private and public data that the type holds OR something else)
Yes, both the things you mention. For a concrete type the properties that define what it is and what it does include the type and layout of its members. i.e. how it is represented in the source code.
For an abstract class, the properties that define what it is and what it does are its member functions and their observable behaviour. The details of how it produces that observable behaviour are not necessarily important, and sometimes aren't even visible in the source code because you actually use some concrete class defined in another piece of code and only use it through an abstract interface.
Edit: Judging from the comments you wrote below you apparently missed that I tried to give you an answer. What I wrote above refers to the properties that define what a type is and what it does. That is a "definition of a type".
If you had to write documentation for a C++ type for users, how would you define it?
For a concrete type you might describe the types of its members and so define some of its properties in terms of the properties of its members. e.g. "A std::complex<float> stores two float members, which represent the real and imaginary parts of the complex number." This tells you that std::complex<float> can only store complex numbers with the same precision as float, i.e. its precision is determined by the fact it is represented using two float members.
For an abstract class you would describe the behaviour of its member functions, which are likely to be virtual, so you describe it in terms of the interface it follows, not in terms of the details of its implementation.
But they are not formal terms, I think you are wrong to treat them as strict technical terms. He's just using the words with their usual English meaning.
You go looking out for a vegetable in dinner tonight. Wait.. a vegetable? The word vegetable defines something for sure but it carries no representation. Someone will surely ask you which vegetable. So a vegetable is an abstract concept.
So now you order some potatoes and onions. Well, they define some properties and represent themselves well enough so that you can locate them in the store. Potatoes and onions make up for concrete representation of a type with a well defined property and behavior.
Try writing two classes following this analogy. You may connect to what is meant by representation is part of their definition.
I stumbled over the same passage in the text, and it took me a while, but I believe I deduced from the text what is meant by representation and definition of a class.
Answer to question 1: The representation of a type are the data members. Those are the members of the type which store the information/state, as opposed to the methods/operations on them.
Answer to question 2: The definition is simply the code implementing the class. (like the definition of Vector below).
Rationale: See section 2.3.2 of the same book and pay close attention on the use of the word 'representation':
Having the data specified separately from the operations on it has advantages, such as the ability to use the data in arbitrary ways. However, a tighter connection between the representation and the operations is needed for a user-defined type to have all the properties expected of a "real type."
It seems that "representation" here now replaced "data".
Here, the representation of a Vector (the members elem and sz) [...]
elem and sz are precisely the data members of the Vector class defined in that section:
class Vector {
public:
Vector(int s) :elem{new double[s]}, sz{s} {} // construct a Vector
double& operator[](int i) { return elem[i]; } // element access: subscripting
int size() { return sz; }
private:
double* elem; // pointer to the elements
int sz; // the number of elements
};
Further Explanation:
For a concrete type, it is possible from the definition to tell how much memory must be allocated for the data members of an object of this type. When you declare a variable to be of that type somewhere in the source code of your program, the compiler will know its size in memory.
In the case of the class Vector defined above, the memory required for the data members of an instance of that class would be whatever memory is needed for an integer sz and a pointer to a double elem.
An abstract type on the other hand, may not specify data members in its definition, so that the memory required for an object of such a type would be unknown.
For more on abstract types see section 3.2.2 of the same book and note that the abstract class Container defined in that section has no data members (further supporting my answer to question 1).
With this understanding in mind, some of the exposition that follows the sentence in question where the words definition and representation are used makes sense.
I'll paraphrase:
since the representation is part of the definition of a concrete type, we can place an object of such a type on the stack, in statically allocated memory, and in other objects and we can refer to such objects directly and without the use of pointers or references, etc..
If we didn't know the size of the data members of an object, we wouldn't be able to do these things.
In response to question 3:
I do not know the answer to question number 3, but believe that as stated in previous answers, the terminology used here is informal and shouldn't be viewed as some sort of standard. This goes with the spirit of the part of the book this is written in which is only giving a brief high-level informal overview over C++ not assuming previous knowledge and thus avoiding jargon.

Do the C++ standards guarantee that unused private fields will influence sizeof?

Consider the following struct:
class Foo {
int a;
};
Testing in g++, I get that sizeof(Foo) == 4 but is that guaranteed by the standard? Would a compiler be allowed to notice that a is an unused private field and remove it from the in-memory representation of the class (leading to a smaller sizeof)?
I don't expect any compilers to actually do that kind of optimization but this question popped up in a language lawyering discussion so now I'm curious.
The C++ standard doesn't define a lot about memory layouts. The fundamental rule for this case is item 4 under section 9 Classes:
4 Complete objects and member subobjects of class type shall have nonzero size. [ Note: Class objects can be assigned, passed as arguments to functions, and returned by functions (except objects of classes for which copying or moving has been restricted; see 12.8). Other plausible operators, such as equality comparison, can be defined by the user; see 13.5. — end note ]
Now there is one more restriction, though: Standard-layout classes. (no static elements, no virtuals, same visibility for all members) Section 9.2 Class members requires layout compatibility between different classes for standard-layout classes. This prevents elimination of members from such classes.
For non-trivial non-standard-layout classes I see no further restriction in the standard. The exact behavior of sizeof(), reinterpret_cast(), ... are implementation defined (i.e. 5.2.10 "The mapping function is implementation-defined.").
The answer is yes and no. A compiler could not exhibit exactly that behaviour within the standard, but it could do so partly.
There is no reason at all why a compiler could not optimise away the storage for the struct if that storage is never referenced. If the compiler gets its analysis right, then no program that you could write would ever be able to tell whether the storage exists or not.
However, the compiler cannot report a smaller sizeof() thereby. The standard is pretty clear that objects have to be big enough to hold the bits and bytes they contain (see for example 3.9/4 in N3797), and to report a sizeof smaller than that required to hold an int would be wrong.
At N3797 5.3.2:
The sizeof operator yields the number of bytes in the object
representation of its operand
I do not se that 'representation' can change according to whether the struct or member is referenced.
As another way of looking at it:
struct A {
int i;
};
struct B {
int i;
};
A a;
a.i = 0;
assert(sizeof(A)==sizeof(B));
I do not see that this assert can be allowed to fail in a standards-conforming implementation.
If you look at templates, you'll notice that "optimization" of such often ends up with nearly nothing in the output even though the template files may be thousands of lines...
I think that the optimization you are talking about will nearly always occur in a function when the object is used on the stack and the object doesn't get copied or passed down to another function and the private field is never accessed (not even initialized... which could be viewed as a bug!)

What is the size limit for a class?

I was wondering what the size limit for a class is. I did a simple test:
#define CLS(name,other) \
class name\
{\
public: \
name() {};\
other a;\
other b;\
other c;\
other d;\
other e;\
other f;\
other g;\
other h;\
other i;\
other j;\
other k;\
};
class A{
int k;
public:
A(){};
};
CLS(B,A);
CLS(C,B);
CLS(D,C);
CLS(E,D);
CLS(F,E);
CLS(G,F);
CLS(H,G);
CLS(I,H);
CLS(J,I);
It fails to compile with
"'J' : class is too large"
If I remove the final declaration - CLS(J,I);, it all compiles fine.
Is this a compiler-imposed restriction, or is it somewhere in the standard?
In C++11 this is Annex B. Implementations can impose limits, but they should be at least:
Size of an object [262 144].
Data members in a single class [16 384].
Members declared in a single class [4 096].
The third one isn't directly relevant to the kind of construction you're using, I mention it just because it indicates that the second one is indeed the total members, presumably including those in bases and I'm not sure about members-of-members. But it's not just about the members listed in a single class definition.
Your implementation appears to have given up either 2^31 data members, or at size 2^32, since it accepts I but not J. It's fairly obviously reasonable for a compiler to refuse to consider classes with size greater than SIZE_MAX, even if the program happens not to instantiate it or use sizeof on the type. So even with the best possible effort on the part of the compiler I wouldn't ever expect this to work on a 32 bit implementation.
Note that "these quantities are only guidelines and do not determine compliance", so a conforming implication can impose an arbitrary smaller limit even where it has sufficient resources to compile a program that uses larger numbers. There's no minimum limit for conformance.
There are various opportunities in the C++ standard for a conforming implementation to be useless due to ridiculously small resource limits, so there's no additional harm done if this is another one.
C++03 is more-or-less the same:
Size of an object [262 144].
Data members in a single class, structure, or union [16 384].
Members declared in a single class [4 096].
I wanted to mention another place in which class size limit is mentioned, which is in section 1.2 of the Itanium C++ ABI draft
Various representations specified by this ABI impose limitations on
conforming user programs. These include, for the 64-bit Itanium ABI:
The offset of a non-virtual base subobject in the full object
containing it must be representable by a 56-bit signed integer (due to
RTTI implementation). This implies a practical limit of 2**55 bytes on
the size of a class.
I'm sure its compiler dependent. You can run your compiler in a preprocess only mode to see what the generated output is if you're curious. You might also want to look at template expansion rather than macros.