their representation is part of their definition as related to C++ concrete types? - c++

In both of his books
The C++ Programming Language, 2013 (4th edition) and
A Tour of C++, 2013
Bjarne Stroustrup writes:
Types such as complex ... are called concrete types because
their representation is part of their definition.
What follows to some extent clarifies the above statement:
In that, they resemble built-in types. In contrast, an abstract type
is a type that completely insulates a user from implementation
details. To do that, we decouple the interface from the
representation and give up genuine local variables. Since we don’t
know anything about the representation of an abstract type (not even
its size), we must allocate objects on the free store and access them
through references or pointers.
Questions
In the phrase "...their representation is part of their definition."
What is the meaning of type representation? That is, the representation of what exactly: The object layout in memory? The private and public data that the type holds? Or something else?
What is the meaning of type definition?
Are these typical meanings of type representation and definition as related to C++?
I decided to do some more research and I checked other sources. First I looked through ISO/IEC 14882:2011 specifications that state requirements for implementations of the C++ programming language, then through other sources.
Ad question 1
I was not able to find in ISO specs anything like "type representation" or "representation of a type". Instead there are 2 terms related to objects:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
So it seems to me that the term type representation does not have any conventional meaning within the ISO standards.
Ok. Maybe it is something outside the ISO standards? Let's see what
Linux Standard Base C++ Specification 3.1 > Chapter 7. C++ Class Representations > 7.1. C++ Data Representation says:
An object file generated by the compilation process for a C++ program shall contain several closely related internal objects, or Class Components, to represent each C++ Class. Such objects are not a visible part of the source code. The following table describes these Class Components at a high level.
Table Class Components
Object.......................Contains
=----------------------------------------=
Class Data...................Class members
Virtual Table................Information needed to dispatch virtual functions,
access virtual base class subobjects and to access
the RTTI information
RTTI.........................Run-Time Type Information used by the typeid and
dynamic_cast operators, and exception handlers
Typeinfo Name................String representation of Class name
Construction Virtual Table...Information needed during construction and
destruction of Classes with non-trivial
inheritance relationships.
VTT..........................A table of virtual table pointers which holds the
addresses of construction and non-construction
virtual tables.
Ad question 2
I was again not able to find in ISO specs an explicit explanation of type definition.
Instead I found the following:
A declaration may introduce one or more names into a translation
unit... A class declaration introduces the class name into the
scope where it is declared...A declaration is a definition unless
[I removed things not directly related to the class declaration], ...
it is a class name declaration...
Here is a Microsoft interpretation of the same thing:
C++ Declarations - MSDN - Microsoft
A declaration introduces
one or more names into a program. Declarations can occur more than
once in a program...Declarations also serve as definitions, except
when the declaration:...;Is a class name declaration with no
following definition, such as class T;...
and
C++ Definitions - MSDN - Microsoft
A definition is a unique
specification of an object or variable, function, class, or
enumerator. Because definitions must be unique, a program can contain
only one definition for a given program element. There can be a
many-to-one correspondence between declarations and definitions.
There are two cases in which a program element can be declared and not defined: A function is declared but never referenced with a
function call or with an expression that takes the function's address.
A class is used only in a way that does not require its definition be
known.
Examples:
struct S; // declares, but not defines S
class T {}; // declares, and defines T
class P { int a;}; // declares, and defines P, P::a
Conclusions:
Candidate Answer N1:
proposed by Jonathan Wakely
(below is my understanding)
The phrase "Types such as complex ... are called concrete types because their representation is part of their definition" should be interpreted and understood in the following way:
● their(=type) definition is a technical c++ term whose meaning is conventional and can be found in c++ specs;
● their(=type) representation is (according to Jonathan Wakely) not a technical c++ term in this context, but its meaning can be easily figured out by anybody who understands English language well enough (and probably, it is my guess, has been previously exposed to the generous amount of c++ codes and texts). Type representation in this context means
"the properties that define what the type is and what it does", that is:
"for a concrete type: the type and layout of its members",
"for an abstract type: its member functions and their observable behavior"
● The whole phrase then (we are talking about the concrete classes) translates to:
"Types such as complex ... are called concrete types because the types and layouts of their members are part of their definition"
I think this interpretation makes sense, is understandable, and also agrees well with what follows it in the BS books.
Please correct me if something here is not ok**

QUESTIONS: in the phrase "...their representation is part of their definition." 1) What is the meaning of type representation? (that is, the representation of WHAT exactly: object layout in memory or private and public data that the type holds OR something else) 2) What is the meaning of type definition? 3) Are these typical meanings of type representation and definition as related to c++?
You're asking for the meaning of terms that Stroustrup doesn't use in the text you quoted!
He's not trying to define a formal specification of a term like "type representation" the way the C++ standard does, he's writing prose that is more informal. All the references to technical terms that you've dug up are misleading and not directly relevant.
(that is, the representation of WHAT exactly: object layout in memory or private and public data that the type holds OR something else)
Yes, both the things you mention. For a concrete type the properties that define what it is and what it does include the type and layout of its members. i.e. how it is represented in the source code.
For an abstract class, the properties that define what it is and what it does are its member functions and their observable behaviour. The details of how it produces that observable behaviour are not necessarily important, and sometimes aren't even visible in the source code because you actually use some concrete class defined in another piece of code and only use it through an abstract interface.
Edit: Judging from the comments you wrote below you apparently missed that I tried to give you an answer. What I wrote above refers to the properties that define what a type is and what it does. That is a "definition of a type".
If you had to write documentation for a C++ type for users, how would you define it?
For a concrete type you might describe the types of its members and so define some of its properties in terms of the properties of its members. e.g. "A std::complex<float> stores two float members, which represent the real and imaginary parts of the complex number." This tells you that std::complex<float> can only store complex numbers with the same precision as float, i.e. its precision is determined by the fact it is represented using two float members.
For an abstract class you would describe the behaviour of its member functions, which are likely to be virtual, so you describe it in terms of the interface it follows, not in terms of the details of its implementation.
But they are not formal terms, I think you are wrong to treat them as strict technical terms. He's just using the words with their usual English meaning.

You go looking out for a vegetable in dinner tonight. Wait.. a vegetable? The word vegetable defines something for sure but it carries no representation. Someone will surely ask you which vegetable. So a vegetable is an abstract concept.
So now you order some potatoes and onions. Well, they define some properties and represent themselves well enough so that you can locate them in the store. Potatoes and onions make up for concrete representation of a type with a well defined property and behavior.
Try writing two classes following this analogy. You may connect to what is meant by representation is part of their definition.

I stumbled over the same passage in the text, and it took me a while, but I believe I deduced from the text what is meant by representation and definition of a class.
Answer to question 1: The representation of a type are the data members. Those are the members of the type which store the information/state, as opposed to the methods/operations on them.
Answer to question 2: The definition is simply the code implementing the class. (like the definition of Vector below).
Rationale: See section 2.3.2 of the same book and pay close attention on the use of the word 'representation':
Having the data specified separately from the operations on it has advantages, such as the ability to use the data in arbitrary ways. However, a tighter connection between the representation and the operations is needed for a user-defined type to have all the properties expected of a "real type."
It seems that "representation" here now replaced "data".
Here, the representation of a Vector (the members elem and sz) [...]
elem and sz are precisely the data members of the Vector class defined in that section:
class Vector {
public:
Vector(int s) :elem{new double[s]}, sz{s} {} // construct a Vector
double& operator[](int i) { return elem[i]; } // element access: subscripting
int size() { return sz; }
private:
double* elem; // pointer to the elements
int sz; // the number of elements
};
Further Explanation:
For a concrete type, it is possible from the definition to tell how much memory must be allocated for the data members of an object of this type. When you declare a variable to be of that type somewhere in the source code of your program, the compiler will know its size in memory.
In the case of the class Vector defined above, the memory required for the data members of an instance of that class would be whatever memory is needed for an integer sz and a pointer to a double elem.
An abstract type on the other hand, may not specify data members in its definition, so that the memory required for an object of such a type would be unknown.
For more on abstract types see section 3.2.2 of the same book and note that the abstract class Container defined in that section has no data members (further supporting my answer to question 1).
With this understanding in mind, some of the exposition that follows the sentence in question where the words definition and representation are used makes sense.
I'll paraphrase:
since the representation is part of the definition of a concrete type, we can place an object of such a type on the stack, in statically allocated memory, and in other objects and we can refer to such objects directly and without the use of pointers or references, etc..
If we didn't know the size of the data members of an object, we wouldn't be able to do these things.
In response to question 3:
I do not know the answer to question number 3, but believe that as stated in previous answers, the terminology used here is informal and shouldn't be viewed as some sort of standard. This goes with the spirit of the part of the book this is written in which is only giving a brief high-level informal overview over C++ not assuming previous knowledge and thus avoiding jargon.

Related

Rationale for non-virtual derived class not being pointer-interconvertible with its first base

There are a few questions and answers on the site already concerning pointer-interconvertibility of structs and their first member variable, as well as structs and their first public base. This question is one of them, for example.
However, what I'm interested in is not the fact that it's undefined behavior to reinterpret_cast (or static_cast through a void *) between a non-standard-layout struct and its public base, but rather the reasoning why the C++ standard currently forbids such casts. The existing questions and answers don't cover this aspect.
Consider the following example in particular (Godbolt):
#include <type_traits>
struct Base {
int m_base_var = 1;
};
struct Derived: public Base {
int m_derived_var = 2;
};
Derived g_derived;
constexpr Derived *g_pDerived = &g_derived;
constexpr Base *g_pBase = &g_derived;
constexpr void *g_pvDerived = &g_derived;
//These assertions all hold
static_assert(!std::is_pointer_interconvertible_base_of_v<Base, Derived>);
static_assert((void *)g_pDerived == (void *)g_pBase);
static_assert((void *)g_pDerived == g_pvDerived);
static_assert((void *)g_pBase == g_pvDerived);
//This is well-defined and returns &g_derived
Derived * getDerived() {
return static_cast<Derived *>(g_pvDerived);
}
//This is also well-defined; outer static_cast added to illustrate the sequence of conversions
Base * getBase() {
return static_cast<Base *>(static_cast<Derived *>(g_pvDerived));
}
//This is UB due to the first static_assert!
Base * getBaseUB() {
return static_cast<Base *>(g_pvDerived);
}
As you can see from the Godbolt link, all three functions compile to the exact same assembly on x86-64 GCC. However, the standard forbids the third variant since Base is not a pointer-interconvertible base of Derived.
My question is: Is there an obvious reason why the standard forbids this kind of cast? In particular, on all implementations that I know of, the value of a pointer to the Base subobject is the same as that of the pointer to the whole Derived, and I don't see a particular reason why Derived should not be considered standard-layout anymore. (In other words, Base lives at offset zero within Derived.) Would it be legal for a C++ implementation to place Base at a non-zero offset within Derived? (Is there maybe an implementation already that does this?)
Note that this question is only about cases without virtual member functions / virtual inheritance / multiple inheritance.
This is really two question:
What is standard layout, like really?
Why is pointer-interconvertibility linked to standard layout?
Standard layout was constructed out of one half of the pre-C++11 concept of "plain old data" types. The other half is trivial copyability (ie: memcpying an instance of the object is just as good as a copy constructor). These two halves didn't really interact, but POD required both.
From a purely standard C++ perspective, standard layout is a requirement for the ability to construct a type whose layout matches an existing type, such that if you shove both of those types into a union, you can access subobjects of the non-active members. This is the core functionality that standard layout enabled within the language since it was invented in C++11.
This is why standard layout only allows one member in the class hierarchy to have non-static data members. If only one class has NSDMs, then there's no question about the ordering between NSDMs of different base classes and so forth. And being able to know a priori what that ordering is is vital for being able to know that two types match.
This is also useful for communicating across languages.
Once standard layout was defined however, it started getting used for things that were... less clearly part of its domain.
For example, standard layout became the determinator for whether offsetof was valid behavior. This is due to offsetof oroginally being based on being a POD type, so when the layout part was spun off, offsetof was updated to use that. However, this was suboptimal, as the only layout-based thing that would break offsetof is having virtual base classes (the offset of members of a virtual base class depends on the most-derived class, which depends on the runtime type of the object). Now, the new restriction was still was better than POD, but it could have been expanded to include more stuff. But that would mean coming up with a new definition.
Something similar likely goes with pointer-interconvertibility. This concept was invented in C++17 to resolve various issues with the object model. There is no evidence in the papers explaining why they picked standard layout to hang pointer-interconvertibility on. But it was an existing tool with well-defined rules that already had well-defined rules on what the "first subobject" was for any given type.
Expanding the rules like you wants requires creating a new definition for "first suboject".
Is a base class pointer-interconvertible to a particular derived class? Well, that depends on what other classes that derived class inherits from. Is the first NSDM pointer-interconvertible to the class it is a member of? That depends on what other classes are involved in the inheritance diagram.
These dependencies already exist, but they all key off of a specific, pre-existing rule. What you want requires creating a new rule that's more complex. It will have to copy 90% of the existing standard-layout rules (forbidding virtual, public/private members, etc) and then add its own rules. The first NSDM is pointer-interconvertible unless any base classes are non-empty. Any particular base class is pointer-interconvertible only so long as all previous base classes in declaration order are empty of NSDMs.
It's so much easier to just piggyback off of the standard layout rules and say "a standard-layout type is pointer-interconvertible with its first NSDM and all of its base classes."
Having an extra rule also imposes some burden on the specification. It's a partial redundancy, and that breeds errors. For example, C++23 is on track to expand standard layout types by removing the forbidding of mixing public and private members, forcing layout to be ordered strictly by declaration. If pointer-interconvertibility had its own rules, it would have been possible to update standard-layout but not pointer-interconvertibility.
Both C and C++ were defined by tradition before standards were written. When the first standards were written, it was more important to avoid requiring that any implementations break programs that existed for them, than to forbid implementations that would processing various programs from changing in such a fashion as to break them.
Although it would be typical for a derived class object to store all of the parent data in a sequence of consecutive bytes, followed by the remainder of the data for the object, it could sometimes be useful for implementations to deviate from that. If a base class had a char field, a derived class had an int, and sub-derived class had another five char fields, an implementation that nestles one of the sub-derived class fields between the base-class field and first-derived class might be able to store data more compactly than one which places everything from each layer in a single sequence of consecutive bytes.
Note that the C++ Standard expressly waives jurisdiction over what C++ programs should be viewed as "conforming". As such, there was no perceived need to ensure that it didn't classify as UB any programs which most implementations should process usefully. If the authors had foreseen the way compilers would treat the Committee's decisions to waive jurisdiction over various constructs as an invitation to process them nonsensically, they probably would have defined behavior in many more corner cases than they actually did.

How are primitve/fundamental datatypes in C++ structured?

As the title says I want to know how primitive/fundamental datatypes in C++ are structured? If I remember right basically when programming I have always treated them like "class objects". So I ask myself if they are structured the same way like for example
class int
{
//content of int
};
After some basic research I don't think primitve datatypes are structured like this. Still I need to know how they are structured.
C++ language has grammar for specifying classes, and the definition of a class can be shown by writing it in C++:
class class_type {}; // an example of a class
Unlike class types, fundamental types aren't and cannot be defined using the C++ language and as such, it is not possible to show their definition using C++.
The standard specifies the behaviour of fundamental types. Objects of fundamental types occupy some memory storage, and the state of the memory represents some value. Exactly what memory representation has any particular value is not specified by the standard and can be different across separate systems.

Do the C++ standards guarantee that unused private fields will influence sizeof?

Consider the following struct:
class Foo {
int a;
};
Testing in g++, I get that sizeof(Foo) == 4 but is that guaranteed by the standard? Would a compiler be allowed to notice that a is an unused private field and remove it from the in-memory representation of the class (leading to a smaller sizeof)?
I don't expect any compilers to actually do that kind of optimization but this question popped up in a language lawyering discussion so now I'm curious.
The C++ standard doesn't define a lot about memory layouts. The fundamental rule for this case is item 4 under section 9 Classes:
4 Complete objects and member subobjects of class type shall have nonzero size. [ Note: Class objects can be assigned, passed as arguments to functions, and returned by functions (except objects of classes for which copying or moving has been restricted; see 12.8). Other plausible operators, such as equality comparison, can be defined by the user; see 13.5. — end note ]
Now there is one more restriction, though: Standard-layout classes. (no static elements, no virtuals, same visibility for all members) Section 9.2 Class members requires layout compatibility between different classes for standard-layout classes. This prevents elimination of members from such classes.
For non-trivial non-standard-layout classes I see no further restriction in the standard. The exact behavior of sizeof(), reinterpret_cast(), ... are implementation defined (i.e. 5.2.10 "The mapping function is implementation-defined.").
The answer is yes and no. A compiler could not exhibit exactly that behaviour within the standard, but it could do so partly.
There is no reason at all why a compiler could not optimise away the storage for the struct if that storage is never referenced. If the compiler gets its analysis right, then no program that you could write would ever be able to tell whether the storage exists or not.
However, the compiler cannot report a smaller sizeof() thereby. The standard is pretty clear that objects have to be big enough to hold the bits and bytes they contain (see for example 3.9/4 in N3797), and to report a sizeof smaller than that required to hold an int would be wrong.
At N3797 5.3.2:
The sizeof operator yields the number of bytes in the object
representation of its operand
I do not se that 'representation' can change according to whether the struct or member is referenced.
As another way of looking at it:
struct A {
int i;
};
struct B {
int i;
};
A a;
a.i = 0;
assert(sizeof(A)==sizeof(B));
I do not see that this assert can be allowed to fail in a standards-conforming implementation.
If you look at templates, you'll notice that "optimization" of such often ends up with nearly nothing in the output even though the template files may be thousands of lines...
I think that the optimization you are talking about will nearly always occur in a function when the object is used on the stack and the object doesn't get copied or passed down to another function and the private field is never accessed (not even initialized... which could be viewed as a bug!)

What are the storage classes in D?

Up until now I was under the impression that things like immutable and const were storage classes. In a recent video (at around 11:55) Walter Bright states that immutable is not a storage class, but rather it's a type constructor. In the official documentation, immutable, const, and among many other keywords, are listed as being storage classes:
StorageClass:
abstract
auto
const
deprecated
enum
extern
final
immutable
inout
shared
nothrow
override
pure
__gshared
Property
scope
static
synchronized
Is this list wrong? Some of it doesn't make sense(e.g., deprecated, override).
I know static and ref are storage classes, but what's the rest? And which one of the keywords in D are type constructors?
I would point out that there is a big difference between a grammar rule named StorageClass, and what is semantically a storage class in the language. The grammar rules have to do with parsing, not the semantic phase of compilation.
First off, TDPL, chapter 8, is explicitly about type qualifiers (for which Walter used the term type constructor). There are only 3 of them in D:
const
immutable
shared
All three of them are a part of the type that they modify. Such is not true with storage classes such as ref.
inout is what TDPL calls a "wildcard qualifier symbol," so it's a placeholder for a type qualifier rather than really being either a type qualifer or a storage class.
Now, as to what's a storage classes or not, I give two quotes from TDPL:
Each function parameter (base and exponent in the example above) has, in addition to its type, an optional storage class that decides the way that arguments are passed to the function when invoked.
(from pages 6 - 7)
Although static is not related to passing arguments to functions, discussing it here is appropriate because, just like ref, static applied to data is a storage class, meaning an indication about a detail regarding how data is stored.
(from page 137)
Also, there's this line with regards to storage classes in C which seems to be used quite a bit in explanations on storage classes in C found online:
A storage class defines the scope (visibility) and life time of variables and/or functions within a C Program.
A storage class has no effect on the type of a variable, just how it's stored. Unfortunately, I can't find an exact list of storage classes in D, and people are quite liberal with the term storage class, using it even when it doesn't apply. Pretty much any attribute applied to a type save for access modifiers seems to get called a storage class, depending on who's talking. However, there are a few which are beyond a doubt storage classes:
enum (when used as a manifest constant)
extern
lazy
out
ref
scope
static
lazy, out, and ref can be used to modify function parameters and indicate how they're passed, whereas enum and static are used to indicate how the variables are stored (which is nowhere in the case of enum, since manifest constants are copy-pasted everywhere that they're used rather than being actual variables). extern affects linkage.
in is a hybrid, since it's a synonym for scope const, and while scope is a storage class, const is a type qualifier.
The online documentation also refers to auto and synchronized as storage classes, though I don't know on what basis. auto is like inout in that it's a placeholder (in its case a placeholder for a type rather than a type qualifier) and therefore indicates nothing about how a type is stored, so I wouldn't have thought that it would be a storage class. synchronized doesn't modify variables but rather classes.
__gshared is probably a storage class as well, though it's a bit funny, since it does more or less what shared (which is a type qualifier) does, but it's not part of the type.
Beyond that, I don't know. The fact that synchronized is listed as a storage class implies that some of the others (such as final) might be, but (like synchronized) they have nothing to do with how variables are stored or linked. So, I don't know how they could be considered storage classes.
I'll ask on the newsgroup though and see if I can get a more definitive list.
EDIT: It seems that there is no definitive, official list of storage classes in D. The term is used for almost any attribute used on a variable declaration which doesn't affect its type (i.e. not a type qualifier). It seems that Walter and Andrei tend to make a big point about the type qualifiers to underline which attributes actually affect the type of a variable, but the term storage class hasn't been given anywhere near the same level of importance and ends up being used informally rather than per any rigorous definition.

what is a type in C++?

What all constructs(class,struct,union) classify as types in C++? Can anyone explain the rationale behind calling and qualifying certain C++ constructs as a 'type'.
$3.9/1 - "There are two kinds of
types: fundamental types and compound
types. Types describe objects (1.8),
references (8.3.2), or functions
(8.3.5). ]"
Fundamental types are char, int, bool and so on.
Compound types are arrays, enums, classes, references, unions etc
A variable contains a value.
A type is a specification of the value. (eg, number, text, date, person, truck)
All variables must have a type, because they must hold strictly defined values.
Types can be built-in primitives (such as int), custom types (such as enums and classes), or some other things.
Other answers address the kinds of types C++ makes available, so I'll address the motivation part. Note that C++ didn't invent the notion of a data type. Quoting from the Wikipedia entry on Type system.
a type system may be defined as "a
tractable syntactic framework for
classifying phrases according to the
kinds of values they compute"
Another interesting definition from the Data Type page:
a data type (or datatype) is a
classification identifying one of
various types of data, such as
floating-point, integer, or Boolean,
stating the possible values for that
type, the operations that can be done
on that type, and the way the values
of that type are stored
Note that this last one is very close to what C++ means by "type". Perhaps obvious for the built-in (fundamental) types like bool:
possible values are true and false
operations - per definition of arithmetic operators that can accept bool as argument
the way it's stored - actually not mandated by the C++ standard, but one can guess that on some systems a type requiring only a single bit can be stored efficiently (although I think most C++ systems don't do this optimization).
For more complex, user created, types, the situation is more difficult. Consider enum types: you know exactly the range of values a variable of an enum type can get. What about struct and class? There, also, your type declaration tells the compiler what possible values the struct can have, what operations you can do on it (operator overloading and functions accepting objects of this type), and it will even infer how to store it.
Re range of values, although huge, remember it's finite. Even a struct with N 32-bit integers has a finite range of possible values, which is 2^(32N).
Cite from the book "Bjarne Stroustrup - Programming Principles and Practice Using C++", page 77, chapter 3.8:
A type defines a set of possible values and a set of operations (for an object).
An object is some memory that holds a value of a given type.
A value is a set of bits in memory interpreted according to a type.
A variable is a named object.
A declaration is a statement that gives a name to an object.
A definition is a declaration that sets aside memory for an object.
Sounds like a matter of semantics to me...A type refers to something with a construct that can be used to describe it in a away that conforms to traditional Object Oriented concepts(properties and methods). Anything that isn't called a type is probably created with a less robust construct.