I'm watching Chandler Carruth's talk in CppCon 2019:
There are no Zero-Cost Abstractions
in it, he gives the example of how he was surprised by just how much overhead you incur by using an std::unique_ptr<int> over an int*; that segment starts about at time point 17:25.
You can have a look at the compilation results of his example pair-of-snippets (godbolt.org) - to witness that, indeed, it seems the compiler is not willing to pass the unique_ptr value - which in fact in the bottom line is just an address - inside a register, only in straight memory.
One of the points Mr. Carruth makes at around 27:00 is that the C++ ABI requires by-value parameters (some but not all; perhaps - non-primitive types? non-trivially-constructible types?) to be passed in-memory rather than within a register.
My questions:
Is this actually an ABI requirement on some platforms? (which?) Or maybe it's just some pessimization in certain scenarios?
Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
Has the C++ standards committee discussed this point in recent years, or ever?
PS - So as not to leave this question with no code:
Plain pointer:
void bar(int* ptr) noexcept;
void baz(int* ptr) noexcept;
void foo(int* ptr) noexcept {
if (*ptr > 42) {
bar(ptr);
*ptr = 42;
}
baz(ptr);
}
Unique pointer:
using std::unique_ptr;
void bar(int* ptr) noexcept;
void baz(unique_ptr<int> ptr) noexcept;
void foo(unique_ptr<int> ptr) noexcept {
if (*ptr > 42) {
bar(ptr.get());
*ptr = 42;
}
baz(std::move(ptr));
}
Is this actually an ABI requirement, or maybe it's just some pessimization in certain scenarios?
One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux:
If a C++ object has either a non-trivial copy constructor or a non-trivial
destructor, it is passed by invisible reference (the object is replaced in the
parameter list by a pointer that has class INTEGER).
An object with either a non-trivial copy constructor or a non-trivial destructor cannot be
passed by value because such objects must have well defined addresses. Similar issues apply
when returning an object from a function.
Note, that only 2 general purpose registers can be used for passing 1 object with a trivial copy constructor and a trivial destructor, i.e. only values of objects with sizeof no greater than 16 can be passed in registers. See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.
There are different ABIs for other CPU architectures.
There is also Itanium C++ ABI which most compilers comply with (apart from MSVC), which requires:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
A type is considered non-trivial for the purposes of calls if:
it has a non-trivial copy constructor, move constructor, or destructor, or
all of its copy and move constructors are deleted.
This definition, as applied to class types, is intended to be the complement of the definition in [class.temporary]p3 of types for which an extra temporary is allowed when passing or returning a type. A type which is trivial for the purposes of the ABI will be passed and returned according to the rules of the base C ABI, e.g. in registers; often this has the effect of performing a trivial copy of the type.
Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
It is an implementation detail, but when an exception is handled, during stack unwinding, the objects with automatic storage duration being destroyed must be addressable relative to the function stack frame because the registers have been clobbered by that time. Stack unwinding code needs objects' addresses to invoke their destructors but objects in registers do not have an address.
Pedantically, destructors operate on objects:
An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction.
and an object cannot exist in C++ if no addressable storage is allocated for it because object's identity is its address.
When an address of an object with a trivial copy constructor kept in registers is needed the compiler can just store the object into memory and obtain the address. If the copy constructor is non-trivial, on the other hand, the compiler cannot just store it into memory, it rather needs to call the copy constructor which takes a reference and hence requires the address of the object in the registers. The calling convention probably cannot depend whether the copy constructor was inlined in the callee or not.
Another way to think about this, is that for trivially copyable types the compiler transfers the value of an object in registers, from which an object can be recovered by plain memory stores if necessary. E.g.:
void f(long*);
void g(long a) { f(&a); }
on x86_64 with System V ABI compiles into:
g(long): // Argument a is in rdi.
push rax // Align stack, faster sub rsp, 8.
mov qword ptr [rsp], rdi // Store the value of a in rdi into the stack to create an object.
mov rdi, rsp // Load the address of the object on the stack into rdi.
call f(long*) // Call f with the address in rdi.
pop rax // Faster add rsp, 8.
ret // The destructor of the stack object is trivial, no code to emit.
In his thought-provoking talk Chandler Carruth mentions that a breaking ABI change may be necessary (among other things) to implement the destructive move that could improve things. IMO, the ABI change could be non-breaking if the functions using the new ABI explicitly opt-in to have a new different linkage, e.g. declare them in extern "C++20" {} block (possibly, in a new inline namespace for migrating existing APIs). So that only the code compiled against the new function declarations with the new linkage can use the new ABI.
Note that ABI doesn't apply when the called function has been inlined. As well as with link-time code generation the compiler can inline functions defined in other translation units or use custom calling conventions.
With common ABIs, non-trivial destructor -> can't pass in registers
(An illustration of a point in #MaximEgorushkin's answer using #harold's example in a comment; corrected as per #Yakk's comment.)
If you compile:
struct Foo { int bar; };
Foo test(Foo byval) { return byval; }
you get:
test(Foo):
mov eax, edi
ret
i.e. the Foo object is passed to test in a register (edi) and also returned in a register (eax).
When the destructor is not trivial (like the std::unique_ptr example of OP's) - Common ABIs require placement on the stack. This is true even if the destructor does not use the object's address at all.
Thus even in the extreme case of a do-nothing destructor, if you compile:
struct Foo2 {
int bar;
~Foo2() { }
};
Foo2 test(Foo2 byval) { return byval; }
you get:
test(Foo2):
mov edx, DWORD PTR [rsi]
mov rax, rdi
mov DWORD PTR [rdi], edx
ret
with useless loading and storing.
Is this actually an ABI requirement on some platforms? (which?) Or maybe it's just some pessimization in certain scenarios?
If something is visible at the compliation unit boundry then whether it is defined implicitly or explicitly it becomes part of the ABI.
Why is the ABI like that?
The fundamental problem is that registers get saved and restored all the time as you move down and up the call stack. So it's not practical to have a reference or pointer to them.
In-lining and the optimizations that result from it is nice when it happens, but an ABI designer can't rely on it happening. They have to design the ABI assuming the worst case. I don't think programmers would be very happy with a compiler where the ABI changed depending on the optimization level.
A trivially copyable type can be passed in registers because the logical copy operation can be split into two parts. The parameters are copied to the registers used for passing parameters by the caller and then copied to the local variable by the callee. Whether the local variable has a memory location or not is thus only the concern of the callee.
A type where a copy or move constructor must be used on the other hand cannot have it's copy operation split up in this way, so it must be passed in memory.
Has the C++ standards committee discussed this point in recent years, or ever?
I have no idea if the standards bodies have considered this.
The obvious solution to me would be to add proper destructive moves (rather than the current half-way house of a "valid but otherwise unspecified state") to the langauge, then introduce a way to flag a type as allowing for "trivial destructive moves" even if it does not allow for trivial copies.
but such a solution WOULD require breaking the ABI of existing code to implement for existing types, which may bring a fair bit of resistance (though ABI breaks as a result of new C++ standard versions are not unprecedented, for example the std::string changes in C++11 resulted in an ABI break..
First we need to go back to what it means to pass by value and by reference.
For languages like Java and SML, pass by value is straightforward (and there is no pass by reference), just as copying a variable value is, as all variables are just scalars and have builtin copy semantic: they are either what who count as arithmetic type in C++, or "references" (pointers with different name and syntax).
In C we have scalar and user defined types:
Scalars have a numeric or abstract value (pointers are not numbers, they have an abstract value) that is copied.
Aggregate types have all their possibly initialized members copied:
for product types (arrays and structures): recursively, all members of structures and elements of arrays are copied (the C function syntax doesn't make it possible to pass arrays by value directly, only arrays members of a struct, but that's a detail).
for sum types (unions): the value of the "active member" is preserved; obviously, member by member copy isn't in order as not all members can be initialized.
In C++ user defined types can have user defined copy semantic, which enable truly "object oriented" programming with objects with ownership of their resources and "deep copy" operations. In such case, a copy operation is really a call to a function that can almost do arbitrary operations.
For C structs compiled as C++, "copying" is still defined as calling the user defined copy operation (either constructor or assignment operator), which are implicitly generated by the compiler. It means that the semantic of a C/C++ common subset program is different in C and C++: in C a whole aggregate type is copied, in C++ an implicitly generated copy function is called to copy each member; the end result being that in either case each member is copied.
(There is an exception, I think, when a struct inside a union is copied.)
So for a class type, the only way (outside union copies) to make a new instance is via a constructor (even for those with trivial compiler generated constructors).
You can't take the address of an rvalue via unary operator & but that doesn't mean that there is no rvalue object; and an object, by definition, has an address; and that address is even represented by a syntax construct: an object of class type can only be created by a constructor, and it has a this pointer; but for trivial types, there is no user written constructor so there no place to put this until after the copy is constructed, and named.
For scalar type, the value of an object is the rvalue of the object, the pure mathematical value stored into the object.
For a class type, the only notion of a value of the object is another copy of the object, which can only be made by a copy constructor, a real function (although for trivial types that function is so specially trivial, these can sometimes be created without calling the constructor). That means that the value of object is the result of change of global program state by an execution. It doesn't access mathematically.
So pass by value really isn't a thing: it's pass by copy constructor call, which is less pretty. The copy constructor is expected to perform a sensible "copy" operation according to the proper semantic of the object type, respecting its internal invariants (which are abstract user properties, not intrinsic C++ properties).
Pass by value of a class object means:
create another instance
then make the called function act on that instance.
Note that the issue has nothing to do with whether the copy itself is an object with an address: all function parameters are objects and have an address (at the language semantic level).
The issue is whether:
the copy is a new object initialized with the pure mathematical value (true pure rvalue) of original object, as with scalars;
or the copy is the value of original object, as with classes.
In the case of a trivial class type, you can still define the member of member copy of the original, so you get to define the pure rvalue of the original because of triviality of the copy operations (copy constructor and assignment). Not so with arbitrary special user functions: a value of the original has to be a constructed copy.
Class objects must be constructed by the caller; a constructor formally has a this pointer but formalism isn't relevant here: all objects formally have an address but only those that actually get their address used in non purely local ways (unlike *&i = 1; which is purely local use of address) need to have a well defined address.
An object must absolutely by passed by address if it must appear to have an address in both these two separately compiled functions:
void callee(int &i) {
something(&i);
}
void caller() {
int i;
callee(i);
something(&i);
}
Here even if something(address) is a pure function or macro or whatever (like printf("%p",arg)) that can't store the address or communicate to another entity, we have the requirement to pass by address because the address must be well defined for a unique object int that has an unique identity.
We don't know if an external function will be "pure" in term of addresses passed to it.
Here the potential for a real use of the address in either a non trivial constructor or destructor on the caller side is probably the reason for taking the safe, simplistic route and give the object an identity in the caller and pass its address, as it makes sure that any non trivial use of its address in the constructor, after construction and in the destructor is consistent: this must appear to be the same over the object existence.
A non trivial constructor or destructor like any other function can use the this pointer in a way that requires consistency over its value even though some object with non trivial stuff might not:
struct file_handler { // don't use that class!
file_handler () { this->fileno = -1; }
file_handler (int f) { this->fileno = f; }
file_handler (const file_handler& rhs) {
if (this->fileno != -1)
this->fileno = dup(rhs.fileno);
else
this->fileno = -1;
}
~file_handler () {
if (this->fileno != -1)
close(this->fileno);
}
file_handler &operator= (const file_handler& rhs);
};
Note that in that case, despite explicit use of a pointer (explicit syntax this->), the object identity is irrelevant: the compiler could well use bitwise copy the object around to move it and to do "copy elision". This is based on the level of "purity" of the use of this in special member functions (address doesn't escape).
But purity isn't an attribute available at the standard declaration level (compiler extensions exist that add purity description on non inline function declaration), so you can't define an ABI based on purity of code that may not be available (code may or may not be inline and available for analysis).
Purity is measured as "certainly pure" or "impure or unknown". The common ground, or upper bound of semantics (actually maximum), or LCM (Least Common Multiple) is "unknown". So the ABI settles on unknown.
Summary:
Some constructs require the compiler to define the object identity.
The ABI is defined in term of classes of programs and not specific cases that might be optimized.
Possible future work:
Is purity annotation useful enough to be generalized and standardized?
On Win32 I just noticed that making a struct non-POD causes the signature of functions that return the struct by value to change.
Why is this? And doesn't that mean C could not declare the function signature of the extern-C function?
For instance if the preprocessor sees __cplusplus and so slips in a constructor or two along with some other member functions yielding a struct with an identical layout. Then why should that matter in this way?
Well, the ultimate answer can only be given by whoever defined the ABI, but the probable reason is that C copies structs by just copying memory, but for non-PODs, such memory copying may not work correctly. And the information needed to decide whether a memcpy is valid may be in a different file, and even if all information is available, the compiler may not be able to decide it (it is equivalent to the halting problem). Therefore the ABI designer probably decided to just assume that it is not possible for non-PODs (even if in the given case it might be actually provable, it's just not worth the effort to try).
Also note that formally, adding a member function makes the type different in C++, and having one declaration with and another without that member function makes your code technically undefined behaviour even if this affects neither layout nor PODness of the class (like a non-virtual non-special member function).
Cannot remember where I saw it now- but somewhere I read that dynamic polymorphism prevents the compiler from making various optimizations.
Besides inlining, could somebody please englighten me with any examples of such "missed" optimization opportunities which polymorphism prevents the compiler from making?
With:
Derived d;
d.vMethod(); // that will call Derived::vMethod statically (allowing inlining).
With (unless one of Derived or Derived::vMethod is declared final in C++11):
void foo(Derived& d)
{
d.vMethod(); // this will call virtually vMethod (disallowing inlining).
}
Virtual call has an additional cost (as indirection through the vtable).
C++11 introduces final keyword which may turn the last example in static call.
At least in C++, polymorphic objects must be in the form of pointers or references. Sometimes this prevents the possibility of putting them on stack variables, or List types, you need to use List. Stack variables spare dynamic allocations etc.
A call to Poly.vmethod() is always resolved at compile time, even if vmethod() is virtual, while Poly->vmethod() consults the virtual method table. (Well, if the method is virtual it is meant to be polymorphic. Static methods are statically resolved in either case.)
Return value optimization (RVO) is another trick that does not have place when returning pointers or references. RVO is typically implemented by passing a hidden parameter: a pointer to a memory region, that is filled with the "returned" object. The size and the type of this region must be perfectly known at compile time.
According to C++ Standard, it's perfectly acceptable to do this:
class P
{
void Method() {}
};
...
P* p = NULL;
p->Method();
However, a slight change to this:
class P
{
virtual void Method() {}
};
...
P* p = NULL;
p->Method();
produces an access violation when compiled with Visual Studio 2005.
As far as I understand, this is caused by some quirk in Microsoft's compiler implementation and not by my sheer incompetence for a change, so the questions are:
1) Does this behavior persist in more recent versions of VS?
2) Are there any, I don't know, compiler settings that prevent this access violation?
According to C++ Standard, it's perfectly acceptable to do this
No it is not!
Dereferencing a NULL pointer is Undefined Behavior as per the C++ Standard.[#1]
However, If you do not access any members inside a non virtual member function it will most likely work on every implementation because for a non virtual member function the this only needs to be derefernced for accessing members of this since there are no members being accessed inside the function hence the result.
However, just because the observable behavior is okay does not mean the program is well-formed. correct.
It still is ill-formed.
It is an invalid program nevertheless.
The second version crashes because while accessing a virtual member function, the this pointer needs to be dereferenced just even for calling the appropriate member function even if there are no members accessed within that member function.
A good read:
What's the difference between how virtual and non-virtual member functions are called?
[#1]Reference:
C++03 Standard: §1.9/4
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. ]
As said by AIs... I'll even explain why: in many C++ implementations the this pointer is simply passed as the first "hidden" parameter of the method. So what you see as
void Method() {}
is really
void Method(P* this) {}
But for virtual methods it's more complex. The runtime needs to access the pointer to find the "real" type of P* to be able to call the "right" virtual implementation of the method. So it's something like
p->virtualTable->Method(p);
so p is always used.
First of all, neither one will even compile, because you've defined Method as private.
Assuming you make Method public, you end up with undefined behavior in both cases. Based on the typical implementation, most compilers will allow the first to "work" (for a rather loose definition of work) while the second will essentially always fail.
This is because a non-virtual member function is basically a normal function that receives an extra parameter. Inside that function, the keyword this refers to that extra parameter, which is a pointer to the class instance for which the function was invoked. If you invoke the member function via a null pointer, it mostly means that inside that function this will be a null pointer. As long as nothing in the function attempts to dereference this, chances are pretty good that you see any noticeable side effects.
A virtual function, however, is basically a function called via a pointer. In a typical implementation, any class that has one or more virtual functions (whether defined directly in that class, or inherited from a base class) will have a vtable. Each instance of that class (i.e., each object) will contain a pointer to the vtable for its class. When you try to call a virtual function via a pointer, the compiler will generate code that:
Dereferences that pointer.
Gets the vtable pointer from the proper offset in that object
dereferences the vtable pointer to get the class' vtable
looks at the proper offset in the vtable to get a pointer to the function to invoke
invokes that function
Given a null pointer, step one of that process is going to break.
I'd note for the record that this applies to virtually all C++ compilers. VC++ is far from unique in this regard. Quite the contrary -- while it's theoretically possible for a compiler to implement virtual functions (for one example) differently than this, the reality is that every compiler of which I'm aware works essentially identically for the kind of code you posted. Virtually all C++ compilers will show similar behavior given the same code -- major differences in implementation are mostly a theoretical possibility, not one you're at all likely to encounter in practice.
Any class having a virtual function would get an extra hidden pointer which would point to the most derived class.
What is the type of this vptr?
It has no type. It's an implementation detail unspecified by the standard; it is not part of the language.
Note that C++ doesn't say that there has to be a virtual table or a virtual "pointer" at all (though this is the most common implementation of RTTI in C++ toolchains).
Also, your analysis is wrong. In, say, GCC, usually each object gets a vptr that points to the relevant virtual table for that object's type: object has pointer, type has table.
The standard does not guarantee the presence of the virtual table pointer, even though most implementations use it.
As a result, it has no type. It is simply an array of pointers.
It has compiler-dependent type which may be anything as long as the compiler understands it. As the language doesn't say anything about vptr, neither programmers use it in their code, compilers are free to create any arbitrary type for implementing runtime polymorphism. That type doesn't has to be conformant with the C++ language.