I've seen a lot of discussion about how C++ doesn't have a Standard ABI quite in the same way that C does. I'm curious as to what, exactly, the issues are. So far, I've come up with
Name mangling
Exception handling
RTTI
Are there any other ABI issues pertaining to C++?
Off the top of my head:
C++ Specific:
Where the 'this' parameter can be found.
How virtual functions are called
ie does it use a vtable or other
What is the layout of the structures used for implementing this.
How are multiple definitions handled
Multiple template instantiations
Inline functions that were not inlined.
Static Storage Duration Objects
How to handle creation (in the global scope)
How to handle creation of function local (how do you add it to the destructor list)
How to handle destruction (destroy in reverse order of creation)
You mention exceptions. But also how exceptions are handled outside main()
ie before or after main()
Generic.
Parameter passing locations
Return value location
Member alignment
Padding
Register usage (which registers are preserved which are scratch)
size of primitive types (such as int)
format of primitive types (Floating point format)
The big problem, in my experience, is the C++ standard library. Even if you had an ABI that dictates how a class should be laid out, different compilers provide different implementations of standard objects like std::string and std::vector.
I'm not saying that it would not be possible to standardize the internal layout of C++ library objects, only that it has not been done before.
The closest thing we have to a standard C++ ABI is the Itanium C++ ABI:
this document is written as a generic specification, to be usable by C++ > implementations on a variety of architectures. However, it does contain > processor-specific material for the Itanium 64-bit ABI, identified as
such."
The GCC doc explains support of this ABI for C++:
Starting with GCC 3.2, GCC binary conventions for C++ are based
on a written, vendor-neutral C++ ABI that was designed to be specific
to 64-bit Itanium but also includes generic specifications that apply
to any platform. This C++ ABI is also implemented by other compiler
vendors on some platforms, notably GNU/Linux and BSD systems
As was pointed out by #Lindydancer, you need to use the same C++ standard libary/runtime as well.
An ABI standard for any language really needs to come from a given platform that wants to support such a thing. Language standards especially C/C++ really can not do this for many reasons but mostly because such a thing would make the language less flexible and less portable and therefore less used. C really doesn't have a defined ABI but many platforms define (directly or indirectly) one. The reason this isn't happening with C++ is because the language is much bigger and changes are made more often. However, Herb Sutter has a very interesting proposal about how to get more platforms to create standard ABIs and how developers can write code that uses the ABI in a standard way:
https://isocpp.org/blog/2014/05/n4028
He points out how C++ has a standard way to link into a platform C ABI but not a C++ ABI via extern "C". I think this proposal could go a long way to allowing interfaces to be defined in terms of C++ instead of C.
I've seen a lot of discussion about how C++ doesn't have a Standard ABI quite in the same way that C does.
What standard C ABI? Appendix J in the C99 standard is 27 pages long. In addition to undefined behavior (and some implementations give some UB a well-defined behavior), it covers unspecified behavior, implementation-defined behavior, locale-specific behavior, and common extensions.
Related
I often see the statement "implementation-defined" in the C Standard documentations, as well as getting it as answer very much.
I have then searched in the C99 Standard for it, and:
In ISO/IEC 9899/1999 (C99) is stated under §3.12:
3.12
Implementation
particular set of software, running in a particular translation environment under particular control options, that performs translation of programs for, and supports execution of functions in, a particular execution environment
As well under §5:
Environment
An implementation translates C source files and executes C programs in two dataprocessing-system environments, which will be called the translation environment and the execution environment in this International Standard. Their characteristics define and constrain the results of executing conforming C programs constructed according to the syntactic and semantic rules for conforming implementations.
But to which software applications exactly it refers to?
Which set of software in particular?
It is stated as providing a translation AND an execution environment. So it couldn´t be the compiler alone, or am i wrong about this assumption?
About which parts of my system i can think of as part of "the implementation"?
Is it the Composing of the used Compiler with its relying C standard, the operation system, the C standard used itself or a mix between those all?
Does it despite the previous statement also include a piece of hardware (used processor, mainboard, etc)?
I quite do not understand, what an implementation exaclty is.
I feel like i have to be a 100-year experienced cyborg to know what it all includes entirely and exactly.
Generally speaking, an "implementation" refers to a given compiler and the machine it runs on. The latter is important due to things such an endianness, which dictates the byte ordering of integer and floating point types, among other considerations.
An implementation is required to document its implementation defined behavior. For example you can find GCC's implementation defined behavior here.
Compilers often support multiple versions of the C standard, so each operating mode can also be considered an implementation. For example, you can pass the -std option to GCC to specify c89, c99. or c11 modes.
I think you have a good formal sense of what it is, and are focusing your question on specifics of real-world implementations, so that's what I'll address. "The implementation" actually tends to encompass a number of components which act and depend upon one another via a number of interface contracts, all of which need to be honored in order to have any hope of the implementation as a whole being conforming.
These include possibly:
the compiler
possibly an assembler, if the compiler produces asm as an intermediate form
the linker
library code that's part of the standard library (which is part of the language, as the language is specified, not a separate component, but only for hosted implementations not freestanding ones)
library code that's "compiler glue" for implementing language constructs for which the compiler doesn't directly emit code (on GCC, this is libgcc), often used for floating point on machines that lack hardware fpu, division on machines that lack hardware divider, etc.
the dynamic linker, if the implementation uses dynamic-linked programs
the operating system kernel, if the implementation's library functions don't directly drive the hardware, but depend on syscalls or "software interrupts" or similar defined by the operating system having their specified behavior in order to implement part of the standard library or other (e.g. startup or glue) library code
etc.
Arguably you could also say the hardware itself is part of the implementation.
The C99 Standard defines many things, but some are just not that relevant so they did not care to define them in the Standard in detail. Instead, they write "implementation defined" which means that whoever actually programs a compiler according to their standard can choose how exactly they do that.
For example, gcc is an implementation of that standard (Actually, gcc implements various different Standards, as pmg points out in his comment. But that's not too important right now). If you were to write your own compiler, you can only call it a "C99 Compiler" if it adheres to the standard. But where the standard states that something is implementation dependent, you are free to choose what your compiler should do.
As far as I can tell, compiler extensions may be considered undefined rather than implementation-defined. I am guessing (but do not know for sure) that this applies to the C++ standard as well as C standard.
Both GCC and LLVM offer an -fexceptions feature that appears to ensure that throwing an exception from C++ code through C code and then catching it in C++ code will behave as expected, i.e., unwinding the stack frames in both C and C++ and invoking the destructors for the C++ locals. (Note: I understand that resources allocated in the C stack frames being unwound will not be freed. That is not part of my question.) Here is the relevant text from the GCC documentation:
If you do not specify this option, GCC enables it by default for languages like C++ that normally require exception handling, and disables it for languages like C that do not normally require it. However, you may need to enable this option when compiling C code that needs to interoperate properly with exception handlers written in C++.
However, I cannot find anything in the C or C++ standards indicating how stack-unwinding should be expected to interact with a stack containing frames compiled from different source languages. The C++ standard appears to only mention unwinding in 15.2, except.ctor, which simply explains the rules regarding destroying local objects when an exception is thrown.
Therefore, is passing an exception through C code undefined behavior, even using a language extension designed to make it work in a well-defined way? Is using such an implementation-provided extension "wrong"?
For context, this question is inspired by two fairly lengthy discussions in the Rust community about stack-unwinding through C code:
Rust internals thread
GitHub issue
Relying on Implementation Documentation
The essential question here is whether we can rely on specifications provided by a C or C++ implementation. (Since we are dealing with a situation with mixed C and C++ code, I will refer to this combined implementation as a single implementation.)
In fact, we must rely on implementation documentation. The C and C++ standards do not apply unless and until an implementation asserts that it conforms (at least in part) to the standards. The standards have no power of law; they do not apply to any person or undertaking until somebody decides to adopt them. (The C 2018 Foreword refers to an ISO statement explaining the standards are voluntary.)
If an implementation tells you it conforms to the C and C++ standards, and it also tells you it supports throwing C++ exceptions through C code, there is no reason to believe one and not the other. If you accept the implementation’s documentation, then it both conforms to the language standard and supports throwing exceptions through C code. If you do not accept the implementation’s documentation, then there is no reason to expect conformance to the language standards. (This is a general view, neglecting instances where apparent bugs give us reason to doubt specific behaviors, for example.)
If you ask whether passing an exception through C code is “undefined” in the sense used in the C or C++ standards, the answer is yes. But those standards are only discussing what they define. Their use of “undefined” does not prohibit anybody else from defining behavior. In fact, if you are using an implementation’s documentation, you have a definition for the behavior. The C and C++ standards do not undo, negate, or nullify definitions made by other documents:
Where the C or C++ standard says any behavior is undefined that only means the behavior is undefined within the context of the C or C++ standard.
Any other specification a programmer chooses to use may define additional behavior that is not defined by the C or C++ standard. The C and C++ standards do not prohibit this.
Example
As an example, some of the documents one might rely on to specify the behavior of a commercial software product include:
The C standard.
The C++ standard.
The assembler manual.
The compiler documentation.
Apple’s Developer Tools documentation, include behaviors of Xcode, the linker, and other tools used during a software build.
Processor manuals.
Instruction set architecture specifications.
IEEE-754 Standard for Floating Point Arithmetic.
Unix documentation for command-line tools.
Unix documentation for system interfaces.
For much software, it would be impossible to produce the software if the overall behavior were not defined by all these specifications combined. The notion that the C or C++ standard overrides or trumps other documentation is ludicrous.
Writing Portable Code
Any software project, or any engineering project, works from premises: It takes as given various tool specifications, material properties, device properties, and so on, and it derives desired products from those premises. Rarely does any complete end-user commercial product rely solely on the C or C++ standard. When you buy an iPhone, it obeys the laws of physics, and you are entitled to rely on it to conform to safety specifications for electrical devices and to radio frequency behaviors regulated by governmental agencies. It conforms to many specifications, and the notion that the C standard should be regarding as trumping those other specifications is absurd. If your device burst into flame because of a programming error that the C standard says has undefined behavior, that is not acceptable—the fact the C standard says it is not defined does not trump the safety specification.
Even in purely software projects, very few strictly conform to the C or C++ standards. Largely, only software that does some pure computations and limited input/output can be written in strictly conforming C or C++. That can include very useful libraries that are included in other software, but it includes very few complete commercial end-user programs—such as a few things used by mathematicians and scientists to answer questions about logic, math, and modeling, for example. Most software in this world interacts with devices and operating systems in ways not defined by the C or C++ standards. Most software uses extensions not defined by the standards—extensions that manipulate files and memory in ways not defined by the standards, that interact with devices and users in ways not defined by the standards. They display GUI windows and accept mouse and keyboard input from the user. They transmit and receive data over a network. They send radio waves to other devices.
These things are impossible without using behaviors not defined by the language standards. And, if the language standards trumped the definitions of these behaviors, writing such software would be impossible. If you wanted to send a Wi-Fi radio signal, and you had adopted the C standard, and the C standard trumped other definitions, that would mean it would be impossible for you to write software that reliable sends a radio signal. Obviously, that is not true. The C standard does not trump other specifications.
Writing “portable code” is not a feasible requirement for most software projects. It is, of course, desirable to contain non-portable code to clear interfaces. It is desirable to write what code one can using portable code so that it can be reused. But this is only part of most projects. For most projects, the project as a whole must use behaviors defined by documents other than the language standards.
In the sense that C does not define what happens when you call a function written in a language other than C, much less what happens if that function fails to return but instead ends its lifetime and the lifetime of the C caller in some other way, yes, it is undefined behavior. It is not "implementation-defined behavior", because the defining characteristic of implementation-defined behavior is that the language standard imposes a requirement on implementations that they document a particular behavior, and that is not the case here; the topic in question is completely outside the scope of the relevant standard.
From a standpoint of reasonable and portable C programming, you should not use or depend on -fexceptions and C++ code that's intended to be called from C should catch all exceptions in the outermost extern "C" function (or function exposed via a function pointer to C callers) and translate them into error codes or some mechanism compatible with C (e.g. a longjmp, but only if it's documented that the C caller has to be prepared for the callee to do so).
The code is not UB because the code is not in C++ language, the code is in C++ with gcc/clang extensions language. In C++ with gcc/clang extensions the code is documented and well defined. In C++ the same code would be UB.
So if you take the same code and compile it in pure standard C++ then that code would exhibit UB. But if you compile it in C++ with gcc/clang extensions then the code is well defined.
I often use the technique to wrap my high-performance C++ classes with a a thin C layer that I compile to shared libraries, and then load them in other programming languages, such as Python.
From my reading here and there, I understand that the only requirement for this to work, is to have the function interfaces use only native types or structs of these types. (so, int and longs, float, double, etc and their pointers of any rank).
My question is: Assuming full ABI compatibility between various compilers, is this the only requirement I have to fulfill to have full API compatibility with a shared library?
Why can't C++ libraries be ported? Here's my understanding:
Case 1: Consider the type std::string. Internally it contains a char* null-terminated string, and a size integer. The C++ standard doesn't say which of these should come first (right?). Meaning that if I put std::string on a function interface, two different compilers may have them in different order, which will not work.
Case 2: Consider inheritance and vtables for a class with virtual methods. The C++ standard doesn't require any specific position/order for where vtable pointers have to go (right?). They could be at the beginning of the class before any other variable, and they could also be at the end, after all other member variables. So again, interfacing this class on a function will not be consistent.
An additional question following my first one: Doesn't this very problem happen also inside function calls? Or is it that nothing matters after it's compiled to binary, and types have no meaning anymore? Wouldn't RTTI elements cause problems, for example, if I put them in a C wrapper interface?
The reason why there is no C++ ABI, is partly because there is no C ABI. As stated by Bjarne Stroustrup (source):
The technical hardest problem is probably the lack of a C++ binary interface (ABI). There is no C ABI either, but on most (all?) Unix platforms there is a dominant compiler and other compilers have had to conform to its calling conventions and structure layout rules - or become unused. In C++ there are more things that can vary - such as the layout of the virtual function table - and no vendor has created a C++ ABI by fiat by eliminating all competitors that did not conform. In the same way as it used to be impossible to link code from two different PC C compilers together, it is generally impossible to link the code from two different Unix C++ compilers together (unless there are compatibility switches).
The lack of an ABI gives more freedom to compiler implementations, and allows the languages to be spread to multiple different types of systems.
On Windows there are some platform specific dependencies that relies on the way the compiler outputs the result, one example comes from COM where pure virtual interfaces are required to be laid out in a specific way. So on Windows most compilers will, at least agree on that.
The Windows API uses the stdcall calling convention, so when coding against the Windows API, there are a fixed set of rules for how to pass parameters to a function. But again this is system dependent, and there is nothing preventing you from writing a program that uses a different convention.
Say an OS/kernel is written with C++ in mind and does not "do" any pure C style stuff, but instead exposes the C standard library built upon a full-fledged C++ standard library. Is this possible? If not, why?
PS: I know the C library is "part of C++", but let's say it's internally based on a C++-based implementation.
Small update: It seems I've stirred up a discussion as to what is "allowed" by my rules here. Generally speaking: the C Standard library implementation should use C++ everwhere that is possible/Right (tm). I mostly think about algorithms and acting on static class objects behind the scenes. I'm not really excluding any language features, but instead trying to put the emphasis on a sane C++ implementation. With regards to the setjmp example, I see no reason why valid C (which would use either other pre-implemented in C++ C library parts or not use any other library functions at all) here would be violation of my "rules". If there is no counterpart in the C++ library, why debate the use of it.
Yes, that is possible. It would be much like one exports a C API from a library written in C++, FORTRAN, assembler or most any other language for that matter.
Actually, c++ has the ability to be faster than c in many ways, due to it's ability to support many translationtime constructs like expression templates. For this reason, c++ matrix libraries tend to be much more optimised than c, involve less temporaries, unroll loops, etc. With new c++0x features like variant templates, the printf function, for instance, could be much faster and typesafe than a version implemented in c. It my even be able to honor the interfaces of many c constructs and evaluate some of their arguments (like string literals) translationtime.
Unfortunately, many people think c is faster than c++ because many people use OOP to mean that all relations and usage must occur through large inheritance hierarchies, virtual dispatch, etc. That caused some early comparisons to be completely different from what is considered good usage these days. If you were to use virtual dispatch where it is appropriate (e.g. like filesystems in the kernel, where they build vtables through function pointers and often basically build c++ in c), you would have no pessimisation from c, and with all of the new features, can be significantly faster.
Not only is speed a possible improvement, but there are places where the implementation would benefit from better type safety. There are common tricks in c (like storing data in void pointers when it must be generic) that break type safety and where c++ can provide strong error checking. This won't always translate through the interfaces to the c library, since those have fixed typing, but it will definitely be of use to the implementers of the library and could assist in some places where it may be possible to extract more information from calls by providing "as-if" interfaces (for instance, an interface that takes a void* might be implemented as a generic interface with a concept check that the argument is implicitly convertible to void*).
I think this would be a great test of the power of c++ over c.
Given that "pure C stuff" has such a large overlap with C++, I fail to see how you'd avoid it entirely in anything, much less an OS kernel. After all, is the + operation "pure C stuff"? :)
That said, you could certainly implement certain C library functions using classes and whatnot. Implement qsort using std::sort? Sure, no problem. Just don't forget your extern "C".
I see no reason why you couldn't do it, but I also see no reason why someone would use such an implementation. It's going to use a lot more memory, and be at least somewhat slower, than a normal implementation...although it might not be much worse than glibc, whose implementation of stdio is already essentially C++ anyway... (Lookup GNU libio... you'll be horrified.)
Kernels like Linux have very strict ABI, based on syscalls, ioctls, filesystems, and conforming to quite a few standards (POSIX being the major one). Since the ABI has to be stable its surface is also limited. It would be a lot of work (particularly since you need a minimally useful kernel as well), but these standards could be implemented in any language.
Edit: You mentioned the libc as well. That is not part of the kernel, and the language of the libc can be entirely unrelated to that of the kernel, thanks to the aforementioned ABI. Unlike the kernel, the libc needs to be C or have a very good ffi for C. C++ with parts in extern C would fit the bill.
The title says everything. I am talking about C/C++ specifically, because both consider this as "implementation issue". I think, defining a standard interface can ease building a module system on top of it, and many other good things.
What could C/C++ "lose" if they defined a standard ABI?
The freedom to implement things in the most natural way on each processor.
I imagine that c in particular has conforming implementations on more different architectures than any other language. Abiding by a ABI optimized for the currently common, high-end, general-purpose CPUs would require unnatural contortions on some the odder machines out there.
Backwards compatibility on every platform except for the one whose ABI was chosen.
Basically, everyone missed that one of the C++14 proposals actually DID define a standard ABI. It was a standard ABI specifically for libraries that used a subset of C++. You define specific sections of "ABI" code (like a namespace) and it's required to conform to the subset.
Not only that, it was written by THE Herb Stutter, C++ expert and author the "Exceptional C++" book series.
The proposal goes into many reasons why a portable ABI is difficult, as well as novel solutions.
https://isocpp.org/blog/2014/05/n4028
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4028.pdf
Note that he defines a "target platform" to be a combination of CPU architecture (x64, x86, ARM, etc), OS, and bitness (32/64).
So the goal here, is actually having C++ code (Visual Studio) be able to talk to other C++ code (GCC, older Visual Studio, etc) on the same platform. It's not a goal of a universal ABI that lets cellphones libraries run on your Windows machine.
This proposal was NOT ratified in C++14, however, it was moved into the "Evolution" phase of C++17 for further discussion/iteration.
https://www.ibm.com/developerworks/community/blogs/5894415f-be62-4bc0-81c5-3956e82276f3/entry/c_14_is_ratified_the_view_from_the_june_2014_c_standard_meeting?lang=en
So as of January 2017, my fingers remain crossed.
Rather than a generic ABI for all platforms (which would be disastrous as it would only be optimal for only one platform). The standard's committee could say that each platform will conform to a specific ABI.
But: Who defines it (the first compiler through the door?). In which case they get an excessive competitive advantage. Or a committee after 5 years of compilers (which would be another horrible idea).
Also it does not give the compiler leaway to do further research into new optimization strategies, you would be stuck with the tricks available at the point where the standard was defined.
The C (or C++) language specifications define the source language. They don't care about the processor running it (A C program could even be interpreted by a human slave, but that would be unethical and not cost-effective).
The ABI is by definition something about the target system. It is related to the processor and the system (and the existing libraries following the ABI).
In the past, it did happen that some processors had proprietary (i.e. undisclosed) specification (even their machine instruction set was not public), and they had a non-public ABI which was followed by a compiler (respecting more or less the language standard).
Defining a programming language don't require the same skill sets as defining the ABI.
You could even define a newer ABI for an existing processor, but that requires a lot of work (patching the compiler, recompiling every thing, including C & C++ standard libraries and all utilities and libraries that you need) so is generally useless.
Execution speed would suffer drastically on a majority of platforms. So much so that it would likely no longer be reasonable to use the C language for a number of embedded platforms. The standards body could be liable for an antitrust suit brought by the makers of the various chips not compatible with the ABI.
Well, there wouldn't be one standard ABI, but about 1000. You would need one for every combination of OS and processor architecture.
Initially, nothing would be lost. But eventually, somebody would find some horrible bug and they would either fix it, breaking the ABI, or leave it, causing problems.
I think that the situation right now is fine. Any OS is free to define an ABI for itself (and they do), which makes sense. It should be the job of the OS to define its ABI, not the C/C++ standard.
C always had a standard ABI, which is even the one used for any most standard ABI (I mean, the C ABI is the ABI of choice, when different languages or systems has to bind to each others). The C ABI is kind of common ABI of others ABIs. C++ is more complex although extending and thus based on C, and indeed, a standard ABI for C++ is more challenging and may present issues to the freedom a C++ compiler have for its own implementation of the target machine code. However, it seems to actually have a standard ABI; see Itanium C++ ABI.
So the question may not be that much “what could they loose?”, but rather “what do they loose?” (if ever they really loose something).
Side note: needed to keep in mind ABIs are always architecture and OS dependant. So if what was meant by “Standard ABI” is “standard across architectures and platforms”, then there may never has been or be such thing, but communication protocols.