What could C/C++ "lose" if they defined a standard ABI? - c++

The title says everything. I am talking about C/C++ specifically, because both consider this as "implementation issue". I think, defining a standard interface can ease building a module system on top of it, and many other good things.
What could C/C++ "lose" if they defined a standard ABI?

The freedom to implement things in the most natural way on each processor.
I imagine that c in particular has conforming implementations on more different architectures than any other language. Abiding by a ABI optimized for the currently common, high-end, general-purpose CPUs would require unnatural contortions on some the odder machines out there.

Backwards compatibility on every platform except for the one whose ABI was chosen.

Basically, everyone missed that one of the C++14 proposals actually DID define a standard ABI. It was a standard ABI specifically for libraries that used a subset of C++. You define specific sections of "ABI" code (like a namespace) and it's required to conform to the subset.
Not only that, it was written by THE Herb Stutter, C++ expert and author the "Exceptional C++" book series.
The proposal goes into many reasons why a portable ABI is difficult, as well as novel solutions.
https://isocpp.org/blog/2014/05/n4028
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4028.pdf
Note that he defines a "target platform" to be a combination of CPU architecture (x64, x86, ARM, etc), OS, and bitness (32/64).
So the goal here, is actually having C++ code (Visual Studio) be able to talk to other C++ code (GCC, older Visual Studio, etc) on the same platform. It's not a goal of a universal ABI that lets cellphones libraries run on your Windows machine.
This proposal was NOT ratified in C++14, however, it was moved into the "Evolution" phase of C++17 for further discussion/iteration.
https://www.ibm.com/developerworks/community/blogs/5894415f-be62-4bc0-81c5-3956e82276f3/entry/c_14_is_ratified_the_view_from_the_june_2014_c_standard_meeting?lang=en
So as of January 2017, my fingers remain crossed.

Rather than a generic ABI for all platforms (which would be disastrous as it would only be optimal for only one platform). The standard's committee could say that each platform will conform to a specific ABI.
But: Who defines it (the first compiler through the door?). In which case they get an excessive competitive advantage. Or a committee after 5 years of compilers (which would be another horrible idea).
Also it does not give the compiler leaway to do further research into new optimization strategies, you would be stuck with the tricks available at the point where the standard was defined.

The C (or C++) language specifications define the source language. They don't care about the processor running it (A C program could even be interpreted by a human slave, but that would be unethical and not cost-effective).
The ABI is by definition something about the target system. It is related to the processor and the system (and the existing libraries following the ABI).
In the past, it did happen that some processors had proprietary (i.e. undisclosed) specification (even their machine instruction set was not public), and they had a non-public ABI which was followed by a compiler (respecting more or less the language standard).
Defining a programming language don't require the same skill sets as defining the ABI.
You could even define a newer ABI for an existing processor, but that requires a lot of work (patching the compiler, recompiling every thing, including C & C++ standard libraries and all utilities and libraries that you need) so is generally useless.

Execution speed would suffer drastically on a majority of platforms. So much so that it would likely no longer be reasonable to use the C language for a number of embedded platforms. The standards body could be liable for an antitrust suit brought by the makers of the various chips not compatible with the ABI.

Well, there wouldn't be one standard ABI, but about 1000. You would need one for every combination of OS and processor architecture.
Initially, nothing would be lost. But eventually, somebody would find some horrible bug and they would either fix it, breaking the ABI, or leave it, causing problems.
I think that the situation right now is fine. Any OS is free to define an ABI for itself (and they do), which makes sense. It should be the job of the OS to define its ABI, not the C/C++ standard.

C always had a standard ABI, which is even the one used for any most standard ABI (I mean, the C ABI is the ABI of choice, when different languages or systems has to bind to each others). The C ABI is kind of common ABI of others ABIs. C++ is more complex although extending and thus based on C, and indeed, a standard ABI for C++ is more challenging and may present issues to the freedom a C++ compiler have for its own implementation of the target machine code. However, it seems to actually have a standard ABI; see Itanium C++ ABI.
So the question may not be that much “what could they loose?”, but rather “what do they loose?” (if ever they really loose something).
Side note: needed to keep in mind ABIs are always architecture and OS dependant. So if what was meant by “Standard ABI” is “standard across architectures and platforms”, then there may never has been or be such thing, but communication protocols.

Related

Is it safe to package C++11 software on current Linux distributions?

As a downstream maintainer in a Linux distribution, some of the packages that I usually maintain are starting to use the C++11 features in their code base. All of them depend on different libraries packaged by the Linux distributions.
Problems with the ABI could appear when mixing C++11 code with C++98 and AFAIK, most of the current major Linux Distributions are not enabling the C++11 flag by default when compiling software to generate packages.
The question is: How are the major Linux distributions handling the entry of C++11 code? Is there a decent way of checking or avoiding these problems with the ABI when using system libraries?
Thanks.
The issue has nothing to do with C++11 vs C++98 except that C++11 can motivate binary changes. There is nothing special about binary changes motivated by C++11. They are just as breaking or non-breaking as regular binary changes. Furthermore, they are only changed if the library maintainer specifically chooses to change his binary interface.
In other words, this has nothing to do with the Standard version and everything to do with the library, unless the library explicitly chooses to offer two different binary interfaces to different Standard versions (which is still a library choice). Excepting this case, you are just as broken in C++98 as you are in C++11. Itanium is backwards compatible between the C++11-supporting versions and the C++98-supporting versions, so the compiler ABIs are not broken.
From memory, unless you're using 4.7.0 which they broke for fun and then unbroke, you're pretty much safe with libstdc++- they are storing up ABI breakage for a future release when they can make one big break.
In other words, whilst the transition period to C++11 can introduce additional motivation to break ABI and therefore additional risk, actually using C++11 itself does not introduce any additional risk.

Objective-C stable ABI

I'm mainly a C++ guy. As C++ lacks an official ABI I always use a COM-like approach for component designs that support more than one compiler.
Recently I came across the question whether Objective-C would be a replacement for the COM-like approach. Obviously for Objective-C to be a replacement one would need a stable ABI, therefor I'd like to know if a stable ABI for Objective-C exists (on all major OSes [OSX, GNU/Linux, Windows]) and how easy it would be to use Objective-C(++) as "glue" between components created by different compilers.
EDIT:
As Nikolai Ruhe pointed out a short description of COM may be helpful. COM is essentially a "binary standard" that allows mixing binarys of different compilers (and in a variety of languages). The vehicle of COM are interfaces, which define methods (which map to C++'s virtual functions). Components implement a at least one interfaces and are distributed as DLLs. They can be located anywhere on the system (the position is specified in the Registry) and can be loaded by any COM-client via the ID of the interface they implement.
I can only speak for Apple's implementation, as I have no experience with the GNU or other ports.
Objective-C relies on C's ABI for the most part (like function calls and memory layout of structs).
It's own ABI underwent a couple of changes in Apple's implementation, like non-fragile instance variables introduced with the "Modern Runtime", introduction of properties, faster exception handling, garbage collection, __weak support for ARC.
Some of the changes were backwards compatible, some not. But since the whole system and frameworks are provided by Apple and the changes were usually introduced with other non-compatible changes (switch to Intel, and LP64) this was without consequences to users.
Edit: One thing you should have in mind is that Objective-C does not only rely on a fixed ABI but also on a compatible runtime. That's one more headache to care about for your purpose.

C++ ABI issues list

I've seen a lot of discussion about how C++ doesn't have a Standard ABI quite in the same way that C does. I'm curious as to what, exactly, the issues are. So far, I've come up with
Name mangling
Exception handling
RTTI
Are there any other ABI issues pertaining to C++?
Off the top of my head:
C++ Specific:
Where the 'this' parameter can be found.
How virtual functions are called
ie does it use a vtable or other
What is the layout of the structures used for implementing this.
How are multiple definitions handled
Multiple template instantiations
Inline functions that were not inlined.
Static Storage Duration Objects
How to handle creation (in the global scope)
How to handle creation of function local (how do you add it to the destructor list)
How to handle destruction (destroy in reverse order of creation)
You mention exceptions. But also how exceptions are handled outside main()
ie before or after main()
Generic.
Parameter passing locations
Return value location
Member alignment
Padding
Register usage (which registers are preserved which are scratch)
size of primitive types (such as int)
format of primitive types (Floating point format)
The big problem, in my experience, is the C++ standard library. Even if you had an ABI that dictates how a class should be laid out, different compilers provide different implementations of standard objects like std::string and std::vector.
I'm not saying that it would not be possible to standardize the internal layout of C++ library objects, only that it has not been done before.
The closest thing we have to a standard C++ ABI is the Itanium C++ ABI:
this document is written as a generic specification, to be usable by C++ > implementations on a variety of architectures. However, it does contain > processor-specific material for the Itanium 64-bit ABI, identified as
such."
The GCC doc explains support of this ABI for C++:
Starting with GCC 3.2, GCC binary conventions for C++ are based
on a written, vendor-neutral C++ ABI that was designed to be specific
to 64-bit Itanium but also includes generic specifications that apply
to any platform. This C++ ABI is also implemented by other compiler
vendors on some platforms, notably GNU/Linux and BSD systems
As was pointed out by #Lindydancer, you need to use the same C++ standard libary/runtime as well.
An ABI standard for any language really needs to come from a given platform that wants to support such a thing. Language standards especially C/C++ really can not do this for many reasons but mostly because such a thing would make the language less flexible and less portable and therefore less used. C really doesn't have a defined ABI but many platforms define (directly or indirectly) one. The reason this isn't happening with C++ is because the language is much bigger and changes are made more often. However, Herb Sutter has a very interesting proposal about how to get more platforms to create standard ABIs and how developers can write code that uses the ABI in a standard way:
https://isocpp.org/blog/2014/05/n4028
He points out how C++ has a standard way to link into a platform C ABI but not a C++ ABI via extern "C". I think this proposal could go a long way to allowing interfaces to be defined in terms of C++ instead of C.
I've seen a lot of discussion about how C++ doesn't have a Standard ABI quite in the same way that C does.
What standard C ABI? Appendix J in the C99 standard is 27 pages long. In addition to undefined behavior (and some implementations give some UB a well-defined behavior), it covers unspecified behavior, implementation-defined behavior, locale-specific behavior, and common extensions.

GCC vs MS C++ compiler for maintaining API backwards binary compatibility

I came from the Linux world and know a lot of articles about maintaining backwards binary compatibility (BC) of a dynamic library API written in C++ language. One of them is "Policies/Binary Compatibility Issues With C++" based on the Itanium C++ ABI, which is used by the GCC compiler. But I can't find anything similar for the Microsoft C++ compiler (from MSVC).
I understand that most of the techniques are applicable to the MS C++ compiler and I would like to discover compiler-specific issues related to ABI differences (v-table layout, mangling, etc.)
So, my questions are the following:
Do you know any differences between MS C++ and GCC compilers when maintaining BC?
Where can I find information about MS C++ ABI or about maintaining BC of API in Windows?
Any related information will be highly appreciated.
Thanks a lot for your help!
First of all these policies are general and not refer to gcc only. For example: private/public mark in functions is something specific to MSVC and not gcc.
So basically these rules are fully applicable to MSVC and general compiler as well.
But...
You should remember:
GCC/C++ keeps its ABI stable since 3.4 release and it is about 7 years (since 2004) while MSVC breaks its ABI every major release: MSVC8 (2005), MSVC9 (2008), MSVC10 (2010) are not compatible with each other.
Some frequently flags used with MSVC can break ABI as well (like Exceptions model)
MSVC has incompatible run-times for Debug and Release modes.
So yes you can use these rules, but as in usual case of MSVC it has much more quirks.
See also "Some thoughts on binary compatibility" and Qt keeps they ABI stable with MSVC as well.
Note I have some experience with this as I follow these rules in CppCMS
On Windows, you basically have 2 options for long term binary compatibility:
COM
mimicking COM
Check out my post here. There you'll see a way to create DLLs and access DLLs in a binary compatible way across different compilers and compiler versions.
C++ DLL plugin interface
The best rule for MSVC binary compatibility is use a C interface. The only C++ feature you can get away with, in my experience, is single-inheritance interfaces. So represent everything as interfaces which use C datatypes.
Here's a list of things which are not binary compatible:
The STL. The binary format changes even just between debug/release, and depending on compiler flags, so you're best off not using STL cross-module.
Heaps. Do not new / malloc in one module and delete / free in another. There are different heaps which do not know about each other. Another reason the STL won't work cross-modules.
Exceptions. Don't let exceptions propagate from one module to another.
RTTI/dynamic_casting datatypes from other modules.
Don't trust any other C++ features.
In short, C++ has no consistent ABI, but C does, so avoid C++ features crossing modules. Because single inheritance is a simple v-table, you can usefully use it to expose C++ objects, providing they use C datatypes and don't make cross-heap allocations. This is the approach used by Microsoft themselves as well, e.g. for the Direct3D API. GCC may be useful in providing a stable ABI, but the standard does not require this, and MSVC takes advantage of this flexibility.

Developing embedded software library, C or C++?

I'm in the process of developing a software library to be used for embedded systems like an ARM chip or a TI DSP (for mostly embedded systems, but it would also be nice if it could also be used in a PC environment). Obviously this is a pretty broad range of target systems, so being able to easily port to different systems is a priority.The library will be used for interfacing with a specific hardware and running some algorithms.
I am thinking C++ is the best option, over C, because it is much easier to maintain and read. I think the additional overhead is worth it for being able to work in the object oriented paradigm. If I was writing for a very specific system, I would work in C but this is not the case.
I'm assuming that these days most compilers for popular embedded systems can handle C++. Is this correct?
Is there any other factors I should consider? Is my line of thinking correct?
If portability is very important for you, especially on an embedded system, then C is certainly a better option than C++. While C++ compilers on embedded platforms are catching up, there's simply no match for the widespread use of C, for which any self-respecting platform has a compliant compiler.
Moreover, I don't think C is inferior to C++ where it comes to interfacing hardware. The amount of abstraction is sufficiently low (i.e. no deep class hierarchies) to make C just as good an option.
There is certainly good support of C++ for ARM. ARM have their own compiler and g++ can also generate EABI compliant ARM code. When it comes to the DSPs, you will have to look at their toolchain to decide what you are going to do. Be aware that the library that comes with a DSP may well not implement the full C or C++ standard library.
C++ is suitable for low-level embedded development and is used in the SymbianOS Kernel. Having said that, you should keep things as simple as possible.
Avoid exceptions which may demand more library support than what is present (therefore use new (std::nothrow) Foo instead of new Foo).
Avoid memory allocations as much as possible and do them as early as possible.
Avoid complex patterns.
Be aware that templates can bloat your code.
I have seen many complaints that C++ is "bloated" and inappropriate for embedded systems.
However, in an interview with Stroustrup and Sutter, Bjarne Stroustrup mentioned that he'd seen heavily templated C++ code going into (IIRC) the braking systems of BMWs, as well as in missile guidance systems for fighter aircraft.
What I take away from this is that experts of the language can generate sophisticated, efficient code in C++ that is most certainly suitable for embedded systems. However, a "C With Classes"[1] programmer that does not know the language inside out will generate bloated code that is inappropriate.
The question boils down to, as always: in which language can your team deliver the best product?
[1] I know that sounds somewhat derogatory, but let me say that I know an awful lot of these guys, and they churn out an awful lot of relatively simple code that gets the job done.
C++ compilers for embedded platforms are much closer to 83's C with classes than 98's C++ standard, let alone C++0x. For instance, some platform we use still compile with a special version of gcc made from gcc-2.95!
This means that your library interface will not be able to provide interfaces with containers/iterators, streams, or such advanced C++ features. You'll have to stick with simple C++ classes, that can very easily be expressed as a C interface with a pointer to a structure as first parameter.
This also means that within your library, you won't be able to use templates to their full power. If you want portability, you will still be restricted to generic containers use of templates, which is, I'm sure you'll admit, only a very tiny part of C++ templates power.
C++ has little or no overhead compared to C if used properly in an embedded environment. C++ has many advantages for information hiding, OO, etc. If your embedded processor is supported by gcc in C then chances are it will also be supported with C++.
On the PC, C++ isn't a problem at all -- high quality compilers are extremely widespread and almost every C compiler is directly associated with a C++ compiler that's quite good, though there are a few exceptions such as lcc and the newly revived pcc.
Larger embedded systems like those based on the ARM are generally quite similar to desktop systems in terms of tool chain availability. In fact, many of the same tools available for desktop machines can also generate code to run on ARM-based machines (e.g., lots of them use ports of gcc/g++). There's less variety for TI DSPs (and a greater emphasis on quality of generated code than source code features), but there are still at least a couple of respectable C++ compilers available.
If you want to work with smaller embedded systems, the situation changes in a hurry. If you want to be able to target something like a PIC or an AVR, C++ isn't really much of an option. In theory, you could get (for example) Comeau to produce a custom port that generated code you could compile on that target's C compiler -- but chances are pretty good that even if you did, it wouldn't work out very well. These systems are really just too limitated (especially on memory size) for C++ to fit them well.
Depending on what your intended use is for the library, I think I'd suggest implementing it first as C - but the design should keep in mind how it would be incorporated into a C++ design. Then implement C++ classes on top of and/or along side of the C implementation (there's no reason this step cannot be done concurrently with the first). If your C design is done with a C++ design in mind, it's likely to be as clean, readable and maintainable as the C++ design would be. This is somewhat more work, but I think you'll end up with a library that's useful in more situations.
While you'll find C++ used more and more on various embedded projects, there are still many that restrict themselves to C (and I'd guess this is more often the case than not) - regardless of whether or not the tools support C++. It would be a shame to have a nice library of routines that you could bring to a new project you're working on, but be unable to use them because C++ isn't being used on that particular project.
In general, it's much easier to use a well-designed C library from C++ than the other way around. I've taken this approach with several sets of code including parsing Intel Hex files, a simple command parser, manipulating synchronization objects, FSM frameworks, etc. I'm planning on doing a simple XML parser at some point.
Here's an entirely different C++-vs-C argument: stable ABIs. If your library exports a C ABI, it can be compiled with any compiler that works on the system, because C ABIs are generally platform standards. If your library exports a C++ ABI, it can only be compiled with a matching compiler -- because C++ ABIs are usually not platform standards, and often differ from compiler to compiler and even version to version.
Interestingly, one of the rare exceptions to this is ARM; there's an ARM C++ ABI specification, and all compliant ARM compilers follow it. This is not true on x86; on x86, you're lucky if a C++ library compiled with a 4.1 version of GCC will link correctly with an application compiled with GCC 4.4, and don't even ask about 3.4.6.
Even if you export a C ABI, you can have problems. If your library uses C++ internally, it will then link to libstdc++ for things in the C++ std:: namespace. If your user compiles a C++ application that uses your library, they'll also link to libstdc++ -- and so the overall application gets linked to libstdc++ twice, and their libstdc++ may not be compatible with your libstdc++, which can (or so I understand) lead to odd errors from the intersection of the two. Considerably less likely, but still possible.
All of these arguments only apply because you're writing a library, and they're not showstoppers. But they are things to be aware of.