Difference Between C and C++ Executables? - c++

I know there are differences in the source code between C and C++ programs - this is not what I'm asking about.
I also know this will vary from CPU to CPU and OS to OS, depending on compiler.
I'm teaching myself C++ and I've seen numerous references to libraries that can be used by both languages. This has started me thinking - are there significant differences between the binary executables of the two languages?
For libraries to be easily used by both, I would think they'd have to be similar on an executable level.
Are there many situations where a person could examine a executable file and tell whether it was created by C or C++ source code? Or would the binaries be pretty similar?

In most cases, yes, it's pretty easy. Here are just a few clues that I've seen often enough to remember them easily:
C++ program will typically end up with at least a few visible symbols that have been mangled.
C++ program will typically have at least a few calls to virtual functions, which are typically quite distinctive from code you'll typically see in C.
Many C++ compilers implement a calling convention for C++ that gives special consideration to passing the this pointer into C++ member functions. Again, since the this pointer simply doesn't exist in C, you'll rarely see a direct analog (though in some cases, they will use the same convention to pass some other pointer, so you need to be careful about this one).

A executable is a executable is a executable, no matter what language it's written in. If it's built for the target architecture, it'll run on the architecture.
The (arguably) most important difference between C and C++-compiled code, and the one relevant to libraries that can be linked both against C and C++ executables, is that of name mangling. Basically: when a library is compiled, it exports a set of symbols (function names, exported variables, etc.) that executables linked against the library can use. How these symbols are named is a fairly compiler/linker-specific, and if the subsequent executable is linked using a linker using an incompatible convention, then symbols won't resolve correctly. In addition, C and C++ have slightly different conventions. The Wikipedia article linked above has more of the details; suffice to say, when declaring exported symbols in a header file, you'll usually see a construction like:
#ifdef __cplusplus
extern "C" {
#endif
/* exported declarations here */
#ifdef __cplusplus
}
#endif
__cplusplus is a preprocessor macro only defined when compiling C++ code. The idea here is that, when using the header in C++, the compiler is instructed to use the C way of naming exported symbols (inside the "extern "C" { /* foo */ }" block, so the library can be linked both in C and C++ correctly.

I think I could tell if something is C++ or C from reading the disassembled binary code [for processor architectures that I'm familiar with, x86, x86_64 and ARM]. But in reality, there isn't much difference, you'd have to look pretty hard to know for sure.
Signs to look for are "indirect calls" (function pointer calls via a table) and this-pointers. Although C can have pointer to struct arguments and will often use function pointers, it's not usually set up in the way that C++ does it. Also, you'll notice, sometimes, that the compiler takes a pointer to a struct and adds a small offset - that's removing the outer layer of an inherited class. This CAN happen in C as well, but it won't be as common/distinctive.
Looking just at the binary [unless you can "do disassembly in your head" would be a lot harder - especially if it's been stripped of symbols - that's like the guy who could tell you what classical music something was on an old Vinyl record from looking at the tracks [with the label hidden] - not something most people can do, even if they are "good".

In practice, a C program (or a C++ program) is rarely only pure standard C (or C++) (for instance the C99 standard has no mean to scan a directory). So programs use additional libraries.
On Linux, most binaries are dynamically linked. Use the ldd command to find out.
If the binary is linked to the stdc++ library, the source code is likely C++.
If only the libc.so library is linked, the source code is probably only C (but you could link statically the libstdc++.a library).
You can also use tools working on binary files (e.g. objdump, readelf, strings, nm on Linux ....) to find more about them.

The code generated by C and C++ compilers is generally the same code. There are two important differences:
Name mangling: Each function and global variable becomes a symbol at compile time. In C these symbol's names are the same as their names in your source code. In C++ they are being mangled a bit to allow for polymorphic code
Calling conventions: If you call a method in C++ the this-pointer is passed as a hidden first parameter. Other conventions might also be different such as call by reference which does not exist in C
You can use an block such as this to let the C++-compiler generate code compatible to C:
extern "C" {
/* code */
}

Related

How do I (programmatically) ensure that a dll contains definitions for all exported functions?

Let's say I have this lib
//// testlib.h
#pragma once
#include <iostream>
void __declspec(dllexport) test();
int __declspec(dllexport) a();
If I omit the definition for a() and test() from my testlib.cpp, the library still compiles, because the interface [1] is still valid. (If I use them from a client app then they won't link, obviously)
Is there a way I can ensure that when the obj is created (which I gather is the compiler's job) it actually looks for the definitions of the functions that I explicitly exported, and fails if doesn't ?
This is not related to any real world issue. Just curious.
[1] MSVC docs
No, it's not possible.
Partially because a dllexport declaration might legally not even be implemented in the same DLL, let alone library, but be a mere forward declaration for something provided by yet another DLL.
Especially it's impossible to decide on the object level. It's just another forward declaration like any other.
You can dump the exported symbols once the DLL has been linked, but there is no common tool for checking completeness.
Ultimately, you can't do without a client test application which attempts to load all exported interfaces. You can't check that on compile time yet. Even just successfully linking the test application isn't enough, you have to actually run it.
It gets even worse if there are delay-loaded DLLs (and yes, there usually are), because now you can't even check for completeness unless you actually call at least one symbol from each involved DLL.
Tools like Dependency Walker etc. exist for this very reason.
You asked
"Is there a way I can ensure that when the obj is created (which I gather is the compiler's job) it actually looks for the definitions of the functions that I explicitly exported, and fails if doesn't ?"
When you load a DLL then the actual function binding is happenning at runtime(late binding of functions) so it is not possible for the compiler to know if the function definition are available in the DLL or not. Hope this answer to your query.
How do I (programmatically) ensure that a dll contains definitions for all exported functions?
In general, in standard C++11 (read the document n3337 and see this C++ reference), you cannot
Because C++11 does not know about DLL or dynamic linking. It might sometimes make sense to dynamically load code which does not define a function which you pragmatically know will never be called (e.g. an incomplete DLL for drawing shapes, but you happen to know that circles would never be drawn in your particular usage, so the loaded DLL might not define any class Circle related code. In standard C++11, every called function should be defined somewhere (in some other translation unit).
Look also into Qt vision of plugins.
Read Levine's Linkers and loaders book. Notice that on Linux, plugins loaded with dlopen(3) have a different semantics than Windows DLLs. The evil is in the details
In practice, you might consider using some recent variant of the GCC compiler and develop your GCC plugin to check that. This could require several weeks of work.
Alternatively, adapt the Clang static analyzer for your needs. Again, budget several weeks of work.
See also this draft report and think about C++ cross compilers (e.g. compiling on Windows a plugin for a RaspBerry Pi)
Consider also runtime code generation frameworks like asmjit or libgccjit. You might think of generating at runtime the missing stubs or functions (and fill appropriately function pointers with them, or even vtables). C++ exceptions could also be an additional issue.
If your DLL contains calls to the functions, the linker will fail if a definition isn't provided for those functions. It doesn't matter if the calls are never executed, only that they exist.
//// testlib.h
#pragma once
#include <iostream>
#ifndef DLLEXPORT
#define DLLEXPORT(TYPE) TYPE __declspec(dllexport)
#endif
DLLEXPORT(void) test();
DLLEXPORT(int) a();
//// testlib_verify.c
#define DLLEXPORT(TYPE)
void DummyFunc()
{
#include testlib.h
}
This macro-based solution only works for functions with no parameters, but it should be easy to extend.

Explicit name mangling (also called name decoration) for all symbols in a library

Is there any way to explicitly do name mangling (also called name decoration) in a library written in c(or cpp).I want all the symbols of my shared library to have their names mangled.
Consider this question:
Two library of different versions in an application
In this if I can explicitly have all their names mangled , I think i can resolve that issue.May be there is some option in gcc compiler itself to do this.
Your question is:
Is there any way to explicitly do name mangling (also called name decoration) in a library written in c(or cpp).I want all the symbols of my shared library to have their names mangled.
However, I suspect that you're using the term name mangling inappropriately. Name mangling has nothing to do with library release version. If you mean to version each object exported in your library, then there are plenty of questions to answer that. Personally, I would use a versioned namespace -- but only because I haven't (yet) been bitten by it. Here's a quick example:
namespace mylibrary {
namespace v1 {
class foo {};
}
using foo = v1::foo;
}
mylibrary::foo f; // mylibrary::v1::foo
...then on a later release...
namespace mylibrary {
namespace v1 {
class foo {};
}
namespace v2 {
class foo;
}
using foo = v2::foo;
}
mylibrary::foo newer_f; // mylibrary::v2::foo
mylibrary::v1::foo older_f;
There are of course many permutations you could have. And there are a lot of caveats, especially if you have templated code or make use of ADL. If you release version 1 of the library with one definition of class foo but then version 2 has a different definition, then the two libraries will not be compatible! That's rather the whole point though.
If however I am incorrect and you truly do want to enforce C++ name mangling in your C++ library (which is odd, because it should be done by default), then the answer is twofold. First, take a look at some related questions:
Why would you use 'extern "C++"'?
"Undefined reference to" error while linking object files
"Undefined reference to" error when linking static C library with C++ code
The reading is related but not causal. The related questions are answering your question in reverse.
Many operating systems are written in C and that is typically why you'd see extern "C" when including system headers. It's also why you sometimes see the linker complaining about missing functions when you try to use things declared in a header whose library was compiled with C instead of C++.
So to go in the other direction (in your direction): in your header file, you can declare your exports to be extern "C++". That tells the compiler to specifically use mangled names when importing or exporting the object.
Using extern "C++" won't by itself be your magic trick. There are some GCC options which control some of the more specific functionality about name mangling. So, secondly, take a look at those. The (external link) to the GCC manual page is here: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html
Any option which mentions ABI, such as -fabi, might impact you. "-fabi" flags relate to "Application Binary Interface". You might want to learn more about these terms too. What is an application binary interface has some excellent answers describing what an ABI is and how you can start to reason about them. "-Wabi" will tell GCC to emit warnings when it detects potential ABI conflicts. But, like all things C++, it's not foolproof. I would not be surprised if there are name mangling issues which might not be detected by it. That's particularly true if you ever mix heterogenous compiler vendors or versions.
Importantly: mixing ABIs is likely going to be a big headache. I'd be very concerned about ABI incompatibilities being forced together and causing very difficult-to-debug undefined behaviors!

C migrating to C++ (embedded)

I have a project in pure C - st usb library and I need to migrate it to c++ and change same structures into classes. I removed all c++ "protections" like:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
I changed all files extensions from .c to .cpp (except HAL library).
I realized that c++ .hex is 7kB smaller then c .hex. When I looked into .map file I saw that many functions are missing. I thought that staticfunctions caused that, but removing static key word didn't help. Does anyone have idea what could cause that some functions weren't compiled. When extensions are .c everything is fine.
I can think of two main reasons:
Inlining. The compiler can decide there is no need to emit the function as a standalone function if all usages can be inlined.
Unused code. The compiler can see that a function isn't used anywhere in your code and decide to eliminate it from the final result.
If the result is to be used as sort of library, that your environment calls specific functions without having an explicit call in your own code, I think the best method is to compile and link your code as a library (probably dynamic lib) and exporting those functions as library interface (visibility in gcc, dllexport in MSVC) will force the compiler/linker to include them in the produced binary even if they don't see why they are needed right now.
(Of course, this is a wild guess about your target environment.)
Another option is to turn off the specific compiler/linker optimizations for removing dead code and forcing emitting standalone function instance for inlined functions, but I think it's very indirect approach, have a wider effect and can complicate maintenance later.
C++ functions are given different signatures than C functions. Since you lost functionality and the code is much smaller, it's likely that a function that requires C linkage is being compiled as C++ and the signature mismatch is preventing proper linking.
A likely place for this to happen is in the interrupt vector table. If the handler function is compiled with C++ linkage, the handler address won't make it into a table that's compiled with C.
Double check the interrupt vectors and make sure that they reference the correct functions. If they are correct, check any other code compiled with C that might reference an external symbol compiled with C++.

How to link old C code with reserved keywords in it with C++?

I have a 10+ years old C library which -- I believe -- used to work just fine in the good old days, but when I tried to use it with a C++ source (containing the main function) the other day I ran into some difficulties.
Edit: to clarify, the C library compiles just fine with gcc, and it generates an object file old_c_library.o. This library was supposed to be used in a way so that the C header file old_c_library.h is #included in your main.c C source file. Then your main C source file should be compiled and linked together with old_c_library.o via gcc. Here I want to use a C++ source file main.cpp instead, and compile/link it with g++.
The following three problems occurred, during the compilation of the C++ source file:
one of the header files of the C library contains the C++ reserved word new (it is the name of an integer), which resulted in fatal error; and
one of the header files of the C library contains a calloc call (an explicit typecast is missing), which resulted in fatal error; and
various files of the C library contain code where comparison of signed and unsigned integers happen, which resulted in warnings.
Edit: I tried to use the #extern "C" { #include "obsolete_c_library.h" } "trick", as suggested in the comments, but that did not solve any of my problems.
I can sort out problem 1 by renaming all instances of the reserved words and replacing them by -- basically -- anything else. I can sort out problem 2 by typecasting the calloc call. I might try to sort out the warnings by ideas suggested here: How to disable GCC warnings for a few lines of code.
But I still wonder, is there a way to overcome these difficulties in an elegant, high-level way, without actually touching the original library?
Relevant:
Where is C not a subset of C++? and Do I cast the result of malloc? and How do I use extern to share variables between source files?.
Generally speaking, it is not safe to #include C header files into C++ sources if those headers were not built in anticipation of such usage. Under some circumstances it can be made to work, but you need to be prepared to either modify the headers or write your own declarations for the functions and global variables you want to access.
At minimum, if the C headers declare any functions and you are not recompiling those functions in C++ then you must ensure that the declarations are assigned C linkage in your C++ code. It's not uncommon for C headers to account for that automatically via conditional compilation directives, but if they do not then you can do it on the other side by wrapping the inclusion(s) in a C linkage block:
extern "C" {
#include "myclib.h"
}
If the C headers declare globals whose names conflict with C++ keywords, and which you do not need to reference, then you may be able to use the preprocessor to redefine them:
#define new extern_new
#include "myclib.h"
#undef new
That's not guaranteed to work, but it's worth a try. Do not forget to #undef such macros after including the C headers, as shown.
There may be other fun tricks you can play with macros to adapt specific headers to C++, but at some point it makes more sense to just copy / rewrite the needed declarations (and only those), either in your main C++ source or in your own C++ header. Note that doing so does not remove the need to declare C linkage -- that requirement comes from the library having been compiled by a C compiler rather than a C++ compiler.
You can write a duplicate of the c header with the only difference being lack of declaration of extern int new. Then use the newly created c++ friendly header instead of the old, incompatible one.
If you need to refer that variable from c++, then you need to do something more complex. You will need to write a new wrapper library in c, that exposes read and write functions a pointer to c++ that can be used to access the unfortunately named variable.
If some inline functions refer to the variable, then you can leave those out from the duplicate header and if you need to call them from c++, re-implement them in a c++ friendly manner. Just don't give the re-implementation same name within extern "C", because that would give you a conflict with the original inline function possibly used by some existing c code.
Do the same header duplication as described in 1. leaving out the problematic inline function and re-implementing if needed.
Warnings can be ignored or disabled as you already know. No need to modify the original headers. You could rewrite any {const,type}-unsafe inline functions with versions that don't generate warnings, but you have to consider whether your time is worth it.
The disadvantage of this approach is that you now have two vesions of some/all of the headers. New ones used by c++, and the old ones that may be used by some old code.
If you can give up the requirement of not touching the original library and don't need to work with an existing compiled binary, then an elegant solution would be to simply rename the problematic variables (1.). Make non-c++-compatible inline functions (2.) non-inline and move them into c source files. Fix unsafe code (3.) that generates warnings or if the warning is c++ specific, simply make the function non-inline.

Header-only linking

Many C++ projects (e.g. many Boost libraries) are "header-only linked".
Is this possible also in plain C? How to put the source code into headers? Are there any sites about it?
Executive summary: You can, but you shouldn't.
C and C++ code is preprocessed before it's compiled: all headers are "pasted" into the source files that include them, recursively. If you define a function in a header and it is included by two C files, you will end up with two copies in each object file (One Definition Rule violation).
You can create "header-only" C libraries if all your functions are marked as static, that is, not visible outside the translation unit. But it also means you will get a copy of all the static functions in each translation unit that includes the header file.
It is a bit different in C++: inline functions are not static, symbols emitted by the compiler are still visible by the linker, but the linker can discard duplicates, rather than giving up ("weak" symbols).
It's not idiomatic to write C code in the headers, unless it's based on macros (e.g. queue(3)). In C++, the main reason to keep code in the headers are templates, which may result in code instantiation for different template parameters, which is not applicable to C.
You do not link headers.
In C++ it's slightly easier to write code that's already better-off in headers than in separately-compiled modules because templates require it. 1
But you can also use the inline keyword for functions, which exists in C as well as C++. 2
// Won't cause redefinition link errors, because of 6.7.4/5
inline void foo(void) {
// ...
}
[c99: 6.7.4/5:] A function declared with an inline function
specifier is an inline function. The function specifier may appear
more than once; the behavior is the same as if it appeared only once.
Making a function an inline function suggests that calls to the
function be as fast as possible. The extent to which such
suggestions are effective is implementation-defined.
You're a bit stuck when it comes to data objects, though.
1 - Sort of.
2 - C99 for sure. C89/C90 I'd have to check.
Boost makes heavy use templates and template meta-programming which you cannot emulate (all that easily) in C.
But you can of course cheat by having declarations and code in C headers which you #include but that is not the same thing. I'd say "When in Rome..." and program C as per C conventions with libraries.
Yes, it is quite possible. Declare all functions in headers and either all as static or just use a single compilation unit (i.e. only a single c file) in your projects.
As a personal anecdote, I know quite a number of physicists who insist that this technique is the only true way to program C. It is beneficial because it's the poor man's version of -fwhole-program, i.e. makes optimizations based on the knowledge of function behaviour possible. It is practical because you don't need to learn about using the linker flags. It is a bad idea because your whole program must be compiled as a whole and recompiled with each minor change.
Personally, I'd recommend to let it be or at least go with static for only a few functions.