Wrapping different C API implementations in C++ - c++

assume we were using gcc/g++ and a C API specified by a random committee. This specification defines the function
void foo(void);
Now, there are several implementations according to this specification. Let's pick two as a sample and call them nfoo and xfoo (provided by libnfoo and libxfoo as static and dynamic libraries respectively).
Now, we want to create a C++ framework for the foo-API. Thus, we specify an abstract class
class Foo
{
public:
virtual void foo(void) = 0;
};
and corresponding implementations
#include <nfoo.h>
#include "Foo.h"
class NFoo : public Foo
{
public:
virtual void foo(void)
{
::foo(); // calling foo from the nfoo C-API
}
};
as well as
#include <xfoo.h>
#include "Foo.h"
class XFoo : public Foo
{
public:
virtual void foo(void)
{
::foo(); // calling foo from the xfoo C-API
}
};
Now, we are facing a problem: How do we create (i.e. link) everything into one library?
I see that there will be a symbol clash with the foo function symbols of the C API implementations.
I already tried to split the C++ wrapper implementations into separate static libraries, but then I realized (again) that static libraries is just a collection of unlinked object files. So this will not work at all, unless there is a way to fully link the C libraries into the wrapper and remove/hide their symbols.
Suggestions are highly appreciated.
Update: Optimal solutions should support both implementations at the same time.
Note: The code is not meant to be functional. Perceive it as pseudo code.

Could you use dlopen/dlsym at runtime to resolve your foo call.
something like example code from link ( may not compile):
void *handle,*handle2;
void (*fnfoo)() = null;
void (*fxfoo)() = null;
/* open the needed object */
handle = dlopen("/usr/home/me/libnfoo.so", RTLD_LOCAL | RTLD_LAZY);
handle2 = dlopen("/usr/home/me/libxfoo.so", RTLD_LOCAL | RTLD_LAZY);
fnfoo = dlsym(handle, "foo");
fxfoo = dlsym(handle, "foo");
/* invoke function */
(*fnfoo)();
(*fxfoo)();
// don't forget dlclose()'s
otherwise, the symbols in the libraries would need to be modified.
this is not portable to windows.

First thing's first, if you are going to be wrapping up a C API in C++ code, you should hide that dependency behind a compilation firewall. This is to (1) avoid polluting the global namespace with the names from the C API, and (2) freeing the user-code from the dependency to the third-party headers. In this example, a rather trivial modification can be done to isolate the dependency to the C APIs. You should do this:
// In NFoo.h:
#include "Foo.h"
class NFoo : public Foo
{
public:
virtual void foo(void);
};
// In NFoo.cpp:
#include "NFoo.h"
#include <nfoo.h>
void NFoo::foo(void) {
::foo(); // calling foo from the nfoo C-API
}
The point of the above is that the C API header, <nfoo.h>, is only included in the cpp file, not in the header file. This means that user-code will not need to provide the C API headers in order to compile code that uses your library, nor will the global namespace names from the C API risk clashing with anything else being compiled. Also, if your C API (or any other external dependency for that matter) requires creating a number of things (e.g., handles, objects, etc.) when using the API, then you can also wrap them in a PImpl (pointer to a forward-declared implementation class that is only declared-defined in the cpp file) to achieve the same isolation of the external dependency (i.e., a "compilation firewall").
Now, that the basic stuff is out of the way, we can move to the issue at hand: simultaneously linking to two C APIs with name-clashing symbols. This is a problem and there is no easy way out. The compilation firewall technique above is really about isolating and minimizing dependencies during compilation, and by that, you could easily compile code that depends on two APIs with conflicting names (which isn't true in your version), however, you will still be hit hard with ODR (One Definition Rule) errors when reaching the linking phase.
This thread has a few useful tricks to resolving C API name conflicts. In summary, you have the following choices:
If you have access to static libraries (or object files) for at least one of the two C APIs, then you can use a utility like objcopy (in Unix/Linux) to add a prefix to all the symbols in that static library (object files), e.g., with the command objcopy --prefix-symbols=libn_ libn.o to prefix all the symbols in libn.o with libn_. Of course, this implies that you will need to add the same prefix to the declarations in the API's header file(s) (or make a reduced version with only what you need), but this is not a problem from a maintenance perspective as long as you have a proper compilation firewall in place for that external dependency.
If you don't have access to static libraries (or object files) or don't want to do this above (somewhat troublesome) approach, you will have to go with a dynamic library. However, this isn't as trivial as it sounds (and I'm not even gonna go into the topic of DLL Hell). You must use dynamic loading of the dynamic link library (or shared-object file), as opposed to the more usual static loading. That is, you must use the LoadLibrary / GetProcAddress / FreeLibrary (for Windows) and the dlopen / dlsym / dlclose (all Unix-like OSes). This means that you have to individually load and set the function-pointer address for each function that you wish to use. Again, if the dependencies are properly isolated in the code, this is going to be just a matter of writing all this repetitive code, but not much danger involved here.
If your uses of the C APIs is much simpler than the C APIs themselves (i.e., you use only a few functions out of hundreds of functions), it might be a lot easier for you to create two dynamic libraries, one for each C API, that exports only the limited subset of functions, giving them unique names, that wrap calls to the C API. Then, you main application or library can be link to those two dynamic libraries directly (statically loaded). Of course, if you need to do that for all the functions in that C API, then there is no point in going through all this trouble.
So, you can choose what seems more reasonable or feasible for you, there is no doubt that it will require quite a bit a manual work to fix this up.

if you only want to access one library implementation at a time, a natural way to go about it is as a dynamic library
in Windows that also works for accessing two or more library implementations at a time, because Windows dynamic libraries provide total encapsulation of whatever's inside

IIUC ifdef is what you need
put #define _NFOO
in the nfoo lib and #define XFOO in xfoo lib.
Also remember if nfoo lib and xfoo lib both have a function called Foo then there will be error during compilation. To avoid this GCC/G++ uses function overloading through name mangling.
you can then check if xfoo is linked using ifdefs
#ifdef XFOO
//call xfoo's foo()
#endif

A linker cannot distinguish between two different definitions of the same symbol name, so if you're trying to use two functions with the same name you'll have to separate them somehow.
The way to separate them is to put them in dynamic libraries. You can choose which things to export from a dynamic library, so you can export the wrappers while leaving the underlying API functions hidden. You can also load the dynamic library at runtime and bind to symbols one at a time, so even if the same name is define in more than one they won't interfere with each other.

Related

C++ library export specific functions only

I am trying to search for ways to control the 'exposure' of functions/classes/variables to third-party users while I still have full access within a C++ project/library.
In javascript you got modules which does this exactually.
In java/C# you can get pretty far with access-modifiers.
But in C/C++ there doesn't seem to be any control beyond the file itself (i.e. when its in the .h/.hpp file, its accessible from anywhere).
So the question, is there a way to access functions/classes/variables from files within the project without exposing them to third-party users?
Well, don't put them on the API, and if these symbols aren't needed internally, keep them in header files only used for building your project, not installed as development headers for your consumers.
Say, you have a class declaration
class Whatpeopleuse {
private:
int _base;
public:
Whatpeopleuse(int base);
int nth_power(unsigned int exponent);
};
in your n247s/toolbox.hpp, which you install / ship to customers.
And the implementation in your mycode.cc file
#include "n247s/toolbox.hpp"
#include "math_functions.hpp" // contains declaration of power_function
Whatpeopleuse::Whatpeopleuse(int base) : _base(base)
{
}
int
Whatpeopleuse::nth_power(unsigned int exponent)
{
return power_function(_base, exponent)
}
with power_function defined in another file, math_functions.cc:
#include "math_functions.hpp"
int power_function(int base, unsigned int exponent)
{
int result = 1;
for(;exponent; --exponent)
result *= base;
return result;
}
Then you compile your mycode.cc and your math_functions.cc, link them together to a n247s.so (or .dll, or whatever your shared library extension is), and ship that to the customer together with toolbox.hpp.
Customer never sees the function definitions in math_functions.h, or the code in math_functions.cc or mycode.cc. That's internal to the binary you produced.
What the customer sees/gets is
the header toolbox.hpp, and what symbols / types there are in your library that they are able to access (otherwise, their compiler wouldn't know what there is to call in your library)
the binary n247s library, containing the symbols as declared in toolbox.hpp.
Of these symbols, only these that have visibility actually are then given a name associated with an address within the shared library file. You'll find that it's common to tell the linker that actually none of the functions in a header should be visible by default, and explicitly mark these classes and functions you want to see, using compiler __attribute__((visibility("default"))) (at least that's what's in my macros to do that, for MSVC, that attribute specification might look different).
The user of the class Whatpeopleuse can only access its public: members (There's ways around that, though, within limits), but they can see that you have private members (like _base). If that's too much disclosure, you can make your customer-facing classes only contain a single member pointer, something called a detail, and the customer-facing member functions (whose implementations just call detail->name_of_member).
I'd like to add that you don't want to make it hard for your customers to know what your class is doing. If they're so motivated, they can reverse engineer quite a lot. Making something that's just harder to read and understand because its headers go through lengths to obfuscate what's happening behind the scenes is frustrating and people won't like it.
On the other hand, the above methodology is something you typically find in large code bases – not to "hide" things from the user, but to keep the API clean – only the things that are in the public-facing header are part of the API, the rest is never seen by the user's compiler. That's great, because it means
you make clear what is a good idea to use, and what is internal "plumbing". This is mostly important because often, it's easy to forget what the functionality is that you actually want to offer, and then start writing confusing / hard to use libraries.
To minimize the ABI of your library: as long as the ABI of the symbols that you have in your public-facing libraries don't change, your user can just drop-in replace your library v1.1.2 with v1.1.3, by replacing the library binary, without recompilation.
It makes it clear what needs to be user-friendly documented, and what not. If you're shipping someone a library without documentation, they will go through hell to not use your library. I've myself have been in that position. I know companies who just rewrote whole driver suites from scratch because the documentation they got from the hardware vendor was not explaining behavior.

Can you weak link(optional link) a static library in C++ like Objc could?

I am developing a static library A with a mix of Objective C and C++ in Xcode, I run into the need to "weak link" another static library and Let's call it B. A calls some methods defined in B. The idea is that if B is not linked/provided, A will not throw any undefined symbols error.
I know that in Objective C you can do that and in code I can rely on run time methods such as NSClassFromString or [someObject Class] to check if a certain function is present/available from another static library, but I don't know if I can achieve that in one of my .cpp source file. Please advise and thank you!
I created a very simple sample project for illustration purpose:
Library A:
Core_ObjC.h, this is the header that will be exposed
#import <Foundation/Foundation.h>
#interface Core_Objc : NSObject
-(int) calculate;
#end
Core_ObjC.mm
#import "Core_ObjC.h"
#include "Core_CPP.h"
#implementation Core_Objc
-(int) calculate{
return calculate_Core();//Call into cpp here
}
#end
Core_CPP.cpp
#include "Core_CPP.h"
#include "NonCore_CPP.h"
int calculate_Core(){
return calculate_NonCore();//Call into another cpp here but it's defined in Library B
}
Library B:
NonCore_CPP.cpp
#include "NonCore_CPP.h"
int calculate_NonCore(){
return 100;
}
If I link both libraries in a sample app, the app will compile fine. However, when I link only A from the sample app, I will encounter error like:
Undefined symbols for architecture arm64:
"calculate_NonCore()", referenced from:
calculate_Core() in CoreFramework(Core_CPP.o)
The error does make sense to me because B will have the missing definition, but I am just looking for a solution that the compilation won't complain when there is only A.
So, weak linking works at the C function level as well. For details, see Apple's documentation but basically, the symbol needs to be declared using __attribute__((weak_import)), as follows
extern int MyFunction() __attribute__((weak_import));
extern int MyVariable __attribute__((weak_import));
and then you can check if its address is zero to check if it was found during link time, using e.g. if (MyFunction != NULL) or if (&MyVariable != NULL).
Your title mentions static libraries. Note that static libraries are just collections of object files, so you either link the library or you don't; there's no weak linking a static library, because by definition, it's present at build time. Or not. If you want runtime behaviour, use a dynamic library. (dylib)
The above should actually cover the example you have given: you'll just need to provide alternative code for when calculate_NonCore is unavailable, as you can't call it without crashing the program. (Makes sense, it's missing)
You mention C++ too, although your code doesn't make any obvious use of it. C++ makes this a little more complicated because of its mangled names; additionally, classes themselves don't have any linkage, so you can't test for presence of a class. The compiler seems to allow applying the weak_import attribute to member functions, static member variables, etc. but I'm not sure to what extent the NULL checks will work. Free functions should certainly not be an issue - treat them like C functions - and I'd guess that static members are probably a good candidate because you can easily take their address.
I'd probably try to avoid calling into any code that makes use of any part of a class that might not be there, and set things up such that you can simply check for the presence of a "canary" static member variable or function, which, if present, you can conclude that the whole class is present.

How do I (programmatically) ensure that a dll contains definitions for all exported functions?

Let's say I have this lib
//// testlib.h
#pragma once
#include <iostream>
void __declspec(dllexport) test();
int __declspec(dllexport) a();
If I omit the definition for a() and test() from my testlib.cpp, the library still compiles, because the interface [1] is still valid. (If I use them from a client app then they won't link, obviously)
Is there a way I can ensure that when the obj is created (which I gather is the compiler's job) it actually looks for the definitions of the functions that I explicitly exported, and fails if doesn't ?
This is not related to any real world issue. Just curious.
[1] MSVC docs
No, it's not possible.
Partially because a dllexport declaration might legally not even be implemented in the same DLL, let alone library, but be a mere forward declaration for something provided by yet another DLL.
Especially it's impossible to decide on the object level. It's just another forward declaration like any other.
You can dump the exported symbols once the DLL has been linked, but there is no common tool for checking completeness.
Ultimately, you can't do without a client test application which attempts to load all exported interfaces. You can't check that on compile time yet. Even just successfully linking the test application isn't enough, you have to actually run it.
It gets even worse if there are delay-loaded DLLs (and yes, there usually are), because now you can't even check for completeness unless you actually call at least one symbol from each involved DLL.
Tools like Dependency Walker etc. exist for this very reason.
You asked
"Is there a way I can ensure that when the obj is created (which I gather is the compiler's job) it actually looks for the definitions of the functions that I explicitly exported, and fails if doesn't ?"
When you load a DLL then the actual function binding is happenning at runtime(late binding of functions) so it is not possible for the compiler to know if the function definition are available in the DLL or not. Hope this answer to your query.
How do I (programmatically) ensure that a dll contains definitions for all exported functions?
In general, in standard C++11 (read the document n3337 and see this C++ reference), you cannot
Because C++11 does not know about DLL or dynamic linking. It might sometimes make sense to dynamically load code which does not define a function which you pragmatically know will never be called (e.g. an incomplete DLL for drawing shapes, but you happen to know that circles would never be drawn in your particular usage, so the loaded DLL might not define any class Circle related code. In standard C++11, every called function should be defined somewhere (in some other translation unit).
Look also into Qt vision of plugins.
Read Levine's Linkers and loaders book. Notice that on Linux, plugins loaded with dlopen(3) have a different semantics than Windows DLLs. The evil is in the details
In practice, you might consider using some recent variant of the GCC compiler and develop your GCC plugin to check that. This could require several weeks of work.
Alternatively, adapt the Clang static analyzer for your needs. Again, budget several weeks of work.
See also this draft report and think about C++ cross compilers (e.g. compiling on Windows a plugin for a RaspBerry Pi)
Consider also runtime code generation frameworks like asmjit or libgccjit. You might think of generating at runtime the missing stubs or functions (and fill appropriately function pointers with them, or even vtables). C++ exceptions could also be an additional issue.
If your DLL contains calls to the functions, the linker will fail if a definition isn't provided for those functions. It doesn't matter if the calls are never executed, only that they exist.
//// testlib.h
#pragma once
#include <iostream>
#ifndef DLLEXPORT
#define DLLEXPORT(TYPE) TYPE __declspec(dllexport)
#endif
DLLEXPORT(void) test();
DLLEXPORT(int) a();
//// testlib_verify.c
#define DLLEXPORT(TYPE)
void DummyFunc()
{
#include testlib.h
}
This macro-based solution only works for functions with no parameters, but it should be easy to extend.

Intercept and forward C calls to a third-party library

I have an application (A) which calls a third-party shared library (C). I want to write a library of my own (B) that intercepts calls from A to C, and in some cases replace the calls with my own code, in some cases do some extra processing, then call the matching function in C, and in some cases just forward the calls to C directly.
The application is open-source, so I could just change each call site to call a similarly-named function in B, then call the corresponding function in C when required, but that would be a lot of work and would make merging upstream changes in the application difficult. I do not have the source to the third-party library. If it was header-only then I could just use namespaces to achieve this but I'm not sure how to go about it when both my library and the third-party library seemingly need the exact same symbols defined.
Is there even any way to make this work? I'm primarily targeting OS X, but would like it to work on linux and then eventually Windows as well.
You can use LD_PRELOAD to point to your own shared library. With LD_PRELOAD all functions in your shared library will be called instead of functions with the same name in those other libraries.
If you wish to inject code and then call the original functions you will need to call dlsym to get the original function from the third party library.
There are some examples here: https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/
One solution is as follows.
In a header file, #define all functions you want to intercept to your wrapper functions.
#define foo wrap_foo
#define bar wrap_bar
Stick this in some header file that is included everywhere, so that all of the code includes this header. This header should get included before the header for the library that you are intercepting. gcc also has an -include option that causes a header file to be included in all sources.
The goal here is to replace all calls to actual functions with calls to the wrapper functions.
Now define your wrapper functions somewhere, and intercept the calls as you wish. This file should not include the wrapper header that you created earlier.
int wrap_foo() {
return foo();
}
int wrap_bar() {
return bar();
}
Link it all together.
Note that this will let you intercept calls to foo() and bar() etc. only in the code that is compiled including the wrapper header. Any pre-compiled code (in libraries or objects) will link to foo() and bar() etc. directly.

Enforce linking with a shared library with -Wl,--as-needed (when only templates are provided)

I am creating a template-only C++ library. However, I'd like to provide an 'empty' shared library as well, so that through controlling SONAME I would be able to enforce rebuilds of the template consumers whenever the templates change in a way resulting in instantiated template ABI incompatibility.
Sadly, if a particular user has -Wl,--as-needed in his LDFLAGS, the linker is going to remove my shared library from NEEDED because the compiled executable is not requesting any symbols from it. How can I ensure that the program will always be linked against my library, preferably not introducing unnecessary dummy function calls (or if I have to, making them least burdening)?
Edit: as a note, the particular template class provides static methods, and usually only those static methods are used. Thus, it is not a good idea to rely on anything put in the constructor, and I'd really like to avoid burdening all the methods with some kind of enforcement.
Inspired by #EmployedRussian, I achieved:
extern int dummy;
namespace
{
struct G
{
inline G()
{
dummy = 0;
}
};
static const G g;
}
But sadly, that performs the assignment once for every unit including the header file.
However, I'd like to provide an 'empty' shared library as well, so that through controlling SONAME I would be able to enforce rebuilds of the template consumers whenever the templates change in a way resulting in instantiated template ABI incompatibility.
This will force an error at runtime.
You can trivially achieve the same result (runtime error) without using SONAME. In one of your template headers, put in a global object that will at runime
Take address of or call libmysolib_version_<N>, or
Do dlopen(libmysolib.so, ...) and dlsym("libmysolib_version_<N>", ...)
Then just keep incrementing N every time you break the ABI.
preferably not introducing unnecessary dummy function calls
Taking address of libmysolib_version_<N> does not call a function; it just forces the runtime linker to find that symbol once (at startup). You may though run afoul of the linker garbage collection.
I'd recommend an alternative approach:
myheader.h
namespace mylib_1 {
void foo();
//all the code goes there
}
namespace mylib = mylib_1;
User calls:
mylib::foo()
The code that uses different myheader version would not link as it would change the signature of the functions.
This approach is used by ICU