Limiting the scope of global symbols from linked objects - c++

I have a C library in an archive file, clib.a. I've written a C++ wrapper for it, cpp.o, and would like to use this as a static library:
ar cTrvs cppwrap.a clib.a cpp.o
Code which links to this won't be able to use the stuff from clib.a directly unless the correct header is included. However, if someone coincidentally creates an appropriate prototype -- e.g. void myCoincidentallyNamedGlobalFunction() -- I'm concerned which definition of myCoincidentallyNamedGlobalFunction will apply.
Since the symbols from clib.a only need to be accessed in cpp.o, and not anything linked to cppwrap.a, is there a way to completely hide them so that there is no possible collision (so even including the clib header would fail)?

You can manually remove unneeded symbols on the final combined library:
$ objcopy -N foo cppwrap.a (remove symbol)
Or, if you need the symbols but want to make sure that external users can't get to them:
$ objcopy -L bar cppwrap.a (localize symbol)
Or, if a symbol in clib.a must be visible by something in cpp.o but you don't want it to be used by anyone else:
$ objcopy -W baz cppwrap.a (weaken symbol)
In this case, collisions with symbols from other object files/libraries will defer to their usage, even though the symbol will still be visible. To obscure things further or to reduce chances of even a deferential collision, you can also use:
$ objcopy --redefine-sym old=new cppwrap.a
An anonymous namespace may help in some cases, but not if there's functionality that your wrapper needs but is trying to hide from external users.

Related

hide private symbols automatically

I have a C++ project with public and private header files.
To increase encapsulation and decrease symbol clashes in a larger project I would like to export only the minimal set of symbols.
Although we could manually annotate each function with visibility attributes, I'd prefer an approach that does not require changing the source code.
Given the following project structure:
LibA
include
*.h
src
*.h
*.cpp
Is there a way to automatically hide all the symbols that don't appear in include/*.h ?
Is there an elegant way of instrumenting the compiler/linker?
Could we automatically generate a version-script ?
With gcc and clang, this is as simple as building with -fvisibility=hidden. Then you only have to explicitly export the few public symbols you want exposed.
For more details, there's a gcc article on symbol visibility that you may want to read.
Could we automatically generate a version-script?
You sure could: run nm -C *.o | egrep ' [TDBW] ' to get the list of global symbols, then look in include/*.h to see which ones should be exported. This will likely be fragile: if you e.g. use macros to generate symbol names, this will probablynot work at all.
It may be worth it to generate the list once, hand-curate it, and then maintain it together with the sources by hand in the revision control system.
If the number of symbols to be exported is relatively small, compiling with -fvisibility-hidden and annotating just the public symbols is a much more robust solution.

Is it possible to artificially induce object file extraction for a given static library?

I was recently reading this answer and noticed that it seems inconvenient for users to have to link static libraries in the correct order.
Is there some flag or #pragma I can pass to gcc when compiling my library so that my library's object files will always be included?
To be more specific, I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
Is there some flag or #pragma I can pass to gcc
No.
I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
Ship your "library" as a single object file. In other words, instead of:
ar ru libMyLibrary.a ${OBJS}
use:
ld -r -o libMyLibrary.a ${OBJS}
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
You can name your object file libMyLibrary.a. I believe the linker will search for it using the usual rules, but when it finds it, it will discover that this is an object file, and treat it as such, despite it being "misnamed". This should work at least on Linux and other ELF platforms. I am not sure whether it will work on Windows.

Can an export map select only the functions you want to link to?

I am writing a test harness with Googletest and need to control the symbol table to avoid conflicts (the code base is mainly C with a bit of C++ on Linux).
I am looking for a way to link against only the functions I want in a file and also to be able to create custom sets of functions to link against for each test.
This is a bit broad I know but any suggestions or ideas will be most welcome!
You can use a version script for your linker to define, which symbols should be exported in the symbol table.
Such a version script can look like this:
{
global:
symb1;
symb2;
symb3;
local: *;
};
This example will only export the symbols symb1-3, all other symbols are omitted from the symbol table.
Now specify this script as version script for the linker, an example for a shared library:
cc -shared obj1.o obj2.o obj3.o -o library.so -Wl,--version-script=<scriptname>
Even more control can be gained through symbol versions, more details can be found in the ld-documentation: http://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_25.html

How to hide the exported symbols name within a shared library

For VC, I can write a DEF file and use the 'NONAME' directive to leaving only the ordinal number in dll's export table.
How could I do the same thing with gcc and ELF format shared library?
Or, is there something equivalent in ELF shared library like the ordinal number in a PE format DLL? If not, how could I hide the exported symbol's name within a shared library?
======================================
UPDATE: Some additional descriptions:
In Windows, you can export a function by only place a integer ID (the ordinal) with an empty name.
To show it, the normal layout for a dll's export table looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo22.gif.
the "NONAME" one looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo23.gif.
Notice the functions name are "N/A" in the second picture. Here is a full explaination of it: hxxp://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm.
======================================
UPDATE: A lot of thanks for everyone who make me advice. Finally, I decide to keeping use static library on linux/posix platforms. But extract the small "special part" (which is using some features not suitable for static lib, e.g: TLS Slot, etc.) to a normal shared-library. Because the small normal shared-library only doing few things, and these work are totally insensitive, so there is no need to obscure/hide its APIs.
I think it's the simplest way to solve my problem :-D
The previous answers regarding attribute ((visibility ("hidden"))) is good when you want to maintain the code long term, but if you only have a few symbols that you want visible and want a quick fix... On the symbols that you want to export use, add
__attribute__ ((visibility ("default")))
Then you can pass -fvisibility=hidden to the compiler
There is a thorough explanation here:
http://gcc.gnu.org/wiki/Visibility
Edit: An alternative would be to build a static library/archive (make .a archive with ar -cru mylib.a *.o) or combine the objects into a single object file according to this combine two GCC compiled .o object files into a third .o file
If you are asking "Why combine object files instead of just making a static library?" ... because the linker will treat .o files differently than .a files (I don't know why, just that it does), specifically it will allow you to link a .o file into a shared library or a binary even if all of the symbols are hidden (even the ones you are using) This has the added benefit of reducing startup times (one less DSO and a lot less symbols to look up) and binary size (the symbols typically make up ~20% of the size and stripping only takes care of about half of that - just the externally visible parts)
for binaries strip --strip-all -R .note -R .comment mybinary
for libraries strip --strip-unneeded -R .note -R .comment mylib.so
More on the benefits of static linking here: http://sta.li/faq but they don't discuss licensing issues which are the main reason not to use a static library and since you are wanting to hide your API, that may be an issue
Now that we know have an object that is "symbol clean", it is possible to use our combined object to build a libpublic.so by linking private.o and public.c (which aliases/exports only what you want public) into a shared library.
This method lends itself well to finding the "extra code" that is unneeded in your public API as well. If you add -fdata-sections -ffunction-sections to your object builds, when you link with -Wl,--gc-sections,--print-gc-sections , it will eliminate unused sections and print an output of what was removed.
Edit 2 - or you could hide the whole API and alias only the functions you want to export
alias ("target")
The alias attribute causes the declaration to be emitted as an alias for another symbol, which must be specified. For instance,
void __f () { /* Do something. */; }
void f () __attribute__ ((weak, alias ("__f")));
defines f' to be a weak alias for __f'. In C++, the mangled name for the target must be used. It is an error if `__f' is not defined in the same translation unit.
Not all target machines support this attribute.
You could consider using GCC function attribute for visibility and make it hidden, i.e. adding __attribute__((visibility ("hidden"))) at many appropriate places in your header file.
You'll then hide thus your useless symbols, and keep the good ones.
This is a GCC extension (perhaps supported by other compilers like Clang or Icc).
addenda
In the Linux world, a shared library should export functions (or perhaps global data) by their names, as published in header files. Otherwise, don't call these functions "exported" -they are not!
If you absolutely want to have a function in a shared library which is reachable but not exported, you could register it in some way (for instance, putting the function pointer in some slot of a global data, e.g. an array), this means that you have (or provide) some function registration machinery. But this is not an exported function anymore.
To be more concrete, you could have in your main program a global array of function pointers
// in a global header.h
// signature of some functions
typedef void signature_t(int, char*);
#define MAX_NBFUN 100
// global array of function pointers
extern signature_t *funtab[MAX_NBFUN];
then in your main.c file of your program
signature_t *funtab[MAX_NBFUN];
Then in your shared object (.e.g. in myshared.c file compiled into libmyshared.so) a constructor function:
static my_constructor(void) __attribute__((constructor));
static myfun(int, char*); // defined elsewhere is the same file
static void
my_constructor(void) { // called at shared object initialization
funtab[3] = myfun;
}
Later on your main program (or some other shared object) might call
funtab[3](124, "foo");
but I would never call such things "exported" functions, only reachable functions.
See also C++ software like Qt, FLTK, RefPerSys, GCC, GTKmm, FOX-Toolkit, Clang, etc.... They all are extendable thru plugins or callbacks or closures (and internally a good C++ compiler would emit and optimize calls to closures for C++ lambda expressions). Look also inside interpreters like Python, fish, Lua, or GNU guile, you can extend them with C++ code.
Consider also generating machine code on the fly and using it in your program. Libraries like asmjit or libgccjit or LLVM or GNU lightning could be helpful.
On Linux, you might generate at runtime some C++ code into /tmp/generated.cc, compile that code into a /tmp/generated-plugin.so plugin by forking (perhaps with system(3) or popen(3)...) some command like g++ -Wall -O -fPIC -shared /tmp/generated.cc -o /tmp/generated-plugin.so then use dlopen(3) and dlsym(3). Use then extern "C" functions, and see the C++ dlopen minihowto. You might be interested in __attribute__((constructor)).
My personal experience (in past projects that I am not allowed to mention here, but are mentioned on my web page) is that you can on Linux generate many hundred thousands plugins. I would still dare mention my manydl.c program (whose GPLv3+ license allows you to adapt it to C++).
At the conceptual level, reading the GC handbook might be helpful. There is a delicate issue in garbage collecting code (or plugins).
Read also Drepper's paper How to write shared libraries, see elf(5), ld(1), nm(1), readelf(1), ldd(1), execve(2), mmap(2), syscalls(2), dlopen(3), dlsym(3), Advanced Linux Programming, the Program Library HOWTO, the C++ dlopen mini-howto, and Ian Taylor's libbacktrace.
To hide the meaning of the exported functions on UNIX, you can just obfuscate their names with simple renaming, by using #defines. Like this:
#define YourGoodFunction_CreateSomething MeaninglessFunction1
#define YourGoodFunction_AddSomethingElseToSomething lxstat__
#define YourGoodFunction_SaveSomething GoAway_Cracker
#define YourGoodFunction_ReleaseSomething Abracadabra
and so on.
In a case of a few functions it can be done by hands. If you need thousands, you should use code generation.
get the list of your real function names, use grep, awk, cut, etc.
prepare a dictionary of the meaningless names
write a script (or binary) generator which will output a C header file with #defines as shown above.
The only question is how you can get the dictionary. Well, I see a few options here:
you could ask your co-workers to randomly type on their keyboards ;-)
generate a random strings like: read(/dev/urandom, 10-20 bytes) | base64
use some real dictionary (general English, specific domain)
collect real system API names and change them a bit: __lxstat -> lxstat__
this is limited only by your imagination.
You can write a version-script and pass it to the linker to do this.
A simple script looks like this:
testfile.exp:
{
global:
myExportedFunction1;
myExportedFunction2;
local: *;
}
Then link your executable with the following options:
-Wl,--version-script=testfile.exp
When applied to a shared library this will still list the symbols in the .so file for debugging purposes, but it is not possible to access them from the outside of the library.
I was looking for a solution for the same problem. So, far I couldn't find a robust solution. However, as a prove of concept I used objcopy to achieve desired results. Basically, after compiling an object file I redefine some of its symbols. Then the translated object file is used to build the final shared object or executable. As a result the class/method names that could be used as a hint to reverse engineer my algorithm are completely renamed by some meaningless names m1,m2,m3.
Here is the test I used to ensure that the idea works:
Makefile:
all: libshared_object.so executable.exe
clean:
rm *.o *.so *.exe
libshared_object.so : shared_object.o
g++ -fPIC --shared -O2 $< -o $#
strip $#
shared_object.o : shared_object.cpp interface.h
g++ -fPIC -O2 $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
executable.exe : executable.o libshared_object.so
g++ -O2 -lshared_object -L. $< -o $#
strip $#
executable.o : executable.cpp interface.h
g++ -O2 -lshared_object -L. $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
run: all
LD_LIBRARY_PATH=. ./executable.exe
interface.h
class MyVerySecretClass
{
private:
int secret_var;
public:
MyVerySecretClass();
~MyVerySecretClass();
void secret_method1();
void secret_method2();
void secret_method3();
};
shared_object.cpp
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include "interface.h"
MyVerySecretClass::MyVerySecretClass()
: secret_var(0)
{}
MyVerySecretClass::~MyVerySecretClass()
{
secret_var = -1;
}
void MyVerySecretClass::secret_method1()
{
++secret_var;
}
void MyVerySecretClass::secret_method2()
{
printf("The value of secret variable is %d\n", secret_var);
}
void MyVerySecretClass::secret_method3()
{
char cmdln[128];
sprintf( cmdln, "pstack %d", getpid() );
system( cmdln );
}
executable.cpp
#include "interface.h"
int main ( void )
{
MyVerySecretClass o;
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method3();
return 0;
}

Is there a .def file equivalent on Linux for controlling exported function names in a shared library?

I am building a shared library on Ubuntu 9.10. I want to export only a subset of my functions from the library. On the Windows platform, this would be done using a module definition (.def) file which would contain a list of the external and internal names of the functions exported from the library.
I have the following questions:
How can I restrict the exported functions of a shared library to those I want (i.e. a .def file equivalent)
Using .def files as an example, you can give a function an external name that is different from its internal name (useful for prevent name collisions and also redecorating mangled names etc)
On windows I can use the EXPORT command (IIRC) to check the list of exported functions and addresses, what is the equivalent way to do this on Linux?
The most common way to only make certain symbols visible in a shared object on linux is to pass the -fvisibility=hidden to gcc and then decorate the symbols that you want to be visible with __attribute__((visibility("default"))).
If your looking for an export file like solution you might want to look at the linker option --retain-symbols-file=FILENAME which may do what you are looking for.
I don't know an easy way of exporting a function with a different name from its function name, but it is probably possible with an elf editor. Edit: I think you can use a linker script (have a look at the man page for ld) to assign values to symbols in the link step, hence giving an alternative name to a given function. Note, I haven't ever actually tried this.
To view the visible symbols in a shared object you can use the readelf command. readelf -Ds if I remember correctly.
How can I restrict the exported functions of a shared library to those I want (i.e. a .def file equivalent)
Perhaps you're looking for GNU Export Maps or Symbol Versioning
g++ -shared spaceship.cpp -o libspaceship.so.1
-Wl,-soname=libspaceship.so.1 -Wl,
--version-script=spaceship.expmap
gcc also supports the VC syntax of __declspec(dllexport). See this.
Another option is to use the strip command with this way:
strip --keep-symbol=symbol_to_export1 --keep-symbol=symbol_to_export2 ... \
libtotrip.so -o libout.so