How to hide the exported symbols name within a shared library - c++

For VC, I can write a DEF file and use the 'NONAME' directive to leaving only the ordinal number in dll's export table.
How could I do the same thing with gcc and ELF format shared library?
Or, is there something equivalent in ELF shared library like the ordinal number in a PE format DLL? If not, how could I hide the exported symbol's name within a shared library?
======================================
UPDATE: Some additional descriptions:
In Windows, you can export a function by only place a integer ID (the ordinal) with an empty name.
To show it, the normal layout for a dll's export table looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo22.gif.
the "NONAME" one looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo23.gif.
Notice the functions name are "N/A" in the second picture. Here is a full explaination of it: hxxp://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm.
======================================
UPDATE: A lot of thanks for everyone who make me advice. Finally, I decide to keeping use static library on linux/posix platforms. But extract the small "special part" (which is using some features not suitable for static lib, e.g: TLS Slot, etc.) to a normal shared-library. Because the small normal shared-library only doing few things, and these work are totally insensitive, so there is no need to obscure/hide its APIs.
I think it's the simplest way to solve my problem :-D

The previous answers regarding attribute ((visibility ("hidden"))) is good when you want to maintain the code long term, but if you only have a few symbols that you want visible and want a quick fix... On the symbols that you want to export use, add
__attribute__ ((visibility ("default")))
Then you can pass -fvisibility=hidden to the compiler
There is a thorough explanation here:
http://gcc.gnu.org/wiki/Visibility
Edit: An alternative would be to build a static library/archive (make .a archive with ar -cru mylib.a *.o) or combine the objects into a single object file according to this combine two GCC compiled .o object files into a third .o file
If you are asking "Why combine object files instead of just making a static library?" ... because the linker will treat .o files differently than .a files (I don't know why, just that it does), specifically it will allow you to link a .o file into a shared library or a binary even if all of the symbols are hidden (even the ones you are using) This has the added benefit of reducing startup times (one less DSO and a lot less symbols to look up) and binary size (the symbols typically make up ~20% of the size and stripping only takes care of about half of that - just the externally visible parts)
for binaries strip --strip-all -R .note -R .comment mybinary
for libraries strip --strip-unneeded -R .note -R .comment mylib.so
More on the benefits of static linking here: http://sta.li/faq but they don't discuss licensing issues which are the main reason not to use a static library and since you are wanting to hide your API, that may be an issue
Now that we know have an object that is "symbol clean", it is possible to use our combined object to build a libpublic.so by linking private.o and public.c (which aliases/exports only what you want public) into a shared library.
This method lends itself well to finding the "extra code" that is unneeded in your public API as well. If you add -fdata-sections -ffunction-sections to your object builds, when you link with -Wl,--gc-sections,--print-gc-sections , it will eliminate unused sections and print an output of what was removed.
Edit 2 - or you could hide the whole API and alias only the functions you want to export
alias ("target")
The alias attribute causes the declaration to be emitted as an alias for another symbol, which must be specified. For instance,
void __f () { /* Do something. */; }
void f () __attribute__ ((weak, alias ("__f")));
defines f' to be a weak alias for __f'. In C++, the mangled name for the target must be used. It is an error if `__f' is not defined in the same translation unit.
Not all target machines support this attribute.

You could consider using GCC function attribute for visibility and make it hidden, i.e. adding __attribute__((visibility ("hidden"))) at many appropriate places in your header file.
You'll then hide thus your useless symbols, and keep the good ones.
This is a GCC extension (perhaps supported by other compilers like Clang or Icc).
addenda
In the Linux world, a shared library should export functions (or perhaps global data) by their names, as published in header files. Otherwise, don't call these functions "exported" -they are not!
If you absolutely want to have a function in a shared library which is reachable but not exported, you could register it in some way (for instance, putting the function pointer in some slot of a global data, e.g. an array), this means that you have (or provide) some function registration machinery. But this is not an exported function anymore.
To be more concrete, you could have in your main program a global array of function pointers
// in a global header.h
// signature of some functions
typedef void signature_t(int, char*);
#define MAX_NBFUN 100
// global array of function pointers
extern signature_t *funtab[MAX_NBFUN];
then in your main.c file of your program
signature_t *funtab[MAX_NBFUN];
Then in your shared object (.e.g. in myshared.c file compiled into libmyshared.so) a constructor function:
static my_constructor(void) __attribute__((constructor));
static myfun(int, char*); // defined elsewhere is the same file
static void
my_constructor(void) { // called at shared object initialization
funtab[3] = myfun;
}
Later on your main program (or some other shared object) might call
funtab[3](124, "foo");
but I would never call such things "exported" functions, only reachable functions.
See also C++ software like Qt, FLTK, RefPerSys, GCC, GTKmm, FOX-Toolkit, Clang, etc.... They all are extendable thru plugins or callbacks or closures (and internally a good C++ compiler would emit and optimize calls to closures for C++ lambda expressions). Look also inside interpreters like Python, fish, Lua, or GNU guile, you can extend them with C++ code.
Consider also generating machine code on the fly and using it in your program. Libraries like asmjit or libgccjit or LLVM or GNU lightning could be helpful.
On Linux, you might generate at runtime some C++ code into /tmp/generated.cc, compile that code into a /tmp/generated-plugin.so plugin by forking (perhaps with system(3) or popen(3)...) some command like g++ -Wall -O -fPIC -shared /tmp/generated.cc -o /tmp/generated-plugin.so then use dlopen(3) and dlsym(3). Use then extern "C" functions, and see the C++ dlopen minihowto. You might be interested in __attribute__((constructor)).
My personal experience (in past projects that I am not allowed to mention here, but are mentioned on my web page) is that you can on Linux generate many hundred thousands plugins. I would still dare mention my manydl.c program (whose GPLv3+ license allows you to adapt it to C++).
At the conceptual level, reading the GC handbook might be helpful. There is a delicate issue in garbage collecting code (or plugins).
Read also Drepper's paper How to write shared libraries, see elf(5), ld(1), nm(1), readelf(1), ldd(1), execve(2), mmap(2), syscalls(2), dlopen(3), dlsym(3), Advanced Linux Programming, the Program Library HOWTO, the C++ dlopen mini-howto, and Ian Taylor's libbacktrace.

To hide the meaning of the exported functions on UNIX, you can just obfuscate their names with simple renaming, by using #defines. Like this:
#define YourGoodFunction_CreateSomething MeaninglessFunction1
#define YourGoodFunction_AddSomethingElseToSomething lxstat__
#define YourGoodFunction_SaveSomething GoAway_Cracker
#define YourGoodFunction_ReleaseSomething Abracadabra
and so on.
In a case of a few functions it can be done by hands. If you need thousands, you should use code generation.
get the list of your real function names, use grep, awk, cut, etc.
prepare a dictionary of the meaningless names
write a script (or binary) generator which will output a C header file with #defines as shown above.
The only question is how you can get the dictionary. Well, I see a few options here:
you could ask your co-workers to randomly type on their keyboards ;-)
generate a random strings like: read(/dev/urandom, 10-20 bytes) | base64
use some real dictionary (general English, specific domain)
collect real system API names and change them a bit: __lxstat -> lxstat__
this is limited only by your imagination.

You can write a version-script and pass it to the linker to do this.
A simple script looks like this:
testfile.exp:
{
global:
myExportedFunction1;
myExportedFunction2;
local: *;
}
Then link your executable with the following options:
-Wl,--version-script=testfile.exp
When applied to a shared library this will still list the symbols in the .so file for debugging purposes, but it is not possible to access them from the outside of the library.

I was looking for a solution for the same problem. So, far I couldn't find a robust solution. However, as a prove of concept I used objcopy to achieve desired results. Basically, after compiling an object file I redefine some of its symbols. Then the translated object file is used to build the final shared object or executable. As a result the class/method names that could be used as a hint to reverse engineer my algorithm are completely renamed by some meaningless names m1,m2,m3.
Here is the test I used to ensure that the idea works:
Makefile:
all: libshared_object.so executable.exe
clean:
rm *.o *.so *.exe
libshared_object.so : shared_object.o
g++ -fPIC --shared -O2 $< -o $#
strip $#
shared_object.o : shared_object.cpp interface.h
g++ -fPIC -O2 $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
executable.exe : executable.o libshared_object.so
g++ -O2 -lshared_object -L. $< -o $#
strip $#
executable.o : executable.cpp interface.h
g++ -O2 -lshared_object -L. $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
run: all
LD_LIBRARY_PATH=. ./executable.exe
interface.h
class MyVerySecretClass
{
private:
int secret_var;
public:
MyVerySecretClass();
~MyVerySecretClass();
void secret_method1();
void secret_method2();
void secret_method3();
};
shared_object.cpp
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include "interface.h"
MyVerySecretClass::MyVerySecretClass()
: secret_var(0)
{}
MyVerySecretClass::~MyVerySecretClass()
{
secret_var = -1;
}
void MyVerySecretClass::secret_method1()
{
++secret_var;
}
void MyVerySecretClass::secret_method2()
{
printf("The value of secret variable is %d\n", secret_var);
}
void MyVerySecretClass::secret_method3()
{
char cmdln[128];
sprintf( cmdln, "pstack %d", getpid() );
system( cmdln );
}
executable.cpp
#include "interface.h"
int main ( void )
{
MyVerySecretClass o;
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method3();
return 0;
}

Related

Relocatable code for .SO and .DLL libraries

I am developing a C++ library that requires some external assembly functions to be included.
Currently, the C/C++ functions are being declared this way (generic format, not the exact code):
#if defined _WIN32
#define DLL_ENTITY __declspec(dllexport)
#endif
#if defined _WIN32
DLL_ENTITY
#endif
int Function (int argument);
and compile it (using a Makefile) with GCC using the -fPIC flag to create relocatable code that can be used from the programs linked to my library. For example (one command output of my Makefile):
g++ -I`pwd`/.. -Wall -fPIC -march=x86-64 -mtune=generic -g -c sort.cpp
and in Windows I create and configure a project with executable format DLL option in Visual Studio, then it does all the job.
Okay, my assembly functions look like this:
global function
global _function
function:
_function:
ENTER reserved_bytes,nest_level
; ...get possible parameters...
; ...do something...
LEAVE
ret
Well, according to the NASM manual, for the Windows DLL libraries I must add something like:
export function
My doubts are these:
1) For Linux it does not mention nothing about 'export', I guess it is the same way as the C/C++ function/class prototypes that do not require any special treatment, they are declared in the same way as in standalone programs. Is it right? Just use 'export' for Windows and nothing for Linux?
2) What about the relocatable code generation? For Windows, the 'export' keyword makes it relocatable or JUST EXPORTABLE? And for Linux, do I need to use some flag equivalent to -fPIC or I must create the relocatable code by using BASED addressing? For example:
add WORD[BP+myNumber],10h
instead of
add WORD[myNumber],10h
but in this case, how can I find the base address of the function to set BP (EBP/RBP) to it (just in case that I require to access LOCAL variables or data)?

Link two different version of protobuf library in the same C++ project

can I use both protobuf 2.6 and 3.0 libs in the same c++ project and link them together?
You cannot link two different versions of libprotobuf into the same program. (It may be possible on some OS's, but it definitely won't work on Linux, where the declarations with the same names would overwrite each other. It may be possible to make work on Windows or Mac, but it's probably not a good idea to rely on this.)
However, you don't need to do this. libprotobuf 3.x supports both "proto3" and "proto2" syntax. As long as you can rebuild your code from source (including regenerating the .pb.h and .pb.cc files), you should be able to rebuild everything using version 3.x, even if some of the proto files use proto2-exclusive features.
While C++ might not support the concept of linking multiple versions of the same symbol into a single object, it can still be done. Executable formats like ELF or PE support many things that are not part of the C++ standard. Using symbol visibility and partial linking, one can have code that uses two different copies of the same symbols.
I'm guessing you want to link to two different already compiled protobuf shared libraries. That won't work. You'll have to statically link at least one protobuf and compile it yourself.
It would look something like this:
// lib1.c
void test(void) { printf("test version 1\n"); }
// lib2.c
void test(void) { printf("test version 2\n"); }
// uselib1.c
void test(void);
void usetest(void) { test(); }
// main.c
void test(void);
void usetest(void);
int main(void) { usetest(); test(); }
We want usetest() from uselib1.c to call the version of test() that's in lib1.c, while main() should call the version that's in lib2.c. If we just link all these together, it doesn't work:
$ gcc uselib1.c lib1.c main.c lib2.c
/tmp/ccqQhm5c.o: In function `test':
lib2.c:(.text+0x0): multiple definition of `test'
You can't have multiple copies if test(). But what we can do is partially link just uselib1 and lib1, which works since there is only one test() with just those two objects. Then the symbols from lib1 are localized so that nothing else using the combined uselib1+lib1 will see the lib1 symbols.
$ gcc -c -fvisibility=hidden lib1.c
$ gcc -c uselib1.c
$ ld -r uselib1.o lib1.o -o combined1.o
$ objcopy --localize-hidden combined1.o
$ gcc main.c lib2.c combined1.o
$ ./a.out
test version 1
test version 2
When compiling lib1, I use -fvisibility=hidden to mark all the symbols in lib1 as hidden. This would make a difference if it were a shared library. As an object (or static lib) they can still be used by other code, and they are used when "ld -r" partially links lib1.o and uselib1.o into combined1.o. Then objcopy localizes all the hidden symbols. This has the effect of making the copy of test() inside combined1.o act like it was a static function. When combined1.o, main.c, and lib2.c are all linked, main.c will use test() from lib2.c like we want.
Of course, using two different versions of the same library in one project is a nightmare to maintain. You'll constantly be including the wrong version's headers and getting subtle bugs.
I'm not familiar with the library but in general no unless each library is 100% contained in it's own unique namespace. Otherwise there will be a name clashes with each class, function, etc.

How dynamic linking works, its usage and how and why you would make a dylib

I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.

Can an export map select only the functions you want to link to?

I am writing a test harness with Googletest and need to control the symbol table to avoid conflicts (the code base is mainly C with a bit of C++ on Linux).
I am looking for a way to link against only the functions I want in a file and also to be able to create custom sets of functions to link against for each test.
This is a bit broad I know but any suggestions or ideas will be most welcome!
You can use a version script for your linker to define, which symbols should be exported in the symbol table.
Such a version script can look like this:
{
global:
symb1;
symb2;
symb3;
local: *;
};
This example will only export the symbols symb1-3, all other symbols are omitted from the symbol table.
Now specify this script as version script for the linker, an example for a shared library:
cc -shared obj1.o obj2.o obj3.o -o library.so -Wl,--version-script=<scriptname>
Even more control can be gained through symbol versions, more details can be found in the ld-documentation: http://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_25.html

gcc: Linking C library in a C++ app results in "multiple definition of" errors

I have a working C library which I want to link to a C++ application using gcc but the linker (g++) is giving me the "multiple definition" error. With a C application and gcc it works.
The headers defining the interface all contain the:
#ifdef __cplusplus
extern "C" {
#endif
I checked the library using the "nm" command and it does have multiple definitions of the method (the method in question is not from the public interface).
My questions are:
Why does my library have multiple definitions (some have the T while others have U)?
Why it works if the application including the file is a C application (I'm using -Wall to build)?
Do I need any special attribute or use a specific file extension to make it work or is the case that I need to go back to programming school :) ?
Paying more attention to the lib.a file I can see that one of the objects is included twice. For example, I have two sections for the same object:
obj1.o
00000000 T Method
obj2.o
00000000 T Hello
obj1.o
00000000 T Method
I guess this is the problem?
Any help is really appreciated.
My wild guess is that the "#define BLAHBLAH_H" and "#ifndef BLAHBLAH_H / #endif" set outside the 'extern "C"{}' thing.
after playing around I found that actually the whole command line (it's kind of a complex application with an automated compilation and linkage) contained the --whole-archive parameter before the inclusion of the C library. Moving the library after the --no-whole-archive fixed the problem.
Original command
gcc -Wl,**--whole-archive** -l:otherlibs *-Llibpath -l:libname* Wl,**--no-whole-archive** -o myApp hello.c
Fixed command
gcc -Wl,**--whole-archive** -l:otherlibs Wl,**--no-whole-archive** *-Llibpath -l:libname* -o myApp hello.c
Thank you for everyone's help guys and sorry if I didn't provide enough/accurate information.
Best Regards