I'm using a 3rd party static library in my application (which means I can't recompile it).
This library was built using -stdlib=libstdc++ i.e. for compatibility with macOS versions < 10.9: since this version, stdlib=libc++ by default.
But this means all my application code has to be built with -stdlib=libstdc++, which prevents me to use some C++11 features.
So I tried to convert this static library to a shared library, but then the symbols were not visible.
Indeed, using nm I can see they are marked t instead of T which seems to be OK when linking statically, but is not when linking dynamically.
I wanted to change the visibility of the symbols I need to global.
I'm not sure how I can achieve it on macOS, since objcopy is not available here (it as an option called--globalize-symbol which seems to do what I want, according to this SO answer)
I managed to fix this using only ld.
ld -r lib.a -o new_lib.a -alias _old_function_name _new_function_name
Symbols exported this way are marked as global.
It it nos possible to reuse the same name in one call, but it is when doing this twice i.e. _function(t) -> function(T) -> _function(T)
Then I just built my shared library using something like:
g++ -fpic -shared -Wl,-force_load new_lib.a -stdlib=libstdc++ -o lib.dylib
The only thing that bothers me is that I didn't find how to remove the old symbols when creating the new, global ones, but this doesn't seem to create any issue.
I want to hide as much information as I can from ldd, so I'm learning how to statically link in libraries instead of dynamically linking them. I've read from another stackoverflow post that the correct syntax is
g++ -ldynamiclib -o exe files.cpp staticlib.a
Thus, my current compilation code looks like this:
STATIC_LIB=""
STATIC_LIB="$STATIC_LIB ${PATH}/libcrypto.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libdl-2.5.so" // I couldn't find the .a version for this, so I tried doing it this way, and have also tried doing just -ldl
STATIC_LIB="$STATIC_LIB ${PATH}/libstdc++.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libgcc.a"
STATIC_LIB="$STATIC_LIB ${PATH}/libc.a"
g++ -g -I${INCLUDE_PATH} -o executable file1.cpp file2.cpp $STATIC_LIB
I've confirmed with ldd that this way works for libcrypto, as it is an external library that I brought in. However, this does not work at all for everything else, and I can still see them being listed when I use ldd. Does anyone knows the correct way of doing this?
P.S. I've also tried several other alternatives such as including -static, or using -Wl,-Bstatic, and I couldn't get either of those to work. Not sure if it's my syntax or if it's just not possible.
Those libraries libstdc++, libgcc and libc are special in that they're very fundamental to the running of any program compiled with gcc. Special gcc options exist if you want to link them statically, namely -static-libstdc++ and -static-libgcc.
Note that you should really know what you're doing if you choose these options. It can create portability problems for your program, many of which express themselves in unintuitive ways.
I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.
I have a library called A.a, and its .hpp file called A.hpp. When programs need to use this library, they #include "A.hpp", and get linked to it like this: g++ test1.cpp A.a -o test1. I'd like to be able to only compile it like this g++ test1.cpp -o test1, without explicitly typing A.a in there, just like I don't need to explicitly link my program with iostream. How can I achieve this?
It can be done on Visual C++ (the compiler can embed some linker options in object files, requests to link a library being one of those that are possible).
Gcc (and, to my knowledge, clang) do not have such a feature. You have to provide the libraries on the command line; there is no way around it (build tools are not technically a way around it; they also put the libraries onto the command lines they use to run the linker).
For VC, I can write a DEF file and use the 'NONAME' directive to leaving only the ordinal number in dll's export table.
How could I do the same thing with gcc and ELF format shared library?
Or, is there something equivalent in ELF shared library like the ordinal number in a PE format DLL? If not, how could I hide the exported symbol's name within a shared library?
======================================
UPDATE: Some additional descriptions:
In Windows, you can export a function by only place a integer ID (the ordinal) with an empty name.
To show it, the normal layout for a dll's export table looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo22.gif.
the "NONAME" one looks like this: http://home.hiwaay.net/~georgech/WhitePapers/Exporting/HowTo23.gif.
Notice the functions name are "N/A" in the second picture. Here is a full explaination of it: hxxp://home.hiwaay.net/~georgech/WhitePapers/Exporting/Exp.htm.
======================================
UPDATE: A lot of thanks for everyone who make me advice. Finally, I decide to keeping use static library on linux/posix platforms. But extract the small "special part" (which is using some features not suitable for static lib, e.g: TLS Slot, etc.) to a normal shared-library. Because the small normal shared-library only doing few things, and these work are totally insensitive, so there is no need to obscure/hide its APIs.
I think it's the simplest way to solve my problem :-D
The previous answers regarding attribute ((visibility ("hidden"))) is good when you want to maintain the code long term, but if you only have a few symbols that you want visible and want a quick fix... On the symbols that you want to export use, add
__attribute__ ((visibility ("default")))
Then you can pass -fvisibility=hidden to the compiler
There is a thorough explanation here:
http://gcc.gnu.org/wiki/Visibility
Edit: An alternative would be to build a static library/archive (make .a archive with ar -cru mylib.a *.o) or combine the objects into a single object file according to this combine two GCC compiled .o object files into a third .o file
If you are asking "Why combine object files instead of just making a static library?" ... because the linker will treat .o files differently than .a files (I don't know why, just that it does), specifically it will allow you to link a .o file into a shared library or a binary even if all of the symbols are hidden (even the ones you are using) This has the added benefit of reducing startup times (one less DSO and a lot less symbols to look up) and binary size (the symbols typically make up ~20% of the size and stripping only takes care of about half of that - just the externally visible parts)
for binaries strip --strip-all -R .note -R .comment mybinary
for libraries strip --strip-unneeded -R .note -R .comment mylib.so
More on the benefits of static linking here: http://sta.li/faq but they don't discuss licensing issues which are the main reason not to use a static library and since you are wanting to hide your API, that may be an issue
Now that we know have an object that is "symbol clean", it is possible to use our combined object to build a libpublic.so by linking private.o and public.c (which aliases/exports only what you want public) into a shared library.
This method lends itself well to finding the "extra code" that is unneeded in your public API as well. If you add -fdata-sections -ffunction-sections to your object builds, when you link with -Wl,--gc-sections,--print-gc-sections , it will eliminate unused sections and print an output of what was removed.
Edit 2 - or you could hide the whole API and alias only the functions you want to export
alias ("target")
The alias attribute causes the declaration to be emitted as an alias for another symbol, which must be specified. For instance,
void __f () { /* Do something. */; }
void f () __attribute__ ((weak, alias ("__f")));
defines f' to be a weak alias for __f'. In C++, the mangled name for the target must be used. It is an error if `__f' is not defined in the same translation unit.
Not all target machines support this attribute.
You could consider using GCC function attribute for visibility and make it hidden, i.e. adding __attribute__((visibility ("hidden"))) at many appropriate places in your header file.
You'll then hide thus your useless symbols, and keep the good ones.
This is a GCC extension (perhaps supported by other compilers like Clang or Icc).
addenda
In the Linux world, a shared library should export functions (or perhaps global data) by their names, as published in header files. Otherwise, don't call these functions "exported" -they are not!
If you absolutely want to have a function in a shared library which is reachable but not exported, you could register it in some way (for instance, putting the function pointer in some slot of a global data, e.g. an array), this means that you have (or provide) some function registration machinery. But this is not an exported function anymore.
To be more concrete, you could have in your main program a global array of function pointers
// in a global header.h
// signature of some functions
typedef void signature_t(int, char*);
#define MAX_NBFUN 100
// global array of function pointers
extern signature_t *funtab[MAX_NBFUN];
then in your main.c file of your program
signature_t *funtab[MAX_NBFUN];
Then in your shared object (.e.g. in myshared.c file compiled into libmyshared.so) a constructor function:
static my_constructor(void) __attribute__((constructor));
static myfun(int, char*); // defined elsewhere is the same file
static void
my_constructor(void) { // called at shared object initialization
funtab[3] = myfun;
}
Later on your main program (or some other shared object) might call
funtab[3](124, "foo");
but I would never call such things "exported" functions, only reachable functions.
See also C++ software like Qt, FLTK, RefPerSys, GCC, GTKmm, FOX-Toolkit, Clang, etc.... They all are extendable thru plugins or callbacks or closures (and internally a good C++ compiler would emit and optimize calls to closures for C++ lambda expressions). Look also inside interpreters like Python, fish, Lua, or GNU guile, you can extend them with C++ code.
Consider also generating machine code on the fly and using it in your program. Libraries like asmjit or libgccjit or LLVM or GNU lightning could be helpful.
On Linux, you might generate at runtime some C++ code into /tmp/generated.cc, compile that code into a /tmp/generated-plugin.so plugin by forking (perhaps with system(3) or popen(3)...) some command like g++ -Wall -O -fPIC -shared /tmp/generated.cc -o /tmp/generated-plugin.so then use dlopen(3) and dlsym(3). Use then extern "C" functions, and see the C++ dlopen minihowto. You might be interested in __attribute__((constructor)).
My personal experience (in past projects that I am not allowed to mention here, but are mentioned on my web page) is that you can on Linux generate many hundred thousands plugins. I would still dare mention my manydl.c program (whose GPLv3+ license allows you to adapt it to C++).
At the conceptual level, reading the GC handbook might be helpful. There is a delicate issue in garbage collecting code (or plugins).
Read also Drepper's paper How to write shared libraries, see elf(5), ld(1), nm(1), readelf(1), ldd(1), execve(2), mmap(2), syscalls(2), dlopen(3), dlsym(3), Advanced Linux Programming, the Program Library HOWTO, the C++ dlopen mini-howto, and Ian Taylor's libbacktrace.
To hide the meaning of the exported functions on UNIX, you can just obfuscate their names with simple renaming, by using #defines. Like this:
#define YourGoodFunction_CreateSomething MeaninglessFunction1
#define YourGoodFunction_AddSomethingElseToSomething lxstat__
#define YourGoodFunction_SaveSomething GoAway_Cracker
#define YourGoodFunction_ReleaseSomething Abracadabra
and so on.
In a case of a few functions it can be done by hands. If you need thousands, you should use code generation.
get the list of your real function names, use grep, awk, cut, etc.
prepare a dictionary of the meaningless names
write a script (or binary) generator which will output a C header file with #defines as shown above.
The only question is how you can get the dictionary. Well, I see a few options here:
you could ask your co-workers to randomly type on their keyboards ;-)
generate a random strings like: read(/dev/urandom, 10-20 bytes) | base64
use some real dictionary (general English, specific domain)
collect real system API names and change them a bit: __lxstat -> lxstat__
this is limited only by your imagination.
You can write a version-script and pass it to the linker to do this.
A simple script looks like this:
testfile.exp:
{
global:
myExportedFunction1;
myExportedFunction2;
local: *;
}
Then link your executable with the following options:
-Wl,--version-script=testfile.exp
When applied to a shared library this will still list the symbols in the .so file for debugging purposes, but it is not possible to access them from the outside of the library.
I was looking for a solution for the same problem. So, far I couldn't find a robust solution. However, as a prove of concept I used objcopy to achieve desired results. Basically, after compiling an object file I redefine some of its symbols. Then the translated object file is used to build the final shared object or executable. As a result the class/method names that could be used as a hint to reverse engineer my algorithm are completely renamed by some meaningless names m1,m2,m3.
Here is the test I used to ensure that the idea works:
Makefile:
all: libshared_object.so executable.exe
clean:
rm *.o *.so *.exe
libshared_object.so : shared_object.o
g++ -fPIC --shared -O2 $< -o $#
strip $#
shared_object.o : shared_object.cpp interface.h
g++ -fPIC -O2 $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
executable.exe : executable.o libshared_object.so
g++ -O2 -lshared_object -L. $< -o $#
strip $#
executable.o : executable.cpp interface.h
g++ -O2 -lshared_object -L. $< -c -o $#
objcopy --redefine-sym _ZN17MyVerySecretClass14secret_method1Ev=m1 \
--redefine-sym _ZN17MyVerySecretClass14secret_method2Ev=m2 \
--redefine-sym _ZN17MyVerySecretClass14secret_method3Ev=m3 $#
run: all
LD_LIBRARY_PATH=. ./executable.exe
interface.h
class MyVerySecretClass
{
private:
int secret_var;
public:
MyVerySecretClass();
~MyVerySecretClass();
void secret_method1();
void secret_method2();
void secret_method3();
};
shared_object.cpp
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include "interface.h"
MyVerySecretClass::MyVerySecretClass()
: secret_var(0)
{}
MyVerySecretClass::~MyVerySecretClass()
{
secret_var = -1;
}
void MyVerySecretClass::secret_method1()
{
++secret_var;
}
void MyVerySecretClass::secret_method2()
{
printf("The value of secret variable is %d\n", secret_var);
}
void MyVerySecretClass::secret_method3()
{
char cmdln[128];
sprintf( cmdln, "pstack %d", getpid() );
system( cmdln );
}
executable.cpp
#include "interface.h"
int main ( void )
{
MyVerySecretClass o;
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method1();
o.secret_method2();
o.secret_method3();
return 0;
}