Can a Visual Studio produced static library, be stripped of symbols? - c++

I'll divide this questions in 3 parts:
I would like to produce a static library and strip off its symbols. (Debug info is already not included)
Similar to the strip command in linux. Can it be done?
Is there an equivalent tool in windows env, to the nm tool in linux?
When creating a static library using VS2008. Is it possible to define a script that will exclude some of the produced .obj files out of the build and out of the static lib?
Can it be dynamic? I mean I'd define a compilation mode in the script and this would result in specific object files being excluded from the build

If anything is visible that you feel should not be, try declaring it with the "static" keyword. This tells the compiler that it is accessible only to the current module.

There are cases where it would be convenient to be able to strip out all but a small number of "exported" public symbols, but it's not really feasible.
A static library is little more than a collection of .obj files. The internal dependencies haven't been resolved yet, and they won't be resolved until link time.
For example, if your .lib consists of foo.obj and bar.obj, and there's a call in foo.obj to a function defined in bar.obj, then that symbol must be available at link time, even if nothing outside of the library should be able to see it.
For that reason, you cannot strip the symbols (with the possible exception of file-scope static symbols). Even class methods that are protected or private (in the C++-sense) will exist in the symbol table, since the enforcement of the visibility is a compile-time issue, not a link-time one.
In contrast, a dynamic library is a standalone binary that has already been linked. References from foo.obj to bar.obj have already been resolved. Thus a DLL can be stripped of symbols except for the ones that must be exported (and even those can be renamed or replaced by ordinals).
If your DLL exposes a simple C API, then you're all set. But if you want to expose a C++ class, you're probably going to end up exporting all of its methods, even the protected and private ones (since inlining in the external application might result in direct calls to private methods).

No, how do you think the users of the static library would link to it without knowing where are the symbols they use defined?
Yes, try the DUMPBIN utility.
Well, yes. You can run the LIB utility with /REMOVE:foo.
That said, I think you are doing something that either is not worth doing or could be done a lot simpler than with removing library members.

I kept finding the names of certain (but not all) static functions in .obj files produced by VS2010. Interestingly, they were visible in my Release .obj files but not the Debug .obj files. I just used cygwin strings to perform the search:
$ strings myObjectFile.obj | grep myStaticFunctionName
I tracked it down to the "Whole Program Optimization = Yes" setting ("/GL"). When I switched this to "No" the function names no longer appear.
Update: As a followup test I opened the "cleansed" myObjectFile.obj in vim and I can still find them (with either :set encoding=utf-8 or :set encoding=latin1). I'm not sure why strings was missing the matches. Oh well.

Related

How do I *prevent* "weak" linking of static library symbols in Visual-C++?

As far as my practical tests go, when linking a static library (.lib) into your executable in Visual-C++, if any executable .obj file defines a duplicate symbol to one in the static library, the symbol in the static library will be silently ignored.
Confirm ( Feb 18 '10 at 17:46 Michael Burr):
MSVC used to behave such that if a symbol is defined in a .obj file
and a .lib it would use the one on the .obj file without warning. I
recall that it would also handle the situation where the symbol is
defined in multiple libs it would use the one in the library named
first in the list.
I can't say I've tried this in a while, but I'd be surprised if they
changed this behavior (especially that .obj defined symbols override
symbols in .lib files).
A brief test with VS 2010 RC indicates that the behavior I described is still there.
('Windows Static Library with Default Functions' seems also a confirmation to me)
Now first of all, I would love to be proven wrong, but at least for a regular C++ function this seems to be the way it is.
Second, is there any way to prevent this? I have a function, that when any binary links to the static library containing this function, I would like to confirm that the version from the static library is actually used, and not some leftover or whatever in the other project. (Do note: Fn in question is test_suite* init_unit_test_suite(int argc, char* argv[]), (*) so I cannot practically change it because it is from a third party lib.)
(*): This is the Boost.Test main function that should be supplied by a custom static lib of ours. If any dev creates a Unit Test project -- which are linked to the static lib automatically through a property sheet -- but erroneously also defines the function, the build should break instead of using the dev supplied function.
I think the linker behaves differently if you link against independent obj file, not packaged in a static lib. At the very least you should get some warning/error about symbol duplicity.
When I needed something similar a while ago I too couldn't find it in the MS toolchain, but there are two MS devices that come close and might be handy: __declspec(selectany) and the undocumented #pragma /alternatename. Perhaps linking to an obj file and declaring the symbol as selectany would do the trick? If not, perhaps adding a #pragma comment(linker, "/alternatename:_YourSymbol=_DefaultExeSymbol") in the exe-obj file would do it.

Workaround for when -whole-library is not available

I'm trying to compile on an environment where the -Wl,-whole-library flag is not supported (emscripten). How can I trick to force the compiler to include the exported symbols ? The solution should met as many of these properties as possible :
Could be applied on a single library (I don't want to include unused symbols from other libraries)
Could be automatically generated (for example by fetching the exported symbol table with nm?)
Would work with functions & member functions
I thought about computing a file with something like :
int x = (int)(&func_a)+(int)(&func_b)+...;
But it doesn't work with member functions, which cannot be casted to int (and can be private).
Do you have any idea ?
Ideas:
Use --whole-library flag before linking the lib you want and just
after add -no-whole-library before listing other libs so that only
the one you need To be wholly linked is and try add --export-dynamic flag using a linker that supports it.
Then dig the nm/objdump/exportmap road http://accu.org/index.php/journals/1372 to export/build link info and for using link info http://runtimecompiledcplusplus.blogspot.fr/ for using exported maps and code so that you can mimic the -Wl in your code.

How to detect multiply defined symbols

I have a common scenario:
Some source files for an executable which depend on some standard libraries and some user libraries. All the user libraries are statically linked into the executable whereas the standard libraries are linked dynamically to it.
Problem:
I believe that I have multiply defined symbols in my complete package (executable which already includes the user library code + shared standard libraries). The linker obviously has insight into it, but as I understand the linker won't complain unless it encounters multiple strong named symbols. My fear is that, while I am moving my code from solaris 8/sparc platform to solaris10/sparc platform, some standard unix functions have been implemented in the user libraries which are causing the app to crash at runtime. Note that the app runs fine in solaris 8/sparc platform
I have been facing weird issues which have led me to believe this might be the source
Modifying one variable from one library is changing the value of another variable in another library
Solaris 8-10: host2ip conversion problems
What I need:
Is there a way to easily list all multiply defined symbols?
Is there a way to easily list all multiply defined symbols stemming from user libraries?
Do you guys think the issue #1 might be caused by linking issues, or you feel it might be a sign of some other issue?
Edit1:
Since then I know that on generating map file using ld, it has a section of multiply defined symbol which I am going through to find names that look like standard library call. For people who do not know, the linker will only fail to link if it finds multiple symbols with the same name AND the names are strong names.
You could turn on MAP file generation in the compiler (actually linker) settings and look through the map file for symbols that match the UNIX system functions you are concerned about. You'd probably have to write a script to automate it, but this would be a good starting point. The command line switch is probably -map or something similar, it will depend on which compiler/linker you are using.
The actual problem that was happening is:
The library (let's call it lib1) had an array like below
#define ARRAY_SIZE 1024
SomeStruct* global_array[ARRAY_SIZE];
This array is used by my another library (let's call it lib2) which in turn is used by my application using an extern declaration for it.
While compiling lib2 (or is it the app not sure), we did not define ARRAY_SIZE at all. This somehow caused the compiler of lib2 (or the app) to miscalculate the size of global_array in-turn causing it to allocate the memory for some other variable at a location which was already allocated to the global_array.
By defining ARRAY_SIZE again while compiling my libs and apps, everything starts behaving normal. I do not fully understand what caused the issue and why it gets resolved since extern declaration of arrays do not contain the size. Also, if the library really used the MACRO ARRAY_SIZE, then why wouldn't the compilation fail? Also, there is a possibility, that the name used for the define is a standard name (the actual string was FD_SETSIZE)
My initial gut feeling about the linker was wrong.

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries

C++ shared library shows internal symbols

I have built a shared library (.dll, .so) with VC++2008 and GCC.
The problem is that inside both libs it shows the names of private symbols (classes, functions) and they weren't exported.
I don't want my app to display the name of classes/functions that weren't exported.
Is any way i can do that?
In GCC i did:
Compiled with -fvisibility=hidden and then made public with attribute ((visibility("default")))
In VC++:
__declspec(dllexport)
Thanks!
For GNU tool chains you can use th strip command to remove symbols from object files. It takes various command options to control its behavior. It may do what you want.
You can create a header file to obfuscate the internal function and method names you want to be hidden. Ie something like below (need some include guard too)
#define someFunctionName1 sJkahe28273jwknd
#define someFunctionName2 lSKlajdwe98
#define someMethodName1 ksdKLJLKJl22fss
#define someMethodName2 lsk89hHHuhu7g
...and include this in the header files where the real definitions live.
The private keyword when used for access specification only
effectively works at compile time and is intended as an aid to programmers, not a security feature - as you have found out the "privacy" is implemented
using lexical means .
It's easy to see that this must be so - if you implement two private functions with dependencies between each other in two separate .cpp files, the linker has to find the private names in the resulting object (or library) files.
Bottom line - C++ has no code security features - if you give someone the object code of your program, they will always be able to examine it.