I have a very huge executable built on IBM AIX. When I enable function level linking, the size of the task is 2.8GB, whereas when I disable function level linking task size goes up to 3.50GB.
This would most likely mean that there's additional object files that are pulled in which my application doesnt need, right? If so, how can I find the symbols that are removed with function level linking.
I tried to look at nm output on both tasks, but was clueless on what to look for, and what to diff
You need to add -Wl,--print-gc-sections to LDFLAGS.
Related
I've been able to make a fair amount of progress in trying to add Bazel build files to enable the building of the gennorm2 tool in ICU. Here is my work-in-progress PR using the Bazel target //icu4c/source/tools/gennorm2.
I'm currently getting stuck when running bazelisk build //icu4c/source/tools/gennorm2 --verbose_failures --sandbox_debug with these errors.
They reference functions defined in urename.h. As I understand it, urename.h is also used to rename certain functions by appending a suffix with the version number (_68), but I defined a preprocessor constant U_DISABLE_RENAMING to disable that specific behavior. This only had the effect of changing the names of the undefined function names in the error output, but otherwise not changing it (ex: errors now complain of u_errorName instead of u_errorName_68).
The part that puzzles me is why the error output claims that these symbols are not found. As you can see, the target //icu4c/source/tools/gennorm2 depends on //icu4c/source/common:platform, which in turn depends on //icu4c/source/common:headers, which includes the field hdrs = glob(["unicode/*.h", "unicode/*.h",]), which should be matching
/icu4c/source/common/unicode/urename.h.
In case it helps, this is the verbose log output when running make VERBOSE=1 using the current autotools-based configure + make build on a fresh checkout of ICU.
A teammate was able to take a look and help me reason through the errors and ultimately fix them.
The first thing is to acknowledge that it is indeed a linker error, which can see by noticing the error message references the linker program ld.
This is important because we previously spent time in the wrong place by debugging the compile configs as if the problem happened during the compile phase before the linker phase. (But I learned about one way to debug compile problems is taking the raw GCC command given by --verbose_failures --sandbox_debug and replacing -c with -E and changing the argument of -o to a .txt file in /tmp to save the output what the compiler sees for that file after all the includes are recursively inlined). This means that my attempts to solve the problem by specifying preprocessor defines for the compile-phase were misguided.
The project's documentation on dependencies revealed that I had mis-specified a dependency on one of the targets to specify only the headers (//icu4c/source/common:headers) instead of the relevant definitions and headers (//icu4c/source/common:platform).
After doing that, we solved another problem that was small and interesting. The gennorm2 target depends on code to get the current year (ex: for printing out help messages that include the copyright statement with the year range). As an i18n library, ICU has code to get that, somewhere in //icu4c/source/i18n:icu4ci18n. This creates an excessive amount of code dependencies for an isolated use case (and will cause problems for follow-on work), so we replaced the block of code in gennorm2 calling those calendar year fns (ucal_open, ucal_getNow, ucal_setMillis, ucal_get, ucal_close) with the libc date library function to give us the year as a number, and added linkopts = ["-ldl"] to link in the dl date library.
I have a common scenario:
Some source files for an executable which depend on some standard libraries and some user libraries. All the user libraries are statically linked into the executable whereas the standard libraries are linked dynamically to it.
Problem:
I believe that I have multiply defined symbols in my complete package (executable which already includes the user library code + shared standard libraries). The linker obviously has insight into it, but as I understand the linker won't complain unless it encounters multiple strong named symbols. My fear is that, while I am moving my code from solaris 8/sparc platform to solaris10/sparc platform, some standard unix functions have been implemented in the user libraries which are causing the app to crash at runtime. Note that the app runs fine in solaris 8/sparc platform
I have been facing weird issues which have led me to believe this might be the source
Modifying one variable from one library is changing the value of another variable in another library
Solaris 8-10: host2ip conversion problems
What I need:
Is there a way to easily list all multiply defined symbols?
Is there a way to easily list all multiply defined symbols stemming from user libraries?
Do you guys think the issue #1 might be caused by linking issues, or you feel it might be a sign of some other issue?
Edit1:
Since then I know that on generating map file using ld, it has a section of multiply defined symbol which I am going through to find names that look like standard library call. For people who do not know, the linker will only fail to link if it finds multiple symbols with the same name AND the names are strong names.
You could turn on MAP file generation in the compiler (actually linker) settings and look through the map file for symbols that match the UNIX system functions you are concerned about. You'd probably have to write a script to automate it, but this would be a good starting point. The command line switch is probably -map or something similar, it will depend on which compiler/linker you are using.
The actual problem that was happening is:
The library (let's call it lib1) had an array like below
#define ARRAY_SIZE 1024
SomeStruct* global_array[ARRAY_SIZE];
This array is used by my another library (let's call it lib2) which in turn is used by my application using an extern declaration for it.
While compiling lib2 (or is it the app not sure), we did not define ARRAY_SIZE at all. This somehow caused the compiler of lib2 (or the app) to miscalculate the size of global_array in-turn causing it to allocate the memory for some other variable at a location which was already allocated to the global_array.
By defining ARRAY_SIZE again while compiling my libs and apps, everything starts behaving normal. I do not fully understand what caused the issue and why it gets resolved since extern declaration of arrays do not contain the size. Also, if the library really used the MACRO ARRAY_SIZE, then why wouldn't the compilation fail? Also, there is a possibility, that the name used for the define is a standard name (the actual string was FD_SETSIZE)
My initial gut feeling about the linker was wrong.
I'll divide this questions in 3 parts:
I would like to produce a static library and strip off its symbols. (Debug info is already not included)
Similar to the strip command in linux. Can it be done?
Is there an equivalent tool in windows env, to the nm tool in linux?
When creating a static library using VS2008. Is it possible to define a script that will exclude some of the produced .obj files out of the build and out of the static lib?
Can it be dynamic? I mean I'd define a compilation mode in the script and this would result in specific object files being excluded from the build
If anything is visible that you feel should not be, try declaring it with the "static" keyword. This tells the compiler that it is accessible only to the current module.
There are cases where it would be convenient to be able to strip out all but a small number of "exported" public symbols, but it's not really feasible.
A static library is little more than a collection of .obj files. The internal dependencies haven't been resolved yet, and they won't be resolved until link time.
For example, if your .lib consists of foo.obj and bar.obj, and there's a call in foo.obj to a function defined in bar.obj, then that symbol must be available at link time, even if nothing outside of the library should be able to see it.
For that reason, you cannot strip the symbols (with the possible exception of file-scope static symbols). Even class methods that are protected or private (in the C++-sense) will exist in the symbol table, since the enforcement of the visibility is a compile-time issue, not a link-time one.
In contrast, a dynamic library is a standalone binary that has already been linked. References from foo.obj to bar.obj have already been resolved. Thus a DLL can be stripped of symbols except for the ones that must be exported (and even those can be renamed or replaced by ordinals).
If your DLL exposes a simple C API, then you're all set. But if you want to expose a C++ class, you're probably going to end up exporting all of its methods, even the protected and private ones (since inlining in the external application might result in direct calls to private methods).
No, how do you think the users of the static library would link to it without knowing where are the symbols they use defined?
Yes, try the DUMPBIN utility.
Well, yes. You can run the LIB utility with /REMOVE:foo.
That said, I think you are doing something that either is not worth doing or could be done a lot simpler than with removing library members.
I kept finding the names of certain (but not all) static functions in .obj files produced by VS2010. Interestingly, they were visible in my Release .obj files but not the Debug .obj files. I just used cygwin strings to perform the search:
$ strings myObjectFile.obj | grep myStaticFunctionName
I tracked it down to the "Whole Program Optimization = Yes" setting ("/GL"). When I switched this to "No" the function names no longer appear.
Update: As a followup test I opened the "cleansed" myObjectFile.obj in vim and I can still find them (with either :set encoding=utf-8 or :set encoding=latin1). I'm not sure why strings was missing the matches. Oh well.
I am trying to limit the ABI of a shared library using the gcc's fvisibility feature. However I am confused what is the correct way to do it.
My makefile organizes the build process in two stages. At the first step all .cpp files are built to object files using some gcc options. Then all the object files are linked together using another set of gcc and ld options. From what I have read fvisibility is relevant to the second step. However this contradicts with the results I observer. If I add fvisibility=hidden to the compile time options the result is as expected, nm -D reporting a much smaller set of exported symbols. On the contrary if I add it to the link time options it does not seem to affect the build.
While looking for an explanation I have compared the object files produced with and without fvisibility. The difference seems to be in the addresses of the symbols inside the object file. However I am not aware how that difference in addresses carries the message to the linker so that it is able to hide the symbols in one of the cases and expose them in the other.
Could anyone please explain to me that. Thank you for your time.
Compile time, as the visibility is placed in the object (.o) files, and then used by the linker when creating the complete executable/shared object. When using it at link time, but not compile time, it will have no effect, as the visibility in the object files is still default. There's also no need to use it at link time at all I've found.
In the case of how the visibility is stored, the different symbols are probably in different segments, and they get their visibility from the options of the segment.
You may find http://gcc.gnu.org/wiki/Visibility to be helpful
Linux/Gcc/LD - Toolchain.
I would like to remove STL/Boost debug symbols from libraries and executable, for two reasons:
Linking gets very slow for big programs
Debugging jumps into stl/boost code, which is annoying
For 1. incremental linking would be a big improvement, but AFAIK ld does not support incremental linking. There is a workaround "pseudo incremental linking" in an 1999 dr.dobb's journal (not in the web any more, but at archive.org (the idea is to put everything in a dynamic library and all updated object files in an second one that is loaded first) but this is not really a general solution.
For 2. there is a script here, but a) it did not work for me (it did not remove symbols), b) it is very slow as it works at the end of the pipe, while it would be more efficient to remove the symbols earlier.
Obviously, the other debug symbols should stay in place.
GNU strip accepts regex arguments to --strip-symbols=
The STL and boost symbols are name-mangled because of the namespaces they're in. I don't have GCC binutils handy at this moment, but just peek at the name mangling used for namespaces and construct the regex for 'symbols from namespace X' and pass this to --strip-symbols=
As far as I know there's no real option to do what you want in gcc. The main problem being that all the code you want to strip debug symbols for is defined in headers.
Otherwhise it would be possible to build a library separatly, strip that, and link with the stripped version.
But only getting debug symbols from certain parts of a compilation unit, while building and linking (for your desired link time speedup) is not possible in gcc as far as I know.
You probably don't want to strip the debug symbols from the shared libraries, as you may need that at some point.
If you are using GDB or DDD to debug, you may be able to get away with removing the Boost source files from the Source Path so it can't trace into the functions. (Or just don't trace into them, trace over!)
You can remove the option to compile the program with debug symbols, which will speed the link time.
Like the script you link to, you can consult the strip program ("man strip") to remove all or certain symbols.
You may want to use strip.
strip --strip-unneeded --strip-debug libfoo.so
Why don't you just build without debugging in the first place though?
This answer provides some specifics that I needed to make MSalters' answer work for removing STL symbols.
The STL symbol names are mangled. The trick is to find a regular expression that covers these names. I looked these symbols up with GNU's Binutils:
> nm --debug-syms <objectfile>
I basically searched on STL functions, like resize. If this is difficult, the output becomes readable when using the following command:
> nm --debug-syms --demangle <objectfile>
Look up a line number containing an STL function call, then look up it's mangled name on that same line number using the first provided command. This allowed me to see that all STL symbol names began with _ZNSt[0-9]+ or _ZSt[0-9]+, etc.
To allow GNU Strip to remove these symbols I used:
> strip --wildcard \
--strip-symbol='_ZNKSt*' \
--strip-symbol='_ZNSt*' \
--strip-symbol='_ZSt*' \
--strip-symbol='_ZNSa*' \
<objectfile>
I used these commands directly on the compiled/linked binary. I verified the removal of these symbols by comparing the output of nm before and after the removal (I wrote the output to files and used vimdiff). The --wildcard option allows the use of regular expressions. Although I would expect [0-9]* to mean 0 to an infinite amount of numbers, here it actually means 1 number followed by an infinite amount of anything (until the end of the line).
If you are looking to not step into STL code this can be achieved by gdb's skip file command, as done here.
Hope it helps
Which compiler are you using? For example, if I understand your question correctly, this is a trivial matter in MS Visual Studio.