I have been working on a cross platform windowing library aimed to be used for OpenGL specifically, currently focusing on linux. I am making use of glload to manage OpenGL extensions, and this is being compiled, along with other libraries that I will use later, into an .so. This `.so is being dynamically loaded as you would expect, but at run time the program gives the following output (manually wrapped so it is easier to read):
_dist/x64-linux-debug/bin/test: Symbol `glXCreateContextAttribsARB' has \
different size in shared object, consider re-linking
Now, obviously I have tried re-linking, going as far as rebuilding the entire project many times (testing things out, not just blindly hoping it will magically make it all better). The program does seem to be willing to run as it will produce some logging output as I would expect it to. I have used nm to confirm that the 'symbol' is in the .so
nm _dist/x64-linux-debug/lib64/libvendor.so | grep glXCreateContextAttribsARB
00000000009e0e78 B glXCreateContextAttribsARB
If I use readelf to look at the symbols being defined I get the following (again, I have manually wrapped the first three lines for formatting sake):
readelf -Ws _dist/x64-linux-debug/bin/test \
_dist/x64-linux-debug/lib64/libvendor.so | \
grep glXCreateContextAttribsARB
348: 000000000062b318 8 OBJECT GLOBAL DEFAULT 26 glXCreateContextAttribsARB
421: 000000000062b318 8 OBJECT GLOBAL DEFAULT 26 glXCreateContextAttribsARB
1370: 00000000009e0e78 8 OBJECT GLOBAL DEFAULT 25 glXCreateContextAttribsARB
17464: 00000000009e0e78 8 OBJECT GLOBAL DEFAULT 25 glXCreateContextAttribsARB
I am afraid that this is about all I can offer to help, as I really do not know what to try or look into. Like I said, I am sure more will info will be need, so please just say an I will provide what I can. I am running these commands from my project root, encase you are wondering.
wilsonmichaelpatrick's answer is mostly correct, but using gdb is likely not the fastest way to find the problem, and will likely not work at all if you have a non-debug build.
First, you should confirm that there in fact is a problem:
readelf -Ws _dist/x64-linux-debug/bin/test _dist/x64-linux-debug/lib64/libvendor.so |
grep glXCreateContextAttribsARB
This should show the symbol being defined in test and libvendor.so, with different size.
Second, re-link test and libvendor.so with -Wl,-y,glXCreateContextAttribsARB flag. That will tell you which object files (or libraries) provide the (different) definitions.
Finally, preprocess the sources that produce above object files with -E and -dD flags, and see what's different between them.
Update:
I need help digesting what it is saying
Don't be helpless. Read man readelf, or just run it by hand. You'll see something like this:
readelf -Ws /bin/date | head -5
Symbol table '.dynsym' contains 75 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __ctype_toupper_loc#GLIBC_2.3 (2)
This tells you the meaning of the data you've got. In particular, this tells you that the size of the symbol in test and in libvendor.so is the same (8). Therefore, the problem is not in these two ELF files, but somewhere else. Run readelf on your other libraries, and look for definition of glXCreateContextAttribsARB that has a different size. Then follow the rest of the procedure.
The runtime is noticing that glXCreateContextAttribsARB as compiled in the shared object, and glXCreateContextAttribsARB as compiled in the main program (or maybe even some other shared object previously linked) have different sizes. This means that, in the separate builds for the shared object and whatever else references that object, they must be looking at different code (probably in a shared object) where this is defined. Sometimes this occurs because they are looking at different files, sometimes this occurs because of different #defines causing different interpretations of the same file. Whatever the reason, you absolutely need to make sure that the same symbol (e.g. a structure) is defined the same way (i.e. with the same member variables and size) across everything that is linked together at runtime.
It's actually a very good thing that it is refusing to run, as this is a catastrophe when two parts of the code interpret the same bit of memory in different ways at runtime. (Not too much of an exaggeration to say anything could happen if this was allowed to proceed.)
You might want to try just loading up the executable in gdb (without running it) and typing
info types
to see where it is defined, and then load the shared object in gdb (without running it) and doing another info types there to see what each of them thinks it's looking at. If it's the same thing, check the preprocessor directives.
I have faced a tedious issue related to objects of different sizes so I want to share my experience - even though it is clear to me that it is only one reason that might explain different object sizes - and not mandatorily the OP's.
The symptoms were objects of different sizes in debug mode, none in release mode. The linker produced the according warnings. The symbol names were hard to decipher but related to some unnamed static variables in instances of class templates.
The reason was the debug logging feature à la LOG("Do something.");. The LOG macro used the C ANSI macro __FILE__ which expanded to another path depending on whether the header was included by the application or by the shared library. And this string was exactly the aforementioned unnamed static variable.
Even more tedious was the fact that due to our make environment the __FILE__ macro sometimes expanded to, let's say, C:\temp\file.h and sometimes to C:\other\..\temp\file.h so that building the application and the library from the same place didn't solve the problem either.
I hope this piece of experience might spare some time to some of you.
In most cases you're probably just linking against the wrong library (a different version). For example, you have libfoo installed twice and link your executable with -L /path/to/version1 -lfoo but during runtime you link with /path/to/version2 (you can see this one with ldd yourprogram).
One reason could be that the executable was linked with -rpath,/path/to/version1 but (as recent versions do) this set the RUNPATH entry in the dynamic section; while you have LD_LIBRARY_PATH=/path/to/version2. When RUNPATH is set, LD_LIBRARY_PATH gets precedence. In this case delete the library from /path/to/version2 (or remove that path from LD_LIBRARY_PATH).
EXAMPLE
$ minimal
/home/carlo/minimal: Symbol `_ZN6libcwd8libcw_doE' has different size in shared object, consider re-linking
COREDUMP : /home/carlo/projects/libcwd/libcwd/elfxx.cc:2381: void libcwd::elfxx::objfile_ct::load_dwarf(): Assertion `size == sizeof(address)' failed.
(libcwd is smart enough to see it too; aka the problem here is with libcwd):
$ ldd minimal | grep libcwd_r
libcwd_r.so.5 => /usr/local/install/6.0.0-1ubuntu2/lib/libcwd_r.so.5 (0x00007f0b69840000)
$ echo $LD_LIBRARY_PATH
/usr/local/install/6.0.0-1ubuntu2/lib
$ objdump -a -x minimal | grep PATH
RUNPATH /opt/gitache/libcwd_r/888f62c44fd64f1486176bf9e35b36f79612790017c31f95e117fc59743a54ca/lib
Unsetting LD_LIBRARY_PATH or removing libcwd from that path results in
$ unset LD_LIBRARY_PATH
$ ldd minimal | grep libcwd_r
libcwd_r.so.5 => /opt/gitache/libcwd_r/888f62c44fd64f1486176bf9e35b36f79612790017c31f95e117fc59743a54ca/lib/libcwd_r.so.5 (0x00007f11d7298000)
and things work again. Or alternatively I could add to my CMakeLists.txt of the project:
$ set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--disable-new-dtags")
After which we get,
$ objdump -a -x minimal | grep PATH
RPATH /opt/gitache/libcwd_r/888f62c44fd64f1486176bf9e35b36f79612790017c31f95e117fc59743a54ca/lib
which now has precedence over LD_LIBRARY_PATH and therefore also solves the issue. This is not the recommended way however: if you set LD_LIBRARY_PATH you should know what you are doing. If that doesn't work, you should fix LD_LIBRARY_PATH or remove the offending library.
Related
I seem to somehow have an impossible situation, that I can only assume means my analysis is somehow wrong, since the following all seem to be true:
The executable runs, so it must have all dependent functions provided
The executable depends on a function I'll call Foo::Bar::_ex
This function is not defined in any .a or .so file in the entire filesystem
One of the dependent libraries requires this undefined function
I cannot link the code into an executable because I can't find any library that provides this function
I can see the requirement of this function by the application by running ldd on the app, and seeing that it requires a library I'll call libExample.so. I can see by running objdump -T on the .so file that it requires the mystery function:
ldd APP
libExample.so => /path/to/libExample.so
objdump -T /path/to/libExample.so | c++filt | grep Foo::Bar::_ex
00000000 D *UND* 00000000 Foo::Bar::_ex
For every /path/to/libWhatever.a I collected the library path and output of objdump -t /path/to/libWhatever.a | c++filt into ~/adump.txt. Similarly, I collected the path of every .so file and output of objdump -T /path/to/libWhatever.so | c++filt into ~/sodump.txt.
When I grep adump.txt for Foo::Bar::_ex, I get only entries like the following:
00000000 *UND* 00000000 Foo::Bar::_ex
00000108 g O .data.rel.local 00000004 Foo::Bar::_ex
When I grep sodump.txt for Foo::Bar::_ex, I get only entries like the following:
00000000 D *UND* 00000000 Foo::Bar::_ex
004f9bc4 g DO .data 000000004 Base Foo::Bar::_ex
00000000009ff5f8 g DO .data 0000000000000008 Base Foo::Bar::_ex
I understand from the objdump man page that DF means defining a function, and DO means defining an object, and that if I could find a DF entry for Foo::Bar::_ex in some library, my problem would be solved, just use that library in the link command.
I don't understand what "Object" means in objdump terms - it obviously isn't function code or a runtime object, so what is it?
How does the app run without complaint about a missing function, when none of the libraries provide anything acceptable to the linker?
I think I found out my real problem today, and it isn't what I thought. I have one shared library that is somehow buggered in a way where it only works if you pass the path to it on the command line instead of using -L and -l.
In other words, just for this one library, g++ -L /path/to/lib/dir -l libName.so does not work. The linker says it cannot find any of the functions in it, which clearly exist. It doesn't complain about the file not being found, it just can't find the functions.
If I use g++ /path/to/libName.so, now it is happy, and links the app with the specific path given. As long as that path can be loaded at runtime, it works.
So the dorky process I use is to copy the lib to the current dir, give just the name of the library to g++, then remove the copy. The exe is then able to find the library in the usual way at runtime.
Go figure.
Well today I was going through the "fun" exercise of trying to pull in all the dependencies (read: lost of swearing, just not loud enough for anyone else to hear), and was getting frustrated with linker errors about can't find a compatible libstdc++ implementation.
So I rolled back the set of OS packages I was installing in my docker container thru a Dockerfile, and .... it worked! Suddenly linking succeeded.
As far as I can tell, I started bringing in some unrelated OS packages that provide libraries whose names are similar to totally unrelated libraries that were sitting in a directory (god only knows where they came from).
Linking to the unrelated OS packages starts asking for other OS stuff, and things quickly go off the rails from there. Once I linked just to the provided libraries of unknown origin, my problem went away :)
Thanks for your answers, lesson learned!
Say that I have three libraries: libMissingSymbol.so, libMiddle.so, and libSymbolHaver.so. libMissingSymbol contains a symbol defined in libSymbolHaver, but only has a dependency on libMiddle. libMiddle is supposed to have a dependency on libSymbolHaver, but it doesn't. I don't have the source code or unlinked object files that these libraries were assembled from. Is it possible for me to link libMiddle with libSymbolHaver so that libMissingSymbol can find the symbol it needs at load time? Is there any way that I can fix this using only these three shared object files and any necessary tools? I have to end up with libraries with the same contents (including SONAMEs) barring the dependency change to libMiddle in order to not break things further down the line in my project.
Hypothetical readelf output (trimmed for relevance) to clarify:
$ readelf -s libMissingSymbol.so
123: 00000000 0 OBJECT GLOBAL DEFAULT UND MangledSymbol
$ readelf -d libMissingSymbol.so
Dynamic section at offset 0x42434 contains 37 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libMiddle.so]
0x0000000e (SONAME) Library soname: [libMissingSymbol.so]
$ readelf -d libMiddle.so
Dynamic section at offset 0x75b28 contains 29 entries:
Tag Type Name/Value
0x0000000e (SONAME) Library soname: [libMiddle.so]
$ readelf -s libSymbolHaver.so
35: 00064d0c 4 OBJECT GLOBAL DEFAULT 22 MangledSymbol
Is it possible for me to link libMiddle with libSymbolHaver so that libMissingSymbol can find the symbol it needs at load time?
No: all UNIX linkers, except the AIX one, consider .so the final link product and no further modification is possible.
Update:
viability of doing this a different way (e.g. decompiling libMiddle and rebuilding it with the correct dependencies)?
I don't believe that is viable either -- it is really hard to modify a fully-linked ELF file and not violate myriad of internal consistency constraints.
I suggest the following approach, which is very likely to just work(TM).
Abandon your "only using these three libraries" restriction. It appears to be artificial and unnecessary.
Copy libMiddle.so -> libZiddle.so (be sure to make a copy of the original libMiddle.so somewhere else in case things go wrong).
binary-patch the SONAME in libZiddle.so to match the new name. The string "libMiddle.so" is in the .dynstr section of the library, and (I believe) is not hashed in any way, so changing one letter in it will not introduce any self-inconsistencies into the new library.
Once you've done this, compare readelf -a libMiddle.so and readelf -a libZiddle.so, the SONAME should be the only difference.
Remove libMiddle.so.
Link a new libMiddle.so containing some_unused_function(), and having dynamic dependency on both libZiddle.so and libSymbolHaver.so.
Now any binary that currently links against libMiddle.so and fails with missing symbol (e.g. libMissingSymbol.so) will find the new (empty) libMiddle.so, but because the new libMiddle.so requires both libZiddle.so (where most of the symbols are) and libSymbolHaver.so, it should just work.
I have this problem all the time in Linux programming. As long as all the manuals and almost all the source code for Linux are C-centric, all references to some function needs only some include <something.h> line and the function is accessible from the C/C++ code.
But I am programming in assembly language and know almost nothing about C/C++.
In order to be able to call some function, I have to import it from the corresponding .so library.
How to determine the file name of the library? It often differs from the name of the library itself and is not specified in the manuals.
For example, the name of the XLib is actually libX11.so.6. The name of the XShm extension library seems to be libXext.so.6.
Is there easy way to determine the secret real name of the library, using provided C manuals and references?
This is another not-100%-accurate method that may give you some ideas as to how you can narrow things down a bit. It doesn't exactly fit the question because it uses common linux utilities instead of man files, but it may still be helpful.
Use your distribution's package management software.
For example, on Arch Linux, if you were interested in a function in GLFW/glfw3.h, you could find out who owns that file:
$ pacman -Qo /usr/include/GLFW/glfw3.h
/usr/include/GLFW/glfw3.h is owned by glfw 3.1-1
Find out which .so files are in that package:
$ pacman -Ql glfw | grep 'so$'
glfw /usr/lib/libglfw.so
And, if needed, find the actual file that link points to:
$ readlink -f /usr/lib/libglfw.so
/usr/lib/libglfw.so.3.1
This will depend on your distribution. I believe on Ubuntu/Debian you'd use dpkg-query instead.
Edit: DevSolar points out in a comment that you can use apt-file search <header> and apt-file list <package> instead of dpkg-query -S <header> and dpkg-query -L <package>. apt-file appears to work even for packages that aren't installed (though it seems slower?).
I also noticed that (on my Ubuntu VM at least) that, e.g., libglfw-dev contains the libglfw.so symlink, while libglfw2 contains the actual libglfw.so.2 object.
Once you have a set of .so files, you can check them for whatever function you are interested in:
$ nm -D /usr/lib/libglfw.so | grep "glfwCreateWindow"
0000000000007cd0 T glfwCreateWindow
Note that I pulled this last step from a comment on the previous question and don't fully understand it. Maybe you could even skip the earlier steps and rely on nm and grep alone?
This is not a sure fire way, but it can help in many cases.
Basically, you can usually find the library name at the bottom of the man page.
Eg, man XCreateWindow says libX11 on the last line. Then you look for libX11.so and use nm or readelf to see all exported functions.
Another example, man XShm says libXext at the bottom. And so on.
UPDATE
If the function is in section (2) of the man pages, it's a system call (see man man) and is provided by glibc, which would be libc-2.??.so.
Lastly (thanks Basile), if the function does not mention the library, it is also most likely provided by glibc.
DISCLAIMER: Again this is not a 100% accurate method -- but it should help in most cases.
You can ask gcc to tell you which file it would use for linking like so:
gcc --print-file-name=libX11.so
Sample output:
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/libX11.so
This file will usually be a symlink, so you'll have to pipe it through readlink or realpath to get the actual file. For example:
readlink -f $(gcc --print-file-name=libXext.so)
Sample output:
/usr/lib/x86_64-linux-gnu/libXext.so.6.4.0
As I commented, you could use gcc to link your program, and then it should be able to accept -lX11 ; by using gcc -v instead of gcc you'll find out what is actually linked and how.
However, you have a much more significant issue than finding the lib*.so.*; most C or C++ APIs are described in header files, and these C or C++ header files also contain symbolic constants (like O_RDONLY for open(2)...) or macros (like WIFEXITED in POSIX wait ...) whose value or expansion you should manually find in header files or documentations. (Quite often, such constants are either preprocessor #define-d constants or enum values). Also, some headers -in particular in C++- contains a lot of inline-d functions (or macros)!
A possible way might be to generate some C files to find all these constants, enums, macros, inlined functions..., and/or to customize the GCC compiler (e.g. with MELT ...) to find them.
So my message is that for better or worse, the C language is deeply tied to Linux & POSIX.
You might restrict yourself to use only syscalls(2) from your assembler code. Then you won't use libX11 and you don't need any header or constant (except the ones for syscalls, starting from <asm/unistd.h>).
BTW, in 2015, coding entirely in assembler for performance reasons is a mistake. The compiler is generating better code than you reasonably can (as soon as you have more than a few hundred machine instructions). In practice, you can code in assembler with GCC by using extended asm instructions in your C functions.
Or are you building your own compiler ? Then you should have told so in your question!
Read also the Program Library HowTo & the Linux Assembly HowTo
List items 1- 4 are the steps that I did.
List item 5 describes the problem
List item 6 provides additional information
I have compiled a C source code say c1.c with -g flag.
I have also a
dynamic shared library say liba1.so built with -g for all the source
files that it has.
I built the executable say exe1 by linking c1.o (c1.c object code) with the liba1.so .
I do gdb exe1. and am able to step through the sources of c1.c. When c1 calls the shared library, I am also able to put a breakpoint on a function in the shared library.
However, when I try to step through the function, it says that "Single stepping until exit from function foo1 ,which has no line number information" Also it should ordinarily show the value of the parameters passed into the function foo1 but does not do that. This happens for all functions in the shared library including some very big ones so the values cannot be optimized out
I did an objdump -t on the shared library AND the executable - it shows the symbol table (the fact that I can set a breakpoint on the function also supports this). Also, I can see the values of the variables used in the file c1.c So what should I do in order to ensure that I can see the values of the local variables inside the shared library. Here are the other arguments that are being used to compile the shared library "-O2 -std=gnu99 -Werror -fno-stack-protector -Wstack-protector --param ssp-buffer-size=1 -g -nostdinc". doing info f and trying to look at memory addresses on the frame also does not give any information.
I am looking for some suggestion to at least troubleshoot it. Can I know using objdump (or any other utility) if a shared library has line number information.
I am looking for some suggestion to at least troubleshoot it.
The most likely reason for no line number information, is that there is in fact no line number information, and the most likely reason for that is that you have two copies of liba1.so -- one that has debug info, and one that doesn't, and you are loading at runtime the latter.
First step: (gdb) info shared will tell you exactly which liba1.so is loaded.
If it is in fact the version that you've just built with -g, you should verify that it does have the debug info you are expecting. The exact commands for doing so are platform specific (and you didn't tell which platform you are on). On an ELF platform, objdump -g liba1.so or readelf -w liba1.so should work.
One common reason for -g code to not have debug info is presence of -s (strip) flag on the link line; make sure you don't have "stray" flags on your link line. Some platforms also require -g to be used at link time in addition to compile time.
We have a large set of C++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile an executable using these libraries and deploy the binary on the front-end. It would be extremely useful to be able to identify that binary. Ideally what we would like to have is a small script that would retrieve the following information directly from the binary:
$ident binary
$binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx...
$ dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx...
So it should display all the information for the binary itself and for all of its dependencies.
Currently our approach is:
During compilation for each product we generate Manifest.h and Manifest.cpp and then inject Manifest.o into binary
ident script parses target binary, finds generated stuff there and prints this information
However this approach is not always reliable for different versions of gcc..
I would like to ask SO community - is there better approach to solve this problem?
Thanks for any advice
One of the catches with storing data in source code (your Manifest.h and .cpp) is the size limit for literal data, which is dependent on the compiler.
My suggestion is to use ld. It allows you to store arbitrary binary data in your ELF file (so does objcopy). If you prefer to write your own solution, have a look at libbfd.
Let us say we have a hello.cpp containing the usual C++ "Hello world" example. Now we have the following make file (GNUmakefile):
hello: hello.o hello.om
$(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $#
%.om: %.manifest
ld -b binary -o $# $<
%.manifest:
echo "$#" > $#
What I'm doing here is to separate out the linking stage, because I want the manifest (after conversion to ELF object format) linked into the binary as well. Since I am using suffix rules this is one way to go, others are certainly possible, including a better naming scheme for the manifests where they also end up as .o files and GNU make can figure out how to create those. Here I'm being explicit about the recipe. So we have .om files, which are the manifests (arbitrary binary data), created from .manifest files. The recipe states to convert the binary input into an ELF object. The recipe for creating the .manifest itself simply pipes a string into the file.
Obviously the tricky part in your case isn't storing the manifest data, but rather generating it. And frankly I know too little about your build system to even attempt to suggest a recipe for the .manifest generation.
Whatever you throw into your .manifest file should probably be some structured text that can be interpreted by the script you mention or that can even be output by the binary itself if you implement a command line switch (and disregard .so files and .so files hacked into behaving like ordinary executables when run from the shell).
The above make file doesn't take into account the dependencies - or rather it doesn't help you create the dependency list in any way. You can probably coerce GNU make into helping you with that if you express your dependencies clearly for each goal (i.e. the static libraries etc). But it may not be worth it to take that route ...
Also look at:
C/C++ with GCC: Statically add resource files to executable/library and
Is there a Linux equivalent of Windows' "resource files"?
If you want particular names for the symbols generated from the data (in your case the manifest), you need to use a slightly different route and use the method described by John Ripley here.
How to access the symbols? Easy. Declare them as external (C linkage!) data and then use them:
#include <cstdio>
extern "C" char _binary_hello_manifest_start;
extern "C" char _binary_hello_manifest_end;
int main(int argc, char** argv)
{
const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start;
printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start);
}
The symbols are the exact characters/bytes. You could also declare them as char[], but it would result in problems down the road. E.g. for the printf call.
The reason I am calculating the size myself is because a.) I don't know whether the buffer is guaranteed to be zero-terminated and b.) I didn't find any documentation on interfacing with the *_size variable.
Side-note: the * in the format string tells printf that it should read the length of the string from the argument and then pick the next argument as the string to print out.
You can insert any data you like into a .comment section in your output binary. You can do this with the linker after the fact, but it's probably easier to place it in your C++ code like this:
asm (".section .comment.manifest\n\t"
".string \"hello, this is a comment\"\n\t"
".section .text");
int main() {
....
The asm statement should go outside any function, in this instance. This should work as long as your compiler puts normal functions in the .text section. If it doesn't then you should make the obvious substitution.
The linker should gather all the .comment.manifest sections into one blob in the final binary. You can extract them from any .o or executable with this:
objdump -j .comment.manfest -s example.o
Have you thought about using standard packaging system of your distro? In our company we have thousands of packages and hundreds of them are automatically deployed every day.
We are using debian packages that contain all the neccessary information:
Full changelog that includes:
authors;
versions;
short descriptions and timestamps of changes.
Dependency information:
a list of all packages that must be installed for the current one to work correctly.
Installation scripts that set up environment for a package.
I think you may not need to create manifests in your own way as soon as ready solution already exists. You can have a look at debian package HowTo here.