Loading GNU ld script with dlopen - c++

I have an C++14 code that should load an arbitrary shared object file with dlopen. Unfortunately on some systems (e.g. my archlinux, reportedly also applies to some .so on ubuntu and gentoo), these so-files can be "GNU ld scripts" instead of the actual binaries.
For reference, here is the content of my /usr/lib/libm.so:
/* GNU ld script
*/
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /usr/lib/libm.so.6 AS_NEEDED ( /usr/lib/libmvec.so.1 ) )
I have found a couple of code-pieces that deal with this issue in ghc or ruby. I would like to avoid resorting to manually parsing the text file based parsing the dlerror text and the file. I feel that is terribly evil and I won't be able to implement and maintain corner cases of this format.
Is there a clean way to implement handling this case? Frankly I am puzzled as to why dlopen does not actually handle these tranparaently.
Note: Considering the aforementioned patches I think this is not simply an issue with my system configuration / versions. If this should work out-of-the-box with dlopen (bug instead of missing feature), please let me know.

The linker scripts are intended to be used by the linker, not the run-time linker.
The GNU ld script comment should have been a giveaway: this is for ld, not for ld.so. ;-)
See for instance: http://www.math.utah.edu/docs/info/ld_3.html
So I guess using this with dlopen() would mean mimicking/importing part of ld's magic for this, which would confirm your fears about resorting to manually parsing the text and maintaining terribly evil code.
EDIT: There seems to be one thing that can help you though:
https://www.sourceware.org/ml/libc-alpha/2011-07/msg00152.html
<gnu/lib-names.h> should contain a define LIBM_SO which should point you to the correct file that you can actually dlopen().
That means that normally no evil code would be necessary.

Related

Change Linux shared library (.so file) version after it was compiled

I'm compiling Linux libraries (for Android, using NDK's g++, but I bet my question makes sense for any Linux system). When delivering those libraries to partners, I need to mark them with a version number. I must also be able to access the version number programatically (to show it in an "About" dialog or a GetVersion function for instance).
I first compile the libraries with an unversioned flag (version 0.0) and need to change this version to a real one when I'm done testing just before sending it to the partner. I know it would be easier to modify the source and recompile, but we don't want to do that (because we should then test everything again if we recompile the code, we feel like it would be less error prone, see comments to this post and finally because our development environment works this way: we do this process for Windows binaries: we set a 0.0 resources version string (.rc) and we later change it by using verpatch...we'd like to work with the same kind of process when shipping Linux binaries).
What would be the best strategy here?
To summarize, requirements are:
Compile binaries with "unset" version (0.0 or anything else)
Be able to modify this "unset" version to a specific one without having to recompile the binary (ideally, run a 3rd party tool command, as we do with verpatch under Windows)
Be able to have the library code retrieve it's version information at runtime
If your answer is "rename the .so", then please provide a solution for 3.: how to retrieve version name (i.e.: file name) at runtime.
I was thinking of some solutions but have no idea if they could work and how to achieve them.
Have a version variable (one string or 3 int) in the code and have a way to change it in the binary file later? Using a binary sed...?
Have a version variable within a resource and have a way to change it in the binary file later? (as we do for win32/win64)
Use a field of the .so (like SONAME) dedicated to this and have a tool allowing to change it...and make it accessible from C++ code.
Rename the lib + change SONAME (did not find how this can be achieved)...and find a way to retrieve it from C++ code.
...
Note that we use QtCreator to compile the Android .so files, but they may not rely on Qt. So using Qt resources is not an ideal solution.
I am afraid you started to solve your problem from the end. First of all SONAME is provided at link time as a parameter of linker, so in the beginning you need to find a way to get version from source and pass to the linker. One of the possible solutions - use ident utility and supply a version string in your binary, for example:
const char version[] = "$Revision:1.2$"
this string should appear in binary and ident utility will detect it. Or you can parse source file directly with grep or something alike instead. If there is possibility of conflicts put additional marker, that you can use later to detect this string, for example:
const char version[] = "VERSION_1.2_VERSION"
So you detect version number either from source file or from .o file and just pass it to linker. This should work.
As for debug version to have version 0.0 it is easy - just avoid detection when you build debug and just use 0.0 as version unconditionally.
For 3rd party build system I would recommend to use cmake, but this is just my personal preference. Solution can be easily implemented in standard Makefile as well. I am not sure about qmake though.
Discussion with Slava made me realize that any const char* was actually visible in the binary file and could then be easily patched to anything else.
So here is a nice way to fix my own problem:
Create a library with:
a definition of const char version[] = "VERSIONSTRING:00000.00000.00000.00000"; (we need it long enough as we can later safely modify the binary file content but not extend it...)
a GetVersion function that would clean the version variable above (remove VERSIONSTRING: and useless 0). It would return:
0.0 if version is VERSIONSTRING:00000.00000.00000.00000
2.3 if version is VERSIONSTRING:00002.00003.00000.00000
2.3.40 if version is VERSIONSTRING:00002.00003.00040.00000
...
Compile the library, let's name it mylib.so
Load it from a program, ask its version (call GetVersion), it returns 0.0, no surprise
Create a little program (did it in C++, but could be done in Python or any other languauge) that will:
load a whole binary file content in memory (using std::fstream with std::ios_base::binary)
find VERSIONSTRING:00000.00000.00000.00000 in it
confirms it appears once only (to be sure we don't modify something we did not mean to, that's why I prefix the string with VERSIONSTRING, to make it more unic...)
patch it to VERSIONSTRING:00002.00003.00040.00000 if expected binary number is 2.3.40
save the binary file back from patched content
Patch mylib.so using the above tool (requesting version 2.3 for instance)
Run the same program as step 3., it now reports 2.3!
No recompilation nor linking, you patched the binary version!

Reference function of shared library in executable with relocations but without PIC

I am wondering myself if it is possible to build & link a executable using a shared object so that it is not using PIC (therefore PLT) but load-time relocations.
I think if this is possible, the code section has to be re writeable (which should principal be no problem).
If I try with no additional gcc parameters, it uses PIC (usually, to create a PIC shared lib, I have to add -fPIC).
I know that it is possible with data, for that case a R_386_COPY relocation is executed.
So, is this possible for functions? And if, with which gcc parameters?
is this possible for functions?
Sure.
And if, with which gcc parameters?
No version of ld that I know of will do that (as generally this is considered the wrong thing to do). You'll have to build ld from source, and apply a patch to make it do what you want.
the code section has to be re writeable
Correct.
(which should principal be no problem).
Many environments, such as e.g. SELinux prohibit writeable and executable mappings, as such mappings are exceedingly insecure.
So while your binary with writable code section would run in some environments with no problem, it will not run in many others.

How to compile and execute a stand-alone SML-NJ executable

I have seen one other answer link but what I don't understand is what is basis.cm and what's it's use?
You are asking two questions.
What is basis.cm and what's it's use?
This is the Basis library. It allows the use of built-in functions.
How to compile and execute a stand-alone SML-NJ executable
Assuming you followed Jesper Reenberg's tutorial on how to execute a heap image, the next thing you need in order to have SML/NJ produce a stand-alone executable is to convert this heap image. One should hypothetically be able to do this using heap2exec, a tool that takes the heap image, e.g. the .x86-linux file generated on my system, and generates an .asm file that can be assembled and linked.
Unfortunately, this tool is not very well-maintained, so you have to
Go to the smlnj.org page and fix the download-link by removing 'www.' (this page and the SourceForge page don't contain the same explanations or assumptions about argument count, and neither page's download link work).
Download and extract this tool, and fix the 'build' script so it points to your ml-build tool
Fix the tool's argument use by changing [inf, outf] to [_, inf, outf]
Run ./build which generates 'heap2asm.x86-linux' on my system
For example, in order to generate an .asm file for the heap2asm program itself, run
sml #SMLload heap2asm.x86-linux heap2asm.x86-linux heap2asm.s
At this point, I have unfortunately been unable to produce an executable that works. E.g. if you run gcc -c heap2asm.s and ld heap2asm.o, you get a warning of a missing _start label. The resulting executable segfaults even if you rename the existing _sml_heap_image label to _start. That is, it seems that a piece of entry code that the runtime environment normally delivers is missing here.
At this point, discard SML/NJ and use MLton for producing stand-alone binaries.

How to determine in which .SO library is given C function?

I have this problem all the time in Linux programming. As long as all the manuals and almost all the source code for Linux are C-centric, all references to some function needs only some include <something.h> line and the function is accessible from the C/C++ code.
But I am programming in assembly language and know almost nothing about C/C++.
In order to be able to call some function, I have to import it from the corresponding .so library.
How to determine the file name of the library? It often differs from the name of the library itself and is not specified in the manuals.
For example, the name of the XLib is actually libX11.so.6. The name of the XShm extension library seems to be libXext.so.6.
Is there easy way to determine the secret real name of the library, using provided C manuals and references?
This is another not-100%-accurate method that may give you some ideas as to how you can narrow things down a bit. It doesn't exactly fit the question because it uses common linux utilities instead of man files, but it may still be helpful.
Use your distribution's package management software.
For example, on Arch Linux, if you were interested in a function in GLFW/glfw3.h, you could find out who owns that file:
$ pacman -Qo /usr/include/GLFW/glfw3.h
/usr/include/GLFW/glfw3.h is owned by glfw 3.1-1
Find out which .so files are in that package:
$ pacman -Ql glfw | grep 'so$'
glfw /usr/lib/libglfw.so
And, if needed, find the actual file that link points to:
$ readlink -f /usr/lib/libglfw.so
/usr/lib/libglfw.so.3.1
This will depend on your distribution. I believe on Ubuntu/Debian you'd use dpkg-query instead.
Edit: DevSolar points out in a comment that you can use apt-file search <header> and apt-file list <package> instead of dpkg-query -S <header> and dpkg-query -L <package>. apt-file appears to work even for packages that aren't installed (though it seems slower?).
I also noticed that (on my Ubuntu VM at least) that, e.g., libglfw-dev contains the libglfw.so symlink, while libglfw2 contains the actual libglfw.so.2 object.
Once you have a set of .so files, you can check them for whatever function you are interested in:
$ nm -D /usr/lib/libglfw.so | grep "glfwCreateWindow"
0000000000007cd0 T glfwCreateWindow
Note that I pulled this last step from a comment on the previous question and don't fully understand it. Maybe you could even skip the earlier steps and rely on nm and grep alone?
This is not a sure fire way, but it can help in many cases.
Basically, you can usually find the library name at the bottom of the man page.
Eg, man XCreateWindow says libX11 on the last line. Then you look for libX11.so and use nm or readelf to see all exported functions.
Another example, man XShm says libXext at the bottom. And so on.
UPDATE
If the function is in section (2) of the man pages, it's a system call (see man man) and is provided by glibc, which would be libc-2.??.so.
Lastly (thanks Basile), if the function does not mention the library, it is also most likely provided by glibc.
DISCLAIMER: Again this is not a 100% accurate method -- but it should help in most cases.
You can ask gcc to tell you which file it would use for linking like so:
gcc --print-file-name=libX11.so
Sample output:
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/libX11.so
This file will usually be a symlink, so you'll have to pipe it through readlink or realpath to get the actual file. For example:
readlink -f $(gcc --print-file-name=libXext.so)
Sample output:
/usr/lib/x86_64-linux-gnu/libXext.so.6.4.0
As I commented, you could use gcc to link your program, and then it should be able to accept -lX11 ; by using gcc -v instead of gcc you'll find out what is actually linked and how.
However, you have a much more significant issue than finding the lib*.so.*; most C or C++ APIs are described in header files, and these C or C++ header files also contain symbolic constants (like O_RDONLY for open(2)...) or macros (like WIFEXITED in POSIX wait ...) whose value or expansion you should manually find in header files or documentations. (Quite often, such constants are either preprocessor #define-d constants or enum values). Also, some headers -in particular in C++- contains a lot of inline-d functions (or macros)!
A possible way might be to generate some C files to find all these constants, enums, macros, inlined functions..., and/or to customize the GCC compiler (e.g. with MELT ...) to find them.
So my message is that for better or worse, the C language is deeply tied to Linux & POSIX.
You might restrict yourself to use only syscalls(2) from your assembler code. Then you won't use libX11 and you don't need any header or constant (except the ones for syscalls, starting from <asm/unistd.h>).
BTW, in 2015, coding entirely in assembler for performance reasons is a mistake. The compiler is generating better code than you reasonably can (as soon as you have more than a few hundred machine instructions). In practice, you can code in assembler with GCC by using extended asm instructions in your C functions.
Or are you building your own compiler ? Then you should have told so in your question!
Read also the Program Library HowTo & the Linux Assembly HowTo

Emacs, Cedet and semantic

I've configured CEDET for emacs following Alex article (great!!).
Now, the questions:
I've generated GTAGS with Gnu Global in my /usr/include, how can i check if semantic is using GTAGS?
Can I keep my GTAGS in another directory and instruct semantic to use that dir?
In C/C++ sources, completion on include statement (from system
headers) doesn't list all available headers. Ok, this is a stupid
problem.. but makes me think something is not working right
You can use the command:
M-x semantic-c-describe-environment RET
to find out about your include path and CPP macro settings.
To test GNU Global use, you can use:
M-x semanticdb-test-gnu-global RET printf RET
to search for "printf" in in some project. Since your project (perhaps in /home/you/myproject) does not have printf in it, it will fail, but if you opened a file in /usr/include, and did the same command, it will hopefully identify printf.
A more general way to ask about GNU Global is with:
M-x cedet-gnu-global-version-check RET
That all said, the GNU Global support is best in situations where you want to have lots and lots of preparsed files that you access infrequently. Once a header is accessed once (like for printf), then the GNU Global database won't be used anymore, because an equivalent Semantic database will have been created for it. This is necessary because GNU Global does not provide enough information to do smart completion.