Determine the path of a *.dll or *.so at runtime - c++

A C++ library must read a file from disk that is located in the same directory as the library. I don't know if the question is trivial or if it is impossible: How can the library determine its own path on disk? The solution should work for Linux (.so) and Windows (.dll).

Although the task you describe is usually solved in a different fashion, here's the solution for Linux:
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
void foo() {
Dl_info dlInfo;
dladdr(puts, &dlInfo);
if (dlInfo.dli_sname != NULL && dlInfo.dli_saddr != NULL)
printf("puts is loaded from %s\n", dlInfo.dli_fname);
else
printf("It's strange but puts is not found\n");
puts("Hello, world! I'm foo!");
}
Here the source of puts function is determined using dladdr.
For Windows see this recipe
When I told that the task is solved differently I meant the following. In Linux (and oher "typical unix-like system") shared libraries are separated from the rest of "application data" (the former reside in /lib, /lib64, /usr/lib or /usr/lib64, depending on target platform, "importance of a particular library" and other factors, the latter typically go to /usr/share/<appname>). That is the path to library won't help you. Path to application data files is usually configured during compilation, sometimes - via a dedicated setting in application configuration.
For those "third-party" apps which are intended to be installed to /opt/ or other "special locations", like /usr/local the path to application data is calculated from the path of the application, which in turn is often determined from application binary location, which is known right from main() arguments. Certainly your case may require a special layout, but it's useful to keep in mind these considerations.

Thank you for the answers. As I understand now getting the path is overly complicated and not compatible across platforms. It seems to be better to embed the file in the executable.
Edit: I have found
http://www.fourmilab.ch/xd/
which is really easy to use.

Related

Is using an absolute path to load system DLLs with LoadLibrary() and GetModuleHandle() better to prevent DLL hijacking?

In many cases to load some newer API one would use a construct as such:
(FARPROC&)pfnZwQueryVirtualMemory = ::GetProcAddress(::GetModuleHandle(L"ntdll.dll"), "ZwQueryVirtualMemory");
But then, considering a chance of Dll hijacking, is it better to specify a DLL's absolute path, as such. Or is it just an overkill?
WCHAR buff[MAX_PATH];
buff[0] = 0;
if(::GetSystemDirectory(buff, MAX_PATH) &&
::PathAddBackslash(buff) &&
SUCCEEDED(::StringCchCat(buff, MAX_PATH, L"ntdll.dll")))
{
(FARPROC&)pfnZwQueryVirtualMemory = ::GetProcAddress(::GetModuleHandle(buff), "ZwQueryVirtualMemory");
}
else
{
//Something went wrong
pfnZwQueryVirtualMemory = NULL;
}
The problem with the latter method is that it doesn't always work (for instance with Comctl32.dll.)
You don't have to do anything special for ntdll.dll and kernel32.dll because they are going to be loaded before you get the chance to do anything, they are also on the known-dlls list.
The dll hijacking issues often include auxiliary libraries. Take version.dll for example, it is no longer on the known-dlls list so explicitly linking to it is problematic, it needs to be loaded dynamically.
The best solution is a combination of 3 things:
Call SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_SYSTEM32) if it is available (Win8+ and updated Win7).
Call LoadLibrary with the full path (GetSystemDirectory) before calling GetModuleHandle.
Don't explicitly link to anything other than kernel32, user32, gdi32, shlwapi, shell32, ole32, comdlg32 and comctl32.
If SetDefaultDllDirectories is not available then it is really hard to protect yourself if you don't control the application directory because various Windows functions will delay-load dlls like shcore.dll without full paths (especially the shell APIs). SetDllDirectory("") helps against the current/working directory but there is no good application directory workaround for unpatched pre-Win8 systems, you just have to test with Process Monitor and manually load the problematic libraries early in WinMain.
The application directory is a problem because some users just put everything in the downloads folder and run it from there. This means you might end up with a malicious dll in your application directory.

Substitute for call to "system" in my C++ program

I'm trying to find find a substitute for a call to "system" (from stdlib.h) in my C++ program.
So far I've been using it to call g++ in my program to compile and then link a variable number of source files in a directory chosen by the user.
Here I've got an example how the command could approximately look like: "C:/mingw32/bin/g++.exe -L"C:\mingw32\lib" [...]"
However, I have the problem that (at least with the MinGW compiler I'm using) I get the error "Command line is too long" when the command string gets too long.
In my case it was about 12000 characters long. So I probably need another way to call g++.
Additionally, I've read that you generally shouldn't use "system" anyway: http://www.cplusplus.com/forum/articles/11153/
So I'm in need for some substitute (that should also be as platform independent as possible, because I want the program to run on Windows and Linux).
I've found one candidates that would generally look quite well suited:
_execv / execv:
Platform independent, but:
a) http://linux.die.net/man/3/exec says "The exec() family of functions replaces the current process image with a new process image". So do I need to call "fork" first so that the C++ program isn't terminated? Is fork also available on Windows/MSVC?
b) Using "system", I've tested whether the return value was 0 to see if the source file could be compiled. How would this work with exec? If I understand the manpage correctly, will it only return the success of creating the new process and not the status of g++? And with which function could I suspend my program to wait for g++ to finish and get the return value?
All in all, I'm not quite sure how I should handle this. What are your suggestions? How do multiplatform programs like Java (Runtime.getRuntime().exec(command)) or the Eclipse C++ IDE internally solve this? What would you suggest me to do to call g++ in an system independent way - with as many arguments as I want?
EDIT:
Now I'm using the following code - I've only tested it on Windows yet, but at least there it seems to work as expected. Thanks for your idea, jxh!
Maybe I'll look into shortening the commands by using relative paths in the future. Then I would have to find a platform independent way of changing the working directory of the new process.
#ifdef WIN32
int success = spawnv(P_WAIT, sCompiler.c_str(), argv);
#else
pid_t pid;
switch (pid = fork()) {
case -1:
cerr << "Error using fork()" << endl;
return -1;
break;
case 0:
execv(sCompiler.c_str(), argv);
break;
default:
int status;
if (wait(&status) != pid) {
cerr << "Error using wait()" << endl;
return -1;
}
int success = WEXITSTATUS(status);
}
#endif
You might get some traction with some of these command line options if all your files are in (or could be moved to) one (or a small number) of directories. Given your sample path to audio.o, this would reduce your command line by about 90%.
-Ldir
Add directory dir to the list of directories to be searched for `-l'.
From: https://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_3.html#SEC17
-llibrary
Search the library named library when linking.
It makes a difference where in the command you write this option; the linker searches processes libraries and object files in the order they are specified. Thus, foo.o -lz bar.o' searches libraryz' after file foo.o' but beforebar.o'. If bar.o' refers to functions inz', those functions may not be loaded.
The linker searches a standard list of directories for the library, which is actually a file named `liblibrary.a'. The linker then uses this file as if it had been specified precisely by name.
The directories searched include several standard system directories plus any that you specify with `-L'.
Normally the files found this way are library files--archive files whose members are object files. The linker handles an archive file by scanning through it for members which define symbols that have so far been referenced but not defined. But if the file that is found is an ordinary object file, it is linked in the usual fashion. The only difference between using an -l' option and specifying a file name is that-l' surrounds library with lib' and.a' and searches several directories.
From: http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_3.html
Here's another option, perhaps closer to what you need. Try changing the directory before calling system(). For example, here's what happens in Ruby...I'm guessing it would act the same in C++.
> system('pwd')
/Users/dhempy/cm/acts_rateable
=> true
> Dir.chdir('..')
=> 0
> system('pwd')
/Users/dhempy/cm
=> true
If none of the other answers pan out, here's another. You could set an environment variable to be the path to the directory, then use that variable before each file that you link in.
I don't like this approach much, as you have to tinker with the environment, and I don't know if that would actually affect the command line limit. It may be that the limit applies after interpolating the command. But, something to thing about, regardless.

How to get the application running path at runtime in c++ [duplicate]

This question already has answers here:
Finding current executable's path without /proc/self/exe
(14 answers)
Closed 7 years ago.
Is there a way in C/C++ to find the location (full path) of the current executed program?
(The problem with argv[0] is that it does not give the full path.)
To summarize:
On Unixes with /proc really straight and realiable way is to:
readlink("/proc/self/exe", buf, bufsize) (Linux)
readlink("/proc/curproc/file", buf, bufsize) (FreeBSD)
readlink("/proc/self/path/a.out", buf, bufsize) (Solaris)
On Unixes without /proc (i.e. if above fails):
If argv[0] starts with "/" (absolute path) this is the path.
Otherwise if argv[0] contains "/" (relative path) append it to cwd
(assuming it hasn't been changed yet).
Otherwise search directories in $PATH for executable argv[0].
Afterwards it may be reasonable to check whether the executable isn't actually a symlink.
If it is resolve it relative to the symlink directory.
This step is not necessary in /proc method (at least for Linux).
There the proc symlink points directly to executable.
Note that it is up to the calling process to set argv[0] correctly.
It is right most of the times however there are occasions when the calling process cannot be trusted (ex. setuid executable).
On Windows: use GetModuleFileName(NULL, buf, bufsize)
Use GetModuleFileName() function if you are using Windows.
Please note that the following comments are unix-only.
The pedantic answer to this question is that there is no general way to answer this question correctly in all cases. As you've discovered, argv[0] can be set to anything at all by the parent process, and so need have no relation whatsoever to the actual name of the program or its location in the file system.
However, the following heuristic often works:
If argv[0] is an absolute path, assume this is the full path to the executable.
If argv[0] is a relative path, ie, it contains a /, determine the current working directory with getcwd() and then append argv[0] to it.
If argv[0] is a plain word, search $PATH looking for argv[0], and append argv[0] to whatever directory you find it in.
Note that all of these can be circumvented by the process which invoked the program in question. Finally, you can use linux-specific techniques, such as mentioned by emg-2. There are probably equivalent techniques on other operating systems.
Even supposing that the steps above give you a valid path name, you still might not have the path name you actually want (since I suspect that what you actually want to do is find a configuration file somewhere). The presence of hard links means that you can have the following situation:
-- assume /app/bin/foo is the actual program
$ mkdir /some/where/else
$ ln /app/bin/foo /some/where/else/foo # create a hard link to foo
$ /some/where/else/foo
Now, the approach above (including, I suspect, /proc/$pid/exe) will give /some/where/else/foo as the real path to the program. And, in fact, it is a real path to the program, just not the one you wanted. Note that this problem doesn't occur with symbolic links which are much more common in practice than hard links.
In spite of the fact that this approach is in principle unreliable, it works well enough in practice for most purposes.
Not an answer actually, but just a note to keep in mind.
As we could see, the problem of finding the location of running executable is quite tricky and platform-specific in Linux and Unix. One should think twice before doing that.
If you need your executable location for discovering some configuration or resource files, maybe you should follow the Unix way of placing files in the system: put configs to /etc or /usr/local/etc or in current user home directory, and /usr/share is a good place to put your resource files.
In many POSIX systems you could check a simlink located under /proc/PID/exe. Few examples:
# file /proc/*/exe
/proc/1001/exe: symbolic link to /usr/bin/distccd
/proc/1023/exe: symbolic link to /usr/sbin/sendmail.sendmail
/proc/1043/exe: symbolic link to /usr/sbin/crond
Remember that on Unix systems the binary may have been removed since it was started. It's perfectly legal and safe on Unix. Last I checked Windows will not allow you to remove a running binary.
/proc/self/exe will still be readable, but it will not be a working symlink really. It will be... odd.
On Mac OS X, use _NSGetExecutablePath.
See man 3 dyld and this answer to a similar question.
For Linux you can find the /proc/self/exe way of doing things bundled up in a nice library called binreloc, you can find the library at:
http://autopackage.org/docs/binreloc/
I would
1) Use the basename() function: http://linux.die.net/man/3/basename
2) chdir() to that directory
3) Use getpwd() to get the current directory
That way you'll get the directory in a neat, full form, instead of ./ or ../bin/.
Maybe you'll want to save and restore the current directory, if that is important for your program.

How to find "my" lib directory?

I'm developing a C++ program under Linux. I want to put some stuff (to be specific, LLVM bitcode files, but that's not important) in libraries, so I want the following directory structure:
/somewhere/bin/myBin
/somewhere/lib/myLib.bc
How do I find the lib directory? I tried to compute a relative part from argv[0], but if /somewhere is in my PATH, argv[0] will just contain myBin. Is there some way to get this path? Or do I have to set it at compile time?
How do GNU autotools deal with this? What happens exactly if I supply the --prefix option to ./configure?
Edit: The word library is a bit misleading in my case. My library consist of LLVM bitcode, so it's not an actual (shared) object file, just a file I want to open from my program. You can think of it as an image or text file.
maybe what you want is :
/usr/lib
unix directory reference: http://www.comptechdoc.org/os/linux/usersguide/linux_ugfilestruct.html
Assume your lib directory is "../lib" relative to executable
First you need to identify where myBin located, You can get it by reading /proc/self/exe
Then concat your binary file path with "../lib" will give you the lib directory.
You will have to use a compiler flag to tell the program. For example, if you have a plugin dir:
# Makefile.am
AM_CPPFLAGS = -DPLUGIN_DIR=\"${pkglibdir}\"
bin_PROGRAMS = awesome_prog
pkglib_LTLIBRARIES = someplugin.la
The list of directories to be searched is stored in the file /etc/ld.so.conf.
In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated set of directories where libraries should be searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes.
LD_LIBRARY_PATH is handy for development and testing:
$ export LD_LIBRARY_PATH=/path/to/mylib.so
$ ./myprogram
[read more]
Addressing only the portion of the question "how to GNU autotools deal with this?"...
When you assign a --prefix to configure, basically two things happen: 1) it instructs the build system that everything is to be installed in ${prefix}, and 2) it looks in ${prefix}/share/config.site for any additional information about how the system is set up (it is common for that file not to exist.) It does absolutely nothing to help find libraries, but depends on the user having set up the tool chain properly. If you want to use a library in /foo/lib, you must have your toolchain set up to look there (eg, by putting /foo/lib in /etc/ld.so.conf, or by putting -L/foo/lib in LDFLAGS and "/foo/lib" in LD_LIBRARY_PATH)
The configure script relies on you to have the environment set up. It does not help you set up that environment, but does help by alerting you that you have not done so.
You could use the readlink system call on /proc/self/exe to get the path of your executable. You might then use realpath etc.

C++ How to get a filename (and path) of the executing .so module in Unix

C++ How to get a filename (and path) of the executing .so module in Unix?
Something similar to GetModuleFileName on Windows.
Although it is not a POSIX standard interface, the dladdr() function is available on many systems including Linux, Solaris, Darwin/Mac OS X, FreeBSD, HP-UX, and IRIX. This function takes an address, which could be a pointer to a static function within the module for example (if cast to void *), and fills in a Dl_info structure with information including the path name of the shared object containing that address (in the dli_fname member).
Unfortunately, there is no way to do that using UNIX or POSIX. If you need to use it to look up some sort of data, you should use the $PATH environment variable and search for the data in a path that is relative to each entry in $PATH. For example, it is not uncommon to store binaries in "installdir/bin" for some installation directory "installdir" and to store the associated data in "installdir/share/name_of_program" for some installation directory and some program named "name_of_program". If that is the case, then looking at "../share/name_of_program/name_of_resource_file" relative to each entry in getenv("PATH") is a good way of searching for resources. Another thing you could do is allow the necessary information to be provided on the commandline or in some configuration file, and only perform the search if needed as a fallback option.
Edit
Now that you've stated your rationale for this, I would advise you to simply use the QSettings class from Qt for your configuration information, as it uses the preferred native mechanism for each platform (the registry on Windows, a PLIST file on Mac OS X, the Gnome GConf database on Linux). You may want to take a look at my C++ Project Template as it uses Qt to do just this, and it provides simple commandline options to easily tweak the configuration settings ("--prefset", "--prefget", and "--preflist" manipulate QSettings).
That said, if you absolutely must use an XML configuration file of your own instead of using the preferred native mechanism, I strongly advise you to place the system-wide configuration in "installdir/etc" while placing your library in "installdir/lib" for some installation directory "installdir", as that is the typical place for configuration files on UNIX systems, and "installdir/lib" should ONLY be used for library files, not for configuration files and other errata. I suggest you place a user-specific version of the configuration file in "$XDG_CONFIG_HOME" (if it is defined) or in "$HOME/.config" (where "$HOME" is the user's home folder).
When searching for the system-wide configuration file, I would recommend that you search within $XDG_CONFIG_DIRS if it is defined; if it isn't defined, then falling back to "/etc/xdg" or searching for "../etc/name_of_your_program.conf.xml" relative to "$PATH" and possibly also relative to the "$LD_LIBRARY_PATH", "$DYLD_LIBRARY_PATH", "$DYLD_FALLBACK_LIBRARY_PATH"), the contents of "/etc/ld.so.conf" if it exists, and the contents of "/etc/ld.so.conf.d/*.conf" if those files exist, halting your search as soon as you encounter the first valid such configuration file would be a sensible approach.
Credit goes to Roger for pointing out the XDG Basedir Spec and for his excellent constructive criticisms.
Possible solutions:
You can read the /proc/{PID}/mmap file for the list of shared libraries. Where {PID} is the process pid (you can get it using getpid()).
Call the command line tool ldd for the program binary file (stored in argv[0]).
If you write a solution from scratch take a look of ldd commands source code from uClibc how to get the list of shared libs from an elf binary.