How to explore the functions in .A library file - c++

I created a .A and a corresponding .dll file from mingw. This is to be run on windows. Now I am attempting to interface and use the function in the .A library from a different component and I am getting a linker error. I wanted to make sure if my functions were exported properly. I therefore followed this link and decided to do this
> lib.exe /list libSomeLibrary.a
as a result I get something like this
d001861.o
d001862.o
d001863.o
d001864.o
d001865.o
d001866.o
d001867.o
d001868.o
d001869.o
d001870.o
d001871.o
d001872.o
d001873.o
d001874.o
d001875.o
d001876.o
d001877.o
d001878.o
d001879.o
d001880.o
d001881.o
d001882.o
d001883.o
....
....
Is this correct. Is there a way for me to actually get the names of the functions in the lib file.

Since you are using MinGW, you may use the nm command to list the contents of any libfoo.a archive file; used thus:
nm -A libfoo.a | more
(or better still, if you have the less command, e.g. in MinGW's MSYS shell environment):
nm -A libfoo.a | less
it will list all symbols specified within the archive, grouped and qualified by the name of the embedded object file which provides each one.

A .a library file contains a number of individual object files combined together into a library archive. It is the individual .o object files which actually contain the symbols and code.
You should be able to use the dumpbin tool to examine a library on Windows systems. The /exports option should show exported symbols. I don't have any experience with this tool but the page above implies it should be able to operate directly on windows lib files.
You state you are using MingW which comes with GNU Binutils, including nm and objdump, both of which can be used to examine object files. To use these tools you might find it more convenient to extract the .o files from the .a file, which you can do using the ar command. If you don't do that then it will list all of the information in all of the object files within the archive.
Example (shown on Linux system but these tools should work the same in Windows if using MingW):
Extract corefile.o from libbfd.a
$ ar x /usr/lib64/libbfd.a corefile.o
Examine global (external) symbols in corefile.o
$ nm -g corefile.o
0000000000000000 T bfd_core_file_failing_command
0000000000000030 T bfd_core_file_failing_signal
0000000000000060 T bfd_core_file_pid
U bfd_set_error
0000000000000090 T core_file_matches_executable_p
U filename_cmp
00000000000000d0 T generic_core_file_matches_executable_p
U _GLOBAL_OFFSET_TABLE_
U strrchr

Related

How to find the functions used by a particular library in a C++ file

I'm working with legacy C++ code compiled with g++. The files in question are compiled using a library. My goal is to determine every use of a function or macro from a particular library in each of these files. (In my case, OpenSSL is the library in question, and I'll reference it as such throughout the rest of the post. However, I think my question generically applies to any C library I'd compile against.)
I could conceive of this being easier if OpenSSL were a C++ library using a namespace - I could simply grep on the namespace to find the OpenSSL functions. Since, however, it is a C library, undecorated OpenSSL functions and macros are sprinkled across some the source files and I can't readily tell by scanning the source which functions are from OpenSSL and which are other local functions or functions from other libraries.
Looking through Stack Overflow, I see questions like this for the Windows environment, but I don't see any answers for a Linux environment. Broadening my search, I see references to nm and objdump, but if it's possible to get the details I'm looking for from these tools from an object file, I can't figure out the correct parameters to use.
Thanks in advance for your help!
I don't think there is a simple and quick solution for this, you will have to do some work for this.
There are three ways your software might link with openssl.
Static linking.
Dynamic link with the runtime linker
Manual linking with dlopen.
In all cases, the best solution would be to remove the header files and the openssl library from their location and recompile the code.
If you do not have access to the code you have to use nm or objdump to get the symbols from your executable and cross reference them with the ones in the openssl library. This will not work if you are using dlopen to link the library.
Another option would be to get the openssl library and recompile it with tracing enabled and execute your code with the new library.
The nm tool is used to list all the symbols in an object, regardless if it is a library or an executable. You can make a bash script that cross-references the output of calling nm on the openssl library and on your executable. The way to call this is nm objname. The third column is the one with the symbols.
objdump is a more precise tool that you can use to list all the symbols that are undefined in your. You can use it to list the header of your executable (objdump -h objname), this normally lists all the libraries your executable needs at runtime to run. If openssl is listed here then this means you are linking against it dynamically with the run time linker. You can use objdump -R with openssl to get the symbols in the openssl interface. You can cross-reference this with the symbols listed when calling objdump -r with your executable
A coworker of mine was able to get this information using nm. Here's the procedure we followed:
Get the List of Symbols
As suggested by riodoro1 above, the list of objects from the library used by your code can be obtained by linking without the library (without -lcrypto in my case, for instance). Alternatively, this can be obtained as described below using nm
Run nm on all relevant objects:
find . -name '*.o' -exec nm {} \; > nm.txt
Find undefined symbols referenced by objects and strip symbols:
grep '^ *U' nm.txt > nm2.txt
Remove C++ symbols (mangled names begin with _Z), uniquify those remaining:
grep -v ' _Z' nm2.txt | sort | uniq > nm3.txt
Manually edit nm3.txt, remove symbols not part of openssl, write to nm4.txt.
Use the Preprocessor to Expand Macros
Build the cc files normally, capture output to log file. Isolate the lines that show the commands that compiled lotus source files. Search and replace in the output to produce commands to invoke the preprocessor. Change:
-o .../file.o => -o .../file.i
' -c ' => ' -E '
Run the modified commands to produce preprocessor output.
The preprocessor output contains the full text from all included
header files, followed by the preprocessed C code. Headers are
long and uninteresting so strip them from the output. We'll get
just C code with expanded macros.
bash -c 'for f in `find . -name "*.i"`; do cat "${f}" | perl cat-preproc-without-headers.pl > "${f}"cc; done'
Here's the contents of cat-preproc-without-headers.pl:
#!/usr/bin/perl
# Write lines to stdout if cat != 0
$cat = 0;
while(<>) {
if(/^# [1-9]\d* .*\.cc/) {
$cat = 1;
} elsif(/^# [0-9]/) {
$cat = 0;
} elsif($cat) {
print;
}
}
Conclusion
With the list of symbols and the expanded macros, you now have all the symbols from the library and the places where they are used in the source code.
As per #firebrush suggestion I post my comment as an answer (maybe for posterity).
In order to see where the library functions are used You can remove the library from linking and see what .o files have missing references.

How dynamic linking works, its usage and how and why you would make a dylib

I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.

Creating correct D3D11 library file for GCC

What is the correct way to convert files like d3d11.lib that are provided in the DirectX SDK to the *.a GCC library format? I've tried the common reimp method for converting *.lib files to *.a files, but it doesn't seem to work.
Step one involves creating a definitions file:
bin\reimp -d d3d11.lib
Let's say I want to use the D3D11CreateDevice function that should be provided in this library. If I open the created definitions file everything seems to be OK:
LIBRARY "d3d11.dll"
EXPORTS
(...)
D3D11CreateDevice
D3D11CreateDeviceAndSwapChain
(...)
Next I try to create the *.a file using the definitions file and the original lib file:
bin\dlltool -v -d d3d11.def -l libd3d11.a
This does in fact produce a valid library (and no error messages when dlltool is set to verbose), but if I try to use the function D3D11CreateDevice that should be implemented in it, I get an error:
undefined reference to `D3D11CreateDevice'
If I ask nm what symbol are present in the library (and filter using grep), I get this:
D:\Tools\LIB2A>bin\nm libd3d11.a | grep D3D11CreateDevice
File STDIN:
00000000 I __imp__D3D11CreateDeviceAndSwapChain
00000000 T _D3D11CreateDeviceAndSwapChain
00000000 I __imp__D3D11CreateDevice
00000000 T _D3D11CreateDevice
The imp function is the function that calls the actual implementation of D3D11CreateDevice inside the DLL. However, that actual implementation is now prefixed by an underscore.
Why is "_D3D11CreateDevice" defined while "D3D11CreateDevice" is not even though it is mentioned in the definitions file?
Just do:
copy d3d11.lib libd3d11.a
Alternatively you use X:\path\to\d3d11.lib on the GCC command line instead of -ld3d11. The GNU utilities on Windows use the same PECOFF archive format that Microsoft's tools use.
An outdated version of dlltool will prepend an underscore to every function when converting d3d11lib. Solved it by using a dlltool.exe from MinGW-w64 4.9.2. This dlltool produces a library with the correct function names.
When using the regular d3d11.lib provided by Microsoft in combination with headers provided by anyone, a SIGSEGV will occur when stepping into the library at runtime. This means that you do have to convert to the *.a format for some reason not investigated.

Is there a .def file equivalent on Linux for controlling exported function names in a shared library?

I am building a shared library on Ubuntu 9.10. I want to export only a subset of my functions from the library. On the Windows platform, this would be done using a module definition (.def) file which would contain a list of the external and internal names of the functions exported from the library.
I have the following questions:
How can I restrict the exported functions of a shared library to those I want (i.e. a .def file equivalent)
Using .def files as an example, you can give a function an external name that is different from its internal name (useful for prevent name collisions and also redecorating mangled names etc)
On windows I can use the EXPORT command (IIRC) to check the list of exported functions and addresses, what is the equivalent way to do this on Linux?
The most common way to only make certain symbols visible in a shared object on linux is to pass the -fvisibility=hidden to gcc and then decorate the symbols that you want to be visible with __attribute__((visibility("default"))).
If your looking for an export file like solution you might want to look at the linker option --retain-symbols-file=FILENAME which may do what you are looking for.
I don't know an easy way of exporting a function with a different name from its function name, but it is probably possible with an elf editor. Edit: I think you can use a linker script (have a look at the man page for ld) to assign values to symbols in the link step, hence giving an alternative name to a given function. Note, I haven't ever actually tried this.
To view the visible symbols in a shared object you can use the readelf command. readelf -Ds if I remember correctly.
How can I restrict the exported functions of a shared library to those I want (i.e. a .def file equivalent)
Perhaps you're looking for GNU Export Maps or Symbol Versioning
g++ -shared spaceship.cpp -o libspaceship.so.1
-Wl,-soname=libspaceship.so.1 -Wl,
--version-script=spaceship.expmap
gcc also supports the VC syntax of __declspec(dllexport). See this.
Another option is to use the strip command with this way:
strip --keep-symbol=symbol_to_export1 --keep-symbol=symbol_to_export2 ... \
libtotrip.so -o libout.so

Includes with the Linux GCC Linker

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?
Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.
There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).
Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.
The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.
There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.