What is the expected relation of C++ modules and dynamic linkage? - c++

The C++ modules TS provides an excellent facility for eliminating the preprocessor, improving compile times, and generally supporting much more robust, modular, code development in C++, for non-template code at least.
The underlying machinery provides control over import and export of symbols in ordinary programs.
However, there is a major problem developing libraries for two kinds of dynamic loading: startup time loading, and run time loading. This problem involves the exporting of symbols from the library, which is often discussed in terms of visibility.
Generally, not all the extern symbols of the translation units used to construct a dynamic link library should be made visible to the user. In addition, with run time loading, especially with a plugin concept, the same symbol must be exported from many concurrently loaded libraries.
On Windows the use of language extensions
__declspec(dllexport)
__declspec(dllimport)
attached in source code as attributes of symbols, and more recently on gcc and clang systems on unix platforms, the use of
__attribute__((visibility("default")))
__attribute__((visibility("hidden")))
are intended to support the provision and use of symbols intended to be made public by the library. Using these is complicated and messy: on Windows macros must be used to export the symbols whilst compiling the library, but import them when using it. On the unix platforms, the visibility must be set to default to both export and import the symbols, the compiler deciding itself, based on whether a definition is found or not: the compiler must be invoked with
-fvisibility=hidden
switch. The export/import attributes are not required for static linkage, and probably should be macro'd out to an empty string. Making code and fiddling the build system so that this all works, especially considering that #includes must have the correct symbol visibility set during compilation of library translation units is very hard, the file structure required in repositories is a mess, the source code is littered with macros, and in general .. the whole thing is a disaster. Almost all open source repositories FAIL to correctly export symbols for dynamic linkage, and most programmers have no idea that dynamic library code structure (using two level namespaces) is quite different to static linkage.
An example of how to do it (hopefully correctly) can be seen here:
https://github.com/calccrypto/uint256_t
This repository used to have 2 headers and 2 implementation files, the user of the built library would see 2 headers. There are now 7 headers and 2 implementation files and the user of the built library will see 5 header files (3 with extension include to indicate they're not to be directly included).
So after that long winded explanation, the question is: will the final C++ modules specification help to solve problems with export and import of symbols for dynamic linkage? Can we expect to be able to develop for shared libraries without polluting our code with vendor specific extensions and macros?

Modules don't help you with symbol visibility across DLL boundaries. We can check this with a quick experiment.
// A.ixx
export module A;
export int f() { return 1; }
Here we have a simple module interface file exporting one symbol f in the module interface of module A (happens to share the same name as the file base name, but this isn't necessary). Let's compile this like so:
cl /c /std:c++20 /interface /TP A.ixx
The /c flag avoids invoking the linker (happens automatically by default), c++20 or later is required for module syntax to work, and the /interface flag lets the compiler know we are compiling a module interface unit. The /TP arg says "treat the source input as a C++ input" and is needed when /interface is specified. Finally, we have our input file.
Running the above produces an interface file A.ifc and an object file A.obj. Note that there is no import lib file or exp file you would expect if you were compiling a DLL.
Next, let's write a file that consumes this.
// main.cpp
import A;
int main() { return f(); }
To compile this into an executable, we can use the following command:
cl /std:c++20 main.cpp A.obj
The presence of the A.obj input there is not optional. Without it, we get a classic linker error of f being an unresolved symbol. If we run this, we'll get a main.exe which statically links the code in A.obj.
What happens if we try and compile A.ixx into a DLL? That is, what if we try to produce a DLL with the linker from A.obj? The answer you get a DLL but no import lib or exp. If you try running link /noentry /dll A.obj /out:A.dll you will get an A.dll with the expected /disasm section (visible via dump bin), but no export table.
Dump of file A.dll
File Type: DLL
0000000180001000: B8 01 00 00 00 mov eax,1
0000000180001005: C3 ret
Summary
1000 .rdata
1000 .text
That's the disassembly in A.dll which we expect, but checking the exports section with dumpbin /export A.dll reveals nothing. The reason is of course, we didn't export the symbol!
If we change the source code of A.ixx to the following:
// A.ixx
export module A;
export __declspec(export) int f() { return 1; }
... we can repeat the steps (compile A.obj, link A.dll) to find that this time, the linker produces an import lib and exp file as we'd expect. Invoking dumpbin /exports A.lib on the import lib generated should show the ?f##YAHXZ symbol present.
Now, we can link main.cpp again A.lib (as opposed to A.obj) via cl /std:c++20 main.cpp A.lib to produce a valid executable, this time relying on A.dll for the code instead of having f statically embedded.
We can check that this is in fact happening as expected in WinDbg.
Note on the lower left module pane the presence of A.dll. Note also that in the disassembly view in the center, we are about to call main!f. Uh oh, not good. While this does properly resolve to the !A module, it does so via an extra indirection in the import address table as seen here:
This is the classic problem that happens when you forget to decorate a function or symbol with the __declspec(dllimport) directive. When the compiler encounters the symbol without the dllimport directive that it doesn't recognize, it emits a relocation entry which is expected to be resolved at link time. Along with that entry, it emits a jmp and an unresolved address. This is a classic problem that I won't get into here, but the upshot is that we have an extra unnecessary indirection because the symbol recognized as exported from the module A was expected to be statically linked.
It turns out, we can't fix this easily. If we try to add another declaration of f to main.cpp or some other translation unit, the linker will complain that it sees f with "inconsistent dll linkage." The only way to resolve this is to compile a second version of the A module interface with dllimport decorations (much like how headers typically have macros that expand to dllexport or dllimport depending on the TU using the header).
The moral of the story is that DLL linkage and module linkage, while not completely at odds, aren't particularly compatible either. The module export does not include exported symbols in the export table, needed to resolve symbols across DLL boundaries. Furthermore, putting these symbols in the export table still leaves you the trouble of an extra indirection after the implicit dynamic link is done via the import address table.

Related

C++ - Visual Studio tries to link against symbols with #[num], but compiles symbols without that suffix

I'm trying to link against ZLib, which has been built by my solution with the same
respective configuration type as my project (Debug|Win32). When I build my main project, I get these unresolved symbols:
__imp__compress#16
__imp__compressBound#4
__imp__uncompress#16
If I had to guess, I would say that the #[num] is the number of bytes for the function arguments, as that seems to line up nicely. (I have tried to search what the technical term for that suffix is, in hopes to find out how I can force the compiler to export with it, but with no luck.)
Inspecting the symbols of my zlibd.lib, I can see all three of those symbols, except they don't have the suffix at all, nor do any other symbols in the lib.
6.zlibd.dll __imp__compress
6.zlibd.dll _compress
7.zlibd.dll __imp__compress2
7.zlibd.dll _compress2
8.zlibd.dll __imp__compressBound
8.zlibd.dll _compressBound
9.zlibd.dll __imp__crc32
9.zlibd.dll _crc32
Am I missing an option in Visual Studio to export them in that manner?
Also, I know for sure that the linker can see my zlibd.lib, as it shows up in every search and even says at the end that it is unused.
...
Searching C:\<path-to-lib>\zlibd.lib:
...
Unused libraries:
...
C:\<path-to-lib>\zlibd.lib
If I inspect kernel32.lib, for example, I see the # in the symbols:
10.KERNEL32.dll __imp__AddConsoleAliasA#12
11.KERNEL32.dll _AddConsoleAliasW#12
11.KERNEL32.dll __imp__AddConsoleAliasW#12
12.KERNEL32.dll _AddDllDirectory#4
12.KERNEL32.dll __imp__AddDllDirectory#4
#16 and #4 in the unresolved symbols mean __stdcall calling convention when you import those symbols. Missing sign # in zlibd.lib means __cdecl calling convention is used while building zlib.dll. You should use the identical calling convention while exporting and importing function.
Since you have not provided any debugging details (build settings, a minimal example), I can only guess:
You use different compiler settings for the two projects: you maybe use /Gz in the app and /Gd in the dll. Use the same compiler setting.
You do not define the macro ZLIB_WINAPI in the app. Add the macro definition -DZLIB_WINAPI=1.
You define the macro ZLIB_WINAPI in the app. Remove the macro definition.

How do you merge multiple static linked libraries into a single dll given each static lib defines exported functionality (vc++ 2008)?

How do you merge multiple static linked libraries into a single dll given each static lib defines exported functionality (vc++ 2008)?.
In a multi project layout existing out of a single dll project and multiple sub projects that are linked in statically (in the dll project). Despite being marked as __declspec(export) some of the symbols in the sub-projects (.lib) refuse to have their symbols exported in the final dll.
Generating a .def file and marking the symbols explicitly for exportation could solve this problem. However identifying which of the symbols are marked as __declspec( export ) proofs a problem. Due large number of exported classes/function and primarily name mangling maintaining a list by hand is a unfordable process thus generating the list of symbols, that were marked for export, would be the only viable option.
Is there an utility or compiler directive could do this?
Use a DEF file.
Always use a DEF file.
Never fail to use a DEF file.
Just accept that a DEF file is the thing to use.
Stop using __declspec(dllexport), and use a dang-dratted def file already.
Also don't export classes. Export those class members which need to be exported only. And use a DEF file to do it.
Seriously, if you export classes without a DEF file, the function names will be several times longer than the actual program data. You should to use ordinals for exporting C++ member functions.
After bit trial and error I found that using the lib /def command can be utilized to generate an import library and export file. It appears that the export file contains all symbols that are marked with __declspec(dllexport). Subsequently the .exp file can be inspected with dumpbin and used as a reference to generate a module definition file.
Starting with Visual Studio 2015 Update 2 there is a new way of doing this, by using the linker option /WHOLEARCHIVE
It's documented here
The /WHOLEARCHIVE option forces the linker to include every object
file from either a specified static library, or if no library is
specified, from all static libraries specified to the LINK command. To
specify the /WHOLEARCHIVE option for multiple libraries, you can use
more than one /WHOLEARCHIVE switch on the linker command line. By
default, the linker includes object files in the linked output only if
they export symbols referenced by other object files in the
executable. The /WHOLEARCHIVE option makes the linker treat all object
files archived in a static library as if they were specified
individually on the linker command line.

why do I need to link a lib file to my project?

I am creating a project that uses a DLL. To build my project, I need to include a header file, and a lib file. Why do I need to include the respective lib file? shouldn't the header file declare all the needed information and then at runtime load any needed library/dll?
Thanks
In many other languages, the equivalent of the header file is all you need. But the common C linkers on Windows have always used import libraries, C++ linkers followed suit, and it's probably too late to change.
As a thought experiment, one could imagine syntax like this:
__declspec(dllimport, "kernel32") void __stdcall Sleep(DWORD dwMilliseconds);
Armed with that information the compiler/linker tool chain could do the rest.
As a further example, in Delphi one would import this function, using implicit linking, like so:
procedure Sleep(dwMilliseconds: DWORD); stdcall; external 'kernel32';
which just goes to show that import libraries are not, a priori, essential for linking to DLLs.
That is a so-called "import library" that contains minimal wiring that will later (at load time) ask the operating system to load the DLL.
DLLs are a Windows (MS/Intel) thing. The (generated) lib contains the code needed to call into the DLL and it exposes 'normal' functions to the rest of your App.
No, the header file isn't necassarily enough. The header file can contain just the declarations of the functions and classes and other things you need, not their implementations.
There is a world of difference between this code:
void Multiply(int x, int y);
and this code:
void Multiply(int x, int y)
{
return x * y;
}
The first is a declaration, and the second is a definition or implementation. Usually the first example is put in header files, and the second one is put in .CPP files (If you are creating libraries). If you included a header with the first and didn't link in anything, how is your application supposed to know how to implement Multiply?
Now if you are using header files that contain code that is ALL inlined, then you do not need to link anything. But if even one method is NOT inlined, but has its implementation in a .CPP file that is compiled to a .lib file, than you need to link in the .lib file.
[EDIT]
With your use of Import Libraries, you are telling the linker to NOT include the implementation details of the imported code into your binary. Instead the OS will then load the import DLL at run-time into your process. This will make your application smaller, but you have to ship another DLL with it. If the implementation of the library changes, you can just reship another DLL to your customers, and not have to reship the entire application.
There is another option where you can just link in a library and you don't need to ship another DLL. That option is where the Linker will include the implementation into your application, making it bigger in size. If you have to change the implementation details in the imported library, then you have to recompile and relink your entire application, and reship the entire thing to your customers.
There are two relevant phases in the building process here:
compilation: from the source code to an object file. During the compilation, the compiler needs to know what external things are available, for that one needs a declaration. Declarations designed to be used in several compilation units are grouped in header. So you need the headers for the library.
linking: For static libraries, you need the compiled version of the library. For dynamic libraries, in Unix you need the library, in windows, you need the "import library".
You could think that a library could also embed the declarations or the header could include the library which needs to be linked. The first is often done in other languages. The second is sometimes available through pragmas in C and C++, but there is no standard way to do this and would be in conflict with common usage (such as choosing a library among several which provide code variant for the same declarations, for instance debug/release single thread/multithreads). And neither choice correspond well with the simple compilation model of C and C++ which has its roots in the 60's.
The header file is consumed by the compiler. It contains all the forward declarations of functions, classes and global variables that will be used. It may also contain some inline function definitions as well.
These are used by the compiler to give it the bare minimum information that it needs to compile your code. It will not contain the implementation details.
However you still need to link in all the function, and variable definitions that you have told the compiler about. Failure to do so will result in a linker error. Often this is contains in other object files which may be joined into a single static library.
In the case of DLLs (or .so files), we still need to tell the linker where in the DLL or shared object the missing symbols are. On windows, this information is contained in a .lib file. This will generate the code to load and link the code at runtime.
On unix the the dll and lib files are combined into a single .so file which you must link against to about linker errors.
You can still use a dll without a .lib file but you will then have to load and link in all the symbols manually using operating system APIs.
from 1000 ft, the lib contains the list of the functions that dll exports and addresses that are needed for the call.

Can a Visual Studio produced static library, be stripped of symbols?

I'll divide this questions in 3 parts:
I would like to produce a static library and strip off its symbols. (Debug info is already not included)
Similar to the strip command in linux. Can it be done?
Is there an equivalent tool in windows env, to the nm tool in linux?
When creating a static library using VS2008. Is it possible to define a script that will exclude some of the produced .obj files out of the build and out of the static lib?
Can it be dynamic? I mean I'd define a compilation mode in the script and this would result in specific object files being excluded from the build
If anything is visible that you feel should not be, try declaring it with the "static" keyword. This tells the compiler that it is accessible only to the current module.
There are cases where it would be convenient to be able to strip out all but a small number of "exported" public symbols, but it's not really feasible.
A static library is little more than a collection of .obj files. The internal dependencies haven't been resolved yet, and they won't be resolved until link time.
For example, if your .lib consists of foo.obj and bar.obj, and there's a call in foo.obj to a function defined in bar.obj, then that symbol must be available at link time, even if nothing outside of the library should be able to see it.
For that reason, you cannot strip the symbols (with the possible exception of file-scope static symbols). Even class methods that are protected or private (in the C++-sense) will exist in the symbol table, since the enforcement of the visibility is a compile-time issue, not a link-time one.
In contrast, a dynamic library is a standalone binary that has already been linked. References from foo.obj to bar.obj have already been resolved. Thus a DLL can be stripped of symbols except for the ones that must be exported (and even those can be renamed or replaced by ordinals).
If your DLL exposes a simple C API, then you're all set. But if you want to expose a C++ class, you're probably going to end up exporting all of its methods, even the protected and private ones (since inlining in the external application might result in direct calls to private methods).
No, how do you think the users of the static library would link to it without knowing where are the symbols they use defined?
Yes, try the DUMPBIN utility.
Well, yes. You can run the LIB utility with /REMOVE:foo.
That said, I think you are doing something that either is not worth doing or could be done a lot simpler than with removing library members.
I kept finding the names of certain (but not all) static functions in .obj files produced by VS2010. Interestingly, they were visible in my Release .obj files but not the Debug .obj files. I just used cygwin strings to perform the search:
$ strings myObjectFile.obj | grep myStaticFunctionName
I tracked it down to the "Whole Program Optimization = Yes" setting ("/GL"). When I switched this to "No" the function names no longer appear.
Update: As a followup test I opened the "cleansed" myObjectFile.obj in vim and I can still find them (with either :set encoding=utf-8 or :set encoding=latin1). I'm not sure why strings was missing the matches. Oh well.

Trouble compiling dll that accesses another dll

So, I have an interesting issue. I am working with a proprietary set of dlls that I ,obviously, don't have the source for. The goal is to write an intermediate dll that groups together a large series of funnction calls from the proprietary dlls. The problem I am having, when compiling with g++, is that I get errors for the original dlls along the lines of:
cannot export libname_NULL_THUNK_DATA. Symbol not found.
If I add a main and just compile to an executable everything works as expected. I'm using mingw for compilation. Thanks for any help.
In response to the first reply: Either I'm confused about what you're saying or I didn't word my question very well. I'm not explicitly trying to export anything from my wrapper I am just calling functions from their dlls. The problem is that I get errors that it can't export these specific symbols from the dll to my wrapper. The issue is that I'm not even entirely sure what these _NULL_THUNK_DATA symbols are for. I did a search and read somewhere that they shouldn't be exported because they're internal symbols that windows uses. I have tried using the --exclude-symbols directive to the linker but it didn't seem to do anything. I apologize if I'm completely misunderstanding what you're trying to say.
So, I think my issue was related to this. When just compiling a standard executable that uses a dll I was able to include the headers and directly call the functions for example:
#include :3rdparty.h
int main(){
dostuff(); // a function in the 3rdparty.dll
}
this would compile and run fine. I just needed to link the libraries in the g++ command.
When linking with the -shared flag I would get these errors (with main removed of course). I think it has something to do with the fact that by default g++ attempts to import all symbols from the dll. What I didn't understand is why this happens in the dll vs in an executable. I will try doing it using GetProcAddress(). Thank you!
it should be as easy as you think it should be.
eg:
your dll code needs:
void doStuff()
{
3rdparty.login();
3rdparty.dostuff();
3rdparty.logoff();
};
so far - so good, you've included the right headers .... (if you have them, if you don't then you need to import the library using LoadLibrary(), then create a function pointer to each exported dll entrypoint using GetProcAddress() and then call that function pointer)
You then link with the 3rd party lib and that's it. Occasionally you will have to wrap the definitions with 'extern "C"' in order to get the linkage name mangling correct.
As you say you're using g++, you can't be getting confused with __declspec(dllimport) which is a MS VC extension.
"Compiling" tells me that you're approaching this from the wrong end. Your DLL should not export its own wrapper functions, but directly refer to exports from other DLLs.
E.g. in a Windows Kernel32.DEF file, the following forward exists:
EXPORTS
...
HeapAlloc = NTDLL.RtlAllocHeap
There's no code for the HeapAlloc function.