I'm trying to hook GCC's __cxa_throw so I'll be able to receive exception backtraces without cluttering exception objects themselves. While it worked well with executable, it doesn't work with shared objects.
I'm constructing tiny static library which contains hook itself and redirects it to my function in a separate SO. And my __cxa_throw appears nicely as exported symbol in .text section. Though it doesn't help, even with LD's -Bsymbolix, -Bsymbolic-functions.
So the question is. Can I force LD to find my existing implementation of __cxa_throw and satisfy external symbol with it?
Thanks
First and foremost, your hook function must be compiled into object file and linked into resulting SO as object file. Seems that symbols from object files have higher priority than even default imports like libstd. This should already satisfy all symbol references inside whole SO, including archives being linked in.
Second, if you want your hook to not be visible from outside world, just use __attribute__((visibility("hidden")))
Related
Background
First of all, I think this question goes beyond the C++ standard. The standard deals with multiple translation units (instantiation units) and thus multiple object modules, but does not seem to acknowledge the possibility of having multiple independently compiled and linked binary modules (i.e., .so files on Linux and .dll files on Windows). After all, the latter more of less enters into the world of application binary interface (ABI) that the standard leaves to implementations to consider at present.
When only a single binary module is involved, the following code snippet illustrates an elegant and portable (standard-compliant) solution to make singletons.
inline T& get() {
static T var{};
return var;
}
There are two things to note about this solution. First, the inline specifier makes the function a candidate to be included in multiple translation units, which is very convenient. Note that, the standard guarantees there is only a single instance of get() and the local static variable var in the final binary module (see here).
The second thing to note is that since C++11, initialization of static local variables is properly synchronized (see the Static local variables section here). So, concurrent invocations of get() is fine.
Now, I try to extend this solution to the case when multiple binary modules are involved. I find the following variant works with VC++ on Windows.
// dllexport is used in building the library module, and
// dllimport is used in using the library in an application module.
// Usually controlled by a macro switch.
__declspec(dllexport/dllimport) inline T& get() {
static T var{};
return var;
}
Note for non-Windows users: __declspec(dllexport) specifies that an entity (i.e., a function, a class or an object) is implemented (defined) in this module and is to be referenced by other modules. __declspec(dllimport), on the other hand, specifies that an entity is not implemented in this module and is to be found in some other module.
Since VC++ supports exporting and importing template instantiations (see here), the above solution can even be templated. For example:
template <typename T> inline
T& get() {
static T var{};
return var;
}
// EXTERN is defined to be empty in building the library module, and
// to `extern` in using the library module in an application module.
// Again, this is usually controlled by a macro switch.
EXTERN template __declspec(dllexport/dllimport) int& get<int>();
As a side note, the inline specifier is not mandatory here. See this S.O. question.
The Question
Since there is no __declspec(dllexport/import) equivalents in GCC and clang, is there a way to make a variant of the above solution that works on these two compilers?
Also, in Boost.Log, I noticed the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT macro (see the Global logger objects section here). It is claimed to create singletons even if the application consists of multiple modules. If someone knows about the inner workings of this macro, explanations are welcome here.
Finally, if you know about any better solutions for making singletons, feel free to post it as an answer.
Since there is no __declspec(dllexport/import) equivalents in GCC and clang, is there a way to make a variant of the above solution that works on these two compilers?
First, this is not as much a compiler-related question but rather an underlying operating system one. GCC (and supposedly clang) do support __declspec(dllexport/import) on Windows and basically do the same as what MSVC does with the functions and object marked this way. Basically, the marked symbol is placed in a table of exported symbols from the dll (an export table). This table can be used, for instance, when you query for a symbol in a dll in run time (see GetProcAddress).
Along with the dll there comes an associated lib file that contains auxiliary data for linking your application with the dll. When you link your application with the library, the linker uses the lib file to resolve references to the dll symbols and compose the import table in your application binary. When the application starts, the OS (or rather the runtime loader component of the OS) uses the import table to find out what dlls your application depends on and what symbols it imports from those dlls. It then uses export tables in the dlls to resolve addresses of the referenced symbols in the dlls and complete the linking process.
The important side effect of this process is that only imported symbols are dynamically resolved and every symbol you dynamically link to is associated with a particular dll. You can have same-named symbols in multiple dlls and the application itself, and these symbols will refer to distinct entities as long as they are not exported. If they are exported, linking process will fail because of ambiguity. This makes process-wide singletons difficult on Windows. This also breaks some C/C++ language rules, because taking address of an object or function with external linkage (in the language terms) can produce different addresses in different parts of the program. On the other hand, the dlls are more self-contained and depend on the loading context in a lesser degree.
Things are significantly different on Linux and other POSIX-like OSs. When linked, for each shared object (which can be an so library or the application executable) a table of symbols is compiled. It lists both the symbols this shared object implements and the symbols it is missing. Additionally, the linker may embed into the shared object a list of other shared objects (optionally, with search paths) that could be used to resolve the missing symbols. The runtime loader includes a linker which loads the shared objects sequentially and constructs a global table of symbols comprising symbols from all shared objects. As that table is constructed, the duplicate symbols from multiple shared objects are resolved to a single implementation (as all implementations are considered equivalent, the first shared object in the load list that implements the symbol is used). Any missing symbols are also resolved as the subsequent shared objects in the link order are loaded.
The effect of this process is that each symbol with external linkage resolve to a single implementation in one of the shared objects, even if multiple shared objects implement it. This is more in line with the C/C++ language rules and makes it simpler to implement process-wide singletons. A simple function-local static variable, not marked in any special way, is enough.
Now, there are ways to influence the linking process, and in particular there are ways to limit the symbols that are exported from a shared object. The most common ways to do that are using symbol visibility and linker scripts. With these tools it is possible to achieve linking behavior very close to Windows, with all its pros and cons. Note that when you limit symbol visibility you do have to mark the symbols you intend to export from the shared object with the visibility attribute or pragma. There's no need to mark symbols for import though.
Also, in Boost.Log, I noticed the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT macro (see the Global logger objects section here). It is claimed to create singletons even if the application consists of multiple modules. If someone knows about the inner workings of this macro, explanations are welcome here.
Boost.Log requires to be built as a shared library when it is used from a multi-module application. This makes it possible for it to have a process-wide storage of references to global loggers declared throughout the application (the storage is implemented within the Boost.Log dll/so). When you obtain a logger declared with the BOOST_LOG_INLINE_GLOBAL_LOGGER_DEFAULT or similar macro, the storage is first looked up for the reference to the logger. If it is not found, the logger is created and a reference to it is stored back to the internal storage. Otherwise the existing reference is used. Along with reference caching, this provides performance very close to a function-local static variable.
Finally, if you know about any better solutions for making singletons, feel free to post it as an answer.
Although this is not really an answer, you should generally avoid singletons. They are difficult to implement correctly and in a way that does not hamper performance. If you really do have to implement one then the solution similar to Boost.Log looks generic enough. Note however that with this solution it is generally not known which module created (and as such, 'owns') the singleton, so you can't unload any modules dynamically. There may be simpler ways that are case-specific, like exporting a function returning a reference to the local static object. If you want portability and support non-default symbol visibility by default, always explicitly export your symbols.
GCC has the ability to make a symbol link weakly via __attribute__((weak)). I want to use the a weak symbol in a static library that users can override in their application. A GCC style weak symbol would let me do that, but I don't know if it can be done with visual studio.
Does Visual Studio offer a similar feature?
You can do it, here is an example in C:
/*
* pWeakValue MUST be an extern const variable, which will be aliased to
* pDefaultWeakValue if no real user definition is present, thanks to the
* alternatename directive.
*/
extern const char * pWeakValue;
extern const char * pDefaultWeakValue = NULL;
#pragma comment(linker, "/alternatename:_pWeakValue=_pDefaultWeakValue")
MSVC++ has __declspec(selectany) which covers part of the functionality of weak symbols: it allows you to define multiple identical symbols with external linkage, directing the compiler to choose any one of several available. However, I don't think MSVC++ has anything that would cover the other part of weak symbol functionality: the possibility to provide "replaceable" definitions in a library.
This, BTW, makes one wonder how the support for standard replaceable ::operator new and ::operator delete functions works in MSVC++.
MSVC used to behave such that if a symbol is defined in a .obj file and a .lib it would use the one on the .obj file without warning. I recall that it would also handle the situation where the symbol is defined in multiple libs it would use the one in the library named first in the list.
I can't say I've tried this in a while, but I'd be surprised if they changed this behavior (especially that .obj defined symbols override symbols in .lib files).
The only way i know. Place each symbol in a separate library. User objects with overrides also must be combined to library. Then link all together to an application. User library must be specified as input file, your lib's must be transfered to linker using /DEFAULTLIB: option.
From here:
... if a needed symbol can be satisfied without consulting a library,
then the OBJ in the library will not be used. This lets you override a
symbol in a library by explicitly placing it an OBJ. You can also
override a symbol in a library to putting it in another library that
gets searched ahead of the one you want to override. But you can’t
override a symbol in an explicit OBJ, because those are part of the
initial conditions.
This behavior results from the algorithm adopted by the linker.
So for short, to override a function,
With GCC, you must use the __attribute__((weak)). You cannot rely on the input order of object files into the linker to decide which function implementation is used.
With VS toolchain, you can rely on the order of the object/lib files and you must place your implementation of the function before the LIB that implements the original function. And you can put your implementation as an OBJ or LIB.
There isn't an MS-VC equivalent to this attribute. See http://connect.microsoft.com/VisualStudio/feedback/details/505028/add-weak-function-references-for-visual-c-c. I'm going to suggest something horrible: reading the purpose of it here: http://www.kolpackov.net/pipermail/notes/2004-March/000006.html it is essentially to define functions that, if their symbols exist, are used, otherwise, are not, so...
Why not use pre-processor for this purpose, with the huge caveat of "if you need to do this at all"? (I'm not a fan of recommending pre-processor).
Example:
#ifdef USE_MY_FUNCTION
extern void function();
#endif
then call appropriately in the application logic, surrounded by #ifdef statements. If your static library is linked in, as part of the linking in process, tweak the defines to define USE_MY_FUNCTION.
Not quite a direct equivalent and very ugly but it's the best I can think of.
I have an argument with another developer, I'd like to settle here over Dynamic Link vs. Static Link.
In Theory:
Say you have a library with 100 functions, each has significant amounts of code inside it:
int A()
int B()
int C()
..
..and so on...
And your application only calls or depends on one of them.
You have two methods at your disposal.
Build the library as a dynamic linked library
Build the library as a statically linked library
My colleague claims that linking the static library to our application, the compiler/linker will not add the code of the 99 unused functions into our executable. I claim it will. I claim in this scenario the only advantage is having a single executable and not having to distribute the library with our application, but it will not have significant size differences if we used a dynamically linked library approach.
Who is correct?
It can depend on a combination of how the code is organized, and what compiler flags you use.
Following the classic, simple model of things, the linker would link in whatever object files in the library were needed to satisfy the symbol references, so if your A(), B() and C() were each defined in different object files, only the object file that contained the symbol you actually used would be linked into the program (unless it, in turn, depended upon one or more of the others, in which case, the linker would find object files to satisfy those references as well, recursively, until it either satisfied them all, or found one it couldn't satisfy (at which time you'd get the standard "Unresolved external XXX" error message).
More recently, most compilers can "package" functions into separate "modules" without your having to put them into separate source files to create separate object files. Details vary, but can reduce (or eliminate) the necessity for having each source file as tiny as possible just to keep what ends up in the final executable to a minimum.
So, bottom line: at least for the most part, he's right and you're wrong.
It depends :-)
If you put each function in its own source file, or use the /Gy compile option, each function will be packaged in a separate section of the static library.
The linker will then be able to pick them up as needed, and only include the functions that are actually called.
GCC has the ability to make a symbol link weakly via __attribute__((weak)). I want to use the a weak symbol in a static library that users can override in their application. A GCC style weak symbol would let me do that, but I don't know if it can be done with visual studio.
Does Visual Studio offer a similar feature?
You can do it, here is an example in C:
/*
* pWeakValue MUST be an extern const variable, which will be aliased to
* pDefaultWeakValue if no real user definition is present, thanks to the
* alternatename directive.
*/
extern const char * pWeakValue;
extern const char * pDefaultWeakValue = NULL;
#pragma comment(linker, "/alternatename:_pWeakValue=_pDefaultWeakValue")
MSVC++ has __declspec(selectany) which covers part of the functionality of weak symbols: it allows you to define multiple identical symbols with external linkage, directing the compiler to choose any one of several available. However, I don't think MSVC++ has anything that would cover the other part of weak symbol functionality: the possibility to provide "replaceable" definitions in a library.
This, BTW, makes one wonder how the support for standard replaceable ::operator new and ::operator delete functions works in MSVC++.
MSVC used to behave such that if a symbol is defined in a .obj file and a .lib it would use the one on the .obj file without warning. I recall that it would also handle the situation where the symbol is defined in multiple libs it would use the one in the library named first in the list.
I can't say I've tried this in a while, but I'd be surprised if they changed this behavior (especially that .obj defined symbols override symbols in .lib files).
The only way i know. Place each symbol in a separate library. User objects with overrides also must be combined to library. Then link all together to an application. User library must be specified as input file, your lib's must be transfered to linker using /DEFAULTLIB: option.
From here:
... if a needed symbol can be satisfied without consulting a library,
then the OBJ in the library will not be used. This lets you override a
symbol in a library by explicitly placing it an OBJ. You can also
override a symbol in a library to putting it in another library that
gets searched ahead of the one you want to override. But you can’t
override a symbol in an explicit OBJ, because those are part of the
initial conditions.
This behavior results from the algorithm adopted by the linker.
So for short, to override a function,
With GCC, you must use the __attribute__((weak)). You cannot rely on the input order of object files into the linker to decide which function implementation is used.
With VS toolchain, you can rely on the order of the object/lib files and you must place your implementation of the function before the LIB that implements the original function. And you can put your implementation as an OBJ or LIB.
There isn't an MS-VC equivalent to this attribute. See http://connect.microsoft.com/VisualStudio/feedback/details/505028/add-weak-function-references-for-visual-c-c. I'm going to suggest something horrible: reading the purpose of it here: http://www.kolpackov.net/pipermail/notes/2004-March/000006.html it is essentially to define functions that, if their symbols exist, are used, otherwise, are not, so...
Why not use pre-processor for this purpose, with the huge caveat of "if you need to do this at all"? (I'm not a fan of recommending pre-processor).
Example:
#ifdef USE_MY_FUNCTION
extern void function();
#endif
then call appropriately in the application logic, surrounded by #ifdef statements. If your static library is linked in, as part of the linking in process, tweak the defines to define USE_MY_FUNCTION.
Not quite a direct equivalent and very ugly but it's the best I can think of.
Can someone describe what a symbol table is within the context of C and C++?
There are two common and related meaning of symbol tables here.
First, there's the symbol table in your object files. Usually, a C or C++ compiler compiles a single source file into an object file with a .obj or .o extension. This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand. If you call a function from your code, the compiler doesn't put the final address of the routine in the object file. Instead, it puts a placeholder value into the code and adds a note that tells the linker to look up the reference in the various symbol tables from all the object files it's processing and stick the final location there.
Second, there's also the symbol table in a shared library or DLL. This is produced by the linker and serves to name all the functions and data items that are visible to users of the library. This allows the system to do run-time linking, resolving open references to those names to the location where the library is loaded in memory.
If you want to learn more, I suggest John Levine's excellent book "Linkers and Loaders".link text
Briefly, it is the mapping of the name you assign a variable to its address in memory, including metadata like type, scope, and size. It is used by the compiler.
That's in general, not just C[++]*. Technically, it doesn't always include direct memory address. It depends on what language, platform, etc. the compiler is targeting.
In Linux, you can use command:
nm [object file]
to list the symbol table of that object file. From this printout, you may then decipher the in-use linker symbols from their mangled names.
The symbol table is the list of "symbols" in a program/unit. Symbols are most often the names of variables or functions. The symbol table can be used to determine where in memory variables or functions will be located.
Check out the Symbol Table wikipedia entry.
Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc.
From the "Computer Systems A Programmer’s Perspective" book, Ch 7 Linking. "Symbols and Symbol Tables":
Symbol table is information about functions and global variables that
are defined and referenced in the program
And important note (form the same chapter):
It is important to realize that local linker symbols are not the same
as local program variables. The symbol table does not contain any
symbols that correspond to local nonstatic program variables. These
are managed at run time on the stack and are not of interest to the
linker