What causes a TU to be included in compilation? - c++

I am a little out of my depth here, I'll be honest.
I am doing some rather curious experimentation, having pre-main functions register my classes in a factory, through anonymous namespaces. Until recently, adding the following to a class definition (.cpp) would do the trick.
namespace { int x = Register<classType>(className); }
This would be wrapped in a macro and 'Register' would pass the type and name off to my factory.
This worked fine and every class that included this macro was getting registered, until I went to move the code into a static library. Now, since the classes are only referenced by the factory, it looks like they're being omitted from the build - my 'Register' functions are no longer being called and so my factory is empty.
I have managed to fix this by moving all my macros into the constructor of a manager object, but I noticed that as soon as I referenced them there, the macros in the .cpp files started getting called again. I guessed because now the classes are actually being referenced by something.
However, I don't really want to do it this way and I can't seem to find a non-committal way of referencing the classes in the constructor (e.g class ClassToRegister;) which includes them in the build, so that my register macros will get called.
Firstly, does this make sense?
Secondly, any advice on how I can force these TUs to compile so that the anonymous namespaces 'kick in' at runtime?

It appears this is part and parcel of using static libraries; unused code will not make it through without linker voodoo.

A static library is a bunch of object files you give the linker saying "hey, find here what I didn't define elsewhere". So if a given object file in the library doesn't fill a dependency, you won't be able to include it in your program without relying on other features of the linker (for instance some linkers have a way to include all the object files of static libraries instead of only the one who fill dependencies).

Most likely you are a victim of aggressive optimisation, but with a reason : since you do not use objects in a nameless namespace, the compiler removes them.
You could try to get around like this :
namespace foo
{
namespace{
MACRO_TO_DEFINE_VARIABLE( MyClass ); // define a variable named registrationObj
};
MyClass::MyClass()
{
(void)registrationObj;
}
}

Related

Can you weak link(optional link) a static library in C++ like Objc could?

I am developing a static library A with a mix of Objective C and C++ in Xcode, I run into the need to "weak link" another static library and Let's call it B. A calls some methods defined in B. The idea is that if B is not linked/provided, A will not throw any undefined symbols error.
I know that in Objective C you can do that and in code I can rely on run time methods such as NSClassFromString or [someObject Class] to check if a certain function is present/available from another static library, but I don't know if I can achieve that in one of my .cpp source file. Please advise and thank you!
I created a very simple sample project for illustration purpose:
Library A:
Core_ObjC.h, this is the header that will be exposed
#import <Foundation/Foundation.h>
#interface Core_Objc : NSObject
-(int) calculate;
#end
Core_ObjC.mm
#import "Core_ObjC.h"
#include "Core_CPP.h"
#implementation Core_Objc
-(int) calculate{
return calculate_Core();//Call into cpp here
}
#end
Core_CPP.cpp
#include "Core_CPP.h"
#include "NonCore_CPP.h"
int calculate_Core(){
return calculate_NonCore();//Call into another cpp here but it's defined in Library B
}
Library B:
NonCore_CPP.cpp
#include "NonCore_CPP.h"
int calculate_NonCore(){
return 100;
}
If I link both libraries in a sample app, the app will compile fine. However, when I link only A from the sample app, I will encounter error like:
Undefined symbols for architecture arm64:
"calculate_NonCore()", referenced from:
calculate_Core() in CoreFramework(Core_CPP.o)
The error does make sense to me because B will have the missing definition, but I am just looking for a solution that the compilation won't complain when there is only A.
So, weak linking works at the C function level as well. For details, see Apple's documentation but basically, the symbol needs to be declared using __attribute__((weak_import)), as follows
extern int MyFunction() __attribute__((weak_import));
extern int MyVariable __attribute__((weak_import));
and then you can check if its address is zero to check if it was found during link time, using e.g. if (MyFunction != NULL) or if (&MyVariable != NULL).
Your title mentions static libraries. Note that static libraries are just collections of object files, so you either link the library or you don't; there's no weak linking a static library, because by definition, it's present at build time. Or not. If you want runtime behaviour, use a dynamic library. (dylib)
The above should actually cover the example you have given: you'll just need to provide alternative code for when calculate_NonCore is unavailable, as you can't call it without crashing the program. (Makes sense, it's missing)
You mention C++ too, although your code doesn't make any obvious use of it. C++ makes this a little more complicated because of its mangled names; additionally, classes themselves don't have any linkage, so you can't test for presence of a class. The compiler seems to allow applying the weak_import attribute to member functions, static member variables, etc. but I'm not sure to what extent the NULL checks will work. Free functions should certainly not be an issue - treat them like C functions - and I'd guess that static members are probably a good candidate because you can easily take their address.
I'd probably try to avoid calling into any code that makes use of any part of a class that might not be there, and set things up such that you can simply check for the presence of a "canary" static member variable or function, which, if present, you can conclude that the whole class is present.

Linker removing static initialiser [duplicate]

I am working on a factory that will have types added to them, however, if the class is not explicitly instiated in the .exe that is exectured (compile-time), then the type is not added to the factory. This is due to the fact that the static call is some how not being made. Does anyone have any suggestions on how to fix this? Below is five very small files that I am putting into a lib, then an .exe will call this lib. If there is any suggestions on how I can get this to work, or maybe a better design pattern, please let me know. Here is basically what I am looking for
1) A factory that can take in types
2) Auto registration to go in the classes .cpp file, any and all registration code should go in the class .cpp (for the example below, RandomClass.cpp) and no other files.
BaseClass.h : http://codepad.org/zGRZvIZf
RandomClass.h : http://codepad.org/rqIZ1atp
RandomClass.cpp : http://codepad.org/WqnQDWQd
TemplateFactory.h : http://codepad.org/94YfusgC
TemplateFactory.cpp : http://codepad.org/Hc2tSfzZ
When you are linking with a static library, you are in fact extracting from it the object files which provide symbols which are currently used but not defined. In the pattern that you are using, there is probably no undefined symbols provided by the object file which contains the static variable which triggers registration.
Solutions:
use explicit registration
have somehow an undefined symbol provided by the compilation unit
use the linker arguments to add your static variables as a undefined symbols
something useful, but this is often not natural
a dummy one, well it is not natural if it is provided by the main program, as a linker argument it main be easier than using the mangled name of the static variable
use a linker argument stating that all the objects of a library have to be included
dynamic libraries are fully imported, thus don't have that problem
As a general rule of thumb, an application do not include static or global variables from a library unless they are implicitly or explicitly used by the application.
There are hundred different ways this can be refactored. One method could be to place the static variable inside function, and make sure the function is called.
To expand on one of #AProgrammer's excellent suggestions, here is a portable way to guarantee the calling program will reference at least one symbol from the library.
In the library code declare a global function that returns an int.
int make_sure_compilation_unit_referenced() { return 0; }
Then in the header for the library declare a static variable that is initialized by calling the global function:
extern int make_sure_compilation_unit_referenced();
static int never_actually_used = make_sure_compilation_unit_referenced();
Every compilation unit that includes the header will have a static variable that needs to be initialized by calling a (useless) function in the library.
This is made a little cleaner if your library has its own namespace encapsulating both of the definitions, then there's less chance of name collisions between the bogus function in your library with other libraries, or of the static variable with other variables in the compilation unit(s) that include the header.

Different C++ Class Declarations

I'm trying to use a third party C++ library that isn't using namespaces and is causing symbol conflicts. The conflicting symbols are for classes my code isn't utilizing, so I was considering creating custom header files for the third party library where the class declarations only include the public members my code is using, leaving out any members that use the conflicting classes. Basically creating an interface.
I have three questions:
If the compilation to .obj files works, will this technique still cause symbol conflicts when I get to linking?
If that isn't a problem, will the varying class declarations cause problems when linking? For example, does the linker verify that the declaration of a class used by each .obj file has the same number of members?
If neither of those are a problem and I'm able to link the .obj files, will it cause problems when invoking methods? I don't know exactly how C++ works under the hood, but if it uses indexes to point to class methods, and those indexes were different from one .obj file to another, I'm guessing this approach would blow up at runtime.
In theory, you need identical declarations for this to work.
In practice, you will definitely need to make sure your declarations contain:
All the methods you use
All the virtual methods, used or not.
All the data members
You need all these in the right order of declaration too.
You might get away with faking the data members, but would need to make sure you put in stubs that had the same size.
If you do not do all this, you will not get the same object layout and even if a link works it will fail badly and quickly at run-time.
If you do this, it still seems risky to me and as a worst case may appear to work but have odd run time failures.
"if it uses indexes ": To some extent exactly how virtual functions work is implementation defined, but typically it does use an index into a virtual function table.
What you might be able to do is to:
Take the original headers
Keep the full declarations for the classes you use
Stub out the classes and declarations you do not use but are referenced by the ones you do.
Remove all the types not referenced at all.
For explanatory purposes a simplified explaination follows.
c++ allows you to use functions you declare. what you do is putting multiple definitions to a single declaration across multiple translation units. if you expose the class declaration in a header file your compiler sees this in each translation unit, that includes the header file.
Therefore your own class functions have to be defined exactly as they have been declared (same function names same arguments).
if the function is not called you are allowed not to define it, because the compiler doesn't know whether it might be defined in another translation unit.
Compilation causes label creation for each defined function(symbol) in the object code. On the other hand a unresolved label is created for each symbol that is referenced to (a call site, a variable use).
So if you follow this rules you should get to the point where your code compiles but fails to link. The linker is the tool that maps defined symbols from each translation-unit to symbol references.
If the object files that are linked together have multiple definitions to the same functions the linker is unable to create an exact match and therefore fails to link.
In practice you most likely want to provide a library and enjoy using your own classes without bothering what your user might define. In spite of the programmer taking extra care to put things into a namespace two users might still choose the same name for a namespace. This will lead to link failures, because the compiler exposed the symbols and is supposed to link them.
gcc has added an attribute to explicitly mark symbols, that should not be exposed to the linker. (called attribute hidden (see this SO question))
This makes it possible to have multiple definitions of a class with the same name.
In order for this to work across compilation units, you have to make sure class declarations are not exposed in an interface header as it could cause multiple unmatching declarations.
I recommend using a wrapper to encapsulate the third party library.
Wrapper.h
#ifndef WRAPPER_H_
#define WRAPPER_H_
#include <memory>
class third_party;
class Wrapper
{
public:
void wrappedFunction();
Wrapper();
private:
// A better choice would be a unique_ptr but g++ and clang++ failed to
// compile due to "incomplete type" which is the whole point
std::shared_ptr<third_party> wrapped;
};
#endif
Wrapper.cpp
#include "Wrapper.h"
#include <third_party.h>
void Wrapper::wrappedFunction()
{
wrapped->command();
}
Wrapper::Wrapper():wrapped{std::make_shared<third_party>()}
{
}
The reason why a unique_ptr doesn't work is explained here: std::unique_ptr with an incomplete type won't compile
You can move the entire library into a namespace by using a clever trick to do with imports. All the import directive does is copy the relevant code into the current "translation unit" (a fancy name for the current code). You can take advantage of this as so
I've borrowed heavily from another answer by user JohnB which was later deleted by him.
// my_thirdparty.h
namespace ThirdParty {
#include "thirdparty.h"
//... Include all the headers here that you need to use for thirdparty.
}
// my_thirdparty.cpp / .cc
namespace ThirdParty {
#include "thirdparty.cpp"
//... Put all .cpp files in here that are currently in your project
}
Finally, remove all the .cpp files in the third party library from your project. Only compile my_thirdparty.cpp.
Warning: If you include many library files from the single my_thirdparty.cpp this might introduce compiler issues due to interaction between the individual .cpp files. Things such as include namespace or bad define / include directives can cause this. Either resolve or create multiple my_thirdparty.cpp files, splitting the library between them.

Static variable initialization over a library

I am working on a factory that will have types added to them, however, if the class is not explicitly instiated in the .exe that is exectured (compile-time), then the type is not added to the factory. This is due to the fact that the static call is some how not being made. Does anyone have any suggestions on how to fix this? Below is five very small files that I am putting into a lib, then an .exe will call this lib. If there is any suggestions on how I can get this to work, or maybe a better design pattern, please let me know. Here is basically what I am looking for
1) A factory that can take in types
2) Auto registration to go in the classes .cpp file, any and all registration code should go in the class .cpp (for the example below, RandomClass.cpp) and no other files.
BaseClass.h : http://codepad.org/zGRZvIZf
RandomClass.h : http://codepad.org/rqIZ1atp
RandomClass.cpp : http://codepad.org/WqnQDWQd
TemplateFactory.h : http://codepad.org/94YfusgC
TemplateFactory.cpp : http://codepad.org/Hc2tSfzZ
When you are linking with a static library, you are in fact extracting from it the object files which provide symbols which are currently used but not defined. In the pattern that you are using, there is probably no undefined symbols provided by the object file which contains the static variable which triggers registration.
Solutions:
use explicit registration
have somehow an undefined symbol provided by the compilation unit
use the linker arguments to add your static variables as a undefined symbols
something useful, but this is often not natural
a dummy one, well it is not natural if it is provided by the main program, as a linker argument it main be easier than using the mangled name of the static variable
use a linker argument stating that all the objects of a library have to be included
dynamic libraries are fully imported, thus don't have that problem
As a general rule of thumb, an application do not include static or global variables from a library unless they are implicitly or explicitly used by the application.
There are hundred different ways this can be refactored. One method could be to place the static variable inside function, and make sure the function is called.
To expand on one of #AProgrammer's excellent suggestions, here is a portable way to guarantee the calling program will reference at least one symbol from the library.
In the library code declare a global function that returns an int.
int make_sure_compilation_unit_referenced() { return 0; }
Then in the header for the library declare a static variable that is initialized by calling the global function:
extern int make_sure_compilation_unit_referenced();
static int never_actually_used = make_sure_compilation_unit_referenced();
Every compilation unit that includes the header will have a static variable that needs to be initialized by calling a (useless) function in the library.
This is made a little cleaner if your library has its own namespace encapsulating both of the definitions, then there's less chance of name collisions between the bogus function in your library with other libraries, or of the static variable with other variables in the compilation unit(s) that include the header.

How do I prevent my 'unused' global variables being compiled out of my static library?

I'm using static initialisation to ease the process of registering some classes with a factory in C++. Unfortunately, I think the compiler is optimising out the 'unused' objects which are meant to do the useful work in their constructors. Is there any way to tell the compiler not to optimise out a global variable?
class SomeClass {
public:
SomeClass() {
/* do something useful */
}
};
SomeClass instance;
My breakpoint in SomeClass's constructor doesn't get hit. In my actual code, SomeClass is in a header file and instance is in a source file, more or less alone.
EDIT: As guessed by KJAWolf, this code is actually compiled into a static lib, not the executable. Its purpose is to register some types also provided by the static lib with a static list of types and their creators, for a factory to then read from on construction. Since these types are provided with the lib, adding this code to the executable is undesirable.
Also I discovered that by moving the code to another source file that contains other existing code, it works fine. It seems that having a file purely consisting of these global objects is what's causing the problem. It's as if that translation unit was entirely ignored.
The compiler is not allowed to optimiza away global objects.
Even if they are never used.
Somthing else is happening in your code.
Now if you built a static library with your global object and that global object is not referenced from the executable it will not be pulled into the executable by the linker.
The compiler should never optimize away such globals - if it does that, it is simply broken.
You can force that one object (your list of types) pulls some other objects with it by partially linking them before building the complete static lib.
With GNU linker:
ld -Ur -o TypeBundle.o type1.o type2.o type3.o static_list.o
ld -static -o MyStaticLib.a type_bundle.o other_object.o another_object.o ...
Thus, whenever the static list is referenced by code using the library, the complete "TypeBundle.o" object will get linked into the resulting binary, including type1.o, type2.o, and type3.o.
While at it, do check out the meaning of "-Ur" in the manual.
To build off of Arthur Ulfeldt, volatile tells the compiler that this variable can change outside of the knowledge of the compiler. I've used it for put a statement to allow the debugger to set a breakpoint. It's also useful for hardware registers that can change based on the environment or that need a special sequence. i.e. Serial Port receive register and certain watchdog registers.
you could use
#pragma optimize off
int globalVar
#pragma optimize on
but I dunno if that only works in Visual Studio ( http://msdn.microsoft.com/en-us/library/chh3fb0k(VS.80).aspx ).
You could also tell the compiler to not optimize at all, especially if you're debugging...
Are you using gcc with gdb? There was a problem in the past where gdb could not accurately set breakpoints in constructors.
Also, are you using an optimization level which allows the compiler to inline methods in the class definition.
You need to use -whole-archive when linking. See the answer here:
ld linker question: the --whole-archive option
I have same setup & problem on VS2008.
I found that if you declare you class with dllexport it will not optimize.
class __declspec( dllexport ) Cxxx
{
.
}
However this generates a lot of warnings in my case because I must declare all classes used in this class also as dllexport.
All optimizations are off (in debug mode), still this is optimized. Also volatile/pragma optimize off. On global variable created of this class (in same cpp file) etc does not work.
Just found that dllexport does require at least to include header files of these classes in some other cpp file from exe to work! So only option is to add a file with calls to some static members for each class, and add this file to all projects used these classes.
It would not be a compiler, but the library linker (or the tool that pools object files into the .lib), who can decide that the whole file is not used, so discard it.
A workaround would be to add an empty function to that file and call it from a file that contains stuff that is called directly.
How about using the keyword volatile? It will prevent the compiler from too much optimization.