I'm trying to use a third party C++ library that isn't using namespaces and is causing symbol conflicts. The conflicting symbols are for classes my code isn't utilizing, so I was considering creating custom header files for the third party library where the class declarations only include the public members my code is using, leaving out any members that use the conflicting classes. Basically creating an interface.
I have three questions:
If the compilation to .obj files works, will this technique still cause symbol conflicts when I get to linking?
If that isn't a problem, will the varying class declarations cause problems when linking? For example, does the linker verify that the declaration of a class used by each .obj file has the same number of members?
If neither of those are a problem and I'm able to link the .obj files, will it cause problems when invoking methods? I don't know exactly how C++ works under the hood, but if it uses indexes to point to class methods, and those indexes were different from one .obj file to another, I'm guessing this approach would blow up at runtime.
In theory, you need identical declarations for this to work.
In practice, you will definitely need to make sure your declarations contain:
All the methods you use
All the virtual methods, used or not.
All the data members
You need all these in the right order of declaration too.
You might get away with faking the data members, but would need to make sure you put in stubs that had the same size.
If you do not do all this, you will not get the same object layout and even if a link works it will fail badly and quickly at run-time.
If you do this, it still seems risky to me and as a worst case may appear to work but have odd run time failures.
"if it uses indexes ": To some extent exactly how virtual functions work is implementation defined, but typically it does use an index into a virtual function table.
What you might be able to do is to:
Take the original headers
Keep the full declarations for the classes you use
Stub out the classes and declarations you do not use but are referenced by the ones you do.
Remove all the types not referenced at all.
For explanatory purposes a simplified explaination follows.
c++ allows you to use functions you declare. what you do is putting multiple definitions to a single declaration across multiple translation units. if you expose the class declaration in a header file your compiler sees this in each translation unit, that includes the header file.
Therefore your own class functions have to be defined exactly as they have been declared (same function names same arguments).
if the function is not called you are allowed not to define it, because the compiler doesn't know whether it might be defined in another translation unit.
Compilation causes label creation for each defined function(symbol) in the object code. On the other hand a unresolved label is created for each symbol that is referenced to (a call site, a variable use).
So if you follow this rules you should get to the point where your code compiles but fails to link. The linker is the tool that maps defined symbols from each translation-unit to symbol references.
If the object files that are linked together have multiple definitions to the same functions the linker is unable to create an exact match and therefore fails to link.
In practice you most likely want to provide a library and enjoy using your own classes without bothering what your user might define. In spite of the programmer taking extra care to put things into a namespace two users might still choose the same name for a namespace. This will lead to link failures, because the compiler exposed the symbols and is supposed to link them.
gcc has added an attribute to explicitly mark symbols, that should not be exposed to the linker. (called attribute hidden (see this SO question))
This makes it possible to have multiple definitions of a class with the same name.
In order for this to work across compilation units, you have to make sure class declarations are not exposed in an interface header as it could cause multiple unmatching declarations.
I recommend using a wrapper to encapsulate the third party library.
Wrapper.h
#ifndef WRAPPER_H_
#define WRAPPER_H_
#include <memory>
class third_party;
class Wrapper
{
public:
void wrappedFunction();
Wrapper();
private:
// A better choice would be a unique_ptr but g++ and clang++ failed to
// compile due to "incomplete type" which is the whole point
std::shared_ptr<third_party> wrapped;
};
#endif
Wrapper.cpp
#include "Wrapper.h"
#include <third_party.h>
void Wrapper::wrappedFunction()
{
wrapped->command();
}
Wrapper::Wrapper():wrapped{std::make_shared<third_party>()}
{
}
The reason why a unique_ptr doesn't work is explained here: std::unique_ptr with an incomplete type won't compile
You can move the entire library into a namespace by using a clever trick to do with imports. All the import directive does is copy the relevant code into the current "translation unit" (a fancy name for the current code). You can take advantage of this as so
I've borrowed heavily from another answer by user JohnB which was later deleted by him.
// my_thirdparty.h
namespace ThirdParty {
#include "thirdparty.h"
//... Include all the headers here that you need to use for thirdparty.
}
// my_thirdparty.cpp / .cc
namespace ThirdParty {
#include "thirdparty.cpp"
//... Put all .cpp files in here that are currently in your project
}
Finally, remove all the .cpp files in the third party library from your project. Only compile my_thirdparty.cpp.
Warning: If you include many library files from the single my_thirdparty.cpp this might introduce compiler issues due to interaction between the individual .cpp files. Things such as include namespace or bad define / include directives can cause this. Either resolve or create multiple my_thirdparty.cpp files, splitting the library between them.
Related
Imagine I've got the following files:
simulate.h:
#ifndef SIMULATE_H
#define SIMULATE_H
#include "my_data_type.h"
MyDataType Simulate ();
#endif
simulate.cpp:
#include "simulate.h"
// include lots of other things
// define lots of functions and new classes to solve sub-problems
// finally we define the "Simulate" function, which is the **only** thing we want to export.
Now, imagine that we have lots of header/cpp files pairs like above (with a tonne of functions/data types that aren't required outside of the cpp files).
Am I right in thinking that this creates unnecessary overhead for both the compiler and the linker?
As I understand it, the compiler can't know what won't be used by other object files, so this would create more bloated .o files and thereby slow down the linker, is that right?
I know that modules solves a lot of these problems in c++20, but is there some standard way around it in c++17?
I can think of one way that would seem to communicate to the compiler that the introduced functions/data types are not going to be reused: wrap them up in a class, put everything in the private section and expose only one method to the public. However, this is super hacky and ugly.
As I understand it, the compiler can't know what won't be used by other object files, so this would create more bloated .obj files and thereby slow down the linker, is that right?
Broadly speaking... no.
The code has to be in the .o files, because the code you do "export" uses it. Any code transitively used by the executable has to get compiled and stored in some object file. So it's going to have to be somewhere, and the compiler will have to read it and store it in the executable.
As for whether it slows down linkers, they're pretty good about finding the specific code they're looking for. I mean yes, some compilers may load the whole file into memory, so a bigger .o will take longer to load. But again, your "exported" function uses that code, so it still matters.
As Jarod42 pointed out in the comments, we can used unnamed/anonymous namespaces, whose contents are unreachable from other translation units (N3797: 7.3.1.1):
An unnamed-namespace-definition behaves as if it were replaced
by
inline /*opt*/ namespace unique { /* empty body */ }
using namespace unique;
namespace unique { namespace-body }
where inline appears if and only if it appears in the
unnamed-namespace-definition, all occurrences of unique in a
translation unit are replaced by the same identifier, and this
identifier differs from all other identifiers in the entire program.
So we can use this to tell the compiler what's local to our translation unit:
namespace // Sub-problem stuff:
{
// ... classes
// ... functions
}
// Export stuff:
// ... classes
// ... functions
This way the compiler is only bound to the correct behaviour of the "exports", so can do things like remove some of the sub-problem functions that have been inlined.
In OOP, you want to break apart a program into multiple classes.
In C#, you would do as such:
namespace a
{
public class ClassA
{
//Methods...that are defined and have code in them.
}
}
and to use that class, you just do "using namespace a;".
Say I want to create a class in C++, and define them, and put code in them.
class ClassA
{
public:
void methodA();
}
ClassA::methodA()
{
//implementation.
}
To access this implementation, you would just use #include "ClassA.h". I fully understand that, and then you have to implement that code again? That seems counterproductive as I like to spread my project over a lot of classes.
So what would be the proper procedure to implement ClassA and not re-implement all it's methods again?
You don't have to reimplement them in each CPP file, the C++ linker takes care of making sure the definitions get matched together.
All you need is:
A header:
#ifndef FOO_H
#define FOO_H
class Foo{
//Junk goes here
};
#endif
A cpp:
#include "foo.h"
//implementations for junk goes here
void Foo::junk(){
}
And then you can include foo.h. Each cpp will be compiled to a .o file. Than, those .o files are handed to the linker which can figure out where definitions are and piece together the code correctly.
C and C++ have a different way of doing things. As long as you have a declaration for a class, method, or external variable the compiler will happily compile and leave off the actual definition of the methods, classes, etc, for link time. This is simplifying things a lot, but basically the compiler will leave a hint to the linker in the object file, saying that the linker needs to insert the address of the method here.
So you just need to include the "ClassA.h" file and you can compile fine.
Because of this you see some different behavior in C and C++ than you would in C#. For example, in C or C++ it's perfectly fine to have two different items (methods, variables, etc) that are named the same in different files as long as neither one is visible outside the file. Whereas in C# you would have to use different namespaces or different names. Note - not that I'm saying this is good practice, it's just possible.
The .h header files contain the class specification. The corresponding .cpp files contain the implementation and are compiled to .o files. During development, you would include .h files to access the APIs provided by the class. During compilation/linking stage, you would include the .o files also along with your source files to form the final binary. You don't need to implement anything again, w.r.t to the class you are using.
We have attempted to reduce code duplication through the use of the TEST_GROUP_BASE to create a shared base class. When we attempt to use this TEST_GROUP_BASE in more than one test class, we get linker warnings complaining about 'getwchar'and 'putwchar': inconsistent dll linkage and errors reporting multiple definitions of both these functions, and a number of other 'char'/'wchar' pairs (e.g. strchr/wcschr, strpbrk/wcspbrk). If I only include one test file that makes use of the TEST_GROUP_BASE macro, the linker errors don't appear.
The base class is defined as a TEST_BASE in a .h file with all the member functions inlined. This .h file is then included in the derived test files with the TEST_GROUP_BASE macro used to incorporate the shared TEST_BASE. Have I missed anything?
I've not managed to find any examples of TEST_GROUP_BASE being used so I'm not sure whether I've missed a critical piece of configuration. We are testing legacy C code, but all references to the production code are made within extern "C" braces, since our simple tests pass that would suggest that the c/c++ is linking OK.
Can anyone suggest any possible causes, or point me in the direction of any opensource examples of how TEST_GROUP_BASE is being used elsewhere?
The development environment is VS2010.
I'm not quite sure why there are errors on putwchar and getwchar, that probably is unrelated to TEST_BASE AND TEST_GROUP_BASE but probably relates to them being inline and the header file being included with different linkage. Without a code example, it would be hard to figure out where the different linkage problems come from, especially as you mentioned that it works with only one TEST_GROUP_BASE.
Probably the best way to resolve this problem though is to not put all the TEST_BASE functions inline in the header file. The TEST_BASE macro is actually very simple replacement for "struct testBaseClass : public Utest". So a TEST_BASE is simply any class that is sub-classed from Utest. That means that you can simply put the implementation in a cpp file.
One of the reasons why you can't find much usage of TEST_GROUP_BASE is that many people (including me) recommend against using it. It is often more flexible to put the parts that you want to re-use in a seperate class and use (rather than derive) that class in your TEST_GROUP. This allows for many smaller "fixture" classes that can be re-used across different tests.
Hope this helps.
I am a little out of my depth here, I'll be honest.
I am doing some rather curious experimentation, having pre-main functions register my classes in a factory, through anonymous namespaces. Until recently, adding the following to a class definition (.cpp) would do the trick.
namespace { int x = Register<classType>(className); }
This would be wrapped in a macro and 'Register' would pass the type and name off to my factory.
This worked fine and every class that included this macro was getting registered, until I went to move the code into a static library. Now, since the classes are only referenced by the factory, it looks like they're being omitted from the build - my 'Register' functions are no longer being called and so my factory is empty.
I have managed to fix this by moving all my macros into the constructor of a manager object, but I noticed that as soon as I referenced them there, the macros in the .cpp files started getting called again. I guessed because now the classes are actually being referenced by something.
However, I don't really want to do it this way and I can't seem to find a non-committal way of referencing the classes in the constructor (e.g class ClassToRegister;) which includes them in the build, so that my register macros will get called.
Firstly, does this make sense?
Secondly, any advice on how I can force these TUs to compile so that the anonymous namespaces 'kick in' at runtime?
It appears this is part and parcel of using static libraries; unused code will not make it through without linker voodoo.
A static library is a bunch of object files you give the linker saying "hey, find here what I didn't define elsewhere". So if a given object file in the library doesn't fill a dependency, you won't be able to include it in your program without relying on other features of the linker (for instance some linkers have a way to include all the object files of static libraries instead of only the one who fill dependencies).
Most likely you are a victim of aggressive optimisation, but with a reason : since you do not use objects in a nameless namespace, the compiler removes them.
You could try to get around like this :
namespace foo
{
namespace{
MACRO_TO_DEFINE_VARIABLE( MyClass ); // define a variable named registrationObj
};
MyClass::MyClass()
{
(void)registrationObj;
}
}
I am compiling C++ on VS 2005.
When and why use #include and when and why use pre-decleration as class XXXX?
What is the benefit of using each option and which one is preffered?
I would also glad for a good tutorial on compiling.
Always prefer forward declaration whenever possible. Changes to the referred class file will not trigger recompilation of cpp files including the class using the pre-declared one. This reduces a bit the dependencies.
On each place where you are effectively using the class XXXX, you will have to include that header. If you derive from class XXXX, you will also have to include the header.
A header file is used to contain the declaration of entities that are defined in separate compilation units. If you did not have a header file, you'd have to enter such declarations in every compilation unit (which is essentially what #include does for you, it inserts the contained text at that point in the file, however if you did not use a header, you'd have to do it multiple times and that is both error prone and difficult to maintain when the code changes.
You'd use a declaration directly in the .cpp file for example if the symbol being defined is only ever used within that compilation unit and therefore did not need global visibility. In the case of data declarations, you also typically declare them static to give them scope limited to the compilation unit.