The effect of number of functions in a C++ library - c++

Suppose now two C++ libraries are available: one library has all the functions that will be needed by the program ( a C++ application program that will invoke the library), and the other one not only has the necessary functions that will be needed by the program but also has other functions that will not not used by the program. We assume that for the common functions in both libraries they are implemented in the same manner. My question is: when the program uses the library to perform a certain task, what's the effect of the library on the performance of the program?
The reason why I asked this question is because when developing a c++ library I often wrote some additional functions, which may not be invoked by the users of the library but are important for debugging. When the library is finished, I have two choices: one is to keep these auxiliary functions and the other is removing them or using other strategies of keep them (for example, define MACRO to disable these functions). If keeping these auxiliaries functions will not deteriorate the performance, I would like to keep them.

Everything else being the same, there will be no performance difference.
In addition, if the library is a static library, the linker will not include the functions that are not used, and the executables will have the same size.

Well if you have written a static library that I guess you have. Then the only difference it will create is that the static library functionality will be part of you executable no matter if you use it or not.
I don't think it will hurt you in terms of speed but yes it will occupy a lot more space since a copy of lib will be created with you executable.

Related

What can be changed in .so library without breaking compatibility

For example can I add new function in header file without the need to recompile all programs using that library?
You can add functions and objects to a shared library without breaking existing programs that rely on that library. Under some circumstances you could increase the size of objects (in particular, arrays) in the library.
You can also replace the implementation of a function, provided that the function signature does not change. This will not cause any problems with dynamic linking, but if the new implementation's behavior does not meet existing programs' expectations then you'll see program misbehavior.
You can remove functions and objects that no program links to. If you're concerned only with existing programs then you may be able to catalog what functions and objects those are, but otherwise you can only base such an evaluation on the visibility of the functions / objects within the shared library -- externally-visible functions and objects cannot safely be removed.
There may be other implementation-specific details of a shared library that can be changed without breaking compatibility as well.
Note, however, that none of that has anything directly to do with header files. Compatibility of shared libraries is primarily a run time consideration. Header files are relevant only at compile time.
Another point is that you have to be very careful of any shared structures. If a function in your library accepts or returns a structure or a pointer to a structure, and if you make any changes to that structure (adding, removing, or rearranging members), you're likely to introduce incompatibility.
(Strictly speaking, changes like this do count as changes to the function's signature, as mentioned by others.)
If you're very, very careful, you can arrange to add new members at the end of a structure, but it generally requires explicit cooperation by callers, using mechanisms defined in advance (that is, adhered to since version 0, by all calling code, before any changes get made).

What is the runtime performance cost of including a library? [duplicate]

This question already has answers here:
Will there be a performance hit on including unused header files in C/C++?
(4 answers)
Closed 6 years ago.
Is there any runtime performance difference between including an entire library (with probably hundreds of functions) and then using only a single function like:
#include<foo>
int main(int argc, char *argv[]) {
bar();//from library foo
return 0;
}
And between pasting the relevant code fragment from the library directly into the code, like:
void bar() {
...
}
int main(int argc, char *argv[]) {
bar();//defined just above
return 0;
}
What would prevent me from mindlessly including all of my favourite (and most frequently used) libraries in the beginning of my C files? This popular thread C/C++: Detecting superfluous #includes? suggests that the compilation time would increase. But would the compiled binary be any different? Would the second program actually outperform the first one?
Related: what does #include <stdio.h> really do in a c program
Edit: the question here is different from the related Will there be a performance hit on including unused header files in C/C++? question as here there is a single file included. I am asking here if including a single file is any different from copy-pasting the actually used code fragments into the source. I have slightly adjusted the title to reflect this difference.
There is no performance difference as far as the final program is concerned. The linker will only link functions that are actually used to your program. Unused functions present in the library will not get linked.
If you include a lot of libraries, it might take longer time to compile the program.
The main reason why you shouldn't include all your "favourite libraries" is program design. Your file shouldn't include anything except the resources it is using, to reduce dependencies between files. The less your file knows about the rest of the program, the better. It should be as autonomous as possible.
This is not such simply question and so does not deserve a simple answer. There are a number of things that you may need to consider when determining what is more performant.
Your Compiler And Linker: Different compilers will optimize in different ways. This is something that is easily overlooked, and can cause some issues when making generalisations. For the most part modern compilers and linkers will optimize the binary to only include what is 100% necessary for executions. However not all compilers will optimize your binary.
Dynamic Linking: There are two types of linking when using other libraries. They behave in similar ways however are fundamentally different. When you link against a dynamic library the library will remain separate from the program and only execute at runtime. Dynamic libraries are usually known as shared libraries and are therefore should be treated as if they are used by multiple binaries. Because these libraries are often shared, the linker will not remove any functionality from the library as the linker does not know what parts of that library will be needed by all binaries within that system or OS. Because of this a binary linked against a dynamic library will have a small performance hit, especially immediately after starting the program. This performance hit will increase with the number of dynamic linkages.
Static Linking: When you link a binary against a static library (with an optimizing linker) the linker will 'know' what functionality you will need from that particular library and will remove functionality that will not be used in your resulting binary. Because of this the binary will become more efficient and therefore more performant. This does however come at a cost.
e.g.
Say you have a an operating system that uses a library extensively throughout a large number of binaries throughout the entire system. If you were to build that library as a shared library, all binaries will share that library, whilst perhaps using different functionality. Now say you statically link every binary against a library. You would end up with and extensively large duplication of binary functionality, as each binary would have a copy of the functionality it needed from that library.
Conclusion: It is worth noting that before asking the question what will make my program more performant, you should probably ask yourself what is more performant in your case. Is your program intended to take up the majority of your CPU time, probably go for a statically linked library. If your program is only run occasionally, probably go for a dynamically linked library to reduce disk usage. It is also worth noting that using a header based library will only give you a very marginal (if at all) performance gain over a statically linked binary, and will greatly increase your compilation time.
It depends greatly on the libraries and how they are structured, and possible on the compiler implementation.
The linker (ld) will only assemble code from the library that is referenced by the code, so if you have two functions a and b in a library, but only have references to a then function b may not be in the final code at all.
Header files (include), if they only contain declarations, and if the declarations does not result in references to the library, then you should not see any difference between just typing out the parts you need (as per your example) and including the entire header file.
Historically, the linker ld would pull code by the files, so as along as every function a and b was in different files when the library was created there would be no implications at all.
However, if the library is not carefully constructed, or if the compiler implementation does pull in every single bit of code from the lib whether needed or not, then you could have performance implications, as your code will be bigger and may be harder to fit into CPU cache, and the CPU execution pipeline would have to occasional wait to fetch instructions from main memory rather than from cache.
It depends heavily on the libraries in question.
They might initialize global state which would slow down the startup and/or shutdown of the program. Or they might start threads that do something in parallel to your code. If you have multiple threads, this might impact performance, too.
Some libraries might even modify existing library functions. Maybe to collect statistics about memory or thread usage or for security auditing purposes.

init_seg and warning C4073 from library code?

We have a C++ library. It has four static objects that are sensitive to initialization order (two of them are strings from the standard library).
We are using init_seg(lib) to control the order of initialization of C++ static objects in the library. The source file that is using it is compiled and used in either a dynamic link library or static library. According to the documentation for init_seg:
... It is particularly important to use the init_seg pragma in dynamic-link libraries (DLLs) or libraries requiring initialization. (emphasis mine)
The Visual Studio solution is organized into four projects. One is the static library, or is the dynamic library, one is the test driver for the static library, and one is the test driver for the dynamic library.
Under Visual Studio, compiling the source file with the init_seg results in warning C4073, with the text initializers put in library initialization area. According to MSDN:
Only third-party library developers should use the library initialization area, which is specified by #pragma init_seg. The following sample generates C4073...
The code using the init_seg is used in the libraries only, and not used with the test drivers. I've verified the static library and dynamic library project settings, and they are clearly calling out library artifacts.
Why am I receiving the C4073 warning?
It's just warning you, not even a wag of the finger, it's more like "are you sure?" When you use a 3rd party library that uses this feature it warned its developer too - unless he turned it off with #pragma warning. You could do the same. Or you could use segment user not segment lib: your strings will still be constructed before your application code runs. Segment lib is really intended for things like ... oh, Qt or MFC or a framework like that that needs to be initialized before any application code runs, including your early initialization stuff in user.
Here's some more information:
Suppose you have a library for your own app. And it has some stuff that needs to be initialized before any of its code is run, because certain classes exposed by this library are intended to be (or, are allowed to be) statically allocated in your application code, and those classes do complicated stuff in their constructors that needs, say, some large bollix of precomputed (but not static) data. So for that precomputed stuff you precompute it in the constructor of a class (a different class) and you statically allocate an instance of that class, so that the initialization of that instance calls its constructor which precomputes all that stuff, and that static instance you mark with pragma init_seg(user). Now, before any of your application code runs - including any constructors of static instances of this library's class in your code - the library's init_seg(user) code will all run, and so, by the time your static instances of this library's classes get constructed the data they need will be there.
Now consider stuff that really has got to be there early, of which static instances exist that you can call. E.g., std::cout. You can call methods on std::cout in constructors of your own classes that you have static instances of. Obviously, the object std::cout needs to be initialized before any of your code runs. And some of your code might run in stuff you've marked init_seg(user). So Microsoft places all such code - that is, the constructors for std::cout etc. - in init_seg(compiler). That stuff will run before anything else.
So what is init_seg(lib) for? Well, suppose you're a framework like MFC. You expose stuff like the Application object that the user will (is likely to, is expected to) create a static instance of. Some of the code in Application which will be run at static initialization time depends on other stuff in MFC that needs to be initialized. So obviously it needs to be init_segged, but where? compiler is for the compiler and runtimes alone, and your framework stuff (MFC) might be used in the init_seg(user) that the user is allowed to use ... so you get an intermediate level between compiler and user - that's lib.
Now it is fairly rare to need anything like that because most C++ libraries used by programs are not themselves used by multiple libraries and thus don't need to make sure they're initialized before all other "ordinary" libraries. MFC does, because you might have bought 3rd party MFC control or other libraries from one or more vendors, and those depend on MFC, and those might use MFC things in their constructors, and you might use objects from those libraries statically, so MFC needs to be initialized before those other libraries.
But in most cases, your library isn't going to be a dependency of other C++ libraries that people are using. Thus there isn't any kind of dependency chain where your library needs to be initialized before other libraries. Your libraries might need to be initialized before your user code, yes, but there isn't any order in which they need to be initialized. So init_seg(user) is fine for all of them.
And, Microsoft (and generally, most C++ experts) will tell you: If there is some kind of order dependency in which your separate libraries need to be initialized at the time of static initialization of your app then you're doing it wrong. Seriously. There's a design problem there.
So, to respond to a comment: It isn't a compiler bug. It's a user warning. Fairly benign: Nothing's going to go wrong if you ignore it (unlike, say ignoring a warning about casting from long to int). But if you're using init_seg(lib) you might not really be understanding what that compiler feature is all about, so they'd like you to think about it. And after you've thought about it, if you still want to do it, go ahead and turn the warning off.

Why are not all Boost libraries header-only?

Why are all libraries in Boost not headers-only?
Saying it differently, what makes the use of .lib/.dll mandatory?
Is it when a class can't be a template or has static fields?
Different points, I guess.
Binary size. Could header-only put a size burden on the client?
Compilation times. Could header-only mean a significant decrease in compilation performance?
Runtime Performance. Could header-only give superior performance?
Restrictions. Does the design require header-only?
About binary size.
and a bit of security
If there's a lot of reachable code in the boost library, or code about which the compiler can't argue whether it is reachable by the client, it has to be put into the final binary. (*)
On operating systems that have package management (e.g. RPM- or .deb-based), shared libraries can mean a big decrease in binary distribution size and have a security advantage: Security fixes are distributed faster and are then automatically used by all .so/.DLL users. So you had one recompile and one redistribution, but N profiteers. With a header-only library, you have N recompiles, N redistributions, always for each fix, and some member of those N are huge in themselves already.
(*) reachable here means "potentially executed"
About compilation times.
Some boost libraries are huge. If you would #include it all, each time you change a bit in your source-file, you have to recompile everything you #included.
This can be counter-measured with cherry picked headers, e.g.
#include <boost/huge-boost-library.hpp> // < BAD
#include <boost/huge-boost-library/just-a-part-of-it.hpp> // < BETTER
but sometimes the stuff you really need to include is already big enough to cripple your recompiles.
The countermeasure is to make it a static or shared library, in turn meaning "compile completely exactly once (until the next boost update)".
About runtime performance.
We are still not in an age were global optimization solves all of our C++ performance problems. To make sure you give the compiler all the information it needs, you can make stuff header-only and let the compiler make inlining decisions.
In that respect, note that inlining gives not always superior performance because of caching and speculation issues on the CPU.
Note also that this argument is mostly with regards to boost libraries that might be used frequently enough, e.g. one could expect boost::shared_ptr<> to be used very often, and thus be a relevant performance factor.
But consider the real and only relevant reason boost::shared_ptr<> is header-only ...
About restrictions.
Some stuff in C++ can not be put into libraries, namely templates and enumerations.
But note that this is only halfway true. You can write typesafe, templated interfaces to your real data structures and algorithms, which in turn have their runtime-generic implementation in a library.
Likewise, some stuff in C++ should be put into source files, and in case of boost, libraries. Basically, this is everything that would give "multiple definition" errors, like static member variables or global variables in general.
Some examples can also be found in the standard library: std::cout is defined in the standard as extern ostream cout;, and so cout basically requires the distribution of something (library or sourcefile) that defines it once and only once.

why do game engines prefer static libraries over dynamic link libraries

I've been reading a few gaming books. And they always prefer to create the engine as a static library over dynamic link. I am new to c++ so I am not highly knowledge when it comes to static libraries and dynamic link libraries. All I know is static libraries increase the size of your program, where DLL link libraries are loaded as you need them within your program.
[edit]
I've played games where it almost seemed they used DLL's to load in sound, lighting, and what not all individually. as the level was loading up. cause you don't necessarily need that when your at the game menu.
Dynamic link libraries need to be position independent; this can cause performance inefficiencies on some processor architectures.
Static libraries can be optimized when included in your program, e.g., by stripping dead code. This can improve cache performance.
By position independent, he means that since the game engine and DLL are completely separated, the DLL is stand-alone and cannot be interwoven into the game engine code, whereas statically linking a library allows the compiler to optimize using both your game engine code AND the library code.
For example, say there's a small function that the compiler thinks should be inlined (copied directly in place of a function call). Then with a static library, the compiler would be able to inline this code, since it knows what the code is (you're linking at compile-time). However, with a dynamic library, the compiler would be unable to inline that code, since it does not know what the code is (since it will be linking at run-time).
Another often overlooked reason which deserves mention is that for many games you aren't going to be running lots of other stuff, and many libraries that are used for games aren't going to be used for the other things that you may be running at the same time as a game, so you don't have to worry about one of the major positives you get from using shared libraries, which is that only one copy of (most of) the library needs to loaded at one time while several things can make use of that one copy. When running a game you will probably only have one program that would want to use that library running anyway because you probably aren't going to be running many other programs (particularly other games or 3D programs) at the same time.
You also open up the possibility of global/link time optimization, which is much more difficult with shared libraries.
Another question covers the differences between static and dynamic libraries: When to use dynamic vs. static libraries
As for why they use static libraries, the extra speed may be worth it and you can avoid DLL hell (was a big problem in the past). It's also useful if you want to distribute your program and libraries together, ensuring the recipient has the correct dependencies, though there's nothing stopping you from distributing DLLs together with the executable.
When developing games for a console, often dynamic linking isn't an option. If you want to use the engine for both console and PC development, it would be best to avoid dynamic linking.