It seems that in C++ and D, languages which are statically compiled and in which template metaprogramming is a popular technique, there is a decent amount of concern about template instantiation bloat. It seems to me like mostly a theoretical concern, except on very resource-constrained embedded systems. Outside of the embedded space, I have yet to hear of an example of someone being able to demonstrate that it was a problem in practice.
Can anyone provide an example outside of severely resource-limited embedded systems of where template instantiation bloat mattered in practice and had measurable, practically significant negative effects?
There's little problem in C++, because the amount of template stuff you can do in C++ is limited by their complexity.
In D however ... before CTFE (compile-time function evaluation) existed, we had to use templates for string processing. This is also the reason big mangled symbols are compressed in DMD - the strings used as template arguments become part of the mangled symbol names, and when instantiating templates with longer segments of code (for instance) the resulting symbol sizes would spectacularly explode the object format.
It's better nowadays. But on the whole, templates still cause a lot of bloat for a simple reason - they parse faster and are more powerful than in C++, so people naturally use them a lot more (even in cases that technically wouldn't need templates). I must admit I'm one of the prime offenders here (take a look at tools.base if you like, but be sure to keep a barf bag handy - the file is effectively 90% template code).
Template bloat is NOT an issue (It is a mental problem not a code problem).
Yes it can get big. But what's the alternative?
You could write all the code yourself manually (once for each type). Do you think writting it manually will make it smaller. The compiler only instanciate the versions it actually needs and the linker will remove multiple copies spread over compilation units.
So there is no actual bloat.
It is just building what you use. If you use a lot of different types you need to write more code.
I think you'll need to find an older compiler to see the template code bloat in practice. Modern C++ compilers (and linkers) have been able to optimize it away for a while.
I think it's mainly mental bloat. The next programmer to work on your code will first need to figure out what subset of it matters.
Template instantion bloat IS a matter in practice, because it can increases ( a lot!!! ) compile and link time.
I personnaly thinks that c++ #1 problem is compil time, and it's mainly due to template.
I worked on a project with about 50 libs. We had our own rtti system using templates. I had to rewrite because of the template bloat
Here is some numbers:
libs went from 640 mbytes to 420 mbytes
temps went from 4.3 gbytes to 2.9 gbytes
full rebuild went from 19:30 to 13:10
Related
In C, when you'd like to do generic programming, your only language-supported option is macros. They work great and are widely used, but are discouraged if you can get by with inline functions or regular functions instead. (If using gcc, you can also use gcc statement expressions, which avoid the double-evaluation "bug". Example.)
C++, however, has done away with the so-called "evils" of macros by creating templates. I'm still somewhat new to the full-blown gigantic behemoth of a language that is C++ (I assess it must have like 4 or 5x as many features and language constructs as C), and generally have favored macros or gcc statement expressions, but am being pressured more and more to use templates in their place. This begs the question: in general, when doing generic programming in C++, which will produce smaller executables: macros or templates?
If you say, "size doesn't matter, choose safety over size", I'm going to go ahead and stop you right there. For large computers and application programming, this may be true, but for microcontroller programming on an Arduino, ATTiny85 with 8KB Flash space for the program, or other small devices, that's hogwash. Size matters too, so tradeoffs must be made.
Which produces smaller executables for the same code when doing generic programming? Macros or templates? Any additional insight is welcome.
Related:
Do c++ templates make programs slow?
Side note:
Some things can only be done with macros, NOT templates. Take, for example, non-name-mangled stringizing/stringifying and X macros. More on X macros:
Real-world use of X-Macros
https://www.geeksforgeeks.org/x-macros-in-c/
https://en.wikipedia.org/wiki/X_Macro
At this time of history, 2020, this is only the job of the optimizer. You can achieve better speed with assembly, the point is that it's not worth in both size and speed. With proper C++ programming your code will be fast enough and small enough. Getting faster or smaller by messing the readability of the code is not worth the trouble.
That said, macros replace stuff at the preprocessor level, templates do that at the compile level. You may get faster compilation time with macros, but a good compiler will optimize them more that macros. This means that you can have the same exe size, or possibly less with templates.
The vast, 99%, troubles of speed or size in an application comes from the programmers errors, not from the language. Very often I discover that some photo resources are PNG instead of proper JPG in my executable and voila, I have a bloat. Or that I accidentally forgot to use weak_ptr to break a reference and now I have two shared pointers that share 100MB of memory that will not be freed. It's almost always the human error.
... in general, when doing generic programming in C++, which will produce smaller executables: macros or templates?
Measure it. There shouldn't be a significant difference assuming you do a good job writing both versions (see the first point above), and your compiler is decent, and your code is equally sympathetic to both.
If you write something that is much bigger with templates - ask a specific question about that.
Note that the linked question's answer is talking about multiple non-mergeable instantiations. IME function templates are very often inlined, in which case they behave very much like type-safe macros, and there's no reason for the inlining site to be larger if it's otherwise the same code. If you start taking the addresses of function template instantiations, for example, that changes.
... C++ ... generally have favored macros ... being pressured more and more to use templates in their place.
You need to learn C++ properly. Perhaps you already have, but don't use templates much: that still leaves you badly-placed to write a good comparison.
This begs the question
No, it prompts the question.
I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)
I use GCC and Clang, but any suggestions are welcome.
I found profile_templates on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project1, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.
1 Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.
I know this is an old question, but there is a newer answer that I would like to give.
There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):
https://github.com/mikael-s-persson/templight
A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector and sorts it with std::sort:
Check it out here:
https://github.com/mikael-s-persson/templight-tools
Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:
https://github.com/sabel83/metashell
I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.
The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.
Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.
Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.
The classic book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond comes with a 20 page appendix on profiling compile-time costs. It has a companion CD with the code that generates the graphs in that appendix.
Another paper is http://aszt.inf.elte.hu/~gsd/s/cikkek/profiling/profile.pdf, perhaps that is of use to you.
Yet another, but more labor-intensive, way is to count the number of instantiations of each class from your compiler output.
http://www.cs.uoregon.edu/research/tau/about.php might be something which can be of your interest as for templated entities, it shows the breakup of time spent for each instantiation. Other data includes how many times each function was called, how many profiled functions did each function invoke, and what the mean inclusive time per call was
I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)
I use GCC and Clang, but any suggestions are welcome.
I found profile_templates on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project1, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.
1 Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.
I know this is an old question, but there is a newer answer that I would like to give.
There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):
https://github.com/mikael-s-persson/templight
A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector and sorts it with std::sort:
Check it out here:
https://github.com/mikael-s-persson/templight-tools
Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:
https://github.com/sabel83/metashell
I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.
The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.
Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.
Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.
The classic book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond comes with a 20 page appendix on profiling compile-time costs. It has a companion CD with the code that generates the graphs in that appendix.
Another paper is http://aszt.inf.elte.hu/~gsd/s/cikkek/profiling/profile.pdf, perhaps that is of use to you.
Yet another, but more labor-intensive, way is to count the number of instantiations of each class from your compiler output.
http://www.cs.uoregon.edu/research/tau/about.php might be something which can be of your interest as for templated entities, it shows the breakup of time spent for each instantiation. Other data includes how many times each function was called, how many profiled functions did each function invoke, and what the mean inclusive time per call was
In this chapter Scott Meyer mentioned a few technique to avoid header files dependency. The main goal is to avoid recompiling a cpp file if changes are limited to other included header files.
My questions are:
In my past projects I never paid attention to this rule. The compilation time is not short but it is not intolerable. It could have more to do with the scale (or the lack of) of my projects. How practical is this tip today given the advance in the compiler technology (e.g. clang)?
Where can I find more examples of the use of this techniques? (e.g. Gnome or other OSS projects)
P.S. I am using the 2nd edition.
I don't think compiler technology has advanced particularly. clang is not some piece of magic - if you have dependencies then and you make changes, then dependent code will have to be recompiled. This can take a very, very long time - read hours, or even days for a big project, so people try to minimise such dependencies where possible.
Having said that, it is possible to overdo things - making all classes into PIMPLs, forward declaring everything, etc. Doing this just leads to obfuscated code, and should be avoided whenever possible.
Reducing compilation times is a red herring, and a form of premature optimization. Reorganizing your code to reduce compilation times (when this matters) can be done, but at a somehow great cost.
As for Gnome, Gnome has a "private pointer" in every GObject. This implements the pimpl idiom. This reduces dependencies between source files, and allow for some form of encapsulation. There are fewer compile time problems for C projects.
Modern C++ designs make heavy use of templates, which inevitably make your compilation times skyrocket. Using the pimpl idiom and forward declaring classes (instead of including a header, where possible) reduces the logical dependencies between translation units (this is a good thing), but in many situations do not really help with compilation times.
Using boost greatly increase compilation times (beware if you indirectly include boost headers in many source files), and many C++ projects use it.
I should mention also the thin template idiom is often used to reduce code bloat with templates.
I've noticed that when I use a boost feature the app size tends to increase by about .1 - .3 MB. This may not seem like much, but compared to using other external libraries it is (for me at least). Why is this?
Boost uses templates everywhere. These templates can be instantiated multiple times with the same parameters. A sufficiently smart linker will throw out all but one copy. However, not all linkers are sufficiently smart. Also, templates are instantiated implicitly sometimes and it's hard to even know how many times one has been instantiated.
"so much" is a comparative term, and I'm afraid you're comparing apples to oranges. Just because other libraries are smaller doesn't imply you should assume Boost is as small.
Look at the sheer amount of work Boost does for you!
I doubt making a custom library with the same functionality would be of any considerable lesser size. The only valid comparison to make is "Boost's library that does X" versus "Another library that does X". Not "Boost's library that does X" and "Another library that does Y."
The file system library is very powerful, and this means lots of functions, and lot's of back-bone code to provide you and I with a simple interface. Also, like others mentioned templates in general can increase code size, but it's not like that's an avoidable thing. Templates or hand-coded, either one will results in the same size code. The only difference is templates are much easier.
It all depends on how it is used. Since Boost is a bunch of templates, it causes a bunch of member functions to be compiled per type used. If you use boost with n types, the member functions are defined (by C++ templates) n times, one for each type.
Boost consists primarily of very generalized and sometimes quite complex templates, which means, types and functions are created by the compiler as required per usage, and not simply by declaration. In other words, a small amount of source code can produce a significant quantity of object code to fulfill all variations of templates declared or used. Boost also depends on standard libraries, pulling in those dependencies as well. However, the most significant contribution is the fact that Boost source code exists almost primarily in include files. Including standard c include files (outside of STL) typically includes very little source code and contains mostly prototypes, small macros or type declarations without their implementations. Boost contains most of its implementations in its include file.