Reducing STL code bloat by wrapping containers

Reducing STL code bloat by wrapping containers - c++

I have a C++ library (with over 50 source files) which uses a lot of STL routines with primary containers being list and vector. This has caused a huge code bloat and I would like to reduce the code bloat by creating a wrapper over the list and vector.
Shown below is my wrapper over std:: and the wrapped instances.
template<typename T>
class wlist
{
private:
std::list<T> m_list;
public:
// new iterator set.
typedef typename std::list<T>::iterator iterator;
typedef typename std::list<T>::const_iterator cIterator;
typedef typename std::list<T>::reverse_iterator reverse_iterator;
unsigned int size () { return m_list.size(); }
bool empty () { return m_list.empty(); }
void pop_back () { m_list.pop_back(); }
void pop_front () { m_list.pop_front(); }
void push_front (T& item) { m_list.push_front(item); }
void push_back (T item) { m_list.push_back(item); }
iterator insert(iterator position, T item) {m_list.insert(position,item);}
bool delete_item (T& item);
T back () { return (m_list.empty()) ? NULL : m_list.back();}
T front () { return (m_list.empty()) ? NULL : m_list.front();}
iterator erase(iterator item ) { return m_list.erase(item); }
iterator begin() { return m_list.begin(); }
iterator end() { return m_list.end(); }
reverse_iterator rbegin() { return m_list.rbegin(); }
};
File A:
class label {
public:
int getPosition(void);
setPosition(int x);
private:
wlist<text*> _elementText; // used in place of list<text> _elementText;
}
File B:
class image {
private:
void draw image() {
wlist<label*>::iterator currentElement = _elementText.begin();
((label*)(*currentElement))->getPosition();
currentElement ++;
}
}
My belief was that by wrapping the STL container, I would be able to reduce the code bloat but the reduction in code size seems to be insignificant while my motive to wrap the STL was to achieve a code reduction of roughly 20%.
1) By exposing the "wrapped" iterator, have I in-turn embedded STL into my client code thereby negating all the code saving that I was trying to do ????
2) Have I chosen the right profiling method ????
Size before modification:
$ size libWrap.so
text: 813115
data: 99436
bss: 132704
dec : 1045255
hex: ff307
Size after modification:
$ size libWrap.so
text: 806607
data: 98780
bss: 132704
dec : 1038091
hex: fd70b

Firstly, the interface offered by your wrapper is completely and totally disgusting. There's a reason that iterators exist, and it's because your implementation flat out doesn't work for non-pointer types. Returning and taking by value instead of by reference? A terrible design.
Secondly, you can never reduce the size of your program by introducing more code. Your wrapper still uses the STL list under the hood, so you're still instantiating all of those types. Most likely, the compiler just completely removed the whole lot.
Thirdly, you're not even doing an equivalent replacement, because you've replaced what used to be a list of values wth a list of pointers, introducing six million lifetime headaches and other problems.
Fourthly, even the idea of code bloat is quite ridiculous on the vast majority of platforms. I, of course, cannot psychically know that you are not working on some embedded platform with hardly any memory (although I doubt you would use many lists on such a platform) but on virtually every system, the size of the code itself is meaningless compared to other assets needed for the program to execute.
What you can do is try something like SCARY iterators or partial specializations for T*.

I am trying to imagine why you are concerned with this. Why is it a problem?
My guess is that you have many ( hundreds? ) of different classes, and each one generates a copy of the templated containers.
If this is so and if it is necessary, then sit back and let the compiler do the tedious work for you.
If it is not necessary, then the problem seems likely to be that all your different classes are not necessary. There is a problem with your class design. You might have many different classes that differ only slightly. If the difference is so slight that the extra code generated to handle the difference seems out of proportion, then the different behavior might be better handled by code inside a single class.

It seems that you want to pre-compile your templated wrapper once only in your library rather than have the compiler figure out the templated class every time it gets called. You can do this by moving your declaration from the header file (where it normally is for templated code) into your .cpp file. This also has the advantage that it reduces compilation times. There is a price in flexability in this approach, however, you have know from the beginnings the types that you want your class to work for (but you don't want the compiler to figure it out for you, anyway)
Putting templated code into a .cpp file will usually result in linker errors. To avoid these you need to expliciltly declaire the templates that you want the compiler to compile in the cpp file:
At the end of the .cpp file, you write something like
template class wlist<double>;
template class wlist<int>;
etc.
This instructs the compiler to compiler to compile these version of the class (and only these versions).
This of cause reduces the flexibility of your library - if you call a wlist<complex> then you would get the linker errors.
See here for more info: http://www.parashift.com/c++-faq-lite/templates.html#faq-35.12
I believe this is usually done to reduce compilation times - I imagine it will reduce code bloat too, but I have never used the technique for this reason and so never checked the size of my executable....

Related

Displaying the sizes of all objects, containers, programs etc. С++

Let's imagine that we have a big project. In which there are many classes, objects of these classes, containers of objects, and so on.
Can I view the dimensions of all these objects? Output, for example, a table or a list of all objects, containers, with their sizes.
Of course, for a project, for example, in 20000+ lines, I will not be able to display everything by hand.
It is important that this can be done not only after the completion of the program, but during execution
Are there any utilities, programs, perhaps this can be done using gdb or some other means.
I hope I explained the question exhaustively, ask questions, I will explain what I can.

Output, for example, a table or a list of all objects, containers, with their sizes.
Of course, for a project, for example, in 20000+ lines, I will not be able to display everything by hand.
Yes, you can do that. Either by writing appropriate object-tracking code and linking it into your program, or by writing a GDB script to enumerate objects "externally".
However, usually this request means that you want to understand your program via debugger instead of understanding it by reading the code. And the former task is usually about 10 times harder than the latter.
In addition, 20,000 line program is relatively small. A big project is something with a few million lines of code.
Update:
Imagine that you have a multi-threaded program with various modules and "legacy code" that leaks a container or something else, such a leak cannot be determined from the dump from the sanitizer, because when the program ends, RAII cleans everything.
Ah, so this is a good example of an XY problem.
As you can imagine, this problem isn't unique to your project, and there are existing tools to solve it.
For example, TCMalloc includes heap profiler, which allows you to snapshot current heap state and answer questions such as "for all currently allocated objects, where did they get allocated from?".
One advantage of TCMalloc sampling approach (compared to what you proposed) is that its overhead is quite small, and so it can be live in production -- you don't need a special heap-debug build and can analyze any process at will.

As Neil Butterworth pointed out, C++ does not maintain a list of objects by itself, but if you really wanted to, you could implement it yourself: Create a class that tracks its instances and have all classes whose instances you want to monitor inherit from it. However, you’re probably better off using conventional debugging tools (example) to get the information that you need. Maybe there is a better whay to diagnose your problem than to print a list of all objects and their sizes – see the XY problem.
If you really want to try the instance list approach, here’s some code:
ListedObject.h++
#include <unordered_set>
#include <string>
#include <cstdio>
class ListedObject
{
private:
static std::unordered_set<const ListedObject*> objectList;
public:
ListedObject ()
{
objectList.insert(this);
}
virtual ~ListedObject ()
{
objectList.erase(this);
}
virtual std::size_t loSize () const = 0;
// You will need to override this with e.g.
// std::size_t loSize () const override {return size();}
virtual std::string loName () const
{
return "<anonymous>";
}
// You need a mechanism to give the objects names that exist at runtime
// (your variable names won’t)
static void printObjects ()
{
for (const ListedObject *const lo : objectList)
{
std::printf("%s %zu\n", lo->loName().c_str(), lo->loSize());
}
}
};
ListedObject.c++
#include "ListedObject.h++"
std::unordered_set<const ListedObject*> ListedObject::objectList;
You would use it like this:
#include "ListedObject.h++"
class DummyContainer: public ListedObject
// not really a container, just something that simulates having a size
{
public:
std::size_t dummySize;
DummyContainer (std::size_t dummySize): dummySize(dummySize) {}
std::size_t loSize () const override
{
return dummySize;
}
};
int main (void)
{
DummyContainer d1(4);
DummyContainer d2(12);
{
DummyContainer d3(1337);
}
DummyContainer d4(42);
ListedObject::printObjects();
// For me, this prints:
// <anonymous> 42
// <anonymous> 12
// <anonymous> 4
return 0;
}
However, this approach won’t work for classes you can’t edit, like std::vectors. You could try to get around this with something like…
template <class T>
class ListedVector: public std::vector<T>, public ListedObject
{
public:
std::size_t loSize () const override
{
return sizeof(T) *
//size(); // does not compile for some reason¹
static_cast<const std::vector<T>*>(this)->size();
}
};
…, but this will not inherit the various constructor signatures of std::vectors.
¹ gcc says: “error: there are no arguments to ‘size’ that depend on a template parameter, so a declaration of ‘size’ must be available [-fpermissive]”

Efficient configuration of class hierarchy at compile-time

This question is specifically about C++ architecture on embedded, hard real-time systems. This implies that large parts of the data-structures as well as the exact program-flow are given at compile-time, performance is important and a lot of code can be inlined. Solutions preferably use C++03 only, but C++11 inputs are also welcome.
I am looking for established design-patterns and solutions to the architectural problem where the same code-base should be re-used for several, closely related products, while some parts (e.g. the hardware-abstraction) will necessarily be different.
I will likely end up with a hierarchical structure of modules encapsulated in classes that might then look somehow like this, assuming 4 layers:
Product A Product B
Toplevel_A Toplevel_B (different for A and B, but with common parts)
Middle_generic Middle_generic (same for A and B)
Sub_generic Sub_generic (same for A and B)
Hardware_A Hardware_B (different for A and B)
Here, some classes inherit from a common base class (e.g. Toplevel_A from Toplevel_base) while others do not need to be specialized at all (e.g. Middle_generic).
Currently I can think of the following approaches:
(A): If this was a regular desktop-application, I would use virtual inheritance and create the instances at run-time, using e.g. an Abstract Factory.
Drawback: However the *_B classes will never be used in product A and hence the dereferencing of all the virtual function calls and members not linked to an address at run-time will lead to quite some overhead.
(B) Using template specialization as inheritance mechanism (e.g. CRTP)
template<class Derived>
class Toplevel { /* generic stuff ... */ };
class Toplevel_A : public Toplevel<Toplevel_A> { /* specific stuff ... */ };
Drawback: Hard to understand.
(C): Use different sets of matching files and let the build-scripts include the right one
// common/toplevel_base.h
class Toplevel_base { /* ... */ };
// product_A/toplevel.h
class Toplevel : Toplevel_base { /* ... */ };
// product_B/toplevel.h
class Toplevel : Toplevel_base { /* ... */ };
// build_script.A
compiler -Icommon -Iproduct_A
Drawback: Confusing, tricky to maintain and test.
(D): One big typedef (or #define) file
//typedef_A.h
typedef Toplevel_A Toplevel_to_be_used;
typedef Hardware_A Hardware_to_be_used;
// etc.
// sub_generic.h
class sub_generic {
Hardware_to_be_used the_hardware;
// etc.
};
Drawback: One file to be included everywhere and still the need of another mechnism to actually switch between different configurations.
(E): A similar, "Policy based" configuration, e.g.
template <class Policy>
class Toplevel {
Middle_generic<Policy> the_middle;
// ...
};
// ...
template <class Policy>
class Sub_generic {
class Policy::Hardware_to_be_used the_hardware;
// ...
};
// used as
class Policy_A {
typedef Hardware_A Hardware_to_be_used;
};
Toplevel<Policy_A> the_toplevel;
Drawback: Everything is a template now; a lot of code needs to be re-compiled every time.
(F): Compiler switch and preprocessor
// sub_generic.h
class Sub_generic {
#if PRODUCT_IS_A
Hardware_A _hardware;
#endif
#if PRODUCT_IS_B
Hardware_B _hardware;
#endif
};
Drawback: Brrr..., only if all else fails.
Is there any (other) established design-pattern or a better solution to this problem, such that the compiler can statically allocate as many objects as possible and inline large parts of the code, knowing which product is being built and which classes are going to be used?

I'd go for A. Until it's PROVEN that this is not good enough, go for the same decisions as for desktop (well, of course, allocating several kilobytes on the stack, or using global variables that are many megabytes large may be "obvious" that it's not going to work). Yes, there is SOME overhead in calling virtual functions, but I would go for the most obvious and natural C++ solution FIRST, then redesign if it's not "good enough" (obviously, try to determine performance and such early on, and use tools like a sampling profiler to determine where you are spending time, rather than "guessing" - humans are proven pretty poor guessers).
I'd then move to option B if A is proven to not work. This is indeed not entirely obvious, but it is, roughly, how LLVM/Clang solves this problem for combinations of hardware and OS, see:
https://github.com/llvm-mirror/clang/blob/master/lib/Basic/Targets.cpp

First I would like to point out that you basically answered your own question in the question :-)
Next I would like to point out that in C++
the exact program-flow are given at compile-time, performance is
important and a lot of code can be inlined
is called templates. The other approaches that leverage language features as opposed to build system features will serve only as a logical way of structuring the code in your project to the benefit of developers.
Further, as noted in other answers C is more common for hard real-time systems than are C++, and in C it is customary to rely on MACROS to make this kind of optimization at compile time.
Finally, you have noted under your B solution above that template specialization is hard to understand. I would argue that this depends on how you do it and also on how much experience your team has on C++/templates. I find many "template ridden" projects to be extremely hard to read and the error messages they produce to be unholy at best, but I still manage to make effective use of templates in my own projects because I respect the KISS principle while doing it.
So my answer to you is, go with B or ditch C++ for C

I understand that you have two important requirements :
Data types are known at compile time
Program-flow is known at compile time
The CRTP wouldn't really address the problem you are trying to solve as it would allow the HardwareLayer to call methods on the Sub_generic, Middle_generic or TopLevel and I don't believe it is what you are looking for.
Both of your requirements can be met using the Trait pattern (another reference). Here is an example proving both requirements are met. First, we define empty shells representing two Hardwares you might want to support.
class Hardware_A {};
class Hardware_B {};
Then let's consider a class that describes a general case which corresponds to Hardware_A.
template <typename Hardware>
class HardwareLayer
{
public:
typedef long int64_t;
static int64_t getCPUSerialNumber() {return 0;}
};
Now let's see a specialization for Hardware_B :
template <>
class HardwareLayer<Hardware_B>
{
public:
typedef int int64_t;
static int64_t getCPUSerialNumber() {return 1;}
};
Now, here is a usage example within the Sub_generic layer :
template <typename Hardware>
class Sub_generic
{
public:
typedef HardwareLayer<Hardware> HwLayer;
typedef typename HwLayer::int64_t int64_t;
int64_t doSomething() {return HwLayer::getCPUSerialNumber();}
};
And finally, a short main that executes both code paths and use both data types :
int main(int argc, const char * argv[]) {
std::cout << "Hardware_A : " << Sub_generic<Hardware_A>().doSomething() << std::endl;
std::cout << "Hardware_B : " << Sub_generic<Hardware_B>().doSomething() << std::endl;
}
Now if your HardwareLayer needs to maintain state, here is another way to implement the HardLayer and Sub_generic layer classes.
template <typename Hardware>
class HardwareLayer
{
public:
typedef long hwint64_t;
hwint64_t getCPUSerialNumber() {return mySerial;}
private:
hwint64_t mySerial = 0;
};
template <>
class HardwareLayer<Hardware_B>
{
public:
typedef int hwint64_t;
hwint64_t getCPUSerialNumber() {return mySerial;}
private:
hwint64_t mySerial = 1;
};
template <typename Hardware>
class Sub_generic : public HardwareLayer<Hardware>
{
public:
typedef HardwareLayer<Hardware> HwLayer;
typedef typename HwLayer::hwint64_t hwint64_t;
hwint64_t doSomething() {return HwLayer::getCPUSerialNumber();}
};
And here is a last variant where only the Sub_generic implementation changes :
template <typename Hardware>
class Sub_generic
{
public:
typedef HardwareLayer<Hardware> HwLayer;
typedef typename HwLayer::hwint64_t hwint64_t;
hwint64_t doSomething() {return hw.getCPUSerialNumber();}
private:
HwLayer hw;
};

On a similar train of thought to F, you could just have a directory layout like this:
Hardware/
common/inc/hardware.h
hardware1/src/hardware.cpp
hardware2/src/hardware.cpp
Simplify the interface to only assume a single hardware exists:
// sub_generic.h
class Sub_generic {
Hardware _hardware;
};
And then only compile the folder that contains the .cpp files for the hardware for that platform.
The benefits to this approach are:
It's simple to understand whats happening and to add a hardware3
hardware.h still serves as your API
It takes away the abstraction from the compiler (for your speed concerns)
Compiler 1 doesn't need to compile hardware2.cpp or hardware3.cpp which may contain things Compiler 1 can't do (like inline assembly, or some other specific Compiler 2 thing)
hardware3 might be much more complicated for some reason you haven't considered yet.. so giving it a whole directory structure encapsulates it.

Since this is for a hard real time embedded system, usually you would go for a C type of solution not c++.
With modern compilers I'd say that the overhead of c++ is not that great, so it's not entirely a matter of performance, but embedded systems tend to prefer c instead of c++.
What you are trying to build would resemble a classic device drivers library (like the one for ftdi chips).
The approach there would be (since it's written in C) something similar to your F, but with no compile time options - you would specialize the code, at runtime, based on somethig like PID, VID, SN, etc...
Now if you what to use c++ for this, templates should probably be your last option (code readability usually ranks higher than any advantage templates bring to the table). So you would probably go for something similar to A: a basic class inheritance scheme, but no particularly fancy design pattern is required.
Hope this helps...

I am going to assume that these classes only need to be created a single time, and that their instances persist throughout the entire program run time.
In this case I would recommend using the Object Factory pattern since the factory will only get run one time to create the class. From that point on the specialized classes are all a known type.

Most effective method of executing functions an in unknown order

Let's say I have a large, between 50 and 200, pool of individual functions whose job it is to operate on a single object and modify it. The pool of functions is selectively put into a single array and arranged in an arbitrary order.
The functions themselves take no arguments outside of the values present within the object it is modifying, and in this way the object's behavior is determined only by which functions are executed and in what order.
A way I have tentatively used so far is this, which might explain better what my goal is:
class Behavior{
public:
virtual void act(Object * obj) = 0;
};
class SpecificBehavior : public Behavior{
// many classes like this exist
public:
void act(Object * obj){ /* do something specific with obj*/ };
};
class Object{
public:
std::list<Behavior*> behavior;
void behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
iter->act(this);
++iter;
};
};
};
My Question is, what is the most efficient way in C++ of organizing such a pool of functions, in terms of performance and maintainability. This is for some A.I research I am doing, and this methodology is what most closely matches what I am trying to achieve.
edits: The array itself can be changed at any time by any other part of the code not listed here, but it's guaranteed to never change during the call to behave(). The array it is stored in needs to be able to change and expand to any size

If the behaviour functions have no state and only take one Object argument, then I'd go with a container of function objects:
#include <functional>
#include <vector>
typedef std::function<void(Object &)> BehaveFun;
typedef std::vector<BehaveFun> BehaviourCollection;
class Object {
BehaviourCollection b;
void behave() {
for (auto it = b.cbegin(); it != b.cend(); ++it) *it(*this);
}
};
Now you just need to load all your functions into the collection.

if the main thing you will be doing with this collection is iterating over it, you'll probably want to use a vector as dereferencing and incrementing your iterators will equate to simple pointer arithmetic.
If you want to use all your cores, and your operations do not share any state, you might want to have a look at a library like Intel's TBB (see the parallel_for example)

I'd keep it exactly as you have it.
Perofmance should be OK (there may be an extra indirection due to the vtable look up but that shouldn't matter.)
My reasons for keeping it as is are:
You might be able to lift common sub-behaviour into an intermediate class between Behaviour and your implementation classes. This is not as easy using function pointers.
struct AlsoWaveArmsBase : public Behaviour
{
void act( Object * obj )
{
start_waving_arms(obj); // Concrete call
do_other_action(obj); // Abstract call
end_waving_arms(obj); // Concrete call
}
void start_waving_arms(Object*obj);
void end_waving_arms(Object*obj);
virtual void do_other_actions(Object * obj)=0;
};
struct WaveAndWalk : public AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
};
struct WaveAndDance : pubic AlsoWaveArmsBase
{
void do_other_actions(Object * obj) { walk(obj); }
}
You might want to use state in your behaviour
struct Count : public Behavior
{
Behaviour() : i(0) {}
int i;
void act(Object * obj)
{
count(obj,i);
++i;
}
}
You might want to add helper functions e.g. you might want to add a can_act like this:
void Object::behave(){
std::list<Behavior*>::iterator iter = behavior.front();
while(iter != behavior.end()){
if( iter->can_act(this) ){
iter->act(this);
}
++iter;
};
};
IMO, these flexibilities outweigh the benefits of moving to a pure function approach.

For maintainability, your current approach is the best (virtual functions). You might get a tiny little gain from using free function pointers, but I doubt it's measurable, and even if so, I don't think it is worth the trouble. The current OO approach is fast enough and maintainable. The little gain I'm talking about comes from the fact that you are dereferencing a pointer to an object and then (behind the scenes) dereferencing a pointer to a function (which happening as the implementation of calling a virtual function).
I wouldn't use std::function, because it's not very performant (though that might differ between implementations). See this and this. Function pointers are as fast as it gets when you need this kind of dynamism at runtime.
If you need to improve the performance, I suggest to look into improving the algorithm, not this implementation.

Extending a thrift generated object in C++

Using the following .thrift file
struct myElement {
1: required i32 num,
}
struct stuff {
1: optional map<i32,myElement> mymap,
}
I get thrift-generated class with an STL map. The instance of this class is long-lived
(I append and remove from it as well as write it to disk using TSimpleFileTransport).
I would like to extend myElement in C++, the extenstions should not affect
the serialized version of this object (and this object is not used in any
other language). Whats a clean way to acomplish that?
I contemplated the following, but they didn't seem clean:
Make a second, non thrift map that is indexed with the same key
keeping both in sync could prove to be a pain
Modify the generated code either by post-processing of the generated
header (incl. proprocessor hackery).
Similar to #2, but modify the generation side to include the following in the generated struct and then define NAME_CXX_EXT in a forced-included header
#ifdef NAME_CXX_EXT
NAME_CXX_EXT ...
#endif
All of the above seem rather nasty
The solution I am going to go with for now:
[This is all pseudo code, didn't check this copy for compilation]
The following generated code, which I cannot modify
(though I can change the map to a set)
class GeneratedElement {
public:
// ...
int32_t num;
// ...
};
class GeneratedMap {
public:
// ...
std::map<int32_t, GeneratedElement> myGeneratedMap;
// ...
};
// End of generated code
Elsewhere in the app:
class Element {
public:
GeneratedElement* pGenerated; // <<== ptr into element of another std::map!
time_t lastAccessTime;
};
class MapWrapper {
private:
GeneratedMap theGenerated;
public:
// ...
std::map<int32_t, Element> myMap;
// ...
void doStuffWIthBoth(int32_t key)
{
// instead of
// theGenerated.myGeneratedMap[key].num++; [lookup in map #1]
// time(&myMap[key].lastAccessTime); [lookup in map #2]
Element& el=myMap[key];
el.pGenerated->num++;
time(&el.lastAccessTime);
}
};
I wanted to avoid the double map lookup for every access
(though I know that the complexity remains the same, it is still two lookups ).
I figured I can guarantee that all insertions and removals to/from the theGenerated)
are done in a single spot, and in that same spot is where I populate/remove
the corresponding entry in myMap, I would then be able to initialize
Element::pGenerated to its corresponding element in theGenerated.myGeneratedMap
Not only will this let me save half of the lookup time, I may even change
myMap to a better container type for my keytype (say a hash_map or even a boost
multi index map)
At first this sounded to me like a bad idea. With std::vector and std::dqueue I can
see how this can be a problem as the values will be moved around,
invalidating the pointers. Given that std::map is implemented with a tree
structure, is there really a time where a map element will be relocated?
(my above assumptions were confirmed by the discussion in enter link description here)
While I probably won't provide an access method to each member of myElement or any syntactic sugar (like overloading [] () etc), this lets me treat these elements almost a consistent manner. The only key is that (aside for insertion) I never look for members of mymap directly.

Have you considered just using simple containership?
You're using C++, so you can just wrap the struct(s) in some class or other struct, and provide wrapper methods to do whatever you want.

Where do you find templates useful?

At my workplace, we tend to use iostream, string, vector, map, and the odd algorithm or two. We haven't actually found many situations where template techniques were a best solution to a problem.
What I am looking for here are ideas, and optionally sample code that shows how you used a template technique to create a new solution to a problem that you encountered in real life.
As a bribe, expect an up vote for your answer.

General info on templates:
Templates are useful anytime you need to use the same code but operating on different data types, where the types are known at compile time. And also when you have any kind of container object.
A very common usage is for just about every type of data structure. For example: Singly linked lists, doubly linked lists, trees, tries, hashtables, ...
Another very common usage is for sorting algorithms.
One of the main advantages of using templates is that you can remove code duplication. Code duplication is one of the biggest things you should avoid when programming.
You could implement a function Max as both a macro or a template, but the template implementation would be type safe and therefore better.
And now onto the cool stuff:
Also see template metaprogramming, which is a way of pre-evaluating code at compile-time rather than at run-time. Template metaprogramming has only immutable variables, and therefore its variables cannot change. Because of this template metaprogramming can be seen as a type of functional programming.
Check out this example of template metaprogramming from Wikipedia. It shows how templates can be used to execute code at compile time. Therefore at runtime you have a pre-calculated constant.
template <int N>
struct Factorial
{
enum { value = N * Factorial<N - 1>::value };
};
template <>
struct Factorial<0>
{
enum { value = 1 };
};
// Factorial<4>::value == 24
// Factorial<0>::value == 1
void foo()
{
int x = Factorial<4>::value; // == 24
int y = Factorial<0>::value; // == 1
}

I've used a lot of template code, mostly in Boost and the STL, but I've seldom had a need to write any.
One of the exceptions, a few years ago, was in a program that manipulated Windows PE-format EXE files. The company wanted to add 64-bit support, but the ExeFile class that I'd written to handle the files only worked with 32-bit ones. The code required to manipulate the 64-bit version was essentially identical, but it needed to use a different address type (64-bit instead of 32-bit), which caused two other data structures to be different as well.
Based on the STL's use of a single template to support both std::string and std::wstring, I decided to try making ExeFile a template, with the differing data structures and the address type as parameters. There were two places where I still had to use #ifdef WIN64 lines (slightly different processing requirements), but it wasn't really difficult to do. We've got full 32- and 64-bit support in that program now, and using the template means that every modification we've done since automatically applies to both versions.

One place that I do use templates to create my own code is to implement policy classes as described by Andrei Alexandrescu in Modern C++ Design. At present I'm working on a project that includes a set of classes that interact with BEA\h\h\h Oracle's Tuxedo TP monitor.
One facility that Tuxedo provides is transactional persistant queues, so I have a class TpQueue that interacts with the queue:
class TpQueue {
public:
void enqueue(...)
void dequeue(...)
...
}
However as the queue is transactional I need to decide what transaction behaviour I want; this could be done seperately outside of the TpQueue class but I think it's more explicit and less error prone if each TpQueue instance has its own policy on transactions. So I have a set of TransactionPolicy classes such as:
class OwnTransaction {
public:
begin(...) // Suspend any open transaction and start a new one
commit(..) // Commit my transaction and resume any suspended one
abort(...)
}
class SharedTransaction {
public:
begin(...) // Join the currently active transaction or start a new one if there isn't one
...
}
And the TpQueue class gets re-written as
template <typename TXNPOLICY = SharedTransaction>
class TpQueue : public TXNPOLICY {
...
}
So inside TpQueue I can call begin(), abort(), commit() as needed but can change the behaviour based on the way I declare the instance:
TpQueue<SharedTransaction> queue1 ;
TpQueue<OwnTransaction> queue2 ;

I used templates (with the help of Boost.Fusion) to achieve type-safe integers for a hypergraph library that I was developing. I have a (hyper)edge ID and a vertex ID both of which are integers. With templates, vertex and hyperedge IDs became different types and using one when the other was expected generated a compile-time error. Saved me a lot of headache that I'd otherwise have with run-time debugging.

Here's one example from a real project. I have getter functions like this:
bool getValue(wxString key, wxString& value);
bool getValue(wxString key, int& value);
bool getValue(wxString key, double& value);
bool getValue(wxString key, bool& value);
bool getValue(wxString key, StorageGranularity& value);
bool getValue(wxString key, std::vector<wxString>& value);
And then a variant with the 'default' value. It returns the value for key if it exists, or default value if it doesn't. Template saved me from having to create 6 new functions myself.
template <typename T>
T get(wxString key, const T& defaultValue)
{
T temp;
if (getValue(key, temp))
return temp;
else
return defaultValue;
}

Templates I regulary consume are a multitude of container classes, boost smart pointers, scopeguards, a few STL algorithms.
Scenarios in which I have written templates:
custom containers
memory management, implementing type safety and CTor/DTor invocation on top of void * allocators
common implementation for overloads wiht different types, e.g.
bool ContainsNan(float * , int)
bool ContainsNan(double *, int)
which both just call a (local, hidden) helper function
template <typename T>
bool ContainsNanT<T>(T * values, int len) { ... actual code goes here } ;
Specific algorithms that are independent of the type, as long as the type has certain properties, e.g. binary serialization.
template <typename T>
void BinStream::Serialize(T & value) { ... }
// to make a type serializable, you need to implement
void SerializeElement(BinStream & strean, Foo & element);
void DeserializeElement(BinStream & stream, Foo & element)
Unlike virtual functions, templates allow more optimizations to take place.
Generally, templates allow to implement one concept or algorithm for a multitude of types, and have the differences resolved already at compile time.

We use COM and accept a pointer to an object that can either implement another interface directly or via [IServiceProvider](http://msdn.microsoft.com/en-us/library/cc678965(VS.85).aspx) this prompted me to create this helper cast-like function.
// Get interface either via QueryInterface of via QueryService
template <class IFace>
CComPtr<IFace> GetIFace(IUnknown* unk)
{
CComQIPtr<IFace> ret = unk; // Try QueryInterface
if (ret == NULL) { // Fallback to QueryService
if(CComQIPtr<IServiceProvider> ser = unk)
ser->QueryService(__uuidof(IFace), __uuidof(IFace), (void**)&ret);
}
return ret;
}

I use templates to specify function object types. I often write code that takes a function object as an argument -- a function to integrate, a function to optimize, etc. -- and I find templates more convenient than inheritance. So my code receiving a function object -- such as an integrator or optimizer -- has a template parameter to specify the kind of function object it operates on.

The obvious reasons (like preventing code-duplication by operating on different data types) aside, there is this really cool pattern that's called policy based design. I have asked a question about policies vs strategies.
Now, what's so nifty about this feature. Consider you are writing an interface for others to use. You know that your interface will be used, because it is a module in its own domain. But you don't know yet how people are going to use it. Policy-based design strengthens your code for future reuse; it makes you independent of data types a particular implementation relies on. The code is just "slurped in". :-)
Traits are per se a wonderful idea. They can attach particular behaviour, data and typedata to a model. Traits allow complete parameterization of all of these three fields. And the best of it, it's a very good way to make code reusable.

I once saw the following code:
void doSomethingGeneric1(SomeClass * c, SomeClass & d)
{
// three lines of code
callFunctionGeneric1(c) ;
// three lines of code
}
repeated ten times:
void doSomethingGeneric2(SomeClass * c, SomeClass & d)
void doSomethingGeneric3(SomeClass * c, SomeClass & d)
void doSomethingGeneric4(SomeClass * c, SomeClass & d)
// Etc
Each function having the same 6 lines of code copy/pasted, and each time calling another function callFunctionGenericX with the same number suffix.
There were no way to refactor the whole thing altogether. So I kept the refactoring local.
I changed the code this way (from memory):
template<typename T>
void doSomethingGenericAnything(SomeClass * c, SomeClass & d, T t)
{
// three lines of code
t(c) ;
// three lines of code
}
And modified the existing code with:
void doSomethingGeneric1(SomeClass * c, SomeClass & d)
{
doSomethingGenericAnything(c, d, callFunctionGeneric1) ;
}
void doSomethingGeneric2(SomeClass * c, SomeClass & d)
{
doSomethingGenericAnything(c, d, callFunctionGeneric2) ;
}
Etc.
This is somewhat highjacking the template thing, but in the end, I guess it's better than play with typedefed function pointers or using macros.

I personally have used the Curiously Recurring Template Pattern as a means of enforcing some form of top-down design and bottom-up implementation. An example would be a specification for a generic handler where certain requirements on both form and interface are enforced on derived types at compile time. It looks something like this:
template <class Derived>
struct handler_base : Derived {
void pre_call() {
// do any universal pre_call handling here
static_cast<Derived *>(this)->pre_call();
};
void post_call(typename Derived::result_type & result) {
static_cast<Derived *>(this)->post_call(result);
// do any universal post_call handling here
};
typename Derived::result_type
operator() (typename Derived::arg_pack const & args) {
pre_call();
typename Derived::result_type temp = static_cast<Derived *>(this)->eval(args);
post_call(temp);
return temp;
};
};
Something like this can be used then to make sure your handlers derive from this template and enforce top-down design and then allow for bottom-up customization:
struct my_handler : handler_base<my_handler> {
typedef int result_type; // required to compile
typedef tuple<int, int> arg_pack; // required to compile
void pre_call(); // required to compile
void post_call(int &); // required to compile
int eval(arg_pack const &); // required to compile
};
This then allows you to have generic polymorphic functions that deal with only handler_base<> derived types:
template <class T, class Arg0, class Arg1>
typename T::result_type
invoke(handler_base<T> & handler, Arg0 const & arg0, Arg1 const & arg1) {
return handler(make_tuple(arg0, arg1));
};

It's already been mentioned that you can use templates as policy classes to do something. I use this a lot.
I also use them, with the help of property maps (see boost site for more information on this), in order to access data in a generic way. This gives the opportunity to change the way you store data, without ever having to change the way you retrieve it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js