C++ architecture for compile-time array dimensionality and named indexing - c++

I can work with code in C++, but it's not where I spend most of my time. I usually work in another language, where, over the course of my career, I have put together a well defined architecture for building predictor/corrector (e.g Kalman filter) type algorithms that are easily maintained and modified. For the sake of a ground up deployment of a recently designed filter, I am hoping to replicate this architecture within a C++ framework. Hopefully, we can get the same level of extensibility built into the deployed product, so I don't need to keep jumping back-and-forth to another language whenever I want to modify the model being used by the filter.
The idea here is that we're going to have an array that contains a bunch of different information about the state of a given system. Let's say, for example, we have a an object with a position and orientation in 3D... We'll use a quaternion for the orientation, but the specifics of that aren't super important.
Here's some pseudo-code to demonstrate what I'm trying to accomplish:
function build_model()
model.add_state('quaternion',[0;0;0;1],[1;1;1]);
model.add_state('position',[0;0;0],[10;10;10]);
model.add_input('velocity',[0;0;0]);
model.add_input('angular_rate',[0;0;0]);
model.add_noise('velocity_noise',[1;1;1]);
model.add_noise('angular_rate_noise',0.01*[1;1;1]);
end
where the above have the form:
add_state(state_name, initial_state_estimate, init_error_std_deviation_estimate)
add_input(input_name, initial_input_value)
add_noise(noise_name, noise_std_deviation)
After calling build_model() happens, I end up with a bunch of information about the estimator.
The state space is of dimension 7
The state error space is of dimension 6
The input vector is of dimension 6
The "process noise" vector is of dimension 6
Further (indexed from 0), I have some arrays, such that:
state[0:3] holds the quaternion
state[4:6] holds the position
state_err[0:2] holds quaternion error
state_err[3:5] holds position error
input[0:2] holds velocity
input[3:5] holds angular_rate
process_noise[0:2] holds velocity noise
process_noise[3:5] holds angular rate noise
... but, I don't want a bunch of hard-coded indices... in fact, once the model is built, the rest of the code should be designed to be completely agnostic to the positions/dimensions/etc of the variables/model/state/error-space etc.
Since the estimator and the model don't really care about each other, I try to keep them encapsulated... i.e. the estimator just has state/error/noise of known dimensions and processes it with functions of a generic format, and then the model specific stuff is presented in the appropriate format. This, unfortunately, makes using an indexed array (rather than a struct or something) preferable.
Essentially what I'm looking for, is a pre-compiler way to associate names (like a structure) and indices (like an array) with the same data... ideally building it up piece by piece using simple language as shown above, to a final dimension, determined by the pre-compiler based on the model definition, to be used for defining the size of various arrays within the estimator runtime algorithm.
I'm not looking for someone to do this for me, but I'd love a push in the right direction. Good architecture early pays dividends in the long run, so I'm willing to invest some time to get it right.
So, a couple of things I've thought about:
There are definitely ways to do this at run-time with dynamic memory and things like std:vector, structures, enums, and so forth. But, since the deployed version of this is going to be running in real-time, performance is an issue... besides, all of this stuff shouldn't need to happen at run-time anyway. If we had sufficiently sophisticated precompiler, it could just calculate all of this out, define some constants/macros/whatever to manipulate the model by name while using indices behind the scenes... unfortunately, fancy precompiler stuff is a pretty niche area that I have little experience with.
It seems like template meta-programming and/or macros might be a way to go, but I'm hesitant to dive head-first into that without guidance, and I recognize that this is shady at best in terms of modern software design.
I could always write code to write the C++ code for me... i.e. spit out a bunch of #defines or enums for the indices by name, as well as the dimensionality of the model/estimator components, and just copy paste this into the C++ code... but that feels wrong for different reasons. On the other hand, that's one way to get a "sufficiently sophisticated pre-compiler".
Giving up on the compile-time dimensioning of my arrays would also solve the problem, but since the all of this is constant once computed, run-time seems like the wrong place for it...
So, is there an elegant solution out there? I'd hate to just brute force this, but I don't see a clear alternative. Also, much of the above may be WAY OFF for any number of reasons... apologies if so, and I appreciate any input you might have :-)

I ended up getting most of the way there using template meta-programming... [see below]
I'd like to find a way to add the state to state_enum and define its corresponding set_state struct at the same time ie:
add_state(quaternion,{0,0,0,1},{1,1,1})
just for cleanliness and to prevent one happening without the other... if anyone has ideas on how to do this (preferably without using __COUNTER__ or boost), let me know. Thanks!
#include <iostream>
struct state_enum{
enum{quaternion,position,last};
};
template <int state_num> struct set_state{
static constexpr double x0[] = {};
static constexpr double sx0[] = {};
};
template <> struct set_state<state_enum::quaternion>{
static constexpr double x0[] = {0,0,0,1};
static constexpr double sx0[] = {1,1,1};
};
template <> struct set_state<state_enum::position>{
static constexpr double x0[] = {0,0,0};
static constexpr double sx0[] = {2,2,2};
};
template <int state_num> struct state{
enum{
m_x = sizeof(set_state<state_num>::x0)/sizeof(set_state<state_num>::x0[0]),
m_dx = sizeof(set_state<state_num>::sx0)/sizeof(set_state<state_num>::sx0[0])
};
enum{
m_x_cummulative = state<state_num-1>::m_x_cummulative+m_x,
m_dx_cummulative=state<state_num-1>::m_dx_cummulative+m_dx,
i_x0=state<state_num-1>::m_x_cummulative,
i_dx0=state<state_num-1>::m_dx_cummulative,
i_x1=state<state_num-1>::m_x_cummulative+m_x-1,
i_dx1=state<state_num-1>::m_dx_cummulative+m_dx-1
};
};
template <> struct state<-1>{
enum{m_x = 0, m_dx=0};
enum{m_x_cummulative = 0, m_dx_cummulative=0, i_x0 = 0, i_dx0=0, i_x1 = 0, i_dx1=0};
};
int main(int argc, const char * argv[]) {
std::cout << "Summary of model indexing and dimensions...\n\n";
std::printf("%-32s %02i\n","quaternion first state index",state<state_enum::quaternion>::i_x0);
std::printf("%-32s %02i\n","quaternion final state index",state<state_enum::quaternion>::i_x1);
std::printf("%-32s %02i\n","position first state index",state<state_enum::position>::i_x0);
std::printf("%-32s %02i\n","position final state index",state<state_enum::position>::i_x1);
std::printf("%-32s %02i\n","full state vector dimensionality",state<state_enum::last>::m_x_cummulative);
std::cout << "\n";
std::printf("%-32s %02i\n","quaternion first error index",state<state_enum::quaternion>::i_dx0);
std::printf("%-32s %02i\n","quaternion final error index",state<state_enum::quaternion>::i_dx1);
std::printf("%-32s %02i\n","position first error index",state<state_enum::position>::i_dx0);
std::printf("%-32s %02i\n","position final error index",state<state_enum::position>::i_dx1);
std::printf("%-32s %02i\n","full error vector dimensionality",state<state_enum::last>::m_dx_cummulative);
std::cout << "\n\n";
return 0;
}

Related

How to create optimized variations of the same algorithm without too much copy/paste code? (C++)

When I write C++ code for realtime optimised purposes, such as audio or graphics processing, I run into the following problem quite often:
I need several variations of piece of code but with only some tiny change in their innerloops. Most often this required variation is how the algorithm outputs the results. I.e. should it replace the previous data in the output buffer, should it add it with the previous data in the buffer, or should it multiply it, etc.
Copy-pasting the whole method and changing just couple of characters from one line in the inner loop feels like an awful way to do things.
One way to cure this fairly efficiently would be to always use some simple blending formula which you give parameters, such as:
*p_buffer++ = A*(*p_buffer) + B*new_value + C;
Then you could simply give A, B and C to that algorithm and be done with it. But that feels like a waste of CPU cycles. Also if the buffer hasn't been initialized with valid float values and there's a NAN, then the results might also be NAN, even if you intend to multiply the previous value in the buffer with 0.
So is there another efficient way of creating these kinds of variations for a method without sacrificing speed? I'm hoping to have separate methods for each of the variations I use. For example:
Render();
RenderAdditive();
RenderMultiplicative();
RenderSomeOtherWay();
EDIT:
Based on the answers, I went by defining the method as:
template<int TYPE> void Render(...)
{
if constexpr (TYPE == 0)
*p_output_buffer++ = data;
if constexpr (TYPE == 1)
*p_output_buffer++ += data;
if constexpr (TYPE == 2)
*p_output_buffer++ *= data;
}
This approach works great.
Thank you for everyone for helping!

Do constexpr values cause binary size to increase?

I know that templatized types such as the below cost nothing on the compiled binary size:
template<auto value>
struct ValueHolder{};
I'm making a program that will use a LOT of such wrapped types, and I don't think I want to be using integral_constants for that reason, since they have a ::value member. I can get away with something more like:
template<typename ValHolder>
struct Deducer;
template<auto value>
struct Deducer<ValueHolder<value>> {
using Output = ValueHolder<value+1>;
};
But it's definitely a bit more work, so I want to make sure I'm not doing it for nothing. Note that we're talking TONS of such values (I'd explain, but I don't want to go on too far a tangent; I'd probably get more comments about "should I do the project" than the question!).
So the question is: Do [static] constexpr values take any size at all in the compiled binary, or are the values substituted at compile-time, as if they were typed-in literally? I'm pretty sure they DO take size in the binary, but I'm not positive.
I did a little test at godbolt to look at the assembly of a constexpr vs non-constexpr array side-by-side, and everything looked pretty similar to me: https://godbolt.org/z/9hecfq
int main()
{
// Non-constexpr large array
size_t arr[0xFFFF] = {};
// Constexpr large array
constexpr size_t cArr[0xFFF] = {};
// Just to avoid unused variable optimizations / warnings
cout << arr[0] << cArr[0] << endl;
return 0;
}
This depends entirely on:
How much the compiler feels like optimizing the variable away.
How you use the variable.
Consider the code you posted. You created a constexpr array. As this is an array, it is 100% legal to index it with a runtime value. This would require the compiler to emit code that accesses the array at that index, which would require that array to actually exist in memory. So if you use it in such a way, it must have storage.
However, since your code only indexes this array with a constant expression index, a compiler that wants to think a bit more than -O0 would allow would realize that it knows the value of all of the elements in that array. So it knows exactly what cArr[0] is. And that means the compiler can just convert that expression into the proper value and just ignore that cArr exists.
Such a compiler could do the same with arr, BTW; it doesn't have to be a constant expression for the compiler to detect a no-op.
Also, note that since both arrays are non-static, neither will take up storage "in the compiled binary". If runtime storage for them is needed, it will be stack space, not executable space.
Broadly speaking, a constexpr variable will take up storage at any reasonable optimization level if you do something that requires it to take up storage. This could be something as innocuous as passing it to a (un-inlined) function that takes the parameter by const&.
Ask your linker :) There is nothing anywhere in the C++ standard that has any bearing on the answer. So you absolutely, positively, must build your code in release mode, and check if in the particular use scenario it does increase the size or not.
Any general results you obtain on other platforms, different compilers (1), other compile options, other modules added/removed to your project, or even any changes to the code, will not have much relevance.
You have a specific question that depends on so many factors that general answers are IMHO useless.
But moreover, if you actually care about the binary size, then it should be already in your test/benchmark suite, you should have integration builds fail when things grow when they shouldn’t, etc. No measurement and no automation are prima facie evidence that you don’t actually care.
So, since you presumably do care about the binary size, just write the code you had in mind and look in your CI dashboard at the binary size metric. Oh, you don’t have it? Well, that’s the first thing to get done before you go any further. I’m serious.
(1): Same compiler = same binary. I’m crazy, you say? No. It bit me once too many. If the compiler binary is different (other than time stamps), it’s not the same compiler, end of story.

Creating custom or using built in types

In some projects people create custom types for everything, and in others they just use ints and floats to represent temperatures, lengths and angles.
I can see advantages and draw backs to both, and I guess it depends on the type of project you are working on if is a good idea or not to create these kinds of types.
Here is what I'm thinking of:
class SomeClass
{
Physics::Temperature TemperatureOnMoon(Geometry::Distance distanceFromSun);
Geometry::Area Shadow(Geometry::Angle xAngle, Geometry::Angle yAngle, Geometry::Triangle triangle);
};
The Temperature type would have a Fahrenheit() and Celsius() method, the Area type would have a constructor that takes two Point types and so on.
This of gives great type safety and I think it increases readability, but it also creates a lot of dependencies. Suddenly everyone who uses SomeClass has to include all these other headers and so you have to do a lot more work when your creating unit tests. It also takes time to develop all the types.
The approach using built in types are much simpler to use and have fewer dependencies:
class SomeClass
{
double TemperatureOnMoon(double distanceFromSun);
double Shadow(double xAngle, double yAngle, double triangle);
};
My question is, to what degree do you create these kinds of types? Would you prefer them in larger projects? Are there ready made libraries for this kind of stuff?
I would avoid creating new types when it's unnecessary. Here are a few issues you will have to deal with:
It hides the information about precision - like in the case of a Distance, what can an distance be? is it an integer, is it a float is it a double?
You will have problems using standard libraries - for example in the case of can you use max(distance1, distance2)? how about sorting distances? you will have to create a compare function explicitly. It also depends on how you define your type. If it's a typedef of a primitive type, you may not need to create a new compare function or max function. But it will still be confusing. But if your Distance is now a class or a struct, then you will have to overload all the operators explicitly, + - = *.....
Since you don't know if it's a floating point type or an integer you don't know if you can safely use == to compare 2 distances. They can be floating points, and if manipulated differently they may end up with a different result than in theory due to precision issues.
The number of files to maintain is going to be bigger, the building process will be unnecessary longer.
I would create new types if they don't make sense as primitives at all, and you do want to overload all the operators or not allow some. I'm struggling to find a good example, but AN example can be "a binary number" so if you define a BinaryNumber as a class/struct instead of using it as an integer that would make sense since if you had a int binaryNumber1=1, binaryNumber2=1; and somewhere along the process you do binaryNumber1+binaryNumber2 you would expect the result to be 10 instead of 2, right? So you would define a BinaryNumber class/struct and overload the operator + - * / etc.

C++ test to verify equality operator is kept consistent with struct over time

I voted up #TomalakGeretkal for a good note about by-contract; I'm haven't accepted an answer as my question is how to programatically check the equals function.
I have a POD struct & an equality operator, a (very) small part of a system with >100 engineers.
Over time I expect the struct to be modified (members added/removed/reordered) and I want to write a test to verify that the equality op is testing every member of the struct (eg is kept up to date as the struct changes).
As Tomalak pointed out - comments & "by contract" is often the best/only way to enforce this; however in my situation I expect issues and want to explore whether there are any ways to proactively catch (at least many) of the modifications.
I'm not coming up with a satisfactory answer - this is the best I've thought of:
-new up two instances struct (x, y), fill each with identical non-zero data.
-check x==y
-modify x "byte by byte"
-take ptr to be (unsigned char*)&x
-iterator over ptr (for sizeof(x))
-increment the current byte
-check !(x==y)
-decrement the current byte
-check x==y
The test passes if the equality operator caught every byte (NOTE: there is a caveat to this - not all bytes are used in the compilers representation of x, therefore the test would have to 'skip' these bytes - eg hard code ignore bytes)
My proposed test has significant problems: (at least) the 'don't care' bytes, and the fact that incrementing one byte of the types in x may not result in a valid value for the variable at that memory location.
Any better solutions?
(This shouldn't matter, but I'm using VS2008, rtti is off, googletest suite)
Though tempting to make code 'fool-proof' with self-checks like this, it's my experience that keeping the self-checks themselves fool-proof is, well, a fool's errand.
Keep it simple and localise the effect of any changes. Write a comment in the struct definition making it clear that the equality operator must also be updated if the struct is; then, if this fails, it's just the programmer's fault.
I know that this will not seem optimal to you as it leaves the potential for user error in the future, but in reality you can't get around this (at least without making your code horrendously complicated), and often it's most practical just not to bother.
I agree with (and upvoted) Tomalak's answer. It's unlikely that you'll find a foolproof solution. Nonetheless, one simple semi-automated approach could be to validate the expected size within the equality operator:
MyStruct::operator==(const MyStruct &rhs)
{
assert(sizeof(MyStruct) == 42); // reminder to update as new members added
// actual functionality here ...
}
This way, if any new members are added, the assert will fire until someone updates the equality operator. This isn't foolproof, of course. (Member vars might be replaced with something of same size, etc.) Nonetheless, it's a relatively simple (one line assert) that has a good shot of detecting the error case.
I'm sure I'm going to get downvoted for this but...
How about a template equality function that takes a reference to an int parameter, and the two objects being tested. The equality function will return bool, but will increment the size reference (int) by the sizeof(T).
Then have a large test function that calls the template for each object and sums the total size --> compare this sum with the sizeof the object. The existence of virtual functions/inheritance, etc could kill this idea.
it's actually a difficult problem to solve correctly in a self-test.
the easiest solution i can think of is to take a few template functions which operate on multiple types, perform the necessary conversions, promotions, and comparisons, then verify the result in an external unit test. when a breaking change is introduced, at least you'll know.
some of these challenges are more easily maintained/verified using approaches such as composition, rather than extension/subclassing.
Agree with Tomalak and Eric. I have used this for very similar problems.
Assert does not work unless the DEBUG is defined, so potentially you can release code that is wrong. These tests will not always work reliably. If the structure contains bit fields, or items are inserted that take up slack space cause by compiler aligning to word boundaries, the size won't change. For this reason they offer limited value. e.g.
struct MyStruct {
char a ;
ulong l ;
}
changed to
struct MyStruct {
char a ;
char b ;
ulong l ;
}
Both structures are 8 bytes (on 32bit Linux x86)

Template Metaprogramming - I still don't get it :(

I have a problem... I don't understand template metaprogramming.
The problem is, that I’ve read a lot about it, but it still doesn’t make much sense to me.
Fact nr.1: Template Metaprogramming is faster
template <int N>
struct Factorial
{
enum { value = N * Factorial<N - 1>::value };
};
template <>
struct Factorial<0>
{
enum { value = 1 };
};
// Factorial<4>::value == 24
// Factorial<0>::value == 1
void foo()
{
int x = Factorial<4>::value; // == 24
int y = Factorial<0>::value; // == 1
}
So this metaprogram is faster ... because of the constant literal.
BUT: Where in the real world do we have constant literals? Most programs I use react on user input.
FACT nr. 2 : Template metaprogramming can accomplish better maintainability.
Yeah, the factorial example may be maintainable, but when it comes to complex functions, I and most other C++ programmers can't read them.
Also, the debugging options are very poor (or at least I don't know how to debug).
When does template metaprogramming make sense?
Just as factorial is not a realistic example of recursion in non-functional languages, neither is it a realistic example of template metaprogramming. It's just the standard example people reach for when they want to show you recursion.
In writing templates for realistic purposes, such as in everyday libraries, often the template has to adapt what it does depending on the type parameters it is instantiated with. This can get quite complex, as the template effectively chooses what code to generate, conditionally. This is what template metaprogramming is; if the template has to loop (via recursion) and choose between alternatives, it is effectively like a small program that executes during compilation to generate the right code.
Here's a really nice tutorial from the boost documentation pages (actually excerpted from a brilliant book, well worth reading).
http://www.boost.org/doc/libs/1_39_0/libs/mpl/doc/tutorial/representing-dimensions.html
I use template mete-programming for SSE swizzling operators to optimize shuffles during compile time.
SSE swizzles ('shuffles') can only be masked as a byte literal (immediate value), so we created a 'mask merger' template class that merges masks during compile time for when multiple shuffle occur:
template <unsigned target, unsigned mask>
struct _mask_merger
{
enum
{
ROW0 = ((target >> (((mask >> 0) & 3) << 1)) & 3) << 0,
ROW1 = ((target >> (((mask >> 2) & 3) << 1)) & 3) << 2,
ROW2 = ((target >> (((mask >> 4) & 3) << 1)) & 3) << 4,
ROW3 = ((target >> (((mask >> 6) & 3) << 1)) & 3) << 6,
MASK = ROW0 | ROW1 | ROW2 | ROW3,
};
};
This works and produces remarkable code without generated code overhead and little extra compile time.
so this Metaprogram is faster ... beacause of the Constant Literal.
BUT : Where in the real World do we have constant Literals ?
Most programms i use react on user input.
That's why it's hardly ever used for values. Usually, it is used on types. using types to compute and generate new types.
There are many real-world uses, some of which you're already familiar with even if you don't realize it.
One of my favorite examples is that of iterators. They're mostly designed just with generic programming, yes, but template metaprogramming is useful in one place in particular:
To patch up pointers so they can be used as iterators. An iterator must expose a handful of typedef's, such as value_type. Pointers don't do that.
So code such as the following (basically identical to what you find in Boost.Iterator)
template <typename T>
struct value_type {
typedef typename T::value_type type;
};
template <typename T>
struct value_type<T*> {
typedef T type;
};
is a very simple template metaprogram, but which is very useful. It lets you get the value type of any iterator type T, whether it is a pointer or a class, simply by value_type<T>::type.
And I think the above has some very clear benefits when it comes to maintainability. Your algorithm operating on iterators only has to be implemented once. Without this trick, you'd have to make one implementation for pointers, and another for "proper" class-based iterators.
Tricks like boost::enable_if can be very valuable too. You have an overload of a function which should be enabled for a specific set of types only. Rather than defining an overload for each type, you can use metaprogramming to specify the condition and pass it to enable_if.
Earwicker already mentioned another good example, a framework for expressing physical units and dimensions. It allows you to express computations like with physical units attached, and enforces the result type. Multiplying meters by meters yields a number of square meters. Template metaprogramming can be used to automatically produce the right type.
But most of the time, template metaprogramming is used (and useful) in small, isolated cases, basically to smooth out bumps and exceptional cases, to make a set of types look and behave uniformly, allowing you to use generic programming more efficiently
Seconding the recommendation for Alexandrescu's Modern C++ Design.
Templates really shine when you're writing a library that has pieces which can be assembled combinatorically in a "choose a Foo, a Bar and a Baz" approach, and you expect users to make use of these pieces in some form that is fixed at compile time. For example, I coauthored a data mining library that uses template metaprogramming to let the programmer decide what DecisionType to use (classification, ranking or regression), what InputType to expect (floats, ints, enumerated values, whatever), and what KernelMethod to use (it's a data mining thing). We then implemented several different classes for each category, such that there were several dozen possible combinations.
Implementing 60 separate classes to do this would have involved a lot of annoying, hard-to-maintain code duplication. Template metaprogramming meant that we could implement each concept as a code unit, and give the programmer a simple interface for instantiating combinations of these concepts at compile-time.
Dimensional analysis is also an excellent example, but other people have covered that.
I also once wrote some simple compile-time pseudo-random number generators just to mess with people's heads, but that doesn't really count IMO.
The factorial example is about as useful for real-world TMP as "Hello, world!" is for common programming: It's there to show you a few useful techniques (recursion instead of iteration, "else-if-then" etc.) in a very simple, relatively easy to understand example that doesn't have much relevance for your every-day coding. (When was the last time you needed to write a program that emitted "Hello, world"?)
TMP is about executing algorithms at compile-time and this implies a few obvious advantages:
Since these algorithms failing means your code doesn't compile, failing algorithms never make it to your customer and thus can't fail at the customer's. For me, during the last decade this was the single-most important advantage that led me to introduce TMP into the code of the companies I worked for.
Since the result of executing template-meta programs is ordinary code that's then compiled by the compiler, all advantages of code generating algorithms (reduced redundancy etc.) apply.
Of course, since they are executed at compile-time, these algorithms won't need any run-time and will thus run faster. TMP is mostly about compile-time computing with a few, mostly small, inlined functions sprinkled in between, so compilers have ample opportunities to optimize the resulting code.
Of course, there's disadvantages, too:
The error messages can be horrible.
There's no debugging.
The code is often hard to read.
As always, you'll just have to weight the advantages against the disadvantages in every case.
As for a more useful example: Once you have grasped type lists and basic compile-time algorithms operating on them, you might understand the following:
typedef
type_list_generator< signed char
, signed short
, signed int
, signed long
>::result_type
signed_int_type_list;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<8>
>::result_type
int8_t;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<16>
>::result_type
int16_t;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<32>
>::result_type
int32_t;
This is (slightly simplified) actual code I wrote a few weeks ago. It will pick the appropriate types from a type list, replacing the #ifdef orgies common in portable code. It doesn't need maintenance, works without adaption on every platform your code might need to get ported to, and emits a compile error if the current platform doesn't have the right type.
Another example is this:
template< typename TFunc, typename TFwdIter >
typename func_traits<TFunc>::result_t callFunc(TFunc f, TFwdIter begin, TFwdIter end);
Given a function f and a sequence of strings, this will dissect the function's signature, convert the strings from the sequence into the right types, and call the function with these objects. And it's mostly TMP inside.
Here's one trivial example, a binary constant converter, from a previous question here on StackOverflow:
C++ binary constant/literal
template< unsigned long long N >
struct binary
{
enum { value = (N % 10) + 2 * binary< N / 10 > :: value } ;
};
template<>
struct binary< 0 >
{
enum { value = 0 } ;
};
TMP does not necessarily mean faster or more maintainable code. I used the boost spirit library to implement a simple SQL expression parser that builds an evaluation tree structure. While the development time was reduced since I had some familiarity with TMP and lambda, the learning curve is a brick wall for "C with classes" developers, and the performance is not as good as a traditional LEX/YACC.
I see Template Meta Programming as just another tool in my tool-belt. When it works for you use it, if it doesn't, use another tool.
Scott Meyers has been working on enforcing code constraints using TMP.
Its quite a good read:
http://www.artima.com/cppsource/codefeatures.html
In this article he introduces the concepts of Sets of Types (not a new concept but his work is based ontop of this concept). Then uses TMP to make sure that no matter what order you specify the members of the set that if two sets are made of the same members then they are equavalent. This requires that he be able to sort and re-order a list of types and compare them dynamically thus generating compile time errors when they do not match.
I suggest you read Modern C++ Design by Andrei Alexandrescu - this is probably one of the best books on real-world uses of C++ template metaprogramming; and describes many problems which C++ templates are an excellent solution.
TMP can be used from anything like ensuring dimensional correctness (Ensuring that mass cannot be divided by time, but distance can be divided by time and assigned to a velocity variable) to optimizing matrix operations by removing temporary objects and merging loops when many matrices are involved.
'static const' values work as well. And pointers-to-member. And don't forget the world of types (explicit and deduced) as compile-time arguments!
BUT : Where in the real World do we have constant Literals ?
Suppose you have some code that has to run as fast as possible. It contains the critical inner loop of your CPU-bound computation in fact. You'd be willing to increase the size of your executable a bit to make it faster. It looks like:
double innerLoop(const bool b, const vector<double> & v)
{
// some logic involving b
for (vector::const_iterator it = v.begin; it != v.end(); ++it)
{
// significant logic involving b
}
// more logic involving b
return ....
}
The details aren't important, but the use of 'b' is pervasive in the implementation.
Now, with templates, you can refactor it a bit:
template <bool b> double innerLoop_B(vector<double> v) { ... same as before ... }
double innerLoop(const bool b, const vector<double> & v)
{ return b ? innerLoop_templ_B<true>(v) : innerLoop_templ_B<false>(v) ); }
Any time you have a relatively small, discrete, set of values for a parameter you can automatically instantiate separate versions for them.
Consider the possiblities when 'b' is based on the CPU detection. You can run a differently-optimized set of code depending on run-time detection. All from the same source code, or you can specialize some functions for some sets of values.
As a concrete example, I once saw some code that needed to merge some integer coordinates. Coordinate system 'a' was one of two resolutions (known at compile time), and coordinate system 'b' was one of two different resolutions (also known at compile time). The target coordinate system needed to be the least common multiple of the two source coordinate systems. A library was used to compute the LCM at compile time and instantiate code for the different possibilities.