While working on an embedded project, I have encountered a function, which is called thousands of times in application's lifetime, often in loops, dozens of times per second. I wondered if I can reduce its cost and I found out, that most of its parameters are known during compilation.
Let me illustrate it with an example.
Original hpp/cpp files can be approximated like this:
original.hpp:
void example(bool arg1, bool arg2, const char* data);
original.cpp:
#include "ex1.hpp"
#include <iostream>
void example(bool arg1, bool arg2, const char* data)
{
if (arg1 && arg2)
{
std::cout << "Both true " << data << std::endl;
}
else if (!arg1 && arg2)
{
std::cout << "False and true " << data << std::endl;
}
else if (arg1 && !arg2)
{
std::cout << "True and false " << data << std::endl;
}
else
{
std::cout << "Both false " << data << std::endl;
}
}
Let's assume, that every single time the function is called, arg1 and arg2 are known during compilation. Argument data isn't, and for variety of reasons its processing cannot be put in header file.
However, all those if statements can be handled by the compiler with a little bit of template magic:
magic.hpp:
template<bool arg1, bool arg2>
void example(const char* data);
magic.cpp:
#include "ex1.hpp"
#include <iostream>
template<bool arg1, bool arg2>
struct Processor;
template<>
struct Processor<true, true>
{
static void process(const char* data)
{
std::cout << "Both true " << data << std::endl;
}
};
template<>
struct Processor<false, true>
{
static void process(const char* data)
{
std::cout << "False and true " << data << std::endl;
}
};
template<>
struct Processor<true, false>
{
static void process(const char* data)
{
std::cout << "True and false " << data << std::endl;
}
};
template<>
struct Processor<false, false>
{
static void process(const char* data)
{
std::cout << "Both false " << data << std::endl;
}
};
template<bool arg1, bool arg2>
void example(const char* data)
{
Processor<arg1, arg2>::process(data);
}
template void example<true, true>(const char*);
template void example<false, true>(const char*);
template void example<true, false>(const char*);
template void example<false, false>(const char*);
As you can see, even on this tiny example cpp file got significantly bigger compared to the original. But I did remove a few assembler instructions!
Now, in my real-life case things are a bit more complex, because instead of two bool arguments I have enums and structures. Long story short, all combinations give me about one thousand combinations, so I have that many instances of line template void example<something>(const char*);
Of course I do not generate them manually, but with macros, yet still cpp file gets humongous, compared to the original and object file is even worse.
All this in the name of removing several if and one switch statements.
My question is: is size the only problem with the template-magic approach? I wonder if there is some hidden cost with using so many versions of the same function. Did I really saved some resources, or just the opposite?
The problem with an increased binary size is almost never the storage of the file itself - the problem is that more code means a lower % of the program instructions are available in cache at any point, leading to cache misses. If you're calling the same instantiation in a tight loop, then having it do less work is great. But if you're constantly bouncing around between different template instantiations, then the cost of going to main memory to load instructions may be far higher than what you save by removing some instructions from inside the function.
This kind of thing can be VERY difficult to predict, though. The way to find the sweet spot in this (and any) type of optimization is to measure. It is also likely to change across platforms - especially in an embedded world.
Related
So I've found a variety of articles and posts saying that there is no way to convert typename to string but I haven't found one about the opposite. I have a template of a function with specializations:
template <typename T>
void foo(T sth) {}
template <>
void foo<int>(int sth) {}
...
and I'm reading from a file constructed like this:
int 20
double 12.492
string word
Is there a way to call the correct specialization of foo() depending on the content of the file?
Yes there is, but it requires manual code and that you know all the types that are going to appear in the file. That's because templates are compile time constructs and they cannot be instantiated at runtime.
You can always use the preprocessor or other tricks to try and reduce the boilerplate if you want to.
void callFoo(std::string type, std::any arg) {
if (type == "int")
foo<int>(std::any_cast<int>(arg));
else if (type == "double")
foo<double>(std::any_cast<double>(arg));
else if (type == "string")
foo<std::string>(std::any_cast<std::string>(arg));
}
Of course, this requires that you pass in the correct type (no implicit conversions!). I don't see any way to avoid that.
To be honest, I am not sure about understanding your question. As I interpret it, I believe that you do not need a kind of dispatcher in running time neither to compute a string containing the type name. Simply you write a general template function that calls a special template wrapper that disambiguates the call to foo() according to the type. You require that the specialized foo() receives a second special parameter (the_type<T>) which is used for disambiguating.
Here a full and operating demo:
# include <string>
# include <iostream>
using namespace std;
template<class T> struct the_type { using type = T; };
template <typename T>
void foo(const T par)
{
foo(par, the_type<T>());
}
void foo(int par, the_type<int>)
{
cout << "int " << par << endl;
}
void foo(double par, the_type<double>)
{
cout << "double " << par << endl;
}
void foo(const string & par, the_type<string>)
{
cout << "string " << par << endl;
}
void foo(const char * par, the_type<const char*>)
{
cout << "char* " << par << endl;
}
int main()
{
foo(20);
foo(12.492);
foo("word");
foo(string("word"));
}
whose output is:
int 20
double 12.492
char* word
string word
If you need another specialization, then you simply define it. In some cases, you will have to explicitly to define the specialization as the template parameter.
You could use macro manips for avoiding repetitive things. For example, given that foo() structure is the same, you could encapsulate it in a macro. Something like this:
# define GENFOO(type_name) \
void foo(type_name par, the_type<type_name>) \
{ \
cout << #type_name " " << par << endl; \
}
GENFOO(int);
GENFOO(double);
GENFOO(string)
However, I would say that each specialized version of foo() would not be so similar.
Almost every OOP programmer has been exposed to the concept of Inversion of control. In C++, we can implement that principle with dynamic callbacks (i.e. functors such as lambdas and function pointers). But if we know at compile time what procedure we are to inject into the driver, theoretically I believe that there is a way to eliminate the overhead of function passing and invoking by composing the callbacks and the driver/signal/what-so-ever function into an "unrolled procedure". Here is an example.
For a GUI program, we have logic on window 1) setup, 2) loop, and 3) termination. We can inject code 1) after window setup, 2) in each render loop, 3) and before termination. A procedural approach is to write in this manner:
// Snippet 1:
init_window();
init_input_handler();
init_canvas();
init_socket();
while (!window_should_close()) {
update_window();
handle_input();
draw_on_canvas();
send_through_socket();
}
drop_input_handler();
drop_canvas();
drop_socket();
terminate_window();
OOP programmers pride ourselves in decoupling and proper abstraction. Instead, we write this:
// Snippet 2:
init_window();
on_window_init_signal.send();
while (!window_should_close()) {
update_window();
on_render_signal.send();
}
on_exit_signal.send();
terminate_window();
But this brings an unwanted overhead as said above. My question is: How can we utilize the C++ metaprogramming mechanisms to achieve zero-overhead inversion of control so that code in a similar form of snippet 2 can be transformed into snippet 1 statically (i.e. at compile time)?
EDIT: I can think of loop optimizations widely found in optimizers. Maybe this is a generalized version of that issue.
"Zero Overhead" & "But if we know at compile time what procedure we are to inject into the driver, " is possible.
You can use a template class to pass the functions to call like that:
struct SomeInjects
{
static void AtInit() { std::cout << "AtInit from SomeInjects" << std::endl; }
static void AtHandleInput() { std::cout << "AtHandleInput from SomeInjects" << std::endl; }
static void AtDraw() { std::cout << "AtDraw from SomeInjects" << std::endl; }
};
struct OtherInject
{
static void AtInit() { std::cout << "AtInit from OtherInject" << std::endl; }
static void AtHandleInput() { std::cout << "AtHandleInput from OtherInject" << std::endl; }
static void AtDraw() { std::cout << "AtDraw from OtherInject" << std::endl; }
};
template < typename Mixin >
struct Win
{
void Init()
{
Mixin::AtInit();
}
void HandleInput()
{
Mixin::AtHandleInput();
}
void Draw()
{
Mixin::AtDraw();
}
};
int main()
{
Win<SomeInjects> wsi;
wsi.Init();
wsi.HandleInput();
wsi.Draw();
Win<OtherInject> wso;
wso.Init();
wso.HandleInput();
wso.Draw();
}
But this has the drawback, that it needs static functions.
More elaborated try:
struct SomeInjects
{
void AtInit() { std::cout << "AtInit from SomeInjects" << std::endl; }
void AtHandleInput() { std::cout << "AtHandleInput from SomeInjects" << std::endl; }
void AtDraw() { std::cout << "AtDraw from SomeInjects" << std::endl; }
};
struct OtherInject
{
void AtInit() { std::cout << "AtInit from OtherInject" << std::endl; }
void AtHandleInput() { std::cout << "AtHandleInput from OtherInject" << std::endl; }
void AtDraw() { std::cout << "AtDraw from OtherInject" << std::endl; }
};
template < typename Mixin >
struct Win: Mixin
{
void Init()
{
this->AtInit();
}
void HandleInput()
{
this->AtHandleInput();
}
void Draw()
{
this->AtDraw();
}
};
int main()
{
Win<SomeInjects> wsi;
wsi.Init();
wsi.HandleInput();
wsi.Draw();
Win<OtherInject> wso;
wso.Init();
wso.HandleInput();
wso.Draw();
}
The last technique is called Mixin.
If your compiler inlines all and everything depends on many things. But typically all calls are inlined if the called functions are not really to big.
But if you need any runtime changeable callbacks, you have to use some kind of callable representation. That can be function pointers or things like std::function. The last generates more or less always some minor overhead.
But remember: A simple dereferenced pointer is typically not the speed problem at all. More important is, that in such cases constants can not be propagated, the code can't be inlined and as a result an overall optimization is not longer possible. But if runtime flexibility is needed, it will have some cost. As always: Measure before optimize!
So I've found a variety of articles and posts saying that there is no way to convert typename to string but I haven't found one about the opposite. I have a template of a function with specializations:
template <typename T>
void foo(T sth) {}
template <>
void foo<int>(int sth) {}
...
and I'm reading from a file constructed like this:
int 20
double 12.492
string word
Is there a way to call the correct specialization of foo() depending on the content of the file?
Yes there is, but it requires manual code and that you know all the types that are going to appear in the file. That's because templates are compile time constructs and they cannot be instantiated at runtime.
You can always use the preprocessor or other tricks to try and reduce the boilerplate if you want to.
void callFoo(std::string type, std::any arg) {
if (type == "int")
foo<int>(std::any_cast<int>(arg));
else if (type == "double")
foo<double>(std::any_cast<double>(arg));
else if (type == "string")
foo<std::string>(std::any_cast<std::string>(arg));
}
Of course, this requires that you pass in the correct type (no implicit conversions!). I don't see any way to avoid that.
To be honest, I am not sure about understanding your question. As I interpret it, I believe that you do not need a kind of dispatcher in running time neither to compute a string containing the type name. Simply you write a general template function that calls a special template wrapper that disambiguates the call to foo() according to the type. You require that the specialized foo() receives a second special parameter (the_type<T>) which is used for disambiguating.
Here a full and operating demo:
# include <string>
# include <iostream>
using namespace std;
template<class T> struct the_type { using type = T; };
template <typename T>
void foo(const T par)
{
foo(par, the_type<T>());
}
void foo(int par, the_type<int>)
{
cout << "int " << par << endl;
}
void foo(double par, the_type<double>)
{
cout << "double " << par << endl;
}
void foo(const string & par, the_type<string>)
{
cout << "string " << par << endl;
}
void foo(const char * par, the_type<const char*>)
{
cout << "char* " << par << endl;
}
int main()
{
foo(20);
foo(12.492);
foo("word");
foo(string("word"));
}
whose output is:
int 20
double 12.492
char* word
string word
If you need another specialization, then you simply define it. In some cases, you will have to explicitly to define the specialization as the template parameter.
You could use macro manips for avoiding repetitive things. For example, given that foo() structure is the same, you could encapsulate it in a macro. Something like this:
# define GENFOO(type_name) \
void foo(type_name par, the_type<type_name>) \
{ \
cout << #type_name " " << par << endl; \
}
GENFOO(int);
GENFOO(double);
GENFOO(string)
However, I would say that each specialized version of foo() would not be so similar.
My program gets a couple of Boolean variables from the user, and their values won't change afterwards. Each Boolean variable enables a part of code. Something like this:
#include <iostream>
void callback_function(bool task_1, bool task_2, bool task_3) {
if (task_1) {
std::cout << "Running task 1" << std::endl;
}
if (task_2) {
std::cout << "Running task 2" << std::endl;
}
if (task_3) {
std::cout << "Running task 3" << std::endl;
}
}
int main() {
bool task_1 = true;
bool task_2 = false;
bool task_3 = true;
while (true) {
callback_function(task_1, task_2, task_3);
}
return 0;
}
Now my question is, since the Boolean variables are fixed every time the program calls callback_function(), is there a way to avoid the if statements inside the callback function?
This is one way to avoid the run-time checks (implement a callback function for all permutations of the Boolean variables --- only two cases are shown below):
#include <functional>
#include <iostream>
void callback_function_for_tasks_1_2_3() {
std::cout << "Running task 1" << std::endl;
std::cout << "Running task 2" << std::endl;
std::cout << "Running task 3" << std::endl;
}
void callback_function_for_tasks_1_3() {
std::cout << "Running task 1" << std::endl;
std::cout << "Running task 3" << std::endl;
}
int main() {
bool task_1 = true;
bool task_2 = false;
bool task_3 = true;
std::function<void()> callback_function;
if (task_1 && task_2 && task_3) {
callback_function = callback_function_for_tasks_1_2_3;
} else if (task_1 && !task_2 && task_3) {
callback_function = callback_function_for_tasks_1_3;
}
while (true) {
callback_function();
}
return 0;
}
The problem is I have to implement 2^n different callback functions, if there are n Boolean variables. Is there a better way to accomplish this?
Ensuring that if statements are evaluated at compile time
C++17 introduces if constexpr, which does exactly this:
template<bool task_1, bool task_2, bool task_3>
void callback_function() {
if constexpr (task_1) {
std::cout << "Running task 1" << std::endl;
}
if constexpr (task_2) {
std::cout << "Running task 2" << std::endl;
}
if constexpr (task_3) {
std::cout << "Running task 3" << std::endl;
}
}
If you have optimizations enabled, if constexpr isn't necessary. Even if you use a regular if instead of if constexpr, because the bools are now templated, the compiler will be able to eliminate the if statements entirely, and just run the tasks. If you look at the assembly produced here, you'll see that even at -O1, there are no if statements in any of the callback functions.
We can now use callback_function directly as a function pointer, avoiding function<void()>:
int main() {
using callback_t = void(*)();
callback_t func = callback_function<true, false, true>;
// Do stuff with func
}
We can also name the bools by assigning them to constexpr variables:
int main() {
using callback_t = void(*)();
constexpr bool do_task1 = true;
constexpr bool do_task2 = false;
constexpr bool do_task3 = true;
callback_t func = callback_function<do_task1, do_task2, do_task3>;
// Do stuff with func
}
Automatically creating a lookup table of all possible callback functions
You mentioned choosing between different callback functions at runtime. We can do this pretty easily with a lookup table, and we can use templates to automatically create a lookup table of all possible callback functions.
The first step is to get a callback function from a particular index:
// void(*)() is ugly to type, so I alias it
using callback_t = void(*)();
// Unpacks the bits
template<size_t index>
constexpr auto getCallbackFromIndex() -> callback_t
{
constexpr bool do_task1 = (index & 4) != 0;
constexpr bool do_task2 = (index & 2) != 0;
constexpr bool do_task3 = (index & 1) != 0;
return callback_function<do_task1, do_task2, do_task3>;
}
Once we can do that, we can write a function to create a lookup table from a bunch of indexes. Our lookup table will just be a std::array.
// Create a std::array based on a list of flags
// See https://en.cppreference.com/w/cpp/utility/integer_sequence
// For more information
template<size_t... Indexes>
constexpr auto getVersionLookup(std::index_sequence<Indexes...>)
-> std::array<callback_t, sizeof...(Indexes)>
{
return {getCallbackFromIndex<Indexes>()...};
}
// Makes a lookup table containing all 8 possible callback functions
constexpr auto callbackLookupTable =
getVersionLookup(std::make_index_sequence<8>());
Here, callbackLookupTable contains all 8 possible callback functions, where callbackLookupTable[i] expands the bits of i to get the callback. For example, if i == 6, then i's bits are 110 in binary, so
callbackLookupTable[6] is callback_function<true, true, false>
Using the lookup table at runtime
Using the lookup table is really simple. We can get an index from a bunch of bools by bitshifting:
callback_t getCallbackBasedOnTasks(bool task1, bool task2, bool task3) {
// Get the index based on bit shifting
int index = ((int)task1 << 2) + ((int)task2 << 1) + ((int)task3);
// return the correct callback
return callbackLookupTable[index];
}
Example demonstrating how to read in tasks
We can get the bools at runtime now, and just call getCallbackBasedOnTasks to get the correct callback
int main() {
bool t1, t2, t3;
// Read in bools
std::cin >> t1 >> t2 >> t3;
// Get the callback
callback_t func = getCallbackBasedOnTasks(t1, t2, t3);
// Invoke the callback
func();
}
Leave the code as it is.
Execution time of an "if" compared to writing to std::out is practically zero, so you are arguing over nothing. Well, unless you spend some time measuring the execution time as it is, and with the if's removed according to the values of the three constants, and found that there is a real difference.
At most, you might make the function inline or static, and the compiler will probably realise the arguments are always the same when optimisation is turned on. (My compiler would give a warning that you are using a function without a prototype, which means you should have either put a prototype into a header file, telling the compiler to expect calls from other call sites, or you should have made it static, telling the compiler that it knows all the calls and can use static analysis for optimisations).
And what you think is a constant, might not stay a constant forever. The original code will work. Any new code most likely won't.
Short of JIT compilation, you can’t do better than your 2^n functions (and the resulting binary size). You can of course use a template to avoid writing them all out. To prevent the source from scaling exponentially just from selecting the correct implementation, you can write a recursive dispatcher (demo):
template<bool... BB>
auto g() {return f<BB...>;}
template<bool... BB,class... TT>
auto g(bool b,TT... tt)
{return b ? g<BB...,true>(tt...) : g<BB...,false>(tt...);}
So I've found a variety of articles and posts saying that there is no way to convert typename to string but I haven't found one about the opposite. I have a template of a function with specializations:
template <typename T>
void foo(T sth) {}
template <>
void foo<int>(int sth) {}
...
and I'm reading from a file constructed like this:
int 20
double 12.492
string word
Is there a way to call the correct specialization of foo() depending on the content of the file?
Yes there is, but it requires manual code and that you know all the types that are going to appear in the file. That's because templates are compile time constructs and they cannot be instantiated at runtime.
You can always use the preprocessor or other tricks to try and reduce the boilerplate if you want to.
void callFoo(std::string type, std::any arg) {
if (type == "int")
foo<int>(std::any_cast<int>(arg));
else if (type == "double")
foo<double>(std::any_cast<double>(arg));
else if (type == "string")
foo<std::string>(std::any_cast<std::string>(arg));
}
Of course, this requires that you pass in the correct type (no implicit conversions!). I don't see any way to avoid that.
To be honest, I am not sure about understanding your question. As I interpret it, I believe that you do not need a kind of dispatcher in running time neither to compute a string containing the type name. Simply you write a general template function that calls a special template wrapper that disambiguates the call to foo() according to the type. You require that the specialized foo() receives a second special parameter (the_type<T>) which is used for disambiguating.
Here a full and operating demo:
# include <string>
# include <iostream>
using namespace std;
template<class T> struct the_type { using type = T; };
template <typename T>
void foo(const T par)
{
foo(par, the_type<T>());
}
void foo(int par, the_type<int>)
{
cout << "int " << par << endl;
}
void foo(double par, the_type<double>)
{
cout << "double " << par << endl;
}
void foo(const string & par, the_type<string>)
{
cout << "string " << par << endl;
}
void foo(const char * par, the_type<const char*>)
{
cout << "char* " << par << endl;
}
int main()
{
foo(20);
foo(12.492);
foo("word");
foo(string("word"));
}
whose output is:
int 20
double 12.492
char* word
string word
If you need another specialization, then you simply define it. In some cases, you will have to explicitly to define the specialization as the template parameter.
You could use macro manips for avoiding repetitive things. For example, given that foo() structure is the same, you could encapsulate it in a macro. Something like this:
# define GENFOO(type_name) \
void foo(type_name par, the_type<type_name>) \
{ \
cout << #type_name " " << par << endl; \
}
GENFOO(int);
GENFOO(double);
GENFOO(string)
However, I would say that each specialized version of foo() would not be so similar.