Template parameter as function specifier and compiler optimization - c++

I have found this very useful post and I`d like to clarify something about the compiler optimizations. Lets say we have this function (same like in the original post):
template<int action>
__global__ void kernel()
{
switch(action) {
case 1:
// First code
break;
case 2:
// Second code
break;
}
}
Would the compiler do the optimization in the sense of eliminating an unreachable code even in the case I called the function with template variable unknown in the time of compiling - something like creating two separete functions? E.g.:
kernel<argv[1][0]>();

Short answer: no.
Templates are instantiated and generated purely at compiletime, so you can't use the values in argv, since they are not known at compile time.
Makes me wonder why you did not just give it a try and threw that code at a compiler - it would have told you that template arguments must be compile time constants.
Update:
Since you told us in the comments that it's not primarily about performance, but about readability, i'd recommend using switch/case:
template <char c> void kernel() {
//...
switch(c) { /* ... */ }
}
switch (argv[1][0]) {
case 'a':
kernel<'a'>();
break;
case 'b':
kernel<'b'>();
break;
//...
}
Since the value you have to make the descision on (i.e. argv[1][0]), is only known at runtime, you have to use runtime descision mechanisms. Of those, switch/case is among the fastest, especially if there are not too many different cases (but more than two) and especially if there are no gaps between the cases (i.e. 'a', 'b', 'c', instead of 1, 55, 2048). The compiler then can produce very fast jumptables.

Being new to templates I`d had to study some essential matters. Finally I came up with the solution to my problem. If I want to call functions with template parameters depending on command line arguments I should do it like this:
if(argv[1][0] == '1')
kernel<1><<< ... >>>();
if(argv[1][0] == '2')
kernel<2><<< ... >>>();
I also checked ptx file of such program and found out that compiler makes in this case optimization producing two different kernel functions without switch statement.

Related

Code before first case in switch statement

Please note that this is a question about C++ language, not about how real or useful is the example I'm giving to illustrate it.
Imagine we have an enum in a namespace (or namespaces):
namespace SomeVeryLargeNamespaceExample {
enum class E {
One,
Two,
};
}
Now, we want to use it in the expression of a switch statement. Today I found that it is possible to add the using SomeVeryLargeNamespaceExample::E inside the switch statement, before the first case, reducing the code cluttering in the cases:
switch (e) {
using SomeVeryLargeNamespaceExample::E;
case E::One:
std::cout << "One\n";
break;
case E::Two:
std::cout << "Two\n";
break;
}
At first glance I thought it was some kind of "do this before any case statement" feature that I never learned about, but some expressions, such as a function call, are not executed (actually, gcc generates a warning statement will never be executed [-Wswitch-unreachable]). On the other hand, a variable declaration is possible (but not initialization).
My question is, what other statements are possible? Is this a feature or just a consequence of how switch is designed?
Note: I found this other question, but it is specific for C (so it doesn't mention the namespaces case, for example), and I'm curious about C++.
Any valid C++ statement is possible:
int a;
void foo(int x)
{
switch (x) {
a=4;
case 0:
a=1;
break;
case 1:
a=2;
break;
}
}
This is syntactically valid C++, and gcc has no issues compiling it and producing an executable. However the initial statement can never be reached, of course. Therefore every modern C++ compiler will give you a warning message:
t.C: In function ‘void foo(int)’:
t.C:8:18: warning: statement will
never be executed [-Wswitch-unreachable]
8 | a=4;
|

std::type_info for array of runtime defined length

When using typeid(char[10]) one gets the std::type_info for char[10].
Now I have the problem that I need to get typeid(char[n]) where n isn't constexpr.
Is there a way to do that?
My current implementation just uses templates in a recursive way to generate calls from typeid(char[1]) to typeid(char[100]) and then choose the right call with a recursive function that stops at the right number.
While this works it only works for up to 100 and if I increase it much higher it will generate a lot of code or stop compiling because of too deep recursion
Are there other possibilities?
Naive implementation that would do what I want:
const std::type_info& getTypeInfoForCharArray(size_t len)
{
switch(len)
{
case 1: return typeid(char[1]);
case 2: return typeid(char[2])
case 3: return typeid(char[3])
case 4: return typeid(char[4])
...
}
}
Background
Now one might ask why I need such a function. To put it shortly, I have to integrate the definition of multiple structs from multiple DLLs where the lengths members can change and shouldn't require a recompilation of the code I work on. I need this to properly allocate and access the memory for those structs so I can call functions in those DLL.
Part of the implementation is a runtime type check for field access to avoid access violations because the C++ compiler can't check for those without knowing the struct at compile time. All this works well except for arrays.
If the answer to my question is "no, it can't be done" then I'll just have to treat arrays differently than other types.
You can use std::integer_sequence:
template <typename T>
struct Helper;
template <std::size_t ...L>
struct Helper<std::integer_sequence<std::size_t, L...>> {
static const std::type_info &get(std::size_t len) {
static const std::type_info *a[sizeof...(L)] = {&typeid(char[L])...};
return *a[len];
}
};
const std::type_info &getTypeInfoForCharArray(std::size_t len) {
const std::size_t max = 10000;
assert(len<=max);
return Helper<std::make_integer_sequence<std::size_t, max+1>>::get(len);
}
int main() {
auto &t = getTypeInfoForCharArray(10000);
std::cout << t.name() << "\n";
}
This compiles in ~1 second with clang (with max size of 10,000).
Note, that this solution will generate all the type_info objects from 0 to max, which may need a significant amount of data (for this example, the resulting binary is ~1 MB), as the compiler needs to generate all type_info objects into the binary.
I think if you don't have the list of possible sizes beforehand, this is the best you can do
(or maybe you can consider some compiler dependent solution. Like exploiting the fact the we know of the format of type_info::name(), but this is a hacky solution. But maybe this is fine, if you use this feature for debugging only).

Multiply chars of a string at compile time (in enumeration)

currently I am doing the following:
enum TC_ID {
CMD01 = 'C'*'M'*'D'*'0'*'1',
CMD02 = 'C'*'M'*'D'*'0'*'2',
..
};
Which works, but is going to be quite effortfull for a whole lot of commands :D
So, I am looking for a Macro, or inline function or something else which multiplies all the chars of a char array/string with a fixed size, so that I don't have to type them in manually in my code.
Is something like this possible?
Some unnecessary but maybe interesting information:
Well, this looks kinda stupid, why am I doing this you might ask ;)
My goal is to use this enum in a switch statement, which in the end is used to execute telecommands for my project.
The size of my telecommands is always 5.
So I am calculating some kind of very simple hash value which will be used inside the switch statement:
char *id // contains the Telecommand as a string
TC_ID hash = static_cast<TC_ID>(id[0]*id[1]*id[2]*id[3]*id[4]);
switch (hash) {
case (CMD01):
// execute funtion..
break;
case (CMD02):
// do something else
break;
default:
// unknown command
}
I know that instead of a switch I could just use a lot of if else statements and strcmp, but I don't want to because it's ugly :D
EDIT: Also, using an appropriate hash function would be much better.
However, how can this be implemented in an enumeration, so that I can still use my switch statement for the commands?
I think what I want is basically some kind of hash table which I can generate at the start for all command words and then make a switch over all of them.. but just how?
EDIT2: My compiler version is C++98
EDIT3: Workaround solution in comment in answer post
This works (C++11):
constexpr int multChars(const char* s /*string*/, int t = 1 /*tally*/){
return *s ? multChars(s+1, t*(*s)) : t;
};
//--------------------------------------------------------
//test it on a template (won't compile unless N is evaluated at compile time)
#include <iostream>
template<int N>
void printN() { std::cout<<N<<'\n'; }
int main(){
printN<multChars("ab")>();
return 0;
}
The ascii code of 'a' is 97 and the ascii code of 'b' is 98.
This returns 9506 as expected.

Templates: instantiating from (and refering to) non-typed parameter at runtime?

I developed a generic "Unsigned" class, or really a class template Unsigned<size_t N> that models after the C (C++) built-in unsigneds using the amount of uint8_ts as a parameter. For example Unsigned<4> is identical to a uint32_t and Unsigned<32> would be identical to a uint256_t -- if it existed.
So far I have managed to follow most if not all of the semantics expected from a built-in unsigned -- in particular sizeof(Natural<N>)==N, (Natural<N>(-1) == "max_value_all_bits_1" == ~Natural<N>(0)), compatibility with abs(), sign(), div (using a custom div_t structure), ilogb() (exclusive to GCC it seems) and numeric_limits<>.
However I'm facing the issue that, since 1.- a class template is just a template so templated forms are unrelated, and 2.- the template non-typed parameter requires a "compile-time constant", which is way stricter than "a const", I'm essentially unable to create a Unsigned given an unknown N.
In other words, I can't have code like this:
...
( ... assuming all adequate headers are included ...)
using namespace std;
using lpp::Unsigned;
std::string str;
cout<< "Enter an arbitrarily long integer (end it with <ENTER>) :>";
getline(cin, str, '\n');
const int digits10 = log10(str.length()) + 1;
const int digits256 = (digits10 + 1) * ceil(log(10)/log(256)); // from "10×10^D = 256^T"
// at this point, I "should" be able to, semantically, do this:
Unsigned<digits256> num; // <-- THIS I CAN'T -- num would be guaranteed
// big enough to hold str's binary expression,
// no more space is needed
Unsigned::from_str(num, str); // somehow converts (essentially a base change algo)
// now I could do whatever I wanted with num "as if" a builtin.
std::string str_b3 = change_base(num, 3); // a generic implemented somehow
cout<< "The number above, in base 3, is: "<< str_b3<< endl;
...
(A/N -- This is part of the testsuite for Unsigned, which reads a "slightly large number" (I have tried up to 120 digits -- after setting N accordingly) and does things like expressing it in other bases, which in and of itself tests all arithmethic functions already.)
In looking for possible ways to bypass or otherwise alleviate this limitation, I have been running into some concepts that I'd like to try and explore, but I wouldn't like to spend too much effort into an alternative that is only going to make things more complicated or that would make the behaviour of the class(es) deviate too much.
The first thing I thought was that if I wasn't able to pick up a Unsigned<N> of my choice, I could at least pick up from a set of pre-selected values of N which would lead to the adequate constructor being called at runtime, but depending on a compile-time value:
???? GetMeAnUnsigned (size_t S) {
switch (S) {
case 0: { throw something(); } // we can't have a zero-size number, right?
case 1, 2, 3, 4: { return Unsigned<4>(); break; }
case 5, 6, 7, 8: { return Unsigned<8>(); break; }
case 9, 10, 11, 12, 13, 14, 15, 16: { return Unsigned<16>(); break; }
....
default: { return Unsigned<128>(); break; } // wow, a 1Kib number!
} // end switch
exit(1); // this point *shouldn't* be reachable!
} // end function
I personally like the approach. However I don't know what can I use to specify the return type. It doesn't actually "solve" the problem, it only degrades its severity by a certain degree. I'm sure doing the trick with the switch would work since the instantiations are from compile-time constant, it only changes which of them will take place.
The only viable help to declare the return type seems to be this new C++0(1?)X "decltype" construct which would allow me to obtain the adequate type, something like, if I understood the feature correctly:
decltype (Unsigned<N>) GetMeAnUnsigned (size_t S) {
.. do some choices that originate an N
return Unsigned<N>();
}
... or something like that. I haven't entered into C++?X beyond auto (for iterators) yet, so the first question would be: would features like decltype or auto help me to achieve what I want? (Runtime selection of the instantiation, even if limited)
For an alternative, I was thinking that if the problem was the relation between my classes then I could make them all a "kind-of" Base by deriving the template itself:
template <size_t N>
class Unsigned : private UnsignedCommon { ...
... but I left that approach in the backburner because, well, one doesn't do that (make all a "kind-of") with built-ins, plus for the cases where one does actually treat them as a common class it requires initializing statics, returning pointers and leave the client to destruct if I recall correctly. Second question then: did I do wrong in discarding this alternative too early?
In a nutshell, your problem is no different from that of the built-in integral types. Given a short, you can't store large integers in it. And you can't at runtime decide which type of integer to use, unless you use a switch or similar to choose between several predefined options (short, int, long, long long, for example. Or in your case, Unsigned<4>, Unsigned<8>, Unsigned<256>. The size cannot be computed dynamically at runtime, in any way.
You have to either define a dynamically sized type (similar to std::vector), where the size is not a template parameter, so that a single type can store any type of integer (and then accept the loss of efficiency that implies), or accept that the size must be chosen at compile-time, and the only option you have for handling "arbitrary" integers is to hardcode a set of predefined sizes and choose between them at runtime.
decltype won't solve your problem either. It is fairly similar to auto, it works entirely at compile-time, and just returns the type of an expression. (The type of 2+2 is int and the compiler knows this at compiletime, even though the value 4 is only computed at runtime)
The problem you are facing is quite common. Templates are resolved at compile time, while you need to change your behavior at runtime. As much as you might want to do that with the mythical one extra layer of indirection the problem won't go away: you cannot choose the return type of your function.
Since you need to perform the operations based on runtime information you must fall back to using dynamic polymorphism (instead of the static polymorphism that templates provide). That will imply using dynamic allocation inside the GetMeAnUnsigned method and possibly returning a pointer.
There are some tricks that you can play, like hiding the pointer inside a class that offers the public interface and delegates to an internal allocated object, in the same style as boost::any so that the user sees a single type even if the actual object is chosen at runtime. That will make the design harder, I am not sure how much more complex the code will be, but you will need to really think on what is the minimal interface that you must offer in the internal class hierarchy to fulfill the requirements of the external interface --this seems like a really interesting problem to tacke...
You can't directly do that. Each unsigned with a separate number has a separate type, and the compiler needs to know the return type of your method at compile time.
What you need to do is have an Unsigned_base base class, from which the Unsigned<t> items derive. You can then have your GetMeAnUnsigned method return a pointer to Unsigned_base. That could then be casted using something like dynamic_cast<Unsigned<8> >().
You might be better off having your function return a union of the possible unsigned<n> types, but that's only going to work if your type meets the requirements of being a union member.
EDIT: Here's an example:
struct UnsignedBase
{
virtual ~UnsignedBase() {}
};
template<std::size_t c>
class Unsigned : public UnsignedBase
{
//Implementation goes here.
};
std::auto_ptr<UnsignedBase> GiveMeAnUnsigned(std::size_t i)
{
std::auto_ptr<UnsignedBase> result;
switch(i)
{
case 42:
result.reset(new Unsigned<23>());
default:
result.reset(new Unsigned<2>());
};
return result;
}
It's a very common problem indeed, last time I saw it was with matrices (dimensions as template parameters and how to deal with runtime supplied value).
It's unfortunately an intractable problem.
The issue is not specific to C++ per se, it's specific to strong typing coupled with compile-time checking. For example Haskell could exhibit a similar behavior.
There are 2 ways to deal with this:
You use a switch not to create the type but actually to launch the full computation, ie main is almost empty and only serve to read the input value
You use boxing: you put the actual type in a generic container (either by hand-crafted class or boost::any or boost::variant) and then, when necessary, unbox the value for specific treatment.
I personally prefer the second approach.
The easier way to do this is to use a base class (interface):
struct UnsignedBase: boost::noncopyable
{
virtual ~UnsignedBase() {}
virtual UnsignedBase* clone() const = 0;
virtual size_t bytes() const = 0;
virtual void add(UnsignedBase const& rhs) = 0;
virtual void substract(UnsignedBase const& rhs) = 0;
};
Then you wrap this class in a simple manager to ease memory management for clients (you hide the fact that you rely on heap allocation + unique_ptr):
class UnsignedBox
{
public:
explicit UnsignedBox(std::string const& integer);
template <size_t N>
explicit UnsignedBox(Unsigned<N> const& integer);
size_t bytes() const { return mData->bytes(); }
void add(UnsignedBox const& rhs) { mData->add(rhs.mData); }
void substract(UnsignedBox const& rhs) { mData->substract(rhs.mData); }
private:
std::unique_ptr<UnsignedBase> mData;
};
Here, the virtual dispatch takes care of unboxing (somewhat), you can also unbox manually using a dynamic_cast (or static_cast if you know the number of digits):
void func(UnsignedBase* i)
{
if (Unsigned<2>* ptr = dynamic_cast< Unsigned<2> >(i))
{
}
else if (Unsigned<4>* ptr = dynamic_cast< Unsigned<4> >(i))
{
}
// ...
else
{
throw UnableToProceed(i);
}
}

Efficient switch statement

In the following two versions of switch case, I am wondering which version is efficient.
1:
string* convertToString(int i)
{
switch(i)
{
case 1:
return new string("one");
case 2:
return new string("two");
case 3:
return new string("three");
.
.
default:
return new string("error");
}
}
2:
string* convertToString(int i)
{
string *intAsString;
switch(i)
{
case 1:
intAsString = new string("one");
break;
case 2:
intAsString = new string("two");
break;
case 3:
intAsString = new string("three");
break;
.
.
default:
intAsString = new string("error");
break;
}
return intAsString;
}
1: has multiple return statements will it cause compiler to generate extra code?
This is a premature optimization worry.
The former form is clearer and has fewer source lines, that is a compelling reason to chose it (in my opinion), of course.
You should (as usual) profile your program to determine if this function is even on the "hot list" for optimization. This will tell you if there is a performance penalty for using break.
As was pointed out in the comments, it's very possible that the main performance culprit of this code is the dynamically allocated strings. Generally, when implementing this kind of "integer to string" mapping function, you should return string constants.
Both are.
What you should really be concerned about is your use of pointers here. Is it necessary? Who will delete these strings? Isn't there a simpler alternative?
There should be no difference in the compiled code.
However:
You'll probably find returning the strings by value to be more efficient.
If there are a lot of strings consider prepopulating a vector with them (or declare a static array) and use i as the index in.
A switch statement is basically a series of if statements as generated machine instructions. One simple optimization strategy is to place the most frequent case first in the switch statement.
I also recommend the same solution as Sebastian but without the assert.
static const char *numberAsString[] = {
"Zero",
"One",
"Two",
"Three",
"Four",
"Five",
"Six",
};
const char *ConvertToString(int num) {
if (num < 1 || num >= (sizeof(numberAsString)/sizeof(char*)))
return "error";
return numberAsString[num];
}
You can never know how optimization will influence the code produced unless you compile with a specific compiler version, a specific set of settings and a specific code base.
C++ optimizing compilers may decide to turn your source code upside down to gain a specific optimization only available for compiler architecture so-and-so without you ever knowing it. A powerful optimizing compiler may e.g. find out that only 2 out of 10 cases are ever needed and will optimize away the whole switch-case-statement.
So my answer to your question is: Mu.
If you turn optimizing on, both functions will very likely generate equivalent code.
The compiler most probably will optimize both versions to the same code.
They will almost certainly both be compiled to an identical, highly-efficient branch table. Use whichever one you feel is clearer.
I would suggest something of the form:
void CScope::ToStr( int i, std::string& strOutput )
{
switch( i )
{
case 1:
strOutput = "Some text involving the number 1";
... etc etc
}
By returning a pointer to a string created on the heap, you risk memory leaks. Specifically regarding your question, I would suggest that the least number of return paths is more advisable than premature optimisation.
Consider keeping the strings as static constants:
static char const g_aaczNUMBER[][] =
{
{"Zero"}, { "One" }, ...
};
static char const g_aczERROR[] = { "Error" };
char* convertIntToString(int i) const {
return i<0 || 9<i ? g_aczERROR : g_aaczNUMBER[i];
}
You optimise[*] switch statements by doing as little work as possible in the switch (because it's uncertain whether the compiler will common up the duplication). If you insist on returning a string by pointer, and using a switch statement, I'd write this:
string *convertToString(int i) {
const char *str;
switch(i) {
case 1 : str = "one"; break;
// etc
default : str = "error"; break;
}
return new string(str);
}
But of course for this example I'd probably just use a lookup table:
const char *values[] = {"error", "one", ... };
string convertToString(unsigned int i) {
if (i >= sizeof(values)/sizeof(*values)) i = 0;
return values[i];
}
That said, I just answered a question about the static initialization order fiasco, so you don't in general want rules of thumb which demand globals. What you do has to depend on the context of the function.
[*] Where I mean the kind of rule-of-thumb optimisation that you do when writing portable code, or in your first version, in the hope of creating code that is clear to read and won't need too much real optimisation. Real optimisation involves real measurements.
There won't be any difference in efficiency here. Certainly none that will matter. The only benefit of going with option #2 is if you'll need to do some post-processing of the string that applies to all cases.
There should not be any measurable difference, return statements should not generate any machinery. They should put a pointer to the string object (allocated on the heap) on the stack of the callsite.
The funny part is you worry about efficieny of break then return but make a new string every time.
The answer is it's up to the compiler, but it should not matter either way. Avoiding the new string will if you call this all the time.
The switch can often be optimized so that it performs a jump instead of a bunch of if else, but if you look in the assembly source you'll generally be underwhelmed by how little the optimizer does.