generic loop-unrolling with user-defined function - c++

Recently I decided to make loop unroller. My function iterates from _Beg to _End(that are templates parameters) and calls function _func on each index:
template<size_t _Beg, size_t _End, typename _Func>
typename std::enable_if<_Beg < _End, void>::type
for_r(_Func _func)
{
_func(_Beg);
for_r<_Beg+1, _End>(_func);
}
template<size_t _Beg, size_t _End, typename _Func>
typename std::enable_if<_Beg >= _End, void>::type
for_r(_Func _func)
{}
It works ok like this:
for_r<0, 10>([](size_t index){cout << index << endl;});
However, 'index' variable is known at compile time, so it would be logically to be possible to use 'index' as constant expression in lambda. Like this:
tuple<int, int, int, int> tpl(1, 2, 3, 4);
for_r<0, 4>([&](size_t index){cout << get<index>(tpl) << endl;});
But 'index' is variable, and it is not possible to pass it into lambda as constexpr. Is there some way to deal with it and implement logical behaviour without explicitly typing loop unrolling like this:
cout << get<0>(tpl) << endl << get<1>(tpl) << endl << get<2>(tpl) << endl << get<3>(tpl) << endl;
?

Since 7 years passed and now we have nice features like C++17's fold expressions and C++20's explicitly templated lambdas, now it's possible to both write and use such "constexpr for"-like things easily. For example:
// Offsets std::integer_sequence by OFFSET
template <typename T, T OFFSET, T... IDXS>
constexpr std::integer_sequence<T, OFFSET + IDXS...> offset(
std::integer_sequence<T, IDXS...>)
{
return {};
}
// Calls templated operator() of given function object
// with INDEX template parameter from range [START; END)
template<std::integral T, T START, T END>
void for_constexpr_interval(auto iteration)
{
if constexpr (START >= END)
return;
[&]<T... IDXS>(std::integer_sequence<T, IDXS...>) {
(iteration.template operator()<IDXS>(), ...);
}(offset<T, START>(std::make_integer_sequence<T, END - START>()));
}
Here inside for_constexpr_interval we use explicitly templated immediately invoked lambda to introduce template parameter pack IDXS through deduction from std::integer_sequence argument (created with the help of offset function on top of std::make_integer_sequence to represent [START; END) interval) while avoiding polluting namespace with unnecessary helper functions. This pack can now be directly used by fold expression without the need to resort to recursive instantiations. In that fold expression we call iteration function object's templated operator() with appropriate template parameter. The syntax .template operator() is needed to disambiguate that operator() denotes a template, so that, e.g., following < is not interpreted as "less than" sign.
It can be used like this:
std::tuple tpl(1, 2, 3, 4);
for_constexpr_interval<size_t, 0, 4>([&]<size_t INDEX>() {
if constexpr (INDEX > 0)
std::cout << ' ';
std::cout << std::get<INDEX>(tpl);
});
std::cout << std::endl;
for_constexpr_interval<int, -1, 2>([&]<int INDEX>() {
std::cout << INDEX << std::endl;
});
Godbolt
Note that when passing function objects to for_constexpr_interval we yet again utilise explicitly templated lambdas. Also, it can be seen from the generated assembly that even at -O1 optimisations calls in main get inlined, which happens at least for the latest GCC, Clang and MSVC. If needed, inlining can be forced by compiler-specific attributes/specifiers (on any combination of for_constexpr_interval, inner immediately invoked lambda, passed lambda), though usually such decisions are better left to compiler.
Finally, for_constexpr_interval can be changed/generalized/supplemented further, e.g. by allowing reverse traversal of indices, STEP (increment) argument or non-std::integral index types, but this will require more complex implementation for creating sequence of indices in inner lambda's argument (probably abandoning std::make_integer_sequence for custom alternative) and some care for floating-point indices.

Related

Boost Hana filter a constexpr tuple

Super basic question about Boost Hana.
From the examples it seems I should be able to do the following:
int main() {
constexpr auto a = boost::hana::make_tuple(-1, 4, 5, -4);
constexpr auto b = boost::hana::filter(a, [](int elem) constexpr { boost::hana::bool_c<(elem > 0)>;});
}
However, I get
error: 'elem' is not a constant expression
which seems weird because I added constexpr wherever I could...
Is this at all possible? Or am I just missing something?
Note: I realize that you can achieve this with constexpr functions or something, but I would like to see how to do it with Hana as well, for educational purposes.
The return type must be statically deducible. However, you are returning different types depending on the argument.
You could do it, but not using hana::filter because that specifically moves into the runtime domain by putting the (statically known) tuple elements in (runtime) function arguments.
As soon as your predicate depends on more than statically known information (i.e. the types of the elements), it won't be constantly evaluated. The examples document how it could be used:
static_assert(hana::filter(hana::make_tuple(1, 2.0, 3, 4.0), is_integral) == hana::make_tuple(1, 3), "");
static_assert(hana::filter(hana::just(3), is_integral) == hana::just(3), "");
BOOST_HANA_CONSTANT_CHECK(hana::filter(hana::just(3.0), is_integral) == hana::nothing);
On the other hand, you can have your cake and eat it if you move the elements to the compile time domain:
Live On Coliru
constexpr auto a = hana::make_tuple(-1_c, 4_c, 5_c, -4_c);
constexpr auto b = hana::filter( //
a, //
[](auto elem) constexpr { return std::integral_constant<bool, (elem() > 0)>{}; } //
);
std::cout << "a:" << hana::size(a) << "\n";
std::cout << "b:" << hana::size(b) << "\n";
Prints
a:4
b:2

What does "::type=0" in first template mean?

The following is part of the code:
// Null-delimited strings, and the like.
template < typename CharT,
typename std::enable_if <
std::is_pointer<CharT>::value &&
!std::is_array<CharT>::value &&
std::is_integral<typename std::remove_pointer<CharT>::type>::value &&
sizeof(typename std::remove_pointer<CharT>::type) == 1,
int >::type = 0 >
contiguous_bytes_input_adapter input_adapter(CharT b)
{
auto length = std::strlen(reinterpret_cast<const char*>(b));
const auto* ptr = reinterpret_cast<const char*>(b);
return input_adapter(ptr, ptr + length);
}
template<typename InputType>
JSON_HEDLEY_WARN_UNUSED_RESULT
static basic_json parse(InputType&& i,
const parser_callback_t cb = nullptr,
const bool allow_exceptions = true,
const bool ignore_comments = false)
{
basic_json result;
parser(detail::input_adapter(std::forward<InputType>(i)), cb, allow_exceptions, ignore_comments).parse(true, result);
return result;
}
static const char *g_sJsonTextInput = "{"
" \"_nested\": {"
" \"_bool\": false,"
" \"_int\": 0,"
" \"_double\": 0,"
" \"_string\": \"foo\""
" }"
"}";
parse(g_sJsonTextInput);
I think the result of "std::enable_if<...>::type" is the first template parameter, is that correct?
If it’s right, how to understand "::type = 0"?
Please help, Thanks!
typename std::enable_if<...>::type is the type of the second template parameter. That parameter is unnamed. It has a default argument = 0, which is used if you don't specify any argument for it (which is the intent here).
the result of "std::enable_if<...>::type" is the first template parameter
It's either the second parameter of enable_if if the condition is true, or an invalid type if the condition is false. It being invalid would trigger SFINAE and disable this function for a specific CharT type.
In modern C++, you would use a shorter notation:
std::enable_if_t<...> (without typename and ::type).
std::is_..._v<...> (instead of std::is_...<...>::value).
Also, instead of int =0 you should use std::nullptr_t =nullptr. With an int, user can inadvertently create several different instantiations of your template by passing different integers to it, which is impossible with std::nullptr_t because it only has one possible value.
Or, if you use C++20, you should use the requires notation.
This is SFINAE (substitution failure is not an error) template stuff. If we look at enable_if, then it says:
template< bool B, class T = void >
struct enable_if;
If B is true, std::enable_if has a public member typedef type, equal
to T; otherwise, there is no member typedef.
So if the big boolean expression evaluates to true, the enable_if expression substitutes to basically int = 0 which is ok, but if it evaluates to false it substitutes to = 0 which fails to compile - but only results in a compilation error if all other possible substitutions also fail (per the SFINAE name).
std::enable_if<...>::type is a hack of the template resolution system.
The SFINAE principle states that if a template resolution is not possible for a function call, the compilation should not fail, but simply try the next (and possibly worse) template resolution.
std::enable_if is defined in such a way that if it's first argument evaluates to false, the ::type is not defined.
to break it down:
template <typename Something,
typename int = 0>
void function() {}
is valid C++.
So is std::enable_if<condition_evaluating_to_true, SomeType>::type (and the whole expression evaluates to SomeType).
std::enable_if<condition_evaluating_to_false, SomeType>::type is not a valid expression.
std::enable_if<...>::type = 0 roughly means "only apply the template resolution containing std::enable_if when the condition within enable_if evaluates to true".

Is there a way to make a type variable in c++?

I am curious about a way to make a type variable.
What I mean by that is explained in the code below:
using var = type_var<int>; // currently var is of type type_var<int>
/// somewhere like in constexpr functions
var::store<float>; // now var is of type type_var<float>
static_assert(std::is_same<var::get_type, float>::value, "");
Of course, as far as I know, this code will never work, since using will make var 'immutable'.
But still, I wonder if there is a way to store types mutably.
What I am asking in this question is, is there a way to make an element that stores 'type' which type contained in the element can change in compile time.
The simple answer is No!
The c++ programming language did not have something like "compile time variables". All and everything follows the One Definition Rule (ODR)
C++ offers with templates a own kind of compile time language, often named as Template MetaProgramming (TMP) The TMP language uses the general concept of a functional programming language.
Taken from the above linked text:
Functional programs do not have assignment statements, that is, the value of a variable in a functional program never changes once defined.
If I understand your pseudo example code, you think something like the following:
template < auto x >
struct Variable
{
static constexpr decltype(x) value = x;
};
Variable< 10 > intvar;
Variable< 'a' > charvar;
int main()
{
// accessing the data:
std::cout << intvar.value << std::endl;
std::cout << charvar.value << std::endl;
}
But in the world of templates and types you have no chance to "assign" a new value nor type to the template anymore, simply by not having any kind of syntax for it.
You can program also algorithms in TMP, but all the "results" of "calls" are not in any kind variable, they always define new "values".
Example of some template metaprogramming. The example shows how to write a
"add" "function". It will add two type containers...
// define a data structure which can contain a list of types
template < typename ... TYPES >
struct TypeContainer;
// let us define some TMP "variables", which are types in c++
using list1 = TypeContainer<int, float, int >;
using list2 = TypeContainer< char, bool, int >;
// and now we define a TMP "function"
template < typename ... PARMS > struct Concat;
// which simply adds two typelists
template < typename ... LIST1, typename ... LIST2 >
struct Concat< TypeContainer< LIST1... >, TypeContainer< LIST2...>>
{
using RESULT = TypeContainer< LIST1..., LIST2...>;
};
using list3 = Concat<list1, list2 >::RESULT;
// But you never can change the "variable", because of the
// One Definition Rule (ODR)
// ( will fail to compile )
//using list2 = Concat<list1, list2 >::RESULT;
// helper to let us know what we have in the typelists:
// works for gcc or clang but is implementation specific
template < typename T>
void Print()
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
int main()
{
Print<list3>();
}
You will find such algorithms already defined in the STL. There you have std::tuple as the type container and std::tuple_cat to do the "add" of two tuples. My code should only give you simple example to understand what we are doing without doing some magic things from inside the STL.

Why isn't a for-loop a compile-time expression?

If I want to do something like iterate over a tuple, I have to resort to crazy template metaprogramming and template helper specializations. For example, the following program won't work:
#include <iostream>
#include <tuple>
#include <utility>
constexpr auto multiple_return_values()
{
return std::make_tuple(3, 3.14, "pi");
}
template <typename T>
constexpr void foo(T t)
{
for (auto i = 0u; i < std::tuple_size<T>::value; ++i)
{
std::get<i>(t);
}
}
int main()
{
constexpr auto ret = multiple_return_values();
foo(ret);
}
Because i can't be const or we wouldn't be able to implement it. But for loops are a compile-time construct that can be evaluated statically. Compilers are free to remove it, transform it, fold it, unroll it or do whatever they want with it thanks to the as-if rule. But then why can't loops be used in a constexpr manner? There's nothing in this code that needs to be done at "runtime". Compiler optimizations are proof of that.
I know that you could potentially modify i inside the body of the loop, but the compiler can still be able to detect that. Example:
// ...snip...
template <typename T>
constexpr int foo(T t)
{
/* Dead code */
for (auto i = 0u; i < std::tuple_size<T>::value; ++i)
{
}
return 42;
}
int main()
{
constexpr auto ret = multiple_return_values();
/* No error */
std::array<int, foo(ret)> arr;
}
Since std::get<>() is a compile-time construct, unlike std::cout.operator<<, I can't see why it's disallowed.
πάντα ῥεῖ gave a good and useful answer, I would like to mention another issue though with constexpr for.
In C++, at the most fundamental level, all expressions have a type which can be determined statically (at compile-time). There are things like RTTI and boost::any of course, but they are built on top of this framework, and the static type of an expression is an important concept for understanding some of the rules in the standard.
Suppose that you can iterate over a heterogenous container using a fancy for syntax, like this maybe:
std::tuple<int, float, std::string> my_tuple;
for (const auto & x : my_tuple) {
f(x);
}
Here, f is some overloaded function. Clearly, the intended meaning of this is to call different overloads of f for each of the types in the tuple. What this really means is that in the expression f(x), overload resolution has to run three different times. If we play by the current rules of C++, the only way this can make sense is if we basically unroll the loop into three different loop bodies, before we try to figure out what the types of the expressions are.
What if the code is actually
for (const auto & x : my_tuple) {
auto y = f(x);
}
auto is not magic, it doesn't mean "no type info", it means, "deduce the type, please, compiler". But clearly, there really need to be three different types of y in general.
On the other hand, there are tricky issues with this kind of thing -- in C++ the parser needs to be able to know what names are types and what names are templates in order to correctly parse the language. Can the parser be modified to do some loop unrolling of constexpr for loops before all the types are resolved? I don't know but I think it might be nontrivial. Maybe there is a better way...
To avoid this issue, in current versions of C++, people use the visitor pattern. The idea is that you will have an overloaded function or function object and it will be applied to each element of the sequence. Then each overload has its own "body" so there's no ambiguity as to the types or meanings of the variables in them. There are libraries like boost::fusion or boost::hana that let you do iteration over heterogenous sequences using a given vistior -- you would use their mechanism instead of a for-loop.
If you could do constexpr for with just ints, e.g.
for (constexpr i = 0; i < 10; ++i) { ... }
this raises the same difficulty as heterogenous for loop. If you can use i as a template parameter inside the body, then you can make variables that refer to different types in different runs of the loop body, and then it's not clear what the static types of the expressions should be.
So, I'm not sure, but I think there may be some nontrivial technical issues associated with actually adding a constexpr for feature to the language. The visitor pattern / the planned reflection features may end up being less of a headache IMO... who knows.
Let me give another example I just thought of that shows the difficulty involved.
In normal C++, the compiler knows the static type of every variable on the stack, and so it can compute the layout of the stack frame for that function.
You can be sure that the address of a local variable won't change while the function is executing. For instance,
std::array<int, 3> a{{1,2,3}};
for (int i = 0; i < 3; ++i) {
auto x = a[i];
int y = 15;
std::cout << &y << std::endl;
}
In this code, y is a local variable in the body of a for loop. It has a well-defined address throughout this function, and the address printed by the compiler will be the same each time.
What should be the behavior of similar code with constexpr for?
std::tuple<int, long double, std::string> a{};
for (int i = 0; i < 3; ++i) {
auto x = std::get<i>(a);
int y = 15;
std::cout << &y << std::endl;
}
The point is that the type of x is deduced differently in each pass through the loop -- since it has a different type, it may have different size and alignment on the stack. Since y comes after it on the stack, that means that y might change its address on different runs of the loop -- right?
What should be the behavior if a pointer to y is taken in one pass through the loop, and then dereferenced in a later pass? Should it be undefined behavior, even though it would probably be legal in the similar "no-constexpr for" code with std::array showed above?
Should the address of y not be allowed to change? Should the compiler have to pad the address of y so that the largest of the types in the tuple can be accommodated before y? Does that mean that the compiler can't simply unroll the loops and start generating code, but must unroll every instance of the loop before-hand, then collect all of the type information from each of the N instantiations and then find a satisfactory layout?
I think you are better off just using a pack expansion, it's a lot more clear how it is supposed to be implemented by the compiler, and how efficient it's going to be at compile and run time.
Here's a way to do it that does not need too much boilerplate, inspired from http://stackoverflow.com/a/26902803/1495627 :
template<std::size_t N>
struct num { static const constexpr auto value = N; };
template <class F, std::size_t... Is>
void for_(F func, std::index_sequence<Is...>)
{
using expander = int[];
(void)expander{0, ((void)func(num<Is>{}), 0)...};
}
template <std::size_t N, typename F>
void for_(F func)
{
for_(func, std::make_index_sequence<N>());
}
Then you can do :
for_<N>([&] (auto i) {
std::get<i.value>(t); // do stuff
});
If you have a C++17 compiler accessible, it can be simplified to
template <class F, std::size_t... Is>
void for_(F func, std::index_sequence<Is...>)
{
(func(num<Is>{}), ...);
}
In C++20 most of the std::algorithm functions will be constexpr. For example using std::transform, many operations requiring a loop can be done at compile time. Consider this example calculating the factorial of every number in an array at compile time (adapted from Boost.Hana documentation):
#include <array>
#include <algorithm>
constexpr int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
template <typename T, std::size_t N, typename F>
constexpr std::array<std::result_of_t<F(T)>, N>
transform_array(std::array<T, N> array, F f) {
auto array_f = std::array<std::result_of_t<F(T)>, N>{};
// This is a constexpr "loop":
std::transform(array.begin(), array.end(), array_f.begin(), [&f](auto el){return f(el);});
return array_f;
}
int main() {
constexpr std::array<int, 4> ints{{1, 2, 3, 4}};
// This can be done at compile time!
constexpr std::array<int, 4> facts = transform_array(ints, factorial);
static_assert(facts == std::array<int, 4>{{1, 2, 6, 24}}, "");
}
See how the array facts can be computed at compile time using a "loop", i.e. an std::algorithm. At the time of writing this, you need an experimental version of the newest clang or gcc release which you can try out on godbolt.org. But soon C++20 will be fully implemented by all the major compilers in the release versions.
This proposal "Expansion Statements" is interesting and I will provide the link for you to read further explanations.
Click this link
The proposal introduced the syntactic sugar for... as similar to the sizeof... operator. for... loop statement is a compile-time expression which means it has nothing to do in the runtime.
For example:
std::tuple<int, float, char> Tup1 {5, 3.14, 'K'};
for... (auto elem : Tup1) {
std::cout << elem << " ";
}
The compiler will generate the code at the compile-time and this is the equivalence:
std::tuple<int, float, char> Tup1 {5, 3.14, 'K'};
{
auto elem = std::get<0>(Tup1);
std::cout << elem << " ";
}
{
auto elem = std::get<1>(Tup1);
std::cout << elem << " ";
}
{
auto elem = std::get<2>(Tup1);
std::cout << elem << " ";
}
Thus, the expansion statement is not a loop but a repeated version of the loop body as it was said in the document.
Since this proposal isn't in C++'s current version or in the technical specification (if it's accepted). We can use the alternative version from the boost library specifically <boost/hana/for_each.hpp> and use the tuple version of boost from <boost/hana/tuple.hpp>. Click this link.
#include <boost/hana/for_each.hpp>
#include <boost/hana/tuple.hpp>
using namespace boost;
...
hana::tuple<int, std::string, float> Tup1 {5, "one", 5.55};
hana::for_each(Tup1, [](auto&& x){
std::cout << x << " ";
});
// Which will print:
// 5 "one" 5.55
The first argument of boost::hana::for_each must be a foldable container.
Why isn't a for-loop a compile-time expression?
Because a for() loop is used to define runtime control flow in the c++ language.
Generally variadic templates cannot be unpacked within runtime control flow statements in c++.
std::get<i>(t);
cannot be deduced at compile time, since i is a runtime variable.
Use variadic template parameter unpacking instead.
You might also find this post useful (if this not even remarks a duplicate having answers for your question):
iterate over tuple
Here are two examples attempting to replicate a compile-time for loop (which isn't part of the language at this time), using fold expressions and std::integer_sequence. The first example shows a simple assignment in the loop, and the second example shows tuple indexing and uses a lambda with template parameters available in C++20.
For a function with a template parameter, e.g.
template <int n>
constexpr int factorial() {
if constexpr (n == 0) { return 1; }
else { return n * factorial<n - 1>(); }
}
Where we want to loop over the template parameter, like this:
template <int N>
constexpr auto example() {
std::array<int, N> vals{};
for (int i = 0; i < N; ++i) {
vals[i] = factorial<i>(); // this doesn't work
}
return vals;
}
One can do this:
template <int... Is>
constexpr auto get_array(std::integer_sequence<int, Is...> a) -> std::array<int, a.size()> {
std::array<int, a.size()> vals{};
((vals[Is] = factorial<Is>()), ...);
return vals;
}
And then get the result at compile time:
constexpr auto x = get_array(std::make_integer_sequence<int, 5>{});
// x = {1, 1, 2, 6, 24}
Similarly, for a tuple:
constexpr auto multiple_return_values()
{
return std::make_tuple(3, 3.14, "pi");
}
int main(void) {
static constexpr auto ret = multiple_return_values();
constexpr auto for_constexpr = [&]<int... Is>(std::integer_sequence<int, Is...> a) {
((std::get<Is>(ret)), ...); // std::get<i>(t); from the question
return 0;
}
// use it:
constexpr auto w = for_constexpr(std::make_integer_sequence<int, std::tuple_size_v<decltype(ret)>>{});
}

Is There a Shortcut to decltype

In this answer I wrote the C++17 code:
cout << accumulate(cbegin(numbers), cend(numbers), decay_t<decltype(numbers[0])>{});
This received some negative commentary about the nature of C++'s type association, which I'm sad to say that I agree with :(
decay_t<decltype(numbers[0])>{} is a very complex way to get a:
Zero-initialized type of an element of numbers
Is it possible to maintain the association with the type of numbers' elements, but not type like 30 characters to get it?
EDIT:
I've got a lot of answers involving the a wrapper for either accumulate or for extracting the type from numbers[0]. The problem being they require the reader to navigate to a secondary location to read a solution that is no less complex than the initialization code decay_t<decltype(numbers[0])>{}.
The only reason that we have to do more than this: decltype(numbers[0]) Is because the array subscript operator returns a reference:
error: invalid cast of an rvalue expression of type 'int' to type 'int&'
It's interesting that with respect to decltype's argument:
If the name of an object is parenthesized, it is treated as an ordinary lvalue expression
However, decltype((numbers[0])) is still just a reference to an element of numbers. So in the end these answers may be as close as we can come to simplifying this initialization :(
While I would always choose to write a helper function as per #Barry,
if numbers is a standard container, it will export the type value_type, so you can save a little complexity:
cout << accumulate(cbegin(numbers), cend(numbers), decltype(numbers)::value_type());
going further, we could define this template function:
template<class Container, class ElementType = typename Container::value_type>
constexpr auto element_of(const Container&, ElementType v = 0)
{
return v;
}
which gives us this:
cout << accumulate(cbegin(numbers), cend(numbers), element_of(numbers, 0));
Personal preference: I find the decay_t, decltype and declval dance pretty annoying and hard to read.
Instead, I would use an extra level of indirection through a type-trait value_t<It> and zero-initialization through init = R{}
template<class It>
using value_t = typename std::iterator_traits<It>::value_type;
template<class It, class R = value_t<It>>
auto accumulate(It first, It last, R init = R{}) { /* as before */ }
I think the best you can do is just factor this out somewhere:
template <class It, class R = std::decay_t<decltype(*std::declval<It>())>>
R accumulate(It first, It last, R init = 0) {
return std::accumulate(first, last, init);
}
std::cout << accumulate(cbegin(numbers), cend(numbers));
Or more generally:
template <class Range, class T =
std::decay_t<decltype(*adl_begin(std::declval<Range&&>()))>>
T accumulate(Range&& range, T init = 0) {
return std::accumulate(adl_begin(range), adl_end(range), init);
}
cout << accumulate(numbers);
where adl_begin is a version of begin() that accounts for ADL.
Sure, we technically still have all the cruft that you were trying to avoid earlier... but at least now you never have to look at it again?