Members in constexpr functors causing runtime execution

Members in constexpr functors causing runtime execution - c++

I am using functors to generate compile time calculated code in the following way (I apologize for the long code, but it is the only way I have found to reproduce the behavior):
#include <array>
#include <tuple>
template <int order>
constexpr auto compute (const double h)
{
std::tuple<std::array<double,order>,
std::array<double,order> > paw{};
auto xtab = std::get<0>(paw).data();
auto weight = std::get<1>(paw).data();
if constexpr ( order == 3 )
{
xtab[0] = - 1.0E+00;
xtab[1] = 0.0E+00;
xtab[2] = 1.0E+00;
weight[0] = 1.0 / 3.0E+00;
weight[1] = 4.0 / 3.0E+00;
weight[2] = 1.0 / 3.0E+00;
}
else if constexpr ( order == 4 )
{
xtab[0] = - 1.0E+00;
xtab[1] = - 0.447213595499957939281834733746E+00;
xtab[2] = 0.447213595499957939281834733746E+00;
xtab[3] = 1.0E+00;
weight[0] = 1.0E+00 / 6.0E+00;
weight[1] = 5.0E+00 / 6.0E+00;
weight[2] = 5.0E+00 / 6.0E+00;
weight[3] = 1.0E+00 / 6.0E+00;
}
for (auto & el : std::get<0>(paw))
el = (el + 1.)/2. * h ;
for (auto & el : std::get<1>(paw))
el = el/2. * h ;
return paw;
}
template <std::size_t n>
class Basis
{
public:
constexpr Basis(const double h_) :
h(h_),
paw(compute<n>(h)),
coeffs(std::array<double,n>())
{}
const double h ;
const std::tuple<std::array<double,n>,
std::array<double,n> > paw ;
const std::array<double,n> coeffs ;
constexpr double operator () (int i, double x) const
{
return 1. ;
}
};
template <std::size_t n,std::size_t p,typename Ltype,typename number=double>
class Functor
{
public:
constexpr Functor(const Ltype L_):
L(L_)
{}
const Ltype L ;
constexpr auto operator()(const auto v) const
{
const auto l = L;
// const auto l = L();
std::array<std::array<number,p+1>,p+1> CM{},CM0{},FM{};
const auto basis = Basis<p+1>(l);
typename std::remove_const<typename std::remove_reference<decltype(v)>::type>::type w{};
for (auto i = 0u; i < p + 1; ++i)
CM0[i][0] += l;
for (auto i = 0u ; i < p+1 ; ++i)
for (auto j = 0u ; j < p+1 ; ++j)
{
w[i] += CM0[i][j]*v[j];
}
for (auto b = 1u ; b < n-1 ; ++b)
for (auto i = 0u ; i < p+1 ; ++i)
for (auto j = 0u ; j < p+1 ; ++j)
{
w[b*(p+1)+i] += CM[i][j]*v[b*(p+1)+j];
w[b*(p+1)+i] += FM[i][j]*v[(b+1)*(p+1)+j];
}
return w ;
}
};
int main(int argc,char *argv[])
{
const auto nel = 4u;
const auto p = 2u;
std::array<double,nel*(p+1)> x{} ;
constexpr auto L = 1.;
// constexpr auto L = [](){return 1.;};
const auto A = Functor<nel,p,decltype(L)>(L);
const volatile auto y = A(x);
return 0;
}
I compile using GCC 8.2.0 with the flags:
-march=native -std=c++1z -fconcepts -Ofast -Wa,-adhln
And when looking at the generated assembly, the calculation is being executed at runtime.
If I change the two lines that are commented for the lines immediately below, I find that the code is indeed being executed at compile time and just the value of the volatile variable is placed in the assembly.
I tried to generate a smaller example that reproduces the behavior but small changes in the code indeed calculate at compile time.
I somehow understand why providing constexpr lambdas helps, but I would like to understand why providing a double would not work in this case. Ideally I wouldn't like to provide lambdas because it makes my frontend messier.
This code is part of a very large code base, so please disregard what the code is actually calculating, I created this example to show the behavior and nothing more.
What would be the right way to provide a double to the functor and store it as a const member variable without changing the compile-time behavior?
Why do small modifications in the compute() function (for instance, other small changes do so as well) do indeed produce compile time code?
I would like to understand what are the actual conditions for GCC to provide these compile-time calculations, as the actual application I am working in requires it.
Thanks!

Non sure to understand when your code is executed run-time and when is executed compile-time, anyway the rule of the C++ language (not only g++ and ignoring the as-if rule) is that a constexpr function
can be executed run-time and must be executed run-time when compute values know run-time (by example: values coming from standard input)
can be executed compile-time and must be executed compile-time when the result goes where a compile-time know value is strictly required (by example: initialization of constexpr variable, not-type template arguments, C-style arrays dimensions, static_assert() tests)
there is a grey area -- when the compiler know the value involved in computation compile time but the computed value doesn't goes where a compile-time value is strictly required -- where the compiler can choose if compute compile-time or run-time.
If you're interested in
const volatile auto y = A(x);
it seems to me we are in the grey area and the compiler can choose if compute the initial value for y compile time or run-time.
If you want a y initialized compile-time, I suppose you can obtain this defining it (and also preceding variables) constexpr
constexpr auto nel = 4u;
constexpr auto p = 2u;
constexpr std::array<double,nel*(p+1)> x{} ;
constexpr auto L = 1.;
// constexpr auto L = [](){return 1.;};
constexpr auto A = Functor<nel,p,decltype(L)>(L);
constexpr volatile auto y = A(x);

for (auto i = 0u; i < p + 1; ++i)
CM0[i][0] += l;
when l is a stateless lambda type, this converts l to a function type, then to bool (an integral type). This two-step conversion is allowed because only one is "user defined".
This conversion always produces 1, and does not depend on the state of l.

Related

Generating prime numbers at compile time

I am interested in how you can generate an array of prime numbers at compile time (I believe that the only way is using metaprogramming (in C++, not sure how this works in other languages)).
Quick note, I don't want to just say int primes[x] = {2, 3, 5, 7, 11, ...};, since I want to use this method in competitive programming, where source files cannot be larger than 10KB. So this rules out any pregenerated arrays of more than a few thousand elements.
I know that you can generate the fibonacci sequence at compile time for example, but that is rather easy, since you just add the 2 last elements. For prime numbers, I don't really know how to do this without loops (I believe it is possible, but I don't know how, using recursion I guess), and I don't know how loops could be evaluated at compile-time.
So I'm looking for an idea (at least) on how to approach this problem, maybe even a short example

We can do a compile time pre calculation of some prime numbers and put them in a compile time generated array. And then use a simple look up mechanism to get the value. This will work only to a small count of prime numbers. But it should show you the basic mechanism.
We will first define some default approach for the calculation a prime number as a constexpr function:
constexpr bool isPrime(size_t n) noexcept {
if (n <= 1) return false;
for (size_t i = 2; i*i < n; i++) if (n % i == 0) return false;
return true;
}
constexpr unsigned int primeAtIndex(size_t i) noexcept {
size_t k{3};
for (size_t counter{}; counter < i; ++k)
if (isPrime(k)) ++counter;
return k-1;
}
With that, prime numbers can easily be calculated at compile time. Then, we fill a std::array with all prime numbers. We use also a constexpr function and make it a template with a variadic parameter pack.
We use std::index_sequence to create a prime number for indices 0,1,2,3,4,5, ....
That is straigtforward and not complicated:
// Some helper to create a constexpr std::array initilized by a generator function
template <typename Generator, size_t ... Indices>
constexpr auto generateArrayHelper(Generator generator, std::index_sequence<Indices...>) {
return std::array<decltype(std::declval<Generator>()(size_t{})), sizeof...(Indices) > { generator(Indices)... };
}
This function will be fed with an index sequence 0,1,2,3,4,... and a generator function and return a std::array<return type of generator function, ...> with the corresponding numbers, calculated by the generator.
We make a next function, that will call the above with the index sequence 1,2,3,4,...Max, like so:
template <size_t Size, typename Generator>
constexpr auto generateArray(Generator generator) {
return generateArrayHelper(generator, std::make_index_sequence<Size>());
}
And now, finally,
constexpr auto Primes = generateArray<100>(primeAtIndex);
will give us a compile-time std::array<unsigned int, 100> with the name Primes containing all 100 prime numbers. And if we need the i'th prime number, then we can simply write Primes [i]. There will be no calculation at runtime.
I do not think that there is a faster way to calculate the n'th prime number.
Please see the complete program below:
#include <iostream>
#include <utility>
#include <array>
// All done during compile time -------------------------------------------------------------------
constexpr bool isPrime(size_t n) noexcept {
if (n <= 1) return false;
for (size_t i = 2; i*i < n; i++) if (n % i == 0) return false;
return true;
}
constexpr unsigned int primeAtIndex(size_t i) noexcept {
size_t k{3};
for (size_t counter{}; counter < i; ++k)
if (isPrime(k)) ++counter;
return k-1;
}
// Some helper to create a constexpr std::array initilized by a generator function
template <typename Generator, size_t ... Indices>
constexpr auto generateArrayHelper(Generator generator, std::index_sequence<Indices...>) {
return std::array<decltype(std::declval<Generator>()(size_t{})), sizeof...(Indices) > { generator(Indices)... };
}
template <size_t Size, typename Generator>
constexpr auto generateArray(Generator generator) {
return generateArrayHelper(generator, std::make_index_sequence<Size>());
}
// This is the definition of a std::array<unsigned int, 100> with prime numbers in it
constexpr auto Primes = generateArray<100>(primeAtIndex);
// End of: All done during compile time -----------------------------------------------------------
// Some debug test driver code
int main() {
for (const auto p : Primes) std::cout << p << ' '; std::cout << '\n';
return 0;
}
By the way. The generateArray fucntionality will of course also work with other generator functions.
If you need for example triangle numbers, then you could use:
constexpr size_t getTriangleNumber(size_t row) noexcept {
size_t sum{};
for (size_t i{ 1u }; i <= row; i++) sum += i;
return sum;
}
and
constexpr auto TriangleNumber = generateArray<100>(getTriangleNumber);
would give you a compile time calculated constexpr std::array<size_t, 100> with triangle numbers.
For fibonacci numbers your could use
constexpr unsigned long long getFibonacciNumber(size_t index) noexcept {
unsigned long long f1{ 0ull }, f2{ 1ull }, f3{};
while (index--) { f3 = f2 + f1; f1 = f2; f2 = f3; }
return f2;
}
and
constexpr auto FibonacciNumber = generateArray<93>(getFibonacciNumber);
to get ALL Fibonacci numbers that fit in a 64 bit value.
So, a rather flexible helper.
Caveat
Big array sizes will create a compiler out of heap error.
Developed and tested with Microsoft Visual Studio Community 2019, Version 16.8.2.
Additionally compiled and tested with clang11.0 and gcc10.2
Language: C++17

The following is just to give you something to start with. It heavily relies on recursively instantiating types, which isn't quite efficient and I would not want to see in the next iteration of the implementation.
div is a divisor of x iff x%div == false:
template <int div,int x>
struct is_divisor_of : std::conditional< x%div, std::false_type, std::true_type>::type {};
A number x is not prime, if there is a p < x that is a divisor of x:
template <int x,int p=x-2>
struct has_divisor : std::conditional< is_divisor_of<p,x>::value, std::true_type, has_divisor<x,p-1>>::type {};
If no 1 < p < x divides x then x has no divisor (and thus is prime):
template <int x>
struct has_divisor<x,1> : std::false_type {};
A main to test it:
int main()
{
std::cout << is_divisor_of<3,12>::value;
std::cout << is_divisor_of<5,12>::value;
std::cout << has_divisor<12>::value;
std::cout << has_divisor<13>::value;
}
Output:
1010
Live Demo.
PS: You probably better take the constexpr function route, as suggested in a comment. The above is just as useful as recursive templates to calculate the fibonacci numbers (ie not really useful other than for demonstration ;).

With "simple" constexpr, you might do:
template <std::size_t N>
constexpr void fill_next_primes(std::array<std::size_t, N>& a, std::size_t n)
{
std::size_t i = (a[n - 1] & ~0x1) + 1;
while (!std::all_of(a.begin(), a.begin() + n, [&i](int e){ return i % e != 0; })) {
i += 2;
}
a[n] = i;
}
template <std::size_t N>
constexpr std::array<std::size_t, N> make_primes_array()
{
// use constexpr result
// to ensure to compute at compile time,
// even if `make_primes_array` is not called in constexpr context
constexpr auto res = [](){
std::array<std::size_t, N> res{2};
for (std::size_t i = 1; i != N; ++i) {
fill_next_primes(res, i);
}
return res;
}();
return res;
}
Demo

generic-template function always returning integer values

I am writing the below linear interpolation function, which is meant to be generic, but current result is not.
The function finds desired quantity of equally distant points linear in between two given boundary points. Both desired quantity and boundaries are given as parameters. As return, a vector of linear interpolated values is returned.
The issue I have concerns to return type, which always appear to be integer, even when it should have some mantissa, for example:
vec = interpolatePoints(5, 1, 4);
for (auto val : vec) std::cout << val << std::endl; // prints 4, 3, 2, 1
But it should have printed: 4.2, 3.4, 2.6, 1.8
What should I do to make it generic and have correct return values?
code:
template <class T>
std::vector<T> interpolatePoints(T lower_limit, T high_limit, const unsigned int quantity) {
auto step = ((high_limit - lower_limit)/(double)(quantity+1));
std::vector<T> interpolated_points;
for(unsigned int i = 1; i <= quantity; i++) {
interpolated_points.push_back((std::min(lower_limit, high_limit) + (step*i)));
}
return interpolated_points;
}

After some simplifications the function might look like:
template<typename T, typename N, typename R = std::common_type_t<double, T>>
std::vector<R> interpolate(T lo_limit, T hi_limit, N n) {
const auto lo = static_cast<R>(lo_limit);
const auto hi = static_cast<R>(hi_limit);
const auto step = (hi - lo) / (n + 1);
std::vector<R> pts(n);
const auto gen = [=, i = N{0}]() mutable { return lo + step * ++i; };
std::generate(pts.begin(), pts.end(), gen);
return pts;
}
The type of elements in the returned std::vector is std::common_type_t<double, T>. For int, it is double, for long double, it is long double. double looks like a reasonable default type.

You just have to pass correct type:
auto vec = interpolatePoints(5., 1., 4); // T deduced as double
Demo
And in C++20, you might use std::lerp, to have:
template <class T>
std::vector<T> interpolatePoints(T lower_limit, T high_limit, const unsigned int quantity) {
auto step = 1 / (quantity + 1.);
std::vector<T> interpolated_points;
for(unsigned int i = 1; i <= quantity; i++) {
interpolated_points.push_back(std::lerp(lower_limit, high_limit, step * i));
}
return interpolated_points;
}
Demo

Lambda function type creep

Let's have a look at the following code:
tbb::blocked_range<int> range(0, a.rows);
uint64_t positive = tbb::parallel_reduce(range, 0, // <- initial value
[&](const tbb::blocked_range<int>& r, uint64_t v)->uint64_t {
for (int y = r.begin(); y < r.end(); ++y) {
auto rA = a[y], rB = b[y];
for (int x = 0; x < a.cols; ++x) {
auto A = rA[x], B = rB[x];
for (int l = y; l < a.rows; ++l) {
auto rAA = a[l], rBB = b[l];
for (int m = x; m < a.cols; ++m) {
if (l == y && m == x)
continue;
auto AA = rAA[m], BB = rBB[m];
if ((A == AA) && (B == BB))
v++; // <- value is changed
if ((A != AA) && (B != BB))
v++; // <- value is changed
}
}
}
}
return v;
}, [](uint64_t first, uint64_t second)->uint64_t {
std::cerr << first << ' + ' << second; // <- wrong values occur
return first+second;
}
);
This is a parallel reduce operation where the initial value is 0. Then, in each parallel computation, based on the initial value, we count up (local variable v in the first lambda function). The second lambda function aggregates the results from parallel workers.
Interestingly enough, this code does not work as expected. The output of the second lambda function will show enormous figures that result from integer overflows.
The code works correctly when replacing the second line with:
uint64_t positive = tbb::parallel_reduce(range, (uint64_t)0, // <- initial value
Now I wonder. Wouldn't the definition of the first lambda (uint64_t v) enforce this cast and how can a function that is supposed to operate on uint64_t operate on int instead?
The compiler is GCC 6.

It doesn't matter what argument the lambda takes. According to the docs, everything is based on the type of the 2nd argument:
template<typename Range, typename Value,
typename Func, typename Reduction>
Value parallel_reduce( const Range& range, const Value& identity,
const Func& func, const Reduction& reduction,
[, partitioner[, task_group_context& group]] );
with pseudo-signatures of:
Value Func::operator()(const Range& range, const Value& x)
Value Reduction::operator()(const Value& x, const Value& y)
So a Value is passed into Func and into Reduction and returned. If you want uint64_ts everywhere, you'll need to ensure that Value is uint64_t. Which is why your (uint64_t)0 works but your 0 doesn't (and is actually undefined behavior to boot).
Note that this is the same problem that you would get with just normal accumulate:
std::vector<uint64_t> vs{0x7fffffff, 0x7fffffff, 0x7fffffff};
uint64_t sum = std::accumulate(vs.begin(), vs.end(), 0, std::plus<uint64_t>{});
// ^^^ oops, int 0!
// even though I'm using plus<uint64_t>!
assert(sum == 0x17ffffffd); // fails because actually sum is truncated
// and is just 0x7ffffffd

Determine `constexpr` execution - during compilation or at runtime?

Is there a way to achieve different behaviour of a constexpr function in the compilation phase and at runtime?
Consider the following example (using a theoretical feature from D: static if):
constexpr int pow( int base , int exp ) noexcept
{
static if( std::evaluated_during_translation() ) {
auto result = 1;
for( int i = 0 ; i < exp ; i++ )
result *= base;
return result;
} else { // std::evaluated_during_runtime()
return std::pow( base , exp );
}
}
If not, is there a way to restrict constexpr to be compile-time only?

No, there is no such way.
Sorry.
N3583 is a paper proposing changes to allow what you are asking for.

Prior to C++20, this wasn't possible. C++20 then added std::is_constant_evaluated which is exactly for this use case:
constexpr int pow(int base, int exp) noexcept
{
if (std::is_constant_evaluated())
{
auto result = 1;
for (int i = 0; i < exp; i++)
result *= base;
return result;
}
else
{
return std::pow(base, exp);
}
}
Note that the if statement itself is not constexpr. If it were, the whole else arm would be removed from the function and it would always run the if arm, no matter if at compile time or runtime. With a normal if statement, you basically get two functions. One that runs at compile time:
constexpr int pow(int base, int exp) noexcept
{
auto result = 1;
for (int i = 0; i < exp; i++)
result *= base;
return result;
}
and one that gets compiled an runs at runtime:
constexpr int pow(int base, int exp) noexcept
{
return std::pow(base, exp);
}
The compiler can safely remove the if arm because it can prove that it isn't reachable at runtime. Pretty neat.

Compute the sum of absolute values with stl algorithms

I would like to use the algorithms of std::numeric to compute the sum of the absolute values of an array, in order to use the gnu parallel extensions (array size is > 500000).
Here is my current code :
double ret = 0;
for (auto i = 0U; i < length; ++i)
{
ret += std::abs(tab[i]);
}
return ret;
So I thought about doing :
auto sumabs = [] (double a, double b)
{
return std::abs(a) + std::abs(b);
}
std::accumulate(tab, tab + length, 0, sumabs);
But it is inefficient because if a reduction algorithm is performed (which I sincerely hope for the sake of fast computation!), std::abs will be applied to values which are already >= 0.
So is there any way to do this ? perhaps performing the first step of the reduction "by hand", and let std::accumulate do a simple addition between the rest ? But there will be a copy and a memory hit...

You can pass a function to the accumlate method and perform the "by hand" evaluation inside the function. By the way in your code you apply the abs method to the first parameter which is not necessary.
int fAccumulate (int accumulated, int accumulateIncrement)
{
int retValue = 0;
if (accumulateIncrement >= 0)
{
retValue = accumulated + accumulateIncrement;
}
else
{
retValue = accumulated + std::abs(accumulateIncrement);
}
return retValue;
}
The use of this code could be:
int init = 0;
int numbers[] = {10,20,-30};
int a = std::accumulate (numbers, numbers+3, init, fAccumulate);

This will use the minimum number of calls to std::abs necessary:
#include <algorithm>
#include <cmath>
int main() {
static const auto abssum = [] (auto x, auto y) {return x + std::abs(y);};
float entries[4] = {1.0f, 2.0f, 3.0f, 4.0f};
auto sum = std::accumulate(std::begin(entries), std::end(entries), 0.0f, abssum);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Members in constexpr functors causing runtime execution - c++

for (auto i = 0u; i < p + 1; ++i) CM0[i][0] += l; when l is a stateless lambda type, this converts l to a function type, then to bool (an integral type). This two-step conversion is allowed because only one is "user defined". This conversion always produces 1, and does not depend on the state of l.

Related

Generating prime numbers at compile time

generic-template function always returning integer values

Lambda function type creep

Determine `constexpr` execution - during compilation or at runtime?

Compute the sum of absolute values with stl algorithms

Categories

Resources