I want to write a simple polynomial class that can take an array of coefficients and expand it into a function a compile time so I don't need to loop over the coefficients at run time. I want to do something like this:
template <PARAM_TYPE, PARAMS>
class P {
public:
PARAM_TYPE eval(PARAM_TYPE p){
//Does PARAMS[0] * pow(p, PARAMS.length() -1) + ... + PARAMS[N-1]
}
}
Sample call
P<double,{2,4,3}> quadratic;
quadratic.eval(5); //returns 73
I don't want to be doing the loop since that will take time. Ideally I want to be able to form the expression above at compile time. Is this possible? Thanks
Here is an example of doing what you want. The compiler is finicky about whether or not it optimizes away all the code into constants depending on the usage I noticed and which compiler you use.
test here
#include <type_traits>
template<class T, unsigned Exponent>
inline constexpr typename std::enable_if<Exponent == 0, T>::type
pow2(const T base)
{
return 1;
}
template<class T, unsigned Exponent>
inline constexpr typename std::enable_if<Exponent % 2 != 0, T>::type
pow2(const T base)
{
return base * pow2<T, (Exponent-1)/2>(base) * pow2<T, (Exponent-1)/2>(base);
}
template<class T, unsigned Exponent>
inline constexpr typename std::enable_if<Exponent != 0 && Exponent % 2 == 0, T>::type
pow2(const T base)
{
return pow2<T, Exponent / 2>(base) * pow2<T, Exponent / 2>(base);
}
template<typename ParamType>
inline constexpr ParamType polynomial(const ParamType&, const ParamType& c0)
{
return c0;
}
template<typename ParamType, typename Coeff0, typename ...Coeffs>
inline constexpr ParamType polynomial(const ParamType& x, const Coeff0& c0, const Coeffs& ...cs)
{
return (static_cast<ParamType>(c0) * pow2<ParamType, sizeof...(cs)>(x)) + polynomial(x, static_cast<ParamType>(cs)...);
}
unsigned run(unsigned x)
{
return polynomial(x, 2, 4, 3);
}
double run(double x)
{
return polynomial(x, 2, 4, 3);
}
unsigned const_unsigned()
{
static const unsigned value = polynomial(5, 2, 4, 3);
return value;
}
double const_double()
{
static const double value = polynomial(5, 2, 4, 3);
return value;
}
EDIT: I have updated the code to a use a tweaked version of pow2<>() that aggressively performs calculations at compile time. This version optimizes so well at -O2 that it actually surprised me. You can see the generated assembly for the full program using the button above the code. If all arguments are constant, the compiler will generate the entire constant value at compile time. If the first argument is runtime-dependent, it generates very tight code for it still.
(Thanks to #dyp on this question for the inspiration to pow)
To evaluate a polynom, a good algorithm is Horner (See https://en.wikipedia.org/wiki/Horner%27s_method). The main idea is to compute the polynom recursively. Let's a polynom P of order n with coefficient ai. It is easy to see that the sequence Pk = Pk-1*x0 + an-k with P0 = an, that P(x0) = Pn.
So let's implement this algorithm using constexpr function:
template<class T>
constexpr double horner(double x, T an) { return an; }
template<class... T, class U = T>
constexpr double horner(double x, U an, T... a) { return horner(x, a...) * x + an; }
std::cout << horner(5., 1, 2, 1) << std::endl;
//test if the invocation of the constexpr function takes the constant expression branch
std::cout << noexcept(horner(5., 1, 2, 1)) << std::endl;
As you see, it is really easy to implement the evaluation of a polynom with constexpr functions using the recursive formula.
Related
I am writing an SI unit class that uses integer template arguments for the dimensions of the type. I am trying to write a power function, but I am running into the issue that the exponent is not necessarily known until runtime and therefore the template arguments are not either.
Some code is omitted in the examples
template <int m = 0, int s = 0, int kg = 0, int A = 0, int K = 0, int mol = 0, int cd = 0>
struct Dimension{
template <typename D, Ratio R> friend class Unit;
public:
real value;
template <int m2, int s2, int kg2, int A2, int K2, int mol2, int cd2>
constexpr Dimension<m+m2, s+s2, kg+kg2, A+A2, K+K2, mol+mol2, cd+cd2> operator*(Dimension<m2, s2, kg2, A2, K2, mol2, cd2> const & rhs) const {return value*rhs.value;}
//Power function here
};
The multiplication operator works because the result type depends on only template parameters.
None of the following work for the power operator because they all have the template argument depend on a runtime parameter.
constexpr auto operator^(int n) const {return Dimension<m*n, s*n, kg*n, A*n, K*n, mol*n, cd*n>{std::pow(value, n)};}
template <int N>
constexpr Dimension<m*N, s*N, kg*N, A*N, K*N, mol*N, cd*N> pow() const {return std::pow(value, N);}
constexpr auto operator^(int n) const {
if constexpr (n == 0) return Dimension<0, 0, 0, 0, 0, 0, 0>{std::pow(value, N)};
else if constexpr (n == 1) return Dimension<m*N, s*N, kg*N, A*N, K*N, mol*N, cd*N>{std::pow(value, N)};
else if constexpr (n == 2) return Dimension<m*2, s*2, kg*2, A*2, K*2, mol*2, cd*2>{std::pow(value, N)};
//...
// I can reasonably expect a range smaller than [-10, 10], but theoretically I should be able to have any n
}
Sorry if there are minor code mistakes, I have written them directly.
Are there any solutions that don't resort to macros that would let me solve this?
class Whatever {
public:
// doThing overloads:
template <typename T>
inline static T doThing(T t, float n) {
/* It's a SmoothStartN function in my code,
but don't worry about the specifics.
Includes a for loop up to n times
(result gets interpolated between non-integer ns). */
return whatever;
}
template <unsigned int n, typename T>
inline static T doThing(T t) {
/* Same as the other one, except now the compiler can
unroll the for loop if appropriate.
Or so I assume, anyway; I might be wrong. */
return whatever;
}
// doMoreComplexThing overloads:
template <unsigned int n, typename T>
inline static T doMoreComplexThing(T t1, T t2) {
float halfN = ((float)n) * 0.5f;
return (doThing(t1, halfN) * doThing(t2, halfN));
}
};
My problem: doMoreComplexThing() currently has to use the presumably-less-well-optimised version of doThing() in all cases. However, in half of all cases, where n is even, it can be evenly divided into integers and thus the more efficient template-uint version is viable.
How could I set this up so that, at compile time, doMoreComplexThing() detects whether n is even and uses the appropriate overload? Is such a thing possible? For that matter, is it likely any more performant to bother with this, or should I just stick with the float overload?
Answer: Thanks to Quentin's suggestion, I believe a good solution looks something like this:
template <unsigned int n, typename T>
inline static T doMoreComplexThing(T t1, T t2) {
if constexpr((n % 2u) == 0u) {
unsigned int halfN = n / 2u;
return (doThing<halfN>(t1) * doThing<halfN>(t2));
}
else {
float halfN = ((float)n) * 0.5f;
return (doThing(t1, halfN) * doThing(t2, halfN));
}
}
Good day,
I tried to implement a class which could summary everything with an inner stored variable _val, see below:
#include <utility>
#include <vector>
#include <cassert>
template <typename T>
class Addable {
T _val;
public:
explicit Addable(T v) :_val(std::move(v)) {}
template <typename ...Us>
[[nodiscard]] constexpr T add(Us&& ...us) const
{
return (_val + ... + us);
}
template<typename U>
[[nodiscard]] constexpr T add(U u) const
{
if constexpr (std::is_same_v<T, std::vector<U>>) {
auto copy = _val;
for (auto& n : copy) {
n += u;
}
return copy;
}
else {
return _val + u;
}
}
};
int main()
{
using namespace std;
assert(Addable<int>{42}.add() == 42);
assert(Addable<int>{42}.add(1) == 43);
assert(Addable<int>{42}.add(1, 1) == 44);
assert(Addable<int>{2}.add(1, 1, 1, 1, 1) == 7);
{
vector v {2, 3};
vector expected {3, 4};
assert(Addable<vector<int>>{v}.add(1) == expected);
}
{
vector v {2, 3};
vector expected {5, 6};
// assert(Addable<vector<int>>{v}.add(1, 2) == expected); // compile error...
}
return 0;
}
Class works
- with a fold expression with a simple T like int here.
- with a T like a std::vector but only for one U u.
When I try to append for each vector element one of each from variadic pack its broke, what I did wrong?...
This one overload handles all your test cases:
template <typename ...Us>
[[nodiscard]] constexpr T add(Us&& ...us) const
{
if constexpr (!sizeof...(us))
{
return _val;
}
else if constexpr (std::is_same_v<T, std::vector< std::common_type_t<Us...> > > )
{ // [2]
auto copy = _val;
for (int& i : copy)
i += (us + ...);
return copy;
}
else
{
return (_val + ... + us);
}
}
In [2] block which is invoked for T = vector you need to iterate over all items in copy, and for each one you have to add a result of folding (us + ...) from input arguments.
Live demo
When I try to append for each vector element one of each from variadic pack its broke, what I did wrong?..
I don't understand what do you exactly want... anyway
With
Addable<vector<int>>{v}.add(1, 2)
you call add() with two arguments.
You have two version of add(): the variadic one and the one that receive one argument.
So, calling it with two arguments, only the variadic matches
template <typename ...Us>
[[nodiscard]] constexpr T add(Us&& ...us) const
{
return (_val + ... + us);
}
but the operator + in _val + ... + us, where _val is a std::vector<int> and the us... values are ints, is undefined. So the error.
If you call add() with a single argument, by example
Addable<vector<int>>{v}.add(2)
the code compile (calling the add() version that manages the std::vector case) but, obviously, the assert() fail when you run the compiled program.
This poly_eval function will compute the result of evaluating a polynomial with a particular set of coefficients at a particular value of x. For example, poly_eval(5, 1, -2, -1) computes x^2 - 2x - 1 with x = 5. It's all constexpr so if you give it constants it will compute the answer at compile time.
It currently uses recursive templates to build the polynomial evaluation expression at compile time and relies on C++14 to be constexpr. I was wondering if anybody could think of a good way to remove the recursive template, perhaps using C++17. The code that exercises the template uses the __uint128_t type from clang and gcc.
#include <type_traits>
#include <tuple>
template <typename X_t, typename Coeff_1_T>
constexpr auto poly_eval_accum(const X_t &x, const Coeff_1_T &c1)
{
return ::std::pair<X_t, Coeff_1_T>(x, c1);
}
template <typename X_t, typename Coeff_1_T, typename... Coeff_TList>
constexpr auto poly_eval_accum(const X_t &x, const Coeff_1_T &c1, const Coeff_TList &... coeffs)
{
const auto &tmp_result = poly_eval_accum(x, coeffs...);
auto saved = tmp_result.second + tmp_result.first * c1;
return ::std::pair<X_t, decltype(saved)>(tmp_result.first * x, saved);
}
template <typename X_t, typename... Coeff_TList>
constexpr auto poly_eval(const X_t &x, const Coeff_TList &... coeffs)
{
static_assert(sizeof...(coeffs) > 0,
"Must have at least one coefficient.");
return poly_eval_accum(x, coeffs...).second;
}
// This is just a test function to exercise the template.
__uint128_t multiply_lots(__uint128_t num, __uint128_t n2)
{
const __uint128_t cf = 5;
return poly_eval(cf, num, n2, 10);
}
// This is just a test function to exercise the template to make sure
// it computes the result at compile time.
__uint128_t eval_const()
{
return poly_eval(5, 1, -2, 1);
}
Also, am I doing anything wrong here?
-------- Comments on Answers --------
There are two excellent answers down below. One is clear and terse, but may not handle certain situations involving complex types (expression trees, matrices, etc..) well, though it does a fair job. It also relies on the somewhat obscure , operator.
The other is less terse, but still much clearer than my original recursive template, and it handles types just as well. It expands out to 'cn + x * (cn-1 + x * (cn-2 ...' whereas my recursive version expands out to cn + x * cn-1 + x * x * cn-2 .... For most reasonable types they should be equivalent, and the answer can easily be modified to expand out to what my recursive one expands to.
I picked the first answer because it was 1st and its terseness is more within the spirit of my original question. But, if I were to choose a version for production, I'd choose the second.
Using the power of comma operator (and C++17 folding, obviously), I suppose you can write poly_eval() as follows
template <typename X_t, typename C_t, typename ... Cs_t>
constexpr auto poly_eval (X_t const & x, C_t a, Cs_t const & ... cs)
{
( (a *= x, a += cs), ..., (void)0 );
return a;
}
trowing away poly_eval_accum().
Observe that the first coefficient if explicated, so you can delete also the static_assert() and is passed by copy, and become the accumulator.
-- EDIT --
Added an alternative version to solve the problem of the return type using std::common_type a decltype() of an expression, as the OP suggested; in this version a is a constant reference again.
template <typename X_t, typename C_t, typename ... Cs_t>
constexpr auto poly_eval (X_t const & x, C_t const & c1, Cs_t const & ... cs)
{
decltype(((x * c1) + ... + (x * cs))) ret { c1 };
( (ret *= x, ret += cs), ..., (void)0 );
return ret;
}
-- EDIT 2 --
Bonus answer: it's possible avoid the recursion also in C++14 using the power of the comma operator (again) and initializing an unused C-style array of integers
template <typename X_t, typename C_t, typename ... Cs_t>
constexpr auto poly_eval (X_t const & x, C_t const & a, Cs_t const & ... cs)
{
using unused = int[];
std::common_type_t<decltype(x * a), decltype(x * cs)...> ret { a };
(void)unused { 0, (ret *= x, ret += cs)... };
return ret;
}
A great answer is supplied above, but it requires a common return type and will therefore not work if you are, say, building a compile time expression tree.
What we need is some way to have a fold expression that both does the multiply with the value at the evaluation point x and add a coefficient at each iteration, in order to eventually end up with an expression like: (((c0) * x + c1) * x + c2) * x + c3. This is (I think) not possible with a fold expression directly, but we can define a special type that overloads a binary operator and does the necessary calculations.
template<class M, class T>
struct MultiplyAdder
{
M mul;
T acc;
constexpr MultiplyAdder(M m, T a) : mul(m), acc(a) { }
};
template<class M, class T, class U>
constexpr auto operator<<(const MultiplyAdder<M,T>& ma, const U& u)
{
return MultiplyAdder(ma.mul, ma.acc * ma.mul + u);
}
template <typename X_t, typename C_t, typename... Coeff_TList>
constexpr auto poly_eval(const X_t &x, const C_t &a, const Coeff_TList &... coeffs)
{
return (MultiplyAdder(x, a) << ... << coeffs).acc;
}
As a bonus, this solution also ticks C++17's 'automatic class template argument deduction' box ;)
Edit: Oops, argument deduction wasn't working inside MultiplyAdder<>::operator<<(), because MultiplyAdder refers to its own template-id rather than its template-name. I've added a namespace specifier, but that unfortunately makes it dependent on its own namespace. There must be a way to refer to its actual template-name, but I can't think of any without resorting to template aliases.
Edit2: Fixed it by making operator<<() a non-member.
(This question has been dramatically edited from the original, without changing the real intent of the original question)
If we add up all the elements in a vector<int>, then the answer could overflow, requiring something like intmax_t to store the answer accurately and without overflow. But intmax_t isn't suitable for vector<double>.
I could manually specify the types:
template<typename>
struct sum_traits;
template<>
struct sum_traits<int> {
typedef long accumulate_safely_t;
};
and then use them as follows:
template<typename C>
auto sum(const C& c) {
sum_traits<decltype(c.begin())> :: accumulate_safely_t> r = 0;
for(auto &el: c)
r += el;
return r;
}
My questions: Is it possible to automatically identify a suitable type, a large and accurate type, so I don't have to manually specify each one via the type trait?
The main problem with your code is that auto r = 0 is equivalent to int r = 0. That's not how your C++98 code worked. In general, you can't find a perfect target type. Your code is just a variant of std::accumulate, so we can look at how the Standard solved this problem: it allows you to pass in the initial value for the accumulator, but also its type: long sum = std::accumulate(begin, end, long{0});
Given:
If we add up all the elements in a vector, then the answer could overflow, requiring something like intmax_t to store the answer accurately and without overflow.
Question:
My questions: Is it possible to automatically identify a suitable type, a large and accurate type, so I don't have to manually specify each one via the type trait?
The problem here is that you want to take runtime data (a vector) and from it deduce a type (a compile-time thing).
Since type deduction is a compile-time operation, we must use only the information available to us at compile time to make this decision.
The only information we have at compile-time (unless you supply more) is std::numeric_limits<int>::max() and std::numeric_limits<std::vector<int>::size_type>::max().
You don't even have std::vector<int>::max_size() at this stage, as it's not mandated to be constexpr. Neither can you rely on std::vector<int>::allocator_type::max_size() because it's:
a member function
optional
deprecated in c++17
So what we're left with is a maximum possible sum of:
std::numeric_limits<int>::max() * std::numeric_limits<std::vector<int>::size_type>::max()
we could now use a compile-time disjunction to find an appropriate integer (if such an integer exists) (something involving std::conditional)
This doesn't make the type adapt to runtime conditions, but it will at least adapt to the architecture for which you're compiling.
Something like this:
template <bool Signed, unsigned long long NofBits>
struct smallest_integer
{
template<std::size_t Bits, class...Candidates>
struct select_candidate;
template<std::size_t Bits, class...Candidates>
using select_candidate_t = typename select_candidate<Bits, Candidates...>::type;
template<std::size_t Bits, class Candidate, class...Rest>
struct select_candidate<Bits, Candidate, Rest...>
{
using type = std::conditional_t<std::numeric_limits<Candidate>::digits >= Bits, Candidate, select_candidate_t<Bits, Rest...>>;
};
template<std::size_t Bits, class Candidate>
struct select_candidate<Bits, Candidate>
{
using type = std::conditional_t<std::numeric_limits<Candidate>::digits >= Bits, Candidate, void>;
};
using type =
std::conditional_t<Signed,
select_candidate_t<NofBits, std::int8_t, std::int16_t, std::int32_t, std::int64_t, __int128_t>,
select_candidate_t<NofBits, std::uint8_t, std::uint16_t, std::uint32_t, std::uint64_t, __uint128_t>>;
};
template<bool Signed, unsigned long long NofBits> using smallest_integer_t = typename smallest_integer<Signed, NofBits>::type;
template<class L, class R>
struct result_of_multiply
{
static constexpr auto lbits = std::numeric_limits<L>::digits;
static constexpr auto rbits = std::numeric_limits<R>::digits;
static constexpr auto is_signed = std::numeric_limits<L>::is_signed or std::numeric_limits<R>::is_signed;
static constexpr auto result_bits = lbits + rbits;
using type = smallest_integer_t<is_signed, result_bits>;
};
template<class L, class R> using result_of_multiply_t = typename result_of_multiply<L, R>::type;
struct safe_multiply
{
template<class L, class R>
auto operator()(L const& l, R const& r) const -> result_of_multiply_t<L, R>
{
return result_of_multiply_t<L, R>(l) * result_of_multiply_t<L, R>(r);
}
};
template<class T>
auto accumulate_values(const std::vector<T>& v)
{
using result_type = result_of_multiply_t<T, decltype(std::declval<std::vector<T>>().max_size())>;
return std::accumulate(v.begin(), v.end(), result_type(0), std::plus<>());
}
struct uint128_t_printer
{
std::ostream& operator()(std::ostream& os) const
{
auto n = n_;
if (n == 0) return os << '0';
char str[40] = {0}; // log10(1 << 128) + '\0'
char *s = str + sizeof(str) - 1; // start at the end
while (n != 0) {
*--s = "0123456789"[n % 10]; // save last digit
n /= 10; // drop it
}
return os << s;
}
__uint128_t n_;
};
std::ostream& operator<<(std::ostream& os, const uint128_t_printer& p)
{
return p(os);
}
auto output(__uint128_t n)
{
return uint128_t_printer{n};
}
int main()
{
using rtype = result_of_multiply<std::size_t, unsigned>;
std::cout << rtype::is_signed << std::endl;
std::cout << rtype::lbits << std::endl;
std::cout << rtype::rbits << std::endl;
std::cout << rtype::result_bits << std::endl;
std::cout << std::numeric_limits<rtype::type>::digits << std::endl;
std::vector<int> v { 1, 2, 3, 4, 5, 6 };
auto z = accumulate_values(v);
std::cout << output(z) << std::endl;
auto i = safe_multiply()(std::numeric_limits<unsigned>::max(), std::numeric_limits<unsigned>::max());
std::cout << i << std::endl;
}
You can use return type deduction in C++14 just like this:
template<typename C>
auto sum(const C& c) {
auto r = 0;
for(auto &el: c)
r += el;
return r;
}
In C++11, considering your C++98 code, you may use the following:
template<typename C>
auto sum(const C& c) -> typename C::value_type {
auto r = 0;
for(auto &el: c)
r += el;
return r;
}
But, as pointed in the comments, auto r = 0; will still resolve to int at compile time. As proposed in an other answer, you may want to make the initial value type (and so the return value type) a template parameter as well:
template<typename C, typename T>
T sum(const C& c, T init) {
for(auto &el: c)
init += el;
return init;
}
// usage
std::vector<std::string> v({"Hello ", "World ", "!!!"});
std::cout << sum(v, std::string{});