I am writing my own library for a university project, containing the template classes: Vector and Matrix. In addition to these template classes, there are also related template functions for vectors and matrices. The professor explicitly told us to define the matrix as a one-dimensional array in which the elements are sorted by column (reasons of efficiency / optimization). The "matrix" template class has 3 template parameters: type of data allowed by the matrix, number of rows, number of columns.
template <class T, unsigned int M, unsigned int N>
class Matrix
Having said that, I immediately get to the problem. I'm writing a function that calculates the determinant of any matrix of dimension > 4, using the LaPlace rule for columns (using the first column).
I also wrote a function for two-dimensional matrices (called D2MatrixDet) and a function for three-dimensional matrices (called D3MatrixDet) tested and working:
template <class T>
double D2MatrixDet(const Matrix<T, 2, 2>& _m)
template <class T>
double D3MatrixDet(const Matrix<T, 3, 3>& _m)
The template function that I have to write has two template parameters: data type of the input matrix, dimension of the matrix (since the determinant is calculated for square matrices, only one dimension is enough). It is a recursive function; the variable "result" is the one that keeps the determinant in memory at each step. Below, the code I wrote.
template <class T, unsigned int D>
void DNMatrixDet(Matrix<T, D, D> _m, double result) //LaPlace Rule respect to the first column
{
const unsigned int new_D = D - 1;
Matrix<T, new_D, new_D> temp;
if (D > 3)
{
for (unsigned int i = 0; i < _m.row; ++i)
//Indicate the element to multiply
{
for (unsigned int j = _m.row, l = 0; j < _m.row * _m.column && l < pow(new_D, 2); ++j)
//Manage the element to be inserted in temp
{
bool invalid_row = false;
for (unsigned int k = 1; k < _m.row && invalid_row == false; ++k) //Slide over row
{
if (j == (i + k * _m.row))
{
invalid_row = true;
}
}
if (invalid_row == false)
{
temp.components[l] = _m.components[j];
++l;
}
}
DNMatrixDet(temp, result);
result += pow((-1), i) * _m.components[i] * result;
}
}
else if (D == 3)
{
result += D3MatrixDet(_m);
}
}
In main, I test the function using a 5 x 5 matrix.
When I try to compile, several errors come out, all very similar and that have to do with the size of the matrix which is decreased by one at each step. This is when the initial matrix size is 5 (LA is the name of the library and Test.cpp is the file that contains the main):
LA.h: In instantiation of 'void LA::DNMatrixDet(LA::Matrix<T, M, M>, double) [with T = double;
unsigned int D = 5]':
Test.cpp:437:33: required from here
LA.h:668:34: error: no matching function for call to 'D3MatrixDet(LA::Matrix<double, 5, 5>&)'
result += D3MatrixDet(_m);
~~~~~~~~~~~^~~~
In file included from Test.cpp:1:
LA.h:619:12: note: candidate: 'template<class T> double LA::D3MatrixDet(const LA::Matrix<T, 3, 3>&)'
double D3MatrixDet(const Matrix<T, 3, 3>& _m)
^~~~~~~~~~~
LA.h:619:12: note: template argument deduction/substitution failed:
In file included from Test.cpp:1:
LA.h:668:34: note: template argument '5' does not match '3'
result += D3MatrixDet(_m);
~~~~~~~~~~~^~~~
This is when the size becomes 4:
LA.h: In instantiation of 'void LA::DNMatrixDet(LA::Matrix<T, M, M>, double) [with T = double;
unsigned int D = 4]':
LA.h:662:28: required from 'void LA::DNMatrixDet(LA::Matrix<T, M, M>, double) [with T = double;
unsigned int D = 5]'
Test.cpp:437:33: required from here
LA.h:668:34: error: no matching function for call to 'D3MatrixDet(LA::Matrix<double, 4, 4>&)'
In file included from Test.cpp:1:
LA.h:619:12: note: candidate: 'template<class T> double LA::D3MatrixDet(const LA::Matrix<T, 3, 3>&)'
double D3MatrixDet(const Matrix<T, 3, 3>& _m)
^~~~~~~~~~~
LA.h:619:12: note: template argument deduction/substitution failed:
In file included from Test.cpp:1:
LA.h:668:34: note: template argument '4' does not match '3'
result += D3MatrixDet(_m);
~~~~~~~~~~~^~~~
And so on. It keeps going down until starting over at 4294967295 (which I found to be the upper limit of a 32 bit "unsigned int") and continuing to go down until I reach the maximum number of template instances (= 900).
At each iteration, the compiler always checks the function for calculating the determinant of a 3 x 3, even if that function is only executed when the input matrix is a 3 x 3. So why does it check something that in theory should never to happen?
I double-checked the mathematical logic of what I wrote several times, even with the help of a matrix written on paper and slowly carrying out the first steps. I believe and hope it is right. I'm pretty sure the problem has to do with using templates and recursive function.
I apologize for the very long question, I tried to explain it in the best possible way. I hope I have well explained the problem.
EDIT:
Fixed problem by defining "if constexpr" at the beginning of DNMatrixDet function. The compilation is successful. I just need to fix the algorithm, but this is beyond the scope of the post. Below is the reprex with the changes made:
template <class T, unsigned int M, unsigned int N>
class Matrix
{
public:
T components[M * N];
unsigned int row = M;
unsigned int column = N;
Matrix()
{
for (unsigned int i = 0; i < M * N; ++i)
{
components[i] = 1;
}
}
Matrix(T* _c)
{
for (unsigned int i = 0; i < M * N; ++i, ++_c)
{
components[i] = *_c;
}
}
friend std::ostream& operator<<(std::ostream& output, const Matrix& _m)
{
output << _m.row << " x " << _m.column << " matrix:" << std::endl;
for (unsigned int i = 0; i < _m.row; ++i)
{
for (unsigned int j = 0; j < _m.column; ++j)
{
if (j == _m.column -1)
{
output << _m.components[i + j*_m.row];
}
else
{
output << _m.components[i + j*_m.row] << "\t";
}
}
output << std::endl;
}
return output;
}
};
template <class T>
double D3MatrixDet(const Matrix<T, 3, 3>& _m)
{
double result = _m.components[0] * _m.components[4] * _m.components[8] +
_m.components[3] * _m.components[7] * _m.components[2] +
_m.components[6] * _m.components[1] * _m.components[5] -
(_m.components[6] * _m.components[4] * _m.components[2] +
_m.components[3] * _m.components[1] * _m.components[8] +
_m.components[0] * _m.components[7] * _m.components[5]);
return result;
}
template <class T, unsigned int D>
void DNMatrixDet(Matrix<T, D, D> _m, double result)
{
Matrix<T, D - 1, D - 1> temp;
if constexpr (D > 3)
{
for (unsigned int i = 0; i < D; ++i)
{
for (unsigned int j = D, l = 0; j < D * D && l < (D - 1) * (D - 1); ++j)
{
bool invalid_row = false;
for (unsigned int k = 1; k < D && invalid_row == false; ++k)
{
if (j == (i + k * D))
{
invalid_row = true;
}
}
if (invalid_row == false)
{
temp.components[l] = _m.components[j];
++l;
}
}
DNMatrixDet(temp, result);
result += i & 1 ? -1 : 1 * _m.components[i] * result;
}
}
else if (D == 3)
{
result += D3MatrixDet(_m);
}
}
int main()
{
double m_start[25] = {4, 9, 3, 20, 7, 10, 9, 50, 81, 7, 20, 1, 36, 98, 4, 20, 1, 8, 5, 93, 47, 21, 49, 36, 92};
Matrix<double, 5, 5> m = Matrix<double, 5, 5> (m_start);
double m_det = 0;
DNMatrixDet(m, m_det);
std::cout << "m is " << m << std::endl;
std::cout << "Det of m is " << m_det << std::endl;
return 0;
}
When you pass as an argument _m with the type Matrix<T, 5, 5>, the trailing else branch contains the code result += D3MatrixDet(_m);. The compiler will still try to compile this and notice that it cannot find a matching constructor.
Since we know at compile-time whether to take this branch or not, we can instruct the compiler by using if constexpr instead. Since we are within a template, the compiler will no longer check this branch if it is discarded.
So let's change if (D > 3) to if constexpr (D > 3).
Related
I am writing the below linear interpolation function, which is meant to be generic, but current result is not.
The function finds desired quantity of equally distant points linear in between two given boundary points. Both desired quantity and boundaries are given as parameters. As return, a vector of linear interpolated values is returned.
The issue I have concerns to return type, which always appear to be integer, even when it should have some mantissa, for example:
vec = interpolatePoints(5, 1, 4);
for (auto val : vec) std::cout << val << std::endl; // prints 4, 3, 2, 1
But it should have printed: 4.2, 3.4, 2.6, 1.8
What should I do to make it generic and have correct return values?
code:
template <class T>
std::vector<T> interpolatePoints(T lower_limit, T high_limit, const unsigned int quantity) {
auto step = ((high_limit - lower_limit)/(double)(quantity+1));
std::vector<T> interpolated_points;
for(unsigned int i = 1; i <= quantity; i++) {
interpolated_points.push_back((std::min(lower_limit, high_limit) + (step*i)));
}
return interpolated_points;
}
After some simplifications the function might look like:
template<typename T, typename N, typename R = std::common_type_t<double, T>>
std::vector<R> interpolate(T lo_limit, T hi_limit, N n) {
const auto lo = static_cast<R>(lo_limit);
const auto hi = static_cast<R>(hi_limit);
const auto step = (hi - lo) / (n + 1);
std::vector<R> pts(n);
const auto gen = [=, i = N{0}]() mutable { return lo + step * ++i; };
std::generate(pts.begin(), pts.end(), gen);
return pts;
}
The type of elements in the returned std::vector is std::common_type_t<double, T>. For int, it is double, for long double, it is long double. double looks like a reasonable default type.
You just have to pass correct type:
auto vec = interpolatePoints(5., 1., 4); // T deduced as double
Demo
And in C++20, you might use std::lerp, to have:
template <class T>
std::vector<T> interpolatePoints(T lower_limit, T high_limit, const unsigned int quantity) {
auto step = 1 / (quantity + 1.);
std::vector<T> interpolated_points;
for(unsigned int i = 1; i <= quantity; i++) {
interpolated_points.push_back(std::lerp(lower_limit, high_limit, step * i));
}
return interpolated_points;
}
Demo
I'm working on a deep-learning project in which I've written some tests to evaluate net weights in a neural net. The code is looks like this for evaluate_net_weight:
/*! Compute the loss of the net as a function of the weight at index (i,j) in
* layer l. dx is added as an offset to the current value of the weight. */
//______________________________________________________________________________
template <typename Architecture>
auto evaluate_net_weight(TDeepNet<Architecture> &net, std::vector<typename Architecture::Matrix_t> & X,
const typename Architecture::Matrix_t &Y, const typename Architecture::Matrix_t &W, size_t l,
size_t k, size_t i, size_t j, typename Architecture::Scalar_t xvalue) ->
typename Architecture::Scalar_t
{
using Scalar_t = typename Architecture::Scalar_t;
Scalar_t prev_value = net.GetLayerAt(l)->GetWeightsAt(k).operator()(i,j);
net.GetLayerAt(l)->GetWeightsAt(k).operator()(i,j) = xvalue;
Scalar_t res = net.Loss(X, Y, W, false, false);
net.GetLayerAt(l)->GetWeightsAt(k).operator()(i,j) = prev_value;
//std::cout << "compute loss for weight " << xvalue << " " << prev_value << " result " << res << std::endl;
return res;
}
and the function is being called as follows:
// Testing input gate: input weights k = 0
auto &Wi = layer->GetWeightsAt(0);
auto &dWi = layer->GetWeightGradientsAt(0);
for (size_t i = 0; i < (size_t) Wi.GetNrows(); ++i) {
for (size_t j = 0; j < (size_t) Wi.GetNcols(); ++j) {
auto f = [&lstm, &XArch, &Y, &weights, i, j](Scalar_t x) {
return evaluate_net_weight(lstm, XArch, Y, weights, 0, 0, i, j, x);
};
ROOT::Math::Functor1D func(f);
double dy = deriv.Derivative1(func, Wi(i,j), 1.E-5);
Double_t dy_ref = dWi(i,j);
// Compute relative error if dy != 0
Double_t error;
std::string errorType;
if (std::fabs(dy_ref) > 1e-15) {
error = std::fabs((dy - dy_ref) / dy_ref);
errorType = "relative";
} else {
error = std::fabs(dy - dy_ref);
errorType = "absolute";
}
if (debug) std::cout << "Input Gate: input weight gradients (" << i << "," << j << ") : (comp, ref) " << dy << ", " << dy_ref << std::endl;
if (error >= maximum_error) {
maximum_error = error;
maxErrorType = errorType;
}
}
}
The XArch is my inputs, Y is predictions, lstm refers to type of network. These are already been defined.
When I try to build program using cmake, I usually get this error:
/Users/harshitprasad/Desktop/gsoc-rnn/root/tmva/tmva/test/DNN/RNN/TestLSTMBackpropagation.h:385:24: error:
no matching function for call to 'evaluate_net_weight'
return evaluate_net_weight(lstm, XArch, Y, weights, 0, 2, i, j, x);
^~~~~~~~~~~~~~~~~~~
/Users/harshitprasad/Desktop/gsoc-rnn/root/tmva/tmva/test/DNN/RNN/TestLSTMBackpropagation.h:67:6: note:
candidate function [with Architecture = TMVA::DNN::TReference<double>] not viable: no known
conversion from 'Scalar_t' (aka 'TMatrixT<double>') to 'typename TReference<double>::Scalar_t'
(aka 'double') for 9th argument
auto evaluate_net_weight(TDeepNet<Architecture> &net, std::vector<typename Architecture::Matr...
I'm not able to figure it out, why this error is happening? It would be great if anyone can help me out with this issue. Thanks!
You might have different and conflicting definitions of your custom type Scalar_t, in different scopes.
From the error message, we can see that the function expects a typename TReference<double>::Scalar_t (which is equivalent to double), but you actually pass an argument of type Scalar_t (which is defined somewhere in the global scope maybe), which is equivalent to TMatrixT<double>, which causes the error, as Some programmer dude mentioned.
I'm writing a templated matrix class using C++14. This class has three template parameters: the type of data stored (dtype), the number of rows (N) and the number of columns (M).
The class signature is
template<class dtype, size_t N, size_t M>
class Matrix
I've written a determinant member function that calls specific cases when a template parameter has a certain value. For example, when the number of rows is 1 it returns a copy of the matrix itself. Alternatively, when the number of rows is 2 or 3 it returns a 1x1 matrix of the same datatype with the determinant. Finally, when the number of rows is more than 3 it uses a recursive method to calculate the determinant based on the cofactor expansion of the determinant.
I am doing this as an exercise to better learn C++14 so I'd be very grateful for some help.
The code snippet causing issues is this part right here:
Matrix<dtype, 1, 1> det() const {
if (N != M || N >= 12) {
return Matrix<dtype, 1, 1>();
} else if (N == 1) {
return this->copy();
} else if (N == 2) {
return Matrix<dtype, 1, 1>(this->get(0, 0) * this->get(1, 1) - this->get(0, 1) * this->get(1, 0));
} else if (N == 3) {
return Matrix<dtype, 1, 1>(
this->get(0, 0) * (this->get(1, 1) * this->get(2, 2) - this->get(1, 2) * this->get(2, 1)) -
this->get(0, 1) * (this->get(1, 0) * this->get(2, 2) - this->get(1, 2) * this->get(2, 0)) +
this->get(0, 2) * (this->get(1, 0) * this->get(2, 1) - this->get(1, 1) * this->get(2, 0)));
} else if (N < 12) {
Matrix<dtype, 1, 1> determinant;
Matrix<dtype, N - 1, N - 1> sub_matrix;
for (size_t i = 0; i < N; ++i) {
sub_matrix = this->drop_cross(i, i);
Matrix<dtype, 1, 1> sub_det(sub_matrix.det());
if (i % 2 == 0) determinant = determinant + (this->get(0, i) * sub_det);
else if (i % 2 == 1) determinant = determinant - (this->get(0, i) * sub_det);
}
return determinant;
}
}
This function is called by this code:
#include "lin_alg_classes.h"
int main() {
Matrix<double, 3, 3> test3(1.0, true);
std::cout << std::endl;
std::cout << test3.det();
return 0;
}
And gives the following output:
In file included from C:\Users\ekin4\CLionProjects\mt_grav\main.cpp:5:0:
C:\Users\ekin4\CLionProjects\mt_grav\lin_alg_classes.h: In instantiation of 'Matrix<dtype, 1ull, 1ull> Matrix<dtype, N, M>::det() const [with dtype = double; long long unsigned int N = 3ull; long long unsigned int M = 3ull]':
C:\Users\ekin4\CLionProjects\mt_grav\main.cpp:29:28: required from here
C:\Users\ekin4\CLionProjects\mt_grav\lin_alg_classes.h:132:31: error: could not convert 'Matrix<dtype, N, M>::copy<double, 3ull, 3ull>()' from 'Matrix<double, 3ull, 3ull>' to 'Matrix<double, 1ull, 1ull>'
return this->copy();
What I don't understand is why it is calling the N = 1 case when it should be calling the N < 12 case. I have checked braces, parentheses and semicolons, and they are all correct, but for the life of me I don't understand what is happening.
Pre c++17 (if constexpr) you can use SFINAE and enable/disable different versions of det() according the values of N and M.
Something like (sorry: not tested)
template <std::size_t A = N, std::size_t B = M>
std::enable_if_t<(A != B) || (A > 11U), Matrix<dtype, 1, 1>> det() const
{ return Matrix<dtype, 1, 1>(); }
template <std::size_t A = N, std::size_t B = M>
std::enable_if_t<(A == B) && (A == 1U), Matrix<dtype, 1, 1>> det() const
{ return this->copy(); }
// other cases
I am trying to optimize my math calculation code base and I found this piece of code from here
this piece of code tries to calculate the matrix multiplication. However, I don't understand how enum can be used for calculation here. Cnt is a type specified in
template <int I=0, int J=0, int K=0, int Cnt=0>
and somehow we can still do
Cnt = Cnt + 1
Could anyone give me a quick tutorial on how this could be happening?
Thanks
template <int I=0, int J=0, int K=0, int Cnt=0> class MatMult
{
private :
enum
{
Cnt = Cnt + 1,
Nextk = Cnt % 4,
Nextj = (Cnt / 4) % 4,
Nexti = (Cnt / 16) % 4,
go = Cnt < 64
};
public :
static inline void GetValue(D3DMATRIX& ret, const D3DMATRIX& a, const D3DMATRIX& b)
{
ret(I, J) += a(K, J) * b(I, K);
MatMult<Nexti, Nextj, Nextk, Cnt>::GetValue(ret, a, b);
}
};
// specialization to terminate the loop
template <> class MatMult<0, 0, 0, 64>
{
public :
static inline void GetValue(D3DMATRIX& ret, const D3DMATRIX& a, const D3DMATRIX& b) { }
};
Or maybe I should ask more specifically, how does Nexti, Nextj, Nextk, Cnt get propagated to the next level when the for loop is unrolled.
thanks
I know power of 2 can be implemented using << operator.
What about power of 10? Like 10^5? Is there any way faster than pow(10,5) in C++? It is a pretty straight-forward computation by hand. But seems not easy for computers due to binary representation of the numbers... Let us assume I am only interested in integer powers, 10^n, where n is an integer.
Something like this:
int quick_pow10(int n)
{
static int pow10[10] = {
1, 10, 100, 1000, 10000,
100000, 1000000, 10000000, 100000000, 1000000000
};
return pow10[n];
}
Obviously, can do the same thing for long long.
This should be several times faster than any competing method. However, it is quite limited if you have lots of bases (although the number of values goes down quite dramatically with larger bases), so if there isn't a huge number of combinations, it's still doable.
As a comparison:
#include <iostream>
#include <cstdlib>
#include <cmath>
static int quick_pow10(int n)
{
static int pow10[10] = {
1, 10, 100, 1000, 10000,
100000, 1000000, 10000000, 100000000, 1000000000
};
return pow10[n];
}
static int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
static int opt_int_pow(int n)
{
int r = 1;
const int x = 10;
while (n)
{
if (n & 1)
{
r *= x;
n--;
}
else
{
r *= x * x;
n -= 2;
}
}
return r;
}
int main(int argc, char **argv)
{
long long sum = 0;
int n = strtol(argv[1], 0, 0);
const long outer_loops = 1000000000;
if (argv[2][0] == 'a')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += quick_pow10(n);
}
}
}
if (argv[2][0] == 'b')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += integer_pow(10,n);
}
}
}
if (argv[2][0] == 'c')
{
for(long i = 0; i < outer_loops / n; i++)
{
for(int j = 1; j < n+1; j++)
{
sum += opt_int_pow(n);
}
}
}
std::cout << "sum=" << sum << std::endl;
return 0;
}
Compiled with g++ 4.6.3, using -Wall -O2 -std=c++0x, gives the following results:
$ g++ -Wall -O2 -std=c++0x pow.cpp
$ time ./a.out 8 a
sum=100000000000000000
real 0m0.124s
user 0m0.119s
sys 0m0.004s
$ time ./a.out 8 b
sum=100000000000000000
real 0m7.502s
user 0m7.482s
sys 0m0.003s
$ time ./a.out 8 c
sum=100000000000000000
real 0m6.098s
user 0m6.077s
sys 0m0.002s
(I did have an option for using pow as well, but it took 1m22.56s when I first tried it, so I removed it when I decided to have optimised loop variant)
There are certainly ways to compute integral powers of 10 faster than using std::pow()! The first realization is that pow(x, n) can be implemented in O(log n) time. The next realization is that pow(x, 10) is the same as (x << 3) * (x << 1). Of course, the compiler knows the latter, i.e., when you are multiplying an integer by the integer constant 10, the compiler will do whatever is fastest to multiply by 10. Based on these two rules it is easy to create fast computations, even if x is a big integer type.
In case you are interested in games like this:
A generic O(log n) version of power is discussed in Elements of Programming.
Lots of interesting "tricks" with integers are discussed in Hacker's Delight.
A solution for any base using template meta-programming :
template<int E, int N>
struct pow {
enum { value = E * pow<E, N - 1>::value };
};
template <int E>
struct pow<E, 0> {
enum { value = 1 };
};
Then it can be used to generate a lookup-table that can be used at runtime :
template<int E>
long long quick_pow(unsigned int n) {
static long long lookupTable[] = {
pow<E, 0>::value, pow<E, 1>::value, pow<E, 2>::value,
pow<E, 3>::value, pow<E, 4>::value, pow<E, 5>::value,
pow<E, 6>::value, pow<E, 7>::value, pow<E, 8>::value,
pow<E, 9>::value
};
return lookupTable[n];
}
This must be used with correct compiler flags in order to detect the possible overflows.
Usage example :
for(unsigned int n = 0; n < 10; ++n) {
std::cout << quick_pow<10>(n) << std::endl;
}
An integer power function (which doesn't involve floating-point conversions and computations) may very well be faster than pow():
int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
Edit: benchmarked - the naive integer exponentiation method seems to outperform the floating-point one by about a factor of two:
h2co3-macbook:~ h2co3$ cat quirk.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <string.h>
#include <math.h>
int integer_pow(int x, int n)
{
int r = 1;
while (n--)
r *= x;
return r;
}
int main(int argc, char *argv[])
{
int x = 0;
for (int i = 0; i < 100000000; i++) {
x += powerfunc(i, 5);
}
printf("x = %d\n", x);
return 0;
}
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=integer_pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -1945812992
real 0m1.169s
user 0m1.164s
sys 0m0.003s
h2co3-macbook:~ h2co3$ clang -Wall -o quirk quirk.c -Dpowerfunc=pow
h2co3-macbook:~ h2co3$ time ./quirk
x = -2147483648
real 0m2.898s
user 0m2.891s
sys 0m0.004s
h2co3-macbook:~ h2co3$
No multiplication and no table version:
//Nx10^n
int Npow10(int N, int n){
N <<= n;
while(n--) N += N << 2;
return N;
}
Here is a stab at it:
// specialize if you have a bignum integer like type you want to work with:
template<typename T> struct is_integer_like:std::is_integral<T> {};
template<typename T> struct make_unsigned_like:std::make_unsigned<T> {};
template<typename T, typename U>
T powT( T base, U exponent ) {
static_assert( is_integer_like<U>::value, "exponent must be integer-like" );
static_assert( std::is_same< U, typename make_unsigned_like<U>::type >::value, "exponent must be unsigned" );
T retval = 1;
T& multiplicand = base;
if (exponent) {
while (true) {
// branch prediction will be awful here, you may have to micro-optimize:
retval *= (exponent&1)?multiplicand:1;
// or /2, whatever -- `>>1` is probably faster, esp for bignums:
exponent = exponent>>1;
if (!exponent)
break;
multiplicand *= multiplicand;
}
}
return retval;
}
What is going on above is a few things.
First, so BigNum support is cheap, it is templateized. Out of the box, it supports any base type that supports *= own_type and either can be implicitly converted to int, or int can be implicitly converted to it (if both is true, problems will occur), and you need to specialize some templates to indicate that the exponent type involved is both unsigned and integer-like.
In this case, integer-like and unsigned means that it supports &1 returning bool and >>1 returning something it can be constructed from and eventually (after repeated >>1s) reaches a point where evaluating it in a bool context returns false. I used traits classes to express the restriction, because naive use by a value like -1 would compile and (on some platforms) loop forever, while (on others) would not.
Execution time for this algorithm, assuming multiplication is O(1), is O(lg(exponent)), where lg(exponent) is the number of times it takes to <<1 the exponent before it evaluates as false in a boolean context. For traditional integer types, this would be the binary log of the exponents value: so no more than 32.
I also eliminated all branches within the loop (or, made it obvious to existing compilers that no branch is needed, more precisely), with just the control branch (which is true uniformly until it is false once). Possibly eliminating even that branch might be worth it for high bases and low exponents...
Now, with constexpr, you can do like so:
constexpr int pow10(int n) {
int result = 1;
for (int i = 1; i<=n; ++i)
result *= 10;
return result;
}
int main () {
int i = pow10(5);
}
i will be calculated at compile time. ASM generated for x86-64 gcc 9.2:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 100000
mov eax, 0
pop rbp
ret
You can use the lookup table which will be by far the fastest
You can also consider using this:-
template <typename T>
T expt(T p, unsigned q)
{
T r(1);
while (q != 0) {
if (q % 2 == 1) { // q is odd
r *= p;
q--;
}
p *= p;
q /= 2;
}
return r;
}
This function will calculate x ^ y much faster then pow. In case of integer values.
int pot(int x, int y){
int solution = 1;
while(y){
if(y&1)
solution*= x;
x *= x;
y >>= 1;
}
return solution;
}
A generic table builder based on constexpr functions. The floating point part requires c++20 and gcc, but the non-floating point part works for c++17. If you change the "auto" type param to "long" you can use c++14. Not properly tested.
#include <cstdio>
#include <cassert>
#include <cmath>
// Precomputes x^N
// Inspired by https://stackoverflow.com/a/34465458
template<auto x, unsigned char N, typename AccumulatorType>
struct PowTable {
constexpr PowTable() : mTable() {
AccumulatorType p{ 1 };
for (unsigned char i = 0; i < N; ++i) {
p *= x;
mTable[i] = p;
}
}
AccumulatorType operator[](unsigned char n) const {
assert(n < N);
return mTable[n];
}
AccumulatorType mTable[N];
};
long pow10(unsigned char n) {
static constexpr PowTable<10l, 10, long> powTable;
return powTable[n-1];
}
double powe(unsigned char n) {
static constexpr PowTable<2.71828182845904523536, 10, double> powTable;
return powTable[n-1];
}
int main() {
printf("10^3=%ld\n", pow10(3));
printf("e^2=%f", powe(2));
assert(pow10(3) == 1000);
assert(powe(2) - 7.389056 < 0.001);
}
Based on Mats Petersson approach, but compile time generation of cache.
#include <iostream>
#include <limits>
#include <array>
// digits
template <typename T>
constexpr T digits(T number) {
return number == 0 ? 0
: 1 + digits<T>(number / 10);
}
// pow
// https://stackoverflow.com/questions/24656212/why-does-gcc-complain-error-type-intt-of-template-argument-0-depends-on-a
// unfortunatly we can't write `template <typename T, T N>` because of partial specialization `PowerOfTen<T, 1>`
template <typename T, uintmax_t N>
struct PowerOfTen {
enum { value = 10 * PowerOfTen<T, N - 1>::value };
};
template <typename T>
struct PowerOfTen<T, 1> {
enum { value = 1 };
};
// sequence
template<typename T, T...>
struct pow10_sequence { };
template<typename T, T From, T N, T... Is>
struct make_pow10_sequence_from
: make_pow10_sequence_from<T, From, N - 1, N - 1, Is...> {
//
};
template<typename T, T From, T... Is>
struct make_pow10_sequence_from<T, From, From, Is...>
: pow10_sequence<T, Is...> {
//
};
// base10list
template <typename T, T N, T... Is>
constexpr std::array<T, N> base10list(pow10_sequence<T, Is...>) {
return {{ PowerOfTen<T, Is>::value... }};
}
template <typename T, T N>
constexpr std::array<T, N> base10list() {
return base10list<T, N>(make_pow10_sequence_from<T, 1, N+1>());
}
template <typename T>
constexpr std::array<T, digits(std::numeric_limits<T>::max())> base10list() {
return base10list<T, digits(std::numeric_limits<T>::max())>();
};
// main pow function
template <typename T>
static T template_quick_pow10(T n) {
static auto values = base10list<T>();
return values[n];
}
// client code
int main(int argc, char **argv) {
long long sum = 0;
int n = strtol(argv[1], 0, 0);
const long outer_loops = 1000000000;
if (argv[2][0] == 't') {
for(long i = 0; i < outer_loops / n; i++) {
for(int j = 1; j < n+1; j++) {
sum += template_quick_pow10(n);
}
}
}
std::cout << "sum=" << sum << std::endl;
return 0;
}
Code does not contain quick_pow10, integer_pow, opt_int_pow for better readability, but tests done with them in the code.
Compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), using -Wall -O2 -std=c++0x, gives the following results:
$ g++ -Wall -O2 -std=c++0x main.cpp
$ time ./a.out 8 a
sum=100000000000000000
real 0m0.438s
user 0m0.432s
sys 0m0.008s
$ time ./a.out 8 b
sum=100000000000000000
real 0m8.783s
user 0m8.777s
sys 0m0.004s
$ time ./a.out 8 c
sum=100000000000000000
real 0m6.708s
user 0m6.700s
sys 0m0.004s
$ time ./a.out 8 t
sum=100000000000000000
real 0m0.439s
user 0m0.436s
sys 0m0.000s
if you want to calculate, e.g.,10^5, then you can:
int main() {
cout << (int)1e5 << endl; // will print 100000
cout << (int)1e3 << endl; // will print 1000
return 0;
}
result *= 10 can also be written as result = (result << 3) + (result << 1)
constexpr int pow10(int n) {
int result = 1;
for (int i = 0; i < n; i++) {
result = (result << 3) + (result << 1);
}
return result;
}