Partial template specialization to unroll loops for specific sizes

Partial template specialization to unroll loops for specific sizes - c++

I have written a matrix class which can take different sizes. Now I want to unroll loops for specific sizes. How should I do this?
The only way I can seem to get working is a child-class for 2-d. But this I would like to avoid, as it would result in much duplicate code.
For example:
#include <iostream>
template<class T, size_t M, size_t N>
class matrix
{
matrix<T,M,N>& operator*= (const matrix<T,M,N> &B);
};
template<class T, size_t M, size_t N>
matrix<T,M,N>& matrix<T,M,N>::operator*= (const matrix<T,M,N> &B)
{
// ...
return *this;
}
int main()
{
return 0;
}
Now I would like to add an implementation for the case that M = 2 and N = 2 where I unroll all loops to gain efficiency.
(I have timed the unroll in a previous implemenation, and it really does seem to make sense, in particular for more complicated operations then featured in this example.)

You can delegate operator*= to an overloaded function template. E.g.:
template<class T, size_t M, size_t N>
class matrix
{
public:
matrix<T,M,N>& operator*=(const matrix<T,M,N>&);
};
// Generic version.
template<class T, size_t M, size_t N>
void inplace_dot(matrix<T,M,N>& a, matrix<T,M,N> const& b);
// Overload for 2x2 matrix.
template<class T>
void inplace_dot(matrix<T,2,2>& a, matrix<T,2,2> const& b);
template<class T, size_t M, size_t N>
matrix<T,M,N>& matrix<T,M,N>::operator*=(const matrix<T,M,N>& b)
{
inplace_dot(*this, b);
return *this;
}

Related

C++ template matrix class - square matrix specialisation

I'm creating a simple C++ matrix template class, with the following definition:
template<uint n, uint m, typename T = double>
class Matrix {
private:
T data[n][m];
static Matrix<n, m, T> I;
public:
Matrix();
Matrix(std::initializer_list<T> l);
T& at(uint i, uint j); // one-based index
T& at_(uint i, uint j); // zero-based index
template<uint k> Matrix<n, k, T> operator*(Matrix<m, k, T>& rhs);
Matrix<m, n, T> transpose();
Matrix<n, m, T> operator+(const Matrix<n, m, T>& rhs);
Matrix<n, m, T>& operator+=(const Matrix<n, m, T>& rhs);
Matrix<n, m, T> operator-(const Matrix<n, m, T>& rhs);
Matrix<n, m, T>& operator-=(const Matrix<n, m, T>& rhs);
Matrix<n, m, T> operator*(const T& rhs);
Matrix<n, m, T>& operator*=(const T& rhs);
Matrix<n, m, T> operator/(const T& rhs);
Matrix<n, m, T>& operator/=(const T& rhs);
static Matrix<n, m, T> identity();
};
(uint is defined as an unsigned int)
The final function Matrix<n, m, T> identity() aims to return the static I member which is the identity matrix using a basic singleton pattern. Obviously the identity matrix is only defined for square matrices so I tried this:
template<uint n, typename T>
inline Matrix<n, n, T> Matrix<n, n, T>::identity() {
if (!I) {
I = Matrix<n, n, T>();
for (uint i = 0; i < n; ++i) {
I.at(i, i) = 1;
}
}
return I;
}
Which gives the error C2244 'Matrix<n,n,T>::identity': unable to match function definition to an existing declaration.
My impression was that I could do some sort of specialisation of the template where the number of columns and rows are equal. I'm not sure if this is even possible, but your help would be much appreciated.

Try this:
static Matrix<n, m> identity() {
static_assert(n == m, "Only square matrices have a identity");
return {}; //TODO
}
See: http://cpp.sh/7te2z

Partial specialization of a template class is allowed for the entire class, but not for its single member.
Members of partial specializations are not related to the members of the primary
template.
That is the reason behind your compilation error.
Of course, specializing the entire Matrix template for the square matrix case doesn't make sense. #tkausl has already answered how to make the identity() function available only for the square matrix types.
However, I would like to draw your attention toward a couple of issues in your implementation of Matrix:
The matrix data is a plain array (instead of a dynamically allocated array or std::vector). This has the following disadvantages:
sizeof(Matrix<N, M, T>) == sizeof(T)*N*M. Allocating large matrices on the stack may result in stack overflow.
Impossibility of taking advantage of move semantics (as the data is inseparable from the matrix object).
Though, if the matrix dimensions are fixed at compile time constants, then, most probably, they will be small numbers. If so, paying the cost of dynamic allocation may indeed be unjustified.
identity() returns its result by value (rather than by constant reference).
I (the identity matrix) is a static member of the class. It is better to make it a static variable inside the identity() function.

C++ introduce 'new' template arguments

I'm trying to make a reusable matrix class with the dimensions and type given in the template arguments. The struct itself is just:
template <unsigned int N, unsigned int M, typename T>
struct Matrix
{
T elements[N* M];
};
When I tried to implement matrix multiplication, I came to a problem that I need to introduce new template arguments. The size of the original matrix is N * M. The size of the second matrix is L * N and the result matrix is N * K. So the multiply function would look something like:
Matrix<N, K, T> Multiply(const Matrix<L, N, T>& other) {... }
But then I need to create a template for the function, so calling would become mat.Multiply<x, y>(mat2), which means that I have to specify things twice. Is there a way to avoid this? (something like Matrix<N, unsigned int K, T>)
Edit: I've tried this:
template <unsigned int K, unsigned int L>
Matrix<N, K, T> Multiply(const Matrix<L, N, T>& other)
And with this code I get an error saying no instance of function template matches the argument list:
Matrix<3, 2, int> mat;
Matrix<2, 3, int> mat2;
mat.Multiply(mat2)
By the way I'm using MSVC and Visual Studio.

so calling would become mat.Multiply<x, y>(mat2)
If the multiplication member function were a function template like so
template <unsigned int N, unsigned int M, typename T>
struct Matrix
{
template <unsigned int L>
Matrix<N, L, T> Multiply(const Matrix<M, L, T>& other) {... }
T elements[N * M];
};
then template argument deduction allows you to call the function like this:
mat.Multiply(mat2)
Note: you should probably consider implementing a non-member operator* too, to allow for this:
auto mat3 = mat * mat2;

It is normal that Multiply is also template.
Note that K == L for a multiplication.
template <unsigned int N, unsigned int M, typename T>
struct Matrix
{
template <unsigned int K>
Matrix<N, K, T> Multiply(const Matrix<M, K, T>& other) const {/*..*/}
T elements[N* M];
};
In the call, all template argument are deducible so you can call it that way:
int main()
{
const int N = 4;
const int M = 5;
const int K = 6;
using T = float;
Matrix<N, M, T> rhs;
Matrix<M, K, T> lhs;
Matrix<N, K, T> res = rhs.Multiply(lhs);
}

how to partially specialize a templated c++ class?

I wrote a generic matrix class like so:
template <std::size_t N, std::size_t M, class T>
class CMatrix
{
protected:
T* m_elem;
public:
CMatrix() : m_elem(new T[N*M]) {}
~CMatrix() { delete[] m_elem; }
inline bCMatrix& operator*=(const T& val);
inline bCMatrix& operator/=(const T& val);
...
template <std::size_t L> inline CMatrix<L, N, T> operator*(const CMatrix<M, L, T>& mat) const;
};
Now I would like to make several specializations for CMatrix<2,2,T>.
I'd like to add a comstructor CMatrix<2,2,T>(const T& a00, const T& a01, const T& a10, const T& a11) {...}.
And I like to specialize the matrix multiplication for CMatrix<2x2>*CMatrix<2x2>. But on the other hand I'd like to keep the functionalities of *= and /= and so on.
How would I do so?
My only working attempt is to rewrite (copy/past/search/replace) all the code for the specialization N=3, M=3, like:
template <class T>
class CMatrix<3,3,T>
{
...
};
Now I can specialize the matrix-matrix multiplication for 2x2. But I have two nearly identical pieces of code, like one for the generic matrix-vector multiplication and one for the matrix-vector multiplication with matrix 2x2. Which both have to be changed if I optimize the generic one. Sucks to maintain that.

You can read some documentation on class template, such as http://en.cppreference.com/w/cpp/language/partial_specialization. For you it entails defining the specialization in a way like this:
template<typename T>
class CMatrix<2, 2, T>
{
public:
CMatrix<2,2,T>(const T& a00, const T& a01, const T& a10, const T& a11);
...
template <std::size_t L> inline CMatrix<L, 2, T> operator*(const CMatrix<2, L, T>& mat) const;
// Special overload for 2x2 x 2x2 multiplication
inline CMatrix<2, 2, T> operator*(const CMatrix<2, 2, T>& mat) const;
};
To prevent duplication of code for operator*= etc you can make a template<int M, int N, typename T> class CMatrixBase {} and implement them in there, and have your actual matrix classes and specializations inherit from that.
Martin suggested to use CRTP, which helps you to not lose your type information. This would look like this:
template<int M, int N, typename T, typename Derived>
class CMatrixBase
{
public:
Derived& operator*=(T val) {
doMultiplication(val);
return *static_cast<Derived*>(this);
}
};

C++ Vector Template Per-Component Operations

I'm rewriting the vector math portion of my project, and I'd like to generalize vectors by their type and number of dimensions. A vector<T, N> represents an N dimensional vector of type T.
template<typename T, int N>
struct vector {
T data[N];
};
I'll need to rewrite many math functions, most of which will operate on a per-component basis. The straightforward implementation of the addition operator is shown below.
template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
vector<T, N> result;
for (int i = 0; i < N; i++) {
result[i] = lhs[i] + rhs[i];
}
return result;
}
My question: Is there a way (via template-trickery?) to implement this without the use of a for loop and a temporary variable? I understand that the compiler would most likely unroll the loop and optimize it away. I just don't like the idea of having all of my performance-critical math functions implemented this way. They will all be inlined and in the header, so having many of these functions would also make for a big ugly header file.
I'm wondering if there is a way to do this which would produce more optimal source code. Possibly a way that works like variadic templates do. Something along the lines of this.
template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
return vector<T, N>(lhs[0] + rhs[0], lhs[1] + rhs[1]...);
}

One way to do this is via lower level "map" functions:
Here's a complete working example
#include <iostream>
#include <math.h>
template<typename T, int N>
struct vector {
T data[N];
};
First declare your worker "map" functions - I've got 3 here map, map2, foreach.
template<typename T, int N, typename FN>
static void foreach(const vector<T,N> & vec, FN f) {
for(int i=0; i<N ;++i) {
f(vec.data[i]);
}
}
template<typename T, int N, typename FN>
static auto map(const vector<T,N> & vec, FN f) -> vector<decltype(f(T(0))), N> {
vector<decltype(f(T(0))), N> result;
for(int i=0; i<N ;++i) {
result.data[i] = f(vec.data[i]);
}
return result;
}
template<typename T1, typename T2, int N, typename FN>
static auto map2(const vector<T1,N> & vecA,
const vector<T2,N> & vecB,
FN f)
-> vector<decltype(f(T1(0), T2(0))), N> {
vector<decltype(f(T1(0), T2(0))), N> result;
for(int i=0; i<N ;++i) {
result.data[i] = f(vecA.data[i], vecB.data[i]);
}
return result;
}
Now use the helpers to define your higher level functions via lambdas. I'll define binary +, binary -, unary - and e^x. Oh and operator<< so we can see what is going on.
I'm pretty sure there's a better alternative to the lambdas used in operator+ and operator-, but I can't remember them
template<typename T, int N>
vector<T,N> operator+(const vector<T,N> &lhs, const vector<T,N> &rhs) {
return map2(lhs, rhs, [](T a,T b) { return a+b;} );
}
template<typename T, int N>
vector<T,N> operator-(const vector<T,N> &lhs, const vector<T,N> &rhs) {
return map2(lhs, rhs, [](T a,T b) { return a-b;} );
}
template<typename T, int N>
vector<T,N> operator-(const vector<T,N> &vec) {
return map(vec, [](T a) { return -a;} );
}
template<typename T, int N>
auto exp(const vector<T,N> &vec) -> vector<decltype(exp(T(0))), N> {
return map(vec, [](T a) { return exp(a); } );
}
template<typename T, int N>
std::ostream & operator<<(std::ostream& os, const vector<T,N> &vec) {
os<<"{";
foreach(vec, [&os](T v) { os<<v<<", "; } );
os<<"}";
return os;
}
Now look how they work just fine...
int main() {
vector<int, 5> v1 = {1,2,3,4,5};
vector<int, 5> v2 = {2,4,6,8,10};
std::cout<<v1 << " + " << v2 << " = " << v1+v2<<std::endl;
std::cout<<v1 << " - " << v2 << " = " << v1-v2<<std::endl;
std::cout<<" exp( - " << v2 << " )= " << exp(-v1)<<std::endl;
}

You can do this and I'll point you towards a solution (which compiles and runs). You are looking to get rid of the loop, preferably by inlining it in hopes the compiler will optimize things for you.
In practice I have found it sufficient to specify the dimensions needed, i.e. N = 3, 4, 5 because this allows finer grained control over what the compiler does than doing what you asked for. However you can use recursion and partial template specialization to implement your operators. I have illustrated addition.
So instead of this:
template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
vector<T, N> result;
for (int i = 0; i < N; i++) {
result[i] = lhs[i] + rhs[i];
}
return result;
}
You want code that effectively does this:
template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
vector<T, N> result;
result[0] = lhs[0] + rhs[0];
result[1] = lhs[1] + rhs[1];
...
result[N-1] = lhs[N-1] + rhs[N-1];
return result;
}
if N is 1, it is pretty easy you just want this...
template
vector operator+(vector lhs, vector rhs) {
vector result;
result[0] = lhs[0] + rhs[0];
return result;
}
and if N is 2, it is pretty easy you just want this...
template
vector operator+(vector lhs, vector rhs) {
vector result;
result[0] = lhs[0] + rhs[0];
result[1] = lhs[1] + rhs[1];
return result;
}
The easiest way is to simply define this up to as many N as you expect to use and not the answer you are looking for because you probably don't need more than N=5 or N=6 in practice right?
However, you can also use partial template specialization and recursion to get there. Consider this struct, which recursively calls itself and then assigns the index:
template<typename T, int N, int IDX>
struct Plus
{
void operator()(vector<T,N>& lhs, vector<T,N>& rhs, vector<T,N>& result)
{
Plus<T,N,IDX-1>()(lhs,rhs,result);
result.data[IDX] = lhs.data[IDX] + rhs.data[IDX];
}
};
and this partial specialization which appears to do nothing, but handles the case when the index is 0 and ends the recursion:
template<typename T, int N>
struct Plus<T,N,-1>
{
void operator()(vector<T,N>& lhs, vector<T,N>& rhs, vector<T,N>& result)
{
//noop
}
};
and finally this implementation of operator+ which instantiates Plus and calls it:
template<typename T, int N>
vector<T, N> operator+(vector<T, N> lhs, vector<T, N> rhs) {
vector<T, N> result;
Plus<T,N,N-1>()(lhs,rhs,result);
return result;
}
You'll need to turn this into an operator to make it more general purpose but you get the idea. However this is mean to the compiler and it may take awhile in big projects even if it is super cool. In practice I have found that hand typing the overloads you want or writing script code to generate the C++ results in a more debuggable experience and code that in the end is simpler to read and easier for the compiler to optimize. More specifically if you write a script to generate the C++ you can include the SIMD intrinsics in the first place and not leave things to chance.

Firstly, compiler would probably unroll the loop.
Secondly, for better performance, pass your argument by const reference instead of by value to avoid extra copies.
And to answer your question, you may use std::index_sequence to construct in place, something like:
namespace detail
{
template<typename T, int N, std::size_t...Is>
vector<T, N> add(std::index_sequence<Is...>,
const vector<T, N>& lhs,
const vector<T, N>& rhs)
{
return {{ (lhs[Is] + rhs[Is])... }};
}
}
template<typename T, int N>
vector<T, N> operator+(const vector<T, N>& lhs, const vector<T, N>& rhs) {
return detail::add(std::make_index_sequence<N>{}, lhs, rhs);
}
Demo

c++ - Implement template operator non-friend

I have a simple generice arithmetic vector class and want to implement the * operator for scalar multiplication:
template<class Value_T, unsigned int N>
class VectorT
{
public:
typedef Value_T value_type;
typedef unsigned int size_type;
size_type GetNumElements() const { return N; }
// class member
// elements in the vector
value_type elements[N];
// the number of elements
static const size_type size = N;
};
// scalar multiplication
template<class Value_T, unsigned int N>
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v, Value_T s)
{
VectorT<Value_T,N> vTemp(v);
for (unsigned int i = 0; i < v.GetNumElements(); i++)
{
vTemp.elements[i] *= s;
}
return vTemp;
}
Using it like this ...
typedef VectorT<double, 3> Vec3;
int main(int argc, char* argv[])
{
Vec3 v;
Vec3 v2 = v * 2; // multiply with scalar int
return 0;
}
... gives compiler error C2782 (MSVC2012) that the template parameter Value_T for the * operator is ambiguous: int or double.
If I define the *operator within my class as friend function the error is gone and it works fine for scalar int or double. But actually there is no need here for friend declaration as the public interface of class VectorT is sufficient for the operator (In my real code I have the members privat and some public accessor methods).
I want the scalar multiplication work only for the Value_T type: For a VectorT<int, N> I want only integer values be given to the *operator. I further want the *operator being commutativ. That's why I don't implement it as a simple member function, where the lefthand operator is always of type VectorT.
How to implement * operator as non-member and non-friend here?

Look at the operator's definition
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v, Value_T s)
and on the line
Vec3 v2 = v * 2; // multiply with scalar int
operator* is called with parameters of type VectorT and int. So Value_T is double once and int the other time. You could multiply with 2.0 to resolve the ambiguity or add a third template parameter to the operator* definition:
template<class Value_T, unsigned int N, class ScalarType>
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v, ScalarType s)

The answers posted so far suggest giving an independent template parameter to the type of function parameter s. This is a viable approach, but not the only one.
Alternatively, a better solution might be to exclude your second argument from the template argument deduction process by intentionally placing it into non-deduced context
// scalar multiplication
template<class Value_T, unsigned int N>
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v,
typename VectorT<Value_T, N>::value_type s)
{
...
}
That way the compiler will deduce the template arguments from the type of v parameter, but not from the type of s parameter. And s will still have the proper type, just like you wanted it to.
The above approach takes advantage of the fact that your class template already provides a convenient inner typename value_type. In a more general case one can achieve the same by using an identity template
template <typename T> struct identity
{
typedef T type;
};
template<class Value_T, unsigned int N>
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v,
typename identity<Value_T>::type s)
{
...
}
The C++ standard library decided not to include std::identity template, since its functionality is apparently covered by std::decay and std::remove_reference. So, by using standard templates you can implement the general solution as
template<class Value_T, unsigned int N>
const VectorT<Value_T, N> operator*(const VectorT<Value_T, N>& v,
typename std::decay<Value_T>::type s)
{
...
}
The following article mentions this specific matter
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3766.html

First of, you could use std::array instead of implementing VectorT. Nevertheless, the problem you have is due to the fact that the integral literal 2 in the expression Vec3 v2 = v * 2; is of type int. Consequently, the compiler can't deduce to the overloaded operator* of yours, since as it is implemented it works only in the cases where your VectorT contains elements of the same type with the multiplier.
In order to overcome this, you could add an additional template argument to your overloaded operator* like the example below:
template<class Value_T1, class Value_T2, unsigned int N>
VectorT<Value_T1, N> operator*(const VectorT<Value_T1, N>& v, Value_T2 s)
{
VectorT<Value_T1, N> vTemp(v);
for(unsigned int i(0), sz(v.GetNumElements()); i < sz; ++i) {
vTemp.elements[i] *= s;
}
return vTemp;
}
LIVE DEMO

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Partial template specialization to unroll loops for specific sizes - c++

Related

C++ template matrix class - square matrix specialisation

C++ introduce 'new' template arguments

how to partially specialize a templated c++ class?

C++ Vector Template Per-Component Operations

c++ - Implement template operator non-friend

Categories

Resources