Intel MKL OOP Wrapper Design and Operator Overloading

Intel MKL OOP Wrapper Design and Operator Overloading - c++

I started writing an oop wrapper for Intels MKL library and came across some design issues. I hope you can help me find the "best" way to handle these issues. The issues are mainly concerning operator overloading and are not critical two the wrapper but effect readability and/or performance.
The first issue is overloading operators considering how the blas functions are defined. As an example, matrix multiplication is defined as
( being matrices, scalars).
Now i can overload , and alone, but for the implementation of BLAS I would need 4 function calls using overloaded operators instead of one. Or i could use a normal function call (which will be implemented anyway), but lose the "natural" way of writing the equation using overloaded operators, making it less readable (but still more readible than with those horrible BLAS names).
The second issue is read and write access to the matrices. As example we can consider the following upper triangular matrix:
This matrix would be stored efficiently in a 1D array like this (order may vary depending on row/column major order):
Since a matrix has two indices, the easiest way to overload reading would be using
<TYPE> & operator() (size_t row, size_t column);
instead of some work around with subscript operators. The problem is handling the zeros. They may not be stored in the array, but mathematically they exist. If I want to read these values in another function (not MKL) I may need to be able to return the zero to handle this (aside from storing the matrix type, which is done for BLAS anyway).
Since () returns a reference, I can't return 0. I could make a dummy variable, but if I were to write to that value, I wouldn't have a upper triangular matrix anymore. So I would have to either change the matrix type, forbid writing to these elements, or ignore it (bad idea).
To change the matrix type I would need to detect writing, that would require explicitly using some kind of proxy object.
To prevent writing, I would probably have to do the same since I can't return a const value because the overload doesn't fit that definition. Alternatively I could forbid writing this way in general, but then I couldn't change the existing matrix itself, which I don't want.
I hope you can give me some pointers on how to handle these issues and what design principles I may be forgetting/should take into account. As I said, they are not critical (I can write appropriate functions for everything instead of operators).
T

I wrote a library for medical image reconstruction https://github.com/kvahed/codeare. The matrix object there has a lot of overloaded operators and convenience function to allow one to write efficiently matlab-like code in c++.
What you want to do for passing the data between MKL and other libraries / algorithms is in my view impossible. How do you want to distinguish 0 from 1e-18. What when you want to go to some numeric optimisation etc. This is premature optimisation that you are looking at. Even if you wanted to use sparsity, you could only do it say column-wise or row-wise, or like above note down, that you have an upper triangular form. But skipping individual 0s. Crazy. Of course copying 0s around doesn't feel right, but getting your algorithms optimised first and then worry about the above would be the way I'd go.
Also don't forget, that a lot of libraries out there cannot handle sparse matrixes, at which point you would have to put in place a recopying of the non-zero part or some bad ass expensive iterator, that would deliver the results.
Btw you would not only need the operator you noted down in your question but also the const variant; in other words:
template <typename T> class Matrix {
...
T& operator()(size_t n, size_t m);
const T& operator()(size_t n, size_t m) const;
...
};
There is so much more expensive stuff to optimise than std::copying stuff. For example SIMD intrinsics ...
https://github.com/kvahed/codeare/blob/master/src/matrix/SIMDTraits.hpp

Related

Is it possible to start indexing of matrices from 1 in Eigen?

I am using Eigen to do some linear algebra computations in my code. However, all of mathematical formulas are based on the fact that indexing starts from 1. So, each time that I want to implement them in the code, I have to check if my indexing in the code is consistent with them or not. I was wondering that if it is possible to tell Eigen to start the indexing from 1 instead of 0.

Indexing operations in Eigen allow, in addition to indexing with integers, indexing with symbolic indices. You should be able implement your own custom symbolic index, derived from Eigen::symbolic::BaseExpr, that could be used as a 1-based index API, where its eval_impl method simply subtracts 1 from its arg. E.g.:
template<typename Arg0>
class MyIndexExpr : public BaseExpr<MyIndexExpr<Arg0> >
{
public:
MyIndexExpr(const Arg0& arg0) : m_arg0(arg0) {}
template<typename T>
Index eval_impl(const T& values) const { return m_arg0.eval_impl(values) - 1; }
protected:
Arg0 m_arg0;
};
To use this in product code, however, would most likely be a very bad idea, likely to lead to confusion, possible bugs as well as an extra non-motivated run-time overhead for any indexing. As you are coding in C++, you might want to stick to its zero-based indexing practice. Maybe you could consider symbolic indexing for when writing tests for your formulas, but using integer zero-based indexing for your product code.

The answer is "not really".
Of course, as #πάνταῥεῖ suggested, you could write a wrapper or inherit from Eigen types and overwrite the indexing operators accordingly. Alternatively, you could implement a custom Index type which, when converted to Eigen::Index, will subtract 1.
But both approaches are error-prone and will likely rather increase confusion, especially if you miss some relevant parts. Also it will extremely confuse any C++ programmer looking at your code, as 0-based indexing is the most natural way in C/C++ (and many languages whose syntax is derived from them, like Java, C#, ...)
Finally, (as also suggested by #dfri) if you code in C++ get used to 0-based indexing, this will save you a lot of trouble in the long run.

How to build a constructor for a matrix type in C++

I'm a beginner in C++, and as a learning exercise I'm trying to write a library for doing matrix math (matrix multiplication, invertion and the like).
The first thing I want to do is to define a class "Matrix", which members are "rows"- the number of rows in the matrix, "cols"-the number of columns in the matrix, and "_matrix"-an array containing the elements of the matrix.
The problem is I don't have any idea how to build the constructor.
Can I write something like "Matrix(m,n,array)"? how do I make sure the array actually contains m*n elements?
I would love some guidence on how to procceed (well.. how to begin if I'm being honest :) )
thanks!

Another answer provides a typical solution one would expect a Matrix class constructor to have (i.e. Matrix(unsigned, unsigned)).
If you are doing it as an exercise and you are serious about learnig C++, I would also suggest implementing the following contructor:
Matrix(std::initializer_list<std::initializer_list<T>> init_list);
Therefore you could build your object like that:
Matrix m({{1,2,3},{4,5,6},{7,8,9}});
Note that you could take size of the constructed matrix straight from the std::initializer_lists provided and you could easily build templated matrices that way.

If I were using a matrix, I would expect a constructor like:
Matrix(unsigned int maximum_rows, unsigned int maximum_columns);
I don't care if the matrix is implemented by array, list or other data structure. I told it the size, so construct one.
Edit 1:
You want to hide the implementation of the Matrix from the user. The implementation of the constructor depends on your implementation.
The implementation may be different for a lower triangular matrix than a generic one. You may decide on a vector of vectors, a 2 dimensional array, a one dimensional array or a linked list.
I as the user don't really care how it's implemented. All I care is that the expected Matrix functionalities are implemented correctly, and in some cases, efficiently. So I may expect an overloaded operator + or an add method or both.
Again, search the internet to see examples of how other people have implemented a matrix.
Edit 2:
There may be cases where you want to have one class for the functionality and another class for the implementation. In that case, you may want to pass the implementation to the Matrix's constructor. (I would suggest using a reference to a base class the describes the implementation interface). But that may be overkill for what you need.

Handling large matrices in C++

I am using large matrices of doubles in C++. I need to get rows or columns from these matrices and pass them to a function. What is the fastest way I can do this?
One way is to write a function that returns a copy of the desired row or column as an std::vector.
Another way is to pass the whole thing as a reference and modify the function to be able to read the desired values.
Are there any other options? Which one do you recommend?
BTW, how do you recommend I store the data in the matrix class? I am using std::vector< std::vector< double > > right now.
EDIT
I mus have mentioned that the matrices might have more that two dimensions. So using boost or arma::mat here is out of the question. Although, I am using armadillo in other parts of the library.

If a variable number of dimensions above 2 is a key requirement, take a look at boost's multidimensional array library. It has efficient (copying free) "views" you can use to reference lower-dimensional "slices" of the full matrix.
The details of what's "fastest" for this sort of thing depends an awful lot on what exactly you're doing, and how the access patterns/working set "footprint" fit to your HW's various levels of cache and memory latency; in practice it can be worth copying to more compact representations to get more cache coherent access, in preference to making sparse strided accesses which just waste a lot of a cache line. Alternatives are Morton-order accessing schemes which can at least amortize "bad axis" effects over all axes. Only your own benchmarking on your own code and use-cases on your HW can really answer that though.
(Note that I wouldn't use Boost.MultiArray for 2 dimensional arrays - there are faster, better options for linear algebra / image processing applications - but for 3+ it's worth considering.)

I would use a library like http://arma.sourceforge.net/ Because not only do you get a way to store the matrix. You also have functions that can do operations on it.

Efficient (multi)linear algebra is a surprisingly deep subject; there are no easy one-size-fits-all answers. The principal challenge is data locality: the memory hardware of your computer is optimized for accessing contiguous regions of memory, and probably cannot operate on anything other than a cache line at a time (and even if it could, the efficiency would go down).
The size of a cache line varies, but think 64 or 128 bytes.
Because of this, it is a non-trivial challenge to lay out the data in a matrix so that it can be accessed efficiently in multiple directions; even more so for higher rank tensors.
And furthermore, the best choices will probably depend heavily on exactly what you're doing with the matrix.
Your question really isn't one that can be satisfactorily answered in a Q&A format like this.
But to at least get you started on researching, here are two keyphrases that may be worth looking into:
block matrix
fast transpose algorithm
You may well do better to use a library rather than trying to roll your own; e.g. blitz++. (disclaimer: I have not used blitz++)

vector<vector<...>> will be slow to allocate, slow to free, and slow to access because it will have more than one dereference (not cache-friendly).
I would recommend it only if your (rows or columns) don't have the same size (jagged arrays).
For a "normal" matrix, you could go for something like:
template <class T, size_t nDim> struct tensor {
size_t dims[nDim];
vector<T> vect;
};
and overload operator(size_t i, size_t j, etc.) to access elements.
operator() will have to do index calculations (you have to choose between row-major or column-major order). For nDim > 2, it becomes somewhat complicated, and it could benefit from caching some indexing computations.
To return a row or a column, you could then define sub types.
template <class T, size_t nDim> struct row /*or column*/ {
tensor<T, nDim> & tensor;
size_t iStart;
size_t stride;
}
Then define an operator(size_t i) that will return tensor.vect[iStart + i*stride]
stride value will depend on whether it is a row or a column, and your (row-major or column-major) ordering choice.
stride will be 1 for one of your sub type. Note that for this sub type, iterating will probably be much faster, because it will be cache-friendly. For other sub types, unfortunately, it will probably be rather slow, and there is not much you can do about it.
See other SO questions about why iterating on the rows then the columns will probably have a huge performance difference than iterating on the columns then the rows.

I recommend you pass it by reference as copying might be a slow process depending on the size. std::vector is fine if you want the ability to expand and contract the container.

Matrix representation using Eigen vs double pointer

I have inherited some code which makes extensive use of double pointers to represent 2D arrays. I have little experience using Eigen but it seems easier to use and more robust than double pointers.
Does anyone have insight as to which would be preferable?

Both Eigen and Boost.uBLAS define expression hierarchies and abstract matrix data structures that can use any storage class that satisfies certain constraints. These libraries are written so that linear algebra operations can be clearly expressed and efficiently evaluated at a very high level. Both libraries use expression templates heavily, and are capable of doing pretty complicated compile-time expression transformations. In particular, Eigen can also use SIMD instructions, and is very competitive on several benchmarks.
For dense matrices, a common approach is to use a single pointer and keep track of additional row, column, and stride variables (the reason that you may need the third is because you may have allocated more memory than you really need to store x * y * sizeof(value_type) elements because of alignment). However, you have no mechanisms in place to check for out-of-range accessing, and nothing in the code to help you debug. You would only want to use this sort of approach if, for example, you need to implement some linear algebra operations for educational purposes. (Even if this is the case, I advise that you first consider which algorithms you would like to implement, and then take a look at std::unique_ptr, std::move, std::allocator, and operator overloading).

Remember Eigen has a Map capability that allows you to make an Eigen matrix to a contiguous array of data. If it's difficult to completely change the code you have inherited, mapping things to an Eigen matrix at least might make interoperating with raw pointers easier.

Yes definitely, for modern C++ you should be using a container rather than raw pointers.
Eigen
When using Eigen, take note that its fixed size classes (like Vector3d) use optimizations that require them to be properly aligned. This requires special care if you include those fixed size Eigen values as members in structures or classes. You also can't pass them by value, only by reference.
If you don't care about such optimizations, it's trivial enough to disable it: simply add
#define EIGEN_DONT_ALIGN
as the first line of all source files (.h, .cpp, ...) that use Eigen.
The other two options are:
Boost Matrix
#include <boost/numeric/ublas/matrix.hpp>
boost::numeric::ublas::matrix<double> m (3, 3);
std::vector
#include <vector>
std::vector<std::vector<double> > m(3, std::vector<double>(3));

When is precomputing a value using TMP ever actually useful?

Scott Meyers in "Effective C++" points at the ability to do e.g. matrix operations in the compiler as a reason for implementing some of your algorithms in template classes/functions. But these functions can't operate on arguments that are determined at run-time, obviously--they only work for numbers that are written into the program or at best given as arguments to the compiler. Once the program is compiled, it will be using the same output value, every time it is run. In that case why not just calculate that value with a regular (non-templated) program, and just write it in to the original program where necessary? It's not faster to calculate e.g. a 1000-pt. fft in the compiler than it is with a regular program surely.
The best I can come up with is if you need to compile different versions of your program for different clients, then TMP might save you some time. But does this need every actually arise?

The main advantage of TMP when it comes to matrix operations is not the ability to precompute the result of a matrix operation, but rather the ability to optimize the generated code for doing the actual matrix computation at runtime. You are correct - it would be pretty unlikely that you'd ever want to precompute a matrix in the program - but it's salmon to want to optimize matrix math at compile-time before the program begins running. For example, consider this code:
Matrix a, b, c;
/* ... Initialize these matrices ... */
Matrix d = a + b + c;
This last line uses some overloaded operators to compute a matrix expression. Using traditional C++ programming techniques, this would work as follows:
Compute b * c, returning a temporary matrix object holding the copy.
Compute a + b + c, again returning a temporary copy.
Copy the result into d.
This is slow - there's no good reason to make any copies of any values here. instead we should just for loop over all indices in the matrices and sum up all the values we find. However, using a TMP technique called expression templates, it's possible to implement these operators in a way that actually does this computation in the intelligent, optimized way rather than the slow, standard way. It's this family of techniques that I think Meyers was referring to in the book.
The most well-known examples of TMP are simple programs to precompute values at compile time, but in practice it's much more complex techniques like these that actually get used in practice.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js