Scott Meyers in "Effective C++" points at the ability to do e.g. matrix operations in the compiler as a reason for implementing some of your algorithms in template classes/functions. But these functions can't operate on arguments that are determined at run-time, obviously--they only work for numbers that are written into the program or at best given as arguments to the compiler. Once the program is compiled, it will be using the same output value, every time it is run. In that case why not just calculate that value with a regular (non-templated) program, and just write it in to the original program where necessary? It's not faster to calculate e.g. a 1000-pt. fft in the compiler than it is with a regular program surely.
The best I can come up with is if you need to compile different versions of your program for different clients, then TMP might save you some time. But does this need every actually arise?
The main advantage of TMP when it comes to matrix operations is not the ability to precompute the result of a matrix operation, but rather the ability to optimize the generated code for doing the actual matrix computation at runtime. You are correct - it would be pretty unlikely that you'd ever want to precompute a matrix in the program - but it's salmon to want to optimize matrix math at compile-time before the program begins running. For example, consider this code:
Matrix a, b, c;
/* ... Initialize these matrices ... */
Matrix d = a + b + c;
This last line uses some overloaded operators to compute a matrix expression. Using traditional C++ programming techniques, this would work as follows:
Compute b * c, returning a temporary matrix object holding the copy.
Compute a + b + c, again returning a temporary copy.
Copy the result into d.
This is slow - there's no good reason to make any copies of any values here. instead we should just for loop over all indices in the matrices and sum up all the values we find. However, using a TMP technique called expression templates, it's possible to implement these operators in a way that actually does this computation in the intelligent, optimized way rather than the slow, standard way. It's this family of techniques that I think Meyers was referring to in the book.
The most well-known examples of TMP are simple programs to precompute values at compile time, but in practice it's much more complex techniques like these that actually get used in practice.
Related
I'm using the Eigen::LevenbergMarquardt solver for a model fitting application. The functor I'm providing to it includes functions to compute the error vector and Jacobian. These functions contain a lot of similar code and some costly calculations are repeated duplicated.
The prototype of the () operator used to compute the error vector includes what appears to be an optional pointer to a Jacobian matrix. If the Eigen::LevenbergMarquardt solver can be setup to compute the error vector and Jacobian at the same time in relevant cases it would really speed up my algorithm.
int operator()(const Eigen::VectorXf& z, Eigen::VectorXf& fvec, Eigen::MatrixXf* _j = 0)
I have not found any documentation describing this _j parameter or how it can be used. Checking its value during my code shows it is always a NULL pointer.
Does anyone know what this parameter is used for and if it's possible to compute the error vector and Jacobian simultaneously when both are needed?
Not sure about this particular solver but these kind of parameters are commonly only used whenever the solver needs it. Maybe it is just an extension point for the future. Looking at the source code, the LM solver never calls the functor with that parameter.
I think a better approach in your case would be to cache redundant parts of the computation within your functor. Maybe just keep a copy of the input vector and do a quick memcmp before doing the computation. Not ideal but since the interface has no way of telling you when the inputs change, that's probably the most robust option you have.
I started writing an oop wrapper for Intels MKL library and came across some design issues. I hope you can help me find the "best" way to handle these issues. The issues are mainly concerning operator overloading and are not critical two the wrapper but effect readability and/or performance.
The first issue is overloading operators considering how the blas functions are defined. As an example, matrix multiplication is defined as
( being matrices, scalars).
Now i can overload , and alone, but for the implementation of BLAS I would need 4 function calls using overloaded operators instead of one. Or i could use a normal function call (which will be implemented anyway), but lose the "natural" way of writing the equation using overloaded operators, making it less readable (but still more readible than with those horrible BLAS names).
The second issue is read and write access to the matrices. As example we can consider the following upper triangular matrix:
This matrix would be stored efficiently in a 1D array like this (order may vary depending on row/column major order):
Since a matrix has two indices, the easiest way to overload reading would be using
<TYPE> & operator() (size_t row, size_t column);
instead of some work around with subscript operators. The problem is handling the zeros. They may not be stored in the array, but mathematically they exist. If I want to read these values in another function (not MKL) I may need to be able to return the zero to handle this (aside from storing the matrix type, which is done for BLAS anyway).
Since () returns a reference, I can't return 0. I could make a dummy variable, but if I were to write to that value, I wouldn't have a upper triangular matrix anymore. So I would have to either change the matrix type, forbid writing to these elements, or ignore it (bad idea).
To change the matrix type I would need to detect writing, that would require explicitly using some kind of proxy object.
To prevent writing, I would probably have to do the same since I can't return a const value because the overload doesn't fit that definition. Alternatively I could forbid writing this way in general, but then I couldn't change the existing matrix itself, which I don't want.
I hope you can give me some pointers on how to handle these issues and what design principles I may be forgetting/should take into account. As I said, they are not critical (I can write appropriate functions for everything instead of operators).
T
I wrote a library for medical image reconstruction https://github.com/kvahed/codeare. The matrix object there has a lot of overloaded operators and convenience function to allow one to write efficiently matlab-like code in c++.
What you want to do for passing the data between MKL and other libraries / algorithms is in my view impossible. How do you want to distinguish 0 from 1e-18. What when you want to go to some numeric optimisation etc. This is premature optimisation that you are looking at. Even if you wanted to use sparsity, you could only do it say column-wise or row-wise, or like above note down, that you have an upper triangular form. But skipping individual 0s. Crazy. Of course copying 0s around doesn't feel right, but getting your algorithms optimised first and then worry about the above would be the way I'd go.
Also don't forget, that a lot of libraries out there cannot handle sparse matrixes, at which point you would have to put in place a recopying of the non-zero part or some bad ass expensive iterator, that would deliver the results.
Btw you would not only need the operator you noted down in your question but also the const variant; in other words:
template <typename T> class Matrix {
...
T& operator()(size_t n, size_t m);
const T& operator()(size_t n, size_t m) const;
...
};
There is so much more expensive stuff to optimise than std::copying stuff. For example SIMD intrinsics ...
https://github.com/kvahed/codeare/blob/master/src/matrix/SIMDTraits.hpp
I'm new to C++ and I think a good way for me to jump in is to build some basic models that I've built in other languages. I want to start with just Linear Regression solved using first order methods. So here's how I want things to be organized (in pseudocode).
class LinearRegression
LinearRegression:
tol = <a supplied tolerance or defaulted to 1e-5>
max_ite = <a supplied max iter or default to 1k>
fit(X, y):
// model learns weights specific to this data set
_gradient(X, y):
// compute the gradient
score(X,y):
// model uses weights learned from fit to compute accuracy of
// y_predicted to actual y
My question is when I use fit, score and gradient methods I don't actually need to pass around the arrays (X and y) or even store them anywhere so I want to use a reference or a pointer to those structures. My problem is that if the method accepts a pointer to a 2D array I need to supply the second dimension size ahead of time or use templating. If I use templating I now have something like this for every method that accepts a 2D array
template<std::size_t rows, std::size_t cols>
void fit(double (&X)[rows][cols], double &y){...}
It seems there likely a better way. I want my regression class to work with any size input. How is this done in industry? I know in some situations the array is just flattened into row or column major format where just a pointer to the first element is passed but I don't have enough experience to know what people use in C++.
You wrote a quite a few points in your question, so here are some points addressing them:
Contemporary C++ discourages working directly with heap-allocated data that you need to manually allocate or deallocate. You can use, e.g., std::vector<double> to represent vectors, and std::vector<std::vector<double>> to represent matrices. Even better would be to use a matrix class, preferably one that is already in mainstream use.
Once you use such a class, you can easily get the dimension at runtime. With std::vector, for example, you can use the size() method. Other classes have other methods. Check the documentation for the one you choose.
You probably really don't want to use templates for the dimensions.
a. If you do so, you will need to recompile each time you get a different input. Your code will be duplicated (by the compiler) to the number of different dimensions you simultaneously use. Lots of bad stuff, with little gain (in this case). There's no real drawback to getting the dimension at runtime from the class.
b. Templates (in your setting) are fitting for the type of the matrix (e.g., is it a matrix of doubles or floats), or possibly the number of dimesions (e.g., for specifying tensors).
Your regressor doesn't need to store the matrix and/or vector. Pass them by const reference. Your interface looks like that of sklearn. If you like, check the source code there. The result of calling fit just causes the class object to store the parameter corresponding to the prediction vector β. It doesn't copy or store the input matrix and/or vector.
I have a 2d array. I need to perform a few operations on it as fast as possible (function will be called a dozen of times per second, so It would be nice to make it efficient).
Now, let's say I want to get element A[i][j], is there any difference in speed between simply using A[i][j] and *(A+(i*width+j)) (ignoring the fact that I need to calculate i*width+j, let's say I already have this value)?
With all the optimizations turned on, there should be no difference - not only in the timing, but also in the code the compiler generates for these two constructs.
The biggest difference from a programmer's point of view is readability. The first construct immediately tells the reader that he's dealing with a 2D array, while the second one requires some thinking (is it a row-major order, or a column-major order? Where is the width calculated? What was the reason to choose this way over a more obvious 2D array syntax?). That is why the first construct is preferable in real-life scenarios.
Depending on the quality of compiler, I think the [] notation can result in faster code. The reason is that when you use pointers, the compiler can't be sure that pointer aliasing is not occurring and this can preclude certain optimizations.
On the other hand, if the [] notation is used, those concerns do not apply and the compiler can get more aggressive with applying optimizations.
I'm trying to build a polynomial function generator, so that it takes a vector (arbitrary size) as argument, and generates a polynomial function I can use later.
for instance,
poly_gen(vector<int> power_index)
returns a function (or by other method) in forms of (that i can call with another function)
y(k)=a0+ a1*n+ a2*n^2 + a3*n^3 + ... + ak*n^k
where a0,a1....ak are stored in the vector- power_index
and later I can call it with
int calc_poly(int n)
and this calc_poly can return me a number, calculated by using the polynomial expression generated by poly_gen()
PS:
I don't know how to search this question by key words.
function,construction, generator, pointer, functor...
didn't give me the desired results.
thank you all!
You can't generate functions at runtime in C++, so you're going to have to go with a functor.
You can create an object that stores the coefficients given by power_index in some manner (perhaps a direct copy), and give it an operator() (int n) operator that will take the coefficients and calculate the value of the polynomial (Horner's rule?). Then you can pass that object around freely.
So, you need a constructor, an internal representation of the coefficients, and an operator() that does the actual calculation. Should be simple enough.
There was a nice "rosetta stone" question on (almost) this very problem a while back.
There are several C++ answers there: one using boost::lambda, one using a more conventional approach, and one using MPL (and also a C++0x version, which IMHO would be the ideal solution if your compiler supports it). Obviously the simple quadratic there will need generalizing to arbitrary numbers of powers but that's simple enough compared with getting your head around the function object concept.