I am pretty competent with Python, but I'm pretty new to C++ and things like pointers. I try to write some codes for solving ODE with the Eigen package for linear algebra (I will need to deal with lots of matrices later, so I plan to start with it). I have the following code for RK4 and they work:
#include "../eigen-eigen-b3f3d4950030/Eigen/Dense"
using namespace Eigen;
VectorXd Func(const VectorXd& a)
{ // equations for solving simple harmonic oscillator
Vector2d ans;
ans(0) = a(1); // dy/dt
ans(1) = -a(0); // d2y/dt2
return ans;
}
MatrixXd RK4(VectorXd Func(const VectorXd& y), const Ref<const VectorXd>& y0, double h, int step_num)
{
MatrixXd y(step_num, y0.rows());
y.row(0) = y0;
for (int i=1; i<step_num; i++){
VectorXd y_old = y.row(i-1).transpose();
VectorXd k1 = h*Func(y_old);
VectorXd k2 = h*Func(y_old+k1/2);
VectorXd k3 = h*Func(y_old+k2/2);
VectorXd k4 = h*Func(y_old+k3);
VectorXd dy = (k1 + 2*k2 + 2*k3 + k4)/6;
y.row(i) = y.row(i-1) + dy.transpose();
}
return y;
}
int main()
{
Vector2d v1;
v1(0) = 1.4; v1(1) = -0.1;
double h = 0.1;
int step_num = 50;
MatrixXd sol = RK4(Func,v1,h,step_num);
return 0;
}
I have the following questions:
What's the meaning of & in the function argument? Pass by reference? I just copied the code from the official documentation, but I'm not too sure if I understand every bit in the function arguments of RK4 such as VectorXd Func(const VectorXd& y). Are there alternative ways of accepting Eigen::MatrixXd and functions which accept Eigen::MatrixXd as function arguments?
From what I understand, we cannot return a whole 2D array from a function, and what we are returning is just the first element of the array (correct me if I'm wrong). What about the Eigen::MatrixX? What are we actually passing/returning? The first element of the matrix, or a completely new object defined by the Eigen library?
I'm not sure if these codes are written efficiently. Are there anything I can do to optimize this part? (Just wondering if I have done anything that may significantly slow down the speed).
Thanks
Yes, & is pass-by-reference; The latter one is syntax for passing a function, that takes a vector by reference and returns a vector. An Eigen::Matrix should always be passed by reference. There are tons of ways to pass one function to another, the most idiomatic ones in C++ are probably template arguments and std::function.
You can't have multiple return arguments, but you can return a pair or a tuple or a Matrix object. RK4 returns a whole matrix.
The code is fairly efficient. If it was really performance-critical there might be a few things that could be optimized, but I would not worry for now.
The biggest point is that RK4 is very general and works with dynamically sized types, which are a lot more expensive than their statically sized counter parts (VectorXf vs Vector2d). But this would require you to create a specialized version for all dimensions you are interested in or to get the compiler to do it for you by using templates.
Generally: Read a good book to get you started.
Related
Consider the following code:
#include <Eigen/Core>
using Matrix = Eigen::Matrix<float, 2, 2>;
Matrix func1(const Matrix& mat) { return mat + 0.5; }
Matrix func2(const Matrix& mat) { return mat / 0.5; }
func1() does not compile; you need to replace mat with mat.array() in the function body to fix it ([1]). However, func2() does compile as-is.
My question has to do with why the API is designed this way. Why is addition-with-scalar and division-by-scalar treated differently? What problems would arise if the following method is added to the Matrix class, and why haven't those problems arisen already for the operator/ method?:
auto operator+(Scalar s) const { return this->array() + s; }
From a mathematics perspective, a scalar added to a matrix "should" be the same as adding the scalar only to the diagonal. That is, a math text would usually use M + 0.5 to mean M + 0.5I, for I the identity matrix. There are many ways to justify this. For example, you can appeal to the analogy I = 1, or you can appeal to the desire to say Mx + 0.5x = (M + 0.5)x whenever x is a vector, etc.
Alternatively, you could take M + 0.5 to add 0.5 to every element. This is what you think is right if you don't think of matrices from a "linear algebra mindset" and treat them as just collections (arrays) of numbers, where it is natural to just "broadcast" scalar operations.
Since there are multiple "obvious" ways to handle + between a scalar and a matrix, where someone expecting one may be blindsided by the other, it is a good idea to do as Eigen does and ban such expressions. You are then forced to signify what you want in a less ambiguous way.
The natural definition of / from an algebra perspective coincides with the array perspective, so no reason to ban it.
I am trying to compute colsum(N * P), where N is a sparse, 1M by 2500 matrix, and P is a dense 2500 by 1.5M matrix. I am using the Eigen C++ library with Intel's MKL library. The issue is that the matrix N*P can't actually exist in memory, it's way too big (~10 TB). My question is whether Eigen will be able to handle this computation through some combination of lazy evaluation and parallelism? It says here that Eigen won't make temporary matrices unnecessarily: http://eigen.tuxfamily.org/dox-devel/TopicLazyEvaluation.html
But does Eigen know to compute N * P in piecewise chunks that will actually fit in memory? IE: it will have to do something like colsum(N * P_1) ++ colsum(N * P_2) ++ .. ++ colsum(N * P_n), where P is split into n different submatrices column-wise and "++" is concatenation.
I am working with 128 GB RAM.
I gave it a try but ended up with a bad malloc (I'm only running on 8GB on Win8). I set up my main() and used a not inline colsum function I wrote.
int main(int argc, char *argv[])
{
Eigen::MatrixXd dense = Eigen::MatrixXd::Random(1000, 100000);
Eigen::SparseMatrix<double> sparse(100000, 1000);
typedef Triplet<int> Trip;
std::vector<Trip> trps(dense.rows());
for(int i = 0; i < dense.rows(); i++)
{
trps[i] = Trip(20*i, i, 2);
}
sparse.setFromTriplets(trps.begin(), trps.end());
VectorXd res = colsum(sparse, dense);
std::cout << res;
std::cin >> argc;
return 0;
}
The attempt was simply:
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
return (sparse * dense).colwise().sum();
}
That had a bad malloc. Sol it looks like you have to split it up manually on your own (unless someone else has a better solution).
EDIT
I improved the function a bit, but the get the same bad malloc:
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
return (sparse * dense).topRows(4).colwise().sum();
}
EDIT 2
Another option would be to make the sparse matrix dense and force a lazy evaluation. I don't think that it would work with a sparse matrix (oh well).
__declspec(noinline) VectorXd
colsum(const Eigen::SparseMatrix<double> &sparse, const Eigen::MatrixXd &dense)
{
Eigen::MatrixXd denseSparse(sparse);
return denseSparse.lazyProduct(dense).colwise().sum();
}
This doesn't give me the bad malloc, but computes a lot of pointless 0*x_i expressions.
To answer your question: Especially, when products are involved, Eigen often evaluates parts of expressions into temporaries. In some situations this could be optimized but is not implemented yet, in some cases this is essentially the most efficient way to implement it.
However, in your case you could simply calculate the colsum of N (a 1 x 2500 vector) and multiply that by P.
Maybe future versions of Eigen will be able to make this kind of optimization themselves, but most of the time it is a good idea to make problem-specific optimizations oneself before letting the computer do the rest of the work.
Btw: I'm afraid sparse.colwise() is not implemented yet, so you must compute that manually. If you are lazy, you can instead compute Eigen::RowVectorXd Nsum = Eigen::RowVectorXd::Ones(N.rows())*P; (I have not checked it, but this might actually get optimized to near optimal code, with the most recent versions of Eigen).
I have started with Rcpp and I am working through Hadley's book / page here.
I guess these basics are more than enough for me, still though I missed, some aspect or feel that this might be less basic:
How can I assign attributes to an arbitrary R Object using C++?
E.g.:
// [[Rcpp::export]]
NumericVector attribs(CharacterVector x,NumericVector y) {
NumericVector out = y;
out.attr("my-attr") = x;
return out;
}
I understand I have to specify the type in C++, but still I wonder whether there's a way to assign an attribute to ANY R object that I pass...
I have seen that settatr in the data.table works with C++, but seems to work only with elements of class data.table. Is there any way but writing an extra function for every R mode / class?
EDIT: The ultimate purpose is to speed up assigning attributes to each element of a list.
We had discussion here previously – but it did not involve Rcpp so far (except for using it via other packages.)
Maybe you want something like this? RObject is the generic class for all R objects. Note the use of clone so that you don't accidentally modify the object passed in.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector set_attr(CharacterVector x, RObject y) {
CharacterVector new_x = clone(x);
new_x.attr("my-attr") = y;
return new_x;
}
/*** R
x <- c("a", "b", "c")
set_attr(x, 1)
set_attr(x, "a")
attributes(x)
*/
Pardon my enthusiam: It's simply amazing how Rcpp helps an absolute novice to speed up code like that!
That's why I gave it a try though Hadley's answer perfectly covers the question. I tried to turn the input given here into a solution for the more specific case of adding attributes to a list of objects as fast as possible.
Even though my code is probably far from perfect I was already able to outperform all
functions suggested in the discussion, including data.table's setattr. I guess this is probably due to the fact that I let C++ not only to do the assignment but the looping as well.
Here's the example and benchmark:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
RObject fx(List x, CharacterVector y){
int n = x.size();
NumericVector new_el;
for(int i=0; i<n; i++) {
new_el = x[i];
new_el.attr("testkey") = y;
x[i] = new_el;
}
return(x);
}
I need a 3D matrix/array structure on my code, and right now I'm relying on Eigen for both my matrices and vectors.
Right now I am creating a 3D structure using new:
MatrixXd* cube= new MatrixXd[60];
for (int i; i<60; i++) cube[i]=MatrixXd(60,60);
and for acessing the values:
double val;
MatrixXd pos;
for (int i; i<60; i++){
pos=cube[i];
for (int j; j<60; j++){
for (int k; k<60; k++){
val=pos(j,k);
//...
}
}
}
However, right now it is very slow in this part of the code, which makes me beleive that this might not be the most efficient way. Are there any alternatives?
While it was not available, when the question was asked, Eigen has been providing a Tensor module for a while now. It is still in an "unsupported" stage (meaning the API may change), but basic functionality should be mostly stable. The documentation is scattered here and here.
A solution I used is to form a fat matrix containing all the matrices you need stacked.
MatrixXd A(60*60,60);
and then access them with block operations
A0 = A.block<60,60>(0*60,0);
...
A5 = A.block<60,60>(5*60,0);
An alternative is to create a very large chunk of memory ones, and maps Eigen matrices from it:
double* data = new double(60*60 * 60*60*60);
Map<MatrixXd> Mijk(data+60*(60*(60*k)+j)+i), 60, 60);
At this stage you can use Mijk like a MatrixXd object. However, since this not a MatrixXd type, if you want to pass it to a function, your function must either:
be of the form foo(Map<MatrixXd> mat)
be a template function: template<typename Der> void foo(const MatrixBase<Der>& mat)
take a Ref<MatrixXd> object which can handle both Map<> and Matrix<> objects without being a template function and without copies. (doc)
I am doing some C++ computational mechanics (don't worry, no physics knowledge required here) and there is something that really bothers me.
Suppose I want to represent a 3D math Vector (nothing to do with std::vector):
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[0] = x;
coordinates[1] = y;
coordinates[2] = z;
}
private:
double coordinates[3];
};
So far so good. Now I can overload operator[] to extract coordinates:
double& Vector::operator[](int i) {
return coordinates[i] ;
}
So I can type:
Vector V;
… //complex computation with V
double x1 = V[0];
V[1] = coord2;
The problem is, indexing from 0 is NOT natural here. I mean, when sorting arrays, I don't mind, but the fact is that the conventionnal notation in every paper, book or whatever is always substripting coordinates beginning with 1.
It may seem a quibble but the fact is that in formulas, it always takes a double-take to understand what we are taking about. Of course, this is much worst with matrices.
One obvious solution is just a slightly different overloading :
double& Vector::operator[](int i) {
return coordinates[i-1] ;
}
so I can type
double x1 = V[1];
V[2] = coord2;
It seems perfect except for one thing: this i-1 subtraction which seems a good candidate for a small overhead. Very small you would say, but I am doing computationnal mechanics, so this is typically something we couldn't afford.
So now (finally) my question: do you think a compiler can optimize this, or is there a way to make it optimize ? (templates, macro, pointer or reference kludge...)
Logically, in
double xi = V[i];
the integer between the bracket being a literal most of the time (except in 3-iteration for loops), inlining operator[] should make it possible, right ?
(sorry for this looong question)
EDIT:
Thanks for all your comments and answers
I kind of disagree with people telling me that we are used to 0-indexed vectors.
From an object-oriented perspective, I see no reason for a math Vector to be 0-indexed because implemented with a 0-indexed array. We're not suppose to care about the underlying implementation. Now, suppose I don't care about performance and use a map to implement Vector class. Then I would find it natural to map '1' with the '1st' coordinate.
That said I tried out with 1-indexed vectors and matrices, and after some code writing, I find it not interacting nicely every time I use an array around. I thougth Vector and containers (std::array,std::vector...) would not interact often (meaning, transfering data between one another), but it seems I was wrong.
Now I have of a solution that I think is less controversial (please give me your opinion) :
Every time I use a Vector in some physical context, I think of using an enum :
enum Coord {
x = 0,
y = 1,
z = 2
};
Vector V;
V[x] = 1;
The only disadvantage I see being that these x,y and z can be redefined without enven a warning...
This one should be measured or verified by looking at the disassembly, but my guess is: The getter function is tiny and its arguments are constant. There is a high chance the compiler will inline the function and constant-fold the subtraction. In that case the runtime cost would be zero.
Why not to try this:
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[1] = x;
coordinates[2] = y;
coordinates[3] = z;
}
private:
double coordinates[4];
};
If you are not instantiating your object in quantities of millions, then the memory waist might be affordable.
Have you actually profiled it or examined the generated code? That's how this question is answered.
If the operator[] implementation is visible then this is likely to be optimized to have zero overhead.
I recommend you define this in the header (.h) for your class. If you define it in the .cpp then the compiler can't optimize as much. Also, your index should not be an "int" which can have negative values... make it a size_t:
class Vector {
// ...
public:
double& operator[](const size_t i) {
return coordinates[i-1] ;
}
};
You cannot say anything objective about performance without benchmarking. On x86, this subtraction can be compiled using relative addressing, which is very cheap. If operator[] is inlined, then the overhead is zero—you can encourage this with inline or with compiler-specific instructions such as GCC’s __attribute__((always_inline)).
If you must guarantee it, and the offset is a compile-time constant, then using a template is the way to go:
template<size_t I>
double& Vector::get() {
return coordinates[i - 1];
}
double x = v.get<1>();
For all practical purposes, this is guaranteed to have zero overhead thanks to constant-folding. You could also use named accessors:
double Vector::x() const { return coordinates[0]; }
double Vector::y() const { return coordinates[1]; }
double Vector::z() const { return coordinates[2]; }
double& Vector::x() { return coordinates[0]; }
double& Vector::y() { return coordinates[1]; }
double& Vector::z() { return coordinates[2]; }
And for loops, iterators:
const double* Vector::begin() const { return coordinates; }
const double* Vector::end() const { return coordinates + 3; }
double* Vector::begin() { return coordinates; }
double* Vector::end() { return coordinates + 3; }
// (x, y, z) -> (x + 1, y + 1, z + 1)
for (auto& i : v) ++i;
Like many of the others here, however, I disagree with the premise of your question. You really should simply use 0-based indexing, as it is more natural in the realm of C++. The language is already very complex, and you need not complicate things further for those who will maintain your code in the future.
Seriously, benchmark this all three ways (ie, compare the subtraction and the double[4] methods to just using zero-based indices in the caller).
It's entirely possible you'll get a huge win from forcing 16-byte alignment on some cache architectures, and equally possible the subtraction is effectively free on some compiler/instruction set/code path combinations.
The only way to tell is to benchmark realistic code.