In my program -- which uses the Eigen library -- I need to operate on 2D vectors. In my inner loop I have the following function:
static inline double eval(double x, double y, double xi, double yi)
{
const double invlen2 = 1/(x*x + y*y);
const double invlen4 = invlen2*invlen2;
const double invlen6 = invlen4*invlen2;
const double x2 = x*x, y2 = y*y;
const double x3 = x2*x, y3 = y2*y;
const double xi2 = xi*xi, yi2 = yi*yi;
return x*invlen2 + invlen4*(x2*xi + 2*x*y*yi - xi*y2)
+ invlen6*(x3*xi2 + 3*x*y2*yi2 + 6*x2*y*xi*yi - 3*x*xi2*y2 - 2*y3*xi*yi - x3*yi2);
}
void f(Vector2d& out, const Vector2d& R, const Vector2d& r)
{
out.x() = eval(R.x(), R.y(), r.x(), r.y());
out.y() = eval(R.y(), R.x(), r.y(), r.x());
}
This expression, although messy, seems like a prime candidate for vectorisation as both the x() and y() computations follow identical paths. My question is how to do it with Eigen without needing to manually drop down to assembly.
This answer has nothing to do with Eigen, but since you mentioned manually dropping down to assembly, I'll add this.
You don't need to use assembly to vectorize code. There are compiler intrinsics that will let manually vectorize without assembly:
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_overview.htm#intref_overview
That said: It looks like Eigen already has internal support for vectorization, but it doesn't appear to be applicable in your example. So I can see why you want to do it manually.
Related
Due to the fact that i plan to pass numpy arrays into my C++ code with pybind11, naturally i would like to compute with Row Major matrices. I found a (one liner) implementation of the squared euclidean distance on stack
typedef Eigen::MatrixXd Matrix;
void squared_dist(const Matrix& X1, const Matrix& X2, Matrix& D) {
D = ((-2 * X1.transpose() * X2).colwise() + X1.colwise().squaredNorm().transpose()).rowwise() + X2.colwise().squaredNorm();
}
But this requires X1, X2, and D to be the default Column Major Matrix. How would i implement a similar one-liner for Row Major Matrices?
You can use a templated version of that one-liner function so that it can accept RowMajor as well as ColMajor Eigen::Matrix arguments:
template<class T>
void squared_dist(const T& X1, const T& X2, T& D) {
D = ((-2 * X1.transpose() * X2).colwise() + X1.colwise().squaredNorm().transpose()).rowwise() + X2.colwise().squaredNorm();
}
This godbolt demo shows how that function can be used in a code.
I need to solve an ODE that has stiff behaviour. Said ODE would be given by
int odefunc (double x, const double y[], double f[], void *params)
{
double k=1e12; //At k=1e7 it works (doesn't show stiffness)
double y_eq=-1.931+1.5*log(x)-x;
f[0] = k*(exp(2*y_eq-y[0])-exp(y[0]));
return GSL_SUCCESS;
}
I'm not so used to C (I am to C++, though) and the documentation in the GSL page for Differential Equations doesn't have that many comments for me to understand it. I've used this post to understand it (also because I only have 1 ode, not a system of them), and to some extent it has been useful, but it's still a bit confusing to me. For my tests I've used my function instead of the one given, and I've gone with it although I don't understand the code. But now it was key for me to understand, because...
I would need to modify the examples they give me, I need to use an algorithm that solves stiff equations, preferably gsl_odeiv2_step_msbdf with an adaptative step-size or something equivalent to that (the BDF method seems to be used profusely in the academia). That needs the jacobian, and that part is really intricate if you're not used to GSL or C.
To manually implement an algorithmic differentiation it helps to include explicit intermediary steps
int odefunc (double x, const double y[], double f[], void *params)
{
double k=1e12; //At k=1e7 it works (doesn't show stiffness)
double y_eq=-1.931+1.5*log(x)-x;
double v1 = exp(2*y_eq-y[0]);
double v2 = exp(y[0]);
double z = k*(v1-v2);
f[0] = z;
return GSL_SUCCESS;
}
Then you can enhance each step with its derivative propagation
int jac (double x, const double y[], double *dfdy,
double dfdx[], void *params)
{
double k=1e12; //At k=1e7 it works (doesn't show stiffness)
// Dx_dx=Dy_dy=1, Dx_dy=Dy_dx=0 are used without naming them
double y_eq = -1.931+1.5*log(x)-x,
Dy_eq_dx=1.5/x-1,
Dy_eq_dy=0;
double v1 = exp(2*y_eq-y[0]),
Dv1_dx=v1*(2*Dy_eq_dx-0),
Dv1_dy=v1*(2*Dy_eq_dy-1);
double v2 = exp(y[0]),
Dv2_dx=v2*(0),
Dv2_dy=v2*(1);
double z = k*(v1-v2),
Dz_dx=k*(Dv1_dx-Dv2_dx),
Dz_dy=k*(Dv1_dy-Dv2_dy);
dfdx[0] = Dz_dx;
dfdy[0] = Dz_dy;
return GSL_SUCCESS;
}
For larger code use a code-rewriting tool that does these steps automatically. auto-diff.org should have a list of appropriate ones.
I'm only beginning in C++ and I'm struggling to understand some code from a custom vector class in an article I'm working through. The author writes it as:
class vec3
{
public:
vec3() {}
vec3(float e0, float e1, float e2)
{
e[0] = e0;
e[1] = e1;
e[2] = e2;
}
(...)
But so far I've only seen class definitions where the types of data it holds are defined, such as:
class vec3
{
public:
float m_x;
float m_y;
float m_z;
vec3(float x, float y, float z) : m_x(x), m_y(y), m_z(z)
{}
My guess was that the code in the article is creating an empty vector which it then populates with floats or there was something assumed in the definition. Is this just a syntax difference or is there something more fundamental that I'm missing? Apologies for what seems like a basic question but I couldn't find any similar questions. . . it might be too basic for that! But I just wanted to understand it before I moved on.
Thanks,
Paddy
In the code you've posted, you are correct that there is no declaration for the variable e anywhere. I'm not sure if this is because you didn't post that part of the code from the book, or if the book omitted that for brevity.
Without knowing what the book author was meaning by e, I don't want to suggest a completion of the code. There are several things that e could be declared as that would be compatible with the code you've posted.
It defines e as just a float array [e[3]].
With this information added, then there is no relevant difference between three separate members and an array. In the end, both variants will require the same amount of memory, and in most cases, the (optimised!) code generated by the compiler will be exactly the same:
float getY()
{
return m_y;
// will result in address of object + offset to m_y
}
float getY()
{
return e[1];
// will result in address of object + offset to e + offset to second element
// as both offsets are constant, they will be joined to one single summand
// and we are back at first variant...
}
One of the few things you can do with arrays but not with separate members is having a loop, such as the following:
float magnitude()
{
float sum = 0.0F;
for(auto f : e)
sum += f*f;
return sqrtf(sum);
}
However, for such short loops, loop unrolling is pretty likely, and the code generated again is with high probability equivalent to the separate member variant:
float magnitude()
{
return sqrtf(m_x * m_x + m_y * m_y + m_z * m_z);
}
With an array, you could pass all three members in one single parameter to other functions (as pointer to first element), with separate members, you'd have to pass all of them separately (well, there are ways around, but they either require extra effort or are "dirty"...).
I've written a kd-tree template, it's parameter being a natural number K.
As part of the template, I've written the following function to compute the distance between two points (kd_point is an alias for std::array)
template <unsigned K>
float kd_tree<K>::DistanceSq(const kd_point &P, const kd_point &Q)
{
float Sum = 0;
for (unsigned i = 0; i < K; i++)
Sum += (P[i] - Q[i]) * (P[i] - Q[i]);
return Sum;
}
I've turned "Enable C++ Core Check (Release)" on, and it gives me said warning. Is there a right way to write this routine to eliminate the warning?
Since you mention in comments that your kd_point's support range based iteration (so I assume can return iterators), you can re-write the function without the raw loop. Use named algorithms from the standard library instead:
template <unsigned K>
float kd_tree<K>::DistanceSq(const kd_point &P, const kd_point &Q)
{
return std::inner_product(
begin(P), end(P), begin(Q), 0.0f, std::plus<float>{},
[](float pi, float qi) {
return (pi - qi)*(pi - qi);
}
);
}
The standard library would be exempt from the warning, of course. If the (in this case) marginal benefit of replacing a raw loop by a named operation doesn't appeal to you, consider that if you ever come back to this code with a C++17 enabled compiler, you'll be able to almost effortlessly parallelize it:
template <unsigned K>
float kd_tree<K>::DistanceSq(const kd_point &P, const kd_point &Q)
{
return std::transform_reduce(std::execution::par, // Parallel execution enabled
begin(P), end(P), begin(Q), 0.0f, std::plus<float>{},
[](float pi, float qi) {
return (pi - qi)*(pi - qi);
}
);
}
Answer by StoryTeller is probably the most suitable C++ way to solve this particular task.
I would like to add that in general, if you want to iterate not over one, but over two sequences simultaneously, you can use "secret overload of boost::range::for_each", accepting two ranges:
#include <boost/range/algorithm_ext/for_each.hpp>
template <unsigned K>
float kd_tree<K>::DistanceSq(const kd_point &P, const kd_point &Q)
{
float Sum = 0;
boost::range::for_each(P, Q, [&Sum](float p, float q)
{
Sum += (p - q) * (p - q);
});
return Sum;
}
Note that similarly to standard algorithms, this algorithm is header-only and won't bring any library dependency to your code.
Lets say I have the following:
struct point
{
double x;
double y;
double z;
};
I can write the following:
void point_mult(point& p, double c) { p.x *= c; p.y *= c; p.z *= c; }
void point_add(point& p, const point& p2) { p.x += p2.x; p.y += p2.y; p.z += p2.z; }
So I can then do the following:
point p{1,2,3};
point_mult(p, 2);
point_add(p, point{4,5,6});
This requires no copies of point, and only two constructions, namely the construction of point{1,2,3} and the construction of point{4,5,6}. I believe this applies even if point_add, point_mult and point{...} are in separate compilation units (i.e. can't be inlined).
However, I'd like to write code in a more functional style like this:
point p = point_add(point_mult(point{1,2,3}, 2), point{4,5,6});
How can I write point_mult and point_add such that no copies are required (even if point_mult and point_add are in separate compilation units), or is functional style forced to be not as efficient in C++?
Let's ignore the implicit fallacy of the question (namely that copying automatically means reduced efficiency). And let's also ignore the question of whether any copying would actually happen, or whether it would all be elided away by any half-decent compiler. Let's just take it on face value: can this be done without copying?
Yes, and it is probably the only other legitimate use for r-value references (though the previously ignored stipulations make this use case dubious):
point &&point_mult(point &&p, double c);
Of course, this will only bind to temporaries. So you would need an alternate version for l-values:
point &point_mult(point &p, double c);
The point is that you pass the references through as they are, either as references to temporaries or references to l-values.
It can be done with really ugly template metaprogramming. For example eigen uses templates so that expressions like matrix1 + matrix2 * matrix3 don't need to create any temporaries. The gist of how it works is the + and * operators for matrices don't return Matrix objects but instead return some kind of matrix expression object which is templatized on the types of the expression paramaters. This matrix expression object can then compute parts of the expression only when they are needed instead of creating temporary objects to store the result of subexpressions.
Actually implementing this can get quite messy. Have a look at Eigen's source if your interested. Boost's uBlas also does something similar, though it's not as extensively as eigen.
An efficient (and generalized) technique is expression templates. You can read a nice introductory explanation here.
It's difficult to implement and being based on templates, you cannot use separate compilation units, but it's very efficient. An interesting application in symbolic computation is parsing: Boost.Spirit builds very efficient parsers out of them.
C++11 auto keywords helps usage on practical programming tasks, as always when dealing with complex types, see this other answer.
First, why not use "better" functions ?
struct Point {
double x;
double y;
double z;
Point& operator+=(Point const& right) {
x += right.x; y += right.y; z += right.z;
return *this;
}
Point& operator*=(double f) {
x *= f; y *= f; z *= f;
return *this;
}
};
Now it can be used as:
Point p = ((Point{1,2,3} *= 2) += Point{4,5,6});
But I truly think that you worry too much about copies here (and performance).
Make it work
Make it fast
If you don't have anything that already works, talking about performance is akin to chasing mills... bottlenecks are rarely where we thought they would be.
Change the definition of point_mult() to:
point& point_mult(point& p, double c) { p.x *= c; p.y *= c; p.z *= c; return p; }
^^^^^^ ^^^^^^^^^
And call it as:
point & p = point_add(point_mult(*new point{1,2,3}, 2), point{4,5,6});
^^^ ^^^^^
there is no copy involved. However, you have to later do delete &p; for releasing the memory.