I have an Eigen Matrix A which includes NAN values. I want to get the sum of differences of this matrix to multiple other matrices.
double getDistance(const Eigen::MatrixXf& from, const Eigen::MatrixXf& to)
{
Eigen::MatrixXf difference = (to - from).cwiseAbs2();
difference = difference.unaryExpr([](float v, double& sum)
{ return std::isnan(v) ? 0.0f : v;});
double distance = difference.sum();
return distance;
}
std::vector<double> getDistances(const std::vector<Eigen::MatrixXf>& from, const Eigen::MatrixXf& to)
{
std::vector<double> distances;
for (int i = 0; i < from.size(); ++i)
{
distances.push_back(getDistance(from[i], to));
}
return distances;
}
Right now I need to remove the NANs of difference every single time and then take the sum.
I was thinking about doing my own sum function which skips NANs.
Is there an elegant way to do this?
Does unaryExpr work for summing up where we need an "out parameter"?
I would recommend to follow starmole recommendation first, but to answer the question isNaN and select are for you:
return (to-from).array().isNaN().select(0,to-from).squaredNorm();
With the release of Eigen 3.4, handling NaN propagation got improved.
Related
I have been trying to understand the new ranges library and try to convert some of the more traditional for loops into functional code. The example code given by cppreference is very straight forward and readable. However, I am unsure how to apply Ranges over a vector of Points that needs to have every x and y values looked at, calculated, and compared at the end for which is the greatest distance.
struct Point
{
double x;
double y;
}
double ComputeDistance(const Point& p1, const Point& p2)
{
return std::hypot(p1.x - p2.x, p1.y - p2.y);
}
double GetMaxDistance(const std::vector<Point>& points)
{
double maxDistance = 0.0;
for (int i = 0; i < points.size(); ++i)
{
for(int j = i; j < points.size(); ++j)
{
maxDistance = std::max(maxDistance, ComputeDistance(points.at(i),points.at(j)));
}
}
return maxDistance;
}
GetMaxDistance is the code that I would love to try and clean up and apply ranges on it. Which I thought would be as simple as doing something like:
double GetMaxDistance(const std::vector<Point>& points)
{
auto result = points | std::views::tranform(ComputeDistance);
return static_cast<double>(result);
}
And then I realized that was not correct since I am not passing any values into the function. So I thought:
double GetMaxDistance(const std::vector<Point>& points)
{
for(auto point : points | std::views::transform(ComputeDistance))
// get the max distance somehow and return it?
// Do I add another for(auto nextPoint : points) here and drop the first item?
}
But then I realized that I am applying that function to every point, but not the point next to it, and this would also not work since I am still only passing in one argument into the function ComputeDistance. And since I need to compute the distance of all points in the vector I have to compare each of the points to each other and do the calculation. Leaving it as an n^2 algorithm. Which I am not trying to beat n^2, I would just like to know if there is a way to make this traditional for loop take on a modern, functional approach.
Which brings us back to the title. How do I apply std::ranges in this case? Is it even possible to do with what the standard has given us at this point? I know more is to be added in C++23. So I don't know if this cannot be achieved until that releases or if this is not possible to do at all.
Thanks!
The algorithm you're looking for is combinations - but there's no range adaptor for that (neither in C++20 nor range-v3 nor will be in C++23).
However, we can manually construct it in this case using an algorithm usually called flat-map:
inline constexpr auto flat_map = [](auto f){
return std::views::transform(f) | std::views::join;
};
which we can use as follows:
double GetMaxDistance(const std::vector<Point>& points)
{
namespace rv = std::views;
return std::ranges::max(
rv::iota(0u, points.size())
| flat_map([&](size_t i){
return rv::iota(i+1, points.size())
| rv::transform([&](size_t j){
return ComputeDistance(points[i], points[j]);
});
}));
}
The outer iota is our first loop. And then for each i, we get a sequence from i+1 onwards to get our j. And then for each (i,j) we calculate ComputeDistance.
Or if you want the transform at top level (arguably cleaner):
double GetMaxDistance(const std::vector<Point>& points)
{
namespace rv = std::views;
return std::ranges::max(
rv::iota(0u, points.size())
| flat_map([&](size_t i){
return rv::iota(i+1, points.size())
| rv::transform([&](size_t j){
return std::pair(i, j);
});
})
| rv::transform([&](auto p){
return ComputeDistance(points[p.first], points[p.second]);
}));
}
or even (this version produces a range of pairs of references to Point, to allow a more direct transform):
double GetMaxDistance(const std::vector<Point>& points)
{
namespace rv = std::views;
namespace hof = boost::hof;
return std::ranges::max(
rv::iota(0u, points.size())
| flat_map([&](size_t i){
return rv::iota(i+1, points.size())
| rv::transform([&](size_t j){
return std::make_pair(
std::ref(points[i]),
std::ref(points[j]));
});
})
| rv::transform(hof::unpack(ComputeDistance)));
}
These all basically do the same thing, it's just a question of where and how the ComputeDistance function is called.
C++23 will add cartesian_product and chunk (range-v3 has them now) , and just recently added zip_transform, which also will allow:
double GetMaxDistance(const std::vector<Point>& points)
{
namespace rv = std::views;
namespace hof = boost::hof;
return std::ranges::max(
rv::zip_transform(
rv::drop,
rv::cartesian_product(points, points)
| rv::chunk(points.size()),
rv::iota(1))
| rv::join
| rv::transform(hof::unpack(ComputeDistance))
);
}
cartesian_product by itself would give you all pairs - which both includes (x, x) for all x and both (x, y) and (y, x), neither of which you want. When we chunk it by points.size() (produces N ranges of length N), then we repeatedly drop a steadingly increasing (iota(1)) number of elements... so just one from the first chunk (the pair that contains the first element twice) and then two from the second chunk (the (points[1], points[0]) and (points[1], points[1]) elements), etc.
The zip_transform part still produces a range of chunks of pairs of Point, the join reduces it to a range of pairs of Point, which we then need to unpack into ComputeDistance.
This all exists in range-v3 (except zip_transform there is named zip_with). In range-v3 though, you get common_tuple, which Boost.HOF doesn't support, but you can make it work.
I am trying to determine the eigenvalues and eigenvectors of a sparse array in Eigen. Since I need to compute all the eigenvectors and eigenvalues, and I could not get this done using the unsupported ArpackSupport module working, I chose to convert the system to a dense matrix and compute the eigensystem using SelfAdjointEigenSolver (I know my matrix is real and has real eigenvalues). This works well until I have matrices of size 1024*1024 but then I start getting deviations from the expected results.
In the documentation of this module (https://eigen.tuxfamily.org/dox/classEigen_1_1SelfAdjointEigenSolver.html) from what I understood it is possible to change the number of max iterations:
const int m_maxIterations
static
Maximum number of iterations.
The algorithm terminates if it does not converge within m_maxIterations * n iterations, where n denotes the size of the matrix. This value is currently set to 30 (copied from LAPACK).
However, I do not understand how do you implement this, using their examples:
SelfAdjointEigenSolver<Matrix4f> es;
Matrix4f X = Matrix4f::Random(4,4);
Matrix4f A = X + X.transpose();
es.compute(A);
cout << "The eigenvalues of A are: " << es.eigenvalues().transpose() << endl;
es.compute(A + Matrix4f::Identity(4,4)); // re-use es to compute eigenvalues of A+I
cout << "The eigenvalues of A+I are: " << es.eigenvalues().transpose() << endl
How would you modify it in order to change the maximum number of iterations?
Additionally, will this solve my problem or should I try to find an alternative function or algorithm to solve the eigensystem?
My thanks in advance.
Increasing the number of iterations is unlikely to help. On the other hand, moving from float to double will help a lot!
If that does not help, please, be more specific on "deviations from the expected results".
m_maxIterations is a static const int variable, and as such it can be considered an intrinsic property of the type. Changing such a type property usually would be done via a specific template parameter. In this case, however, it is set to the constant number 30, so it's not possible.
Therefore, you're only choice is to change the value in the header file and recompile your program.
However, before doing that, I would try the Singular Value Decomposition. According to the homepage, its accuracy is "Excellent-Proven". Moreover, it can overcome problems due to numerically not completely symmetric matrices.
I solved the problem by writing the Jacobi algorithm adapted from the Book Numerical Recipes:
void ROTATy(MatrixXd &a, int i, int j, int k, int l, double s, double tau)
{
double g,h;
g=a(i,j);
h=a(k,l);
a(i,j)=g-s*(h+g*tau);
a(k,l)=h+s*(g-h*tau);
}
void jacoby(int n, MatrixXd &a, MatrixXd &v, VectorXd &d )
{
int j,iq,ip,i;
double tresh,theta,tau,t,sm,s,h,g,c;
VectorXd b(n);
VectorXd z(n);
v.setIdentity();
z.setZero();
for (ip=0;ip<n;ip++)
{
d(ip)=a(ip,ip);
b(ip)=d(ip);
}
for (i=0;i<50;i++)
{
sm=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
sm += fabs(a(ip,iq));
}
if (sm == 0.0) {
break;
}
if (i < 3)
tresh=0.2*sm/(n*n);
else
tresh=0.0;
for (ip=0;ip<n-1;ip++)
{
for (iq=ip+1;iq<n;iq++)
{
g=100.0*fabs(a(ip,iq));
if (i > 3 && (fabs(d(ip))+g) == fabs(d[ip]) && (fabs(d[iq])+g) == fabs(d[iq]))
a(ip,iq)=0.0;
else if (fabs(a(ip,iq)) > tresh)
{
h=d(iq)-d(ip);
if ((fabs(h)+g) == fabs(h))
{
t=(a(ip,iq))/h;
}
else
{
theta=0.5*h/(a(ip,iq));
t=1.0/(fabs(theta)+sqrt(1.0+theta*theta));
if (theta < 0.0)
{
t = -t;
}
c=1.0/sqrt(1+t*t);
s=t*c;
tau=s/(1.0+c);
h=t*a(ip,iq);
z(ip)=z(ip)-h;
z(iq)=z(iq)+h;
d(ip)=d(ip)- h;
d(iq)=d(iq) + h;
a(ip,iq)=0.0;
for (j=0;j<ip;j++)
ROTATy(a,j,ip,j,iq,s,tau);
for (j=ip+1;j<iq;j++)
ROTATy(a,ip,j,j,iq,s,tau);
for (j=iq+1;j<n;j++)
ROTATy(a,ip,j,iq,j,s,tau);
for (j=0;j<n;j++)
ROTATy(v,j,ip,j,iq,s,tau);
}
}
}
}
}
}
the function jacoby receives the size of of the square matrix n, the matrix we want to calculate the we want to solve (a) and a matrix that will receive the eigenvectors in each column and a vector that is going to receive the eigenvalues. It is a bit slower so I tried to parallelize it with OpenMp (see: Parallelization of Jacobi algorithm using eigen c++ using openmp) but for 4096x4096 sized matrices what I did not mean an improvement in computation time, unfortunately.
I am trying to use Intel TBB parallel_reduce to obtain the sum of array elements consisting of doubles. However the result is different compared to OpenMP reduction implementation.
Here is the OpenMP one:
double dAverageTemp = 0.0;
#pragma omp parallel for reduction(+:dAverageTemp)
for (int i = 0; i < sCartesianSize; i++)
dAverageTemp += pdTempCurr[i];
This code returns the correct value which is "317.277493"; however this TBB code:
double dAverageTemp = tbb::parallel_reduce(tbb::blocked_range<double*>(pdTempCurr, pdTempCurr + sCartesianSize - 1),
0.0,
[](const tbb::blocked_range<double*> &r, double value) -> double {
return std::accumulate(r.begin(), r.end(), value);
},
std::plus<double>()
);
insists that the result is "317.277193".
What am I missing here?
Although all comments about the order of summations are perfectly correct, the simple truth here is you have a bug in your code. All std::, thrust:: and tbb:: algorithms or constructors abide to the same philosophy when it comes to define ranges, which is to indicate from first element to take to first element not to take, like in a for ( auto it = v.begin(); it < v.end(); it++)
Therefore, here, your code for tbb::blocked_range should go up to pdTempCurr + sCartesianSize, not to pdTempCurr + sCartesianSize - 1.
It should become:
double dAverageTemp = tbb::parallel_reduce(tbb::blocked_range<double*>(pdTempCurr, pdTempCurr + sCartesianSize ),
0.0,
[](const tbb::blocked_range<double*> &r, double value) -> double {
return std::accumulate(r.begin(), r.end() value);
},
std::plus<double>()
);
My (wild) guess is that pdTempCurr[sCartesianSize-1] is around 0.0003 which will account for the numerical difference experienced.
So I have the code below. It perfectly calculates all the y-points of the polynomial (and prints them to plot with gnuplot), but how do i get the resulting polynomial (1-x² in this case)?
void twoDegreePoly() {
int n = 3;
double x[n],y[n];
printf ("#m=0,S=16\n");
for (int i=0; i<n ;i++) {
x[i] = ((double)2*i)/2 -1;
y[i] = f(x[i]);
printf ("%g %g\n", x[i], y[i]);
}
printf ("#m=1,S=0\n");
gsl_interp_accel *acc = gsl_interp_accel_alloc ();
const gsl_interp_type *t = gsl_interp_polynomial;
gsl_interp* poly = gsl_interp_alloc(t,n);
gsl_interp_init (poly, x, y,n);
for (double xi=x[0]; xi<x[n-1]; xi+= 0.01) {
double yi = gsl_interp_eval (poly, x, y, xi, acc);
printf ("%g %g\n", xi, yi);
}
}
After a quick scan over the documentation, it doesn't seem that such a feature is available in the GSL. This could be caused by two reasons: first, getting polynomial coeffcients is special to this interpolation method doesn't fit well into the general design (which can handle arbitrary functions). Second, citing Numerical Recipes:
Please be certain, however, that the coefficients are what you need. Generally, the coefficients of the interpolating polynomial can be determined much less accurately than its value at a desired abscissa. Therefire, it is not a good idea to determine the coefficients only for use in calculating interpolating values. Values thus calculated will not pass exactly through the tabulated points, for example, ...
The reason for this is that in principle, calculating the coefficients involves solving a linear system with a Vandermonde matrix, which is highly ill-conditioned.
Still, Numerical Recipes gives a routine polcoe by which you can obtain the interpolating polynomial. You can find it in chapter 3.5. in the free second edition.
I have done something similar with the Akima's interpolation.
First, define the state as GSL do:
typedef struct
{
double *b;
double *c;
double *d;
double *_m;
}akima_state_t;
Then, create the interpolant
spline = gsl_spline_alloc (gsl_interp_akima, M_size);
gsl_spline_init (spline, x, y, M_size);
and after that, you can do :
const akima_state_t *state = (const akima_state_t *) ( spline -> interp -> state);
double _b,_c,_d;
for (int i = 0; i < M_size; i++)
{
_b = state->b[i];
_c = state->c[i];
_d = state->d[i];
std::cout << "(x>"<<x[i]<<")*(x<"<<x[i+1]<<")*("<<y[i]<< "+ (x-"<< x[i]<<")*("<<_b<<"+(x-"<< x[i]<<")*("<<_c<<"+"<<_d<<"*(x-"<<x[i]<<")))) + ";
}
I do not have tried with a polynomial interpolation, but here the state struct for polynomial, it should be a good starting point.
typedef struct
{
double *d;
double *coeff;
double *work;
}
polynomial_state_t;
I have been given this equation to calculate the total energy of a signal:
Ex= ∑ n|x[n]|2
Which to me suggests that you square each of the blocks up, then get the sum of the whole entire block. I am wondering if the code/Algorithm I have written is accurate for this equation and I have done it the most efficient way.
double totalEnergy(vector<double> data, const int rows, const int cols)
{
vector<double> temp;
double energy = 0;
for(int i=0; (i < 2); i++)
{
for(int j=0; (j < 2); j++)
{
temp.push_back( (data[i*2+j]*data[i*2+j]) );
}
}
energy = accumulate (temp.begin(), temp.begin()+(rows*cols), 0);
return energy;
}
int main(int argc, char *argv[]) {
vector<double> data;
data.push_back(4);
data.push_back(4);
data.push_back(4);
data.push_back(4);
totalEnergy(data, 2, 2);
}
Result: 64
Any help / advice would be greatly appreciated :)!
This is certainly not the most efficient way to do this computation although I think the implementation is nearly correct: the multiplication by n somehow got lost, though. Since I can't see what the sum index and bounds are I'm not going to fix this but I'll reproduce the results of the implementation, just "better". There are two obvious points which can be improved:
The code doesn't make any use of being in a particular column or row. That is, it could as well just consider the input as a flat array whose size is actually known from the size of the input vector.
The function uses two temporary vectors (the one passed to the function and one inside the function). Creating a std::vector<T> needs to allocate memory which isn't a cheap operation.
As a first approximation, I would transform the input vector in-place and then accumulate the result:
double square(double value) {
return value * value;
}
double totalEnergy(std::vector<double> data) {
std::transform(data.begin(), data.end(), data.begin(), &square);
return std::accumulate (data.begin(), data.end(), 0);
}
The function still make a copy of the data and modifies it. I don't like this. Oddly enough the operation you implemented is basically an inner product of a vector with itself, i.e., this yields the same result without creating an extra vector either:
double totalEnergy(std::vector<double> const& data) {
return std::inner_product(data.begin(), data.end(), data.begin(), 0);
}
Assuming this implements the correct formula (although I'm still suspicious of the n in the original formula), this is probably considerable faster. It seem to be more concise, too...