I would like to convert my for loop to STL std::for_each loop.
bool CMyclass::SomeMember()
{
int ii;
for(int i=0;i<iR20;i++)
{
ii=indexR[i];
ishell=static_cast<int>(R[ii]/xStep);
theta=atan2(data->pPOS[ii*3+1], data->pPOS[ii*3]);
al2[ishell] += massp*cos(fm*theta);
}
}
Actually I was planning to use parallel STL from g++4.4
g++ -D_GLIBCXX_PARALLEL -fopenmp
which is allow to run code in parallel without changes if the code is written in standard STL library.
You need to seperate out the loop body into a seperate function or functor; I've assumed all the undeclared variables are member variables.
void CMyclass::LoopFunc(int ii) {
ishell=static_cast<int>(R[ii]/xStep);
theta=atan2(data->pPOS[ii*3+1],
data->pPOS[ii*3]);
al2[ishell] += massp*cos(fm*theta);
}
bool CMyclass::SomeMember() {
std::for_each(&indexR[0],&indexR[iR20],std::tr1::bind(&CMyclass::LoopFunc,std::tr1::ref(*this));
}
class F {
public:
void operator()(int ii) {
ishell=static_cast<int>(R[ii]/xStep);
theta=atan2(data->pPOS[ii*3+1], data->pPOS[ii*3]);
al2[ishell] += massp*cos(fm*theta);
}
F(int[] r): //and other parameters should also be passed into the constructor
r_(r) {}
void:
int[] r_; // refers to R[ii] array
// and other parameters should also be stored
};
F f(R); // pass other parameters too
for_each(&indexR[0], &indexR[iR20], f);
However it might not be a good idea to use this "automatic parallelization" since you need to keep in mind the grainsize of each parallel computation -- I am not sure how well the compiler takes the grain size into account.
You cannot just separate cycle body into functor and assume that it will be paralellised because you have too many dependencies inside cycle body.
Cycle will be able to run in parallel only if you have no global arrays or pointers. If you provide full function body then we can think how to change it to parallel version.
You'll need to convert the loop body into a function or functor. There are a lot of undeclared variables in there, so I can't easily tell how to separate out the loop body. Here's a stab at it:
class DoStuff
{
int* R;
int xStep;
Data* data;
double massp;
double fm;
double* al2;
public:
DoStuff(int* R_, int xStep_, Data* data_, double massp_, double fm_, double* al2_) :
R(R_), xStep(xStep_), data(data_), massp(massp_), fm(fm_), al2(al2_) {}
void operator()(int ii)
{
int ishell = static_cast<int>(R[ii]/xStep);
double theta = atan2(data->pPOS[ii*3+1], data->pPOS[ii*3]);
al2[ishell] += massp*cos(fm*theta);
}
};
for_each(indexR, indexR+iR20, DoStuff(R, xStep, data, massp, fm, al2));
Related
I'm using ROOT Cern to solve a multi-variable non-linear system of equations. For some problems I have 4 functions and 4 variables. However, for others I need 20 functions with 20 variables. I'm using a class called "WrappedParamFunction" to wrap the functions and then I add the wrapped functions to the "GSLMultiRootFinder" to solve them. The function is wrapped this way:
ROOT::Math::WrappedParamFunction<> g0(&f0, "number of variables", "number of parameters");
Therefore, I need to declare the f0...fi functions before my void main(){} part of the code. I'm declaring the functions in this way:
double f0(const double *x, const double *par){return -par[0]+y[0]*par[1];}
double f1(const double *x, const double *par){return -par[1]+y[1]*par[2];}
.
.
Is there a way to create those functions inside a loop and stack them in an array? Something like this:
double (*f[20])(const double *x, const double *par);
for(int i=0;i<20;i++){
f[i]= -par[i]+x[i]*par[i+1];
}
So that later I can wrap the functions in this way:
ROOT::Math::WrappedParamFunction<> g0(f[0], "number of variables", "number of parameters");
f[i]= -par[i]+x[i]*par[i+1];
You can't generate code at runtime, so you can't do exactly what you're asking for.
You can however save the value of i for use at runtime, so you have a single callable object with a hidden parameter i not passed explicitly by the caller. The simplest example is
auto f = [i](const double *x, const double *par)
{
return -par[i]+x[i]*par[i+1];
};
but this gives a unique type to the lambda f, so you can't easily have an array of them.
You can however write
using Callable = std::function<double, const double *, const double *>;
std::array<Callable, 20> f;
and store the lambdas in that.
I think you'll need to use ROOT::Math::WrappedParamFunction<Callable> for this to work, though, since the FuncPtr parameter type is not erased.
If you really can't change the WrappedParamFunction type parameter for whatever reason, you can generate a free function instead of a stateful lambda using templates - but it's pretty ugly.
Edit - I was considering writing that version out too, but fabian beat me to it. Do note that you have to either write out all that machinery for each distinct function that needs this treatment, wrap it in a macro, or generalize everything to take a function template parameter as well.
There are almost certainly better ways of accomplishing this, but this probably gets you closest to the desired result described in the question:
Create a function template with the offset as template parameter and then create an std::array of function pointers with function pointers pointing to specializations of a template function. Note that the size of the array must be a compile time constant for this to work:
template<size_t Offset>
double f(const double* y, const double* par)
{
return -par[Offset] + y[Offset] * par[Offset+1];
}
template<size_t ... Offsets>
std::array<double(*)(double const*, double const*), sizeof...(Offsets)> CreateFsHelper(std::index_sequence<Offsets...>)
{
return { &f<Offsets>... };
}
template<size_t N>
std::array<double(*)(double const*, double const*), N> CreateFs()
{
return CreateFsHelper(std::make_index_sequence<N>{});
}
int main()
{
auto functions = CreateFs<20>();
}
Making your i a template parameter and generating the functions recursively at compile time can also do the trick:
using FunctionPrototype = double(*)(const double *, const double *);
template<int i>
double func(const double * x, const double * par) {
return -par[i]+x[i]*par[i+1];
}
template<int i>
void generate_rec(FunctionPrototype f[]) {
f[i-1] = &func<i-1>;
generate_rec<i-1>(f);
}
template<>
void generate_rec<0>(FunctionPrototype f[]) { }
template<int i>
FunctionPrototype* generate_functions()
{
FunctionPrototype * f = new FunctionPrototype[i]();
generate_rec<i>(f);
return f;
}
FunctionPrototype * myFuncs = generate_functions<3>(); // myFuncs is now an array of 3 functions
"Is there a way to create an array of functions inside a loop in C or C++"
sure, you can create a std::array or std::vector of std::function.
You can also create a container of function pointers if you so desire.
In the following example, I would like a traverse method that receives a callback. This example works perfectly as soon as I don't capture anything [] because the lambda can be reduced into a function pointer. However, in this particular case, I would like to access sum.
struct Collection {
int array[10];
void traverse(void (*cb)(int &n)) {
for(int &i : array)
cb(i);
}
int sum() {
int sum = 0;
traverse([&](int &i) {
sum += i;
});
}
}
What is the proper way (without using any templates) to solve this? A solution is to use a typename template as follows. But in this case, you lack visibility on what traverse gives in each iteration (an int):
template <typename F>
void traverse(F cb) {
for(int &i : array)
cb(i);
}
Lambda types are unspecified; there is no way to name them.
So you have two options:
Make traverse a template (or have it take auto, which is effectively the same thing)
Fortunately this is a completely normal and commonplace thing to do.
Have traverse take a std::function<void(int)>. This incurs some overhead, but does at least mean the function need not be a template.
But in this case, you lack visibility on what traverse gives in each iteration (an int)
We don't tend to consider that a problem. I do understand that giving this in the function's type is more satisfying and clear, but generally a comment is sufficient, because if the callback doesn't provide an int, you'll get a compilation error anyway.
Only captureless lambdas can be used with function pointers. As every lambda definition has its own type you have to use a template parameter in all places where you accept lambdas which captures.
But in this case, you lack visibility on what traverse gives in each iteration (an int).
This can be checked easily by using SFINAE or even simpler by using concepts in C++20. And to make it another step simpler, you even do not need to define a concept and use it later, you can directly use an ad-hoc requirement as this ( this results in the double use of the requires keyword:
struct Collection {
int array[10];
template <typename F>
// check if F is "something" which can be called with an `int&` and returns void.
requires requires ( F f, int& i) { {f(i)} -> std::same_as<void>; }
void traverse(F cb)
{
for(int &i : array)
cb(i);
}
// alternatively you can use `std::invocable` from <concepts>
// check if F is "something" which can be called with an `int&`, no return type check
template <std::invocable<int&> F>
void traverse2(F cb)
{
for(int &i : array)
cb(i);
}
int sum() {
int sum = 0;
traverse([&](int &i) {
sum += i;
});
return sum;
}
};
In your case you have several ways of declaring a callback in C++:
Function pointer
void traverse(void (*cb)(int &n)) {
for(int &i : array)
cb(i);
}
This solution only supports types that can decay into a function pointer. As you mentioned, lambdas with captures would not make it.
Typename template
template <typename F>
void traverse(F cb) {
for(int &i : array)
cb(i);
}
It does accept anything, but as you noticed. the code is hard to read.
Standard Functions (C++11)
void traverse(std::function<const void(int &num)>cb) {
for(int &i : array)
cb(i);
}
This is the most versatile solution with a slightly overhead cost.
Don't forget to include <functional>.
When trying to use a conditional copy_if algorithm to copy only the values that are lower than mean of values in a vector into another vector, I hit a snag with my function object:
struct Lower_than_mean
{
private:
double mean;
vector<double>d1;
public:
Lower_than_mean(vector<double>a)
:d1{a}
{
double sum = accumulate(d1.begin(), d1.end(), 0.0);
mean = sum / (d1.size());
}
bool operator()(double& x)
{
return x < mean;
}
};
int main()
{
vector<double>vd{ 3.4,5.6, 7, 3,4,5.6,9,2 };
vector<double>vd2(vd.size());
copy_if(vd.begin(), vd.end(), vd2, Lower_than_mean(vd));
}
What is the right way of going about this?
You used vd instead of vd.begin() in the call to std::copy_if.
But, seriously, you didn't bother to even read your compiler output...
Also, like #zch suggests - your approach doesn't make sense: Don't keep re-calculating the mean again and again. Instead, calculate it once, and then your function becomes as simple [mean](double x) { return x < mean; } lambda.
I would like to eliminate duplicity of code in this problem:
class PopulationMember
{
public:
vector<int> x_;
vector<int> y_;
}
class Population
{
vector<PopulationMember*> members_;
void doComputationforX_1(); // uses the attribute x_ of all members_
void doComputationforX_2();
void doComputationforX_3();
void doComputationforY_1(); // exactly same as doComputationforX_1, but
void doComputationforY_2(); // uses the attribute y_ of all members_
void doComputationforY_3();
EDIT: // there are also functions that use all the members_ simultaniously
double standardDeviationInX(); // computes the standard deviation of all the x_'s
double standardDeviationInY(); // computes the standard deviation of all the y_'s
}
The duplicity is causing me to have 6 methods instead of 3. The pairwise similarity is so
striking, that I can get the implementation of doComputationforY_1 out of doComputationforX_1 by simply replacing the "x_" by "y_".
I thought about remaking the problem in this way:
class PopulationMember
{
public:
vector<vector<int>> data_; // data[0] == x_ and data[1] == y_
}
But it becomes less clear this way.
I know that a precompiler macro is a bad solution in general, but I do not see any other. My subconciousness keeps suggesting templates, but I just do not see how can I use them.
If you want to keep x_ and y_ separately in the same class PopulationMember then it's better to choose pass by value solution rather than template solution:
Define the generic method as:
void doComputationfor (vector<int> (PopulationMember::*member_));
// pointer to data ^^^^^^^^^^^^^^^^^^^^^^^^^^
Call it as:
doComputationfor(&PopulationMember::x_);
doComputationfor(&PopulationMember::y_);
Remember that if your doComputationfor is large enough then, imposing template method would make code duplication.
With the pointer to member method, you will avoid the code duplication with a little runtime penalty.
If the API you have specified is exactly what you want users of the class to see, then just make private methods in Population called doComputation_1( const vector<int> &v ) { do stuff on v; }
And then make the public implementations 1 line long:
public:
void DoComputationX_1() { doComputation_1( x_ ); }
void DoComputationY_1() { doComputation_1( y_ ); }
private:
// drop the 'const' if you will need to modify the vector
void doComputation_1( const vector<int> &v ) { do stuff on v; }
I don't feel like this is the right solution, but I can't piece together what your class is really trying to do in order to offer up anything more meaningful.
Suppose you have a function, and you call it a lot of times, every time the function return a big object. I've optimized the problem using a functor that return void, and store the returning value in a public member:
#include <vector>
const int N = 100;
std::vector<double> fun(const std::vector<double> & v, const int n)
{
std::vector<double> output = v;
output[n] *= output[n];
return output;
}
class F
{
public:
F() : output(N) {};
std::vector<double> output;
void operator()(const std::vector<double> & v, const int n)
{
output = v;
output[n] *= n;
}
};
int main()
{
std::vector<double> start(N,10.);
std::vector<double> end(N);
double a;
// first solution
for (unsigned long int i = 0; i != 10000000; ++i)
a = fun(start, 2)[3];
// second solution
F f;
for (unsigned long int i = 0; i != 10000000; ++i)
{
f(start, 2);
a = f.output[3];
}
}
Yes, I can use inline or optimize in an other way this problem, but here I want to stress on this problem: with the functor I declare and construct the output variable output only one time, using the function I do that every time it is called. The second solution is two time faster than the first with g++ -O1 or g++ -O2. What do you think about it, is it an ugly optimization?
Edit:
to clarify my aim. I have to evaluate the function >10M times, but I need the output only few random times. It's important that the input is not changed, in fact I declared it as a const reference. In this example the input is always the same, but in real world the input change and it is function of the previous output of the function.
More common scenario is to create object with reserved large enough size outside the function and pass large object to the function by pointer or by reference. You could reuse this object on several calls to your function. Thus you could reduce continual memory allocation.
In both cases you are allocating new vector many many times.
What you should do is to pass both input and output objects to your class/function:
void fun(const std::vector<double> & in, const int n, std::vector<double> & out)
{
out[n] *= in[n];
}
this way you separate your logic from the algorithm. You'll have to create a new std::vector once and pass it to the function as many time as you want. Notice that there's unnecessary no copy/allocation made.
p.s. it's been awhile since I did c++. It may not compile right away.
It's not an ugly optimization. It's actually a fairly decent one.
I would, however, hide output and make an operator[] member to access its members. Why? Because you just might be able to perform a lazy evaluation optimization by moving all the math to that function, thus only doing that math when the client requests that value. Until the user asks for it, why do it if you don't need to?
Edit:
Just checked the standard. Behavior of the assignment operator is based on insert(). Notes for that function state that an allocation occurs if new size exceeds current capacity. Of course this does not seem to explicitly disallow an implementation from reallocating even if otherwise...I'm pretty sure you'll find none that do and I'm sure the standard says something about it somewhere else. Thus you've improved speed by removing allocation calls.
You should still hide the internal vector. You'll have more chance to change implementation if you use encapsulation. You could also return a reference (maybe const) to the vector from the function and retain the original syntax.
I played with this a bit, and came up with the code below. I keep thinking there's a better way to do this, but it's escaping me for now.
The key differences:
I'm allergic to public member variables, so I made output private, and put getters around it.
Having the operator return void isn't necessary for the optimization, so I have it return the value as a const reference so we can preserve return value semantics.
I took a stab at generalizing the approach into a templated base class, so you can then define derived classes for a particular return type, and not re-define the plumbing. This assumes the object you want to create takes a one-arg constructor, and the function you want to call takes in one additional argument. I think you'd have to define other templates if this varies.
Enjoy...
#include <vector>
template<typename T, typename ConstructArg, typename FuncArg>
class ReturnT
{
public:
ReturnT(ConstructArg arg): output(arg){}
virtual ~ReturnT() {}
const T& operator()(const T& in, FuncArg arg)
{
output = in;
this->doOp(arg);
return this->getOutput();
}
const T& getOutput() const {return output;}
protected:
T& getOutput() {return output;}
private:
virtual void doOp(FuncArg arg) = 0;
T output;
};
class F : public ReturnT<std::vector<double>, std::size_t, const int>
{
public:
F(std::size_t size) : ReturnT<std::vector<double>, std::size_t, const int>(size) {}
private:
virtual void doOp(const int n)
{
this->getOutput()[n] *= n;
}
};
int main()
{
const int N = 100;
std::vector<double> start(N,10.);
double a;
// second solution
F f(N);
for (unsigned long int i = 0; i != 10000000; ++i)
{
a = f(start, 2)[3];
}
}
It seems quite strange(I mean the need for optimization at all) - I think that a decent compiler should perform return value optimization in such cases. Maybe all you need is to enable it.