Error: Expression must be a modifiable lvalue - c++

I have been getting this error come up in the for loop when I try to assign values to x_dev, y_dev, and pearson. As far as I can see they should all be modifiable. Can anyone see where I have gone wrong?
class LoopBody
{
double *const x_data;
double *const y_data;
double const x_mean;
double const y_mean;
double x_dev;
double y_dev;
double pearson;
public:
LoopBody(double *x, double *y, double xmean, double ymean, double xdev, double ydev, double pear)
: x_data(x), y_data(y), x_mean(xmean), y_mean(ymean), x_dev(xdev), y_dev(ydev), pearson(pear) {}
void operator() (const blocked_range<size_t> &r) const {
for(size_t i = r.begin(); i != r.end(); i++)
{
double x_temp = x_data[i] - x_mean;
double y_temp = y_data[i] - y_mean;
x_dev += x_temp * x_temp;
y_dev += y_temp * y_temp;
pearson += x_temp * y_temp;
}
}
};
Having followed #Bathsheba 's advice I have overcome these problems. However When running a parallel_for the operator is runs but the for loop is never entered.
This is where I call the parallel_for:
parallel_for(blocked_range<size_t>(0,n), LoopBody(x, y, x_mean, y_mean, x_dev, y_dev, pearson), auto_partitioner());

The () operator is marked const, and you're attempting to modify class member data (e.g. x_dev, y_dev and person). That is not allowed and is why you're getting the compile-time error.
You probably want to drop the const from the method.
Alternatively you can mark the member data that you want to modify as mutable, but this is not the preferred solution as it makes code brittle, difficult to read and can wreak havoc with multi-threading.

Seemingly you want to do reduction, i.e. compute some aggregate values over the data.
For that, TBB offers a special function template: parallel_reduce. Unlike parallel_for that perhaps you use now, parallel_reduce does not require operator() of a body class to be const, because an instance of that class accumulates partial results. However, it poses other requirements to the class: the need to have a special constructor as well as a method to merge partial results from another body instance.
More information can be found in the Intel(R) TBB User Guide: http://www.threadingbuildingblocks.org/docs/help/tbb_userguide/parallel_reduce.htm
Also there is an overload of parallel_reduce which takes two functors - one for body and another one for merging partial results - as well as a special "identity" value used to initialize accumulators. But you are computing three aggregate values at once, so you would still need to have a struct or class to store all three values in a single variable.

Related

Returning values from a function : reference vs struct

I'm new to programming and I'm studying the C++ programming language using the book : Programming principles and practice using C++. I'm here today because at the end of chapter 8, the author focuses on functions and proposes an exercise to invite the learner to think about the better solution to a problem :
Write a function that finds the smallest and the largest element of a vector argument and also computes the mean and the median. Do not use global variables. Either return a struct containing the results or pass them back through reference arguments. Which of the two was of returning several values do you prefer and why ?
Now, usually I wouldn't define a single function to perform several actions but in this case I have to create just one function and think about how to return several values. My first approach was to create a function that takes reference arguments like this :
void my_func(
vector<double>& numbers,
double& min,
double& max,
double& mean,
double& median
);
But going on with writing the program I started to think that this solution used too many arguments and maybe the other solution proposed (using struct) would be better. How would you use a struct to solve this problem ? How do you return several values from a function ?
Using struct for this problem is simple: define a struct for the return type, and use it, like this:
struct stats {
double min;
double max;
double mean;
double median;
};
stats my_func(vector<double>& numbers) {
stats res;
...
res.min = ...
res.max = ...
res.mean = ...
res.median = ...
return res;
}
The tradeoff here is that in exchange for having a much simpler function signature, the users of your function need to extract the elements that they want one by one.
But what about if the struct is really complex and the cost of copying becomes too expensive?
It takes a structure of extreme size for copying to become too expensive in comparison to the "payload" CPU time of your function. On top of that, C++ optimizer reduces the copying costs by employing copy elision and return value optimization strategies when you do this:
stats s = my_func(numbers);
When the struct becomes so gigantic that you don't want to copy it, combine the two approaches like this:
void my_func(vector<double>& numbers, stats& res);
Declare the struct
i.e.
struct stats {
double min;
double max;
double mean;
double median;
};
Then
stats my_func(vector<double>& numbers);
The function will be like this
stats my_func(vector<double>& numbers) {
stats return_value;
// Fill in the structure
return return_value;
}
Well you just return a struct.
struct res {
double min;
double max;
double mean;
double median;
};
res my_func(vector<double>& numbers) {
res result;
// do stuff, put result in in result.min etc
return result;
}
The other can be done in a similar way:
void my_func(vector<double>& numbers,
res& result)
{
// the same code except no return value
return;
}
Now for the question in your task (which I shall not answer - because it's a question what you prefer) I'd like to mention what the technical difference is.
When returning a struct it means you will possibly create a temporary copy of the result struct (it may also result in two objects res one for the caller and one for my_func to work with). While the other approach means that you pass to the function essentially addresses where to put the result. With a "good" implementation you may end up with the same code.

C++: Why accessing class data members is so slow compared to accessing global variables?

I'm implementing a computationally expensive program and in the last days I spent a lot of time getting familiar with object oriented design, design patterns and SOLID principles. I need to implement several metrics in my program so I designed a simple interface to get it done:
class Metric {
typedef ... Vector;
virtual ~Metric() {}
virtual double distance(const Vector& a, const Vector& b) const = 0;
};
the first metric I implemented was the Minkowski metric,
class MinkowskiMetric : public Metric {
public:
MinkowskiMetric(double p) : p(p) {}
double distance(const Vector& a, const Vector& b) const {
const double POW = this->p; /** hot spot */
return std::pow((std::pow(std::abs(a - b), POW)).sum(), 1.0 / POW);
private:
const double p;
};
Using this implementation the code ran really slow someone tried a global variable instead of accessing the data member, my last implementation doesn't get the job done but looks like this.
namespace parameters {
const double p = 2.0; /** for instance */
}
And the hot spot line looks like:
...
const double POW = parameters::p; /** hot spot */
return ...
Just making that change, the code runs at least 275 times faster in my machine, using either gcc-4.8 or clang-3.4 with optimization flags in Ubuntu 14.04.1.
Is this a problem a common pitfall?
Is there any way around it?
Am I just missing something?
The difference between the two version is that in one case, the compiler has to load p and perform some computation with it, while in the other, you're using a global constant, which the compiler can probably just substitute directly. So in one case, the resulting code probably does this:
Load p.
Call abs(a - b), name the result c
Call pow(c, p), name the result d
Call d.sum() (whatever that means), name the result e
Calculate 1.0 / p, name the result i
Call pow(e, i).
That's a bunch of library calls, and library calls are slow. Also, pow is slow.
When you use the global constant, the compiler can do some calculations by itself.
Call abs(a - b), name the result c.
pow(c, 2.0) is more efficiently calculated as c * c, name the result d
Call d.sum(), name the result e
1.0 / 2.0 is 0.5, and pow(e, 0.5) can be translated to the more efficient sqrt(e).
Let's have a look at what is going on here:
...
Metric *metric = new MinkowskiMetric(2.0);
metric->distance(a, b);
Since distance is a virtual function the runtime has to look up the address of the metric pointer to load in the virtual function table pointer and then use that to look up the address of the distance function for your object.
This is probably incidental to what is happening next:
double distance(const Vector& a, const Vector& b) const {
const double POW = this->p; /** hot spot */
The function has to then look up the address of the this pointer (which happens to be explicitly stated here) in order to know from which location to load in the value of p. Compare that to the version which uses a global variable:
double distance(const Vector& a, const Vector& b) const {
const double POW = parameters::p; /** hot spot */
...
namespace parameters {
const double p = 2.0; /** for instance */
}
This version of p is always going to live at the same address and therefore loading in its value is only ever going to be a single operation and removes a level of indirection which is almost certainly causing a cache miss and causing the CPU to block waiting for data to be loaded from RAM.
So how can you avoid this? Try to allocate objects on the stack as much as possible. This enables a locality of reference known as spatial locality which means that your data is much more likely to be living in the CPU's cache when it needs to load it in. You can see Herb Sutter discussing this issue in the middle of this talk.
If you want to use OOP in code that should be somewhat performant you'll still have to minimise the amount of memory accesses. This means a change in design. Taking your example (assuming you're evaluating the metric a few times):
double MinkowskiMetric::distance(const Vector& a, const Vector& b) const {
const double POW = this->p; /** hot spot */
return std::pow((std::pow(std::abs(a - b), POW)).sum(), 1.0 / POW);
}
can be turned into
template<class VectorIter, class OutIter>
void MinkowskiMetric::distance(VectorIter aBegin, VectorIter aEnd, VectorIter bBegin, OutIter rBegin) const {
const double pow = this->p, powInv = 1.0 / pow;
while(aBegin != aEnd) {
Vector a = *aBegin++;
Vector b = *bBegin++;
*rBegin++ = std::pow((std::pow(std::abs(a - b), pow)).sum(), powInv);
}
}
Now you'll access the location of the virtual function and the members of this exactly once for a set of Vector pairs - adjust your algorithm accordingly to make use of this optimisation.

multithreading and dynamic arrays/matrix-op's

I am currently writing a physical simulation (t.b.m.p. solving a stochastic differential equation) and I need to parallelize it.
Now this could be achieved with MPI and I think I will have to do it some time in the future, but currently I want to utilize all 8 cores of my local machine for it. A normal run takes from 2 - 17 hours for one parameter set. Therefore I thought to utilize multithreading, specifically the following function should be executed in parallel. This function essentially solves the same SDE Nrep times for Nsteps timesteps. The results are averaged and stored for each thread into a separate row of an Nthreads x Nsteps array JpmArr.
double **JpmArr;
void worker(const dtype gamma, const dtype dt, const uint seed, const uint Nsteps, const uint Nrep,\
const ESpMatD& Jplus, const ESpMatD& Jminus, const ESpMatD& Jz, const uint tId ){
dtype dW(0), stdDev( sqrt(dt) );
std::normal_distribution<> WienerDistr(0, stdDev);
//create the arrays for the values of <t|J+J-|t>
dtype* JpmExpect = JpmArr[tId];
//execute Nrep repetitions of the experiment
for (uint r(0); r < Nrep; ++r) {
//reinitialize the wave function
psiVecArr[tId] = globalIstate;
//<t|J+J-|t>
tmpVecArr[tId] = Jminus* psiVecArr[tId];
JpmExpect[0] += tmpVecArr[tId].squaredNorm();
//iterate over the timesteps
for (uint s(1); s < Nsteps; ++s) {
//get a random number
dW = WienerDistr(RNGarr[tId]);
//execute one step of the RK-s.o. 1 scheme
tmpPsiVecArr[tId] = F2(gamma, std::ref(Jminus), std::ref(psiVecArr[tId]) );
tmpVecArr[tId] = psiVecArr[tId] + tmpPsiVecArr[tId] * sqrt(dt);
psiVecArr[tId] = psiVecArr[tId] + F1(gamma, std::ref(Jminus), std::ref(Jplus), std::ref(psiVecArr[tId])) * dt + tmpPsiVecArr[tId] * dW \
+ 0.5 * (F2(gamma, std::ref(Jminus), std::ref(tmpVecArr[tId]) ) - F2(gamma, std::ref(Jminus), std::ref(psiVecArr[tId]))) *(dW * dW - dt) / sqrt(dt);
//normalise
psiVecArr[tId].normalize();
//compute <t|J+J-|t>
tmpVecArr[tId] = Jminus* psiVecArr[tId];
JpmExpect[s] += tmpVecArr[tId].squaredNorm();
}
}
//average over the repetitions
for (uint j(0); j < Nsteps; ++j) {
JpmExpect[j] /= Nrep;
}
}
I am using Eigen as a library for linear algebra, thus:
typedef Eigen::SparseMatrix<dtype, Eigen::RowMajor> ESpMatD;
typedef Eigen::Matrix<dtype, Eigen::Dynamic, Eigen::RowMajor> VectorXdrm;
are used as types. The above worker function calls:
VectorXdrm& F1(const dtype a, const ESpMatD& A, const ESpMatD& B, const VectorXdrm& v) {
z.setZero(v.size());
y.setZero(v.size());
// z is for simplification
z = A*v;
//scalar intermediate value c = <v, Av>
dtype c = v.dot(z);
y = a * (2.0 * c * z - B * z - c * c * v);
return y;
}
VectorXdrm& F2(const dtype a, const ESpMatD& A, const VectorXdrm& v) {
//zero the data
z.setZero(v.size());
y.setZero(v.size());
z = A*v;
dtype c = v.dot(z);
y = sqrt(2.0 * a)*(z - c * v);
return y;
}
where the vectors z,y are of type VectorXdrm and are declared in the same file (module-global).
All the arrays (RNGarr, JpmArr, tmpPsiVecArr, tmpVecArr, psiVecArr) are initialized in main (by use of extern declaration in main.cpp). After that setup is done I run the function using std::async, wait for all to finish and then collect the data from JpmArr into a single array in main() and write it to file.
Problem:
The results are nonsense if I use std::launch::async.
If I use std::launch::deferred the computed and averaged results match (as far as then numerical method permits) the results I obtain by analytic means.
I have no idea anymore where stuff fails. I used to use Armadillo for linear algebra, but it's normalize routine delivered nan's so I switched to Eigen, which hints (in the documenation) at being usable with multiple threads - it still fails.
Having not worked with threads before I have spent 4 days now trying to get this working and reading up on things. The latter lead me to use the global arrays RNGarr, JpmArr, tmpPsiVecArr, tmpVecArr, psiVecArr (before I just tried to create the appropriate arrays in worker and pass the results by means of a struct workerResult back to main.) as well as using std::ref() to pass the matrices Jplus, Jminus, Jz to the function.(the last is omitted in the function above - for brevity)
But the results I get are still wrong and I have no Idea anymore, what is wrong and what I should do to get the right results.
Any input and/or pointers to examples of solutions of such (threading) problems or references will be greatly appreciated.
There is clearly some interaction between the calculations in each thread -- either due to that your global data is being updated by multiple thread, or though some of the structures passed by reference is mutating while running -- z and y cannot be globals if they are updated by multiple threads -- but there may be many other problems
I would suggest you refactor the code as follows;
Make it object oriented -- define a class which are self contained. Take all the parameters which are given to worker and make them members of the class.
if you are not sure if data structures are mutating, then do not pass them by reference. If in doubt assume the worst and make complete copies within the class.
In cases where threads do have to update shared structures (you should have none in your use case) you will need to protect read and writes by mutexes for exclusive access.
Remove all global -- instead of having a global JpmArr, z and y, define the data which is needed within the class.
make worker, F1, F2 member functions of your class.
Then in main, new your newly created class as many times as you need, and start each of them as a thread -- making sure that you are waiting for the thread to complete before you read any of the data -- each thread will have its own stack, and its own data within the class, and hence interference between parallel calculations becomes much less likely.
If you want to optimize further, you will need to consider each matrix calculation as a job and create a pool of thread matching the number of cores and let each thread pick up a job sequentially, that will reduce context switching overhead and CPU L1/L2 cache misses which will happen if your number of threads becomes many more than your number of cores -- however this is becoming a lot more elaborate than what you need for your immediate problem....
Stop using globals. It's bad style anyway, and here multiple threads will be zeroing and mutating z and y at the same time.
The simplest fix is to replace your globals with local variables in the worker function - so each concurrent call to worker has its own copy - and pass them into F1 and F2 by reference.

Passing a QVector pointer as an argument

1) I want to pass a the pointer of a QVector to a function and then do things with it. I tried this:
void MainWindow::createLinearVector(QVector<float> *vector, float min, float max )
{
float elementDiff=(max-min)/(vector->size()-1);
if(max>min) min -= elementDiff;
else min += elementDiff;
for(int i=0; i< vector->size()+1 ; i++ )
{
min += elementDiff;
*(vector+i) = min; //Problematic line
}
}
However the compiler gives me "no match for operator =" for the *(vector+i) = min; line. What could be the best way to perform actions like this on a QVector?
2) The function is supposed to linearly distribute values on the vector for a plot, in a way the matlab : operator works, for instance vector(a:b:c). What is the simpliest and best way to perform such things in Qt?
EDIT:
With help from here the initial problem is solved. :)
I also improved the metod in itself. The precision could be improved a lot by using linear interpolation instead of multiple additions like above. With multiple addition an error is accumulating, which is eliminated in large part by linear interpolation.
Btw, the if statement in the first function was unecessary and possible to remove by just rearranging stuff a little bit even in the multiple addition method.
void MainWindow::createLinearVector(QVector<double> &vector, double min, double max )
{
double range = max-min;
double n = vector.size();
vector[0]=min;
for(int i=1; i< n ; i++ )
{
vector[i] = min+ i/(n-1)*range;
}
}
I considered using some enchanced loop for this, but would it be more practical?
With for instance a foreach loop I would still have to increment some variable for the interpolation right? And also make a conditional for skipping the first element?
I want to place a float a certain place in the QVector.
Then use this:
(*vector)[i] = min; //Problematic line
A vector is a pointer to a QVector, *vector will be a QVector, which can be indiced with [i] like any QVector. However, due to precedence, one needs parentheses to get the order of operations right.
I think, first u need use the Mutable iterator for this stuff: Qt doc link
Something like this:
QMutableVectorIterator<float> i(vector);
i.toBack();
while (i.hasPrevious())
qDebug() << i.{your code}
Right, so it does not make much sense to use a QVector pointer in here. These are the reasons for that:
Using a reference for the method parameter should be more C++'ish if the implicit sharing is not fast enough for you.
Although, most of the cases you would not even need a reference when just passing arguments around without getting the result back in the same argument (i.e. output argument). That is because *QVector is implicitly shared and the copy only happens for the write as per documentation. Luckily, the syntax will be the same for the calling and internal implementation of the method in both cases, so it is easy to change from one to another.
Using smart pointers is preferable instead of raw pointers, but here both are unnecessarily complex solutions in my opinion.
So, I would suggest to refactor your code into this:
void MainWindow::createLinearVector(QVector<float> &vector, float min, float max)
{
float elementDiff = (max-min) / (vector.size()-1);
min += ((max>min) ? (-elementDiff) : elementDiff)
foreach (float f, vector) {
min += elementDiff;
f = min;
}
}
Note that I fixed up the following things in your code:
Reference type parameter as opposed to pointer
"->" member resolution to "." respectively
Ternary operation instead of the unnatural if/else in this case
Qt's foreach instead of low-level indexing in which case your original point becomes moot
This is then how you would invoke the method from the caller:
createLinearVector(vector, fmin, fmax);

Passing multiple variables back from a single function?

I have an assignment (see below for question) for a beginners c++ class, where i am asked to pass 2 values back from a single function. I am pretty sure of my understanding of how to use functions and the general structure of what the program should be, but i am having trouble fingin how to pass two variables back to "main" from the function.
Assignment:
Write a program that simulates an airplane race. The program will display a table showing the speed in km/hour and distance in km traveled by two airplanes every second until one of them has gone 10 kilometers.
These are the requirements for the program:
-The program will use a function that has the following parameters: time and acceleration.
-The function will pass back two data items: speed and distance.
You have two options (well, three really, but I'm leaving pointers out).
Take references to output arguments and assign them within the function.
Return a data structure which contains all of the return values.
Which option is best depends on your program. If this is a one off function that isn't called from many places then you may chose to use option #1. I assume by "speed" you mean the "constant velocity" which is reached after "time" of acceleration.
void calc_velocity_profile(double accel_time,
double acceleration,
double &out_velocity, // these last two are
double &out_distance); // assigned in the function
If this is a more general purpose function and/or a function which will be called by many clients I would probably prefer option #2.
struct velocity_profile {
double velocity;
double distance;
};
velocity_profile calc_velocity_profile(double accel_time, double acceleration);
Everything being equal, I prefer option 1. Given the choice, I like a function which returns a value instead of a function which mutates its input.
2017 Update: This is discussed in the C++ Core Guidelines :
F.21 To return multiple "out" values, prefer returning a tuple or struct
However, I would lean towards returning a struct over a tuple due to named, order-independent access that is encapsulated and reusable as a explicit strong type.
In the special case of returning a bool and a T, where the T is only filled if the bool is true , consider returning a std::optional<T>. See this CPPCon17 video for an extended discussion.
Struct version:
struct SpeedInfo{
float speed;
float distance;
};
SpeedInfo getInfo()
{
SpeedInfo si;
si.speed = //...
si.distance = //...
return si;
}
The benefit of this is that you get an encapsulated type with named access.
Reference version:
void getInfo(float& speed, float& distance)
{
speed = //...
distance = //...
}
You have to pass in the output vars:
float s;
float d;
getInfo(s, d);
Pointer version:
void getInfo(float* speed, float* distance)
{
if(speed)
{
*speed = //...
}
if(distance)
{
*distance= //...
}
}
Pass the memory address of the output variable:
float s;
float d;
getInfo(&s, &d);
Pointer version is interesting because you can just pass a nullptr/NULL/0 for things you aren't interested in; this can become useful when you are using such a function that potentially takes a lot of params, but are not interested in all the output values. e.g:
float d;
getInfo(nullptr, &d);
This is something which you cant do with references, although they are safer.
There is already such a data structure in C++ that is named as std::pair. It is declared in header <utility>. So the function could look the following way
std::pair<int, int> func( int time, int acceleration )
{
// some calculations
std::pair<int, int> ret_value;
ret_value.first = speed_value;
ret_value.second = distance_value;
return ( ret_value );
}