reduce_parallel is not a thread-safe function?

reduce_parallel is not a thread-safe function? - c++

I want to call parallel_reduce to sum the vector elements. But I find that if the vector elements is enough, the result is not correct. Please help me how to use this function.
// prepare data
const size_t allNum = 1000000;
std::vector<double> a;
for (int i = 0; i < allNum; ++i)
{
a.push_back(double(i + 1));
}
// λ func
auto f = [&]() -> double {
return tbb::parallel_reduce(tbb::blocked_range<size_t>(0, allNum),
0.0,
[&](const tbb::blocked_range<size_t>& r, double init) -> double {
for (int i = r.begin(); i < r.end(); ++i)
{
init += a[i];
}
return init;
},
[](double f, double s) -> double {
return f + s;
}
/*std::plus<double>()*/);
};
// call λ func, get the result
double correctResult = (1.0 + 1000000.0) * 500000.0;
double sum = f(); // sum != correctResult
// sum is different every loop

I tried running the above code. It was working fine and got the correct results too!
For more information about parallel_reduce, refer to the below links:
https://software.intel.com/content/www/us/en/develop/documentation/tbb-documentation/top/intel-threading-building-blocks-developer-reference/algorithms/parallelreduce-template-function.html
https://link.springer.com/content/pdf/10.1007%2F978-1-4842-4398-5.pdf
Thanks,
Santosh

Related

Call a user-defined function within a new user-defined function in C++

I would like to run these function in R by sourcing the C++ code into R. I have used sourceCpp function in Rcpp R package to load the functions from a *.cpp file.
However, when I used the function get_add_terms_lapply:
get_add_terms_lapply(t = 20,gamma=0.5772,Nb=25.13274)
I get the following error:
Error in get_add_terms_lapply(t = time_end, gamma = gamma, Nb = Nb) :
negative length vectors are not allowed
This is the source *.cpp file that I have used:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double get_add_terms(double i, double t, double Nb) {
int j = i - 3;
NumericVector x(j);
double total = 0;
if( j == 0){
total = 0;
}else{
for(int u = 0; u <= j; u++) {
int res = pow(u+1, 2.0);
x[u] = res;
}
NumericVector::iterator it;
for(it = x.begin(); it != x.end(); ++it) {
total += *it;
}
}
float log_term = pow(log(t - (0.5 + i - 2.0)), (i-1));
double int_tot = (total / t);
double exp_term = exp(-int_tot);
double add_terms = (i * exp_term * log_term) / pow(Nb,(i-1));
return add_terms;
}
// [[Rcpp::export]]
NumericVector get_add_terms_lapply(double t, double gamma, double Nb) {
float term1 = ( 2.0 * log(t - 0.5) + (2.0 * gamma)) / Nb;
double N;
if (t == 1 || t == 2) {
N = 2;
}else{
if (t <= 44) {
N = t;
}else{
N = 44;
}
}
NumericVector term_up(N);
term_up[0] = 0;
term_up[1] = term1;
for(double i = 2; i < N; ++i) {
term_up[i-1] = get_add_terms(i, t, Nb);
}
return term_up;
}
Beyond the specific example I have here, I would like to know how a previously user-defined function can be used in a new user-defined function in C++.

Simple and Efficient FFT C or C++ code for HLS implementation

I'm working on my project and it's related to speech processing. I have to implement parts of the project on an Intel FPGA board using Intel HLS Compiler that converts C code to RTL code for FPGA Implementation.
I need to convert time domain signals to frequency domain using FFT in order to process the input signals of speech.I don't know how does FFT algorithm work i just know that it would be more efficient to use a FFT instead of DFT for hardware implementation.
I need a simple radix-2 code that doesn't involve malloc() function and complex.h library if a code uses them i have to port the code, since i don't have a lot of experience with C it would be hard and time consuming for me to port the code that is only a small part of the project;math.h library is allowed to be included and can be implemented in HLS.
I have searched and found many C and C++ implementations of FFT algorithm but most of them need porting to be suitable for HLS implementation.
I have found following code and i think it's the best match for my implementation since it does not include complex library and malloc function : https://gist.github.com/Determinant/db7889995f08fe982418
I can't understand some parts of aforementioned code:
-What is the following definition used for?
#define comp_mul_self(c, c2) \
-What are the fft function inputs? because the input variables names are not clear enough for me.
void fft(const Comp *sig, Comp *f, int s, int n, int inv)
I would really appreciate if someone with enough knowledge of FFT algorithm could comment the important parts of the code and if there is a better C or C++ code for my implementation please point me the directions.
here is the full code from link above:
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct Comp {
/* comp of the form: a + bi */
double a, b;
} Comp;
Comp comp_create(double a, double b) {
Comp res;
res.a = a;
res.b = b;
return res;
}
Comp comp_add(Comp c1, Comp c2) {
Comp res = c1;
res.a += c2.a;
res.b += c2.b;
return res;
}
Comp comp_sub(Comp c1, Comp c2) {
Comp res = c1;
res.a -= c2.a;
res.b -= c2.b;
return res;
}
Comp comp_mul(Comp c1, Comp c2) {
Comp res;
res.a = c1.a * c2.a - c1.b * c2.b;
res.b = c1.b * c2.a + c1.a * c2.b;
return res;
}
void comp_print(Comp comp) {
printf("%.6f + %.6f i\n", comp.a, comp.b);
}
/* const double PI = acos(-1); */
#define PI 3.141592653589793
#define SQR(x) ((x) * (x))
/* Calculate e^(ix) */
Comp comp_euler(double x) {
Comp res;
res.a = cos(x);
res.b = sin(x);
return res;
}
#define comp_mul_self(c, c2) \
do { \
double ca = c->a; \
c->a = ca * c2->a - c->b * c2->b; \
c->b = c->b * c2->a + ca * c2->b; \
} while (0)
void dft(const Comp *sig, Comp *f, int n, int inv) {
Comp ep = comp_euler(2 * (inv ? -PI : PI) / (double)n);
Comp ei, ej, *pi = &ei, *pj = &ej, *pp = &ep;
int i, j;
pi->a = pj->a = 1;
pi->b = pj->b = 0;
for (i = 0; i < n; i++)
{
f[i].a = f[i].b = 0;
for (j = 0; j < n; j++)
{
f[i] = comp_add(f[i], comp_mul(sig[j], *pj));
comp_mul_self(pj, pi);
}
comp_mul_self(pi, pp);
}
}
void fft(const Comp *sig, Comp *f, int s, int n, int inv) {
int i, hn = n >> 1;
Comp ep = comp_euler((inv ? PI : -PI) / (double)hn), ei;
Comp *pi = &ei, *pp = &ep;
if (!hn) *f = *sig;
else
{
fft(sig, f, s << 1, hn, inv);
fft(sig + s, f + hn, s << 1, hn, inv);
pi->a = 1;
pi->b = 0;
for (i = 0; i < hn; i++)
{
Comp even = f[i], *pe = f + i, *po = pe + hn;
comp_mul_self(po, pi);
pe->a += po->a;
pe->b += po->b;
po->a = even.a - po->a;
po->b = even.b - po->b;
comp_mul_self(pi, pp);
}
}
}
void print_result(const Comp *sig, const Comp *sig0, int n) {
int i;
double err = 0;
for (i = 0; i < n; i++)
{
Comp t = sig0[i];
t.a /= n;
t.b /= n;
comp_print(t);
t = comp_sub(t, sig[i]);
err += t.a * t.a + t.b * t.b;
}
printf("Error Squared = %.6f\n", err);
}
void test_dft(const Comp *sig, Comp *f, Comp *sig0, int n) {
int i;
puts("## Direct DFT ##");
dft(sig, f, n, 0);
for (i = 0; i < n; i++)
comp_print(f[i]);
puts("----------------");
dft(f, sig0, n, 1);
print_result(sig, sig0, n);
puts("################");
}
void test_fft(const Comp *sig, Comp *f, Comp *sig0, int n) {
int i;
puts("## Cooley–Tukey FFT ##");
fft(sig, f, 1, n, 0);
for (i = 0; i < n; i++)
comp_print(f[i]);
puts("----------------------");
fft(f, sig0, 1, n, 1);
print_result(sig, sig0, n);
puts("######################");
}
int main() {
int n, i, k;
Comp *sig, *f, *sig0;
scanf("%d", &k);
n = 1 << k;
sig = (Comp *)malloc(sizeof(Comp) * (size_t)n);
sig0 = (Comp *)malloc(sizeof(Comp) * (size_t)n);
f = (Comp *)malloc(sizeof(Comp) * (size_t)n);
for (i = 0; i < n; i++)
{
sig[i].a = rand() % 10;
sig[i].b = 0;
}
puts("## Original Signal ##");
for (i = 0; i < n; i++)
comp_print(sig[i]);
puts("#####################");
test_dft(sig, f, sig0, n);
test_fft(sig, f, sig0, n);
return 0;
}

Find the index and value of column maximum in openCV matrix

This is the original MATLAB implementation
function[m, p] = max2(im)
[m1, k1] = max(im);
[m, k2] = max(m1);
x = k2;
y = k1(k2);
p = [y, x];
It is being used inside this functionality
for r = 2.^linspace(log2(minR),log2(maxR),numSteps);
itestSeek = imresize(itestBase,minR/r);
icorr = normxcorr2(cc,itestSeek);
[m,p] = max2(icorr); //here
if (m>bestm)
bestp = p*r;
bests = ccSize*r;
bestm = m;
end;
end;
Here is my OpenCV 3.0.0/ c++ implementation
void Utilities::Max2(cv::Mat input_image, double& m, std::vector<int>& p)
{
std::vector<double> m1(input_image.cols); // the local maximum for each column
std::vector<int> k1(input_image.cols); // the index of the local maximum
for (int c = 0; c < input_image.cols; ++c)
{
float temp_max = input_image.at<float>(0, c);
int temp_index = 0;
for (int r = 0; r < input_image.rows; ++r)
{
if (temp_max < input_image.at<float>(r, c))
{
temp_max = input_image.at<float>(r, c);
temp_index = r;
}
}
m1[c] = temp_max;
k1[c] = temp_index;
}
auto iter = std::max_element(m1.begin(), m1.end()); //max of all the local maximum;
m = *iter;
int k2 = std::distance(m1.begin(), iter);
double y = k1[k2];
p.push_back(y);
p.push_back(k2);
}
c++ usage of the function
std::vector<double> best_p;
std::vector<double> best_s;
for (double i = 0; i < linspace_vector.size(); i++)
{
cv::Mat i_test_seek;
cv::Mat i_corr;
double r = linspace_vector[i];
double resize_factor = min_r / r; // minR/r in matlab
cv::resize(i_test_base, i_test_seek, cv::Size(), resize_factor, resize_factor, cv::INTER_CUBIC);
cv::matchTemplate(i_test_seek, cc_template, i_corr, CV_TM_CCORR_NORMED);
cv::imshow("i_corr", i_corr);
cv::waitKey(0);
double m;
std::vector<int> p;
Utilities::Max2(i_corr, m, p);
if (m> best_m)
{
best_p.clear();
best_s.clear();
for (int i = 0; i < p.size(); ++i)
{
best_p.push_back(p[i] * r);
}
best_s.push_back(cc_size_height * r);
best_s.push_back(cc_size_width * r);
best_m = m;
}
}
Can you suggest a more efficient way of doing this?
I find the local maximum for each column and the index of that value.
Later I find the global maximum of all of the indices.

Can you try the following and benchmark, if the performance increases:
#include <limits>
void Utilities::Max2(cv::Mat input_image, double& m, std::vector<int>& p)
{
m = std::numeric_limits<double>::min;
std::pair<int, int> temp_index = 0;
for (int r = 0; r < input_image.rows; ++r)
{
for (int c = 0; c < input_image.cols; ++c)
{
if (m < input_image.at<float>(r, c))
{
m = input_image.at<float>(r, c);
temp_index = std::make_pair(c, r);
}
}
}
p[0] = temp_index.second;
p[1] = temp_index.first;
}

If there is a way to get the input as a vector and you can get the number col columns, for example using:
int cols = input_image.rows;
std::vector<double> v;
v.assign(input_image.datastart, input_image.dataend);
Then you can compute in just one go:
std::vector<double>::iterator iter = std::max_element(v.begin(), v.end());
double m = *iter;
int k = std::distance(v.begin(), iter);
int y = (int)k / cols;
int x = k % cols;
However, I am not sure if getting the data as a vector is an option nor the performance of convert it into a vector. Maybe you can run and see how it compares to your implementation.

The first piece of code is essentially finding the max value and its indices (both x and y) in an image to my understanding.
function[m, p] = max2(im)
[m1, k1] = max(im); %find the max value in each col
[m, k2] = max(m1); %find the max value among maxes
x = k2; %find the "row" of the max value
y = k1(k2); %and its "col"
p = [y, x];
This can be done using some iterations but iteration is almost always significantly slower than vector operations or Opencv functions.
So, if my understanding is correct, this operation can simply be done by
double minVal, maxVal;
Point minLoc, maxLoc;
minMaxLoc(im, &minVal, &maxVal, &minLoc, &maxLoc);
maxLoc.y will give the row, and maxLoc.x will give col.
update: Your Matlab code can also be simplified (which potentially will speed up too)
[mx, ind] = max(im(:));
p = [rem(ind,size(im,1)) ceil(ind/size(im,1))];

You could also try the following:
// creating a random matrix with 2 rows and 4 columns
Mat1d mat(2, 4);
double low = -7000.0; // minimum value for generating random numbers
double high = +7000.0; // maximum value for generating random numbers
randu(mat, Scalar(low), Scalar(high)); // generating random number matrix
double max_element = *std::max_element(mat.begin(),mat.end()); // get the max element in the matrix
int max_element_index = std::max_element(mat.begin(),mat.end()) - mat.begin(); // get the max_element_index from the matrix`
The max element index is a row major order value starting from 0 until number of items in the matrix, in this case 7,
cout << mat << endl;
cout << max_element << endl;
cout << max_element_index << endl;
[Referred Generate random numbers matrix in OpenCV for the code above]

Find nearest three values of a number from an array of numbers

I have 20 coorinates x[20], y[20], I'm trying to get the nearest 3 coorinates to the user coordinate, this function supposed to return the indexes of the 3 nearest values.
double distanceFormula(double x1, double x2, double y1, double y2){
return sqrt(pow((x1 - x2), 2) + pow((y1 - y2), 2));
}
int* FindNearestThree(double keyX, double keyY, double x[], double y[]){
int wanted [3];
double distance;
double distTemp;
for (int i = 0; i<20; i++)
{
distTemp = formula(keyX, x[i], keyY, y[i]);
if (distance != null || distance > distTemp){
distance = distTemp;
wanted[0] = i;
}
//this will get only the nearest value
}
return results;
}

using Point = std::pair<int, int>;
std::array<Point, 20> points;
populate(points);
std::sort(
points.begin()
, points.end()
, [up=get_user_coords()](const Point& p1, const Point& p2) {
int d1 = std::pow(up.first - p1.first, 2) + std::pow(up.second - p1.second, 2);
int d2 = std::pow(up.first - p2.first, 2) + std::pow(up.second - p2.second, 2);
return d1 < d2;
});
// The nearest 3 points are now at indices 0, 1, 2.
If you need to work with many, many more points, then I suggest doing some research on the Nearest neighbor search algorithm, because this can get slow fast.

Following may help:
template <std::size_t N, typename It, typename Queue>
std::array<It, N> asArray(Queue& queue, It emptyValue)
{
std::array<It, N> res;
for (auto& e : res) {
if (queue.empty()) {
e = emptyValue;
} else {
e = queue.top();
queue.pop();
}
}
return res;
}
template <std::size_t N, typename It, typename ValueGetter>
std::array<It, N>
MinNElementsBy(It begin, It end, ValueGetter valueGetter)
{
auto myComp = [&](const It& lhs, const It& rhs)
{
return valueGetter(*lhs) < valueGetter(*rhs);
};
std::priority_queue<It, std::vector<It>, decltype(myComp)> queue(myComp);
for (auto it = begin; it != end; ++it) {
queue.push(it);
if (N < queue.size()) {
queue.pop();
}
}
return asArray<N>(queue, end);
}
Live Demo

i guess it might be the simplest and really ugly solution:
for (int j = 0; j <3; j++) {
for (int i = 0; i<20; i++)
{ /* if statement needed here to check if you already
have current value in your result set
and then your loop as it is*/
}
}

Passing Extra Data Into a Function

I am using the dlib optimization library for C++ specifically the following function:
template <
typename search_strategy_type,
typename stop_strategy_type,
typename funct,
typename funct_der,
typename T
>
double find_max (
search_strategy_type search_strategy,
stop_strategy_type stop_strategy,
const funct& f,
const funct_der& der,
T& x,
double max_f
);
Functions f and der are designed to take a vector of the data parameters being modified to obtain the maximum value of my function. However my function being maximized has four parameters (one is my dataset and the other is fixed by me). However I can't pass these as inputs to my f and der functions because of the format they are supposed to have. How do I get this data into my functions? I am currently trying the below (I hard set variable c but for vector xgrequ I read data from a file each time I process the function.
//Function to be minimized
double mleGPD(const column_vector& p)
{
std::ifstream infile("Xm-EVT.csv");
long double swapRet;
std::string closeStr;
std::vector<double> histRet;
//Read in historical swap data file
if (infile.is_open())
{
while (!infile.eof())
{
infile >> swapRet;
histRet.push_back(swapRet);
}
}
sort(histRet.begin(), histRet.end());
std::vector<double> negRet;
//separate out losses
for (unsigned c = 0; c < histRet.size(); c++)
{
if (histRet[c] < 0)
{
negRet.push_back(histRet[c]);
}
}
std::vector<double> absValRet;
//make all losses positive to fit with EVT convention
for (unsigned s = 0; s < negRet.size(); s++)
{
absValRet.push_back(abs(negRet[s]));
}
std::vector<double> xminusu, xmu, xgrequ;
int count = absValRet.size();
double uPercent = .9;
int uValIndex = ceil((1 - uPercent)*count);
int countAbove = count - uValIndex;
double c = (double)absValRet[uValIndex - 1];
//looking at returns above u
for (unsigned o = 0; o < uValIndex; ++o)
{
xmu.push_back(absValRet[o] - c);
if (xmu[o] >= 0)
{
xgrequ.push_back(absValRet[o]);
xminusu.push_back(xmu[o]);
}
}
double nu = xgrequ.size();
double sum = 0.0;
double a = p(0);
double b = p(1);
for (unsigned h = 0; h < nu; ++h)
{
sum += log((1 / b)*pow(1 - a*((xgrequ[h] - c) / b), -1 + (1 / a)));
}
return sum;
}
//Derivative of function to be minimized
const column_vector mleGPDDer(const column_vector& p)
{
std::ifstream infile("Xm-EVT.csv");
long double swapRet;
std::string closeStr;
std::vector<double> histRet;
//Read in historical swap data file
if (infile.is_open())
{
while (!infile.eof())
{
infile >> swapRet;
histRet.push_back(swapRet);
}
}
sort(histRet.begin(), histRet.end());
std::vector<double> negRet;
//separate out losses
for (unsigned c = 0; c < histRet.size(); c++)
{
if (histRet[c] < 0)
{
negRet.push_back(histRet[c]);
}
}
std::vector<double> absValRet;
//make all losses positive to fit with EVT convention
for (unsigned s = 0; s < negRet.size(); s++)
{
absValRet.push_back(abs(negRet[s]));
}
std::vector<double> xminusu, xmu, xgrequ;
int count = absValRet.size();
double uPercent = .9;
int uValIndex = ceil((1 - uPercent)*count);
int countAbove = count - uValIndex;
double c = (double)absValRet[uValIndex - 1];
//looking at returns above u
for (unsigned o = 0; o < uValIndex; ++o)
{
xmu.push_back(absValRet[o] - c);
if (xmu[o] >= 0)
{
xgrequ.push_back(absValRet[o]);
xminusu.push_back(xmu[o]);
}
}
column_vector res(2);
const double a = p(0);
const double b = p(1);
double nu = xgrequ.size();
double sum1 = 0.0;
double sum2 = 0.0;
for (unsigned h = 0; h < nu; ++h)
{
sum1 += ((xgrequ[h]-c)/b)/(1-a*((xgrequ[h]-c)/b));
sum2 += log(1 - a*((xgrequ[h] - c) / b));
}
res(0) = sum1;//df/da
res(1) = sum2;//df/db
return res;
}
Here is what my actual function call looks like:
//Dlib max finding
column_vector start(2);
start = .1, .1; //starting point for a and b
find_max(bfgs_search_strategy(), objective_delta_stop_strategy(1e-6), mleGPD, mleGPDDer, start,100);
std::cout << "solution" << start << std::endl;

This kind of API is very common. It's almost always possible to for f and der to any callable, not just static functions. that is, you can pass a custom class object with operator () to it.
For example
struct MyF {
//int m_state;
// or other state variables, such as
std::vector<double> m_histRet;
// (default constructors will do)
double operator()(const column_vector& p) const {
return some_function_of(p, m_state);
}
};
int main(){
. . .
MyF myf{42};
// or
MyF myf{someVectorContainingHistRet};
// then use myf as you would have used mleGPD
}
You need to initiate MyF and MyDer with the same state (std::vector<double> histRet I presume.) Either as copies or (const) references to the same state.
Edit: More full example:
struct MLGDPG_State {
std::vector<double> xgrequ;
// . . . and more you need in f or fder
}
MLGDPG_State makeMLGDPG_State(const std::string& filename){
std::ifstream infile(filename);
std::ifstream infile("Xm-EVT.csv");
long double swapRet;
std::string closeStr;
std::vector<double> histRet;
//Read in historical swap data file
if (infile.is_open())
{
while (!infile.eof())
{
infile >> swapRet;
histRet.push_back(swapRet);
}
}
sort(histRet.begin(), histRet.end());
std::vector<double> negRet;
//separate out losses
for (unsigned c = 0; c < histRet.size(); c++)
{
if (histRet[c] < 0)
{
negRet.push_back(histRet[c]);
}
}
std::vector<double> absValRet;
//make all losses positive to fit with EVT convention
for (unsigned s = 0; s < negRet.size(); s++)
{
absValRet.push_back(abs(negRet[s]));
}
std::vector<double> xminusu, xmu, xgrequ;
int count = absValRet.size();
double uPercent = .9;
int uValIndex = ceil((1 - uPercent)*count);
int countAbove = count - uValIndex;
double c = (double)absValRet[uValIndex - 1];
//looking at returns above u
for (unsigned o = 0; o < uValIndex; ++o)
{
xmu.push_back(absValRet[o] - c);
if (xmu[o] >= 0)
{
xgrequ.push_back(absValRet[o]);
xminusu.push_back(xmu[o]);
}
}
return {std::move(xgrequ)};
// Or just 'return MleGPD(xgrequ)' if you are scared of {} and move
}
//Functor Class, for ion to be minimized
struct MleGPD{
MLGDPG_State state;
double operator()(const column_vector& p) const {
auto mu = state.xgrequ.size();
double sum = 0.0;
double a = p(0);
double b = p(1);
for (unsigned h = 0; h < nu; ++h)
{
sum += log((1 / b)*pow(1 - a*((xgrequ[h] - c) / b), -1 + (1 / a)));
}
return sum;
};
Use the same pattern for a struct MleGPD_Derivative.
Usage:
const auto state = makeMLGDPG_State("Xm-EVT.csv");
const auto f = MleGPD{state};
const auto der = MleGPD_Derivative{state};
start = .1, .1; //starting point for a and b
find_max(bfgs_search_strategy(), objective_delta_stop_strategy(1e-6), f, der, start,100);
std::cout << "solution" << start << std::endl;
Note, that for simple structs like these, it's often fine to use the default constructors, copy constructor etc. Also note http://en.cppreference.com/w/cpp/language/aggregate_initialization

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

reduce_parallel is not a thread-safe function? - c++

Related

Call a user-defined function within a new user-defined function in C++

Simple and Efficient FFT C or C++ code for HLS implementation

Find the index and value of column maximum in openCV matrix

Find nearest three values of a number from an array of numbers

Passing Extra Data Into a Function

Categories

Resources