Determining CPU time required to execute loop

Determining CPU time required to execute loop - c++

I've done some SO searching and found this and that outlining timing methods.
My problem is that I need to determine the CPU time (in milliseconds) required to execute the following loop:
for (int i = 0, temp = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
I've looked at two methods, clock() and stead_clock::now(). Per the docs, I know that clock() returns "ticks" so I can get it in seconds by dividing the difference using CLOCKS_PER_SEC. The docs also mention that steady_clock is designed for interval timing, but you have to call duration_cast<milliseconds> to change its unit.
What I've done to time the two (since doing both in the same run may lead to one taking longer since the other was called first) is run them each by themselves:
clock_t t = clock();
for (int i = 0, temp = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
t = clock() - t;
cout << (float(t)/CLOCKS_PER_SEC) * 1000 << "ms taken" << endl;
chrono::steady_clock::time_point p1 = chrono::steady_clock::now();
for (int i = 0, temp = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
chrono::steady_clock::time_point p2 = chrono::steady_clock::now();
cout << chrono::duration_cast<milliseconds>(p2-p1).count() << "ms taken" << endl;
Output:
0ms taken
0ms taken
Do both these methods floor the result? Surely some fractal of milliseconds took place?
So which is ideal (or rather, more appropriate) for determining the CPU time required to execute the loop? At first glance, I would argue for clock() since the docs specifically tell me that its for determining CPU time.
For context, my CLOCKS_PER_SEC holds a value of 1000.
Edit/Update:
Tried the following:
clock_t t = clock();
for (int j = 0; j < 1000000; j++) {
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
}
t = clock() - t;
cout << (float(t) * 1000.0f / CLOCKS_PER_SEC / 1000000.0f) << "ms taken" << endl;
Outputs: 0.019953ms taken
clock_t start = clock();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
clock_t end = clock();
cout << fixed << setprecision(2) << 1000.0 * (end - start) / CLOCKS_PER_SEC << "ms taken" << endl;
Outputs: 0.00ms taken
chrono::high_resolution_clock::time_point p1 = chrono::high_resolution_clock::now();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
chrono::high_resolution_clock::time_point p2 = chrono::high_resolution_clock::now();
cout << (chrono::duration_cast<chrono::microseconds>(p2 - p1).count()) / 1000.0 << "ms taken" << endl;
Outputs: 0.072ms taken
chrono::steady_clock::time_point p1 = chrono::steady_clock::now();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
chrono::steady_clock::time_point p2 = chrono::steady_clock::now();
cout << (chrono::duration_cast<chrono::microseconds>(p2 - p1).count()) / 1000.0f << "ms taken" << endl;
Outputs: 0.044ms
So the question becomes, which is valid? The second method to me seems invalid because I think the loop is completing faster than a millisecond.
I understand the first method (simply to execute longer) but the last two methods produce drastically different results.
One thing I've noticed is that after compiling the program, the first time I run it I may get 0.073ms (for the high_resolution_clock) and 0.044ms (for the steady_clock) at first, but all subsequent runs are within the range of 0.019 - 0.025ms.

You can do the loop a million times, and divide. You can also add the volatile keyword to avoid some compiler optimizations.
clock_t t = clock();
for (int j = 0, j < 1000000; j++) {
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
}
t = clock() - t;
cout << (float(t) * 1000.0f / CLOCKS_PER_SEC / 1000000.0f) << "ms taken" << endl;

Well using GetTickCount() seems to be solution, I hope
double start_s = GetTickCount();
for (int i = 0, temp = 0; i < 10000000; i++)
{
if (i % 2 == 0)
{
temp = (i / 2) + 1;
}
else
{
temp = 2 * i;
}
}
double stop_s = GetTickCount();
cout << (stop_s - start_s) / double(CLOCKS_PER_SEC) * 1000 << "ms taken" << endl;
For me returns between 16-31ms

Related

SSE slower than standard logic [duplicate]

This question already has answers here:
SSE intrinsics without compiler optimization
(2 answers)
Idiomatic way of performance evaluation?
(1 answer)
Closed 3 days ago.
Add 2 arrays element by element. Standart logic and by SSE
int count = std::pow(2, 20);
alignas(16) float* fm1 = new float[count];
alignas(16) float* fm2 = new float[count];
alignas(16) float* res = new float[count];
for (int i = 0; i < count; ++i) {
fm1[i] = static_cast<float>(i);
fm2[i] = static_cast<float>(i);
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int j = 0; j < 1000; ++j) {
for (int i = 0; i < count; ++i) {
res[i] = fm1[i] + fm2[i];
}
}
auto diff = std::chrono::high_resolution_clock::now() - start;
std::cout << "execute time duration = " << std::chrono::duration<double, std::milli>(diff).count() << " milliseconds" << std::endl;
}
{
assert(count % 4 == 0);
auto start = std::chrono::high_resolution_clock::now();
for (int j = 0; j < 1000; ++j) {
for (int i = 0; i < count; i += 4) {
__m128 a = _mm_load_ps(&fm1[i]);
__m128 b = _mm_load_ps(&fm2[i]);
__m128 r = _mm_add_ps(a, b);
_mm_store_ps(&res[i], r);
}
}
auto diff = std::chrono::high_resolution_clock::now() - start;
std::cout << "execute time duration = " << std::chrono::duration<double, std::milli>(diff).count() << " milliseconds" << std::endl;
}
result
execute time duration = 1692.19 milliseconds
execute time duration = 2339.49 milliseconds
laptop configuration
11th Gen Intel(R) Core(TM) i7-11370H # 3.30GHz 3.30 GHz
16,0 ГБ
I expect that SSE logic wil be faster at least 3 times, but it slover

Parallel execution taking more time than serial

I am basically writing code to count if a pair sum is even(among all pairs from 1 to 100000). I wrote a code using pthreads and without pthreads. But the code with pthreads is taking more time than the serial one. Here is my serial code
#include<bits/stdc++.h>
using namespace std;
int main()
{
long long sum = 0, count = 0, n = 100000;
auto start = chrono::high_resolution_clock::now();
for(int i = 1; i <= n; i++)
for(int j = i-1; j >= 0; j--)
{
sum = i + j;
if(sum%2 == 0)
count++;
}
cout<<"count is "<<count<<endl;
auto end = chrono::high_resolution_clock::now();
double time_taken = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
time_taken *= 1e-9;
cout << "Time taken by program is : " << fixed << time_taken << setprecision(9)<<" secs"<<endl;
return 0;
}
and here is my parallel code
#include<bits/stdc++.h>
using namespace std;
#define MAX_THREAD 3
long long cnt[5] = {0};
long long n = 100000;
int work_per_thread;
int start[] = {1, 60001, 83001, 100001};
void *count_array(void* arg)
{
int t = *((int*)arg);
long long sum = 0;
for(int i = start[t]; i < start[t+1]; i++)
for(int j = i-1; j >=0; j--)
{
sum = i + j;
if(sum%2 == 0)
cnt[t]++;
}
cout<<"thread"<<t<<" finished work "<<cnt[t]<<endl;
return NULL;
}
int main()
{
pthread_t threads[MAX_THREAD];
int arr[] = {0,1,2};
long long total_count = 0;
work_per_thread = n/MAX_THREAD;
auto start = chrono::high_resolution_clock::now();
for(int i = 0; i < MAX_THREAD; i++)
pthread_create(&threads[i], NULL, count_array, &arr[i]);
for(int i = 0; i < MAX_THREAD; i++)
pthread_join(threads[i], NULL);
for(int i = 0; i < MAX_THREAD; i++)
total_count += cnt[i];
cout << "count is " << total_count << endl;
auto end = chrono::high_resolution_clock::now();
double time_taken = chrono::duration_cast<chrono::nanoseconds>(end - start).count();
time_taken *= 1e-9;
cout << "Time taken by program is : " << fixed << time_taken << setprecision(9)<<" secs"<<endl;
return 0;
}
In the parallel code I am creating three threads and 1st thread will be doing its computation from 1 to 60000, 2nd thread from 60001 to 83000 and so on. I have chosen these numbers so that each thread gets to do approximately similar number of computations. The parallel execution takes 10.3 secs whereas serial one takes 7.7 secs. I have 6 cores and 2 threads per core. I also used htop command to check if the required number of threads are running or not and it seems to be working fine. I don't understand where the problem is.

The all cores in the threaded version compete for cnt[].
Use a local counter inside the loop and copy the result into cnt[t] after the loop is ready.

malloc(): corrupted top size, but I am using delete[]

void merge_sort(int* arr, int start, int stop){ // [start; stop]
int min_array_size = 8;
std::cout << start << " " << stop << std::endl;
if ((stop - start + 1) > min_array_size){
merge_sort(arr, start, start + ((stop-start+1) / 2) - 1);
merge_sort(arr, start + ((stop-start+1) / 2), stop);
std::cout << "merging: " << start << " " << stop << std::endl;
int left_index = start;
int right_index = start + ((stop-start+1) / 2);
int* new_arr = new int[stop-start + 1];
std::cout << "New_arr created!\n";
for (int i = start; i <= stop; i++){
if (arr[left_index] < arr[right_index]){
new_arr[i] = arr[left_index];
left_index++;
}
else {
new_arr[i] = arr[right_index];
right_index++;
}
if (left_index == start + ((stop-start+1) / 2)){
i++;
for (int j = i; j <= stop; j++, i++){
new_arr[j] = arr[right_index++];
}
}
if (right_index > stop){
i++;
for (int j = i; j <= stop; j++, i++){
new_arr[j] = arr[left_index++];
}
}
}
for (int i = start; i <= stop; i++){
arr[i] = new_arr[i];
std::cout << "arr[" << i << "] = " << arr[i] << std::endl;
}
delete[] new_arr;
std::cout << "memory cleaned!\n";
}
else{
selection_sort(arr + start, (stop - start + 1));
for (int i = 0; i < (stop - start) + 1; i++){
std::cout << arr[i + start] << " ";
}
std::cout << std::endl;
}
}
Can someone please tell me why it says the following if I clean memory with delete[] new_arr?
malloc(): corrupted top size
I really can't understand why it is so.
Here is my insertion_sort() code:
void selection_sort(int* arr, size_t size){
for (int i = 0; i < size - 1; i++){
int min_index = i;
for (int j = i + 1; j < size; j++){
if (arr[j] < arr[min_index]){
min_index = j;
}
}
if (min_index != i){
std::swap(arr[i], arr[min_index]);
}
}
}
https://godbolt.org/z/Wj56MM349

You're overflowing your heap allocations, corrupting the heap, and breaking things eventually. The problem is:
You allocate an array with stop - start + 1 elements (meaning valid indices run from 0 to stop - start.
In the subsequent loop (for (int i = start; i <= stop; i++){), you assign to any and all array indices from start to stop (inclusive on both ends).
If start is not 0, then assignment to new_arr[stop] is definitely an out-of-bounds write; the larger start gets, the more invalid indices you'll see.
The problem would rarely occur as a result of a single call (the heap metadata being corrupted would be for a subsequent heap element, not the one containing new_arr), but with the recursive calls you have plenty of opportunities to do terrible things to the heap and break it faster. Even a single such out-of-bounds write puts you in nasal demons territory, so you need to fix your code to avoid doing it even once.

Smith Waterman for C++ (Visual Studio 14.0)

I would like to kill two birds with one stone, as the questions are very similiar:
1:
I followed this code on github Smith Waterman Alignment to create the smith-waterman in C++. After some research I understood that implementing
double H[N_a+1][N_b+1]; is not possible (anymore) for the "newer" C++ versions. So to create a constant variable I changed this line to:
double **H = new double*[nReal + 1];
for (int i = 0; i < nReal + 1; i++)
H[i] = new double[nSynth + 1];
and also the same scheme for int I_i[N_a+1][N_b+1], I_j[N_a+1][N_b+1]; and so one (well, everywhere, where a two dimensional array exists). Now I'm getting the exception:
Unhandled exception at 0x00007FFF7B413C58 in Smith-Waterman.exe: Microsoft C
++ exception: std :: bad_alloc at location 0x0000008FF4F9FA50.
What is wrong here? Already debugged, and the program throws the exceptions above the for (int i = 0; i < nReal + 1; i++).
2: This code uses std::strings as parameters. Would it be also possible to create a smith waterman algortihm for cv::Mat?
For maybe more clarification, my full code looks like this:
#include "BinaryAlignment.h"
#include "WallMapping.h"
//using declarations
using namespace cv;
using namespace std;
//global variables
std::string bin;
cv::Mat temp;
std::stringstream sstrMat;
const int maxMismatch = 2;
const float mu = 0.33f;
const float delta = 1.33;
int ind;
BinaryAlignment::BinaryAlignment() { }
BinaryAlignment::~BinaryAlignment() { }
/**
*** Convert matrix to binary sequence
**/
std::string BinaryAlignment::matToBin(cv::Mat src, std::experimental::filesystem::path path) {
cv::Mat linesMat = WallMapping::wallMapping(src, path);
for (int i = 0; i < linesMat.size().height; i++) {
for (int j = 0; j < linesMat.size().width; j++) {
if (linesMat.at<Vec3b>(i, j)[0] == 0
&& linesMat.at<Vec3b>(i, j)[1] == 0
&& linesMat.at<Vec3b>(i, j)[2] == 255) {
src.at<int>(i, j) = 1;
}
else {
src.at<int>(i, j) = 0;
}
sstrMat << src.at<int>(i, j);
}
}
bin = sstrMat.str();
return bin;
}
double BinaryAlignment::similarityScore(char a, char b) {
double result;
if (a == b)
result = 1;
else
result = -mu;
return result;
}
double BinaryAlignment::findArrayMax(double array[], int length) {
double max = array[0];
ind = 0;
for (int i = 1; i < length; i++) {
if (array[i] > max) {
max = array[i];
ind = i;
}
}
return max;
}
/**
*** Smith-Waterman alignment for given sequences
**/
int BinaryAlignment::watermanAlign(std::string seqSynth, std::string seqReal, bool viableAlignment) {
const int nSynth = seqSynth.length(); //length of sequences
const int nReal = seqReal.length();
//H[nSynth + 1][nReal + 1]
double **H = new double*[nReal + 1];
for (int i = 0; i < nReal + 1; i++)
H[i] = new double[nSynth + 1];
cout << "passt";
for (int m = 0; m <= nSynth; m++)
for (int n = 0; n <= nReal; n++)
H[m][n] = 0;
double temp[4];
int **Ii = new int*[nReal + 1];
for (int i = 0; i < nReal + 1; i++)
Ii[i] = new int[nSynth + 1];
int **Ij = new int*[nReal + 1];
for (int i = 0; i < nReal + 1; i++)
Ij[i] = new int[nSynth + 1];
for (int i = 1; i <= nSynth; i++) {
for (int j = 1; j <= nReal; j++) {
temp[0] = H[i - 1][j - 1] + similarityScore(seqSynth[i - 1], seqReal[j - 1]);
temp[1] = H[i - 1][j] - delta;
temp[2] = H[i][j - 1] - delta;
temp[3] = 0;
H[i][j] = findArrayMax(temp, 4);
switch (ind) {
case 0: // score in (i,j) stems from a match/mismatch
Ii[i][j] = i - 1;
Ij[i][j] = j - 1;
break;
case 1: // score in (i,j) stems from a deletion in sequence A
Ii[i][j] = i - 1;
Ij[i][j] = j;
break;
case 2: // score in (i,j) stems from a deletion in sequence B
Ii[i][j] = i;
Ij[i][j] = j - 1;
break;
case 3: // (i,j) is the beginning of a subsequence
Ii[i][j] = i;
Ij[i][j] = j;
break;
}
}
}
//Print matrix H to console
std::cout << "**********************************************" << std::endl;
std::cout << "The scoring matrix is given by " << std::endl << std::endl;
for (int i = 1; i <= nSynth; i++) {
for (int j = 1; j <= nReal; j++) {
std::cout << H[i][j] << " ";
}
std::cout << std::endl;
}
//search H for the moaximal score
double Hmax = 0;
int imax = 0, jmax = 0;
for (int i = 1; i <= nSynth; i++) {
for (int j = 1; j <= nReal; j++) {
if (H[i][j] > Hmax) {
Hmax = H[i][j];
imax = i;
jmax = j;
}
}
}
std::cout << Hmax << endl;
std::cout << nSynth << ", " << nReal << ", " << imax << ", " << jmax << std::endl;
std::cout << "max score: " << Hmax << std::endl;
std::cout << "alignment index: " << (imax - jmax) << std::endl;
//Backtracing from Hmax
int icurrent = imax, jcurrent = jmax;
int inext = Ii[icurrent][jcurrent];
int jnext = Ij[icurrent][jcurrent];
int tick = 0;
char *consensusSynth = new char[nSynth + nReal + 2];
char *consensusReal = new char[nSynth + nReal + 2];
while (((icurrent != inext) || (jcurrent != jnext)) && (jnext >= 0) && (inext >= 0)) {
if (inext == icurrent)
consensusSynth[tick] = '-'; //deletion in A
else
consensusSynth[tick] = seqSynth[icurrent - 1]; //match / mismatch in A
if (jnext == jcurrent)
consensusReal[tick] = '-'; //deletion in B
else
consensusReal[tick] = seqReal[jcurrent - 1]; //match/mismatch in B
//fix for adding first character of the alignment.
if (inext == 0)
inext = -1;
else if (jnext == 0)
jnext = -1;
else
icurrent = inext;
jcurrent = jnext;
inext = Ii[icurrent][jcurrent];
jnext = Ij[icurrent][jcurrent];
tick++;
}
// Output of the consensus motif to the console
std::cout << std::endl << "***********************************************" << std::endl;
std::cout << "The alignment of the sequences" << std::endl << std::endl;
for (int i = 0; i < nSynth; i++) {
std::cout << seqSynth[i];
};
std::cout << " and" << std::endl;
for (int i = 0; i < nReal; i++) {
std::cout << seqReal[i];
};
std::cout << std::endl << std::endl;
std::cout << "is for the parameters mu = " << mu << " and delta = " << delta << " given by" << std::endl << std::endl;
for (int i = tick - 1; i >= 0; i--)
std::cout << consensusSynth[i];
std::cout << std::endl;
for (int j = tick - 1; j >= 0; j--)
std::cout << consensusReal[j];
std::cout << std::endl;
int numMismatches = 0;
for (int i = tick - 1; i >= 0; i--) {
if (consensusSynth[i] != consensusReal[i]) {
numMismatches++;
}
}
viableAlignment = numMismatches <= maxMismatch;
return imax - jmax;
}
Thanks!

Can't seem to get the for loop divided among several thread with open mp

I've got this project to submit in college but it seems that I can't get the for loop to be divided among the several thread since I'm trying to print to thread number in the entire loop it just keeps printing zeros (indicating that only the master thread is handling the for loop) so here's the code
// Filename: swarm1.cpp
#include "swarm1.h"
#include <omp.h>
cParticle particles[MAX_PARTICLES];
int main()
{
omp_set_num_threads(4);
//double times[500];
//for (int i=0;i<500;i++){
double sTime=omp_get_wtime();
srand((unsigned)time(0));
psoAlgorithm();
double eTime=omp_get_wtime();
cout<<eTime-sTime<<" Seconds \n";
//times[i]=eTime-sTime;
//}
//double avg=0;
//for (int i=0;i<500;i++)
//avg+=times[i];
/*cout << avg << " seconds" << endl;
avg/=500;
cout << avg << " seconds" << endl;
*/
return 0;
}
void psoAlgorithm()
{
int gBest = 0;
int gBestTest = 0;
int epoch = 0;
bool done = false;
initialize();
do
{
/* Two conditions can end this loop:
if the maximum number of epochs allowed has been reached, or,
if the Target value has been found.
*/
if(epoch < MAX_EPOCHS){
#pragma omp parallel for schedule (dynamic,5)
for(int i = 0; i <= MAX_PARTICLES - 1; i++)
{
cout << omp_get_thread_num() <<endl;
/*for(int j = 0; j <= MAX_INPUTS - 1; j++)
{
if(j < MAX_INPUTS - 1){
cout << particles[i].getData(j) << " + ";
}else{
cout << particles[i].getData(j) << " = ";
}
} // j*/
//cout << testProblem(i) << endl;
if(testProblem(i) == TARGET)
{
done = true;
cout << " particle " << i << " has found the target \n";
for(int j = 0; j <= MAX_INPUTS - 1; j++)
{
if(j < MAX_INPUTS - 1){
cout << particles[i].getData(j) << " + ";
}else{
cout << particles[i].getData(j) << " = ";
}
}
}
} // i
gBestTest = minimum();
//If any particle's pBest value is better than the gBest value,
//make it the new gBest Value.
if(abs(TARGET - testProblem(gBestTest)) < abs(TARGET - testProblem(gBest)))
{
gBest = gBestTest;
}
getVelocity(gBest);
updateParticles(gBest);
epoch += 1;
}else{
done = true;
}
}while(!done);
cout << epoch << " epochs completed." << endl;
return;
}
void initialize()
{
int total;
#pragma omp parallel for schedule (dynamic,5)
for(int i = 0; i <= MAX_PARTICLES - 1; i++)
{
total = 0;
for(int j = 0; j <= MAX_INPUTS - 1; j++)
{
particles[i].setData(j, getRandomNumber(START_RANGE_MIN, START_RANGE_MAX));
total += particles[i].getData(j);
} // j
particles[i].setpBest(total);
} // i
return;
}
void getVelocity(int gBestIndex)
{
/* from Kennedy & Eberhart(1995).
vx[][] = vx[][] + 2 * rand() * (pbestx[][] - presentx[][]) +
2 * rand() * (pbestx[][gbest] - presentx[][])
*/
int testResults, bestResults;
float vValue;
bestResults = testProblem(gBestIndex);
#pragma omp parallel for schedule (dynamic,5)
for(int i = 0; i <= MAX_PARTICLES - 1; i++)
{
testResults = testProblem(i);
vValue = particles[i].getVelocity() +
2 * gRand() * (particles[i].getpBest() - testResults) + 2 * gRand() *
(bestResults - testResults);
if(vValue > V_MAX){
particles[i].setVelocity(V_MAX);
}else if(vValue < -V_MAX){
particles[i].setVelocity(-V_MAX);
}else{
particles[i].setVelocity(vValue);
}
} // i
}
void updateParticles(int gBestIndex)
{
int total, tempData;
#pragma omp parallel for schedule (dynamic,5)
for(int i = 0; i <= MAX_PARTICLES - 1; i++)
{
for(int j = 0; j <= MAX_INPUTS - 1; j++)
{
if(particles[i].getData(j) != particles[gBestIndex].getData(j))
{
tempData = particles[i].getData(j);
particles[i].setData(j, tempData + static_cast<int>(particles[i].getVelocity()));
}
} // j
//Check pBest value.
total = testProblem(i);
if(abs(TARGET - total) < particles[i].getpBest())
{
particles[i].setpBest(total);
}
} // i
}
int testProblem(int index)
{
int total = 0;
for(int i = 0; i <= MAX_INPUTS - 1; i++)
{
total += particles[index].getData(i);
} // i
return total;
}
float gRand()
{
// Returns a pseudo-random float between 0.0 and 1.0
return float(rand()/(RAND_MAX + 1.0));
}
int getRandomNumber(int low, int high)
{
// Returns a pseudo-random integer between low and high.
return low + int(((high - low) + 1) * rand() / (RAND_MAX + 1.0));
}
int minimum()
{
//Returns an array index.
int winner = 0;
bool foundNewWinner;
bool done = false;
do
{
foundNewWinner = false;
#pragma omp parallel for schedule (dynamic,2)
for(int i = 0; i <= MAX_PARTICLES - 1; i++)
{
if(i != winner){ //Avoid self-comparison.
//The minimum has to be in relation to the Target.
if(abs(TARGET - testProblem(i)) < abs(TARGET - testProblem(winner)))
{
winner = i;
foundNewWinner = true;
}
}
} // i
if(foundNewWinner == false)
{
done = true;
}
}while(!done);
return winner;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Determining CPU time required to execute loop - c++

Related

SSE slower than standard logic [duplicate]

Parallel execution taking more time than serial

malloc(): corrupted top size, but I am using delete[]

Smith Waterman for C++ (Visual Studio 14.0)

Can't seem to get the for loop divided among several thread with open mp

Categories

Resources