OpenMP, parallel for loop, Large differences in processing time

OpenMP, parallel for loop, Large differences in processing time - c++

I've developed a program that reads numbers from .txt file where it will store into a vector to undergone a series of combinations and calculations to determine whether the result matches the number that I've wanted. These process will be done in multiple threads, where each thread will be in charge of handling various number of iterations within the parallel for loop.
Long story short, the processing time varies a lot when it comes to large number (e.g. 9 numbers) where the processing time could be as short as 3 minutes or it could be more than 10 minutes.
Here's the benchmark that I've tried so far:
8 numbers serial : 18.119 seconds
8 numbers multithread (first-try): 10.238 seconds
8 numbers multithread (second-try): 18.943 seconds
9 numbers serial : 458.980 seconds
9 numbers multithread (first-try): 172.347 seconds
9 numbers multithread (second-try): 519.532 seconds //Seriously?
//Another try after suggested modifications
9 numbers multithread (first-try): 297.017 seconds
9 numbers multithread (second-try): 297.85 seconds
9 numbers multithread (third-try): 304.755 seconds
9 numbers multithread (fourth-try): 396.391 seconds
So the question is, is there any possible way to improve the program (multi-thread) so that it only requires the least amount of time to shuffle/calculate the numbers?
Here's a portion of the code where parallel for loop occurs (Updated with slight modifications):
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <stdlib.h>
#include <algorithm>
#include <stdio.h>
#include <Windows.h>
#include <omp.h>
#define OPERATORSIZE 3
using namespace std;
int cur_target;
ofstream outFile;
string get_operator(int i) {
switch (i) {
case 0:
return "+";
case 1:
return "-";
case 2:
return "*";
case 3:
return "/";
default:
return "";
}
}
int prev_num_pos(vector<int> &cur_equation, int count) {
for (int i = count - 1; i >= 0; i--) {
if (cur_equation[i] != -1) return i + 1;
}
return 0;
}
bool nextoperator(int k, vector<int> &operator_array) {
for (int i = k - 2; i >= 0; i--) {
if (operator_array[i] < OPERATORSIZE) {
operator_array[i] += 1;
break;
}
else
operator_array[i] = 0;
switch (i) {
case 0:
return false;
}
}
return true;
}
void vector_combination(vector<int> int_list) { // Generate the number combinations from the number list
bool div_remainder = false;
int count = 0;
#pragma omp parallel for schedule(dynamic) firstprivate(div_remainder) reduction(+:count)
for (int i = 0; i < int_list.size(); ++i) {
vector<int> cur_equation, cur_temp, cur_list, operator_array;
auto list = int_list;
rotate(list.begin(), list.begin() + i, list.begin() + i + 1);
do
{
cur_list.clear();
operator_array.clear();
for (auto x : list)
cur_list.push_back(x);
for (int i = 0; i < cur_list.size() - 1; i++)
operator_array.push_back(0);
do
{
div_remainder = false;
count = 0;
cur_equation = operator_array;
cur_temp = cur_list;
for (int i = 0; i < cur_equation.size(); ++i) { // Check for equation priorities
if (cur_equation[i] == 3) {
count = i;
if (cur_temp[count] % cur_temp[count + 1] != 0) {
div_remainder = true;
break;
}
}
}
if (div_remainder)
continue;
for (int i = 0; i < cur_temp.size() - 1; ++i) {
count = -1;
if (cur_equation[i] == 2 || cur_equation[i] == 3) {
count = prev_num_pos(cur_equation, i);
}
else
continue;
if (cur_equation[i] == 2) {
cur_temp[count] *= cur_temp[i + 1];
cur_equation[i] = -1;
}
else if (cur_equation[i] == 3) {
if (cur_temp[i + 1] != 0) {
cur_temp[count] /= cur_temp[i + 1];
cur_equation[i] = -1;
}
else {
div_remainder = true;
break;
}
}
}
if (div_remainder)
continue;
for (int i = 0; i < cur_temp.size() - 1; ++i) {
switch (cur_equation[i]) {
case 0: {
cur_temp[0] += cur_temp[i + 1]; // Addition
cur_equation[i] = -1;
break;
}
case 1: { // Subtraction
cur_temp[0] -= cur_temp[i + 1];
cur_equation[i] = -i;
break;
}
}
}
if (cur_temp[0] == cur_target) {
#pragma omp critical
{
for (int i = 0; i < cur_list.size(); ++i) {
outFile << cur_list[i];
if (i < cur_list.size() - 1) { outFile << get_operator(operator_array[i]); }
}
outFile << "\n";
}
}
} while (nextoperator(cur_list.size(), operator_array));
// Send to function to undergone a list of operator combinations
} while (next_permutation(list.begin() + 1, list.end()));
}
}
int main(void) {
SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
vector<int> int_list;
string line;
ifstream myfile("Problem.txt");
if (myfile.is_open()) {
while (getline(myfile, line)) {
int num = stoi(line);
int_list.push_back(num);
cur_target = num;
}
}
else
cout << "Unable to open file." << endl;
myfile.close();
int_list.pop_back();
sort(int_list.begin(), int_list.end());
outFile.open("answer.txt");
vector_combination(int_list);
outFile.close();
int answer_count = 0;
myfile.open("answer.txt");
if (myfile.is_open()) {
while (getline(myfile, line)) {
++answer_count;
if (answer_count > 1)
break;
}
}
myfile.close();
if (answer_count == 0) {
outFile.open("answer.txt");
outFile << "-1" << endl;
}
outFile.close();
return 0;
}
As for the sample input, create a .txt file named "Problem.txt" with random numbers like so (The last number is the targeted result)(Updated with current sample input used for benchmark):
28
55
78
77
33
65
35
62
19
221
The hardware/software specification that the program runs on:
Processor: i5 Sandy Bridge 2500K,
Ram: 8GB,
OS: Windows 10 Professional,
IDE: Visual Studio 2015 Enterprise Edition,

Move the #pragma omp critical inside the if condition. Since cur_temp is thread private and cur_target is global read only, it is not necessary to protect the condition with a critical section.
This change drastically minimizes the direct interaction between the threads and, on my system, speeds up the parallel version consistently.
I would weakly guess the performance variations were influenced by the (seemingly random) phase shift between the loops running on different threads.
If performance variation persists, try enabling thread binding. Check the documentation of your OpenMP implementation, look for OMP_PROC_BIND, "thread pinning", "binding", or "affinity".

Apparently the runtime variance was caused by the vectors. I've checked it using performance analyzer and noticed the time spent on copying the values between vectors was not consistent. I've modified it to pointer array instead and the runtime is now improved tremendously and consistent.

Related

Parallel Program with POSIX Threads Slower than Serial

I know this question is a duplicate one, but I couldn't find any other topic similar to my code.
The problem statement is as followed:
There is a CSV file with 16,000 lines. A serial version of the program is extracting those rows with a price (SalePrice is a column head in the CSV) higher than a specific value (threshold) given to the program with command-line arguments and calculating their Mean and Standard Derivation which will be used for further computations.
This larger CSV file is broken into 4 CSV files for the parallel version. Each thread is assigned to one CSV file and should do the same calculations (Calculating Mean and STD of rows with price higher than a specific value named threshold in my code).
Since the data is large enough, I don't think this is because of the multithreading overhead.
I would be thankful if someone could please help me find out what part is slowing down my parallel version?
#include <iostream>
#include <fstream>
#include <vector>
#include <math.h>
#include <iomanip>
#include <pthread.h>
#include <stdio.h>
#include <time.h>
#include <sys/stat.h>
#include <unistd.h>
using namespace std;
#define COMMA ','
#define EMPTY_STR ""
#define FILENAME "dataset.csv"
#define CLASSIFIER "GrLivArea"
#define SALE_PRICE "SalePrice"
const int MAX_THREAD_NUMBERS = 20;
int NUMBER_OF_THREADS;
int threshold;
int expensive_cnt[MAX_THREAD_NUMBERS];
vector<string> lines;
string head;
double _std;
long sum[MAX_THREAD_NUMBERS];
long ps[MAX_THREAD_NUMBERS];
long sumsq[MAX_THREAD_NUMBERS];
double mean;
int total_items;
int total_expensive_cnt;
struct Item
{
int x;
bool category;
};
vector<Item> items[MAX_THREAD_NUMBERS];
int getColNum(const string& head, const string& key)
{
int cnt = 0;
string cur = EMPTY_STR;
for (int i = 0 ; i < head.size() ; i++)
{
if (head[i] == COMMA)
{
if (cur == key)
return cnt;
cnt++;
cur = EMPTY_STR;
}
else
cur += head[i];
}
if (cur == key)
return cnt;
return -1;
}
vector<int> separateByComma(string s)
{
vector<int> res;
string cur = EMPTY_STR;
for (int i = 0 ; i < s.size() ; i++)
if (s[i] == COMMA)
{
res.push_back(stoi(cur));
cur = EMPTY_STR;
}
else
cur += s[i];
res.push_back(stoi(cur));
return res;
}
void* calcSums(void* tid)
{
long thread_id = (long)tid;
string filename = "dataset_" + to_string(thread_id) + ".csv";
ifstream fin(filename);
string head;
fin >> head;
int classifierColNum = getColNum(head, CLASSIFIER);
if (classifierColNum == -1)
{
printf("NO GrLivArea FOUND IN HEAD OF CSV\n");
exit(-1);
}
int priceColNum = getColNum(head, SALE_PRICE);
if (priceColNum == -1)
{
printf("NO SalePrice FOUND IN HEAD OF CSV\n");
exit(-1);
}
string line;
while (fin >> line)
{
vector<int> cur = separateByComma(line);
bool category = (cur[priceColNum] >= threshold);
Item item{cur[classifierColNum], category};
if (category)
{
sum[thread_id] += item.x;
sumsq[thread_id] += (item.x * item.x);
expensive_cnt[thread_id]++;
}
items[thread_id].push_back(item);
}
fin.close();
pthread_exit(NULL);
}
void calcMeanSTD()
{
string line;
for (int i = 0 ; ; i++)
{
struct stat buffer;
string name = "dataset_" + to_string(i) + ".csv";
if (!(stat (name.c_str(), &buffer) == 0))
break;
NUMBER_OF_THREADS++;
}
pthread_t threads[NUMBER_OF_THREADS];
int return_code;
for (long tid = 0 ; tid < NUMBER_OF_THREADS ; tid++)
{
return_code = pthread_create(&threads[tid], NULL, calcSums, (void*)tid);
if (return_code)
{
printf("ERROR; return code from pthread_create() is %d\n", return_code);
exit(-1);
}
}
for (long tid = 0 ; tid < NUMBER_OF_THREADS ; tid++)
{
return_code = pthread_join(threads[tid], NULL);
if (return_code)
{
printf("ERROR; return code from pthread_join() is %d\n", return_code);
exit(-1);
}
}
double total_sum = 0;
double total_sum_sq = 0;
total_expensive_cnt = 0;
total_items = 0;
for (int i = 0 ; i < NUMBER_OF_THREADS ; i++)
{
total_sum += sum[i];
total_sum_sq += sumsq[i];
total_expensive_cnt += expensive_cnt[i];
total_items += items[i].size();
}
mean = total_sum / total_expensive_cnt;
_std = sqrt((total_sum_sq - ((total_sum * total_sum) / (total_expensive_cnt))) / (total_expensive_cnt));
}
int main(int argc, char *argv[])
{
threshold = atoi(argv[1]);
calcMeanSTD();
cout << mean << " " << _std << endl;
return 0;
}
Please let me know if any part is not understandable.
Here are some run-time values:
Read CSV (Serial): 0.043268s Calculations (Serial): 0.000151s
The exact time calculation isn't much easy in the multithreaded version here since the calculations and file reading are done in the same while loop which is not separable here. There also many thread switches. Anyway, their sum is about: 0.14587s
As it can be seen, the amount of time needed to read from files is almost 300 times as doing the math calculations.

Thanks to the answers in the comment, I found out what is happening:
I tried increasing the number of rows in my CSV files to see if the parallelization is working.
The run-time values for a CSV file with 1000000 rows are:
Parallel: real 0m0.558s user 0m2.173s sys 0m0.020s
Serial: real 0m1.834s user 0m1.818s sys 0m0.016s
Since I am using 4 threads, I expect 1.834 divided by 0.558 to be near to 4 which actually is 3.28 and is fair enough.
This run-time values for smaller CSV files aren't showing these results which seems to be because of the simple math computations in my code.
The bottleneck of this code is the section where I am reading from CSV files. This section seems to be serial since it is reading from a disk.
There is also a problem of False Sharing in this code
which causes cache contention due to updates of different memory locations by different threads when these locations share the same cache line mapping. There are many solutions to this problem, for example, I can introduce padding into these arrays to make sure that elements accessed by multiple threads do not share cache lines. Or, more simply, work with thread-local variables instead of arrays, and, in the end, update the array elements only once.

Trying to create a multithreaded program to find the total primes from 0-100000000

Hello I am trying to write a C++ multithreaded program using POSIX thread library to find the number of prime numbers between 1 and 10,000,000 (10 million) and find out how many microseconds it takes...
Creating my threads and running them works completely fine, however I feel as if there is an error found in my Prime function when determining if a number is prime or not...
I keep receiving 78496 as my output, however I desire 664579. Below is my code. Any hints or pointers would be greatly appreciated.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include <sys/time.h> //measure the execution time of the computations
using namespace std;
//The number of thread to be generated
#define NUMBER_OF_THREADS 4
void * Prime(void* index);
long numbers[4] = {250000, 500000, 750000, 1000000};
long start_numbers[4] = {1, 250001, 500001, 750001};
int thread_numbers[4] = {0, 1, 2, 3};
int main(){
pthread_t tid[NUMBER_OF_THREADS];
int tn;
long sum = 0;
timeval start_time, end_time;
double start_time_microseconds, end_time_microseconds;
gettimeofday(&start_time, NULL);
start_time_microseconds = start_time.tv_sec * 1000000 + start_time.tv_usec;
for(tn = 0; tn < NUMBER_OF_THREADS; tn++){
if (pthread_create(&tid[tn], NULL, Prime, (void *) &thread_numbers[tn]) == -1 ) {
perror("thread fail");
exit(-1);
}
}
long value[4];
for(int i = 0; i < NUMBER_OF_THREADS; i++){
if(pthread_join(tid[i],(void **) &value[i]) == 0){
sum = sum + value[i]; //add four sums together
}else{
perror("Thread join failed");
exit(-1);
}
}
//get the end time in microseconds
gettimeofday(&end_time, NULL);
end_time_microseconds = end_time.tv_sec * 1000000 + end_time.tv_usec;
//calculate the time passed
double time_passed = end_time_microseconds - start_time_microseconds;
cout << "Sum is: " << sum << endl;
cout << "Running time is: " << time_passed << " microseconds" << endl;
exit(0);
}
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
for (int j=2; j*j <= i; j++)
{
if (i % j == 0)
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}```

This is because, you used 10^6 instead of 10^7.
Also, added some corner cases for numbers 1, 2 and 3:
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
// Corner cases
if(i<=1)continue;
if (i <= 3){
sum_t++;
continue;
}
for (int j=2; j*j <= i; j++)
{
if ((i % j == 0) || (i %( j+2))==0 )
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}
I tested your code with correct number and got the correct number of primes as output:
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 4.69242e+07 microseconds
Thanks to #chux - Reinstate Monica for pointing this out

Along with taking 10^7 as the numbers divided in thread instead of setting the limit as 10^6 ,a number of other small scale errors are there and a number of optimizations could be made -
First of all start numbers could be from 2 itself
long start_numbers[4] = {2, 2500001, 5000001, 7500001};
sum_t++ in your code may not work on edge cases. It is better to follow the following algorithm for calculating Prime function
bool flag = false;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
flag = false;
for (long j=2; j*j <= i; j++){
if (i % j == 0 )
{
flag = true;
break;
}
}
if(!flag)
sum_t++;
}
After these 2 operations i am getting the result as
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 6.62618e+06 microseconds
edit:
( Note : in this case j is taken as long datatype but it could work as well with int in this 'example' since the tested compiler takes int as 32 bits long)

Optimized c++ function for nth prime number execution time

I am implementing a c++ function to get Nth prime number using some predefined indices for time optimization purpose.
my code is :
// file prime.cpp
#include <iostream>
#include <time.h>
using namespace std;
/*
#define primeAt10000 104743
#define primeAt20000 224743
#define primeAt30000 350381
#define primeAt40000 479951
#define primeAt50000 611977
*/
int prime(int n){
int pos = 1,i = 1,temp;
if(n==0)
return 2;
/*
else if(n>50000){
i = primeAt50000;
pos = 50001;
}else if(n>40000){
i = primeAt40000;
pos = 40001;
}else if(n>30000){
i = primeAt30000;
pos = 30001;
}else if(n>20000){
i = primeAt20000;
pos = 20001;
}else if(n>10000){
i = primeAt10000;
pos = 10001;
}*/
while( i+=2 ){
temp = i/2+1;
for(int j = 3 ; j <= temp ; j+=2)
if(i%j == 0)
goto con;
if(pos++ >= n)
return i;
con :;
}
}
int main(int argc, char const *argv[]){
int index;
cin >> index;
clock_t start = clock();
cout << prime(index)<<endl;
cout << (clock()-start)/CLOCKS_PER_SEC<<"sec"<< endl;
return 0;
}
compiled with:
g++ prime.cpp -o prime.exe
I ran this code three times for inputs 9999, 19999 and 29999
1st run : 1sec 6sec 14sec
2nd run : 1sec 7sec 15sec
3rd run : 1sec 7sec 16sec
After enabling commented code again I ran three times with same inputes
1st run : 1sec 5sec 8sec
2nd run : 1sec 5sec 8sec
3rd run : 1sec 5sec 8sec
My question is :
Why this difference in taken time for each execution after second compilation while the loops are running everytime for ~1,25,000 times?
and
Why for input 19999 (~104743 looping times) it is much closer then the first 3 runs after first compilation (~224743 looping times)?

Difference in time for each 9999 interval is different because when we going towards larger numbers to check either it is prime or not it takes more time then smaller ones.
In other words directly We can say that the run-time of for-loop in prime() is increased because of larger value of variable temp.
when we checking for i = 101, the value of temp become 51 and for-loop will run approx 25 times.
while when we check for i = 10001, the value of temp become 5001 and for-loop will run for approx 2500 times.
this difference in run-time of for loop will increase your overall time complexity.

After some discussion with #JonathanLeffler I have further optimized this function to achieve fastest output for larger input values like for index 9999, 19689 and so on...
Now the complexity of my prime function is (N^2)/12 unlike before [it was (N^2)/8].
My new code is :
#include <iostream>
#include <time.h>
using namespace std;
#define primeAt10000 104743-7
#define primeAt20000 224743-7
#define primeAt30000 350381-7
#define primeAt40000 479951-7
#define primeAt50000 611977-7
bool checkPrime(int x){
int temp = x/2+1;
for(int j = 3 ; j <= temp ; j+=2)
if(x%j == 0)
return false;
return true;
}
int prime(int n){
int pos = 2,i = 0;
if(n==0)
return 2;
else if(n==1)
return 3;
else if(n>50000){
i = primeAt50000;
pos = 50000;
}else if(n>40000){
i = primeAt40000;
pos = 40000;
}else if(n>30000){
i = primeAt30000;
pos = 30000;
}else if(n>20000){
i = primeAt20000;
pos = 20000;
}else if(n>10000){
i = primeAt10000;
pos = 10000;
}
while( i+=6 ){
if(checkPrime(i-1))
if(pos++>=n)
return i-1;
if(checkPrime(i+1))
if(pos++>=n)
return i+1;
}
return 0;
}
int main()
{
int index;
cin >> index;
clock_t start = clock();
cout << prime(index)<<endl;
cout << (clock()-start)/(float)CLOCKS_PER_SEC<<"sec";
return 0;
}
Compiled with(as the advice of #NathanOliver && #JonathanLeffler) :
g++ -O3 -Wall -Werror -Wextra prime.cpp -o prime.exe
Now prime.exe taking 1.34, 4.83 and 7.20sec respectivly to inputs 9999, 19999 and 29999.

Why my Shell sorting is so slow

I am trying to implement shell sorting algorithm myself. I wrote my own code and didn't watch to any code samples only watch the video of algorithm description
My sort works but very slow (bubble sort 100 items - 0.007 s; shell sort 100 items - 4.83 s), how is it possible to improve it?
void print(vector<float>vec)
{
for (float i : vec)
cout << i << " ";
cout << "\n\n";
}
void Shell_sorting(vector<float>&values)
{
int swapping = 0;
int step = values.size();
clock_t start;
double duration;
start = clock();
while (step/2 >= 1)
{
step /= 2;
for (int i = 0; i < values.size()-step; i++)
{
if ((i + step < values.size()))
{
if ((values[i + step] < values[i]))
{
swap(values[i], values[i + step]);
print(values);
++swapping;
int c = i;
while (c - step > 0)
{
if (values[c] < values[c - step])
{
swap(values[c], values[c - step]);
print(values);
++swapping;
c -= step;
}
else
break;
}
}
}
else
break;
}
}
duration = (clock() - start) / (double)CLOCKS_PER_SEC;
print(values);
cout << swapping << " " << duration;
print(values);
}

A better implementation could be:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec = {
726,621,81,719,167,958,607,130,263,108,
134,235,508,407,153,162,849,923,996,975,
250,78,460,667,654,62,865,973,477,912,
580,996,156,615,542,655,240,847,613,497,
274,241,398,84,436,803,138,677,470,606,
226,593,620,396,460,448,198,958,566,599,
762,248,461,191,933,805,288,185,21,340,
458,592,703,303,509,55,190,318,310,189,
780,923,933,546,816,627,47,377,253,709,
992,421,587,768,908,261,946,75,682,948,
};
std::vector<int> gaps = {5, 2, 1};
int j;
for (int gap : gaps) {
for (int i = gap; i < vec.size(); i++)
{
j = i-gap;
while (j >= 0) {
if (vec[j+gap] < vec[j])
{
int temp = vec[j+gap];
vec[j+gap] = vec[j];
vec[j] = temp;
j = j-gap;
}
else break;
}
}
}
for (int item : vec) std::cout << item << " " << std::endl;
return 0;
}
I prefer to use a vector to store gap data so that you do not need to compute the division (which is an expansive operation). Besides, this choice, gives your code more flexibility.
the extern loop cycles on gap values. Once choosen the gap, you iterate over your vector, starting from vec[gap] and explore if there are elements smaller then it according to the logic of the Shell Sort.
So, you start setting j=i-gap and test the if condition. If it is true, swap items and then repeat the while loop decrementing j. Note: vec[j+gap]is the element that in the last loop cycle was swapped. If the condition is true, there's no reason to continue in the loop, so you can exit from it with a break.
On my machine, it took 0.002s calculated using the time shell command (the time includes the process of printing numbers).
p.s. to generate all that numbers and write them in the array, since i'm too lazy to write a random function, i used this link and then i edited the output in the shell with:
sed -e 's/[[:space:]]/,/g' num | sed -e 's/$/,/'

Nested loop acting weird

EDIT: Posting everything, because it gets really weird.
using namespace std;
int main()
{
int doors = -1;
int jumper = 1;
bool isOpen[100];
string tf;
for(int i = 0 ; i < 100; i++){
isOpen[i] = false;
}
while(jumper < 100){
while(doors < 100){
if(isOpen[doors + jumper] == true){
isOpen[doors + jumper] = false;
}
else{
isOpen[doors + jumper] = true;
}
doors += jumper;
cout << doors << endl;
}
doors = -1;
jumper+=1;
}
for(int i = 0; i < 100; i++){
if(isOpen[i]){
tf = "open";
}
else{
tf = "closed.";
}
cout << "Door " << i << " is " << tf << endl;
}
return 0;
}
So I'm having a very odd problem with this piece of code.
It's supposed to go through an array of 100 items. 0 - 99 by ones then tows then threes, etc. However, after a = 10, it shoots up to 266.
Can anyone tell me why?
Edit:
This problem only happens when the for loop is commented out. When it is left in the code, it does the same thing, but it doesn't happen until 19.
If I comment out the "string tf;" as well, it continues to loop at 99.
This is all based on the doors count.
I'm unsure why either of these should be a factor to the loop that neither are connected to.

According to your description this is what you should do:
for(int adv = 1, i = 0; adv < 100;)
{
// i is array index (your b) -> use it somehow:
doSomething(arr[i]);
i += adv;
if(i >= 100)
{
i = 0;
adv++;
}
}
The (probable) reason you got weird behavior (including the 266 value) is that your code overruns the buffer. When b will be high enough (say 99), you'd write to isOpen[b + a] which will be 100 or higher (100 if a is 1, and that's just the first iteration, later iterations will go much further). If the compiler allocates isOpen before the ints you'll be overwriting them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

OpenMP, parallel for loop, Large differences in processing time - c++

Apparently the runtime variance was caused by the vectors. I've checked it using performance analyzer and noticed the time spent on copying the values between vectors was not consistent. I've modified it to pointer array instead and the runtime is now improved tremendously and consistent.

Related

Parallel Program with POSIX Threads Slower than Serial

Trying to create a multithreaded program to find the total primes from 0-100000000

Optimized c++ function for nth prime number execution time

Why my Shell sorting is so slow

Nested loop acting weird

Categories

Resources