I know this question is a duplicate one, but I couldn't find any other topic similar to my code.
The problem statement is as followed:
There is a CSV file with 16,000 lines. A serial version of the program is extracting those rows with a price (SalePrice is a column head in the CSV) higher than a specific value (threshold) given to the program with command-line arguments and calculating their Mean and Standard Derivation which will be used for further computations.
This larger CSV file is broken into 4 CSV files for the parallel version. Each thread is assigned to one CSV file and should do the same calculations (Calculating Mean and STD of rows with price higher than a specific value named threshold in my code).
Since the data is large enough, I don't think this is because of the multithreading overhead.
I would be thankful if someone could please help me find out what part is slowing down my parallel version?
#include <iostream>
#include <fstream>
#include <vector>
#include <math.h>
#include <iomanip>
#include <pthread.h>
#include <stdio.h>
#include <time.h>
#include <sys/stat.h>
#include <unistd.h>
using namespace std;
#define COMMA ','
#define EMPTY_STR ""
#define FILENAME "dataset.csv"
#define CLASSIFIER "GrLivArea"
#define SALE_PRICE "SalePrice"
const int MAX_THREAD_NUMBERS = 20;
int NUMBER_OF_THREADS;
int threshold;
int expensive_cnt[MAX_THREAD_NUMBERS];
vector<string> lines;
string head;
double _std;
long sum[MAX_THREAD_NUMBERS];
long ps[MAX_THREAD_NUMBERS];
long sumsq[MAX_THREAD_NUMBERS];
double mean;
int total_items;
int total_expensive_cnt;
struct Item
{
int x;
bool category;
};
vector<Item> items[MAX_THREAD_NUMBERS];
int getColNum(const string& head, const string& key)
{
int cnt = 0;
string cur = EMPTY_STR;
for (int i = 0 ; i < head.size() ; i++)
{
if (head[i] == COMMA)
{
if (cur == key)
return cnt;
cnt++;
cur = EMPTY_STR;
}
else
cur += head[i];
}
if (cur == key)
return cnt;
return -1;
}
vector<int> separateByComma(string s)
{
vector<int> res;
string cur = EMPTY_STR;
for (int i = 0 ; i < s.size() ; i++)
if (s[i] == COMMA)
{
res.push_back(stoi(cur));
cur = EMPTY_STR;
}
else
cur += s[i];
res.push_back(stoi(cur));
return res;
}
void* calcSums(void* tid)
{
long thread_id = (long)tid;
string filename = "dataset_" + to_string(thread_id) + ".csv";
ifstream fin(filename);
string head;
fin >> head;
int classifierColNum = getColNum(head, CLASSIFIER);
if (classifierColNum == -1)
{
printf("NO GrLivArea FOUND IN HEAD OF CSV\n");
exit(-1);
}
int priceColNum = getColNum(head, SALE_PRICE);
if (priceColNum == -1)
{
printf("NO SalePrice FOUND IN HEAD OF CSV\n");
exit(-1);
}
string line;
while (fin >> line)
{
vector<int> cur = separateByComma(line);
bool category = (cur[priceColNum] >= threshold);
Item item{cur[classifierColNum], category};
if (category)
{
sum[thread_id] += item.x;
sumsq[thread_id] += (item.x * item.x);
expensive_cnt[thread_id]++;
}
items[thread_id].push_back(item);
}
fin.close();
pthread_exit(NULL);
}
void calcMeanSTD()
{
string line;
for (int i = 0 ; ; i++)
{
struct stat buffer;
string name = "dataset_" + to_string(i) + ".csv";
if (!(stat (name.c_str(), &buffer) == 0))
break;
NUMBER_OF_THREADS++;
}
pthread_t threads[NUMBER_OF_THREADS];
int return_code;
for (long tid = 0 ; tid < NUMBER_OF_THREADS ; tid++)
{
return_code = pthread_create(&threads[tid], NULL, calcSums, (void*)tid);
if (return_code)
{
printf("ERROR; return code from pthread_create() is %d\n", return_code);
exit(-1);
}
}
for (long tid = 0 ; tid < NUMBER_OF_THREADS ; tid++)
{
return_code = pthread_join(threads[tid], NULL);
if (return_code)
{
printf("ERROR; return code from pthread_join() is %d\n", return_code);
exit(-1);
}
}
double total_sum = 0;
double total_sum_sq = 0;
total_expensive_cnt = 0;
total_items = 0;
for (int i = 0 ; i < NUMBER_OF_THREADS ; i++)
{
total_sum += sum[i];
total_sum_sq += sumsq[i];
total_expensive_cnt += expensive_cnt[i];
total_items += items[i].size();
}
mean = total_sum / total_expensive_cnt;
_std = sqrt((total_sum_sq - ((total_sum * total_sum) / (total_expensive_cnt))) / (total_expensive_cnt));
}
int main(int argc, char *argv[])
{
threshold = atoi(argv[1]);
calcMeanSTD();
cout << mean << " " << _std << endl;
return 0;
}
Please let me know if any part is not understandable.
Here are some run-time values:
Read CSV (Serial): 0.043268s Calculations (Serial): 0.000151s
The exact time calculation isn't much easy in the multithreaded version here since the calculations and file reading are done in the same while loop which is not separable here. There also many thread switches. Anyway, their sum is about: 0.14587s
As it can be seen, the amount of time needed to read from files is almost 300 times as doing the math calculations.
Thanks to the answers in the comment, I found out what is happening:
I tried increasing the number of rows in my CSV files to see if the parallelization is working.
The run-time values for a CSV file with 1000000 rows are:
Parallel: real 0m0.558s user 0m2.173s sys 0m0.020s
Serial: real 0m1.834s user 0m1.818s sys 0m0.016s
Since I am using 4 threads, I expect 1.834 divided by 0.558 to be near to 4 which actually is 3.28 and is fair enough.
This run-time values for smaller CSV files aren't showing these results which seems to be because of the simple math computations in my code.
The bottleneck of this code is the section where I am reading from CSV files. This section seems to be serial since it is reading from a disk.
There is also a problem of False Sharing in this code
which causes cache contention due to updates of different memory locations by different threads when these locations share the same cache line mapping. There are many solutions to this problem, for example, I can introduce padding into these arrays to make sure that elements accessed by multiple threads do not share cache lines. Or, more simply, work with thread-local variables instead of arrays, and, in the end, update the array elements only once.
Hello I am trying to write a C++ multithreaded program using POSIX thread library to find the number of prime numbers between 1 and 10,000,000 (10 million) and find out how many microseconds it takes...
Creating my threads and running them works completely fine, however I feel as if there is an error found in my Prime function when determining if a number is prime or not...
I keep receiving 78496 as my output, however I desire 664579. Below is my code. Any hints or pointers would be greatly appreciated.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include <sys/time.h> //measure the execution time of the computations
using namespace std;
//The number of thread to be generated
#define NUMBER_OF_THREADS 4
void * Prime(void* index);
long numbers[4] = {250000, 500000, 750000, 1000000};
long start_numbers[4] = {1, 250001, 500001, 750001};
int thread_numbers[4] = {0, 1, 2, 3};
int main(){
pthread_t tid[NUMBER_OF_THREADS];
int tn;
long sum = 0;
timeval start_time, end_time;
double start_time_microseconds, end_time_microseconds;
gettimeofday(&start_time, NULL);
start_time_microseconds = start_time.tv_sec * 1000000 + start_time.tv_usec;
for(tn = 0; tn < NUMBER_OF_THREADS; tn++){
if (pthread_create(&tid[tn], NULL, Prime, (void *) &thread_numbers[tn]) == -1 ) {
perror("thread fail");
exit(-1);
}
}
long value[4];
for(int i = 0; i < NUMBER_OF_THREADS; i++){
if(pthread_join(tid[i],(void **) &value[i]) == 0){
sum = sum + value[i]; //add four sums together
}else{
perror("Thread join failed");
exit(-1);
}
}
//get the end time in microseconds
gettimeofday(&end_time, NULL);
end_time_microseconds = end_time.tv_sec * 1000000 + end_time.tv_usec;
//calculate the time passed
double time_passed = end_time_microseconds - start_time_microseconds;
cout << "Sum is: " << sum << endl;
cout << "Running time is: " << time_passed << " microseconds" << endl;
exit(0);
}
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
for (int j=2; j*j <= i; j++)
{
if (i % j == 0)
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}```
This is because, you used 10^6 instead of 10^7.
Also, added some corner cases for numbers 1, 2 and 3:
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
// Corner cases
if(i<=1)continue;
if (i <= 3){
sum_t++;
continue;
}
for (int j=2; j*j <= i; j++)
{
if ((i % j == 0) || (i %( j+2))==0 )
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}
I tested your code with correct number and got the correct number of primes as output:
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 4.69242e+07 microseconds
Thanks to #chux - Reinstate Monica for pointing this out
Along with taking 10^7 as the numbers divided in thread instead of setting the limit as 10^6 ,a number of other small scale errors are there and a number of optimizations could be made -
First of all start numbers could be from 2 itself
long start_numbers[4] = {2, 2500001, 5000001, 7500001};
sum_t++ in your code may not work on edge cases. It is better to follow the following algorithm for calculating Prime function
bool flag = false;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
flag = false;
for (long j=2; j*j <= i; j++){
if (i % j == 0 )
{
flag = true;
break;
}
}
if(!flag)
sum_t++;
}
After these 2 operations i am getting the result as
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 6.62618e+06 microseconds
edit:
( Note : in this case j is taken as long datatype but it could work as well with int in this 'example' since the tested compiler takes int as 32 bits long)
I am implementing a c++ function to get Nth prime number using some predefined indices for time optimization purpose.
my code is :
// file prime.cpp
#include <iostream>
#include <time.h>
using namespace std;
/*
#define primeAt10000 104743
#define primeAt20000 224743
#define primeAt30000 350381
#define primeAt40000 479951
#define primeAt50000 611977
*/
int prime(int n){
int pos = 1,i = 1,temp;
if(n==0)
return 2;
/*
else if(n>50000){
i = primeAt50000;
pos = 50001;
}else if(n>40000){
i = primeAt40000;
pos = 40001;
}else if(n>30000){
i = primeAt30000;
pos = 30001;
}else if(n>20000){
i = primeAt20000;
pos = 20001;
}else if(n>10000){
i = primeAt10000;
pos = 10001;
}*/
while( i+=2 ){
temp = i/2+1;
for(int j = 3 ; j <= temp ; j+=2)
if(i%j == 0)
goto con;
if(pos++ >= n)
return i;
con :;
}
}
int main(int argc, char const *argv[]){
int index;
cin >> index;
clock_t start = clock();
cout << prime(index)<<endl;
cout << (clock()-start)/CLOCKS_PER_SEC<<"sec"<< endl;
return 0;
}
compiled with:
g++ prime.cpp -o prime.exe
I ran this code three times for inputs 9999, 19999 and 29999
1st run : 1sec 6sec 14sec
2nd run : 1sec 7sec 15sec
3rd run : 1sec 7sec 16sec
After enabling commented code again I ran three times with same inputes
1st run : 1sec 5sec 8sec
2nd run : 1sec 5sec 8sec
3rd run : 1sec 5sec 8sec
My question is :
Why this difference in taken time for each execution after second compilation while the loops are running everytime for ~1,25,000 times?
and
Why for input 19999 (~104743 looping times) it is much closer then the first 3 runs after first compilation (~224743 looping times)?
Difference in time for each 9999 interval is different because when we going towards larger numbers to check either it is prime or not it takes more time then smaller ones.
In other words directly We can say that the run-time of for-loop in prime() is increased because of larger value of variable temp.
when we checking for i = 101, the value of temp become 51 and for-loop will run approx 25 times.
while when we check for i = 10001, the value of temp become 5001 and for-loop will run for approx 2500 times.
this difference in run-time of for loop will increase your overall time complexity.
After some discussion with #JonathanLeffler I have further optimized this function to achieve fastest output for larger input values like for index 9999, 19689 and so on...
Now the complexity of my prime function is (N^2)/12 unlike before [it was (N^2)/8].
My new code is :
#include <iostream>
#include <time.h>
using namespace std;
#define primeAt10000 104743-7
#define primeAt20000 224743-7
#define primeAt30000 350381-7
#define primeAt40000 479951-7
#define primeAt50000 611977-7
bool checkPrime(int x){
int temp = x/2+1;
for(int j = 3 ; j <= temp ; j+=2)
if(x%j == 0)
return false;
return true;
}
int prime(int n){
int pos = 2,i = 0;
if(n==0)
return 2;
else if(n==1)
return 3;
else if(n>50000){
i = primeAt50000;
pos = 50000;
}else if(n>40000){
i = primeAt40000;
pos = 40000;
}else if(n>30000){
i = primeAt30000;
pos = 30000;
}else if(n>20000){
i = primeAt20000;
pos = 20000;
}else if(n>10000){
i = primeAt10000;
pos = 10000;
}
while( i+=6 ){
if(checkPrime(i-1))
if(pos++>=n)
return i-1;
if(checkPrime(i+1))
if(pos++>=n)
return i+1;
}
return 0;
}
int main()
{
int index;
cin >> index;
clock_t start = clock();
cout << prime(index)<<endl;
cout << (clock()-start)/(float)CLOCKS_PER_SEC<<"sec";
return 0;
}
Compiled with(as the advice of #NathanOliver && #JonathanLeffler) :
g++ -O3 -Wall -Werror -Wextra prime.cpp -o prime.exe
Now prime.exe taking 1.34, 4.83 and 7.20sec respectivly to inputs 9999, 19999 and 29999.
I am trying to implement shell sorting algorithm myself. I wrote my own code and didn't watch to any code samples only watch the video of algorithm description
My sort works but very slow (bubble sort 100 items - 0.007 s; shell sort 100 items - 4.83 s), how is it possible to improve it?
void print(vector<float>vec)
{
for (float i : vec)
cout << i << " ";
cout << "\n\n";
}
void Shell_sorting(vector<float>&values)
{
int swapping = 0;
int step = values.size();
clock_t start;
double duration;
start = clock();
while (step/2 >= 1)
{
step /= 2;
for (int i = 0; i < values.size()-step; i++)
{
if ((i + step < values.size()))
{
if ((values[i + step] < values[i]))
{
swap(values[i], values[i + step]);
print(values);
++swapping;
int c = i;
while (c - step > 0)
{
if (values[c] < values[c - step])
{
swap(values[c], values[c - step]);
print(values);
++swapping;
c -= step;
}
else
break;
}
}
}
else
break;
}
}
duration = (clock() - start) / (double)CLOCKS_PER_SEC;
print(values);
cout << swapping << " " << duration;
print(values);
}
A better implementation could be:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec = {
726,621,81,719,167,958,607,130,263,108,
134,235,508,407,153,162,849,923,996,975,
250,78,460,667,654,62,865,973,477,912,
580,996,156,615,542,655,240,847,613,497,
274,241,398,84,436,803,138,677,470,606,
226,593,620,396,460,448,198,958,566,599,
762,248,461,191,933,805,288,185,21,340,
458,592,703,303,509,55,190,318,310,189,
780,923,933,546,816,627,47,377,253,709,
992,421,587,768,908,261,946,75,682,948,
};
std::vector<int> gaps = {5, 2, 1};
int j;
for (int gap : gaps) {
for (int i = gap; i < vec.size(); i++)
{
j = i-gap;
while (j >= 0) {
if (vec[j+gap] < vec[j])
{
int temp = vec[j+gap];
vec[j+gap] = vec[j];
vec[j] = temp;
j = j-gap;
}
else break;
}
}
}
for (int item : vec) std::cout << item << " " << std::endl;
return 0;
}
I prefer to use a vector to store gap data so that you do not need to compute the division (which is an expansive operation). Besides, this choice, gives your code more flexibility.
the extern loop cycles on gap values. Once choosen the gap, you iterate over your vector, starting from vec[gap] and explore if there are elements smaller then it according to the logic of the Shell Sort.
So, you start setting j=i-gap and test the if condition. If it is true, swap items and then repeat the while loop decrementing j. Note: vec[j+gap]is the element that in the last loop cycle was swapped. If the condition is true, there's no reason to continue in the loop, so you can exit from it with a break.
On my machine, it took 0.002s calculated using the time shell command (the time includes the process of printing numbers).
p.s. to generate all that numbers and write them in the array, since i'm too lazy to write a random function, i used this link and then i edited the output in the shell with:
sed -e 's/[[:space:]]/,/g' num | sed -e 's/$/,/'
EDIT: Posting everything, because it gets really weird.
using namespace std;
int main()
{
int doors = -1;
int jumper = 1;
bool isOpen[100];
string tf;
for(int i = 0 ; i < 100; i++){
isOpen[i] = false;
}
while(jumper < 100){
while(doors < 100){
if(isOpen[doors + jumper] == true){
isOpen[doors + jumper] = false;
}
else{
isOpen[doors + jumper] = true;
}
doors += jumper;
cout << doors << endl;
}
doors = -1;
jumper+=1;
}
for(int i = 0; i < 100; i++){
if(isOpen[i]){
tf = "open";
}
else{
tf = "closed.";
}
cout << "Door " << i << " is " << tf << endl;
}
return 0;
}
So I'm having a very odd problem with this piece of code.
It's supposed to go through an array of 100 items. 0 - 99 by ones then tows then threes, etc. However, after a = 10, it shoots up to 266.
Can anyone tell me why?
Edit:
This problem only happens when the for loop is commented out. When it is left in the code, it does the same thing, but it doesn't happen until 19.
If I comment out the "string tf;" as well, it continues to loop at 99.
This is all based on the doors count.
I'm unsure why either of these should be a factor to the loop that neither are connected to.
According to your description this is what you should do:
for(int adv = 1, i = 0; adv < 100;)
{
// i is array index (your b) -> use it somehow:
doSomething(arr[i]);
i += adv;
if(i >= 100)
{
i = 0;
adv++;
}
}
The (probable) reason you got weird behavior (including the 266 value) is that your code overruns the buffer. When b will be high enough (say 99), you'd write to isOpen[b + a] which will be 100 or higher (100 if a is 1, and that's just the first iteration, later iterations will go much further). If the compiler allocates isOpen before the ints you'll be overwriting them.