I wrote a simple program using pthread but my results are random....
#define NTHREADS 2
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
void *add(void* numbers){
pthread_mutex_lock( &mutex1 );
int *n = (int*) numbers;
float sum;
for(int i = 0; i < 5; i++){
sum = sum + n[i] +5;
}
cout << sum/5<<endl;
pthread_mutex_unlock( &mutex1 );
}
void *substract(void* numbers){
pthread_mutex_lock( &mutex1 );
int *n = (int*) numbers;
float sum;
for(int i = 0; i < 5; i++){
sum = sum + n[i] -10;
}
cout << sum/5<<endl;
pthread_mutex_unlock( &mutex1 );
}
main(){
pthread_t thread_id[NTHREADS];
int i, j;
int *numbers = new int[5];
numbers[0] = 34; numbers[1] = 2; numbers[2]= 77; numbers[3] = 40; numbers[4] = 12;
pthread_create( &thread_id[0], NULL, add, (void*) numbers);
pthread_create( &thread_id[1], NULL, substract, (void*) numbers );
pthread_join( thread_id[0], NULL);
pthread_join( thread_id[1], NULL);
exit(EXIT_SUCCESS);
}
The output of the program is random....Sometimes it got
-2.42477e+26
23
Sometimes it got only one strange number such as
235.69118e+13
(empty space)
I have also tried only to use one thread, but the result is still random. For example, I only used thread to calculate "add", the result is sometimes 38, which is correct, but sometimes is a very strange number.
Where I did wrong? Thank you .
The reason for random numbers, as I told you in your previous question, is that you do not initialize your sum before using. There are other issues with your code as well (see comments), but they are not directly responsible for the random result.
You also do not need to use any mutex at all in your current code. As a matter of fact, by using mutex you made your application effectively single-threaded, dumping all multithreading benefits. The only place where you might need a mutex is right before and after cout call - to ensure the output is not intertwined.
There are various things you need to fix in your code, but the most "burning issue", and the one causing you the problems is using an uninitialized variable -> sum.
Related
In my program, I want to get number of threads from user. For example, user enters number of threads as 5, i want to create 5 threads. It is only needed in the beginning of the program. I don't need to change number of threads during the program. So, i write the code such as;
int numberOfThread;
cout << "Enter number of threads: " ;
cin >> numberOfThread;
for(int i = 0; i < numberOfThread; i++)
{
pthread_t* mythread = new pthread_t;
pthread_create(&mythread[i],NULL, myThreadFunction, NULL);
}
for(int i = 0; i < numberOfThread; i++)
{
pthread_join(mythread[i], NULL);
}
return 0;
but i have an error in this line pthread_join(mythread[i], NULL);
error: ‘mythread’ was not declared in this scope.
What is wrong in this code?
and do you have a better idea to create user defined number of thread?
First, you have a memory leak when creating threads because you allocate memory but then loose the reference to it.
I suggest you the following: create an std::vector of std::threads (so, don't use pthread_t at all) and then you can have something like:
std::vector<std::thread> threads;
for (std::size_t i = 0; i < numberOfThread; i++) {
threads.emplace_back(myThreadFunction, 1);
}
for (auto& thread : threads) {
thread.join();
}
if your myThreadFunction looks like:
void myThreadFunction(int n) {
std::cout << n << std::endl; // output: 1, from several different threads
}
First of all, I think it is important to say that I am new to multithreading and know very little about it. I was trying to write some programs in C++ using threads and ran into a problem (question) that I will try to explain to you now:
I wanted to use several threads to fill an array, here is my code:
static const int num_threads = 5;
int A[50], n;
//------------------------------------------------------------
void ThreadFunc(int tid)
{
for (int q = 0; q < 5; q++)
{
A[n] = tid;
n++;
}
}
//------------------------------------------------------------
int main()
{
thread t[num_threads];
n = 0;
for (int i = 0; i < num_threads; i++)
{
t[i] = thread(ThreadFunc, i);
}
for (int i = 0; i < num_threads; i++)
{
t[i].join();
}
for (int i = 0; i < n; i++)
cout << A[i] << endl;
return 0;
}
As a result of this program I get:
0
0
0
0
0
1
1
1
1
1
2
2
2
2
2
and so on.
As I understand, the second thread starts writing elements to an array only when the first thread finishes writing all elements to an array.
The question is why threads dont't work concurrently? I mean why don't I get something like that:
0
1
2
0
3
1
4
and so on.
Is there any way to solve this problem?
Thank you in advance.
Since n is accessed from more than one thread, those accesses need to be synchronized so that changes made in one thread don't conflict with changes made in another. There are (at least) two ways to do this.
First, you can make n an atomic variable. Just change its definition, and do the increment where the value is used:
std::atomic<int> n;
...
A[n++] = tid;
Or you can wrap all the accesses inside a critical section:
std::mutex mtx;
int next_n() {
std::unique_lock<std::mutex> lock(mtx);
return n++;
}
And in each thread, instead of directly incrementing n, call that function:
A[next_n()] = tid;
This is much slower than the atomic access, so not appropriate here. In more complex situations it will be the right solution.
The worker function is so short, i.e., finishes executing so quickly, that it's possible that each thread is completing before the next one even starts. Also, you may need to link with a thread library to get real threads, e.g., -lpthread. Even with that, the results you're getting are purely by chance and could appear in any order.
There are two corrections you need to make for your program to be properly synchronized. Change:
int n;
// ...
A[n] = tid; n++;
to
std::atomic_int n;
// ...
A[n++] = tid;
Often it's preferable to avoid synchronization issues altogether and split the workload across threads. Since the work done per iteration is the same here, it's as easy as dividing the work evenly:
void ThreadFunc(int tid, int first, int last)
{
for (int i = first; i < last; i++)
A[i] = tid;
}
Inside main, modify the thread create loop:
for (int first = 0, i = 0; i < num_threads; i++) {
// possible num_threads does not evenly divide ASIZE.
int last = (i != num_threads-1) ? std::size(A)/num_threads*(i+1) : std::size(A);
t[i] = thread(ThreadFunc, i, first, last);
first = last;
}
Of course by doing this, even though the array may be written out of order, the values will be stored to the same locations every time.
I'm struggling to write a threaded program in c++ that is accurate and faster than my non-threaded version.
I'm finding the largest entry in a 2d array of random doubles.
Here is the general code:
void getLargest(double** anArray, double largestEntry, int dimLower, int dimUpper, int dim) {
for (int i = dimLower; i < dimUpper; i++) {
for (int j = 0; j < dim; j++) {
if (anArray[i][j] > largestEntry) {
largestEntry = anArray[i][j];
}
}
}
}
int main(){
// Seed the random number generator
srand( time(NULL));
// 2D array dimension
int dim = 30000;
// Specify max values
double max = (double) (dim * dim * dim);
double min = (double) (dim * dim * dim * -1.0);
double t1 = get_wallTime();
// Create a 2D array
double **myArray = new double*[dim];
for (int i=0; i<dim; i++){
myArray[i] = new double[dim];
for (int j=0; j<dim; j++){
// generate random number
myArray[i][j] = genRandNum(min, max);
}
}
double largestEntry = 0.0;
int portion = dim / 5;
std::future<void> thread1 = std::async (std::launch::async, getLargest, myArray, largestEntry, 0, portion, dim);
thread1.get();
std::future<void> thread2 = std::async (std::launch::async, getLargest, myArray, largestEntry, portion, (portion * 2), dim);
thread2.get();
std::future<void> thread3 = std::async (std::launch::async, getLargest, myArray, largestEntry, (portion * 2), (portion * 3), dim);
thread3.get();
std::future<void> thread4 = std::async (std::launch::async, getLargest, myArray, largestEntry, (portion * 3), (portion * 4), dim);
thread4.get();
std::future<void> thread5 = std::async (std::launch::async, getLargest, myArray, largestEntry, (portion *4), dim, dim);
thread5.get();
double t2 = get_wallTime();
double t3 = t2 - t1;
cout << " The largest entry is " << largestEntry << endl;
cout << "runtime : " << t3 << "\n";
}
I have the appropriate #includes.
I understand my code as updating the double largestEntry from each thread if the portion of the 2d array that the thread is processing has a larger entry than the thread prior to it. Then I output the largest entry, and the runtime.
Here is the output:
The largest entry is 0
runtime : 14.7113
This runs way faster than I'm expecting it to, and the largest entry should not be zero. Basically, I'm having trouble finding why that is. I'm not very comfortable with using async, but when I have before, this method worked very well. I know I'm not updating largestEntry correctly, though I'm unsure of where I've made a mistake.
Thanks for any advice you guys could give.
You're passing largestEntry into getLargest by value, so when it is updated only the value within the function is updated, not the value in main.
Two other notes: The thread1.get() etc. calls should all be after the threads are created, so they all run simultaneously.
Two, each thread should return it's own value for largestEntry (it can be the value of the future), then compare those to find the largest. If they all reference the same variable you're going to get into race conditions between the threads, CPU cache thrashing, and possibly bad answers depending on how the optimizer handles the updates to largestEntry (it could avoid writing the value out until all the looping was done).
I'm doing an assignment that involves calculating pi with threads. I've done this using mutex and it works fine, but I would like to get this version working as well. Here is my code.
#include <iostream>
#include <stdlib.h>
#include <iomanip>
#include <vector>
#include <pthread.h>
using namespace std;
typedef struct{
int iterations; //How many iterations this thread is going to do
int offset; //The offset multiplier for the calculations (Makes sure each thread calculates a different part of the formula)
}threadParameterList;
vector<double> partialSumList;
void* pi_calc(void* param){
threadParameterList* _param = static_cast<threadParameterList*>(param);
double k = 1.0;
for(int i = _param->iterations * _param->offset + 1; i < _param->iterations * (_param->offset + 1); ++i){
partialSumList[_param->offset] += (double)k*(4.0/((2.0*i)*(2.0*i+1.0)*(2.0*i+2.0)));
k *= -1.0;
}
pthread_exit(0);
}
int main(int argc, char* argv[]){
//Error checking
if(argc != 3){
cout << "error: two parameters required [iterations][threadcount]" << endl;
return -1;
}
if(atoi(argv[1]) <= 0 || atoi(argv[2]) <= 0){
cout << "error: invalid parameter supplied - parameters must be > 0." << endl;
return -1;
}
partialSumList.resize(atoi(argv[2]));
vector<pthread_t> threadList (atoi(argv[2]));
vector<threadParameterList> parameterList (atoi(argv[2]));
int iterations = atoi(argv[1]),
threadCount = atoi(argv[2]);
//Calculate workload for each thread
if(iterations % threadCount == 0){ //Threads divide evenly
for(int i = 0; i < threadCount; ++i){
parameterList[i].iterations = iterations/threadCount;
parameterList[i].offset = i;
pthread_create(&threadList[i], NULL, pi_calc, ¶meterList[i]);
}
void* status;
for(int i = 0; i < threadCount; ++i){
pthread_join(threadList[i], &status);
}
}
else{ //Threads do not divide evenly
for(int i = 0; i < threadCount - 1; ++i){
parameterList[i].iterations = iterations/threadCount;
parameterList[i].offset = i;
pthread_create(&threadList[i], NULL, pi_calc, ¶meterList[i]);
}
//Add the remainder to the last thread
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
pthread_create(&threadList[threadCount], NULL, pi_calc, ¶meterList[threadCount]);
void* status;
for(int i = 0; i < threadCount-1; ++i){
pthread_join(threadList[i], &status);
cout << status << endl;
}
}
//calculate pi
double pi = 3.0;
for(int i = 0; i < partialSumList.size(); ++i){
pi += partialSumList[i];
}
cout << "Value of pi: " << setw(15) << setprecision(15) << pi << endl;
return 0;
}
The code works fine in most cases. There are certain combinations of parameters that cause me to get a double free or corruption error on return 0. For example, if I use the parameters 100 and 10 the program creates 10 threads and does 10 iterations of the formula on each thread, works fine. If I use the parameters 10 and 4 the program creates 4 threads that do 2 iterations on 3 threads and 4 on the 4th thread, works fine. However, if I use 5 and 3, the program will correctly calculate the value and even print it out, but I get the error immediately after. This also happens for 17 and 3, and 10 and 3. I tried 15 and 7, but then I get a munmap_chunk(): invalid pointer error when the threads are trying to be joined - although i think that's something for another question.
If I had to guess, it has something to do with pthread_exit deallocating memory and then the same memory trying to be deallocated again on return, since I'm passing the parameter struct as a pointer. I tried a few different things like creating a local copy and defining parameterList as a vector of pointers, but it didn't solve anything. I've also tried eraseing and clearing the vector before return but that didn't help either.
I see this issue:
You are writing beyond the vector's bounds:
vector<threadParameterList> parameterList (atoi(argv[2]));
//...
int threadCount = atoi(argv[2]);
//...
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
Accessing parameterList[threadCount] is out of bounds.
I don't see in the code where threadCount is adjusted, so it remains the same value throughout that snippet.
Tip: If the goal is to access the last item in a container, use vector::back(). It works all the time for non-empty vectors.
parameterList.back().iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList.back().offset = threadCount - 1;
One thing I can see is you might be going past the end of the vector here:
for(int i = 0; i < partialSumList.capacity(); ++i)
capacity() returns how many elements the vector can hold. This can be more than the size() of the vector. You can change you call to capacity() to size() to make sure you don't go past the end of the vector
for(int i = 0; i < partialSumList.size(); ++i)
The second thing I spot is that when iterations % threadCount != 0 you have:
parameterList[threadCount].iterations = (iterations % threadCount) + (iterations / threadCount);
parameterList[threadCount].offset = threadCount - 1;
pthread_create(&threadList[threadCount], NULL, pi_calc, ¶meterList[threadCount]);
Which is writing past the end of the vector. Then when you join all of the threads you don't join the last thread as you do:
for(int i = 0; i < threadCount-1; ++i){
^^^ uh oh. we missed the last thread
pthread_join(threadList[i], &status);
cout << status << endl;
}
There is a program I am working on that, after I launch it, works for some time and then stalls. Here is a simplified version of the program:
#include <cstdlib>
#include <iostream>
#include <pthread.h>
pthread_t* thread_handles;
pthread_mutex_t mutex;
pthread_cond_t cond_var = PTHREAD_COND_INITIALIZER;
int thread_count;
const int some_count = 77;
const int numb_count = 5;
int countR = 0;
//Initialize threads
void InitTh(char* arg[]){
/* Get number of threads */
thread_count = strtol(arg[1], NULL, 10);
/*Allocate space for threads*/
thread_handles =(pthread_t*) malloc (thread_count*sizeof(pthread_t));
}
//Terminate threads
void TermTh(){
for(long thread = 0; thread < thread_count; thread++)
pthread_join(thread_handles[thread], NULL);
free(thread_handles);
}
void* DO_WORK(void* replica) {
/*Does something*/
pthread_mutex_lock(&mutex);
countR++;
if (countR == numb_count) pthread_cond_broadcast(&cond_var);
pthread_mutex_unlock(&mutex);
}
//Some function
void FUNCTION(){
pthread_mutex_init(&mutex, NULL);
for(int k = 0; k < some_count; k++){
for(int j = 0; j < numb_count; j++){
long thread = (long) j % thread_count;
pthread_create(&thread_handles[thread], NULL, DO_WORK, (void *)j);;
}
/*Wait for threads to finish their jobs*/
pthread_mutex_lock(&mutex);
if (countR < numb_count) while(pthread_cond_wait(&cond_var,&mutex) != 0);
countR = 0;
pthread_mutex_unlock(&mutex);
/*Does more work*/
}
pthread_cond_destroy(&cond_var);
pthread_mutex_destroy(&mutex);
}
int main(int argc, char* argv[]) {
/*Initialize threads*/
InitTh(argv);
/*Do some work*/
FUNCTION();
/*Treminate threads*/
TermTh();
return 0;
}
When some_count, (in my particular case,) is less than 76, the program works fine, but if I specify a larger value the program, as mentioned earlier, works for some time and then stalls. Maybe somebody can point out what I am doing wrong?
In
long thread = (long) j % thread_count;
pthread_create(&thread_handles[thread], NULL, DO_WORK, (void *)j);;
you can "override" initialized thread handles, depending on your actual thread count parameter.
I think you should init the thread number to numb_count rather then argv
then replace
long thread = (long) j % thread_count;
with
long thread = (long) j;
won't sure it fix it, but it's needed anyway...
Moreover, it's not about the number 76 or 77, you have a race condition in the thread use.
lets say that one of you threads got to the point in "DO_WORK" when he unlock the mutex but he still didn't returned from this function (meaning the thread is still running...). then you may try to create the same thread in the next iteration using:
pthread_create(&thread_handles[thread], NULL, DO_WORK, (void *)j);
fixing, change:
pthread_mutex_lock(&mutex);
if (countR < numb_count) while(pthread_cond_wait(&cond_var,&mutex) != 0);
countR = 0;
pthread_mutex_unlock(&mutex);
to:
pthread_mutex_lock(&mutex);
if (countR < numb_count) while(pthread_cond_wait(&cond_var,&mutex) != 0);
countR = 0;
for(long thread = 0; thread < numb_count; thread++)
pthread_join(thread_handles[thread], NULL);
pthread_mutex_unlock(&mutex);
You could try to analyze it using helgrind.
Install valgrind, then launch valgrind --tool=helgrind yourproject and see what helgrind spits out
You are neither initializing your mutex correctly (not causing the error here), nor storing the threads you create correctly. Try this:
for(int count = 0; count < thread_count; ++count) {
pthread_create(&thread_handles[count], NULL, DO_WORK, (void *)(count % numb_count));
}