I'm trying to get my head around Windows API threads and thread control. I briefly worked with threads in Java so I know the basic concepts but something that has worked in Java seems to only work halfway in C++.
What I am trying to do is as follows: 2 threads in one process, sharing a common resource(for this case, the common resource is a pair of two global variables int a, b;).
The first thread should acquire a mutex, use rand() to generate pairs of numbers from 0 to 100 until it gets a pair such that b == 2 * a, then release the mutex.
The second thread should then acquire the mutex, and check if the b == 2 * a condition is true for the given values(printing something like "incorrect" in case it is not), then release the mutex so the first thread can get it back. This process of generating an checking pairs of numbers should be repeated quite a few times, say 500/1000 times.
My code is as follows:
Main:
#define INIT_SEED time(NULL)
#define NUMBER_OF_CHECKS 250
int a = 0;
int b = 1;
HANDLE mutexHandle = CreateMutex(NULL, FALSE, NULL);
int main()
{
HANDLE thread1Handle = CreateThread(NULL, NULL, Thread1Behaviour, NULL, NULL, NULL);
Sleep(50);
HANDLE thread2Handle = CreateThread(NULL, NULL, Thread2Behaviour, NULL, NULL, NULL);
WaitForSingleObject(thread1Handle, INFINITE);
WaitForSingleObject(thread2Handle, INFINITE);
return 0;
}
Thread 1 behavior:
DWORD WINAPI Thread1Behaviour( LPVOID _ )
{
srand(INIT_SEED);
for (int i = 0; i < NUMBER_OF_CHECKS; i++)
{
WaitForSingleObject(mutexHandle, INFINITE);
do
{
b = rand() % 100;
a = rand() % 100;
}
while (b != 2 * a);
cout << i << ".\t" << b << " " << a << endl;
ReleaseMutex(mutexHandle);
Sleep(50);
}
return 0;
}
Thread 2 behavior:
DWORD WINAPI Thread2Behaviour( LPVOID _ )
{
for (int i = 0; i < NUMBER_OF_CHECKS; i++)
{
WaitForSingleObject(mutexHandle, INFINITE);
if (b == 2 * a)
cout << i << ".\t" << b << "\t=\t2 * " << a << endl;
else
cout << i << ".\t" << b << "\t=\t2 * " << a << "\tINCORRECT!!!" << endl;
ReleaseMutex(mutexHandle);
Sleep(50);
}
return 0;
}
The implementation is simple enough(I skipped over the handle validity checks to keep the code short, in case a bad handle could be the cause i can add them in, but I imagine that in case of a bad handle everything should just crash & burn, not work but with incorrect outputs).
I remember for working with threads in Java that i used to sleep for some time to make sure the same thread does not reacquire the mutex. However, when I run the above code, it mainly works as intended however, when the number of checks is big enough, somethimes the first thread gets the mutex 2 times in a row, leading to an output like this:
1. 92 46
2. 66 33
1. 66 = 2 * 33
Which means that at the end, the second thread will end up checking the same pair several times:
249. 80 40
248. 80 = 2 * 40
249. 80 = 2 * 40
I have tried changing the sleep timer value with values between 0 and 250 but this remains the case no matter how large the sleep period is. When I put at least 250 it seems to work about half the time.
Also, if I remove the cout in the first thread, the problem becomes 2-3 times worse, with more botched synchronizations.
And one more thing I noticed is that for a certain configuration of sleep timer and cout/no cout in thread 1, the number of times the mutex is immediately reacquired is the same, so this is completely reproducible(at least for me).
Using logic, I got 2 conflicting conclusions:
Since it MOSTLY works as intended, it might be a synchronization problem, with the way threads "rush" for the mutex as soon as it is available
Since I am able to reproduce this issue in a pretty "deterministic" way, it might mean that it is an issue in the logic of the code
But the above can't both be true at once, so what exactly is the problem here?
EDIT: To clarify the question: I know that mutex is not technically used for order of execution, but in this case, why does it not work as intended and what would be the fix?
Thanks in advance!
Related
Maybe I’m confusing myself with threads, but my understanding of threading conflicts with each other.
I’ve created a program which uses POSIX pthreads. Without using these threads the program takes 0.061723 seconds to run, and with threads takes 0.081061 seconds to run.
At first I thought this is what should happen, as threads allow something to happen while other things should be able to happen. i.e. processing a lot of data on one thread while still having responsive UI on another, this would mean the processing of the data would take longer as the CPU divides its time between processing UI and processing the data.
However, surely the point of multithreading is to make the program take advantage of multiple CPUs/cores?
As you can tell I’m something of an intermediate so excuse me if it’s a simple question.
But what should I expect the program to do?
I’m running this on a mid-2012 Macbook Pro 13” base model. CPU is 22 nm "Ivy Bridge" 2.5 GHz Intel "Core i5" processor (3210M), with two independent processor "cores" on a single silicon chip
UPDATED WITH CODE
This is in main function. I didn’t add variable declaration for convenience but I’m sure you can work out what each does by its name:
// Loop through all items we need to process
//
while (totalNumberOfItemsToProcess > 0 && numberOfItemsToProcessOnEachIteration > 0 && startingIndex <= totalNumberOfItemsToProcess)
{
// As long as we have items to process...
//
// Align the index with number of items to process per iteration
//
const uint endIndex = startingIndex + (numberOfItemsToProcessOnEachIteration - 1);
// Create range
//
Range range = RangeMake(startingIndex,
endIndex);
rangesProcessed[i] = range;
// Create thread
//
// Create a thread identifier, 'newThread'
//
pthread_t newThread;
// Create thread with range
//
int threadStatus = pthread_create(&newThread, NULL, processCoordinatesInRangePointer, &rangesProcessed[i]);
if (threadStatus != 0)
{
std::cout << "Failed to create thread" << std::endl;
exit(1);
}
// Add thread to threads
//
threadIDs.push_back(newThread);
// Setup next iteration
//
// Starting index
//
// Realign the index with number of items to process per iteration
//
startingIndex = (endIndex + 1);
// Number of items to process on each iteration
//
if (startingIndex > (totalNumberOfItemsToProcess - numberOfItemsToProcessOnEachIteration))
{
// If the total number of items to process is less than the number of items to process on each iteration
//
numberOfItemsToProcessOnEachIteration = totalNumberOfItemsToProcess - startingIndex;
}
// Increment index
//
i++;
}
std::cout << "Number of threads: " << threadIDs.size() << std::endl;
// Loop through all threads, rejoining them back up
//
for ( size_t i = 0;
i < threadIDs.size();
i++ )
{
// Wait for each thread to finish before returning
//
pthread_t currentThreadID = threadIDs[i];
int joinStatus = pthread_join(currentThreadID, NULL);
if (joinStatus != 0)
{
std::cout << "Thread join failed" << std::endl;
exit(1);
}
}
The processing functions:
void processCoordinatesAtIndex(uint index)
{
const int previousIndex = (index - 1);
// Get coordinates from terrain
//
Coordinate3D previousCoordinate = terrain[previousIndex];
Coordinate3D currentCoordinate = terrain[index];
// Calculate...
//
// Euclidean distance
//
double euclideanDistance = Coordinate3DEuclideanDistanceBetweenPoints(previousCoordinate, currentCoordinate);
euclideanDistances[index] = euclideanDistance;
// Angle of slope
//
double slopeAngle = Coordinate3DAngleOfSlopeBetweenPoints(previousCoordinate, currentCoordinate, false);
slopeAngles[index] = slopeAngle;
}
void processCoordinatesInRange(Range range)
{
for ( uint i = range.min;
i <= range.max;
i++ )
{
processCoordinatesAtIndex(i);
}
}
void *processCoordinatesInRangePointer(void *threadID)
{
// Cast the pointer to the right type
//
struct Range *range = (struct Range *)threadID;
processCoordinatesInRange(*range);
return NULL;
}
UPDATE:
Here are my global variables, which, are only global for simplicity - don’t have a go!
std::vector<Coordinate3D> terrain;
std::vector<double> euclideanDistances;
std::vector<double> slopeAngles;
std::vector<Range> rangesProcessed;
std::vector<pthread_t> threadIDs;
Correct me if I’m wrong, but, I think the issue was with how the time elapsed was measured. Instead of using clock_t I’ve moved to gettimeofday() and that reports a shorter time, from non threaded time of 22.629000 ms to a threaded time of 8.599000 ms.
Does this seem right to people?
Of course, my original question was based on whether or not a multithreaded program SHOULD be faster or not, so I won’t mark this answer as the correct one for that reason.
Assume we have an array or vector of length 256(can be more or less) and the number of pthreads to generate to be 4(can be more or less).
I need to figure out how to assign each pthread to a process a section of the vector.
So the following code dispatches the multiple threads.
for(int i = 0; i < thread_count; i++)
{
int *arg = (int *) malloc(sizeof(*arg));
*arg = i;
thread_err = pthread_create(&(threads[i]), NULL, &multiThread_Handler, arg);
if (thread_err != 0)
printf("\nCan't create thread :[%s]", strerror(thread_err));
}
As you can tell from the above code, each thread passes an argument value to the starting function. Where in the case of the four threads, the argument values range from 0 to 3, 5 threads = 0 to 4, and so forth.
Now the starting function does the following:
void* multiThread_Handler(void *arg)
{
int thread_index = *((int *)arg);
unsigned int start_index = (thread_index*(list_size/thread_count));
unsigned int end_index = ((thread_index+1)*(list_size/thread_count));
std::cout << "Start Index: " << start_index << std::endl;
std::cout << "End Index: " << end_index << std::endl;
std::cout << "i: " << thread_index << std::endl;
for(int i = start_index; i < end_index; i++)
{
std::cout <<"Processing array element at: " << i << std::endl;
}
}
So in the above code, the thread whose argument is 0 should process the section 0 - 63(in the case of an array size of 256 and a thread count of 4), the thread whose argument is 1 should process the section 64 - 127, and so forth. The last thread processing 192 - 256.
Each of these four sections should processed in parallel.
Also, the pthread_join() functions are present in the original main code to make sure each thread finishes before the main thread terminates.
The problem is, that the value i in the above for-loop is taking on suspiciously large values. I'm not sure why this would occur since I am fairly new to pthreads.
It seems like sometimes it works perfectly fine and other times and other times, the value of i becomes so large that it causes the program to either abort or presents a segmentation fault.
The problem is indeed a data race caused by lack of synchronization. And the shared variable being used (and modified) by multiple threads is std::cout.
When using streams such as std::cout concurrently, you need to synchronize all operations with a stream by a mutex. Otherwise, depending on the platform and your luck, you might get output from multiple threads messed together (which might sometimes look like printed values being larger than you expect), or you might get the program crashed, or have other sorts of undefined behavior.
// Incorrect Code
unsigned int start_index = (thread_index*(list_size/thread_count));
unsigned int end_index = ((thread_index+1)*(list_size/thread_count));
The above code is critical region is wrong in your above program. as there is no synchronization mechanism has been used so there is data race.This leads to the wrong calculation of start_index and end_index counters and hence we may get wrong(random garbage values) and hence the for loop variable "i" goes on the toss. So you should use the following code to synchronize the critical region of your program.
// Correct Code
s=thread_mutex_lock (&mutexhandle);
start_index = (thread_index*(list_size/thread_count));
end_index = ((thread_index+1)*(list_size/thread_count));
s=thread_mutex_unlock (&mutexhandle);
I am new to here and I hope I am doing everything right.
I was wondering how to find out which thread finishes after waiting for one to finish using the WaitForMultipleObjects command. Currently I have something along the lines of:
int checknum;
int loop = 0;
const int NumThreads = 3;
HANDLE threads[NumThreads];
WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
threads[loop] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
It is only supposed to have a max of three threads running at the same time. So I have a loop to begin all three threads (hence the loop value). The problem is when I go through it again, I would like to change the value of loop to the value of whichever thread just finished its task so that it can be used again. Is there any way to find out which thread in that array had finished?
I would paste the rest of my code, but I'm pretty sure no one needs all 147 lines of it. I figured this snippet would be enough.
When the third parameter is false, WaitForMultipleObjects will return as soon as ANY of the objects is signaled (it doesn't need to wait for all of them).
And the return value indicates which object caused it to return. It will be WAIT_OBJECT_0 for the first object, WAIT_OBJECT_0 + 1 for the second, etc.
I am away from my compiler and I don't know of an onlione IDE that works with windows but here is the rough idea of what you need to do.
const int NumThreads = 3;
HANDLE threads[NumThreads];
//create threads here
DWORD result = WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
if(result >= WAIT_OBJECT_0 && result - WAIT_OBJECT_0 < NumThreads){
int index = result - WAIT_OBJECT_0;
if(!CloseHandle(Handles[index])){ //need to close to give handle back to system even though the thread has finished
DWORD error = GetLastError();
//TODO handle error
}
threads[index] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
}
else {
DWORD error = GetLastError();
//TODO handle error
break;
}
at work we do this a bit differently. We have made a library which wraps all needed windows handle types and preforms static type checking (though conversion operators) to make sure you can't wait for an IOCompletionPort with a WaitForMultipleObjects (which is not allowed). The wait function is variadic rather than taking an array of handles and its size and is specialized using SFINAE to use WaitForSingleObject when there is only one. It also takes Lambdas as arguements and executes the corresponding one depending on the signaled event.
This is what it looks like:
Win::Event ev;
Win::Thread th([]{/*...*/ return 0;});
//...
Win::WaitFor(ev,[]{std::cout << "event" << std::endl;},
th,[]{std::cout << "thread" << std::endl;},
std::chrono::milliseconds(100),[]{std::cout << "timeout" << std::endl;});
I would highly recommend this type of wrapping because at the end of the day the compiler optimizes it to the same code but you can't make nearly as many mistakes.
I'm trying to solve Dining philosophers problem using C++.
Code is compiled with g++ -lpthread.
Entire solution is on philosophers github. Repository contains two cpp files: main.cpp and philosopher.cpp. "Main.cpp" creates mutex variable, semaphore, 5 conditinal variables, 5 forks, and starts philosophers. Semaphore is used only to synchronize start of philosophers. Other parameters are passed to philosophers to solve a problem. "Philosopher.cpp" contains solution for given problem but after few steps deadlock occurs.
Deadlock occurs when philosopher 0 is eating, and philosopher 1 (next to him) wants to take forks. Then, philosopher 1 has taken mutex, and wont give it back until philosopher 0 puts his forks down. Philosopher 0 can't put his forks down because of taken mutex, so we have a deadlock. Problem is in Philosopher::take_fork method, call for pthread_cond_wait(a,b) isn't releasing mutex b. Can't figure out why?
// Taking fork. If eather lef or right fork is taken, wait.
void Philosopher::take_fork(){
pthread_mutex_lock(&mon);
std::cout << "Philosopher " << id << " is waiting on forks" << std::endl;
while(!fork[id] || !fork[(id + 1)%N])
pthread_cond_wait(cond + id, &mon);
fork[id] = fork[(id + 1)%N] = false;
std::cout << "Philosopher " << id << " is eating" << std::endl;
pthread_mutex_unlock(&mon);
}
Please reference to this code for the rest.
Your call to pthread_cond_wait() is fine, so the problem must be elsewhere. You have three bugs that I can see:
Firstly, in main() you are only initialising the first condition variable in the array. You need to initialise all N condition variables:
for(int i = 0; i < N; i++) {
fork[i] = true;
pthread_cond_init(&cond[i], NULL);
}
pthread_mutex_init(&mon, NULL);
Secondly, in put_fork() you have an incorrect calculation for one of the condition variables to signal:
pthread_cond_signal(cond + (id-1)%N); /* incorrect */
When id is equal to zero, (id - 1) % N is equal to -1, so this will try to signal cond - 1, which does not point at a condition variable (it's possible that this pointer actually corrupts your mutex, since it might well be placed directly before cond on the stack). The calculation you actually want is:
pthread_cond_signal(cond + (id + N - 1) % N);
The third bug isn't the cause of your deadlock, but you shouldn't call srand(time(NULL)) every time you call rand() - just call that once, at the start of main().
i have a problem with the order of execution of the threads created consecutively.
here is the code.
#include <iostream>
#include <Windows.h>
#include <boost/thread.hpp>
using namespace std;
boost::mutex mutexA;
boost::mutex mutexB;
boost::mutex mutexC;
boost::mutex mutexD;
void SomeWork(char letter, int index)
{
boost::mutex::scoped_lock lock;
switch(letter)
{
case 'A' : lock = boost::mutex::scoped_lock(mutexA); break;
case 'B' : lock = boost::mutex::scoped_lock(mutexB); break;
case 'C' : lock = boost::mutex::scoped_lock(mutexC); break;
case 'D' : lock = boost::mutex::scoped_lock(mutexD); break;
}
cout << letter <<index << " started" << endl;
Sleep(800);
cout << letter<<index << " finished" << endl;
}
int main(int argc , char * argv[])
{
for(int i = 0; i < 16; i++)
{
char x = rand() % 4 + 65;
boost::thread tha = boost::thread(SomeWork,x,i);
Sleep(10);
}
Sleep(6000);
system("PAUSE");
return 0;
}
each time a letter (from A to D) and a genereaion id (i) is passed to the method SomeWork as a thread. i do not care about the execution order between letters but for a prticular letter ,say A, Ax has to start before Ay, if x < y.
a random part of a random output of the code is :
B0 started
D1 started
C2 started
A3 started
B0 finished
B12 started
D1 finished
D15 started
C2 finished
C6 started
A3 finished
A9 started
B12 finished
B11 started --> B11 started after B12 finished.
D15 finished
D13 started
C6 finished
C7 started
A9 finished
how can avoid such conditions?
thanks.
i solved the problem using condition variables. but i changed the problem a bit. the solution is to keep track of the index of the for loop. so each thread knows when it does not work. but as far as this code is concerned, there are two other things that i would like to ask about.
first, on my computer, when i set the for-loop index to 350 i had an access violation. 310 was the number of loops, which was ok. so i realized that there is a maximum number of threads to be generated. how can i determine this number?
second, in visual studio 2008, the release version of the code showed a really strange behaviour. without using condition variables (lines 1 to 3 were commented out), the threads were ordered. how could that happen?
here is the code:
#include <iostream>
#include <Windows.h>
#include <boost/thread.hpp>
using namespace std;
boost::mutex mutexA;
boost::mutex mutexB;
boost::mutex mutexC;
boost::mutex mutexD;
class cl
{
public:
boost::condition_variable con;
boost::mutex mutex_cl;
char Letter;
int num;
cl(char letter) : Letter(letter) , num(0)
{
}
void doWork( int index, int tracknum)
{
boost::unique_lock<boost::mutex> lock(mutex_cl);
while(num != tracknum) // line 1
con.wait(lock); // line 2
Sleep(10);
num = index;
cout << Letter<<index << endl;
con.notify_all(); // line 3
}
};
int main(int argc , char * argv[])
{
cl A('A');
cl B('B');
cl C('C');
cl D('D');
for(int i = 0; i < 100; i++)
{
boost::thread(&cl::doWork,&A,i+1,i);
boost::thread(&cl::doWork,&B,i+1,i);
boost::thread(&cl::doWork,&C,i+1,i);
boost::thread(&cl::doWork,&D,i+1,i);
}
cout << "************************************************************************" << endl;
Sleep(6000);
system("PAUSE");
return 0;
}
If you have two different threads waiting for the lock, it's entirely non-deterministic which one will acquire it once the lock is released by the previous holder. I believe this is what you are experiencing. Assume B10 is holding the lock, and in the mean time threads are spawned for B11 and B12. B10 releases the lock - it's down to a coin toss as to whether B11 or B12 acquires it next, irrespective of which thread was created first, or even which thread started waiting first.
Perhaps you should implement work queues for each letter, such that you spawn exactly 4 threads, each of which consume work units? This is the only way to easily guarantee ordering in this way. A simple mutex is not going to guarantee ordering if multiple threads are waiting for the lock.
Even though B11 is started before B12 it is not guaranteed to be given a CPU time slice to execute SomeWork() prior to B12. This decision is up to the OS and its scheduler.
Mutex's are typically used to synchronize access to data between threads and a concern has been raised with the sequence of thread execution (i.e. data access).
If the threads for group 'A' are executing the same code on the same data then just use one thread. This will eliminate context switching between threads in the group and yield the same result. If the data is changing consider a producer/consumer pattern. Paul Bridger give's an easy to understand producer/consumer example here.
Your threads have dependencies that must be satisfied before they start execution. In your example, B12 depends on B0 and B11. Somehow you have to track that dependency knowledge. Threads with unfinished dependencies must be made to wait.
I would look into condition variables. Each time a thread finishes SomeWork() it would use the condition variable's notify_all() method. Then all of the waiting threads must check if they still have dependencies. If so, go back and wait. Otherwise, go ahead and call SomeWork().
You need some way for each thread to determine if it has unfinished dependencies. This will probably be some globally available entity. You should only modify it when you have the mutex (in SomeWork()). Reading by multiple threads should be safe for simple data structures.