I have an application where I need at least (for some consistency guarantees) k out of n threads to return. After that, I'd like the other threads to continue running, but after k threads finish, then I continue along. I can have a variable that gets incremented at the end of each thread, and then have a spin lock, but I'd prefer to not have a spin lock. Any suggestions for something else to do?
I can have a variable that gets incremented at the end of each thread
That would work. I recommend doing that.
and then have a spin lock, but I'd prefer to not have a spin lock
There's no need to use a spin lock. Wait on a condition variable instead.
Related
I have two threads using a common semaphore to conduct some processing. What I noticed is Thread 1 appears to hog the semaphore, and thread 2 is never able to acquire it. My running theory is maybe through compiler optimization/thread priority, somehow it just keeps giving it to thread 1.
Thread 1:
while(condition) {
mySemaphore->aquire();
//do some stuff
mySemaphore->release();
}
Thread 2:
mySemaphore->aquire();
//block of code i never reach...
mySemaphore->release();
As soon as I add a delay before Thread 1s next iteration, it allows thread 2 in. Which I think confirms my theory.
Basically for this to work I might need some sort of ordering aware lock. Does my reasoning make sense?
Prove or Disprove the correctness of the following semaphore.
Here are my thoughts on this.
Well, if someone implements it so wait runs first before signal, there will be a deadlock. The program will call wait, decrement count, enter the count < 0 condition and wait at gate. Because it is waiting at gate, it cannot proceed to the signal that is right after the wait. So in that case, this might imply that the semaphore is incorrect.
However, if we assume that two processes are running, one running wait first and the other running signal first, then if the first process run waits and blocks at wait(gate), then the other process can run signal and release the process that was blocked. Thus, continuing on this scheme would make the algorithm valid and not result in a dead lock.
Given implementation follows these principles:
Binary semaphore S protect count variable from concurrent access.
If non-negative, count reflect number of free resources for general semaphore. Otherwise, absolute value of count reflect number of threads which wait (p5) or ready-to-wait (between p4 and p5) on binary semaphore gate.
Every signal() call increments count and, if its previous value is negative, signals binary semaphore gate.
But, because of possibility of ready-to-wait state, given implementation is incorrect:
Assume thread#1 calls wait(), and currently is in ready-to-wait state. Assume another thread#2 also calls wait(), and currently is in ready-to-wait state too.
Assume thread#3 calls signal() at this moment. Because count is negative (-2), the thread performs all operations including p10 (signal(gate)). Because gate is not waited at the moment, it becomes in free state.
Assume another thread#4 calls signal() at this moment. Because count is still negative (-1), the thread also performs all operations including p10. But now gate is already in free state. So, signal(gate) is no-op here, and we have missed signal event: only one of thread#1 and thread#2 will continue after executing p5 (wait(gate)). Other thread will wait forever.
Without possibility of ready-to-wait state (that is signal(S) and wait(gate) would be executed atomically) implementation would be OK.
I found a bug in my program, that the same thread is awoke twice taking the opportunity for another thread to run, thus causing unintended behaviours. It is required in my program that all threads waiting should run exactly once per turn. This bug happens because I use semaphores to make the threads wait. With a semaphore initialized with count 0, every thread calls down to the semaphore at the start of its infinite loop, and the main thread calls up in a for loop NThreads (the number of threads) times. Occasionally the same thread takes the up call twice and the problem arises.
What is the way to deal with this problem properly? Is using condition variables and broadcasting a way to do this? Will it guarantee that every thread is awoke once and only once? What are other good ways possible?
On windows, you could use WaitForMultipleObjects to select a ready thread from the threads that have not been run in the current Nthread iterations.
Each thread should have a "ready" event to signal when it is ready, and a "wake" event to wait on after it has signaled its "ready" event.
At the start of your main thread loop (1st of NThreads iteration), call WaitForMultipleObjects with an array of your NThreads "ready" events.
Then set the "wake" event of the thread corresonding to the "ready" event returned by WaitForMultipleObjects, and remove it from the array of "ready" handles. That will guaranty that the thread that has already been run won't be returned by WaitForMultipleObjects on the next iteration.
Repeat until the last iteration, where you will call WaitForMultipleObjects with an array of only 1 thread handle (I think this will work as if you called WaitForSingleObject).
Then repopulate the array of NThreads "ready" events for the next new Nthreads iterations.
Well, use an array of semaphores, one for each thread. If you want the array of threads to run once only, send one unit to each semaphore. If you want the threads to all run exactly N times, send N units to each semaphore.
I have a class instances which is being used in multiple threads. I am updating multiple member variables from one thread and reading the same member variables from one thread. What is the correct way to maintain the thread safety?
eg:
phthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
//unlock here?
Should I unlock the mutex over here? ( if another thread access the obj1 member variables 1 and 2 now, the accessed data might not be correct because memberV2 has not yet be updated. However, if I does not release the lock, the other thread might block because there is time consuming operation below.
//perform some time consuming operation which must be done before the assignment to memberV2 and after the assignment to memberV1
obj1.memberV2 = update field 2 from some calculation
pthread_mutex_unlock(&mutex1) //should I only unlock here?
Thanks
Your locking is correct. You should not release the lock early just to allow another thread to proceed (because that would allow the other thread to see the object in an inconsistent state.)
Perhaps it would be better to do something like:
//perform time consuming calculation
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = result;
pthread_mutex_unlock(&mutex1)
This of course assumes that the values used in the calculation won't be modified on any other thread.
Its hard to tell what you are doing that is causing problems. The mutex pattern is pretty simple. You Lock the mutex, access the shared data, unlock the mutex. This protects data, becuase the mutex will only let one thread get the lock at a time. Any thread that fails to get the lock has to wait till the mutex is unlocked. Unlocking wakes the waiters up. They will then fight to attain the lock. Losers go back to sleep. The time it takes to wake up might be multiple ms or more from the time the lock is released. Make sure you always unlock the mutex eventually.
Make sure you don't to keep locks locked for a long period of time. Most of the time, a long period of time is like a micro second. I prefer to keep it down around "a few lines of code." Thats why people have suggested that you do the long running calculation outside the lock. The reason for not keeping locks a long time is you increase the number of times other threads will hit the lock and have to spin or sleep, which decreases performance. You also increase the probability that your thread might be pre-empted while owning the lock, which means the lock is enabled while that thread sleeps. Thats even worse performance.
Threads that fail a lock dont have to sleep. Spinning means a thread encountering a locked mutex doesn't sleep, but loops repeatedly testing the lock for a predefine period before giving up and sleeping. This is a good idea if you have multiple cores or cores capable of multiple simultaneous threads. Multiple active threads means two threads can be executing the code at the same time. If the lock is around a small amount of code, then the thread that got the lock is going to be done real soon. the other thread need only wait a couple nano secs before it will get the lock. Remember, sleeping your thread is a context switch and some code to attach your thread to the waiters on the mutex, all have costs. Plus, once your thread sleeps, you have to wait for a period of time before the scheduler wakes it up. that could be multiple ms. Lookup spinlocks.
If you only have one core, then if a thread encounters a lock it means another sleeping thread owns the lock and no matter how long you spin it aint gonna unlock. So you would use a lock that sleeps a waiter immediately in hopes that the thread owning the lock will wake up and finish.
You should assume that a thread can be preempted at any machine code instruction. Also you should assume that each line of c code is probably many machine code instructions. The classic example is i++. This is one statement in c, but a read, an increment, and a store in machine code land.
If you really care about performance, try to use atomic operations first. Look to mutexes as a last resort. Most concurrency problems are easily solved with atomic operations (google gcc atomic operations to start learning) and very few problems really need mutexes. Mutexes are way way way slower.
Protect your shared data wherever it is written and wherever it is read. else...prepare for failure. You don't have to protect shared data during periods of time when only a single thread is active.
Its often useful to be able to run your app with 1 thread as well as N threads. This way you can debug race conditions easier.
Minimize the shared data that you protect with locks. Try to organize data into structures such that a single thread can gain exclusive access to the entire structure (perhaps by setting a single locked flag or version number or both) and not have to worry about anything after that. Then most of the code isnt cluttered with locks and race conditions.
Functions that ultimately write to shared variables should use temp variables until the last moment and then copy the results. Not only will the compiler generate better code, but accesses to shared variables especially changing them cause cache line updates between L2 and main ram and all sorts of other performance issues. Again if you don't care about performance disregard this. However i recommend you google the document "everything a programmer should know about memory" if you want to know more.
If you are reading a single variable from the shared data you probably don't need to lock as long as the variable is an integer type and not a member of a bitfield (bitfield members are read/written with multiple instructions). Read up on atomic operations. When you need to deal with multiple values, then you need a lock to make sure you didn't read version A of one value, get preempted, and then read version B of the next value. Same holds true for writing.
You will find that copies of data, even copies of entire structures come in handy. You can be working on building a new copy of the data and then swap it by changing a pointer in with one atomic operation. You can make a copy of the data and then do calculations on it without worrying if it changes.
So maybe what you want to do is:
lock the mutex
Make a copy of the input data to the long running calculation.
unlock the mutex
L1: Do the calculation
Lock the mutex
if the input data has changed and this matters
read the input data, unlock the mutex and go to L1
updata data
unlock mutex
Maybe, in the example above, you still store the result if the input changed, but go back and recalc. It depends if other threads can use a slightly out of date answer. Maybe other threads when they see that a thread is already doing the calculation simply change the input data and leave it to the busy thread to notice that and redo the calculation (there will be a race condition you need to handle if you do that, and easy one). That way the other threads can do other work rather than just sleep.
cheers.
Probably the best thing to do is:
temp = //perform some time consuming operation which must be done before the assignment to memberV2
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = temp; //result from previous calculation
pthread_mutex_unlock(&mutex1)
What I would do is separate the calculation from the update:
temp = some calculation
pthread_mutex_lock(&mutex1);
obj.memberV1 = 1;
obj.memberV2 = temp;
pthread_mutex_unlock(&mutex1);
I've known for eons that the way you use a condition variable is
lock
while not task_done
wait on condition variable
unlock
Because sometimes condition variables will spontaneously wake. But I've never understood why that's the case. In the past I've read it's expensive to make a condition variable that doesn't have that behavior, but nothing more than that.
So... why do you need to worry about falsely being woken up when waiting on a condition variable?
It isn't that the condition variable will erroneously wake up; the condition variable will only wake up if it has been signalled from another thread. However, it is possible that by the time the thread has been re-scheduled for execution, some other thread has already managed to nab the resource on which you were waiting, and so it is necessary to double-check. For example, if a group of threads x,y,z are waiting on some resource R that w was previously holding, and x,y,z,w communicate through a condition variable... suppose w is done with R and signals x,y,z. So, x,y, and z will all be taken off of the wait queue and placed in the runqueue to be scheduled for execution. Suppose x is scheduled first... so then it acquires R, and then it might be put to sleep, and then y might be scheduled, and so when y is running, the resource R on which y was previously waiting is still not available, so it is necessary for y to go to sleep again. Then z wakes up, and z also finds that R is still in use, so z needs to go back to sleep again, etc.
If you have exactly two threads, and the condition variable is shared between just the two of them, there are sometimes situations where it is ok to not perform that check. However, if you want to make your application dynamic and capable of scaling up to an arbitrary number of threads, then it's good to be in the habit (not to mention much simpler and less worrisome) to do that extra check as it is required in most situations.
Threads can wake up without being signaled. This is called a spurious wakeup. However, just precisely why they occur is a question that seems to be mired in superstition and uncertainty. Reasons I have seen include being a side effect of the way threading implementations work, or being intentionally added to force programmers to properly use loops instead of conditionals around wait.