Strict Alternation for 3 processes - c++

Well, Recently I have come to learn Strict Alternation while studying operating systems concepts. To reduce chances of race condition and handle two processes we go like this:
Process 0:
While (TRUE) {
while (turn != 0); // wait
critical_section();
turn = 1;
noncritical_section();}
}
Process 1:
While (TRUE) {
while (turn != 1); // wait
critical_section();
turn = 0;
noncritical_section();
}
But I'm wondering how I can handle 3 processes to reduce racing condition even more?
My approach is:
Process 0:
while (turn != 0 && turn != 2); // wait
critical_section();
turn = 1;
noncritical_section();}
Process 1:
while (turn != 1 && turn != 0); // wait
critical_section();
turn = 2;
noncritical_section();}
Process 3:
while (turn != 1 && turn != 2); // wait
critical_section();
turn = 0;
noncritical_section();}
Is my approach okay? what do you guys suggest? and are there anything better out there?
Thanks

With what you have it wouldn't necessarily strictly alternate anyway, for instance, it could go into turn = 0 and the turn = 1 or turn = 2 code could follow. My suggestion would be to use OS-level events, one for each code path and each process triggers the one that follows.

With all the justified criticism on the presented concept, no one yet cared to correct the error in the posted approach.
The general pattern is simple in each process:
while it's not my turn, wait
do critical section
set turn to next process
…
So, the posted while (turn …) conditions in the case of 3 processes are, as SoronelHaetir noted, wrong. For process 0 and 1 they have to stay the same as in the 2-process case; generally for process i it's while (turn != i) ; // wait.

Related

Threading and Mutex

I'm working on a program that simulates a gas station. Each car at the station is it's own thread. Each car must loop through a single bitmask to check if a pump is open, and if it is, update the bitmask, fill up, and notify other cars that the pump is now open. My current code works but there are some issues with load balancing. Ideally all the pumps are used the same amount and all cars get equal fill-ups.
EDIT: My program basically takes a number of cars, pumps, and a length of time to run the test for. During that time, cars will check for an open pump by constantly calling this function.
int Station::fillUp()
{
// loop through the pumps using the bitmask to check if they are available
for (int i = 0; i < pumpsInStation; i++)
{
//Check bitmask to see if pump is open
stationMutex->lock();
if ((freeMask & (1 << i)) == 0 )
{
//Turning the bit on
freeMask |= (1 << i);
stationMutex->unlock();
// Sleeps thread for 30ms and increments counts
pumps[i].fillTankUp();
// Turning the bit back off
stationMutex->lock();
freeMask &= ~(1 << i);
stationCondition->notify_one();
stationMutex->unlock();
// Sleep long enough for all cars to have a chance to fill up first.
this_thread::sleep_for(std::chrono::milliseconds((((carsInStation-1) * 30) / pumpsInStation)-30));
return 1;
}
stationMutex->unlock();
}
// If not pumps are available, wait until one becomes available.
stationCondition->wait(std::unique_lock<std::mutex>(*stationMutex));
return -1;
}
I feel the issue has something to do with locking the bitmask when I read it. Do I need to have some sort of mutex or lock around the if check?
It looks like every car checks the availability of pump #0 first, and if that pump is busy it then checks pump #1, and so on. Given that, it seems expected to me that pump #0 would service the most cars, followed by pump #1 serving the second-most cars, all the way down to pump #(pumpsInStation-1) which only ever gets used in the (relatively rare) situation where all of the pumps are in use simultaneously at the time a new car pulls in.
If you'd like to get better load-balancing, you should probably have each car choose a different random ordering to iterate over the pumps, rather than having them all check the pumps' availability in the same order.
Normally I wouldn't suggest refactoring as it's kind of rude and doesn't go straight to the answer, but here I think it would help you a bit to break your logic into three parts, like so, to better show where the contention lies:
int Station::acquirePump()
{
// loop through the pumps using the bitmask to check if they are available
ScopedLocker locker(&stationMutex);
for (int i = 0; i < pumpsInStation; i++)
{
// Check bitmask to see if pump is open
if ((freeMask & (1 << i)) == 0 )
{
//Turning the bit on
freeMask |= (1 << i);
return i;
}
}
return -1;
}
void Station::releasePump(int n)
{
ScopedLocker locker(&stationMutex);
freeMask &= ~(1 << n);
stationCondition->notify_one();
}
bool Station::fillUp()
{
// If a pump is available:
int i = acquirePump();
if (i != -1)
{
// Sleeps thread for 30ms and increments counts
pumps[i].fillTankUp();
releasePump(i)
// Sleep long enough for all cars to have a chance to fill up first.
this_thread::sleep_for(std::chrono::milliseconds((((carsInStation-1) * 30) / pumpsInStation)-30));
return true;
}
// If no pumps are available, wait until one becomes available.
stationCondition->wait(std::unique_lock<std::mutex>(*stationMutex));
return false;
}
Now when you have the code in this form, there is a load balancing issue which is important to fix if you don't want to "exhaust" one pump or if it too might have a lock inside. The issue lies in acquirePump where you are checking the availability of free pumps in the same order for each car. A simple tweak you can make to balance it better is like so:
int Station::acquirePump()
{
// loop through the pumps using the bitmask to check if they are available
ScopedLocker locker(&stationMutex);
for (int n = 0, i = startIndex; n < pumpsInStation; ++n, i = (i+1) % pumpsInStation)
{
// Check bitmask to see if pump is open
if ((freeMask & (1 << i)) == 0 )
{
// Change the starting index used to search for a free pump for
// the next car.
startIndex = (startIndex+1) % pumpsInStation;
// Turning the bit on
freeMask |= (1 << i);
return i;
}
}
return -1;
}
Another thing I have to ask is if it's really necessary (ex: for memory efficiency) to use bit flags to indicate whether a pump is used. If you can use an array of bool instead, you'll be able to avoid locking completely and simply use atomic operations to acquire and release pumps, and that'll avoid creating a traffic jam of locked threads.
Imagine that the mutex has a queue associated with it, containing the waiting threads. Now, one of your threads manages to get the mutex that protects the bitmask of occupied stations, checks if one specific place is free. If it isn't, it releases the mutex again and loops, only to go back to the end of the queue of threads waiting for the mutex. Firstly, this is unfair, because the first one to wait is not guaranteed to get the next free slot, only if that slot happens to be the one on its loop counter. Secondly, it causes an extreme amount of context switches, which is bad for performance. Note that your approach should still produce correct results in that no two cars collide while accessing a single filling station, but the behaviour is suboptimal.
What you should do instead is this:
lock the mutex to get exclusive access to the possible filling stations
locate the next free filling station
if none of the stations are free, wait for the condition variable and restart at point 2
mark the slot as occupied and release the mutex
fill up the car (this is where the sleep in the simulation actually makes sense, the other one doesn't)
lock the mutex
mark the slot as free and signal the condition variable to wake up others
release the mutex again
Just in case that part isn't clear to you, waiting on a condition variable implicitly releases the mutex while waiting and reacquires it afterwards!

C++ async only uses 2 cores

I am using async to run a method simultaneously, but when I check my CPU, it shows that only 2 of 8 are in use. My CPU utilization is about 13%-16% the whole time.
The function async should create a new thread with every call and thus should be able to use more processors or did I understand something wrong?
Here's my code:
for (map<string, Cell>::iterator a = cells.begin(); a != cells.end(); ++a)
{
for (map<string, Cell>::iterator b = cells.begin(); b != cells.end(); ++b)
{
if (a->first == b->first)
continue;
if (_paths.count("path_" + b->first + "_" + a->first) > 0)
{
continue;
}
tmp = "path_" + a->first + "_" + b->first;
auto future = async(launch::async, &Pathfinder::findPath, this, &a->second, &b->second, collisionZone);
_paths[tmp] = future.get();
}
}
Did I get the concept wrong?
EDIT:
Thanks guys, I figured it out now. I didn't know, that calling .get() on the future would wait for it to finish, which afterwards seems only logical...
However, I edited my code now:
for (map<string, Cell>::iterator a = cells.begin(); a != cells.end(); ++a)
{
for (map<string, Cell>::iterator b = cells.begin(); b != cells.end(); ++b)
{
if (a->first == b->first)
continue;
if (_paths.count("path_" + b->first + "_" + a->first) > 0)
{
continue;
}
tmp = "path_" + a->first + "_" + b->first;
mapBuffer[tmp] = async(launch::async, &Pathfinder::findPath, this, &a->second, &b->second, collisionZone);
}
}
for (map<string, future<list<Point>>>::iterator i = mapBuffer.begin(); i != mapBuffer.end(); ++i)
{
_paths[i->first] = i->second.get();
}
It works. Now it spawns threads properly and uses all my cpu power. You saved me a lot of trouble! Thanks again.
To answer the underlying problem:
You probably should refactor the code by splitting the loop. In the first loop, you create all the futures and put them in a map indexed by tmp. In the second loop, you loop over this map and get all the values from each future, storing the results in _paths
After the first loop, you'll have a lot of futures running in parallel, so your cores should be busy enough. If cells is big enough (>numCores), it may be wise to just split the inner loop.
std::async runs specified function asynchronously and returns immediately. That's it.
It's up to compiler how to do it. Some compilers create thread per async operation, some compilers have thread pool.
I recommend to read this: https://stackoverflow.com/a/15775870/2786682
By the way, your code does not really use std::async as you're making synchronous call to future.get just after 'spawning' the async operation.
YES, you did get it wrong. Parallel code requires some thoughts before writing any code.
Your code creates a future (which may and probably will spawn a new thread), and immediately after that, you force the newly created future to stop (call its .get()method), to synchronize, and have it returning a result.
So, with this strategy, your code will not utilize more than 2 cpu cores ever, at any point in time. It can't.
Actually, most of the time your code utilizes only a single core!
The trick is "to parallelize" your code.

waitpid() and fork() to limit number of child processes

My program is supposed to limit the number of child processes to 3.
With the code below, waitpid stalls my parent process so I can't create more child processes after the first one. If I don't use waitpid then I don't know when a child process quits to decrease the number of alive processes.
int numProcs = 0;
while(1==1) {
/*
* inserts code that waits for incoming input
*/
numProcs++;
pid = fork();
if (pid == 0) {
doStuff(); // may exit anytime, based on user input
} else {
if (numProcs > 3) {
wait(&status);
numProcs--;
} else {
waitpid(pid, &status, 0); // PROBLEM!
numProcs--;
}
}
}
I've been searching for this problem the whole day. Can somebody help?
At the risk of being obvious, you basically want to just drop the else clause. The logic you're looking for is something like:
int max_active = 3; // or whatever
int number_active = 0;
bool done = false;
for (; !done; ++number_active) {
// wait for something to do;
GetSomeWork();
// wait for something to finish, if necessary.
for (; number_active >= max_active; --number_active)
wait(&status);
pid = fork();
if (pid < 0)
ReportErrorAndDie();
if (pid == 0)
DoTheWorkAndExit();
}
This actually lets you change the value of max_active without restarting, which is the only justification for the for loop around the wait() call.
The obvious complaint is that number_active in my version doesn't actually tell you how many processes are active, which is true. It tells you how many processes you haven't wait()'ed for, which means that you might keep some zombies (but the number is limited). If you're constantly running at or close to the maximum number of tasks, this doesn't matter, and unless your maximum is huge, it doesn't matter anyway, since the only Design Requirement was that you don't use more than the maximum number of tasks, and consequently you only have to know that the number active is not more than the maximum.
If this really bothers you and you want to clean the tasks up, you can put:
for (; waitpid(-1, &status, WNOHANG) > 0; --number_active) {}
before the other for loop, which will reap the zombies before checking if you need to block. (I can't remember if waitpid(-1, &status WNOHANG) returns an error if there are no processes at all, but in any event there's no point continuing the loop on an error.)
You have two problems with your code, but the second one is masked by the first.
Your immediate problem is that waitpid(pid, &status, 0); will block until the process with the specified pid terminates. You want to add the WNOHANG option as a third parameter to the waitpid() call. That will ensure that the call dosn't block.
This will add a new problem: You will have to check for yourself is any child process have terminated. You can do that using the WIFEXITED macro:
} else {
waitpid (-1, &status, WNOHANG);
if (WIFEXITED(status)) {
numProcs--;
}
}
The second problem is that your original code only waits for the latest pid to be created. You should instead wait for -1, which is all child processes.

Fast counting semaphore on Windows?

First of all, I know that it can be implemented with a mutex and condition variable, but I want the most efficient implementation possible.
I would like a semaphore with a fast-path when there's no contention. On Linux this is easy with a futex; for example, here's a wait:
if (AtomicDecremenIfPositive(_counter) > 0) return; // Uncontended
AtomicAdd(&_waiters, 1);
do
{
if (syscall(SYS_futex, &_counter, FUTEX_WAIT_PRIVATE, 0, nullptr, nullptr, 0) == -1) // Sleep
{
AtomicAdd(&_waiters, -1);
throw std::runtime_error("Failed to wait for futex");
}
}
while (AtomicDecrementIfPositive(_counter) <= 0);
AtomicAdd(&_waiters, -1);
and post:
AtomicAdd(&_counter, 1);
if (Load(_waiters) > 0 && syscall(SYS_futex, &_counter, FUTEX_WAKE_PRIVATE, 1, nullptr, nullptr, 0) == -1) throw std::runtime_error("Failed to wake futex"); // Wake one
At first I thought for Windows to just use NtWaitForKeyedEvent(). The problem is it's not a direct substitution because it doesn't atomically check the value at _counter before going into the kernel, and so can miss the wake from NtReleaseKeyedEvent(). Worse, then NtReleaseKeyedEvent() would block.
What's the best solution?
Windows has native semaphores with CreateSemaphore. Until and unless you have some kind of documented performance problem doing it the normal way, you shouldn't even consider optimizations that are fragile or hardware-specific.
I think something like this should work:
// bottom 16 bits: post count
// top 16 bits: wait count
struct Semaphore { unsigned val; }
wait(struct Semaphore *s)
{
retry:
do
old = s->val;
if old had posts (bottom 16 bits != 0)
new = old - 1
wait = false
else
new = old + 65536
wait = true
until successful CAS of &s->val from old to new
if wait == true
wait on keyed event
goto retry;
}
post(struct Semaphore *s)
{
do
old = s->val;
if old had waiters (top 16 bits != 0)
// perhaps new = old - 65536 and remove the "goto retry" above?
// not sure, but this is safer...
new = old - 65536 + 1
release = true
else
new = old + 1
release = false
until successful CAS of &s->val from old to new
if release == true
release keyed event
}
edit: that said, I'm not sure this would help you a lot. Your thread pool usually should be big enough that a thread is always ready to process your request. This means that not only waits, but also posts will always take the slow path and go to the kernel. So, counting semaphores are probably the one primitive where you do not really care about a userspace-only fastpath. Stock Win32 semaphores should be good enough. That said, I'm happy to be proven wrong!
I vote for your first idea, e.g critical section and condition variable. Critical section is fast enough and it does use interlocked operation before it goes to sleep. Or, you can experiment with SRWLocks instead of critical section. Condition variables (and SRWLocks) are very fast - their only problem is that there are no conditions on XP, but maybe you do not need to target this platform .
Qt has all kinds of things like QMutex, QSemaphore which are implemented in spirit like what you presented in your question.
Actually, I would suggest replacing the futex stuff with the usual OS-provided synchronization primitives; it should not matter much since that is the slow path anyway.

Mutual exclusion problem [duplicate]

Please take a look on the following pseudo-code:
boolean blocked[2];
int turn;
void P(int id) {
while(true) {
blocked[id] = true;
while(turn != id) {
while(blocked[1-id])
/* do nothing */;
turn = id;
}
/* critical section */
blocked[id] = false;
/* remainder */
}
}
void main() {
blocked[0] = false;
blocked[1] = false;
turn = 0;
parbegin(P(0), P(1)); //RUN P0 and P1 parallel
}
I thought that a could implement a simple Mutual - Exclution solution using the code above. But it's not working. Has anyone got an idea why?
Any help would really be appreciated!
Mutual Exclusion is in this exemple not guaranteed because of the following:
We begin with the following situation:
blocked = {false, false};
turn = 0;
P1 is now executes, and skips
blocked[id] = false; // Not yet executed.
The situation is now:
blocked {false, true}
turn = 0;
Now P0 executes. It passes the second while loop, ready to execute the critical section. And when P1 executes, it sets turn to 1, and is also ready to execute the critical section.
Btw, this method was originally invented by Hyman. He sent it to Communications of the Acm in 1966
Mutual Exclusion is in this exemple not guaranteed because of the following:
We begin with the following situation:
turn= 1;
blocked = {false, false};
The execution runs as follows:
P0: while (true) {
P0: blocked[0] = true;
P0: while (turn != 0) {
P0: while (blocked[1]) {
P0: }
P1: while (true) {
P1: blocked[1] = true;
P1: while (turn != 1) {
P1: }
P1: criticalSection(P1);
P0: turn = 0;
P0: while (turn != 0)
P0: }
P0: critcalSection(P0);
Is this homework, or some embedded platform? Is there any reason why you can't use pthreads or Win32 (as relevant) synchronisation primitives?
Maybe you need to declare blocked and turn as volatile, but without specifying the programming language there is no way to know.
Concurrency can not be implemented like this, especially in a multi-processor (or multi-core) environment: different cores/processors have different caches. Those caches may not be coherent. The pseudo-code below could execute in the order shown, with the results shown:
get blocked[0] -> false // cpu 0
set blocked[0] = true // cpu 1 (stored in CPU 1's L1 cache)
get blocked[0] -> false // cpu 0 (retrieved from CPU 0's L1 cache)
get glocked[0] -> false // cpu 2 (retrieved from main memory)
You need hardware knowledge to implement concurrency.
Compiler might have optimized out the "empty" while loop. Declaring variables as volatile might help, but is not guaranteed to be sufficient on multiprocessor systems.