Difference between Producer Consumer and Sleeping Barber

Difference between Producer Consumer and Sleeping Barber - concurrency

I'm learning synchronization problems and read about Producer Consumer problem and Sleeping Barber problem.
I found that Producer Consumer problem is very similar with Sleeping Barber problem. Frankly I cannot found the difference between them.
Let me say...
When the producer makes a product, he add it into the queue; when a customer arrived at the barber shop, he goes to the waiting room.
Of course, the customer goes to the barber and wake him up if he is sleeping. And it seems similar in Product Consumer problem. The consumer may sleep if the queue is empty, and someone should wake him up when a new product is added to the queue.
When the queue is full, the producer should not make more product and so he has better sleep. When the consumer consumes a product from the queue and there is a place in the queue, the producer should wake up (or should be waked up by someone) to work.
I think this is similar in Sleeping Barber problem. When the waiting room is full, the newly arrived customer could wait the place in waiting room. When someone in the waiting room goes to barber, the customer of outside could enter the waiting room. (Sure, the customer could just go back home if there is no empty chair in the waiting room, but I don't think it is a big difference)
I think the implementations to solve the both problems are similar. Among various versions of the implementations, I saw the implementation using two semaphores and a mutex. Two semaphores are used to wake sleeping actors, and a mutex is used to prevent a corruption of data area by concurrent access.
I believe this solution could solve the both of problems. So I feels there is no difference between Producer Consumer problem and Sleeping Barber problem.

Producer-Consumer problem is about that producer and consumer have different throughputs, so in the case of Producer producing tasks faster then Consumer executes them, then size of the queue with tasks between them will grow, thus will grow time for task from the moment it arrived in a queue and the moment Consumer takes it from the queue, and eventually you will be out of memory.
Sleeping barber problem is about race conditions. Imagine you have the same Producer generating tasks(people coming to barbershop) and Consumer(barber). In case not to do busy waiting your Consumer sleeps when there are no more tasks in the queue, so when a new task arrives it first notifies Consumer that it shouldn't sleep. So now imagine a case, when you Consumer currently is executing task A, and task B arrives, it sees that Consumer is working and will just go to the queue, but its not an atomic operation, so between this check(that Consumer is busy) and adding itself to the queue, Consumer can already finish task A and check the queue, see nothing(as B is still not added), and go to sleep, however B doesn't know about that and will wait till eventually another task C will come and awake Consumer.
I hope that shows that these problems are different and they have many different ways of solving them(you can easily google some ways to solve them).
For example in java you can use BlockingQueue to solve sleeping barber problem, so basically queue itself will awake your Producer in case there are new tasks to be executed.
One way to solve Producer-Consumer problem, is also to use BlockingQueue with fixed size, so when the queueis full, Producer will be blocked, when it will try to add more tasks, till there are more free space in a queue.
These problems are different, and some solutions solve just one of them, not the both. For example, for consumer-producer problem, apart from sleeping, depending on your requirements, there are other strategies:
when the queue is full just discard new tasks producer creates. This can seem stupid and inefficient, however in a real world this approach sometimes used when you don't have a requirement to execute all produced task. A rather simple example can be that you have a producer which creates some tasks, but these tasks have timeouts and timeouts are set by some thirdparty data provider, so time starts even before you consumer actually created a task, so sleeping is not an option, and if the queue is full then its unlikely that you will do this task in time, and if you will increase the queue, not only this task would fail, but then next tasks will fail with timeouts too, and what eventually would happen - there would be no tasks which are executed in time. So you choice is to discard some new tasks, so that at some point, you can add them again and they will be executed in time. Its a real world example from quotes with rates which banks send for trading.
the other option to solve this consumer-producer problem is to use so named reactive streams which is from Reactive programming. Simply, its when your consumer actually tells your producer information about its throughput. It can be number of tasks he is ready to consume at the moment, or number of tasks per second. For example, you can take a look at this implementation or this article
These two approaches solve consumer producer problem, but they are not solve sleeping barber problem, as they don't specifically say anything about producer communication to the tasks queue.
I think you got sleeping barber problem wrong, as its not about waiting room(queue) being full, but being empty. The problem is that in case waiting room is empty and barber is busy with someone else, without some synchronization you can find yourself in a wrong state.(In consumer-producer problem you are always in a right state). Particularly new customer arrives, sees barber is busy and goes to waiting room, imagine, that this waiting room is far away, then it would take some time to get there, and while he is walking to this room, barber has already finished his work, and used a video camera to check if somebody waiting in this room, but there is nobody there at the moment, as you are still going, so barber goes asleep, and you reach waiting room and sit there, forever, if no other customer will come and wake up a barber.
There are also some other ways to solve this problem. However the most common is indeed using BlockingQueue. However it may seem to you that BlockingQueue always solves both problems, but its right only for fixed size BlockingQueue. You can as well have for example blocking queue without any capacity limitation(except your heap size obviously), e.g. LinkedBlockingQueue, then producer will never sleep, as the queue would just increase its capacity when needed. Its also widely used approach. Because sometimes you couldn't stop your producer, as it may be some remote thirdparty system, which never stops and just produces you new tasks, and you need to consume all of them, and you obviously couldn't say to it to stop, as there could be other clients listening to this data, and they don't want to stop when just one client is not as fast as the others. So you would need then to use queue without fixed size, so we will have sleeping barber problem solved by blocking queue, but we can still have problem with throughput, meaning consumer-producer problem. To solve it as well we can create some observer thread, which will sometimes check current size of the queue, if at some point it sees that queue size has increased significantly it can for example create another consumer for this queue, or notify developers that there is a problem, or do some other things. However this observer doesn't help us to solve sleeping barber problem at all.
I need to mention, that these are not the only ways of solving these problems, and they can have their advantages and disadvantages.
I tried to highlight words you can google to understand them completely.

Related

why some thread pool implementation doesn't use producer and consumer model

I intend to implement a thread pool to manage threads in my project. The basic structure of thread pool come to my head is queue, and some threads generate tasks into this queue, and some thread managed by thread pool are waiting to handle those task. I think this is class producer and consumer problem. But when I google thread pool implementation on the web, I find those implementation seldom use this classic model, so my question is why they don't use this classic model, does this model has any drawbacks? why they don't use full semaphore and empty semaphore to sync?

If you have multiple threads waiting on a single resource (in this case the semaphores and queue) then you are creating a bottle neck. You are forcing all tasks through one queue, even though you have multiple workers. Logically this might make sense if the workers are usually idle, but the whole point of a thread pool is to deal with a heavily loaded scenario where the workers are kept busy (for maximum through-put). Using a single input queue will be particularly bad on a multi-processor system where all workers read and write the head of the queue when they are trying to get the next task. Even though the lock contention might be low, the queue head pointer will still need to be shared/communicated from one CPU cache to another each time it is updated.
Think about the ideal case: all workers are always busy. When a new task is enqueued you want it to be dispatched to the worker that will complete its current/pending task(s) first.
If, as a client, you had a contention-free oracle that could tell you which worker to enqueue a new task to, and each worker had its own queue, then you could implement each worker with its own multi-writer-single-reader queue and always dispatch new tasks to the best queue, thus eliminating worker contention on a single shared input queue. Of course you don't have such an oracle, but this mechanism still works pretty well until a worker runs out of tasks or the queues get imbalanced. "Work stealing" deals with these cases, while still reducing contention compared to the single queue case.
See also:
Is Work Stealing always the most appropriate user-level thread scheduling algorithm?

Why there's no Producer and Consumer model implementation
This model is very generic and could have lots of different explanations, one of the implementation could be a Queue:
Try Apache APR Queue:
It's documented as Thread Safe FIFO bounded queue.
http://apr.apache.org/docs/apr-util/1.3/apr__queue_8h.html

Thread Pool and Job Queue Architecture?

I have an epoll to receive incoming events and put them into a Job Queue.
When the event is put into Job Queue, pthread conditional signal is sent to wake up the worker threads.
However I've met a problem where all worker threads are busy and Jobs keep stacks up on the queue. This is serious problem because if Jobs are stacked, and new event doesn't come in a while, Jobs in Queue won't be handed to Worker Threads.
As soon as Thread gets available, I wish they can take jobs from the Job Queue automatically (if Possible).
Is there anyway to do this? All I can think of is.. adding a Queue Observer and send a conditional Signal in intervals.
Also, I know that STL Queue is not thread-safe. Does that mean I have to Mutex Lock everytime when I get an access to STL Queue? WOn't this slow down my working process?
Any suggestion to solve this problem will be great.

The classic way of counting/signaling jobs on a producer-consumer queue is to use a semaphore. The producer signals it after pushing on a job and the consumer waits on it before popping one off. You need a mutex around the push/pop as well to protect the queue from multiple access.

Take a look at the work-stealing thread pool in .NET. Yes you'll have to mutex lock the global queue, to that end I've written a double-lock deque so operations on the front / back can be done in parallel. I also have a lock-free deque but the overhead is too high for client-side apps.

Scheduling of Process(s) waiting for Semaphore

It is always said when the count of a semaphore is 0, the process requesting the semaphore are blocked and added to a wait queue.
When some process releases the semaphore, and count increases from 0->1, a blocking process is activated. This can be any process, randomly picked from the blocked processes.
Now my question is:
If they are added to a queue, why is the activation of blocking processes NOT in FIFO order? I think it would be easy to pick next process from the queue rather than picking up a process at random and granting it the semaphore. If there is some idea behind this random logic, please explain. Also, how does the kernel select a process at random from queue? getting a random process that too from queue is something complex as far as a queue data structure is concerned.
tags: various OSes as each have a kernel usually written in C++ and mutex shares similar concept

A FIFO is the simplest data structure for the waiting list in a system
that doesn't support priorities, but it's not the absolute answer
otherwise. Depending on the scheduling algorithm chosen, different
threads might have different absolute priorities, or some sort of
decaying priority might be in effect, in which case, the OS might choose
the thread which has had the least CPU time in some preceding interval.
Since such strategies are widely used (particularly the latter), the
usual rule is to consider that you don't know (although with absolute
priorities, it will be one of the threads with the highest priority).

When a process is scheduled "at random", it's not that a process is randomly chosen; it's that the selection process is not predictable.
The algorithm used by Windows kernels is that there is a queue of threads (Windows schedules "threads", not "processes") waiting on a semaphore. When the semaphore is released, the kernel schedules the next thread waiting in the queue. However, scheduling the thread does not immediately make that thread start executing; it merely makes the thread able to execute by putting it in the queue of threads waiting to run. The thread will not actually run until a CPU has no threads of higher priority to execute.
While the thread is waiting in the scheduling queue, another thread that is actually executing may wait on the same semaphore. In a traditional queue system, that new thread would have to stop executing and go to the end of the queue waiting in line for that semaphore.
In recent Windows kernels, however, the new thread does not have to stop and wait for that semaphore. If the thread that has been assigned that semaphore is still sitting in the run queue, the semaphore may be reassigned to the old thread, causing the old thread to go back to waiting on the semaphore again.
The advantage of this is that the thread that was about to have to wait in the queue for the semaphore and then wait in the queue to run will not have to wait at all. The disadvantage is that you cannot predict which thread will actually get the semaphore next, and it's not fair so the thread waiting on the semaphore could potentially starve.

It is not that it CAN'T be FIFO; in fact, I'd bet many implementations ARE, for just the reasons that you state. The spec isn't that the process is chosen at random; it is that it isn't specified, so your program shouldn't rely on it being chosen in any particular way. (It COULD be chosen at random; just because it isn't the fastest approach doesn't mean it can't be done.)

All of the other answers here are great descriptions of the basic problem - especially around thread priorities and ready queues. Another thing to consider however is IO. I'm only talking about Windows here, since it is the only platform I know with any authority, but other kernels are likely to have similar issues.
On Windows, when an IO completes, something called a kernel-mode APC (Asynchronous Procedure Call) is queued against the thread which initiated the IO in order to complete it. If the thread happens to be waiting on a scheduler object (such as the semaphore in your example) then the thread is removed from the wait queue for that object which causes the (internal kernel mode) wait to complete with (something like) STATUS_ALERTED. Now, since these kernel-mode APCs are an implementation detail, and you can't see them from user mode, the kernel implementation of WaitForMultipleObjects restarts the wait at that point which causes your thread to get pushed to the back of the queue. From a kernel mode perspective, the queue is still in FIFO order, since the first caller of the underlying wait API is still at the head of the queue, however from your point of view, way up in user mode, you just got pushed to the back of the queue due to something you didn't see and quite possibly had no control over. This makes the queue order appear random from user mode. The implementation is still a simple FIFO, but because of IO it doesn't look like one from a higher level of abstraction.
I'm guessing a bit more here, but I would have thought that unix-like OSes have similar constraints around signal delivery and places where the kernel needs to hijack a process to run in its context.
Now this doesn't always happen, but the documentation has to be conservative and unless the order is explicitly guaranteed to be FIFO (which as described above - for windows at least - it can't be) then the ordering is described in the documentation as being "random" or "undocumented" or something because a random process controls it. It also gives the OS vendors lattitude to change the ordering at some later time.

Process scheduling algorithms are very specific to system functionality and operating system design. It will be hard to give a good answer to this question. If I am on a general PC, I want something with good throughput and average wait/response time. If I am on a system where I know the priority of all my jobs and know I absolutely want all my high priority jobs to run first (and don't care about preemption/starvation), then I want a Priority algorithm.
As far as a random selection goes, the motivation could be for various reasons. One being an attempt at good throughput, etc. as mentioned above above. However, it would be non-deterministic (hypothetically) and impossible to prove. This property could be an exploitation of probability (random samples, etc.), but, again, the proofs could only be based on empirical data on whether this would really work.

send over IP immediately on different thread

This is probably impossible, but i'm going to ask anyways. I have a multi-threaded program (server) that receives a request on a thread dedicated to IP communications and then passes it on to worker threads to do work, then I have to send a reply back with answers to the client and send it when it is actually finished, with as little delay as possible. Currently I am using a consumer/producer pattern and placing replies on a queue for the IP Thread to take off and send back to my client. This, however gives me no guarantee about WHEN this is going to happen, as the IP thread might not get scheduled any time soon, I cannot know. This makes my client, that is blocking for this call, think that the request has failed, which is obviously not the point.
Due to the fact I am unable to make changes in the client, I need to solve this sending issue on my side, the problem that I'm facing is that I do not wish to start sharing my IP object (currently only on 1 thread) with the worker threads, as then things get overly complicated. I wondered if there is some way I can use thread sync mechanisms to ensure that the moment my worker thread is finished, the IP thread will execute my send the reply back to the client?
Will manual/autoreset events do this for me or are these not guaranteed to wake up the thread immediately?

If you need it sent immediately, your best bet is to bite the bullet and start sharing the connection object. Lock it before accessing it, of course, and be sure to think about what you'll do if the send buffer is already full (the connection thread will need to deal with sending the portion of the message that didn't fit the first time, or the worker thread will be blocked until the client accepts some of the data you've sent). This may not be too difficult if your clients only have one request running at a time; if that's the case you can simply pass ownership of the client object to the worker thread when it begins processing, and pass it back when you're done.
Another option is using real-time threads. The details will vary between operating systems, but in most cases, if your thread has a high enough priority, it will be scheduled in immediately if it becomes ready to run, and will preempt all other threads with lower priority until done. On Linux this can be done with the SCHED_RR priority class, for example. However, this can negatively impact performance in many cases; as well as crashing the system if your thread gets into an infinite loop. It also usually requires administrative rights to use these scheduling classes.
That said, if scheduling takes long enough that the client times out, you might have some other problems with load. You should also really put a number on how fast the response needs to be - there's no end of things you can do if you want to speed up the response, but there'll come a point where it doesn't matter anymore (do you need response in the tens of ms? single-digit ms? hundreds of microseconds? single-digit microseconds?).

There is no synchronization mechanism that will wake a thread immediately. When a synchronization mechanism for which a thread is waiting is signaled, the thread is placed in a ready queue for its priority class. It can be starved there for several seconds before it's scheduled (Windows does have mechanisms that deal with starvation over 3-4 second intervals).
I think that for out-of-band, critical communications you can have a higher priority thread to which you can enqueue the reply message and wake it up (with a condition variable, MRE or any other synchronization mechanism). If that thread has higher priority than the rest of your application's threads, waking it up will immediately effect a context switch.

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.

I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js