Multithreading program - c++

basically my program has 2 sets of threads, workers and jobs. Each job has an arrival time, then it is pushed onto a queue.
For the servers, I want them to constantly look for a job on the queue, once there is a job on the queue, only 1 worker takes it off and does its thing with it.
In main, all the worker threads are created first and then the job threads are created and synchronized (each pushing stuff on the queue). I can't get the timing right as the worker threads sometimes do things at exactly the same time OR the jobs aren't being pushed onto the queue at the right times (ie. job with arrival time 3 is pushed before job with arrival time 2).
How can I do this using semaphores and/or mutexes?
I tried to put a mutex in the worker function but I don't really have a good handle on mutexes/semaphores..
Any ideas would be appreciated.
Thanks!

The Q push to Q pop operation needs to be atomic i.e (is the Critical section). Put that under a Mutex acquire and Mutex release. That should do it for you.
Check the posix thread tutorial to understand mutex acquisition and release. I use this PTHREAD TUTORIAL

Copied from an answer to one of my earlier questions. My question concerns Win32 threads but the described consept pretty much the same with pthreads.
Use a semaphore in your queue to indicate whether there are elements ready to be processed.
Every time you add an item, call sem_post() to increment the count associated with the semaphore
In the loop in your thread process, call sem_wait() on the handle of your semaphore object
Here is a tutorial for POSIX Semaphores
But first like the other guy said, you have to make Q thread-safe.
void *func(void *newWorker) {
struct workerType* worker = (struct workerType*) newWorker;
while(numServed < maxJobs) {
//if(!Q.empty()) {
// No need to ask if Q is empty.
// If a thread passes the sem_wait function
// there has to be at least one job in the Q
sem_wait(&semaphore);
pthread_mutex_lock(&mutex);
struct jobType* job = Q.front();
numServed++;
cout << job->jobNum << " was served by " << worker->workerNum << endl;
Q.pop();
pthread_mutex_unlock(&mutex);
//sleep(worker->runTime); No need to sleep also
//}
}
}
void *job(void *jobdata) {
struct jobType *job = (struct jobType*) jobdata;
//sleep(job->arrivtime);
pthread_mutex_lock(&mutex);
Q.push(job);
pthread_mutex_unlock(&mutex);
sem_post(&semaphore);
// Inform the workers that another job is pushed.
}

The problem is that your servers are doing three non-atomic queue operations (empty, then front, then pop) with no synchronization to ensure that some other thread doesn't interleave its operations. You need to acquire a mutex or semaphore before calling Q.empty and release it after calling Q.pop to ensure that the empty/front/pop trio is done atomically. You also need to make sure you release the mutux properly if Q.empty fails

Related

Where can we use std::barrier over std::latch?

I recently heard new c++ standard features which are:
std::latch
std::barrier
I cannot figure it out ,in which situations that they are applicable and useful over one-another.
If someone can raise an example for how to use each one of them wisely it would be really helpful.
Very short answer
They're really aimed at quite different goals:
Barriers are useful when you have a bunch of threads and you want to synchronise across of them at once, for example to do something that operates on all of their data at once.
Latches are useful if you have a bunch of work items and you want to know when they've all been handled, and aren't necessarily interested in which thread(s) handled them.
Much longer answer
Barriers and latches are often used when you have a pool of worker threads that do some processing and a queue of work items that is shared between. It's not the only situation where they're used, but it is a very common one and does help illustrate the differences. Here's some example code that would set up some threads like this:
const size_t worker_count = 7; // or whatever
std::vector<std::thread> workers;
std::vector<Proc> procs(worker_count);
Queue<std::function<void(Proc&)>> queue;
for (size_t i = 0; i < worker_count; ++i) {
workers.push_back(std::thread(
[p = &procs[i], &queue]() {
while (auto fn = queue.pop_back()) {
fn(*p);
}
}
));
}
There are two types that I have assumed exist in that example:
Proc: a type specific to your application that contains data and logic necessary to process work items. A reference to one is passed to each callback function that's run in the thread pool.
Queue: a thread-safe blocking queue. There is nothing like this in the C++ standard library (somewhat surprisingly) but there are a lot of open-source libraries containing them e.g. Folly MPMCQueue or moodycamel::ConcurrentQueue, or you can build a less fancy one yourself with std::mutex, std::condition_variable and std::deque (there are many examples of how to do this if you Google for them).
Latch
A latch is often used to wait until some work items you push onto the queue have all finished, typically so you can inspect the result.
std::vector<WorkItem> work = get_work();
std::latch latch(work.size());
for (WorkItem& work_item : work) {
queue.push_back([&work_item, &latch](Proc& proc) {
proc.do_work(work_item);
latch.count_down();
});
}
latch.wait();
// Inspect the completed work
How this works:
The threads will - eventually - pop the work items off of the queue, possibly with multiple threads in the pool handling different work items at the same time.
As each work item is finished, latch.count_down() is called, effectively decrementing an internal counter that started at work.size().
When all work items have finished, that counter reaches zero, at which point latch.wait() returns and the producer thread knows that the work items have all been processed.
Notes:
The latch count is the number of work items that will be processed, not the number of worker threads.
The count_down() method could be called zero times, one time, or multiple times on each thread, and that number could be different for different threads. For example, even if you push 7 messages onto 7 threads, it might be that all 7 items are processed onto the same one thread (rather than one for each thread) and that's fine.
Other unrelated work items could be interleaved with these ones (e.g. because they weree pushed onto the queue by other producer threads) and again that's fine.
In principle, it's possible that latch.wait() won't be called until after all of the worker threads have already finished processing all of the work items. (This is the sort of odd condition you need to look out for when writing threaded code.) But that's OK, it's not a race condition: latch.wait() will just immediately return in that case.
An alternative to using a latch is that there's another queue, in addition to the one shown here, that contains the result of the work items. The thread pool callback pushes results on to that queue while the producer thread pops results off of it. Basically, it goes in the opposite direction to the queue in this code. That's a perfectly valid strategy too, in fact if anything it's more common, but there are other situations where the latch is more useful.
Barrier
A barrier is often used to make all threads wait simultaneously so that the data associated with all of the threads can be operated on simultaneously.
typedef Fn std::function<void()>;
Fn completionFn = [&procs]() {
// Do something with the whole vector of Proc objects
};
auto barrier = std::make_shared<std::barrier<Fn>>(worker_count, completionFn);
auto workerFn = [barrier](Proc&) {
barrier->count_down_and_wait();
};
for (size_t i = 0; i < worker_count; ++i) {
queue.push_back(workerFn);
}
How this works:
All of the worker threads will pop one of these workerFn items off of the queue and call barrier.count_down_and_wait().
Once all of them are waiting, one of them will call completionFn() while the others continue to wait.
Once that function completes they will all return from count_down_and_wait() and be free to pop other, unrelated, work items from the queue.
Notes:
Here the barrier count is the number of worker threads.
It is guaranteed that each thread will pop precisely one workerFn off of the queue and handle it. Once a thread has popped one off of the queue, it will wait in barrier.count_down_and_wait() until all the other copies of workerFn have been popped off by other threads, so there is no chance of it popping another one off.
I used a shared pointer to the barrier so that it will be destroyed automatically once all the work items are done. This wasn't an issue with the latch because there we could just make it a local variable in the producer thread function, because it waits until the worker threads have used the latch (it calls latch.wait()). Here the producer thread doesn't wait for the barrier so we need to manage the memory in a different way.
If you did want the original producer thread to wait until the barrier has been finished, that's fine, it can call count_down_and_wait() too, but you will obviously need to pass worker_count + 1 to the barrier's constructor. (And then you wouldn't need to use a shared pointer for the barrier.)
If other work items are being pushed onto the queue at the same time, that's fine too, although it will potentially waste time as some threads will just be sitting there waiting for the barrier to be acquired while other threads are distracted by other work before they acquire the barrier.
!!! DANGER !!!
The last bullet point about other working being pushed onto the queue being "fine" is only the case if that other work doesn't also use a barrier! If you have two different producer threads putting work items with a barrier on to the same queue and those items are interleaved, then some threads will wait on one barrier and others on the other one, and neither will ever reach the required wait count - DEADLOCK. One way to avoid this is to only ever use barriers like this from a single thread, or even to only ever use one barrier in your whole program (this sounds extreme but is actually quite a common strategy, as barriers are often used for one-time initialisation on startup). Another option, if the thread queue you're using supports it, is to atomically push all work items for the barrier onto the queue at once so they're never interleaved with any other work items. (This won't work with the moodycamel queue, which supports pushing multiple items at once but doesn't guarantee that they won't be interleved with items pushed on by other threads.)
Barrier without completion function
At the point when you asked this question, the proposed experimental API didn't support completion functions. Even the current API at least allows not using them, so I thought I should show an example of how barriers can be used like that too.
auto barrier = std::make_shared<std::barrier<>>(worker_count);
auto workerMainFn = [&procs, barrier](Proc&) {
barrier->count_down_and_wait();
// Do something with the whole vector of Proc objects
barrier->count_down_and_wait();
};
auto workerOtherFn = [barrier](Proc&) {
barrier->count_down_and_wait(); // Wait for work to start
barrier->count_down_and_wait(); // Wait for work to finish
}
queue.push_back(std::move(workerMainFn));
for (size_t i = 0; i < worker_count - 1; ++i) {
queue.push_back(workerOtherFn);
}
How this works:
The key idea is to wait for the barrier twice in each thread, and do the work in between. The first waits have the same purpose as the previous example: they ensure any earlier work items in the queue are finished before starting this work. The second waits ensure that any later items in the queue don't start until this work has finished.
Notes:
The notes are mostly the same as the previous barrier example, but here are some differences:
One difference is that, because the barrier is not tied to the specific completion function, it's more likely that you can share it between multiple uses, like we did in the latch example, avoiding the use of a shared pointer.
This example makes it look like using a barrier without a completion function is much more fiddly, but that's just because this situation isn't well suited to them. Sometimes, all you need is to reach the barrier. For example, whereas we initialised a queue before the threads started, maybe you have a queue for each thread but initialised in the threads' run functions. In that case, maybe the barrier just signifies that the queues have been initialised and are ready for other threads to pass messages to each other. In that case, you can use a barrier with no completion function without needing to wait on it twice like this.
You could actually use a latch for this, calling count_down() and then wait() in place of count_down_and_wait(). But using a barrier makes more sense, both because calling the combined function is a little simpler and because using a barrier communicates your intention better to future readers of the code.
Any any case, the "DANGER" warning from before still applies.

Thread pools and context switching slowdowns

I have a thread pool with idling threads that wait for jobs to be pushed to a queue, in a windows application.
I have a loop in my main application thread that adds 1000 jobs to the pool's queue sequentially (it adds a job, then waits for the job to finish, then adds another job, x1000). So no actual parallel processing is happening...here's some pseudocode:
////threadpool:
class ThreadPool
{
....
std::condition_variable job_cv;
std::condition_variable finished_cv;
std::mutex job_mutex;
std::queue<std::function <void(void)>> job_queue;
void addJob(std::function <void(void)> jobfn)
{
std::unique_lock <std::mutex> lock(job_mutex);
job_queue.emplace(std::move(jobfn));
job_cv.notify_one();
}
void waitForJobToFinish()
{
std::unique_lock<std::mutex> lock(job_mutex);
finished_cv.wait(lock, [this]() {return job_queue.empty(); });
}
....
void threadFunction() //called by each thread when it's first started
{
std::function <void(void)> job;
while (true)
{
std::unique_lock <std::mutex> latch(job_mutex);
job_cv.wait(latch, [this](){return !job_queue.empty();});
{
job = std::move(job_queue.front());
job_queue.pop();
latch.unlock();
job();
latch.lock();
finished_cv.notify_one();
}
}
}
}
...
////main application:
void jobfn()
{
//do some lightweight calculation
}
void main()
{
//test 1000 calls to the lightweight jobfn from the thread pool
for (int q = 0; q < 1000; q++)
{
threadPool->addJob(&jobfn);
threadPool->waitForJobToFinish();
}
}
So basically what's happening is a job is added to the queue and the main loop begins to wait, a waiting thread then picks it up, and when the thread finishes, it notifies the application that the main loop can continue and another job can be added to the queue, etc. So that way 1000 jobs are processed sequentially.
It's worth noting that the jobs themselves are tiny and can complete in a few milliseconds.
However, I've noticed something strange....
The time it takes for the loop to complete is essentially O(n) where n is the number of threads in the thread pool. So even though jobs are processed one-at-a-time in all scenarios, a 10-thread pool takes 10x longer to complete the full 1000-job task than a 1-thread pool.
I'm trying to figure out why, and my only guess so far is that context switching is the bottleneck...maybe less (or zero?) context switching overhead is required when only 1 thread is grabbing jobs...but when 10 threads are continually taking their turn to process a single job at a time, there's some extra processing required? But that doesn't make sense to me...wouldn't it be the same operation required to unlock thread A for a job, as it would be thread B,C,D...? Is there some OS-level caching going on, where a thread doesn't lose context until a different thread is given it? So calling on the same thread over and over is faster than calling threads A,B,C sequentially?
But that's a complete guess at this point...maybe someone else could shed some insight on why I'm getting these results...Intuitively I assumed that so long as only 1 thread is executing at a time, I could have a thread pool with an arbitrarily large number of threads and the total task completion time for [x] jobs would be the same (so long as each job is identical and the total number of jobs is the same)...why is that wrong?
Your "guess" is correct; it's simply a resource contention issue.
Your 10 threads are not idle, they're waiting. This means that the OS has to iterate over the currently active threads for your application, which means a context switch likely occurs.
The active thread is pushed back, a "waiting" thread pulled to the front, in which the code checks if the signal has been notified and the lock can be acquired, since it likely can't in the time slice for that thread, it continues to iterate over the remaining threads, each trying to see if the lock can be acquired, which it can't because your "active" thread hasn't been allotted a time slice to complete yet.
A single-thread pool doesn't have this issue because no additional threads need to be iterated over at the OS level; granted, a single-thread pool is still slower than just calling job 1000 times.
Hope that can help.

Single reader multiple writers with pthreads and locks and without boost

Consider the next piece of code.
#include <iostream>
#include <vector>
#include <map>
using namespace std;
map<pthread_t,vector<int>> map_vec;
vector<pair<pthread_t ,int>> how_much_and_where;
pthread_cond_t CV = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void* writer(void* args)
{
while(*some condition*)
{
int howMuchPush = (rand() % 5) + 1;
for (int i = 0; i < howMuchPush; ++i)
{
// WRITE
map_vec[pthread_self()].push_back(rand() % 10);
}
how_much_and_where.push_back(make_pair(pthread_self(), howMuchPush));
// Wake up the reader - there's something to read.
pthread_cond_signal(&CV);
}
cout << "writer thread: " << pthread_self() << endl;
return nullptr;
}
void* reader(void* args) {
pair<pthread_t, int> to_do;
pthread_cond_wait(&CV, &mutex);
while(*what condition??*)
{
to_do = how_much_and_where.front();
how_much_and_where.erase(how_much_and_where.begin());
// READ
cout << to_do.first << " wrote " << endl;
for (int i = 0; i < to_do.second; i++)
{
cout << map_vec[to_do.first][i] << endl;
}
// Done reading. Go to sleep.
pthread_cond_wait(&CV, &mutex);
}
return nullptr;
}
//----------------------------------------------------------------------------//
int main()
{
pthread_t threads[4];
// Writers
pthread_create(&threads[0], nullptr, writer, nullptr);
pthread_create(&threads[1], nullptr, writer, nullptr);
pthread_create(&threads[2], nullptr, writer, nullptr);
// reader
pthread_create(&threads[4], nullptr, reader, nullptr);
pthread_join(threads[0], nullptr);
pthread_join(threads[1], nullptr);
pthread_join(threads[2], nullptr);
pthread_join(threads[3], nullptr);
return 0;
}
Background
Every writer have his own container to which he writes data.
And suppose that there's a reader who knows when a writer finished writing chunk of data, and what is the size of that chunk (The reader has a container to which writers write pairs of this data).
Questions
Obviously i should put locks on the shared sources - map_vec and how_much_and_where. But i don't understand what ,in this case, is the -
efficent way to to position locks on this resources (For example, locking map_vec before every push_back in the for loop? Or maybe lock it before the for loop - But isn't pushing to a queue is a wasteful and long operation, that may cause the reader to wait too much?) /
safe way to position locks in order to prevent deadlocks.
I don't understand what is the right condition that should be in the
while loop - i thought that maybe as long as how_much_and_where is
not empty, but obviously a situation in which the reader emptied how_much_and_where right before a writer added a pair may accour.
Suppose a writer sent a signal while the reader was busy reading some
data. As far as i understand, this signal will be ignored, and the
pair the which the writer pushed, may never be dealt with (#of of signals
received and dealt with < #of pairs\tasks for the reader). How can i
prevent such scenario?
To simplify things we should decouple the implementation of the general-purpose/reusable producer-consumer queue (or simply "blocking queue" as I usually call it) from the implementation of the actual producers and the consumer (that aren't general-purpose/reusable - they are specific to your program). This will make the code much more clear and manageable from a design perspective.
1. Implementing a general-purpose (reusable) blocking queue
First you should implement a "blocking queue" that can manage multiple multiple producers and a single consumer. This blocking queue will contain the code that handles multithreading/synchronization and it can be used by a consumer thread to receive items from several producer threads. Such a blocking queue can be implemented in a lot of different ways (not only with mutex+cond combo) depending whether you have 1 or more consumers and 1 or more producers (sometimes it is possible to introduce different kinds of [platform specific] optimizations when you have only 1 consumer or 1 producer). The simplest queue implementation with mutex+cond pair automatically handles multiple producers and multiple consumers if needed.
The queue has only an internal container (it can be a non-thread safe std::queue, vector or list) that holds the items and an associated mutex+cond pair that protects this container from concurrent access of multiple threads. The queue has to provide two operations:
produce(item): puts one item into the queue and returns immediately. The pseudo code looks like this:
lock mutex
add the new item to the internal container
signal through cond
unlock mutex
return
wait_and_get(): if there is at least one item in the queue then it removes the oldest one and returns immediately, otherwise it waits util someone puts an item to the queue with the produce(item) operation.
lock mutex
if container is empty:
wait for cond (pthread_cond_wait)
remove oldest item
unlock mutex
return the removed oldest item
2. Implementing your program using the blocking queue
Now that you have a reusable blocking queue to build on we can implement the producers and the consumer along with the main thread that controls things.
The producers
They just throw a bunch of items into the queue (by calling produce(item) of the blocking queue) and then they exit. If the production of items isn't computation heavy or doesn't require waiting for a lot of IO operations then this will finish very quickly in your example program. To simulate real world scenarios where the threads do heavy work you could the the following: On each producer thread you put only X (lets say 5) number of items to the queue but between each item you wait for a random number of seconds let's say between 1 and 3 seconds. Note that after some time your producer threads quit by themselves when they finished their job.
The consumer
The consumer has an infinite loop in which it always gets the next item from the queue with wait_and_get() and processes it somehow. If it is a special item that signals the end of processing then it breaks out of the infinite loop instead of processing the item. Pseudo code:
Infinite loop:
get the next item from the queue (wait_and_get())
if this is the special item indicating the end of processing then break out of the loop...
otherwise let's process this item
The main thread
Start all threads including producers and the consumers in any order.
Wait for all producer threads to finish (pthread_join() them).
Remember that producers finish and quit by themselves after some time without external stimuli. When you finish join-ing all producers it means that every producer has quit so no one will call the produce(item) operation of the queue again. However the queue may still have unprocessed items and consumer may still work on crunching those.
Put the last special "end of processing" item to the queue for the consumer.
When the consumer finishes processing the last item produced by the producers it will still ask the queue for the next item with wait_and_get() - this may result in a deadlock because of waiting for the next item that never arrives. To aid this on the main thread we put the last special item to the queue that signals the end of processing for the consumer. Remember that our consumer implementation contains a check for this special item to find out when to finish processing. Important that this special item has to be placed to the queue on the main thread only after the producers have finished (after joining them)!
If you have multiple consumers then its easier to put multiple special "end of processing" items to the queue (1 for each consumer) than making the queue smarter to be able to handle multiple consumers with only 1 "end of processing" item. Since the main thread orchestrates the whole thing (thread creation, thread joining, etc) it knows exactly the number of consumers so it's easy to put the same number of "end of processing" items to the queue.
Wait for the consumer thread to terminate by joining it.
After putting the end-of-processing special item to the queue we wait for the consumer thread to process the remaining items (produced by the producers) along with our last special item (produced by the main "coordinator" thread) that asks consumer to finish. We do this waiting on the main thread by pthread_join()-in the consumer thread.
Additional notes:
In my threaded system implementations the items of the blocking queue are usually pointers - Pointers to "job" objects that have to be executed/processed. (You can implement the blocking queue as a template class, in this case the user of the blocking queue can specify the type of the item). In my case it is easy to put a special "end of processing" item to the queue for the consumer: I usually use a simple NULL job pointer for this purpose. In your case you will have to find out what kind of special value can you use in the queue to signal the end of processing for the consumer.
The producers may have their own queues and a whole bunch of other data structures with which they play around to "produce items" but the consumer doesn't care about those data structures. The consumer cares only about individual items received through its own blocking queue. If a producer wants something from the consumer then it has to send an item (a "job") to the consumer through the queue. The blocking queue instance belongs to the consumer thread - it provides a one-way communication channel between an arbitrary thread and the consumer thread. Even the consumer thread itself can put an item into its own queue (in some cases this is useful).
The pthread_cond_wait documentation says that this function can wake up without actual signaling (although I've never seen a single bug caused by the spurious wakup of this function in my life). To aid this the if container is empty then pthread_cond_wait part of the code should be replaced to while the container is empty pthread_cond_wait but again, this spurious wakeup thing is probably a lochness monster that is present only on some architectures with specific linux implementations of threading primitives so your code would probably work on desktop machines without caring about this problem.

Low performance of boost::barrier, wait operation

I have performance issue with boost:barrier. I measure time of wait method call, for single thread situation when call to wait is repeated around 100000 it takes around 0.5 sec. Unfortunately for two thread scenario this time expands to 3 seconds and it is getting worse with every thread ( I have 8 core processor).
I implemented custom method which is responsible for providing the same functionality and it is much more faster.
Is it normal to work so slow for this method. Is there faster way to synchronize threads in boost (so all threads wait for completion of current job by all threads and then proceed to the next task, just synchronization, no data transmission is required).
I have been asked for my current code.
What I want to achieve. In a loop I run a function, this function can be divided into many threads, however all thread should finish current loop run before execution of another run.
My current solution
volatile int barrierCounter1 =0; //it will store number of threads which completed current loop run
volatile bool barrierThread1[NumberOfThreads]; //it will store go signal for all threads with id > 0. All values are set to false at the beginning
boost::mutex mutexSetBarrierCounter; //mutex for barrierCounter1 modification
void ProcessT(int threadId)
{
do
{
DoWork(); //function which should be executed by every thread
mutexSetBarrierCounter.lock();
barrierCounter1++; //every thread notifies that it finish execution of function
mutexSetBarrierCounter.unlock();
if(threadId == 0)
{
//main thread (0) awaits for completion of all threads
while(barrierCounter1!=NumberOfThreads)
{
//I assume that the number of threads is lower than the number of processor cores
//so this loop should not have an impact of overall performance
}
//if all threads completed, notify other thread that they can proceed to the consecutive loop
for(int i = 0; i<NumberOfThreads; i++)
{
barrierThread1[i] = true;
}
//clear counter, no lock is utilized because rest of threads await in else loop
barrierCounter1 = 0;
}
else
{
//rest of threads await for "go" signal
while(barrierThread1[i]==false)
{
}
//if thread is allowed to proceed then it should only clean up its barrier thread array
//no lock is utilized because '0' thread would not modify this value until all threads complete loop run
barrierThread1[i] = false;
}
}
while(!end)
}
Locking runs counter to concurrency. Lock contention is always worst behaviour.
IOW: Thread synchronization (in itself) never scales.
Solution: only use synchronization primitives in situations where the contention will be low (the threads need to synchronize "relatively rarely"[1]), or do not try to employ more than one thread for the job that contends for the shared resource.
Your benchmark seems to magnify the very worst-case behavior, by making all threads always wait. If you have a significant workload on all workers between barriers, then the overhead will dwindle, and could easily become insignificant.
Trust you profiler
Profile only your application code (no silly synthetic benchmarks)
Prefer non-threading to threading (remember: asynchrony != concurrency)
[1] Which is highly relative and subjective

What is better for a message queue? mutex & cond or mutex&semaphore?

I am implementing a C++ message queue based on a std::queue.
As I need popers to wait on an empty queue I was considering using mutex for mutual exclusion and cond for suspending threads on empty queue, as glib does with the gasyncqueue.
However it looks to me that a mutex&semaphore would do the job, I think it contains an integer and that seems like a pretty high number to reach on pending messages.
Pros of semaphore are that you don't need to check manually the condition each time you return from a wait, as you now for sure that someone inserted something(when someone inserted 2 items and you are the second thread arriving).
Which one would you choose?
EDIT:
Changed the question in response to #Greg Rogers
A single semaphore does not do the job - you need to be comparing (mutex + semaphore) and (mutex + condition variable).
It is pretty easy to see this by trying to implement it:
void push(T t)
{
queue.push(t);
sem.post();
}
T pop()
{
sem.wait();
T t = queue.top();
queue.pop();
return t;
}
As you can see there is no mutual exclusion when you are actually reading/writing to the queue, even though the signalling (from the semaphore) is there. Multiple threads can call push at the same time and break the queue, or multiple threads could call pop at the same time and break it. Or, a thread could call pop and be removing the first element of the queue while another thread called push.
You should use whichever you think is easier to implement, I doubt performance will vary much if any (it might be interesting to measure though).
Personally I use a mutex to serialize access to the list, and wake up the consumer by sending a byte over a socket (produced by socketpair()). That may be somewhat less efficient than a semaphore or condition variable, but it has the advantage of allowing the consumer to block in select()/poll(). That way the consumer can also wait on other things besides the data queue, if it wants to. It also lets you use the exact same queueing code on almost all OS's, since practically every OS supports the BSD sockets API.
Psuedocode follows:
// Called by the producer. Adds a data item to the queue, and sends a byte
// on the socket to notify the consumer, if necessary
void PushToQueue(const DataItem & di)
{
mutex.Lock();
bool sendSignal = (queue.size() == 0);
queue.push_back(di);
mutex.Unlock();
if (sendSignal) producerSocket.SendAByteNonBlocking();
}
// Called by consumer after consumerSocket selects as ready-for-read
// Returns true if (di) was written to, or false if there wasn't anything to read after all
// Consumer should call this in a loop until it returns false, and then
// go back to sleep inside select() to wait for further data from the producer
bool PopFromQueue(DataItem & di)
{
consumerSocket.ReadAsManyBytesAsPossibleWithoutBlockingAndThrowThemAway();
mutex.Lock();
bool ret = (queue.size() > 0);
if (ret) queue.pop_front(di);
mutex.Unlock();
return ret;
}
If You want to allow multiple simultaneously users at a time to use your queue, you should use semaphores.
sema(10) // ten threads/process have the concurrent access.
sema_lock(&sema_obj)
queue
sema_unlock(&sema_obj)
Mutex will "authorize" only one user at a time.
pthread_mutex_lock(&mutex_obj)
global_data;
pthread_mutex_unlock(&mutex_obj)
That's the main difference and You should decide which solution will fit your requirements.
But I'd choose mutex approach, because You don't need to specifies how many users can grab your resource.