Is std::queue having event mechanism( Signals in std::queue) - c++

Is there any event mechanism or predefined signals in the queue. If any data or message comes in the queue the queue should generate an event saying data is ready to process. Or signal other thread to do his task Instead of continuously polling to the queue.
In posix Message Queue there is function like mq_notify() which will notify to other process or thread if any data comes in the Message queue so we can avoid Polling.
Edit
If not, So how can I achieve this on std::queue. I want to avoid polling continuously it is slowing down the performance of the code.
Whenever some event occur on the queue it should notify to others.

std::queue is a containter type, not an event mechanism. I recommend making a class around the queue that implements a message queue.
EDIT:
Ok, so
So I recommend using an std::queue, std::mutex, and a std::condition_variable, if you use boost that has the same types. Putting those in your new Queue class and when pushing, you would lock the mutex, push onto the queue, unlock the mutex, and notify_one() the condition. That way the condition variable is notified only when pushed. You can do the same on pop.

There are two approaches to this. The simplest is to have an asynchronous queue, implemented using a mutex and condition variable, on which a thread blocks, waiting for another thread to push something onto the queue. This is a very common idiom for task dispatching and here are two simple implementations:
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html and
http://cxx-gtk-utils.sourceforge.net/2.2/classCgu_1_1AsyncQueueDispatch.html
By using a list rather than a deque as the queue container you can allocate new nodes outside the queue's mutex which significantly improves performance under high contention (see the source code for the second link mentioned above for an example using std::list::splice to achieve this).
Instead of having a designated thread block on the asynchronous queue, after a thread places an item on the queue it could instead invoke an event in the program's event loop which executes a callback which extracts the item from the queue and does something with it. Implementing this is more OS-specific but see http://www.appinf.com/docs/poco/Poco.NotificationQueue.html and http://cxx-gtk-utils.sourceforge.net/2.2/classCgu_1_1Notifier.html for different approaches to this.

Related

Thread communication with in C++ using pthreds

There are two threads T1 and T2
class Sender{
public:
void sendMessage();
};
class Reciever{
public:
void getMessage(string msg);
};
Consider Sender S is in Thread T1, Reciever R is in Thread T2 and now I need S.sendMessage() should communicate with object R to execute getMessage(string msg). So how can I do it... Producer and consumer approach might help but here it is a one-time requirement so is that really neede to maintain a common queue? please help me.
Condition variables are what you are looking for. They allow a thread to wait (blocking) for an event sent from another thread.
You correctly discerned that you do not need a producer-consumer queue if there is only one producer and only one consumer and only a single message is passed.
So, your receiver thread calls getMessage (which should either return a string, or take the string as a reference parameter), which internally waits for a condition variable. Then, in the sender thread, you notify the condition variable inside sendMessage. This wakes up the receiver thread.
Edit: Although you are asking a pthread-specific question, pthread has an equivalent of C++'s std::condition_variable. I recommend you use C++11's utilities instead of talking to pthreads directly, as they are easier to use.
Edit 2: You cannot just make another thread execute some function. The only thing you can do between threads is communication, so if you want to have some reaction in another thread to something you do in your thread, the other thread has to be actively waiting for you to trigger this event (by notifying a condition variable or similar).
The standard way combines a std::queue with a mutex and a condition variable. The mutex is used by the condition variable and protects the queue. The receiver waits until the queue is not empty and then pops the message from the queue. The sender pushes the message onto the queue.
When only one type of message is needed, you can use a queue of messages, if not then make it dynamic by sending shared pointers to messages.

Libuv: protecting the event loop from concurrent accesses

I would like to know what precautions are needed to be able to safely add callbacks to a libuv event loop from multiple threads in C++.
More details
I have some multi-threaded C++11 code that I want to modify to use make use of libuv's network communication API. I do not want to create a new libuv event loop every time network communication is required (for that would use up resources). So I created a libuv loop in a separate thread (I prevent the loop from closing by registering a "keep-alive" timer). This event loop is currently passed to other threads using a singleton. Callbacks are then registered (from other threads) while the loop is running.
I am worried about concurrent accesses to the libuv event loop when registering new callbacks: when calling uv_tcp_init the loop is explicitly passed (rather, a pointer to the loop); when calling uv_tcp_connect the loop is not explicitly mentioned but a pointer to it is stored in the uv_tcp_t struct passed. I haven't checked whether any of the above-mentioned functions actually modify the loop, but my intuition is at least one of them must do to (otherwise, libuv couldn't keep track of active handles).
My first thought was to add a mutex attribute to the singleton used to access the event loop and use it to prevent concurrent access to the event loop when calling any of the above functions:
EventLoop & loop = EventLoop::get(); // Access the singleton
{
std::lock_guard<std::mutex> lock(loop.mutex_attribute);
// Register callbacks, etc
}
However, this does not protect the event loop from concurrent accesses between my thread (which successfully acquired the lock) and some libuv internal function (or a registered callback triggered by libuv) since the latter are not aware of my using a singleton to protect access.
Should I be worried about said concurrent accesses? What steps may I take to mitigate the risks?
The solution I settled for was to not add handles directly to the libuv event loop from other threads, but rather to have other threads add handles to a queue (stored in the same singleton as the pointer to the event loop). Access to the queue is protected by a mutex.
The "keep-alive" timer then periodically empties the queue (the timer callback is aware of the mutex protecting the queue) by:
getting the first handle from the queue,
registering that handle with the libuv event loop (since we register the handle from a callback within the libuv event loop, there shouldn't be any risks of concurrent access), and performing any other operation needed on this handle (in my case, call uv_tcp_init and uv_tcp_connect),
repeating until the queue is empty.

C++, What and/or where is a pthread executing?

I have built a multi-threaded producer-consumer (add to a Queue, consume off the queue using numerous threads), but I am trying to optimize this further by sending a new produce() directly to the execution threads, if they are idle (instead of enqueue-ing it onto the queue).
So, I need to figure out where a thread is currently executing (is it currently conditionally waiting, or is it executing something). Can anyone suggest a way to do this?
If the execution thread is idle, won't it be waiting on the queue? The fastest way to get it some work to do is probably just pushing the work onto the queue.
Do you have reason to believe that the queue is a bottleneck?
That's what the queue should already do.
First, the thread can't be idle unless the queue is empty, right?
So what does your "enqueue and signal" operation do? It puts a pointer to the data where the thread can find it and then tells the thread to work on the data. That's the minimum task to do what you want to do anyway.
So no optimization should be possible.
You could have a global flag for each thread indicating whether it is waiting or not. Just set the flag before going into a pthread_cont_wait and reset it when released.
Having said this, I really don't see why you would want to venture away from the classic task queue pattern. It works well in most cases.
You can do this, but whether you actually want to is another matter - see the other posts.
First, forget about all the consumer threads waiting on a common semaphore. To do what you seem to want, waiting consumer threads have to be addressed by instance. To do this, a consumer that turns up, locks the queue and finds it empty needs to wait on an event of its very own. Also, the consumer needs to provide, in its 'pop' call, the address of where it wants the object put. So, in addition to the 'normal' object queue, consumer threads that need to wait need a struct containing a pointer and an event to wait on. You could create an array, or circular buffer, of these wait_structs when you create the P-C queue.
Then you're set.
PRODUCER: (calls push with an object ref/ptr)
Acquires queue lock and checks the list of wait_structs. If there is an entry, it loads its object into the address pointed to by the wait_struct pointer, (so 'sending a new produce() directly to the execution thread'), and signals the wait_struct event. If there is no entry in the list of wait_structs, the producer queues its object in the object queue. Oh yes - releases the queue lock :)
CONSUMER: (calls pop with the address where it wants an object ref put)
Acquires queue lock and checks the object queue count. If it's non-zero, it pops the object, shoves it into the target address it provided, releases the lock and runs on. If the object queue is empty, the consumer gets a free wait_strut in the list of wait_structs, sets the pointer to the value it passed in, releases the queue lock and waits on the event. When the event gets signaled, the consumer already has its object, (shoved in by the producer), and can just run on - no need to visit the PC-queue again.
Yes, this design works, (in Delphi, anyway - should work in C++), and is faster than a 'classic' semaphore-based PC-queue, (which is faster than a Windows Message Queue, which is faster than an IOCP queue).
I have got it working with a timeout - I'll let you figure out how to do that. (Hint - you have to make use of the consumer object location, (that is addressed by the pointer passed in), as temporary storage :)

pthread pool, C++

I am working on a networking program using C++ and I'd like to implement a pthread pool. Whenever, I receive an event from the receive socket, I will put the data into the queue in the thread pool. I am thinking about creating 5 separate threads and will consistently check the queue to see if there is anything incoming data to be done.
This is quite straight forward topic but I am not a expert so I would like to hear anything that might help to implement this.
Please let me know any tutorials or references or problems I should aware.
Use Boost.Asio and have each thread in the pool invoke io_service::run().
Multiple threads may call
io_service::run() to set up a pool of
threads from which completion handlers
may be invoked. This approach may also
be used with io_service::post() to use
a means to perform any computational
tasks across a thread pool.
Note that all threads that have joined
an io_service's pool are considered
equivalent, and the io_service may
distribute work across them in an
arbitrary fashion.
Before I start.
Use boost::threads
If you want to know how to do it with pthread's then you need to use the pthread condition variables. These allow you to suspend threads that are waiting for work without consuming CPU.
When an item of work is added to the queue you signal the condition variable and one pthread will be released from the condition variable thus allowing it to take an item from the queue. When the thread finishes processing the work item it returns back to the condition variable to await the next piece of work.
The main loop for the threads in the loop should look like this;
ThreadWorkLoop() // The function that all the pool threads run.
{
while(poolRunnin)
{
WorkItem = getWorkItem(); // Get an item from the queue. This suspends until an item
WorkItem->run(); // is available then you can run it.
}
}
GetWorkItem()
{
Locker lock(mutex); // RAII: Lock/unlock mutex
while(workQueue.size() == 0)
{
conditionVariable.wait(mutex); // Waiting on a condition variable suspends a thread
} // until the condition variable is signalled.
// Note: the mutex is unlocked while the thread is suspended
return workQueue.popItem();
}
AddItemToQueue(item)
{
Locker lock(mutex);
workQueue.pushItem(item);
conditionVariable.signal(); // Release a thread from the condition variable.
}
Have the receive thread to push the data on the queue and the 5 threads popping it. Protect the queue with a mutex and let them "fight" for the data.
You also want to have a usleep() or pthread_yield() in the worker thread's main loop
You will need a mutex and a conditional variable. Mutex will protect your job queue and when receiving threads add a job to the queue it will signal the condition variable. The worker threads will wait on the condition variable and will wake up when it is signaled.
Boost asio is a good solution.
But if you dont want to use it (or cant use it for whatever reasons) then you'll probably want to use a semaphore based implementation.
You can find a multithreaded queue implementation based on semaphores that I use here:
https://gist.github.com/482342
The reason for using semaphores is that you can avoid having the worker threads continually polling, and instead have them woken up by the OS when there is work to be done.

Lightest synchronization primitive for worker thread queue

I am about to implement a worker thread with work item queuing, and while I was thinking about the problem, I wanted to know if I'm doing the best thing.
The thread in question will have to have some thread local data (preinitialized at construction) and will loop on work items until some condition will be met.
pseudocode:
volatile bool run = true;
int WorkerThread(param)
{
localclassinstance c1 = new c1();
[other initialization]
while(true) {
[LOCK]
[unqueue work item]
[UNLOCK]
if([hasWorkItem]) {
[process data]
[PostMessage with pointer to data]
}
[Sleep]
if(!run)
break;
}
[uninitialize]
return 0;
}
I guess I will do the locking via critical section, as the queue will be std::vector or std::queue, but maybe there is a better way.
The part with Sleep doesn't look too great, as there will be a lot of extra Sleep with big Sleep values, or lot's of extra locking when Sleep value is small, and that's definitely unnecessary.
But I can't think of a WaitForSingleObject friendly primitive I could use instead of critical section, as there might be two threads queuing work items at the same time. So Event, which seems to be the best candidate, can loose the second work item if the Event was set already, and it doesn't guarantee a mutual exclusion.
Maybe there is even a better approach with InterlockedExchange kind of functions that leads to even less serialization.
P.S.: I might need to preprocess the whole queue and drop the obsolete work items during the unqueuing stage.
There are a multitude of ways to do this.
One option is to use a semaphore for the waiting. The semaphore is signalled every time a value is pushed on the queue, so the worker thread will only block if there are no items in the queue. This will still require separate synchronization on the queue itself.
A second option is to use a manual-reset event which is set when there are items in the queue and cleared when the queue is empty. Again, you will need to do separate synchronization on the queue.
A third option is to have an invisible message-only window created on the thread, and use a special WM_USER or WM_APP message to post items to the queue, attaching the item to the message via a pointer.
Another option is to use condition variables. The native Windows condition variables only work if you're targetting Windows Vista or Windows 7, but condition variables are also available for Windows XP with Boost or an implementation of the C++0x thread library. An example queue using boost condition variables is available on my blog: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
It is possible to share a resource between threads without using blocking locks at all, if your scenario meets certain requirements.
You need an atomic pointer exchange primitive, such as Win32's InterlockedExchange. Most processor architectures provide some sort of atomic swap, and it's usually much less expensive than acquiring a formal lock.
You can store your queue of work items in a pointer variable that is accessible to all the threads that will be interested in it. (global var, or field of an object that all the threads have access to)
This scenario assumes that the threads involved always have something to do, and only occasionally "glance" at the shared resource. If you want a design where threads block waiting for input, use a traditional blocking event object.
Before anything begins, create your queue or work item list object and assign it to the shared pointer variable.
Now, when producers want to push something onto the queue, they "acquire" exclusive access to the queue object by swapping a null into the shared pointer variable using InterlockedExchange. If the result of the swap returns a null, then somebody else is currently modifying the queue object. Sleep(0) to release the rest of your thread's time slice, then loop to retry the swap until it returns non-null. Even if you end up looping a few times, this is many. many times faster than making a kernel call to acquire a mutex object. Kernel calls require hundreds of clock cycles to transition into kernel mode.
When you successfully obtain the pointer, make your modifications to the queue, then swap the queue pointer back into the shared pointer.
When consuming items from the queue, you do the same thing: swap a null into the shared pointer and loop until you get a non-null result, operate on the object in the local var, then swap it back into the shared pointer var.
This technique is a combination of atomic swap and brief spin loops. It works well in scenarios where the threads involved are not blocked and collisions are rare. Most of the time the swap will give you exclusive access to the shared object on the first try, and as long as the length of time the queue object is held exclusively by any thread is very short then no thread should have to loop more than a few times before the queue object becomes available again.
If you expect a lot of contention between threads in your scenario, or you want a design where threads spend most of their time blocked waiting for work to arrive, you may be better served by a formal mutex synchronization object.
The fastest locking primitive is usually a spin-lock or spin-sleep-lock. CRITICAL_SECTION is just such a (user-space) spin-sleep-lock.
(Well, aside from not using locking primitives at all of course. But that means using lock-free data-structures, and those are really really hard to get right.)
As for avoiding the Sleep: have a look at condition-variables. They're designed to be used together with a "mutex", and I think they're much easier to use correctly than Windows' EVENTs.
Boost.Thread has a nice portable implementation of both, fast user-space spin-sleep-locks and condition variables:
http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref
A work-queue using Boost.Thread could look something like this:
template <class T>
class Queue : private boost::noncopyable
{
public:
void Enqueue(T const& t)
{
unique_lock lock(m_mutex);
// wait until the queue is not full
while (m_backingStore.size() >= m_maxSize)
m_queueNotFullCondition.wait(lock); // releases the lock temporarily
m_backingStore.push_back(t);
m_queueNotEmptyCondition.notify_all(); // notify waiters that the queue is not empty
}
T DequeueOrBlock()
{
unique_lock lock(m_mutex);
// wait until the queue is not empty
while (m_backingStore.empty())
m_queueNotEmptyCondition.wait(lock); // releases the lock temporarily
T t = m_backingStore.front();
m_backingStore.pop_front();
m_queueNotFullCondition.notify_all(); // notify waiters that the queue is not full
return t;
}
private:
typedef boost::recursive_mutex mutex;
typedef boost::unique_lock<boost::recursive_mutex> unique_lock;
size_t const m_maxSize;
mutex mutable m_mutex;
boost::condition_variable_any m_queueNotEmptyCondition;
boost::condition_variable_any m_queueNotFullCondition;
std::deque<T> m_backingStore;
};
There are various ways to do this
For one you could create an event instead called 'run' and then use that to detect when thread should terminate, the main thread then signals. Instead of sleep you would then use WaitForSingleObject with a timeout, that way you will quit directly instead of waiting for sleep ms.
Another way is to accept messages in your loop and then invent a user defined message that you post to the thread
EDIT: depending on situation it may also be wise to have yet another thread that monitors this thread to check if it is dead or not, this can be done by the above mentioned message queue so replying to a certain message within x ms would mean that the thread hasn't locked up.
I'd restructure a bit:
WorkItem GetWorkItem()
{
while(true)
{
WaitForSingleObject(queue.Ready);
{
ScopeLock lock(queue.Lock);
if(!queue.IsEmpty())
{
return queue.GetItem();
}
}
}
}
int WorkerThread(param)
{
bool done = false;
do
{
WorkItem work = GetWorkItem();
if( work.IsQuitMessage() )
{
done = true;
}
else
{
work.Process();
}
} while(!done);
return 0;
}
Points of interest:
ScopeLock is a RAII class to make critical section usage safer.
Block on event until workitem is (possibly) ready - then lock while trying to dequeue it.
don't use a global "IsDone" flag, enqueue special quitmessage WorkItems.
You can have a look at another approach here that uses C++0x atomic operations
http://www.drdobbs.com/high-performance-computing/210604448
Use a semaphore instead of an event.
Keep the signaling and synchronizing separate. Something along these lines...
// in main thread
HANDLE events[2];
events[0] = CreateEvent(...); // for shutdown
events[1] = CreateEvent(...); // for work to do
// start thread and pass the events
// in worker thread
DWORD ret;
while (true)
{
ret = WaitForMultipleObjects(2, events, FALSE, <timeout val or INFINITE>);
if shutdown
return
else if do-work
enter crit sec
unqueue work
leave crit sec
etc.
else if timeout
do something else that has to be done
}
Given that this question is tagged windows, Ill answer thus:
Don't create 1 worker thread. Your worker thread jobs are presumably independent, so you can process multiple jobs at once? If so:
In your main thread call CreateIOCompletionPort to create an io completion port object.
Create a pool of worker threads. The number you need to create depends on how many jobs you might want to service in parallel. Some multiple of the number of CPU cores is a good start.
Each time a job comes in call PostQueuedCompletionStatus() passing a pointer to the job struct as the lpOverlapped struct.
Each worker thread calls GetQueuedCompletionItem() - retrieves the work item from the lpOverlapped pointer and does the job before returning to GetQueuedCompletionStatus.
This looks heavy, but io completion ports are implemented in kernel mode and represent a queue that can be deserialized into any of the worker threads associated with the queue (i.e. waiting on a call to GetQueuedCompletionStatus). The io completion port knows how many of the threads that are processing an item are actually using a CPU vs blocked on an IO call - and will release more worker threads from the pool to ensure that the concurrency count is met.
So, its not lightweight, but it is very very efficient... io completion port can be associated with pipe and socket handles for example and can dequeue the results of asynchronous operations on those handles. io completion port designs can scale to handling 10's of thousands of socket connects on a single server - but on the desktop side of the world make a very convenient way of scaling processing of jobs over the 2 or 4 cores now common in desktop PCs.