Multirate threads - c++

I ran recently into a requirement in which there is a need for multithreaded application whose threads run at different rates.
The questions then become, since i am still learning multithreading:
A scenario is given to put things into perspective:
Say 1st thread runs at 100 Hz "real time"
2nd runs at 10 Hz
and say that the 1st thread provides data "myData" to the 2nd thread.
How is myData going to be provided to the 2nd thread, is the common practice to just read whatever is available from the first thread, or there need to be some kind of decimation to reduce the rate.
Does the myData need to be some kind of Singleton with locking mechanism. Although myData isn't shared, but rather updated by the first thread and used in the second thread.
How about the opposite case, when the data used in one thread need to be used at higher rate in a different thread.

How is myData going to be provided to the 2nd thread
One common method is to provide a FIFO queue -- this could be a std::dequeue or a linked list, or whatever -- and have the producer thread push data items onto one end of the queue while the consumer thread pops the data items off of the other end of the queue. Be sure to serialize all accesses to the FIFO queue (using a mutex or similar locking mechanism), to avoid race conditions.
Alternatively, instead of a queue you could have a single shared data object (essentially a queue of length one) and have your producer thread overwrite the object every time it generates new data. This could be done in cases where it's not important that the consumer thread sees every piece of data that was generated, but rather it's only important that it sees the most recent data. You'd still need to do the locking, though, to avoid the risk of the consumer thread reading from the data object at the same time the producer thread is in the middle of writing to it.
or does there need to be some kind of decimation to reduce the rate.
There doesn't need to be any decimation -- the second thread can just read in as much data as there is available to read, whenever it wakes up.
Does the myData need to be some kind of Singleton with locking
mechanism.
Singleton isn't necessary (although it's possible to do it that way). The locking mechanism is necessary, unless you have some kind of lock-free synchronization mechanism (and if you're asking this level of question, you don't have one and you don't want to try to get one either -- keep things simple for now!)
How about the opposite case, when the data used in one thread need to
be used at higher rate in a different thread.
It's the same -- if you're using a proper inter-thread communications mechanism, the rates at which the threads wake up doesn't matter, because the communications mechanism will do the right thing regardless of when or how often the the threads wake up.

Any multithreaded program has to cope with the possibility that one of the threads will work faster than another - by any ratio - even if they're executing on the same CPU with the same clock frequency.
Your choices include:
producer-consumer container than lets the first thread enqueue data, and the second thread "pop" it off for processing: you could let the queue grow as large as memory allows, or put some limit on the size after which either data would be lost or the 1st thread would be forced to slow down and wait to enqueue further values
there are libraries available (e.g. boost), or if you want to implement it yourself google some tutorials/docs on mutex and condition variables
do something conceptually similar to the above but where the size limit is 1 so there's just the single myData variable rather than a "container" - but all the synchronisation and delay choices remain the same
The Singleton pattern is orthogonal to your needs here: the two threads do need to know where the data is, but that would normally be done using e.g. a pointer argument to the function(s) run in the threads. Singleton's easily overused and best avoided unless reasons stack up high....

Related

cout on extra thread - thread safety

I have a very time sensitive task in my main thread. However, I would also like to simultaneously print some information about this task.
Problem: cout takes some time to execute and therefore slows down the time sensitive task in the main thread.
Idea: I thought about creating an extra thread which handles the output. To communicate between the main thread and the newly created thread, I thought about a vector which includes the strings that should be printed. In the newly created thread an infinite while loop would print these strings one after another.
Problem with the idea: Vectors are not thread safe. Therefore, I'm worried that locking and unlocking the vector will take nearly as much time as it would take when calling cout directly in the main thread.
Question: Is there an alternative to locking/unlocking the vector? Are my worries regarding the locking of the vector misguided? Would you take a whole different approach to solve the problem?
Depending on how time-sensitive the task is, I'd probably build up a vector of outputs in the producer thread, then pass the whole vector to the consumer thread (and repeat as needed).
The queue between the two will need to be thread safe, but you can keep the overhead minuscule by passing a vector every, say, 50-100 ms or so. This is still short enough to look like real-time to most observers, but is long enough to keep the overhead of locking far too low to care about in most cases.
You could use the idea one often sees in "interrupt" programming - send data from the thread to a ring buffer. Then, in another thread, print from the ring buffer. Actually, in the "good old days", one could write a ring buffer without any "atomics" (and can still do on some embedded systems).
But, even with atomics, ring buffers are not hard to write. There is one implementation here: c++ threadsafe ringbuffer implementation (untested, but on the first look seems OK).

Best practices for inter thread communication of large objects

I need to be extremely concerned with speed/latency in my current multi-threaded project.
Consider the following inter-thread communication:
Thread #1: Processes a large object, call it A, that it receives from network events. This data significantly alters its internal state.
Thread #2: Needs to know about the current, altered internal state of object A to make some set of decisions.
I can think of essentially two methods to proceed:
Thread #2 has a pointer to object A, and when signaled (say, through constantly checking a small object sent via a lockfree queue or by checking a shared atomic bool), Thread #2 locks object A and reads it.
Thread #1 pushes some version of a copy of the object onto the lockfree queue so that thread #2 can simply receive it directly, use it, and dispose of the copy when it is finished.
Method #1 avoids the costly copy of a large object needed for Method #2, but it always requires a lock/unlock and, if I understand things correctly, additional L3 cache hits.
I understand there may not be any simple performance answers...that I may simply need to benchmark. I am mostly interested in best practices advice. Specifically, I'd like to know to think about the problem a cache memory level to know how information is being passed and copied internally.
It really depends on how much information thread 2 is trying to access from thread 1. If it needs complete access of all of thread 1's objects/memory then yeah, copying all that is costly and I'd do a lock. But it also depends on the time. If all thread 2 needs is to check a state like one variable I would have thread 1 update a smaller object outside of itself that can be shared without locks since thread 2 only reads while thread 1 only writes.
Also you might not even need locking if thread 2 only reads and is ok with making a decision on a data state as it updates.

what's the advantage of message queue over shared data in thread communication?

I read a article about multithread program design http://drdobbs.com/architecture-and-design/215900465, it says it's a best practice that "replacing shared data with asynchronous messages. As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data".
What confuse me is that I don't see the difference between using shared data and message queues. I am now working on a non-gui project on windows, so let's use windows's message queues. and take a tradition producer-consumer problem as a example.
Using shared data, there would be a shared container and a lock guarding the container between the producer thread and the consumer thread. when producer output product, it first wait for the lock and then write something to the container then release the lock.
Using message queue, the producer could simply PostThreadMessage without block. and this is the async message's advantage. but I think there must exist some lock guarding the message queue between the two threads, otherwise the data will definitely corrupt. the PostThreadMessage call just hide the details. I don't know whether my guess is right but if it's true, the advantage seems no longer exist,since both two method do the same thing and the only difference is that the system hide the details when using message queues.
ps. maybe the message queue use a non-blocking containner, but I could use a concurrent container in the former way too. I want to know how the message queue is implemented and is there any performance difference bwtween the two ways?
updated:
I still don't get the concept of async message if the message queue operations are still blocked somewhere else. Correct me if my guess was wrong: when we use shared containers and locks we will block in our own thread. but when using message queues, myself's thread returned immediately, and left the blocking work to some system thread.
Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided. It's much easier to implement than is shared memory for intercomputer communication. Also, as you've already noticed, message passing has the advantage that application developers don't need to worry about the details of protections like shared memory.
Shared memory allows maximum speed and convenience of communication, as it can be done at memory speeds when within a computer. Shared memory is usually faster than message passing, as message-passing are typically implemented using system calls and thus require the more time-consuming tasks of kernel intervention. In contrast, in shared-memory systems, system calls are required only to establish shared-memory regions. Once established, all access are treated as normal memory accesses w/o extra assistance from the kernel.
Edit: One case that you might want implement your own queue is that there are lots of messages to be produced and consumed, e.g., a logging system. With the implemenetation of PostThreadMessage, its queue capacity is fixed. Messages will most liky get lost if that capacity is exceeded.
Imagine you have 1 thread producing data,and 4 threads processing that data (presumably to make use of a multi core machine). If you have a big global pool of data you are likely to have to lock it when any of the threads needs access, potentially blocking 3 other threads. As you add more processing threads you increase the chance of a lock having to wait and increase how many things might have to wait. Eventually adding more threads achieves nothing because all you do is spend more time blocking.
If instead you have one thread sending messages into message queues, one for each consumer thread then they can't block each other. You stil have to lock the queue between the producer and consumer threads but as you have a separate queue for each thread you have a separate lock and each thread can't block all the others waiting for data.
If you suddenly get a 32 core machine you can add 20 more processing threads (and queues) and expect that performance will scale fairly linearly unlike the first case where the new threads will just run into each other all the time.
I have used a shared memory model where the pointers to the shared memory are managed in a message queue with careful locking. In a sense, this is a hybrid between a message queue and shared memory. This is very when large quantities of data must be passed between threads while retaining the safety of the message queue.
The entire queue can be packaged in a single C++ class with appropriate locking and the like. The key is that the queue owns the shared storage and takes care of the locking. Producers acquire a lock for input to the queue and receive a pointer to the next available storage chunk (usually an object of some sort), populates it and releases it. The consumer will block until the next shared object has released by the producer. It can then acquire a lock to the storage, process the data and release it back to the pool. In A suitably designed queue can perform multiple producer/multiple consumer operations with great efficiency. Think a Java thread safe (java.util.concurrent.BlockingQueue) semantics but for pointers to storage.
Of course there is "shared data" when you pass messages. After all, the message itself is some sort of data. However, the important distinction is when you pass a message, the consumer will receive a copy.
the PostThreadMessage call just hide the details
Yes, it does, but being a WINAPI call, you can be reasonably sure that it does it right.
I still don't get the concept of async message if the message queue operations are still blocked somewhere else.
The advantage is more safety. You have a locking mechanism that is systematically enforced when you are passing a message. You don't even need to think about it, you can't forget to lock. Given that multi-thread bugs are some of the nastiest ones (think of race conditions), this is very important. Message passing is a higher level of abstraction built on locks.
The disadvantage is that passing large amounts of data would be probably slow. In that case, you need to use need shared memory.
For passing state (i.e. worker thread reporting progress to the GUI) the messages are the way to go.
It's quite simple (I'm amazed others wrote such length responses!):
Using a message queue system instead of 'raw' shared data means that you have to get the synchronization (locking/unlocking of resources) right only once, in a central place.
With a message-based system, you can think in higher terms of "messages" without having to worry about synchronization issues anymore. For what it's worth, it's perfectly possible that a message queue is implemented using shared data internally.
I think this is the key piece of info there: "As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data". I.e. use producer-consumer :)
You can do your own message passing or use something provided by the OS. That's an implementation detail (needs to be done right ofc). The key is to avoid shared data, as in having the same region of memory modified by multiple threads. This can cause hard to find bugs, and even if the code is perfect it will eat performance because of all the locking.
I had exact the same question. After reading the answers. I feel:
in most typical use case, queue = async, shared memory (locks) = sync. Indeed, you can do a async version of shared memory, but that's more code, similar to reinvent the message passing wheel.
Less code = less bug and more time to focus on other stuff.
The pros and cons are already mentioned by previous answers so I will not repeat.

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.
I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.

Fastest implementation of one thread providing data, many threads consuming data

I have a lot of data that I want to disseminate to many different threads. This data is coming from a single thread. The consuming threads can safely access the container simultaneously.
The data needs to be merged into the container ever delta seconds (50ms < delta < 1), during which time the consuming threads need to be locked out, but not blocked. Similarly, when the data producer wants to merge in the data, it should wait until any reading threads are finished (which should be fast), but no one else should start reading as the update needs to occur as soon as possible.
I'm working on linux (platform specific solution is perfectly fine/expected) and I care about every millisecond. What sort of locking mechanisms should I use or is there an even better model for this problem?
If there is only one data producer thread and memory is not a consideration, you may want to consider using a merge and swap algorithm.
In it, the writer thread creates a copy of the data structure while readers continue to use the original, merges in new changes, then performs an exchange of the two structures within a mutex or critical section (or reader/writer lock). If your Unix platform supports interlocked exchange as an atomic operation, you can perform a lock-free exchange maximizing read throughput through they implementation.
It looks like you need to use the pthread read/write locks. They allow you to restrict access to one writer OR multiple readers. Look at pthread_rwlock_init to initialize the lock, pthread_rwlock_rdlock to acquire the lock for reading data, and pthread_rwlock_wrlock to acquire the lock for writing data.
Sounds like a good use for pthread read-write locks along with some thread-safe queues. The producer thread inserts items into the queue. The worker pool will pull items off of the queue and process the data. I'm not sure how the output will work but you might want to use a thread-safe queue here as well... maybe a priority queue to automatically merge the data if it makes sense.
The locked queue construct is nothing more than a mutex for exclusive locking, a std::queue for data storage, and a condition variable to wake up threads that are waiting on the queue. The enqueue method grabs the lock, inserts into the queue, releases the lock, and signals the condition. The dequeue method grabs the mutex, waits on the condition using the mutex as a guard, and dequeues any data that is there when it is woken up. This is a pretty standard producer-consumer style queue.
Before you roll your own solution, you might want to check out Boost.MPI and Boost.Thread. They both provide nicer C++ interfaces over the underlying OS implementation. I've used Boost.Thread a lot but it doesn't provide a nice message passing interface, but it does improve over pthread.
If you are really into multi-processing, you might want to give Boost.MPI or maybe Apache Qpid serious consideration. I plan on looking into Qpid and AMPQ for future projects since they both provide nice message-based interfaces.