How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables? - c++

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.

I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.

Related

multithreaded one read one write time_t

I'm writing a multithreaded application that have 2 threads.
One of the threads receives data from a queue and aggregates it and the other one send the aggregated data to a server.
I want to be able to know the last time that a data was received so I use:
time_t last_data = time(NULL)
to get the correct time on each event (I dont need it to be super accurate but I need it to be fast) and then the other send this value with the aggregated data.
My questions are:
Do I have to synchronize this even if this is not very important that I get the most recent update?
I tested it with std::atomic<time_t> and it seems to have some performance issues, is there any other faster way?
What would be the worst case that can happen if I will not synchronize the read/write?
Is there a faster way to get the current time then time(NULL) (don't have to be super accurate)?
UPDATE
Here is an explanation of my application workflow.
Application needs:
1. Consume data from external sources using IPC (currently nanomsg).
2. Aggregate the data to bulks.
3. Send the aggregated data to remote server every given interval (1 second).
Current implementation:
Create 2 buffers to hold the aggregated data (one for receiving and one for sending).
Create a consumer thread to consume data from IPC and fill the receiving buffer.
Create a sending thread that will send the data to the server.
Every iteration of the interval the sending thread will swap the buffers (swap pointers and locking using mutex) and send the data to the server.
I don't want that the consumer will wait on network IO so I have created this flow.
Can I use event driven here instead of this complex mechanism without all the locking (currently it is working fine but i'm sure it can be better)?

Don't do it that way. You only need one thread. You can use select/poll/epoll. These can wait on your inputs, and at the same time for you output to finish. You will be doing event driven programming, and non-blocking output. It is something worth learning. It is a bit harder at first, but soon makes life easier i.e. not having the problem that you have now. Also the program will be faster.

Supposing one thread executes:
last_data = time(NULL);
And the other uses last_data but there is no synchronization event between the two then there are no guarantees when or if the revised value of last_data will become visible to the reading thread.
However the most serious possibility is that write of time_t (maybe long) isn't atomic and another thread could read a corrupt 'part written' value.
That could cause glitches in delay and time calculations that might foul downstream process.
You might analyse your program and find that because the two interact there is a sufficient memory fence at some point that guarantees eventual update.
NB: This is an odd situation where I'm suspecting you think something isn't synchronized but it is! The usual experience is the other way around...
Basically there's not really enough information to understand what problem you're having.
For example, if the reader thread is the only process to read the time I'd expect to see code like:
Thread 1:
If data received, lock L, update time, add to queue, unlock L.
Thread 2:
If items in queue L, read queue and update time, unlock L .. process item.
In which case the time will be synchronized already.
Please provide a minimum, complete, verifiable example...

Multirate threads

I ran recently into a requirement in which there is a need for multithreaded application whose threads run at different rates.
The questions then become, since i am still learning multithreading:
A scenario is given to put things into perspective:
Say 1st thread runs at 100 Hz "real time"
2nd runs at 10 Hz
and say that the 1st thread provides data "myData" to the 2nd thread.
How is myData going to be provided to the 2nd thread, is the common practice to just read whatever is available from the first thread, or there need to be some kind of decimation to reduce the rate.
Does the myData need to be some kind of Singleton with locking mechanism. Although myData isn't shared, but rather updated by the first thread and used in the second thread.
How about the opposite case, when the data used in one thread need to be used at higher rate in a different thread.

How is myData going to be provided to the 2nd thread
One common method is to provide a FIFO queue -- this could be a std::dequeue or a linked list, or whatever -- and have the producer thread push data items onto one end of the queue while the consumer thread pops the data items off of the other end of the queue. Be sure to serialize all accesses to the FIFO queue (using a mutex or similar locking mechanism), to avoid race conditions.
Alternatively, instead of a queue you could have a single shared data object (essentially a queue of length one) and have your producer thread overwrite the object every time it generates new data. This could be done in cases where it's not important that the consumer thread sees every piece of data that was generated, but rather it's only important that it sees the most recent data. You'd still need to do the locking, though, to avoid the risk of the consumer thread reading from the data object at the same time the producer thread is in the middle of writing to it.
or does there need to be some kind of decimation to reduce the rate.
There doesn't need to be any decimation -- the second thread can just read in as much data as there is available to read, whenever it wakes up.
Does the myData need to be some kind of Singleton with locking
mechanism.
Singleton isn't necessary (although it's possible to do it that way). The locking mechanism is necessary, unless you have some kind of lock-free synchronization mechanism (and if you're asking this level of question, you don't have one and you don't want to try to get one either -- keep things simple for now!)
How about the opposite case, when the data used in one thread need to
be used at higher rate in a different thread.
It's the same -- if you're using a proper inter-thread communications mechanism, the rates at which the threads wake up doesn't matter, because the communications mechanism will do the right thing regardless of when or how often the the threads wake up.

Any multithreaded program has to cope with the possibility that one of the threads will work faster than another - by any ratio - even if they're executing on the same CPU with the same clock frequency.
Your choices include:
producer-consumer container than lets the first thread enqueue data, and the second thread "pop" it off for processing: you could let the queue grow as large as memory allows, or put some limit on the size after which either data would be lost or the 1st thread would be forced to slow down and wait to enqueue further values
there are libraries available (e.g. boost), or if you want to implement it yourself google some tutorials/docs on mutex and condition variables
do something conceptually similar to the above but where the size limit is 1 so there's just the single myData variable rather than a "container" - but all the synchronisation and delay choices remain the same
The Singleton pattern is orthogonal to your needs here: the two threads do need to know where the data is, but that would normally be done using e.g. a pointer argument to the function(s) run in the threads. Singleton's easily overused and best avoided unless reasons stack up high....

Best way to handle job cancellation on thread

I wrote a simple job queue that uses a thread to run the jobs in the queue one-by-one. The thread itself is from a pool, so it's lifetime lasts as long as the job queue object is around. The job is popped off the queue, then run() is called on the job, and then it's discarded once finished.
I'm wondering what sorts of paradigms could I use to abort a job in mid-process. The naive approach is to have an abort flag which I check at regular intervals. The problem is that some jobs take a while because of I/O blocking or some other computationally heavy task.
Another option I thought was to kill the thread entirely. This is a potentially dirty and error prone solution.
Are there other ways of doing this?
EDIT: Since I'm in C++ land, is there a way to inject an exception into the other thread? It would immediately break execution and return to the thread main. This would be ideal, I think.

Depends on implementation of threads you use, there may be different ways to manipulate "abort flag". I would offer to look toward boost.threads & boost.interruption_points.
UPD: That injects exception in thread, if it's at interruption_point, as you wanted.
But if you have big unsplittable block of heavy calculations, then, I believe, ideologically it have to be finished full. Think, if you see any "moments" where you can stop it, inside this block, then you can split this block on parts, inserting there "abort flag" at those moments.
So, if it is monolith block, there can't be such moments. So you can't interrupt calculations normal way. So you have to wait for their finish.
But you can avoid waiting problems, if you will calculate your heavy block not in separated thread, but in separated process. Then you can kill it without being afraid for dirtying your main process memory, if needs, you can even left it to calculate what it needs for hours, after your main process was closed many minutes ago, and then silently die, if needs. No problems.

Thread or timer to read sensor data out?

My Linux C++ application is periodically reading sensor data. Readout is done by simple file I/O operation (OS is writing to file, application is reading from this file).
Some information about my platform:
I have single core processor with hyper-threading
sensor data update frequency is 1 second
application GUI runs in main thread and shouldn't be blocked
I considered two approaches for sensor data read out:
timer running in main application thread
separate thread with infinite loop which does sensor data readout and then sleeps
Which approach makes more sens, are there any other alternatives ? What are the costs of both solution (e.g. blocking of main thread in first or context switching in second approach) ?

I don't know anything about your application or the hardware, but here are a few things to consider:
If you use a thread, you will have to create a communication channel of some sort to tell the main thread that data has been updated. Usually this would be a pipe(), as signals are inherently unreliable and condition locks don't work with I/O multiplexing (i.e. select()/poll()).
Can you get the entire set of data without blocking? If so, then just reading it in the main thread is probably easier. However, if your read can block you'll probably need some more "keep track of my read state to incorporate it into my central select()", whereas a thread can just block until more data is available.
Thus, neither solution is automatically "easier" to do.
I wouldn't worry about "context switching" for a read that only occurs once per second; that's irrelevant.

What else does the main thread have to do? Is it ok if it blocks? If so, then you dont need to do the timer, etc in a separate thread.
If the main thread cant block waiting for the periodic timer, then a separate thread must be created. The communication of data between the threads can be via an object that is accessible to both threads and protected via a mutex (look up pthread_mutex_t), which is quite simple to do.
As for which solution would be better and what are the costs, it depends on what else the main thread is doing. But for something this simple, either way should be about the same, and the context switching shouldnt affect anything. What should affect performance the most is how performance intensive the reads are.

I believe that cost of the context switch once a second is not an issue even for single-core CPU without hyper-threading especially taking to the account that the application is running in user space, thus is not really time-critical. The polling of your sensor in the main thread complicates the logic of the application. So, I would recommend you to start a thread for that purpose.

A sleep-loop will skew the timing because each iteration is going to take longer than 1sec. Timers don't have that problem, and they are made for this scenario. So choose a timer.
Performance-wise there is no difference because you are only triggering once a second.

If the Linux driver is reading a sensor data and writing it to a device file every second, you shouldn't duplicate the timer logic in your application. It may happen that after 1 second sleep your application will still read the same data as 1 second ago. A better approach would be to have a thread that would call a blocking read on a device file. When new sensor data is available, blocking read returns, the thread can process the data and call read again.

Threading in an endless C++ program

I have a web interface where the user submits some data and it gets written to a database. In the background there is a C++ program which periodically checks the database for new entries. It then takes these entries, processes them and writes their result to a directory. It then proceeds to sleep and keep checking for new entries to process.
My question is in regards to adding multithreading to the C++ program. I have read that it's generally a bad idea just to create a new thread every time you need a another job done, but rather add the jobs to a queue and disperse them out to a fixed number of threads that have already been created (say, 5 or so). Is this the proper design route to take for my situation? Also, if I understand pthread_join correctly, I don't actually need to call it because I don't want to wait for all of the jobs to finish before continuing to check for new updates to the database.
I just wanted to make sure I'm headed in the right direction, any affirmations/criticisms/resources?

You should first decide whether you even need more than one thread - it sounds like checking the database and writing files at some given interval can be accomplished using only one thread. Multiple threads would become useful when you start having to write different data to multiple files simultaneously at non-regular intervals. You are correct that using a queue of sorts would be the best way to distribute these 'jobs' to your threads, and that using a thread pool will give you a little more control over how many 'jobs' you want running simultaneously at any given time. The pthread_join method is used when you want to make sure one thread doesn't exit before another - I've used this mostly to make sure that the program's initial thread doesn't exit after creating the thread pool, as when the parent thread exits the program's execution stops. Some psuedo code based on my comments below.
main thread:
spawn child threads
while(some exit condition){
check database for new jobs
if(new jobs){
acquire job queue mutex //mutexes ensures only one thread accesses shared
add job to queue //data at a time
signal on shared condition variable
release job queue mutex
}
sleep(some regular duration)
}
child thread:
while(some exit condition){
acquire job queue mutex
if(job queue's size == 0){
wait on the shared condition variable
}
grab job from queue
release job queue mutex
handle job
}
See here for pthread/mutex/CV usage notes.

In my experience creating a thread will most likely take tens of milliseconds. For your days computers this is not a big deal. Nothing bad will happen if it will be created/destroyed often. Looking for simple and flawless app level design might be more important.
As a possible variant, I would recommend considering a pool of threads, one thread per available CPU core. These threads should simply sleep at the end of the loop and regularly check if there is something to do or not.
This simplistic design will add minimal overhead and allow using all available CPU power at the same time.
My 2 cents.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js